SWI-Prolog has 71 Git repositories and 4 maintainers. This sparsity presents some difficulties to both maintainers, contributors, and would-be contributors. Fortunately there is a reasonably easy solution.
We can combine all those repositories into one (or some small number) using git-subtree which preserves history. For the issue tracker, we can make a label for each package. Such consolidation will simplify maintenance and reduce the barrier to people who want to contribute.
How it will simplify maintenance: Now we only need to make one commit to update CMake version instead of tens of commits spread in tens of repositories.
How it will simplify issue reporting: Now people don’t have to find out which repository to file the issues in. They open an issue in swipl-devel, and a maintainer attaches a suitable label that indicates the package.
Interesting idea. I kind of got used to submodules. The model might seem a bit complicated, but is easy to understand and once you get it is quite trivial what to do. The number of modules is also not that bad: just 37 make up SWI-Prolog. The rest is stuff that is (no longer) related to the core system.
That said, the submodule system surely leads to confusion, some of which you already mention. I’ll do some reading and see which problems it solved and creates. Might take some time though, but there is no hurry.
Right now the repository is not contributor friendly --if you need to contribute to one of the packages/submodules (which is the most common case). It took me almost an hour googling answers to find out the solution.
The reason it is easy for Jan is because he is the owner of the repo, and this problem doesn’t show up for owners of the repo, and also because he always has the proper parent directory structure.
I think Erik’s idea has merit, look at the docs from git:
Unlike submodules, subtrees do not need any special constructions (like .gitmodules
files or gitlinks) be present in your repository, and do not force end-users of your
repository to do anything special or to understand how subtrees work. A subtree is
just a subdirectory that can be committed to, branched, and merged along with your
project in any way you want.
It will solve a lot of the module problems experienced by contributors. But it will be painful for Jan in the beginning
But I think it is worth it to expand the contributions.
I’ll surely read into it. As is, the simple way is to clone the whole thing, init all submodules. That allows you to build the system. To make a PR for a module, fork the module on github, use git remote add to make your fork accessible from the cloned submodule. Then do your edit work and push to your added remote.
First thing is to understand what exactly git subtree does, what gets easier and what gets harder.
The normal quick contributor just would like to do a quick patch of the documentation, or some small bug with a one-line change, etc.
I’ll show you what happens to that contributor who has forked swipl-devel in his github user account:
$ git clone https://github.com/someuser/swipl-devel
Cloning into 'swipl-devel'...
remote: Enumerating objects: 184191, done.
remote: Total 184191 (delta 0), reused 0 (delta 0), pack-reused 184191
Receiving objects: 100% (184191/184191), 80.62 MiB | 3.26 MiB/s, done.
Resolving deltas: 100% (147549/147549), done.
$ cd swipl-devel
$ git submodule update --init
Submodule 'bench' (https://github.com/erlanger/bench.git) registered for path 'bench'
Submodule 'debian' (https://github.com/erlanger/distro-debian.git) registered for path 'debian'
Submodule 'packages/PDT' (https://github.com/erlanger/packages-PDT.git)
[....submodule registration...]
Submodule 'packages/zlib' (https://github.com/erlanger/packages-zlib.git) registered for path 'packages/zlib'
Cloning into '/tmp/swipl-devel/bench'...
Username for 'https://github.com': <<<--------- LOOK HERE
Uhh? It is asking for the user name? The user who just wants to make a one-line change will simply say: “why is it asking me for the user name? This is too hard I’ll do it sometime later”, the end result: we’ll never get the contribution.
The more persistent user will start googling around, and figure out that it is asking for the user name because of the way .gitmodules is set up. Then he will figure out an hour later, that he has to change .gitmodules the way it is described in the PR I showed above. This is why travis can’t build SWI-Prolog without the patch in the PR I mentioned.
The reason why Jan has never experienced this is because he is the owner of the repo.
Jan, you would see the above if you fired up a VM, fork swipl-devel from a new github account, and try to make a one line patch as if you were not the author of the project.
One problem: Before we merge, we have to make sure that there is no code left behind in the branches of the submodules. For example, the packages-ssl repository has several branches (base64_newline, cmake, etc.). We have to merge all them into master if we don’t want to lose the changes in those branches.
One way to simplify development (which I use myself) is to not use branches:
Do everything in the master branch.
The master branch must always work.
It does not have to be Jan who merges all the repositories.
As an example, I have merged package-ssl:master into my fork of swipl-devel:master. The difference from the original is: Everyone who clones this repository also immediately gets the contents of package-ssl:e9d0a9e in the swipl-devel/package/ssl directory. That is, vanilla git-clone works as expected. Then, we can delete the packages-ssl GitHub repository.
I can easily merge the other packages (it’s just git subtree add -P packages/<name> <commit>), but only Jan can verify that all branches of child repositories have been correctly merged to their respective master branches, or discarded if those codes are unwanted.
One downside of subtree: It may slow down the repository if there are too many files. In my experience, with a 100000-file repository, git rebase is unbearably slow.
After a bit of bench reading I’m not convinced git subtree is worth the trouble. It all feels a little like “I think (Prolog) modules are too complicated, put everything in a single file”. Enough people program that way anyway
Submodules have had their value in the past when several of the modules were practically managed by other people. At the moment all package modules are practically in maintenance stage and this doesn’t matter too much, but I still like to be able to do so. Submodules were also intended to be shared with other Prolog systems. That too isn’t active right now, but work is going on between XSB and SWI, so who knows? I’m a big fan of branching and rebasing and the warnings do not make me very happy (we have about 45,000 files).
Git is not a distributed file system. I more like, if I recall correctly, Linus Torvald’s view that a software system is a set of patches. So for now, I think we should educate people how to contribute in a comfortable way.
Hmm. I always do a git pull on the main repo and that also fetches the submodules. Possible because devel is also a local branch for me? In fact the only submodule you probably do not want is debian as it is only used to build the Ubuntu PPAs.
Agree. I did not foresee those use cases. Let’s stick to submodules and forget about subtrees for now. Submodules are not too hard.
We can help future contributors avoid swi’s problem by putting a prominent note in the contributor guide: clone before fork, and do not fork before clone. It turns out that this instruction is already in unix.html, but it is two clicks away from SubmitPatch.html. That is, it exists, but it is hard to find.
It turns out that this is not the first time Jan has written the instructions. He did write it once in 2015 in Google Groups. Thus Jan has written it at least three times: once in the website, once in Google Groups, and once in Discourse.
Thus, I think we have found the real problem: the newcomers cannot find the instructions, because the instructions are three clicks away from the home page.
I added a clone before fork to SubmitPatch.html (may take an hour for the CDN to update). That saves one click. Still, people tend not to read these things. There is already a link from COMMUNITY
So, I guess the question becomes "what is a good place for people to find this info"?
So, I guess the question becomes "what is a good place for people to find this info "?
That question is insightful.
People tend not to read these things because they are not cloning when they are at that page. The information should be at where they are when they are cloning: the swipl-devel GitHub repository. The information, the person, and the task must be near to each other in space and time. Ideally, the information is presented right where people need it when they need it.
The question becomes “Where are they when they need that information?”
The answer: They probably are at swipl-devel at GitHub, after searching for “swi prolog source code” in Google. (I may be wrong. You may have a more accurate answer from the website statistics.)
Thus, I think the best place for that information is the README.md file in swipl-devel, because people will be looking at that when they are cloning. The readme is as close as possible to the “Fork” and “Clone” button as GitHub allows. The readme is the only place that is zero clicks away from where people are when they are cloning.
Also, we can assume that people want to build the source right after they clone it, so the information about building should be placed right after the information about cloning. Then, they will want to install it, run it, learn about it, play with it, write big programs in it, contribute to it, and so on. Thus the sequence of information in README.md should follow that most likely sequence of tasks done by a new contributor.