Improving contributor guide discoverability (was: Consolidating the 71 GitHub repositories to simplify maintenance and contribution)

edom · April 5, 2019, 2:54pm

Dear SWI-Prolog maintainers,

SWI-Prolog has 71 Git repositories and 4 maintainers. This sparsity presents some difficulties to both maintainers, contributors, and would-be contributors. Fortunately there is a reasonably easy solution.

We can combine all those repositories into one (or some small number) using git-subtree which preserves history. For the issue tracker, we can make a label for each package. Such consolidation will simplify maintenance and reduce the barrier to people who want to contribute.

How it will simplify maintenance: Now we only need to make one commit to update CMake version instead of tens of commits spread in tens of repositories.

How it will simplify issue reporting: Now people don’t have to find out which repository to file the issues in. They open an issue in swipl-devel, and a maintainer attaches a suitable label that indicates the package.

What do you think?

Best regards,

Erik

jan · April 5, 2019, 3:22pm

Interesting idea. I kind of got used to submodules. The model might seem a bit complicated, but is easy to understand and once you get it is quite trivial what to do. The number of modules is also not that bad: just 37 make up SWI-Prolog. The rest is stuff that is (no longer) related to the core system.

That said, the submodule system surely leads to confusion, some of which you already mention. I’ll do some reading and see which problems it solved and creates. Might take some time though, but there is no hurry.

Thanks for the tip — Jan

swi · April 5, 2019, 5:16pm

Right now the repository is not contributor friendly --if you need to contribute to one of the packages/submodules (which is the most common case). It took me almost an hour googling answers to find out the solution.

.gitmodules needs to be changed as done in this PR: https://github.com/SWI-Prolog/swipl-devel/pull/356 . The .gitmodules modification is meant for travis, but it works for any user.

The reason it is easy for Jan is because he is the owner of the repo, and this problem doesn’t show up for owners of the repo, and also because he always has the proper parent directory structure.

The problem only shows up for contributors.

swi · April 5, 2019, 5:37pm

In particular, I mean this line from the PR referenced above:

if [ "$TRAVIS_OS_NAME" == "linux" ] ; then sed -i 's~url = ..~url = https://github.com/SWI-Prolog~' .gitmodules; fi

swi · April 5, 2019, 5:47pm

I think Erik’s idea has merit, look at the docs from git:

Unlike submodules, subtrees do not need any special constructions (like .gitmodules
files or gitlinks) be present in your repository, and do not force end-users of your
repository to do anything special or to understand how subtrees work. A subtree is
just a subdirectory that can be committed to, branched, and merged along with your
project in any way you want.

It will solve a lot of the module problems experienced by contributors. But it will be painful for Jan in the beginning

But I think it is worth it to expand the contributions.

jan · April 5, 2019, 5:54pm

I’ll surely read into it. As is, the simple way is to clone the whole thing, init all submodules. That allows you to build the system. To make a PR for a module, fork the module on github, use git remote add to make your fork accessible from the cloned submodule. Then do your edit work and push to your added remote.

First thing is to understand what exactly git subtree does, what gets easier and what gets harder.

swi · April 5, 2019, 6:24pm

The normal quick contributor just would like to do a quick patch of the documentation, or some small bug with a one-line change, etc.

I’ll show you what happens to that contributor who has forked swipl-devel in his github user account:

$ git clone https://github.com/someuser/swipl-devel
Cloning into 'swipl-devel'...
remote: Enumerating objects: 184191, done.
remote: Total 184191 (delta 0), reused 0 (delta 0), pack-reused 184191
Receiving objects: 100% (184191/184191), 80.62 MiB | 3.26 MiB/s, done.
Resolving deltas: 100% (147549/147549), done.
$ cd swipl-devel
$ git submodule update --init
Submodule 'bench' (https://github.com/erlanger/bench.git) registered for path 'bench'
Submodule 'debian' (https://github.com/erlanger/distro-debian.git) registered for path 'debian'
Submodule 'packages/PDT' (https://github.com/erlanger/packages-PDT.git) 
[....submodule registration...]
Submodule 'packages/zlib' (https://github.com/erlanger/packages-zlib.git) registered for path 'packages/zlib'
Cloning into '/tmp/swipl-devel/bench'...
Username for 'https://github.com':     <<<--------- LOOK HERE

Uhh? It is asking for the user name? The user who just wants to make a one-line change will simply say: “why is it asking me for the user name? This is too hard I’ll do it sometime later”, the end result: we’ll never get the contribution.

The more persistent user will start googling around, and figure out that it is asking for the user name because of the way .gitmodules is set up. Then he will figure out an hour later, that he has to change .gitmodules the way it is described in the PR I showed above. This is why travis can’t build SWI-Prolog without the patch in the PR I mentioned.

The reason why Jan has never experienced this is because he is the owner of the repo.

Jan, you would see the above if you fired up a VM, fork swipl-devel from a new github account, and try to make a one line patch as if you were not the author of the project.

edom · April 5, 2019, 6:39pm

One problem: Before we merge, we have to make sure that there is no code left behind in the branches of the submodules. For example, the packages-ssl repository has several branches (base64_newline, cmake, etc.). We have to merge all them into master if we don’t want to lose the changes in those branches.

One way to simplify development (which I use myself) is to not use branches:

Do everything in the master branch.
The master branch must always work.

It does not have to be Jan who merges all the repositories.
As an example, I have merged package-ssl:master into my fork of swipl-devel:master. The difference from the original is: Everyone who clones this repository also immediately gets the contents of package-ssl:e9d0a9e in the swipl-devel/package/ssl directory. That is, vanilla git-clone works as expected. Then, we can delete the packages-ssl GitHub repository.

I can easily merge the other packages (it’s just git subtree add -P packages/<name> <commit>), but only Jan can verify that all branches of child repositories have been correctly merged to their respective master branches, or discarded if those codes are unwanted.

One downside of subtree: It may slow down the repository if there are too many files. In my experience, with a 100000-file repository, git rebase is unbearably slow.

swi · April 5, 2019, 6:49pm

An intermediate step, without so much work, is to simply convert modules into subtrees, and leave the merging for later.

Even if we don’t use subtrees, I think .gitmodules has to be fixed if we want to make contribution easy.

jan · April 6, 2019, 7:43am

After a bit of bench reading I’m not convinced git subtree is worth the trouble. It all feels a little like “I think (Prolog) modules are too complicated, put everything in a single file”. Enough people program that way anyway

Submodules have had their value in the past when several of the modules were practically managed by other people. At the moment all package modules are practically in maintenance stage and this doesn’t matter too much, but I still like to be able to do so. Submodules were also intended to be shared with other Prolog systems. That too isn’t active right now, but work is going on between XSB and SWI, so who knows? I’m a big fan of branching and rebasing and the warnings do not make me very happy (we have about 45,000 files).

Git is not a distributed file system. I more like, if I recall correctly, Linus Torvald’s view that a software system is a set of patches. So for now, I think we should educate people how to contribute in a comfortable way.

edom · April 6, 2019, 8:11am

I am doing git submodule update when I encounter this error message:

Fetched in submodule path 'debian', but it did not contain c4718ab1a3fddaa0b2e02d52694a94c641df3b48. Direct fetching of that commit failed.

It only appears when git submodule update. When I git clone debian normally, I can git-log that commit, and it indeed exists.

This page suggests that you may have forgotten to push something to swipl-devel. Is this the case?

(Update: This solved the problem. It seems that git submodule update only fetches the remote master.)

cd debian
git fetch origin devel

jan · April 6, 2019, 1:43pm

Hmm. I always do a git pull on the main repo and that also fetches the submodules. Possible because devel is also a local branch for me? In fact the only submodule you probably do not want is debian as it is only used to build the Ubuntu PPAs.

swi · April 6, 2019, 3:50pm

This makes sense.

edom · April 6, 2019, 4:28pm

Agree. I did not foresee those use cases. Let’s stick to submodules and forget about subtrees for now. Submodules are not too hard.

We can help future contributors avoid swi’s problem by putting a prominent note in the contributor guide: clone before fork, and do not fork before clone. It turns out that this instruction is already in unix.html, but it is two clicks away from SubmitPatch.html. That is, it exists, but it is hard to find.

It turns out that this is not the first time Jan has written the instructions. He did write it once in 2015 in Google Groups. Thus Jan has written it at least three times: once in the website, once in Google Groups, and once in Discourse.

Thus, I think we have found the real problem: the newcomers cannot find the instructions, because the instructions are three clicks away from the home page.

jan · April 6, 2019, 5:03pm

I added a clone before fork to SubmitPatch.html (may take an hour for the CDN to update). That saves one click. Still, people tend not to read these things. There is already a link from COMMUNITY

So, I guess the question becomes "what is a good place for people to find this info"?

edom · April 6, 2019, 9:40pm

[…] people tend not to read these things […]

So, I guess the question becomes "what is a good place for people to find this info "?

That question is insightful.

People tend not to read these things because they are not cloning when they are at that page. The information should be at where they are when they are cloning: the swipl-devel GitHub repository. The information, the person, and the task must be near to each other in space and time. Ideally, the information is presented right where people need it when they need it.

The question becomes “Where are they when they need that information?”

The answer: They probably are at swipl-devel at GitHub, after searching for “swi prolog source code” in Google. (I may be wrong. You may have a more accurate answer from the website statistics.)

Thus, I think the best place for that information is the README.md file in swipl-devel, because people will be looking at that when they are cloning. The readme is as close as possible to the “Fork” and “Clone” button as GitHub allows. The readme is the only place that is zero clicks away from where people are when they are cloning.

Also, we can assume that people want to build the source right after they clone it, so the information about building should be placed right after the information about cloning. Then, they will want to install it, run it, learn about it, play with it, write big programs in it, contribute to it, and so on. Thus the sequence of information in README.md should follow that most likely sequence of tasks done by a new contributor.

jan · April 7, 2019, 8:02am

Makes sense. See https://github.com/SWI-Prolog/swipl-devel. Please suggest improvements if you think this can be done better.

edom · April 7, 2019, 9:43am

I made a draft pull request. Let’s continue the discussion there.

swi · April 7, 2019, 2:41pm

Thr new README.md is much better to solve this problem, I think this will help much. Especially because it is on the top of the README.

Topic		Replies	Views
Being friendly to quick contributions General	6	2118	December 23, 2019
SWI-Prolog github repository ("bench" submodule) is currently borked SWI-Prolog web site and services bug	7	848	August 10, 2020
Prolog to GPT API - Reply 1 Split Topic	3	343	February 22, 2023
Good behaviour in SWI-Prolog library contribution Help!	1	248	February 11, 2021
GitHub Pull Request Wiki	0	499	May 13, 2020

Improving contributor guide discoverability (was: Consolidating the 71 GitHub repositories to simplify maintenance and contribution)

Related topics