SWI-Prolog has a lot of git submodules. Most of them live under packages/ and are, predictably, the standard packages that come with most SWI-Prolog installations. In addition, there’s ‘bench’, which has benchmark code, and ‘debian’ (distro-debian), which contains debian packaging.
There are some downsides to git submodules. The separation introduces some extra maintenance burden, since every package change requires a submodule update. It also requires more involved build and contribution instructions, and it makes the tarballs that github automatically generates from tags unbuildable.
Despite the separation of the codebase into all these submodules, most work on this set of repositories is done by Jan, meaning development of all these submodules is pretty much just integrated with development of core SWI-Prolog itself. So why isn’t this development just taking place in a single repository?
Jan has been successfully developing and maintaining SWI-Prolog with this setup for years. @jan, since this is mostly your burden, I assume this setup is actually making your work easier, and you’re willingly paying the price of slightly more git fiddling because you are getting some benefit out of it.
What is that benefit? What am I missing here?
…
With my question out of the way, let me spend some time ranting about Nix builds, for some additional background to my question. There’s no reason to read this, I just had to write it.
In Nix, builds are considered functions. The idea is that given the same inputs and enough sandboxing, a build process should produce the same outputs. This assumption allows Nix to efficiently cache builds. It also allows Nix to generate so-called closures, minimal sets of packages that are needed to run a program, and which can then be deployed on any system (provided it has the same architecture) without having to worry about missing dependencies (or worse, having it load incompatible dependencies that then spew out seemingly unreproducible errors).
For all these assumptions to work, it is very, very important that Nix is able to validate that source inputs do not change. Therefore, Nix packages, when they need to download some source code, also need to specify a hash. This way, Nix can verify that the download corresponds with what the original packager thought the files were, and inductively, that anything built from such sources produces deterministic outputs.
For source tarballs this is super easy. Nix can just download them and run sha256sum or something similar on it. For git repositories, it’s slightly more difficult, as Nix really just wants to care about the source code, not the commit history or any other git-specific thing. So before calculating any checksum, Nix first has to remove the .git subdirectory. When there are submodules, Nix has to do a little dance of recursively checking them out in the right places, then removing all those .git subdirectories, to finally get a complete source tree which can be hashed.
So far so good. SWI-Prolog packaging works just fine today in nixpkgs using a git checkout with submodules.
There is just one little annoyance. Since building a Nix package requires us to know up front what all inputs look like, you can’t just build a Nix package with an updated source and have it work first try. Instead, the general procedure most people take is to
- Update the package definition to point at the updated version and clear the expected hash.
- Trigger a build, which will fail, because the hash is not set. It will tell us what the hash actually should have been.
- Update the package definition with the hash copied out of the failed build.
Not exactly user-friendly! While this works fine for the occasional source inport into nixpkgs, if you need to regularly build different versions of the source code, this gets annoying.
This is why I wrote swipl-nix, to automate that hash-calculating for as many versions of SWI-Prolog that I can. But actually, there is a much better solution, though submodules make it less ideal than it could be.
While most Nix packages are built by third parties, there’s nothing stopping a package from providing its own packaging, and checking it in into the same repository as the code which it packages. In that case, there’s no need for a source import. Or more precisely, if you already have some mechanism in place that fetches the nix code, you’d get the source code with it for free.
In the wonderful world of Nix, there are various programs that help out with this, but the most popular workflow at the moment is flakes. A flake is a bit of nix code, usually in a git repository, which acts as a big wrapper around nix dependencies. You give it a bunch of imprecise inputs, such as other git repositories you want to get the latest version from, and you define a function which turns this into outputs, usually a package. The flake subsystem does all the required work of fetching those inputs, calculating the hashes, and writing a lockfile. This lockfile is then used as a deterministic input.
Long story short, if we could just make SWI-Prolog a flake, it could effectively package itself. People could get the latest development version of swipl just by having their flake point at github:SWI-Prolog/swipl-devel
, and the flake subsystem would do all the required boilerplate hash calculating work. As an extra bonus, this would also let you do nix run github:SWI-Prolog/swipl-devel
and immediately get the latest dev build. or nix run github:SWI-Prolog/swipl-devel/feature-branch
to check out a potential bugfix on feature-branch
.
But flakes do not work very well with submodules. It is not impossible to use them, but it was clearly an afterthought in the whole flake design process, and it requires the user to know submodules are in use and modify their commands and code accordingly, thereby breaking the abstraction a little bit. Basically, to pull in a dependency that needs its submodules, the importing flake (or user, when doing a has to provide an extra flag to also fetch all submodules. nix run
or similar)For example, we’d have to do [edit: turns out this actually does not work for remote git repositories, only for local ones that are already checked out and have all their submodules initialized. There appears to be no way to just nix run 'github:SWI-Prolog/swipl-devel?submodules=1
to run the latest swipl.nix run
a remote git repo that needs submodules.]
It’s not the end of the world. But it is mildly annoying, and that is almost as bad.
For me, things would be much nicer if everything lived directly in the swipl-devel repository. But since probably only a handful of people worldwide care about the niche intersection of Nix and SWI-Prolog, I understand if my concerns aren’t very important here .
But if the submodule situation is more of a historical accident with no clear benefits today, I’d be very willing to help out with a migration to a source tree without submodules, as I think it would be beneficial regardless.