SWI-Prolog feature sizes

Hi all,

I wrote a script to compare the full size of different feature combinations for SWI-Prolog. By full size I mean the stored size of the SWI-Prolog binary, library and its runtime files, along with all dependencies needed to support that particular feature set.

It’s not a very clever script. I just run a lot of different nix builds of the swipl-devel master branch, then ask nix about the total dependency closure size. You can find it in the swipl-nix repository (sizes.nix), and run it with nix run github:matko/swipl-nix#sizes, though I don’t recommend doing so unless you’re prepared to simultaneously build and store 12 different versions of SWI-Prolog.

I am now running this against an experimental branch where I’m additionally working on cutting down the gperftools dependency to just the tcmalloc library (*). You should be able to get something close to what I got by explicitely referring to the swipl-nix commit I used (929c70e, corresponding to e54300d in swipl-devel). This is on x86_64-linux.

> nix run github:matko/swipl-nix/929c70e#sizes
       PKG         SIZE       MBSIZE       CHANGE
minimal        85150856       81.20M

withYaml       85345360       81.39M        +.18M
withOdbc       86437168       82.43M       +1.22M
withPcre       87183920       83.14M       +1.93M
withDb         89610072       85.45M       +4.25M
withGui       112569312      107.35M      +26.14M
withPython    204178024      194.71M     +113.51M
withJava     1027237752      979.65M     +898.44M

noJava        239292448      228.20M     -880.67M
noPython     1045644416      997.20M     -111.67M
noGui        1150413224     1097.11M      -11.76M

full         1162748240     1108.88M

In this overview, minimal and full are versions of SWI-Prolog with all 7 measured features (yaml, odbc, pcre, db, gui, python, java) disabled/enabled. The various withX rows are minimal, plus only that one feature enabled. The various noX rows are full, minus that one feature disabled.

The reason for providing both a ‘withX’ and ‘noX’ for some packages is that there’s dependency reuse between features. The noX is therefore smaller than the corresponding withX.

What stands out to me is how insignificant the GUI support is compared to what java and python are pulling in. Currently the common approach distros take is to provide SWI-Prolog in a version with and without GUI. Shouldn’t there be some consideration about these other two dependencies?

I understand that the java feature doesn’t actually reach beyond the single jar file that gets generated, which a packaging system could put in its own package, eliminating the dependency for those that don’t need it without needing a custom build.

I’m less sure about the python support, but isn’t it contained to its own library, which could potentially be installed separately?

(*) Currently on swipl-nix master another 63MB of dependencies is pulled in, due to tcmalloc (which itself is just 215 kilobytes) being packaged along with all of gperftools. This is solvable, but will probably take some collaboration.

1 Like

/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4 seems to be part of both package libtcmalloc-minimal4 and libgoogle-perftools-dev (although the latter appears to just contain the symlink /usr/lib/x86_64-linux-gnu/libtcmalloc.so -> libtcmalloc.so.4.5.10, which is in libtcmalloc-minimal4).

Peter, regarding gperftools packaging issues I was specifically talking in the context of nixpkgs, which is what is used here to calculate closure sizes. Debian packages tcmalloc correctly.

I think it’ll be requiring a significant amount of Qt libraries to be installed, though.

Certainly. The “do you want the Java behemoth on your system, and then following on from that, do you want the Java behemoth integrated into everything possible” is a huge elephant-in-the-room packaging question.

E.g. Gentoo solves this by making it a user decision at compilation time. Pre-compiled distros instead should offer several variations of packages pre-compiled, when there are such significant choices to be made.

Qt is not required in xpce, as discussed in a previous topic. xpce just requires a bunch of X libraries, which are admittedly large when compared to small libraries like libyaml, but nowhere near the size of any modern gui framework.

The sizes reported are runtime dependency sizes. If Qt had been a runtime dependency, it would have been counted.

That said, to do anything with xpce you do need some sort of X server, which on most systems is either going to be Xorg or Xwayland. Probably you’d have some sort of desktop environment to go with that. I’m not accounting for any of that.

I am not sure if this choice needs to be made at compile time. For sure, that’s how gentoo would have to do it, being fully source based, but for any distro that uses a cache of pre-built things, it is possible to use a single source package to produce several output packages. Compile time can just build everything, as long as the build output is properly split up in several packages. People depending on SWI-Prolog with a particular feature set can then just pull in the right set of packages to get everything.

This is possible as long as java and python support are things that live in files that can safely be removed while retaining a functioning SWI-Prolog that just lacks these features.

I hope this is possible, because I’m not looking forward to installing swi-prolog-nox-nojava-python.

By the way, looks like that Gentoo packaging is lacking python support, unless it’s not behind a use flag. It’s also very outdated.

Was reminded of this today. This isn’t just a possibility for the ‘big’ dependencies. Presumably, more parts of SWI-prolog, like many of the libraries, could be split off and be made into separate installables, so that you could have a minimal SWI-Prolog core that you can just add extra packages on top of to add features.

@jan - any thoughts?

Yes, but contradicting. A good infra structure for add-ons is a must. We are not that far off (although there are still various issues with the pack system). A coherent core that is guaranteed to fit together is also great. Note that the Debian based systems generate multiple apt packages from the source. Notably creating a seperate installable package for xpce, jpl, odbc and possibly Janus (Python) is a good idea to allow for minimal installations. My experience with Python pip learned me that a good portable infrastructure for foreign extensions is too hard for the Python community and (thus) surely too hard for us :frowning:

I had hoped that the pack system would provide new stuff for the coherent centrally maintained system. In practice that did not work so well because several of the main contributors have there own “base packs” that contain common utilities that have a lot of overlap with the core libraries. We can not adapt these as it leads to too much code duplication. Instead, the new functionality of these should first find a place in the core.

I don’t know how this fits nix.

I’ll study the debian packaging. To be clear, these aren’t just completely separate packages that just ran the entire build with certain features enabled/disabled, but actually a split output, where one build resulted in several different packages which together make up a fully featured SWI-Prolog?

Packaging is hard, and I’m very skeptical about ever solving this problem independently within the infrastructure of a language when there are foreign dependencies involved. We just cannot know where they are supposed to come from. And making the decision for the user (which I’ve seen some python do - just bundle a whole bunch of native stuff in a wheel), while it might work out for a majority of users, just cannot work for everyone.

This is why package managers exist. Different distributions, operating systems and independent package managers want to make different choices here, and allow their users different ways to specify their preferences. I think it is important that language infrastructure, like our pack system, is able to play nicely with such package managers.

There’s a bit of a gray area where some dependencies could be automatically fetched by a build vs retrieved from a package system (do you get that pip from pypi or from apt?). Though I don’t think anyone is currently doing this, it should entirely be possible to actually package some prolog pack, allowing people to apt-get install them, instead of swipl pack install. In that case, any of the native dependency mess can just be solved by debian. Packs can even be fully precompiled that way, or receive any OS-specific patches that might be necessary to have it run well.

You hoped that some of the work done by people in packs could be pulled into SWI-Prolog itself? What are some of these base packs that people are using?

The way I see it, there’s sort of 3 tiers here.

  1. libraries bundled with SWI-Prolog, maintained as part of SWI-Prolog. This is our standard library
  2. packs maintained by others which a significant chunk of SWI-Prolog users will want
  3. Everything else

In my view, distros should package all of 1 and 2, and make a good effort for 3. SWI-Prolog development should only have to be concerned with 1.

My last thoughts on this are, if there’s base packs that do a lot we’d want to have as part of the core libraries, can’t we just make an opinionated choice to selectively pull some in, leave out some other things, and refactor whatever currently doesn’t mesh with the standard library? Given the current state of the prolog ecosystem I do not think a consensus will just arise among prolog developers, but practically, if the next version of SWI-Prolog ships with new core bits, those bits will just become the ‘standard’.

It’s just another package manager.
It supports two different ways of customizing what gets installed. One is to provide the user with flags that manipulate how the build is done. This is the approach currently taken in the Nix packaging for SWI-Prolog.

The other is to produce split output from a build with all features enabled. Since packages in the official NixOS package distribution (Nixpkgs) are automatically built, and their build outputs are cached, this lets people just download the parts they need without actually having to run their own build. This is what I think the Nix packaging should actually do.

Roughly, method one is comparable to Gentoo. method two is comparable to Debian.

They build the package once and then they run the installer with options to install certain components and create the packages. So, one build, and a series of install+package runs.

Fair enough. It makes us depend on the packagers for each of these package managers though (If I get you right). There are a lot of them, so we can’t do that. I did/do a few and it is a lot of work :frowning: For some package managers there are well maintained packages for SWI-Prolog, for some outdated or (partly) broken and some are lacking completely. SWI-Prolog is too small to find a packager for each package system :frowning:

Note that also for Python one commonly has to use pip on e.g., Ubuntu because the extension is not available as apt package or the apt package is too old. As a smaller community we will do worse :frowning:

That was my hope. A little how it works/use to work with GNU-Emacs (at least, in my perception, I might be wrong). If you look at the current packs, there are some base packages like lambda, various xlib/util/…/, various libraries extensively using coroutining, etc. These are used by packs that are bring more application oriented. Would be nice to have a dependency diagram :slight_smile: One example I did pull in was format_spec, analysing format/2 strings. I needed it for library(check) as well as library(sandbox). The original code had dependencies on stuff that wasn’t really needed, so I had to do some serious rewrite.

I think 1 and 3 are nobrainers. The issue is 2. Should they be merged gradually into 1, where we deal centrally with all build and dependency issues. Should they remain as is, and leave it to package maintainers to turn them into packages and have them installed using swipl pack install, usually with some trouble, if the package manager of choice did not create a package?

Let me be clear, I don’t know. I know people like the fact that simply installing SWI-Prolog gives them a system with a lot of goodies that are (almost) always nicely compatible. But, the downside is that this does not scale :frowning: