Where is libswipl.a?

@jan thanks for that detailed post describing the problem in more depth, it was extremely useful for me, and it made me realize that perhaps we can find a solution.

As you say, it is a mess, but if we are able to think about our needs in a different way we may be able to find a solution.

The reason is that we don’t really need, for foreign functions, a full C runtime environment with dynamic linker, etc, our needs, I think, are rather modest.

In order to achieve foreign predicates with static and dynamic libraries what we need is:

  1. The address of the foreign function and some of its other properties, e.g. number of parameters, some flags
  2. The code of the function available in memory, ready to be executed
  3. To be able to call the function pointer and pass it the arguments

Dynamic libs

For regular dynamic libraries as we use today, the three needs are satisfied as follows in the current code:

  1. Number 1 is satisfied by dlsym and by the registration/installation function when the shared lib is loaded
  2. Number 2 is satisfied by the dynamic linker
  3. Number 3 - There is no problem with this, C provides this functionality

Static binary

What I was thinking is that we are not bound by dlsym to get the function address, so we could store a per architecture foreign symbol table with the relative address of the foreign functions (relative to a stable function, like PL_initialise(...) in the saved state.

This foreign symbol table (foreign function table) would also contain all the other needed information such as number of arguments, flags, etc.

The symbol table would be created by qsave_program/2 (which is run from the static swipl binary that will be used for the standalone version), and stored in the saved state. This symbol table would later on be used by the same swipl static binary (to which the saved state is appended) to find the address in order to call the foreign function. This way we have a stand-alone file which has the static swipl binary including add-ons, foreign static libraries, packages, res:// and a saved state appended (which includes the foreign symbol table).

In this case the needs above would be satisfied as follows:

  1. dlsym would be substituted by code that reads from the foreign symbol table in the saved state.
  2. The code of the function is already loaded because it is a static binary.
  3. This is no problem in any case.

To obtain the relative address of the foreign functions an initialization routine could be called for each static library that is used (the name of the registration function can be based on the name of the library, just like today we can have something like install_mylib() for shared libraries). The relative address against PL_initialise would then be calculated and this address stored in the foreign symbol table when qsave_program/2 is called. This address offset should be stable for each architecture (and, of course, the static binary used).

This is the main idea, some things would have to be flushed out, like versioning, building the swipl binary with required libs. etc. Am I missing something? Would this work?

EDIT: I don’t think static binaries are really needed for Windows since bundling dlls and needed files in a directory works just about everywhere.

I’m not really there yet. As is, the code that goes into what is now a shared object file X has an install function that should be called install_X (or just install, but that as that causes issues on some platforms we’ll discard that). How am I going to find the relative address of this function? Using nm? As we need a C compiler and linker anyway, why not simply add a function that calls all the install functions and call that from main()?

I think there is a point in disregarding Windows for pure static binaries as it is both less needed and generating a static library is the most complicated for this platform.

The main issue issue I see is how we should collect the dependencies and link requirements of the various individual static libraries from which we compose the final executable. That might all be doable using CMake, exporting all dependencies from CMake and provide a CMake skeleton for assembling a fully static binary with user extensions and those part of the system needed.

If someone wants to give it a try …

The solution that Google took for Python is to not try for static executables, but to try for “hermetic” executables - that is, self contained. It’s similar to Java’s “JAR” files – the “executable” is actually a zip file that automatically unzips itself and executes. This presents some problems on diskless machines and I don’t know the solution (if any) for them – a modified dlopen that can read from file-like streams rather than requiring a file name would be one solution.

Here’s an old presentation by a one of Jan’s fellow Amsterdammers (and a member of the Python Software Foundation):

We surely can go that way. I wonder how it relates to snap/flatback/… techniques? These approaches create a controlled environment using files mounted using loopback mounts (similar to MacOS bundles). The good news is that these technologies are widespread in the Linux world these days and support both static and dynamic libraries while controlling dependencies. Yes, quite a bit of Linux kernel support is required for this to work. MacOS has a bit simplified version of this that leads to more duplication as every bundle needs to be fully self-contained except for the standard MasOS libraries. In contrast, snaps depend on layers stacked on top of each other. The difference is not terrible. The MacOS bundle is 28Mb while the Linux snap is 19Mb as it gets Qt and KDE from another layer.

The solution I mentioned above has the following advantages:

  1. It works on read-only filesystems
  2. It can open the path to android apps by bundling the static binary with the android java app, and talking to it using a pengines java client.

Here is the process flushing out some more details.

The static binary creation lifecycle

Instead of a one step process ran by cmake, the making of the static binary would follow a two step process:

  1. Making base prolog system.

    • Controlled by: cmake
    • Executed by: Jan, OS packager, swipl power users
    • Artifacts produced:
      • libswpl_base.a, which contains all static libraries corresponding to the shared libs in the current swi prolog source. This includes the libraries for packages/*. It also includes static versions of desired and required third-party libraries, libz.a, libgmp.a, etc.
      • swipl: this is the base static binary linked with libswpl_base.a
  2. Configure and produce deployable static binary with prolog application:

    • Controlled by: qsave_program/2
    • Executed by: prolog developer wanting a deployable artifact
    • Artifacts produced:
      • One static binary containing/linked with:
        ** libswpl_base.a produced above
        ** libmylib.a, libmyaddon.a, libxxx.a: these implement user provided foreign functions to be imported by use_foreign_library/2 , load_foreign_files/3. These include downloadable add-ons, etc
        ** the saved state from the prolog program
    • Steps performed by qsave_program/2 to produce the deployable static binary;
      1. Gather names of necessary static add-ons, libraries, etc from qsave_program/2 options
      2. produce a small C source code file containing an install_static_libraries function and a main function which calls
        install_static_libraries and runs prolog just like the swipl binary above.
      3. The install_static_libraries function produces a foreign function symbol table with all the information needed by use_foreign_library/2, load_foreign_files/3.
      4. Compile and link the C source produced in step 2 with libswpl_base.a, libmylib.a, libmyaddon.a, libxxx.a

The artifacts from Step 1 can be installed by the OS package manager, while step two is executed by the prolog developer by calling qsave_program/2.

It is completely reasonable to expect the presence of a C compiler/linker if the prolog developer wants to produce a deployable static binary of his application. rust, golang, etc of course require the compiler to be present to produce a static binary.

Notice that the process above also has an advantage if the user doesn’t have any additional add-ons, or foreign code beyond the ones provided by the base system in step 1: A static binary can be produced even without a C linker/compiler by simply appending the saved state to the base static binary produced in step 1.

what do you think?

1 Like

That should mostly work.

That is one solution. The other is to simply generate a C function linked with the main function that initializes the foreign components. The use_foreign_library/1 is than simply a dummy. Both may have advantages. Leaving the talk to initialize the foreign code to use_foreign_library/1 has the advantage that the initialization happens as part of the module loading as it normally happens. Might be better. We can simply generate a foreign predicate that initializes the foreign extensions. That should be solvable.

This would imply that the default distribution contains the dynamic stuff, the static libraries and the static executable. Not sure I like blowing up the installation that much. Anyway, that is step two.

I guess this first of all requires a volunteer to setup notably CMake generation of static versions of the add-ons and CMake infrastructure to link a static executable given a set of add-ons. That is surely not on my shortlist …

yes, I think there is an advantage to keeping it as it normally happens.

Yes, I don’t like the bloating either. Since they are only two static self-contained files they could be downloaded on first use from the swi-prolog.org website.

Yes, this is the first step to get it done. I would be happy to do it, but at the moment I am not able to; perhaps someone is able to volunteer?