SQLite dependency in a pack

Boris · July 22, 2024, 1:28pm

I found time to work on my (probably misguided) effort of having yet another way of interfacing SWI-Prolog with SQLite. On the way I learned something I knew and have already forgotten. From the SQLite docs:

The use of the amalgamation is recommended for all applications.

As it turns out, while any Linux and Mac OS do come with SQLite, those are compiled differently and in fact behave differently. I haven’t double-checked but the version installed might be different, too.

Question: is it a good idea to include the sqlite3.c and sqlite3.h files as they come in the c subdirectory of my pack? SQLite is public domain so from that angle there doesn’t seem to be any issues, is that right? How about any other considerations I am missing? One small detail is that the two files together are 9.5Mb on disk…

PS: The “amalgamation” is interesting enough as a concept. All ~300K lines of C (quite some of it is docs embedded in the comments) are “amalgamated” in those two files, and the docs claim:

And because all code is in a single translation unit, compilers can do better inter-procedure and inlining optimization resulting in machine code that is between 5% and 10% faster.

The docs then go on to talk what to do if your IDE cannot handle source files of this size

jan · July 22, 2024, 2:25pm

Smells like a semi-religous discussion Some like modularity and reuse, others like a single file. I tend to prefer the first. If you copy the source you have to watch for upstream (security) fixes and enhancements. If you reuse a distributed version your Linux packager or MacOS package manager deals with that, but you need to deal with the possible incompatibilities. There is no free lunch

As for multiple files to gain performance, modern C compilers can do link time optimization and at least gcc can do “whole project” compilation.

I guess it is a matter of taste and, to some extend, priorities …

I have a similar discussion with @peter.ludemann about the space package that suffers from portability issues due to the unstable C++ API of libgeos

Boris · July 22, 2024, 2:31pm

Thank you for your answer @jan . I have no opinion on what is better, I was just trying to ~~summarize the things I learned~~ repeat the thing I read while trying to figure out what is going on and why I get different results on Linux and Mac OS.

If I rephrased the question:

Question: assuming I want to have consistent behavior of my embedded SQLite; should I just use the provided “amalgamation” by including the full source in my pack? Is there any reason not to do it?

You gave one reason, I need to update it myself. This is correct and I am not sure how to deal with that. The probability that my code needs fixing and enhancing is infinitely higher than the probability that SQLite does, goes my thinking. I really don’t know.

The other option is? Not even sure. Conditional compilation of my C code depending on how the pre-installed SQLite library was built? Something else?

Mac OS is a bit of a mystery to me anyway, they provide subtly incompatible versions of most GNU tools (things like wc and awk and grep) derived from BSD, so completely different pedigree from what you find on Linux. It is sometimes annoying but apparently you deal with that by installing the GNU versions.

jan · July 22, 2024, 3:22pm

Checking the changelog of SQLite may tell you how serious the update problem is. Do they have frequent security patches? Portability patches? Enhancements?

Normally, you’d use CMake to configure the project, find SQLite and if necessary find details about it that you need to react upon. The environ pack is a complete pack with a CMake configuration.

That is a good traditional reason and in the days there where zillions of Unix dialects very useful. Possibly an important reason for GNUs popularity For SWI-Prolog, most of the scripting for building it is in Prolog

peter.ludemann · July 22, 2024, 5:00pm

On the other hand, rocksdb has a stable C++ API (although not necessarily ABI stable) and I routinely update it to the latest without problems (pack(rocksdb) has facebook/rocksdb as a submodule because we want to compile it to a .a file rather than a .so).

I’ve tried using that and it was so slow that I gave up.

I think the discussion is somewhat related to .so files versus .a files (“static” executables that have no dependencies). I’ve used both at different companies and I much prefer static executables - but this was when working in a very big mono-repo, with tools that would detect changes to dependencies (but would dependabot solve this? - I subscribe to a few projects that use dependabot and that seems to work for them, but I don’t know how hard it is to set up a workflow with dependabot).

maren · July 22, 2024, 8:32pm

While it is true that different OSes might ship slightly different versions of a package, the situation is not quite so bad as it is with incompatibilities between gnu and bsd tools, since there’s still only one vendor of sqlite with one source history. You might get a different feature set depending on what was compiled, but it will still be the same type of library, with the same mechanism of discovering what version of that library you are on, and with what options it was compiled.

A big advantage of not directly embedding sqlite in your pack, but instead getting it from the environment, is that you give more freedom to your pack user to use a very specific compile of sqlite that fits a problem better. If you’re embedding all the sqlite code in your pack, you’re making it a lot harder to override.

Note that while you can probably do compile-time checks, for sqlite there’s also the option to do all your feature checks at runtime, as both the query toplevel and the c/c++ api let you discover what features it was compiled with: Run-Time Library Compilation Options Diagnostics Pragma statements supported by SQLite

peter.ludemann · July 23, 2024, 12:34am

If you’re embedding the code (as a git submodule), all you need to do is remove the submodule update from the Makefile or config file, the do a git checkout <tag> in the submodule and rebuild. This is easier than specifying a version with apt (and on my Chromebook, I’m more-or-less locked into whichever version the Debian maintainers decided to use).

maren · July 23, 2024, 12:57am

It is not impossible. There are always ways to make things work.
But there are disadvantages to doing it that way. Now you have to go patch the source, essentially making you maintain a private fork just to import a different version of sqlite. No easy pack_install for you.

It also forces people using the pack to always source compile sqlite, when installing the pack, even if they have a perfectly fine dynamic library already, or could download one easily from their distro.

There are many ways of combining dependencies into a functional system, and this is why we have so many different distros and package managers. Your particular environment might be a little inflexible, and arguably that’s a feature of a lot of distros. It is what lets them provide some stability. Part of that stability is using tested versions of a package like sqlite, which is completely subverted when software packages just import and build their own version.

In other environments, such as when building dedicated containers, or when using Nix, you have a lot more flexibility about the combination you make. It is not necessarily always obvious though that a particular software package is using some custom source that you might want to override. People might reasonably assume that if they replace sqlite system-wide (for example, by building from source and installing in a container, or by using an overlay in Nix), that this would affect all programs using sqlite, but with no formal dependency relation, that just won’t happen.

So yeah, I’m a big believer in not making these packaging decisions for people. And especially in the case of sqlite, given there’s clearly a lot of effort there to provide a stable and inspectable cross-platform interface, why would you?

peter.ludemann · July 23, 2024, 5:31am

Instead of having to edit the Makefile or config, we could allow options with pack_install/2 to specify key=value pairs for the environment, make, configure – these could do things such as specify a specific version of sqlite3, whether or not to do submodule update, etc.

On my Chromebook I get sqlite3 3.40.1-2 (the latest release is 3.46) … no choice as far as I can tell (at least, I haven’t figured out how to add a custom PPA).
I could, of course, download sqllite3 from github or wherever, and install it in ~/.local; but that requires ensuring that LIBRARY_PATH is set appropriately. Sadly, Linux doesn’t have a good solution for libraries that aren’t in a standard location such as /usr/lib/x86_64-linux-gnu (unlike, e.g., Plan9), although in this case, sqlite3 is an executable, so it merely requires setting PATH.

Boris · July 23, 2024, 6:04am

Thank you for your answers @jan, I will specifically look at the recent history of the SQLite project to get a feel of how good or bad it looks.

@peter.ludemann Something like dependabot or renovate is a great idea really. At the very least I will get alerted if there is a patch upstream. I will look into it.

Boris · July 23, 2024, 7:07am

There is some misunderstanding here. I shouldn’t have posted a question without enough code to show, so I apologize for this.

The SQLite distribution model seems to be that you take the full source, drop it in your project, and configure SQLite using compile-time options. You could have as many differently configured versions of SQLite as you have compilation units. It practice many applications come with SQLite embedded in this way and they all co-exist happily on the same system.

What you certainly cannot do is use a feature that has been disabled at compile time; or change a compile time option if it doesn’t fit your use case.

This is exactly where it goes wrong. On Mac OS, when running CMake, I get:

-- Found SQLite3: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.5.sdk/usr/include (found version "3.43.2")

Version 3.43.2 was released in Oct 2023, and there are about 10 releases after it (currently at 3.46.0). I am at Apple Inc’s mercy when it comes to fixes and patches.

The concrete problem I faced: it seems that SQLite found by CMake on my Mac is compiled with the option SQLITE_OMIT_AUTORESET, in contrast to the SQLite provided by my Linux distro. You should read the full doc entry for that option but here is a small snippet:

But that change caused issues in other improperly implemented applications that were actually looking for an SQLITE_MISUSE return to terminate their query loops. (Anytime an application gets an SQLITE_MISUSE error code from SQLite, that means the application is misusing the SQLite interface and is thus incorrectly implemented.) The SQLITE_OMIT_AUTORESET interface was added to SQLite version 3.7.5 (2011-02-01) in an effort to get all of the (broken) applications to work again without having to actually fix the applications.

This is one of tens of compile-time options. In another example, foreign key support depends on a combination of two compile-time options. Foreign keys are a major differentiator for relational databases and reason enough for me to bother with SQLite at all.

jan · July 23, 2024, 7:30am

I agree to most of what is being said in this topic. The final choice is in general not easy.

Well, I decided to add rocksdb as a git submodule for a number of reasons:

In those days the library was rather unstable. Quite often the packaged version in Ubuntu was broken.
Packaged rocksdb was often linked to jemalloc, which conflicts with SWI-Prolog’s memory allocation. That is not a problem if you link librocksdb.so into an application as the application will have one consistent allocator. It is a problem if you dynamically load the library as you end up with two allocators.
We need rocksdb compiled with C++ RTTI, which is often not enabled for the packaged versions.

So, there was little choice. For HDT I decided to embed a modified version because the “official” version is often not available as pack and lacking a proper maintainer, patches are not centrally merged For the Rserve client I embedded the code also for lack of a proper central version. I had to take the central version, apply various patches from forks and added some additional R type support myself to make it do what I needed.

Still, my overall preference is to dynamically link official versions of dependencies.

peter.ludemann · July 31, 2024, 5:37pm

The space pack has problems because the “official versions” have an unstable C++ API (the C API is stable). There’s a separate discussion about that here: Future of the space package? - #7 by peter.ludemann

All of this presumes that the binary formats from various compilers are compatible. My understanding is that assumption isn’t always true on Windows.

Topic		Replies	Views
Yet another SQLite binding for SWI-Prolog Pack discussion	13	131	January 12, 2025
Building prosqlite from source ? (with a view towards osquery) General	1	108	June 2, 2024
Building prosqlite from source ? (with a view towards osquery) General	1	126	June 2, 2024
Pack rocksdb c++ issue Help!	7	783	May 27, 2020
Connecting to sqlite3 with ODBC: choosing a database file? Tools how-to	3	1333	March 19, 2023

SQLite dependency in a pack

Related topics