Internal data representation changes in GIT version

The changes announced with Cooking: internal data representation have been pushed to the swipl-devel.git master branch.

Users should be aware that internally a lot has changed. The commits affect nearly 4,000 lines spread over 115 files. Unfortunately, while all tests pass, some regression is to be expected. Please test and report issues

Consequences

  • The QLF format is incompatible. You may have to delete old *.qlf files from the build directory if you do an incremental build. You can do so by running this in the build directory:

    swipl qlf clean -ra
    
  • The 64 bit version should be fully compatible

  • On 32 bit version (still distributed for Windows, can be build on small embedded Linux systems such as Raspberry Pi (although these also move to 64 bits) and the WASM version) there are a few more consequences:

    • The C API type PL_atomic_t changed from 32 to 64 bits (used for dict keys)
    • The system can use all system memory for the Prolog stacks rather than 128Mb per stack. Stack usage is practically doubled.
    • Program size (compiled code) is not affected.
    • Internal hash tables map 64 bit to 64 bit. This should have little impact on most programs, but space used for tabling is significantly more. Future versions may enhance on that.

Next steps

Timing for the next steps is still unclear. Assuming these changes hold, several steps come into view. For example:

  • Get rid of relative addressing for Prolog data. That almost surely will improve performance. It will also simplify the garbage collector and allows for Prolog data that does not live on the stack of the running engine. Think about shared ground data or allow engines to access each others data.
  • Simplify the data type tags. That should simplify the system and improve performance.
  • Consider using 32 bit VM instructions on 64 bit hardware. That would seriously reduce the size of compiled code on 64 bit machines.
1 Like

The changes are running ok so far, but the size of the executable increased (this is a 64 bit system) :

3029307 bin/b.after-64bitdata
2287166 bin/b.b4-64bitdata

859361 bin/x.after-64bitdata
832724 bin/x.b4-64bit-data

The b executable bundles data inside itself and so the size increase is noticeable ( 32%). My guess is that this could cause problems for people with large datasets, e.g. 1TB would possibly become 1.4TB or maybe more depending on the type of data. This could be significant (without much advantage).

If this problem also happens for RAM usage then it certainly is a major issue, especially for small systems.

Just for comparison, the arm 32bit version (compiled with older swipl, without the 64 bit enhancements):

303530 bin/x.arm.b4-64bitdata

Unfortunately I am not able to compile a new binary on 32bit arm at the moment, but I suspect the size will double.

I produce the executables with:

$(SWIPL)   --no-pce --stand_alone=true --foreign=save  -o <out> -c <pl files>

That is rather odd. I hadn’t tested this and there should not be much change. On sCASP I get
a small degradation, from 876033 to 867671. Not sure what to think of this. Everything should be the same except for the integer instructions (in the head, body and optimized arithmetic). In all these scenarios, in the old version we have

  • Tagged ints as *_SMALLINT + value
  • 64 bit ints as *_INTEGER + value
  • Bigints as size + bits.

In the new version we have

  • tagged ints as *_SMALLINT + value, with the remark that on 32 bit systems there are two variants of this, one for numbers that fit in 32 bits and one for numbers that fit in 64 bits. The state/qlf code is the same, i.e., the mapping is done while loading the .qlf file to make qlf files portable.
  • Bigints as size + bits.

I.e., the explicit 64 bit integer instructions have gone. That should lead to some more overhead in the compiled binary and some slowdown. This applies for (signed) integers that require 57 to 64 bits.

To get a 30% increase you’d have to have a lot of such integers in your code. Does that ring a bell? Note that we are talking about integers in the code, not those computed during execution. If this is not the case, could you use some bi-sectioning of your code to find the predicates responsible for the change and use vm_list/1 on them to see whether they are compiled differently? With a little hacking in pl-qlf.c it should be possible to print the (file) space occupied by every predicate.

That should not be the case. The VM code should be precisely the same (as the two versions of the smallint instructions are mapped onto one more abstract code) and when loaded into memory each VM instruction is the same of a void* on both architectures.

Yes, that is correct. In the code I bundle facts of the form:

data(Integer,Integer,String).

And I have about 35_000 such facts. This is a small data set. I plan to bundle about 30 times more facts in the executable.

It is a rather common use case to have more than 100_000 facts with integers, such as if you want to keep facts about cities/municipalities/states and geographic places with integers representing population count, count of deaths, counts of births, counts of sick people, counts of hospitals, etc. Something like:

geolocation(Name,CountOfBirths,CountOfDeaths,CountofHospitals,CountOfEtc).

I can easily see having 1_000_000 of those facts, and to want them bundled in an executable or stored in a qlf file.
I got an increase of 32% with just two such integers, but if you have more integers (the example above has 4) the growth would be exponential.

I really think this is a no-go for the 64-bit data change unless there is a way to solve this problem. This is especially true because prolog (especially SWI-Prolog) is well suited to handle large knowledge bases.

The change from 32-bit to 64-bit would increase your memory usage by 8MB. Is that a problem?
(A million facts doesn’t seem very large to me – some of my test code has that many facts and I anticipate a production system would have 100x or 1000x as many, which has led me to start looking into things like HDT, Redis, and rocksdb).

Thanks sharing your concerns. There are a number of issues

Representation

  • Cheap vs expensive ints. In the old version we have
    • Inlined ints. They take a single cell and have (had) 57 bits on 64-bit hardware and 25 on 32-bit hardware. They take a single VM code inside a clause and use zigzag encoding in the QLF.
    • 64 bit ints. On the stacks they take 3 (64 bit) or 4 (32 bit) words. They take 1 (64 bit) or 2 (32 bit) VM arguments and use the same zigzag encoding on the qlf file
    • GMP/LibBF numbers. On the stacks these use 2 words, a size and the bits rounded up to the word (32/64) size. The clause representation is the same, except that the final guard word is lacking, so it is one word shorter. In Qlf files they currently use 4 bytes size followed by the exported bits rounded to bytes.
  • In the new version the inlined ints are always 57 bits, which makes integers in the range 25-57 bits a lot cheaper on the stacks. 64 bit ints are gone. Big ints are the same, except that the they use rounding to 64 bits rather than 32 bits on 32 bit hardware.

This means handling 25-57 bit integers on 32 bit hardware improves while handling 57-64 bit integers degrades on all platforms. As a result several hundreds of lines of code are removed, avoiding a lot of opportunities for bugs and gaining some performance.

Your problem

What I do not really get from your description is why so many of your integers end up in the range 57-64 bits? If that is true, why? If it is not true, there is probably some implementation bug. If there are indeed so many integers in this range, it would be fairly simple to enhance the qlf format. I’ll probably do that anyway. That still blows up the clause size for 57-64 integers from 1 to 3 machine words. That is a bit harder to improve on. The stack representation is not dramatically affected. On 64 bit hardware it will typically grow from 3 to 4 machine words.

For short, the crucial question is why 57-64 bit integers are dominant in your application?

P.s. In my current plans the small-and-fast integers may go down to 56 bits. Surely not smaller.

Hmm. I start to suspect something else is wrong in your case. Consider the program below, which is a rather extreme case of your example.

“Old” version (stable 9.2.3):

~/src/swipl/build.pgo/src/swipl qlf compile big.pl 
QLF file is 4,300,704 bytes
p/3 uses 14,400,120 bytes

“New” version

swipl qlf compile big.pl 
QLF file is 4,000,704 bytes
p/3 uses 19,200,120 bytes

Program (big.pl)

term_expansion(ints(N), Clauses) :-
    length(Clauses, N),
    maplist(mk_clause, Clauses).

mk_clause(p(N1,N2,N3)) :-
    N1 is 1<<56 + random(1<<62-1<<56),
    N2 is 1<<56 + random(1<<62-1<<56),
    N3 is 1<<56 + random(1<<62-1<<56).

ints(100_000).

stats :-
    size_file('big.qlf', FileSize),
    format('QLF file is ~D bytes~n', [FileSize]),
    predicate_property(p(_,_,_), size(Size)),
    format('p/3 uses ~D bytes~n', [Size]).

:- initialization
    stats.

hmmm…this is strange. All the facts I have use integers below 1000 so surely they fit in 57 bits.

I will try to make a test case, but it may take me some time since it is a busy time for me.

@jan, found the cause of the discrepancy. The file increase was not due to the 64 bit changes, but because of shared libraries included in the standalone executable.

Here is the real change caused by the 64bit code. I extracted the member size with unzip -l <standalone file>

Before 64 bit data (swipl stable 9.2.2):

 7943190  2024-04-08 15:56   $prolog/state.qlf
 [... 19 shared libraries...]

After 64 bit data (swipl 9.3.3-173-g33e8ef5b6):

7894232  2024-04-08 15:58   $prolog/state.qlf
[... 19 shared libraries ...]

So, as expected, there is no real change, because I am using small integers that fit in 57 bits.

The size quoted in my original post comes from an older codebase which used only 9 shared libraries instead of the new codebase that uses 19 shared libraries. That was the main source of the discrepancy. Sorry for my mistake.

1 Like