Cooking: internal data representation

SWI-Prolog’s internal representation of Prolog data is shaped in the days of 32 bit machines. In the early days, 32 bits where plenty to address 8Mb of physical memory and keep various bits for type tagging and garbage collection. Physical memory followed Moore’s law and more types were added. The 32 bits became problematic and more are more indirect ways to detect the type were added. Concentrating on threads, where each thread has its own stack, I decided at some point to move to relative addresses, representing Prolog data a tagged (typed) offset to the base of a denoted stack. That limited the stack sized to 128Mb per stack on 32 bit systems (per stack).

64 bit hardware allows a much simpler and more efficient data representation. The x64 architecture is (if my information is correct) capable of 48 bit virtual addresses and 52 bit physical (I was rather surprised to see the virtual address space is small than the physical). We are only concerned with the virtual address space, so that gives us 16 bits to use as we please without limiting the addressing capability. We only need 8: 2 for GC and 6 for the type tag is enough to avoid all the clumsy type inference we have now and keep plenty of space for extensions. I think that is a plan that should be future proof, at least for several decades.

But, as @jfmc also confirmed, we cannot dispose of the 32 bit architecture yet as we have small embedded systems and WASM. A dual representation is too complicated to maintain. So, I decided to explore making all Prolog data 64 bit, also on 32-bit architectures. That is currently available as a branch 64-bits in swipl-devel.git. The awkward data representation is almost completely kept. Only representation of full 64 bit integers without big number support is already dropped, leaving only inline integers (now 57 bits) and using GMP/LibBF for anything that does not fit.

On 64 bit systems there is at the moment very little visible impact. It is a little (barely measurable) faster. On 32 bit systems this means that it can address stacks only limited by system memory rather than 128Mb. Of course, it requires twice as much stack space. Program size is barely affected as VM instructions are still 32 bits. Some data structures now use 64 bit instead of 32 bit values. As is, I think this mostly affects tabling while it is probably possible to reduce the table size again. Performance on 32 bit Windows has decreased a bit while, somewhat unexpectedly, performance on WASM has improved a bit. I think the latter is more interesting.

I think the experiment can be considered a success and this is a viable route to simplify the data representation. On 64 bit systems this should allow for better performance and possibly less memory usage as not everything needs to be 64 bits and we now have support to reduce the size of some data structures.

Current plan

  • Backport current swipl-devel to stable (almost all can be copied)
  • Move the development series to this new code.
  • Redesign the data representation and develop a step-by-step plan to realize this.

If you see problem or opportunities to do things better/different, please share.

8 Likes

This wouldn’t surprise me if WASM were actual hardware, because 64-bit-aligned memory access might be faster (128-bit-aligned might be even better, depending on how the cache is implemented).

There were quite a few changes in branch 64-bits … is a reasonable summary of the changes: “changed int to int64_t or equivalent”? (And also unsigned int to uint64_t). Presumably this breaks binary compatibility for the API, so it might be making some other changes that have been postponed due to binary compatibility issues.

Actually, this version is pretty much binary compatible. The VM instructions for (small) integers changed, so .qlf files are not compatible. The only other external API type that changed is PL_atomic_t, dealing with the somewhat dubious dict API where we want to be able to pass both atoms and tagged integers as keys. The other side effect is that PL_get_integer() no longer accepts a float that happens to be int. That was more a left over from old days when SWI-Prolog only had small tagged integers and floats that were also used for larger integers.

I’m almost ready to merge. All platforms work fine, except for the VS2022 build due to some padding issue with LibBF large integers. This is probably an old issue, but the test uses a 63 bit integer that used to be tagged, but is now a bit int. I’ll find that :slight_smile: The whole exercise fixed several small issues.

Nothing will change there. We will probably still write variables as relative addresses to keep the numbers small. Preserving variable ordering during GC is rather important as without we cannot use order based set representations for variables. While one should be careful in avoiding the ordering being changed as a result of instantiation or even variable sharing, this is a rather important feature IMO (and one that is not guaranteed by ISO AFAIK).

If I recall correctly, Scryer consider(s/ed) to represent small atoms “inlined”. This has been proposed also by Paul Tarau. With a uniform 64 bit representation this becomes more interesting. I ran some statistics on some programs a while ago and was not convinced. I do not recall the exact figures, but it didn’t seem worthwhile to introduce the complexity of another atom representation.

For the char representation, SWI-Prolog uses an array of code-page arrays to cache mapping a code point to an atom. In the other direction we have to get the atom data and convert it. It is not terribly efficient, but I have no evidence that it causes serious slowdown.

Nice to know. I have no intend in that direction though. Floats are of not much importance in Prolog. Richard O’Keefe talked me into separating floats rigorously from integers (which was not the case in really old versions) and I think for a good reason. In Javascript we can end up with an integer that is the result of float rounding. I don’t think that is what we want.

I was only refering to NaN boxing. Not to blurring the integer and float
type. But it turned out that my hypothesis was wrong since JavaScript
doesn’t use NaN boxing, at least the V8 engine doesn’t use NaN

boxing per se according to reddit opinions, but it seems other browsers
and JavaScript engines did use NaN boxing or variants of it. On the other
hand it seems that LUAJit uses or used NaN boxing:

LUAJit - Design aspects of the VM

  • NaN-tagging: 64 bit tagged values are used for stack slots and
    table slots. Unboxed floating-point numbers (doubles) are
    overlayed with tagged object references. The latter can be
    distinguished from numbers via the use of special NaNs as tags.
    It’s a remote descendant of pointer-tagging.

http://lua-users.org/lists/lua-l/2009-11/msg00089.html

The benefit of NaN boxing would be that you get the float directly,
without unmasking a tagged pointer. Only everything non-number
has a some tag bits set. Could be of interest to clpBNR.

Edit 08.04.2024
The reddit thread discusses mixing 32-bit floats and 64-bit floats as
well. You can also use NaN boxing to introduce a 32-bit float type.
But I don’t think SWI-Prolog has such a float type. Formerly

Jekejeke Prolog had such a float type for a while, but I yeeted it.
Sadly I didn’t find a more academic and scientific paper yet about
NaN boxing, only reddit thread and lua mailing. Is there no paper?

NaN boxing seems a nice idea for applications that use predominantly floats. I don’t think Prolog is in that category. Also, SWI-Prolog uses mark-and-sweep GC, which implies we need two bits on each data, which we cannot fit in the unboxed double. Well, we could use the two least significant bits, but @ridgeworks would get unhappy :frowning:

Finally, I don’t want to end up in the same trap as in the old days where I thought 32 bits is plenty. Now I think 64 allows me to take 8 of them and be good for decades. Having only the 48 bit NaN space (if my memory suits me) for all the non-float types is tight.

Stuff that does not fit into the tagged 8 byte cell is pushed onto the stack guarded by two “guard cells”. They contain the size and type of the guarded bits. So, a double on the stack is

<guard 1 cell, float> <the float> <guard 1 cell, float>

This trick is used for doubles, strings, big integers (in the new version > 57 bits) and rational numbers.

These can of course become garbage. Even compounds may be split up by GC, i.e., the compound itself is garbage, but some if its arguments remain accessible. This is all nicely described in Mats Carlson’s TPLP paper on the SICStus garbage collector.

This doesn’t hold for tagged integers, since they have usually value
semantics, and not pointer semantics.

So you might have two compounds f("abc") and g("abc") and
the string "abc" is shared, so the f-compound might become
unreachable but the g-compound still reachable,

meaning "abc" is also still reachable:

         [  f  ,  *--]--v 
                        abc 
         [  g ,   *--]--^

In value semantics there is no sharing, like if you have f(123)
and g(123) you don’t need to mark 123. The tagged integer
is aggregate into the compound. If the compound f goes

away then 123 goes away with it, there is no “split up”:

         [  f ,  123 ] 
                        
         [  g ,  123 ]

Same would happen with NaN boxing and floats. If you have
64 bits pointers you can give the floats a value semantics, just
like short integers via tagged pointers usually have,

and thus they would get aggregate into the compound:

         [  f ,  3.1415 ] 
                        
         [  g ,  3.1415 ]

Edit 10.04.2024
Interestingly Java has refused to adopted tagged integers
in the past. They did other tricks, like the integers -128 to 127
had preallocated cached objects. But somehow I have the feeling

this is changing now. Recently new Integer() was deprected,
one should use Integer.valueOf(). Maybe so that applications
always capitalize on the cache, or as an early sign of tagging?

Have to check what is cooking. The LuaJIT reference is already
old, like 10 years old. The problem is that you need special casing,
invoking a method cannot assume pointer semantics receiver

anymore, suddently a value semantics receiver might be around. But
since SWI-Prolog has already tagged integers, I suspect there exists
already some experience in this special casing of having in the

same place sometimes value objects and sometimes pointer
objects, the same experience could extend to floats.

Stealing two bits from the mantissa would be a mistake since that breaks IEEE rounding semantics, but I suppose you could steal them from the exponent, decreasing the total range of finite floating point values. Of course that would require additional software to mask/restore GC bits and to check any results for (62 bit) overflow which would probably mitigate against any performance advantage.

In the bigger picture, wouldn’t using the extended NaN range foreclose on anybody else (application or library) from using floating point NaN’s for their own purposes, whatever that might be?

So IMO you would need some compelling reasons to consider this approach.

1 Like

First you don’t need to steal any bits from the mantissa or
the exponent, since in NaN boxing, the non-NaN float representation
is unaffect, which works if you use floats with value semantics.

The NaN boxing would be only used for other types than floats.
For example if the qNaN and sNaN range is used, anything
with exponent 7FF is a Nan. But the mantissa has to be non-zero:

x111 1111 1111 xxxx 
xxxx xxxx xxxx xxxx 
xxxx xxxx xxxx xxxx 
xxxx xxxx xxxx xxxx 

So the NaN space is more than 48 bits, its in fact 53 bits.
A NaN constant is typically a qNaN, for example JavaScript
uses this qNaN below as a NaN constant, so you would

reserve 7FF8000000000000 . But I guess the
Chrome browser doesn’t use NaN boxing, since I find,
that it preserves different NaNs. Try this code:

const f2b = (x) => new Uint8Array(new Float64Array([x]).buffer);
const b2f = (x) => new Float64Array(x.buffer)[0];

const b = f2b(NaN);
console.log(b);
b[1] = 255;
console.log(typeof b2f(b));
console.log(f2b(b2f(b)));

Gives me:

> Uint8Array [0, 0, 0, 0, 0, 0, 248, 127]
> "number"
> Uint8Array [0, 255, 0, 0, 0, 0, 248, 127]

Well thats also not 100% conclusive. It could also
pretend to not use NaN boxing. Like if you have a dual
implementation with unwrapped and wrapped floats.

Just like you can promote small ints to big ints after
a certain operation, like when a sucessor operation
leaves the small ints, it could also switch from

value semantics to pointer semantics for exoctic values.
This would protect against forging of NaN-boxed pointers,
and would give back the full NaN range to the application

programmer. But maybe this is too far fetched. Also IEEE
has not much operations to distinguish NaNs, except
if you ask for a binary representation of a float.

I understand this is still a topic under discussion. I was referring specifically to @jan’s point:

I interpreted this as to mean that even float values would have to contain the two GC bits which means that IEEE semantics of the bits are no longer enforced.

I also understand that the NaN “domain” is significantly larger than a single value most of which is unused, at least in SWIP. The bigger question is that if that additional data space is used by SWI-Prolog for other tagged, non-float values, does that conflict with any other, application or library specific, uses of the NaN data space? And if so, does it matter? This may be the reason other platforms, e.g., Chrome, avoid this approach.

I don’t understand why anybody would even have this conclusion.
Especially this below is not part of my proposal, its not needed for
NaN-boxing that you treat floats like > 57 bit integers:

The above is the anti-thesis to NaN-boxing. That is not what is understood by
NaN-boxing. Thats a total misinterpretation of NaN-boxing. NaN-boxing means
you would treat non-NaN floats just like < 57 bits integers. The 56 bits

results from subtracting 2 bits from 64 bit for GC masks, and then 6 bits
from 62 bits for type, resulting in 56 bits. Well that doesn’t work since in
NaN-boxing there are only 53 bits. So if you subtract 2+6 you get 45 bits.

So you cannot make the small ints that large. Maybe yon can reduce
the number of bits that you use for the type, since now floats have
already a type. Don’t know. You could use this encoding,

M = mark bits, and T = type bits:

M111 1111 1111 MTTT 
TTTx xxxx xxxx xxxx 
xxxx xxxx xxxx xxxx 
xxxx xxxx xxxx xxxx 

But if you want to have 48 virtual memory access, I
guess you would try to reduce the number of type bits.
Not sure whether this is possible, maybe do this:

T111 1111 1111 TTMM
xxxx xxxx xxxx xxxx 
xxxx xxxx xxxx xxxx 
xxxx xxxx xxxx xxxx 

Only 3 type bits might be indeed tight. This is only 8 types.
Although a simple Prolog has only 5 types (integer, float,
atom, compound, variable), and you don’t need to encode

float as a type, its just exponent =\= 7FF. And just like
for small ints you would not use any M bits, for small ints
or for floats, since they don’t need to be marked, they

are already inline in a surrounding compound. This would
give 2 additional bits for small ints. But you need to code
carefully, that they don’t get marked. But with the additional

two bits the small ints would have 50 bits.

Edit 2024.04.11
What I don’t know is whether it can be utterly complicated.
Like different platforms have different float Big-Endian or
Little-Endian orientation. So NaN boxing might be non-trivial.