Cooking: internal data representation

SWI-Prolog’s internal representation of Prolog data is shaped in the days of 32 bit machines. In the early days, 32 bits where plenty to address 8Mb of physical memory and keep various bits for type tagging and garbage collection. Physical memory followed Moore’s law and more types were added. The 32 bits became problematic and more are more indirect ways to detect the type were added. Concentrating on threads, where each thread has its own stack, I decided at some point to move to relative addresses, representing Prolog data a tagged (typed) offset to the base of a denoted stack. That limited the stack sized to 128Mb per stack on 32 bit systems (per stack).

64 bit hardware allows a much simpler and more efficient data representation. The x64 architecture is (if my information is correct) capable of 48 bit virtual addresses and 52 bit physical (I was rather surprised to see the virtual address space is small than the physical). We are only concerned with the virtual address space, so that gives us 16 bits to use as we please without limiting the addressing capability. We only need 8: 2 for GC and 6 for the type tag is enough to avoid all the clumsy type inference we have now and keep plenty of space for extensions. I think that is a plan that should be future proof, at least for several decades.

But, as @jfmc also confirmed, we cannot dispose of the 32 bit architecture yet as we have small embedded systems and WASM. A dual representation is too complicated to maintain. So, I decided to explore making all Prolog data 64 bit, also on 32-bit architectures. That is currently available as a branch 64-bits in swipl-devel.git. The awkward data representation is almost completely kept. Only representation of full 64 bit integers without big number support is already dropped, leaving only inline integers (now 57 bits) and using GMP/LibBF for anything that does not fit.

On 64 bit systems there is at the moment very little visible impact. It is a little (barely measurable) faster. On 32 bit systems this means that it can address stacks only limited by system memory rather than 128Mb. Of course, it requires twice as much stack space. Program size is barely affected as VM instructions are still 32 bits. Some data structures now use 64 bit instead of 32 bit values. As is, I think this mostly affects tabling while it is probably possible to reduce the table size again. Performance on 32 bit Windows has decreased a bit while, somewhat unexpectedly, performance on WASM has improved a bit. I think the latter is more interesting.

I think the experiment can be considered a success and this is a viable route to simplify the data representation. On 64 bit systems this should allow for better performance and possibly less memory usage as not everything needs to be 64 bits and we now have support to reduce the size of some data structures.

Current plan

  • Backport current swipl-devel to stable (almost all can be copied)
  • Move the development series to this new code.
  • Redesign the data representation and develop a step-by-step plan to realize this.

If you see problem or opportunities to do things better/different, please share.

8 Likes

This wouldn’t surprise me if WASM were actual hardware, because 64-bit-aligned memory access might be faster (128-bit-aligned might be even better, depending on how the cache is implemented).

There were quite a few changes in branch 64-bits … is a reasonable summary of the changes: “changed int to int64_t or equivalent”? (And also unsigned int to uint64_t). Presumably this breaks binary compatibility for the API, so it might be making some other changes that have been postponed due to binary compatibility issues.

Actually, this version is pretty much binary compatible. The VM instructions for (small) integers changed, so .qlf files are not compatible. The only other external API type that changed is PL_atomic_t, dealing with the somewhat dubious dict API where we want to be able to pass both atoms and tagged integers as keys. The other side effect is that PL_get_integer() no longer accepts a float that happens to be int. That was more a left over from old days when SWI-Prolog only had small tagged integers and floats that were also used for larger integers.

I’m almost ready to merge. All platforms work fine, except for the VS2022 build due to some padding issue with LibBF large integers. This is probably an old issue, but the test uses a 63 bit integer that used to be tagged, but is now a bit int. I’ll find that :slight_smile: The whole exercise fixed several small issues.

Nothing will change there. We will probably still write variables as relative addresses to keep the numbers small. Preserving variable ordering during GC is rather important as without we cannot use order based set representations for variables. While one should be careful in avoiding the ordering being changed as a result of instantiation or even variable sharing, this is a rather important feature IMO (and one that is not guaranteed by ISO AFAIK).

If I recall correctly, Scryer consider(s/ed) to represent small atoms “inlined”. This has been proposed also by Paul Tarau. With a uniform 64 bit representation this becomes more interesting. I ran some statistics on some programs a while ago and was not convinced. I do not recall the exact figures, but it didn’t seem worthwhile to introduce the complexity of another atom representation.

For the char representation, SWI-Prolog uses an array of code-page arrays to cache mapping a code point to an atom. In the other direction we have to get the atom data and convert it. It is not terribly efficient, but I have no evidence that it causes serious slowdown.

Nice to know. I have no intend in that direction though. Floats are of not much importance in Prolog. Richard O’Keefe talked me into separating floats rigorously from integers (which was not the case in really old versions) and I think for a good reason. In Javascript we can end up with an integer that is the result of float rounding. I don’t think that is what we want.

NaN boxing seems a nice idea for applications that use predominantly floats. I don’t think Prolog is in that category. Also, SWI-Prolog uses mark-and-sweep GC, which implies we need two bits on each data, which we cannot fit in the unboxed double. Well, we could use the two least significant bits, but @ridgeworks would get unhappy :frowning:

Finally, I don’t want to end up in the same trap as in the old days where I thought 32 bits is plenty. Now I think 64 allows me to take 8 of them and be good for decades. Having only the 48 bit NaN space (if my memory suits me) for all the non-float types is tight.

Stuff that does not fit into the tagged 8 byte cell is pushed onto the stack guarded by two “guard cells”. They contain the size and type of the guarded bits. So, a double on the stack is

<guard 1 cell, float> <the float> <guard 1 cell, float>

This trick is used for doubles, strings, big integers (in the new version > 57 bits) and rational numbers.

These can of course become garbage. Even compounds may be split up by GC, i.e., the compound itself is garbage, but some if its arguments remain accessible. This is all nicely described in Mats Carlson’s TPLP paper on the SICStus garbage collector.

Stealing two bits from the mantissa would be a mistake since that breaks IEEE rounding semantics, but I suppose you could steal them from the exponent, decreasing the total range of finite floating point values. Of course that would require additional software to mask/restore GC bits and to check any results for (62 bit) overflow which would probably mitigate against any performance advantage.

In the bigger picture, wouldn’t using the extended NaN range foreclose on anybody else (application or library) from using floating point NaN’s for their own purposes, whatever that might be?

So IMO you would need some compelling reasons to consider this approach.

1 Like

I understand this is still a topic under discussion. I was referring specifically to @jan’s point:

I interpreted this as to mean that even float values would have to contain the two GC bits which means that IEEE semantics of the bits are no longer enforced.

I also understand that the NaN “domain” is significantly larger than a single value most of which is unused, at least in SWIP. The bigger question is that if that additional data space is used by SWI-Prolog for other tagged, non-float values, does that conflict with any other, application or library specific, uses of the NaN data space? And if so, does it matter? This may be the reason other platforms, e.g., Chrome, avoid this approach.