SWI-Prolog’s internal representation of Prolog data is shaped in the days of 32 bit machines. In the early days, 32 bits where plenty to address 8Mb of physical memory and keep various bits for type tagging and garbage collection. Physical memory followed Moore’s law and more types were added. The 32 bits became problematic and more are more indirect ways to detect the type were added. Concentrating on threads, where each thread has its own stack, I decided at some point to move to relative addresses, representing Prolog data a tagged (typed) offset to the base of a denoted stack. That limited the stack sized to 128Mb per stack on 32 bit systems (per stack).
64 bit hardware allows a much simpler and more efficient data representation. The x64 architecture is (if my information is correct) capable of 48 bit virtual addresses and 52 bit physical (I was rather surprised to see the virtual address space is small than the physical). We are only concerned with the virtual address space, so that gives us 16 bits to use as we please without limiting the addressing capability. We only need 8: 2 for GC and 6 for the type tag is enough to avoid all the clumsy type inference we have now and keep plenty of space for extensions. I think that is a plan that should be future proof, at least for several decades.
But, as @jfmc also confirmed, we cannot dispose of the 32 bit architecture yet as we have small embedded systems and WASM. A dual representation is too complicated to maintain. So, I decided to explore making all Prolog data 64 bit, also on 32-bit architectures. That is currently available as a branch 64-bits
in swipl-devel.git
. The awkward data representation is almost completely kept. Only representation of full 64 bit integers without big number support is already dropped, leaving only inline integers (now 57 bits) and using GMP/LibBF for anything that does not fit.
On 64 bit systems there is at the moment very little visible impact. It is a little (barely measurable) faster. On 32 bit systems this means that it can address stacks only limited by system memory rather than 128Mb. Of course, it requires twice as much stack space. Program size is barely affected as VM instructions are still 32 bits. Some data structures now use 64 bit instead of 32 bit values. As is, I think this mostly affects tabling while it is probably possible to reduce the table size again. Performance on 32 bit Windows has decreased a bit while, somewhat unexpectedly, performance on WASM has improved a bit. I think the latter is more interesting.
I think the experiment can be considered a success and this is a viable route to simplify the data representation. On 64 bit systems this should allow for better performance and possibly less memory usage as not everything needs to be 64 bits and we now have support to reduce the size of some data structures.
Current plan
- Backport current swipl-devel to stable (almost all can be copied)
- Move the development series to this new code.
- Redesign the data representation and develop a step-by-step plan to realize this.
If you see problem or opportunities to do things better/different, please share.