Ann: SWI-Prolog 9.3.14

While my original post started with a ChatGPT lead, where I
proposed a rehash threshold of 0.75. I have a new ChatGPT
lead for what was done in release 9.3.14:

  • No Linked Lists for Overflow: Unlike separate chaining, where multiple elements can hash to the same index by storing them in a linked list, open addressing has to resolve collisions within the array itself. This creates a scenario where even a single collision forces a probe sequence, leading to more potential conflicts along the way.

  • Cluster Formation: As more elements are added, clusters (groups of filled slots) can form, especially in linear probing. These clusters grow, making future insertions or lookups within these clusters more prone to further collisions. This effect is called primary clustering and significantly impacts performance as the load factor approaches 1.

We see this, the new implementation is slower. My code snippet
with keeping separate chaining would have improved the 2.204 seconds.
Now the open adressing slows down, needs more inferences:

/* SWI-Prolog 9.3.13 */
?- time(test3).
% 34,383,478 inferences, 2.141 CPU in 2.204 seconds (97% CPU, 16062355 Lips)
true.

/* SWI-Prolog 9.3.14 */
?- time(test3).
% 39,694,226 inferences, 2.609 CPU in 2.657 seconds (98% CPU, 15212158 Lips)
true.

Whether the present open addressing is more memory savy I don’t
know. One surely spares the separate chaining. On the other hand
the new implementation uses a rehash threshold of 0.5,

making the primary table bigger. Test case:

distgen.p.log (842 Bytes)