Load Factor in SWI-Prolog library(nb_set)

Just out of curiosity was taking up this lead:

  • Low Load Factor (< 0.5): Reduces collisions, so lookups and insertions are faster but at the cost of more memory overhead. Hash tables are less densely packed, making the average lookup time closer to O(1).

  • High Load Factor (> 0.7): Increases collisions, especially in hash tables with open addressing or linear probing, which can degrade lookup and insertion to O(n) in the worst case. It saves memory but may require more rehashing as the table fills up.

So I made a little test with SWI-Prolog library(nb_set):

/* Load Factor 1.0, Current Implementation */
?- time((between(1,1000,_), test, fail; true)).
% 34,150,149 inferences, 1.687 CPU in 1.744 seconds (97% CPU, 20237125 Lips)
true.

/* Load Factor 0.75, Experiment Below */
?- time((between(1,1000,_), test2, fail; true)).
% 30,188,706 inferences, 1.578 CPU in 1.641 seconds (96% CPU, 19129477 Lips)
true.

Test cases were random insertion:

test :-
   empty_nb_set(X), 
   (between(1,1000,_), random(Y), Z is floor(Y*1000), 
   add_nb_set(Z,X), fail; true).

test2 :-
   empty_nb_set(X), 
   (between(1,1000,_), random(Y), Z is floor(Y*1000), 
   add_nb_set2(Z,X), fail; true).

The experimental load factor 0.75 was implemented with:

add_nb_set2(Key, Set) :-
    add_nb_set2(Key, Set, _).
add_nb_set2(Key, Set, New) :-
    arg(1, Set, Buckets),
    compound_name_arity(Buckets, _, BCount),
    nb_set:hash_key(Key, BCount, Hash),
    arg(Hash, Buckets, Bucket),
    (   member(X, Bucket),
        Key =@= X
    ->  New = false
    ;   New = true,
        duplicate_term(Key, Copy),
        nb_linkarg(Hash, Buckets, [Copy|Bucket]),
        arg(2, Set, Size0),
        Size is Size0+1,
        nb_setarg(2, Set, Size),
        (   4*Size > 3*BCount
        ->  nb_set:rehash(Set)
        ;   true
        )
    ).

Little clue. Thinking about it, I suspect that a closed table implementation is probably both faster and uses less memory.