Greetings, everybody! I would like to ask if you know of any library anywhere that has got efficient (amortized constant time) Hash Tables. I am in dire need of a Hash Table for SWI Prolog, and it will contain 10k-700k key-value mappings. (of course, I will provide a hash function since keys are going to be tuples of ints: thinking of implementing those by 123’/'123 e.g.)
Having tried the library(assoc), I can tell that its O(logn) performance with those big constants isn’t favoring my application unfortunately… Therefore, speed is of utmost importance to me, and backtrackability is not (in this case) for me. I would prefer in this part of my code destructive modifications and as fast as possible.
Are you aware of any such implementations somewhere, or how I could make this work?
Note: I have also searched the newly proposed dict structure, and, as the documentation on its own states, it is totally inappropriate for speed needs…
In that case, the first solution is to use a dynamic predicate and let clause indexing do the work. That has no problems with millions of clauses and gives excellent performance.
10 ?- kv_get_test(100 000).
% 599,999 inferences, 0.119 CPU in 0.119 seconds (100% CPU, 5049332 Lips)
true.
That is 1.2 µsecs per lookup, and that is INCLUDING generating a random number for each lookup!!! I wouldn’t look any further- thanks to Jan for such a great work
UPDATE: If I take the random number generation out, I am getting about 190 NANO-SECONDS per loopkup!! (I7 core). Hard to beat that.
assoc is a AVL (balancing binary tree) that requires log(N) calls to compare/3 and accompanying insert/re-balance operations. Compare that to clause index hash tables that compute that hash key from the value and do an O(1) lookup (provided a good distribution). Finally, clause indexing is in C and library(assoc) in Prolog.
No. The database is not backtrackable.
It is not very hard to write a hash table in Prolog that manages its data on the stacks. Several people have done so. library(nb_set) implements a non-backtrackable hash table. This implementation is easily changed to be backtrackable. It should also be pretty straightforward to write the whole thing in C and gain some extra performance. As is, we get this (distinct/2 uses nb_sets and spends most of its time there).
In what way would such a hash table be (non) functionality different from AVLs – what is the key benefit of having an AVL as hash table – is it access time – but, i guess, that would also be improved in the “naive” implementation due to, for example, better indexing.
Access time for a hash table is typically better, but it is hard to define hash table update operations as a relation. This requires a copy of the main array, whereas updating a tree requires copying log(N) small terms. In other words, hash tables only make sense when implemented as a destructively updating data structure and this isn’t very Prolog minded.
distinct/1,2 plays a fundamentally different role: it modifies an answer stream without buffering the whole thing before producing the first answer. That property is quite interesting if you are not interested in all answers. As a consequence distinct answers are defined as the store not containing a variant. I doubt that can be avoided.
It just passes on the first answer for which it doesn’t have a variant in its hash table. Possibly it should unify the answer with the created copy. I think that complicates the code quite a bit though and I’m not 100% sure that is better or worse. The code is pretty simple and just ?- edit(distinct). should show it.