New hashtable library

With the new library(hashtable) and the following code:

hashtable_gen(N) :-
   ht_new(HT),
   forall(  between(1,N,I),
            (  ht_put(HT,I,I)
	    )
   ),
   assertz(hashtable(HT)).

hashtable_lookup(N) :-
   hashtable(HT),
   forall(  between(1,N,I),
            (  ht_get(HT,I,I)
	    )
   ).

hashtable_cleanup(N) :-
   hashtable(HT),
   forall(  between(1,N,I),
            (  ht_del(HT,I,I)
	    )
   ),
   retractall(hashtable(_)).

I get the following error:

1 ?- hashtable_gen(100).
true.

2 ?- hashtable_lookup(100).
ERROR: Arithmetic: evaluation error: `zero_divisor'
ERROR: In:
ERROR:   [12] _62620 is 8085836 mod 0
ERROR:   [11] hashtable:ht_get(ht(0,0,[](_62674)),1,1) at /home/u/tmp/swipl-devel/build.release/home/library/hashtable.pl:264
ERROR:   [10] hashtable_lookup(100) at /home/u/bld/prolog/tmp/db_bench.pl:170
ERROR:    [9] <user>
3 ?- hashtable_cleanup(100).
false.

4 ?- 

Also, how do you delete the entire hashtable? We have a predicate for deleting keys (ht_del/3), but how do we delete the entire hashtable?

I was planning to look at this library more carefully, but now in a hurry:

I don’t think you should be using forall/2 for this. This is a backtrackable data structure?

So, if you use ht_pairs/2 instead:

:- use_module(library(hashtable)).

:- dynamic hashtable/1.

hashtable_gen(N) :-
    numlist(1, N, Xs),
    pairs_keys_values(Pairs, Xs, Xs),
    ht_pairs(HT, Pairs),
    assertz(hashtable(HT)).

hashtable_lookup(N) :-
    hashtable(HT),
    forall(between(1, N, X),
           ht_get(HT, X, X)).

and now:

?- hashtable_gen(100).
true.

?- hashtable_lookup(100).
true.

But I don’t like this code either, because I don’t see a good reason to “save” the hashtable like this.

Also, why “clean” the hashtable yourself? Just throw it away and make a new one seems easy enough? (SWI-Prolog is garbage collected, right?)

If you have saved it like you did, I think the retractall/2 should be good enough to get rid of it.

Disclaimer: this is now a bit in a hurry, I am almost certainly overlooking something.

1 Like

OK, I think I was too hasty. If you actually want to test if you can add 100 values one by one, you’d have to write more “procedural” code, like this:

hashtable_gen(N) :-
    ht_new(HT),
    hashtable_add_n(N, HT),
    assertz(hashtable(HT)).

hashtable_add_n(0, _) :- !.
hashtable_add_n(N, HT) :-
    ht_put(HT, N, N),
    succ(N0, N),
    hashtable_add_n(N0, HT).

@jan or whoever else can help: I admit I do not quite understand the code snippet in the docs to ht_put/5.

This:

ht_put_list(HT, Key, Value) :-
    ht_put(HT, Key, [Value|Tail], [], Tail).

It says, “This can be used to bootstrap managing a list of values”. Do you mean a list of values associated with a key? So you add a new value to the front of the list already associated with this key?

Yes. I don’t even know how to use foreach/2 to emulate calling ht_put/3 N times in a row. How would you do it?

PS: the way I see it, forall/2 is the leakiest possible abstraction for \+ (Condition, \+ Action) (and it is not advertised as anything more than that!).

It also seems that there could be a helper predicate for adding a list of Key-Value pairs to an already existing hashtable (does it exist already? didn’t find it). Something like:

ht_put_list(HT, List) :-
    maplist(ht_put_pair(HT), List).

ht_put_pair(HT, Key-Value) :-
    ht_put(HT, Key, Value).

The semantics are up for discussion though… what should happen with duplicate keys? Replace? Fail? Add to a list? Maybe this is why it is “missing” :slight_smile:

I don’t think you want to assert hash tables in general.

This is a bug in ht_get/3. Pushed a fix. Thanks!

The docs say so :slight_smile: It is a term, just like library(assoc(, etc.

Yes. More in general, you compose a new aggregated value from the already existing one or the IfNew argument if there is no existing one. You can also use this to count, as in

ht_put(HT, Key, New, 0, Old),
New is Old+1.

These loops are from Joachim Schimpf (ECLIPSe). I have an implementation based on the original paper (which is also used by SICStus). I played a bit and in general got quite confused with several corner cases. ECLIPSe seems to have improved on that later. The current code is quite deeply depending on ECLIPSe though, so I left it for now. I’m not against logical loops, but mostly as a portability helper.

I thought it was as simple as this, but this does not work because foreach/2 copies the hash table.

?- ht_new(HT), foreach(between(1, 10, X), ht_put(HT, X, X)).

Not sure. As you state, there are several possible variations. These are dealth with using the ht_put variations. We do not really want to duplicate this. Also the much older library(assoc) has translation from pairs both ways, but no incremental adding of a list.

2 Likes

library(yall) translates if the expression is found and translated by goal_expansion/2 and uses copy_term/2 otherwise. So, the semantics when dealing with destructive assignment in terms is rather poorly defined. Hash tables, being terms subject to destructive assignment, should be handled with care wrt. many high level constructs, some of which making implicit copies.

Yes, I agree that for application code this is true. The code snippet I showed was simply for a micro-benchmark that I did for different data structures. and I wanted to add hashtables to it.

I thought it was a bug, thanks for the quick fix!

EDIT: Is there a predicate to completely remove the hashtable? i.e. the opposite of ht_new/1?

With the same code as above, now (after the fix) I am getting:

1 ?- hashtable_gen(100).
true.

2 ?- hashtable_lookup(100).
false.

3 ?- hashtable_cleanup(100).
false.

I didn’t expect line 2 and 3 to fail. Am I missing something?

You cannot use forall/2 for this. You also cannot use foreach/2. You need to do the loop yourself, for example (quoting myself ;-):


Try the following, to see what happens:

?- ht_new(HT), ht_put(HT, a, 1).

and now try as if it was inside forall/2:

?- ht_new(HT), \+ ( \+ ht_put(HT, a, 1) ).
1 Like

You shouldn’t need to do that. SWI-Prolog has garbage collection.

If you really insist, you could probably do something like:

setarg(1, HT, 0),
setarg(2, HT, 0),
setarg(3, HT, [](_)).
1 Like

Thanks, that clears it up for me; although I didn’t expect it givent that HTs are mutable structures.

It is just a Prolog term like all others and thus, when using threads, any thread gets a copy and any subsequent modification in the thread only applies to this specific copy.

Mutable, but backtrackable. This is the difference between setarg/3 and nb_setarg/3 (“nb” stands for “non-backtrackable”). You should read the whole section probably, “Non-logical operations on terms”.

I had myself tried to read it carefully more than once and didn’t quite “get it” for a long time.

It would be nice if there were a consistent set of predicates for
hashtable
dict
assoc
rbtree

For example, rbtree lookup has the key as the 2nd arg; the others have it as the 1st. (I prefer it as the 2nd because that plays nicer with predicates like maplist/3). Also, each of these libraries has a slightly different way of iterating or backtracking over all values. And little details, such as dict_create/3 is probably redundant. Etc.

Also: any benchmarks on hashtable versus assoc/rbtree?

Database micro-benchmark

Ahh I see, this is what threw me off. I had a mental model that the hashtable was a blob object produced in C and it had to be “released” somehow.

Indeed we get inconsistent results with the following code:

ht_put1(HT,I) :-
   ht_put(HT,I,I).

ht_get1(HT,I) :-
   ht_get(HT,I,I).

ht_del1(HT,I) :-
   ht_del(HT,I,I).

loop(_,0) :- !.
loop(Goal,N) :-
   N > 0,
   call(Goal,N),
   N1 is N - 1,
   loop(Goal,N1).

Query:

15 ?- ht_new(HT), loop(ht_put1(HT),10), aggregate_all(count,ht_gen(HT,K,V),Cnt).
HT = ht(10, 32, [](_12110, _12112, _12114, _12116, _12118, _12120, _12122, _12124, _12126, _12128, _12130, _12132, _12134, _12136, _12138, _12140, _12142, _12144, _12146, _12148, _12150, _12152, _12154, _12156, 7, 7, 1, 1, 9, 9, _12170, _12172, 2, 2, _12178, _12180, 8, 8, _12186, _12188, 3, 3, _12194, _12196, _12198, _12200, _12202, _12204, 10, 10, 4, 4, _12214, _12216, 6, 6, _12222, _12224, _12226, _12228, 5, 5, _12234, _12236, _12238)),
Cnt = 10.

15 ?- ht_new(HT), loop([I]>>ht_put(HT,I,I),10), aggregate_all(count,ht_gen(HT,K,V),Cnt).
HT = ht(0, 0, [](_15340)),
Cnt = 0.

So the new hashtable library doesn’t work with library(yall).

EDIT: IMHO, this makes it less useful, and would cause hard to debug problems if one is not familiar with the intricacies of the new hashtable library.

Dear @peter.ludemann,

I have updated the micro-benchmark code to support hashtables and these are the results (for version 8.3.4-11-g1db629e24):

          Database raw and crude micro benchmark           
                     1,000,000 entries                     
Database  Operation                              Wall time
--------  ---------                              ---------
nb        1st lookup  .......................... 0.137 secs.
trie      1st lookup  .......................... 0.203 secs.
rec       1st lookup  .......................... 0.262 secs.
consult   1st lookup  .......................... 0.316 secs.
asrt      1st lookup  .......................... 0.801 secs.
hashtable 1st lookup  .......................... 2.317 secs.

nb        2nd lookup  .......................... 0.135 secs.
trie      2nd lookup  .......................... 0.201 secs.
rec       2nd lookup  .......................... 0.259 secs.
asrt      2nd lookup  .......................... 0.301 secs.
consult   2nd lookup  .......................... 0.309 secs.
hashtable 2nd lookup  .......................... 2.361 secs.

nb        insert      .......................... 0.236 secs.
trie      insert      .......................... 0.267 secs.
rec       insert      .......................... 0.294 secs.
asrt      insert      .......................... 0.432 secs.
hashtable insert      .......................... 7.498 secs.
consult   insert      .......................... 23.479 secs.

However, the hastable numbers are skewed because I couldn’t use forall/2 and had to use a manual loop due to the backtracking nature of the hashtable library. The real numbers are probably better.

The full benchmark code is here:

Micro-Benchmark code
:- dynamic keydb/2,
           trie/1,
	   consult_db/2,
	   consult_file/1,
	   hashtable/1,
	   db_bench/2.

:- use_module(library(assoc)).
:- use_module(library(apply_macros)).

go :-
   writeln('Database raw and crude micro benchmark'),
   N = 1 000 000,
   format('   -> Testing ~:d entries~n',[N]),
   retractall(db_bench(_,_)),
   test_db(rec,N),
   test_db(nb,N),
   test_db(asrt,N),
   test_db(consult,N),
   test_db(trie,N),
   test_db(hashtable,N),
   print_header(N),
   print_results.


test_db(Name,N) :-
   atomic_list_concat([Name,'_',gen],Gen),
   atomic_list_concat([Name,'_',lookup],Lookup),
   atomic_list_concat([Name,'_',cleanup],Cleanup),

   setup_call_cleanup(
      (  writeln(Gen),
         benchmark_goal( [Name,insert], call(Gen,N))
      ),
      (  writeln(Lookup),
         benchmark_goal( [Name,'1st lookup'], call(Lookup,N)),
         write(Lookup), writeln(' (2nd time)'),
         benchmark_goal( [Name,'2nd lookup'], call(Lookup,N))
      ),
      (  catch( call(Cleanup,N),
                error(existence_error(procedure,_),_),
	        true)
      )
   ) .




                         /**********************
                          *    Db operations   *
                          **********************/
rec_gen(N) :-
   forall(  between(1,N,I),
            (  recordz(I, I)
	    )
   ).

rec_lookup(N) :-
   forall(  between(1,N,I),
            (  recorded(I, I)
	    )
   ).

rec_cleanup(N) :-
   forall(  between(1,N,I),
            (  recorded(I, I, Ref), erase(Ref)
	    )
   ).



nb_gen(N) :-
   forall(  between(1,N,I),
            %(  atom_number(A,I), nb_setval(A, I)
            (  nb_setval('500', I)
	    )
   ).

nb_lookup(N) :-
   forall(  between(1,N,_),
            (  nb_getval('500', _)
	    )
   ).

nb_cleanup(_) :-
   nb_delete('500').



asrt_gen(N) :-
   %retractall(keydb(_,_)),
   forall(  between(1,N,I),
            (  assertz(keydb(I,I))
	    )
   ).

asrt_lookup(N) :-
   forall(  between(1,N,I),
            (  keydb(I,I)
	    )
   ).

asrt_cleanup(_) :-
   retractall(keydb(_,_)).



trie_gen(N) :-
   trie_new(Trie),
   assertz(trie(Trie)),
   forall(  between(1,N,I),
            (  trie_insert(Trie,I,I)
	    )
   ).

trie_lookup(N) :-
   trie(Trie),
   forall(  between(1,N,I),
            (  trie_lookup(Trie,I,I)
	    )
   ).

trie_cleanup(_) :-
   trie(Trie),
   trie_destroy(Trie),
   retractall(trie(_)).



consult_gen(N) :-
   File = '/tmp/consult_db.db_bench',
   assertz(consult_file(File)),
   setup_call_cleanup(
      open(File,write,Stream),
      consult_write_terms(Stream,N),
      close(Stream)
   ),
   load_files([File]).

consult_write_terms(Stream,N) :-
   forall(  between(1,N,I),
            (  format(Stream,'~w.~n',[consult_db(I,I)])
	    )
   ).

consult_lookup(N) :-
   forall(  between(1,N,I),
            (  consult_db(I,I)
	    )
   ).

consult_cleanup(_) :-
   consult_file(F),
   retractall(consult_file(_)),
   delete_file(F).


hashtable_gen(N) :-
   ht_new(HT),
   loop( ht_put1(HT), N),
   assertz(hashtable(HT)).

hashtable_lookup(N) :-
   hashtable(HT),
   loop( ht_get1(HT), N).

hashtable_cleanup(N) :-
   hashtable(HT),
   loop( ht_del1(HT), N),
   retractall(hashtable(_)).

ht_put1(HT,I) :-
   ht_put(HT,I,I).

ht_get1(HT,I) :-
   ht_get(HT,I,I).

ht_del1(HT,I) :-
   ht_del(HT,I,I).

loop(_,0) :- !.
loop(Goal,N) :-
   N > 0,
   call(Goal,N),
   N1 is N - 1,
   loop(Goal,N1).


                         /**********************
                          *    Nice printout   *
                          **********************/
print_header(N) :-
   nl,
   format('~t~w~t~59|~n',['Database raw and crude micro benchmark']),
   format('~t~:d entries~t~59|~n',[N]),
   format('~w ~t~10|~w ~t~12+~t~48| ~w~n',['Database','Operation','Wall time']),
   format('~w ~t~10|~w ~t~12+~t~48| ~w',  ['--------','---------','---------']).

print_results :-
   bagof( b(Db,W),
          S^(db_bench([Db,Type|_], S), get_dict(wall_time,S,W)),
	  Res),
   sort(2,'@<',Res,Sorted),
   nl,
   maplist( {Type}/[b(Db,Wall)]>>
            format('~w ~t~10|~w ~t~12+~`.t~48| ~3f secs.~n',[Db,Type,Wall]),
            Sorted),

   fail.

print_results.



                            /****************
                             *    Helpers   *
                             ****************/

% Measure, print and store wall and cpu time
% used by Goal.
benchmark_goal(BenchMark,Goal) :-
   get_time(OldWall),
   statistics(cputime, OldCpu),
   call(Goal),
   get_time(NewWall),
   statistics(cputime, NewCpu),
   UsedWall is NewWall - OldWall,
   UsedCpu  is NewCpu  - OldCpu,
   assertz( db_bench(BenchMark, stat{  cpu_time: UsedCpu,
                                       wall_time: UsedWall })),
   print_message(information, bench(UsedCpu, UsedWall)).

prolog:message(bench(Cpu, Wall)) -->
   { Perc is round(Cpu*100/Wall) },
   [ '~3f CPU in ~3f seconds (~w% CPU)' - [Cpu, Wall, Perc] ].

3 Likes

Interesting. I only compared the performance to the CHR hash tables. Here the result was close time-wise and much better memory wise. Hash tables should also compare positively to trees memory wise. Note that using -O (optimize) may help as hash tables require arithmetic while the others do not.

The other potential advantage is that the code is pretty easily moved largely to C.