While library(persistency) allows for tracking and restoring clauses associated to a predicate, it doesn’t scale all that well. After some recent scalability discussions I though to explore the option to store clauses in RocksDB. The intermediate results are on Github. Below is a copy of the README. The current module provides this interface:
:- module(rocks_preds, [ rdb_open/2, % +Dir, -DB rdb_assertz/1, % +Clause rdb_assertz/2, % +Dir, +Clause rdb_clause/2, % +Head,-Body rdb_clause/3, % +Dir, +Head, -Body rdb_clause/4, % +Dir, +Head, -Body, ?CRef rdb_nth_clause/3, % +Head,?Nth,?Reference rdb_nth_clause/4, % +Dir,+Head,?Nth,?Reference rdb_load_file/1, % +File rdb_load_file/2, % +Dir, +File rdb_current_predicate/1, % ?PI rdb_current_predicate/2, % +Dir,?PI rdb_predicate_property/2, % :Head, ?Property rdb_predicate_property/3, % ?Dir, :Head, ?Property rdb_index/2, % :PI, +Spec rdb_index/3, % +Dir, :PI, +Spec rdb_destroy_index/2, % :PI,+Spec rdb_destroy_index/3 % +Dir,:PI,+Spec ]).
A lot is missing, such as rdb_abolish, rdf_retract, rdf_retractall, combined argument indexes, update the index on assert and (notably), actually running the predicates rather than only providing clause/3 access. Performance can probably be boosted a couple of times by moving the encoding and decoding of database keys and values from Prolog to C++.
I know persistent predicates have been part of several Prolog systems in the past. I’m wondering what the experiences are with these and what functionality is crucial?
Thanks --- Jan
Readme from the repo
This library builds on top of the rocksdb add-on. Right now it requires the GIT version of SWI-Prolog and recompiling the
rocksdb pack from source.
The idea of this library is to see whether we can build a persistent predicate store on top of RocksDB. The current implementation is there to access different database organizations to access how realistic this is and what kind of performance and scalability is achievable.
The database maps string to Prolog terms. The key is a string because this allows for a RocksDB prefix seek (using rocks_enum_from/4) to enumerate objects. Keys:
<PI>\u0001<0-padded hex clause no>→ (Head :- Body)
Key for the Nth clause or PI.
Key we can enumerate to find defined predicates
Key for properties of PI
Key for a clause index created on PI
If arguments related to Index have term_hash/4 Hash, return list
of candidate clauses.
Measured on AMD3950X based system, 64Gb mem and M2 SSD drive.
- 21 files, 21 predicates, 821,492 clauses, 34Mb source text
- Load time: 11.7 sec.
- RocksDB size: 99Mb
- rdb_index(hyp/2, 1) (89,089 clauses) → 1.35 sec.
- random query time on hyp(+,-): 10usec
- From 1.7Gb HDT file (+ 850Mb index)
- Load 123,020,820 triples in 2088 sec (CPU)
- Load performance (clauses/sec) is constant.
- RocksDB size: 5.2Gb
- Count triples: 165 sec (152 sec for HDT).
- rdb_index(rdf/3,1) → 1383 sec.
- random query time on rdf(+,-,-): 30usec (HTD: 7.8usec)
For now, the access predicate is rdb_clause/3. This is fine for facts. We could execute the code by calling the body as below.
p(X) :- rdb_clause(p(X), Body), call(Body).
The disadvantage of this is that the cut is scoped to Body in that case. This can be fixed by interpreting the body or have a call/1 variation that does not scope the cut.
If a predicate is small, simply extract it from the database and call
If it is larger, create a predicate
p(Index, Hash, Arg1, … ArgN) :-