While library(persistency) allows for tracking and restoring clauses associated to a predicate, it doesn’t scale all that well. After some recent scalability discussions I though to explore the option to store clauses in RocksDB. The intermediate results are on Github. Below is a copy of the README. The current module provides this interface:
:- module(rocks_preds,
[ rdb_open/2, % +Dir, -DB
rdb_assertz/1, % +Clause
rdb_assertz/2, % +Dir, +Clause
rdb_clause/2, % +Head,-Body
rdb_clause/3, % +Dir, +Head, -Body
rdb_clause/4, % +Dir, +Head, -Body, ?CRef
rdb_nth_clause/3, % +Head,?Nth,?Reference
rdb_nth_clause/4, % +Dir,+Head,?Nth,?Reference
rdb_load_file/1, % +File
rdb_load_file/2, % +Dir, +File
rdb_current_predicate/1, % ?PI
rdb_current_predicate/2, % +Dir,?PI
rdb_predicate_property/2, % :Head, ?Property
rdb_predicate_property/3, % ?Dir, :Head, ?Property
rdb_index/2, % :PI, +Spec
rdb_index/3, % +Dir, :PI, +Spec
rdb_destroy_index/2, % :PI,+Spec
rdb_destroy_index/3 % +Dir,:PI,+Spec
]).
A lot is missing, such as rdb_abolish, rdf_retract, rdf_retractall, combined argument indexes, update the index on assert and (notably), actually running the predicates rather than only providing clause/3 access. Performance can probably be boosted a couple of times by moving the encoding and decoding of database keys and values from Prolog to C++.
I know persistent predicates have been part of several Prolog systems in the past. I’m wondering what the experiences are with these and what functionality is crucial?
Thanks --- Jan
Readme from the repo
Store predicates in RocksDB
This library builds on top of the rocksdb add-on. Right now it requires the GIT version of SWI-Prolog and recompiling the rocksdb
pack from source.
The idea of this library is to see whether we can build a persistent predicate store on top of RocksDB. The current implementation is there to access different database organizations to access how realistic this is and what kind of performance and scalability is achievable.
Database organization
The database maps string to Prolog terms. The key is a string because this allows for a RocksDB prefix seek (using rocks_enum_from/4) to enumerate objects. Keys:
<PI>\u0001<0-padded hex clause no>
→ (Head :- Body)
Key for the Nth clause or PI.meta\u0001<PI>
→true
Key we can enumerate to find defined predicates<PI>\u0002<Property>
→ Term
Key for properties of PI<PI>\u0003<Index>
→ Status
Key for a clause index created on PI<PI>\u0004<Index>\u0002<Hash>
→ list(ClauseRef)
If arguments related to Index have term_hash/4 Hash, return list
of candidate clauses.
First results:
Measured on AMD3950X based system, 64Gb mem and M2 SSD drive.
Wordnet 3.0
- 21 files, 21 predicates, 821,492 clauses, 34Mb source text
- Load time: 11.7 sec.
- RocksDB size: 99Mb
- rdb_index(hyp/2, 1) (89,089 clauses) → 1.35 sec.
- random query time on hyp(+,-): 10usec
RDF (Geonames)
- From 1.7Gb HDT file (+ 850Mb index)
- Load 123,020,820 triples in 2088 sec (CPU)
- Load performance (clauses/sec) is constant.
- RocksDB size: 5.2Gb
- Count triples: 165 sec (152 sec for HDT).
- rdb_index(rdf/3,1) → 1383 sec.
- random query time on rdf(+,-,-): 30usec (HTD: 7.8usec)
Future
For now, the access predicate is rdb_clause/3. This is fine for facts. We could execute the code by calling the body as below.
p(X) :-
rdb_clause(p(X), Body),
call(Body).
The disadvantage of this is that the cut is scoped to Body in that case. This can be fixed by interpreting the body or have a call/1 variation that does not scope the cut.
Cashed execution
-
If a predicate is small, simply extract it from the database and call
it. -
If it is larger, create a predicate
p(Index, Hash, Arg1, … ArgN) :-
Body.