Persistent predicates based on RocksDB

Hmmm … I’d need a clean machine to check what’s actually needed; but I’m too lazy to do that.

It’s a good idea to point to the build instructions page … the full package list is probably overkill (e.g., right now cmake and ninja aren’t needed); I’ll add some instructions for what to do if there’s a build failure.

I can live with that. I destroy and build Ubuntu machines on a weekly basis using WSL 2 on Windows so with regards to RocksDB just let me know what you need checked and the next time I do one I will add it to the list.


Yes it is. I was just wanted to show the exact command as people would be confused if I did not.


Just so I was not misunderstood. I was not expecting any written code changes just a line or two in the documentation pointing users where to go if that error occurs.

If SWI-Prolog with RocksDB works as we expect then perhaps it will bring many more users to using SWI-Prolog (sorry for excluding the other Prolog that don’t have this) including users that don’t know computer systems but know Prolog, E.g. Bioinformaticians, those using R, those using the other Logics included with SWI-Prolog that could work with large sets of facts, etc.

Typically, you should be able to build most C based programs on Debian based systems after

 apt install build-essential

All the other deps @EricGT mentions from the build instructions are for SWI-Prolog and its core foreign language extensions. If you got SWI-Prolog itself as package you don’t need those.

On MacOS you need Xcode for compiling basic extensions.

On Windows, you need MinGW-w64. Would be nice to have a good setup description for that.

For this package you also need git as it is downloaded using git.

I am trying to move away from using MinGW-w64 and MSYS2 as from what I have seen if one has a Linux based system installed under WSL 2 then they can leverage that instead. I have not done that for building SWI-Prolog for Windows but every day I work with RocksDB, it is one more day adding weight to that side of the scale to just do it.

If I do get it working I will surely post the instruction on this site.


EDIT

Found this set of instructions that look similar to what is needed for building on Windows using MinGW-64.

I am slowing moving toward actually building SWI-Prolog for Windows on Windows.

Personal notes

For the instruction

pacman -S base-devel git mingw-w64-x86_64-toolchain mingw-w64-x86_64-gcc mingw-w64-x86_64-cmake mingw-w64-x86_64-openssl mingw-w64-x86_64-qt5 mingw-w64-x86_64-ninja

pacman is a package management tool and is also a command.
Just copy the command as is and paste into the MSYS2 console.
Note that Ctrl-V does not work for pasting, instead use the middle mouse button which is mapped to paste.

In trying to build fastogt ran into error for instruction

python3 build_env.py

This GitLab repository might be needed.

pyfastogt · master · FastoGT / pyfastogt · GitLab

I’ve added this to the README, and also the comments about MacOS and Windows (neither of which I can test).

I copied 2 parts swi-prolog code from: 1 . example with file upload with multipart library, and
2 - a simple webserver with odbc postgress

can the uploaded file be embedded in the SQL query so that it inserts it in Postgress to blob record? or else inserted in rockdb as blob

1:

save_file(Request, Parts0, In, file(FileName, Path), Options) :-
    copy_term(Parts0, Parts),
    once(append(Parts, [], _)),                 % close the list
    debug(upload, 'Params so far: ~p', [Parts]),
    (   option(filename(FileName), Options),
        FileName \== blob
    ->  true
    ;   part(qqfilename, Parts, FileName)
    ),
    file_name_extension(_, Ext, FileName),
    part(qquuid, Parts, UUID),
    upload_file(UUID, Ext, Dir, Path),
    enforce_file_size_limit(Parts, FileName, Size),
    part_offset(Parts, Size, IsMulti, Offset),
    debug(upload, 'IsMulti = ~p, Offset = ~p', [IsMulti, Offset]),
    make_directory_path(Dir),
    setup_call_cleanup(
        open(Path, update, Out,
             [ type(binary)
             ]),
        (   seek(Out, Offset, bof, NewOffset),
            assertion(Offset == NewOffset),
            copy_stream_data(In, Out)
        ),
        close(Out)),
    (   IsMulti == false
    ->  broadcast(file_upload(FileName, Path, Request))
    ;   true
    ).

2:

sql_escape_single_quotes(StringIn, StringOut) :-
  split_string(StringIn, "'", "", List),
  atomics_to_string(List, "''", StringOut).

db_insert(Title, Art) :-
  sql_escape_single_quotes(Title, ETitle),
  sql_escape_single_quotes(Art, EArt),  
  odbc_connect('blog', Connection, []),
  odbc_query(Connection, "INSERT INTO arts (title, art) VALUES ('~w', '~w')"-[ETitle, EArt]),
  odbc_disconnect(Connection).

For (1), you should be able to replace the open(Path,update,Out) code by a rocks_put/3 call. If you’re appending, then you’d need to first do rocks_get/3 to get the existing blob. (And, of course, you’d need to have rocks_open/3 and rocks_close/1 at suitable places, similar to how you’d do a database connect.) The “merge” facility might help avoid the extra work when appending to a blob (I’ve only skimmed this; @jan can probably give better advice if it’s not clear from the documentation).

For (2), I think that the answer is “yes”. I’m not sure of the ODBC syntax … shouldn’t it use "?"s to avoid SQL injection? Also, you’d probably want to use a format like ~q or ~k to ensure round-trip of terms. In the case of using library(rocksdb), use the option option value(term), which use an optimized read and write that’s faster and more compact than regular term reading and writing.

I think this is slightly inaccurate. It’s Xcode CLI tools one needs. It is in fact possible to have these and not Xcode itself, as my own system proves.


Ian

@jan @peter.ludemann

Built a new Ubuntu machine from scratch, installed SWI-Prolog development via PPA and then ran check_installation/0. As I always find with the PPA install, tcmalloc is not present. (Note that with a source build it is present)

Is tcmalloc needed/advantageous for using SWI-Prolog with #RocksDB?

Built a new Ubuntu machine using PPA.
Used Git clone to bring down the SWI-Prolog repository for RocksDB.

Tried
pack_rebuild(rocksdb).
which failed with Error: program 'make' does not exist

Installed build-essentials

sudo apt update
sudo apt install build-essential

Retired pack_rebuild(rocksdb).
Error no longer appears.

:slightly_smiling_face:


Just to check which make commands are installed.

$ make -v
GNU Make 4.3
...

$ cmake -v
Command 'cmake' not found, but can be installed with: 
...

$ gmake -v
GNU Make 4.3
...

The Makefile for pack(rocksdb) has
ROCKSENV=ROCKSDB_DISABLE_JEMALLOC=1 ROCKSDB_DISABLE_TCMALLOC=1

@jan made a comment about the tcmalloc warning here:

1 Like

I think I read a comment from @jan somewhere that enabling tcmalloc for RocksDB caused some problem (but I can’t find this comment) … as RocksDB is multi-threaded, it would be nice if it could use tcmalloc … but possibly its use of tcmalloc somehow conflicts with SWI-Prolog using it?

I think the one you are referring to is the RocksDB pack readme.md.

Totally forgot about this and when reinstalling from scratch and double checking items read it again a few minutes ago.

There are a number of issues with several pre-built versions of librocksdb:

  • Shared objects are often linked to jemalloc or tcmalloc. This prevents lazy loading of the library, causing either problems loading or running the embedded rocksdb.

AFAIK problems emerge if a shared object has tcmalloc as dependency while the main program has not. Some RocksDB packages are linked against jemalloc, so you have the main program loading tcmalloc or ptmalloc and then RocksDB using jemalloc. That does not work. The safe option is to not link RocksDB against a particular malloc library and leave the choice of allocator to the main program. It is possible that RocksDB actually uses the tcmalloc/jemalloc extension functions. That would be a pity as some optimizations may be lost. libswipl.so solves the tcmalloc dependency dynamically, i.e., it queries the linker to see whether tcmalloc is in use and, if so, uses the added functionality.

1 Like

A scan of the RocksDB source doesn’t find tcmalloc/malloc_extension.h (or anything ending in extension.h), so presumably RocksDB doesn’t use the extension functions. And, as far as I can tell, the only references to tcmalloc are in build files.
On my system, rocksdb4pl.so doesn’t have any tcmalloc dependencies (nor does 'libswipl.so`, so I should probably change my build options).

No. For exactly the same reasons that cause problems when rocksdb.so depends on tcmalloc, libswipl.so doesn’t depend on it. The dependency is in swipl itself (the main program):

> ldd src/swipl
	linux-vdso.so.1 (0x00007fffebf54000)
	libtcmalloc_minimal.so.4 => /lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4 (0x00007f58b1097000)
	libswipl.so.8 => /home/janw/src/swipl-devel/linux/src/libswipl.so.8 (0x00007f58b0ed9000)
    ...

SWI-Prolog does use the extensions, but libswipl.so discovers them dynamically such that the library works fine with any malloc implementation.

Maybe someone will explain this while I am digging into the code to understand it.

Normally when accessing a resource such as a database via a connection, the access is maintained until the connection is closed then access is lost.

In using RocksDB that appears to not be the case. When using code as such

load :-
    mrfiles_path(Mrfiles_data_path),
    rocksdb_directory(Rocksdb_directory),
    setup_call_cleanup(
        (
            open(Mrfiles_data_path,read,Files_data_stream),
			rdb_open(Rocksdb_directory,RocksDB)
        ),
        mrfiles_lines(Files_data_stream,RocksDB),
        (
			rdb_close(RocksDB),
            close(Files_data_stream)
        )
    ).

Note: mrfile_lines/2 calls rdb_assertz/1 for each record.

I was surprised to find that with a wrapper predicate like

mrfile(A1,A2,A3,A4,A5,A6) :-
   rdb_clause(mrfile(A1,A2,A3,A4,A5,A6),true).

that the query worked from the top level. I was expecting that after load/0 successfully ran, calling rdb_close/1 and then returning control back the top level that the RocksDB resource would not be accessible.

See: rdb_close not closing as expected. RocksDB can still be queried from top level after rdb_close. · Issue #2 · JanWielemaker/rocks-predicates · GitHub

Feedback on predicates.


Item: 1

rdb_assertz/2 will write duplicate entries.

When adding facts that should be unique in the DB it would be nice to have an option or different predicate that checked if the fact existed and did not add a duplicate and that also succeeded if the fact was a duplicate. :slightly_smiling_face: I would send a pull request but my Git skills are horrendous.

See: New or option for rdb_assertz/2 that does not create duplicate facts and succeeds if duplicate. · Issue #3 · JanWielemaker/rocks-predicates · GitHub


Item: 2

Using the most general query.

Retrieve first result.
There are more valid results.
Press enter instead of space bar.

Example:

?- mrfile(A,B,C,D,E,F).
A = 'AMBIGLUI.RRF',
B = 'Ambiguous term identifiers',
C = ['LUI', 'CUI'],
D = 2,
E = 394198,
F = 7578465 .

ERROR: Arguments are not sufficiently instantiated
ERROR: In:
ERROR:    [5] '$execute_goal2'('<garbage_collected>',['A'='AMBIGLUI.RRF',...|...],_2420)

See: Most general query (goal) results in error if enter used to exit instead of space bar to continue. · Issue #5 · JanWielemaker/rocks-predicates · GitHub


Item: 3

It seems that information is being tabled; concern is that RAM is being used behind the scene and if millions, tens of millions of facts are asserted into RocksDB that the tabling could use up all of the RAM.

?- current_table(M:H,Table),trie_property(Table,size(Bytes)).
M = rocks_preds,
H = rdb_clause_index(<rocksdb>(0x560f3641a570), umls:mrfile/6, _),
Table = <trie>(0x560f365231b0),
Bytes = 168 ;
false.

For now my take on this item is to just be aware that tabling is needed and that using the example code above one can check on the RAM used as needed. In other words if the sky is not falling no need to cause a panic.

See: Reduce or remove need for tabiling to avoid using RAM. · Issue #4 · JanWielemaker/rocks-predicates · GitHub

Nice to know commands

For downloading packs via Git clone

Windows

On Windows (Work in progress)

> cd %LOCALAPPDATA%\swi-prolog\pack
> git clone https://github.com/JanWielemaker/rocksdb.git
> git clone https://github.com/JanWielemaker/rocks-predicates.git
> cd %USERPROFILE%

On Ubuntu

$ sudo apt install libsnappy-dev zlib1g-dev libbz2-dev liblz4-dev libzstd-dev libgflags-dev
$ mkdir -p  ~/.local/share/swi-prolog/pack
$ cd ~/.local/share/swi-prolog/pack
$ git clone https://github.com/JanWielemaker/rocksdb.git
$ git clone https://github.com/JanWielemaker/rocks-predicates.git
$ cd rocksdb
$ git clone https://github.com/facebook/rocksdb.git

Nice to know queries.

Is the module rocksdb loaded?

?- current_module(rocksdb).
true.

Load the rocksdb module.

?- use_module(library(rocksdb)).
true.

For ther error:

ERROR: rocksdb4pl: cannot open shared object file: No such file or directory

If the RockDB pack was recently installed/built/rebuilt within SWI-Prolog try halt/0 then restart swipl and try the command again, it might resolve the error.

Is the module rocks_preds loaded?

?- current_module(rocks_preds).
true.

Load the rocks_preds module.
Note: This expects the the code to under the pack directory.
Note: You will need to change the user name accordingly.

?- assertz(user:file_search_path(library, '/home/groot/.local/share/swi-prolog/pack/rocks-predicates')).
true.
?- use_module(library(rocks_preds)).

What predicates are in module rocksdb?

?- listing(rocksdb:P).
...

What predicates are in module rocks_preds:P?

?- listing(rocks_preds:P).
...

Is there currently a RocksDB?

?- current_blob(Blob,rocksdb).
Blob = <rocksdb>(0x560f3641a570) ;
false.

Note: More than one RocksDB can be open at a time, e.g.

?- current_blob(Blob,rocksdb).
Blob = <rocksdb>(0x557a667b1960) ;
Blob = <rocksdb>(0x557a66692920) ;
false.

What is the default RocksDB?
Note: Identified by file directory.

?- rocks_preds:default_db(Dir).
Dir = '/mnt/d/...'.

Note: If default_db/1 returns Dir = 'predicates.db'. then no default RocksDB has been set. This helps to understand error messages like

ERROR: rocksdb `'predicates.db'' does not exist

Note: predicates.db may actually be created, used and contain data. The parent directory at the time of creation will be the current working directory. So if you load up some data and then it seems to go missing look for the directory name predicates.db somewhere.

What is SWI-Prolog using for the current working directory?

?- pwd.
% /home/groot/
true.

What predicates exist in the RocksDB?

?- current_table(rocks_preds:H,_),H=rdb_clause_index(_,M:P/I,_).
H = rdb_clause_index(<rocksdb>(0x560f3641a570), umls:mrcol/8, '$VAR'('_')),
M = umls,
P = mrcol,
I = 8 ;
H = rdb_clause_index(<rocksdb>(0x560f3641a570), umls:mrfile/6, '$VAR'('_')),
M = umls,
P = mrfile,
I = 6 ;
H = rdb_clause_index(<rocksdb>(0x560f3641a570), '$VAR'('M'):'$VAR'('P')/'$VAR'('I'), '$VAR'('_')) ;
H = rdb_clause_index(<rocksdb>(0x560f3641a570), '$VAR'('M'):'$VAR'('P')/'$VAR'('I'), '$VAR'('_')) ;
false.

A variation that also reports the predicate with predication.

?- current_table(rocks_preds:H,_),H=rdb_clause_index(_,M:P/I,_),pi_head(M:P/I,Predication).
H = rdb_clause_index(<rocksdb>(0x55dff497bd10), issue_5:fact_01/9, _),
M = issue_5,
P = fact_01,
I = 9,
Predication = issue_5:fact_01(_, _, _, _, _, _, _, _, _) ;
false.

What are the properties for the predicates?

?- rocks_preds:default_db(Dir),rocks_preds:rdb_predicate_property(Dir,umls:mrfile(_,_,_,_,_,_),Property).
Dir = '/mnt/d/...',
Property = database('/mnt/d/...') ;
Dir = '/mnt/d/...',
Property = defined ;
Dir = '/mnt/d/...',
Property = number_of_clauses(100).
?- rocks_preds:default_db(Dir),rocks_preds:rdb_predicate_property(Dir,umls:mrcol(_,_,_,_,_,_,_,_),Property).
Dir = '/mnt/d/...',
Property = database('/mnt/d/...') ;
Dir = '/mnt/d/...',
Property = defined ;
Dir = '/mnt/d/...',
Property = number_of_clauses(329).

A variation using predication; since a predication is a most general query (goal) it can be executed as such.

?- current_table(rocks_preds:H,_),H=rdb_clause_index(_,M:P/I,_),pi_head(M:P/I,Goal),Goal.
H = rdb_clause_index(<rocksdb>(0x55dff497bd10), issue_5:fact_01/9, _),
M = issue_5,
P = fact_01,
I = 9,
Goal = issue_5:fact_01('1', 'Mars 2MV-3 No.1', '04 Nov 1962', '25 Nov 1962', '890', -, -, 'Failed', 'Soviet Union') ;

...

When issue 2 is corrected the need to open the RocksDB and then run queries from the top-level will be desired. As noted the use of break/0 can accomplish this.

interactive :-
    absolute_file_name(rocksdb('.'),Rocksdb_directory),
    setup_call_cleanup(
        rdb_open(Rocksdb_directory,RocksDB),
        break,
        rdb_close(RocksDB)
    ).

Example run:

?- interactive.
% Break level 1
[1]  ?- mars_lander(Sequence,Lander,Launch_date,Landing_date,Mass_kg,Landing_site,Region,Status,Country_of_origin).
Sequence = '1',
Lander = 'Mars 2MV-3 No.1',
Launch_date = '04 Nov 1962',
Landing_date = '25 Nov 1962',
Mass_kg = '890',
Landing_site = Region, Region = (-),
Status = 'Failed',
Country_of_origin = 'Soviet Union' .

...

Sequence = '21',
Lander = 'Tianwen-1',
Launch_date = '23 July 2020',
Landing_date = '14 May 2021',
Mass_kg = '240',
Landing_site = '109.7°E, 25.1°N',
Region = 'Utopia Planitia',
Status = 'Operational',
Country_of_origin = 'China'.

[1]  ?- ^D
% Exit break level 1
true.

?-

Note: ^D is the visualization of holding down the keyboard key Ctrl and pressing the key D.

It should be possible to use initialization/1 (which is only used as a Prolog directive) to open the RocksDB an thus enter top-level with the predicates loaded and use at_halt/1 to close the RocksDB when SWI-Prolog halts; just not high on my priority list at present to work out the details.

The rdb_close/1 does a RocksDB close and then remove the entry from the dynamic predicate pred_table/2. When another rdb_open/2 is done, there’s no entry in the pred_table/2, so the database is re-opened.

This doesn’t seem quite right … probably pred_table/2 should store the options given to the initial rdb_open/2. (Please open an issue on this and assign it to me, so that I don’t forget)

BTW, there are also the alias(Name) and how(once) options to rdb_open/3.

1 Like