AFAIK problems emerge if a shared object has tcmalloc as dependency while the main program has not. Some RocksDB packages are linked against jemalloc, so you have the main program loading tcmalloc or ptmalloc and then RocksDB using jemalloc. That does not work. The safe option is to not link RocksDB against a particular malloc library and leave the choice of allocator to the main program. It is possible that RocksDB actually uses the tcmalloc/jemalloc extension functions. That would be a pity as some optimizations may be lost. libswipl.so solves the tcmalloc dependency dynamically, i.e., it queries the linker to see whether tcmalloc is in use and, if so, uses the added functionality.
A scan of the RocksDB source doesn’t find tcmalloc/malloc_extension.h
(or anything ending in extension.h
), so presumably RocksDB doesn’t use the extension functions. And, as far as I can tell, the only references to tcmalloc are in build files.
On my system, rocksdb4pl.so
doesn’t have any tcmalloc dependencies (nor does 'libswipl.so`, so I should probably change my build options).
No. For exactly the same reasons that cause problems when rocksdb.so depends on tcmalloc, libswipl.so doesn’t depend on it. The dependency is in swipl itself (the main program):
> ldd src/swipl
linux-vdso.so.1 (0x00007fffebf54000)
libtcmalloc_minimal.so.4 => /lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4 (0x00007f58b1097000)
libswipl.so.8 => /home/janw/src/swipl-devel/linux/src/libswipl.so.8 (0x00007f58b0ed9000)
...
SWI-Prolog does use the extensions, but libswipl.so discovers them dynamically such that the library works fine with any malloc implementation.
Maybe someone will explain this while I am digging into the code to understand it.
Normally when accessing a resource such as a database via a connection, the access is maintained until the connection is closed then access is lost.
In using RocksDB that appears to not be the case. When using code as such
load :-
mrfiles_path(Mrfiles_data_path),
rocksdb_directory(Rocksdb_directory),
setup_call_cleanup(
(
open(Mrfiles_data_path,read,Files_data_stream),
rdb_open(Rocksdb_directory,RocksDB)
),
mrfiles_lines(Files_data_stream,RocksDB),
(
rdb_close(RocksDB),
close(Files_data_stream)
)
).
Note: mrfile_lines/2 calls rdb_assertz/1 for each record.
I was surprised to find that with a wrapper predicate like
mrfile(A1,A2,A3,A4,A5,A6) :-
rdb_clause(mrfile(A1,A2,A3,A4,A5,A6),true).
that the query worked from the top level. I was expecting that after load/0 successfully ran, calling rdb_close/1 and then returning control back the top level that the RocksDB resource would not be accessible.
Feedback on predicates.
Item: 1
rdb_assertz/2 will write duplicate entries.
When adding facts that should be unique in the DB it would be nice to have an option or different predicate that checked if the fact existed and did not add a duplicate and that also succeeded if the fact was a duplicate. I would send a pull request but my Git skills are horrendous.
Item: 2
Using the most general query.
Retrieve first result.
There are more valid results.
Press enter instead of space bar.
Example:
?- mrfile(A,B,C,D,E,F).
A = 'AMBIGLUI.RRF',
B = 'Ambiguous term identifiers',
C = ['LUI', 'CUI'],
D = 2,
E = 394198,
F = 7578465 .
ERROR: Arguments are not sufficiently instantiated
ERROR: In:
ERROR: [5] '$execute_goal2'('<garbage_collected>',['A'='AMBIGLUI.RRF',...|...],_2420)
Item: 3
It seems that information is being tabled; concern is that RAM is being used behind the scene and if millions, tens of millions of facts are asserted into RocksDB that the tabling could use up all of the RAM.
?- current_table(M:H,Table),trie_property(Table,size(Bytes)).
M = rocks_preds,
H = rdb_clause_index(<rocksdb>(0x560f3641a570), umls:mrfile/6, _),
Table = <trie>(0x560f365231b0),
Bytes = 168 ;
false.
For now my take on this item is to just be aware that tabling is needed and that using the example code above one can check on the RAM used as needed. In other words if the sky is not falling no need to cause a panic.
Nice to know commands
For downloading packs via Git clone
Windows
On Windows (Work in progress)
> cd %LOCALAPPDATA%\swi-prolog\pack
> git clone https://github.com/JanWielemaker/rocksdb.git
> git clone https://github.com/JanWielemaker/rocks-predicates.git
> cd %USERPROFILE%
On Ubuntu
$ sudo apt install libsnappy-dev zlib1g-dev libbz2-dev liblz4-dev libzstd-dev libgflags-dev
$ mkdir -p ~/.local/share/swi-prolog/pack
$ cd ~/.local/share/swi-prolog/pack
$ git clone https://github.com/JanWielemaker/rocksdb.git
$ git clone https://github.com/JanWielemaker/rocks-predicates.git
$ cd rocksdb
$ git clone https://github.com/facebook/rocksdb.git
Nice to know queries.
Is the module rocksdb loaded?
?- current_module(rocksdb).
true.
Load the rocksdb module.
?- use_module(library(rocksdb)).
true.
For ther error:
ERROR: rocksdb4pl: cannot open shared object file: No such file or directory
If the RockDB pack was recently installed/built/rebuilt within SWI-Prolog try halt/0 then restart swipl and try the command again, it might resolve the error.
Is the module rocks_preds loaded?
?- current_module(rocks_preds).
true.
Load the rocks_preds module.
Note: This expects the the code to under the pack directory.
Note: You will need to change the user name accordingly.
?- assertz(user:file_search_path(library, '/home/groot/.local/share/swi-prolog/pack/rocks-predicates')).
true.
?- use_module(library(rocks_preds)).
What predicates are in module rocksdb?
?- listing(rocksdb:P).
...
What predicates are in module rocks_preds:P?
?- listing(rocks_preds:P).
...
Is there currently a RocksDB?
?- current_blob(Blob,rocksdb).
Blob = <rocksdb>(0x560f3641a570) ;
false.
Note: More than one RocksDB can be open at a time, e.g.
?- current_blob(Blob,rocksdb).
Blob = <rocksdb>(0x557a667b1960) ;
Blob = <rocksdb>(0x557a66692920) ;
false.
What is the default RocksDB?
Note: Identified by file directory.
?- rocks_preds:default_db(Dir).
Dir = '/mnt/d/...'.
Note: If default_db/1 returns Dir = 'predicates.db'.
then no default RocksDB has been set. This helps to understand error messages like
ERROR: rocksdb `'predicates.db'' does not exist
Note: predicates.db
may actually be created, used and contain data. The parent directory at the time of creation will be the current working directory. So if you load up some data and then it seems to go missing look for the directory name predicates.db
somewhere.
What is SWI-Prolog using for the current working directory?
?- pwd.
% /home/groot/
true.
What predicates exist in the RocksDB?
?- current_table(rocks_preds:H,_),H=rdb_clause_index(_,M:P/I,_).
H = rdb_clause_index(<rocksdb>(0x560f3641a570), umls:mrcol/8, '$VAR'('_')),
M = umls,
P = mrcol,
I = 8 ;
H = rdb_clause_index(<rocksdb>(0x560f3641a570), umls:mrfile/6, '$VAR'('_')),
M = umls,
P = mrfile,
I = 6 ;
H = rdb_clause_index(<rocksdb>(0x560f3641a570), '$VAR'('M'):'$VAR'('P')/'$VAR'('I'), '$VAR'('_')) ;
H = rdb_clause_index(<rocksdb>(0x560f3641a570), '$VAR'('M'):'$VAR'('P')/'$VAR'('I'), '$VAR'('_')) ;
false.
A variation that also reports the predicate with predication.
?- current_table(rocks_preds:H,_),H=rdb_clause_index(_,M:P/I,_),pi_head(M:P/I,Predication).
H = rdb_clause_index(<rocksdb>(0x55dff497bd10), issue_5:fact_01/9, _),
M = issue_5,
P = fact_01,
I = 9,
Predication = issue_5:fact_01(_, _, _, _, _, _, _, _, _) ;
false.
What are the properties for the predicates?
?- rocks_preds:default_db(Dir),rocks_preds:rdb_predicate_property(Dir,umls:mrfile(_,_,_,_,_,_),Property).
Dir = '/mnt/d/...',
Property = database('/mnt/d/...') ;
Dir = '/mnt/d/...',
Property = defined ;
Dir = '/mnt/d/...',
Property = number_of_clauses(100).
?- rocks_preds:default_db(Dir),rocks_preds:rdb_predicate_property(Dir,umls:mrcol(_,_,_,_,_,_,_,_),Property).
Dir = '/mnt/d/...',
Property = database('/mnt/d/...') ;
Dir = '/mnt/d/...',
Property = defined ;
Dir = '/mnt/d/...',
Property = number_of_clauses(329).
A variation using predication; since a predication is a most general query (goal) it can be executed as such.
?- current_table(rocks_preds:H,_),H=rdb_clause_index(_,M:P/I,_),pi_head(M:P/I,Goal),Goal.
H = rdb_clause_index(<rocksdb>(0x55dff497bd10), issue_5:fact_01/9, _),
M = issue_5,
P = fact_01,
I = 9,
Goal = issue_5:fact_01('1', 'Mars 2MV-3 No.1', '04 Nov 1962', '25 Nov 1962', '890', -, -, 'Failed', 'Soviet Union') ;
...
When issue 2 is corrected the need to open the RocksDB and then run queries from the top-level will be desired. As noted the use of break/0 can accomplish this.
interactive :-
absolute_file_name(rocksdb('.'),Rocksdb_directory),
setup_call_cleanup(
rdb_open(Rocksdb_directory,RocksDB),
break,
rdb_close(RocksDB)
).
Example run:
?- interactive.
% Break level 1
[1] ?- mars_lander(Sequence,Lander,Launch_date,Landing_date,Mass_kg,Landing_site,Region,Status,Country_of_origin).
Sequence = '1',
Lander = 'Mars 2MV-3 No.1',
Launch_date = '04 Nov 1962',
Landing_date = '25 Nov 1962',
Mass_kg = '890',
Landing_site = Region, Region = (-),
Status = 'Failed',
Country_of_origin = 'Soviet Union' .
...
Sequence = '21',
Lander = 'Tianwen-1',
Launch_date = '23 July 2020',
Landing_date = '14 May 2021',
Mass_kg = '240',
Landing_site = '109.7°E, 25.1°N',
Region = 'Utopia Planitia',
Status = 'Operational',
Country_of_origin = 'China'.
[1] ?- ^D
% Exit break level 1
true.
?-
Note: ^D
is the visualization of holding down the keyboard key Ctrl
and pressing the key D
.
It should be possible to use initialization/1 (which is only used as a Prolog directive) to open the RocksDB an thus enter top-level with the predicates loaded and use at_halt/1 to close the RocksDB when SWI-Prolog halts; just not high on my priority list at present to work out the details.
The rdb_close/1 does a RocksDB close and then remove the entry from the dynamic predicate pred_table/2. When another rdb_open/2 is done, there’s no entry in the pred_table/2, so the database is re-opened.
This doesn’t seem quite right … probably pred_table/2 should store the options given to the initial rdb_open/2. (Please open an issue on this and assign it to me, so that I don’t forget)
BTW, there are also the alias(Name)
and how(once)
options to rdb_open/3.
Thanks.
I was actually hoping it was a feature and not a bug.
Now that I know it is a bug, I suspect that break/0 or something will be needed to keep the RocksDB session alive so that manual queries can be entered at the top level.
Right now, you’d need to check, using rdb_clause/2. But this seems like a reasonable feature, and not difficult to implement, so please create an Issue for it. Issues · JanWielemaker/rocks-predicates · GitHub
That tabling might not be necessary … RocksDB has its own LRU cache of key/value pairs and its use of RAM can be controlled by options.
Again, please open an issue on this … I’ll have to think a bit about the ramifications.
Please file a bug with a reproducible example.
I’m in the middle of some code clean-up, so it’ll be at least a few days before I can work on this.
I have no problem trying to make a reproducible example but now knowing that rdb_close/1 is not working as expected that this might be an artifact of using the system in a way not intended.
I know a lot of work has gone into this and more is needed, so just want to say THANKS!.
That’s still a bug.
[All systems get used in ways not intended …]
For those wanting to try using rocks-predicates module and need to see just what is needed. These are specific to Ubuntu but should also work with Debian. If you use Windows then Ubuntu can be installed using WSL 2.
Note: This was created as minimal working example for issue 5 but is still useful for others.
Note: Because of Issue 2 the ability to run queries from the top level after rdb_close/1 is run should not occur. In the future if the modules this relies on are changed then the example Prolog code here will probably need to use break/0 or similar.
Installing SWI-Prolog via PPA
Based on: Installing from PPA (Ubuntu Personal Package Archive)
$ sudo apt-get update
$ sudo apt-get upgade
$ sudo apt-get install software-properties-common
$ sudo apt-add-repository ppa:swi-prolog/devel
$ sudo apt-get update
$ sudo apt-get install swi-prolog
$ swipl
After starting SWI-Prolog check the version to make sure it is a recent version.
As of 07/12/2022 it is 8.5.14
?- version.
Welcome to SWI-Prolog (threaded, 64 bits, version 8.5.14)
Installing SWI-Prolog pack RocksDB using Git instead of pack_install/1 so that different commits can be tried.
$ sudo apt install libsnappy-dev liblz4-dev libzstd-dev libgflags-dev
$ mkdir -p ~/.local/share/swi-prolog/pack
$ cd ~/.local/share/swi-prolog/pack
$ git clone https://github.com/JanWielemaker/rocksdb.git
$ git clone https://github.com/JanWielemaker/rocks-predicates.git
$ cd rocksdb
$ git clone https://github.com/facebook/rocksdb.git
As this only downloaded the source code for the packs they still need to be compiled.
$ swipl
?- pack_rebuild(rocksdb).
To see the checked out commit for a directory
groot@System:~/.local/share/swi-prolog/pack/rocksdb$ git show --oneline -s
e253458 (HEAD -> master, tag: V0.10.0, origin/master, origin/HEAD) ENHANCED: added statistics and logging options
OS: Ubuntu 22.04 LTS
SWI-Prolog: Install via PPA development version 8.5.14
RocksDB pack commit: e253458
RocksDB commit: a9565ccb2
rocks-predicates pack commit: 3071007
Example Prolog code with data (facts) for loading into RocksDB.
Note: This was originally created as a MWE for issue 5 thus the odd name of the module.
:- module('issue_5',
[
check/1,
mars_lander/9
]).
% -------------------------------------------------------------------------
% Use the fact dummy and the user:file_search_path/2 to setup the Alias myapp.
dummy.
user:file_search_path(myapp,Dir) :-
source_file(dummy,File),
file_directory_name(File,Dir).
user:file_search_path(library, '/home/groot/.local/share/swi-prolog/pack/rocks-predicates').
% ----------------------------------------------------------------------------
:- use_module(library(rocksdb)).
:- use_module(library(rocks_preds)).
% ----------------------------------------------------------------------------
% Source: https://en.wikipedia.org/wiki/List_of_Mars_landers#Mars_landers
fact_01('1','Mars 2MV-3 No.1','04 Nov 1962','25 Nov 1962','890','-','-','Failed','Soviet Union').
fact_01('2','Mars 2','19 May 1971','27 Nov 1971','1210','45°S 47°E','-','Failed','Soviet Union').
fact_01('3','Mars 3','28 May 1971','02 Dec 1971','1210','45°S 202°E','Sirenum Terra','Partial Success','Soviet Union').
fact_01('4','Mars 6','05 Aug 1973','12 Mar 1974','635','23.90°S 19.4°W','Margaritifer Terra','Failed','Soviet Union').
fact_01('5','Mars 7','09 Aug 1973','-','635','-','-','Failed','Soviet Union').
fact_01('6','Viking 1','20 Aug 1975','20 Jul 1976','572','22.27°N 47.95°W','Chryse Planitia','Success','USA').
fact_01('7','Viking 2','09 Sep 1975','03 Sep 1976','572','47.64°N 225.71°W','Utopia Planitia','Success','USA').
fact_01('8','Phobos 1','07 Jul 1988','-','2600','-','-','Failed','Soviet Union').
fact_01('9','Phobos 2','12 Jul 1988','-','2600','-','-','Failed','Soviet Union').
fact_01('10','Mars 96','16 Nov 1996','-','3159','41°31N 153°77 W♦','-','Failed','Russia').
fact_01('11','Mars Pathfinder','04 Dec 1996','04 Jul 1997','361','19°7′48″ N 33°18′12″W','Ares Vallis','Success','USA').
fact_01('12','Mars Polar Lander','03 Jan 1999','03 Dec 1999','583','76°S 195°W','Ultimi Scopuli','Failed','USA').
fact_01('13','Beagle 2','02 Jun 2003','25 Dec 2003','33.2','11.5265°N 90.4295°E','Isidis Planitia','Failed','United Kingdom').
fact_01('14','Spirit rover','10 Jun 2003','4 Jan 2004','174','14.5684°S 175.4726°E','Gusev Crater','Success','USA').
fact_01('15','Opportunity rover','07 Jul 2003','25 Jan 2004','174','1.9462°S 354.4743°E','Meridiani Planum','Success','USA').
fact_01('16','Phoenix lander','04 Aug 2007','5 May 2008','350','68.22°N 125.7°W','Vastitas Borealis','Success','USA').
fact_01('17','Curiosity rover','26 Nov 2011','5 Aug 2012','899','4.5895°S 137.4417°E','Gale Crater','Operational','USA').
fact_01('18','Schiaparelli EDM','14 Mar 2016','19 Oct 2016','577','2.052°S 6.208°W','Meridiani Planum','Failed','European UnionESA/Russia').
fact_01('19','InSight Mars Lander','5 May 2018','26 Nov 2018','727','4.5°N 135.9°E','Elysium Planitia','Operational','USA').
fact_01('20','Perseverance rover','30 Jul 2020','18 Feb 2021','1,025','18.4447°N 77.4508°E','Jezero crater','Operational','USA').
fact_01('21','Tianwen-1','23 July 2020','14 May 2021','240','109.7°E, 25.1°N','Utopia Planitia','Operational','China').
% ----------------------------------------------------------------------------
user:file_search_path(rocksdb,myapp('RocksDB')).
% ----------------------------------------------------------------------------
check(1) :-
load.
load :-
absolute_file_name(rocksdb('.'),Rocksdb_directory),
setup_call_cleanup(
rdb_open(Rocksdb_directory,RocksDB),
load_records,
rdb_close(RocksDB)
).
load_records :-
forall(
fact_01(A1,A2,A3,A4,A5,A6,A7,A8,A9),
rdb_assertz(fact_01(A1,A2,A3,A4,A5,A6,A7,A8,A9))
).
mars_lander(A1,A2,A3,A4,A5,A6,A7,A8,A9) :-
rdb_clause(fact_01(A1,A2,A3,A4,A5,A6,A7,A8,A9),true).
Example run.
groot@Galaxy:~$ swipl
Welcome to SWI-Prolog (threaded, 64 bits, version 8.5.14)
...
?- working_directory(_,'/mnt/c/Users/groot/Projects/rocks-predicates_issue_5').
true.
?- [issue_5].
true.
?- check(1).
true.
?- current_blob(Blob,rocksdb).
Blob = <rocksdb>(0x557084c04070) ;
false.
?- rocks_preds:default_db(Dir).
Dir = '/mnt/c/Users/groot/Projects/rocks-predicates_issue_5/RocksDB'.
?- current_table(rocks_preds:H,_),H=rdb_clause_index(_,M:P/I,_).
H = rdb_clause_index(<rocksdb>(0x563e1abac0f0), issue_5:fact_01/9, _),
M = issue_5,
P = fact_01,
I = 9 ;
false.
?- rocks_preds:default_db(Dir),rocks_preds:rdb_predicate_property(Dir,issue_5:fact_01(_,_,_,_,_,_,_,_,_),number_of_clau
ses(N)).
Dir = '/mnt/c/Users/groot/Projects/rocks-predicates_issue_5/RocksDB',
N = 21.
?- issue_5:fact_01(A1,A2,A3,A4,A5,A6,A7,A8,A9).
A1 = '1',
A2 = 'Mars 2MV-3 No.1',
A3 = '04 Nov 1962',
A4 = '25 Nov 1962',
A5 = '890',
A6 = A7, A7 = (-),
A8 = 'Failed',
A9 = 'Soviet Union' ;
A1 = '2',
A2 = 'Mars 2',
A3 = '19 May 1971',
A4 = '27 Nov 1971',
A5 = '1210',
A6 = '45°S 47°E',
A7 = (-),
A8 = 'Failed',
A9 = 'Soviet Union' ;
A1 = '3',
A2 = 'Mars 3',
A3 = '28 May 1971',
A4 = '02 Dec 1971',
A5 = '1210',
A6 = '45°S 202°E',
A7 = 'Sirenum Terra',
A8 = 'Partial Success',
A9 = 'Soviet Union'
...
?-
Note: As this is a MWE for issue 5 this is how to recreate the issue.
?- mars_lander(A1,A2,A3,A4,A5,A6,A6,A8,A9).
A1 = '1',
A2 = 'Mars 2MV-3 No.1',
A3 = '04 Nov 1962',
A4 = '25 Nov 1962',
A5 = '890',
A6 = (-),
A8 = 'Failed',
A9 = 'Soviet Union' ;
Press space bar to see next result.
A1 = '5',
A2 = 'Mars 7',
A3 = '09 Aug 1973',
A4 = A6, A6 = (-),
A5 = '635',
A8 = 'Failed',
A9 = 'Soviet Union' .
Press enter here instead of space bar which causes the error.
ERROR: Arguments are not sufficiently instantiated
ERROR: In:
ERROR: [5] '$execute_goal2'(user:mars_lander('5','Mars 7','09 Aug 1973',-,'635',-,-,'Failed','Soviet Union'),['A1'='5',...|...],true)
?-
Enjoy.
Debian doesn’t seem to support adding the PPA - it needs some kind of key, which I don’t know how to generate. In the end, I build SWI-Prolog from the github source with this, which puts it in $HOME/.local/bin/swipl
:
cd ~/src/swipl-devel && git pull --recurse && \
mkdir -p ~/src/swipl-devel/build && \
cd ~/src/swipl-devel/build && \
cmake -DCMAKE_INSTALL_PREFIX=$HOME/.local -G Ninja .. && \
ninja && \
ctest -j8 && \
ninja install
The first trial load of the SemMedDB data completed, took about 14 hours to load.
This line from the RocksDB log is notable.
Note: Split line up into multiple lines to improve readbility.
Cumulative writes:
339M writes,
339M keys,
339M commit groups,
1.0 writes per commit group,
ingest:
34.14 GB,
0.70 MB/s
Since module rocks_preds uses tabling and tabling eats RAM is there some safe guard built in to keep RAM from being exhausted?
Based on my experience with SQL databases, batch loading would give you a significant performance improvement. RocksDB has a batch facility, and the rocksdb interface supports it; but rocks_preds doesn’t use it (yet). It’s probably worth opening an enhancement request for this.
What do you get from the following query after you load your data and do your queries?
?- forall(predicate_property(rocks_preds:rdb_clause_index(_,_,_), P), writeln(P)).
You could, of course, remove the table
directive if it turns out to use a lot of memory. It’s not clear to me that it’s needed, although that might require some modification of the rocks_preds
or rocksdb
code to maintain efficiency.
There’s an open issue on this, but I’m not able to look into it right now.
Thanks.
I have been eyeballing that feature since I first saw it but it requires that the data be presented in key sorted order. (ref). Also that would be great for the initial load of the data but then for regular updates it is back to things as normal.
Another option also of consideration and not on my priority list is the alternative Bulkloading by ingesting external SST files but obviously that requires that the SST files were created, thus one needs the chicken before the egg. If there are many users that follow in loading many of these databases (biological in my case) into RocksDB and don’t mind delays of a few days while the the SST files are updated and compacted for each monthly release then these datasets could be distributed as SST files. Also there is no requirement that there be a single RocksDB, one could put each dataset into a different RocksDB and then instead of loading the data just replace the RocksDB with the fresh SST files as needed. Pointer and handle manipulation is becoming one of my favorite swords of choice.
Also did not check Windows Process Monitor see see if any Virus checker or other such process was hooking file updates that need to be configured correctly for this. (ref)
Does SWI-Prolog really need it?
Since RocksDB is meant to be an embedded DB, the files should be local or at least accessible and if one can access the files then one could just write some simple C++ and do it that way. Also since the code should be pretty much boilerplate with some options I would not be surprised if such code is easily found in a Git repository or such.
What do you get from the following query after you load your data and do your queries?
?- forall(predicate_property(rocks_preds:rdb_clause_index(_,_,_), P),
writeln(P)).
EDIT 1 of just this answer
The second attempt almost finished cleanly, seems an RRF file was missing so the code threw an exception entering trace mode and at which point pressing the space bar just completed the commands in the stack, often failing, including failing on rdb_close. Plan to use ldb and other #RocksDB tools to inspect the files.
While your desired query did work this time, the result is not what I think you seek.
?- forall(predicate_property(rocks_preds:rdb_clause_index(_,_,_),P),writeln(P)).
interpreted
visible
static
file(/home/eric/.local/share/swi-prolog/pack/rocks-predicates/rocks_preds.pl)
line_count(348)
number_of_clauses(1)
number_of_rules(1)
last_modified_generation(7084)
defined
tabled
tabled(variant)
size(488)
true.
Obviously something is amiss.
Second attempt clearly loaded more data.
Cumulative writes:
671M writes, 671M keys,
671M commit groups,
1.0 writes per commit group,
ingest:
60.01 GB,
0.50 MB/s
That part of the code is still a mystery to me.
I do know that rocks_preds actually creates two RocksDBs, one for the data and then on in a folder called predicates
that I have just to work out the details.
I did find
to try and inspect the RocksDB files but the precompiled version is not free and building for Windows resulted in an error. Might try a Linux build if I am hard pressed for a way to understand the RocksDB files.
See: RocksDB Tools.
Read the issues:
8081 was surprising.
You really are busy. You asked me to open that one.
Another item of concern.
Journal and/or log files eventually consuming all disk space.
Since RocksDB is like a database in that it keeps a journal and/or logs, over time these files will continue to grow and possibly accumulate. Thus one could run out of disk space. I don’t know how serious this is but something that needs to be understood up front when setting up RocksDB.
Based on reading:
What I Wish Someone Would Have Told Me About Using Rabbitmq Before It Was Too Late
and a few others in the last few days.
Batch loading doesn’t require that the keys be in order, although it might help performance. The reference you mention even says “Bulk loading of data into RocksDB can be modeled as a large parallel sort where the dataset doesn’t fit into memory, …”.
Compaction deletes those journals and/or logs when it’s safe to do so (and the logs with human-readable messages, such as statistics, are rotated). That’s a standard feature of journaling file systems (such as NTFS, ext3, ext4, zfs, etc. and also of databases that use this technique, such as PosgreSQL). Of course, if the database is being written to faster than compaction can happen, then you can run out of disk space, and that wouldn’t happen with a non-journaling system. But I suspect with modern hardware, that would require an incredibly high data rate.