Persistent predicates based on RocksDB

drspro · June 23, 2022, 9:29am

a fast database key value storage could also be made with the swi-prolog red-black tree library? but then you would have to write the whole tree to a file when updating 1 key value -pair?

nicos · June 23, 2022, 9:55am

The types i think are quite flexible.
You probably show the create table examples in which case they have to match to SQL datatypes.

It should be trivial to extend the query datatypes to anything the backend supports. pack(bio_db) progressed things to that direction, but i had no
time to propagate the changes to pack(db_facts).

Regards,

Nicos Angelopoulos

joeblog · June 23, 2022, 10:22am

Something Swish does is put variable names in those table positions where SQL would put NULL. I find this handy in that it implies columns in a given row holding the same variable name need to be equal once the value is set.

In Prolog, ungrounded variables tend to be seen as “types” in their own right rather than just placeholders for whatever type, and I think it would be cool to make ungrounded variables a “type” (replacing NULL) in a persistent database system.

EricGT · June 28, 2022, 7:10am

Does this mean it needs the Git tip or has enough time passed that the latest dev version can be used?

jan · June 28, 2022, 7:15am

The current dev release will do. You need to fetch the pack from git and build it yourself.

EricGT · July 2, 2022, 7:47pm

With the RocksDB pack installed and working, and a real world project of using Prolog code that has millions of facts that exceed the limit of local systems RAM (think 16 GB) it is now occurring to me that what I first envisioned is not how the sample RocksDB code is working.

What I envisioned was that if a fact was needed it would transparently be accessed from an SSD. In other words if I had ten million facts (predicate without body) of DNA segments available as dna(Index, nucleotide list) then the goal dna(2048547,Nucleotides) would transparently retrieve the fact from an SSD using RocksDB. Thus the data that can be accessed with SWI-Prolog is no longer limited to RAM but how much SSD memory can be attached. Granted the fact would have to be identified as needing to work differently than a standard fact, thinking a Prolog directive to identify them as such and term expansion to change how the fact goal is processed. However library(persistency) is similar code worth leveraging for the task.

Is this one of the ways that others are thinking/expecting?
Is it just me thinking like this?

If it is just me then I don’t mind but I would need help if C/C++ code is needed for increasing efficiency, much more help if it has to be C++. Also if anything needs to be special for Mac OS then count me out on that part.

EricGT · July 3, 2022, 2:42pm

Noting this because others might find it useful.

“Syntax and Semantics of a Persistent Common Lisp” by J. H. Jacobs and M. R. Swanson (pdf)

What kept my attention to reading this was

In our view, a Persistent Lisp should follow three principles.

First, it should conform to established Lisp syntax, semantics, and programming style; if it does not look like Lisp, it cannot be called Lisp.

Second, the persistence features provided should be powerful enough so that the programmer need not resort to using Lisp I/O features or operating systems calls to implement persistent programs.

Third, the programs constructed using Persistent Lisp should be sufficiently efficient that programmers will be able to use the programs that they construct.

and

Replacing pointers with handles was necessary so that heap data which becomes persistent can be moved into the protected region of memory reserved for persistent values.

Related thesis

TL;DR

This was found by first not thinking about the concept as persistent predicates but as saving predicate state based on the Prolog memory structures (ref). Then in realizing that tried to come up with a different name and thought of persistent Prolog. Also knowing that many programming languages are not homoiconic so limited my searches to such languages. Thus Googled for persistent lisp pdf but quickly found that Google associates persistent lisp with speaking and not the language so added language thus Googling persistent lisp language pdf showed the paper.

peter.ludemann · July 3, 2022, 5:40pm

I tried the wordnet file that @jan sent me.

Setting read-only made almost no difference in random lookups or counting the triples (the RocksDB said that read-only avoids some locks, so I suppose the performance improvements for read-only only show up if there are multiple threads accessing the database).

Interestingly, limiting the VM by ulimit -v 1000000 sped things up by about 20%.
If I reduced VM by much more, I got a crash with “terminate called without an active exception”.
If I reduced the RAM too much (using ulimit -l and ulimit -v), I got a Prolog “fail” (presumably there’s a return code from RocksDb that isn’t being checked properly).

This was on a ChromeBook (AMD Ryzen 7 3700C) 2.3-4.0GHz with 16GB RAM, 4 cores (8 threads) and PCIe SSD. My file was 6.75x larger than Jan’s (# triples; 3.2x larger by MB) and his benchmark numbers are roughly 3x to 5x faster than mine when I adjust for size of the database. My guess is that this is due to more CPU cache and faster memory – after all, I have a laptop and Jan has a server. Or, it could be that Jan’s larger RAM allows better file caching, especially as my system has very little spare RAM (I have a lot of open tabs). I’m going to write some code that creates a large set of “random” predicates, so that I can hopefully reduce the effect of file caching and RAM. There are also some RocksDB tuning parameters, but it’ll take me a while to add the code for setting RocksDB options.

% Loaded /home/peter/Downloads/wordnet.hdt in 255.331 sec (255.101 sec CPU)
% Count triples (5557296): 12.102 seconds
% Random triples: 5.546 seconds for 100000 = 57 microsec each
% DB size: 316MB
% When reduced memory by ulimit -v 1000000
%      (no effect: ulimit -l 10000; ulimit -m 10000)
%   random triples improved to 4.4 seconds, count to 10.863 sec
% read-only made essentially no difference.

For comparison, @jan reported (AMD3950X based system, 64Gb mem and M2 SSD drive; 3.5-4.7GHz, 16 cores, 32 threads) :

21 files, 21 predicates, 821,492 clauses, 34Mb source text
Load time: 11.7 sec.
RocksDB size: 99Mb
rdb_index(hyp/2, 1) (89,089 clauses) → 1.35 sec.
random query time on hyp(+,-): 10usec

EricGT · July 3, 2022, 6:16pm

I think for the next few weeks that those of us trying out SWI-Prolog with #RocksDB will be adding such code for RocksDB options as that seems the quickest path to enlightenment. Will have to brush up on my C/C++ coding and setup a toolchain.

Or maybe I can get by with SWI-Prolog Foreign Language Interface.

EricGT · July 3, 2022, 6:24pm

To make it easier to add a link for RocksDB it has been added as #RocksDB to this sites Linkify Words list.

E.g

#RocksDB

peter.ludemann · July 3, 2022, 6:33pm

I know how to add the options, and my C++ is good enough.
There’s already a foreign language interface - rocksdb/rocksdb4pl.cpp at 5034a96229e2e7d366718ccb200537e6ab6723a8 · JanWielemaker/rocksdb · GitHub
which uses the C++ interface (section('packages/pl2cpp.html'))

The “options” handling for RocksDB is a bit complicated – there’s already some code in rocksdb4pl.cpp, but it needs to be adapted for a wider range of options. The new PL_scan_options() might help.

If anybody wants to work on this, please message me so that we don’t do duplicate work. Otherwise, I’ll work on it on-and-off. (Jan’s away, so it might take some time for him to accept PRs)

peter.ludemann · July 4, 2022, 6:16am

I’ve written the code for handling most of the rocksdb options, but I need to put together a test suite to verify what I’ve written.
If you want to see what options are available (these are most of DBoptions in rocksdb/include/rocksdb/options.h): rocksdb/rocksdb.pl at 5aedb4acb2a8fd2cb5e6b9c007a6739b29aaa910 · kamahen/rocksdb · GitHub

I found some tuning guides (not sure which are best, and there might be others):

github.com

EighteenZi/rocksdb_wiki/blob/64377d5bb29fab79730668fca73a8020ed758d99/RocksDB-Tuning-Guide.md

The purpose of this guide is to provide you with enough information so you can tune RocksDB for your workload and your system configuration. 

RocksDB is very flexible, which is both good and bad. You can tune it for variety of workloads and storage technologies. Inside of Facebook we use the same code base for in-memory workload, flash devices and spinning disks. However, flexibility is not always user-friendly. We introduced a huge number of tuning options that may be confusing. We hope this guide will help you squeeze the last drop of performance out of your system and fully utilize your resources.

We assume you have basic knowledge of how a Log-structured merge-tree (LSM) works. There are already plenty of great resources on LSM, no need to write one more.

## Amplification factors
Tuning RocksDB is often a trade off between three amplification factors: write amplification, read amplification and space amplification. 

**Write amplification** is the ratio of bytes written to storage versus bytes written to the database. 

For example, if you are writing 10 MB/s to the database and you observe 30 MB/s disk write rate, your write amplification is 3. If write amplification is high, the workload may be bottlenecked on disk throughput. For example, if write amplification is 50 and max disk throughput is 500 MB/s, your database can sustain a 10 MB/s write rate. In this case, decreasing write amplification will directly increase the maximum write rate. 

High write amplification also decreases flash lifetime. There are two ways in which you can observe your write amplification. The first is to read through the output of `DB::GetProperty("rocksdb.stats", &stats)`. The second is to divide your disk write bandwidth (you can use `iostat`) by your DB write rate.

**Read amplification** is the number of disk reads per query. If you need to read 5 pages to answer a query, read amplification is 5. Logical reads are those that get data from cache, either the RocksDB block cache or the OS filesystem cache. Physical reads are handled by the storage device, flash or disk. Logical reads are much cheaper than physical reads but still impose a CPU cost. You might be able to estimate the physical read rate from `iostat` output but that include reads done for queries and for compaction.

**Space amplification** is the ratio of the size of database files on disk to data size. If you Put 10MB in the database and it uses 100MB on disk, then the space amplification is 10. You will usually want to set a hard limit on space amplification so you don't run out of disk space or memory.

To learn more about the three amplification factors in context of different database algorithms, we strongly recommend [Mark Callaghan's talk at Highload](http://vimeo.com/album/2920922/video/98428203).

This file has been truncated. show original

EricGT · July 4, 2022, 8:00am

You actually found one by EighteenZi, which is one more than I currently know.

The other two are from what I now consider my main go to pages for information

While not something I plan do, possibly never, but maybe someday somewhere someone will take up this idea, think thesis.

Use a neural network to analyze the performance counters, thinking LSTM, and adjust the options or even swap out functionality, e.g. convert bloom filter to ribbon filter, real time to improve performance.

Thanks for the info, please keep it coming.

If @Jan considers some of these replies as going off topic and needs them moved just let me know. As it is your topic it is your call.

EricGT · July 6, 2022, 12:50pm

Created my first RocksDB from scratch and in reviewing the files it automatically created an ini file type holding the RocksDB options. I have not checked but it appears that all of the options are there.

Name: OPTIONS-000006
Size: 7K

EricGT · July 7, 2022, 10:39am

@jan and @peter.ludemann

Just to make sure I am understanding how to use RocksDB with Prolog.

In my first naive attempt using just (JanWielemaker/rocksdb) to store facts, rocks_put/3 and rocks_get/3 were used.

rocks_put/3 worked as expected but then thinking like Prolog used rocks_get/3 with a variable for the key, that obviously caused an error. So RocksDB expects the key to be bound.

Looking at the examples from (JanWielemaker/rocks-predicates) one finds wrapper predicates around rdb_clause/N that stand in for facts, e.g.

rdf(S,P,O) :-
    rdb_clause('rdf.db', rdf(S,P,O), true).

and

hyp(A1,A2) :-
    rdb_clause(hyp(A1,A2), true).

Backtracking through the code examples and data structures one observes that the RocksDB Key/Values should not be loaded directly using rocks_put/3 but rdb_assertz/N.

Once a RocksDB is created via loading facts with rdb_assertz/N then most general queries, e.g.

something(A,B).

can be called even from the top level.

Obviously that knowledge is just touching the toe to the water but at least nothing bit back.

drspro · July 7, 2022, 2:00pm

is this RockDB about the same as the QDBM library ?

qdbm

EricGT · July 7, 2022, 2:53pm

Yes in the sense that it is a key/value store and that it stores the data as files (ref) but there are many Key/Value stores.

SWI-Prolog also has a library to use Redis (GitHub). At the inception of Redis it was memory only but that has changed so that it can now use online secondary storage. I don’t know the details but Googling for such info is effective.

The benefit that some of us find with RocksDB is that the data can be accessed from online secondary storage which dramatically increases the amount of data, think Prolog predicates which include facts, that can be accessed, think 100s of GBs, while lowering the cost. The down side is that the access time increases. The other big benefit of RocksDB is that it has been used in production at FaceBook for several years in some areas, is open source (GitHub) and actively updated. (ref)

If you look through the history of online questions about large data sets and Prolog you will find that many over the years have encountered the problem and used traditional means to store and access the data, think SQL databases. With the data in a database then the data no longer acts like traditional Prolog, e.g. most general query, but is often accessed through some in-between code.

At least with Prolog facts I am finding success using them and being able to use a most general query. Doing so currently requires the prototype code from

JanWielemaker/rocksdb
JanWielemaker/rocks-predicates

HTH

drspro · July 7, 2022, 3:33pm

dear Eric, thankyou for your information it is very valuable for me. Do you know if this rockDb can be configured and used such that it saves its info ( keys an values ) in multiple locations so that it mirrors all the info to 2 or 3 or 4 or 5 servers / data locations, or should that be built on top using swi prolog which performs the multiple storage to multiple different RockDb locations

EricGT · July 7, 2022, 4:22pm

AFAIK no.

With so many configuration options (ref), with my RocksDB knowledge only a week old and multiple locations not a part of my needed scenario, I can not say definitely no.

Similar questions to yours are prevalent on the internet and the typical answers that I see are to look at other Key/Value stores, YMMV.

In theory what Jan W has done for SWI-Prolog by interfacing with one Key/Value store, namely RocksDB, should be transferable to other Key/Value stores.

While I am currently experimenting with the two code repositories noted:

JanWielemaker/rocksdb
JanWielemaker/rocks-predicates

I have no problem moving onto something else if it does not work.

EDIT

Evolution of Development Priorities in Key-value Stores Serving
Large-scale Applications: The RocksDB Experience by Siying Dong, Andrew Kryczka, Yanqin Jin and Michael Stumm

RocksDB is a key-value store targeting large-scale distributed systems and optimized for Solid State Drives (SSDs). This paper describes how our priorities in developing RocksDB have evolved over the last eight years. The evolution is the result both of hardware trends and of extensive experience running RocksDB at scale in production at a number of organizations. We describe how and why RocksDB’s resource optimization target migrated from write amplification, to space amplification, to CPU utilization. Lessons from running large-scale applications taught us that resource allocation needs to be managed across different RocksDB instances, that data format needs to remain backward and forward compatible to allow incremental software rollout, and that appropriate support for database replication and backups are needed. Lessons from failure handling taught us that data corruption errors needed to be detected earlier and at every layer of the system.

swi · July 7, 2022, 4:32pm

You can use library(redis) for that; redis is a very versatile distributed key value store supporting fault tolerance and clustering (on the network or even unix sockets on the local machine), and it is used by many many companies. It can be used for small databases (it consumes very little resources and it is simple to install) or for very large databases using clusters and shards.

Topic		Replies	Views
SemMedDB and Prolog Discussion on-topic-only	6	698	July 13, 2022
Is ChatGPT just wrong about library(persistency)? General chatgpt	1	304	February 16, 2023
For RocksDB rocks-predicates, is add_to_clause_index/5 used with prefix seek? General	2	317	July 2, 2022
Trying to understand library(persistency) Help!	1	2051	May 10, 2019
Database micro-benchmark (Discussion) Discussion	22	839	April 25, 2020

Persistent predicates based on RocksDB

Related topics