Persistent predicates based on RocksDB

drspro · July 7, 2022, 4:46pm

that is great to know, thankyou i will try to use Redis with swi prolog for this

peter.ludemann · July 7, 2022, 5:21pm

I don’t know the exact use case for RocksDB at Facebook, but I do know how LevelDB is used at Google (RocksDB is based on LevelDB).

The underlying file system at Google is called “Colossus” (an older version was called GFS for Google File System, but it had limitations, such as performance problems with more than 10 million files). This provides a data-center scale file system, with replication and fault tolerance – it’s like having a file system with several hundred PB or so of capacity. So, replication is handled by the file system, not by LevelDB (and, presumably, not by RocksDB).

To allow higher performance, Google provides BigTable (for datacenter scale tables) and Spanner (for multi-datacenter scale). The tables are sharded to individual “tablet” servers, and each column gets its own LevelDB tables (for more details, see Bigtable: A Distributed Storage System for Structured Data – Google Research – the “sstables” in the paper are an internal form of the open-source LevelDB tables). The individual servers are for CPU scaling – data scaling is already done by the underlying distributed file system. (When I was first at Google 15 years ago, the largest table was roughly 40PB with ~5000 servers … it took about 2 weeks for a mapreduce job with a separate ~5000 servers to read through and summarize all the data)

RocksDB seems to have taken a hybrid approach, where a single RocksDB table can contain multiple columns, and can also be multi-threaded. This scales better than LevelDB, but not as much as BigTable approach; but it also can avoid the complexity of BigTable or Spanner and can better take advantage of multi-core hardware (when BigTable was developed, datacenter CPUs seldom had more than 4 cores).

BTW, I found this replicator for RocksDB: GitHub - pinterest/rocksplicator: RocksDB Replication

peter.ludemann · July 7, 2022, 5:27pm

@EricGT – be sure to use the appropriate values for the rocks_open/3 options key and value, which can take on the values atom, string, binary, int32, int64, float, double, term – you probably want key(term),value(term) for your use case – and that’s what pack(rocks_preds) uses. (rdb_open/3 sets the key(term) and value(term))

I’ve updated the code for pack(rocksdb) and hope to have a new version with many more options available within a few days.

drspro · July 7, 2022, 5:35pm

Peter thankyou for the information that is great to know. I need to create a database system for someone. They want the data to be mirrored in multiple servers, for stability and it should be fast too ofcourse. The total data will not be that big as is the case with google. They themselves suggested to make multi location database version of PostgressDb. Me myself would prefer to use a good index-value system ( redis or rockdb ) and store the rest of the data in sequential text-files.

EricGT · July 7, 2022, 5:54pm

One interesting note related to what you write.

Some database systems are actually replacing their storage engine with RocksDB. (ref) To bad PostgressDb is not on the list.

peter.ludemann · July 7, 2022, 6:48pm

Sorry; yes, I meant rocks_open/3. I’ll edit my post.

peter.ludemann · July 7, 2022, 7:02pm

I have mixed feelings about NoSQL vs relational databases. NoSQL is definitely faster (at Google, adding a form of transactions to BigTable (which is a NoSQL key-value store) resulted in a ~10x performance hit, even though it used “eventual consistency” semantics). On the other hand, relational database are much better for handling multiple indexes, transactions and consistency (note that RocksDB has a form of transactions, although pack(rocksdb) doesn’t support it). [@jan’s rocks-predicates adds simple multi-indexing, and I intend to add more indexing features.]

Why would you want to do this? Both RocksDB and PostgreSQL do a decent job of handling “blobs” of data, and they take care of the replication issues (in the case of RocksDB, by an add-on).

PostgreSQL uses something similar to RocksDB for its data storage under the covers – an immutable key-value store that is periodically compacted. That’s why PostgreSQL doesn’t need to hold write locks in many situations where, e.g., MySQL, needs to. So, it might be that there’s not much advantage in using RocksDB with PostgreSQL whereas other databases can benefit more by taking advantage of the different record locking model. (This is pure speculation on my part, and I could easily be wrong – I had bad experiences using MySQL and have assiduously avoided it for the past 15 years.)

EricGT · July 7, 2022, 7:07pm

Thanks. Did not know this.

This module implements the hstore data type for storing sets of key/value pairs within a single PostgreSQL value. This can be useful in various scenarios, such as rows with many attributes that are rarely examined, or semi-structured data. Keys and values are simply text strings.

drspro · July 8, 2022, 7:36am

Peter thankyou for your information,

I had similar experience with Mysql where it crashed and All the data was lost, where as with text-files where the data is appended, i never had this problem. also with text you dont need to Create a database in advance, which is a big advantage.
and text-files result in much smaller backups when zipped.
In several data-storage cases you dont need an expensive key, appending in a file is enough. I thought that Postgres and Mysql stores this kind of data together with the indexes in 1 file, so that very large blob data is mixed with index-data.

because I thought that RocksDB was only for key-value storage the same as QDBM - depot - library is/was.

jan · July 8, 2022, 7:50am

Since long there is a bundled interface for BerkeleyDB. See bdb.pl -- Berkeley DB interface. That also maps binary blobs to binary blobs. There are many differences though. For example, multiple values may be associated to the same key. On the other hand there is no seek-to-prefix and key enumeration as in RocksDB, etc. For short, there are a large number of K-V stores out there and they all have pros and cons compared to each other. The interface differs fundamentally, backup, distribution, shared access, scalability, robustness against crashes, etc. all vary. Once upon a time BerkeleyDB was the dominant store. Now, I think there is not a single dominant store. RockDB came into the picture because for some project we needed something that was as scalable as possible and BDB wasn’t going to do the job. RockDB came out best. Whether or not that is still the case, I do not know.

jan · July 8, 2022, 8:36am

Redis is a lot more It also can act as a message brokering system. As a K-V store it has some limitations though. Being networked rather than embedded it is a lot slower and it is less scalable because it keeps all data in memory. Ok, it can distribute keys over a cluster and thus has no hard upper bound. This implies Cluster/LAN networking rather than using Unix domain sockets though.

But, it has interesting data structures and is pretty good stuff for horizontal scaling and coordinating micro services.

j4n_bur53 · July 8, 2022, 9:28am

Is the current implementation comparable with the indexing that Scryer
Prolog currently provides for its memory? Some hybrid between first
argument indexing and multi argument indexing:

First instantiated argument indexing
Scryer Prolog indexes on the leftmost argument that
is not a variable in all clauses of a predicate’s definition.
https://github.com/mthom/scryer-prolog#first-instantiated-argument-indexing

Or maybe its a little bit more powerful? Can it happen in the RockDB
interface of SWI-Prolog that there are Index1 and Index2 with
Index1=\=Index2 for the same predicate indicator PI here:

jan · July 8, 2022, 4:06pm

No. There are (as yet) only explicitly created indexes for the RocksDB based predicates. The design focuses on predicates with many ground facts.

Yes. Multiple indexes are supported. (Also for SWI-Prolog’s internal predicates).

peter.ludemann · July 8, 2022, 6:57pm

I’ve had my bad experiences with MySQL. RocksDB seems to be designed to avoid data loss (e.g., with the write-ahead journal, which allows recovery after a crash) … it’s the core of Google’s BigTable data storage and millions (billions?) of people depend on it. (Google also uses Reed-Solomon encoding and replication, but that’s at the underlying “file system” level)

RocksDB does data compression transparently, so it probably takes less space than text files, unless you compress the text files and not just the backups.

You have a database anyway, so text files just add complexity – they need to go into a separate directory, etc.

Appending to a file has its own dangers, unless you’re using a journaling file system. (RocksDB is in effect a journaling file)
FWIW, I’ve seen a Posix-like file system that was implemented on top of bigtable – the keys were just the full paths to file names (RocksDB does prefix compression on the keys, so a hierarchical key compresses nicely). If you always read the entire text at once (rather than doing successive read-lines), then storing large blobs of text in a database (or key-value store like RocksDB) makes a lot of sense IMHO.

j4n_bur53 · July 8, 2022, 8:29pm

And this below is the meta data that records what is indexed?

What values can “Status” have?

peter.ludemann · July 8, 2022, 9:11pm

From looking at the code:
indexing
destroying
true

I have some simple test code that I can post if you wish (it makes a few entries and then dumps the database in human-readable form).

EricGT · July 9, 2022, 1:54pm

For the SWI-Prolog RocksDB packs page

https://www.swi-prolog.org/pack/list?p=rocksdb

When installing the pack on an Ubuntu machine the option

Run post installation scripts for pack “rocksdb” Y/n?

Y needs to be entered to start the build scripts. However if make is not installed it will result in the following error

ERROR: Could not find executable file “make” in $PATH

Perhaps the instructions should link to setting up a system for building, e.g. Build SWI-Prolog from source

jan · July 9, 2022, 3:14pm

It is not always the same (although, when you have all setup to build from source you are probably fine). General instructions are pretty hard to give and often platform and package dependent. If there are foreign components you always need make and a suitable compiler for the foreign components, typically a C or C++ compiler with a compatible runtime environment (the latter is mostly a Windows problem as the runtime libraries are generally well standardized on other platforms).

I don’t see an obvious solution. One might be a Prolog program that deals with most of these issues. But, the rule set will be large and hard to maintain. Making cmake part of the solution is probably more promising. CMake roughly allows you to specify “I want to create a shared object from these source files with these properties and this tool chain” and it has most of the knowledge to properly use these tools to get the desired result.

peter.ludemann · July 9, 2022, 3:30pm

I’ll add a comment to the README about make. It seems to be part of the base system for Debian, but it doesn’t hurt to do sudo apt install make.

(I should have a new version of pack(rocksdb) in a day or two; right now, I’m doing a bit of cleanup on SWI-cpp.h, which was obviously written by a C programmer. )

EricGT · July 9, 2022, 3:40pm

I know I stated make when I wrote the reply as that was what the error message noted but your command made me wonder if the error should note cmake and that your note should use sudo apt install cmake as that is what is used on the SWI-Prolog build instructions page.

sudo apt-get install \
        build-essential cmake ninja-build pkg-config \
        ncurses-dev libreadline-dev libedit-dev \
        libgoogle-perftools-dev \
        libgmp-dev \
        libssl-dev \
        unixodbc-dev \
        zlib1g-dev libarchive-dev \
        libossp-uuid-dev \
        libxext-dev libice-dev libjpeg-dev libxinerama-dev libxft-dev \
        libxpm-dev libxt-dev \
        libdb-dev \
        libpcre2-dev \
        libyaml-dev \
        default-jdk junit4

Topic		Replies	Views
SemMedDB and Prolog Discussion on-topic-only	6	691	July 13, 2022
Is ChatGPT just wrong about library(persistency)? General chatgpt	1	298	February 16, 2023
For RocksDB rocks-predicates, is add_to_clause_index/5 used with prefix seek? General	2	314	July 2, 2022
Trying to understand library(persistency) Help!	1	2042	May 10, 2019
Database micro-benchmark (Discussion) Discussion	22	838	April 25, 2020

Persistent predicates based on RocksDB

Related topics