Wait on database changes

As is, assert uses a lock to include the new clause into the linked list of clauses. Probably could be achieved lock-free as well, but I’m not aware of this lock suffering from significant contention. Retracts merely sets the retract generation. Asynchronous clause garbage collection (running in the gc thread) detects clauses that are erased and there is no open query with a generation in which the clause is visible. It uses the same lock as above to remove the clause from the predicate. The clause is placed in quarantine as reading threads may be scanning it. Eventually it can be really reclaimed. So yes, executing a predicate is lock free, both for static and dynamic code.

This doesn’t affect the current prototype wait-on-change mechanism. That basically makes the POSIX condition variable fully available to the Prolog user. It notably allows zero or more agents to listen to database changes that can include zero or more elementary changes. This is more flexible than a message queue. The extra power comes with extra responsibilities though: lock-based consistency management is easy to get wrong. The use case I have in mind is mostly a multi-threaded blackboard design This is hard to implement with message queues only as a term written on a queue can only be picked up by a single thread.

1 Like

The main thing I want is isolation without locking. As we have all the visibility stuff in place to deal with the logical update view this isn’t particularly hard to implement: make sure that the thread performing the database changes does so using far-in-the-future generations. That mechanism is currently already used for reloading code: the thread that reloads the code does all its work using far-in-the-future generations. When the reload is done, the permanent changes are moved to just above the current generation and finally the notion of the generation is adjusted to make the changes visible. As a result, threads other than the one reloading see a simple atomic change to the database.

Now this is a bit simplified scenario as only one thread is allowed to (re)compile a file. Other threads detect some thread is recompiling the file and wait for the compiling thread to finish. We probably want multiple threads using transactions. We might get away by serializing the individual transactions though. That also avoids the issues of resolving conflicting transactions.

I think we might want so-called snapshot isolation and MVCC for IO-based assert/retract. From docs it seems like we already have predicate versions to implement logical updates but currently it only works in a scope of a single predicate invocation (and backtracking) and not in a scope of some arbitrary group of calls.

This would automatically free reads from waiting behind a mutex until some concurrent write transation is doing its IO. For IO-free assert/retract, imho, it’s not worth doing.

1 Like

This could indeed be a module based lock. Not sure how useful this is. The library isn’t designed for highly volatile database management. A lock over a rather complex I/O operation isn’t too bad either.

If you want to send a PR managing a per-module lock created when attaching the database I’m happy to include it, otherwise I’ll wait until this proves to be problematic.

In my experience so far, for virtually any usage of the persistency library I would have used shared, either because there are no threads, so it doesn’t matter or because multiple threads share the data.

For now, I see no big problems with this library. It is useful for relatively simple databases with not too many modifications like and application user database.

That is your assumption, but it isn’t true for SWI-Prolog. Dynamic predicates are executed without locking. They use exactly the same technology, except there is a globally visible predicate that holds a dynamic array of thread-specific implementations that are otherwise just dynamic predicates. I guess it is barely measurable but I’d expect thread local predicates to be a little slower.

Mnesia is a distributed DBMS built with Erlang. It’s not a “native” part of Erlang.

Incidentally, Google Bigtable and Spanner offer distributed atomic transactions, which are a layer above the basic key-value functionality (where changes to a single row are atomic). There’s a large overhead for doing this: “… uses twice as many resources as the previous system to process the same [processing] rate. … Roughly 30 times more CPU per transactions than a standard DBMS.”

(And I think there are a few other papers)

It’s probably not relevant to discuss Bigtable, replication, Spanner, Paxos, etc. further. (BTW, Bigtable can also have replication; it’s not only a Spanner feature.)

I just wanted to point out that transactions can have a considerable overhead – and that’s for a design that uses eventual consistency (partly for performance and partly because of replication). So, whatever solution is decided on for transactions, you probably want to include the ability to have single-row (single assertion) atomicity without any transaction overhead.

(Bigtable and friends also have a notification API – it’s also not cheap, even though it piggy-backs on the garbage collector.)

Transactions as we are discussing comes pretty cheap:

101 ?- time(forall(between(1, 1 000 000, I), (retractall(p(_)), asserta(p(I))))).
% 3,000,001 inferences, 1.240 CPU in 1.244 seconds (100% CPU, 2418825 Lips)
true.

102 ?- time(forall(between(1, 1 000 000, I), transaction((retractall(p(_)), asserta(p(I)))))).
% 6,000,002 inferences, 1.722 CPU in 1.725 seconds (100% CPU, 3484241 Lips)
true.

SWI-Prolog’s dynamic database handles this scenario quite poorly though :frowning: