Were there some changes to the dynamic database?

I tried this test case:

:- dynamic foo/2.

test :-
   retractall(foo(_,_)),
   between(1,1000000,K),
   assertz(foo(K, bar)),
   J is K-5,
   retract(foo(J, bar)),
   fail.
test.

In version 9.1.x the test took 1.5 secs.
In recent version 9.3.x it takes 2.0 secs.

1 Like

SWI-Prolog’s dynamic predicate design is not well suited for this scenario. The concurrent clause garbage collector may fall behind reclaiming the dead clauses. There are no serious changes between the two versions, but timing of the clause garbage collector can make big differences. Using asserta/1 rather than assertz/1 cuts the time from 1.9 sec to 0.9 sec on my machine. This is because retract/1 doesn’t have to skip through the dead clauses.

I have been thinking for a long time that we would need a mechanism to deal with a dynamic predicate that has one clause and have an atomic update. For shared dynamic predicates that would simplify thread synchronization. As is, we need to wrap the update in transaction/1 to avoid other threads seeing two or no clause (or we need to use with_mutex/2 for both reading and writing). For thread_local/1, we could simply immediately discard the clause.

I wonder about a good API though. One option I have been thinking is to have a max_clauses(Count) predicate property, causing asserta/1 to drop the last clause on an assert if the max was reached and assertz/1 to drop the first clause. Having one clause is than a special case of this. That API would also deal with using a dynamic predicate for caching. We could also have a property to make the assert block when the max is reached, turning the dynamic predicate into an alternative for message queues.

Another option would be a predicate to replace a clause. But, how do you reference the old one? Using a clause reference? Using a term as retract/1, but this may match zero clauses or multiple clauses. What then? Such an API is interesting for updating dynamic predicates, but less ideal for the global variable scenario and not usable for the cache/queue scenario.

Is there any known current practice in this area?

3 Likes

I think max_clauses(Count) is a very good idea. I often find myself using assert to cache the last value of a long computation or network provided value. This is, I think, a very very common use case for whch max_clauses can be a very good solutIon.

With the rest of SWI-Prolog staying as it is, does that mean that we can efficiently access the clauses inserted last, but not the ones that were inserted first in this message queue?

It would mean we get normal clause indexing in the queue. The system message queues allow picking messages out of order, but the indexing is limited. It I recall correctly queues index similar to first argument indexing, e.g., on atoms, small integers and functor. Using predicates as message queues is more Prolog like.

If I recall correctly from some talk, Ciao seems to use predicates for communication, though a little different. I recall they allow a thread to enumerate a “queue predicate” using backtracking, where it blocks, waiting for new clauses, when it reaches the end.

The test case is not related to queues. Its only the simplest test
case that came to my mind that has assertz/1 and retract/1 in it.
The old dynamic database test case I had was fibonacci memoization,

but this tested only assertz/1. Now I wanted also a test case that tests
retract/1. Just imagine a test case that tests a little database representing
a warehouse, and articles get randomly inserted and deleted.

Mostlikely the originally posted test case is slow because retract/1
is not cutted away, and I guess its skipping some list with clauses
marked as deleted or something? For example this test case is much faster:

/* SWI-Prolog 9.3.3 */
?- time(test).
% 7,001,001 inferences, 0.000 CPU in 1.110 seconds (0% CPU, Infinite Lips)
true.

That it shows 0.0000 CPU is quite a funny bug? But it takes only 1.0
second for 1_000_000 transactions, and retract/1 is cutted away via (->)/2:

:- dynamic(foo/2).

test :-
   retractall(foo(_,_)),
   warehouse(1000000, 1, 18884).

warehouse(0, _, _) :- !.
warehouse(N, X, Y) :-
   assertz(foo(Y, bar)),
   (retract(foo(X, bar)) -> true; true),
   zx81(X, Z), zx81(Y, T),
   M is N-1,
   warehouse(M, Z, T).

zx81(X, Y) :- Y is (X*75+74) mod 65537.

No clue why you get 0 sec CPU. Is this always? Looks like a platform issue. The assert/retract example of the benchmark suite implements the sieve of Eratosthenes for prime numbers. Not exiting either … See bench/programs/sieve.pl at 20144be63495a71560641243d9f1f0a66490b89e · SWI-Prolog/bench · GitHub

Shared dynamic predicates are fairly complicated. We must keep clauses around as long as some thread is accessing the predicate in some generation. We also want that the predicate access is lock free. In practice, this means that retract merely sets the erased generation and a flag. The dead clause remains there and must be skipped. If we re-hash the predicate we ignore these clauses. Some thread may still need access, but it will be using the old hash (we can not delete hashes without making sure no thread is using them).

At some point, triggered by excessive skipping of erased clauses or memory involved in erased clauses, clause GC is triggered. This finds the oldest generation accessed by some thread and removes all clauses that were erased before that generation from the predicate. It also discards old indexes. Indexes and the predicate maintain a reference count on the clause. If this drops to zero, we still need to prove the clauses is not being executed. If that is done, we can finally actually reclaim the memory.

Luckily almost all the work is done by the gc thread, and thus happens concurrently. As is, thread_local/1 predicates get no special treatment. That could of course be improved.

Overall, dynamic predicate are indexed and as fast as static code. Some compile time optimizations are not performed, but most dynamic predicates are facts, so that has few implications). Dynamic predicates use either simple first argument indexing or hash tables. Some dedicated clause selection as used for static code is never considered as it would need to be reconsidered with each change. Assert is fairly fast. Retract itself as well, but may cause slow-down by leaving many dead clauses. If you want it to be really slow, ensure some thread maintains an open choice point on a dynamic predicate :slight_smile:

A general advice is to use asserta/1 rather than assertz/1 if there is no requirement for assertz/1.

Its not related to what version of assert/1 is used. You can repeat the
warehouse test with asserta/1 as well. Its approximately the same speed:

/* assertz/1 */
?- time(test).
% 7,001,001 inferences, 0.000 CPU in 1.110 seconds (0% CPU, Infinite Lips)
true.

/* asserta/1 */
?- time(test).
% 7,001,001 inferences, 0.000 CPU in 1.129 seconds (0% CPU, Infinite Lips)
true.

In the warehouse test the retract/1 is random. So you cannot
influence the retract//1 by some choice of assertz/1 or asserta/1.
Since it is random it will do 50% of the retracts near the start

and 50% of the retracts near the end of the clause list. If you use
the other assert, it will be still 50% to 50%, namely this time 50%
near the end and 50% near the start of the clause list.

True for this case. There are many more scenarios where asserta/1 produces faster code than where assertz/1 does this. I think it holds as a rule of thumb. Deep understanding is always better :slight_smile:

I can still not reproduce in any way your claim of such a general rule.
It seems to depend on the distance, if I change the distance from 5 to
50, assertz/1 and asserta/1 behave again the same:

/* asserta/1 and distance 50 */
?- time(test).
% 3,000,000 inferences, 0.344 CPU in 0.927 seconds (37% CPU, 8727273 Lips)
true.

/* assertz/1 and distance 50 */
?- time(test).
% 3,000,000 inferences, 0.281 CPU in 0.913 seconds (31% CPU, 10666667 Lips)
true.

This is the first test case with 5 replaced by 50:

:- dynamic(foo/2).

test :-
   retractall(foo(_,_)),
   between(1,1000000,K),
   assertz(foo(K, bar)),
   J is K-50,
   retract(foo(J, bar)),
   fail.
test.

Why would a smaller distance 5 make such a difference?

Wouldn’t a max_clauses(Count) cause some subtle bugs? If the values for Count is 100 and the program works fine for 50 clauses, adding another 51 clauses (by someone who doesn’t know that the magic limit is 100) might suddenly cause the program to misbehave because the oldest clauses is silently removed.
(Or maybe I misunderstand the intended use of max_clauses(Count).)

As it is, it’s easy enough to check the number of clauses in a predicate and do a retract if there are too many – hiding this behind the well-known asserta/1 and assertz/1 seems dangerous (at the least, I would suggest creating new predicates with the “max” semantics).

It is not so easy. SWI-Prolog has a number_of_clauses predicate property. For most Prolog systems this is much more expensive. Even fetching the number_of_clauses cannot always avoid a scan and counting.

An important aspect of this would be to make the assert and accompanying retract atomic.

But yes, you are right that a disadvantage of this is that assert/1 no longer has the traditional semantics on all predicates. This makes it harder to understand what is going on. I think the current palette lacks proper support for

  • Get a simple atomic update of a dynamic predicate with one clause that acts as a (shared) global variable
  • Allow a predicate to act as a message queue for communication between threads.
  • Use a predicate for caching without the cache growing unlimited.
  • Update clauses that are not the first or last clause of a predicate.

Some of this does have support now, but rather complicated:

  • Atomic updates can be implemented using transaction/1
  • A queue can be implemented using thread_wait/2
  • Limiting the size of a predicate can be implemented using assertz/1, get the number of clauses using predicate_property/2 and remove the excess first clauses using retract/1.
  • There is no sensible way to update a clause that is not first or last. The best we have is do a findall/3, retractall/1, update the list and assert it again.

I’m open to suggestions to improve on this.

1 Like

Rather hard to say, but I think the difference is that with just a few clauses the system does not create an index. There are several parameters that control the whole picture and the values for these parameters are just guessed. Some are also tradeoffs between memory usage and performance. Getting such parameters right is an interesting challenge. SWI-Prolog tries to be smart with garbage collect and stack expansion, using parameter settings that prefer minimizing memory as a start, but modify the parameters if GC time exceeds more than a certain percentage of the total CPU time. There are many parameters for clause garbage collection and indexing that may profit from a more dynamic approach.

In this case asserta/1 solution should run slower since it creates an
unfavorable order. You can check via listing(foo/2) what the clause
order is after executing the original test case.

With assertz/1 it will retract/1 always the first cause:

/* assertz/1 */
?- time(test), listing(foo/2).
% 3,000,000 inferences, 2.063 CPU in 2.478 seconds
(83% CPU, 1454545 Lips)
:- dynamic foo/2.

foo(999996, bar).
foo(999997, bar).
foo(999998, bar).
foo(999999, bar).
foo(1000000, bar).

true.

With asserta/1 on the other hand it will always retract/1 the last clause:

?- time(test), listing(foo/2).
% 3,000,000 inferences, 0.578 CPU in 0.854 seconds
(68% CPU, 5189189 Lips)
:- dynamic foo/2.

foo(1000000, bar).
foo(999999, bar).
foo(999998, bar).
foo(999997, bar).
foo(999996, bar).

true.

But maybe what we see is an effect that no index means also clause scanning
during backtracking. And because it is the last clause retract/1 from asserta/1 it
can do choice point elimination. And what we see in timing is the difference

between choice point creation and no choice point creation? Are retract/1
choice points that expensive?

What is the use case for max_clauses? (Except for max_clauses=1)

Using them as cache with limited size (cache the last N lookups) and an alternative for message queues if we also add an option to make the assert block if the limit is reached rather than dropping an old clause.

I use the caching approach regularly, but as is it is a bit long coding and as you cannot remove the last clause easily, the only way to implement it is to use assertz/1, get the number of clauses and if above the limit backtrack over retract N times.