Recorded DB problem

AlexeyKosarchuk · March 11, 2023, 3:59pm

I’am trying to use recorded DB for my project. But I encountered some weird behaviour. If I record few different terms on one key, then some strange effect occurs - erase can’t delete record.

Here is the code. I add some value-holding predicate, then another, unrelated predicate on the same key. After that I query value-hoding record, erase it and assert new. After that I query new value. After each step I dump recorded DB contents. It appears, that record is not erased! And it ony happens if I add anotherRecord after value(0). I’am very confused. Is it an error or I just don’t get how recorded DB works?

dump(Title) :- 
	writef("%w ----------\n", [Title]),
	findall(_, (recorded(model, X), write(X), nl), _).

		
rdb_test :-
	% Initial records:
	recordz(model, value(0)), dump("Added value 0"),
	recordz(model, anotherRecord), dump("Added other record"),	
	
	% Query & erase old record
	recorded(model, value(_), Ref), erase(Ref), dump("Erased value 0"), 
	% Add new record
	recordz(model, value(1)), dump("Added value 1"),
	
	% Query new record. 
	recorded(model, value(T)), dump("Queried value"),
	writef("Value is %w\n", [T]).

peter.ludemann · March 13, 2023, 7:00am

Your code leaves choicepoints (you can see them by entering “*” when you get the true response).

The Update View section in the manual might explain what’s going on.

jan · March 13, 2023, 11:18am

Avoiding the choice point of the recorded/3 that produces the reference for erase indeed fixes the problem. It is a bug though. Pushed a fix.

That said, the recorded database is old school. For most applications the normal dynamic database will perform better, has better update semantics and is part of the standard. Why would you use the recorded database?

Boris · March 13, 2023, 1:12pm

It is not clear enough to the uninitiated from the docs that this is old school. I guess this is the only reason. Without additional context it is difficult to tell if it is a good idea to use it or not. In particular, this sentence:

There are few reasons to use this database in SWI-Prolog due to the good performance of dynamic predicates.

While there is a difference between “there are A FEW reasons” and “there are FEW reasons” it is a bit of a stretch to expect everyone to catch it maybe it should be worded stronger, something like

Since the introduction of X.Y…

or

Since the improvements to Y.Z it is a better choice to use dynamic predicates. The only exception is [some rare use case].

And I am sorry but I don’t know enough to be able to propose the actual PR.

AlexeyKosarchuk · March 13, 2023, 8:40pm

Actually, for me recorded DB outperformed dynamic predicates. Interestingly enough, most of time my code spent in retract/1 predicate. But actually I’ve dropped using dynamic predicates or recorded DB in favor of storing list of predicates in global variable and using it as dynamic DB. This approach outperforms both dynamic predicates and recorded DB. I can optimize it even more by oring ordered sets.

EricGT · March 13, 2023, 9:40pm

You might be interested in resizing arrays without re-allocating them which is found in

Does SWI-Prolog have N+K-trees?

I know the documeation is not easy to find or follow in the topic and there is no official documentation on it but for certain cases it is rather useful.

AlexeyKosarchuk · March 14, 2023, 6:19am

Thanks. Can’t wrap my mind around that yet, but looks interesting!

Boris · March 14, 2023, 7:12am

It would be interesting to know what exactly was happening there. Is it possible that the just-in-time indexing was somehow not working as intended? Or could it be that retract was leaving behind choice points you didn’t need but didn’t cut?

Altogether I try to use retractall/1 if at all possible. I guess you need to model your data in such a way that your primary key can be used together with retractall/1. I have not measured what will work better in combination with deep indexing.

(Sorry for the basic stuff, just to make sure it is clear what I mean)
Consider those two ways to represent your data:

Approach 1:

p(a, 1, foo).
p(a, 2, bar).
p(b, 1, baz).
p(b, 2, foobar).

Approach 2:

p(a(1), foo).
p(a(2), bar).
p(b(1), baz).
p(b(2), foobar).

You can use retractall/1 to delete all rows that have the compound key {a, 1} using:

Approach 1:

retractall(p(a, 1, _)).

Approach 2:

retractall(p(a(1), _)).

and to delete all rows that only match the {a}, either

retractall(p(a, _, _)).

or

retractall(p(a(_), _)).

Long story short, I have not measured what would be more efficient, and in what situations. Obviously, if you use the second approach you cannot easily use retractall/1 to do the alternative to:

retractall(p(_, 1, _)).

jan · March 14, 2023, 9:00am

Looks like the number of state variables is small. In that case using a data structure and passing it around wins. Probably dicts are the fastest easy to use structure (rather than lists). The nice thing is that this way of storing data is nicely transparent to backtracking and the use of variables.

Only if that doesn’t work, start looking at alternatives. Think about the semantics wrt backtracking and copying and the scale (how much data). The alternatives are the dynamic database, recorded database (indeed wins in a few scenarios), global variables and non-backtrackable assignment.

logicmoo · March 14, 2023, 12:02pm

Another reason to use the recorded database is that variable attributes are preserved. ( attributes are preserved in lots of places… just not the dynamic database)

AlexeyKosarchuk · March 14, 2023, 2:33pm

So far that list contains around 300-500 terms, but in future it will be much, much more. I expect to run into troubles when scaling data up. For that reason I’ve absrtacted away all operations on that model into stable API. This way I can change actual storage approach with minimal troubles.

Frankly speaking, I was surprised by slow performance of dynamic predicates. Especially by the fact that most of cpu time is spent on retract/1. Could it be caused by that other problem with choicepoints I wrote in other topic?

AlexeyKosarchuk · March 14, 2023, 2:39pm

I haven’t looked deep into that. I’ve found that one run of program took around 150 seconds and most of time was spent on retract/1. retractall/1 didn’t fixed the issue. Recorded DB worked better, but for now I use list of terms stored in global variable. Final variant performs same amount of computations in 8 seconds. For now it is more that enought for me, so I stopped further optimization efforts.

Maybe it somehow messed with that other problem I wrote in other topic - rouge choicepoints launching my stack into the space ) Don’t know )

AlexeyKosarchuk · March 14, 2023, 3:06pm

Yeah, it was choicepoints
Dynamic predicates are fully vindicated, their performace is awesome! But choicepoints somehow slowit down leading to reatract/1 chewing up more and more CPU. I can’t really se a logic behind this dependency, but there should be some.

jan · March 14, 2023, 3:48pm

To comment seriously we’d need some more info on the how you represent, update and query the dynamic db. Typically you should make sure that clause indexing can deal with it. The minimum portable Prolog promise is first argument indexing, which implies that the first argument must be a unique atom, small integer or compound term with unique name/arity. SWI-Prolog indexing does a lot more, though many of its extensions do not guarantee there is no choicepoint.

How bad a choicepoint is depends a lot on the program. For programs that do mostly backtracking search, it doesn’t matter much. For programs that are intended not to do any backtracking, the cost can be pretty high. Choice points prevent last-call optimization, causing the program to use a lot of stack space and slow down garbage collection.

peter.ludemann · March 14, 2023, 8:18pm

I have some code that needs to store 10s of thousands of terms (a global symbol table). After a few experiments, I ended up using library(rbtrees). (I didn’t try @EricGT 's suggestion of N+K trees.) I also wrote some custom code to read and write them quickly, using library(fastrw).

AlexeyKosarchuk · March 15, 2023, 3:25pm

I represent dynmic DB as single dynamic predicate
model(X)

, where X is a concrete term asserted to that DB. For example,

model(location(object1, (10, 20)))

. An then I can query it by running goal

model(location(object1, Location)).

If I need to update obect location, I retract old clause and assert new

Boris · March 16, 2023, 6:00am

I suspect the details about the data you were inserting/deleting is not the most interesting part; it is rather what else was going on in your program. Do you think you could explain how you profiled? Did you just do

?- profile( run ).

and then looked at the profile window that popped up?

AlexeyKosarchuk · March 16, 2023, 9:57am

Yeah. Today I’ve run load test with 500.000 terms in that dynamic DB. Profile was something like:
retract/1 - 98%
model/1 - 0.4%
assertz/1 - 0.1%

jan · March 16, 2023, 4:21pm

That quite likely is going to give yiou very poor indexing. You probably better use multiple predicates and make the key (object) the first argument. Also terms like (10,20) for a point are noity ideal. Better is

location(object1, 10, 20).

You could also go for a triple model like below. That typically performs worse and leaves more unwanted choicepoints, but it easily allows enumerating all properties of some object.

model(object1, location, point(10,20)).

And, make sure that if the call should be deterministic, you make sure it really is.

Topic		Replies	Views
How to delete all records in a database using 'The recorded database' predicates? Help! how-to	4	394	July 15, 2023
Database micro-benchmark (Discussion) Discussion	22	842	April 25, 2020
Assertion Creep Help!	17	1581	December 16, 2020
retract(pred(X,Y,Z) -- with X,Y,Z bound, creating a choice point Help!	23	1410	January 2, 2022
Trying to understand library(persistency) Help!	1	2060	May 10, 2019

Recorded DB problem

Related topics