Behavior of a backtracking query when rdf_assert/1 or assert/1 are made during execution

ericzinda · November 14, 2020, 8:22pm

I have an application that uses RDF to dynamically modify one graph in an RDF database each time a query in a different RDF graph is successful and then backtrack to find other answers. Like this:

% Backtrack over all places and all rocks in the graph called 'default'
% adding 600 items to a different graph called 'otherGraph' 
% between each solution
test(Place, Rock) :-
    % Find all things named "place" in 'default' graph
    rdf(Place, hasName, 'place', default),
    % Find all things named "rock" in 'default' graph
    rdf(Rock, hasName, 'rock', default),
    % Add 600 items to a graph named 'otherGraph'
   addItems(600).

% Iterates N times and asserts something slightly different into 
% Graph 'otherGraph'
addItems(0) :- !.  
addItems(N) :- 
    N>0, 
    uuid(NewArgID), 
    rdf_assert(NewArgID, rel, value, otherGraph), 
    S is N-1, 
    addItems(S).  

% This is the data being queried
:- rdf_assert(rock, hasName, rock, default).
:- rdf_assert(rock1, hasName, rock, default).
:- rdf_assert(rock2, hasName, rock, default).
:- rdf_assert(france, hasName, place, default).
:- rdf_assert(japan, hasName, place, default).
:- rdf_assert(hawaii, hasName, place, default).

The problem is that the rdf_assert/4 predicate has side effects due to re-hashing which can cause a temporary change of ordering as well as temporary duplicates. You can see that here if you run the query multiple times in a row. The data in the graph being queried is never changed, but spurious duplicates happen the second time the test is run:

?- findall([Place, Rock], test(Place, Rock), Test).
Test = [
[france,rock],[france,rock1],[france,rock2],
[japan,rock],[japan,rock1],[japan,rock2],
[hawaii,rock],[hawaii,rock1],[hawaii,rock2]].

?- findall([Place, Rock], test(Place, Rock), Test).
Test = [
[france,rock],[france,rock1],[france,rock2],
[japan,rock],[japan,rock1],[japan,rock2],
[hawaii,rock],[hawaii,rock1],[hawaii,rock2],
[hawaii,rock],[hawaii,rock1],[hawaii,rock2],
[hawaii,rock1],[hawaii,rock],[hawaii,rock2]].

This is by design as described here. My problem is that the spurious duplicates cause a big performance problem because my application has a ton of data and these can cause a lot of unnecessary work. So, I’m looking for alternatives.

Looking at my application, I’ve realized that I can probably rewrite to avoid using the RDF database and just use plain assert/1 (along with setup_call_cleanup/3) . Will the Prolog semantics for assert/1 and retract/1 have similar re-hashing side effects as RDF? Or are they written such that data that is not being queried can be updated/added without affecting the results of the query (including duplicates or ordering)?

ericzinda · November 15, 2020, 9:01pm

FWIW, I went ahead and swapped out my use of the RDF predicates for plain old assertz/1 and retractall/1 since I wasn’t using anything especially RDF, except for the transaction support.

I got a 50% performance improvement basically across the board (against a huge test suite that I have)! Wow. No idea how much of that was from a different storage engine and how much was removing spurious backtracking (which happened a lot in my scenario).

I was able to do this because it appears as though the new transaction/1 is exactly what I needed. Although it doesn’t appear to be in the current stable build (8.2.1) yet unless I’m missing something.

Still curious about the answer to my question above about the behavior of assert/1 and retract/1 with respect to backtracking, though.

jan · November 16, 2020, 8:32am

No. While for RDF, duplicates nor order has any semantic value, it is important in Prolog. Thus, changing these would be a bug. The RDF store uses a rather peculiar data representation that is good at providing many indexes using not too much memory. It is good at bulk loading and small concurrent changes in a large dataset. It is not good at changing large portions of the data regularly as that triggers the rather awkward re-hashing approach. It has advantages though: RDF rehashing progresses concurrently with querying and writing while SWI-Prolog rehashing of a clause index freezes both reads and writes. Re-hashing a clause index does not affect on going reads though and does not need to wait until these are completed.

Yeah. When implemented the RDF database easily outperformed Prolog. SWI-Prolog’s indexing has improved a lot though, so now that tables have turned. The RDF database does use less memory. If you need N-tuples rather than triples you need a lot of extra triples and Prolog may still win

Transactions are new in 8.3. They won’t go into 8.2. They will go to 8.4. There is no schedule for 8.4 though. In general though, there is little reason to wait for stable. The devel series are a rolling release that is pretty stable most of the time. It comes with new stuff that is not always stable (exact semantics, API, implementation). Occasionally there are larger rewrites that may affect overall stability. They are announced here.

Rules of thumb if you want the safe route

If the stable version works for you, use it. Do be aware that stability issues are quicker fixed in the devel versions.
Else pick the devel version and just stick to the version you started with until some bug or new feature asks for action. In that case you have to decide between GIT cherry-picking the required change(s) or move to the latest devel version.

Topic		Replies	Views
Rdf/3 rdf/4 ordering of results? Help!	2	366	June 2, 2020
Rdf_assert_list -- and adding to a list Help!	1	333	February 28, 2019
Incrementally tabling a predicate that uses rdf Help!	3	425	June 30, 2020
Issue with rdf_assert of date/3 Predicate bug	2	395	March 20, 2020
RDF Literals with Prefixes General	2	291	November 9, 2022

Behavior of a backtracking query when rdf_assert/1 or assert/1 are made during execution

Related topics