I have an application that uses RDF to dynamically modify one graph in an RDF database each time a query in a different RDF graph is successful and then backtrack to find other answers. Like this:
% Backtrack over all places and all rocks in the graph called 'default'
% adding 600 items to a different graph called 'otherGraph'
% between each solution
test(Place, Rock) :-
% Find all things named "place" in 'default' graph
rdf(Place, hasName, 'place', default),
% Find all things named "rock" in 'default' graph
rdf(Rock, hasName, 'rock', default),
% Add 600 items to a graph named 'otherGraph'
addItems(600).
% Iterates N times and asserts something slightly different into
% Graph 'otherGraph'
addItems(0) :- !.
addItems(N) :-
N>0,
uuid(NewArgID),
rdf_assert(NewArgID, rel, value, otherGraph),
S is N-1,
addItems(S).
% This is the data being queried
:- rdf_assert(rock, hasName, rock, default).
:- rdf_assert(rock1, hasName, rock, default).
:- rdf_assert(rock2, hasName, rock, default).
:- rdf_assert(france, hasName, place, default).
:- rdf_assert(japan, hasName, place, default).
:- rdf_assert(hawaii, hasName, place, default).
The problem is that the rdf_assert/4 predicate has side effects due to re-hashing which can cause a temporary change of ordering as well as temporary duplicates. You can see that here if you run the query multiple times in a row. The data in the graph being queried is never changed, but spurious duplicates happen the second time the test is run:
?- findall([Place, Rock], test(Place, Rock), Test).
Test = [
[france,rock],[france,rock1],[france,rock2],
[japan,rock],[japan,rock1],[japan,rock2],
[hawaii,rock],[hawaii,rock1],[hawaii,rock2]].
?- findall([Place, Rock], test(Place, Rock), Test).
Test = [
[france,rock],[france,rock1],[france,rock2],
[japan,rock],[japan,rock1],[japan,rock2],
[hawaii,rock],[hawaii,rock1],[hawaii,rock2],
[hawaii,rock],[hawaii,rock1],[hawaii,rock2],
[hawaii,rock1],[hawaii,rock],[hawaii,rock2]].
This is by design as described here. My problem is that the spurious duplicates cause a big performance problem because my application has a ton of data and these can cause a lot of unnecessary work. So, I’m looking for alternatives.
Looking at my application, I’ve realized that I can probably rewrite to avoid using the RDF database and just use plain assert/1 (along with setup_call_cleanup/3) . Will the Prolog semantics for assert/1 and retract/1 have similar re-hashing side effects as RDF? Or are they written such that data that is not being queried can be updated/added without affecting the results of the query (including duplicates or ordering)?