I have a few million facts (mostly triples, except the node labels have 5 components) that I run some exhaustive validation queries over, using forall/2
. I’ve found that the time to run these queries is sensitive to which query is first, which seems to affect the JIT indexes, and I’d like to know how to convince swipl
to do the best job of indexing.
The tests are something like this:
forall(lookup(V, some_edge, some_value),
assertion((node(V1,V2,V3,V4,V5, another_edge, _),
node(V1,V2,V3,V4,V5, another_edge2, _))),
forall(node(V1,V2,V3,V4,V5, some_edge, some_value),
assertion(another_pred(V1,V2,V3,V4,V5, _))),
...
I ran each set of tests twice and subtracted the two times to get the time to create the JIT index and the time to run the queries. The query times varied by about 2x (and the faster queries required about 20% more indexing time). Query and indexing times were both about 10 seconds; this will become significant when I have a much larger set of nodes.
Here’s what jiti_list
had to show about the two different indexes.
For the slow queries:
Predicate Indexed Buckets Speedup Flags
============================================================================
src_browser:node/7 1+7 2,097,152 536379.5
1+4 524,288 298994.2
6+7 262,144 4821.1
and for the fast queries:
src_browser:node/7 1+4 524,288 298977.9
4+7 524,288 28394.0
6+7 262,144 4821.1
So, the questions are:
- Is this working-as-intended?
- How should I “prime” the JITI to get maximum speedup? (e.g., query with
once(node(_,_,_,_,_,_,_)_
, oronce(node(some_value,_,_,_,_,_,_))
, or something else?) - Is there a way to show the variation on the first argument? (I get 2,012,781 entries (clauses), with 164,067 unique values for the first argument, by using
findall/3
andlength/2
… which took 0.3 seconds; just wondering if there’s faster/better way –predicate_property(...,number_of_clauses(_)
gives me half the answer, but it’d be nice to get the JITI-like information for the first argument) - The predicates are
dynamic
… does it make sense to compile them? (They’re just facts.)
And follow-on questions:
- Is there any way to save the indexes in a
.qlf
file? - Can I run separate queries in separate threads to create the indexes for multiple predicates in parallel?
- What happens if I run a query in one thread, which starts creating an index, and simultaneously run a similar query in another thread – does the second query stall until the index is ready from the first thread?