Optimizing the JIT indexes

I have a few million facts (mostly triples, except the node labels have 5 components) that I run some exhaustive validation queries over, using forall/2. I’ve found that the time to run these queries is sensitive to which query is first, which seems to affect the JIT indexes, and I’d like to know how to convince swipl to do the best job of indexing.

The tests are something like this:

forall(lookup(V, some_edge, some_value),
       assertion((node(V1,V2,V3,V4,V5, another_edge, _),
                  node(V1,V2,V3,V4,V5, another_edge2, _))),
forall(node(V1,V2,V3,V4,V5, some_edge, some_value),
       assertion(another_pred(V1,V2,V3,V4,V5, _))),
...

I ran each set of tests twice and subtracted the two times to get the time to create the JIT index and the time to run the queries. The query times varied by about 2x (and the faster queries required about 20% more indexing time). Query and indexing times were both about 10 seconds; this will become significant when I have a much larger set of nodes.

Here’s what jiti_list had to show about the two different indexes.

For the slow queries:

Predicate                                     Indexed  Buckets Speedup Flags
============================================================================
src_browser:node/7                              1+7      2,097,152 536379.5   
                                                1+4      524,288 298994.2   
                                                6+7      262,144  4821.1   

and for the fast queries:

src_browser:node/7                              1+4      524,288 298977.9   
                                                4+7      524,288 28394.0   
                                                6+7      262,144  4821.1   

So, the questions are:

  • Is this working-as-intended?
  • How should I “prime” the JITI to get maximum speedup? (e.g., query with once(node(_,_,_,_,_,_,_)_, or once(node(some_value,_,_,_,_,_,_)), or something else?)
  • Is there a way to show the variation on the first argument? (I get 2,012,781 entries (clauses), with 164,067 unique values for the first argument, by using findall/3 and length/2 … which took 0.3 seconds; just wondering if there’s faster/better way – predicate_property(...,number_of_clauses(_) gives me half the answer, but it’d be nice to get the JITI-like information for the first argument)
  • The predicates are dynamic … does it make sense to compile them? (They’re just facts.)

And follow-on questions:

  • Is there any way to save the indexes in a .qlf file?
  • Can I run separate queries in separate threads to create the indexes for multiple predicates in parallel?
  • What happens if I run a query in one thread, which starts creating an index, and simultaneously run a similar query in another thread – does the second query stall until the index is ready from the first thread?

I guess you read the docs? I don’t have much to add. The key thing is that it only creates a new index if it considers its existing indexes poor. That way the result can depend on the order.

Won’t index anything because there is nothing to gain using an index: each clause is a candidate anyway. If you want an index on the second argument, make a call with only that argument instantiated (doesn’t matter to what as long as it is indexable). Now if you want an index in the 2+3 args (2nd and 3rd), you can call with these two arguments given, but it might conclude that the previous 2nd argument index is good enough. You could improve by first calling with 2+3 bound. Now it will analyze the index for 2 and for 3 and if it considers both not very selective it will create a combined one (if that is selective). You can than call with only 2 bound. It cannot use the 2+3 index, so it will create a new index for the 2nd arg.

In general though, indexing is up to the system and poor indexing should be considered a candidate for improvement. Trying to fool the system may work now, but may not in the future. It is a bit like the good old register declaration in C. That helped a lot in old C compilers. Modern compilers ignore it. Most of the time the compiler is probably right, but quite likely there are also cases where the human was right and the compiler wrong.

In this case I see two ways forward. One would be a flag that controls its space vs time heuristics that would make it more aggressive in creating more indexes. Another might be sampling: count the number of times we use a sub-optimal index and if this exceeds some threshold build the additional index.

No. There is a predicate property giving information about the indexes, but I’d rather keep the exact statistics used private as this stuff changes over time. Currently it considers unique values/clauses and the stdev of the numbers of duplicate values, where it prefers arguments with a more even distribution of duplicates over one where we say have one value that appears 1,000 times while all the others are unique.

In the current system, no. The only difference between a dynamic predicate and a static is a flag that prevents assert/retract. At least, for many clauses. For a few clauses the system may use dedicated clause selection for specific patterns in static predicates which it does not do for dynamic ones.

No. It probably takes less time to build them then to load them from a file, so there is little point. Besides, the building is lazy while the loading would not be.

Yes. Each thread looks for existing indexes and does the work to decide on which index to create. Once deciding to create an index it checks whether some other thread is already doing this and if so it waits for the other thread to complete the index.

I think that was enough for today :slight_smile:

1 Like

Running the “create JITI queries” in parallel worked very nicely – thank-you. Loading 2 million facts from a .qlf file took 0.5 seconds (vs 12.6 seconds from a .pl file); adding 4 indexes in parallel took 1.4 seconds (thank-you concurrent_maplist/2).

I’m seeing wide variation in the time my queries take (+/- 20%), so it’s a bit difficult to determine what are the best indexes for my data. (Feels a bit like tuning a database …)

2 Likes

Sounds good. I don’t really recall 20% wide. Just slightly little different data characteristics may already cause larger differences. I guess the ultimate route to these things is to monitor bottlenecks and take measures at runtime. For example, the running thread could count calls with poor indexing and at some point either take action itself or notifies some other thread to take a closer look and possibly create an additional index. Something like this already happens for clause and atom GC to some extend. Also for normal GC there is now a hook that is triggered if GC takes too much time and that can try to reason on the basis of some key statistics over the last few (5?) garbage collections to adjust parameters.

I get at least 20% variation in timings with the same data. My hypothesis is that memory usage (both within swipl and with other running things such as chrome) is a factor; but statistics/2 for that doesn’t seem to work – e.g.:

1 ?- statistics(core, M).
M = [0, -1].
2 ?- statistics(memory, M).
M = [0, -1].

and I don’t know how to interpret globalused and similar.

These are old compatibility keys that are hard to support in modern OSes. If you want to know overall resource usage, use the OS tools. All the other stuff is somewhat documented. I doubt anyone wants to describe exhaustively what is possible …

5 posts were split to a new topic: Poor memory reuse (tcmalloc?)

Answering myself, this worked, to prime the indexes (except I ran them in parallel):

( node(sig,corpus,root,path,lang,_,_) -> true ; true ),
( node(_,_,_,_,_,fact,value) -> true ; true),
( edge(vname(sig,corpus,root,path,lang),_,_) -> true ; true ), % deep indexing
 ...

Also, a second execution of the queries ran about 30% faster, despite there being no change in the output from jiti_list for any of the predicates in the queries as far as I can tell.

1 Like