JIT indexing question

I have two predicates with identical argument patterns, one with 9 clauses and one with 8. The first two arguments in each case form a unique pair of atoms. After executing some code, jiti_list/1 displays the following index info for the two predicates:

Predicate                                     Indexed  Buckets Speedup Flags
============================================================================
clpBNR:multCase/6                               1            4     3.0   
clpBNR:odivCase/6                               1+2         16     8.0   

multCase has 9 clauses and only gets indexed on the first argument - there will be 3 clauses with the same first argument. According to the numbers the speedup is less than half of that of odivCase (8 clauses) which is indexed on first and second arguments.

Any explanation for how this happens? Does clause ordering matter? Number of required buckets?

Not a big deal in the grand scheme, but I am curious.

Not an answer, but I’ve found that I sometimes need to “warm up” the JITI by calling the predicates with argument patterns that I expect, and then looking at what jiti_list/1 tells me. For example, I have this code for warming up the JITI for the nodes and edges in a large source code graph, with the resulting indexes as comments (these depend very much on the nature of the data; for this particular data, there is only one "corpus and “root”, so the indexing doesn’t consider those):

concurrent_maplist(index_pred,
  [node(sig,_,_,_,_,_,_),                          % node/7 1
   node(sig,corpus,root,path,lang,_,_),            % node/7 1+4
   node(sig,_,_,_,_,_,value),                      % node/7 1+7
   node(_,_,_,path,_,_,value),                     % node/7 4+7
   node(_,_,_,_,_,fact,value),                     % node/7 6+7
   edge(sig,corpus,root,path,lang,_,_,_,_,_,_),    % edge/11 1
   edge(_,_,_,_,_,_,sig,corpus,root,path,lang),    % edge/11 7
   edge(_,_,_,_,_,lang,sig,corpus,root,path,lang), % edge/11 6
     ...])

index_pred(Goal) :- ( Goal -> true ; true ).
1 Like

The order of the calls may impact the indexes as well. If, for example, there is first a call with just the first argument instantiated, it will create a first argument index. If now we get a call with the first 2 arguments instantiated it may conclude the first argument index is “good enough”. If the calls happen in the other order, it may first decide on a combined index. But now, it cannot use the combined index for the 1st argument alone, so it will create a first argument index.

Probably there should be some flags that control the indexing. There are complicated tradeoffs between time (execution and analysis), space and choicepoint creation. There may also be better heuristics. I think being a bit more aggressive on predicates with a small number of clauses may help (notably in choicepoint reduction). On the long term I think such predicates need a more subtle decision tree based solution that should also consider checks immediately after the neck such as type tests. The current indexing is first of all designed for (dynamic) predicates with lots of clauses.

One direction to think about might be some hotspot warning mechanism?

In this case all the calls are made with the first two arguments instantiated to specifically match the first two arguments in the head. Each of these arguments is one of 3 atoms so 3x3=9 possibilities ==> 9 clauses. One of the predicates disallows one of the possibilities, so has only 8 clauses.

I suspect there’s one of those internal heuristics that’s just tipping the decision one way or the other. As I said, I was just curious.

Curiosity won the day and I think I found it. In pl-index.c:

2634  if ( best >= 0 &&
2635       (float)clist->number_of_clauses/best_speedup > 3 )
2636  { ....

In my example there were 9 clauses and best_speedup was 3 (same argument value in 3 clauses). 9/3 =:= 3, so test just failed. If I change > to >= indexing occurs on first two arguments for a speedup of 9.

Predicate                                     Indexed  Buckets Speedup Flags
============================================================================
clpBNR:multCase/6                               1+2         16     9.0   
clpBNR:odivCase/6                               1+2         16     9.0   

Not that many clauses here, and nothing dynamic, but an impressive speedup in clause selection if the numbers are to be believed. (I modified the odivCase/6 to fit the 9 clause pattern).

Might be nice to have this tuneable for situations that really matter, but not sure what the best way to do that might be.

:slight_smile: The real question is of course “what is the difference in actual execution performance?” The Speedup merely indicates that the number of clauses for which we try to do full head unification is reduced by this factor. Even that is only true if none match or we try all on backtracking.

Note that none of the hard coded numbers in pl-index.c have been validated. They are merely (not so) well informed guesses.

I would also not really know how to validate these numbers. In the end this is a multi-dimensional problem. What is the price of clause unification (in comparison to execution)? Does a missed choicepoint elimination do much harm? In a program with lots of backtracking search the price is typically small. With a lot of forward progress (without backtracking), the dangling choicepoint may severely impact the amount of life data (and GC time).

Nice challenge :slight_smile:

Valid point. Not that it matters much but I assume you would get a corresponding speedup even if a full head unification isn’t done. In this particular example, the head unification will succeed or fail on the first two arguments, so the relative speedup will be the same, won’t it. But on average you may only get a factor of half the number of clauses, e.g., 4.5 vs 1 for 9 clauses, plus a bit of head unification on each attempted clause. That probably isn’t much in this example, although these predicates are called a lot.

This is maybe a bigger issue IMO. If you have the latter case (mostly forward progress) and you want to protect against dangling (unnecessary) CP’s, your options are limited. You can’t be sure if indexing will do the job, even though it may be obvious. So you end up with either a bunch of green cuts spread through the clauses, or wrapping the call in a once/1 (or the cheaper Goal -> true), neither of which is very pretty and do incur an additional cost (creating/removing unnecessary CP).

Indeed.

Thinking about this, I might have an idea. What if we would allow for some instrumentation for failed head unifications (optionally extended with other stuff we may consider clause selection early in the body, such as type tests)? A failed head unification is the indication that indexing is non-optimal. Now we can do some examination and look for a solution. If there is nothing better, we should be able to say that we know and don’t want to examine this situation any more at a cost that is low enough to have neglectable impact.

Has something like this been tried?

This replaces the current indexing on first call? Other than dynamic predicates, does it make any difference?

Good point. Possibly. After all, there is no point looking for indexing options if no head unification failed. That is not the primary goal though. The primary goal is to figure out for which predicates it might be worthwhile to seek for better indexing. Using head unification failure is quite ideal for this. As we will backtrack right after, we have access to both the current (partial) unified arguments, the arguments at call time as well as the program pointer that will tell us exactly where the unification failed.

There are still a lot of open questions though:

  • Trigger immediately or collect some failures before deciding this is a hot spot?
  • If we do not trap immediately, what information do we collect and when do we trap?
  • If we do trap immediately, what information to pass to the hook?
  • How to prevent further unwanted traps?

It seems we can put a flag on the VM instruction without noticeable impact on performance, so that might be a way out.

Some kind of “mode” declaration would be good. From looking at the jiti_list, I have a pretty good idea of what indexing I want; there’s no need for the system to try to guess this.

Having said this, some of my big indexing (a) takes a long time [I “warm up” the indexing by running a set of queries in parallel using concurrent_maplist/2] and (b) is data dependent [right now, some arguments aren’t useful for indexing but in future they will be].

A declaration will probably only make it somewhat faster. So far, experience with the old explicit indexing indicated they were often wrong. Sometimes due to misunderstanding of how the index worked, sometime due to misunderstanding of the program and sometimes due to forgetting to update the index after changes to the usage.

Surely automatic indexing also gets it wrong at times. That is what we need to improve upon. The advantage is that the indexes are built lazily and that we can easily introduce new indexing techniques, some of which may be hard to specify. I consider decision tree type of clause selection for static code with not too many clauses and often no nice uniform clauses as one often finds with dynamic code.

In the meanwhile asking some dedicated queries can get what you want as you are using.

1 Like

Fair enough, but it can be pretty difficult figuring out, when it matters, how to “influence” the indexing. For the motivating example, the predicate was designed to work with indexing on the first two arguments and all calls have that instantiation pattern. However, the automatic indexing was sub-optimal, perhaps due to a slightly over-restrictive heuristic. Would a “hint” have been useful?

On the other hand, my `jiti-list also contains:

Predicate                                     Indexed  Buckets Speedup Flags
============================================================================
clpBNR:select_min/3                             1            2     1.0   

where:

select_min(n(L,R,Min),Min,R) :- var(L), !, 
	nonvar(Min).  % fail if tree was empty
select_min(n(L,R,KData),Min,n(NewL,R,KData)) :-
	select_min(L,Min,NewL).

Why did it bother? And about half the items in the list are similar in nature or not much better, so it looks like a fair amount of work for no obvious benefit.

It’s true that the focus of clause indexing isn’t examples like these, and it does do a good job for the couple of predicates in the library that stand to benefit the most (“speedups” > 25), but I don’t think an additional mechanism to provide info for indexing is a bad thing for those who really need it. Training strikes me as a bit of a black art at the best of times.

But if ways to improve automatic indexing can be found so this was a non-issue, that’s even better.

Good question. Looking at the code though, this should probably be written something like below. This is of course a transformation that could be automated.

select_min(n(L,R,D),Min,Res) :-
    (   var(L)
    ->  nonvar(D),
        Min = D, Res = R
    ;   Res = n(NewL, R, D),
        select_min(L, Min, NewL)
    ).

Clause indexing is not really going to help matching twice against the n/3 term if the first clause fails unless we get something like

   if ( var(A1) ) use clause 1
   else if ( A1 is an n/3 term )
          if ( var(arg(1, A1) ) use clause 1
        else
          use clause 2
   else fail

These are rules that you can deduce from the predicate without runtime information. Surely an index directive would not have helped using the current state of affairs.

But not the point of this thread. All things being equal, I usually prefer a style which avoids “if-then” and their ilk. But I do recognize they have their uses. (While on this topic, perhaps better compiling of once/1 would be beneficial so I wouldn’t have to write “Goal -> true” instead.)

My main point was why did it try and index this clause (and many others in the list) at all; not how would one index it better. I really wasn’t expecting it to.

On the other hand, it’s not obvious to me how to get optimal indexing on my initial example, so perhaps a “hint” would help there. Not that that would be my preferred solution.

I wasn’t really expecting this either and it makes no sense. Shouldn’t be too hard to track down the “why” with a bit of tracing. Could be a wrong heuristic. I don’t know whether or not it matters. Might give some slowdown, but possibly just costs a little memory.

Load library(apply_macros) to do the transformation for you. This is not ideal. Eventually we would like a more general partial evaluation to do more of these transformations automatically. As is, it does allow zero overhead for a number of meta-predicates. See the lib for details.

Wouldn’t it be even better to write the code like this:

select_min(n(L,R,Min),Min,R), var(L) => 
    nonvar(Min).  % fail if tree was empty
select_min(n(L,R,KData),Min, R) =>
    R = n(NewL,R,KData),
    select_min(L,Min,NewL).

However, unlike @ridgeworks , I couldn’t cause either of these to generate JITI.

Not on topic, but wouldn’t I have to write the first clause as:

select_min(n(L,R,Val),Min,T), var(L) => 
    nonvar(Min),  % fail if tree was empty
    Min = Val,
    T = R.

Colour me old fashioned but I’m not a big fan of SSU. Additional syntax. Additional semantics. Another portability issue. And not much upside IMO. YMMV.

This is wrong (depending on how it is intended to be used), This clause will only fire when called with the same Min and R in the two positions.

If you want best performance though, you should do the matching of the n/3 term only once. That is why the single clause version is “eventually” faster. I don’t know whether it is in practice. Probably it is as one clause removes the whole indexing stuff.

The fact that this creates an index at all is indeed still a bit weird. Without knowing all properties that may matter (e.g., dynamic, multifile) and the call patterns it is hard to say.

I surely disagree on that. It saved me many hours of debugging :slight_smile: It makes writing “functional” code much less error prone. Also pattern matching on terms with unknown instantiation becomes a lot easier (no need for subsumes_term/2 or var/1 tests all over the place). And yes, it is not portable, but as @j4n_bur53 has shown, you can automatically translate => rules to ISO Prolog.

As I said, YMMV. I spend way more time debugging design problems than problems due to unwanted/unexpected non-determinism. My other main issue is mis-interpreting library API’s.