First and second argument indexing -- which should be a first argument

Hello,

I have a predicate that represents a fact along with a type of the fact, such as so:

fact(type_of_fact, fact_name).

There are only a hand-full of types, but there could be many fact names.

Given just in time indexing of arguments, is there a difference in lookup speed if i define one or the other as follows:

fact(fact_name, type_of_fact).
fact(type_of_fact, fact_name).

Note: fact_names are unique across types as well – so, if fact_name is first – one lookup is sufficient.

Does, just in time indexing, take into account the results of the first lookup – i.e. is there some kind of partitioned hash space based on first argument instantiation.

thanks,

Dan

Typically, see jiti_list/1. Given enough clauses there should be no difference. The system will (given calls using the modes (+,-), (-,+) and possibly (+,+)) create two indexes and always first checks whether the most selective argument is instantiated.

1 Like

Thanks Jan,

I expect both to be instantiated most of the time.

Daniel

As one of the arguments is (I understand) fully selective, it will use that one. I.e., it creates a list of indexes, most selective to least selective and tries their conditions in that order. As soon as one is found and it is good enough it will use it. If the found index is poor (not very selective), it will try to create a better index (remembering what it tried).

I have a multifile predicate that consists of facts of the following form (all predicates are completely ground).

protobufs:proto_meta_field_name('.google.protobuf.compiler.CodeGeneratorRequest',1,file_to_generate,'.google.protobuf.compiler.CodeGeneratorRequest.file_to_generate').
protobufs:proto_meta_field_name('.google.protobuf.compiler.CodeGeneratorRequest',3,compiler_version,'.google.protobuf.compiler.CodeGeneratorRequest.compiler_version').

When I load this and run jiti_list/1, I get

?- jiti_list(protobufs:proto_meta_field_name/4).
Predicate                                     Indexed  Buckets Speedup Flags
============================================================================
protobufs:proto_meta_field_name/4               2+3         16    13.0   
                                                2            8     3.4      

I want to have it indexed on 1+2 because that’s one of my call patterns, but there doesn’t seem to be a way of convincing SWI-Prolog to change its indexing. Also, the predicate is deterministic with arguments 1+2 ground:

?- protobufs:proto_meta_field_name(X,Y,A,B), protobufs:proto_meta_field_name(X,Y,A2,B2), [A,B]\=[A2,B2].
false.

What’s weird is that sometimes when the first two arguments are ground, I get a choicepoint, and sometimes not. Here’s a query that leaves no choicepoint:

?- protobufs:proto_meta_field_name('.google.protobuf.compiler.CodeGeneratorRequest',3,FN,Fqn).
FN = compiler_version,
Fqn = '.google.protobuf.compiler.CodeGeneratorRequest.compiler_version'.

And here’s a query that leaves a choicepoint, although there’s no other result.

?- protobufs:proto_meta_field_name('.google.protobuf.compiler.CodeGeneratorRequest',1,FN,Fqn).
FN = file_to_generate,
Fqn = '.google.protobuf.compiler.CodeGeneratorRequest.file_to_generate' ;
false.

In both cases, there are more results when just argument 2 is specified:

?- forall(protobufs:proto_meta_field_name(A,1,FN,Fqn), writeln(A:FN)).
.google.protobuf.compiler.Version:major
.google.protobuf.compiler.CodeGeneratorRequest:file_to_generate
.google.protobuf.compiler.CodeGeneratorResponse:error
.google.protobuf.compiler.CodeGeneratorResponse.File:name
true.

?- forall(protobufs:proto_meta_field_name(A,3,FN,Fqn), writeln(A:FN)).
.google.protobuf.compiler.Version:patch
.google.protobuf.compiler.CodeGeneratorRequest:compiler_version
true.

I suppose one possibility is to change the predicate to combine the first 2 arguments (created using atomic_list_concat/2). But the code would be cleaner if the indexing would do what I want …

@Jan,

I wonder if one could add indexing directives, to explicitly mention on which argument(s) index-- to help the indexer in cases it can’t figure out the intent of the programmer …

Would this make sense …

If you call this typically with the first 3 arguments instantiated it may well decide that 2+3 is “good enough”. You can force a 1+2 index by calling the predicate simply as proto_meta_field_name(1,2,_,_), i.e. by instantiating the arguments you want to index with anything that can be indexed and leave the other arguments uninstantiated. There are (rare) cases where this can be useful. If Prolog’s decision seems wrong, please share the complete data and calling patterns. In general you can get a sub-optimal result if you first make (in this case) a call with args 2+3 and next make calls with 1+2+3 where it considers the existing 2+3 index good enough.

From the call it seems it is just using the first argument index. Even if not though, hash indexes are not guaranteed to avoid a choice point even if there is only one solution. That is guaranteed for indexing on a single argument where the arguments are atoms, small integers or compounds (on the functor). In any other case the key that is being compared may collide and thus leave a choice point.

That may, according to the single argument index, work. Alternatively, if you know a call must be unique, use once/1 or a cut.

It is not my intent. On just about any program automatic indexing has proven to be doing a good job. Manual indexing tends to index stuff that is never used, forgets important indexes, etc. Also, considering the ever more advanced automatic indexing, the syntax needs to be adapted and may get rather nasty. As is, we are not limited by complicates of the specification. The predicate property that describes the current indexing used is deliberately documented to be subject to changes without notice.

From this example we can learn that it may be interesting to be able to state that a single or combined index is (by the user) guaranteed to be unique. Also here we have the syntax problem mentioned above. It does however state important information about the structure of the database. This information could be used for automatic verification. If can be a hint to the indexing code and it has a runtime advantage: if we found a matching clause we do not have to look further.

1 Like

The fact that we know some combination of arguments must be unique can be used easily to add a wrapper for assert/1 that guarantees that this is actually the case. We could use that in SWI-Prolog’s transactions as a constraint, but that is another matter. To me, documenting such a fact has value and we can do something useful with it.

As for the hash, we compute a word-sized key for each argument. If the argument is an atom or small integer, the key is the argument. If it is compound, its functor is the key and otherwise (strings, big integers, etc.) it is a hash key of the content. If the index concerns multiple arguments these are combined into a single word-size hash.
We do compare this word-sized key, but not the actual content. That would require complicated poking around in the clause (or a lot of additional storage).

Perhaps for those examples where a hash seemingly gives a non-determinstic result while the rule is tagged as deterministic – a detailed check could be made – at the contents level – at least for debugging purposes.

The code is here, in the latest commit to contrib-protobufs:

which updates the submodules to use

You can reproduce this (assuming you put the source in $HOME/src/swipl-devel):

$HOME/src/swipl-devel/build/src/swipl -l $HOME/src/swipl-devel/packages/protobufs/bootstrap/protoc-gen-swipl
?- jiti_list(protobufs:proto_meta_field_name/4).
Predicate                                     Indexed  Buckets Speedup Flags
============================================================================
protobufs:proto_meta_field_name/4               4          128   127.0   
true.

… which is a different result than I reported above (but still not what I want). I tried running some of my test data (which would call proto_meta_field_name/4, and jiti_list showed that the first argument was now indexed:

Predicate                                     Indexed  Buckets Speedup Flags
============================================================================
protobufs:proto_meta_field_name/4               4          128   127.0   
                                                1            4     1.5   

Here’s sample test data (file d_wire.pl):

You can run this by:

?- [d_wire].
?- wire_codes(WS), protobuf_parse_from_codes(WS, '.google.protobuf.compiler.CodeGeneratorRequest', Request).

The files with the facts are these:

$HOME/src/swipl-devel/build/home/library/protobufs/protoc_gen_prolog_pb/google/protobuf/descriptor_pb.pl
$HOME/src/swipl-devel/build/home/library/protobufs/protoc_gen_prolog_pb/google/protobuf/compiler/plugin_pb.pl

which are symlinks to

$HOME/src/swipl-devel/packages/protobufs/bootstrap/protoc_gen_prolog_pb/google/protobuf/descriptor_pb.pl
$HOME/src/swipl-devel/packages/protobufs/bootstrap/protoc_gen_prolog_pb/google/protobuf/compiler/plugin_pb.pl

One possibly confounding factor is that the facts files have a term_expansion that attempts to prevent duplicate clauses (I can’t depend on the normal Prolog module mechanism because I’m putting all the facts into the protobufs module … this might be a bad choice, and I’m open to suggestions of better ways of doing things). The term expansion uses clause/2 for look-up; if this uses/sets the index, it could skew the automation because the initial lookups will have zero, one, two facts and they won’t be representative.

Is there a way to force removal of an index/indexes so that the next call will re-envoke the JIT indexing?

term_expansion(protobufs:Head, Clause) :-
    (   clause(protobufs:Head, true)
    ->  Clause=[]
    ;   Clause=[protobufs:Head]
    ).
term_expansion((protobufs:Head:-Body), Clause) :-
    (   clause(protobufs:Head, Body)
    ->  Clause=[]
    ;   Clause=[(protobufs:Head:-Body)]
    ).

I’ll try removing this and will also try a few other things and report the results.

Yes, that’s a trick I discovered a while ago and used successfully. pykythe/src_browser.pl at 1e8b37e173bbcd12e50118fff3258a2d0811ebdb · kamahen/pykythe · GitHub
It didn’t do anything here, possibly because of the term_expansion that called clause/2?

@jan - I thought that SWI-Prolog did 1st-argument indexing by default and that the JIT indexing was for additional indexes. (The typical WAM does 1st-argument indexing; it’s somewhat implicit in the “switch_on_…” opcodes) But it’s been many years since I’ve worked with the details of WAM, so my memory might be faulty; and swipl’s VM is a bit different anyway.

Yes, I did that (using an if-then-else: line 1413 in protobufs.pl - protobufs:field_and_type/7). I found the problem because the directive `det(field_and_type/7) caused an run-time exception, due to a choicepoint.

One curious thing is that adding the if-then-else removed what appeared to be an infinite backtracking loop with some test data. I haven’t figured out what’s going on there - the code should be quite straightforward, although there could be some perturbations due to my use of when/2 and dif/2. (And if anyone can suggest techniques for finding infinite backtracking loops, please tell me! - the best I can think of is adding print statements until I start to see a pattern.)

I had a det/1 declaration which caused an exception due to a choicepoint that shouldn’t have been created with that combination of arguments, if swipl’s JITI was working the way I expected.

There was an additional issue, in that some inputs appeared to cause an infinite loop (I’m guessing that it’s a backtrack loop, based on some other debugging of this code). It’s difficult for me to track down the cause because this code bootstraps itself and it’s called as a protoc compiler “plugin”. The problem seems to be related to the extra choicepoints, which is puzzling but not entirely surprising – there’s possibly a failure path that gets into infinite backtracking (again, if anyone has suggestions for finding such situations, please tell me).

I think SWI-Prolog -- Manual still provides a fair description of the current indexing. And yes, clause/2 also causes the predicate to be indexed (as well as retract/1).

Run the profiler for a while? I’d (temporarily) add a goal to

prolog_ide(thread_monitor)

in the code. That allows you to view patterns in the stacks and temporarily profile the goal for a while. That should give an idea what it is doing.

1 Like

Section 2.18.4 says “The base-line functionality of Prolog implementations provides indexing on constants and functor (name/arity) on the first argument”. Section 2.18 seems to say that this applies only if there are 10 or fewer clauses (Linear scan on first argument)

Can I use term_hash/2 or term_hash/4 to check for hash collisions on predicate arguments? If the JITI uses term_hash/4, what are the values for Depth and Range? And how are hashes of compound indexes (e.g., 2+3) computed? The values of hash_term/2 are supposed to be the deterministic, so what could cause the “Indexed” value from jiti_list in one run to be 2+3,2 (buckets: 16, 8) and 4 (buckets: 128) in another run? (I think the clauses were the same in both cases, but I’m not 100% sure.)

AFAIK, no. The clause indexing uses its own hashes. The key computation is in indexOfWord().

1 Like

From looking at the code, the hash of an atom is its “id” (wherever this is computed … I guess I need to start learning about the swipl engine - where should I start?). And for integers, it’s the “murmer hash”.

Anyway, indexing on arg 4 of protobufs:proto_meta_field_name/4 would be great if actual calls had arg 4 instantiated; but they’re only instantiated during the load phase (by the call to clause/2). When I took out the term-expansion (and call to clause/2), then the correct indexes were built (I don’t think that there are any calls that have only arg1 instantiated, but I presume that having index 1 doesn’t hurt, given that there’s also index 1+2):

Predicate                                     Indexed  Buckets Speedup Flags
============================================================================
protobufs:proto_meta_field_name/4               1+2        256   140.0   
                                                1           32    16.2   

@jan - is there a way to delete the index(es) for a predicate so that the next call with rebuild it? It seems that the automatic rebuild mechanism doesn’t work for my particular case, because the number of clauses doesn’t change but the calling patterns do change.

Hi Peter,

Given the importance of indexing to write optimized code, perhaps you could put together a short tutorial on how to think about indexing that would be great. In particular additional pitfalls to be aware of.

Markus has a tutorial where he introduces to indexing, perhaps, your insights could expand on it a bit (no pun intended :slight_smile:

Reading your postings, i still feel its a bit of a magic to me …

I presume you mean this video: Argument Indexing in Prolog - YouTube ?
I don’t think I can add much to that or to what @jan wrote upthread.
If the argument is an atom, then it’s just a hash-map from the atom’s “id” to a clause. Even if there’s a hash collision, the result will still be deterministic (no spurious choicepoint) by the normal techniques of hash buckets. (There’s nothing that requires a hash or hash buckets; other hash collision techniques exist; or instead of hashing, the atoms could be sorted and a binary search used.)

If you want to dig into the abstract machine: http://cliplab.org/logalg/slides/8_wam_AitKaci_slides.pdf (slide 123), https://www.it.uu.se/edu/course/homepage/logpro/ht04/LP_Impl.pdf (slide 40)
etc etc

Modulo some word size I guess? Or modulo hash table size?
So if you have a large atom table but the predicate with the hash table
index, has only a few entriesm, collision are quite frequent.

I think SWI-Prolog will do both, first modulo hash table size, it must so to go into
the search. And then it doesn’t compare the content, but what was computed
into word size, so the atom table would need to be larger than word size.

What is the word size 16-bit or 32-bit or 64-bit in SWI-Prolog hashes?

If word size is same size as atom table adress size, then you never get this larger.
So you would need to check other data types for collision. Or your code has
spurious choice points because its only data deterministic but

not structurally deterministic.

The “key” for an indexed value is always a machine word (32 or 64 bits). For atoms that is the atom handle and thus guaranteed to be unique. This uniqueness guarantee exists for arguments where all values are either atoms, compounds (on the functor) and small integers (wordsize-7bits). This can be a mixture of these types. If any value is something for which there is no uniqueness guarantee (float, string, big integer, rational number) the uniqueness guarantee is no longer true, also not for calls with e.g., an atom as the computed key for one of these larger objects may collide with an atom handle.

But as always, indexing is not set in stone. Future versions may do better. In extreme cases they may do a little worse. The above uniqueness guarantee should not be dropped.

You always get spurious choice points in Prolog.

Example:

?- member(b,[a,b,c]).
true ;
false.

The usual cure is two fold. You have the choices, variant 1) and variant 2):

  1. Keep calling a multi predicate with deterministic data, but place a cut:
    ?- member(b,[a,b,c]), !.
    true.
    
  2. Make a specialized version of the multi predicate that is structurally deterministic:
    ?- memberchk(b,[a,b,c]).
    true.
    

Both approach 1) and approach 2) do eliminate the suprious choice point.

The same approach also holds for DCG based Prolog programms.
The current Protobuf implementation has like 2-3 locations of a DCG
predicates that leave a spurious choice point.

Its not so difficult to spot these locations.