How to write data for fast indexing?

So I can’t claim to understand the SWI-Prolog indexing system, but it would be useful to have some kind of heuristic for writing code that’s quick to index.

I have a large for manually written static data set. So I like to write it in a frame-like syntax:

frame(id, [slot-"value"]). % with may a dozen slots max

I’m going to want to query this data with typically either the id or slot ground, so I’ll write a program to convert the syntax. The question is, what syntax?

Triple:

triple(id, slot, "value").

or id/2 slot/2 pairs:

query(ID, Slot, Value) :-
    atom(ID), !, call(ID, Slot, Value).
query(ID, Slot, Value) :-
    atom(Slot), !, call(Slot, ID, Value).

id(slot, "value").
slot(id, "value").

If I query with only the slot ground would the triple lookup be comparable to if the subject were ground? Or would the second option with id/2 and slot/2 pairs be the faster option exploiting first-argument and functor/arity indexing?

As my dad always told me “to measure is to know”. See this notebook. Note that the query is done twice as the first query creates the index and is thus slower.

As you can see, the difference is not exciting since SWI-Prolog performs JIT deep argument indexing. If you want possibly multiple arguments indexed, the plain frame/3 version will win as SWI-Prolog does combine multiple bad indexes at one level, but does not combine indexes at different depths. If you have a fixed number of columns in your table, always use an n-ary flat predicate and if you prefer some wrapper around that to make it available in a syntax you prefer best. Typically use goal expansion to do the rewrite from the preferred syntax to the flat predicate.

Using multiple predicates is required in Prolog systems with just first argument indexing. It complicates your program and wastes memory. Performance-wise you get about the same result. In other words, you do not need JIT indexing as some Prolog systems support, but it makes your life a lot simpler.

Another rule from Richard O’Keefe: “never use a list if the number of arguments is fixed”. Use a compound term instead. These are more compact and you can access elements at O(1) time. Finally, the functor name allows you to properly name and identify the data.

2 Likes

Good advice!

In this case I’m working with the belief that the number of arguments is not fixed nor ordered. I updated the notebook to reflect this with a couple of access methods to the slots. In this case the indexing doesn’t seem to help, which makes sense.

But it’s easy to compile to those triples and there the indexing works a treat.