How to write data for fast indexing?

PaulBrownMagic · August 11, 2019, 7:29am

So I can’t claim to understand the SWI-Prolog indexing system, but it would be useful to have some kind of heuristic for writing code that’s quick to index.

I have a large for manually written static data set. So I like to write it in a frame-like syntax:

frame(id, [slot-"value"]). % with may a dozen slots max

I’m going to want to query this data with typically either the id or slot ground, so I’ll write a program to convert the syntax. The question is, what syntax?

Triple:

triple(id, slot, "value").

or id/2 slot/2 pairs:

query(ID, Slot, Value) :-
    atom(ID), !, call(ID, Slot, Value).
query(ID, Slot, Value) :-
    atom(Slot), !, call(Slot, ID, Value).

id(slot, "value").
slot(id, "value").

If I query with only the slot ground would the triple lookup be comparable to if the subject were ground? Or would the second option with id/2 and slot/2 pairs be the faster option exploiting first-argument and functor/arity indexing?

jan · August 11, 2019, 8:56am

As my dad always told me “to measure is to know”. See this notebook. Note that the query is done twice as the first query creates the index and is thus slower.

As you can see, the difference is not exciting since SWI-Prolog performs JIT deep argument indexing. If you want possibly multiple arguments indexed, the plain frame/3 version will win as SWI-Prolog does combine multiple bad indexes at one level, but does not combine indexes at different depths. If you have a fixed number of columns in your table, always use an n-ary flat predicate and if you prefer some wrapper around that to make it available in a syntax you prefer best. Typically use goal expansion to do the rewrite from the preferred syntax to the flat predicate.

Using multiple predicates is required in Prolog systems with just first argument indexing. It complicates your program and wastes memory. Performance-wise you get about the same result. In other words, you do not need JIT indexing as some Prolog systems support, but it makes your life a lot simpler.

Another rule from Richard O’Keefe: “never use a list if the number of arguments is fixed”. Use a compound term instead. These are more compact and you can access elements at O(1) time. Finally, the functor name allows you to properly name and identify the data.

PaulBrownMagic · August 12, 2019, 12:11am

Good advice!

In this case I’m working with the belief that the number of arguments is not fixed nor ordered. I updated the notebook to reflect this with a couple of access methods to the slots. In this case the indexing doesn’t seem to help, which makes sense.

But it’s easy to compile to those triples and there the indexing works a treat.

Topic		Replies	Views
Ann: SWI-Prolog 9.3.19 Releases	0	126	January 23, 2025
Significant speed difference in predicate usage? Help!	3	404	June 16, 2019
Indexing strategy on lists Help!	1	309	March 5, 2020
JIT indexing question Help!	19	303	February 14, 2023
A little benchmark shows a slow first access to tabled predicates Help!	0	42	January 23, 2025

How to write data for fast indexing?

Related topics