Indexing atoms vs. pairs -- performance

Hello,

Currently i am asserting items with a first argument an atom that is an identifer, such as so:

item(item_1, item_type_1).
item(item_2, item_type_2).

when accessing item by identifier, its indexed on the first argument – getting the full speed of indexing.

What if i were to assert item as follows:

item((item_1, item_type)).

i.e. make the first argument a composite, or a pair.

The idea here is to pass along in my system the type of an item along with its identifier.

Would this negatively impact item lookup – i.e. indexing a composite or pair instead of an atom.

I am assuming, for now, that all lookups will be of the form item(X), with X the passed around composite or pair as well.

Alternatively, i could create a new concatenated atom instead - but, would then need to split the concatenated atom if i want to access the type information, a very frequent operation – which, i think would be an expensive step to do – best if i could access the type via binding.

Btw, another approach is to create an additional predicate for type information such as item_ type(item_1) and item_type_2(item_1) – e.g. if more than one type apply to an item.

This however adds predicate lookups for each type query – i am hoping to get to the type information with at most one lookup.

Edit: *************

I did a little benchmark, and it seems that using a compound term or pair does effect a slow down – perhaps, in the 12% to 16% range, overall (via profile/1), and it seems that a pair (X-T) is a bit speedier than a compound term, as first argument.

I would need to see what speed up i get with reducing fact lookup, when using such a solution and whether its worth … would however need major refactoring of my code.

Does this make sense, more broadly .


any thoughts are much appreciated,

Dan

A pair is a compound term:

?- write_canonical(a-1).
-(a,1)
true.

Other than being a representation used by some built-in predicates (e.g. keysort/2) and libraries predicates, there’s nothing special about it. It indexes as poorly as using:

?- write_canonical((a,1)).
','(a,1)
true.

When your first argument of your facts is a compound term, you’re indexing on its functor (name and number of arguments), not on the compound term arguments themselves.

1 Like

thank you for the explanation …

So, its a “compound” index that is needed, to index on it … which would explain the slow down, since more machinery is working to find the functor and its two arguments.

Which leads me to believe that if i can find a “binary” solution – such as an integer with 96 or 128 bits that encodes identifier and type information, that it would “by default” speed up indexing – since only one index is needed — but, it may still cause some slow downs given the size of integer that is passed around.

I vaguely recall “packaged” integer that are exactly aligned with word sizes for speed ups – such kind of benefit is probably only available in C / C++

Edit: re: passing around integer

Perhaps, in prolog the actual 96 or 128 bit integer “atom” isn’t passed around at all, just a pointer to it, indicative of the bound logic variable – so the bit size doesn’t matter …

What matters is the speed to “unpack” the long integer to retrieve type information vs. obtaining the same type information via one (or more) additional indexed predicate look ups.

Dan

Paulo’s explanation is true for many Prolog implementations. The traditional way of improving a predicate’s indexing is to manually hash on the terms.

SWI-Prolog has deep indexing. You can see what swipl does by using jiti_list – I think you have to run your program with some representative data to get an accurate result.

My experience with using term hashes show that the cost of computing a hash in order to do a lookup can sometimes be significant when compared with the speedup from using the hash to benefit from first-argument indexing. Curious about other people experience here.

Thanks for mentioning SWI-Prolog recent implementation of deep indexing. I usually avoid relying on it as very few Prolog systems provide it (YAP is one of them). In cases where one of the compound term arguments acts as a key (as it seems to be the case in the OP), is usually simple to make it a separate argument, specially as the key is used to access the value. I also expect that first-argument indexing on the key followed by accessing the value(s) as additional predicate arguments to be a bit faster and more memory efficient than accessing an indexed compound term and then accessing the value(s) as arguments of that compound term. Jan, can you shed some light here?

Just a little experiment:

i0(N) :-
    forall(between(1, N, X),
           assertz(d0(X,X))),
    time(forall(between(1, N, X),
                d0(X, _))),
    time(forall(between(1, N, X),
                d0(_, X))).

i1(N) :-
    forall(between(1, N, X),
           assertz(d1(X-X))),
    time(forall(between(1, N, X),
                d1(X-_))),
    time(forall(between(1, N, X),
                d1(_-X))).
101 ?- i0(1 000 000).
% 2,000,001 inferences, 0.739 CPU in 0.745 seconds (99% CPU, 2706333 Lips)
% 2,000,000 inferences, 0.728 CPU in 0.734 seconds (99% CPU, 2748151 Lips)
true.

102 ?- i1(1 000 000).
% 2,000,000 inferences, 0.894 CPU in 0.894 seconds (100% CPU, 2238378 Lips)
% 2,000,000 inferences, 0.803 CPU in 0.803 seconds (100% CPU, 2491910 Lips)
true.

103 ?- jiti_list(d0).
Predicate                                     Indexed  Buckets Speedup Flags
============================================================================
user:d0/2                                       1        1,048,576 1000000.0   
                                                2        1,048,576 1000000.0   
true.

104 ?- jiti_list(d1).
Predicate                                     Indexed  Buckets Speedup Flags
============================================================================
user:d1/1                                       1            2     1.0  L
                                                1/1      1,048,576 1000000.0   
                                                1/2      1,048,576 1000000.0   
true.

I think that explains it all: the second is a little slower because it it first uses the indexing stuff to find the clauses (all of them) using the -/2 functor after which it is in the same situation having to deal with two arguments.

The lesson is, as a very old advice: use as few compound terms as you can for the storage and write a wrapper around it that turns the data in the shape you want to have it. That makes the clauses shorter and builds fewer indexes. Of course, this is mostly interesting if you have a lot of clauses and notably memory is getting a problem.

3 Likes