Puzzling Performance Issue

Using: SWI-Prolog version 8.4.0

I have two versions of a simple predicate which takes a string Input and an “cursor” position PosIn and advances the cursor to any skip white space characters (defined by ws_char/1) to position PosOut:

skip_ws0(Input,PosIn,PosOut) :- 
	sub_atom(Input,PosIn,1,_,C),
	ws_char(C), !,
	P is PosIn+1,
	skip_ws0(Input,P,PosOut).
skip_ws0(_Input,PosIn,PosIn).

skip_ws1(Input,PosIn,PosOut) :- 
	((sub_atom(Input,PosIn,1,_,C), ws_char(C)) 
	 -> P is PosIn+1,
	    skip_ws1(Input,P,PosOut)
	 ;  PosOut = PosIn
	).

ws_char(' ').
ws_char('\t').
ws_char('\n').
ws_char('\r').

Timing the difference:

?- S="1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 ",
string_length(S,L),time((between(1,10000,_),between(0,L,I),skip_ws0(S,I,_),fail)).
% 5,530,001 inferences, 0.383 CPU in 0.385 seconds (100% CPU, 14426855 Lips)
false.

?- S="1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 ", 
string_length(S,L),time((between(1,10000,_),between(0,L,I),skip_ws1(S,I,_),fail)).
% 6,540,001 inferences, 0.315 CPU in 0.315 seconds (100% CPU, 20751433 Lips)
false.

So skip_ws1 outperforms skip_ws0 by about 25% on this 100 character input string. On a considerably longer test input (over 30,000 characters), the difference is much, much larger (Note the difference in Lips).

?- json_sample1K(S),string_length(S,L),time((between(1,10,_),between(0,L,I),skip_test:skip_ws0(S,I,_),fail)).
% 1,949,421 inferences, 4.509 CPU in 4.510 seconds (100% CPU, 432372 Lips)
false.

?- json_sample1K(S),string_length(S,L),time((between(1,10,_),between(0,L,I),skip_test:skip_ws1(S,I,_),fail)).
% 2,254,701 inferences, 0.109 CPU in 0.109 seconds (100% CPU, 20608945 Lips)
false.

Other observations:

  • Although the large string was not tested, the effect is observable on SWISH.
  • profile/1 port count data is identical for the two versions on the large input string.
  • looking at the vm_list output, pretty much the only differences are a) 1 clause vs. 2, and b) i_cut in skip_ws0 vs. c_cut in skip_ws1.

While it’s obvious which version I’ll be using, I’m curious as to whether there’s any explanation for this effect.

2 Likes

Nice puzzle :slight_smile: I had to use valgrind to understand it. The problem is that skip_ws0/3 has two clauses and (thus) it computes the hash of first argument and holds it against the first arguments of the clauses. The hash of a string is the hash of its content, so the longer the string the more expensive it gets to compute. For skip_ws1/3 this is not an issue as there is only one clause and thus the system simply calls the clause.

Now the problem is what to do? We could figure out that all clauses have an unbound first argument and thus there is no point computing the hash. We could also make computing the hash of a string for clause indexing purposes faster, for example by not considering the whole string but (say) just the start, end and length.

This is a general problem for predicates with more than one clause that is called with a string argument and thus needs to be improved. Thanks for reporting!

edit This also applies for (really) big integers and rational numbers. An alternative could be to place the hash in front of the actual string/integers/rational such that we do not have to compute it. That would also speedup failing (==) comparison as that will often already fail on the hash.

You are looking at the decompiler (clause/2 and friends). To find what instructions do, see pl-vmi.c.

1 Like

Thanks for the explanation; I knew there must be one.

I understand very little about how clause indexing works in detail, but I do wonder whether it’s worth hashing anything “big” (strings, MP ints and rationals) and instead just rely on clause by clause head unification to do the work in those cases. It seems unlikely (to me) that these “large literal values” would actually be arguments in the heads, so even computing a hash seems pointless.

But, as I said, I don’t know a lot about indexing.

1 Like

I wouldn’t expect strings to be used for clause indexing at all … to me, the main difference between atoms and strings is that atoms can be compared (and indexed) with O(1) but strings are O(n).

As for big MP ints and rationals … wouldn’t they have to be really really huge to have much effect on hashing?

I think there exist “incremental” hashes that allow one to compute the a hash on the first “n” bytes, and then - if necessary - compute use the next “m” bytes without recomputing the hash on the first “n+m” bytes. I don’t know if such a hash would be useful in this case, though.

I wouldn’t have any problem with this. Perhaps the only
“special case” would be the empty string, but that’s a frill.

I would think so, but they are bounded only by available memory (like strings). But other than small integers, I don’t see much value in indexing based on any numeric value.

Pushed

where the first is a refactor to get all hash computations on blobs on the stack (used by these types) in one place and the second bases the hash on the first word, last word and size for objects that use more than 4 machine words. As a result the max slowdown seems to be about 10%. This is a “quick fix”. As these changes have no compatibility impact I think something that may end up to be an intermediate solution is good enough.

Future versions should handle static predicates with a few clauses more cleverly.

1 Like

This fix is probably more than good enough. At least the cost of indexing on unbounded arguments is bounded (and fairly small).

Not that I have the SWI-Prolog C level skills at present to do this but would a hook for the hash computation on the blob type make sense?

In other works if one creates a new blob type, or for an existing blob type, one can create a custom hash computation that would be faster and lead to less collisions.

I already feared the term “blob” could lead to confusion. The blob type you refer to is the super type of atoms that is used to encapsulate arbitrary binary data (often handles to streams, threads, clauses, etc). Strings, floats, big integers and rational numbers live on the (global) stack and are internally called indirect types and are also blobs in the sense that the content of these types is simply an array of binary data. As yet, it is not possible for the user to add new types to this mechanism as it is for blobs (and there are no plans to change that either).

1 Like