Another day, another predicate, another choicepoint!

Bratko exercise 4.3 - define nth_member(N, L, M) where M is the Nth element in the list L.

Assuming numbering is 1-based, not 0-based, my solution (which was identical to the author’s) was:

nth_member(1, [M|_], M).
nth_member(N, [_|T], M) :-
  N1 #= N - 1,
  nth_member(N1, T, M).

This works fine, but gives a choicepoint. I think I understand why (argument indexing can’t distinguish between the first argument as they are both integers), but can’t see a way to rewrite it so that it doesn’t give a choicepoint.

The usual (in my limited experience so far) was of avoiding choicepoints is to rewrite the predicate so that the first argument is a list that gets whittled away during recursion, leaving you with two clauses, one whose first argument is an empty list, and one whose first argument is a non-empty list.

However, I can’t see how to apply that idea here. It doesn’t seem to make sense to look for the 0th (or any) element of an empty list, so I’m not sure how you’d write the first clause.

I tried reordering the arguments, so it had a list first, but it didn’t help, as I still couldn’t manage to get an empty list in the first clause.

Anyone able to advise? Thanks

Normally it’s a case of: see nth1/3 - click on the weird orange :- system towards the top-right of the page, to see the code. But, that code is a bit complex, for performance.

Here’s a quick attempt, similar in style to peano_int:

nth1_elem(I, L, E) :-
    (   integer(I)
    ->  I @>= 1
    ;   var(I)
    ),
    nth1_elem_(L, 1, I, E).

nth1_elem_([H|T], U, I, E) :-
    (   U == I
    ->  !
    ;   U = I
    ),
    % After cut
    H = E,
    (   T == []
        % No more elements to check
    ->  !
    ;   true
    ).
nth1_elem_([_|T], U, I, E) :-
    U1 is U + 1,
    nth1_elem_(T, U1, I, E).

This works sensibly with e.g.:

?- nth1_elem(I, L, E).
I = 1,
L = [E|_] ;
I = 2,
L = [_, E|_] ;
I = 3,
L = [_, _, E|_] ;

?- nth1_elem(3, L, E).
L = [_, _, E|_].

?- dif(E, a), nth1_elem(1, [a|T], E).
false.  % Doesn't fall into the trap of an infinite loop

?- E = b, nth1_elem(I, [a,b], E).
E = b,
I = 2.  % No unwanted choicepoint

?- E = b, nth1_elem(I, [a,b,c], E).
E = b,
I = 2 ;  % Unavoidable choicepoint, when only 1 pass through list
false.

1 Like

When all else fails, take a look at the library source, in this case library(lists)https://github.com/SWI-Prolog/swipl-devel/blob/5310dab7d72e5e66c373d956b87c39f296fb2f5d/library/lists.pl#L251

Actually, looks like my nth1_elem is faster than the current nth1/3 - comparisons:

% Increase memory to 4GB-ish, to test with large lists
?- set_prolog_flag(stack_limit, 3_647_483_648).
?- garbage_collect, numlist(1, 20_000_000, NL),
time((nth1(I, NL, E), false)).
% 60,000,001 inferences, 2.944 CPU in 2.947 seconds (100% CPU, 20381175 Lips)

?- garbage_collect, numlist(1, 20_000_000, NL),
time((nth1_elem(I, NL, E), false)).
% 40,000,002 inferences, 2.041 CPU in 2.043 seconds (100% CPU, 19595093 Lips)
?- garbage_collect, time((between(1, 10_000, Len),
numlist(1, Len, NL), nth1(I, NL, 1_000), false)).
% 150,094,002 inferences, 7.206 CPU in 7.212 seconds (100% CPU, 20830450 Lips)

?- garbage_collect, time((between(1, 10_000, Len),
numlist(1, Len, NL), nth1_elem(I, NL, 1_000), false)).
% 100,109,001 inferences, 6.624 CPU in 6.630 seconds (100% CPU, 15113713 Lips)
?- garbage_collect, time((between(1, 10_000, Len),
length(L, Len), nth1(I, L, E), false)).
% 150,055,001 inferences, 6.037 CPU in 6.043 seconds (100% CPU, 24854631 Lips)

?- garbage_collect, time((between(1, 10_000, Len),
length(L, Len), nth1_elem(I, L, E), false)).
% 100,060,001 inferences, 5.244 CPU in 5.250 seconds (100% CPU, 19079618 Lips)
2 Likes

@brebs Thanks for the reply, and for the tip about seeing the source code.

Your code is going to take me some time to grok, but I have one immediate question. Why don’t you get a choicepoint for nth1_elem_, given that both clauses have a non-empty (ie not a constant []) list as their first argument?

Thanks again.

Because the cuts remove the choicepoint of nth1_elem_([_|T], U, I, E).

The thing is - when learning Prolog, it’s best to basically accept choicepoints, because attempts to remove them add complexity, and are a distraction from usefully solving the problem.

… except for a bit of first-argument indexing, I’d say.

Cuts are subtle, and easily abused. Notice that I am only cutting after an ==/2 comparison, which requires that both arguments are ground/1 or the same variable.

2 Likes

Hopefully of interest: to prevent an unwanted choicepoint with:

?- E = b, nth1(I, [a,b,c], E).

… could use a (slower than single-pass) 2-pass method:

nth1_2pass(I, L, E) :-
    remove_mismatches_nth1(E, L, Short),
    % Unify with matches
    member(m(I, E), Short).

remove_mismatches_nth1(E, Long, Short) :-
    remove_mismatches_nth1_(Long, E, 1, Short).

remove_mismatches_nth1_([], _, _, []).
remove_mismatches_nth1_([H|T], E, U, Short) :-
    (   H \= E
        % Definitely not a match
    ->  Short = Short0
        % Potentially a match
    ;   Short = [m(U, H)|Short0]
    ),
    U1 is U + 1,
    remove_mismatches_nth1_(T, E, U1, Short0).

Result:

?- E = b, nth1_2pass(I, [a,b,c], E).
E = b,
I = 2.

\=/2 also “integrates” nicely with dif/2 - example:

?- dif(E, c), nth1_2pass(I, [a,b,c], E).
E = a,
I = 1 ;
E = b,
I = 2.  % No unwanted choicepoint on c

A declarative variant would simply use (#>)/2 as well:

nth_member(1, [M|_], M).
nth_member(N, [_|T], M) :-
  N #> 1, N1 #= N - 1,
  nth_member(N1, T, M).

Seems to terminate:

?- nth_member(X, [2,1,3], 1).
X = 2 ;
false.

The spurious choice point is an artefact from the “procedural world”,
its not something that would impact the “declarative world”. Since its
a non-functional requirement, to have no spurious choice points.

But from a functional requirements viewpoint the above is correct,
despite the spurious choice point, since the answer set is
nevertheless {2}. Just like the nth1/3 predicate from library lists:

?- findall(X, nth_member(X, [2,1,3], 1), L).
L = [2].

?- findall(X, nth1(X, [2,1,3], 1), L).
L = [2].

There is no “declarative world” way to detect spurious choice points,
this is only something you see in the top-level, eliminating them is a
kind of nice to have from a “procedural world” viewpoint.

Using #=/2 kills performance:

?- garbage_collect, time((between(1, 200, Len),
length(L, Len), nth1_member(I, L, E), false)).
% 94,524,626 inferences, 3.605 CPU in 3.609 seconds (100% CPU, 26219482 Lips)

?- garbage_collect, time((between(1, 200, Len),
length(L, Len), nth1_elem(I, L, E), false)).
% 41,201 inferences, 0.002 CPU in 0.002 seconds (100% CPU, 18584190 Lips)

Can squeeze out more performance, at the cost of readability:

nth1_elem(I, L, E) :-
    (   integer(I)
    ->  I @>= 1
    ;   var(I)
    ),
    nth1_elem_(L, 1, I, E).

nth1_elem_([H|T], U, I, E) :-
    (   (   U == I
        ->  ! 
        ;   U = I
        ),
        % After cut
        H = E,
        (   T == []
            % No more elements to check
        ->  !
        ;   true
        )
    ;   U1 is U + 1,
        nth1_elem_(T, U1, I, E)
    ).

Performance:

?- garbage_collect, numlist(1, 20_000_000, NL),
time((nth1_elem(I, NL, E), false)).
% 60,000,001 inferences, 1.300 CPU in 1.301 seconds (100% CPU, 46167444 Lips)

?- garbage_collect, time((between(1, 10_000, Len),
numlist(1, Len, NL), nth1_elem(I, NL, 1_000), false)).
% 100,118,001 inferences, 5.001 CPU in 5.006 seconds (100% CPU, 20020685 Lips)

?- garbage_collect, time((between(1, 10_000, Len),
length(L, Len), nth1_elem(I, L, E), false)).
% 150,055,001 inferences, 3.349 CPU in 3.351 seconds (100% CPU, 44811446 Lips)

Maybe its a slow CLP(FD) implementation,
did you try GNU Prolog?

/* GNU Prolog 1.5.0 */
?- between(1,200,N), length(L,N), nth_member(I,L,E), fail; true.
(94 ms) yes

/* SWI-Prolog 9.1.14 */
?- time((between(1,200,N), length(L,N), nth_member(I,L,E), fail; true)).
% 94,504,524 inferences, 4.766 CPU in 4.767 seconds (100% CPU, 19830457 Lips)
true.

That was measured on my machine,
and was a factor ca. 50x slower in SWI-Prolog.

But its alread a non-trivial problem for a CLP(FD) system, since for the
above test case, it has to build a constraint chain, which is then resolved
when the first clause of nth_member/3 is reached, i.e. when N becomes 1.

I was wondering. The code was was still using succ/2. In the old days, this foreign predicate was faster than using X is Y+1, but these days the FreshVar is Var+/-Constant is translated into a single VM instruction. This pattern wins a lot because it is mostly used with small integers and therefore the implementation is optimized to first check the simple case where Var is a small int, does the arithmetic and simply stores the result as we know it is a fresh variable. succ/2 goes through the foreign interface, does not know which of the two arguments is bound and needs unification to bind the other variable. Replacing succ/2 with is/2 cut time on my machine from 3.2 sec to 1.3 sec.

Yes. SWI-Prolog’s clf(fd) is rather slow, especially for these simple cases. I thought that was mostly the rather slow constraint wakeup of SWI-Prolog, but according to profile/1 the bottleneck is inside library(clpfd) where it spends nearly 90% of its time.

The (#<)/2 isn’t needed for any mode (_, +, _), i.e. a mode where the list is given.
Such a mode is used in the benchmarking test case above. It gives a slight speed-up
for any mode (+, _, _), since the predicate would not traverse the whole list in

all cases anymore. You could benchmark with and without (#<)/2 examples like:

?- nth_member(2, [5,4,6], X).
X = 4 ;
false.

But one sees already the theoretical benefit of using CLP(FD).
Putting asside spurious choice points, it really allows the concise
declarative formulation of predicates, with a few rules, that

stay bidirectional. CLP(FD) is therefore sometimes used together
with DCG, i.e. parsing/unparsing, to make DCG more bidirectional.
But CLP(FD) has also its limitations, might not work always.

Thanks for the clarification. All becoming clearer (slowly :smiley:)

Understood, but it’s a useful exercise to expand my Prolog skills. I’m not obsessing about them, but finding ways to avoid them often leads to learning useful features, such as…

Yup, that one was an eye-opener for me. Just couldn’t see how to use it here.

I could be wrong, but I get the impression that cuts are generally discouraged, and I’ve tried not to use them. Something I may come back to when I understand al this a lot better though.

Thanks again

Yes, that makes the built-in nth1/3 competitive on time.

However, I still find 2nd attempt consistently, slightly faster than built-in nth1/3, for those 3 tests I listed… as well as being simpler code :grinning:

In my measurement the new built-in wins by about 10% on all three. The C compiler and CPU may matter (for me gcc 11 on AMD3950X). The real difference comes from the mode (+,+,-)
though, where the built-in wins almost 30 times :slight_smile: This is due to the use of ‘$seek_list’/4, which does the work at the lowest possible level.

110 ?- Max = 100 000, garbage_collect, numlist(1, Max, L), time(forall(between(1,Max,I), nth1(I, L, _))).
% 399,999 inferences, 8.992 CPU in 8.992 seconds (100% CPU, 44486 Lips)
Max = 100000,
L = [1, 2, 3, 4, 5, 6, 7, 8, 9|...].

111 ?- Max = 100 000, garbage_collect, numlist(1, Max, L), time(forall(between(1,Max,I), nth1_elem(I, L, _))).
% 5,000,349,999 inferences, 246.156 CPU in 246.166 seconds (100% CPU, 20313731 Lips)
Max = 100000,
L = [1, 2, 3, 4, 5, 6, 7, 8, 9|...].

Ah, I didn’t recompile.

Very nice - can’t argue/compete with that performance :grinning:

So, for deterministic code, might we expect a 10-30x performance speed-up with a compiler (like Aquarius) that translates to C?

For some operations, probably yes. ‘$skip_list’/4 only does the simple stuff, following the linked list cells to skip N cells of the list or until it hits the end (which may be a variable if the list is partial). So, if we do nth1(20, [1,2,3,4|T], E), the ‘$skip_list’/4 will skip the 4 known elements and says it stopped 16 short. Now we know the other part and we know we need to create a partial list with 15 variables followed by E.

If you want to do this as fast as possible at the low level, you first do this skipping because it does not change any data structures and thus can never cause stack shifts or GC calls (and thus we can use raw pointers rather than indirect pointers known to GC). For the second step we now use Prolog, but as we know exactly which data structure to build. the fastest would be to make sure there is enough stack space for the target partial list and then build it using (again) raw pointers. That is used by e.g. findall/3 to quickly create the output list: while collecting the results it computes the required stack space to represent the output list, so it can do all the work using raw pointers without caring about stack shifts/GC and without checking available memory.

The other benefit of computing the size is that we can claim out-of-stack if the output list exceeds the stack. This is done in two steps. First we compare the accumulated size to the free space. If that is exceeded we call GC and compare again against the (hopefully larger) free space.

Finally, we know we are counting lists lengths, so a counter of size size_t can not overflow as such a list does not fit in the address space.

So, yes and no. You can gain a lot, but not merely by translating what Prolog would do. In addition we need to reason about shift/GC and integer sizes.

1 Like