Pattern Matching Order

brebs · February 20, 2025, 7:22pm

Here is a code solution which correctly handles non-ground variables. It takes pains to avoid unwanted choicepoints. I’m deliberately avoiding using if_ in reif package so that the code is easier to grok for beginners.

As with clumped/2, this is only preventing consecutive duplicates, hence its predicate name.

I prefer to use “ne” vs “eq” (not equal vs equal), to be more meaningful than the generic true vs false.

no_cons_dupes([], []).
no_cons_dupes([H|T], [H|R]) :-
    no_cons_dupes_(T, H, R).

no_cons_dupes_([], _, []).
no_cons_dupes_([H|T], P, R) :-
    (   H \= P
        % Certain mismatch
    ->  no_cons_dupes_eq_(ne, T, H, R)
    ;   H == P
        % Certain match
    ->  no_cons_dupes_eq_(eq, T, H, R)
        % Unsure whether matches
    ;   H = P,
        no_cons_dupes_eq_(eq, T, H, R)
    ;   dif(H, P),
        no_cons_dupes_eq_(ne, T, H, R)
    ).

% "ne" means "not equal"
no_cons_dupes_eq_(ne, T, H, [H|R]) :-
    no_cons_dupes_(T, H, R).
no_cons_dupes_eq_(eq, T, H, R) :-
    no_cons_dupes_(T, H, R).

Example output:

?- no_cons_dupes([a, a, a, a, b, c, c, a, a, d, e, e, e, e], L).
L = [a, b, c, a, d, e].

This Prolog code has the advantage of being able to produce logical/sensible output when given uncertainty, e.g.:

?- no_cons_dupes([a, a, b, b, b, c, c, a], L).
L = [a, b, c, a].

?- no_cons_dupes([a, a, b, b, b, c, CV, a], L).
CV = c,
L = [a, b, c, a] ;
CV = a,
L = [a, b, c, a] ;
L = [a, b, c, CV, a],
dif(CV, c),
dif(CV, a).

?- no_cons_dupes([A, B, C], L).
A = B, B = C,
L = [C] ;
A = B,
L = [B, C],
dif(C, B) ;
B = C,
L = [A, C],
dif(C, A) ;
L = [A, B, C],
dif(B, A),
dif(C, B).

wlodr · February 20, 2025, 8:58pm

As I explained, we make it tail recursive by an obvious swapping (of the two conjuncts in the body).

wlodr · February 20, 2025, 11:37pm

A simplification of the code of brebs. Works with nonground lists, with similar results.

no_dup([], []).
no_dup([H|T], Out) :-
    ( T = [] -> Out = [H]
    ; no_dup_(H, T, Out)
    ).
no_dup_(H1, [H2|T], Out) :-
    ( H1 \= H2  % Certain mismatch
      -> Out = [H1|Out2]
    ; H1 == H2  % Certain match
      -> Out = Out2
    ;           % Match or mismatch
      ( H1 = H2, Out = Out2
      ; dif(H1, H2), Out = [H1|Out2]
      )
    ),
    no_dup([H2|T], Out2).

The program remains correct (but leaves choice-points) when we drop the case with ==/2

brebs · February 21, 2025, 12:14am

Nope:

?- no_dup(L, R).
L = R, R = [] ;
L = R, R = [_].

wlodr · February 21, 2025, 12:44am

Luckily, I wrote that it works for nonground lists Here L is not a list.

brebs · February 21, 2025, 12:54am

You mean ground, not nonground. The first argument must be a ground list.

What is the point of writing such crippled code, which will happily produce a wrong result (with no guard) and pretend it’s a correct result? It’s an example of what not to do.

brebs · February 21, 2025, 2:17pm

This could be improved though:

?- no_cons_dupes(L, [1, 2]).
ERROR: Stack limit (1.0Gb) exceeded

E.g. L could be [1, 2] or [1, 1, 2] or [1, 2, 2] etc., going off into infinite length.

The code below acts a bit nicer, by changing the position of dif/2

no_cons_dupes([], []).
no_cons_dupes([H|T], [H|R]) :-
    no_cons_dupes_(T, H, R).

no_cons_dupes_([], _, []).
no_cons_dupes_([H|T], P, R) :-
    (   H \= P
        % Certain mismatch
    ->  R0 = R
    ;   H == P
        % Certain match
    ->  R = [H|R0]
        % Unsure whether matches
        % Try match first, to reduce length of R0
    ;   H = P,
        R = [H|R0]
    ;   dif(H, P),
        R0 = R
    ),
    no_cons_dupes_(T, H, R0).

Result:

?- no_cons_dupes(L, [1, 2]).
L = [1, 2, 2] ;
L = [1, 2, 2, _A],
dif(_A, 2) ;
L = [1, 2, 2, _A, _B],
dif(_A, 2),
dif(_B, _A) ;
L = [1, 2, 2, _A, _B, _C],
dif(_A, 2),
dif(_B, _A),
dif(_C, _B) ;
...

peter.ludemann · February 22, 2025, 10:27pm

So, this solution is a variant of append/3, which is tail-recursive in Prolog but not tail-recursive in functional languages (logical variables FTW).
Nice!

(Unfortunately, your edit didn’t show when subscribing by email)

wlodr · February 23, 2025, 4:43pm

I disagree. My no_dup/2 does correctly work also on non-ground lists. In particular, on the tests from the former post of brebs it produces the same results.
Indeed, no_dup(L,R) produces only two answers out of infinitely many, but L is not a non-ground list (as it has instances which are not lists).

wlodr · February 23, 2025, 5:31pm

On the other hand, my program can be made nicer – more declarative – by using indexing instead of the first if-then-else. Here is the result.

%  no_dupl(LL, L) - L is the list LL without duplicates
no_dupl([], []).
no_dupl([H|T], Out) :-
    no_dupl_(H, T, Out).

%  no_dupl_(H, LL, L) - L is the list [H|LL] without duplicates
no_dupl_(H, [], [H]).
no_dupl_(H1, [H2|T], Out) :-
      ( H1 \= H2  % Certain mismatch
        -> Out = [H1|Out2]
      ; H1 == H2   % Certain match
        -> Out = Out2
      ;          % Match or mismatch
        ( H1 = H2, Out = Out2
        ; dif(H1, H2), Out = [H1|Out2]
        )
      ),
      no_dupl([H2|T], Out2).

This program produces infinitely many answers for query no_dupl(LL,L).
It is incomplete, like the program of brebs – in the sense that both produce only lists with the same (variable) element repeated.
However moving no_dupl([H2|T], Out2) to the beginning of the clause body, results in a program producing all the answers. Here is a fragment of the answers:

LL = [_A, _A, _A],
L = [_A] ;
LL = [_A, _B, _B],
L = [_A, _B],
dif(_A, _B) ;
LL = [_A, _A, _B],
L = [_A, _B],
dif(_A, _B) ;
LL = L, L = [_A, _B, _C],
dif(_B, _C),
dif(_A, _B) ;

Added later
I got here an old lesson: Avoid to use extra-logical features of Prolog (unless really unnecessary). Getting rid of a ->/2 resulted in a more general and easier to understand code.

wlodr · February 23, 2025, 5:42pm

This program is however incorrect. For instance

?- no_cons_dupes([a,a,a], L).
L = [a, a, a].

brebs · February 23, 2025, 5:45pm

No. There’s an infinite amount of answers to e.g.:

L = [_|_], no_cons_dupes(L, [a, b]).

We generally do not care about showing the infinite options in a balanced way. E.g. this is consistently choosing the initial option of dif:

?- L = [_|_], no_cons_dupes(L, [a, b]).
L = [a, b, b] ;
L = [a, b, b, _A],
dif(_A, b) ;
L = [a, b, b, _A, _B],
dif(_A, b),
dif(_B, _A) ;

A more human-readable presentation would be to show all options based on limited length:

?- L = [_|_], length(L, _), no_cons_dupes(L, [a, b]).
L = [a, b, b] ;
L = [a, b, b, _A],
dif(_A, b) ;
L = [a, _A, b, b],
dif(_A, a),
dif(b, _A) ;
L = [a, b, b, _A, _B],
dif(_A, b),
dif(_B, _A) ;

This is not “incomplete” i.e. a bug, it’s just the awkwardness of infinity.

The same issue can be seen with e.g.:

?- no_cons_dupes(L, [1, 2]).
L = [1, 2] ;
L = [1, 2, 2] ;
L = [1, 2, 2, 2] ;
...

vs:

?- length(L, _), no_cons_dupes(L, [1, 2]).
L = [1, 2] ;
L = [1, 2, 2] ;
L = [1, 1, 2] ;
L = [1, 2, 2, 2] ;
L = [1, 1, 2, 2] ;
L = [1, 1, 1, 2] ;
...

Another example, using the built-in append/3 is:

?- append(L1, L2, LA).
L1 = [],
L2 = LA ;
L1 = [_A],
LA = [_A|L2] ;
L1 = [_A, _B],
LA = [_A, _B|L2] ;
L1 = [_A, _B, _C],
LA = [_A, _B, _C|L2] ;
...

… which is going off to infinite length with the 1st argument (L1).

brebs · February 23, 2025, 6:09pm

Ah yes, this is a bug, thanks. This is fixed in the version below, by keeping the important distinction between H and P:

no_cons_dupes([], []).
no_cons_dupes([H|T], [H|R]) :-
    no_cons_dupes_(T, H, R).

no_cons_dupes_([], _, []).
no_cons_dupes_([H|T], P, R) :-
    (   H \= P
        % Certain mismatch
    ->  no_cons_dupes_eq_(ne, T, H, P, R)
    ;   H == P
        % Certain match
    ->  no_cons_dupes_eq_(eq, T, H, P, R)
        % Unsure whether matches 
    ;   dif(H, P),
        no_cons_dupes_eq_(ne, T, H, P, R)
    ;   H = P,
        no_cons_dupes_eq_(eq, T, H, P, R)
    ).

% "ne" means "not equal"
no_cons_dupes_eq_(ne, T, H, _P, [H|R]) :-
    no_cons_dupes_(T, H, R).
no_cons_dupes_eq_(eq, T, _H, P, R) :-
    % Continuing comparison against original element P
    no_cons_dupes_(T, P, R).

Result:

?- no_cons_dupes([a,a,a], L).
L = [a].

wlodr · February 24, 2025, 10:16am

I find it strange. You aggressively criticized my former no_dup/2 because it did not produce all the answers to the most general query (which NB was outside of the specification). As a good one you present a program that also does not produce all the answers to the query.
(And you see no advantage in a program that actually does produce all of them.)
Added later (as you require explanations):
– For the most general query ?- no_cons_dupes(L, R)., your program produces only answers with L being a list with all elements equal. (My post 30 presented a program producing all the answers.)
– I wrote here “outside of specification”, as my post 23 with the program for no_dup/2 stated that the first argument should be a (possibly non-ground) list.

brebs · February 24, 2025, 11:11am

Code should follow Principle of least astonishment - Wikipedia , as decreed in The Tao of Programming - Wikipedia

Example?

Where was it specified that the program should sometimes produce a wrong result? How can that be justified? Can just fix the code, or throw/1 an exception - pretty much anything is better than producing a wrong result and pretending that it’s a correct result.

What? Please re-read my “showing the infinite options in a balanced way” post.

brebs · February 24, 2025, 6:43pm

Using my latest version, posted earlier in this thread:

?- no_cons_dupes(L, R).
L = R, R = [] ;
L = R, R = [_] ;
L = R, R = [_A, _B],
dif(_B, _A) ;
L = R, R = [_A, _B, _C],
dif(_B, _A),
dif(_C, _B) ;

These are all different, using dif/2.

I’ve already explained in this thread why it’s OK to go off to infinity with 1 variable - choicepoints are being created for the alternatives. It’s a consequence of infinite solutions, not a bug in the code. E.g. try to explain why the built-in append/3 is or isn’t equally buggy, since it appears to be ignoring the potentials for L2 (but isn’t - they are choicepoints):

?- append(L1, L2, LA).
L1 = [],
L2 = LA ;
L1 = [_A],
LA = [_A|L2] ;
L1 = [_A, _B],
LA = [_A, _B|L2] ;

My point from earlier, when your code was using ->/2 to commit to [] rather than have the option of [_|_] also, is that “possibly non-ground” can mean var/1, as shown by nonground/2 e.g.:

?- nonground(L, NG).
L = NG.

Anyway, I’ve explained the points enough times now.

Topic		Replies	Views
Sorting predicates in PROLOG Help! how-to	23	3297	August 3, 2022
Please help with this Prolog question Help!	3	339	October 27, 2022
Creating a unique list of elements Help!	4	973	December 22, 2023
A little doubt on maplist/2 Help! discussion	3	774	April 29, 2020
Confusing behavior of sequence//3 and sequence//5 from dcg/high_order Help! how-to	18	1365	December 25, 2020

Pattern Matching Order

Related topics