Pattern Matching Order

Here is a code solution which correctly handles non-ground variables. It takes pains to avoid unwanted choicepoints. I’m deliberately avoiding using if_ in reif package so that the code is easier to grok for beginners.

As with clumped/2, this is only preventing consecutive duplicates, hence its predicate name.

I prefer to use “ne” vs “eq” (not equal vs equal), to be more meaningful than the generic true vs false.

no_cons_dupes([], []).
no_cons_dupes([H|T], [H|R]) :-
    no_cons_dupes_(T, H, R).

no_cons_dupes_([], _, []).
no_cons_dupes_([H|T], P, R) :-
    (   H \= P
        % Certain mismatch
    ->  no_cons_dupes_eq_(ne, T, H, R)
    ;   H == P
        % Certain match
    ->  no_cons_dupes_eq_(eq, T, H, R)
        % Unsure whether matches
    ;   H = P,
        no_cons_dupes_eq_(eq, T, H, R)
    ;   dif(H, P),
        no_cons_dupes_eq_(ne, T, H, R)
    ).

% "ne" means "not equal"
no_cons_dupes_eq_(ne, T, H, [H|R]) :-
    no_cons_dupes_(T, H, R).
no_cons_dupes_eq_(eq, T, H, R) :-
    no_cons_dupes_(T, H, R).

Example output:

?- no_cons_dupes([a, a, a, a, b, c, c, a, a, d, e, e, e, e], L).
L = [a, b, c, a, d, e].

This Prolog code has the advantage of being able to produce logical/sensible output when given uncertainty, e.g.:

?- no_cons_dupes([a, a, b, b, b, c, c, a], L).
L = [a, b, c, a].

?- no_cons_dupes([a, a, b, b, b, c, CV, a], L).
CV = c,
L = [a, b, c, a] ;
CV = a,
L = [a, b, c, a] ;
L = [a, b, c, CV, a],
dif(CV, c),
dif(CV, a).

?- no_cons_dupes([A, B, C], L).
A = B, B = C,
L = [C] ;
A = B,
L = [B, C],
dif(C, B) ;
B = C,
L = [A, C],
dif(C, A) ;
L = [A, B, C],
dif(B, A),
dif(C, B).
1 Like

As I explained, we make it tail recursive by an obvious swapping (of the two conjuncts in the body).

A simplification of the code of brebs. Works with nonground lists, with similar results.

no_dup([], []).
no_dup([H|T], Out) :-
    ( T = [] -> Out = [H]
    ; no_dup_(H, T, Out)
    ).
no_dup_(H1, [H2|T], Out) :-
    ( H1 \= H2  % Certain mismatch
      -> Out = [H1|Out2]
    ; H1 == H2  % Certain match
      -> Out = Out2
    ;           % Match or mismatch
      ( H1 = H2, Out = Out2
      ; dif(H1, H2), Out = [H1|Out2]
      )
    ),
    no_dup([H2|T], Out2).

The program remains correct (but leaves choice-points) when we drop the case with ==/2

Nope:

?- no_dup(L, R).
L = R, R = [] ;
L = R, R = [_].

Luckily, I wrote that it works for nonground lists :slight_smile: Here L is not a list.

You mean ground, not nonground. The first argument must be a ground list.

What is the point of writing such crippled code, which will happily produce a wrong result (with no guard) and pretend it’s a correct result? It’s an example of what not to do.

This could be improved though:

?- no_cons_dupes(L, [1, 2]).
ERROR: Stack limit (1.0Gb) exceeded

E.g. L could be [1, 2] or [1, 1, 2] or [1, 2, 2] etc., going off into infinite length.

The code below acts a bit nicer, by changing the position of dif/2

no_cons_dupes([], []).
no_cons_dupes([H|T], [H|R]) :-
    no_cons_dupes_(T, H, R).

no_cons_dupes_([], _, []).
no_cons_dupes_([H|T], P, R) :-
    (   H \= P
        % Certain mismatch
    ->  R0 = R
    ;   H == P
        % Certain match
    ->  R = [H|R0]
        % Unsure whether matches
        % Try match first, to reduce length of R0
    ;   H = P,
        R = [H|R0]
    ;   dif(H, P),
        R0 = R
    ),
    no_cons_dupes_(T, H, R0).

Result:

?- no_cons_dupes(L, [1, 2]).
L = [1, 2, 2] ;
L = [1, 2, 2, _A],
dif(_A, 2) ;
L = [1, 2, 2, _A, _B],
dif(_A, 2),
dif(_B, _A) ;
L = [1, 2, 2, _A, _B, _C],
dif(_A, 2),
dif(_B, _A),
dif(_C, _B) ;
...

So, this solution is a variant of append/3, which is tail-recursive in Prolog but not tail-recursive in functional languages (logical variables FTW).
Nice!

(Unfortunately, your edit didn’t show when subscribing by email)

I disagree. My no_dup/2 does correctly work also on non-ground lists. In particular, on the tests from the former post of brebs it produces the same results.
Indeed, no_dup(L,R) produces only two answers out of infinitely many, but L is not a non-ground list (as it has instances which are not lists).

On the other hand, my program can be made nicer – more declarative – by using indexing instead of the first if-then-else. Here is the result.

%  no_dupl(LL, L) - L is the list LL without duplicates
no_dupl([], []).
no_dupl([H|T], Out) :-
    no_dupl_(H, T, Out).

%  no_dupl_(H, LL, L) - L is the list [H|LL] without duplicates
no_dupl_(H, [], [H]).
no_dupl_(H1, [H2|T], Out) :-
      ( H1 \= H2  % Certain mismatch
        -> Out = [H1|Out2]
      ; H1 == H2   % Certain match
        -> Out = Out2
      ;          % Match or mismatch
        ( H1 = H2, Out = Out2
        ; dif(H1, H2), Out = [H1|Out2]
        )
      ),
      no_dupl([H2|T], Out2).

This program produces infinitely many answers for query no_dupl(LL,L).
It is incomplete, like the program of brebs – in the sense that both produce only lists with the same (variable) element repeated.
However moving no_dupl([H2|T], Out2) to the beginning of the clause body, results in a program producing all the answers. Here is a fragment of the answers:

LL = [_A, _A, _A],
L = [_A] ;
LL = [_A, _B, _B],
L = [_A, _B],
dif(_A, _B) ;
LL = [_A, _A, _B],
L = [_A, _B],
dif(_A, _B) ;
LL = L, L = [_A, _B, _C],
dif(_B, _C),
dif(_A, _B) ;

This program is however incorrect. For instance

?- no_cons_dupes([a,a,a], L).
L = [a, a, a].

No. There’s an infinite amount of answers to e.g.:

L = [_|_], no_cons_dupes(L, [a, b]).

We generally do not care about showing the infinite options in a balanced way. E.g. this is consistently choosing the initial option of dif:

?- L = [_|_], no_cons_dupes(L, [a, b]).
L = [a, b, b] ;
L = [a, b, b, _A],
dif(_A, b) ;
L = [a, b, b, _A, _B],
dif(_A, b),
dif(_B, _A) ;

A more human-readable presentation would be to show all options based on limited length:

?- L = [_|_], length(L, _), no_cons_dupes(L, [a, b]).
L = [a, b, b] ;
L = [a, b, b, _A],
dif(_A, b) ;
L = [a, _A, b, b],
dif(_A, a),
dif(b, _A) ;
L = [a, b, b, _A, _B],
dif(_A, b),
dif(_B, _A) ;

This is not “incomplete” i.e. a bug, it’s just the awkwardness of infinity.

The same issue can be seen with e.g.:

?- no_cons_dupes(L, [1, 2]).
L = [1, 2] ;
L = [1, 2, 2] ;
L = [1, 2, 2, 2] ;
...

vs:

?- length(L, _), no_cons_dupes(L, [1, 2]).
L = [1, 2] ;
L = [1, 2, 2] ;
L = [1, 1, 2] ;
L = [1, 2, 2, 2] ;
L = [1, 1, 2, 2] ;
L = [1, 1, 1, 2] ;
...

Ah yes, this is a bug, thanks. This is fixed in the version below, by keeping the important distinction between H and P:

no_cons_dupes([], []).
no_cons_dupes([H|T], [H|R]) :-
    no_cons_dupes_(T, H, R).

no_cons_dupes_([], _, []).
no_cons_dupes_([H|T], P, R) :-
    (   H \= P
        % Certain mismatch
    ->  no_cons_dupes_eq_(ne, T, H, P, R)
    ;   H == P
        % Certain match
    ->  no_cons_dupes_eq_(eq, T, H, P, R)
        % Unsure whether matches 
    ;   dif(H, P),
        no_cons_dupes_eq_(ne, T, H, P, R)
    ;   H = P,
        no_cons_dupes_eq_(eq, T, H, P, R)
    ).

% "ne" means "not equal"
no_cons_dupes_eq_(ne, T, H, _P, [H|R]) :-
    no_cons_dupes_(T, H, R).
no_cons_dupes_eq_(eq, T, _H, P, R) :-
    % Continuing comparison against original element P
    no_cons_dupes_(T, P, R).

Result:

?- no_cons_dupes([a,a,a], L).
L = [a].

I find it strange. You aggressively criticized my former no_dup/2 because it did not produce all the answers to the most general query (which NB was outside of the specification). As a good one you present a program that also does not produce all the answers to the query.
(And you see no advantage in a program that actually does produce all of them.)