Two-way deterministic zero-padding of a list

Hi there, I’ve been learning prolog for a few months now, on and off, and I thought I was getting it, but this one looked simple but has me stumped.

I should note I’m using CLP(FD) for this, which I have only just started using on the basis of the voices who say it is more relational and easier to understand its behaviour than the usual prolog arithmetic. I am quite new to it so maybe there is something I don’t understand.

I want to have a predicate that relates a list to its padded version and a number representing the length of a padded version, and I would like this to be two-way, and it should be deterministic as there is only one solution. And of course being pure would be nice (avoiding cuts etc).

E.g. lzp([1,2], L, 4) should give L=[0,0,1,2], and lzp(L, [0,0,1,2], 4) should give L=[1,2].

‘Great’, I thought. ‘An opportunity to try out my relational thinking!’

This seems straightforward enough. A list L is related to it’s padded version P and the padded length N by:

  1. P is the concatenation of a list of zeroes Z and L.
  2. the length of Z + the length of L = N
  3. the first element of L > 0

(condition 3 is true in my use case, and means stripping the padding is unambiguous — without a condition of this sort you can’t tell where the padding ends and the list begins, of course.)

(I should also note in passing N could be treated better in the below, at the moment in my implementations it always needs to be specified, even though it’s redundant in the stripping case and doesn’t allow for the most general case of lzp(X, Y, Z). This isn’t the biggest problem though…)

This seemed like a reasonable implementation:

lzp_rel(L, L1, N) :-
    length(L, L_len),
    #(L_len) #=< #(N), 
    #(N_es) #= N - L_len,
    #(N_es) #>= 0, 
    length(Es, N_es),
    L = [H |_ ],
    #(H) #> 0,
    maplist(=(0), Es),
    append(Es, L, L1).

and it works:

?- lzp_rel([1,2], L, 4). 
L = [0, 0, 1, 2].

?- lzp_rel(L, [0,0,1,2], 4). 
L = [1, 2];

except that last call isn’t deterministic, and if you hit that ; it goes off into infinity.

The reason is that length keeps trying larger and larger lists.

So I thought “maybe I need a CLP(FD) aware length”, leading to:

len([], 0).

len([H | T], N) :-
    #(N1) #= #(N) - 1,
    len(T, N1). 

len_to_n(L, N) :-
    Len in 1..N,
    length(L, Len). 

But this still winds up in infinity after the last ; :

?- len_to_n(L, 3). 
L = [_] ;
L = [_, _] ;
L = [_, _, _] ;

I don’t understand the behaviour here. doing a trace only shows the following after the last answer:

L = [_, _, _] ;
   Redo: (11) length([_5716, _8774, _11842|_11844], _4408{clpfd = ...}) ? 

and then no further output. It looks like it’s starting to try larger lists again?

(I think I can probably figure out how to get lists of length ≤ N on backtracking, so that’s my next avenue of attack.)

Anyway, so I tried a recursive approach, which seems, well, more like you’d do it in Lisp:

lzp_rec_(L, L, N) :-
    length(L, Length),
    #(Length) #= #(N).  

lzp_rec_(L, L2, N) :-
      #(N) #>= 0, %needed otherwise backtracking will take this to -inf. 
      L2 = [0 | L3],   
      #(N1) #= #(N) - 1,
      lzp_rec_(L, L3, N1). 

lzp_rec(L, L2, N) :-
    lzp_rec_(L, L2, N),
    L = [H | _],
    #(H) #> #(0). 

This behaves better but isn’t deterministic in either case, even though there’s only one
result.

?- lzp_rec(L, [0,0,1,2], 4). 
L = [1, 2] ;
false.

?- lzp_rec([1, 2], L, 4). 
L = [0, 0, 1, 2] ;
false.

The reason of course is that the second lzp_rec_ definition is there to try after the base case. Many prolog list recursion definitions have the base case and recursive case distinguished by pattern matching, e.g. [] in the first and [H | T] in the second, so there isn’t another definition that matches the pattern to try. This must be the first time I’ve ended up writing something different to that, so I haven’t had to think about that before…

I can of course ‘fix’ my definitions by wrapping them in once but that isn’t very satisfactory.

Anyway, I’ve got an avenue to try to writing something that’ll generate lists up to a certain length which might fix the first definition. And perhaps pattern matching (or (shiver) a cut) could fix the second.

But I feel like I must have missed something, because this seemingly simple problem has been hard to find a satisfactory solution to!

Hello there,

Instead of trying to reexplain badly this topic, let me link to a very well informed source about constraint programming: https://ridgeworks.github.io/clpBNR/CLP_BNR_Guide/CLP_BNR_Guide.html#toc4Example_1,_List_Length

And here is a general piece of advice: clpfd is constraint programming on integers, not lists ! The moment you mix clpfd with the predicate length/2, you already lost.

Hum, recursion is the most fundamental control flow we have in prolog, what does lisp have to do with anything here ?

So, here is my take on your original problem:

A list L is related to it’s padded version P (let’s ignore N for now) if:

  1. the first element of P is 0 and the rest of P is padded L.
  2. the first element of P is not 0 and the rest of P is L.

I notice that the condition distinguishing 1 and 2 is an arithmetic condition, so I will use clpfd to reify the condition to a boolean and pattern match on the boolean:

lzp([H | T], L) :-
  % if H = 0, then order is 1
  % if H \= 0, then order is 0 and the reverse
  % if Order = 1, then H = 0
  % if Order = 0, then H \= 0 
  (H #= 0) #<==> Order,
  lzp(Order, [H | T], L).

lzp(0, L, L).
lzp(1, [0 | T], L) :-
  lzp(T, L).

Now, let’s count the number of time we found a 0 prefix in P. clpfd here can be useful to avoid having an accumulator:

lzp([H | T], L, N) :-
  (H #= 0) #<==> Order,
  (N #> 0) #<==> Order,
  lzp(Order, [H | T], L, N).
lzp(0, L, L, 0).
lzp(1, [0 | T], L, N) :-
  N1 #= N - 1,
  lzp(T, L, N1).

Just keep adding elements of 0 to the front, until the length is reached (or to infinity).

I don’t see clpfd actually helping here.

First I assume from specification that if N=0 the padded list is the same as the unpadded one, i.e., L=P. Otherwise, N must be a positive integer or padding will fail. It follows that an undefined N is the same as the no-padding case. Following up on @brebs suggestion, how about:

lzp_rel(L,L,0).           % if N=0, padded = unpadded
lzp_rel(L,P,N) :- 
	integer(N), N>0,      % imposed conditions on N
	lzp_rel_(L,P,N).
	
lzp_rel_(L,P,N) :-
	P=[0|L],              % condition 1 - P starts with 0 
	length(P,N),          % condition 1 & 2 
	L \= [0|_].           % condition 3
lzp_rel_(L,[0|P1],N) :-   % first clause not true, add another 0 to the padding
	N > 0, N1 is N-1,
	lzp_rel_(L,P1,N1).    % true if remaining padding has adjusted length

Now:

?- lzp_rel([1,2],L,4).
L = [0, 0, 1, 2] ;
false.

?- lzp_rel(L,[0,0,1,2],4).
L = [1, 2] ;
false.

?- lzp_rel(L,P,4).
L = [],
P = [0, 0, 0, 0] ;
false.

?- lzp_rel(L,P,0).
L = P ;
false.

?- lzp_rel(L,P,N).
L = P,
N = 0 ;
false.

If the superfluous choicepoint bothers you, you can add a green cut to the first clause or manage it in the call, e.g.,

?- once(lzp_rel(L,[0,0,1,2],4)).
L = [1, 2].

?- lzp_rel([1,2],L,4) -> true.
L = [0, 0, 1, 2].

Note that without a green cut, the clause ordering doesn’t semantically matter, but if you add the cut it must appear in the first clause (whichever it may be).

Constraints (e.g., clpfd) can be helpful in restoring logical behaviour to arithmetic, but in this case, given you accept the conditions on N, there doesn’t seem to be any illogical behaviour to restore - indeed there’s very little arithmetic.

1 Like

Can tame a list using e.g.:

len_between(Min, Max, Lst) :-
    Len is Max - Min,
    len_list_tail(Min, Lst, Tail),
    max_len_list(Len, Tail).

len_list_tail(Len, Lst, Tail) :-
    len_list_tail_(Lst, Len, Tail).

len_list_tail_(Tail, 0, Tail) :- !.
len_list_tail_(Lst, Len, Tail) :-
    ( var(Len) -> true ; Len > 0 ),
    Len0 is Len - 1,
    len_list_tail_(Lst, Len0, [_|Tail]).

max_len_list(MaxLen, Lst) :-
    MaxLen >= 0,
    max_len_list_([], Lst, MaxLen).
    
max_len_list_(Lst, Lst, MaxLen) :-
    ( MaxLen == 0 -> ! ; true ).
max_len_list_(Upto, Lst, MaxLen) :-
    MaxLen >= 1,
    MaxLen0 is MaxLen - 1,
    max_len_list_([_|Upto], Lst, MaxLen0).

Can then use as:

?- len_list_tail(3, L, T).
L = [_, _, _|T].

?- max_len_list(2, L).
L = [] ;
L = [_] ;
L = [_, _].

Putting the two together:

?- len_between(3, 5, L).
L = [_, _, _] ;
L = [_, _, _, _] ;
L = [_, _, _, _, _].

Why couldn’t one use between/3 and length/2 for this? Like,

?- between(3, 5, N), length(L, N).

?

Can do. But where’s the fun in that? :grinning:

I don’t really understand how you reached this conclusion ?
Isn’t a more natural/classical interpretation that an undefined N describe all possible padded lists ?

Note that I am not saying that you are wrong or anything. Just asking if you could explain a bit more the reason why you took that decision ?

By the way, I forgot to add some examples of use for my implementation.

Here is the most general query:

?- lzp(P, L, N).
P = L, L = [_A|_],
N = 0,
_A in inf.. -1\/1..sup ;
P = [0, _A|_B],
L = [_A|_B],
N = 1,
_A in inf.. -1\/1..sup ;
P = [0, 0, _A|_B],
L = [_A|_B],
N = 2,
_A in inf.. -1\/1..sup ;
...

So by default, we generate longer and longer padding on a partial list with at least one element which should be different to 0.

Here, is a query with a fixed padding length:

?- lzp(P, L, 3).
P = [0, 0, 0, _A|_B],
L = [_A|_B],
_A in inf.. -1\/1..sup.

Notice the determinism of the solution here, given by reification of the condition on N: (N #> 0) #<==> Order.

Here is a query with a fixed padded list:

?- lzp([0, 0, 0, 1, 2, 3], L, N).
L = [1, 2, 3],
N = 3.

Again, note the determinism, given by the reification of the condition on P: (H #= 0) #<==> Order.

Finally, a query with a constrained N between 3 and 6:

?- N in 3..6, lzp(P, [1, 2, 3], N).
N = 3,
P = [0, 0, 0, 1, 2, 3] ;
N = 4,
P = [0, 0, 0, 0, 1, 2, 3] ;
N = 5,
P = [0, 0, 0, 0, 0, 1, 2, 3] ;
N = 6,
P = [0, 0, 0, 0, 0, 0, 1, 2, 3].

Again, without a superfluous choice point at the end.

I believe this implementation has multiple advantages:

  • fully pure, meaning it is very easy to debug declaratively
  • fully deterministic under all possible conditions (which is important for performance)
  • very compact definition by using constraints library like clpfd

This implementation has also disadvantages:

  • reification like #<==> is very slow, so bad for performance. one solution would be to use reified if_ library, but I don’t know it will be faster…
  • this implementation chains clpfd variables to compute N, which is going to be extremely slow for large N. I think it should be possible to change this to a grounded accumulator with basic prolog arithmetic.

Another aspect of the implementation, which is either bad or good depending on usage, is that the implementation is fully eager. This means that a non grounded N will lead to the eager generation of all possible values of N and instantiation of the padding in P.
I suppose a lazy version could be done if anyone is interested ?

Totally arbitrary decision on my part. The OP didn’t specify what undefined N meant so I gave it one based on:

but your suggestion/implementation, or e.g., failure/error is equally valid as long as it’s included in the problem specification.