Lazy processing

emiruz · December 26, 2024, 10:50pm

Say I’m doing some list processing in the usual way:

process([X|Xs], Acc, Final) :-
   do_work(X,Y), !, process(Xs, [Y|Acc], Final).
process([], Final Final).

I want to put together a chain like this:

process(In, [], Out1),
process(Out1, [], Out2),
process(Out2, [], Final),
foldl(do_more_work, Final, 0, Done).

However, I want it all to be lazy. That is, nothing should happen until foldl starts pulling items, and then the work should happen just in time. None of the intermediary lists nor the final list should ever be materialised.

How would you change the process predicate to make this sort of lazy processing possible? I am aware of lazy lists but it is not obvious to me how to convert process to use them.

Please assume process outputs items in the same order as the input (although that is not shown).

j4n_bur53 · December 26, 2024, 11:28pm

You can use the ISO core standard call/n:

% random(-Number, -Lazy)
random(X, random) :- random_between(1,100,X).

% add(+Lazy, +Number, -Number, -Lazy)
add(L, A, B, add(J, A)) :-
   call(L, H, J),
   B is H+A.

% mul(+Lazy, +Number, -Number, -Lazy)
mul(L, A, B, mul(J, A)) :-
   call(L, H, J),
   B is H*A.

% take(+Integer, +Lazy, -List)
take(0, _, []).
take(N, L, [H|T]) :- N > 0,
   call(L, H, J),
   M is N-1,
   take(M, J, T).

Example queries:

?- take(8, random, X).
X = [83, 39, 28, 95, 18, 72, 16, 42] .

?- take(8, mul(random,2), X).
X = [70, 190, 68, 6, 100, 194, 28, 32] .

?- take(8, add(mul(random,2),100), X).
X = [260, 298, 282, 166, 184, 248, 206, 120] .

emiruz · December 26, 2024, 11:35pm

The problem there is that the source (random) is not a list, and the outputs do not chain lists. Both of these things are an issue for my use case.

Here is something that works for me it seems:

do_work([], [], 0) :- false, !.
do_work([X|Xs], Xs, Final) :- Final is X + 1.

test(Total) :-
    lazy_list(do_work, [1,2,3,4], L1),
    lazy_list(do_work, L1, L2),
    lazy_list(do_work, L2, L3),
    foldl([X,V0,V]>>(V is V0+X), L3, 0, Total).

It is an example of piping do_work into itself via list 3 times, and then using foldl to sum up the final output.

j4n_bur53 · December 26, 2024, 11:44pm

You can embed a finite list, turn it lazy:

% lazy(+List, -Term, -Lazy)
lazy([H|T], H, lazy(T)).

Here is a query:

?- take(6, lazy([a,b,c,d,e,f]), X).
X = [a, b, c, d, e, f] .

emiruz · December 26, 2024, 11:48pm

Last thing I need is to be able to chain things which do not return 1-for-1 lists. For example, say at the top I insert [1,2,3] and the next process creates [4,5,6,7,8,9] from that input. I dont see an obvious way to do that with lazy lists or otherwise.

j4n_bur53 · December 26, 2024, 11:54pm

A lazy list is free to do what ever it wants to
as long as it adheres to the protocol. The protocol
I showed is a little weak, since it cannot detect

the end of a list. It is only call(Lazy, Head, TailLazy),
basically assuming infinite lists. But you can already do
funny stuff with it, like duplicate each element in place:

% dup(+Lazy, -Term, -Lazy)
dup(L, H, dup2(H, J)) :-
   call(L, H, J).

% dup2(+Term, +Lazy, -Term, -Lazy)
dup2(H, J, H, dup(J)).

So the lazy list can have state besides mapping elements:

?- take(6, dup(lazy([a,b,c])), X).
X = [a, a, b, b, c, c] .

Chaining f1 |> f2 is written as f2(f1). So the last processor
is the outer functor of the argument to take/3, and the first
processor is the inner functor of the argument.

emiruz · December 27, 2024, 7:59am

I’ve got what i need working with lazy_list except its useless for the purpose since its about 20x slower than a fully materialized version. I had the same experience when using lazy_findall/3.

Boris · December 27, 2024, 10:09am

Sorry about the long reply

This way you show is slightly unusual. It is impossible to tell without a concrete example, of course, so I am only guessing; still, the usual way would be more like:

process([], []).
process([X|Xs], [Y|Ys]) :-
    do_work(X,Y),
    process(Xs, Ys).

… which of course is just a maplist:

maplist(do_work, Xs, Ys).

(And I mean, the two are semantically equivalent, and if you have library(apply_macros), they are also identical).

There could be reasons why this is not good enough, for example

your do_work/2 leaves choice points, but you know for certain that you don’t need them;
you need to reverse the order;
something else?

By reversing the order, here is a silly example to show what I mean:

% your original definition
process([X|Xs], Acc, Final) :-
   do_work(X,Y), !, process(Xs, [Y|Acc], Final).
process([], Final, Final).

do_work(X, Y) :- char_code(X, Y).

and now:

?- process([a,b,c], [], Ys).
Ys = [99, 98, 97].

In comparison:

?- maplist(char_code, [a,b,c], Ys).
Ys = [97, 98, 99].

You should play around and see that reversing the order of the clauses makes no difference; and that in the specific case of char_code/2, removing the cut makes not difference. For a non-deterministic predicate “do_work” it will of course make a difference, you will get all possible combinations. Here is a silly example just to demonstrate:

?- maplist(between(0), [1,2], Ys).
Ys = [0, 0] ;
Ys = [0, 1] ;
Ys = [0, 2] ;
Ys = [1, 0] ;
Ys = [1, 1] ;
Ys = [1, 2].

About lazy evaluation. The Prolog alternative to lazy evaluation is backtracking. You can get very far with it. The huge advantage is that it is very fast and needs far less memory; you never need to materialize any lists. The huge disadvantage is that you lose solutions on backtracking. If you can provide a concrete example, it might be easier to judge if you need lists or lazy lists.

j4n_bur53 · December 27, 2024, 1:16pm

Same with lazy lists, if you don’t materialize.
Like if you don’t use lazy/3 or take/3:

lazy/3 provides a materialized input. And take/3 provides
a materialized output. For example one could define:

% thaw(+Integer, +Lazy)
thaw(0, _).
thaw(N, L) :- N > 0,
   call(L, _, J),
   M is N-1,
   thaw(M, J).

Then this here:

?- thaw(8, add(mul(random,2),100)).
true.

Uses the same memory like this here:

?- between(1,8,_), random_between(1,100,X), 
Y is X*2+100, fail; true.
true.

But lazy is usually faster in Haskell, because its natively supported.
In SWI-Prolog the overhead is around this here, only ca 50% slower:

?- time(thaw(1000000, add(mul(random,2),100))).
% 9,999,999 inferences, 0.969 CPU in 0.985 seconds (98% CPU, 10322580 Lips)
true .

?- time((between(1,1000000,_), random_between(1,100,X), 
Y is X*2+100, fail; true)).
% 4,999,998 inferences, 0.656 CPU in 0.650 seconds (101% CPU, 7619045 Lips)
true.

But maybe a problem of meta goal argument to the time/1
built-in that backtracking isn’t more speedier?

ridgeworks · December 27, 2024, 4:34pm

I assume you know that SWI Prolog lazy lists are implemented as constraints using attributed variables, and that unification of attributed variables is going to be orders of magnitude slower that unification that doesn’t involve attributed variables.

To make it concrete, the attr_unify_hook that gets invoked whenever a variable with a lazy_list attribute is unified:

attr_unify_hook(State, Value) :-
    State = lazy_list(Next, Read),
    (   var(Read)
    ->  call(Next, NewList, Tail),
        (   Tail == []
        ->  nb_setarg(2, State, NewList)
        ;   put_attr(Tail, lazy_lists, lazy_list(dummy, _)),  % See (*)
            nb_setarg(2, State, NewList),
            arg(2, State, NewListCP),
            '$skip_list'(_, NewListCP, TailCP),
            get_attr(TailCP, lazy_lists, LazyList),
            nb_linkarg(1, LazyList, Next)
        ),
        arg(2, State, Value)
    ;   Value = Read
    ).

Bottomline: if your focus includes performance, you should understand the tradeoffs involved in using lazy evaluation (i.e. constraints).

jan · December 27, 2024, 4:43pm

Note that several of the predicates involving lazy lists have a chunk size argument or option. That reduces the number of wakeups. Still, there is quite a bit of work involved in extending the lazy list.

emiruz · December 27, 2024, 6:59pm

Thanks, no I didn’t know any of those things, but it all makes sense now.

emiruz · December 27, 2024, 9:47pm

Here is an example of me trying to use lazy lists to solve an AoC2024 problem. Ironically the memory blows up at the default limit, I’ve no idea why.

% 353,135,695 inferences, 32.161 CPU in 32.171 seconds (100% CPU, 10980304 Lips)
ERROR: Stack limit (1.0Gb) exceeded
ERROR:   Stack sizes: local: 2Kb, global: 1.0Gb, trail: 78.1Mb
ERROR:   Stack depth: 1,967, last-call: 99%, Choice points: 7
ERROR:   In:
ERROR:     [1,967] '$attvar':uhook(lazy_lists, <compound lazy_list/2>, [length:1|_2157958])
ERROR:     [1,966] '$attvar':call_all_attr_uhooks(<compound att/3>, [length:1|_2157996])
ERROR:     [1,965] '$attvar':'$wakeup'(<compound wakeup/3>)
ERROR:     [1,964] lists:append([length:1,546], _2158052, [length:1|_2158066])
ERROR:     [1,513] user:tree([length:451|_2158100], 1996, _2158094)
ERROR: 
ERROR: Use the --stack_limit=size[KMG] command line option or
ERROR: ?- set_prolog_flag(stack_limit, 2_147_483_648). to double the limit.

j4n_bur53 · December 27, 2024, 10:40pm

Lazy Lists via Attributed Variables aka Constraints as in the
corresponding lazy list library from SWI-Prolog are different from the
direct lazy lists I provided. They do two things, namely a) lazy evaluation

(call-by-name) and b) memoization, and not only one thing lazy evaluation.
Which means they are more memory hungry and require a smarter garbage
collection, and end-user Prolog code that has no choice points that blocks garbage

collection. This means procedural Prolog code with cuts and less declarative Prolog
code, in case one is really up to minimize the memory foot print. Also I don’t know
whether the library is really developed and tested that it doesn’t leave garbage.

The difference is seen here.

/* direct lazy lists, no memoization */
?- Lazy = random, take(3, random, X), take(3, random, Y).
Lazy = random,
X = [51, 76, 72],
Y = [73, 26, 21] .

/* SWI-Prolog lazy lists, memoization */
?- lazy_list(random2, Lazy), take2(3, X, Lazy, _), take2(3, Y, Lazy, _).
Lazy = [15, 54, 30|_A],
X = Y, Y = [15, 54, 30],
lazy_list(user:random2, _A) .

The code to test SWI-Prolog lazy lists:

% random2(-List, +List)
random2 --> {random_between(1,100,X)}, [X].

% take2(+Integer, -List, +List)
take2(0, []) --> [].
take2(N, [X|L]) --> {N>0}, [X], {M is N-1}, take2(M, L).

Haskell does the same for its delayed expressions, namely lazy evaluation
and memoization, the later to avoid repeated and hence costly re-evaluation
of the delayed expression. The memoization is garbage collected when the

delayed expressions becomes unreachabel.

emiruz · December 27, 2024, 11:00pm

The lazy_list abstraction is practical. It’d be great if you could write something using a similar contract which can be used to pipe lists into processes and processes into each other via lists.

I’m not really sure how memoisation fits into all this, but I think its strange that a lazy list ends up exploding the memory due to overhead.

jan · December 29, 2024, 8:46am

2 posts were split to a new topic: Fold over answers

Topic		Replies	Views
Best way to weave list processing predicates? Algorithm how-to	12	752	July 22, 2021
Lazy list issue Help!	8	429	November 2, 2021
Fun exercise: Transform foldr into pseudo-foldl using CLP(FD) (only for integer arithmetic of course) Nice to know	0	310	April 21, 2020
Fold over answers Request For Comments foldl	9	82	December 30, 2024
Using lazy_lists module to give out hash of a Swipl installer in Windows General	1	241	May 30, 2023

Lazy processing

Related topics