Fold over answers

Boris · December 29, 2024, 7:11am

I think I missed the real question by @emiruz. The foldl at the end of their question: in library(aggregate) there are templates, but you cannot provide your own aggregation operation. I think @j4n_bur53 showed something in another thread but I can no longer find it. What would be a possible implementation for a fold for a backtrackable goal?

The idea is just to provide the aggregation loop though a meta-predicate. The first argument is an aggregation operation that is applied in turn to the solutions of the backtrackable goal. Here is a naive attempt:

foldl_bt(Op, Goal, V0, V) :-
    State = state(V0),
    (  call(Goal),
           arg(1, State, S0),
           call(Op, S0, S),
           nb_setarg(1, State, S),
           fail
    ;  arg(1, State, V)
    ).

This seems to work. If I define:

sum(A, B, S) :- S is A + B.

then:

?- aggregate_all(sum(X), between(1, 3, X), Sum).
Sum = 6.

?- foldl_bt(sum(X), between(1, 3, X), 0, Sum).
Sum = 6.

?- foldl_bt(sum(1), between(1, 3, X), 0, Count).
Count = 3.

It also nicely reverses concatenation like the original foldl.

?- foldl(atom_concat, [a,b,c], '', L).
L = cba.

?- foldl_bt(atom_concat(X), (X=a ; X=b ; X=c), '', L).
L = cba.

What am I missing?

PS: Back to the original question

Lazy processing

I want to put together a chain like this:
process(In, [], Out1),
process(Out1, [], Out2),
process(Out2, [], Final),
foldl(do_more_work, Final, 0, Done).
However, I want it all to be lazy. That is, nothing should happen until foldl starts pulling items, and then the work should happen just in time. None of the intermediary lists nor the final list should ever be materialised.

Depending on the processing that is involved, this could look roughly like this:

foldl_bt(aggregation_op(X),
    (   foo(A, B),
        bar(B, C),
        baz(C, X)
    ), 0, Result).

jan · December 29, 2024, 8:45am

Nothing in particular. As is, it is even quite competitive in speed:

?- time(aggregate_all(sum(X), between(1, 1 000 000, X), Sum)).
% 3,000,001 inferences, 0.145 CPU in 0.146 seconds (100% CPU, 20656958 Lips)
Sum = 500_000_500_000.

?- time(foldl_bt(sum(X), between(1, 1 000 000, X), 0, Sum)).
% 4,000,000 inferences, 0.196 CPU in 0.197 seconds (100% CPU, 20414091 Lips)
Sum = 500_000_500_000.

This though, is due to the fact that the state is a small integer, which makes nb_setarg/3 fast … Well, it is not so bad. Consider

sc(X, C0-S0, C-S) :-
    C is C0+1,
    S is S0+X.

And now We should recall that aggregate_all/3 using sum(X) or count uses a state-based implementation, but the count-sum(X) (currently) uses findall/3 and aggregation. But, still this is not bad at all.

?- time(foldl_bt(sc(X), between(1, 1 000 000, X), 0-0, SumCount)).
% 3,000,002 inferences, 0.260 CPU in 0.261 seconds (100% CPU, 11541819 Lips)
SumCount = 1_000_000-500_000_500_000.

?- ?- time(aggregate_all(count-sum(X), between(1, 1 000 000, X), SumCount)).
% 14,004,607 inferences, 0.575 CPU in 0.578 seconds (99% CPU, 24354554 Lips)
SumCount = 1_000_000-500_000_500_000.

Profiling shows the picture below.

I am in favour of adding this to the library. Now the name. foldl_bt/4 is surely not good. “fold” must be part of it. The “l” is for left, which is not relevant here. Maybe just “foldall/4” or “fold_all/4”?

Boris · December 29, 2024, 12:20pm

I am happy that this is useful! I have been thinking about it for a while but never needed it so some credit goes to @emiruz for asking.

I don’t have any way to make a judgement. After reading the Wikipedia article on the topic, if this is a fold at all, it is still a left fold:

On lists, there are two obvious ways to carry this out: either by combining the first element with the result of recursively combining the rest (called a right fold), or by combining the result of recursively combining all elements but the last one, with the last element (called a left fold). This corresponds to a binary operator being either right-associative or left-associative, in Haskell’s or Prolog’s terminology. With a right fold, the sum would be parenthesized as 1 + (2 + (3 + (4 + 5))), whereas with a left fold it would be parenthesized as (((1 + 2) + 3) + 4) + 5.

But I am certain that others here have deeper understanding of the field.

If it is part of library(aggregate), yet another option would be aggregate_fold, similar to re_fold? But really, both foldall and fold_all are good enough, and more importantly, shorter names.

jan · December 29, 2024, 1:15pm

Right.

I’m not a fan of that. Folding is a form of aggregation, I’d say. Other candidates might be fold_answers/4?

Where is yet another issue. library(aggregate) comes to mind. It is a de-facto standard library though, so adding things is less obvious. On the other hand it already contains some only vaguely related predicates such as foreach/2. If this is the target library, fold_all/4 is the name that fits best (IMO).

CapelliC · December 29, 2024, 1:34pm

Maybe library(solution_sequences) could be an appropriate place

emiruz · December 29, 2024, 2:05pm

This is quite useful; thank you. I can see now how backtracking obviates lazy lists for my use cases. I also think that a fold over a Goal is badly needed in the standard library. It has occurred many times that I have to do findall first and then feed it to foldl, or in some of my AoC code, I’ve use lazy_findall + foldl to do roughly what you’ve done above.

jan · December 29, 2024, 8:01pm

Some more good news. In my internal version I enhanced nb_setarg/3 to avoid re-copying the old value. That allows for implementing findall/3 as

join(X, L, [X|L]).

my_findall(T, G, L) :-
    foldl_bt(join(T), G, [], L0),
    reverse(L0, L).

Tested over between(1,1_000_000,X), the performance of this is roughly less than twice as slow as native findall/3. I expect it already outperforms findall/3 on a low number of answers (not tested). Timing is less reproducible as GC is involved.

These results are so good that one starts wondering adding this as foldall/4 to the core?

j4n_bur53 · December 29, 2024, 9:07pm

jan:

?- time(aggregate_all(count-sum(X), between(1, 1 000 000, X), SumCount)).
% 14,004,607 inferences, 0.575 CPU in 0.578 seconds (99% CPU, 24354554 Lips)
SumCount = 1_000_000-500_000_500_000.

The problem is that aggregate_all/3 doesn’t run always where possible
in constant memory, making it slow. You can try this test case:

/* SWI-Prolog 9.3.16 */
?- aggregate_all((count,sum(X)), between(1, 10000000, X), CountSum).
ERROR: Stack limit (1.0Gb) exceeded

But its relatively easy to re-implement aggregate_all/3 to allow
constant memory also for composite aggregates:

/* Dogelog Player 1.2.5 */
?- aggregate_all((count,sum(X)), between(1, 10000000, X), CountSum).
CountSum = (10000000, 50000005000000).

I still do not give up the hope that there are competent people around
who can make a change and then a pull request.

Edit 29.12.2024
fold_bt/4 with sc/3 has the disadvantage that you do
re-create a result cell for the pair C-S on each iteration.

But why not make a pair value holder so that sys_aggr_next/2
doesn’t create a new result cell each time? Like here:

sys_aggr_init(count, X) :- X = v(_), change_arg(1, X, 0).
sys_aggr_init(sum(_), X) :- X = v(_), change_arg(1, X, 0).
sys_aggr_init((F,G), Z) :- Z = (_,_),   %%% make pair value holder
   sys_aggr_init(F, X), change_arg(1, Z, X),
   sys_aggr_init(G, Y), change_arg(2, Z, Y).

sys_aggr_next(count, Y) :- arg(1, Y, H), J is H+1, change_arg(1, Y, J).
sys_aggr_next(sum(X), Y) :- arg(1, Y, H), J is H+X, change_arg(1, Y, J).
sys_aggr_next((F,G), (X,Y)) :-
   sys_aggr_next(F, X),
   sys_aggr_next(G, Y).

sys_aggr_fini(count, X, A) :- arg(1, X, A).
sys_aggr_fini(sum(_), X, A) :- arg(1, X, A).
sys_aggr_fini((F,G), (X,Y), (C,D)) :-
   sys_aggr_fini(F, X, C),
   sys_aggr_fini(G, Y, D).

And then simply:

aggregate_all(A, G, S) :-
   sys_aggr_init(A, T),
   (G, sys_aggr_next(A, T), fail; true),
   sys_aggr_fini(A, T, S).

For SWI-Prolog replace change_arg/3 by nb_linkarg/3.

jan · December 29, 2024, 9:43pm

I know. I feared performance would be a show-stopper. It turns out that the slowdown from this is quite bearable though. Together with the simple definition and quite clear semantics I think this turns this into a pretty nice building block. How often do we not see findall/3 with some aggregation over the list? Some of this is covered by aggregate_all/3, but “folding” the answers seems way more flexible.

what stops you?

Boris · December 30, 2024, 6:58am

j4n_bur53:

But its relatively easy to re-implement aggregate_all/3 to allow
constant memory also for composite aggregates:
/* Dogelog Player 1.2.5 */
?- aggregate_all((count,sum(X)), between(1, 10000000, X), CountSum).
CountSum = (10000000, 50000005000000).

Yes, this was the post that I could no longer find, thank you for re-posting it. Is this as general as allowing the client code to provide an aggregation operation, or is it about allowing composite aggregates for existing templates?

Topic		Replies	Views
Generalizing aggregate/3 by adding foldl semantics Discussion discussion	0	242	June 23, 2024
Can we use foldl with bool values? General	22	557	August 31, 2023
Fun exercise: Transform foldr into pseudo-foldl using CLP(FD) (only for integer arithmetic of course) Nice to know	0	313	April 21, 2020
Use of findall for aggregate_all Predicate	1	1095	April 10, 2020
Aggregation Help!	27	1172	May 6, 2020

Fold over answers

Related topics