Multiple min and max for library(aggregate)?

Currently, there is only one solution for min and max in library(aggregate), regardless if we use aggregate/3 or aggregate_all/3. (or the 4-argument version with the discriminator). Here is what I mean:

foo(1, a).
foo(2, b).
foo(3, c).
foo(3, a).

Then:

?- aggregate_all(max(A, B), foo(A, B), Max).
Max = max(3, c).

I don’t see any way to collect both values in the second argument associated with 3, the maximum of the first argument.

I assume this is a conscious decision when developing the library. Still, how would I do it? The best at the moment (short of writing my own aggregation) is to first find the max, then findall using the value:

?- aggregate_all(max(A), foo(A, _), Max), findall(B, foo(Max, B), Bs).
Max = 3,
Bs = [c, a].

This is fine, but not ideal?

I would have proposed an extension to library(aggregate); for what I see, it means adding a case for new value equal to the min or max, for example here in the aggregate_all definition.

But I cannot figure out a clean interface for this that doesn’t break the current behaviour.

So how would I do this? The silent assumption is that if I have a backtrackable predicate, I don’t necessarily want to collect everything to a list. The best I can come up with is write my own aggregation, using the code in library(aggregate) as an inspiration.

I mean something like this:

my_aggregate_all(max(X,W), Goal, max(Max,Witness)) :-
    !,
    State = state(false, _Max, _Witness),
    (  call(Goal),
           (   State = state(true, Max0, Witness0)
           ->  (   X > Max0
               ->  nb_setarg(2, State, X),
                   nb_setarg(3, State, [W])
               ;   X =:= Max0
               ->  nb_setarg(3, State, [W|Witness0])
               )
           ;   number(X)
           ->  nb_setarg(1, State, true),
               nb_setarg(2, State, X),
               nb_setarg(3, State, [W])
           ;   type_error(number, X)
           ),
           fail
    ;  State = state(true, Max, Witness)
    ).

This obviously reverses the order of the solutions, but it was just to demonstrate what I mean.

The most realistic option I can see quickly is to add a new template, say allmax(Max, Witnesses) where Witnesses is a list of all elements sharing the same max. I have little clue how useful that is in general. Seems there are cases where it makes sense. Did you check the latest SICStus docs? This library is a direct reimplementation of the SICStus lib. If they extended the API we should copy this.

Note that when enumerating from facts the second findall is probably barely slower than extending the library. If the generator is slow and does not do the required indexing this is of course not a good idea.

I see nothing in the SICStus docs that is different from what SWI has. Here: lib-aggregate (SICStus Prolog)

The reason why I am asking is that I “discovered” that backtracking over a (large) file is both fast and easy to write with library(aggregate), and runs in constant memory. The other option is to use something like phrase_from_file/2 but then I need to do the “max” myself.

The more theoretical objection is that for max(X), there is one solution, so current behavior fine; but for max(X, Witness), there can be more than one Witness, and only showing the first one is actually hiding information.

Something like that can work. As suggested, I’d use a different template. Also the way you collect will result in quadratic behavior as nb_setarg/3 copies the argument that gets bigger and bigger. That is nasty to avoid. It is doable though. I once write a findall/3 alternative using this technique call findfew/3 as it is faster if the goal only a few solutions, but gets slower if there are many. Here is the code:

:- meta_predicate
	find_few(?, 0, -).

find_few(T, G, L) :-
	L0 = [dummy|_],
	Result = list(L0),
	(   call(G),
	    duplicate_term(T, T2),
	    NewLastCell = [T2|_],
	    arg(1, Result, LastCell),
	    nb_linkarg(2, LastCell, NewLastCell),
	    nb_linkarg(1, Result, NewLastCell),
	    fail
	;   arg(1, Result, [_]),
	    L0 = [_|L]
	).
2 Likes

Some mode directed tabling have plugable aggregates. SWI-Prolog has for example lattice/2 aggregate that takes a ternary closure. But you can also add plugable aggregates not only to library(tabling), but also to library(aggregate).

Here is an example for a reduce/3 aggregate that takes a initial value and a ternary closure. This works in my system:

maxlist(V-_,W-E,W-[E]) :- W > V, !.
maxlist(V-L,W-E,V-R) :- W =:= V, !, last(R,E,L).
maxlist(V-L,_,V-L).

?- aggregate_all(reduce(0-[],maxlist,X-Y), foo(X,Y), R).
R = 3-[c, a]

The above is without nb_linkarg/3 etc… so it could be improved. Only wanted to show how something common in mode directed tabling can be also applied to aggregates.

The only problem is when you have pre/post processing like arg(1, Result, [_]), L0 = [_|L], etc.. you need possibly more parameters in the plugable aggregate.

That would roughly create foldl/4 over a solution set. Well, we can do that:

?- lazy_findall(1000, X, between(1, 1 000 000, X), List),
   foldl(plus, List, 0, Sum).
List = [1, 2, 3, 4, 5, 6, 7, 8, 9|...],
Sum = 500000500000 .

As the list is exposed in the toplevel it is fully materialized. If you use this inside a predicate GC will keep the list short. Timing is not that good. 0.6 sec vs 0.2 for aggregate_all/3. Surely one could gain a bit by improving the implementation. Still, it implies term copying to materialize the list and GC to keep the memory low.

We could have something like this, providing the aggregation predicate, the initial value and the variable(s) it should aggregate on.

?- aggregate_all(plus, X, between(1, 1 000 000, X), 0, Sum).