Multiple min and max for library(aggregate)?

Boris · February 22, 2021, 9:43am

Currently, there is only one solution for min and max in library(aggregate), regardless if we use aggregate/3 or aggregate_all/3. (or the 4-argument version with the discriminator). Here is what I mean:

foo(1, a).
foo(2, b).
foo(3, c).
foo(3, a).

Then:

?- aggregate_all(max(A, B), foo(A, B), Max).
Max = max(3, c).

I don’t see any way to collect both values in the second argument associated with 3, the maximum of the first argument.

I assume this is a conscious decision when developing the library. Still, how would I do it? The best at the moment (short of writing my own aggregation) is to first find the max, then findall using the value:

?- aggregate_all(max(A), foo(A, _), Max), findall(B, foo(Max, B), Bs).
Max = 3,
Bs = [c, a].

This is fine, but not ideal?

I would have proposed an extension to library(aggregate); for what I see, it means adding a case for new value equal to the min or max, for example here in the aggregate_all definition.

But I cannot figure out a clean interface for this that doesn’t break the current behaviour.

So how would I do this? The silent assumption is that if I have a backtrackable predicate, I don’t necessarily want to collect everything to a list. The best I can come up with is write my own aggregation, using the code in library(aggregate) as an inspiration.

Boris · February 22, 2021, 10:03am

I mean something like this:

my_aggregate_all(max(X,W), Goal, max(Max,Witness)) :-
    !,
    State = state(false, _Max, _Witness),
    (  call(Goal),
           (   State = state(true, Max0, Witness0)
           ->  (   X > Max0
               ->  nb_setarg(2, State, X),
                   nb_setarg(3, State, [W])
               ;   X =:= Max0
               ->  nb_setarg(3, State, [W|Witness0])
               )
           ;   number(X)
           ->  nb_setarg(1, State, true),
               nb_setarg(2, State, X),
               nb_setarg(3, State, [W])
           ;   type_error(number, X)
           ),
           fail
    ;  State = state(true, Max, Witness)
    ).

This obviously reverses the order of the solutions, but it was just to demonstrate what I mean.

jan · February 22, 2021, 10:05am

The most realistic option I can see quickly is to add a new template, say allmax(Max, Witnesses) where Witnesses is a list of all elements sharing the same max. I have little clue how useful that is in general. Seems there are cases where it makes sense. Did you check the latest SICStus docs? This library is a direct reimplementation of the SICStus lib. If they extended the API we should copy this.

Note that when enumerating from facts the second findall is probably barely slower than extending the library. If the generator is slow and does not do the required indexing this is of course not a good idea.

Boris · February 22, 2021, 10:18am

I see nothing in the SICStus docs that is different from what SWI has. Here: lib-aggregate (SICStus Prolog)

The reason why I am asking is that I “discovered” that backtracking over a (large) file is both fast and easy to write with library(aggregate), and runs in constant memory. The other option is to use something like phrase_from_file/2 but then I need to do the “max” myself.

The more theoretical objection is that for max(X), there is one solution, so current behavior fine; but for max(X, Witness), there can be more than one Witness, and only showing the first one is actually hiding information.

jan · February 22, 2021, 10:22am

Something like that can work. As suggested, I’d use a different template. Also the way you collect will result in quadratic behavior as nb_setarg/3 copies the argument that gets bigger and bigger. That is nasty to avoid. It is doable though. I once write a findall/3 alternative using this technique call findfew/3 as it is faster if the goal only a few solutions, but gets slower if there are many. Here is the code:

:- meta_predicate
	find_few(?, 0, -).

find_few(T, G, L) :-
	L0 = [dummy|_],
	Result = list(L0),
	(   call(G),
	    duplicate_term(T, T2),
	    NewLastCell = [T2|_],
	    arg(1, Result, LastCell),
	    nb_linkarg(2, LastCell, NewLastCell),
	    nb_linkarg(1, Result, NewLastCell),
	    fail
	;   arg(1, Result, [_]),
	    L0 = [_|L]
	).

jan · February 27, 2021, 8:21am

That would roughly create foldl/4 over a solution set. Well, we can do that:

?- lazy_findall(1000, X, between(1, 1 000 000, X), List),
   foldl(plus, List, 0, Sum).
List = [1, 2, 3, 4, 5, 6, 7, 8, 9|...],
Sum = 500000500000 .

As the list is exposed in the toplevel it is fully materialized. If you use this inside a predicate GC will keep the list short. Timing is not that good. 0.6 sec vs 0.2 for aggregate_all/3. Surely one could gain a bit by improving the implementation. Still, it implies term copying to materialize the list and GC to keep the memory low.

We could have something like this, providing the aggregation predicate, the initial value and the variable(s) it should aggregate on.

?- aggregate_all(plus, X, between(1, 1 000 000, X), 0, Sum).

Ataraxia · December 4, 2021, 2:03am

I’m bumping this because I totally ran aground here today.

I had a series of aggregations in mind that made total logical sense, but I wasn’t able to implement it because there’s no way to set a choice point only if you have multiple maximum values.

Not only was what I wanted to do not possible, but I couldn’t understand why—it just makes total sense to have this allmax functionality! So I spent at least an hour banging my head against a wall.

allmax(Ct) would totally do it for me. I’m begging you! (Or maybe I should just have a poke in the source and try and implement it myself? Would be a good challenge )

Topic		Replies	Views
`nb_setarg` used to get the max number - Reply 1 General	7	371	July 12, 2022
Pandas Aggregation in Prolog General	10	255	November 10, 2024
Generalizing aggregate/3 by adding foldl semantics Discussion discussion	0	240	June 23, 2024
Use of findall for aggregate_all Predicate	1	1092	April 10, 2020
Library(aggregate): Weird stuff! Help! how-to	3	2755	April 26, 2020

Multiple min and max for library(aggregate)?

Related topics