Trying to understand solution_sequences:group_by/4

Are there any examples of using group_by/4? From the test case, it appears that it’s intended to be wrapped in a “find-all” aggregator:

data(1, a, a1).
data(1, a, ax).
data(1, b, a2).
data(2, a, n1).
data(2, b, n0).
data(2, b, n2).

data2(A, B) :- data(A, B, _).

% The following all give the same results: [1-[a, a, b], 2-[a, b, b]].

group1(Groups) :- findall(A-Group, group_by(A, B, data(A,B,_), Group), Groups).

group2(Groups) :- bagof(A-Group, bagof(B, data2(A,B), Group), Groups).

group3(Groups) :- bagof(A-Group, bagof(B, C^data(A,B,C), Group), Groups).

In other words, group_by/4 is like findall (all free variables are existentially quantified) except it fails rather than giving a result of [].

1 Like

I think the docs are quite accurate. It is basically bagof/3 reversing the specification of existential variables.

I just had my “Ah-ha!” – group_by/4 and the other solution_sequences predicates are all intended to run inside an aggregation (with the others, it was obvious; I was confused by pairs:group_pairs_by_key/2, which transforms lists).

Also bagof/3 is slightly different - it doesn’t have the By argument, and it took some staring at the definition of group_by/4 to understand what’s going on there. I can’t think of a situation where By would have free variables in it or be different form the outer aggregator’s template…

Whether you run them inside an aggregation is up to you. These predicates are intended to alter the solution sequence generated by a non-deterministic predicate. This allows you to specify new non-deterministic predicates as simple logical conjunctions and disjunctions of already defined primitives, even if intermediate steps are normally required to (notably) improve efficiency. These predicates also support SWISH, which represents solutions in a table, much like SQL clients do.

For example, suppose we have a/1 and b/1 and want to get the solutions for a(X),b(X). Now suppose a/1 is cheap but produces a lot of duplicate answers and b/1 is expensive. Using this library we simply write distinct(a(X)), b(X) and we are done. Without you get

findall(X,a(X),Xs), sort(Xs,Us), member(X,Us), b(X).

(or setof, but that may have its own problems). Besides being more compact, distinct also produces answers immediately instead of enumerating all answers for a/1. In addition, there is reduced/1 which doesn’t guarantee full uniqueness but runs in limited memory.

These operators combine nicely and allow for example processing the best 10 a(X) using b/1 in a nice concise and (IMO) readable syntax:

limit(10, order_by([desc(X)],a(X))),
b(X).

The group_by/4 is pretty handy if you wants existential qualification for all variables except a few, notably if variables may be hidden in the arguments you pass to bagof/3. Library yall provides a good alternative, I think as bagof(X, {V1,V2,...}>>Goal, Xs). This predicate was most of all added to complete the common vocabulary to manage rows in database answers.

2 Likes