Aggregation

I have had a few bad experiences with the scoping rules in the default aggregation meta-predicates, and thought that it might be handy to use yall to use aggregation instead. i.e. the aggregator template is just the single argument to a predicate which acts as a generator.

:- meta_predicate sol_set(1,?).
sol_set(Predicate,Result) :-
    findall(Template,
            apply(Predicate, [Template]),
            Templates),
    sort(Templates, Result).

:- meta_predicate sol_bag(1,?).
sol_bag(Predicate,Result) :-
    findall(Template,
            apply(Predicate, [Template]),
            Result).

However, I also noticed that you can capture the formal parameter list from yall! I had never considered this before for some reason.

I wonder is there any way that this can be overcome with goal-expansion magic?

And if not, I wonder if there is a way to have a fairly conservative expansion of prolog that includes “lambdas” of some form.

Not sure I get this. There are a few remarks possible though. First of all, use call(Pred,Template) instead of apply(Pred, [Template]). apply/2 is deprecated and call/N is ISO (and implemented in the VM to reduce the overhead).

Given proper meta predicate declarations yall is compiled away and creates an intermediate predicate if necessary. I personally do not like lambda’s too much. I consider them harder to read than writing my own helper predicate in most cases (except for simply swapping arguments).

Can you give examples of things that went wrong or looked awkward? Note that the library(solution_sequences) and tabling can sometimes provide good alternatives to the bagof/setof/sort family.

2 Likes

Thanks Jan, I’ve replaced apply with call in the definition now.

My prediliction for lambdas probably comes from spending too much time using functional programming languages, so perhaps not something others are concerned with.

However, in the context of aggregation I think it generally makse sense to have explicit quantification of variables - which are closed over and which are existential. And the syntax here always seemed a bit more akward to me than the yall style.

The fact that templates can be seen as the accumulation of a generator makes it fairly natural to define the aggregators as a higher-order predicate rather than as a special form. And indeed one need not use yall at all and can instead define the predicate separately with a helper.

Supposing I do use yall, the following gives an example of how I might use it to create solutions with a generator.

?- sol_set([X]>>(member(X,[1,2,3])), Results). 
Results = [1, 2, 3].

Yet here we see the capture in the formal parameter list.

?- X = 3, sol_set([X]>>(member(X,[1,2,3])), Results).
X = 3,
Results = [3].

Not a true lambda if the lambda args are not always treated fresh.

I recall we have been at that square on this forum. Maybe @pmoura recalls? Note that in SWI-Prolog, goal expansion does have access to the context term and can (in theory) figure out that the lambda variables are used also in the remainder of the clause. This is currently not used. A downside is also that if for some reason the lambda is not term expanded but called you cannot verify this constraint.

Anyway, I should leave this to Paulo. I have little opinion on or experience with lambdas in Prolog.

Could you expand a bit on the motivation behind sol_set/2 and sol_bag/2?

I read what you wrote:

… but I cannot easily imagine the exact problems you ran into. And by “default aggregation meta-predicates”, do you mean bagof/setof or the predicates in library(aggregate)?

As may be aware, library(yall) implements Logtalk lambda expressions. The Logtalk linter includes several checks for lambda expressions. In this particular case, if we create an object with your code:


:- object(gavin).

	:- public(sol_set/2).
	:- meta_predicate(sol_set(1,*)).
	sol_set(Predicate, Result) :-
		findall(Template, call(Predicate, Template), Templates),
		sort(Templates, Result).
	
	:- public(sol_bag/2).
	:- meta_predicate(sol_bag(1,*)).
	sol_bag(Predicate, Result) :-
		findall(Template, call(Predicate, Template), Result).

	:- public(p/1).
	p(Result) :-
		sol_set([X]>>(list::member(X,[1,2,3])), Result). 

	:- public(q/1).
	q(Result) :-
		X = 3,
		sol_set([X]>>(list::member(X,[1,2,3])), Result).

:- end_object.

We will get:

?- {gavin}.
*     Lambda expression [A]>>(list::member(A,[1,2,3])) parameter variable A used elsewhere in clause
*       while compiling object gavin
*       in file /Users/pmoura/Desktop/gavin.lgt between lines 20-22
*     
% [ /Users/pmoura/Desktop/gavin.lgt loaded ]
% 1 compilation warning
true.

?- gavin::p(Results).
Results = [1, 2, 3].

?- gavin::q(Results).
Results = [3].

This warning is not yet, however, implemented in library(yall). From a brief discussion sometime ago with Jan, it may be doable (if anyone wants to sponsor that work, get in touch privately).

As the warning clarifies, variables in lambda parameters must not appear elsewhere in the clause, notably, before the lambda expression.

The implicit suggestion would be interpret any variable used as a lambda parameter as unique even if appearing elsewhere in the clause. That should be possible but still a bit odd when we look at lambda expressions in a clause as a sub-term and not as a closure.

1 Like

Hello Boris,

Yes, I mean setof/bagof. The semantics of setof, are in my opinion “wrong”, and I don’t think I’m alone in thinking this. It’s not sensible from the point of view of proving the set of results to conflate the empty set with failure.

But in addition to this, there is a special purpose language which gives unusual behaviour for the treatment of variables inside of the body of the aggregation predicates. And to make things a bit more confusing, it is a different treatment for different aggregators and not uniform.

In reality the problem stems from quantification - so it would seem that the appropriate way to dispell the problem would be a consistent method of quantification. This can be solved by making closures with a predicate, ensuring existentials for the body as per:

contains(L,X) :- member(X, L).

example(Result) :- L = [1,2,3], sol_set(contains(L),Result). 

Or using yall syntax:

example(Result) :- L = [1,2,3], sol_set({L}/[X]>>(member(X,L)),Result). 

The gotachas I have experienced in the past around the special treatment of forms in the aggregation predicates include unexpected capture of free variables. However it also includes behaviour such as the following:

?- LL = [[1,2,3],[4,5,6]], setof(X, (member(L, LL), member(X,L)), Result). 
LL = [[1, 2, 3], [4, 5, 6]],
L = Result, Result = [1, 2, 3] ;
LL = [[1, 2, 3], [4, 5, 6]],
L = Result, Result = [4, 5, 6].

?- LL = [[1,2,3],[4,5,6]], setof(X, L^(member(L, LL), member(X,L)), Result).
LL = [[1, 2, 3], [4, 5, 6]],
Result = [1, 2, 3, 4, 5, 6].

Of course I’m aware that this is the intended semantics of setof, but I still find it to be unusual and very rarely what I actually want.

Is worth reading about the bagof/3 and setof/3 semantics as described by the author of these predicates, David H. D. Warren:

https://aitopics.org/download/classics:C65CF540

See page 12 for the explanation.

2 Likes

Richard O’Keefe (Craft of Prolog page 363) quotes Lee Naish in pointing out that findall/3 can be used to define all sorts of nasty (non-logical) things such as not/1 and var/1:

not(Goal) :- findall(., Goal, []).
var(Var) :- findall(., (Var = a; Var = b), [_,_]).

If you want to conflate failure with [], it’s easy enough:

setof_or_nil(Template, Goal, Result) :-
    ( setof(Template, Goal, Result) -> true ; Result = [] ).

and it’s up to you to confirm that Goal and Template allow only sound results.

And if you don’t like the way existential variables are specified with bagof and setof, see library(solution_sequences). And library(aggregate) gives some generalizations on bagof/setof.

2 Likes

… we can also use a lambda expression! :stuck_out_tongue: E.g. instead of:

setof(
    Currency,
    Country^Capital^Population^country(Country, Capital, Population, Currency),
    Currencies
)

we can write:

setof(
   Currency,
   {Currency}/country(_, _, _, Currency),
   Currencies
)

Taken from this example.

1 Like

Thanks! That finally explains why the empty set is considered failure! I’m glad to hear that there is a consistent rationale.

However it appears to be to ensure back-tracking behaviour that I’ve literally never wanted. :smiley:

@peter.ludemann those are very interesting extra-logical definitions!

As per setof_or_nil, I’ve a predicate like that in a library.

However my point about the syntax of existential variables is not merely that I dislike the syntax - I wouldn’t mind much if it were consistent. The bigger point is that they aren’t actually quantifiers! They are essentially a macro which performs some magic and can have unusual behaviour if the variable exists in context.

X=1, setof(X, X^(member(X, [1,2,3])), Xs).

It seems to me that proper quantification would be desirable given that we feel that higher-order features are desirable enough to have them.

1 Like

Which magic exactly do you have in mind? In the query you showed, you first bind X to 1; at that point, X and 1 are the same thing. This would suggest that if I just replace X with 1 in everything that follows, I should get the same result to the query. I will try this out (skipping unnecesary parentheses):

?- X = 1, setof(X, X^member(X, [1,2,3]), Xs).
X = 1,
Xs = [1].

?- setof(1, 1^member(1, [1,2,3]), Xs).
Xs = [1].

OK.

I am also struggling to understand the motivation behind telling setof to not bind X (it seems you actually want to bind X in this scenario?)

?- X = 1, setof(X, member(X, [1,2,3]), Xs).
X = 1,
Xs = [1].

?- setof(X, member(X, [1,2,3]), Xs).
Xs = [1, 2, 3].

?- setof(1, member(1, [1,2,3]), Xs).
Xs = [1].

To me it seems that for this particular example you get identical behavior (please correct me if I made a mistake!).

I am a completely out of my depth here, but I will attempt to make a suggestion: take a look at the predicates in library(aggregate), and specifically at the source code. For example, there is the free_variables/4 predicate. You can also reproduce your problems with setof using aggregate :slight_smile:

?- aggregate(bag(X), member(X, [1,2,1,3]), Xs).
Xs = [1, 2, 1, 3].

?- aggregate(set(X), member(X, [1,2,1,3]), Xs).
Xs = [1, 2, 3].

?- X = 1, aggregate(set(X), member(X, [1,2,1,3]), Xs).
X = 1,
Xs = [1].

If it were actually a quantifier, the other X would be irrelevant as is normal with lexical scoping of quantifiers in logic.

OK, I think I understand. Please correct me if I am wrong, but for you the underlying problem is that you don’t like the syntactic scoping of variables in Prolog?

EDIT: the clean solution to this is to use helper predicates. But of course this is a topic that has been chewed to death, and to me at least it seemst that this is about personal preference and there is no “one correct way to do it”.

EDIT2: There is no one correct way to do it but there are still ways that are more difficult than others, since they are in conflict with the design decisions of the language/implementation.

I personally have been on the fence in regards to (anonymous) lambdas. Some languages kinda steer you towards lambdas, Prolog is not one of those. I still use library(yall) every now and then. I also keep on making mistakes with the variable names when I do :slight_smile:

(This is copied from an old tweet of mine and is a good example when you do want the backtracking behavior.)

It’s often overlooked in Prolog learning materials that bagof/3 and setof/3 support aggregating results. E.g.

i(1,odd).
i(2,even).
i(3,odd).
i(4,even).

| ?- bagof(I, i(I,K), L).

K = even
L = [2,4] ? ;

K = odd
L = [1,3]

yes

This is possible only due to how free variables are handling in the query. We cannot do the same with findall/3.

3 Likes

This is basically the killer feature of bagof/setof, for me. It is the only reason to use those instead of findall.

I think you are right here. Scoping in Prolog is always clause-based though and I see little room for changing that. The compiler (term rewriter) could (in theory) be smarter, but we are also faced with the fact that we are supposed to be able to call arbitrary body terms that are created at runtime. This probably gets nasty. Tooling that warns against possible issues is the best I can think of if we do not want to depart too far from good old Prolog :frowning:

Note that similar problems also bite in other places. Consider (A = x; write(A)), where A is not a classical singleton, but is one in the semantic sense. The SWI-Prolog compiler warns against this, but only knows the built-in control structures. findall(X, ...), write(X) is also a singleton, but passes undetected.

1 Like

There’s an alternative way of specifying free/existential variables instead of

Using group_by/4 from library(solution_sequences):

?- group_by(EvenOdd, Value, i(Value, EvenOdd), Bag).
EvenOdd = even,
Bag = [2, 4] ;
EvenOdd = odd,
Bag = [1, 3].

Also:

?- setof(EvenOdd-Bag, group_by(EvenOdd, Value, i(Value,EvenOdd), Bag), Groups).
Groups = [even-[2, 4], odd-[1, 3]].

?- setof(EvenOdd-Bag, bagof(Value, i(Value,EvenOdd), Bag), Groups).
Groups = [even-[2, 4], odd-[1, 3]].

(If you look at the code for group_by/4, you can see how to easily create your own notation if you don’t like setof, bagof, and friends.)

4 Likes

Yeah, it’s more than that I don’t like it. It’s that I think it is actually wrong - it trips people up when we have implicitly or explicitly quantified forms like group_by, setof, bagof, anonymous lambda etc. and we have variables already bound in context. These forms have extended the pure fol fragment which prolog uses and are a kludge rather than a clean addition utilising a higher-order syntax.

We can of course live with a kludge, and it has been done, but I’m not sure why we should have to. Adding a syntax for existential and universally quantified variables is unlikely to…

[EDIT] This was a partial response, accidentally sent and not well thought out!

I guess you are right. There are few ways out though. As is, Prolog read/1 does variable scoping at the term level, being completely unaware of the semantics. Well, terms have no semantics themselves. It is the context that gives them semantics. X^setof(…) is just a term. It only gets the usual semantics if this appears in a subgoal position of a larger term that represents a valid clause and we give this clause to assertz/1 (or it appears in a subgoal position of a body term that is handed to call/1 and probably you can think of some more contexts).

I don’t think we want to give up read/1 being unaware of the context of the term it is reading. That is an important basis for the code-is-data that allows us to reason about programs, construct programs at runtime (dynamic programming), do meta-interpretation, etc. Picat took that turn and it was for me the main reason not to get involved in this (otherwise) interesting route.

You see this problem also with singleton variables. They are a useful clue for finding common mistakes in clauses, but a variable that appears twice but only in two different branches of a disjunction is still a singleton in de semantic view: it doesn’t carry information from one place in the clause to another. That is why SWI-Prolog does semantic singleton checking, but unfortunately this is limited as the compiler only knows about the control structures and a few built-ins such as true, fail, etc.

So, I fear we have to live with the kludge. What we can do is find dubious sharing variables as part of the clause analysis. This allows us to warn about variables appearing in existential qualification of setof/3, etc. that do not appear in the setof/3 goal or also appear elsewhere in the clause. I think there is quite a bit to gain by doing so and most likely it isn’t even that hard. Logtalk seems to be doing some of that already.

2 Likes