As a general rule of thumb, setof/3 is far too complicated for the average user IMO. It is great if you need its features. In database speak setof/3 and bagof/3 are group by operators (see also group_by/4). If you do not want to group and you merely want a list without duplicates simply use findall/3 + sort/2.
There’s some discussion about the “exists” notation and other nuances of findall/setof (and other aggregators) here:
As a general rule, there shouldn’t be any “unused” variables in a predicate to findall/3, bagof/3, setof/3 – use an auxiliary predicate if necessary. And be especially careful of “_” in the predicate to findall/3 et al – in that context, it almost never does what you think it does.
Theoretically you are right. I’ve seen so many people messing up with bagof/3 and setof/3 that I’m getting less and less convinced about this advice for practical use though. If the template instantiation is ground findall/3 is fine. If not, yes you get fresh variables. You have to be aware of that. If it is not what you want, you need bagof/3 or setof/3. For these predicates you have to be aware of existential variables though. Note that these might be nicely visible in your program, but may also be hidden in the goal term somewhere, i.e., come from the caller of the predicate you are defining. After all those years I’ve been tricked too often by such issues
Note that typically you use the aggregation predicates to get all solutions from a database query. As you get fresh variables from this execution, making another copy doesn’t hurt and avoids all the overhead to restore the variable bindings implies by bagof/3 and setof/3. It does hurt for queries like below if Term is not ground.
findall(Y, (member(X, Term), process(X,Y)), Ys)
In these cases high order predicates like (in this case) maplist/3 are often a better (and faster) choice. Note that if constraints are used copying terms, even while restoring variable bindings may upset the constraint solver.
I am always surprised by the hidden complexities of seemingly simple predicates. What can be complicated in a forall/3 call, and yet there is so much hidden complexity.
I guess, all programming languages have the subtleties …
Even the great list of examples Peter provides are very terse and need study to understand.
It would be good if you, or someone else (Peter?), could create a more elaborate blog post to help uninitiated understand in a step by step way the underlying complexites that derive from logic variables, “fresh variables” and thinking in terms of backtracking and whatever else is hidden here.
That would be a lot of work! And I’m not sure I could say it better than people like Ricard O’Keefe (although his chapter on findall/setof/etc is easier to read if you skip the implementation stuff).
I remember when I took a mathematical logic course, and I was amazed that the professor skipped over anything approaching a formal definition of “free variable”. (If I took the same course now, I’d ask the professor for a formal definition … but in those days I cared a bit more about getting along and getting good grades.) There are plenty of English-language definitions, but I don’t recall ever seeing a formal definition (perhaps page 400 of Principia Mathematica has one …)
(And if you look at Wikipedia, there are lots of examples, but the “formal” definition feels rather hand-wavey.)
And it’s more complicated than just “avoid free variables in findall and setof; and by the way free variables behave differently in findall and setof; and use of anonymous variables (”_“) is almost always a mistake”. For example:
I am unsure what you mean. To make this discussion concrete …
if one, say, write, as a logic expression
x = y^2
Then, its unclear what we say because both variables are unbound. So, its ambiguous – does this hold for every x and every y or only for some x and some y.
Edit:
Now that i think of the above – there is context missing – what are the domains of x and y – is crucial to understand whether the relation holds or not and whether what kind of quantification holds
Once one existentially quantifies y, then logically says, that there exists a y for which x = y^2.
By binding y existentially, and having x defined through y, i guess, y is considered existentially bound as well – i.e. there exists at least one x for which the bound y^2 holds.
Can you elaborate on where you see the difficulty of definition of unbound variable here – are there cases covered in prolog, not covered by unbound variables of expressions that have other variables existentially or universal quantified.
It is really difficult, my feeling is that you have many small typos in the things you write so I don’t even know if we are all talking about the same things or just saying some stuff that happens to use the same words.
Your original example:
I can’t escape the feeling something is botched here, the second argument of your findall looks different from the second argument of your setof, doesn’t it? So already the premise of the whole thread is unclear to me. Then, you write:
Good question. What is a forall/3? Do you mean findall/3 or forall/2? We don’t know.
Writing a minimal reproducible example will of course take care of such problems, but experience shows that people are wiling to sacrifice anything, just to avoid sharing a reproducible example.
There is so much written on findall/3 and bagof/3 and setof/3. You can both read why those predicates exist, and you can also study how they are implemented, if you are interested enough.
Don’t do that. Conjunctions are not tuples. The thing that breaks conjunctions is that you cannot use syntactic unification to tell apart a pair from a triple from a quadruple etc. Because:
?- (X, Y) = (a, b, c).
X = a,
Y = (b, c).
There might be some good reason to want this. But in the code that I have seen in the wild using this “tuple” construct, (X, Y), I am quite certain people who write it just don’t understand what it means.
I really wish to know what started this silliness. I think I first noticed this in questions on various forums where some well-meaning and overworked teaching assistant was trying to write equivalent programs in several “niche” languages like Haskell, Scheme, Prolog, etc. A similar phenomenon is overusing parentheses in Prolog code, making it superficially similar to Lisp code I guess?
I don’t understand this: it’s not a “logic expression” that I’m familiar with.
Anyway, to expand a bit on my examples. Here’s a clause where all variables are bound:
p(Zs, X) :- setof(Z, q(Z, X), Zs).
And here’s one where X is not bound (it’ll generate a “singleton” warning):
q(Zs) :- setof(Z, q(Z, X), Zs).
In both cases, as far as the setof is concerned, X is unbound, so backtracking can generate multiple Zs on backtracking. In the first clause, the the X is also available to whatever calls q/2; the X is not available in the second; and in the second clause the X is also unbound at the clause level.
I leave it as an exercise to the reader to consider which of the other clauses this is equivalent to:
s(Zs) :- setof(Z, qq(Z), Zs).
qq(Z) :- q(Z, _X).
The notion of “bound variable” is easy to explain (at least to people whose language includes relative pronouns), and straightforward to implement (if you know how to write parsers and compilers); but I don’t think I’ve ever seen a formal definition. As @jan observes, unbound variables and existential quantifiers in setof/bagof can be confusing to beginners; I’m an “imtermediate” and I find them confusing and usually write auxiliary predicates instead of using “^”, and also avoid things like setof(X-Z,(p(X,Y),q(Y,Z)),XZs) by using an auxiliary predicate for p(X,Y),q(Y,Z).
Relative pronouns can also be tricky in natural language, where they’re one way of producing crashblossoms.
I wrote a little post on my blog a little while ago attempting to explain just this issue & how bagof/3 and findall/3 differ in this respect – perhaps it will prove useful (and/or someone can tell me what I got wrong )