# Forall/2, foreach/2, double negation, relational division and SQL

Recently I revisited database system and found a problem that is very easy to express in Prolog but difficult in SQL.

It is a classic student_take_course problem.

Given the following database.

``````student table:
sname
-------
alice
bob
clark
david
emily
gavin
(6 rows)

course table:
cname
---------
english
history
math
(3 rows)

student_course table:
sname |  cname
-------+---------
alice | english
alice | history
bob   | english
bob   | history
bob   | math
clark | math
emily | english
emily | history
emily | math
emily | biology
ferry | english
ferry | history
ferry | math
gavin | english
gavin | history
gavin | biology
(16 rows)
``````

How do you find all the students in `student` table who have taken all courses?

If you are not familiar with SQL, you might think of using a subquery with `ALL`:

``````SELECT *
FROM student
WHERE sName student_course ALL (
SELECT *
FROM course
);
``````

Unfortunately, SQL does not support this kind of operation. The predicate between `WHERE` and `ALL` can be only comparison operators (>, =, <, …).

Fortunately, we can express it by forall/2 in SWI-Prolog.

``````?- student(SName), forall(course(CName), student_course(SName, CName)).
SName = bob ;
SName = emily ;
false.
``````

This is because the above intended SQL and Prolog query have the same FOL logical reading:

``````∃SName. student(SName) ∧ ∀CName. course(CName) → student_course(SName, CName)
``````

We can further transform this formula via logic equivalence:

``````    ∃SName. student(SName) ∧ ∀CName. course(CName) → student_course(SName, CName)
<=> ∃SName. student(SName) ∧ ¬ ¬ ∀CName. course(CName) → student_course(SName, CName)
<=> ∃SName. student(SName) ∧ ¬ ∃CName. ¬ (course(CName) → student_course(SName, CName))
<=> ∃SName. student(SName) ∧ ¬ ∃CName. ¬ (¬ course(CName) ∨ student_course(SName, CName))
<=> ∃SName. student(SName) ∧ ¬ ∃CName. course(CName) ∧ ¬ student_course(SName, CName)
``````

The resulting formula can be used to derive a legal SQL query:

``````SELECT *
FROM student
WHERE NOT EXISTS (
SELECT *
FROM course
WHERE NOT EXISTS (
SELECT *
FROM student_course
WHERE student.sName = student_course.sName AND course.cName = student_course.cName
)
);
--  sname
-- -------
--  bob
--  emily
-- (2 ROWS)
``````

Comparing Prolog and SQL, it is clear that the Prolog approach is much easier to understand.

P.s. 1. Coincidentally, forall/2 self is also implemented by double negation. The derivation see On a logical reading of `foreach/2` predicate - #15 by chansey97

P.s. 2. This problem can be also solved by foreach/2. Note that under certain circumstances, the behaviors of forall/2 and foreach/2 are the same, but forall/2 is more efficient. For more details see On a logical reading of `foreach/2` predicate - #15 by chansey97

If you are familiar with relational algebra, this problem can also be solved by relational division, except that relational division involves only two tables instead of three.

Now let us ignore `student` table to consider relational division, the new problem can be re-stated as:

How to find all the students who have taken all courses?

If we had relational division, the solution would be a simple `student_course ÷ course`. Unfortunately, SQL does not support it.

In SWI-Prolog, relational division can be easily implemented by “copy-paste” its definition. According to the Wikipedia, `student_course ÷ course` should be defined as:

``````{sc[sName] : sc ∈ student_course ∧ ∀c ∈ course. (t[sName] ∪ c) ∈ student_course}
``````

We can re-write it as FOL:

``````∃sName. student_course(SName, _), ∀CName. course(CName) → student_course(SName, CName)
``````

and write Prolog:

``````?- distinct(SName, student_course(SName, _)), forall(course(CName), student_course(SName, CName)).
SName = bob ;
SName = emily ;
SName = ferry ;
false.
``````

Here the `distinct` reflects the set semantics of relational algebra.

Notice that since `distinct(SName, student_course(SName, _))` plays the original role of `student(SName)`, the new SQL query can be obtained by replacing the original `student` WITH a subquery:

``````WITH student AS (
SELECT DISTINCT sName
FROM student_course
)
SELECT *
FROM student
WHERE NOT EXISTS (
SELECT *
FROM course
WHERE NOT EXISTS (
SELECT *
FROM student_course
WHERE student.sName = student_course.sName AND course.cName = student_course.cName
)
);
``````

There are many other ways to simulate relational division in SQL. The standard way is using `EXCEPT` with a cartesian product, something like:

``````WITH student AS (
SELECT DISTINCT sName
FROM student_course
)
SELECT sName
FROM student
EXCEPT
SELECT sName
FROM (
SELECT sName, cName
FROM student, course
EXCEPT
SELECT sName, cName
FROM student_course
) As students_not_take_all_courses;
``````

This standard way is closer to foreach/2, since foreach/2 underlyingly generates a cartesian product. Thus we can also implement relational division by foreach/2:

``````?- distinct(SName, student_course(SName, _)), foreach(course(CName), student_course(SName, CName)).
SName = bob ;
SName = emily ;
SName = ferry ;
false.
``````

Finally, there is a dangerous but interesting trap:

Now that `foreach(p(x), q(x, y))` can be read as `∃y.∀x.(p(x) → q(x,y)) `, can we replace relational division with foreach/2 directly? (i.e. throw away `distinct(SName, student_course(SName, _))`)

This might work at first glance. For example,

``````?- foreach(course(CName), student_course(SName, CName)).
SName = bob ;
SName = emily ;
SName = ferry ;
false.
``````

However, what if `course` table is empty?

``````?- retractall(course(_)), foreach(course(CName), student_course(SName, CName)).
true.
``````

Prolog returns `true`, but relational division requires all possible `SName`s.

Does it mean foreach/2 is flawed?

No. foreach/2 is correct.

When the antecedent of a implcation is false, the whole implication is true, but the consequent doesn’t have to be true. Here `course(CName)` is always false, so the `∃SName.∀x.(course(CName) → student_course(SName, CName))` is true, but it doesn’t care about the truth of `student_course(SName, CName)`. The resulting `∃SName` could be any value (no matter in `student_course(SName, CName)` or not).

Therefore, the answer is no. We can not replace relational division with foreach/2 directly. Relational division does more than foreach/2.

3 Likes

Your `student/1` table has `ferry` missing!

The other way around somehow works:

``````foreach(p(X), q(Y,X)) <=> r(Y), forall(p(X), q(Y,X))
``````

where `r(Y)` is a superset of the distinct projection `q(Y,_)`. Yes or No?

Lets try, here is the database:

Prolog Text with the Database
``````course(english).
course(history).
course(math).

student(alice).
student(bob).
student(clark).
student(david).
student(emily).
student(gavin).
student(ferry).

student_course(alice,english).
student_course(alice,history).
student_course(bob,english).
student_course(bob,history).
student_course(bob,math).
student_course(clark,math).
student_course(emily,english).
student_course(emily,history).
student_course(emily,math).
student_course(emily,biology).
student_course(ferry,english).
student_course(ferry,history).
student_course(ferry,math).
student_course(gavin,english).
student_course(gavin,history).
student_course(gavin,biology).
``````

And now the two queries:

``````?- foreach(course(CName), student_course(SName, CName)).
SName = bob ;
SName = emily ;
SName = ferry ;
false.

?- student(SName), forall(course(CName), student_course(SName, CName)).
SName = bob ;
SName = emily ;
SName = ferry.
``````

Yes. This is intentional, i.e. I allow some junk data in `student_course` table.

forall/2 and foreach/2 are interchangeable only if `Y` is a ground value when executing forall/2 and foreach/2, see On a logical reading of `foreach/2` predicate - #15 by chansey97

``````?- retractall(course(_)),
foreach(course(CName), student_course(SName, CName)).
true.

?- retractall(course(_)),
student(SName), forall(course(CName), student_course(SName, CName)).
SName = alice ;
SName = bob ;
SName = clark ;
SName = david ;
SName = emily ;
SName = gavin ;
SName = ferry.
``````

Yeah, funny, when `p(X)` is empty, you get different answer substitutions.
This equality would still hold I guess:

``````foreach(p(X), q(Y,X)), r(Y) <=> r(Y), forall(p(X), q(Y,X))
``````

For the above equality you can also relax the condition that `r(Y)`
should be a superset of `q(Y,_)`, it can be anything.

Disclaimer: This is a Datalog thingy, if `Y` becomes non-ground as a result
of `r(Y)` the `forall/2` does a semantic shifting of its meaning,

and the equation possibly breaks again. Not 100% sure.

I agree and I think:

``````foreach(p(X), q(Y,X)), r(Y) <=> r(Y), forall(p(X), q(Y,X)) <=> r(Y), foreach(p(X), q(Y,X))
``````

Of course, `r(Y)` must make Y ground.