Forall/2, foreach/2, double negation, relational division and SQL

Recently I revisited database system and found a problem that is very easy to express in Prolog but difficult in SQL.

It is a classic student_take_course problem.

Given the following database.

student table:
 sname 
-------
 alice
 bob
 clark
 david
 emily
 gavin
(6 rows)

course table:
  cname  
---------
 english
 history
 math
(3 rows)

student_course table:
 sname |  cname  
-------+---------
 alice | english
 alice | history
 bob   | english
 bob   | history
 bob   | math
 clark | math
 emily | english
 emily | history
 emily | math
 emily | biology
 ferry | english
 ferry | history
 ferry | math
 gavin | english
 gavin | history
 gavin | biology
(16 rows)

How do you find all the students in student table who have taken all courses?

If you are not familiar with SQL, you might think of using a subquery with ALL:

SELECT *
FROM student
WHERE sName student_course ALL (
  SELECT *
  FROM course
);

Unfortunately, SQL does not support this kind of operation. The predicate between WHERE and ALL can be only comparison operators (>, =, <, …).

Fortunately, we can express it by forall/2 in SWI-Prolog.

?- student(SName), forall(course(CName), student_course(SName, CName)).
SName = bob ;
SName = emily ;
false.

This is because the above intended SQL and Prolog query have the same FOL logical reading:

∃SName. student(SName) ∧ ∀CName. course(CName) → student_course(SName, CName) 

We can further transform this formula via logic equivalence:

    ∃SName. student(SName) ∧ ∀CName. course(CName) → student_course(SName, CName)
<=> ∃SName. student(SName) ∧ ¬ ¬ ∀CName. course(CName) → student_course(SName, CName)
<=> ∃SName. student(SName) ∧ ¬ ∃CName. ¬ (course(CName) → student_course(SName, CName))
<=> ∃SName. student(SName) ∧ ¬ ∃CName. ¬ (¬ course(CName) ∨ student_course(SName, CName))
<=> ∃SName. student(SName) ∧ ¬ ∃CName. course(CName) ∧ ¬ student_course(SName, CName)

The resulting formula can be used to derive a legal SQL query:

SELECT *
FROM student
WHERE NOT EXISTS (
  SELECT *
  FROM course
  WHERE NOT EXISTS (
    SELECT *
    FROM student_course
    WHERE student.sName = student_course.sName AND course.cName = student_course.cName
  )
);
--  sname 
-- -------
--  bob
--  emily
-- (2 ROWS)

Comparing Prolog and SQL, it is clear that the Prolog approach is much easier to understand.

P.s. 1. Coincidentally, forall/2 self is also implemented by double negation. The derivation see On a logical reading of `foreach/2` predicate - #15 by chansey97

P.s. 2. This problem can be also solved by foreach/2. Note that under certain circumstances, the behaviors of forall/2 and foreach/2 are the same, but forall/2 is more efficient. For more details see On a logical reading of `foreach/2` predicate - #15 by chansey97

If you are familiar with relational algebra, this problem can also be solved by relational division, except that relational division involves only two tables instead of three.

Now let us ignore student table to consider relational division, the new problem can be re-stated as:

How to find all the students who have taken all courses?

If we had relational division, the solution would be a simple student_course ÷ course. Unfortunately, SQL does not support it.

In SWI-Prolog, relational division can be easily implemented by “copy-paste” its definition. According to the Wikipedia, student_course ÷ course should be defined as:

{sc[sName] : sc ∈ student_course ∧ ∀c ∈ course. (t[sName] ∪ c) ∈ student_course}

We can re-write it as FOL:

∃sName. student_course(SName, _), ∀CName. course(CName) → student_course(SName, CName)

and write Prolog:

?- distinct(SName, student_course(SName, _)), forall(course(CName), student_course(SName, CName)).
SName = bob ;
SName = emily ;
SName = ferry ;
false.

Here the distinct reflects the set semantics of relational algebra.

Notice that since distinct(SName, student_course(SName, _)) plays the original role of student(SName), the new SQL query can be obtained by replacing the original student WITH a subquery:

WITH student AS (
   SELECT DISTINCT sName
   FROM student_course
)
SELECT *
FROM student
WHERE NOT EXISTS (
  SELECT *
  FROM course
  WHERE NOT EXISTS (
    SELECT *
    FROM student_course
    WHERE student.sName = student_course.sName AND course.cName = student_course.cName
  )
);

There are many other ways to simulate relational division in SQL. The standard way is using EXCEPT with a cartesian product, something like:

WITH student AS (
   SELECT DISTINCT sName
   FROM student_course
)
SELECT sName
FROM student
EXCEPT
SELECT sName
FROM (
  SELECT sName, cName
  FROM student, course
  EXCEPT
  SELECT sName, cName
  FROM student_course
) As students_not_take_all_courses;

This standard way is closer to foreach/2, since foreach/2 underlyingly generates a cartesian product. Thus we can also implement relational division by foreach/2:

?- distinct(SName, student_course(SName, _)), foreach(course(CName), student_course(SName, CName)).
SName = bob ;
SName = emily ;
SName = ferry ;
false.

Finally, there is a dangerous but interesting trap:

Now that foreach(p(x), q(x, y)) can be read as ∃y.∀x.(p(x) → q(x,y)) , can we replace relational division with foreach/2 directly? (i.e. throw away distinct(SName, student_course(SName, _)))

This might work at first glance. For example,

?- foreach(course(CName), student_course(SName, CName)).
SName = bob ;
SName = emily ;
SName = ferry ;
false.

However, what if course table is empty?

?- retractall(course(_)), foreach(course(CName), student_course(SName, CName)).
true.

Prolog returns true, but relational division requires all possible SNames.

Does it mean foreach/2 is flawed?

No. foreach/2 is correct.

When the antecedent of a implcation is false, the whole implication is true, but the consequent doesn’t have to be true. Here course(CName) is always false, so the ∃SName.∀x.(course(CName) → student_course(SName, CName)) is true, but it doesn’t care about the truth of student_course(SName, CName). The resulting ∃SName could be any value (no matter in student_course(SName, CName) or not).

Therefore, the answer is no. We can not replace relational division with foreach/2 directly. Relational division does more than foreach/2.

3 Likes

Your student/1 table has ferry missing!

The other way around somehow works:

foreach(p(X), q(Y,X)) <=> r(Y), forall(p(X), q(Y,X)) 

where r(Y) is a superset of the distinct projection q(Y,_). Yes or No?

Lets try, here is the database:

Prolog Text with the Database
course(english).
course(history).
course(math).

student(alice).
student(bob).
student(clark).
student(david).
student(emily).
student(gavin).
student(ferry).

student_course(alice,english).
student_course(alice,history).
student_course(bob,english).
student_course(bob,history).
student_course(bob,math).
student_course(clark,math).
student_course(emily,english).
student_course(emily,history).
student_course(emily,math).
student_course(emily,biology).
student_course(ferry,english).
student_course(ferry,history).
student_course(ferry,math).
student_course(gavin,english).
student_course(gavin,history).
student_course(gavin,biology).

And now the two queries:

?- foreach(course(CName), student_course(SName, CName)).
SName = bob ;
SName = emily ;
SName = ferry ;
false.

?- student(SName), forall(course(CName), student_course(SName, CName)).
SName = bob ;
SName = emily ;
SName = ferry.

Yes. This is intentional, i.e. I allow some junk data in student_course table.

forall/2 and foreach/2 are interchangeable only if Y is a ground value when executing forall/2 and foreach/2, see On a logical reading of `foreach/2` predicate - #15 by chansey97

For your example,

?- retractall(course(_)),
   foreach(course(CName), student_course(SName, CName)).
true.

?- retractall(course(_)),
   student(SName), forall(course(CName), student_course(SName, CName)).
SName = alice ;
SName = bob ;
SName = clark ;
SName = david ;
SName = emily ;
SName = gavin ;
SName = ferry.

Yeah, funny, when p(X) is empty, you get different answer substitutions.
This equality would still hold I guess:

foreach(p(X), q(Y,X)), r(Y) <=> r(Y), forall(p(X), q(Y,X)) 

For the above equality you can also relax the condition that r(Y)
should be a superset of q(Y,_), it can be anything.

Disclaimer: This is a Datalog thingy, if Y becomes non-ground as a result
of r(Y) the forall/2 does a semantic shifting of its meaning,

and the equation possibly breaks again. Not 100% sure.

I agree and I think:

foreach(p(X), q(Y,X)), r(Y) <=> r(Y), forall(p(X), q(Y,X)) <=> r(Y), foreach(p(X), q(Y,X))

Of course, r(Y) must make Y ground.