Recently I revisited database system and found a problem that is very easy to express in Prolog but difficult in SQL.
It is a classic student_take_course problem.
Given the following database.
student table:
sname
-------
alice
bob
clark
david
emily
gavin
(6 rows)
course table:
cname
---------
english
history
math
(3 rows)
student_course table:
sname | cname
-------+---------
alice | english
alice | history
bob | english
bob | history
bob | math
clark | math
emily | english
emily | history
emily | math
emily | biology
ferry | english
ferry | history
ferry | math
gavin | english
gavin | history
gavin | biology
(16 rows)
How do you find all the students in student
table who have taken all courses?
If you are not familiar with SQL, you might think of using a subquery with ALL
:
SELECT *
FROM student
WHERE sName student_course ALL (
SELECT *
FROM course
);
Unfortunately, SQL does not support this kind of operation. The predicate between WHERE
and ALL
can be only comparison operators (>, =, <, …).
Fortunately, we can express it by forall/2 in SWI-Prolog.
?- student(SName), forall(course(CName), student_course(SName, CName)).
SName = bob ;
SName = emily ;
false.
This is because the above intended SQL and Prolog query have the same FOL logical reading:
∃SName. student(SName) ∧ ∀CName. course(CName) → student_course(SName, CName)
We can further transform this formula via logic equivalence:
∃SName. student(SName) ∧ ∀CName. course(CName) → student_course(SName, CName)
<=> ∃SName. student(SName) ∧ ¬ ¬ ∀CName. course(CName) → student_course(SName, CName)
<=> ∃SName. student(SName) ∧ ¬ ∃CName. ¬ (course(CName) → student_course(SName, CName))
<=> ∃SName. student(SName) ∧ ¬ ∃CName. ¬ (¬ course(CName) ∨ student_course(SName, CName))
<=> ∃SName. student(SName) ∧ ¬ ∃CName. course(CName) ∧ ¬ student_course(SName, CName)
The resulting formula can be used to derive a legal SQL query:
SELECT *
FROM student
WHERE NOT EXISTS (
SELECT *
FROM course
WHERE NOT EXISTS (
SELECT *
FROM student_course
WHERE student.sName = student_course.sName AND course.cName = student_course.cName
)
);
-- sname
-- -------
-- bob
-- emily
-- (2 ROWS)
Comparing Prolog and SQL, it is clear that the Prolog approach is much easier to understand.
P.s. 1. Coincidentally, forall/2 self is also implemented by double negation. The derivation see On a logical reading of `foreach/2` predicate - #15 by chansey97
P.s. 2. This problem can be also solved by foreach/2. Note that under certain circumstances, the behaviors of forall/2 and foreach/2 are the same, but forall/2 is more efficient. For more details see On a logical reading of `foreach/2` predicate - #15 by chansey97
If you are familiar with relational algebra, this problem can also be solved by relational division, except that relational division involves only two tables instead of three.
Now let us ignore student
table to consider relational division, the new problem can be re-stated as:
How to find all the students who have taken all courses?
If we had relational division, the solution would be a simple student_course ÷ course
. Unfortunately, SQL does not support it.
In SWI-Prolog, relational division can be easily implemented by “copy-paste” its definition. According to the Wikipedia, student_course ÷ course
should be defined as:
{sc[sName] : sc ∈ student_course ∧ ∀c ∈ course. (t[sName] ∪ c) ∈ student_course}
We can re-write it as FOL:
∃sName. student_course(SName, _), ∀CName. course(CName) → student_course(SName, CName)
and write Prolog:
?- distinct(SName, student_course(SName, _)), forall(course(CName), student_course(SName, CName)).
SName = bob ;
SName = emily ;
SName = ferry ;
false.
Here the distinct
reflects the set semantics of relational algebra.
Notice that since distinct(SName, student_course(SName, _))
plays the original role of student(SName)
, the new SQL query can be obtained by replacing the original student
WITH a subquery:
WITH student AS (
SELECT DISTINCT sName
FROM student_course
)
SELECT *
FROM student
WHERE NOT EXISTS (
SELECT *
FROM course
WHERE NOT EXISTS (
SELECT *
FROM student_course
WHERE student.sName = student_course.sName AND course.cName = student_course.cName
)
);
There are many other ways to simulate relational division in SQL. The standard way is using EXCEPT
with a cartesian product, something like:
WITH student AS (
SELECT DISTINCT sName
FROM student_course
)
SELECT sName
FROM student
EXCEPT
SELECT sName
FROM (
SELECT sName, cName
FROM student, course
EXCEPT
SELECT sName, cName
FROM student_course
) As students_not_take_all_courses;
This standard way is closer to foreach/2, since foreach/2 underlyingly generates a cartesian product. Thus we can also implement relational division by foreach/2:
?- distinct(SName, student_course(SName, _)), foreach(course(CName), student_course(SName, CName)).
SName = bob ;
SName = emily ;
SName = ferry ;
false.
Finally, there is a dangerous but interesting trap:
Now that foreach(p(x), q(x, y))
can be read as ∃y.∀x.(p(x) → q(x,y))
, can we replace relational division with foreach/2 directly? (i.e. throw away distinct(SName, student_course(SName, _))
)
This might work at first glance. For example,
?- foreach(course(CName), student_course(SName, CName)).
SName = bob ;
SName = emily ;
SName = ferry ;
false.
However, what if course
table is empty?
?- retractall(course(_)), foreach(course(CName), student_course(SName, CName)).
true.
Prolog returns true
, but relational division requires all possible SName
s.
Does it mean foreach/2 is flawed?
No. foreach/2 is correct.
When the antecedent of a implcation is false, the whole implication is true, but the consequent doesn’t have to be true. Here course(CName)
is always false, so the ∃SName.∀x.(course(CName) → student_course(SName, CName))
is true, but it doesn’t care about the truth of student_course(SName, CName)
. The resulting ∃SName
could be any value (no matter in student_course(SName, CName)
or not).
Therefore, the answer is no. We can not replace relational division with foreach/2 directly. Relational division does more than foreach/2.