I've put an introductory Prolog tut for SQL programmers on Swish

https://swish.swi-prolog.org/p/sql2prolog.swinb

I’ve translated the basic SQL select examples used by Stanford University’s database MooC in the Swish note above.

I’m hoping to get more advanced, making insertions and updates etc. But that’s it for now.

Any suggestions on improvements welcome.

3 Likes

To add my 2 cents …

Its very nice to see SQL and Prolog side-by-side …

But, so far the student will ask why go through the effort to use Prolog if SQL does it all – and SQL is much more pervasively used.

I would look out for examples that can’t be done with SQL, or are much more elegant in Prolog.

I think recursive (graph) structures are hard to handle in SQL but are a natural fit with Prolog.

1 Like

Nice.

I didn’t know that SQL supports recursive calls.

I always thought that Datalog is an “extension” of SQL that supports recursion but no composite terms and functions – to avoid infinite recursions.

So, where does expressivity of SQL end and Prolog (and Datalog) continues …

I think this is the only item that relates to expressivity – indicating that Prolog can say/do more than SQL.

Btw, Conceptbase, a deductive database system, based on Prolog/Datalog, seems to have a notion of compositionality through parameterized combination of query classes.

I always thought that Datalog is an “extension” of SQL that supports recursion but no composite terms and functions – to avoid infinite recursions.

Datalog isn’t an extension of SQL, but rather a subset of Prolog. One of the issues I haven’t gone into is implementation, which is makes competing against open source projects like Postgresql (which has enormous commercial support from big banks) or commercial products like Oracle (which kidnapped Postgresql’s main open source competitor, MySQL, to kill it) fairly impossible.

Datalog strips stuff from Prolog which simply can’t be implemented in a system designed to handle enormous amounts of data, not just stuff small enough to fit into dynamic memory.

Anyways, I’m finding it very educational to revise my SQL basics using Prolog/Datalog. Will expand by tutorial tomorrow.

Yes. That’s why i put it into quote “extension” … i meant it, in the sense of providing capabilities that go beyond SQL’s ability to describe queries

One can write this SQL statement in DES and see its translation to (an extension of) Datalog:

DES> /show_compilations on
DES> WITH RECURSIVE temp (n, fact) AS 
(SELECT 0, 1 -- Initial Subquery
  UNION ALL 
 SELECT n+1, (n+1)*fact FROM temp -- Recursive Subquery 
        WHERE n < 9)
SELECT * FROM temp;

Info: SQL statement compiled to:

answer(A,B) :-
  temp(0,1)
  /\
  (temp(C,D) :-
    temp(E,F),
    E<9,
    C=E+1,
    D=(E+1)*F)
  =>
  temp(A,B).

answer(temp.n:int,temp.fact:int) ->
{
  answer(0,1),
  answer(1,1),
  answer(2,2),
  answer(3,6),
  answer(4,24),
  answer(5,120),
  answer(6,720),
  answer(7,5040),
  answer(8,40320),
  answer(9,362880)
}
Info: 10 tuples computed.          

Curiously, this translation uses hypothetical reasoning for solving the CTE (Common Table Expression in SQL jargon for referring to the local view declaration). This translation was made for the sake of experimenting with this kind of reasoning, where the fact temp(0,1) and the rule temp(C,D) :- temp(E,F), E<9, C=E+1, D=(E+1)*F are assumed in order to solve the goal temp(A,B) (the symbol => is known as embedded implication in hypothetical Datalog). Note also that, since terms are not allowed, a goal such as C=E+1 in this particular system evaluates the arguments of the equality before unifying them.

The more expected translation of the original SQL statement is got from another (quasi) equivalent formulation:

DES> CREATE VIEW temp (n, fact) AS 
(SELECT 0, 1 -- Initial Subquery
  UNION ALL 
 SELECT n+1, (n+1)*fact FROM temp -- Recursive Subquery 
 WHERE n < 9);

Info: SQL statement compiled to:

temp(0,1).
temp(A,B) :-
  temp(C,D),
  C<9,
  A=C+1,
  B=(C+1)*D.

This SQL syntax for recursive queries seems (to me) more natural, though a view must be created for the self-reference. It is not supported in the standard.

We have proposed students to solve SQL puzzles for fun, and wrote a paper around this:
Experiencing Intuitionistic Logic Programming in SQL Puzzles

1 Like

I’ve just expanded it to work through union and intersection, and I’ve found the Prolog versions really nifty, and led me to some enlightenment on the relation between set and logic theory.

Union

select cName as name from College
union
select sName as name from Student;
college(Name, _, _) ; student(_, Name, _, _). 

Intersection

select sID from Apply where major = 'CS'
intersect
select sID from Apply where major = 'EE';
apply(SID, _, 'CS', _) , apply(SID, _, 'EE', _)

I recall reading a history of Augustus De Morgan which said what are commonly known as De Morgan’s laws didn’t originally apply to logic, but to set theory. (Furthermore, he never claimed to have thought of them first, but introduced them to the west by studying eastern mathematics).

What these illustrate is that A ∪ B can generally be replaced by A ∨ B, and A ∩ B by A ∧ B, which was news to me.

I think the formalization of set union in set theoretic terms shows the connection – if you had this in mind:

A ∪ B = {∀ t / t ∈ A ∨ t ∈ B}

I think what is left implicit in your code is the forall – SQL returns all tuples whereas in Prolog you ask to request them one by one.

Jennifer Widom gives the database courses I’m redoing in Prolog which my notes are based on. I’ve also found Jeffrey Ullman to be a brilliant teacher. I did his MooC on automata.

But please don’t just write the atom null there! :wink: One possible solution is to use [] for this, since it isn’t an atom but atomic. It still compares equal to itself, though, which is a problem.

What about representing NULL as a Prolog variable with an attribute that stops it from unifying to anything? The main drawback is that it is hard to write. You get

 table1(e, NULL, f)  :- null(NULL).

With (untested):

null :- put_attr(null, true).

null:attr_unify_hook(_,_) :- fail.

Moving on further, I’ve done the exercise on what SQL calls except, ie what’s written in maths as A - B and called difference textbooks:

select sID from Apply where major = 'CS'
except
select sID from Apply where major = 'EE';

I initially went on a big tangent, but after an aha moment realised this is really simple in Prolog:

apply(SID, _, 'CS', _), \+apply(SID, _, 'EE', _).

I’m now busy with subqueries in the where clause, which doesn’t really tie into Prolog that clearly. This example taught me a few things:

select cName
from College C1
where not exists (select * from College C2
                  where C2.enrollment > C1.enrollment);
college(CName1, _, _),
\+ (    college(CName1, State1, Enrollment1),
        college(CName2, State2, Enrollment2),
        Enrollment2 > Enrollment1    ).

What I learnt here:

  1. \+(A, B, C ...) doesn’t parse, ie a space is needed as in \+ (A, B, C, ...).
  2. I’ve no idea what the style convention is for when you need to group ands, as
    in the part after + between brackets.
  3. I would have abused lists by using max_list (+List:list(number), -Max:number) without the SQL example to show how this can be done with set notation.

Good point. I’ve rewritten it thus:

college(CName1, _State1, _Enrollment1),
\+ (    college(_CName2, _State2, Enrollment2),
        Enrollment2 > _Enrollment1    ).

Keep in mind that the SQL statement that you give to a relational database is rewritten somehow. This is what @j4n_bur53 was trying to show with the execution plan.

As far as I understand, a Prolog program, in contrast to an SQL query, does not magically reorganize itself. Maybe indexing changes things.

Someone correct me if I am wrong, but my impression is that a Prolog program that has the same structure as an SQL query will not have the same operational semantics. On the relational database side, the way a query is executed depends on the current state of the database: indexes, gathered statistics on the data, and so on.

Recall that SQL works with duplicates and, thus, implementing set operations in any language need some considerations and may not be as straightforward as it seems. For example, the following is run on PostgreSQL (using DES as a front-end):

DES> CREATE TABLE t(a int);
DES> INSERT INTO t VALUES (1),(1),(1);
Info: 3 tuples inserted.          
DES> CREATE TABLE s(a int);
DES> INSERT INTO s VALUES (1),(1);
Info: 2 tuples inserted.          
DES> SELECT * FROM t UNION SELECT * FROM s;
answer(a:integer(4)) ->
{
  answer(1)
}
Info: 1 tuple computed.          
DES> SELECT * FROM t UNION ALL SELECT * FROM s;
answer(a:integer(4)) ->
{
  answer(1),
  answer(1),
  answer(1),
  answer(1),
  answer(1)
}
Info: 5 tuples computed.          
DES> SELECT * FROM t INTERSECT SELECT * FROM s;
answer(a:integer(4)) ->
{
  answer(1)
}
Info: 1 tuple computed.          
DES> SELECT * FROM t INTERSECT ALL SELECT * FROM s;
answer(a:integer(4)) ->
{
  answer(1),
  answer(1)
}
Info: 2 tuples computed.          
DES> SELECT * FROM t EXCEPT SELECT * FROM s;
answer(a:integer(4)) ->
{
}
Info: 0 tuples computed.          
DES> SELECT * FROM t EXCEPT ALL SELECT * FROM s;
answer(a:integer(4)) ->
{
  answer(1)
}
Info: 1 tuple computed.

The same operations, as translated and processed by DES with its Datalog engine, are the following:

DES> CREATE TABLE t(a int);
DES> INSERT INTO t VALUES (1),(1),(1);
Info: 3 tuples inserted.          
DES> CREATE TABLE s(a int);
DES> INSERT INTO s VALUES (1),(1);
Info: 2 tuples inserted.          
DES> SELECT * FROM t UNION SELECT * FROM s;
Info: SQL statement compiled to:
answer(A) :-
  distinct(answer_1_2(A)).
answer_1_2(A) :-
  t(A).
answer_1_2(A) :-
  s(A).
answer(t.a:int) ->
{
  answer(1)
}
Info: 1 tuple computed.          
DES> SELECT * FROM t UNION ALL SELECT * FROM s;
Info: SQL statement compiled to:
answer(A) :-
  t(A).
answer(A) :-
  s(A).
answer(t.a:int) ->
{
  answer(1),
  answer(1),
  answer(1),
  answer(1),
  answer(1)
}
Info: 5 tuples computed.          
DES> SELECT * FROM t INTERSECT SELECT * FROM s;
Info: SQL statement compiled to:
answer(A) :-
  distinct((answer_1_1(A),answer_1_3(A))).
answer_1_1(A) :-
  t(A).
answer_1_3(A) :-
  s(A).
answer(t.a:int) ->
{
  answer(1)
}
Info: 1 tuple computed.          
DES> SELECT * FROM t INTERSECT ALL SELECT * FROM s;
Info: SQL statement compiled to:
answer(A) :-
  group_by(t(A),[A],B=count),
  group_by(s(A),[A],C=count),
  D=min(B,C),
  answer_1_4(_E,D).
answer_1_4(1,_A).
answer_1_4(A,B) :-
  answer_1_4(C,B),
  C<B,
  A=C+1.
answer(t.a:int) ->
{
  answer(1),
  answer(1)
}
Info: 2 tuples computed.          
DES> SELECT * FROM t EXCEPT SELECT * FROM s;
Info: SQL statement compiled to:
answer(A) :-
  distinct((answer_1_1(A),not answer_1_3(A))).
answer_1_1(A) :-
  t(A).
answer_1_3(A) :-
  s(A).
answer(t.a:int) ->
{
}
Info: 0 tuples computed.          
DES> SELECT * FROM t EXCEPT ALL SELECT * FROM s;
Info: SQL statement compiled to:
answer(A) :-
  group_by(t(A),[A],B=count),
  group_by(s(A),[A],C=count),
  D=max(B-C,0),
  answer_1_4(_E,D).
answer(A) :-
  t(A),
  not s(A).
answer_1_4(1,A) :-
  A>0.
answer_1_4(A,B) :-
  answer_1_4(C,B),
  C<B,
  A=C+1.
answer(t.a:int) ->
{
  answer(1)
}
Info: 1 tuple computed.

Jan, yours is a good proposal for dealing with nulls. However, nulls are quite a weird concept in the SQL arena. For example, an autojoin could be written as:

?- table1(_,NULL1,_), table1(_,NULL2,_), NULL1 == NULL2.
false.

Which is fine, but the following:

?- table1(_,NULL1,_), table1(_,NULL2,_), NULL1 \== NULL2.
put_attr(NULL1, user, true),
put_attr(NULL2, user, true).

should also fail because a comparison involving nulls should always fail. In SQL, such conditions always returns false.

2 Likes

I am not sure about what is the question about DES. But if it refers to what execution plans or optimisations are applied, then it does not do much at all because it is not intended as a production system. Partial deductions, folding/unfolding and simplifications are some techniques used during translating SQL to Datalog, but nothing really geared towards performance.

DES translates SQL statements into Datalog, but not into Prolog (this could be an idea to explore and develop). Instead, the above translation is executed by the DES tabled-based deductive engine.

Unfortunately, this system does not apply optimisations such as the naive differential optimisation for recursive queries that could be applied but still remains as future work.

I think that that translation can be easily adapted to work as a SWI-Prolog program by declaring temp as a tabled predicate and changing some ‘=’ by ‘is’.

Yes, it is possible. Moreover, you can mix several languages, such as in:

DES> :-persistent(t(a:int), access)
DES> create table s(a int)
DES> select * from t,s
Warning: [Sem] Missing join condition for [t,s].
answer(t.a:int,s.a:int) ->
{
}
Info: 0 tuples computed.

By the way, the warning is issued by the semantic checker. Details can be found in:
[Sae19c] [BibTeX] F. Sáenz-Pérez, “ Applying Constraint Logic Programming to SQL Semantic Analysis

1 Like