What's the difference between Prolog and Datalog?

I’m staring a fork of the What’s is up with the cut? thread here, since possibly the main difference between Prolog and Datalog is Datalog forbids cuts.

Confession: I’ve heard of Datalog via things like Robert Kowalski’s History of Logic Programming and Jeffrey Ullman’s Bottom-up beats Top-Down, but can’t claim to really understand the distinction.

My guess is implementation: Datalog is pitched as a direct competitor to SQL, which means it is primarily intended for permanent as opposed to dynamic storage. (This is generally a very vague topic, since SWI Prolog has SWI-Prolog -- persistent/1 while there’s SQLite which I’m a bit concerned many users don’t realise is a teaching tool, not a proper SQL database).

Since Datalog is described as a subset of Prolog, I’ve never bothered learning it properly, simply assuming that if I don’t use cuts, nested compounds, and I’m not sure what else, I can simply use Prolog.

However, things like magic sets indicate that under the hood, Datalog is not supposed to be just a stripped down Prolog.

At least one implementation of Datalog, DES, is promoted in this group, but I’ve no experience with any of them. So one question I have is, are there are open source versions of Datalog which offer any advantages to using SWI Prolog with persistence or its ODBC interface to Postges or whatever?

Persistency is orthogonal to the different execution model. Note that tabling provides all the advantages of Datalog. The downside, as with other more pure logical subsets of Prolog, is that the border is not entirely clear. Valid Datalog programs are handled fine by Prolog after adding the necessary :- table directives. A much larger set of programs can be handled correctly by tabled programs though, i.e., having function symbols is fine, provided the rules do not allow them to grow unbounded. Using high-order programming is fine (not sure here whether Datalog has that). Even cuts can be used, although under the restriction that they do not limit solutions in a so called strongly connected component (SCC), i.e., a set of goals involved in mutual recursion.

Datalog engines have been optimized for the job much further than (especially) SWI-Prolog, so when it comes to scaling they probably win on most dimensions.

2 Likes

I guess the datalog community wants a language that is decidable and (i guess) tractable by “design” – i.e. the end user can not express something that will render a query undecidable or non-tractable.

The overall motivation I guess, is that queries, written by mere-mortal data scientists should always return with a result, and return in a reasonable amount of time.

I think that the main difference between Prolog and Datalog is that the latter is not meant as a general-purpose language; rather, it is targeted to querying relational databases where one typically wants all solutions from a query. This also means that compound terms are not allowed because they are not described in the relational model. Thus I agree with joeblog’s guess: “Datalog is pitched as a direct competitor to SQL”.

As Jan says, Datalog systems are optimized for typical operations such as the transitive closure (e.g., with the naive differential optimization, c.f. paper about DLV), non linear recursion and persistence (Semmle, LogicBlox).

In turn, DES was born as an educational system and is not (performance-wise) competitive with other systems, but it offers some features that might be appealing. In response to joeblog, this system allows both predicate persistence in external databases, and access to ODBC connections. In this last case, external tables and views are seen as Datalog predicates and can be seamlessly queried from Datalog. Also, other languages such as (recursive) SQL and relational algebra can be used. The main features of DES can be consulted in this page, but one recent and interesting addition is the graphical SQL declarative debugger.

6 Likes

The major difference between Prolog systems and Datalog systems is possibly that Datalog systems might do query planning for you. So for a query or inside rules it might re-order goals whatever before execution. Just like SQL does.

A small example of reordering, not sure whether a Datalog system can really reorder it, but just to show execution time differences. Even second argument indexing cannot help:

enrolled(anna, 1993).
enrolled(bob, 1994).
enrolled(carla, 1993).

rector(1989, paul).
rector(1990, peter).
rector(1991, jack).
rector(1992, jill).
rector(1993, hector).
rector(1994, nestor).

?- time((between(1,1000000,_), enrolled(X,Y), rector(Y,Z), fail; true)).
% 5,000,000 inferences, 0.264 CPU in 0.266 seconds (99% CPU, 18955334 Lips)
true.

?- time((between(1,1000000,_), rector(Y,Z), enrolled(X,Y), fail; true)).
% 8,000,000 inferences, 0.504 CPU in 0.509 seconds (99% CPU, 15860177 Lips)
true.