I’ve had my first encounter with NULL redoing the Stanford SQL MooC with this problem:
select sName, sID, cName, major
from Student left outer join Apply using(sID);
There are four students in the Student table who aren’t in the Apply table, and the requirement is not to ignore them (as a normal join would do) but include them with their cName and major set to NULL.
I will bring this up again, but I promise it is the last time. Do not use the atom null for representing a missing value, esp. if any other column is represented using atoms. This makes the text “null”, represented as the atom null, exactly the same as a missing value.
I am not sure what to use instead. The atomic, not-an-atom [] is one option for SWI-Prolog, but it has other problems. See the comments by @jan and @j4n_bur53.
In another domain I had the problem of “missing” value that would cause a fail if not taken care of. I decided to introduce an “custom rolled” optional value based on dict.
Something like below:
Errors would be some error message atoms created and, eventually, checked, throughout.
Not sure how approriate this is in an SQL context — but it could show that Prolog can work with optionals, including via, monads.
I think the correct solution for null (which has its own problems in SQL – see Errors considered harmful - #102 by peter.ludemann) is to do something like Haskell’s Maybe = Just a | Nothing … that is, a non-defaulty representation. (In Prolog, this would be just(X) and nothing, for example, which can always be distinguished.)
Although, when designing a database, it’s often possible to avoid null completely – with specifying not null for a column when creating a table in SQL, and specifying a suitable default value for strings (typically a zero-length string).
The first relational database I used seriously was Oracle DB; in Oracle, a zero-length string is in practice a NULL. imagine my shock when I discovered that not all databases do that.
Of course missing values can be avoided completely: just normalize your data. The SQL null is just a pragmatic approach to not normalizing your data. There are again many reasons why it is in practice useful to not normalize your data, there is enough written on the topic and I am definitely not an expert.
PS: faking a missing value instead of using null, just because the column has been defined as “not null”, this doesn’t sound right.
Yes, but the person and whatever number they have are separate entities. Not having a serial number does not make you less of a person.
This is not what I meant by “normalizing your data”. Fully normalizing the data would mean that you have a table person, and another table linking the person to some number. There will be simply no row in that other table for people that have a missing number.
But I am certain I am not telling you anything you didn’t know.
One major practical reason for allowing missing values in a relational database is that you can denormalize your data, and keep pieces of data that are usually used at the same time in close physical proximity to each other. So, getting the data is faster/cheaper. But again, those are obvious things I guess?
No, they are not supported. I think there is no need for them for what DES is targeted to. The number of columns in a predicate depends on the host Prolog system. Currently, it runs in both SICStus and SWI-Prolog. Ciao and GNU Prolog were also supported up to series 3.
Maybe techniques as vertical partitioning, as found in big data systems (e.g., Hadoop/Hive) might be appropriate for efficiently dealing with large amounts of data.
I agree. In DES, since there are no user compound terms, a null is internally represented as ‘$NULL’(Id), where Id is a unique identifier, so that two nulls are not equal. Moreover, comparisons need to receive a special treatment, as in:
I made the mistake of using Facebook’s GraphQL in the past. It’s utter shite nobody should touch with a hazmat and a bargepole.
I wasted hours trying to figure out Facebook’s “technology” clearly developed by a bunch of pimply 11-year-olds who can’t document zilch, only to see Facebook’s “standard” completely rewritten by lawyers with zero computer science after the Cambridge Analytica scandal (thereby breaking everyone’s code).
Long story short, friends don’t let friends use “technology” developed by Facebook, Google, Amazon, Microsoft, IBM…
Nice job. I see a lot of findall/3 with sort/2 and sort/4. This works, but breaks the uniformity of solutions being answers to the query. library(solution_sequences) avoids this, providing distinct/1,2, order_by/2, offset/2, limit/2, etc. This library was added to make translating SQL like queries easier
Note that the SWISH menu on queries allow you to quickly add some of these wrappers to the current query.
I wasn’t aware of the Modify solution sequences, so many thanks for the pointer. I’m in the process of rewriting the tut to use them, So far, I translated
select distinct sName, major
from Student, Apply
where Student.sID = Apply.sID;
Witness can be any term. Well, if there are no variables or the variables do not share with the goal you only get one answer Otherwise it simply uses the variables in the term, regardless of the shape of the term.
I would look out for examples that can’t be done with SQL, or are much more elegant in Prolog.
I had some experience in the past with heavy SQL statements. The differences that matter from my point of view includes:
writing a complex SQL statement with nesting, existence checks, various join types (inner, outer, left, right) can be very painful and consume several hours to get it right
in Prolog if you don’t get the expected results you start the tracer and look through the “joins” = “unifications” step by step. In SQL your best choice is to break the SQL into pieces and debug those seperately and then put them together again, which is quite time consuming
if your SQL is part of an application those additional hours don’t really matter, it’s more important what database is in operation serving concurrent users, which DB drivers you may use etc.
on the other hand if you try to stick together your personal information base from heterogenous sources, where others use MS Excel or MS Access, and use this ad hoc, when someone asks for a particular view/extract of data, you can’t use SQL. You never get the time to consolidate your data to a degree, where an SQL would generate a clean output. Your only choices are to use some kind of Excel thing or use Prolog. Excel is easier at the beginning and painful because of redundancies and manual errors when your data collection grows. In Prolog you need more time and thoughts in the beginning to organize your data, but later you can produce reliable output with decent effort. Normally in the end of a project you have a lot more pressure - so Prolog worked well for me in those conditions.
usecases where this worked out well for me were: dependency analysis of legacy code, identifying missing or inconsistent application documentation (architecture, components, interfaces etc.), specify network topology like cabling between devices and switches, routing, firewall rules etc.