This sounds really great! Just a word of warning:
This is a rabbit hole, in a way. Imagine a simple question, like, “what is the longest line/record that you have”?
If your database is in a file, you can use a combination of GNU wc
and awk
to answer that:
$ wc -L /usr/share/dict/words
24 /usr/share/dict/words
$ awk '{ if (length($0) == 24) print }' /usr/share/dict/words
formaldehydesulphoxylate
pathologicopsychological
scientificophilosophical
tetraiodophenolphthalein
thyroparathyroidectomize
You could do the same with awk only, but it will be a bit more roundabout.
With SQL it is kinda straight-forward. Using SQLite (no need to setup anything):
$ sqlite3
SQLite version 3.32.3 2020-06-18 14:16:19
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> Create table words ( w text );
sqlite> .import /usr/share/dict/words words
sqlite> Select w from words where length(w) = ( Select max(length(w)) from words );
formaldehydesulphoxylate
pathologicopsychological
scientificophilosophical
tetraiodophenolphthalein
thyroparathyroidectomize
Now the question: how do you do this in Prolog? And the answer is, “it depends”
To set it up similar to how it’s done in the SQLite example above, you could use this:
?- setup_call_cleanup(open('/usr/share/dict/words', read, In),
read_string(In, _, Str),
close(In)),
split_string(Str, "\n", "", Words),
forall(member(W, Words), assertz(w(W))).
OK, now we have all the words in a table w/1
. How do you query for the longest word(s)?
Like this:
?- w(W),
string_length(W, N),
\+ ( w(W0),
string_length(W0, N0),
N0 > N
).
or maybe like this:
?- aggregate_all(max(N), ( w(W), string_length(W, N) ), N),
w(W),
string_length(W, N).
or maybe like this:
?- aggregate_all(max(N), ( w(W), string_length(W, N) ), N),
findall(W, ( w(W), string_length(W, N) ), Words).
Is there a more efficient way to do it? A more obvious way to do it? Did we have to insert all words to the database? Would there be a better way to read the file and assert the words? Did we choose the correct data type?
It depends, I guess.
One approach would be to aggressively limit the scope of the course, so that you don’t have to go into such questions altogether.