Inline subterm naming syntax (was: Syntax for unify-and-remember in head?)

Aha! This makes it fall into place - everything about the existing functionality makes perfect sense when you look at cyclic terms as bug rather than as feature :smiley: (I think at one point I knew that the “proper” Prolog behavior was to disallow creation of cyclic terms, but sometime between then and now I forgot that bit, ha.) And just like that our write/read contract is solid again, with the entirely reasonable caveat that it only holds for terms that are valid in standard Prolog. So I’m gonna downgrade this proposal from “omg why aren’t we doing this already” to “gosh it’d sure be nice if” :rofl:

And I do still think there’s value in it - in promoting rational trees from “unfortunate unavoidable side-effect” to “supported data structure” and consequently adding an unambiguous syntax to the representation to allow describing it - especially if what we want is for SWI-Prolog to be a language capable of describing (and thus analyzing, and thus optimizing) the internal behavior of software. (Consider: the internal, C-level data structures of swipl itself can themselves be modeled as Prolog terms, but only if cyclic and shared-subterm values are allowed.) You’re absolutely right that that’s a bigger discussion, though!

You may be interested to know that some of my inspiration for this comes from work I’ve been doing that wouldn’t even directly benefit from the syntax changes. In my quest for “more convenient programming of dicts” I wrote a library to support “formal dicts”, which are simply dicts with well-known, centrally-declared tags and key sets. They’re somewhat analogous to “classes” in OOP languages, in that by referencing them (“X is a book published in year Y”) you’re also implicitly referencing a bunch of other things (X also has an author, a title, a publisher, an ISBN, etc). It’s a prime use case for a dict (as I understand the dict use cases), but of course we run into the head-unification dilemma that originally inspired my partial dict RFC (linked above). In short:

A statement like “Cindy likes books written after 2000, anything by Heinlein, and anything published by Del Rey” could be represented with the following:

Example 1: Standard dicts with field operator
% Example 1: Standard dicts with field operator
book(book{title: "I Am Malala",
          author: 'Malala Yousafzai',
          publisher: 'Little, Brown & Co',
          year: 2013,
          isbn: 9780316322409,
          edition: 1}).

likes(cindy, Book) :- Book.year > 2000.
likes(cindy, Book) :- Book.author = 'Robert Heinlein'.
likes(cindy, Book) :- Book.publisher = 'Del Rey'.

% ?- book(B), likes(cindy, B).

Of course, since this has no head-unification, the SWI VM can’t optimize it at all, and this will get slower and slower the more “facts” (or, what should be facts but can’t be) are added to the database. To avoid that, you could name every key in the dict in every rule:

Example 2: Standard dicts with head unification
% Example 2: Standard dicts with head unification
likes(cindy, book{title: _, author: _, publisher: _, year: Y, isbn: _, edition: _}) :- Y > 2000.
likes(cindy, book{title: _, author: 'Robert Heinlein', publisher: _, year: _, isbn: _, edition: _}).
likes(cindy, book{title: _, author: _, publisher: 'Del Rey', year: _, isbn: _, edition: _}).

But of course, that gets massively unwieldy in the syntax, and that’s for a dict with only 6 keys. Even if you abandon dicts and just use vanilla compound terms, it’s a pretty heavy syntax:

Example 3: Compound terms with head unification
% Example 3: Compound terms with head unification

likes(cindy, book(_, _, _, Y, _, _)) :- Y > 2000.
likes(cindy, book(_, 'Robert Heinlein', _, _, _, _)).
likes(cindy, book(_, _, 'Del Rey', _, _, _)).

And of course, the legibility of this version is terrible and the maintainability is even worse. Want to add a new field to the book definition? Be sure to add , _ to the end of every single rule and fact in the source (and stylecheck won’t help you, since all of these rules are still likes/2). Want to remove a field? Good luck. But the actual code is much more efficient, since all of these can index on the head.

So I used term_expansion/2 to make a new syntax that allows predeclaring the keys that belong to a given dict tag, using :: as an infix operator:

Example 4: Formal dicts with head unification
% Example 4: Formal dicts with head unification
book::formal_dict_keys([title, author, publisher, year, isbn, edition]).

likes(cindy, book::{year: Y}) :- Y > 2000.
likes(cindy, book::{author: 'Robert Heinlein'}).
likes(cindy, book::{publisher: 'Del Rey'}).

This code is semantically identical to Example 2, but I think you’ll agree that the syntax is much more legible. The code is also much more maintainable, since the list of dict keys (like a list of class fields) is declared in just one place. And, since the unification is happening in the head, the Prolog VM can optimize the code that queries the database.

Performance note

As an aside, this approaches the performance I was envisioning for the first-class partial/formal dict support, though it doesn’t quite reach it. The place it falls short is that in this version, the VM must still compare every key name and ensure that it matches the skeleton. My thought was for a declaration like this to create, in this case, an arity-6 functor; this functor’s name would be, not an atom or a reserved symbol, but the compound term book(author, edition, isbn, publisher, title, year). As such, each of these head unifications would require only two checks: first, that the functor matches, and second, that the argument (which needs no runtime search when mentioned literally in the source like this) unifies.

There is one downside to this syntax, though, which is that it works wonderfully for cases like this where you want to specify some fields and silently discard the rest, but not so well when you want to match fields and you still want to have access to the full formal dict (to pass it along to another predicate expecting a formal dict, for example). This is where we see the “unify-and-remember in head” problem popping up again, because optimise_unify allows us to write the following if we, say, wanted to show a book record with special formatting if it was an old first edition:

Example 5: Formal dicts and optimise_unify flag
% Example 5: Formal dicts and optimise_unify flag
portray_book(Book) :-
    Book = book::{edition: 1, year: Y},
    Y < 1900,
    !,
    format('FIRST EDITION FROM ~w: ~p', [Book.year, Book]).
portray_book(Book) :-
    Book = book::{},
    format('Book: ~p', [Book]).

Of course, since I’m inventing the :: formal dict syntax, there’s nothing to stop me from adding something to it which also allows specifying a variable that represents the dict as a whole. So I did, extending the syntax from tag::{dict-keyval-syntax} to tag[::DictVar]::{dict-keyval-syntax} (with the [] meaning “optional”, not “Prolog list”). That gives us the following:

Example 6: Formal dicts with full-dict-variable syntax
% Example 6: Formal dicts with full-dict-variable syntax
portray_book(book::Book::{edition: 1, year: Y}) :-
    Y < 1900,
    !,
    format('FIRST EDITION FROM ~w: ~p', [Book.year, Book]).
portray_book(book::Book::{}) :-
    format('Book: ~p', [Book]).

This seems to have basically everything I’ve been looking for: partial dict syntax, unify-and-remember (at least for formal dicts), and optimal performance regardless of dict size. Right?

Well, that’s what I thought, but it turns out, not quite. And the reason why is directly related to (a) limitations imposed by doing this with term_expansion and (b) the compiler not knowing what to do with shared subterms. In my first attempt at implementing this full-dict-variable syntax, the expanded form looked basically like Example 5 above (with the keys written out in full), adding an explicit unification goal at the top of the body and taking advantage of the optimize_unify behavior to move the unification into the head.

However, this requires four (well, five really) implementations: one for a fact, one for a rule, one for a nonterminal, and one for an SSU rule (or two if supporting (?=>)/2, which I didn’t). And more importantly, we see this in the listing:

Example 6a: Implementation using optimise_unify
% Example 6a: Implementation using optimise_unify
:- op(90, xfy, ::).

kvs_dict({KVC}, Dict) :-
    comma_list(KVC, KVL),
    dict_create(Dict, _, KVL).
kvs_dict({}, _{}).

term_expansion(Term0, Term) :-
    Term0 = (portray_book(book::Book::KVS) :- Body),
    BSkel = book{author:_, edition:_, isbn:_, publisher:_, title:_, year:_},
    kvs_dict(KVS, BData),
    BData :< BSkel,
    Term = (portray_book(Book) :- (Book = BSkel, Body)).

% portray_book/2 from Example 6

?- listing(portray_book).
portray_book(Book) :-
    Book=book{author:_, edition:1, isbn:_, publisher:_, title:_, year:Y},
    Y<1900,
    !,
    '.'(Book, year, A),
    format('FIRST EDITION FROM ~w: ~p', [A, Book]).
portray_book(Book) :-
    Book=book{author:_, edition:_, isbn:_, publisher:_, title:_, year:_},
    format('Book: ~p', [Book]).

Since term_expansion happens prior to field/method handling and the unify is syntactically in the body, the field/method translator has no choice but to translate the reference to Book.year in the first clause literally, despite the fact that “we” know that Book.year has already been unified with Y.

So, rather than adding a synthetic goal (and having to deal with all the different formats, which I didn’t do in this example), I simply unified Book prior to compilation:

Example 6b: Implementation using a-priori unification
% Example 6b: Implementation using a-priori unification
term_expansion(Term0, Term) :-
    Term0 = (portray_book(book::Book::KVS) :- Body),
    Book = book{author:_, edition:_, isbn:_, publisher:_, title:_, year:_},
    kvs_dict(KVS, BData),
    BData :< BSkel,
    Term = (portray_book(Book) :- Body).

?- listing(portray_book).
portray_book(book{author:A, edition:1, isbn:B, publisher:C, title:D, year:Y}) :-
    Y<1900,
    !,
    format('FIRST EDITION FROM ~w: ~p',
           [ Y,
             book{ author:A,
                   edition:1,
                   isbn:B,
                   publisher:C,
                   title:D,
                   year:Y
                 }
           ]).
portray_book(book{author:A, edition:B, isbn:C, publisher:D, title:E, year:F}) :-
    format('Book: ~p',
           [ book{ author:A,
                   edition:B,
                   isbn:C,
                   publisher:D,
                   title:E,
                   year:F
                 }
           ]).

The field/method translator now has enough information to simplify out the field access, but now we run into problems with the compiler, which doesn’t know how to handle shared subterms - so it recreates the dict structure in full, doubling the length of the opcode list and significantly increasing the stack usage.

So yes, there you have it! It’s worth noting, the post-expansion clauses that get passed to the compiler in Example 6b are 100% identical to the clauses that get passed to the compiler in Example 5a, using subterm naming. So that’s where all of this - subterm naming, unify-and-remember, etc - started.


This actually dovetails with another idea I’ve had (no, I haven’t run out of ideas to radically modify the SWI VM, why do you ask :rofl:) - and that’s expanded control over operator semantics. At present, the only things you can do with operator definitions are “translate this operator and its operands into a compound Prolog term”. With expanded control, however, perhaps via a predicate op/4 or the like, we could describe additional behaviors, like “apply a name to this subterm” (your idea), or “introduce a line comment” (if you wanted to be able to use // for comments) or for that matter “don’t introduce a line comment” (if you wanted to be able to use % as modulo, like in C). Just like all other operator definitions, these would be local to a given module and only affect literal syntax - in other words, it would only take effect when read_term/2 was instructed to use that module’s operator definitions.