Inline subterm naming syntax (was: Syntax for unify-and-remember in head?)

Boris · September 2, 2021, 2:24pm

Thank you, I actually did try it out before posting Either way this would mean that the docs have a bit of a bug I guess?

jan · September 2, 2021, 2:47pm

Yes. The docs should at least point at !/0 Probably there is a bit too much to say about these control structures to describe them as a normal predicate. If you want to know about them as predicates use

?- edit(;).

The comment also explains why the true predicates exist.

dmchurch · September 2, 2021, 9:30pm

Apologies! I should have clarified: B_INLINEVAR is my own invention, to parallel the similarly-new H_INLINEVAR I mentioned in the previous bullet point. B_UNIFY_FIRSTVAR is defined on line 1243 of src/pl-vmi.c; the display_record/2 examples don’t give any examples of named subterms in the body, so let me provide you with one that shows the opcode (and by the way, I love your notes!):

shared_subterm(Term) :-
    Subterm = f(0),
    Term = term(Subterm, Subterm).

This actually comes from a small module I’ve made that provides a simulation (and only a simulation) of the subterm-naming functionality, using term_expansion/2 and expand_query/4. The simulation uses {Var} Term in place of (Var) Term so that it works with the SWI-Prolog parser (though note that “compatibility with the existing SWI-Prolog parser” is by definition a showstopping defect as far as this proposal is concerned, which is part of why this is only a simulation), under the cut:

inline_subterm_names.pl

:- module(inline_subterm_names, [
    increment_arg/2,
    shared_subterm/2,
    test_inline_names/0,
    op(650, fy, {})
]).
:- use_module(library(terms), [mapsubterms/3]).

extract_inline_subterm(ITerms, {Var} Subterm, Term) =>
    var(Var), % must be unused variable, but we can't actually enforce that
    Term = Subterm,
    arg(1, ITerms, ITail),
    setarg(1, ITerms, [Var = Term|ITail]).
extract_inline_subterm(_, _, _) => fail.

extract_inline_subterms(Term0, Term, Inlines) :-
    ITerms = inlines([]),
    arg(1, ITerms, Tail),
    mapsubterms(extract_inline_subterm(ITerms), Term0, Term),
    arg(1, ITerms, Inlines),
    Inlines \== Tail.

expand_inline_names(Term0, Term) :-
    extract_inline_subterms(Term0, Term, Inlines),
    maplist(call, Inlines).

:- multifile user:term_expansion/2, user:expand_query/4.
user:expand_query(Goal, Expanded, Bindings, ExpandedBindings) :-
    expand_inline_names(Goal, Expanded),
    Bindings = ExpandedBindings.

user:term_expansion(Term0, Term) :-
    expand_inline_names(Term0, Term).

report_success(Goal) :-
    (   Goal
    ->  format(' (SUCCESS)~n', [])
    ;   format(' (FAILURE)~n', [])
    ).

:- set_prolog_flag(optimise_unify, false).
increment_arg(inline, {Arg} f(InitVal)) :-
    NextVal is InitVal + 1,
    setarg(1, Arg, NextVal).

increment_arg(body_nonopt, Arg) :-
    Arg = f(InitVal),
    NextVal is InitVal + 1,
    setarg(1, Arg, NextVal).

:- set_prolog_flag(optimise_unify, true).
increment_arg(body_opt, Arg) :-
    Arg = f(InitVal),
    NextVal is InitVal + 1,
    setarg(1, Arg, NextVal).

increment_arg(nonunify, Arg) :-
    arg(1, Arg, InitVal),
    NextVal is InitVal + 1,
    setarg(1, Arg, NextVal).


:- set_prolog_flag(optimise_unify, false).
shared_subterm(inline, Term) :-
    Term = term({Subterm} f(0), Subterm).

shared_subterm(body_nonopt, Term) :-
    Subterm = f(0),
    Term = term(Subterm, Subterm).

:- set_prolog_flag(optimise_unify, true).
shared_subterm(body_opt, Term) :-
    Subterm = f(0),
    Term = term(Subterm, Subterm).

shared_subterm(nonunify, Term) :-
    Term = term(f(0), 'INVALID'),
    arg(1, Term, Subterm),
    nb_linkarg(2, Term, Subterm).

test_inline_args(Mode) :-
    TestArg = f(0),
    format('Calling increment_arg(~p, ~p), current stored value is 0...', [Mode, TestArg]),
    increment_arg(Mode, TestArg),
    TestArg = f(Final),
    format('final argument is ~p, with stored value ~p (expecting 1).', [TestArg, Final]),
    report_success(Final == 1).

test_shared_subterms(Mode) :-
    shared_subterm(Mode, Term),
    arg(1, Term, Sub1),
    arg(2, Term, Sub2),
    arg(1, Sub1, Sub1Arg),
    arg(1, Sub2, Sub2Arg),
    format('Created term for mode ~p is ~p, subterm args are ~p and ~p, testing same_term(~p, ~p)...', [Mode, Term, Sub1Arg, Sub2Arg, Sub1, Sub2]),
    report_success(same_term(Sub1, Sub2)),
    format('  ...incrementing subterm 1 arg using increment_arg(nonunify, ~p)...', [Sub1]),
    increment_arg(nonunify, Sub1),
    arg(1, Sub1, Sub1Arg_),
    arg(1, Sub2, Sub2Arg_),
    format('new term ~p, subterm args ~p and ~p (expecting 1 and 1).', [Term, Sub1Arg_, Sub2Arg_]),
    report_success((Sub1Arg_ == 1, Sub2Arg_ == 1)).

test_inline_names :-
    test_inline_args(body_nonopt),
    test_inline_args(body_opt),
    test_inline_args(nonunify),
    test_inline_args(inline),
    nl,
    test_shared_subterms(body_nonopt),
    test_shared_subterms(body_opt),
    test_shared_subterms(nonunify),
    test_shared_subterms(inline).

This has some builtin test cases showing a couple tasks with various implementation modes, which you can run as test_inline_names:

inline: the inline subterm naming proposal, which obviously fails since SWI-Prolog doesn’t support it.
body_opt: separated subterm unification in the body, with optimise_unify turned on. This is a close representation of the VM code that the SWI compiler would produce for inline if it were supported.
body_nonopt: identical to above, but with optimise_unify turned off.
nonunify: an implementation that avoids unification entirely by using setarg/3 etc.

And, while the SWI compiler doesn’t support inline naming and thus none of the compiled inline predicates will function, you can see how they would function by putting the clause bodies directly into your query, since that sidesteps those compiler limitations:

?- ( % body of shared_subterm(inline, Term):
       Term = term({Subterm} f(0), Subterm)
   ),
   arg(1, Term, Sub1),
   arg(2, Term, Sub2),
   same_term(Sub1, Sub2), % both args are the same subterm
   Sub1-Sub2 == f(0)-f(0), % both subterms have argument 0

   Subterm = Arg, Arg = f(InitVal), % arg unification, can't emulate in single query :(
   ( % body of increment_arg(inline, Subterm)/1
       NextVal is InitVal + 1,
       setarg(1, Arg, NextVal)
   ),
   Sub1-Sub2 == f(1)-f(1). % both subterms now have argument 1

Term = term(f(1), f(1)),
Subterm = Sub1, Sub1 = Sub2, Sub2 = f(1),
InitVal = 0,
NextVal = 1.

In my eyes, this sentence says it all, really. The way I interpret it is something along these lines:

There is a commonly-known impedance mismatch (if you’ll forgive an electronics engineering term) between what is representable in the Prolog runtime versus what is representable in the Prolog syntax.
This mismatch causes problems that are serious enough that implementations have devised workarounds that allow them to operate properly in the presence of such terms while still adhering to standard Prolog syntax in their input/output.
This mismatch is common enough that such workarounds have been created not just once, but multiple times.

Among the consequences of this are two I consider significant:

If a Prolog implementation serializes a pure-Prolog value using write_canonical/1, and that value does not have any cyclic subterms, then another Prolog implementation can probably read that serialization using read/1 and reconstruct the same term or a variant thereof.
If a Prolog implementation serializes such a value using write_canonical/1, and that value DOES have cyclic subterms, then another Prolog implementation can probably read that serialization using read/1, but the reconstructed term will NOT be the same.

Personally, I consider #2 to be an underspecification failure. In my opinion, the only two proper outcomes of a write_canonical/1 ⇒ read/1 round-trip should be (a) the receiving application successfully reconstructs (a variant of) the sender’s term, or (b) the receiving application throws an error from read/1.

For basically every data type other than “cyclic term”, SWI-Prolog adheres to this round-trip contract of “naive/pure implementation parses correctly or doesn’t parse at all”:

Data type	SWI-Prolog write_canonical/1	SWI read/1	Naïve read/1
dict	`mydict{key:value}`	Success	Syntax error
blob	`<clause>(0x5628bc649000)`	Syntax error¹	Syntax error¹
string	`"hello world"`	Success	SMIR²
mpz	`1234567890123…890`	Success	SMIR or overflow error
term	`f(A,A)`	Success	Success
cyclic term	`@(_30,[_30=f(_30)])`	Success iff `cycles(true)`	succeeds with wrong term³

- contract upheld
- contract partially upheld
- contract violated

¹ Blobs are by nature write-only, though as an edge case this turns into a successful read (and, arguably, a contract violation) if the receiver has < defined as a prefix operator.
² “success modulo internal representation” - it’s the same value but the receiving system can’t put it into the same form because of storage limitations, like “has no concept of string values” or “floating-point format doesn’t have that much precision”. Another Prolog implementation may parse "hello world" as a list of codes or char atoms, but it’s still ten lowercase English letters with a single space in the middle.
³ The written term is a compound term with arity 1, name f, and whose first argument is also a compound term. A naïve read (including a read from SWI-Prolog, unless cycles(true) is passed to read_term/2) generates a compound term with arity 2, name @, whose first argument is unbound. Contrast the example directly above this one, which will always be a compound term with name f, arity 2, both arguments unbound but sharing identity. Also, consider that write_canonical/2 doesn’t serialize dicts as @(D_1, [dict_create(D_1, mydict, [key-value])]), despite that also being a valid way to represent a SWI dict term.

I would love if other Prolog systems implemented this! (Not necessarily the compiler optimizations, I don’t have any stake in how their engines work, but the named-subterm syntax yes.) Heck, I personally think it ought to go into the ISO standard if they ever get around to updating it. However, I don’t have any contact with any of those people, so I’m saying it here instead

It’s worth noting that, as far as pure-logic Prolog is concerned, all the compiler changes I mentioned are only optimizations and have no semantic impact whatsoever—the only time this makes any semantic difference is when dealing with extra-logical predicates like same_term/2 or setarg/3. And, given that they are optimizations, I would have quite happily written the compiler/engine code to perform them already, except that there’s no syntax I can use to express the basic concept in the source! Then I started to ask myself, “wait, these are all internally-consistent and valid Prolog terms, so why can’t I represent them in the source…?” with the end result you see here

jan · September 3, 2021, 12:01pm

Thanks for the write up. It makes a lot of sense. There is, as usual, also some context and history First of all, shared subterms and cyclic terms are very different beasts. We should discuss them separately. Let me start with the cyclic ones.

A cyclic terms doesn’t exists in the logical reading of Prolog. X = f(X) should simply fail and thus there is no way to create such a beast. Unfortunately this is unpractical as it cannot be implemented efficiently. ISO introduced unify_with_occurs_check/2. SWI-Prolog added the occurs_check flag that you can use to do the classical Prolog thing, implicitly check for cycles everywhere and then either fail (the logic meaning) or raise an exception (nice for debugging). In fact, many programs run fine with occurs check enabled, suffering only a barely measurable performance impact. It is still the case that it is not that hard to create programs that turn O(n^2) from O(n) and suffer badly. Such programs also exist in the wild. As a consequence, X = f(X) succeeds in virtually any Prolog system.

If I recall well there has been a write up on how various Prolog implementations the resulting cyclic term. That varies. A small minority (among SWI-Prolog and notably SICStus) behave pretty well. That means all built-in predicates do something sensible with a cyclic term, meaning they do not crash and do not go into an infinite loop (if not, this is a bug). The sensible thing is either a correct answer or an exception if there is no sensible answer. Some systems avoid at least the toplevel from filling your screen ad infinitum. Many crash or loop on simple things such as

?- X = f(X), Y = f(Y), X = Y.

Implemented properly, cyclic terms are not a huge challenge for the Prolog built-ins. There is one exception, and that is write/1 and friends. Ignoring the fact that the term is cyclic is obviously not an option. Raising an exception is not a great idea either So SICStus came up with the @(Skeleton, Unifications) solution that SWI-Prolog followed. The choice is (we have seen it) ambiguity or doing something outside the standard. SICStus is serious about the standard.

Now shared (sub) terms. There is no logical problem with those. Logically they are the same as a term where the shared subterm appears in multiple places. So

?- X = f(a), T = f(X,X), T == f(f(a), f(a)).

is true. In fact, there is no way to tell the two apart in pure Prolog. SWI-Prolog is (possibly) the only system providing same_term/2 that allows you to figure out that two terms actually live on the same address. That is mainly there to assess the consequences of setarg/3 and friends. Typically it is a good idea to add a variable at the end of a compound you use for setarg/3, so copy_term/2 actually creates a copy.

Because the terms are indistinguishable Prolog has some freedom. If you write the clause below, q/1 may be passed the argument you pass to p/1 or an identical new term may be passed. copy_term/2 may not really copy ground (sub) terms (SWI-Prolog does not create a copy for these). Even findall(X, member(X, [f(a)]), Xs) may or may not copy f(a). There have been discussions about GC discarding duplicate terms, sharing a single copy. I’m not aware of any system actually doing this. Besides finding the duplicates you must also prove they remain the same under backtracking.

p(f(X)) :- q(f(X)).

A lot of the built-ins act cleverly on terms with shared sub terms. That is primarily so because you get that for free if you want to handle cyclic terms without looping or crashing . As Prolog code cannot distinguish them from just identical subterms it just does its normal walk over the share terms, typically processing them multiple times.

All this means there is not much opportunity to really use shared subterms. They sometimes emerge. Sometimes some operation causes them to be copied, etc. Consistently using them to express something like a double linked list is possible, but really hard and in general non-portable. For all the above reasons with all the choices it is not that likely to even become portable as well. So called mutable data structures, though available in many Prolog systems, has been discussed for ages in the ISO committee without any hope for consensus

Where does this leave us? The above makes me wary about using sharing in clause terms for deciding how to optimize the clause and (thus) where exactly copying does (not) take place. Note that we can adjust read/1 as you propose. After read/1 though we get term and goal expansion that may easily replace your shared subterm with a copy.

We also have the contract between write/1 and read/1 that should ideally always result in a variant. Now duplicating a shared term maintains this relation, but for a cyclic term we are in trouble. As we have seen, we should typically not attach meaning to sharing, so this is no more than wasting bandwidth in communication and space on the receiving side. But then, terms with sharing can typically be avoided for communication purposes. What about cyclic terms? Although not widely supported, there are cases where they are useful. As is, we must carry them across using an option to read/2 and causing an ambiguity. A safe way to carry them across in portable way is not that relevant as most random Prolog systems will crash on them anyway

For SWI-Prolog you can use term_factorized/3 to reliably and efficiently transfer cyclic and shared terms. This predicate smashes identical sub terms. There is also an internal version that does the same trick but returns the existing physical sharing rather than the logical one (and is a lot faster ).

As for Prolog programmers, what is shared and what now should not matter. You can write your clause using explicit unification to indicate you demand sharing. Although not implemented, I think the Prolog compiler should be allowed to use sharing for identical sub terms in a clause (as done by at least XSB). If you really want no sharing, add an additional variable to the data and/or (in SWI-Prolog) use an explicit duplicate_term/2.

jan · September 3, 2021, 1:11pm

It actually compiles it rather cleverly. Too clever for the decompiler though

edit With 867d3a39460c2ae2c9046c424c15d76758592dd6 we get the correct result.

?- listing(foo).
:- dynamic foo/1.

foo(A) :-
    A=f(A).

grossdan · September 5, 2021, 5:53am

While much of the discuss here goes quite above my head – i wonder if a solution could be envisioned with scoping of prolog flags.

For example to enclose Record@record(_,_,_,_) into a scope where the @ operator is disambiguated through a directive – in one way or another (e.g. via precedence setting for scope only).

Although, i guess, the scope also has to be on client side only – so a call will likely need to revert back to a a prior flag lest the other operator is encountered.

dmchurch · September 8, 2021, 4:39pm

Aha! This makes it fall into place - everything about the existing functionality makes perfect sense when you look at cyclic terms as bug rather than as feature (I think at one point I knew that the “proper” Prolog behavior was to disallow creation of cyclic terms, but sometime between then and now I forgot that bit, ha.) And just like that our write/read contract is solid again, with the entirely reasonable caveat that it only holds for terms that are valid in standard Prolog. So I’m gonna downgrade this proposal from “omg why aren’t we doing this already” to “gosh it’d sure be nice if”

And I do still think there’s value in it - in promoting rational trees from “unfortunate unavoidable side-effect” to “supported data structure” and consequently adding an unambiguous syntax to the representation to allow describing it - especially if what we want is for SWI-Prolog to be a language capable of describing (and thus analyzing, and thus optimizing) the internal behavior of software. (Consider: the internal, C-level data structures of swipl itself can themselves be modeled as Prolog terms, but only if cyclic and shared-subterm values are allowed.) You’re absolutely right that that’s a bigger discussion, though!

You may be interested to know that some of my inspiration for this comes from work I’ve been doing that wouldn’t even directly benefit from the syntax changes. In my quest for “more convenient programming of dicts” I wrote a library to support “formal dicts”, which are simply dicts with well-known, centrally-declared tags and key sets. They’re somewhat analogous to “classes” in OOP languages, in that by referencing them (“X is a book published in year Y”) you’re also implicitly referencing a bunch of other things (X also has an author, a title, a publisher, an ISBN, etc). It’s a prime use case for a dict (as I understand the dict use cases), but of course we run into the head-unification dilemma that originally inspired my partial dict RFC (linked above). In short:

A statement like “Cindy likes books written after 2000, anything by Heinlein, and anything published by Del Rey” could be represented with the following:

Example 1: Standard dicts with field operator

% Example 1: Standard dicts with field operator
book(book{title: "I Am Malala",
          author: 'Malala Yousafzai',
          publisher: 'Little, Brown & Co',
          year: 2013,
          isbn: 9780316322409,
          edition: 1}).

likes(cindy, Book) :- Book.year > 2000.
likes(cindy, Book) :- Book.author = 'Robert Heinlein'.
likes(cindy, Book) :- Book.publisher = 'Del Rey'.

% ?- book(B), likes(cindy, B).

Of course, since this has no head-unification, the SWI VM can’t optimize it at all, and this will get slower and slower the more “facts” (or, what should be facts but can’t be) are added to the database. To avoid that, you could name every key in the dict in every rule:

Example 2: Standard dicts with head unification

% Example 2: Standard dicts with head unification
likes(cindy, book{title: _, author: _, publisher: _, year: Y, isbn: _, edition: _}) :- Y > 2000.
likes(cindy, book{title: _, author: 'Robert Heinlein', publisher: _, year: _, isbn: _, edition: _}).
likes(cindy, book{title: _, author: _, publisher: 'Del Rey', year: _, isbn: _, edition: _}).

But of course, that gets massively unwieldy in the syntax, and that’s for a dict with only 6 keys. Even if you abandon dicts and just use vanilla compound terms, it’s a pretty heavy syntax:

Example 3: Compound terms with head unification

% Example 3: Compound terms with head unification

likes(cindy, book(_, _, _, Y, _, _)) :- Y > 2000.
likes(cindy, book(_, 'Robert Heinlein', _, _, _, _)).
likes(cindy, book(_, _, 'Del Rey', _, _, _)).

And of course, the legibility of this version is terrible and the maintainability is even worse. Want to add a new field to the book definition? Be sure to add , _ to the end of every single rule and fact in the source (and stylecheck won’t help you, since all of these rules are still likes/2). Want to remove a field? Good luck. But the actual code is much more efficient, since all of these can index on the head.

So I used term_expansion/2 to make a new syntax that allows predeclaring the keys that belong to a given dict tag, using :: as an infix operator:

Example 4: Formal dicts with head unification

% Example 4: Formal dicts with head unification
book::formal_dict_keys([title, author, publisher, year, isbn, edition]).

likes(cindy, book::{year: Y}) :- Y > 2000.
likes(cindy, book::{author: 'Robert Heinlein'}).
likes(cindy, book::{publisher: 'Del Rey'}).

This code is semantically identical to Example 2, but I think you’ll agree that the syntax is much more legible. The code is also much more maintainable, since the list of dict keys (like a list of class fields) is declared in just one place. And, since the unification is happening in the head, the Prolog VM can optimize the code that queries the database.

Performance note

As an aside, this approaches the performance I was envisioning for the first-class partial/formal dict support, though it doesn’t quite reach it. The place it falls short is that in this version, the VM must still compare every key name and ensure that it matches the skeleton. My thought was for a declaration like this to create, in this case, an arity-6 functor; this functor’s name would be, not an atom or a reserved symbol, but the compound term book(author, edition, isbn, publisher, title, year). As such, each of these head unifications would require only two checks: first, that the functor matches, and second, that the argument (which needs no runtime search when mentioned literally in the source like this) unifies.

There is one downside to this syntax, though, which is that it works wonderfully for cases like this where you want to specify some fields and silently discard the rest, but not so well when you want to match fields and you still want to have access to the full formal dict (to pass it along to another predicate expecting a formal dict, for example). This is where we see the “unify-and-remember in head” problem popping up again, because optimise_unify allows us to write the following if we, say, wanted to show a book record with special formatting if it was an old first edition:

Example 5: Formal dicts and optimise_unify flag

% Example 5: Formal dicts and optimise_unify flag
portray_book(Book) :-
    Book = book::{edition: 1, year: Y},
    Y < 1900,
    !,
    format('FIRST EDITION FROM ~w: ~p', [Book.year, Book]).
portray_book(Book) :-
    Book = book::{},
    format('Book: ~p', [Book]).

Obviously, this is a prime use case for the inline subterm naming syntax specified above:
Example 5a: Formal dicts and inline subterm naming
% Example 5a: Formal dicts and inline subterm naming
portray_book((Book) book::{edition: 1, year: Y}) :-
    Y < 1900,
    !,
    format('FIRST EDITION FROM ~w: ~p', [Book.year, Book]).
portray_book((Book) book::{}) :-
    format('Book: ~p', [Book]).

Of course, since I’m inventing the :: formal dict syntax, there’s nothing to stop me from adding something to it which also allows specifying a variable that represents the dict as a whole. So I did, extending the syntax from tag::{dict-keyval-syntax} to tag[::DictVar]::{dict-keyval-syntax} (with the [] meaning “optional”, not “Prolog list”). That gives us the following:

Example 6: Formal dicts with full-dict-variable syntax

% Example 6: Formal dicts with full-dict-variable syntax
portray_book(book::Book::{edition: 1, year: Y}) :-
    Y < 1900,
    !,
    format('FIRST EDITION FROM ~w: ~p', [Book.year, Book]).
portray_book(book::Book::{}) :-
    format('Book: ~p', [Book]).

This seems to have basically everything I’ve been looking for: partial dict syntax, unify-and-remember (at least for formal dicts), and optimal performance regardless of dict size. Right?

Well, that’s what I thought, but it turns out, not quite. And the reason why is directly related to (a) limitations imposed by doing this with term_expansion and (b) the compiler not knowing what to do with shared subterms. In my first attempt at implementing this full-dict-variable syntax, the expanded form looked basically like Example 5 above (with the keys written out in full), adding an explicit unification goal at the top of the body and taking advantage of the optimize_unify behavior to move the unification into the head.

However, this requires four (well, five really) implementations: one for a fact, one for a rule, one for a nonterminal, and one for an SSU rule (or two if supporting (?=>)/2, which I didn’t). And more importantly, we see this in the listing:

Example 6a: Implementation using optimise_unify

% Example 6a: Implementation using optimise_unify
:- op(90, xfy, ::).

kvs_dict({KVC}, Dict) :-
    comma_list(KVC, KVL),
    dict_create(Dict, _, KVL).
kvs_dict({}, _{}).

term_expansion(Term0, Term) :-
    Term0 = (portray_book(book::Book::KVS) :- Body),
    BSkel = book{author:_, edition:_, isbn:_, publisher:_, title:_, year:_},
    kvs_dict(KVS, BData),
    BData :< BSkel,
    Term = (portray_book(Book) :- (Book = BSkel, Body)).

% portray_book/2 from Example 6

?- listing(portray_book).
portray_book(Book) :-
    Book=book{author:_, edition:1, isbn:_, publisher:_, title:_, year:Y},
    Y<1900,
    !,
    '.'(Book, year, A),
    format('FIRST EDITION FROM ~w: ~p', [A, Book]).
portray_book(Book) :-
    Book=book{author:_, edition:_, isbn:_, publisher:_, title:_, year:_},
    format('Book: ~p', [Book]).

Since term_expansion happens prior to field/method handling and the unify is syntactically in the body, the field/method translator has no choice but to translate the reference to Book.year in the first clause literally, despite the fact that “we” know that Book.year has already been unified with Y.

So, rather than adding a synthetic goal (and having to deal with all the different formats, which I didn’t do in this example), I simply unified Book prior to compilation:

Example 6b: Implementation using a-priori unification

% Example 6b: Implementation using a-priori unification
term_expansion(Term0, Term) :-
    Term0 = (portray_book(book::Book::KVS) :- Body),
    Book = book{author:_, edition:_, isbn:_, publisher:_, title:_, year:_},
    kvs_dict(KVS, BData),
    BData :< BSkel,
    Term = (portray_book(Book) :- Body).

?- listing(portray_book).
portray_book(book{author:A, edition:1, isbn:B, publisher:C, title:D, year:Y}) :-
    Y<1900,
    !,
    format('FIRST EDITION FROM ~w: ~p',
           [ Y,
             book{ author:A,
                   edition:1,
                   isbn:B,
                   publisher:C,
                   title:D,
                   year:Y
                 }
           ]).
portray_book(book{author:A, edition:B, isbn:C, publisher:D, title:E, year:F}) :-
    format('Book: ~p',
           [ book{ author:A,
                   edition:B,
                   isbn:C,
                   publisher:D,
                   title:E,
                   year:F
                 }
           ]).

The field/method translator now has enough information to simplify out the field access, but now we run into problems with the compiler, which doesn’t know how to handle shared subterms - so it recreates the dict structure in full, doubling the length of the opcode list and significantly increasing the stack usage.

So yes, there you have it! It’s worth noting, the post-expansion clauses that get passed to the compiler in Example 6b are 100% identical to the clauses that get passed to the compiler in Example 5a, using subterm naming. So that’s where all of this - subterm naming, unify-and-remember, etc - started.

This actually dovetails with another idea I’ve had (no, I haven’t run out of ideas to radically modify the SWI VM, why do you ask ) - and that’s expanded control over operator semantics. At present, the only things you can do with operator definitions are “translate this operator and its operands into a compound Prolog term”. With expanded control, however, perhaps via a predicate op/4 or the like, we could describe additional behaviors, like “apply a name to this subterm” (your idea), or “introduce a line comment” (if you wanted to be able to use // for comments) or for that matter “don’t introduce a line comment” (if you wanted to be able to use % as modulo, like in C). Just like all other operator definitions, these would be local to a given module and only affect literal syntax - in other words, it would only take effect when read_term/2 was instructed to use that module’s operator definitions.

dmchurch · September 8, 2021, 5:37pm

I mean you’re right, certainly, but I’m not sure how that’s relevant here. Head unification isn’t sufficient for speed improvement, no, but it is required, and this topic and proposed feature doesn’t discuss or suggest indexing methods. SWI-Prolog already has deep-indexing as a feature, and that’s good enough for me.

dmchurch · September 8, 2021, 7:36pm

Again, you’re absolutely correct, but what that does is to move more implementation details into the Prolog code. If my goal were to run a particular application as efficiently as possible on the existing SWI-Prolog VM, I would probably do something exactly like what you’re suggesting.

However, I don’t particularly care about a particular application, as my goal is to make all applications run more efficiently on the SWI VM. Your example requires that the programmer decide at programming time that one of these fields is a better primary key than any other. What if the programmer is wrong? ISBN is shared amongst reprints of books, what if our database needs to be able to hold and distinguish between different printings of the same book? What if the author or publisher or warehouser ends up using our database to store all the physical copies of the book individually, and thus there are only a few ISBNs amongst tens of thousands of entries? Then ISBN becomes a terrible field to key on, and there’s no way for you as the programmer to predict that.

The ethos of Prolog, such as there is and such as I understand and interpret it, is that programmers shouldn’t need to specify how things need to get done, only what things need to get done. Reality always falls short of the ideal, though, and that’s as true for Prolog implementations as anything else, but that doesn’t mean I’m not always going to be trying to push things in that direction.

Consider this. What is the definition of a Prolog “fact”? A fact is a Prolog term that has been added to the database, such that if you issue a query (either at the ?- prompt or via call/1) and the query term unifies with that fact, then the query succeeds. The database we’re considering here in these examples isn’t trying to establish ownership or series ordering or anything like that; the only thing that this database asserts to be true is “these are books”. Or, more literally, “these value-groups are valid combinations of the fields title, author, publisher, year, isbn, and edition, in the context of ‘book’”.

Viewed that way, even the fact that these dicts are wrapped in book/1 functors is superfluous. Dicts are, themselves, compound terms. From a theoretical standpoint, there’s no reason you couldn’t write this in your source file:

book{title: "I Am Malala",
     author: 'Malala Yousafzai',
     publisher: 'Little, Brown & Co',
     year: 2013,
     isbn: 9780316322409,
     edition: 1}.

This doesn’t work in SWI-Prolog, but not because the theory is unsound; it fails because SWI has implemented an additional restriction that forbids the use of dicts as clause heads. If not for that, we could write predicates whose arguments were named, rather than positional. And at that point your predicates become trivial to the point of being unnecessary:

assertz_book(Book) :- assertz(Book).
fetch_book(Select) :- call(Select).
retract_book(Select) :- retract(Select).

(Mind you, I probably would still wrap this in a book/1, but that’s mainly just defensive programming.)

And, just in case you’re thinking that it’s absurd to use a data structure as a predicate head, remember that in Prolog, code is data. You can use a compound term for its value, then turn around and use that same term as a goal. You can even call a list and define rules with a list as the head, though I wouldn’t recommend making that a usual part of your coding practices.

Digression: callable lists

Calling lists is common practice, of course:

?- L=[X], X = user, L.
|: ^D% user://1 compiled 0.03 sec, 0 clauses
L = [user],
X = user.

And you can even assert and retract lists, though that causes conflicts with the above:

?- assert([a,b,c,d]).
true.

?- listing('[|]').
:- meta_predicate[:|+].

system:[X] :-
    !,
    consult(X).
system:[M:F|R] :-
    consult(M:[F|R]).

%   NOTE: system definition has been overruled for '[|]'/2
:- dynamic'[|]'/2.

[a, b, c, d].

true.

?- [X,Y|Z].
X = a,
Y = b,
Z = [c, d].

?- retract([H0,H1|T]).
H0 = a,
H1 = b,
T = [c, d].

Dicts, on the other hand, would seem to make fantastic facts and clause heads, because a dict’s tag seems a lot like a functor’s name, but sadly that’s not the case; SWI dicts are compound terms, yes, but at present they all share the same functor name. That means that, if you were able to write rules with dict heads, then all dicts of the same size would be part of the same, massive predicate. And that wouldn’t be fantastic.

dmchurch · September 9, 2021, 1:14am

anon95304481:

Jan W. did that already, maybe only partially?
For this purpose a dict is translated into a goal using goal_expansion/2 . So, we can define a traditional Prolog predicate as below and list it to see what happens:
apple_stock_price(Date, Price) :-
   apple{'AAPL_x':Date, 'AAPL_y':Price}.

Not quite. When you’re talking about goal_expansion/2, what you’re doing is translating an arbitrary prolog term into a compound term that refers to a valid predicate. The input term can be whatever you want it to be. As a frivolous example, I can cause all integers to be treated as goals:

goal_expansion(I, format('~NCalled int ~p', [I])) :- integer(I).

?- 1.
Called int 1
true.

That doesn’t mean that integers are valid goals, just that Prolog’s macro expansion functionality is quite flexible. I couldn’t assert an integer into the database, for example, or use clause/2 to get a reference to 1 despite the fact that it “looks” like I’m treating the integer as a goal.

What I was describing was the idea of the dict itself actually being asserted in the database, exactly as any other compound term. And at that point, you can indeed simply assert, retract, and likewise on the dict itself.

If that seems odd, consider that the typical way to represent complex data objects is as a compound term with positional arguments, like using point(0, 0) to represent the origin of the 2D Cartesian plane. Yes, you would typically only use that value as data being passed between functions, because points are usually just a small aspect of the domain being modeled. But what if you were, say, writing a program that did logic manipulation of points along an elliptic curve? Well, then it might make a lot of sense to have in the database a registry of valid points, and asserting facts of the simple form point(X, Y) suddenly isn’t as unreasonable.

Code is data, data is code. If you have a data structure that specifies both a domain and a value within that domain, then it’s a reasonable candidate for a database fact. (That’s the reason that bare integers aren’t candidates for facts: they specify a value, but not the domain.) Hope that makes sense!

grossdan · September 9, 2021, 5:37am

First, i very much appreciate your discussions - i always learn a lot from them.

I wonder what the status of dict is in swi-prolog – is it a new type of term such as atom, compound, list, – or is it a short-hand for a json structure that could be represented by the other terms but that was optimized via special VM treatment as well.

I guess the question you are raising is, that if dict is already known to the VM then why not take it the whole way and make it a first class term.

Dan

jan · September 9, 2021, 7:54am

There is surely some value in that. I once used them to solve one of the ICLP programming context problems and also used them at some point to solve some graph isomorphism problem in real life. The are still difficult beasts though. They are typically hard to handle with Prolog code as it is hard to keep track where you have been and thus you easily end up with an infinite loop. Also note that X = f(X) and X = f(f(X)) are physically different terms, but they cannot be distinguished (they unify and test equal under ==)

This gets pretty close to that ECLiPSe is doing. They declare fields for a named struct (say book) and than translate book{author:'Robert Heinlein'} to book(_, _, 'Robert Heinlein', _, _). Now all the usual unification and optimization works. ECLiPSe implements this as read macros, a concept that SWI-Prolog doesn’t have. Not sure I want it either, though I’m happy to discuss the pros and cons and see where it goes. Also Ciao has some package that allows writing compounds while specifying only some fields by name. This is quite different from what SWI-Prolog dicts aim at. They exist to have an efficient key/value structure that has a syntax. I.e., they are in spirit similar to library(assoc), but assocs have horrible syntax. Dicts are also way more efficient in storage and lookup. They only loose on inserts with over about 1,000 k/v pairs.

Possibly the “formal” dict approach can be unified with ECLiPSe compatibility. All in all, The tag in SWI-Prolog dicts is barely used AFAIK. It was main required to avoid ambiguity.

jan · September 9, 2021, 8:08am

Mostly because it leads to a lot of confusion (and as you point out, collision). Note that there is no reason why efficient unification of dicts in the head cannot work. The problem is more that book{title:x} and book{title:X,author:a} to not unify. The clause indexing code is fairly modular and nothing prevents it from detecting that some argument only holds dicts and do something smart to index on particular keys. That doesn’t solve the above unification issue though. Either mapping to a dict with a consistent set of keys or a compound are the only ways out. Otherwise we need to redefine unification for dicts to do essentially what (>:<)/2 does. But then A=B, A==B no longer holds for dicts

Boris · September 9, 2021, 11:17am

Did you check out the docs yet?

grossdan · September 9, 2021, 12:00pm

Yes, i did …
“SWI-Prolog version 7 introduces dicts as an abstract object …”

It’s referred to as an “abstract object” …

Dan

Boris · September 9, 2021, 1:08pm

Keep reading or jump straight to the last subsection, “Implementation notes about dicts”.

You can also try to use =.. to see what the term is made of, really. Then try out the different type checking predicates to figure out what is the type of the name.

dmchurch · September 14, 2021, 1:59pm

Oh, I want to be clear, I think the SWI restriction against dicts as clause heads is a good one! Because of confusion and collision of course, and even more because the semantics of dicts aren’t fixed and well-defined. The consequences of allowing dicts as rule heads and then changing anything about their semantics or even their internal representation are much, much worse than doing the same when their use has been restricted to a few limited cases. The only reason I brought it up was to explain why I’d chosen to wrap the dict in a functor in my example!

I can’t help but nitpick ever so slightly here, but for a good reason, I promise, and I hope you’ll take it in the lighthearted tone I’m intending! The two terms funct(X, f(X)) and funct(X, f(f(X))) are unifiable but non-equal (in the == sense), even when funct/2 is replaced by =/2.

Now imagine if inline subterm naming were part of the Prolog spec: you could have rephrased that as

Also note that (X) f(X) and (X) f(f(X)) are physically different terms…

and then it would have been literally true, with no chance of misinterpretation

To address the point you were actually making, though: you’re absolutely right that promoting rational trees into the formal domain of Prolog terms would be an enormous change which would require a number of other ancillary changes in the language, especially surrounding the concepts of equality, unifiability, and variance. That’s why your reminder that they currently aren’t in the domain made such an impact that I withdrew the suggestion - I’m not ready to suggest any changes that are quite that sweeping! (Not that I wouldn’t, of course - you know me, after all. But I’d put a lot more thought into the problem first! )

Oh, neat! Yes, that is indeed very similar to my formal dict concept, and in general to what I’ve been thinking when it comes to dict efficiency and usage. I think, though, that if my goal were solely to make functors with named arguments, I’d take a page out of Python’s and (believe it or not) Powershell’s playbooks and choose a syntax that allows for specifying both positional and named arguments.

Digression: Powershell

On an extremely tangential note and almost entirely unrelated to Prolog, it took me a long time to understand the what and why of Powershell, and until I did I absolutely hated it. Why take a command shell and do things that are so very un-command-shell-like? I had an epiphany last year that completely changed my mind about it, though:

Despite its name and appearance, Powershell isn’t a command shell.

Powershell is a programming language in the .NET family which happens to share some features with command shells, among them the concept of current location (or current “focus”, perhaps) and the fact that a bare string with no control characters, delimited by whitespace and terminated by newline, is interpreted as an instruction to run a single command or internal function. Because of those properties, the Powershell REPL works decently well as a command shell, even for users that aren’t familiar with the language.

A good example of where this makes a difference is the ls (or dir, if you prefer) builtin. Under a Unix system, the ls binary has the responsibility to interpret the textual arguments passed to it, get the contents of any directories listed, sort them, format the output, and write it to standard out as text. In Powershell, however, the compiler does argument processing (so you never have to guess if this command uses single-hyphen options, or double-hyphen, or slash, or plus, or just tar/ar-style plain letters), and once ls (which is an alias to the builtin function Get-ChildItem) is finally called, its only job is to fetch and optionally sort the entries in the specified directories, then output them as CLR file objects on its output. When you make a pipeline in Powershell, you’re not necessarily connecting two text streams to each other, you’re allowing a procedure to pass any kind of object to another.

To see why this makes a difference, try the following commands if you have access to a Win/Mac/Linux system with Powershell installed:

ls (shows you a list of files, just as Linux would)

$files = ls (syntax aside, this seems like the same thing as FILES=$(ls)…)

$files (you can put echo before that but it’s unnecessary, gives the same output as above)

$files[0] (!!! this shows just the first file, with the same formatting)

$files[0].LastWriteTime (shows the mtime of that file in long form)

$files[0].FullName (shows the full pathname of that file, despite that not appearing in the original output)

And, if you still don’t think of Powershell as a first-class programming language, try these:

$files.FullName (shows the full pathnames of all files listed by the original ls “command”)

(ls).FullName (cut out the variable, just take a common field out of this stream of objects)

Dunno about you, but I was absolutely blown away by that, because I can’t think of any other language where taking a field from a group of objects is every bit as easy as taking a field from a single object. And, to bring this back to Prolog, the reminder that languages don’t have to resemble each other or prioritize the same things syntactically is a good one, as is the reminder that, quite often, the ability to process groups of things is as or more important than being able to process a single thing, and the language might as well reflect that.

(That last is already displayed in SWI-Prolog, at least behaviorally; take, for example, the optimized code route available for two-clause predicates where the first arguments are [H|T] and [].)

All that said, I’m not sure I want the “read macro” functionality either, though I suspect possibly for different reasons than you; I’m not sure they’d offer as much functionality as I’d like! What I’d really like to do is extend the control that op/3 gives us over the parser. At present there’s a single parsing model that we can tweak parameters on via op/3 and flags like var_prefix, but someday I’d love to see all of those subsumed into a conceptual model where we can tweak behaviors in addition to values. For example, perhaps someday the Prolog parser could get implemented as a DCG, which could be compiled into a form that both read_term/2 and write_term/2 could use, and then parts of that DCG could get overridden on a per-module basis? Food for thought

And yes, I’m aware that at present, there’s both a chicken-and-egg problem and a performance problem, which is why “someday” is a very operative term in that sentence. But I’m gonna keep working on reducing those issues!

jan · September 14, 2021, 3:29pm

Yes. One of the beauties of Prolog is that it a term and an equivalent term allocated elsewhere cannot be distinguished. This means we don’t have to worry about copies and we have only one identity operation (==/2). This falls apart at the moment you start using setarg/3 and similar destructive assignment operations that break the immutable nature of (ground) terms. I think all we may want to change to this is to provide a safer notion of mutable terms. SICStus has something they call mutable terms. I never studied their behavior in enough detail to tell you whether they solve the entire problem or they are essentially the same as using setarg/3 on a compound with with an additional free variable. The SICStus emulation library implements them this way and I’m not aware of that causing any trouble.

That is typically a nice property. The Prolog alternative is typically to pass positional arguments and an option list or dict to do the “rare” arguments. Not the same as the caller cannot decide on what argument to pass by position and what by name. Typically acceptable though.

Difficult issue. There are quite some Prolog users out there who think the introduction of operators was a mistake. They surely have serious disadvantages, just think about IDEs that typically cannot deal with dynamic operators. On the other hand, I don’t particularly like Lisp syntax either

dmchurch · September 14, 2021, 4:27pm

Yes, you may be right about that! I think that, if it were me, I’d probably introduce it as something like the concept of a mutable reference, perhaps with syntactic sugar that makes it easy to “change an argument” or the like, in much the same way put_dict/4 works. That way, code expecting standard, immutable terms don’t have them change in unexpected ways, but code written for the concept of mutability can alter it freely. (And if you wanted to combine the two, you could just make a standard immutable compound term where one argument is a mutable reference.)

That’s just my off-the-cuff thoughts, though - I’ve got no plans to actualize anything like that in the near future!

Understandable, especially if they’re coming from a functional background! Personally, my view on the matter is that source code is written for people, not computers. Anything that makes the source code easier and quicker for developers to understand without changing the semantics is a net gain, in my book! SWI’s quasi-quotations are a good example - they’re effectively a way to tell the parser “go into a different parsing mode, please, and return to the current mode when you see this character”. Only, the “other mode” in that case is also a baked-in, limited parse model. I absolutely love that in SWI-Prolog, lambda syntax is just a library, even though it introduces what is, effectively, a new syntactic construct. Usually it takes a new revision of the language itself to do that!

As far as I’m concerned, all of the following should be able to be unified under the general umbrella of “parser control”:

Operator translation to compound terms
SWI dict syntax
Quasi-quotation syntax
Standard quotation syntax (and what kind of term it parses to)
The prolog flags:
- allow_dot_in_atom
- allow_variable_name_as_functor
- back_quotes
- char_conversion
- character_escapes
- character_escapes_unicode
- double_quotes
- rational_syntax
- var_prefix
The use of . to terminate/separate terms
Which characters are combinable with others to form atoms, and which aren’t

One of the lovely things about operators is that, since they can be included in a module’s exports, you can just use_module/1 and now you can use that syntax yourself. Now just imagine, if the concept of “exportable parser control” were extended beyond just operators, you could pull in quasi-quotation syntax, dict syntax, even an entire different dialect of Prolog just as easily! And you wouldn’t have to add yet another global, compiled-in Prolog flag whenever you wanted to add a new syntax to the language

oskardrums · September 14, 2021, 8:18pm

This idea of generalizing the syntax the of the language reminds me of Racket’s #lang feature, that lets you have different surface languages for the same underlying abstract language, in their case a Lisp. I didn’t get a chance to really play with it yet so I don’t know how good of a reference it is, but on paper it sounds similar. Might be worthwhile checking out the sources.

Topic		Replies	Views
Named constants? Help!	37	3155	May 9, 2020
Syntax/semantics for partial dicts Request For Comments	28	1458	July 20, 2021
Recently started woring with prolog and decided to make a text documenting my annoyances General	18	637	October 21, 2023
Series of "procedural" predicates -- removing choice points Help!	33	2385	June 4, 2019
Ann: SWI-Prolog 8.3.21 Releases	1	629	March 26, 2021

Inline subterm naming syntax (was: Syntax for unify-and-remember in head?)

Related topics