Inline subterm naming syntax (was: Syntax for unify-and-remember in head?)

Problem

I often find when I’m writing Prolog that I both want to unify an argument against a template and use that argument in full in the body. As it stands, my options are to (a) postpone the unification until the body, which changes (meta-)semantics and can impact performance, or (b) recreate the argument in full in the body, which violates DRY and can lead to overallocation of local variables. In other words:

display_record(Record) :-
    Record = record(_, _, _, _),
    format('Record: ~p~n', [Record]).

display_record(record(A, B, C, D)) :-
    format('Record: ~p~n', [record(A, B, C, D)]).

What I’d like would be a syntax that allows me to bind a head variable and specify a unification template at the same time. What’s funny is that the SWI-Prolog compiler already does this (if prolog flag optimise_unify is set), and the decompiler can’t handle it:

?- listing(display_record).
display_record(record(_, _, _, _)) :-
    format('Record: ~p~n', [_]).
display_record(record(A, B, C, D)) :-
    format('Record: ~p~n', [record(A, B, C, D)]).

Note the format/2 goal in the first clause body has an anonymous variable as its argument, which we also see in reflection with clause/2:

?- Head = display_record(Arg), clause(Head, Body), numbervars((Head,Body)).
Head = display_record(record(A, B, C, D)),
Arg = record(A, B, C, D),
Body = format('Record: ~p~n', [E]) ;

Head = display_record(record(A, B, C, D)),
Arg = record(A, B, C, D),
Body = format('Record: ~p~n', [record(A, B, C, D)]).

So, not only is clause/2 returning “incorrect” values (by rights, I should be able to get a reference to the first clause with clause(display_record(Arg), _, Clause), var(Arg), but that fails) but the values it returns won’t even work if you assert/1 them back into the database.

Existing Alternatives

As noted above, this behavior already exists in the compiler, controlled by the flag optimise_unify. However, depending on the type of rule, the syntax required to activate it could take one of three forms:

display_record(Record) :-
    Record = record(_, _, _, _),
    ... .
display_record(Record),  Record = record(_, _, _, _) =>
    ... .
display_record(Record) -->
    { Record = record(_, _, _, _) },
    ... .

The only way to inhibit this behavior (that is, to actually put the unification in the body) is to move it to later in the body or to disable optimise_unify altogether.

Proposed Syntax

I’d like to add a syntax that reifies the optimise_unify behavior back into the language itself. The syntax should allow me to specify a head-unification template while also providing a variable name that can be used to refer to that argument. My proposal is the following:

display_record((Record) record(_, _, _, _)) :-
    format('Record: ~p~n', [Record]).

More specifically, the syntax consists of a literal variable name in parentheses preceding a standard Prolog term; in terms of precedence I’d suggest that the construction be equivalent to a Prolog fy operator at precedence 650 or thereabouts, just above A:B and just below the comparison/unification operators. The variable name must not have already been introduced in the scope of this term, as this isn’t a unification construct, it’s just a way to name a subterm during parsing.

The variable name is in scope immediately after the close parenthesis. Thus, (Foo) Foo is a very silly way to describe an unbound variable, and (Cyc) [1,2,3|Cyc] is a way to describe a cyclic list in a self-contained syntax, not requiring multiple goals or template substitution.

Semantics

This construct doesn’t have any semantics on its own, since it’s entirely transparent to the structure of a term. Put another way, taking any valid Prolog term and prepending (_) is a syntactic no-op and will not change how the term parses in any way. However, the compiler can take note of names declared this way and will tweak the compilation accordingly:

  • An inline name declaration for a clause head parameter generates the same head-unification code as the bare subterm - that is, the presence or absence of an inline name on a top-level parameter does not change the head-unification code. However, any references to that name in the body (or later in the head) will just reference the corresponding argument slot, rather than creating a new variable in the local frame.

  • An inline name declaration for a subterm of a clause head parameter requires a new VMI; I’ll call it H_INLINEVAR. The behavior is exactly like H_FIRSTVAR except that it doesn’t increment ARGP (and thus, it requires another H_* afterwards to define the contents of the INLINEVAR).

  • An inline name declaration within a clause body compiles as though factorized into two subgoals, where the first subgoal independently sets the value of the inline variable and the second references it. The compiler generates a no-op, arity-0 VMI B_INLINEVAR immediately prior to the initial B_UNIFY_FIRSTVAR, as a hint to the decompiler.

  • Body unification to a variable declared with an inline name in the head never triggers the optimise_unify behavior. So, it’s possible to force body-unification of an argument with the following syntax:

    no_head_unify((Arg) _) :-
        Arg = 1.
    

Predicate Interactions

Most existing predicates don’t have to care about this new syntax at all, since it’s a syntax-only construct. Those that do generally deal with reflection or term I/O:

  • read_term/2
    This is the big one, of course. The actual term read never changes, as described above, so a term written as
    same_term((Arg) values(1,2,3), Arg)
    
    will be read as same_term(values(1,2,3), values(1,2,3)), where the second argument is a reference to the first. (Thus, calling this term as a goal will succeed.) However, quite a few options are impacted:
    • variable_names(Vars)
      This will not change, which is to say that inline variable name declarations attached to non-variable subterms will not be present in the Vars list. This is for backwards and ISO compatibility, as existing code may reasonably assume that all members of the Vars list have a variable term as their second argument. For an inline-name-compatible replacement, use
    • subterm_names(Assignments) (new)
      Exactly as variable_names(Vars), but the Name = Subterm elements don’t require that the second argument be a variable.
    • subterm_positions(TermPos)
      Will have a format describing inline names.
    • cycles(Bool)
      Does not change, but will probably not be needed as much.
    • factorized_subterms(Substitution) (new)
      If this option is given, named subterms will not be included as their value, but will instead be replaced by anonymous variables, and Substitution is unified with a list of variables and their values in Var=Value form. This option behaves as though the following definition were in effect:
      read_term(Term, Options) :-
          select_option(factorized_subterms(Substitution), Options, RestOpts),
          read_term(FullTerm, RestOpts),
          term_factorized(FullTerm, Term, Substitution).
      
      It also alters slightly the semantics of the variable_names and subterm_names options; the former now describes the anonymous variables returned in Term, and the latter describes the values returned in factorized_subterms(Substitution).
    • singletons(Vars)
      will return inline names if they are singletons. If factorized_subterms has not been given, the second argument to the = will be an anonymous variable (and, itself, a singleton).
  • write_term/2
    Again, lots of changes here, and all in option handling. Note, though, that the default representation of cyclic terms changes due to the change in the default value of cycles(_).
    • cycles(Atom) (changed from Bool)
      The meanings of true and false are unchanged, but they are joined by the new value inline, which represents cyclic terms using inline names instead of @(Template, Substitutions) syntax. The default changes from true to inline.
    • variable_names(Vars)
      Remains unchanged, as with the read_term/2 option of the same name. The stipulation that “Terms where Var is bound or is a variable that does not appear in Term are ignored.” remains in effect, which is to say that this option cannot be used to provide inline subterm names, even if those subterms are written with names due to cycles(inline).
    • subterm_names(Assignments) (new)
      Behaves similarly to variable_names, but works on bound terms as well as variables. Subterm comparison is performed with the same logic as same_term/2. NOTE: this option is expensive, probably. Subterm comparison must be performed at every level of writing, unlike with variable_names which just sets the variables beforehand.
    • factorized_subterms(Substitution) (new)
      Allows full customization of which subterms are named. Takes a list in the same format as the same option in read_term/2, takes a template as the Term argument in the same way, and interacts with the options variable_names and subterm_names in the same way.
    • factorized(Bool) (new)
      If present and true, all compound terms that appear more than once have a name applied. Semantically equivalent to the following definition:
      write_term(Term, Options) :-
          select_option(factorized(Bool), Options, RestOpts),
          Bool == true,
          term_factorized(Term, Skeleton, Substitution),
          write_term(Skeleton, [factorized_subterms(Substitution)|RestOpts]).
      
  • listing/1, portray_clause/1, et al
    These handle decompilation of inline-named variables properly, as noted above under Semantics.
  • clause/2,3
    These return the head and body of a given clause as they are. Bound subterms are bound, named (and subsequently shared) subterms are shared. This means that given these definitions:
    display_record(record(A, B, C, D)) :-
        format('Record: ~p~n', [record(A, B, C, D)]).
    display_record((Record) record(_, _, _, _)) :-
        format('Record: ~p~n', [Record]).
    
    a call to clause(display_record(HRec), format(_, [BRec])) will match both clauses, but only the second will also satisfy same_term(HRec, BRec).
  • assert/1 et al
    This (like all dynamic programming) is basically unaffected, since assert operates after the parse step. However, the variable-sharing optimizations of the compiler will activate based on shared subterms, so the return from clause/2 will compile into the same code as the original when asserted.
  • term_expansion/2
    Like assert/1 above, this is largely unaffected. The only thing to note is that binding what was originally a variable in the read term will often cause subterm sharing in the resultant clause.

That’s all I can think of at the moment, though I’m sure I’ve missed something. What thoughts do folks have on this? Is the new syntax something other people would find useful?

1 Like

(The following is somewhat half-baked, but I couldn’t resist writing a “have you considered this?”, even if it’s not as good as @dmchurch’ proposal.

There’s an alternative that doesn’t require any new syntax, at the price of looking a bit like Erlang. I’m not saying I prefer it – there’s a lot to like about @dmchurch’s proposal.

Consider a predicate that matches multiple different terms:

display_record(x(A,B))     :- format('Record: ~p~n', [record(A,B)]).
display_record(y(A,B,C))   :- format('Record: ~p~n', [record(A,B,C)]).
display_record(z(A,B,C,D)) :- format('Record: ~p~n', [record(A,B,C,D)]).

could be written:

display_record(Record) :-
  Record = x(A,B),     format('Record/1: ~p~n', [Record]);
  Record = y(A,B,C),   format('Record/2: ~p~n', [Record]);
  Record = z(A,B,C,D), format('Record/3: ~p~n', [Record)]).

To avoid repeating Record =, something like Erlang’s “case” could be used (I think that this can be done with op/3 declarations):

display_record(Record) :-
  case Record in
     x(A,B)     => format('Record/1: ~p~n', [Record]);
     y(A,B,C)   => format('Record/2: ~p~n', [Record]);
     z(A,B,C,D) => format('Record/3: ~p~n', [Record]).

and the => could be replaced by -> if if-then-else is desired instead of possible backtracking. Also, the “case” could be used to factor out common goals:

display_record(Record) :-
  ( case Record in
      x(A,B)     => true;
      y(A,B,C)   => true;
      z(A,B,C,D) => true
   ),
   format('Record: ~p~n', [Record]). 
1 Like

I still have to digest what you wrote in more detail but upon a first cursory readthrough this comes to mind.

Effects on the Byrd Box model and possible added ports.


Is the proposal only for deterministic code, e.g. could the clauses be rewritten using single sided unification?

The patterns @peter.ludemann and you show remind me of C# Deconstructors and F# Pattern Matching which leads to Algebraic Data Types. Is this where you are headed?


I gave your post a like. The like is not to be interpreted as my liking the proposal, instead a like by an admin will give it more points toward being put in the e-mails for those that don’t visit the site regularly.


Personal Notes (Click triangle to see details)

violates DRY

DRY -Don’t Repeat Yourself (Wikipedia)


optimise_unify

SWI-Prolog Documentation 2.18.3 Indexing for body code


Displaying the VM Instructions

File: examples_optimise_unify_true.pl

:- module(examples_optimise_unify_true,
    [
        display_record/1
    ]).

:- set_prolog_flag(optimise_unify,true).

display_record(Record) :-
    Record = record(_, _, _, _),
    format('Record: ~p~n', [Record]).

display_record(record(A, B, C, D)) :-
    format('Record: ~p~n', [record(A, B, C, D)]).
?-current_prolog_flag(optimise_unify,Optimise_unify),clause(examples_optimise_unify_true:display_record(_),Body,Ref),clause_vm(Ref,VM_instructions);true.

Optimise_unify = true,
Body = format('Record: ~p~n', [_17216]),
Ref = <clause>(0000000006F3BAE0),
VM_instructions = [
   vmi(h_functor(record/4), 2), 
   vmi(h_pop, 1),
   vmi(i_enter, 1),
   vmi(b_atom('Record: ~p~n'), 2),
   vmi(b_list, 1),
   vmi(b_argvar(0), 2),
   vmi(b_nil, 1),
   vmi(b_pop, 1),
   vmi(i_depart(system:format/2), 2),
   vmi(i_exit, 1)
] ;

Optimise_unify = true,
Body = format('Record: ~p~n', [record(_20546, _20548, _20550, _20552)]),
Ref = <clause>(0000000006E01250),
VM_instructions = [
   vmi(h_functor(record/4), 2),
   vmi(h_firstvar(1), 2),
   vmi(h_firstvar(2), 2),
   vmi(h_firstvar(3), 2),
   vmi(h_firstvar(4), 2),
   vmi(h_pop, 1),
   vmi(i_enter, 1),
   vmi(b_atom('Record: ~p~n'), 2),
   vmi(b_list, 1),
   vmi(b_functor(record/4), 2),
   vmi(b_argvar(1), 2),
   vmi(b_argvar(2), 2),
   vmi(b_argvar(3), 2),
   vmi(b_argvar(4), 2),
   vmi(b_pop, 1),
   vmi(b_nil, 1),
   vmi(b_pop, 1),
   vmi(i_depart(system:format/2), 2),
   vmi(i_exit, 1)
] ;
true.

File: examples_optimise_unify_false.pl

:- module(examples_optimise_unify_false,
    [
        display_record/1
    ]).

:- set_prolog_flag(optimise_unify,false).

display_record(Record) :-
    Record = record(_, _, _, _),
    format('Record: ~p~n', [Record]).

display_record(record(A, B, C, D)) :-
    format('Record: ~p~n', [record(A, B, C, D)]).
?- current_prolog_flag(optimise_unify,Optimise_unify),clause(examples_optimise_unify_false:display_record(_),Body,Ref),clause_vm(Ref,VM_instructions);true.
Optimise_unify = false,
Body =  (_20=record(_96, _98, _100, _102), format('Record: ~p~n', [_20])),
Ref = <clause>(0000000006A1CDA0),
VM_instructions = [
   vmi(i_enter, 1),
   vmi(b_unify_var(0), 2),
   vmi(h_functor(record/4), 2),
   vmi(h_pop, 1),
   vmi(b_unify_exit, 1),
   vmi(b_atom('Record: ~p~n'), 2),
   vmi(b_list, 1),
   vmi(b_argvar(0), 2),
   vmi(b_nil, 1),
   vmi(b_pop, 1),
   vmi(i_depart(system:format/2), 2),
   vmi(i_exit, 1)
] ;

Optimise_unify = false,
Body = format('Record: ~p~n', [record(_3908, _3910, _3912, _3914)]),
Ref = <clause>(0000000006E227E0),
VM_instructions = [
   vmi(h_functor(record/4), 2),
   vmi(h_firstvar(1), 2),
   vmi(h_firstvar(2), 2),
   vmi(h_firstvar(3), 2),
   vmi(h_firstvar(4), 2),
   vmi(h_pop, 1),
   vmi(i_enter, 1),
   vmi(b_atom('Record: ~p~n'), 2),
   vmi(b_list, 1),
   vmi(b_functor(record/4), 2),
   vmi(b_argvar(1), 2),
   vmi(b_argvar(2), 2),
   vmi(b_argvar(3), 2),
   vmi(b_argvar(4), 2),
   vmi(b_pop, 1),
   vmi(b_nil, 1),
   vmi(b_pop, 1),
   vmi(i_depart(system:format/2), 2),
   vmi(i_exit, 1)
] ;
true.


SWI-Prolog is not based on WAM VM instructions but ZIP VM instructions.
ref 1
ref 2
Bowen et al. , 1983 - " A portable Prolog compiler" - (ResearchGate PDF download)
Neumerkel, 1993 - “The binary WAM,a simplified Prolog engine” (pdf)


SWI-Prolog VM Instructions at the C level
pl-comp.c
pl-wam.c
pl-vmi.c


Ref: pl-vmi.c

Virtual machine instruction names.  Prefixes:
  I_	General instructions
  B_	Body specific version
  H_	Head specific versin
  A_	Arithmetic compilation specific
  C_	Control (compilation of ;/2, etc.)
  S_    Supervisor instructions.  See pl-supervisor.c

SWI-Prolog VM Instructions

virtual machine variables: pl-vmi.c

Within the scope of this file, the following virtual machine variables are available.
* FR
Current environment frame
* NFR
Next frame (used to share in various calling instructions)
* BFR
Backtrack frame: current choicepoint
* PC
Program Counter
* ARGP
Argument pointer
* CL
Running clause (= FR->clause)
* DEF
Running definition

H_FIRSTVAR: pl-vmi.c

H_FIRSTVAR: A variable in the head, which is not anonymous, but
encountered for the first time. So we know that the variable is still a
variable. Copy or make a reference. Trailing is not needed as we are
writing in this frame. ARGP is walking the argument list, but is always
in some compound term as H_FIRSTVAR is not generated for plain variables
in the head.

B_UNIFY_FIRSTVAR: pl-vmi.c

B_UNIFY_VAR, B_UNIFY_EXIT: Unification in the body. We compile A = Term
into one of the following:
Normal: A is firstvar:
B_UNIFY_VAR B_UNIFY_FIRSTVAR

B_UNIFY_EXIT B_UNIFY_EXIT
We need B_UNIFY_FIRSTVAR for the debugger as well as for
clearUninitialisedVarsFrame() in pl-gc.c. When in debug mode we simply
create a frame for =/2 and call it. Note that the `slow unify’ mode must
be consistently applied in B_UNIFY_VAR and B_UNIFY_EXIT, which is why we
copy the global value into a local variable.
TBD: B_UNIFY_CONST ,
B_UNIFY_VAR ,
Note that the B_UNIFY_FIRSTVAR assumes write mode, but this is
unimportant because the compiler generates write (B_*) instructions.

Invented by dmchurch.

B_INLINEVAR 
H_INLINEVAR

(Prolog) Conformity Testing I: Syntax (ref)
(Works for ISO/IEC JTC1 SC22 WG17)


That is clearly a bug. In general it does deal with that. In this case it doesn’t seems to realize it must create the unification instruction. Will have a look.

The above bug is of course in clause/2. Your call should work. Not in general though because if the delayed unification is only used in the head the produced VM code cannot be distinguished from the term appearing in the head and thus clause/2 leaves the term in the head. This is fine: the standard doesn’t even allow clause/2 on static code. SWI-Prolog does allow, but does not guarantee that the returned clause is identical to the asserted clause. It should of course return a clause that is semantically equivalent. This happens notably for compiled unification, comparison and arithmetic which may be returned a little different (arguments swapped, subtraction of a constant replaced by addition of its negative value, etc.)

Hmm. Syntax discussions managed to break the SWI-Prolog community once :frowning: This implies we need a really good reason before making such changes. The second essential issue is what term is represented by (X) T? Note that we need a term representation to maintain the “code is data” (homoiconicity) of the language. If this is a new type of term, the impact on the system and all the infrastructure around it is really serious. Picat decided to give up on homoiconicity, which was the main reason for me not to join the Picat initiative. Your term “coloring” ideas might solve this. Finally, we get some ambiguities. If T is a postfix operator it will create the term T(X). T could also be both a prefix and infix operator, making (X) T Arg ambiguous.

1 Like

Pushed a fix (0644ffb47bd46578f31fdad7877d56ff495286d8)

The advantage is that this is standard syntax and we can deal with it using term expansion (as we already have an efficient encoding). The disadvantage is the ambiguity if you really want to match against an @(X,Y) term :frowning:

1 Like

Mmm, I disagree with that bit - or (more likely) I don’t quite understand your objection here. Operators themselves are a syntax-only construct; you get the same term regardless of if you write X=Y or =(X, Y). There are no new term representations in play here; the only change is that there were terms that were previously unrepresentable (and which could only be crafted programmatically using multiple goals in sequence) that can now be written as literals.

I did say that “the compiler treats these terms differently” but that’s a bit of an intentional oversimplification. The compiler doesn’t have to know what variables were named with inline names; these rules will apply to any term with two compound subterms that share the same stack identity. It’s just that in classical Prolog there is no way to write a term literal with shared-identity subterms.

My assertion about inhibiting optimise_unify using a construct like (Arg) _ probably only works with term painting, though. I’ll strike that bit.

I wasn’t aware of that! I suppose I really ought to shell out for a copy of the ISO Prolog spec at some point. I do somewhat hate that it isn’t freely available :confused:

Ah, this is what comes of writing the whole thing straight through front-to-back. One of the things I’d intended to say was that the (Var) construct is only interpreted as an inline subterm name if that’s the only way to parse it. So, implementing this is easy: it activates at the moment when you would get a “syntax error, operator expected”.

More generally speaking: if you, the coder, are writing a phrase that could be interpreted ambiguously, it’s your responsibility to add parentheses to disambiguate. If you’ve written a term (X) T where T is a postfix operator, then change it to (X) (T). In the postfix/infix case, (X) + X parses to +(X, X), and thus it will not be parsed as an inline name. On the other hand, (X) (+ X) does not parse as a classical Prolog term parses to the cyclic term ++++++++…+++X. Does that clear it up?

I’m with you on that! The homoiconicity trait is one of the things I love most about Prolog! I mean in general I love self-consistent structures, and that’s one of the reasons I went with this particular formulation. Adding inline naming to the language means that any conceivable Prolog term (that is, any term that can be represented in the Prolog engine) is also representable (has a syntax that will produce that term, when read as input).

And this is precisely why I went with something that is, presently, an invalid syntax at the core level! You don’t have to say “this syntax only has meaning if used in the head” or use some convoluted escaping mechanism to get a “literal” version of the syntax you wrote.

So, as far as I’m concerned, the fact that Record@record(_, _, _, _) is (for some operator definitions) standard Prolog syntax is a disadvantage, not an advantage :slightly_smiling_face:

Similarly (and this addresses @j4n_bur53’s point as well, I believe), if you have to use post-parse expansion to realize your subterm sharing solution, then (a) it only works on code, not data, and thus it violates homoiconicity, (b) you have by necessity caused backwards incompatibilities, as there will be some currently-valid code that changes semantics, and (c) you’re introducing complexity and inconsistency to the system, and that way likes Ruby and Too Much Magic. I’m already sad about the fact that you can’t write a literal (.)/2 anywhere in a SWI-Prolog goal, I’m not about to add another construct to that list.

I don’t think so, no. I’ve definitely had thoughts along those lines before, but this particular idea doesn’t add any semantic restrictions to the language (nor, for that matter, does it introduce any additional semantic features). Any terms that can be created using this syntax can already be created by other means, like term_expansion/4, assert/1, or the C API; this just provides a syntax that allows a programmer to write one of these terms directly, without one of those metaprogramming techniques.

The issue with this is that moving unifications into the body makes the compiler’s job more difficult. Unifications in the head are fantastic, because pattern-matching can be used against head terms - and, in fact, one of Prolog’s superpowers is that it can use head-term matching to pick its clause. The two implementations you list here are semantically similar, yes, but not identical; the second version enforces an ordering that isn’t present in the first version. Similarly, note the pitfalls listed in the SSU docs, regarding clause matching:

1 Like

I don’t get that. (Var) Term is a syntactic construct that must fit into Prolog’s data. As it contains two things this can only be a compound term. So this construct must have a canonical representation as X(Var,Term), where we can chose our X freely and for SWI-Prolog we could use a reserved symbol as we do with dicts. Is that what you had in mind?

If you restrict the syntax to really just (Var) the syntax ambiguity is probably bearable as embracing a single variable is not that likely to happen often.

The fact that Picat has something for it suggests there is at least some demand. I’m not aware of any other prior art. Most Prolog systems do something clever with explicit unification directly after the head and some (e.g., XSB) factorize identical subterms in a clause and avoid the copy. AFAIK this is source-to-source translation.

Still, I think this will make more enemies that friends :frowning:

This is not true (unless I’m mistaken). They are semantically equivalent. Yes, it requires additional compiler technology to make them execute with the same efficiency and avoiding the creation of unnecessary choice points. There is no fundamental reason why this is impossible. But I do like multiple clauses more :slight_smile:

That was my intention, yes. The text within parentheses must be a sequence of nonblank characters that the Prolog parser will interpret as a variable name (must start with a capital letter or underscore, or must start with an underscore, depending on the var_prefix Prolog flag). As you say, that construct isn’t likely to be written as a literal anywhere! I also like it because (to my mind at least) it’s a fairly intuitive reading of the term. If I saw the syntax and I wasn’t familiar with it, I could still guess that it was in some way relating this name here to that term there.

Mmm, I don’t think so. (Though that is a very interesting read, thank you!) My point with this concept is specifically not to change semantics, but instead to provide a syntactic representation for a semantic that already exists. Anything that can be done via term/goal/function expansion is by necessity something that changes the semantics of an existing syntactic construct. (If it didn’t change the semantics, it would just be the declaration term_expansion(X, X), which is an expensive no-op.)

You’d probably know better than I would! (Again, I really ought to pick up a copy of that spec.) My understanding of (;)/2 is that the second half of the argument will only be evaluated if the first half of the argument fails, while two consecutive clauses aren’t strictly required to follow that ordering if head-unification pattern match finds a better option.

Sounds like a fun challenge! Definitely on my list :smile:

Hmm, I may have stated that too broadly. What I meant was: in SWI-Prolog, you cannot write a rule which has a (.)/2 subterm anywhere in it and have that subterm be treated as itself. Put another way:

represent_dot2(Left, Right) :-
    write_canonical(YOUR_CODE_GOES_HERE).

My point is that there is nothing that you can put in place of YOUR_CODE_GOES_HERE that causes a query of ?- represent_dot2(foo, bar). to generate the output .(foo, bar). You cannot represent that term in a SWI-Prolog source file.

Precisely it does! But there are two key differences here: first, (–>)/2 already exists in the core Prolog syntax, as it has since the beginning. Thus, there is no backwards compatibility risk.

Second, term/goal expansion only applies to code. Similarly, (–>)/2 only applies to code. The term (A → B) can be used, on its own, and its meaning is simply “the functor with arity 2, name '-->', and its arguments are two distinct variables” - it doesn’t have anything to do with DCGs unless (–>)/2 is used as the primary functor of a rule declaration. Even the names aren’t part of the semantics, they’re only there to relate to other instances of that name in this term. That’s what the inline-naming syntax does: it provides with other ways to allow you to relate two different parts of a single term to each other, without requiring any additional transformation.

Consider: if I wanted to write a singly-linked list as a single literal without using Prolog’s builtin list syntax, I could still write node(V1, node(V2, node(V3, node(V4, nil)))). If I wanted to make a binary tree, I could write node(Root, node(Left, node(LL, nil, nil), node(LR, nil, nil)), node(Right, node(RL, nil, nil), node(RR, nil, nil))). An array can be represented array(V0, V1, V2, V3, V4). Yet there’s no way to represent a doubly linked list as a literal; there’s no way to create that term without multiple unifications in a row. But with inline naming, I can write:

% node(Prev, Value, Next)
(N1) node(nil,
          V1,
          (N2) node(N1,
                    V2,
                    (N3) node(N2,
                              V3,
                              node(N3, V4, nil)
                             )
                   )
         )

Now, I’m not suggesting that anyone wants to write this type of complex data structure by hand! But this syntax makes it possible to represent, which it wasn’t previously. And, since this is Prolog and code is data, anything which makes it possible to represent more data also makes it possible to represent more algorithms. Does that make sense?

You’re missing the point, a bit. What the top-level is showing you, in that output, is three goals that, once all three have been executed, will result in R having the doubly-linked structure.

It has always been possible to create terms like that in Prolog, using multiple unifications in sequence. However, the term itself is unrepresentable.

Actually, it’s entirely different. _S2 = node(...) is a goal. Its functor is (=)/2, the first argument is a variable whose name is irrelevant, and whose second argument is a compound term of the functor node/3. Before that goal is executed, the variable named _S2 has no value. After that goal is executed, the variable named _S2 is bound to a compound term of the functor node/3.

On the other hand, (_S2) node(...) is a compound term (whose name, again, is irrelevant) of the functor node/3. It is a compound term before the VM looks at it, it is a compound term after the VM looks at it. You can use it as an argument to another compound term. You can put it in a list. You can call it as a goal, if you want.

It’s the exact same difference as between "hello world", which is a term that represents a string with 11 characters, and string_codes(Str, [104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100]). The former is a term. The latter is a goal that allows you to create that term.

You’re correct that this particular usage of the functor @/2 is very similar to this proposal - it’s a way to assign names to subterms. You missed the biggest disadvantage to that syntax, though, and it’s a complete showstopper - because it uses existing Prolog parsing, implementing this feature breaks any code that might have ever wanted to use @/2 for its own purposes. In fact, SWI-Prolog itself has a completely different meaning for @/2, which Discourse is very helpfully linking us to.

This! This is exactly what I’m talking about, thank you. You’re saying “synthetic goal” as though that’s a given, but the point behind this syntax is that it doesn’t require synthetic goals. You can simply represent a term of whatever complexity as a term. No synthetic goals, no rewriting using *_expansion, no magic compiler optimization “knowing” what you probably meant when you put a unification at the start of your clause body. Just writing the term you meant to write, the way you meant to write it, and having that be the code that gets executed.

Incorrect. If you have a way to declare - as part of the syntax - that “V” and “Term” are actually two names for the same thing, then there’s no need for unification. It’s just two ways to refer to the same thing. Remember, as far as the Prolog VM is concerned, neither of these two names actually even exist. Naming variables is just a convenient thing that the Prolog language syntax lets us do to make things easier for our human brains to understand. The parser/compiler associates these names with local variables so that we don’t have to say “local variable 1” “local variable 2” everywhere, and it would be quite happy to be told that “V” and “Term” could both be ways to describe the local variable in slot 1.

Right! The current version of the SWI-Prolog VM does not support this functionality.

Please remember that this is a proposal for functionality that I would like to include in SWI-Prolog. By necessity, that means that the things I’m describing do not yet exist and are not yet supported :slightly_smiling_face: The feedback I’m looking for here is along the lines of “what if it were supported” and “what might go wrong if this feature were added”.

No one else picked up on this. If I understand what you are saying, this is incorrect.

Both arguments of ;/2 will be evaluated (unless of course something throws). The order is fixed, first the first and then the second argument, exactly as documented:

Goal1 ; _Goal2 :- Goal1.
_Goal1 ; Goal2 :- Goal2.

And this also proves that the ordering for the clauses is strict.

So by now (after typing this) I am quite convinced that you must have meant something else?

I now finally see what you’re after :slight_smile: It does make sense. As mentioned further down it would require changing the compiler and VM at various places. As is, a clause term is considered a strict tree. In your proposal the compiler should figure out shared subterm in the clause representation and do something clever with them. What exactly depends on the location of the shared subterm. Currently we only handle this special case where a term is both a plain argument to the call and used elsewhere.

Various systems found ways around cyclic terms. Following SICStus, SWI-Prolog does this, i.e., representing a cyclic term as a term @(Skeleton, Unifications)

?- X = f(X), write(a(X)).
@(a(S_1),[S_1=f(S_1)])
X = f(X).

We’ve also see Picat’s approach and we have XSB’s approach to factorize common subterms (regardless of sharing in the source term). Your proposal would (for the compiler) be similar to XSB except that you (only?) consider actual sharing rather than equivalence.

Although (Var) Term looks good, I do consider this dubious in the sense that it introduces something new to the Prolog syntax. As is, we have primitives (variables and atomics) and compounds that use f(…) or an operator. Such a change requires several people to look at it carefully to decide this is ok and won’t get us into trouble now or later.

Should we agree on that, we are faced with the tradeoff between what we gain and what we loose by moving further away from compatibility. I’m afraid my current judgement is that the tradeoff is negative. Things would change if other relevant Prolog systems would be willing to implement this.

2 Likes

I am not finding those VM instruction in the SWI-Prolog source code. They are from this sentence

The compiler generates a no-op, arity-0 VMI B_INLINEVAR immediately prior to the initial B_UNIFY_FIRSTVAR , as a hint to the decompiler.

Need help in translating that to the actual VM instructions which were extracted and in my personal notes. (Click triangle to see details).

OK, would that be a counter example to show my mistake, in regards to the cut transparency?

?- (X = 1, !) ; X = 2.

??

I know this would have been the same: ?- X = 1, ! ; X = 2. but I put the extra parentheses for clarity.