Problem
I often find when I’m writing Prolog that I both want to unify an argument against a template and use that argument in full in the body. As it stands, my options are to (a) postpone the unification until the body, which changes (meta-)semantics and can impact performance, or (b) recreate the argument in full in the body, which violates DRY and can lead to overallocation of local variables. In other words:
display_record(Record) :-
Record = record(_, _, _, _),
format('Record: ~p~n', [Record]).
display_record(record(A, B, C, D)) :-
format('Record: ~p~n', [record(A, B, C, D)]).
What I’d like would be a syntax that allows me to bind a head variable and specify a unification template at the same time. What’s funny is that the SWI-Prolog compiler already does this (if prolog flag optimise_unify
is set), and the decompiler can’t handle it:
?- listing(display_record).
display_record(record(_, _, _, _)) :-
format('Record: ~p~n', [_]).
display_record(record(A, B, C, D)) :-
format('Record: ~p~n', [record(A, B, C, D)]).
Note the format/2 goal in the first clause body has an anonymous variable as its argument, which we also see in reflection with clause/2:
?- Head = display_record(Arg), clause(Head, Body), numbervars((Head,Body)).
Head = display_record(record(A, B, C, D)),
Arg = record(A, B, C, D),
Body = format('Record: ~p~n', [E]) ;
Head = display_record(record(A, B, C, D)),
Arg = record(A, B, C, D),
Body = format('Record: ~p~n', [record(A, B, C, D)]).
So, not only is clause/2 returning “incorrect” values (by rights, I should be able to get a reference to the first clause with clause(display_record(Arg), _, Clause), var(Arg)
, but that fails) but the values it returns won’t even work if you assert/1 them back into the database.
Existing Alternatives
As noted above, this behavior already exists in the compiler, controlled by the flag optimise_unify
. However, depending on the type of rule, the syntax required to activate it could take one of three forms:
display_record(Record) :-
Record = record(_, _, _, _),
... .
display_record(Record), Record = record(_, _, _, _) =>
... .
display_record(Record) -->
{ Record = record(_, _, _, _) },
... .
The only way to inhibit this behavior (that is, to actually put the unification in the body) is to move it to later in the body or to disable optimise_unify
altogether.
Proposed Syntax
I’d like to add a syntax that reifies the optimise_unify
behavior back into the language itself. The syntax should allow me to specify a head-unification template while also providing a variable name that can be used to refer to that argument. My proposal is the following:
display_record((Record) record(_, _, _, _)) :-
format('Record: ~p~n', [Record]).
More specifically, the syntax consists of a literal variable name in parentheses preceding a standard Prolog term; in terms of precedence I’d suggest that the construction be equivalent to a Prolog fy
operator at precedence 650 or thereabouts, just above A:B
and just below the comparison/unification operators. The variable name must not have already been introduced in the scope of this term, as this isn’t a unification construct, it’s just a way to name a subterm during parsing.
The variable name is in scope immediately after the close parenthesis. Thus, (Foo) Foo
is a very silly way to describe an unbound variable, and (Cyc) [1,2,3|Cyc]
is a way to describe a cyclic list in a self-contained syntax, not requiring multiple goals or template substitution.
Semantics
This construct doesn’t have any semantics on its own, since it’s entirely transparent to the structure of a term. Put another way, taking any valid Prolog term and prepending (_)
is a syntactic no-op and will not change how the term parses in any way. However, the compiler can take note of names declared this way and will tweak the compilation accordingly:
-
An inline name declaration for a clause head parameter generates the same head-unification code as the bare subterm - that is, the presence or absence of an inline name on a top-level parameter does not change the head-unification code. However, any references to that name in the body (or later in the head) will just reference the corresponding argument slot, rather than creating a new variable in the local frame.
-
An inline name declaration for a subterm of a clause head parameter requires a new VMI; I’ll call it
H_INLINEVAR
. The behavior is exactly likeH_FIRSTVAR
except that it doesn’t incrementARGP
(and thus, it requires anotherH_*
afterwards to define the contents of theINLINEVAR
). -
An inline name declaration within a clause body compiles as though factorized into two subgoals, where the first subgoal independently sets the value of the inline variable and the second references it. The compiler generates a no-op, arity-0 VMI
B_INLINEVAR
immediately prior to the initialB_UNIFY_FIRSTVAR
, as a hint to the decompiler. -
Body unification to a variable declared with an inline name in the head never triggers the
optimise_unify
behavior. So, it’s possible to force body-unification of an argument with the following syntax:no_head_unify((Arg) _) :- Arg = 1.
Predicate Interactions
Most existing predicates don’t have to care about this new syntax at all, since it’s a syntax-only construct. Those that do generally deal with reflection or term I/O:
- read_term/2
This is the big one, of course. The actual term read never changes, as described above, so a term written as
will be read assame_term((Arg) values(1,2,3), Arg)
same_term(values(1,2,3), values(1,2,3))
, where the second argument is a reference to the first. (Thus, calling this term as a goal will succeed.) However, quite a few options are impacted:-
variable_names(Vars)
This will not change, which is to say that inline variable name declarations attached to non-variable subterms will not be present in theVars
list. This is for backwards and ISO compatibility, as existing code may reasonably assume that all members of theVars
list have a variable term as their second argument. For an inline-name-compatible replacement, use -
subterm_names(Assignments)
(new)
Exactly asvariable_names(Vars)
, but theName = Subterm
elements don’t require that the second argument be a variable. -
subterm_positions(TermPos)
Will have a format describing inline names. -
cycles(Bool)
Does not change, but will probably not be needed as much. -
factorized_subterms(Substitution)
(new)
If this option is given, named subterms will not be included as their value, but will instead be replaced by anonymous variables, andSubstitution
is unified with a list of variables and their values inVar=Value
form. This option behaves as though the following definition were in effect:
It also alters slightly the semantics of theread_term(Term, Options) :- select_option(factorized_subterms(Substitution), Options, RestOpts), read_term(FullTerm, RestOpts), term_factorized(FullTerm, Term, Substitution).
variable_names
andsubterm_names
options; the former now describes the anonymous variables returned inTerm
, and the latter describes the values returned infactorized_subterms(Substitution)
. -
singletons(Vars)
will return inline names if they are singletons. Iffactorized_subterms
has not been given, the second argument to the=
will be an anonymous variable (and, itself, a singleton).
-
- write_term/2
Again, lots of changes here, and all in option handling. Note, though, that the default representation of cyclic terms changes due to the change in the default value ofcycles(_)
.-
cycles(Atom)
(changed from Bool)
The meanings oftrue
andfalse
are unchanged, but they are joined by the new valueinline
, which represents cyclic terms using inline names instead of@(Template, Substitutions)
syntax. The default changes fromtrue
toinline
. -
variable_names(Vars)
Remains unchanged, as with the read_term/2 option of the same name. The stipulation that “Terms where Var is bound or is a variable that does not appear in Term are ignored.” remains in effect, which is to say that this option cannot be used to provide inline subterm names, even if those subterms are written with names due tocycles(inline)
. -
subterm_names(Assignments)
(new)
Behaves similarly tovariable_names
, but works on bound terms as well as variables. Subterm comparison is performed with the same logic as same_term/2. NOTE: this option is expensive, probably. Subterm comparison must be performed at every level of writing, unlike withvariable_names
which just sets the variables beforehand. -
factorized_subterms(Substitution)
(new)
Allows full customization of which subterms are named. Takes a list in the same format as the same option in read_term/2, takes a template as theTerm
argument in the same way, and interacts with the optionsvariable_names
andsubterm_names
in the same way. -
factorized(Bool)
(new)
If present andtrue
, all compound terms that appear more than once have a name applied. Semantically equivalent to the following definition:write_term(Term, Options) :- select_option(factorized(Bool), Options, RestOpts), Bool == true, term_factorized(Term, Skeleton, Substitution), write_term(Skeleton, [factorized_subterms(Substitution)|RestOpts]).
-
- listing/1, portray_clause/1, et al
These handle decompilation of inline-named variables properly, as noted above under Semantics. - clause/2,3
These return the head and body of a given clause as they are. Bound subterms are bound, named (and subsequently shared) subterms are shared. This means that given these definitions:
a call todisplay_record(record(A, B, C, D)) :- format('Record: ~p~n', [record(A, B, C, D)]). display_record((Record) record(_, _, _, _)) :- format('Record: ~p~n', [Record]).
clause(display_record(HRec), format(_, [BRec]))
will match both clauses, but only the second will also satisfysame_term(HRec, BRec)
. - assert/1 et al
This (like all dynamic programming) is basically unaffected, since assert operates after the parse step. However, the variable-sharing optimizations of the compiler will activate based on shared subterms, so the return from clause/2 will compile into the same code as the original when asserted. - term_expansion/2
Like assert/1 above, this is largely unaffected. The only thing to note is that binding what was originally a variable in the read term will often cause subterm sharing in the resultant clause.
That’s all I can think of at the moment, though I’m sure I’ve missed something. What thoughts do folks have on this? Is the new syntax something other people would find useful?