Parsing text using a formal grammar: SWIPL Example

Upgrading a git pack simply runs git pull. If you make local modifications that fails. Under swipl-win git user interaction probably fails, so you need to run git by hand.

Hmm. Seems a Windows thing. Seems remove() doesn’t like to remove read-only files on Windows. Yip, documents say so :frowning:

When you remove manually, like for example on WSL:

> rm -r <git folder>

Then the shell will ask you a yes/no question for certain folders.
But it will be able to completely delete the git folder, if you confirm all.
On windows you can delete SWI pack folders in the explorer.

But I don’t know whether this is allowed, does SWI keep some
information elsewhere, or is what is stored on Windows in
'c:/users/foobar/appdata/local/swi-prolog/pack on per

package basis, all that is stored? Is some pack_cleanup call needed?

Yes. A pack can simply be deleted by deleting its directory.

The current git version can delete read-only files for consistency with other operating systems. It is a bit tricky though because it may not be allowed to delete a file for several reasons in Windows. So, we try to delete the file. If that returns a permission error and the file is read-only, we clean the read-only flag and try again. If it still fails we restore the read-only flag. This makes the operation non-atomic :frowning: Usually Windows provides Func() and a later FuncEx() that provides flags for such cases, but not so for DeleteFile() :frowning:

Was preparing a table with some critical cases. With the new pPEG SWIPL
Example version 02.05.2022, I still find a unary/binary confusion bug:

?- ppeg_parse([-, =:=, :-, -, x0], X), write_canonical(X), nl.
X =  (=:=(-):- -x0).

And even a zeroarity/unary confusion bug:

?- ppeg_parse([-, -, *, :-, x0], X), write_canonical(X), nl.
X =  (*(- (-)):-x0).

Sorry, I’ve lost the plot here. What string will be given to the parser corresponding to the list argument in the calls to ppeg-parse?

The strings are:

?- use_module(library(pPEG)).

?- ['pPEGpl/Examples/SWIP-grammar/'].

?- ['pPEGpl/Examples/SWIP-grammar/'].

?- string_termList("- =:= :- - x0 .", [X]), write_canonical(X), nl.
X =  (=:=(-):- -x0).

?- string_termList("- - * :- x0 .", [X]), write_canonical(X), nl.
X =  (*(- (-)):-x0).

But maybe its a problem with error handling, realization of
your philosophical objection, that the above should fail?
I was using verbose(silent)?

Edit 08.05.2022:
If you take SWI-Prolog as the reference implementation, the two
should indeed fail in pPEG SWIPL example, since in SWI-Prolog
they throw an error:

?- X = (- =:= :- - x0), write_canonical(X), nl.
ERROR: Syntax error: Operator priority clash

?- X = (- - * :- x0), write_canonical(X), nl.
ERROR: Syntax error: Operator priority clash

Yes there’s a bug somewhere in the expression generator since the infix operator =:= ends up with a single argument. I think it could be parsed if =:= was treated as an atom:

?- X = (- (=:=) :- -x0), write_canonical(X).
X =  (- (=:=):- -x0).

?- string_termList("- (=:=) :- - x0 .", [X]), write_canonical(X), nl.
X =  (- (=:=):- -x0).

but this should probably fail as originally written.

The verbose flag should only affect the message output, not succeed/fail semantics.

One source of the bug is this here, I instrumented it with write/1 and nl/0:

% simple postfix
build_term_([V, op(_P,_A,Op)], VarsIn, VarsOut, Term) :-                    
	write('simple postfix, op='), write(op(_P,_A,Op)), nl,
	build_term_([V], VarsIn, NxtVars, Term1),
	reduce_term_(Op, Term1, NxtVars, VarsOut, Term).

You see the bug here:

?- string_termList("- =:= :- - x0 .", [X]).
simple postfix, op=op(700,xfx,=:=)
X =  (=:=(-):- -x0).

So the pPEG SWIPL Example trusts too much in the position
it finds an operator, i.e. [V, op(_P,_A,Op)], and doesn’t check
the operator icon, i.e. xfx, when forming a new term.

The position is not a reliable indicator, since what I was testing
with fuzz4 was errorneous sentences with displaced operators, that
are not in a correct position. Seen in SWI-Prolog that it throws

an error for the example sentence.

Yes, the “simple” prefix, postfix, and infix rules (the ones that call reduce_term_) all suffer from this bug. For now, I’m using:

build_term_([op(_P,A,Op), V], VarsIn, VarsOut, Term) :-                     % simple prefix
	not_op(V), sub_atom(A,0,1,1,'f'),                  % f_
	build_term_([V], VarsIn, VarsNxt, Term1), 
	reduce_term_(Op, Term1, VarsNxt, VarsOut, Term).

build_term_([V, op(_P,A,Op)], VarsIn, VarsOut, Term) :-                     % simple postfix
	not_op(V), sub_atom(A,1,1,0,'f'),                  % _f
	build_term_([V], VarsIn, NxtVars, Term1),
	reduce_term_(Op, Term1, NxtVars, VarsOut, Term).

build_term_([V1, op(_P1,A1,Op1), V2], VarsIn, VarsOut, Term) :-             % simple infix
	not_op(V1), not_op(V2), sub_atom(A1,1,1,1,'f'),    % _f_ 
	reduce_term_(Op1, [Term1,Term2], NxtVars2, VarsOut, Term).

but I haven’t had the time to test this fix or determine that it’s sufficient.

My fuzzer is open source. You find a link
at the end of this medium article here.
What one can try to see, whether the pPEG
SWIPL Example approaches SWI-Prolog.

Currently the results are:

% fuzz3, swi: 0
% fuzz4, swi: 278
% fuzz3, ppeg: 887
% fuzz4, ppeg: 596

I guess with every fix these figures will go down.
I can also do some testing when a new version is
available on GitHub. Maybe you want to test something
else, I tested versus my reference implementation,

but you could also test pPEG against SWI-Prolog.

It’s a bit of work to try and extract the root causes of “disagreement” between the parsers as there’s no error output and the fuzz tests sem to have a lot of duplication. (I counted 30 instances of the same failed test case in one run of fuzz4.) From what I can see, the vast majority of discrepancies involve errors as opposed to parsing a valid term in different ways.

Here’s a start on root causes:

  1. There is a bug in the pPEG parser right reducing non-associative operator sequences. That will be fixed.

  2. Class of expressions exemplified by “x0 ** -x1”, so an xfx infix operator followed by a fy prefix operator with the same precedence:

?- current_op(P,A,-).
P = 200,
A = fy ;
P = 500,
A = yfx.

?- current_op(P,A,**).
P = 200,
A = xfx.

?- X=(x0 ** - x1).
ERROR: Syntax error: Operator priority clash
ERROR: X=(x0 *
ERROR: ** here **
ERROR: * - x1) . 

?- string_termList("x0 ** - x1.",[T]).
T = x0**(-x1).

The pPEG parser supports this as a valid right associative sequence:


which seems to work well in practice, so I’ve left it in place; SWIP (and others?) do not. When I temporarily remove this capability, along with fix for 1., the fuzz3 count drops to 0.

  1. Class of expressions exemplified by “- * x0”. These cannot be parsed successfully if the first “token” is interpreted as a prefix operator. But it can be parsed if it’s interpreted as an atom, which both SWIP and the pPEG parser do:
?- X=(- * x0).
X =  (-)*x0.

?- string_termList("- * x0.",[T]).
T =  (-)*x0.

but the test parser apparently does not:

?- simp_parse([-,*,x0],T).
T = error.
  1. Somewhat related to 3, I recall (perhaps incorectly) reading somewhere that a prfix operator cannot be immediately followed by an infix operator. I do enforce this in the grammar with limited lookahead so:
?- string_termList("- * ** x0.",[T]).
% pPEG Error: Prolog.InfixOp failed, expected <pl_grammar:testOp infix> at line 1.10:
%   1 | - * ** x0.
%                ^

SWIP also treats this as an error:

?- X=(- * ** x0).
ERROR: Syntax error: Operator expected
ERROR: X=(- * **
ERROR: ** here **
ERROR:  x0) . 

but the test parser does not.

?- simp_parse([-,*,**,x0],T).
T = - (*)**x0.

So here’s three root causes of differences mainly confined to corner cases involving errors. In most (all?) cases, the non-error interpretation seems reasonable to me, so the errors are either bugs or differences in interpretation due to incomplete specification.

I guess the first step is to identify all such root causes and use them to define additions/clarifications to existing specifications. At that point, Prolog implementors would have something concrete on which to base further development, should they so choose.

1 Like

There might be some useful ideas in the papers referenced here: (plus the thread mentions an ancient technique for operator precedence handling in Fortran parsers).

Well if that’s what the standard says, I agree with your opinion. Parsing SWIP syntax requires lookahead as the grammar explicitly states:

	expr   = PrefixOp " " (&PrefixOp expr / !InfixOp expr)
	       / term " " ( InfixOp " " expr " " / PostfixOp " " )*

&” and “!” are PEG lookahead operators. There seem to be a couple of cases where single token lookahead is currently used, but maybe they’re mainly for convenience, e.g., to exclude “,” as an operator in argument lists. (Note that the pPEG parser produced from the grammar is more than a simple tokenizer.)

Not sure how this is parseable if restricted to LL(1), but I’m no expert.

The above has a bug I guess. HEAD(PrefixOp) and HEAD(term) are not disjoint.
In as far it violates the following, since your pPEG / might decide to go for term
even when it has seen a PrefixOp:

The word directly basically means without some right hand side
context. So when you see a PrefixOp, you need already commit.
Here is a possible fix:

expr   = PrefixOp " " expr
	       / !PrefixOp  term " " ( InfixOp " " expr " " / PostfixOp " " )*

The fix does the job, but I might have broken some other stuff,
you see the before and after:

?- string_termList("- * x0 .", [T]).
T =  (-)*x0.

?- string_termList("- x0 + x1 .", [T]).
T = -x0+x1.

% ppegpl/examples/swip-grammar/pl_grammar compiled ...
?- string_termList("- * x0 .", [T]).
% pPEG Error: Prolog.InfixOp failed, expected ...
%   1 | - * x0 .
%             ^

?- string_termList("- x0 + x1 .", [T]).
T = -x0+x1.

What did I break, this here:

?- string_termList("(-).", [T]).
% pPEG Error: Prolog.PrefixOp failed, expected &[0-9] at line 1.3:
%   1 | (-).
%         ^

But, as you found, the syntax for a PrefixOp is ambiguous - it could either be a PrefixOp or an atom, so you can’t commit immediately. In the case of the first choice there may be no “operand” to apply the PrefixOp to. On the other hand, the !PrefixOp choice also fails if the “token” matches a PrefixOp syntactically.

So isn’t this the root of the problem? The syntax is ambiguous - is it a PrefixOp or an atom (a type of term) `? - and the ambiguity can’t be resolved without lookahead, which isn’t allowed under LL(1).

Another example: is a PrefixOp followed immediately by a “(” a PrefixOp? I would say no.

No I didn’t find that. The syntax is not ambiguous, since the
ISO core standard contains this clause, the only grammarly “not”
in the ISO core standard? (corresponds to pPEG !): Atoms

Basically excluding PrefixOp from expr in here, is a first
step towards implementing the above rule:

Since when you exclude PrefixOp from term, you will not have
PrefixOp as an immediate operand in the subsequent InfixOp expr
or PostfixOp . But the matter is more complex, the above is

only a sketch. You need to be also able to handle cases like this here:

In the above case the minus sign - is not an immediate operand. I don’t
know how to fix your pPEG grammar towards the end, that PrefixOp is only
excluded from term if it follows by InfixOp or PostfixOp, i.e. to have the cake

and eat it too, fully realizing the Atoms clause.

Edit 17.05.2022:
In simp_parse/2 I solved the problem differently and more tolerant
than the ISO core standard, nevertheless the minus sign - , since it is
a PrefixOp, will not appear as an immediate operand on the left hand

side. But I allow it on the right hand side, but then I have differences
in term building with SWI-Prolog:

/* SWI-Prolog */
?- X = (- -), write_canonical(X), nl.
X = - (-).

?- simp_parse([-,-], T), write_canonical(T), nl.
T = - (-).

?- X = (- - * - -), write_canonical(X), nl.
X = - (-)* - (-).

?- simp_parse([-,-,*,-,-], T), write_canonical(T), nl.
T = - - (*)-(-).

Which is in both cases a relaxing of the ISO core standard, since
it violates the Atoms clause, check out Scryer Prolog:

$ target/release/scryer-prolog -v
$ target/release/scryer-prolog
?- X = (- -), write_canonical(X), nl.
?- X = (- - * - -), write_canonical(X), nl.

Looks like somebody did also try to realize this rule in your pPEG
SWIPL Example. If I check the source code of
I find for example the following:

% simple infix
build_term_([V1, op(_P1,_A1,Op1), V2], VarsIn, VarsOut, Term) :-            
	not_op(V1), not_op(V2),

The only problem is that this is done too late. Is suspect doing
this in the would be more appropriate. But in there is hardly a not_op/1.

What I found useful was pPEGs negation !. But I only found
a partial solution and not a full solution. Since ( op ) cannot
be parsed anymore.

So by ISO would prohibit:

?- X = (- -).
X = - (-).


?- X = (- * -).
X =  (-)*(-).

If so, would you really want to enforce it?

I see only two practical alternatives to the expr grammar rule; either (as written):

	expr   = PrefixOp " " (&PrefixOp expr / !InfixOp expr)
	       / term " " ( InfixOp " " expr " " / PostfixOp " " )*

This appears to exploit to a degree since it says an InfixOp cannot be an operand of a PrefixOp. So this fails (as does SWIP):

?- string_termList("- * .",[T]).
% pPEG Error: Prolog.expr failed, expected Prolog.expr at line 1.5:
%   1 | - * .
%           ^


	expr   = PrefixOp " " expr
	       / term " " ( InfixOp " " expr " " / PostfixOp " " )*

and this fails:

?- string_termList("- * - .",[T]).
% pPEG Error: Prolog.expr failed, expected Prolog.expr at line 1.7:
%   1 | - * - .
%             ^

You have suggested a third alternative with a !PrefixOp guard, but then this fails:

% /Users/rworkman/Documents/PrologDev/SWI_Prolog/pPEG/SWIP-grammar/pl_grammar compiled into pl_grammar 0.02 sec, 0 clauses
?- string_termList("- .",[T]).
% pPEG Error: Prolog.expr failed, expected Prolog.expr at line 1.3:
%   1 | - .
%         ^

which seems even more restrictive than ISO, so it wouldn’t make my short list.

Scryer seems to take literally. That seems too restrictive to me and, it appears, most Prolog implementations agree. For what it’s worth, here’s what Eclipse has to say on the matter (at least they say something):

A.3.3 Operator Ambiguities
Unlike the canonical syntax, operator syntax can lead to ambiguities.

For instance, when a prefix operator is followed by an infix or postfix operator, the prefix is often not meant to be a prefix operator, but simply the left hand side argument of the following infix or postfix. In order to decide whether that is the case, ECLiPSe uses the operator’s relative precedences and their associativities, and, if necessary, a two-token lookahead. If this rules out the prefix-interpretation, then the prefix is treated as a simple atom. In the rare case where this limited lookahead is not enough to disambigute, the prefix must be explicitly enclosed in parentheses.

Another source of ambiguity are operators which have been declared both infix and postfix. In this case, ECLiPSe uses a one-token lookahead to check whether the infix-interpretation can be ruled out. If yes, the operator is interpreted as postfix, otherwise as infix. Again, in rare cases parentheses may be necessary to enforce the interpretation as postfix.

When a binary prefix operator is followed by an infix operator, then either of them could be the main functor. Faced with the ambiguity, the system will prefer the infix interpretation. To force the binary prefix to be recognised, the infix must be enclosed in parentheses.

Ref: Formal definition of clause syntax

Note: Eclipse supports a binary prefix operator which is outside the scope of this discussion so the last paragraph can be ignored.

And this also fails:

But there should be some way to make it work.

Edit 17.05.2022:
It also works in simp_parse/2:

?- simp_parse([-], T).
T =  (-).

?- simp_parse(['(',-,')'], T).
T =  (-).

Thanks. Here’s another I found:

“Precedences in specifications and implementations of programming languages” :

But to my untrained eye, they don’t address the root of the problem with Prolog expressions, namely that operators and atoms are syntactically indistinguishable. So “- *” could be parsed as “-(*)” (it’s actually a syntax error in SWIP) and “- * -” as “*((-),(-))”. Note in the first, “-” is and operator and “*” is an atom, while the roles are reversed in the second. In theory with sufficient lookahead this can be resolved, but it gets a bit tricky in practice for the general case and it gets worse when trying to define it in a grammar which recognizes Prolog syntax, which is my main focus here.

In practice, it’s not a big deal since you can just add enough parentheses to produce the right semantics. But as @j4n_bur53 rightly points out, the implicit assumptions built into various builtin Prolog parsers can adversely affect portability. At least, a grammar forces you to make some of them explicit.