Parsing text using a formal grammar: SWIPL Example

This is an interesting pPEG example bug. Actually I wanted to hunt
parenthesis parsing bugs, so I was up to generating some test cases that
contain parenthesis. But this test case doesn’t contain any parenthesis,

but it seems a binary infix operator is parsed as an unary functor?

?- string_termList(":- x0 =:= :- x1 - x2 .", [Y]), write_canonical(Y), nl.
:-(:-(=:=(x0),-(x1,x2)))
Y =  (:- (=:=(x0):-x1-x2)).

I am still using pPEG from 14.04.2022. How do I do a package update?

Edit 30.04.2022:
Concerning parenthesis I only found this bug, but its not really parenthesis
related. So unlike Trealla it could be that parenthesis are not a problem.
This here gives the same associativity hickup with and without parenthesis:

?- string_termList("- ( x0 * x1 ) ** x2 =:= x3 .", [Y]), 
   write_canonical(Y), nl.
-(=:=(**(*(x0,x1),x2),x3))
Y = - ((x0*x1)**x2=:=x3).

?- string_termList("- x4 ** x2 =:= x3 .", [Y]), write_canonical(Y), nl.
-(=:=(**(x4,x2),x3))
Y = - (x4**x2=:=x3).

?- Y = (- x4 ** x2 =:= x3), write_canonical(Y), nl.
=:=(-(**(x4,x2)),x3)
Y =  (-x4**x2=:=x3).

I believe this is the old right reduction bug. Using the current version:

?- string_termList("- x4 ** x2 =:= x3 .", [Y]), write_canonical(Y), nl.
=:=(-(**(x4,x2)),x3)
Y =  (-x4**x2=:=x3).

You should be able to use pack_upgrade/1, i.e., pack_upgrade(pPEGpl). Failing that, use pack_install to re-install after removing it. (Ref: SWI-Prolog -- library(prolog_pack): A package manager for Prolog .)

The examples, including SWIP grammar and parser, are part of the pack so can be found wherever your copy of SWI-Prolog installs packs. Or you can find them in the github repository (https://github.com/ridgeworks/pPEGpl).

I get false in bold red letters, what does this mean?

?- pack_upgrade(pPEGpl).
% Running "git fetch" in 'c:/users/foobar/appdata/local/swi-prolog/pack/pPEGpl'
% From https://github.com/ridgeworks/pPEGpl
%    ab024cd..6b7e334  main       -> origin/main
false.

Maybe it didn’t update, since I changed one file. (See
other thread, I did a small change for better error handling).

Does SWI-Prolog ship with a GIT client? Can it do merge?

Edit 02.05.2022:
Pack remove is also a bumpy road:

?- pack_remove(pPEGpl).
% Removing 'c:/users/foobar/appdata/local/swi-prolog/pack/pPEGpl' and contents
ERROR: No permission to delete file `'c:/users/foobar/appdata/local/swi-prolog/pack/pPEGpl/.git/objects/21/a031023db3ac04b268c5118b304954a03edb98'' (Permission denied)

I tried another machine, could install new version what was there today 02.05.2022.

Now this test case works:

?- string_termList("- x4 ** x2 =:= x3 .", [Y]), write_canonical(Y), nl.
=:=(-(**(x4,x2)),x3)
Y =  (-x4**x2=:=x3).

The other error is also gone:

?- string_termList(":- x0 =:= :- x1 - x2 .", [Y]), write_canonical(Y), nl.
% prolog_parser Error, operator clash in: op(1200,fx,:-) 
% =:=(x0) op(1200,fx,:-) x1 op(500,yfx,-) x2
false.

Will use version downloaded 02.05.2022 for some further investigations.

Upgrading a git pack simply runs git pull. If you make local modifications that fails. Under swipl-win git user interaction probably fails, so you need to run git by hand.

Hmm. Seems a Windows thing. Seems remove() doesn’t like to remove read-only files on Windows. Yip, documents say so :frowning:

When you remove manually, like for example on WSL:

> rm -r <git folder>

Then the shell will ask you a yes/no question for certain folders.
But it will be able to completely delete the git folder, if you confirm all.
On windows you can delete SWI pack folders in the explorer.

But I don’t know whether this is allowed, does SWI keep some
information elsewhere, or is what is stored on Windows in
'c:/users/foobar/appdata/local/swi-prolog/pack on per

package basis, all that is stored? Is some pack_cleanup call needed?

Yes. A pack can simply be deleted by deleting its directory.

The current git version can delete read-only files for consistency with other operating systems. It is a bit tricky though because it may not be allowed to delete a file for several reasons in Windows. So, we try to delete the file. If that returns a permission error and the file is read-only, we clean the read-only flag and try again. If it still fails we restore the read-only flag. This makes the operation non-atomic :frowning: Usually Windows provides Func() and a later FuncEx() that provides flags for such cases, but not so for DeleteFile() :frowning:

Was preparing a table with some critical cases. With the new pPEG SWIPL
Example version 02.05.2022, I still find a unary/binary confusion bug:

?- ppeg_parse([-, =:=, :-, -, x0], X), write_canonical(X), nl.
:-(=:=(-),-(x0))
X =  (=:=(-):- -x0).

And even a zeroarity/unary confusion bug:

?- ppeg_parse([-, -, *, :-, x0], X), write_canonical(X), nl.
:-(*(-(-)),x0)
X =  (*(- (-)):-x0).

Sorry, I’ve lost the plot here. What string will be given to the parser corresponding to the list argument in the calls to ppeg-parse?

The strings are:

?- use_module(library(pPEG)).
true.

?- ['pPEGpl/Examples/SWIP-grammar/pl_grammar.pl'].
true.

?- ['pPEGpl/Examples/SWIP-grammar/pl_parser.pl'].
true.

?- string_termList("- =:= :- - x0 .", [X]), write_canonical(X), nl.
:-(=:=(-),-(x0))
X =  (=:=(-):- -x0).

?- string_termList("- - * :- x0 .", [X]), write_canonical(X), nl.
:-(*(-(-)),x0)
X =  (*(- (-)):-x0).

But maybe its a problem with error handling, realization of
your philosophical objection, that the above should fail?
I was using verbose(silent)?

Edit 08.05.2022:
If you take SWI-Prolog as the reference implementation, the two
should indeed fail in pPEG SWIPL example, since in SWI-Prolog
they throw an error:

?- X = (- =:= :- - x0), write_canonical(X), nl.
ERROR: Syntax error: Operator priority clash

?- X = (- - * :- x0), write_canonical(X), nl.
ERROR: Syntax error: Operator priority clash

Yes there’s a bug somewhere in the expression generator since the infix operator =:= ends up with a single argument. I think it could be parsed if =:= was treated as an atom:

?- X = (- (=:=) :- -x0), write_canonical(X).
:-(-(=:=),-(x0))
X =  (- (=:=):- -x0).

?- string_termList("- (=:=) :- - x0 .", [X]), write_canonical(X), nl.
:-(-(=:=),-(x0))
X =  (- (=:=):- -x0).

but this should probably fail as originally written.

The verbose flag should only affect the message output, not succeed/fail semantics.

One source of the bug is this here, I instrumented it with write/1 and nl/0:

% simple postfix
build_term_([V, op(_P,_A,Op)], VarsIn, VarsOut, Term) :-                    
	not_op(V),
	!,
	write('simple postfix, op='), write(op(_P,_A,Op)), nl,
	build_term_([V], VarsIn, NxtVars, Term1),
	reduce_term_(Op, Term1, NxtVars, VarsOut, Term).

You see the bug here:

?- string_termList("- =:= :- - x0 .", [X]).
simple postfix, op=op(700,xfx,=:=)
X =  (=:=(-):- -x0).

So the pPEG SWIPL Example trusts too much in the position
it finds an operator, i.e. [V, op(_P,_A,Op)], and doesn’t check
the operator icon, i.e. xfx, when forming a new term.

The position is not a reliable indicator, since what I was testing
with fuzz4 was errorneous sentences with displaced operators, that
are not in a correct position. Seen in SWI-Prolog that it throws

an error for the example sentence.

Yes, the “simple” prefix, postfix, and infix rules (the ones that call reduce_term_) all suffer from this bug. For now, I’m using:

build_term_([op(_P,A,Op), V], VarsIn, VarsOut, Term) :-                     % simple prefix
	not_op(V), sub_atom(A,0,1,1,'f'),                  % f_
	!,
	build_term_([V], VarsIn, VarsNxt, Term1), 
	reduce_term_(Op, Term1, VarsNxt, VarsOut, Term).

build_term_([V, op(_P,A,Op)], VarsIn, VarsOut, Term) :-                     % simple postfix
	not_op(V), sub_atom(A,1,1,0,'f'),                  % _f
	!,
	build_term_([V], VarsIn, NxtVars, Term1),
	reduce_term_(Op, Term1, NxtVars, VarsOut, Term).

build_term_([V1, op(_P1,A1,Op1), V2], VarsIn, VarsOut, Term) :-             % simple infix
	not_op(V1), not_op(V2), sub_atom(A1,1,1,1,'f'),    % _f_ 
	!,
	build_term_([V1],VarsIn,NxtVars1,[Term1]),
	build_term_([V2],NxtVars1,NxtVars2,[Term2]), 
	reduce_term_(Op1, [Term1,Term2], NxtVars2, VarsOut, Term).

but I haven’t had the time to test this fix or determine that it’s sufficient.

My fuzzer is open source. You find a link
at the end of this medium article here.
What one can try to see, whether the pPEG
SWIPL Example approaches SWI-Prolog.

Currently the results are:

% fuzz3, swi: 0
% fuzz4, swi: 278
% fuzz3, ppeg: 887
% fuzz4, ppeg: 596

I guess with every fix these figures will go down.
I can also do some testing when a new version is
available on GitHub. Maybe you want to test something
else, I tested versus my reference implementation,

but you could also test pPEG against SWI-Prolog.

It’s a bit of work to try and extract the root causes of “disagreement” between the parsers as there’s no error output and the fuzz tests sem to have a lot of duplication. (I counted 30 instances of the same failed test case in one run of fuzz4.) From what I can see, the vast majority of discrepancies involve errors as opposed to parsing a valid term in different ways.

Here’s a start on root causes:

  1. There is a bug in the pPEG parser right reducing non-associative operator sequences. That will be fixed.

  2. Class of expressions exemplified by “x0 ** -x1”, so an xfx infix operator followed by a fy prefix operator with the same precedence:

?- current_op(P,A,-).
P = 200,
A = fy ;
P = 500,
A = yfx.

?- current_op(P,A,**).
P = 200,
A = xfx.

?- X=(x0 ** - x1).
ERROR: Syntax error: Operator priority clash
ERROR: X=(x0 *
ERROR: ** here **
ERROR: * - x1) . 

?- string_termList("x0 ** - x1.",[T]).
T = x0**(-x1).

The pPEG parser supports this as a valid right associative sequence:

op_associativityEq(xfx,fy,right).

which seems to work well in practice, so I’ve left it in place; SWIP (and others?) do not. When I temporarily remove this capability, along with fix for 1., the fuzz3 count drops to 0.

  1. Class of expressions exemplified by “- * x0”. These cannot be parsed successfully if the first “token” is interpreted as a prefix operator. But it can be parsed if it’s interpreted as an atom, which both SWIP and the pPEG parser do:
?- X=(- * x0).
X =  (-)*x0.

?- string_termList("- * x0.",[T]).
T =  (-)*x0.

but the test parser apparently does not:

?- simp_parse([-,*,x0],T).
T = error.
  1. Somewhat related to 3, I recall (perhaps incorectly) reading somewhere that a prfix operator cannot be immediately followed by an infix operator. I do enforce this in the grammar with limited lookahead so:
?- string_termList("- * ** x0.",[T]).
% pPEG Error: Prolog.InfixOp failed, expected <pl_grammar:testOp infix> at line 1.10:
%   1 | - * ** x0.
%                ^
false.

SWIP also treats this as an error:

?- X=(- * ** x0).
ERROR: Syntax error: Operator expected
ERROR: X=(- * **
ERROR: ** here **
ERROR:  x0) . 

but the test parser does not.

?- simp_parse([-,*,**,x0],T).
T = - (*)**x0.

So here’s three root causes of differences mainly confined to corner cases involving errors. In most (all?) cases, the non-error interpretation seems reasonable to me, so the errors are either bugs or differences in interpretation due to incomplete specification.

I guess the first step is to identify all such root causes and use them to define additions/clarifications to existing specifications. At that point, Prolog implementors would have something concrete on which to base further development, should they so choose.

1 Like

There might be some useful ideas in the papers referenced here: https://twitter.com/jurgenvinju/status/1526123558539337728 (plus the thread mentions an ancient technique for operator precedence handling in Fortran parsers).

Well if that’s what the standard says, I agree with your opinion. Parsing SWIP syntax requires lookahead as the grammar explicitly states:

	expr   = PrefixOp " " (&PrefixOp expr / !InfixOp expr)
	       / term " " ( InfixOp " " expr " " / PostfixOp " " )*

&” and “!” are PEG lookahead operators. There seem to be a couple of cases where single token lookahead is currently used, but maybe they’re mainly for convenience, e.g., to exclude “,” as an operator in argument lists. (Note that the pPEG parser produced from the grammar is more than a simple tokenizer.)

Not sure how this is parseable if restricted to LL(1), but I’m no expert.

The above has a bug I guess. HEAD(PrefixOp) and HEAD(term) are not disjoint.
In as far it violates the following, since your pPEG / might decide to go for term
even when it has seen a PrefixOp:

The word directly basically means without some right hand side
context. So when you see a PrefixOp, you need already commit.
Here is a possible fix:

expr   = PrefixOp " " expr
	       / !PrefixOp  term " " ( InfixOp " " expr " " / PostfixOp " " )*

The fix does the job, but I might have broken some other stuff,
you see the before and after:

?- string_termList("- * x0 .", [T]).
T =  (-)*x0.

?- string_termList("- x0 + x1 .", [T]).
T = -x0+x1.

?- 
% ppegpl/examples/swip-grammar/pl_grammar compiled ...
?- string_termList("- * x0 .", [T]).
% pPEG Error: Prolog.InfixOp failed, expected ...
%   1 | - * x0 .
%             ^
false.

?- string_termList("- x0 + x1 .", [T]).
T = -x0+x1.

What did I break, this here:

?- string_termList("(-).", [T]).
% pPEG Error: Prolog.PrefixOp failed, expected &[0-9] at line 1.3:
%   1 | (-).
%         ^
false.

But, as you found, the syntax for a PrefixOp is ambiguous - it could either be a PrefixOp or an atom, so you can’t commit immediately. In the case of the first choice there may be no “operand” to apply the PrefixOp to. On the other hand, the !PrefixOp choice also fails if the “token” matches a PrefixOp syntactically.

So isn’t this the root of the problem? The syntax is ambiguous - is it a PrefixOp or an atom (a type of term) `? - and the ambiguity can’t be resolved without lookahead, which isn’t allowed under LL(1).

Another example: is a PrefixOp followed immediately by a “(” a PrefixOp? I would say no.

No I didn’t find that. The syntax is not ambiguous, since the
ISO core standard contains this clause, the only grammarly “not”
in the ISO core standard? (corresponds to pPEG !):

6.3.1.3 Atoms
Unbenannt

Basically excluding PrefixOp from expr in here, is a first
step towards implementing the above rule:

Since when you exclude PrefixOp from term, you will not have
PrefixOp as an immediate operand in the subsequent InfixOp expr
or PostfixOp . But the matter is more complex, the above is

only a sketch. You need to be also able to handle cases like this here:

In the above case the minus sign - is not an immediate operand. I don’t
know how to fix your pPEG grammar towards the end, that PrefixOp is only
excluded from term if it follows by InfixOp or PostfixOp, i.e. to have the cake

and eat it too, fully realizing the 6.3.1.3 Atoms clause.

Edit 17.05.2022:
In simp_parse/2 I solved the problem differently and more tolerant
than the ISO core standard, nevertheless the minus sign - , since it is
a PrefixOp, will not appear as an immediate operand on the left hand

side. But I allow it on the right hand side, but then I have differences
in term building with SWI-Prolog:

/* SWI-Prolog */
?- X = (- -), write_canonical(X), nl.
-(-)
X = - (-).

?- simp_parse([-,-], T), write_canonical(T), nl.
-(-)
T = - (-).

?- X = (- - * - -), write_canonical(X), nl.
*(-(-),-(-))
X = - (-)* - (-).

?- simp_parse([-,-,*,-,-], T), write_canonical(T), nl.
-(-(-(*)),-)
T = - - (*)-(-).

Which is in both cases a relaxing of the ISO core standard, since
it violates the 6.3.1.3 Atoms clause, check out Scryer Prolog:

$ target/release/scryer-prolog -v
"v0.9.0-117-gf2d3e55e"
$ target/release/scryer-prolog
?- X = (- -), write_canonical(X), nl.
   error(syntax_error(incomplete_reduction),read_term/3:1).
?- X = (- - * - -), write_canonical(X), nl.
   error(syntax_error(incomplete_reduction),read_term/3:1).