Parsing text using a formal grammar: SWIPL Example

What I could exclude as a source of error, is ordering of
the operator table itself. At least for SWI-Prolog C code it does not
have an impact, which sequence of operator definitions I use:

This sequence of op/3 calls:

?- op(9, fy, fy), op(9, yfx, yfx).
true.

?- X = (fy 1 yfx 2), write_canonical(X), nl.
yfx(fy(1),2)
X = fy 1 yfx 2.

And this sequence of op/3 calls, give the same:

?- op(9, yfx, yfx), op(9, fy, fy).
true.

?- X = (fy 1 yfx 2), write_canonical(X), nl.
yfx(fy(1),2)
X = fy 1 yfx 2.

This does not exclude the possibility, that reordering the
operator table, inside SWI-Prolog C code, wouldn’t give
another result. It only shows that op/3 cannot confuse the parser.

Edit 15.04.2022:
Here is an example where operator table ordering, respective
reduction rules ordering, influence the reduction result.
Reduction rules ordered like this:

[fy,X] ~> [fy(X)].
[X,yfx,Y] ~> [yfx(X,Y)].

?- reduce([fy,1,yfx,2],X), write_canonical(X), nl.
[yfx(fy(1),2)]
X = [fy 1 yfx 2].

Or the same reduction rules ordered like this, gives a different result:

[X,yfx,Y] ~> [yfx(X,Y)].
[fy,X] ~> [fy(X)].

?- reduce([fy,1,yfx,2],X), write_canonical(X), nl.
[fy(yfx(1,2))]
X = [fy 1 yfx 2].

The reducer code itself is here below, it has a cut (!), so its
sensitive to the reduction rule ordering:

:- op(1200,xfx,~>).

reduce(X, Y) :-
   (A  ~> B),
   append(U, V, X),
   append(A, H, V),
   append(B, H, W),
   append(U, W, Z), !, reduce(Z, Y).
reduce(X, X).

So I think I’ve finally figured this one out. It’s a little nasty because within a Prolog expression a “-” can be a prefix operator, an infix operator, an atom, or the leading character of a number. So here’s a little test that uses all of them:

?- T= - -1- -, write_canonical(T), nl.
-(-(-1),-)
T = - -1-(-).

It get’s a little tricky to cleanly separate the syntax (defined by a grammar) from the semantics in order to figure out which is which.

Cool!

When this is pushed, its possible to change the fuzzer to also
include - operator, prefix and infix, and negative numbers, so
that a greater audience could also life experience some validation

run. To automatically hunt for, more or now less?, easter eggs.

Edit 15.04.2022:
Right now I get, based on the 14.04.2022 version:

?- between(1,100,_), random_expr(0, _, A), swi_parse(A, X), 
ppeg_parse(A, Y), X \== Y, write('expr: '), write(A), write('\nswi: '), 
write_canonical(X), write('\nppeg: '), write_canonical(Y), nl, nl, fail.

expr: - fx x0 - x1 yf - x2
swi: -(-(-(fx(x0)),yf(x1)),x2)
ppeg: -(-(fx(x0),-(yf(x1),x2)))

expr: - - -8 yf
swi: -(-(yf(-8)))
ppeg: -(-(-),yf(8))

Etc...

The new random actions are:

random_expr(N, M, A) :-
   K is 14+N*7, %%% increase here
   random(0, K, I),
   random_action(I, N, M, A).

%%% add these actions before the last action
random_action(7, N, N, A) :-
   random(0, 21, X),
   Y is X - 10,
   number_codes(Y, L),
   atom_codes(A, L).
random_action(8, N, M, A) :- !,
   random_expr(N, H, B),
   random_expr(H, M, C),
   atom_concat(B, ' - ', D),
   atom_concat(D, C, A).
random_action(9, N, M, A) :- !,
   random_expr(N, M, B),
   atom_concat('- ', B, A).

I also changed the last random action so that it generates x<number> and not only <number>.

These are even cases where the ISO core standard agrees with
SWI-Prolog. Possibly because the operator - infix and prefix gives
no ambiguity, because the priorities are different:

?- current_op(X,Y,-).

X = 200
Y = fy ? ;

X = 500
Y = yfx

The test results are, for these two test cases:

Dogelog Player: Pass ✓
Formerly Jekejeke Prolog: Pass ✓
ECLiPSe Prolog: Pass ✓
Tau Prolog: Fail ✗
Scryer Prolog: Pass ✓
SICStus Prolog: Pass ✓
GNU Prolog: Pass ✓

Tau Prolog fails the second test case…

Yes, I agree different precedences render the associativity values irrelevant.

After yet more fiddles to my development version:

?- string_termList("- fx x0 - x1 yf - x2.", [T]), write_canonical(T), nl.
-(-(-(fx(x0)),yf(x1)),x2)
T = -fx x0-x1 yf-x2.

?- string_termList("- - -8 yf.", [T]), write_canonical(T), nl.
-(-(yf(-8)))
T = - - -8 yf.

Hard to know when you’ve reached the end of the long tail.

A simple reader, without negative numbers, is a few lines of DCG.

?- reader(X,1200,[-,fx,x0,-,x1,yf,-,x2],[]), write_canonical(X), nl.
-(-(-(fx(x0)), yf(x1)), x2)
X = -fx x0-x1 yf-x2.

Here is the source code, unfortunately only for ISO core standard:

% reader(-Term, +Integer, +List, -List)
reader(X, L) -->
   reader_primary(Z, L, K), reader_secondary(Z, X, L, K).

% reader_primary(-Term, +Integer, -Integer, +List, -List)
reader_primary(H, L, R) --> [A], {current_op(R, M, A), is_prefix(M, E)}, !,
   {L < R -> throw(error(syntax_error(operator_clash),_)); true},
   {T is R-E}, reader(Z, T), {H =.. [A,Z]}.
reader_primary(X, _, 0) --> [X].

% reader_secondary(+Term, -Term, +Integer, +Integer, +List, -List)
reader_secondary(H, X, L, C) --> [A],
   {current_op(R, M, A), is_infix(M, D, E), L >= R}, !,
   {R-D < C -> throw(error(syntax_error(operator_clash),_)); true},
   {T is R-E}, reader(Z, T),
   {J =.. [A,H,Z]},
   reader_secondary(J, X, L, R).
reader_secondary(H, X, L, C) --> [A],
   {current_op(R, M, A), is_postfix(M, D), L >= R}, !,
   {R-D < C -> throw(error(syntax_error(operator_clash),_)); true},
   {J =.. [A,H]},
   reader_secondary(J, X, L, R).
reader_secondary(H, H, _, _) --> [].

% is_infix(-Atom, -Integer, -Imteger)
is_infix(xfx, 1, 1).
is_infix(yfx, 0, 1).
is_infix(xfy, 1, 0).

% is_prefix(-Atom, -Integer)
is_prefix(fx, 1).
is_prefix(fy, 0).

% is_postfix(-Atom, -Integer)
is_postfix(xf, 1).
is_postfix(yf, 0).

If I load the reader into SWI-Prolog, and if I run also the fuzzer, I can
compute some defect rate. I get that SWI-Prolog has a half as high
error rate than the pPEG parser from 14.04.2022:

?- aggregate_all(count, (between(1,10000,_), random_expr(0, _, A), 
ppeg_parse(A, X), simp_parse(A, Y), X \== Y), C).
C = 583.

?- aggregate_all(count, (between(1,10000,_), random_expr(0, _, A), 
swi_parse(A, X), simp_parse(A, Y), X \== Y), C).
C = 273.

The defect rates are ca. 6% for pPEG 14.04.2022 and ca. 3% for SWI-Prolog.

This is an interesting pPEG example bug. Actually I wanted to hunt
parenthesis parsing bugs, so I was up to generating some test cases that
contain parenthesis. But this test case doesn’t contain any parenthesis,

but it seems a binary infix operator is parsed as an unary functor?

?- string_termList(":- x0 =:= :- x1 - x2 .", [Y]), write_canonical(Y), nl.
:-(:-(=:=(x0),-(x1,x2)))
Y =  (:- (=:=(x0):-x1-x2)).

I am still using pPEG from 14.04.2022. How do I do a package update?

Edit 30.04.2022:
Concerning parenthesis I only found this bug, but its not really parenthesis
related. So unlike Trealla it could be that parenthesis are not a problem.
This here gives the same associativity hickup with and without parenthesis:

?- string_termList("- ( x0 * x1 ) ** x2 =:= x3 .", [Y]), 
   write_canonical(Y), nl.
-(=:=(**(*(x0,x1),x2),x3))
Y = - ((x0*x1)**x2=:=x3).

?- string_termList("- x4 ** x2 =:= x3 .", [Y]), write_canonical(Y), nl.
-(=:=(**(x4,x2),x3))
Y = - (x4**x2=:=x3).

?- Y = (- x4 ** x2 =:= x3), write_canonical(Y), nl.
=:=(-(**(x4,x2)),x3)
Y =  (-x4**x2=:=x3).

I believe this is the old right reduction bug. Using the current version:

?- string_termList("- x4 ** x2 =:= x3 .", [Y]), write_canonical(Y), nl.
=:=(-(**(x4,x2)),x3)
Y =  (-x4**x2=:=x3).

You should be able to use pack_upgrade/1, i.e., pack_upgrade(pPEGpl). Failing that, use pack_install to re-install after removing it. (Ref: SWI-Prolog -- Manual .)

The examples, including SWIP grammar and parser, are part of the pack so can be found wherever your copy of SWI-Prolog installs packs. Or you can find them in the github repository (https://github.com/ridgeworks/pPEGpl).

I get false in bold red letters, what does this mean?

?- pack_upgrade(pPEGpl).
% Running "git fetch" in 'c:/users/foobar/appdata/local/swi-prolog/pack/pPEGpl'
% From https://github.com/ridgeworks/pPEGpl
%    ab024cd..6b7e334  main       -> origin/main
false.

Maybe it didn’t update, since I changed one file. (See
other thread, I did a small change for better error handling).

Does SWI-Prolog ship with a GIT client? Can it do merge?

Edit 02.05.2022:
Pack remove is also a bumpy road:

?- pack_remove(pPEGpl).
% Removing 'c:/users/foobar/appdata/local/swi-prolog/pack/pPEGpl' and contents
ERROR: No permission to delete file `'c:/users/foobar/appdata/local/swi-prolog/pack/pPEGpl/.git/objects/21/a031023db3ac04b268c5118b304954a03edb98'' (Permission denied)

I tried another machine, could install new version what was there today 02.05.2022.

Now this test case works:

?- string_termList("- x4 ** x2 =:= x3 .", [Y]), write_canonical(Y), nl.
=:=(-(**(x4,x2)),x3)
Y =  (-x4**x2=:=x3).

The other error is also gone:

?- string_termList(":- x0 =:= :- x1 - x2 .", [Y]), write_canonical(Y), nl.
% prolog_parser Error, operator clash in: op(1200,fx,:-) 
% =:=(x0) op(1200,fx,:-) x1 op(500,yfx,-) x2
false.

Will use version downloaded 02.05.2022 for some further investigations.

Upgrading a git pack simply runs git pull. If you make local modifications that fails. Under swipl-win git user interaction probably fails, so you need to run git by hand.

Hmm. Seems a Windows thing. Seems remove() doesn’t like to remove read-only files on Windows. Yip, documents say so :frowning:

When you remove manually, like for example on WSL:

> rm -r <git folder>

Then the shell will ask you a yes/no question for certain folders.
But it will be able to completely delete the git folder, if you confirm all.
On windows you can delete SWI pack folders in the explorer.

But I don’t know whether this is allowed, does SWI keep some
information elsewhere, or is what is stored on Windows in
'c:/users/foobar/appdata/local/swi-prolog/pack on per

package basis, all that is stored? Is some pack_cleanup call needed?

Yes. A pack can simply be deleted by deleting its directory.

The current git version can delete read-only files for consistency with other operating systems. It is a bit tricky though because it may not be allowed to delete a file for several reasons in Windows. So, we try to delete the file. If that returns a permission error and the file is read-only, we clean the read-only flag and try again. If it still fails we restore the read-only flag. This makes the operation non-atomic :frowning: Usually Windows provides Func() and a later FuncEx() that provides flags for such cases, but not so for DeleteFile() :frowning:

Was preparing a table with some critical cases. With the new pPEG SWIPL
Example version 02.05.2022, I still find a unary/binary confusion bug:

?- ppeg_parse([-, =:=, :-, -, x0], X), write_canonical(X), nl.
:-(=:=(-),-(x0))
X =  (=:=(-):- -x0).

And even a zeroarity/unary confusion bug:

?- ppeg_parse([-, -, *, :-, x0], X), write_canonical(X), nl.
:-(*(-(-)),x0)
X =  (*(- (-)):-x0).

Sorry, I’ve lost the plot here. What string will be given to the parser corresponding to the list argument in the calls to ppeg-parse?

The strings are:

?- use_module(library(pPEG)).
true.

?- ['pPEGpl/Examples/SWIP-grammar/pl_grammar.pl'].
true.

?- ['pPEGpl/Examples/SWIP-grammar/pl_parser.pl'].
true.

?- string_termList("- =:= :- - x0 .", [X]), write_canonical(X), nl.
:-(=:=(-),-(x0))
X =  (=:=(-):- -x0).

?- string_termList("- - * :- x0 .", [X]), write_canonical(X), nl.
:-(*(-(-)),x0)
X =  (*(- (-)):-x0).

But maybe its a problem with error handling, realization of
your philosophical objection, that the above should fail?
I was using verbose(silent)?

Edit 08.05.2022:
If you take SWI-Prolog as the reference implementation, the two
should indeed fail in pPEG SWIPL example, since in SWI-Prolog
they throw an error:

?- X = (- =:= :- - x0), write_canonical(X), nl.
ERROR: Syntax error: Operator priority clash

?- X = (- - * :- x0), write_canonical(X), nl.
ERROR: Syntax error: Operator priority clash

Yes there’s a bug somewhere in the expression generator since the infix operator =:= ends up with a single argument. I think it could be parsed if =:= was treated as an atom:

?- X = (- (=:=) :- -x0), write_canonical(X).
:-(-(=:=),-(x0))
X =  (- (=:=):- -x0).

?- string_termList("- (=:=) :- - x0 .", [X]), write_canonical(X), nl.
:-(-(=:=),-(x0))
X =  (- (=:=):- -x0).

but this should probably fail as originally written.

The verbose flag should only affect the message output, not succeed/fail semantics.

One source of the bug is this here, I instrumented it with write/1 and nl/0:

% simple postfix
build_term_([V, op(_P,_A,Op)], VarsIn, VarsOut, Term) :-                    
	not_op(V),
	!,
	write('simple postfix, op='), write(op(_P,_A,Op)), nl,
	build_term_([V], VarsIn, NxtVars, Term1),
	reduce_term_(Op, Term1, NxtVars, VarsOut, Term).

You see the bug here:

?- string_termList("- =:= :- - x0 .", [X]).
simple postfix, op=op(700,xfx,=:=)
X =  (=:=(-):- -x0).

So the pPEG SWIPL Example trusts too much in the position
it finds an operator, i.e. [V, op(_P,_A,Op)], and doesn’t check
the operator icon, i.e. xfx, when forming a new term.

The position is not a reliable indicator, since what I was testing
with fuzz4 was errorneous sentences with displaced operators, that
are not in a correct position. Seen in SWI-Prolog that it throws

an error for the example sentence.

Yes, the “simple” prefix, postfix, and infix rules (the ones that call reduce_term_) all suffer from this bug. For now, I’m using:

build_term_([op(_P,A,Op), V], VarsIn, VarsOut, Term) :-                     % simple prefix
	not_op(V), sub_atom(A,0,1,1,'f'),                  % f_
	!,
	build_term_([V], VarsIn, VarsNxt, Term1), 
	reduce_term_(Op, Term1, VarsNxt, VarsOut, Term).

build_term_([V, op(_P,A,Op)], VarsIn, VarsOut, Term) :-                     % simple postfix
	not_op(V), sub_atom(A,1,1,0,'f'),                  % _f
	!,
	build_term_([V], VarsIn, NxtVars, Term1),
	reduce_term_(Op, Term1, NxtVars, VarsOut, Term).

build_term_([V1, op(_P1,A1,Op1), V2], VarsIn, VarsOut, Term) :-             % simple infix
	not_op(V1), not_op(V2), sub_atom(A1,1,1,1,'f'),    % _f_ 
	!,
	build_term_([V1],VarsIn,NxtVars1,[Term1]),
	build_term_([V2],NxtVars1,NxtVars2,[Term2]), 
	reduce_term_(Op1, [Term1,Term2], NxtVars2, VarsOut, Term).

but I haven’t had the time to test this fix or determine that it’s sufficient.