Formal grammar: SWIPL Example error handling

How can I catch errors? I tried:

?- ppeg_parse('0 xf yfx 1 xfy 2 xf xf yf xf', T).
% prolog_parser Error, operator clash in: 0 xf op(9,yfx,yfx) 1 ...
false.

?- swi_parse('0 xf yfx 1 xfy 2 xf xf yf xf', T).
T = syntax_error(operator_clash).

SWI-Prolog seems to emit an error and pPEG does what?
Display an error and fail? Was using this here:

% swi_parse(+Atom, -Term)
swi_parse(A, T) :-
   catch(atom_to_term(A, T, _), error(T, _), true).

% ppeg_parse(+Atom, -Term)
ppeg_parse(A, T) :-
   atom_concat(A, '.', B),
   atom_string(B, C),
   catch(string_termList(C, [T]), error(T, _), true).

Edit 15.04.2022:
Ok, a little mod of pl_parser.pl did the job:

% build_term_(Exp, VarsIn, _VarsOut, _Term) :-
% 	print_message(informational, prolog_parser(op_conflict(Exp,VarsIn))),
%	fail.
build_term_(_, _, _VarsOut, _Term) :-
   throw(error(syntax_error(operator_clash), _)).

Now I get:

?- ppeg_parse('0 xf yfx 1 xfy 2 xf xf yf xf', T).
T = syntax_error(operator_clash).

Using a fuzzer, I find differences what SWI-Prolog native C code
considers an operator clash and what the pPEG SWIPL examples
considers an operator clash. It seems pPEG is more tolerant.

This is how I used the fuzzer:

?- between(1,100,_), random_expr(0, _, A), swi_parse(A, X), 
ppeg_parse(A, Y), X \== Y, write('expr: '), write(A), write('\nswi: '), 
write_canonical(X), write('\nppeg: '), write_canonical(Y), nl, nl, fail.

expr: 0 xfx fy 1 yfx 2 yfx 3
swi: syntax_error(operator_clash)
ppeg: xfx(0,yfx(yfx(fy(1),2),3))

expr: fy fx fx 0
swi: syntax_error(operator_clash)
ppeg: fy(fx(fx(0)))

expr: fy fx 0 yf yf yf
swi: yf(yf(yf(fy(fx(0)))))
ppeg: fy(yf(yf(yf(fx(0)))))

expr: fy 0 xf yfx 1 yfx 2 yf
swi: yf(yfx(yfx(fy(xf(0)),1),2))
ppeg: fy(yf(yfx(yfx(xf(0),1),2)))

false.

The fuzzer code is here:

:- op(9, xfx, xfx).
:- op(9, yfx, yfx).
:- op(9, xfy, xfy).
:- op(9, fx, fx).
:- op(9, fy, fy).
:- op(9, xf, xf).
:- op(9, yf, yf).

% random_expr(+Integer, -Integer, -Atom)
random_expr(N, M, A) :-
   K is 10+N*5,
   random(0, K, I),
   random_action(I, N, M, A).

% random_expr(+Integer, +Integer, -Integer, -Atom)
random_action(0, N, M, A) :- !,
   random_expr(N, H, B),
   random_expr(H, M, C),
   atom_concat(B, ' xfx ', D),
   atom_concat(D, C, A).
random_action(1, N, M, A) :- !,
   random_expr(N, H, B),
   random_expr(H, M, C),
   atom_concat(B, ' yfx ', D),
   atom_concat(D, C, A).
random_action(2, N, M, A) :- !,
   random_expr(N, H, B),
   random_expr(H, M, C),
   atom_concat(B, ' xfy ', D),
   atom_concat(D, C, A).
random_action(3, N, M, A) :- !,
   random_expr(N, M, B),
   atom_concat('fx ', B, A).
random_action(4, N, M, A) :- !,
   random_expr(N, M, B),
   atom_concat('fy ', B, A).
random_action(5, N, M, A) :- !,
   random_expr(N, M, B),
   atom_concat(B, ' xf', A).
random_action(6, N, M, A) :- !,
   random_expr(N, M, B),
   atom_concat(B, ' yf', A).
random_action(_, N, M, A) :-
   number_codes(N, L),
   atom_codes(A, L),
   M is N+1.

Maybe some priority update missing, this clashes:

?- string_termList("a=b=d.", [T]).
ERROR: Syntax error: Operator priority clash

This doesn’t clash, but it should:

?- string_termList("a=b*c=d.", [T]).
T =  (a=(b*c=d)).

Edit 16.04.2022:
I tested arithmetic expressions and pPEG SWIPL example, and I now
find that the original SWI-Prolog C code is more robust here, than the
pPEG SWIPL example. I get:

?- aggregate_all(count, (between(1,10000,_), random_expr(0, _, L, []), 
swi_parse(L, X), simp_parse(L, Y), X \== Y), C).
C = 0.

?- aggregate_all(count, (between(1,10000,_), random_expr(0, _, L, []), 
ppeg_parse(L, X), simp_parse(L, Y), X \== Y), C).
C = 2134.

So the defect rate is now ca. 22% for pPEG 14.04.2022 whereas its 0% for SWI-Prolog. I used these random actions, only common arithmetic operators, and a few other Prolog operators. Also this version of the fuzzer is now written in DCG.

% random_expr(+Integer, +Integer, -Integer, +List, -List)
random_action(0, N, M) --> !,
   random_expr(N, H), [=:=], random_expr(H, M).
random_action(1, N, M) --> !,
   random_expr(N, H), [-], random_expr(H, M).
random_action(2, N, M) --> !,
   random_expr(N, H), [','], random_expr(H, M).
random_action(3, N, M) --> !,
   [:-], random_expr(N, M).
random_action(4, N, M) --> !,
   [-], random_expr(N, M).
random_action(5, N, M) --> !,
   random_expr(N, H), [*], random_expr(H, M).
random_action(6, N, M) --> !,
   random_expr(N, H), [**], random_expr(H, M).
random_action(_, N, M) -->
   {number_codes(N, L),
   atom_codes(A, [0'x|L]),
   M is N+1}, [A].

Also due to same right reduction bug as previous example; from dev. version:

?- string_termList("a=b*c=d.", [T]).
% prolog_parser Error, operator clash in: a op(700,xfx,=) b*c op(700,xfx,=) d
false.

Ok, will be happy to test it again. Now I found that one other
Prolog system is also a little tolerant. SICStus Prolog tolerates this:

/* SICStus Prolog 4.7.1 */
?- X = (:- :- x0).
X = ((:-):-x0) ?

SWI-Prolog and Scryer Prolog are more thight:

/* Scryer Prolog */
?- X = (:- :- x0).
caught: error(syntax_error(incomplete_reduction),read_term/3:1)

/* SWI-Prolog */
?- X = (:- :- x0).
ERROR: Syntax error: Operator priority clash 

I am almost through with fuzzing, whats still on my todo is ECLiPSe Prolog
and Tau Prolog. The difficulty is finding a predicate that does something
like GNU Prologs read_term_from_atom/3.

That’s interesting. Given:

?- current_op(P,A,:-).
P = 1200,
A = fx ;
P = 1200,
A = xfx.

While (:- :- x0) cannot be :-(:- x0) it can be :-(':-',x0), which is how SICStus has apparently parsed it. I think that’s technically correct, but I’m not sure when these infix semantics would be useful (clause with a head of ':-' ?). I suspect that anybody who wrote this example wouldn’t expect the infix interpretation, so maybe the parse failure is actually more useful, even if not correct.

Another interesting corner case is fx(Exp). This could be parsed as either a compound term of arity 1 or a prefix operator applied to an expression in parentheses. If the latter, then it’s subject to operator precedence, so I suspect most Prolog’s treat fx as a functor (effectively binding precedence = 0).

If you still have the patience I’ve pushed a new version up to github. It should fix any right reduction issues and I think it handles the all the “-” overloading issues.

Warning! Long response…

The ISO core standad says the later cannot be, the plain argument is
wrong. Since :- is an operator, so it has level 1201, when you want it as
an atom. But :- is operator of priority 1200 with mode xfx, so neither the
left argument nor the right argument can be 1201:

Unbenannt

The only way to have it in an atom, is then in parenthesis:

You see the parenthesis, that would be required, in the SICStus Prolog output:

The ISO core standard also says the former cannot be, since the operator is fx.
:- x0 is an expression with a principal functor that is an operator of priority 1200,
so it cannot become an argument of an operator with priority 1200 and mode fx.
Since mode fx says the argument must be 1199:

Unbenannt6

Edit 13.04.2022:
The code I published, the parser I published here, doesn’t have
parenthesis parsing. Its a little simplified, but it also rejects the
input, but it does so by the second rule. There are multiple rules
that reject it, the rejection of :-(:- x0) and the rejection of :-(':-',x0).

Some third rule from the parser I published is the one token lookahead rule. If you
see :- you need to be able to immediately take it as fx, and then the error is
later a result of an operator clash, and not of a missing operator priority
override, as the parenthesis rule allows. What my simple parser can reject is

therefore :-(:- x0). Nevertheless the parser I published is a little bit
more tolerant than the ISO core standard. But it doesn’t give differently
built result terms, still the same as ISO core standard, also mostly the
same rejected as ISO core standard, a few more accepted.

Nope. You need same lexical information to distinguish ‘- 1’ and ‘-1’ in SWI-Prolog
dialect. f(...) goes always into the compound production if there is nothing
between f and the left parenthesis (. If your prototype does not yet implement

this rule, you might consider to do so. In the ISO core standard this is
modelled by two different parenthesis:

Unbenannt3

The “open” is a right parenthesis which does not immediately follow.
And the “open ct” is a right parenthesis that does immediately follow. The
compound rule explicitly requires an “open ct”:

So SICStus is non-ISO compliant in this regard. I’ve heard comments that ISO operators are “too restrictive”, maybe this is one of those cases they’re talking about(?).

I’m think a parse failure is appropriate (and helpful) here, so I’m quite happy with ISO interpretation.

But not required in the input which seems odd.

I think the first case should be rejected because you can’t “cascade” fx operators. Not sure why the second should be rejected for ':-' :- x0, but maybe I’m misunderstanding.

I’ve recently added the grammar restriction that a prefix operator cannot be immediately followed by a “(”; instead that’s the start of a compound term, so I think it’s covered.

You didn’t say that you also changed the input from :- :- x0 to an
input such as ':-' :- x0. So its not clear what you are asking.

The SICStus Example so far was only :- :- x0.

Maybe you are aware that SWI-Prolog has introduced some
additional specifics in its dialect concerning quoted atoms

and the parsing of operators, not found elsewhere?

More testing results: ECLiPSe Prolog is somewhere
between SWI-Prolog and SICStus Prolog, concerning
tolerating extra syntax.

SICStus Prolog, quite tolerant:

?- X = (- :- x0).
X = ((-):-x0) ?
?- X = (:- :- x0).
X = ((:-):-x0) ?

ECLIPSe Prolog, half way tolerant:

[eclipse 36]: X = (- :- x0).
X = ((-) :- x0)
[eclipse 37]: X = (:- :- x0).
syntax error: postfix/infix operator expected

SWI-Prolog, intolerant, good!!!

?- X = (- :- x0). 
ERROR: Syntax error: Operator priority clash
?- X = (:- :- x0).
ERROR: Syntax error: Operator priority clash 

My reader prototype sides with SWI-Prolog.

How do I get different error messages. This shows me the same error message:

?- string_termList("X = (:- x0.",[T]), write_canonical(T).
% pPEG Error: Prolog.PrefixOp failed, 
% expected <pl_grammar:testOp prefix> at line 1.11:
%   1 | X = (:- x0.
%                 ^
false.

?- string_termList("X = :- x0).",[T]), write_canonical(T).
% pPEG Error: Prolog.PrefixOp failed, 
% expected <pl_grammar:testOp prefix> at line 1.10:
%   1 | X = :- x0).
%                ^
false.

I am still using the old version from 13.04.2022.
In SWI-Prolog I get two different errors:

?- X = (:- x0.
ERROR: Syntax error: Operator expected
ERROR: X = (:- x0
ERROR: ** here **
ERROR:  . 
?- X = :- x0).
ERROR: Syntax error: Illegal start of term
ERROR: X = :- x
ERROR: ** here **
ERROR: 0) . 

Edit 28.04.2022:
This is related to bailing out when reading multiple times. It seems
string_termList/2 gives up on the first error?

?- string_termList("X = :- x0). X = (:- x0).",L), write_canonical(L).
% pPEG Error: Prolog.PrefixOp failed, 
% expected <pl_grammar:testOp prefix> at line 1.10:
%   1 | X = :- x0). X = (:- x0).
%                ^
false.

This is because the pPEG prototype already checks the balancing
of parenthesis in the first tokenization phase and aborts,
whereas SWI-Prolog pursues another strategy.

Yes, a pPEG grammar is like a regular expression on steroids. When given a string to match, it proceeds as far as it can based on the grammar specification producing a parse tree as it goes. When the grammar fails to match the input string, it fails and, unlike regular expressions, produces a message indicating how far it got and what rule failed to match. This doesn’t necessarily pinpoint the actual cause of failure but should usually help diagnose the problem.

The formal grammar defining the SWI-Prolog syntax matches a string containing a sequence of Prolog terms separated by “full-stops”. Bad syntax causes failure with a message as described above and as demonstrated by your example. This particular grammar facilitates testing because it can recognize whole files of syntactically correct Prolog and (hopefully) fail when it encounters bad syntax . But I probably wouldn’t choose to use it as is for a production Prolog parser where a more incremental approach would seem to be necessary, e.g., to enforce directives. (It wouldn’t be that difficult to modify the existing example to do this.)

As I keep saying, the Prolog parser based on this grammar is not a production parser; it exists to facilitate testing of the formal SWI-Prolog grammar. It is intended to recognize SWI-Prolog syntax but doesn’t pretend to reproduce all the behaviours of the builtin Prolog parser.

Does pPEG allow {}/1 like DCG does?

I found a solution today: Have a grammer that covers non-errorneous
sentences and errorneous sentences. How to do this simply?
Its more difficult to resync the tokenization, then to resync the

parsing. For parsing I found this useful, since Prolog has the
concept of terminating period. Take the very simplified
prototype posted elsewhere:

reader_secondary(H, X, L, C) --> [A],
   {current_op(R, M, A), is_infix(M, D, E), L >= R}, !,
   {R-D < C -> throw(error(syntax_error(operator_clash),_)); true},
   {T is R-E}, reader(Z, T),
   {J =.. [A,H,Z]},
   reader_secondary(J, X, L, R).

It throws an exception inside an auxiliary action {}/1 of DCG. But
you can define throw//1, i.e. make throw itself a DCG non-terminal:

throw(T) --> read_sync, {throw(T)}.

And then replace the infix parsing, its only an example, works also
for all other parser errors:

reader_secondary(H, X, L, C) --> [A],
   {current_op(R, M, A), is_infix(M, D, E), L >= R}, !,
   ({R-D < C} -> throw(error(syntax_error(operator_clash),_)); {true}),
   {T is R-E}, reader(Z, T),
   {J =.. [A,H,Z]},
   reader_secondary(J, X, L, R).

The parser is now calling throw//1 instead of throw/1. And throw//1
can do some syncing.

Edit 28.04.2022:
I implemented this already for the full fledged parser. But the simplified
version doesn’t have it yet, because the simplified version didn’t parse a
list of period terminated terms anyway yet. If I have time I will do the

simplified version as well. Currently I get nice syncing, like for example:

?- X = (:- :- x0).
error(syntax_error(operator_clash), [user:1])
?- X = (:- x0.
error(syntax_error(parenthesis_balance), [user:2])
?- X = :- x0).
error(syntax_error(operator_clash), [user:3])
?- X = (:- x0).
X = (:- x0).

It can parse all 4 queries, and doesn’t get out of sync. read_sync//0 is
currently implemented by skipping tokens until the terminating period
is reached or end of file. A more mature implementation of throw//1

can do much more, like fetching line number before it does read_sync//0.

I am little bit behind schedule of a Fuzzer that also
does parenthesis. So far the ultra simplified 100%
pur Prolog parser does only operators without parenthesis.

But here is a an interesting new find:

$ ./tpl -v
Trealla Prolog (c) Infradig 2020-2022, v1.27.12-31-g3575df
$ ./tpl
?- X = (:- (:- x0)).
Error: operator clash, line 1
false.
?-

There is something wrong with Ulrich Neumerkels
compliance test suite, and maybe some QuickCheck
resp. Fuzzer approach is indeed needed,

to chop off all heads of the hydra.

How can I make error handling for this error:

?- catch(string_termList("- x4 ** x2 =:= .", [Y]), error(E,_), true).
% pPEG Error: Prolog.expr failed, expected Prolog.expr at line 1.16:
%   1 | - x4 ** x2 =:= .
%                      ^
false.

Nothing is thrown… :frowning:

Nothing is thrown… :frowning:

I really don’t want to re-open this can of worms, but if you recall from a lengthy discussion on this forum some time back, I have a philosophical objection to errors compared to failure in logic programs, so you won’t see exceptions generated from most code I write, unless there’s no recourse, e.g., resource errors.

But if you want to modify the parser to generate exceptions that’s your call. In this particular case there are two sources of failure: syntax parsing (using the grammar) and expression reduction (semantics) and you’ve only trapped failure due to the latter. To convert the former failures to exceptions, you need to add a “failure” handling clause to string_termList/2 to throw the error. In fact this will catch both classes of error if you choose to remove your mod.

Or you could do the same to ppeg_parse (your predicate I assume) and not change the example Prolog parser at all. The failures are “noisy” so you won’t lose any diagnostic information.

How do I then switch off that it writes something on the console?
I can of course handle failure. Just change my utility, all errors are
mapped to error, I do not test the exact error term:

ppeg_parse(B, T) :-
   catch(string_termList(B, [T]), error(_, _), T = error).

I have also failure in my simplified parser, since its easier
for certain unaccepted sentences. Maybe will turn it into error,
like the usual read_term/2 does. I can change the utility Into this:

ppeg_parse(B, T) :-
   catch(string_termList(B, [T]), error(_, _), T = error), !.
ppeg_parse(_, error).

But it wont help me in any way, since your code will still write on
the console, like hooligans shouting in a city. Its not about failure
versus errors. Its about making the code silent, so that I can run

thousend of test cases automatically. So how do I switch off all
console errors from pack pPEG? Starting command line with
> /dev/null isn’t an option since I want to still interact with your code.

Edit 02.05.2022:
The SWI-Prolog built-in with_output_to/2 does also not work. Would need
something like a with_error_to/2 ? Does this exist? I only get:

?- with_output_to(atom(_), (write('I am chatty'), nl)).
true.

?- with_output_to(atom(_), string_termList("- x4 ** x2 =:= .", [Y])).
% pPEG Error: Prolog.expr failed, expected Prolog.expr at line 1.16:
%   1 | - x4 ** x2 =:= .
%                      ^
false.

I didn’t start a new testing campaign with pPEG yet, where I get
the new type of error, which I do not yet know how to silence.
But meanwhile I have a funny test case where

SWI-Prolog and SICStus disagree. The pPEG SWIPL Example
sides with SWI-Prolog:

/* SWI-Prolog */
?- X = (-  -  -  *  -  -  -  x0), write_canonical(X), nl.
*(-(-(-)),-(-(-(x0))))
X = - - (-)* - - -x0.

/* SICStus Prolog */
?- X = (-  -  -  *  -  -  -  x0), write_canonical(X), nl.
-(-(-(-(*))),-(-(x0)))
X = - - - (*)- - -x0 ? 

/* pPEG SWIPL Example */
?- string_termList("-  -  -  *  -  -  -  x0 .", [Y]), write_canonical(Y), nl.
*(-(-(-)),-(-(-(x0))))
Y = - - (-)* - - -x0.

ECLiPSe Prolog has an option “quiet”:

term_string(T, S, [variable_names(N), syntax_errors(quiet)]).

The option is inherited from read_term/2, so you can choose in which philosophical
camp you want to live in, you are not violently forced into a particular camp:

modifies the treatment of syntax errors: if ‘quiet’, the predicate fails quietly; if ‘fail’, an error message is printed and then the predicate fails; if ‘error’, a term of the form error(syntax_error(MessageString),context(…)) is thrown. The default is ‘fail’ if the per-module syntax_option ‘syntax_errors_fail’ is set, otherwise ‘error’.
https://www.eclipseclp.org/doc/bips/kernel/ioterm/read_term-3.html

But from the documentation it seems I should rather use “error” than “quiet”.

Edit 02.05.2022:
Usually its more difficult get out of the verbose “fail” camp, then out of the
“error” camp. You can always get out of the “error” camp via something like:

catch(pred, error(E,_), (print(E), fail)).

But how do you get out of the verbose “fail” camp, there is no with_error_to/2 ?

The SWIP message system is used to generate the messages (recommended for libraries) so the easiest way is to just set the verbose prolog environment flag to silent. Alternatively, you can use the message_hook to do whatever you want with the messages.

You get out of the “fail” camp by turning into an error just like you’ve done. Seems much easier than being forced to catch errors particularly since, in my experience, I often want to turn them into failures to avoid resorting to some extra-logical mechanism. But I said I didn’t want to reopen that can of worms, so let’s chalk it up to philosophical differences.

The Eclipse solution recognizes there is more than one way to do it and I suppose in a production parser one might choose that route. For this example, I prefer to leave that up to the user. I think he has the tools to do whatever he chooses, e.g., the Eclipse option could be implemented in a wrapper predicate like ppeg_parse with an options argument.