Parsing text using a formal grammar: SWIPL Example

And this also fails:

But there should be some way to make it work.

Edit 17.05.2022:
It also works in simp_parse/2:

?- simp_parse([-], T).
T =  (-).

?- simp_parse(['(',-,')'], T).
T =  (-).

Thanks. Here’s another I found:

“Precedences in specifications and implementations of programming languages” :

But to my untrained eye, they don’t address the root of the problem with Prolog expressions, namely that operators and atoms are syntactically indistinguishable. So “- *” could be parsed as “-(*)” (it’s actually a syntax error in SWIP) and “- * -” as “*((-),(-))”. Note in the first, “-” is and operator and “*” is an atom, while the roles are reversed in the second. In theory with sufficient lookahead this can be resolved, but it gets a bit tricky in practice for the general case and it gets worse when trying to define it in a grammar which recognizes Prolog syntax, which is my main focus here.

In practice, it’s not a big deal since you can just add enough parentheses to produce the right semantics. But as @j4n_bur53 rightly points out, the implicit assumptions built into various builtin Prolog parsers can adversely affect portability. At least, a grammar forces you to make some of them explicit.

I guess the ECLiPSe realization is not ISO compatible.

Since it violates this here:

You can try yourself:

[eclipse 1]: X = (- * x0).
X = (-) * x0

Are you starting to see a trend here? I am.

My conclusion is that the ISO standard may not be ambiguous (I don’t really know for reasons stated earlier) but most implementations consider it too restrictive. But in loosening the restrictions ambiguities have been introduced and the strategies for resolving these often haven’t been well documented. In practice, nobody cares much since you can always add parentheses to get the desired semantics. Well, they don’t care until you try to port a Prolog program to a different system with different resolution strategies and things break.

What can be done at this point is unclear. If everyone documented their strategy like Eclipse does, that would be a start.

I just pushed a new version updating the SWIP Example. Fixes include:

  • proper treatment of trailing escape sequences in quoted atoms
  • check for proper operator class in expression reduction
  • proper handling of non-associative operators in right reduction of expressions
  • added special “''” escape sequence in char codes (only grammar change)

I think this includes everything discussed so far. The known discrepancy with SWIP “xfx fy” associativity is unchanged from previous versions. In such cases, SWIP will generate an error while the pPEG version will not.

I found computer assistent more expressions that exemplify differences.
Your [x0, **, -, x1] and your [-, *, x0] was indeed found again as well.

For fuzz3:

+--------- [-, :-, x0]
         +------------------------- [x0, **, -, x1]
                                  +------ ppeg3
                                  +------ []
                                        + trealla3
                                        + []
                                        + swi3
                                        + []
                                        + scryer3
                                        + []
                                        + jekejeke3
                                        + gnu3
         +---------------------- [:-, :-, x0]
                               +--------- eclipse3
                               +--------- sicstus3

And for fuzz4:

+---------- [**, **, x0]
          +----------------------- [-, **, x0]
                                 +------- trealla4
                                 +------- ['(', '(', ')', -, x0]
                                        + scryer4
                                        + gnu4
          +-------- [-, **, x0]
                  +--------- [-, *, x0]
                           +------- [-, :-, x0]
                                  +------ swi4
                                  +------ eclipse4
                           +------------- jekejeke4
                  +-------- [-, :-, x0]
                          +-------------- ppeg4
                          +-------------- sicstus4

There is no reference to the simplified parser simp_parse/2 anymore,
its about differences between the different Prolog systems.

Warning: I used an algorithm that picks the first shortest example, that
discriminates two paths. There are more examples not shown.