Parsing text using a formal grammar: SWIPL Example

j4n_bur53 · May 17, 2022, 1:44pm

ridgeworks:

You have suggested a third alternative with a !PrefixOp guard, but then this fails:
% pPEG/SWIP-grammar/pl_grammar compiled into pl_grammar 0.02 sec, 0 clauses
?- string_termList("- .",[T]).

And this also fails:

But there should be some way to make it work.

Edit 17.05.2022:
It also works in simp_parse/2:

?- simp_parse([-], T).
T =  (-).

?- simp_parse(['(',-,')'], T).
T =  (-).

ridgeworks · May 17, 2022, 1:48pm

Thanks. Here’s another I found:

“Precedences in specifications and implementations of programming languages” :

But to my untrained eye, they don’t address the root of the problem with Prolog expressions, namely that operators and atoms are syntactically indistinguishable. So “- *” could be parsed as “-(*)” (it’s actually a syntax error in SWIP) and “- * -” as “*((-),(-))”. Note in the first, “-” is and operator and “*” is an atom, while the roles are reversed in the second. In theory with sufficient lookahead this can be resolved, but it gets a bit tricky in practice for the general case and it gets worse when trying to define it in a grammar which recognizes Prolog syntax, which is my main focus here.

In practice, it’s not a big deal since you can just add enough parentheses to produce the right semantics. But as @j4n_bur53 rightly points out, the implicit assumptions built into various builtin Prolog parsers can adversely affect portability. At least, a grammar forces you to make some of them explicit.

j4n_bur53 · May 17, 2022, 1:49pm

I guess the ECLiPSe realization is not ISO compatible.

Since it violates this here:

You can try yourself:

[eclipse 1]: X = (- * x0).
X = (-) * x0

ridgeworks · May 17, 2022, 2:04pm

Are you starting to see a trend here? I am.

My conclusion is that the ISO standard may not be ambiguous (I don’t really know for reasons stated earlier) but most implementations consider it too restrictive. But in loosening the restrictions ambiguities have been introduced and the strategies for resolving these often haven’t been well documented. In practice, nobody cares much since you can always add parentheses to get the desired semantics. Well, they don’t care until you try to port a Prolog program to a different system with different resolution strategies and things break.

What can be done at this point is unclear. If everyone documented their strategy like Eclipse does, that would be a start.

ridgeworks · May 17, 2022, 2:46pm

I just pushed a new version updating the SWIP Example. Fixes include:

proper treatment of trailing escape sequences in quoted atoms
check for proper operator class in expression reduction
proper handling of non-associative operators in right reduction of expressions
added special “''” escape sequence in char codes (only grammar change)

I think this includes everything discussed so far. The known discrepancy with SWIP “xfx fy” associativity is unchanged from previous versions. In such cases, SWIP will generate an error while the pPEG version will not.

j4n_bur53 · May 21, 2022, 7:19pm

I found computer assistent more expressions that exemplify differences.
Your [x0, **, -, x1] and your [-, *, x0] was indeed found again as well.

For fuzz3:

+--------- [-, :-, x0]
         +------------------------- [x0, **, -, x1]
                                  +------ ppeg3
                                  +------ []
                                        + trealla3
                                        + []
                                        + swi3
                                        + []
                                        + scryer3
                                        + []
                                        + jekejeke3
                                        + gnu3
         +---------------------- [:-, :-, x0]
                               +--------- eclipse3
                               +--------- sicstus3

And for fuzz4:

+---------- [**, **, x0]
          +----------------------- [-, **, x0]
                                 +------- trealla4
                                 +------- ['(', '(', ')', -, x0]
                                        + scryer4
                                        + gnu4
          +-------- [-, **, x0]
                  +--------- [-, *, x0]
                           +------- [-, :-, x0]
                                  +------ swi4
                                  +------ eclipse4
                           +------------- jekejeke4
                  +-------- [-, :-, x0]
                          +-------------- ppeg4
                          +-------------- sicstus4

There is no reference to the simplified parser simp_parse/2 anymore,
its about differences between the different Prolog systems.

Warning: I used an algorithm that picks the first shortest example, that
discriminates two paths. There are more examples not shown.

Topic		Replies	Views
Formal grammar: SWIPL Example error handling General	34	2501	May 18, 2022
A parsing example General	4	7796	March 27, 2019
Op/3 binary with lower precedence somehow evaluated before unaries with higher precedence General	19	163	November 14, 2024
Phrases from string work, but from file failes Help!	13	921	December 30, 2019
Adding error detection to DCG rules Algorithm	18	1870	May 7, 2020

Parsing text using a formal grammar: SWIPL Example

Related topics