Parsing text using a formal grammar: SWIPL Example

Unlike the ISO standard, the source code of SWI-Prolog is publicly available, here. It is licensed with one of the more permissive licenses in common use.

Questions with “Why” are notoriously difficult to answer; this makes me wonder what kind of answer you are hoping to get. On the other hand, “how and where SWI-Prolog got it wrong” is possible to answer. Are you prepared to read the code and would you like my assistance?

Just to be clear, I did not set out to produce any kind of definitive Prolog parser. What would be the point? But I did want to produce a formal and testable specification of SWI-Prolog syntax, which I think is sadly lacking but perhaps I’m in the minority.

Now a formal grammar IMO does not specify semantics (well some grammar systems allow you to imbed so called semantic actions, but leaving that aside), which is what most of this discussion is about. To complete the “testability” objective you have to add enough semantics to be able to compare it with the results of the real SWIP parser. And so far it seems to have accomplished that objective to the extent it even mimics so-called faults in the SWIP parser.

It is lamentable that there’s no active Prolog standards activity to air these kinds of issues. Until that happens I fear Prolog will continue to exist only in the backwater of programming languages in spite of its obvious merits.

(Nothing of Impartance here)

(Nothing of Impartance here)

FWIW:
http://www.complang.tuwien.ac.at/ulrich/iso-prolog/conformity_testing

Written abbreviation for for what it’s worth: used, for example in emails, when you are giving someone information and you do not know if it is useful or not

The referenced page, by Ulrich Neumerkel (editor of ISO/IEC 13211-1), has some syntax conformity tests. I don’t know whether the particular examples given in this Discourse thread are covered by these conformity tests. (Also, it’s out of date for SWI but I assume that not much changed for parsing between SWI6 and SWI7)

At the bottom of this page there’s:

Prolog portability quest. My attempt to vulgarize the ISO standard.
(I tried them, and they all work for SWI-Prolog)

There are many other references to parsing on those pages.
I gave some other references to Prolog parsers vs ISO Prolog, but @j4n_bur53 removed them as off-topic.

(Nothing of Impartance here)

Thats quite some news:

Do I understand you correctly. You say avoid certain operator combinations?
How would a normal end-user or DSL designer notice them?

Do they have some special characteristics? Or do you have an explicit list of
those operator combinations that should be avoided?

Edit 13.04.2022:
Here is a funny example in the wild, the authors seems not to have known
the ridgeworks rules, since he used:

% Christophe Meudec
% Eclipse 6.0 program
:- op(30, fy, [not]).      % must be fy for 'not not a' expressions
:- op(30, yfx, [**, abs]). % must be left to right for 'a ** b ** c' expressions

https://github.com/echancrure/PTC-Solver/blob/f02c64925fbc06a9f2cbbef252684ebccd5bd217/source/util__post_precedence.pl#L12

Which is an attempt to model ADA programming language. The poor guy will have
quite a surprise when migrating from ECLiPSe to SWI-Prolog:

/* ECLiPSe Prolog */
[eclipse 2]: X = (not not not 2 ** 3 ** 4), write_canonical(X), nl.
not(not(not(**(**(2, 3), 4))))

X = not not not 2 ** 3 ** 4
Yes (0.00s cpu)

And then:

/* SWI-Prolog */
?- op(30, fy, not), op(30, yfx, **).
true.

?- X = (not not not 2 ** 3 ** 4), write_canonical(X), nl.
**(**(not(not(not(2))),3),4)
X = not not not 2**3**4.

If you evaluate the two expressions with bitwise arithmetic you get different results:

[eclipse 6]: X is \(\(\((2^3)^4))).
X = -4097

[eclipse 7]: X is (\(\(\(2)))^3)^4.
X = 531441

They are just flagged, I can still read them

I understand the rationale behind reporting as bugs the parsing differences between SWI-Prolog, on the one hand, and other implementations and the standard on the other. I don’t see that anyone is arguing about this.

The way forward seems to be that if there is anyone willing and able to try and fix it, Jan W would do his part as maintainer to consider it. I am referring to this post higher up and particular the last sentences:

I apologize if I misunderstood or misinterpreted that.

Are you @j4n_bur53 interested in working on this and what kind of help would you need? Of course reporting these problems is already very useful.

I have accepted the flags. Seems the posts stay there greyed? Anyway, debating the ISO standard is off topic. Most of would like SWI-Prolog’s parser to accept all valid ISO text and produce the term dictated by the standard. That is not the discussion. @ridgeworks did a good job trying to make the rules explicit. Handling the fy 1 yfx 2 is clearly not standard compliant and thus a bug. It is just that I happen to have other priorities and will for this only fulfill my duties handling a possibly pull request. I have no clue whether this is simple or hard to fix. The code handling operators is a bit of a mess :frowning: That is typically what happens if you write such a beast without a standard around and a variety of implementations. As just implementing the standard and nothing more will break a lot of existing code this is not really an option either.

Woke up in the morning and my brain only produced new challenges…

I am still trying to find a case in the wild in favor of SWI-Prolog. For
example if ADA would effectively parse:

abs x ** y

As this here:

**(abs(x), y)

Then the ISO core standard is somehow not “right”, or lets say more “arbitrary” than
I tought. Which could be an interesting turn of events.

The problem with my suggested fix, using “fx” instead “fy”, it does not work when you
you want to support multiple braketless occurences of an operator.

Edit 13.04.2022:
I also wonder how a recursive descent parser would work, that implements
the SWI-Prolog building of terms. What changes need to be done to lets
say a Prolog parser written in 100% Prolog found on the net, that does the

other builiding of terms, what changes need to be applied to such a parser
to yield the SWI-Prolog building of terms? This is all kind of my Prolog living
standard side quest, that shows that I have only scratched the tip of the ice

berg of parsing. Who knows, maybe there comes a situation where this is handy?

The other way around seems easier, take the pPEG parsers rule:

op_associativityEq(fy,yfx,left).

Change it into this:

op_associativityEq(fy,yfx,left) :- current_prolog_flag(iso, false).
op_associativityEq(fy,yfx,right) :- current_prolog_flag(iso, true).

What will the pPEG parser do now? Is there also a pPEG unparser?

Edit 13.04.2022:
For those who might not be fluent what an unparser is. A parser
would be used in read_term/2 whereas an unparser would be used in
write_term/2. In Prolog usually both input/output predicates make use

of the same operator table, which is therefore the single source that
influences the behaviour of these predicates concerning operators.
Is this also possible for pPEG, with the additional “parameterization”

of the ambiguity issue through op_associativityEq/3.

An old answer said:

But its about write_term/2, and not about write_canonical/2.

Is there a pPEG unparser?

Dogelog Player has both a parser and unparser written in 100%
Prolog. Means it has both read_term/2 and write_term/2 written
100% in Prolog, and currently follows ISO core standard term

building strategy, not the SWI-Prolog term building strategy.

(Surely I will not say anything you don’t know already, but for posterity I will type it out)

I don’t see how one can in the general case make a strong argument for the one or the other parse, given that the two operators have the same priority. In some context (accepted meaning of the operator) the one would make more sense and in other context the other. “Ausnahmen bestätigen die Regel” is one “wisdom” that is commonly used to brush away specific counterexamples that seem to destroy the otherwise beautiful logic of a grammar. In fact, the person who uses this phrase confirms the rule by explicitly stating that this counterexample is exceptional!

The benefit of a standard and strict adherence to it: there is a single source of truth. Sometimes it will be arbitrary but this is OK. If this single source of truth is not widely available and widely accepted and implemented, its usefulness diminishes.

I believe they’re those combinations of operators that make Table 6 necessary, but that table seems to have been deleted from this thread.

Given equal operator precedence values, I think the cases of concern are (fy,yfx), as discussed in this thread, (xfy,yf), and (fy,yf).

No, because currently the parser only exists to test the grammar.

The one exception being cascading the same operator, e.g., X is A-B-C. For that you need some kind of associativity value to “break the tie”.

Yes, sure. I was specifically talking about

should have been more explicit

I want to try this:

How do I do this? I did this:

?- pack_install(pPEG,[url('https://github.com/ridgeworks/pPEGpl.git')]).
% Cloning into 'c:/users/foobar/appdata/local/swi-prolog/pack/pPEGpl'...
Verify package status (anonymously)
        at "https://www.swi-prolog.org/pack/query" Y/n? 
% Contacting server at https://www.swi-prolog.org/pack/query ... ok
% "pPEGpl.git" was downloaded 5 times
Package:                pPEGpl
Title:                  Pack to support parsing text using pPEG grammars
Installed version:      1.0.1
Author:                 Rick Workman <ridgeworks@mac.com>
Home page:              https://github.com/ridgeworks/pPEGpl
Download URL:           https://github.com/ridgeworks/pPEGpl.git
Activate pack "pPEGpl" Y/n? 
true.

But what now? Did it also install the example?

How do I run it? How do I modify it?

Edit 13.04.2022:
Ok the example was also installed. But when when I consult pl_grammar.pl,
I guess it does not have a use_module(library(pPEG)) statement somewhere
on its own, and then stumbles. Or it has, but the order might be problematic:

ERROR: c:/users/foobar/appdata/local/swi-prolog/pack/ppegpl/examples/swip-grammar/pl_grammar.pl:37:21: Syntax error: unknown_quasi_quotation_syntax(pPEG,pl_grammar)
%   library(strings) compiled into strings 0.00 sec, 60 clauses
%   library(pcre) compiled into pcre 0.00 sec, 94 clauses
%  library(pPEG) compiled into pPEG 0.03 sec, 209 clauses
ERROR: Exported procedure pl_grammar:prolog_grammar/1 is not defined
% c:/Users/foobar/AppData/Local/swi-prolog/pack/pPEGpl/Examples/SWIP-grammar/pl_grammar.pl compiled into pl_grammar 0.03 sec, 12 clauses

Ok it wurks. I changed pl_parser.pl accordingly, and I now get:

?- string_termList("fy 1 yfx 2.", [T]), write_canonical(T), nl.
yfx(fy(1),2)
T = fy 1 yfx 2.

?- set_prolog_flag(iso, true).
true.

?- string_termList("fy 1 yfx 2.", [T]), write_canonical(T), nl.
fy(yfx(1,2))
T = fy 1 yfx 2.

Eh voila SWI-Prolog has become member of the ISO core standard club.
Maybe it can even apply for V.I.P. membership, because it is so important?

Edit 13.04.2022:
Interestingly the standard unparser of SWI-Prolog doesn’t give a damn.
But this is another defect which I have posted about here already in some thread.

aLSO who_ever wrote this code of pPEG deserves to burn in hell, for
mixing underscore and camelcase. Took me quite a while to run the query.

Maybe they should organize the ISO core standard club like here:

Platin Sponsor
Gold Sponsor
Silver Sponsor
https://www.jug.ch/sponsors.php

Then you can buy different grades of VIPness with Mamon.
Sad Story: Any poor John Doe Prolog system will be non-VIP.

Can I mark my own post as off topic?
Ok the system says I should wait 3 hours…

Guilty as charged. I guess I’m religious about some things but not that particular convention. (E.g., op_associativityEq/3.) I probably should be more disciplined about exported predicate names.

So I’m not sure what the standard convention is for examples. (Remember the pack is not a prolog parser, it’s a pPEG grammar system.) I guess I assumed users would download example files from github directly, rather than picking them out of the intsalled pack.

So first use_module(library(pPEG)). like a conventional pack and then just manually load/consult pl_grammar.pl and pl_parser.pl in that order. But I admit it’s all a little clumsy. If it’s going to serve as something other than a pPEG example, it will need to be properly packaged.

I think I missed (xfy,yfx)? Definitely a pattern: (..y,y..).