I am a little confused by the op/3
behavior.
:- op(100, fy, xxx).
:- op(200, xfy, yyy).
:- op(100, yf, zzz).
goal :- display(xxx yyy zzz), nl.
Why the above prints yyy(xxx,zzz)
?
I would expect it to print zzz(xxx(yyy))
.
I am a little confused by the op/3
behavior.
:- op(100, fy, xxx).
:- op(200, xfy, yyy).
:- op(100, yf, zzz).
goal :- display(xxx yyy zzz), nl.
Why the above prints yyy(xxx,zzz)
?
I would expect it to print zzz(xxx(yyy))
.
Did you mean this? (Although it gives the same result):
:- op(100, yf, zzz).
Anyway, with this definition (zzz
is a postfix operator; if I use your definition for zzz
, I get a syntax error):
?- X = (xxx 1 zzz), write_canonical(X), nl.
zzz(xxx(1))
X = xxx 1 zzz.
Ah yes. Going to edit the question.
Changed zzz to be postfix.
I get this issue in
SWI-Prolog version 9.2.8 for x86_64-darwin
and in
SWI-Prolog version 9.3.14 for fat-darwin
According to the ISO rules, this is a syntax error. SWI-Prolog and various others try to be a bit more flexible. Roughly, we have
The ISO rules are strict, but require parenthesis in many places where humans think it is not really ambiguous. Many real Prolog systems are more relaxed. This typically works fine for many of the obvious cases, but at some point things get ambiguous Unfortunately no two systems apply the same rules …
The other flexible Prolog systems, in the sense that they
accept the input and don’t throw a syntax error as
some interpretation of the ISO core standard suggest,
do it differently:
?- member(X, [xxx,yyy,zzz]), current_op(Y,Z,X).
X = (xxx), Y = 100, Z = fy
; X = (yyy), Y = 200, Z = xfy
; X = (zzz), Y = 100, Z = yf.
/* SWI-Prolog 9.3.14 */
?- X = (xxx yyy zzz), write_canonical(X), nl.
yyy(xxx,zzz)
X = (xxx)yyy(zzz).
/* SICStus 4.9.0 */
X = (xxx yyy zzz), write_canonical(X), nl.
xxx(zzz(yyy))
X = xxx (yyy)zzz
/* Dogelog 1.2.5, Trealla 2.59.18 */
?- X = (xxx yyy zzz), write_canonical(X), nl.
xxx(zzz(yyy))
X = xxx yyy zzz.
Dogelog and Trealla unparser has evolved to not write (yyy)
, but
otherwise it seems to give the same parsing result as SICStus.
Did SICStus change something since Quintus, sadly I don’t know
how to test Quintus. But maybe the ISO core standard took it
too far, in banning the input, since SICStus Prolog allows it.
And at some moment in time SWI-Prolog invented a different
parsing strategy from modern SICStus.
What was the motivation for the above rule? It needs more
look ahead fiddling. Its also not consistently applied. Note
both (+)/2
and (*)/2
are infix operators:
/* SWI-Prolog 9.3.14 */
?- X = (- + -), write_canonical(X), nl.
-(+(-))
X = - + (-).
?- X = (- * -), write_canonical(X), nl.
*(-,-)
X = (-)*(-).
So I guess the rule is more complex. It gets more complex
when yyy is an infix and prefix operator at the same time,
like in the (+)/2
case above.
Otherwise everything of the rule is satisfied,
like yyy has higher priority than xxx:
?- current_op(X, fy, -).
X = 200.
?- current_op(X, yfx, +).
X = 500.
For reference:
/* Ciao 1.24 */
?- op(100, fy, xxx), op(200, xfy, yyy), op(100, yf, zzz).
yes
?- X = (xxx yyy zzz), write_canonical(X), nl.
xxx(zzz(yyy))
X = xxx yyy zzz ?
History, I guess. I had a discussion with SICStus several years ago, trying to sync. That did not lead to a specification and I failed to reverse engineer the SICStus rules.
I would love a new set of standard rules that is less restrictive than ISO, but probably more restrictive than SICStus. What I learned is that it is basically non-deterministic Prolog code that commits on the first solution.
For now, I have no plans to change things unless we can come up (PIP) with a decent proposal that is carried by several other systems such that I can motivate the change as “standard”.
Well by “motivation” I mean what is the business case of a back-
tracking parser, instead of a greedy very intolerant recursive decending
parser? Is there some Prolog code in the wild that would need it?
SICStus Prolog is sometimes more relaxed than SWI-Prolog:
?- member(X,[xxx,yyy]), current_op(Y,Z,X).
X = xxx, Y = 100, Z = yfx ? ;
X = yyy, Y = 100, Z = fy ? ;
no
/* SWI-Prolog 9.3.14 */
?- X = (yyy xxx xxx yyy xxx xxx), write_canonical(X), nl.
ERROR: Syntax error: Operator priority clash
ERROR: X = (yyy xxx xxx yyy xxx xx
ERROR: ** here **
ERROR: x), write_canonical(X), nl .
/* SICStus Prolog 4.9.0 */
| ?- X = (yyy xxx xxx yyy xxx xxx), write_canonical(X), nl.
yyy(xxx(xxx(xxx,yyy),xxx))
X = yyy (xxx)xxx(yyy)xxx(xxx) ?
Maybe a result that SICStus Prolog does more backtracking?
I know a few relaxations which can also be implemented
in a greedy recursive decending parser. Whereby greedy means
no backtracking. I have never seen somebody publishing
an example of a Prolog term that would need backtracking
based parsing, and that was used in some real world Prolog
code. But maybe I didn’t look closely enough.
Edit 11.11.2024
A motivation i.e. business case could be operator based DSLs.
But this most of the time fails, because unary (prefix and postfix) and
binary (infix) operators cannot model multi-fix operators, right?
The typical exercise is this here:
Homework: Model a Pascal Begin End Block
Pascal | Block statement: begin end | Easy language reference
Can this be done with Prolog operators.? If yes, which
Prolog system can do it and how does it do it?
AFAIK, SICStus Prolog uses a backtracking parser. SWI-Prolog does not. It uses a parser technique that I got from some book I’ve forgotten about long ago that is some form of shift-reduce parsing. At some point, your are stuck and have to drop the operator property of one of the atoms or raise a syntax error. Here, we have fx xfy. These cannot both be operators. At least in theory, one can describe the logic.
A fully backtracking parser accepts more. We’d have to define what do do in case there are multiple parses possible though. AFAIK, SICStus commits to the first, so the order generated by the implementation matters. This is, again AFAIK, not well described though.
The bottom line remains that both parses are IMO equally justifiable. The standard says we should reject this input. Ideally we’d come up with understandable rules that disambiguates ambiguous (ISO errornous) input. Did I miss some de-facto standard algorithm that is in use by a good deal of the other Prolog systems? If so, I’m happy to adopt it. Preferably after agreement.
Well, it is a real world code in the sence that I have picked up the compiler and started to write down a demonstratory example. My goal was not producing a compiler bug report, but rather to demonstrate an applicability of precedence parsing to some NLP tasks.
But then I realised that for the next sentence I get a wrong parse.
The code is the following.
:- op(100, fy, березовой).
:- op(100, fy, высокий).
:- op(100, fy, даже).
:- op(100, fy, еще).
:- op(100, fy, сам).
:- op(100, fy, каждый).
:- op(100, fy, каждого).
:- op(100, fy, много).
:- op(100, fy, надо).
:- op(100, fy, небольшой).
:- op(100, fy, не).
:- op(100, fy, ни).
:- op(100, fy, огурцовой).
:- op(100, fy, одном).
:- op(100, fy, один).
:- op(100, fy, очень).
:- op(100, fy, ростом).
:- op(100, fy, сказочном).
:- op(100, fy, цветочным).
:- op(100, fy, этот).
% postfix
:- op(100, yf, ведь).
:- op(100, yf, вовсе).
:- op(100, yf, бы).
:- op(100, fy, в).
:- op(100, fy, вокруг).
:- op(100, fy, за).
:- op(100, fy, из).
:- op(100, fy, с).
:- op(100, fy, на).
:- op(100, fy, по).
:- op(100, fy, у).
:- op(100, fy, через).
:- op(120, yf, васильков).
:- op(120, yf, колокольчиков).
:- op(120, yf, ромашек).
:- op(120, yf, цветов).
:- op(120, yf, огурцов).
:- op(120, yf, ручья).
:- op(180, xfy, @).
:- op(190, fy, лазить).
:- op(190, fy, тащить).
:- op(190, fy, пилить).
:- op(190, fy, собирать).
:- op(190, fy, сорвать).
:- op(200, xfy, были).
:- op(200, xfy, был).
:- op(200, xfy, было).
:- op(200, xfy, делали).
:- op(200, xfy, жили).
:- op(200, xfy, называли).
:- op(200, xfy, назывались).
:- op(200, xfy, назывался).
:- op(200, xfy, переплывали).
:- op(200, xfy, приходилось).
:- op(200, xfy, росли).
:- op(200, xfy, росло).
:- op(200, xfy, стоял).
:- op(200, xfy, смог).
:- op(200, xfy, ходили).
:- op(300, yfx, зпт).
:- op(300, fy, что).
:- op(400, yfx, потому).
:- op(400, yfx, да).
:- op(400, yfx, и).
:- op(400, yfx, двтч).
:- op(500, fy, а).
goal :-
display(в одном сказочном городе жили коротышки), nl,
display(коротышками @ их называли 0 потому что они были очень маленькие), nl,
display(каждый коротышка был ростом с небольшой огурец), nl,
display(в городе @ у них было очень красиво), nl,
display(вокруг каждого дома росли цветы двтч маргаритки зпт ромашки зпт одуванчики), nl,
display(там @ даже улицы назывались именами цветов двтч улица колокольчиков зпт аллея ромашек зпт бульвар васильков), nl,
display(а сам город назывался цветочным городом), nl,
display(он стоял на берегу ручья), nl,
display(этот ручей @ коротышки называли огурцовой рекой потому что по берегам ручья росло много огурцов), nl,
display(за рекой был лес), nl,
display(коротышки делали из березовой коры @ лодочки зпт 0 переплывали через реку и 0 ходили в лес @ (за ягодами зпт за грибами зпт за орехами)), nl,
display(собирать ягоды было трудно потому что коротышки ведь были крошечные), nl,
display(0 да /*еще*/ тащить с собой @ пилу), nl,
display(а за орехами /*и вовсе*/ приходилось лазить на высокий куст да /*еще*/ тащить с собой @ пилу), nl,
display((ни один коротышка) @ (не смог бы) @ (сорвать орех @ руками)), nl,
display(их @ надо было пилить пилой), nl.
bug :- display(не смог бы). % WTF??? смог(не, бы) Why ???
The program prints out the following if you are interested
жили(в(одном(сказочном(городе))),коротышки)
потому(называли(@(коротышками,их),0),что(были(они,очень(маленькие))))
был(каждый(коротышка),ростом(с(небольшой(огурец))))
было(@(в(городе),у(них)),очень(красиво))
двтч(росли(вокруг(каждого(дома)),цветы),зпт(зпт(маргаритки,ромашки),одуванчики))
двтч(назывались(@(там,даже(улицы)),цветов(именами)),зпт(зпт(колокольчиков(улица),ромашек(аллея)),васильков(бульвар)))
а(назывался(сам(город),цветочным(городом)))
стоял(он,ручья(на(берегу)))
потому(называли(@(этот(ручей),коротышки),огурцовой(рекой)),что(росло(ручья(по(берегам)),огурцов(много))))
был(за(рекой),лес)
и(зпт(делали(коротышки,@(из(березовой(коры)),лодочки)),переплывали(0,через(реку))),ходили(0,@(в(лес),зпт(зпт(за(ягодами),за(грибами)),за(орехами)))))
потому(было(собирать(ягоды),трудно),что(были(ведь(коротышки),крошечные)))
да(0,тащить(@(с(собой),пилу)))
а(да(приходилось(за(орехами),лазить(на(высокий(куст)))),тащить(@(с(собой),пилу))))
@(ни(один(коротышка)),@(смог(не,бы),сорвать(@(орех,руками))))
было(@(их,надо),пилить(пилой))
Mostlikely you need further application domain specific criterias.
Your example has multiple parse trees with the given operator table:
?- term(X, M, [не, смог, бы], []), write_canonical(X), nl.
'не'('бы'('смог'))
X = не (смог)бы,
M = 100 ;
'бы'('не'('смог'))
X = не (смог)бы,
M = 100 ;
'смог'('не','бы')
X = (не)смог(бы),
M = 200 ;
false.
And then an operator table driven parser, without left recursion, so that it terminates:
term(X, M) --> factor(Y, L), rest(Y, L, X, M).
rest(Y, L, X, M) --> more(Y, L, Z, N), rest(Z, N, X, M).
rest(Y, L, Y, L) --> [].
more(Y, L, X, M) --> [O], {current_op(M, yf, O)},
{M >= L, X =.. [O,Y]}.
more(Y, L, X, M) --> [O], {current_op(M, xf, O)},
{M > L, X =.. [O,Y]}.
more(Y, L, X, M) --> [O], {current_op(M, yfx, O)},
term(Z, R),
{M >= L, M > R, X =.. [O,Y,Z]}.
more(Y, L, X, M) --> [O], {current_op(M, xfy, O)},
term(Z, R),
{M > L, M >= R, X =.. [O,Y,Z]}.
more(Y, L, X, M) --> [O], {current_op(M, xfx, O)},
term(Z, R),
{M > L, M > R, X =.. [O,Y,Z]}.
factor(X, M) --> [O], {current_op(M, fy, O)},
term(Y, R), {M >= R, X =.. [O,Y]}.
factor(X, M) --> [O], {current_op(M, fx, O)},
term(Y, R), {M > R, X =.. [O,Y]}.
factor(X, 0) --> [X].
The parser uses directly the builtin current_op/3 for operator table access.
Disclaimer: Not for production use, highly backtracking!
Yeah, I was under impression there is some left to right ordering of op/3 applications, but looks like I was wrong.
Unfortunately the parsing strategy is not described in the docs (and of course there are all kinds of good reasons for that). So I had some intuitive model of the parse, but what exactly happens when op/3 is used is unclear from the docs.
Edited the parser, turns out there are 3 and not only 2 parse trees.
The case is listed in the ISO core standard docs:
Plus a few examples, mainly giving parsing strategy directions:
SWI-Prolog especially fails the 4th test case:
?- X = (fy 2 yf), write_canonical(X), nl.
yf(fy(2))
Result should be fy(yf(2))
, so SWI-Prolog is a different dialect.
You can query the dialect via a Prolog flag:
?- current_prolog_flag(dialect, X).
X = swi.
But I think SWI-Prolog has an unparsing bug:
/* SWI-Prolog 9.3.14 */
?- X = 'не'('бы'('смог')).
X = не (смог)бы.
?- X = 'бы'('не'('смог')).
X = не (смог)бы.
Twice the same output, but different term argument!
Edit 13.11.2024
Scryer Prolog can do it, but it suffers from a little bit too many parenthesis:
/* Scryer Prolog 0.9.4-201 */
?- X = 'не'('бы'('смог')).
X = не (смог)бы.
?- X = 'бы'('не'('смог')).
X = (не (смог))бы.
A system with a more minimal output but still showing a difference is seen here:
/* Dogelog Player 1.2.4 */
?- X = 'не'('бы'('смог')).
X = не смог бы.
?- X = 'бы'('не'('смог')).
X = (не смог)бы.
Again, for reference:
/* Ciao 1.24 */
?- X = 'не'('бы'('смог')).
X = не смог бы ?
yes
?- X = 'бы'('не'('смог')).
X = (не смог)бы ?
yes
?- X = не смог бы.
X = не смог бы ?
yes
?- X = (не смог)бы.
X = (не смог)бы ?
yes
?-
Interestingly the examples by @horsh can be used as a
parser test. I find for various Prolog systems after consult:
/* SWI-Prolog */
?- aggregate_all(count, test(_), C).
C = 16
/* Ciao Prolog */
?- aggregate_all(count, test(_), C).
C = 16
/* Dogelog Player */
?- aggregate_all(count, test(_), C).
C = 15 %%% test(16) cannot be parsed
/* Trealla Prolog */
?- aggregate_all(count, test(_), C).
C = 14. %%% test(9) and test(16) cannot be parsed
/* Scryer Prolog */
?- findall(hit, test(_), L), length(L, C).
C = 13. %%% test(9), test(15) and test(16) cannot be parsed
/* SICStus Prolog */
?- aggregate_all(count, test(_), C).
C = 0. %%% doesn't understand UTF-8 ?
These are the examples turned into test cases:
test(1) :- _ = (в одном сказочном городе жили коротышки).
test(2) :- _ = (коротышками @ их называли 0 потому что они были очень маленькие).
test(3) :- _ = (каждый коротышка был ростом с небольшой огурец).
test(4) :- _ = (в городе @ у них было очень красиво).
test(5) :- _ = (вокруг каждого дома росли цветы двтч маргаритки зпт ромашки зпт одуванчики).
test(6) :- _ = (там @ даже улицы назывались именами цветов двтч улица колокольчиков зпт аллея ромашек зпт бульвар васильков).
test(7) :- _ = (а сам город назывался цветочным городом).
test(8) :- _ = (он стоял на берегу ручья).
test(9) :- _ = (этот ручей @ коротышки называли огурцовой рекой потому что по берегам ручья росло много огурцов).
test(10) :- _ = (за рекой был лес).
test(11) :- _ = (коротышки делали из березовой коры @ лодочки зпт 0 переплывали через реку и 0 ходили в лес @ (за ягодами зпт за грибами зпт за орехами)).
test(12) :- _ = (собирать ягоды было трудно потому что коротышки ведь были крошечные).
test(13) :- _ = (0 да /*еще*/ тащить с собой @ пилу).
test(14) :- _ = (а за орехами /*и вовсе*/ приходилось лазить на высокий куст да /*еще*/ тащить с собой @ пилу).
test(15) :- _ = ((ни один коротышка) @ (не смог бы) @ (сорвать орех @ руками)).
test(16) :- _ = (их @ надо было пилить пилой).
Edit 13.11.2024
About UTF-8, the test cases are tricky, they have a few Homoglyphs.
Ciao Prolog playground is so nice to show them:
I don’t recommend putting the code into any test codebase.
It is an expert from an existing book.