Question on comma and semicolon vs dot in Prolog grammar

Still with reference to Tau Prolog’s grammar, and SWI appears to do the same:

Looking at the atom terminal, why is . treated as any other symbolic character, e.g. +. or .. are valid atoms, while , and ; are treated specially such that they can only occur alone, e.g. ,, is not a valid atom? I.e., why not have , and ; be just as any other symbolic character?

I am especially puzzled since, in parsing, the treatment of dot is not less non-trivial than that of comma: so, again, what is the problem/reason with , and ; such that they have to occur alone (in a “symbolic” atom)?

P.S. The same question also for !, why is e.g. !! not allowed…

I was hoping you could explain :grinning_face:

Got some code to demonstrate a problematic scenario?

I am not an expert of Prolog’s grammar and the reasons or history for all the choices, so I am rather asking.

Don’t need to be an expert.

A bit of actual code would make your question much clearer - can you show some code? E.g.:

?- A = ',,'.
A = ',,'.

?- A = ','.
A = (',').

?- A = (',').
A = (',').

?- A = '.'.
A = ('.').

?- A = '!!'.
A = '!!'.

?- A = '!'.
A = !.

Looks fine to me, although the brackets in the output are interesting. Is that what you are referring to? Why not make it clearer? Especially state any problem area clearly with example(s).

I’m afraid you are just missing the point, despite this is question #2 along the same line. I do want to learn about Prolog’s grammar and its reasons, that is the question: a question of learning. For a bit of motivation (a scenario), think I am writing a Prolog parser, and there are possible variants with different cost/benefits, so I need to understand as much as possible why the standard choices: and, in most cases where I have a doubt, it turns out I am indeed simply missing something, as in my previous question about expression levels.

No, that is a quoted atom, and printed as such, and of course in quotes we can do anything.

I am talking about the symbols that make up “symbolic” (unquoted) atoms in the atom rule of the grammar, namely the !, ,, ; alternatives with one occurrence allowed, vs the [#\$\&\*\+\-\.\/\:\<\=\>\?@\^\~\\]+ alternative, which happens to also contain the ..

The Tau-Prolog spec, unfortunate concering Le Dot, writes:

For simplicity, the terminal symbols comma and dot in the
grammar denote atom symbols (see Table 1) whose values
are ‘,’ and ‘.’, respectively.

That statement by Tau-Prolog is not 100% correct. First of all
comma and dot cannot be used as atoms, as the syntactic
category atom from Table 1 would indicate. atoms can be Prolog

terms in itself, but comma and dot cannot:

?- X = , .
ERROR: Syntax error: Operand expected, unquoted comma or bar found

?- X = . .
ERROR: Syntax error: Unbalanced operator

Second comma is from the synactic category called punctuation,
and dot is its own synactic category, called the terminal period.
Both syntactic categories are not nicely covered in Table 1.

Punctuation is a small number of characters that can only written
alone, like ,,;, etc.. this explains why ,, is not recognized.
Terminal period is a period . followed by layout \n, , etc..

or line comment starting with %:

/* ISO 6.4.8 Other tokens , the ISO core standard literally says:
An end char shall be followed by a layout character or a %. */
terminal_period = "." (layout | line_comment).

/* ISO 6.5.3 Solo characters, I have only listed those
 that can appear as operators and added period */
punctuation = "," | ";" | "|" | "!" | "."

There are two dots now, the terminal period and the non-terminal
period from punctuation. Which caused the ISO commitee some
headache and some Corrigenda, especially quoting during writing

and since period is not a solo character, we find for example .. in
CLP(FD). Subsequently thirdly the grammar should say that
comma and non-terminal period can nevertheles be used as an

operator. Usually only infix or postfix, but not prefix operator:

?- read(X).
|: (A,B).
X = (_, _).

?- read(X).
|: (A.B).
X = _._.

Correctness is relative to requirements, rather (after several hours working with it) I’d say that grammar is just not 100% precise/formal. Indeed, I don’t find it ambiguous either: for example, the dot and comma rules really are not tokens but like the op rule: atoms such that so and so; or, as for the lexing, there must be an unstated assumption that the regexes are tried in the order as presented, otherwise some lookahead in some places becomes necessary…

That said, I very much appreciate your comparison with the details of the ISO standard, which may very well shed some light. But does ISO explain why “solo characters”, i.e. some characters, have to appear “solo”?

IOW, I do not understand why that restriction, such that e.g. !! (not quoted) is not a valid atom, as I am not finding anything in the grammar itself that would necessitate such a restriction. (All the more so, as said, considering, by contrast, that the restriction does not apply to ., despite . is non-trivial, as it can be the sentence/rule terminator, but also an operator when alone, a character of a “symbolic” atom when not alone, as well as appear in the representation of numbers/floats.)

Hi,

They do not need to be solo. For example Ciao Prolog
has a proposal for partial strings where one would write:

?- Y = "abc"||X.

Instead of:

?- Y = [a,b,c|X].

It has been adopted by Trealla Prolog, and is in the working
for Scryer Prolog. But mostlikely you have to remove | from
the solo category. Or use a parsing trick that expects two

solo characters sometimes. The solo
character references I gave reads in full:

And is from this document:

Part 1: General core, ISO/IEC 13211-1:1995(E)
https://www.iso.org/standard/21413.html

Bye

P.S.: Here you see the double || already in action. But unlike
the ISO core standard which has a well defined document
numbering, I didn’t yet find an authorative reference to this

particular form of a tokenizer and parser. Could be an
instance of real programmers don’t document. They just
do some stuff and don’t write documents:

Which is my point! :slight_smile:

But there may be historical and even so-far-unforseen technical reasons why they are: and some people around here might know, especially those who have been using Prolog since the very old days (there are few as far as I can tell, Jan W. of course being one of them).

Moreover, of course we can extend the language, but I am trying to start with a strictly Tau/ISO Prolog, then the extensions ideally should be conservative: and there I find few choices that are sometimes in detail as annoying and even a showstopper as they are unexplained…

real programmers don’t document

(BTW, on the “democratization of programming”: real programmers who do not design and document are rather charlatans: which was true 30 years ago as it is today…)

To avoid character aliasing with other tokens certain characters
are put into the solo category. Also known as delimiter sometimes.
So that you can do in Prolog:

?- X = [a|1], display(X), nl.
'[|]'(a,1)

?- X = (*,!), display(X), nl.
','(*,!)

If the vertical bar | were in the category of a or 1 and form
a bigger token, the result would be different. Or if the comma
, where in the category of * or ! and form a bigger token,

the results would be also different. Basically Prolog has
the following two types of delimiters put into the category
solo characters:

  • Field Delimiter:
    comma , , semi colon ; ,
    vertical bar |.

  • Bracket Delimiters:
    parentheses ( and ), braces { and },
    square brackets [ and ].

And it has put into the category solo characters:

  • Other Delimiters:
    cut !, terminating period .

Interestingly unlike in Excel CSV format, Prolog has no empty
field values so to speak. So you could enlarge the field
delimiters, which is done in Trealla Prolog, by allowing ||.

You could also allow ,,, or ;;;; etc..

1 Like

! and . are not treated in the same way, I thought that much was clear. And, to reiterate, I am not asking why there are solo characters, I am asking why some specific characters are solo: but I won’t repeat the exact list and the whole question.

There might be more reasons than Field Delimiter and Bracket Delimiters.
But I didn’t find some perfect names for it, so I just wrote Other Delimiters.
For example @kuniaki.mukai often writes:

/* Typical Prolog code by @kuniaki.mukai  */
p:-!,q.

This somehow shows that putting the cut !, into the solo
character category, and making it a delimiter is not totally wrong.
If the cut ! would alias with either :- or , the result would be

not as expected, unless a space would be put between the cut
and the other tokens around it. So you could say the Other Delimiters
serve the same purpose like the Field Delimiters and Bracket

Delimiters, they don’t need spaces around them to be recognized
as a single token since they were put into the solo character
category by the ISO core standard. You can read the solo

character category also as:

solo character = hey I don't need spaces around me

But somehow, although the solo characters do not marry with
other tokens and stay solitary their whole life long. Some solo
characters can join with other solo characters as the Ciao Prolog

example of || shows. So while the ISO standard makes
solo characters celibate, this is probably not the whole
story. Some solo characters can theoretically alias with other

solo characters forming bigger tokens, as Ciao Prolog shows.

1 Like

Wrong is like (not) correct: it’s relative. Indeed, I am mostly perplexed by the !, i.e. the fact that it is “solo” (still talking of the Tau/ISO standard) makes no sense to me, and is rather annoying as e.g. I do have found myself wanting a !! or a ?!.

But then you give me a clue with “punctuation” [and “delimiters” is even better]: i.e., maybe the idea is/was simply to provide a toolset such that, even among the punctuation symbols, some are solo for “distinctness”, and just some others are not for more flexibility in (typically) operator naming, indeed probably following some already acquired practices/conventions.

Though, even in that light, that !, , and ;, and maybe even |, can only be “solo” I find an unnecessary restriction, and one that prevents conservative extensions of the language where those characters are not solo: which is really annoying, as loss of conservativity, especially at tokenization time so how source code is read, and considering the reality of Prolog code and systems in the large, means that developing an ISO conformant system is essentially pointless. We’d rather need a new Prolog standard, a 2.0 so to speak…


P.S. Actually, I had missed the pregnancy of Jan B.'s example above: indeed, how should e.g. p :- !, q. be parsed if neither ! nor , are necessarily solo? Sure, I suppose we could do it and anything, just make the grammar and parser smarter and smarter (e.g. the tokenizer might have to know about operators and/or be able to backtrack, etc.): whence, the standard choice might rather be the preferable one, i.e. clean and simple…

Need to think more about it, just one thing is for sure: I need a standard that is one, to begin with.