Question on comma and semicolon vs dot in Prolog grammar

Still with reference to Tau Prolog’s grammar, and SWI appears to do the same:

Looking at the atom terminal, why is . treated as any other symbolic character, e.g. +. or .. are valid atoms, while , and ; are treated specially such that they can only occur alone, e.g. ,, is not a valid atom? I.e., why not have , and ; be just as any other symbolic character?

I am especially puzzled since, in parsing, the treatment of dot is not less non-trivial than that of comma: so, again, what is the problem/reason with , and ; such that they have to occur alone (in a “symbolic” atom)?

P.S. The same question also for !, why is e.g. !! not allowed…

I was hoping you could explain :grinning_face:

Got some code to demonstrate a problematic scenario?

I am not an expert of Prolog’s grammar and the reasons or history for all the choices, so I am rather asking.

Don’t need to be an expert.

A bit of actual code would make your question much clearer - can you show some code? E.g.:

?- A = ',,'.
A = ',,'.

?- A = ','.
A = (',').

?- A = (',').
A = (',').

?- A = '.'.
A = ('.').

?- A = '!!'.
A = '!!'.

?- A = '!'.
A = !.

Looks fine to me, although the brackets in the output are interesting. Is that what you are referring to? Why not make it clearer? Especially state any problem area clearly with example(s).

I’m afraid you are just missing the point, despite this is question #2 along the same line. I do want to learn about Prolog’s grammar and its reasons, that is the question: a question of learning. For a bit of motivation (a scenario), think I am writing a Prolog parser, and there are possible variants with different cost/benefits, so I need to understand as much as possible why the standard choices: and, in most cases where I have a doubt, it turns out I am indeed simply missing something, as in my previous question about expression levels.

No, that is a quoted atom, and printed as such, and of course in quotes we can do anything.

I am talking about the symbols that make up “symbolic” (unquoted) atoms in the atom rule of the grammar, namely the !, ,, ; alternatives with one occurrence allowed, vs the [#\$\&\*\+\-\.\/\:\<\=\>\?@\^\~\\]+ alternative, which happens to also contain the ..

Correctness is relative to requirements, rather (after several hours working with it) I’d say that grammar is just not 100% precise/formal. Indeed, I don’t find it ambiguous either: for example, the dot and comma rules really are not tokens but like the op rule: atoms such that so and so; or, as for the lexing, there must be an unstated assumption that the regexes are tried in the order as presented, otherwise some lookahead in some places becomes necessary…

That said, I very much appreciate your comparison with the details of the ISO standard, which may very well shed some light. But does ISO explain why “solo characters”, i.e. some characters, have to appear “solo”?

IOW, I do not understand why that restriction, such that e.g. !! (not quoted) is not a valid atom, as I am not finding anything in the grammar itself that would necessitate such a restriction. (All the more so, as said, considering, by contrast, that the restriction does not apply to ., despite . is non-trivial, as it can be the sentence/rule terminator, but also an operator when alone, a character of a “symbolic” atom when not alone, as well as appear in the representation of numbers/floats.)

Which is my point! :slight_smile:

But there may be historical and even so-far-unforseen technical reasons why they are: and some people around here might know, especially those who have been using Prolog since the very old days (there are few as far as I can tell, Jan W. of course being one of them).

Moreover, of course we can extend the language, but I am trying to start with a strictly Tau/ISO Prolog, then the extensions ideally should be conservative: and there I find few choices that are sometimes in detail as annoying and even a showstopper as they are unexplained…

real programmers don’t document

(BTW, on the “democratization of programming”: real programmers who do not design and document are rather charlatans: which was true 30 years ago as it is today…)

! and . are not treated in the same way, I thought that much was clear. And, to reiterate, I am not asking why there are solo characters, I am asking why some specific characters are solo: but I won’t repeat the exact list and the whole question.

Wrong is like (not) correct: it’s relative. Indeed, I am mostly perplexed by the !, i.e. the fact that it is “solo” (still talking of the Tau/ISO standard) makes no sense to me, and is rather annoying as e.g. I do have found myself wanting a !! or a ?!.

But then you give me a clue with “punctuation” [and “delimiters” is even better]: i.e., maybe the idea is/was simply to provide a toolset such that, even among the punctuation symbols, some are solo for “distinctness”, and just some others are not for more flexibility in (typically) operator naming, indeed probably following some already acquired practices/conventions.

Though, even in that light, that !, , and ;, and maybe even |, can only be “solo” I find an unnecessary restriction, and one that prevents conservative extensions of the language where those characters are not solo: which is really annoying, as loss of conservativity, especially at tokenization time so how source code is read, and considering the reality of Prolog code and systems in the large, means that developing an ISO conformant system is essentially pointless. We’d rather need a new Prolog standard, a 2.0 so to speak…


P.S. Actually, I had missed the pregnancy of Jan B.'s example above: indeed, how should e.g. p :- !, q. be parsed if neither ! nor , are necessarily solo? Sure, I suppose we could do it and anything, just make the grammar and parser smarter and smarter (e.g. the tokenizer might have to know about operators and/or be able to backtrack, etc.): whence, the standard choice might rather be the preferable one, i.e. clean and simple…

Need to think more about it, just one thing is for sure: I need a standard that is one, to begin with.