There are two ways of escaping some Unicode characters: \\u
and \\x
.
For output, the quoted form prefers \\x
and the portray form prefers \\u
:
X = '\x2', Y = '\u0002',
atom_codes(X, Codes), atom_chars(X, Chars),
format('quoted(~q) canonical(~k) atom(~a) print(~p)~n', [X, X, X, X]).
quoted('\x2\') canonical('\x2\') atom() print('\u0002')
X = Y, Y = '\u0002',
Codes = [2],
Chars = ['\u0002'].
Is there a reason for quoted output preferring the \\x
form and portray preferring the \\u
form?
From reading the section of the manual on character escape sequences, I infer that the '\u` form isn’t in the ISO standard.
Also, I don’t understand the sentence “where \x defines a numeric character code, it doesn’t specify the character set in which the character should be interpreted” … what character set is being referred to, and how does the '\u` notation fix this problem?
It could be an IBM EBCDIC code. If the Prolog processor
character set is IBM EBCDIC. From the ISO Core Standard:
In as far it would be possible that a Prolog processor supports
more decimal digits than only the latin decimal digits. Since the
standard says:
But something tells me, the ISO core standard didn’t have this
use case in mind. Rather more atoms and variables I guess.
On the same page more or less:
Edit 22.06.2022:
In the above terminology, a Prolog processor that supports
more character codes than those listed in 6.5, supports extended
characters, this holds for Prolog processors that support Unicode.
Each Unicode glyph that is not from 6.5, would be an extended
character. But then usally a Prolog processor also supports
the Unicode code point collation, which is a further ingredient
as per 6.6, i.e. the numbering of the glyphs as per Unicode.
If the later is the case then \xXXXX\
and \uXXXX
say the same.
But I wonder whether a Prolog system has ever had \uXXXX
which got internally differently coded since 6.6 was not Unicode
code point numbering? Maybe there is some such Prolog system?
What would make sense, is a ISO core standard working
group, that would draft these stream creation properties:
- bom(Bool)
Specify detecting or writing a BOM.
- encoding(Atom)
Specify a file encoding.
After all we have already 2022 and 50 years of Prolog. But
can we be sure that Prolog texts are exchangeable, if
they use Unicode code points?
What if a UTF-16 file, handy for CJK, comes along?
Edit 25.06.2022:
I feel I have not the according experience. Will redo this stuff now
for Dogelog Player after having it in formerly Jekejeke Prolog and still
discovering new corners. Maybe can tell more in a few years.
Also experience from formerly Jekejeke Prolog are that things
might look different for UrlConnection, since the server might do
the BOM detection and the client doesn’t need. I guess SWI-Prolog
has also some experience here through http_open/3.