- string_codes/2 with escape sequences \xXX for CR with LF

EricGT · July 3, 2020, 10:42am

Using SWI-Prolog (threaded, 64 bits, version 8.3.3) on Windows 10

In converting some input into character codes for DCGs the cr with lf combination is not translating into the correct codes.

Correct examples

?- string_codes("\x0D",Codes).
Codes = [13].

?- string_codes("\x0A",Codes).
Codes = [10].

?- string_codes("\x0D \x0A",Codes).
Codes = [13, 32, 10].

?- string_codes("\x0D\\x0A",Codes).
Codes = [13, 10].

Example with what seems to be invalid conversion.

?- string_codes("\x0D\x0A",Codes).
Codes = [13, 120, 48, 65].

The documentation notes:

The closing \ is obligatory according to the ISO standard, but optional in SWI-Prolog to enhance compatibility with the older Edinburgh standard.

So is this a bug, a misreading of the documentation or something else?

EDIT

Based on reply by Jan W. to use \uXXXX

NB \uXXXX needs four hex digits

?- string_codes("\u000D\u000A",Codes).
Codes = [13, 10].

\uXXXX results in error when two hex digits are given.

?- string_codes("\u0D\u0A",Codes).
ERROR: Syntax error: Illegal \u or \U sequence
ERROR: string_codes("
ERROR: ** here **
ERROR: \u0D\u0A",Codes) .

Boris · July 3, 2020, 11:42am

The docs you linked, for the \xXX..\ say,

The code \xa\3 emits the character 10 (hexadecimal ‘a’) followed by ‘3’. Characters specified this way are interpreted as Unicode characters. See also \u .

If I apply this logic to your invalid conversion, I can rephrase that to:

The code \x0D\x0A emits the character 13 (hexadecimal ‘d’) followed by ‘x0A’. Characters specified this way are interpreted as Unicode characters. See also \u .

This obviously does not answer your question but at least it seems consistent?

PS: if there is something, it has nothing to do with string_codes/2, it is about reading the string literal. Try:

X = "\x0D\x0A".

EricGT · July 3, 2020, 11:54am

\xa\3 gets interpreted as \xa\ then 3. Since there is no escape for \3 then the only valid parse is as noted.

\x0D\x0A should be interpreted as \x0D then \x0A because the code should look ahead when seeing a backslash (\) to see if it is the start of an escape sequence. In this case \ is followed by x and thus signifies it is an escape sequence.

jan · July 3, 2020, 11:59am

Unless you care about portability to other Prolog systems, use \uXXXX or \UXXXXXXXX instead of \x… The ISO Prolog syntax for character codes is awkward and the result means nothing as it is undefined what encoding should be respected. \u and \U are widely used is virtually all languages these days and defined to be Unicode code points.

Topic		Replies	Views
Cannot use \uXXXX to replicated JavaScript behaviour General	0	982	November 21, 2023
Special character problem atom_codes Predicate	4	549	November 24, 2020
`string_codes/2` works on lists given as the first argument? General	5	120	April 20, 2024
Write to socket -- does it strip quotes? Help!	23	861	February 24, 2020
Unexpected behavior of term_string Predicate	2	283	December 28, 2022

- string_codes/2 with escape sequences \xXX for CR with LF

Related topics