Using SWI-Prolog (threaded, 64 bits, version 8.1.22) on Windows 10.
Unable to create Unicode code point U+10000 in a string.
Based on the documentation Character Escape Syntax
Using the character a
in a string is working and with the escape variations.
?- C = "a".
C = "a".
?- C = "\x61".
C = "a".
?- C = "\x000061".
C = "a".
?- C = "\141".
C = "a".
?- C = "\u0061".
C = "a".
?- C = "\U00000061".
C = "a".
U+FFFF also works
?- C = "\xFFFF".
C = "".
?- C = "\x00FFFF".
C = "".
?- C = "\uFFFF".
C = "".
?- C = "\U0000FFFF".
C = "".
However U+10000 does not work
?- C = "\x010000".
ERROR: Syntax error: Illegal character code
ERROR: C = "\
ERROR: ** here **
ERROR: x010000" .
?- C = "\U00010000".
ERROR: Syntax error: Illegal character code
ERROR: C = "
ERROR: ** here **
ERROR: \U00010000" .
?-
Also a change of the encoding with the Prolog flag does not change the result.
?- current_prolog_flag(encoding,Encoding).
Encoding = text.
?- set_prolog_flag(encoding,utf8).
true.
?- current_prolog_flag(encoding,Encoding).
Encoding = utf8.
?- C = "\x010000".
ERROR: Syntax error: Illegal character code
ERROR: C = "\
ERROR: ** here **
ERROR: x010000" .
?- C = "\U00010000".
ERROR: Syntax error: Illegal character code
ERROR: C = "
ERROR: ** here **
ERROR: \U00010000" .
Is this because it is being done on Windows 10, or is it something else?
EDIT
Also tried changing the encoding for the stream user_input
but that resulted in a lack of permission.
?- stream_property(user_input,encoding(Encoding)).
Encoding = wchar_t.
?- set_stream(user_input,encoding(utf8)).
ERROR: No permission to encoding stream `user_input'
ERROR: In:
ERROR: [10] set_stream(user_input,encoding(utf8))
ERROR: [9] <user>
This is related to adding test cases for strings with this post.
Personal Notes
- Docs
- Windows
- Internationalization for Windows Applications
- Unicode and Character Sets
- About Unicode and Character Sets
- Character Sets
- SO Q&A Escaped Characters Outside the Basic Multilingual Plane (BMP) in Prolog
Surrogates and Supplementary Characters
Possible solutions to enable SWI-Prolog work with all the Unicode code points
- Convert SWI-Prolog internal representation from
wchar_t
to something larger (64 bits) to hold all of the Unicode code points. - Donβt convert the Unicode to code points but leave them as an encoding such as UTF-8, or UTF-16 and create predicates to work with them at the Prolog level, or C functions to work with the encoding at the lower level. Example: How to make python 3 print() utf8
- A temporary solution. When converting from an encoding such as UTF-8 to a code point, convert the code points larger than U-FFFF to U-FFFD which is the Unicode replacement character