Some predicates do not work properly with supplementary Unicode characters in Windows

I saw that with stable version of 9.0.x now also supplementary Unicode characters (i.e., >U+FFFF) are supported in Windows. Thanks for that.

I’ve done some checks with Windows 64 bit version of SWI-Prolog 9.0.3 with such Unicode characters and found following issues:

xml_quote_attribute/2 does not create the XML entities correctly on Windows. It converts the high and low surrogates separately, which creates invalid content:

?- xml_quote_attribute('🙂',X).
Windows:
X = '��'.
Linux:
X = '🙂'.

string_length/2 on atoms does not count correctly the number of characters (Unicode codepoints), but atom_length/2 does. According to documentation they should be functionally equivalent.

?- atom_length('🙂',AL),string_length('🙂',SL).
Windows:
AL = 1, SL = 2.
Linux:
AL = SL, SL = 1.

Only the core system and some of the extensions are supposed to handle the full Unicode range on Windows :frowning: library(sgml) is not part of that. I pushed some patches that surely deal with this and a lot more. How complete UTF-16 support is, I do not know. If you want to help, download the next daily version and provide test cases.

Thanks for reporting. They now share the implementation, so this problem is gone :slight_smile:

1 Like