Alpha-- support for UTF-16: emoji on Windows

I have uploaded a lot of patches that establishes support for

  • Reading and writing UTF-16. The encoding names have changed from unicode_le and unicode_be to utf16le and utf16be, accepting the old as aliases. This affects all platforms.
  • Handle wchar_t arrays in the Windows version as UTF-16 rather than UCS-2.

As a result, Windows should be able to represent the full Unicode range, except the code units associated with UTF-16 surrogate pairs.

Status

  • These changes are quite large and may have caused some regression, also on non-Windows platforms.
  • Most seems to work on Windows. Some known glitches
    • The console (swipl-win.exe) doesn’t know about UTF-16, causing some problems with copy/paste and deleting/inserting the individual characters of a surrogate pair. It also doesn’t handle color fonts. These seems unsupported in the old Windows GDI functions. Anyone with Windows C API experience willing to pick this up?
  • XPCE has not been updated and thus will often go wrong on surrogate pairs.
  • Documentation is not updated.

Availability

Should be in tomorrows (June 26, 2022) daily build.

3 Likes