Prolog Tipp: Use UTF-8 code page on Windows

Just notice that Windows 10/11 provides a beta UTF-8
code page. You can enable it in your language administrator
settings and you have to restart Windows. It then shows:

C:\Users\foo>chcp
Aktive Codepage: 65001.

Nice benefit, tested with SWI-Prolog 9.1.12, you don’t need
encoding/1 directive if your Prolog text has some non-ASCII
and no BOM. This example worked, an UTF-8 file:

/* Not needed anymore
:- encoding(utf8).
*/
Etc..
lawvere((A /\ _), C, π1, H, H) :-
   unify_with_occurs_check(A, C).
lawvere((_ /\ B), C, π2, H, H) :-
   unify_with_occurs_check(B, C).
Etc..

It did consult without hassle the Greek pi. Is this even to
expect to work like this in SWI Prolog? I never understood
the encoding/1 directive, or why its necessary.

% c:/bar/baz/graph.p compiled 0.00 sec, 8 clauses

Disclaimer: I don’t know whether there are some applications
that would stumble over a global account setting to code page
65001, so you need to test this of course.

Python does some UTF-8 coercion, because it wants its runtime
to be UTF-8 where possible. You can read about it here:

PEP 538 – Coercing the legacy C locale to a UTF-8 based locale
https://peps.python.org/pep-0538/

I wonder whether SWI-Prolog could also work towards something
like that, so that code page change becomes unnecessary. Otherwise
there might a divergence between SWI-Prolog and Python:

“However, it comes at the cost of making CPython’s encoding assumptions
diverge from those of other locale-aware components in the same process,
as well as those of components running in subprocesses that share
the same environment.”
– PEP 538

But I didn’t test that far yet. i.e. cannot yet pin point some issues,
since the Python bundling for SWI-Prolog is new.

As is, SWI-Prolog tries to blend in with the locale of the system/user. And yes, it detects the Windows 65001 code page being active and automatically switches to UTF-8 if it finds this. That was in part the result of recent work on improving Unicode support on Windows where I found about this code page. Still, the Windows console doesn’t seem to understand Unicode > 0xffff :frowning: The PowerShell handles it with some minor hickups in copy/paste if I recall correctly.

The encoding directive can be used to make source code work regardless of the locale settings of the hosting system. It seems that practically every system converges to UTF-8, so this nightmare is likely to end in some years :slight_smile:

Finally, the Prolog flag encoding may be set to overrule the initial detection, so

 :- set_prolog_flag(encoding, utf8).

in your init.pl causes the system to use UTF-8 by default for all its (text) streams, regardless of the locale.