Qcompile and load_files loads files twice for unicode paths

I have a Prolog project containing several pl files, which are partly generated. During runtime, I use qcompile/1 to create qlf files if needed for some of the files and after that load the complete set of files using load_files/1. I know that qcompile already loads the file, but for simplicity I load the compiled qlf files again. This is no problem until I use non-ASCII characters in the file paths.

Here is a minimal example to reproduce the issue:

start.pl

:- encoding(utf8).
mymain :- qcompile('test.pl'),
	      qcompile('täst.pl'),
		  load_files(['test', 'täst']),
		  write('OK\n').

test.pl

:- encoding(utf8).
constant_ascii('ABC').

täst.pl

:- encoding(utf8).
constant_unicode('ABC').

I run the programm using the Windows build (tested with 7.6.4, 8.2.1 and the nightly build 2020-07-30):
swipl.exe -l start.pl -g mymain

As output I get:

Warning: Redefined static procedure constant_unicode/1
Warning: Previously defined at e:/projekte/prolog/test/täst.pl:2
OK

So täst.pl is loaded obviously twice, but test.pl only once. For me it seems to be that load_files only recognizes already loaded files, if the file path contains only ASCII characters.

Is this a bug in load_files?

Seems to be a bug in the Windows version. There is no problem if I run this on Linux, but running the Windows version on Linux using Wine reproduces the issue.

edit

Turns out a bug in reading .qlf files.
Fixed with efab329cd1cb5918868f4c90db1c4f37fae9c8ea

2 Likes

I can confirm that it works now with the newest nightly Windows build. Thank you.

I did some more tests with the nightly Windows build (2020-08-07). It seems that sequencial loading of single qlf files using load_files/2 remove the content of previous files, if their paths contain non-Latin-1 Unicode characters.

Following a minimal example to reproduce:

file1.pl

:- encoding(utf8).
constant_file1('A').

file2.pl

:- encoding(utf8).
constant_file2('B').

start.pl

:- encoding(utf8).

load_my_file(BASEFN):-
	file_name_extension(BASEFN, 'pl', PLFN),
	file_name_extension(BASEFN, 'qlf', QLFFN),
	(
		format("Checking ~w...\n", [QLFFN]),
		exists_file(QLFFN)->
			( write(' exists -> load_files\n'),load_files([BASEFN],[silent(true)]));
			( write(' exists not -> qcompile\n'), qcompile(PLFN) )
	).

mymain() :-
	load_my_file('file1'),
	constant_file1(X), format('constant_file1: ~w\n',[X]),
	load_my_file('file2'),
	constant_file2(Y), format('constant_file2: ~w\n',[Y]),
	constant_file1(Z), format('again constant_file1: ~w\n',[Z]),
	write('OK\n').

So the start.pl loads two files by either qcompiling them or by loading the previously compiled scripts. Then it prints out their content.

Put these files into a folder containing a Unicode character not in Latin-1, for example Tあst, cd to this folder and run
swipl.exe -l start.pl -g mymain.

Output:

Checking file1.qlf...
 exists not -> qcompile
constant_file1: A
Checking file2.qlf...
 exists not -> qcompile
constant_file2: B
again constant_file1: A
OK

Run it again:

Checking file1.qlf...
 exists -> load_files
constant_file1: A
Checking file2.qlf...
 exists -> load_files
constant_file2: B
ERROR: -g mymain,halt.: mymain/0: Unknown procedure: constant_file1/1
ERROR:   However, there are definitions for:
ERROR:         constant_file2/1

This error does not occur in a folder Test or Täst.
Interestingly, if the first run is in folder Test, then it is renamed to Tあst, then the second run also will not print this error.

1 Like

Your steps are very understandable except for one detail. Thanks.
Just doing a separate check for Jan using the latest Windows daily build and I can reproduce the error using

SWI-Prolog (threaded, 64 bits, version 8.3.5-8-ge14460a94)

The one detail in the instructions that needs clarification is

Put these files into a folder containing a Unicode character not in Latin-1, for example Tあst

should be

Put these files, except the two qlf files, into a folder containing a Unicode character not in Latin-1, for example Tあst

Thanks. Also reproduces on Linux.

Fixed with 39abeceec7736d8319a91798475973b84f3b3222. Your test runs fine now, both on Windows and Linux as well as saved on Windows, loaded on Linux and the other way around.

1 Like

Thank you. I can confirm that this is working. But:
Rename file1.pl to file🙂.pl (🙂 = U+1F642) and change it also in UTF-8 encoded start.pl / load_myfile(‘file🙂’).
Remove the qlf files and start again.

`ERROR: -g mymain,halt: source_sink `'file.pl'' does not exist`

 = U+F642

So it seems that the Unicode number is truncated. I read in another thread that Unicode support for U+10000+ is currently not working properly. Although if I rename only the Tあst to T🙂st it is working.

On Windows the Unicode characters are limited to U+FFFF and below. Do not confuse Unicode codes with UTF-8, UTF-16 etc. as those are encodings of Unicode codes.

See: Unicode code point for U+10000 on Windows OS. Use in string results in Illegal character code

1 Like

For short, supporting code point > 0xFFFF on Windows has been discussed a couple of times. It is not working. It is doable but certainly not trivial to fix and thus waiting for someone with the resources (money or programming effort) to get it done.

For anyone considering investing the time: I’ve had several discussions over the years on how it can best be done. Please contact me before you start hacking.