Hello,
I get some issues with processing Unicode from this chineese dictionary: https://handedict.zydeo.net/de/download (handedict.u8.gz) within this predicate:
kmp_isub(T1,T2,Simil) :-
catch( % okazas iuj nevalidaj unikod-literoj !?
(
downcase_atom(T1,S1),
downcase_atom(T2,S2),
isub(S1,S2,Simil,[zero_to_one(true)])
),
E,
(
writeln(E),
zhde(Zh,T2),
writeln(Zh),
Simil = 0.0
)
),
Simil > 0.5 .
Thread 1 (main): foreign predicate downcase_atom/2 did not clear exception:
error(representation_error(code_point),context(system:downcase_atom/2,_4458584))
error(instantiation_error,context(isub: $isub/5,_4458638))
〥 〥 [wu3]
Thread 1 (main): foreign predicate downcase_atom/2 did not clear exception:
error(representation_error(code_point),context(system:downcase_atom/2,_4466104))
error(instantiation_error,context(isub: $isub/5,_4466158))
〤 〤 [si4]
I suppose that the issue might be within isub, as downcase_atom on one those lines is working ok:
?- downcase_atom('〥 〥 [wu3] /5 (Num) (im Suzhou Zahlenystem 蘇州碼子|苏州码子[su1 zhou1 ma3 zi5])/',A).
A = '〥 〥 [wu3] /5 (num) (im suzhou zahlenystem 蘇州碼子|苏州码子[su1 zhou1 ma3 zi5])/'.
echo '〥 〥 [wu3] /5 (Num) (im Suzhou Zahlenystem 蘇州碼子|苏州码子[su1 zhou1 ma3 zi5])/'|hexdump -C
00000000 e3 80 a5 20 e3 80 a5 20 5b 77 75 33 5d 20 2f 35 |... ... [wu3] /5|
00000010 20 28 4e 75 6d 29 20 28 69 6d 20 53 75 7a 68 6f | (Num) (im Suzho|
00000020 75 20 5a 61 68 6c 65 6e 79 73 74 65 6d 20 e8 98 |u Zahlenystem ..|
00000030 87 e5 b7 9e e7 a2 bc e5 ad 90 7c e8 8b 8f e5 b7 |..........|.....|
00000040 9e e7 a0 81 e5 ad 90 5b 73 75 31 20 7a 68 6f 75 |.......[su1 zhou|
00000050 31 20 6d 61 33 20 7a 69 35 5d 29 2f 0a |1 ma3 zi5])/.|