And just to add more confusion, code point representations aren’t necessarily unique.
For example, try this query (depending on your terminal, you may see how InNfc and InNfd aren’t the same):
?- In = "İzmir", % https://en.wikipedia.org/wiki/%C4%B0zmir
string_codes(In, Codes),
unicode_nfc( In, InNfc), string_codes(InNfc, CodesNfc),
unicode_nfkc(In, InNfkc), string_codes(InNfkc, CodesNfkc),
unicode_nfd( In, InNfd), string_codes(InNfd, CodesNfd),
unicode_nfkd(In, InNfkd), string_codes(InNfkd, CodesNfkd).
In = "İzmir",
Codes = CodesNfc, CodesNfc = CodesNfkc, CodesNfkc = [304, 122, 109, 105, 114],
InNfc = InNfkc, InNfkc = 'İzmir',
InNfd = InNfkd, InNfkd = 'İzmir',
CodesNfd = CodesNfkd, CodesNfkd = [73, 775, 122, 109, 105, 114].
This comes from the following test case in Python, which illustrates even more Unicode madness by applying the lower
operation (I was too lazy to do the equivalent in the Prolog example):
import unicodedata
base_s = 'İzmir'
for form, s in [('(unnormalized)', base_s)] + [
(form, unicodedata.normalize(form, base_s))
for form in ('NFC', 'NFKC', 'NFD', 'NFKD')]:
print(form, [c.lower() for c in s])
print(form, [c for c in s.lower()])