If Normalize is true , isub/4 applies string normalization as implemented by the original authors: Text1 and Text2 are mapped to lowercase and the characters "._ " are removed. Lowercase mapping is done with the C-library function towlower() . In general, the required normalization is domain dependent and is better left to the caller.
It “normalizes” the first and second argument before comparing. Here is an example where it makes a difference:
?- isub('BANANA', ananas, false, D).
D = 0.0.
?- isub('BANANA', ananas, true, D).
D = 0.8974358974358975.
I don’t know if D is “the normalized Levenshtein distance”. From the docs:
… a similarity measure between strings, i.e., something similar to the Levenshtein distance . This method is based on the length of common substrings.
I didn’t bother to read the C source too carefully but you must do it if you want to know what is really being calculated.
It is not doing Levenstein distance. See "A string metric for ontology alignment* by Giorgos Stoilos, 2005. As far as I recall, they do something that they claim works better. It works best for single word names/identifiers, etc. Still, these distance measures are fully domain independent and do not take into considerations that some letter (sequences) are considered very close to other letter (sequences) and this is language dependent.
Thank you Jan and Boris: I now understand that “normalization” refers to the strings and not to the Levenshtein distance. May I suggest that the documentation of isub/4 refer to the Solios paper explicitly? Thanks again for your help!