I’m using: SWI-Prolog (threaded, 64 bits, version 8.1.15)

I want the code to: calculate the normalized Levenshtein Distance as implemented in isub/4.

My code looks like this:

?- isub('Andesite', 'Enderbite', false, D).
D = 0.8008578431372549.

?- isub('Andesite', 'Enderbite', true, D).
D = 0.8008578431372549.

I do not understand the normalization, nor the role of the third parameter. Can . anyone please tell me what is going on here?


In the docs:

If Normalize is true , isub/4 applies string normalization as implemented by the original authors: Text1 and Text2 are mapped to lowercase and the characters "._ " are removed. Lowercase mapping is done with the C-library function towlower() . In general, the required normalization is domain dependent and is better left to the caller.

It “normalizes” the first and second argument before comparing. Here is an example where it makes a difference:

?- isub('BANANA', ananas, false, D).
D = 0.0.

?- isub('BANANA', ananas, true, D).
D = 0.8974358974358975.

I don’t know if D is “the normalized Levenshtein distance”. From the docs:

… a similarity measure between strings, i.e., something similar to the Levenshtein distance . This method is based on the length of common substrings.

I didn’t bother to read the C source too carefully but you must do it if you want to know what is really being calculated.

It is not doing Levenstein distance. See "A string metric for ontology alignment* by Giorgos Stoilos, 2005. As far as I recall, they do something that they claim works better. It works best for single word names/identifiers, etc. Still, these distance measures are fully domain independent and do not take into considerations that some letter (sequences) are considered very close to other letter (sequences) and this is language dependent.

Thank you Jan and Boris: I now understand that “normalization” refers to the strings and not to the Levenshtein distance. May I suggest that the documentation of isub/4 refer to the Solios paper explicitly? Thanks again for your help!

I think it is there already. If you open the section docs, you should see it at the very top:

Giorgos Stoilos
See also
A string metric for ontology alignment by Giorgos Stoilos, 2005.

Thanks Boris for pointing this out to me. I was looking at this page,

and it was not immediately obvious to me to expand on the link on the left to see that reference.