Isub/4

andrea.cortis · November 20, 2019, 9:05pm

I’m using: SWI-Prolog (threaded, 64 bits, version 8.1.15)

I want the code to: calculate the normalized Levenshtein Distance as implemented in isub/4.

My code looks like this:

?- isub('Andesite', 'Enderbite', false, D).
D = 0.8008578431372549.

?- isub('Andesite', 'Enderbite', true, D).
D = 0.8008578431372549.

I do not understand the normalization, nor the role of the third parameter. Can . anyone please tell me what is going on here?

Thanks

Boris · November 21, 2019, 4:53am

In the docs:

If Normalize is true , isub/4 applies string normalization as implemented by the original authors: Text1 and Text2 are mapped to lowercase and the characters "._ " are removed. Lowercase mapping is done with the C-library function towlower() . In general, the required normalization is domain dependent and is better left to the caller.

It “normalizes” the first and second argument before comparing. Here is an example where it makes a difference:

?- isub('BANANA', ananas, false, D).
D = 0.0.

?- isub('BANANA', ananas, true, D).
D = 0.8974358974358975.

I don’t know if D is “the normalized Levenshtein distance”. From the docs:

… a similarity measure between strings, i.e., something similar to the Levenshtein distance . This method is based on the length of common substrings.

I didn’t bother to read the C source too carefully but you must do it if you want to know what is really being calculated.

jan · November 21, 2019, 7:56am

It is not doing Levenstein distance. See "A string metric for ontology alignment* by Giorgos Stoilos, 2005. As far as I recall, they do something that they claim works better. It works best for single word names/identifiers, etc. Still, these distance measures are fully domain independent and do not take into considerations that some letter (sequences) are considered very close to other letter (sequences) and this is language dependent.

andrea.cortis · November 21, 2019, 11:27am

Thank you Jan and Boris: I now understand that “normalization” refers to the strings and not to the Levenshtein distance. May I suggest that the documentation of isub/4 refer to the Solios paper explicitly? Thanks again for your help!

Boris · November 21, 2019, 12:02pm

I think it is there already. If you open the section docs, you should see it at the very top:

author
Giorgos Stoilos
See also
A string metric for ontology alignment by Giorgos Stoilos, 2005.

andrea.cortis · November 21, 2019, 12:42pm

Thanks Boris for pointing this out to me. I was looking at this page,

https://www.swi-prolog.org/pldoc/doc_for?object=isub/4

and it was not immediately obvious to me to expand on the link on the left to see that reference.

Topic		Replies	Views
Bug in isub/4? Predicate	23	1308	April 28, 2021
Issue with processing Unicode General bug	15	275	October 31, 2023
Expects_dialect(sicstus): how to handle SICStus 3 vs. 4 API conflicts Discussion	13	814	January 6, 2021
Declarative meaning of `more generic than`? General subsumes_term	23	482	October 20, 2023
Unicode symbols, and a possible numerical paradox! Discussion	27	2259	June 27, 2022

Isub/4

Related topics