Predsorting a List and retaining All Elements with Equal Keys

Hello,

I would like to sort a list of key-element pairs, using predsort/3. I have defined a custom comparator that I pass to predsort/3; but in the resulting list, only the first element of the “equal” elements is retained; the others are dismissed.

This is a list that I want to sort according to the keys which are floating point numbers:
List = [4.001512214151045-[a,m],4.0-[b,m],3.9962607522532854-[c,m],3.996210705155572-[d,m],7.996561761157103-[a,c],2.4278591392418134-[c,e],5.568707210834485-[a,e],7.35185010728592-[b,d],3.6017356926904003-[d,e],3.7501199980800615-[b,e],4.459831835394693-[b,c],2.2271281956816043-[c,n12],2.2327113561766105-[b,n12],5.2559014450425146-[n11,n12],2.232576986354558-[e,n12],3.0233259830855155-[e,n11],6.635209115016647-[a,d],1.9580091930325554-[d,n11],4.6772000171042505-[a,n11],6.7180428697649734-[a,b],4.346147719532781-[c,d]]

This is the call to the sorting routine:
predsort(compare_with_tolerance, List, Sorted).

This is the sorted list:
Sorted = [2.4278591392418134-[c,e],3.0233259830855155-[e,n11],4.001512214151045-[a,m],5.568707210834485-[a,e],6.635209115016647-[a,d],7.35185010728592-[b,d],7.996561761157103-[a,c]]

This is the custom comparator which basically considers keys as equal as long as they do not differ from each other by more than a given threshold.

compare_with_tolerance(=, Key1-_, Key2-_):-
		Tolerance = 0.5,
		Diff is abs(Key1-Key2),
		Diff =< Tolerance.
		
compare_with_tolerance(<, Key1-_, Key2-_):-
		Tolerance = 0.5,
		Diff1 is Key1-Key2,
		Diff is abs(Diff1),
		Diff > Tolerance,
		Diff1 < 0.		
		
compare_with_tolerance(>, Key1-_, Key2-_):-
		Tolerance = 0.5,
		Diff1 is Key1-Key2,
		Diff is abs(Diff1),
		Diff > Tolerance,
		Diff1 > 0.		

How do I make the predsort/3 retain all elements with “tolerably-equal” keys?

Thanks.

I think I messed up big. This predicate is meant to remove duplicates, so I guess no fixing that.

The big puzzle for me is why you want a fuzzy order and keep duplicates. Your fuzzy comparison may (I think) claim A = B, A < B and A > B at the same time, which will confuse most sorting algorithms.

I would simply sort the stuff first using keysort/2 or sort/4 and then post process the sorted list, dealing with fuzziness. Can’t that work?

I am trying to rely solely on built-in predicates to produce the result I want. My main objective is to partition a list of key-element pairs into sets such that the keys of the elements of each set are equal within some margin of error. I was hoping that a built-in sorting algorithm would produce a list of the form [k1-e11, k1-e12,...,k2-e21,k2-e22,k2-e23,...] that I feed to group_pairs_by_key/2 to give me the partitioning I want i.e. a bunch of subsets e.g. a set of elements whose key is k1, another whose key is k2, etc. I just naively assumed that the sorting algorithm would pick the first occurring key as the representative key for a group of elements with almost-equal keys.

As for the fuzzy part, what is it I am doing wrong? Are there Key1, Key2 combinations that would make compare_with_tolerance/3 evaluate to true in all three clauses? Is this what you meant?

I thought that was the case, but it isn’t. The transitivity of = is lost though, which seems worrying. Sorting and a modified version of group_pairs_by_key/2 is probably the way out, but there we see the transitivity problem re-appears: if consecutive elements are within the tolerance, can we keep adding them to the same set? This may mean that the first and last element of a set are above the tolerance. If that is not allowed, we get an ambiguity, if [a,b,c] cannot be a group, we may have to choose between [a] and [b,c] or [a,b] and [c]. Of course, Prolog can handle non-determinism :slight_smile:

I don’t think you are going to get there using library and built-in only …

Thanks :slight_smile: I was afraid of that…

I believe the algorithm should act as follows to change [1-a1,1.2-a2,1.5-a3, 2.1-b, 3.2-c1,3.4-c2] to [1-[a1,a2,a3], 2.1-[b], 3.2-[c1,c2]]

group_pairs_by_key_with_tolerance(SortedPairsByKey, Result):-
    % Initially: SortedPairsByKey = [1-a1,1.2-a2,1.5-a3, 2.1-b, 3.2-c1,3.4-c2]
    % Initially: Result = []
    % Get tolerance T=0.5,
    % Read first element a1 of SortedPairsByKey which is first element of first subset , now Result=[1-[a1]]
    % Read next element a2 of SortedPairsByKey, compare it to previous element a1, and if difference is within acceptable margin, attach it to a1, Result=[1-[a1,a2]]
    % Read next element a3 of SortedPairsByKey, compare it to previous element a2, and if difference is within acceptable margin, attach it to a2, Result=[1-[a1,a2,a3]]
    % Read next element b of SortedPairsByKey, compare it to previous element a3, and if difference is within acceptable margin, attach it to a3, in this case it is not, so Result=[1-[a1,a2,a3], 2.1-[b]]
	% Read next element c1 of SortedPairsByKey, compare it to previous element b, and if difference is within acceptable margin, attach it to b, in this case it is not, so Result=[1-[a1,a2,a3], 2.1-[b], 3.2-[c1]]
	% Read next element c2 of SortedPairsByKey, compare it to previous element c1, and if difference is within acceptable margin, attach it to c1, so Result=[1-[a1,a2,a3], 2.1-[b], 3.2-[c1,c2]]

How should I go about doing this? I mean constructing a list of sublists where the number of sublists is unknown apriori? I have never done this, and I cannot get my head around it.

Any head start will be very much appreciated.

What you describe sounds a lot like clustering. Start for example by trying out hierarchical clustering.

Your original question:

Use predsort/3 but do not remove equivalent elements

You can trick predsort/3 to not remove duplicates: just make sure your comparison never says two elements are equivalent.

?- [user].
|: my_compare(Order, X, Y) :- compare(Order0, X, Y), 
|:     ( Order0 == '=' -> Order = '<' ; Order0 = Order ).
|: ^D% user://1 compiled 0.03 sec, 1 clauses
true.

?- predsort(my_compare, [a,b,c,a,b,c], Sorted).
Sorted = [a, a, b, b, c, c].

I am not sure how this helps you with your problem though.

2 Likes