# Predsort variants

Hi All! I’m trying to use predsort/3 to sort a list and remove variant duplicates. The code si like this:

``````order_variants(=, A, B) :-
A =@= B.
order_variants(Order, A, B) :-
A \=@= B,   % i know, that this is not optimal
compare(Order, A, B).

sorted_sorted(Sorted_1, Sorted_2) :-
List_1 = [
key(A)-value(A),
key(B)-different_value(B),
key(C)-value(C)
],
List_2 = [
key(P)-value(P),
key(Q)-value(Q),
key(R)-different_value(R)
],
predsort(order_variants, List_1, Sorted_1),
predsort(order_variants, List_2, Sorted_2).
``````

Even though the List_1 and List_2 should be (variantly) the same, only with changed order of the elements, predsort/3 does not remove the duplicate key()-value() in the Sorted_1, but does it in the Sorted_2.

``````sorted_sorted(Sorted_1 ,Sorted_2).
Sorted_1 = [key(_19434)-value(_19434), key(_19454)-different_value(_19454), key(_19474)-value(_19474)],
Sorted_2 = [key(_19494)-value(_19494), key(_19534)-different_value(_19534)].
``````

All the other checks confirm, that those two elements are considered as variants and there are no other solutions:

``````?- order_variants(Order, key(A)-value(A), key(C)-value(C)).
Order =  (=) ;
false.

?- key(A)-value(A) =@= key(C)-value(C).
true.
``````

Could somebody please explain why? Thanks.

Same problem here, but not for sort, but for list_to_set/2. Is there
some built-in around that gives a variant reduced list_to_set/2?

``````?- list_to_set([p(X),p(Y)], L).
L = [p(X), p(Y)].
``````

We would get:

``````?- list_to_set([p(X),p(Y)], L).
L = [p(X)].
``````

Use `?- trace(order_variants).` Sorting relies on the comparison operator to be transitive, In this case it compares elements 1 and 2, where (1) is smaller as the A is introduced earlier and thus an older variable. Then it compares 2 and 3, where 2 is smaller. So, 1 < 3 holds based on these two comparisons. 1 is never directly compared to 3.

The problem is a bit harder than I expected. I though you could sort and next find the variants adjacent, so you can remove them in a linear scan. That however falls into the same trap. You’d need a comparison operator that can tell you what triggered the inequality, and notably whether or not that was a variable.

A correct way is to use keys that compare equal on variants:

``````variant_sort(Terms, Sorted) :-
map_list_to_pairs(variant_key, Terms, Keyed),
sort(1, @<, Keyed, SortedPairs),
pairs_values(SortedPairs, Sorted).

variant_key(Term, Key) :-
copy_term(Term, Key),
numbervars(Key, 0, _).
``````

You can also use variant_sha1/2 to create the keys if you trust a secure hash, i.e., theoretically you may loose non-variants due to hash collisions.

1 Like

Not in the library, but it is fairly easy to adjust library predicate using the key based approach to create a sort/2 for variants. Possibly these make good additions to library(terms). Would be even better if there is prior art we can copy (notably the predicate name and the library name ).

You could use `variant_key/2` to define a comparator and then use predsort/3:

``````variant_compare(C,X,Y) :-
variant_key(X,K), variant_key(Y,J), compare(C,K,J).

variant_key(Term, Key) :-
copy_term(Term, Key),
numbervars(Key, 0, _).
``````

Maybe not the fastest compared to the keying and unkeying method:

``````?- time((between(1, 100000, _),
predsort(variant_compare, [3,f(X),2,f(Y),1], _), fail; true)).
% 10,799,998 inferences, 1.000 CPU in 0.980 seconds (102% CPU, 10799998 Lips)
true.

?- time((between(1, 100000, _),
variant_sort([3,f(X),2,f(Y),1], _), fail; true)).
% 3,499,998 inferences, 0.297 CPU in 0.301 seconds (98% CPU, 11789467 Lips)
true.
``````

It is hard to beat the built-ins sort/2, keysort/2, msort/2 or sort/4. Besides, there are N*log(N) comparisons to be expected while you need two variant_key/2 calls for each. Space-wise it is less clear. you don’t need to carry the keys around using predsort/3, but predsort/3 itself uses more internal lists. sort/2 and friends create a new list as a copy of the original and do the work in-place.

The use case of a variant based list_to_set/2 would be a reified distinct/1.
You sure there is not somewhere a variant based list_to_set/2 already
in SWI-Prolog? Currently list_to_set/2 cannot reify distinct/1,

since list_to_set/2 is not variant based. Different results:

``````p(f(_)).
p(f(_)).

?- distinct(p(X)).
X = f(_) ;
false.

?- findall(X, p(X), L), list_to_set(L, R).
L = R, R = [f(_), f(_)].
``````

Also observe that table/1 directive causes a distinct/1, i.e. a variant
based list_to_set/2 more or less, but its also not reified. You can
only access it tuple by tuple? Unless you wrap it with findall/3:

``````:- table q/1.
q(f(_)).
q(f(_)).

?- q(X).
X = f(_).

?- findall(X, q(X), L).
L = [f(_)].
``````

Ok I gave it a try as follows:

``````list_to_set2(L, R) :-
empty_nb_set(S),
nb_set_to_list(S, R).
``````

Seems to work:

``````?- list_to_set2([f(_), f(_)], R).
R = [f(_)].
``````

But the documentation of add_nb_set/2 is outdated, it says:

If Key is not in Set and New is unified to `true`

Whereas inside the source code I read, more specific that variant is used:

Insert Key into the set. If a variant (see =@=/2) of Key is already in the set …

Edit 14.05.2023
What about the performance of `list_to_set2/2`. Here some results.
`list_to_set2/2` will not sort, but nevertheless deduplicate. Which might
be the desired functionality:

``````?- time((between(1, 100000, _),
list_to_set2([3,f(X),2,f(Y),1], _), fail; true)).
% 15,353,440 inferences, 1.250 CPU in 1.247 seconds (100% CPU, 12282752 Lips)
true.

?- time((between(1, 100000, _),
predsort(variant_compare, [3,f(X),2,f(Y),1], _), fail; true)).
% 10,799,998 inferences, 1.000 CPU in 0.980 seconds (102% CPU, 10799998 Lips)
true.

?- time((between(1, 100000, _),
variant_sort([3,f(X),2,f(Y),1], _), fail; true)).
% 3,499,998 inferences, 0.297 CPU in 0.301 seconds (98% CPU, 11789467 Lips)
true.
``````

Ok, was hoping that it excels. But even slower than predsort/3 !

Pretty much

Why not use this is you want a list of distinct (variant) answers?

``````?- findall(X, distinct(p(X)), L).
``````

I do see value in these variant-based predicates. I’m mostly interested in existing practice that may give names and library names.

It has the advantage over list_to_set2/2 that it preserves the input
order. The input order is not preserved through nb_set_to_list/2,
since nb_set only realizes a hash table, but not a linked hash table

or something similar that preserves input order. So I get these results:

``````?- list_to_set2([3,f(X),2,f(Y),1], Z).
Z = [1, 2, 3, f(_)].

?- findall(T, distinct(T,member(T,[3,f(X),2,f(Y),1])), L).
L = [3, f(_), 2, 1].
``````

Now I thought nb_set_to_list/2 has better performance. But this is not the case:

``````?- time((between(1, 100000, _),
list_to_set2([3,f(X),2,f(Y),1], _), fail; true)).
% 15,353,440 inferences, 1.250 CPU in 1.247 seconds (100% CPU, 12282752 Lips)
true.

?- time((between(1, 100000, _), findall(T,
distinct(T,member(T,[3,f(X),2,f(Y),1])), L), fail; true)).
% 8,600,000 inferences, 0.719 CPU in 0.730 seconds (98% CPU, 11965217 Lips)
true.
``````

I guess some slow down comes from an additional sort/2 in nb_set_to_list/2:

``````nb_set_to_list(nb_set(Buckets, _Size), OrdSet) :-
compound_name_arguments(Buckets, _, Args),
append(Args, List),
sort(List, OrdSet).
``````