Predsort variants

Hi All! I’m trying to use predsort/3 to sort a list and remove variant duplicates. The code si like this:

order_variants(=, A, B) :-
  A =@= B.
order_variants(Order, A, B) :-
  A \=@= B,   % i know, that this is not optimal
  compare(Order, A, B).

sorted_sorted(Sorted_1, Sorted_2) :-
  List_1 = [
    key(A)-value(A),
    key(B)-different_value(B),
    key(C)-value(C)
  ],
  List_2 = [
    key(P)-value(P),
    key(Q)-value(Q),
    key(R)-different_value(R)
  ],
  predsort(order_variants, List_1, Sorted_1),
  predsort(order_variants, List_2, Sorted_2).

Even though the List_1 and List_2 should be (variantly) the same, only with changed order of the elements, predsort/3 does not remove the duplicate key()-value() in the Sorted_1, but does it in the Sorted_2.

sorted_sorted(Sorted_1 ,Sorted_2).
Sorted_1 = [key(_19434)-value(_19434), key(_19454)-different_value(_19454), key(_19474)-value(_19474)],
Sorted_2 = [key(_19494)-value(_19494), key(_19534)-different_value(_19534)].

All the other checks confirm, that those two elements are considered as variants and there are no other solutions:

?- order_variants(Order, key(A)-value(A), key(C)-value(C)).
Order =  (=) ;
false.

?- key(A)-value(A) =@= key(C)-value(C).
true.

Could somebody please explain why? Thanks.

Use ?- trace(order_variants). Sorting relies on the comparison operator to be transitive, In this case it compares elements 1 and 2, where (1) is smaller as the A is introduced earlier and thus an older variable. Then it compares 2 and 3, where 2 is smaller. So, 1 < 3 holds based on these two comparisons. 1 is never directly compared to 3.

The problem is a bit harder than I expected. I though you could sort and next find the variants adjacent, so you can remove them in a linear scan. That however falls into the same trap. You’d need a comparison operator that can tell you what triggered the inequality, and notably whether or not that was a variable.

A correct way is to use keys that compare equal on variants:

variant_sort(Terms, Sorted) :-
    map_list_to_pairs(variant_key, Terms, Keyed),
    sort(1, @<, Keyed, SortedPairs),
    pairs_values(SortedPairs, Sorted).

variant_key(Term, Key) :-
    copy_term(Term, Key),
    numbervars(Key, 0, _).

You can also use variant_sha1/2 to create the keys if you trust a secure hash, i.e., theoretically you may loose non-variants due to hash collisions.

1 Like

Not in the library, but it is fairly easy to adjust library predicate using the key based approach to create a sort/2 for variants. Possibly these make good additions to library(terms). Would be even better if there is prior art we can copy (notably the predicate name and the library name :slight_smile: ).

You could use variant_key/2 to define a comparator and then use predsort/3:

variant_compare(C,X,Y) :-
    variant_key(X,K), variant_key(Y,J), compare(C,K,J).

variant_key(Term, Key) :-
    copy_term(Term, Key),
    numbervars(Key, 0, _).

Maybe not the fastest compared to the keying and unkeying method:

?- time((between(1, 100000, _),
   predsort(variant_compare, [3,f(X),2,f(Y),1], _), fail; true)).
% 10,799,998 inferences, 1.000 CPU in 0.980 seconds (102% CPU, 10799998 Lips)
true.

?- time((between(1, 100000, _),
   variant_sort([3,f(X),2,f(Y),1], _), fail; true)).
% 3,499,998 inferences, 0.297 CPU in 0.301 seconds (98% CPU, 11789467 Lips)
true.

It is hard to beat the built-ins sort/2, keysort/2, msort/2 or sort/4. Besides, there are N*log(N) comparisons to be expected while you need two variant_key/2 calls for each. Space-wise it is less clear. you don’t need to carry the keys around using predsort/3, but predsort/3 itself uses more internal lists. sort/2 and friends create a new list as a copy of the original and do the work in-place.

The use case of a variant based list_to_set/2 would be a reified distinct/1.
You sure there is not somewhere a variant based list_to_set/2 already
in SWI-Prolog? Currently list_to_set/2 cannot reify distinct/1,

since list_to_set/2 is not variant based. Different results:

p(f(_)).
p(f(_)).

?- distinct(p(X)).
X = f(_) ;
false.

?- findall(X, p(X), L), list_to_set(L, R).
L = R, R = [f(_), f(_)].

Also observe that table/1 directive causes a distinct/1, i.e. a variant
based list_to_set/2 more or less, but its also not reified. You can
only access it tuple by tuple? Unless you wrap it with findall/3:

:- table q/1.
q(f(_)).
q(f(_)).

?- q(X).
X = f(_).

?- findall(X, q(X), L).
L = [f(_)].

Pretty much :slight_smile:

Why not use this is you want a list of distinct (variant) answers?

?- findall(X, distinct(p(X)), L).

I do see value in these variant-based predicates. I’m mostly interested in existing practice that may give names and library names.

It has the advantage over list_to_set2/2 that it preserves the input
order. The input order is not preserved through nb_set_to_list/2,
since nb_set only realizes a hash table, but not a linked hash table

or something similar that preserves input order. So I get these results:

?- list_to_set2([3,f(X),2,f(Y),1], Z).
Z = [1, 2, 3, f(_)].

?- findall(T, distinct(T,member(T,[3,f(X),2,f(Y),1])), L).
L = [3, f(_), 2, 1].

Now I thought nb_set_to_list/2 has better performance. But this is not the case:

?- time((between(1, 100000, _),
      list_to_set2([3,f(X),2,f(Y),1], _), fail; true)).
% 15,353,440 inferences, 1.250 CPU in 1.247 seconds (100% CPU, 12282752 Lips)
true.

?- time((between(1, 100000, _), findall(T, 
    distinct(T,member(T,[3,f(X),2,f(Y),1])), L), fail; true)).
% 8,600,000 inferences, 0.719 CPU in 0.730 seconds (98% CPU, 11965217 Lips)
true.

I guess some slow down comes from an additional sort/2 in nb_set_to_list/2:

nb_set_to_list(nb_set(Buckets, _Size), OrdSet) :-
    compound_name_arguments(Buckets, _, Args),
    append(Args, List),
    sort(List, OrdSet).