Predsort variants

mjano314 · January 28, 2021, 1:20pm

Hi All! I’m trying to use predsort/3 to sort a list and remove variant duplicates. The code si like this:

order_variants(=, A, B) :-
  A =@= B.
order_variants(Order, A, B) :-
  A \=@= B,   % i know, that this is not optimal
  compare(Order, A, B).

sorted_sorted(Sorted_1, Sorted_2) :-
  List_1 = [
    key(A)-value(A),
    key(B)-different_value(B),
    key(C)-value(C)
  ],
  List_2 = [
    key(P)-value(P),
    key(Q)-value(Q),
    key(R)-different_value(R)
  ],
  predsort(order_variants, List_1, Sorted_1),
  predsort(order_variants, List_2, Sorted_2).

Even though the List_1 and List_2 should be (variantly) the same, only with changed order of the elements, predsort/3 does not remove the duplicate key()-value() in the Sorted_1, but does it in the Sorted_2.

sorted_sorted(Sorted_1 ,Sorted_2).
Sorted_1 = [key(_19434)-value(_19434), key(_19454)-different_value(_19454), key(_19474)-value(_19474)],
Sorted_2 = [key(_19494)-value(_19494), key(_19534)-different_value(_19534)].

All the other checks confirm, that those two elements are considered as variants and there are no other solutions:

?- order_variants(Order, key(A)-value(A), key(C)-value(C)).
Order =  (=) ;
false.

?- key(A)-value(A) =@= key(C)-value(C).
true.

Could somebody please explain why? Thanks.

jan · May 13, 2023, 4:56pm

Use ?- trace(order_variants). Sorting relies on the comparison operator to be transitive, In this case it compares elements 1 and 2, where (1) is smaller as the A is introduced earlier and thus an older variable. Then it compares 2 and 3, where 2 is smaller. So, 1 < 3 holds based on these two comparisons. 1 is never directly compared to 3.

The problem is a bit harder than I expected. I though you could sort and next find the variants adjacent, so you can remove them in a linear scan. That however falls into the same trap. You’d need a comparison operator that can tell you what triggered the inequality, and notably whether or not that was a variable.

A correct way is to use keys that compare equal on variants:

variant_sort(Terms, Sorted) :-
    map_list_to_pairs(variant_key, Terms, Keyed),
    sort(1, @<, Keyed, SortedPairs),
    pairs_values(SortedPairs, Sorted).

variant_key(Term, Key) :-
    copy_term(Term, Key),
    numbervars(Key, 0, _).

You can also use variant_sha1/2 to create the keys if you trust a secure hash, i.e., theoretically you may loose non-variants due to hash collisions.

jan · May 13, 2023, 5:04pm

Not in the library, but it is fairly easy to adjust library predicate using the key based approach to create a sort/2 for variants. Possibly these make good additions to library(terms). Would be even better if there is prior art we can copy (notably the predicate name and the library name ).

j4n_bur53 · May 13, 2023, 10:07pm

You could use variant_key/2 to define a comparator and then use predsort/3:

variant_compare(C,X,Y) :-
    variant_key(X,K), variant_key(Y,J), compare(C,K,J).

variant_key(Term, Key) :-
    copy_term(Term, Key),
    numbervars(Key, 0, _).

Maybe not the fastest compared to the keying and unkeying method:

?- time((between(1, 100000, _),
   predsort(variant_compare, [3,f(X),2,f(Y),1], _), fail; true)).
% 10,799,998 inferences, 1.000 CPU in 0.980 seconds (102% CPU, 10799998 Lips)
true.

?- time((between(1, 100000, _),
   variant_sort([3,f(X),2,f(Y),1], _), fail; true)).
% 3,499,998 inferences, 0.297 CPU in 0.301 seconds (98% CPU, 11789467 Lips)
true.

jan · May 14, 2023, 6:35am

It is hard to beat the built-ins sort/2, keysort/2, msort/2 or sort/4. Besides, there are N*log(N) comparisons to be expected while you need two variant_key/2 calls for each. Space-wise it is less clear. you don’t need to carry the keys around using predsort/3, but predsort/3 itself uses more internal lists. sort/2 and friends create a new list as a copy of the original and do the work in-place.

j4n_bur53 · May 14, 2023, 8:16pm

The use case of a variant based list_to_set/2 would be a reified distinct/1.
You sure there is not somewhere a variant based list_to_set/2 already
in SWI-Prolog? Currently list_to_set/2 cannot reify distinct/1,

since list_to_set/2 is not variant based. Different results:

p(f(_)).
p(f(_)).

?- distinct(p(X)).
X = f(_) ;
false.

?- findall(X, p(X), L), list_to_set(L, R).
L = R, R = [f(_), f(_)].

Also observe that table/1 directive causes a distinct/1, i.e. a variant
based list_to_set/2 more or less, but its also not reified. You can
only access it tuple by tuple? Unless you wrap it with findall/3:

:- table q/1.
q(f(_)).
q(f(_)).

?- q(X).
X = f(_).

?- findall(X, q(X), L).
L = [f(_)].

jan · May 15, 2023, 7:24am

Pretty much

Why not use this is you want a list of distinct (variant) answers?

?- findall(X, distinct(p(X)), L).

I do see value in these variant-based predicates. I’m mostly interested in existing practice that may give names and library names.

j4n_bur53 · May 16, 2023, 1:14pm

It has the advantage over list_to_set2/2 that it preserves the input
order. The input order is not preserved through nb_set_to_list/2,
since nb_set only realizes a hash table, but not a linked hash table

or something similar that preserves input order. So I get these results:

?- list_to_set2([3,f(X),2,f(Y),1], Z).
Z = [1, 2, 3, f(_)].

?- findall(T, distinct(T,member(T,[3,f(X),2,f(Y),1])), L).
L = [3, f(_), 2, 1].

Now I thought nb_set_to_list/2 has better performance. But this is not the case:

?- time((between(1, 100000, _),
      list_to_set2([3,f(X),2,f(Y),1], _), fail; true)).
% 15,353,440 inferences, 1.250 CPU in 1.247 seconds (100% CPU, 12282752 Lips)
true.

?- time((between(1, 100000, _), findall(T, 
    distinct(T,member(T,[3,f(X),2,f(Y),1])), L), fail; true)).
% 8,600,000 inferences, 0.719 CPU in 0.730 seconds (98% CPU, 11965217 Lips)
true.

I guess some slow down comes from an additional sort/2 in nb_set_to_list/2:

nb_set_to_list(nb_set(Buckets, _Size), OrdSet) :-
    compound_name_arguments(Buckets, _, Args),
    append(Args, List),
    sort(List, OrdSet).

Topic		Replies	Views
Predsorting a List and retaining All Elements with Equal Keys Help!	6	348	February 18, 2020
Sorting predicates in PROLOG Help! how-to	23	3296	August 3, 2022
On sorting non-ground terms and the standard order of terms General	26	1772	January 2, 2020
Custom order in order_by Help!	10	801	August 11, 2020
What does sort/2 do? Predicate	34	7673	December 22, 2020

Predsort variants

Related topics