Is_ordset/1 with atoms and variables

When using is_ordset/1 with atoms and variables, the variables need to come before the atoms.

?- is_ordset([a,A]).
false.

fails because the variable is in the list after the atom.

However,

?- is_ordset([A,a]).
true.

succeeds because the variable is in the list before the atom.


Another example.

?- L = [_,_,A,B,a,b],is_ordset(L),write_term(L,[]).
[_722,_728,_734,_740,a,b]
L = [_, _, A, B, a, b].
true.

In general, it’s a bad idea to have variables in terms in an ordset because they can change their comparison when they become partially or fully instantiated. Taking your example, but modifying it slightly:

?- L = [_,_,A,B,a,b], A=B, is_ordset(L),write_term(L,[]).
false.

?- L = [_,_,A,B,a,b], A=_, is_ordset(L),write_term(L,[]).
[_8038,_8044,_8050,_8056,a,b]
L = [_, _, A, B, a, b].

I’d not go that far. For many computations about term sets ordered sets are a good solution. For example, below is a perfectly sensible way to find out which variables are shared between T1 and T2. Vars is also an ordered set, but one should simply consider it a list unless it is certain nothing touches the variables.

shared_variables(T1, T2, Vars) :-
    term_variables(T1, V1), sort(V1, SV1),
    term_variables(T2, V2), sort(V2, SV2),
    ord_intersection(SV1, SV2, Vars).

For short, ordered sets, library(assoc) and library(rbtress), etc. are perfect with variables as long as one is certain the set of elements/keys is not instantiated during the relevant part of the computation.

Note that attributed variables can be a good alternative for reasoning about variables.

2 Likes

But - is this ever needed? Many (most?) programming languages insist that the key for a lookup be immutable, and that doesn’t seem to bother anyone.

As for shared_variables/3, this does the job without any variables to ord_intersection/3 (but the code feels rather clunky), using numbervars/1 to replace all the variables with '$VAR'(_) terms and then looking up the intersection '$VAR'(_) terms, to get the original variables:

shared_variables(T1, T2, Vars) :-
    term_variables(T1-T2, T1T2Vars),
    term_variables(T1, V1),
    term_variables(T2, V2),
    copy_term(V1-V2, CV1-CV2),
    numbervars(CV1-CV2),
    sort(CV1, SCV1),
    sort(CV2, SCV2),
    ord_intersection(SCV1, SCV2, Vars0),
    maplist(lookup_var(T1T2Vars), Vars0, Vars).

lookup_var(T1T2Vars, VarN, Var) :-
    var_number(VarN, N),
    nth0(N, T1T2Vars, Var).
1 Like

Being interested in the peter.ludemann’s code, which looks persuasive to me, I have compared speed with the Jan’s code, which I am used to be more familiar with. Unfortunately, a result is that the Jan’s code is about 30 time faster, which is more than I expected.

% ?- length(L, 1000), time(repeat(10000, shared_variables(L, L, Vs))).
%@ % 918,490,001 inferences, 70.236 CPU in 70.341 seconds (100% CPU, 13077220 Lips)
%@ L = [_, _, _, _, _, _, _, _, _|...].

% ?- length(L, 1000), time(repeat(10000, jw_shared_variables(L, L, Vs))).
%@ % 40,090,001 inferences, 2.177 CPU in 2.179 seconds (100% CPU, 18413813 Lips)
%@ L = [_, _, _, _, _, _, _, _, _|...].

That doesn’t surprise me – there’s probably a lot of overhead from copy_term/2, numbervars/1, maplist(lookup_var).

For meta-level manipulation of terms, there are two ways:

  1. var/1, atom/1, number/1, and similar to detect variables vs terms.
  2. transform the terms into a non-“defaulty” form (e.g., transform X=a into term('=', [var('X'), atom(a)]).

For option 2, there’s a one-time overhead (unless there’s a read_term/2 option that returns the non-“defaulty” form), but after that, the cost of the various operations should be very similar – that is testing for Term=var(_) would be similar to the cost of var(Term). The non-“defaulty” form has the advantage of being able to keep extra information information (e.g., variable name, source location) that otherwise would need separate parallel book-keeping (similar to read-term/2 subterm_positions(_)).

For meta-level programming, such as term expansion, the use of var/1 is enticing; but I suspect that a non-“defaulty” representation would be more robust and easier to use in the long term.

Thank you for explaining your deep background thought in your codes.
As I have poor knowledge about such related meta-level programming technology,
I have nothing to say here. BTW, I learned var_number/2 first from your codes, which I wanted for a long time. Thanks again.