Upstreaming `ord_lookup`, `ord_range` and `rb_visit_range`

Hello all,

While working on my egraph library, I have written some helper predicate to use with ordsets and rbtrees.
I am wondering if these could be useful for the community, so I am exposing them here.
Let me know if one of them could be useful to you, in which case, I’ll do a pull request on swipl-devel to upstream them.

  1. ord_lookup((+Key)-(-Value), +SortedPairs): efficiently retrieve a Value given a Key from a sorted list of pairs
  2. ord_range(+Set, +Min, +Max, -Range): efficiently retrieve a range of elements from a set given a Min and Max bounds
  3. rb_visit_range(+Tree, +Min, +Max, -Range): retrieve a range of Pairs from an rbtree given a Min and Max bounds on keys.

Let’s start with ord_lookup/2:

%!  ord_lookup(+Pair, +SortedPairs) is semidet.
%
%   Retrieves a value from a sorted list of pairs using standard term comparison.
%   The search is unrolled for performance. Adapted from ord_memberchk/2.
%
%   @arg Pair A `Key-Value` pair where `Key` is the target key to find, and `Value` is unified with the associated value.
%   @arg SortedPairs A list of Key-Value pairs sorted by Key.

ord_lookup(Item-V, [X1-V1, X2-V2, X3-V3, X4-V4|Xs]) :-
   !,
   compare(R4, Item, X4),
   (   R4=(>)
   ->  ord_lookup(Item-V, Xs)
   ;   R4=(<)
   ->  compare(R2, Item, X2),
      (   R2=(>)
      ->  Item==X3, V = V3
      ;   R2=(<)
      ->  Item==X1, V = V1
      ;   V = V2
      )
   ;   V = V4
   ).
ord_lookup(Item-V, [X1-V1, X2-V2|Xs]) :-
   !,
   compare(R2, Item, X2),
   (   R2=(>)
   ->  ord_lookup(Item-V, Xs)
   ;   R2=(<)
   ->  Item==X1, V = V1
   ;   V = V2
   ).
ord_lookup(Item-V, [X1-V1]) :-
   Item==X1, V = V1.
?- numlist(0, 8, L), random_permutation(L, V), pairs_keys_values(P, L, V),
lookup(4-Value, P).
L = [0, 1, 2, 3, 4, 5, 6, 7, 8],
V = [5, 7, 1, 3, 2, 8, 0, 4, 6],
P = [0-5, 1-7, 2-1, 3-3, 4-2, 5-8, 6-0, 7-4, … - …],
Value = 2.

ord_range/4 follows the same unrolled search logic to skip smaller elements:

%!  ord_range(+Set, +Min, +Max, -Range) is det.
%
%   Retrieves a range of elements between `Min` and `Max` (inclusive)
%   from a set using standard term comparison.
%
%   @arg Set the ordset
%   @arg Min minimum bound
%   @arg Max maximum bound
%   @arg Range Sorted list of elements between `Min` and `Max`
ord_range([_X1, _X2, _X3, X4|Xs], Min, Max, Range) :-
   X4 @< Min, !,
   ord_range(Xs, Min, Max, Range).
ord_range([X|Xs], Min, Max, Range) :-
   X @< Min, !,
   ord_range(Xs, Min, Max, Range).
ord_range([X|Xs], _Min, Max, Range) :-
   ord_take([X|Xs], Max, Range).
ord_range([], _Min, _Max, []).

ord_take([X|Xs], Max, Range) :-
   (   X @> Max
   ->  Range = []
   ;   Range = [X|Rest],
       ord_take(Xs, Max, Rest)
   ).
ord_take([], _Max, []).
?- numlist(0, 8, L), ord_range(L, 4, 6, R).
L = [0, 1, 2, 3, 4, 5, 6, 7, 8],
R = [4, 5, 6].

And finally, the same thing, but with rbtrees as rb_visit_range/4:

%!  rb_visit_range(+Tree, +Min, +Max, -Range) is det.
%
%   Retrieves a range of pairs with keys between `Min` and `Max` (inclusive)
%   from a Tree using standard term comparison.
%
%   @arg Tree the rbtree
%   @arg Min minimum bound
%   @arg Max maximum bound
%   @arg Range Sorted list of pairs between `Min` and `Max`
rb_visit_range(t(_,T), Min, Max, Pairs) =>
   visit_range(T, Min, Max, [], Pairs).

visit_range(black('',_,_,_), _Min, _Max, L0, Lf) => 
   L0 = Lf.
visit_range(red(L,K,V,R), Min, Max, L0, Lf) =>
   (   K @< Min
   ->  visit_range(R, Min, Max, L0, Lf)
   ;   K @> Max
   ->  visit_range(L, Min, Max, L0, Lf)
   ;   visit_range(L, Min, Max, [K-V|L1], Lf),
       visit_range(R, Min, Max, L0, L1)
   ).
visit_range(black(L,K,V,R), Min, Max, L0, Lf) =>
   (   K @< Min
   ->  visit_range(R, Min, Max, L0, Lf)
   ;   K @> Max
   ->  visit_range(L, Min, Max, L0, Lf)
   ;   visit_range(L, Min, Max, [K-V|L1], Lf),
       visit_range(R, Min, Max, L0, L1)
   ).
?- numlist(0, 8, L), pairs_keys_values(P, L, L), ord_list_to_rbtree(P, T),
|    rb_visit_range(T, 4, 6, Range).
L = [0, 1, 2, 3, 4, 5, 6, 7, 8],
P = [0-0, 1-1, 2-2, 3-3, 4-4, 5-5, 6-6, 7-7, … - …],
T = t(...),
Range = [4-4, 5-5, 6-6].

It is a little unclear where that should go. library(ordsets) is not about pairs. library(pairs) could be better. Problem is that this library both deal with ordered and not-ordered pairs, so that problem remains :frowning:

Did you check for current practice on any of these? If that exists we must consider aligning as first option.

I couldn’t find anything in other prolog implementation yet.
I found this earlier discussion: Selecting a range in a dictionary
which shows that at least rb_visit_range might be more broadly useful than just myself.

N.B.: Sicstus prolog has a predicate called avl_range but the equivalent is rb_values

Yes, I’m confused too as to where ord_lookup should go.
I would slightly favor putting in ordsets as the predicate only advantage over member(K-V, L), K == MyKey is that the list of pairs is sorted, meaning we can unroll the search.

Trying ord_lookup/2 as you did with a list of 100,000 elements does outperform member/2, but not memberchk/2 (which is cheating using a C helper). Is this worthwhile, also considering its unclear placement?

As for the others, please create a PR.

Ok, will do.

Wow, did not know that.
Did you do an element check or did you do a key-value lookup ?
Also, I’m using it to lookup partially instantiated keys through == which I don’t think you can replicate with memberchk ?

If nobody else is showing interest, I don’t want to push something that nobody wants.
I guess we will have to wait and see.

Hi both,

An ord lookup for pairs is needed, as ord_memberchk/2 is implemented for check only, which is strange in my opinion.

?- ord_memberchk( b-B, [a-1,b-2,b-4,c-3] ).
false.

?- ord_memberchk( b-2, [a-1,b-2,b-4,c-3] ).
true.

Now, if memberchk/2 is faster/fast enough, then it is another matter.
It is worth adding to the docs, that memberchk/2 is cheating and faster than Prolog implementation.
Of course, your tests might not measuring quite the whole truth.
The speed of ord_look_up/2 is primarily not on the true cases, but on the failures.
It might/should fail much faster than memberchk/2- depending on implementation.

In my opinion the predicate is best suited in the pairs library. You could have a sorted pairs library, but maybe an overkill.

In pack(stoics_lib), my medley predicates pack, there is a subdirectory kv, for kv specific predicates. The name convention is more streamlined and lets itself a bit better to having preds that operate on special cases. All predicates start with kv. Predicates that operate on ordered lists start with kvo and on those for sorted lists with kvs. Where order lists allow key repetition and sorted do not (not 100% sure this a widely accepted distinction).

The library defines kvo_k_memberchk/3 and kvs_k_memberchk/3.

kvs_k_memberchk( b, [a-1,b-2,c-3], V ).
V = 2.

kvo_k_memberchk( b, [a-1,b-2,b-4,c-3], V ).
V = 2;
V = 4;
false.

In the case of kvo the chk is a bit of misnomer- maybe member would have been better.

Btw, most of the kv preds are now operating on arbitrary arity 2 or arity N>2 terms. Probably they are slower because of that.

?- kvs_k_memberchk( b, [t(a,1,x),t(b,2,y),t(c,3,z)], V ).
V = 2.

Regards,

Nicos

Thanks for showing interest and prior implementation.
After thinking more about this, I think we can explain the current lack of ord_lookup because of this combination:

  • if you just have a list of pairs, just use memberchk(k-V, L)
  • if you have an ordset of pairs, just use assoc or rbtree

In my usecase, the reason I didn’t use an rbtree is because I need to “heal” the datastructure after unification is done on the keys.
For an ordset, it is just a sort, while for an rbtree, I would need to convert from/to a list before doing a sort.
Given that my use-case is pretty niche, I am not at all confident we should upstream ord_lookup.

@nicos what was your use for kvs_k_memberchk ? why couldn’t you use an rbtree ?

Not sure what the original use case(s) might have been.
These are helper preds accumulated over the years.
Occasionally spending some time to make some cohesive story.
Such as using kv for naming pair names.

I suspect a list of pairs is a simpler data structure that likely constructed
quickly and can be more easily inspected and operated on.

Nicos