Order and sorting

I don’t know if that’s what they meant. In that case they would have said “keys (or values”) not “keys (i.e. values)”.

Thanks. I hadn’t thought about a formal definition of sorting, but of course one exists. I think that sounds a bit complicated though, especially when it defines such a simple concept as sorting to some total (or I guess partial) order. If you asked me to come up with a definition of sorting I’d have tried to say something like “a function that maps the elements of a set to the elements of a set of ordinals”. But I guess there must be details I haven’t thought of.

Just a guess: It might be that a certain formalism is better suited for constructive proofs about the problem. Such constructive proofs are far more useful for developing algorithms that are meant to be executed on a machine.

I think the proofs come after the algorithms, usually. I do usually scratch up my algos in a semi-formal notation before I hack at my code, but I don’t think I’ve ever derived a proof before I knew that an algorithm works to begin with.

EDIT: sorry, I’m in a chatty mood today. It’s a Greek national holiday (though I’m in the UK). I see what you mean and you’re probably right. I’ll shut up now :sweat_smile:

From the other thread:

Not necessarily. Multisets should be able to contain equivalent elements that do not compare equal. Equivalence is of course meaningless for compare/3; but we have a precedent in keysort/2, where two pairs are equivalent if the keys are equal. Same with sort/4, so we can do:

?- keysort([x-y,a-foo,a-bar], R).
R = [a-foo, a-bar, x-y].

?- sort(1, @=<, [x-y,a-foo,a-bar], R).
R = [a-foo, a-bar, x-y].

?- sort([x-y,a-foo,a-bar], R).
R = [a-bar, a-foo, x-y]. % this flipped foo with bar!

This cannot be done if equivalent subsets are represented as a term and a count.

But it also cannot be done easily if the set predicates are implemented on top of compare/3.

For this, the user should be able to provide a comparison predicate; there is a precedent for this, too, in predsort/3.

Currently predsort/3 needs a first argument Pred:

Pred(-Delta, +E1, +E2) . This call must unify Delta with one of <, > or =.

Side-note: some existing algorithms on sorted sequences only require a semi-det relation predicate p/2 (usually, “less than”). The comparison can be bootstrapped from it:

compare_p(Order, X, Y) :-
    (   p(X, Y)
    ->  Order = (<)
    ;   p(Y, X)
    ->  Order = (>)
    ;   Order = (=)
    ).

Either way, I tried to address my own suspicion from the original post:

Right now I suspect that the widely accepted, “textbook” algorithms that work on Prolog “sets” are by definition correct also for “multisets”.

This was not a useful way to formulate it. Instead:

An algorithm that is correct for multisets represented as ordered lists with repeats will also be correct for sets represented as lists without repeats.

I won’t prove that, it follows from the fact that sets in that representation are a special case of multisets. (ie, repeats are at most 1 long…)

Here it got interesting! @stassa.p already observed that the current implementation “just works”. I went ahead and did the following:

  • I copied from here some of the algorithms that work on sorted or partitioned ranges, and translated them to Prolog.
  • For the algorithms that were implemented in library(ordsets) already:
    • I compared the existing implementation;
    • I also tested the existing implementation along with my own (mechanic) translation.

I am attaching two files:
msorted.pl (6.6 KB)
test_msorted.pl (6.3 KB)

msorted.pl contains the algorithms I copied, and test_msorted.pl contains tests for some of the predicates and the corresponding library(ordsets) predicates. You can do for example

?- use_module(test_msorted).
true.

?- run_tests.

Some results:

  • The library(ordsets) implementation of union, intersection, difference (subtract), and symmetric difference (symdiff) is very close to the mechanical translation I did from procedural code to Prolog.
  • The implementation in library(ordsets) seems to be correct for multisets, and so are my mechanical translations in msorted.pl. Some of the implementations in library(ordsets) however do not follow the strict specifications for the msorted_* predicates (see the file msorted.pl for details).

All of this is in practice useless :smiley: however, I can now allow the user to provide the comparison predicates to the algorithms in msorted.pl. For example, with this:

key_compare(Order, A-_, B-_) :- compare(Order, A, B).

… the msorted_* predicates will be correct for multisets, as produced by keysort/2.

Any thoughts? Sorry for just pasting the files here, I am not sure about the names of things so I would like to see if there is any interest or comments before proceeding.

This is much clearer and shorter than what I tried to say over a dozen posts, thanks. Now that I see it written so simply I also have no doubt that it’s obviously true. The interesting thing is that library(ordsets) got that from the other end, i.e. starting from lists without repeats.

It’s interesting that library(ordsets) union and intersection both swap the order compared to your implementation. I haven’t looked carefully enough to tell why.

Ultimately a set isn’t ordered unless explicitly defined as an ordered set, so ordering is not 100% necessary but I guess you were more interested in the sorting in the first place.

This might require detective work. The original oset.pl file seems to be from 1993 and contained the relevant code that is now in library(ordsets). That code is again very similar to the code in “The Craft of Prolog” by O’Keefe, which was published in 1990. I have seen that before; but I also took care to translate the procedural code (for multisets!) I linked above in a “mechanical” fashion (see below for a demonstration). And I ended up with basically the same predicates and helper predicates (I did order my arguments differently…). But I don’t see any references in O’Keefe’s text, so I assume it is just such an obvious algorithm that you end up with the same anyway.

It is an optimization, when the head and tail of the second argument are already unpacked, they are used in a call to the helper predicate that initially takes the head and tail of the first argument. You save two predicate calls.

Yes, correct. Because a set is not ordered, the implementation is free to throw away the initial order and work with lists of sorted elements, since this allows for a better algorithm. Python for example has a container called OrderedDict which maintains the order in which key-value pairs were inserted.

Demonstration of how I did a mechanical translation of procedural code to Prolog and ended up with the exact same thing :smiley:


If I take “union” as an example, here was the code I started with (provided as “possible implementation”):

template<class InputIt1, class InputIt2, class OutputIt>
OutputIt set_union(InputIt1 first1, InputIt1 last1,
                   InputIt2 first2, InputIt2 last2, OutputIt d_first)
{
    for (; first1 != last1; ++d_first)
    {
        if (first2 == last2)
            return std::copy(first1, last1, d_first);
 
        if (*first2 < *first1)
            *d_first = *first2++;
        else
        {
            *d_first = *first1;
            if (!(*first1 < *first2))
                ++first2;
            ++first1;
        }
    }
    return std::copy(first2, last2, d_first);
}

I start from the for(; first1 != last1; ++d_first) at the top. first1 != last1 says “until the first list is not exhausted”, so,

union([], ?B, ?U).
union([X|A], ?B, ?U).

The ++d_first informs me that inside the loop, for every possible path, there will be something written to the output list. I keep that in mind.

Next,

        if (first2 == last2)
            return std::copy(first1, last1, d_first);

This now tells me that if I am at the end of the second list, the first list is the remaining of the output list, and I am done. So:


union([], ?B, ?U).
union([X|A], B, U) :-
    union_b(B, X, A, U).

union_b([], X, A, [X|A]).
union_b([Y|B], X, A, U) :- ?.

Then, the first comparison of the two heads:

        if (*first2 < *first1)
            *d_first = *first2++;

I have committed to using compare/3 so I will write compare(Order, X, Y) and add that case. The *d_first = *first2++; tells me to consume the head of the second list and add it to the result.


union([], ?B, ?U).
union([X|A], B, U) :-
    union_b(B, X, A, U).

union_b([], X, A, [X|A]).
union_b([Y|B], X, A, U) :-
    compare(Order, X, Y),
    union_cmp(Order, X, Y, A, B, U).

union_cmp(>, X, Y, A, B, [Y|U]) :-
    union([X|A], B, U).

The “else” of that “if” needs to be read at once because the procedural reading is important:

else
        {
            *d_first = *first1;
            if (!(*first1 < *first2))
                ++first2;
            ++first1;
        }

So, in both other cases of Order, we consume the head of the first list (++first1; at the bottom) and we also add it to the output list (*d_first = *first1; at the top).

In addition, when the two heads are equal, I can consume the head of the second list, too.

            if (!(*first1 < *first2))
                ++first2;

So:


union([], ?B, ?U).
union([X|A], B, U) :-
    union_b(B, X, A, U).

union_b([], X, A, [X|A]).
union_b([Y|B], X, A, U) :-
    compare(Order, X, Y),
    union_cmp(Order, X, Y, A, B, U).

union_cmp(<, X, Y, A, B, [X|U]) :-
    union(A, [Y|B], U).
union_cmp(=, X, Y, A, B, [X|U]) :-
    union(A, B, U).
union_cmp(>, X, Y, A, B, [Y|U]) :-
    union([X|A], B, U). % here we can take the shortcut

“The shortcut” is that we have already unpacked the head and tail of the first list in the last case above, so we can directly call union_b/4.

Finally, at the bottom, when the second list is longer than the first, we take the rest to the output:

return std::copy(first2, last2, d_first);

This informs me how to resolve the first clause of union/3, so that I get, at the end:

union([], B, B).
union([X|A], B, U) :-
    union_b(B, X, A, U).

union_b([], X, A, [X|A]).
union_b([Y|B], X, A, U) :-
    compare(Order, X, Y),
    union_cmp(Order, X, Y, A, B, U).

union_cmp(<, X, Y, A, B, [X|U]) :-
    union(A, [Y|B], U).
union_cmp(=, X, Y, A, B, [X|U]) :-
    union(A, B, U).
union_cmp(>, X, Y, A, B, [Y|U]) :-
    union_b(B, X, A, U). % shortcut

Compare this to the original code for set union from library(oset) as linked above, complete with the argument-swapping shortcut. (Since we only have equality here, this is completely legal!)

% oset_union(+OSet1, +OSet2, -Union).
% -----------------------------
oset_union([], Union, Union).
oset_union([H1|T1], L2, Union) :-
    union2(L2, H1, T1, Union).

union2([], H1, T1, [H1|T1]).
union2([H2|T2], H1, T1, Union) :-
    compare(Order, H1, H2),
    union3(Order, H1, T1, H2, T2, Union).

union3(<, H1, T1,  H2, T2, [H1|Union]) :-
    union2(T1, H2, T2, Union). % this shortcut swaps!
union3(=, H1, T1, _H2, T2, [H1|Union]) :-
    oset_union(T1, T2, Union).
union3(>, H1, T1,  H2, T2, [H2|Union]) :-
    union2(T2, H1, T1, Union).

PS: I should actually try some LLM, give it the C++ code and ask it to give me a Prolog equivalent. Stay tuned! :smiley:
PPS: well it ain’t gonna give it to me easily.

You could have started with:

The Craft of Prolog
Richard A. O’Keefe
4 Methods of Programming
4.2 The Problem of Merging Two Ordered Lists
https://mitpress.mit.edu/9780262150392/the-craft-of-prolog/

Only the SWI solution and your solution has the optimization
that in ord_union_3/6 there is a call to ord_union2/4, and as
you observe there is swap in the first clause.

But he does it twice, in chapter 5 again. Given the
extensive procedural discussions he has inserted in
his book, he probably thinks this is the best solution.

But your shortcut surely saves a list consing, which might
have slipped Richard O’Keefes attention. As it happens
its the same as SWI implements with single sided

unification (=>)/2. Only in single sided unification
the output needs to be determined by (=)/2. But here
is the Richard O’Keefe original code from his 1990 book:

Yes, but I had a reason not to. That code ends up in the same place but does not include multisets (lists as sorted by msort/2) in the spec. The fact that I started elsewhere and ended up in the same place is interesting enough to me.

Either way, I would like to hear opinions on whether a new library (maybe distributed as an “add-on package”) is of any interest to anyone. We roughly have the following purely additive features not available at the moment in library(ordsets):

  • A spec that covers multisets as produced by msort/2 (already in the code I provided)
  • A C implementation of the same algorithms, if this makes it that much faster (I would know how to do it and the work seems mechanical enough. I specifically used compare/3 because it is available as PL_compare() in the foreign language interface)
  • A version of the predicates that takes a custom comparator.

This last one is not well defined at the moment. One possible way is to have the client code provide a comparison predicate like predsort/3, another would be to have some convention similar to sort/4 which covers a lot of use cases (but not all).

The only important question is if there is any use whatsoever for any of this. Until Stassa popped up with her questions a month ago, no one ever seems to have needed any of this :smiley: I would have needed it a couple of times but I just wrote custom code or circumvented the issue altogether.

What do you mean by that? Amazingly he explicitly considers
duplicates in chapter 4 where he defines what a multi-set
list is, and he can do so without msort/2. This is from his 1990

book. Could be that SWI-Prolog got lost track of that spec:

/* The Craft of Prolog */
/* Richard A. O’Keefe */
/* 4 Methods of Programming */
/* 4.2 The Problem of Merging Two Ordered Lists */

ordered_list([]).
ordered_list([_]).
ordered_list([X1,X2|Rest]) :-
     X1 @=< X2,
     ordered_list([X2|Rest]).

merge(L1, L2, M) :-
    append(L1, L2, L),
    permutation(L, M),
    ordered_list(M).

So he has a declarative definition what the union of two
multi-set lists is, without refering to msort/2. You only
need append/3, permutation/2 and ordered_list/1, to

specify it. Its not fastly excutable but specified. If you use
something else than ordered_list/1 you get non-multi
set lists. You have only to replaced (@=<) by (@<) to move

from the msort/2 based sets to the sort/2 based sets,
To specify a merge/3 will become a little complicated
then, you have to add a duplicate filter after permutation/2.

In as far his solution has the same (=) rule:

You can compare with Richard O’Keefes version:

/* The Craft of Prolog */
/* Richard A. O’Keefe */
/* 5 Data Structure Design */
/* 5.2 Writing a Set-Union Predicate */

oset_union_3(=, H1, T1, _, T2, [H1|Union]) :-
    oset_union(T1, T2, Union).

He also drops only one head. So that duplicates
don’t disappear after performing union. At that time
the verbally specified it as "a predicate that returns the

union of two sets in ordered representation". Actually
I was mistaken, the short cut and swap idea also
oppears on the next page of the book:

Maybe we could say Richard O’Keefe nicely covers the merge/3
operation for multi-set lists and non-multi set lists. But there are
usually more operations like intersection/3, subtraction/3, etc..

And their semantic on multi-sets lists doesn’t need to follow a
unique common practice. Frankly I don’t know whether Richard
O’Keefe would help for these other operations. On the other

hand his coverage of merge/3 is rather extensive.

Yes, sure. You can also find the msorted_until/4 predicate in msorted.pl I shared above. The ordered_list/1 you show can be bootstrapped as

ordered_list(L) :-
    msorted_until(L, _, _, Rest),
    Rest == [].

I had versions of all predicates in msorted.pl that used the weaker less_than/2 comparison, but got rid of them completely because in SWI-Prolog this is anyway a call to compare/3. With the exception of some edge cases, using a compare/3 instead of a less_than/2 does not make a difference. Again, a reminder, compare/3 can be bootstrapped from a less_than/2 like this:

compare(Order, A, B) :-
    (   less_than(A, B)
    ->  Order = (<)
    ;   less_than(B, A)
    ->  Order = (>)
    ;   Order = (=)
    ).

I took a shortcut in msorted.pl and just used the merge/3 available in backcomp.pl:

%!  merge(+List1, +List2, -List3)
%
%   Merge the ordered sets List1 and List2 into a new ordered  list.
%   Duplicates are not removed and their order is maintained.
%
%   @deprecated     The name of this predicate is far too general for
%                   a rather specific function.

merge([], L, L) :- !.
merge(L, [], L) :- !.
merge([H1|T1], [H2|T2], [H|R]) :-
    (   H1 @=< H2
    ->  H = H1,
        merge(T1, [H2|T2], R)
    ;   H = H2,
        merge([H1|T1], T2, R)
    ).

I did it with the intention of re-implementing it using compare/3 and potentially re-writing it in C.

What I mean is that unless I am very confused, the code in “The Craft of Prolog” from 1990, just like the code in the original oset.pl from 1993, talks about ordered lists as created by sort/2. Incidentally, my belief that this is an unnecessary requirement that can be relaxed is what is motivating all those posts.

But it obviously does not in the book for merge/3 and ord_union/3.
Or Richard O’Keefe is not aware what he did. This is also possible.
He says his version is a cleaned up version of some Quintus Prolog code.

What do you mean? Either way, I am not interested in archeology that much. Note that I have not claimed any authorship in the source files I have shared because I believe this is all in the public domain. All I want is to have a useful library, and in fact I do already.

Well you are right that he considers only sets. Because he defines:

“A good representation to use is the one that the built-in predicate setof/3
returns, namely a list of elements in standard order (as defined by compare/3)
with no duplicates. This is called the ordered representation.”

So the book is a little ambivalent. In section 4 he deals with multi-sets
lists explicitly. In section 5 he deals with non-multi set lists explicitly.
But his ord_union/3 has a glitch that it can deal with multi-sets lists.

The same glitch is now found in the SWI-Prolog library. The archeology
here is like an anamnesis, explaining a anomaly in terms of a medical
history. But it also gives the opportunity to remix, like saying hey,

we take section 4 and overlay it on section 5. Like a phoenix, creating
something new and beautiful from a broken history.

Yes, exactly! This is what motivated me to open this topic. On the other hand, it seems that multisets are not that popular :grinning_face:

Well now I got myself confused. ord_union/3 doesn’t implement
merge/3. You see it here:

?- ord_union([a],[a,a,a], X).
X = [a, a, a].

?- ord_union([a,a,a,a,a],[a,a,a], X).
X = [a, a, a, a, a].

On the other hand:

?- merge([a],[a,a,a], X).
X = [a, a, a, a].

?- merge([a,a,a,a,a],[a,a,a], X).
X = [a, a, a, a, a, a, a, a].

The predicate merge/3 exists in library(backcomp) in SWI-Prolog.
It implements the plus (+) semantics of union. On the other hand
the predicate ord_union/3 from library(ordsets) only

implements the maximum (max) semantics of union. So the outer
type signature of a predicate, like accepting multi-set lists does not
unambiguisly hint to a semantic, the inner workings.

So I was also wrong in another aspect, its not only intersection
and subtraction that might have varying semantics, also union
itself has already different incarnations.

One reason for using merge/3 as a temporary drop-in: I know that inside sort/4 there is already a more useful merge, defined in C, and supporting the sort/4 semantics. I just haven’t looked into it yet.