Unexpected semantics of library(ordsets) for ordered _lists_ (not sets)

Better explain this with an example:

190 ?- _Ls = [a,b,c], _Rs = [a,a,a,b,b,c,c,c,c], ord_intersect(_Ls,_Rs,Is).
Is = [a, b, c].

191 ?- _Ls = [a,a,b,c], _Rs = [a,a,a,b,b,c,c,c,c,d,d], ord_intersect(_Ls,_Rs,Is).
Is = [a, a, b, c].

192 ?- _Ls = [a,a,b,b,c], _Rs = [a,a,a,b,b,c,c,c,c,d,d], ord_intersect(_Ls,_Rs,Is).
Is = [a, a, b, b, c].

193 ?- _Ls = [a,a,b,b,c,c], _Rs = [a,a,a,b,b,c,c,c,c,d,d], ord_intersect(_Ls,_Rs,Is).
Is = [a, a, b, b, c, c].

194 ?- _Ls = [a,a,b,b,c,c,c], _Rs = [a,a,a,b,b,c,c,c,c,d,d], ord_intersect(_Ls,_Rs,Is).
Is = [a, a, b, b, c, c, c].

195 ?- _Ls = [a,a,b,b,c,c,c,c], _Rs = [a,a,a,b,b,c,c,c,c,d,d], ord_intersect(_Ls,_Rs,Is).
Is = [a, a, b, b, c, c, c, c].

196 ?- _Ls = [a,a,b,b,b,c,c,c,c], _Rs = [a,a,a,b,b,c,c,c,c,d,d], ord_intersect(_Ls,_Rs,Is).
Is = [a, a, b, b, c, c, c, c].

In other words, ord_intersect/3 is happy to operate on a sorted list with duplicates (essentially, a multiset), despite the documentation saying that the entire library only operates on ordered sets:

Ordered sets are lists with unique elements sorted to the standard order of terms (see sort/2). Exploiting ordering, many of the set operations can be expressed in order N rather than N^2 when dealing with unordered sets that may contain duplicates.

From: SWI-Prolog -- library(ordsets): Ordered set manipulation

At least some of the other predicates in the library work the same way:

201 ?- _Ls = [a,a,a,b,b,c,c,c,c], _Rs = [a,a,a,b,b,c,c,c,c,d,d], ord_subset(_Ls,_Rs).
true.

202 ?- _Ls = [a,a,b,b,c,c], _Rs = [a,a,a,b,b,c,c,c,c,d,d], ord_subtract(_Ls,_Rs,Ds).
Ds = [].

203 ?- _Ls = [a,a,b,b,c,c], _Rs = [a,a,a,b,b,c,c,c,c,d,d], ord_subtract(_Rs,_Ls,Ds).
Ds = [a, c, c, d, d].

% Ugh, this one may be slightly off!
204 ?- _Ls = [a,a,b,b,c,c], _Rs = [a,a,a,b,b,c,c,c,c,d,d], ord_union(_Ls,_Rs,Ds).
Ds = [a, a, a, b, b, c, c, c, c, d, d].

205 ?- _Ls = [a,a,b,b,c,c], _Rs = [a,a,a,b,b,c,c,c,c,d,d], ord_symdiff(_Ls,_Rs,Ds).
Ds = [a, c, c, d, d].

As it happens I went looking at the implementation because I wanted an intersection predicate operating on lists with duplicates, and I thought that sorting the list would improve performance. I went to look at the implementation of ord_intersect/3 to see if I could cannibalize it for patterns to use in my predicate, and I thought to check, and realised it works on ordered lists with duplicates.

Now, I would like to keep this behaviour, but what is the right thing to do here in terms of the library? Should the documentation change to reflect the true behaviour of the code? Or should the code change to match the informal specification given in the documentation?

I’m for the former: keep the behaviour, change the documentation. After all, the current implementation works with ordered sets just fine, it’s just that it seems to work also with essentially multisets. I don’t think that’s a bad thing, it’s just that the library is less restricted than advertised. I don’t think having multiples changes the performance much either, but haven’t checked to be honest.

Sorry if I read this too fast. In the docs to is_ordset/1 I find:

True if Term is an ordered set. All predicates in this library expect ordered sets as input arguments. Failing to fullfil this assumption results in undefined behaviour. Typically, ordered sets are created by predicates from this library, sort/2 or setof/3.

(emphasis mine)

In other words, it is up to the programmer to ensure that the pre-condition holds.

Sorry, is the programmer the user, or the person implementing the library?

From a quick eyballing it looks like the behaviour is not so much undefined, as undocumented, and it seems to make sense for the most part, except from what I’ve seen so far for ordset_union/3.

Having this unexpected behaviour documented to help the user better understand how to use the library can’t be a bad thing, no?

Besides, if we wanted a new library that operates on ordered multisets we’d have to basically copy most of the code from the current library, and wouldn’t that be a bit daft?

Leaving a behaviour undefined allows more flexibility in the implementation – if the behaviour is documented, then it’s possible that some future optimisation would be impossible without breaking compatibility.

Would you expect ordset_union/3 to work if one of the lists were an unordered list? An “ordset” is defined as having no duplicates (that’s what sort/2 does, unlike msort/2). And the library is defined to only work with “ordsets”, that is sorted with no duplicates (and not multisets). If Prolog were a typed language, we might be able to make this obvious at compile time; as it is, you’ll just get a wrong answer (and we’ve had long discussions in the past about failure versus throwing an error, although not about ordsets specifically).

If in doubt, use list_to_ord_set/2.

Yes, I agree, we could argue that the code is the spec and so on. However, in this case, the docs explicitly say that for anything but lists as created by sort/2 and setof/3 the result is undefined behaviour.

I am starting to doubt a bit. What would be the intersection if you have duplicates? My gut feeling tells me the intersection shouldn’t have duplicates.

Or are you aiming to keep some number of duplicates in your intersection?

I agree with your comment in general terms, but library(ordsets) is an old library (apparently inherited from Quintus if I’m not mistaken?). I’m guessing it’s very unlikely that the implementation will change any time soon. Besides, it’s not like there’s any problem in the way it works with ordsets, so why would the implementation change? We could change it so that it doesn’t “work” (depending on how we define that) with ordered multisets, but I’m just unsure what would be the motivation for that.

It’s a library for ordsets, that works just as expected on ordsets and happens to (at least to some extent) work on ordered multisets also. What’s the right thing to do, is my question really.

EDIT: I guess it’s always possible there’s some bug in the library that was not found in the probably many years of its use, or some optimisation still to be made, that would change the behaviour on “ordered multisets” as I call them, so that’s two cases where the implementation could change.

Ah, that’s a good question. It didn’t occur to me to look around for a definition of set operations on multisets. It’s possible the results I see are wrong for multisets. I’ll have to check and come back to you on that.

In my use case I want to count the duplicates in the intersection of two sets. The sets are adjacency relations in an image and the intersection is the common adjacency relations. I want to know what are the adjacency relations most frequently shared between the two images, as part of a measure of similarity between two images.

It’s a very particular use case and I don’t know if it extends outside the strict use case I have.

I guess for my use case the best option is to just copy/paste the code in my project and keep a copy of it that I know works the way I want it “safe” from future changes, that might take away the undocumented behaviour (unlikely as I think those to be). I just don’t like the code duplication.