Unexpected semantics of library(ordsets) for ordered _lists_ (not sets)

Thanks, but you underestimate the complexity of the endeavour. I suck at C. :sweat_smile:

Though I guess that would be one nice C exercise for me.

Well, see, the thing is I don’t think library(ordsets) needs any modification at all. Neither for my use case, nor for any use case that requires multiset operations. My working hypothesis is that all the set operations in library(ordsets) also happen to be correct multiset operations.

See for example @Johnny_Rotten’s comment above:

That’s how the intersection of two multisets should work: it should give you a new set with the elements that are in both sets, and so that each element in the new set has multiplicity (i.e. number of repetitions) equal to the smallest multiplicity of that element in the original sets.

I think the rest of the library works the same way also. I’d have to check, but so far it looks like every other operation is dual-purpose, it works on both sets and multisets.

Granted, that’s just the result of my quick eyballing of a few of the library(ordset) predicates. To make sure would take more effort that I’ve yet to do. Perhaps that’s what I should spent some time on then? I could even write some unit tests to make it easier.

Ah, this just clicked. The sum of two multisets is a different operation than the union of two multisets, IIUC. That operation is missing from library(ordsets), but ord_uniont/3 is, AFAICT, correct as a multiset union.

Ah, that’s why you want the “undefined” behaviour to be defined. Given that the code is old, doesn’t seem to be buggy, and does what you want, documenting that it works for multi-sets seems reasonable – you’d want to add list_to_ord_multiset(List, MSet):-msort(List,MSet) (also ord_set_to_ord_multiset/2 for completeness); add some comments to the various predicates; and add some multi-set tests to tests/library/test_ordsets.pl.

Yeah somehow min, max from multisets is a lattice and min, sum
from multisets is a semiring. You can do a kind of fuzzy logic
Datalog construction all the time.

If you have a lattice or semiring C, the values. Then the function
space D → C, the key value pairs, are also a lattic or semiring.
You can verify all axioms again. For example mode directed

tabling is one big key value machinery, compensating for
the deduplication via tabling, that erases all multiplicity
information known from ordinary Prolog execution:

mode directed tabling
https://www.swi-prolog.org/pldoc/man?section=tabling-mode-directed

Currently allowing to plug-in a lattice, but since it has also sum,
one could maybe make the sum also plugable, by plugging in a
semiring. The numbers semiring goes by the name tropical semiring.

BTW: Could also try matrices and AI accelerators.

That’s right! Sorry, I think I confused the point in my first comments, but then I was confused myself and still am unsure. I think those are good suggestions and I’ll see if I can find the time to follow them.

Heh. I am a category theory illiterate, sorry. What does it mean, in practical terms, that something is a semiring?

The connection to tabling sounds really interesting.

Well the tropical semiring (C, min, +) has this distributivity:

a + min(b, c) = min(a + b, a + c)

So its very close to a semiring (C, +, *), an algebraic structure,
from mathematical logic, consisting of a set with two binary operations
(addition + and multiplication *). Which has this distributivity:

a * (b + c) = a * b + a * c.

You could use such laws to derive test cases for the multi-set
operations. The multi-set operations are more powerful than
tabling, since tabling would see the data through the lense

of variant keys. But semiring distributivity also holds, right?

:- table p(_, min).               :- table p1(_, sum).
p(X, Y) :- b(X, Y).               p1(X, Y) :- a(X, Y).
p(X, Y) :- c(X, Y).               p1(X, Y) :- b(X, Y).

:- table q(_, sum).               :- table p2(_, sum).
q(X, Y) :- a(X, Y).               p2(X, Y) :- a(X, Y).
q(X, Y) :- p(X, Y).               p2(X, Y) :- c(X, Y).

                                  :- table q2(_, min).
                                  q2(X, Y) :- p1(X, Y).
                                  q2(X, Y) :- p2(X, Y).

?- q(X,Y).                        ?- q2(X,Y).           
X = bar, Y = 5 ;                  X = bar, Y = 5 ;
X = foo, Y = 7 ;                  X = foo, Y = 7 ;
X = baz, Y = 6.                   X = baz, Y = 6.

Was using this test data:

a(foo, 3).                        b(foo, 4).
a(bar, 4).                        b(bar, 2).
a(baz, 5).                        b(baz, 1).

c(foo, 5).
c(bar, 1).
c(baz, 2).

I am inclined to disagree. ordsets is a nice consistent library that could be optimized a lot further, even completely rewriting it using low-level primitives. I’d rather keep this option open. Then, multisets are probably best represented as a list of pairs Term-Count and supported by a proper library.

Of course, anyone can take the risk and use the ordsets library beyond its documentation. A simple assertion/1 call in a directive could be enough to warn that things changed.

What would change in that case? I still think (but I still need to check also) that the reason the library works both ways (on ordered lists with either unique elements or with duplicates) is its efficient implementation that exploits list ordering.

What I mean is that the natural way to walk through a pair of ordered lists efficiently is the same whether the lists have duplicates or not. “Natural” is a bit hard to define, but what I mean is that it would take extra code to force a predicate to only accept a list with unique elements (for example, you’d have to keep track of the last element seen and check it’s different than the current element).

I think this extra code is missing from the library because it is not necessary to ensure the library works correctly on lists with unique elements, and it’s hard to justify writing extra code to make an implementation less general, let alone to notice that it is (usually we check that the code does what its spec says, not that it doesn’t do a bit of extra stuff that doesn’t get in the way of the main purpose of the code).

On the other hand, I now seem to remember an older discussion, probably not on this incarnation of the mailing list, about representing multisets as key-value pairs, like you say and like @Johnny_Rotten suggested. I wonder where I saw this before. It may even have been in a textbook. Craft of Prolog? Not sure.

Well, if you’re saying the implementation can change then I guess the best thing for me is to keep a local copy of the library to make sure I don’t lose the functionality. It sucks a little, but at least I think I can distribute it with my code if I respect the license? Not that I’m currently planning to distribute the code, but I may put it on github at some point.

Anyway the main reason I’ve raised this is that I think it’s an interesting quirk of Prolog: sometimes you write a bit of code in the way that feels natural and is most efficient and it turns out to not only solve your immediate problem but also the generalisation of your immediate problem. Kind of like the way append/3 doesn’t just append lists but also splits them. I suppose that’s what you get for having a language where relations, rather than functions, are the first-class citizens. I’d like to be able to point to library(ordsets) as an example of this peculiarity of Prolog.

Thanks for taking the time to write that, but I don’t understand the terminology at all :frowning:

Me neither, but the WAM Book author recently layed hands on it:

Generic Linear Fixed-Point Equation Solving in Arbitrary Semi-Rings
Hassan AĂŻt-Kaci - December 2019
https://www.researchgate.net/publication/335082687

In general OO languages have a little advantage in keeping things
opaque, like changing an implementation, since they have language
support to publish an interface. For example JavaScript has

recently created a library(ordsets), basically on top of their Set()
data structure, which only says for the data structure itself input
order and lookup should be better than linear, which makes them

speedier than lists with a memberchk/2 test. And then they added:

/* lib.esnext.collection.d.ts */
interface Set<T> {
    union<U>(other: ReadonlySetLike<U>): Set<T | U>;
    intersection<U>(other: ReadonlySetLike<U>): Set<T & U>;
    difference<U>(other: ReadonlySetLike<U>): Set<T>;
    symmetricDifference<U>(other: ReadonlySetLike<U>): Set<T | U>;
Etc..

Why is it like ordsets? Well almost, it doesn’t work for compounds, but it
shares the trait with Prolog logic programming that it creates a new result
object or shares input, and doesn’t do inline modification, i.e. Readonly

args. The fancy thing is how type script infered types are annotated.

BTW: The paper has some taxonomy graphs, can one trust them?

That is the ultimate safe way. Just relying on the current implementation is fairly safe, but not guaranteed.

That is IMO a different thing. append/3 describes a relation. Here we are talking about how some piece of code acts on unanticipated input. Consider phrase/2,3. Once upon a time that could be used with non-lists and was used for general threaded variables occasionally. The introduction of strings in SWI-Prolog made unintended calls that would just fail too likely, so now phrase/2,3 does a type check. Likewise, the ordset library could be implemented differently, or static checking could be added that would validate that the input is indeed a valid ordset.

append/3 is not a great example, sorry. A better example is when I want to add or subtract a number to or from another. One way to do that is to write something like:

% Version 1
add_or_subtract(+,X,Y,Z):-
  Z is X + Y.
add_or_subtract(-,X,Y,Z):-
  Z is X - Y.

But I can also do it like this:

% Version 2
add_or_subtract(S,X,Y,Z):-
  Op =.. [S,X,Y]
  ,Z is Op.

And now, suddenly, my predicate works not only on addition and subtraction but also on every other arity-2 arithmetic function. Here, I didn’t make my code more efficient, but I made it shorter and more general, and it turns out to be in a sense maximally general. That’s the kind of quirk of Prolog I was trying to point to.

Now, if I don’t want that maximal generality I have to explicitly restrict S to be in {+,-}, for example like this:

% Version 3
add_or_subtract(S,X,Y,Z):-
  must_be(oneof([+,-]),S)
  ,Op =.. [S,X,Y]
  ,Z is Op.

But, now, what have I achieved and what was my motivation? I’ve made my predicate more special, by adding extra code, or in other words I’ve done extra work to take away its generality.

But, why? I’m sure there are use cases where restricting the value of S to {+,-} maybe be necessary. But there are also cases where it’s not. If I go with Version 2, I can add that predicate to a library and reuse it in both kinds of use cases.

Whenever I need strict type-checking, I can do it before calling add_or_subtract/4 and in fact, this type-checking can restrict the values of S to a different set of values than {+,-} For instance, for one project I might call add_or_subtract/4 like this:

... 
,must_be(oneof([+,-]),S)
,add_or_subtract(S,X,Y,Z)
, ... 

And for another I might call it like this:

... 
,must_be(oneof([arc,tan,cos]),S)
,add_or_subtract(S,X,Y,Z)
, ... 

At that point, the only problem that remains is that the predicate is misnamed: it should be named something like apply_arithmetic_function or something like that.

So my argument is that I can’t see the utility of modifying code to take away useful behaviour. If the motivation is to make the code agree with the documentation, it makes more sens to me (in this particular case and not as a general rule) to update the documentation to better describe the behaviour of the code, if we can agree that the code is not doing something that nobody wants it to do. If the code turns out to behave in a way that there are use cases for (and I think in this case it does) then I think it makes more sense to keep the code the way it is, or improve it without breaking its undocumented behaviour, and instead document that behaviour.

And, again, what’s the alternative, if one wants the multiset behaviour found in library(ordsets) like I do? Copy the library, and rename it. But
 why duplicate code like that? Or write a new library from scratch, that will end up looking just like the old library. That’s not great either.

It has already some implicit type checking, by the use of
Picat inspired subsume based matching (=>)/2, like here:

union2([], H1, T1, Union) =>
    Union = [H1|T1].
union2([H2|T2], H1, T1, Union) =>
    compare(Order, H1, H2),
    union3(Order, H1, T1, H2, T2, Union).

But it doesn’t automatically translate to meaningfull error
messages, a little dilemma, like the common instantiation
error or the common type error list:

?- ord_union([foo, bar], baz, X).
ERROR: No rule matches ordsets:union2(baz,foo,[bar],_168)

?- ord_union([foo, bar], _, X).
ERROR: No rule matches ordsets:union2(_2460,foo,[bar],_2466)

This is quite a dilemma, although the Picat inspired (=>)/2
leads to more robust code, it lacks some common standard
of error messages, that usually appear.

For example must_be/2 gives other error messages:

?- must_be(list, baz).
ERROR: Type error: `list' expected, found `baz' (an atom)

?- must_be(list, _).
ERROR: Arguments are not sufficiently instantiated

Thanks for the reference.

That’s 
fancy indeed. Reminds me more of C# than javascript.

Honestly, I’ve no idea. I’ve kept well away from category theory. It reminds me a bit of collecting Pokemon. Too much work -and for what?

This hadn’t occurred to me but the use of =>/2 implies that the library is not as old as I thought, it must have been re-worked more recently. I guess I could have a look at the repo. I thought this was an ancient library that nobody would touch anymore, kind of like the Cobol code running on banks’ mainframes (don’t ask). I guess I was wrong.

I am honored that you think its category theory, basically abstract
nonsense, by some expert mathematicians. But its not category theory.
Catgeory theory is typically invoked when one wants to go beyond

traditional sets, but the definition has only:

Since the carrier is a set, you can use classical mathematical logic
enriched by ZFC. ZFC has even function spaces, thats nothing unique
to category theory. But you could look at algebraic structures of course

also through the category theory lense. You might indeed see
Pokemons or white Elephants or pink Unicorns.

I guess my knowledge stops at FOL. I have a very vague idea even about ZFC. You could argue I’m stuck somewhere in the 1920’s, between Hilbert and Gödel, in terms of my outlook on mathematics. If asked, I say that I just like to play it safe :stuck_out_tongue:

FOL is also enough. FOL invokes the idea of
signature of a structure all the time:

Signature (logic)
https://en.wikipedia.org/wiki/Signature_(logic)

Its just a cataloge of the domains and their constants,
functions and relations.

These things were discussed and invented before
1920, see Peirce Relations, Russels Complexes, etc..

That, I can work with. But what is the relation with e.g. semirings? FOL functions don’t really work like algebraic functions.

They are supposed to be total, just like + and *,
in a semiring. So if you write in FOL:

∀x ∀y ∀z (x * (y + z) = x * y + x * y)

And if you don’t use multi-sorted FOL, there is
one domain of discourse D and:

* : D x D -> D

+ : D x D -> D

But things are tricky from inside your FOL formulas,
D is only a class, not a set. You cannot reify it.

But every day mathematics is more flexible.