Thanks, but you underestimate the complexity of the endeavour. I suck at C.
Though I guess that would be one nice C exercise for me.
Well, see, the thing is I donât think library(ordsets) needs any modification at all. Neither for my use case, nor for any use case that requires multiset operations. My working hypothesis is that all the set operations in library(ordsets) also happen to be correct multiset operations.
Thatâs how the intersection of two multisets should work: it should give you a new set with the elements that are in both sets, and so that each element in the new set has multiplicity (i.e. number of repetitions) equal to the smallest multiplicity of that element in the original sets.
I think the rest of the library works the same way also. Iâd have to check, but so far it looks like every other operation is dual-purpose, it works on both sets and multisets.
Granted, thatâs just the result of my quick eyballing of a few of the library(ordset) predicates. To make sure would take more effort that Iâve yet to do. Perhaps thatâs what I should spent some time on then? I could even write some unit tests to make it easier.
Ah, this just clicked. The sum of two multisets is a different operation than the union of two multisets, IIUC. That operation is missing from library(ordsets), but ord_uniont/3 is, AFAICT, correct as a multiset union.
Ah, thatâs why you want the âundefinedâ behaviour to be defined. Given that the code is old, doesnât seem to be buggy, and does what you want, documenting that it works for multi-sets seems reasonable â youâd want to add list_to_ord_multiset(List, MSet):-msort(List,MSet) (also ord_set_to_ord_multiset/2 for completeness); add some comments to the various predicates; and add some multi-set tests to tests/library/test_ordsets.pl.
Yeah somehow min, max from multisets is a lattice and min, sum
from multisets is a semiring. You can do a kind of fuzzy logic
Datalog construction all the time.
If you have a lattice or semiring C, the values. Then the function
space D â C, the key value pairs, are also a lattic or semiring.
You can verify all axioms again. For example mode directed
tabling is one big key value machinery, compensating for
the deduplication via tabling, that erases all multiplicity
information known from ordinary Prolog execution:
Currently allowing to plug-in a lattice, but since it has also sum,
one could maybe make the sum also plugable, by plugging in a
semiring. The numbers semiring goes by the name tropical semiring.
Thatâs right! Sorry, I think I confused the point in my first comments, but then I was confused myself and still am unsure. I think those are good suggestions and Iâll see if I can find the time to follow them.
Heh. I am a category theory illiterate, sorry. What does it mean, in practical terms, that something is a semiring?
The connection to tabling sounds really interesting.
Well the tropical semiring (C, min, +) has this distributivity:
a + min(b, c) = min(a + b, a + c)
So its very close to a semiring (C, +, *), an algebraic structure,
from mathematical logic, consisting of a set with two binary operations
(addition + and multiplication *). Which has this distributivity:
a * (b + c) = a * b + a * c.
You could use such laws to derive test cases for the multi-set
operations. The multi-set operations are more powerful than
tabling, since tabling would see the data through the lense
of variant keys. But semiring distributivity also holds, right?
:- table p(_, min). :- table p1(_, sum).
p(X, Y) :- b(X, Y). p1(X, Y) :- a(X, Y).
p(X, Y) :- c(X, Y). p1(X, Y) :- b(X, Y).
:- table q(_, sum). :- table p2(_, sum).
q(X, Y) :- a(X, Y). p2(X, Y) :- a(X, Y).
q(X, Y) :- p(X, Y). p2(X, Y) :- c(X, Y).
:- table q2(_, min).
q2(X, Y) :- p1(X, Y).
q2(X, Y) :- p2(X, Y).
?- q(X,Y). ?- q2(X,Y).
X = bar, Y = 5 ; X = bar, Y = 5 ;
X = foo, Y = 7 ; X = foo, Y = 7 ;
X = baz, Y = 6. X = baz, Y = 6.
I am inclined to disagree. ordsets is a nice consistent library that could be optimized a lot further, even completely rewriting it using low-level primitives. Iâd rather keep this option open. Then, multisets are probably best represented as a list of pairs Term-Count and supported by a proper library.
Of course, anyone can take the risk and use the ordsets library beyond its documentation. A simple assertion/1 call in a directive could be enough to warn that things changed.
What would change in that case? I still think (but I still need to check also) that the reason the library works both ways (on ordered lists with either unique elements or with duplicates) is its efficient implementation that exploits list ordering.
What I mean is that the natural way to walk through a pair of ordered lists efficiently is the same whether the lists have duplicates or not. âNaturalâ is a bit hard to define, but what I mean is that it would take extra code to force a predicate to only accept a list with unique elements (for example, youâd have to keep track of the last element seen and check itâs different than the current element).
I think this extra code is missing from the library because it is not necessary to ensure the library works correctly on lists with unique elements, and itâs hard to justify writing extra code to make an implementation less general, let alone to notice that it is (usually we check that the code does what its spec says, not that it doesnât do a bit of extra stuff that doesnât get in the way of the main purpose of the code).
On the other hand, I now seem to remember an older discussion, probably not on this incarnation of the mailing list, about representing multisets as key-value pairs, like you say and like @Johnny_Rotten suggested. I wonder where I saw this before. It may even have been in a textbook. Craft of Prolog? Not sure.
Well, if youâre saying the implementation can change then I guess the best thing for me is to keep a local copy of the library to make sure I donât lose the functionality. It sucks a little, but at least I think I can distribute it with my code if I respect the license? Not that Iâm currently planning to distribute the code, but I may put it on github at some point.
Anyway the main reason Iâve raised this is that I think itâs an interesting quirk of Prolog: sometimes you write a bit of code in the way that feels natural and is most efficient and it turns out to not only solve your immediate problem but also the generalisation of your immediate problem. Kind of like the way append/3 doesnât just append lists but also splits them. I suppose thatâs what you get for having a language where relations, rather than functions, are the first-class citizens. Iâd like to be able to point to library(ordsets) as an example of this peculiarity of Prolog.
In general OO languages have a little advantage in keeping things
opaque, like changing an implementation, since they have language
support to publish an interface. For example JavaScript has
recently created a library(ordsets), basically on top of their Set()
data structure, which only says for the data structure itself input
order and lookup should be better than linear, which makes them
speedier than lists with a memberchk/2 test. And then they added:
Why is it like ordsets? Well almost, it doesnât work for compounds, but it
shares the trait with Prolog logic programming that it creates a new result
object or shares input, and doesnât do inline modification, i.e. Readonly
args. The fancy thing is how type script infered types are annotated.
BTW: The paper has some taxonomy graphs, can one trust them?
That is the ultimate safe way. Just relying on the current implementation is fairly safe, but not guaranteed.
That is IMO a different thing. append/3 describes a relation. Here we are talking about how some piece of code acts on unanticipated input. Consider phrase/2,3. Once upon a time that could be used with non-lists and was used for general threaded variables occasionally. The introduction of strings in SWI-Prolog made unintended calls that would just fail too likely, so now phrase/2,3 does a type check. Likewise, the ordset library could be implemented differently, or static checking could be added that would validate that the input is indeed a valid ordset.
append/3 is not a great example, sorry. A better example is when I want to add or subtract a number to or from another. One way to do that is to write something like:
% Version 1
add_or_subtract(+,X,Y,Z):-
Z is X + Y.
add_or_subtract(-,X,Y,Z):-
Z is X - Y.
But I can also do it like this:
% Version 2
add_or_subtract(S,X,Y,Z):-
Op =.. [S,X,Y]
,Z is Op.
And now, suddenly, my predicate works not only on addition and subtraction but also on every other arity-2 arithmetic function. Here, I didnât make my code more efficient, but I made it shorter and more general, and it turns out to be in a sense maximally general. Thatâs the kind of quirk of Prolog I was trying to point to.
Now, if I donât want that maximal generality I have to explicitly restrict S to be in {+,-}, for example like this:
% Version 3
add_or_subtract(S,X,Y,Z):-
must_be(oneof([+,-]),S)
,Op =.. [S,X,Y]
,Z is Op.
But, now, what have I achieved and what was my motivation? Iâve made my predicate more special, by adding extra code, or in other words Iâve done extra work to take away its generality.
But, why? Iâm sure there are use cases where restricting the value of S to {+,-} maybe be necessary. But there are also cases where itâs not. If I go with Version 2, I can add that predicate to a library and reuse it in both kinds of use cases.
Whenever I need strict type-checking, I can do it before calling add_or_subtract/4 and in fact, this type-checking can restrict the values of S to a different set of values than {+,-} For instance, for one project I might call add_or_subtract/4 like this:
At that point, the only problem that remains is that the predicate is misnamed: it should be named something like apply_arithmetic_function or something like that.
So my argument is that I canât see the utility of modifying code to take away useful behaviour. If the motivation is to make the code agree with the documentation, it makes more sens to me (in this particular case and not as a general rule) to update the documentation to better describe the behaviour of the code, if we can agree that the code is not doing something that nobody wants it to do. If the code turns out to behave in a way that there are use cases for (and I think in this case it does) then I think it makes more sense to keep the code the way it is, or improve it without breaking its undocumented behaviour, and instead document that behaviour.
And, again, whatâs the alternative, if one wants the multiset behaviour found in library(ordsets) like I do? Copy the library, and rename it. But⊠why duplicate code like that? Or write a new library from scratch, that will end up looking just like the old library. Thatâs not great either.
But it doesnât automatically translate to meaningfull error
messages, a little dilemma, like the common instantiation
error or the common type error list:
?- ord_union([foo, bar], baz, X).
ERROR: No rule matches ordsets:union2(baz,foo,[bar],_168)
?- ord_union([foo, bar], _, X).
ERROR: No rule matches ordsets:union2(_2460,foo,[bar],_2466)
This is quite a dilemma, although the Picat inspired (=>)/2
leads to more robust code, it lacks some common standard
of error messages, that usually appear.
For example must_be/2 gives other error messages:
?- must_be(list, baz).
ERROR: Type error: `list' expected, found `baz' (an atom)
?- must_be(list, _).
ERROR: Arguments are not sufficiently instantiated
This hadnât occurred to me but the use of =>/2 implies that the library is not as old as I thought, it must have been re-worked more recently. I guess I could have a look at the repo. I thought this was an ancient library that nobody would touch anymore, kind of like the Cobol code running on banksâ mainframes (donât ask). I guess I was wrong.
I am honored that you think its category theory, basically abstract
nonsense, by some expert mathematicians. But its not category theory.
Catgeory theory is typically invoked when one wants to go beyond
traditional sets, but the definition has only:
Since the carrier is a set, you can use classical mathematical logic
enriched by ZFC. ZFC has even function spaces, thats nothing unique
to category theory. But you could look at algebraic structures of course
also through the category theory lense. You might indeed see
Pokemons or white Elephants or pink Unicorns.
I guess my knowledge stops at FOL. I have a very vague idea even about ZFC. You could argue Iâm stuck somewhere in the 1920âs, between Hilbert and Gödel, in terms of my outlook on mathematics. If asked, I say that I just like to play it safe