I am working on a natural language application. Here, it is natural to have variables range over small enumerated sets of constants, e.g. lexical categories {noun, verb,preposition,...}, features such as number (range over {singular, plural}), person (range over {first,second,third}).
Is there some way I can get the ability to declare that certain variables range over such (finite, enumerated) set of atoms, besides encoding them arbitrarily into 1..n (for some n)?
This seem a very natural ask, I suspect something is out there to support this. (OTOH, it should not be too hard to build this up on attributed variables … basically CLP(FD) without integers :-)…)
I once used the code for Attributed Variables in the documentation to do finite domain over a set of atoms. Unfortunately, I don’t have the code now; and in the end I worked out a different way of solving the problem, using iteration to a fixed point … finite domain constraints can be difficult to debug when the constraints fail.
Yes, this is pretty simple. The attribute contains the set of admissible values. The unification hook succeeds if a concrete value is member of the set, fails if not. If two such variables are unified you compute the intersection. If empty, fail, if one element, choose the element and otherwise set the new set.
Whether it is overall a good idea is hard to say. Constraints are pretty expensive, though it seems SWI-Prolog is doing something wrong here are several Prolog systems do a better (=faster) job. But, sometimes there are other ways to get to the same result, for example using dynamic reordering of goals. If you go this way, some metrics can determine the proper set representation. Options are a list, ordered set, hash table, AVL tree, but also a bitmap. The latter can be interesting if the total set of concrete values is small.