I am using clpfd to solve a math problem. Once setup, I have 4 variables in the range 0..255 and ca 5k constraints.
Trying to find solutions using labeling([A, B, C, D]) in this case does not return a timely answer.
Any idea to approach this?
Is there a way to do concurrent labeling?
Any suggestions welcome
You’d have to look into clpfd itself to see whether there is some way to split the labeling search space into independent subspaces. Then you could do what you want. What you can do is run multiple threads trying different labeling strategies. Finally, possibly you can split your problem into parts where you can simplify/reduce the constraints before combining them?
Just some ideas … One of the things I plan to implement at some point is a “fork” like functionality where you create a new Prolog thread that is simply a clone of a running one. That should be fairly simple, but I don’t know how useful it would be. There is quite a bit of literature about concurrent Prolog (as opposed to threaded). I don’t know whether that provides interesting insights for clpfd labeling.
Rearranging calculations to use intermediate variables, and adding clp constraints to those intermediate variables, can improve performance - such hints are mentioned in clpBNR guide (search for “performance”, e.g. in the Travelling Salesman section).
If you haven’t done so already, see SWI-Prolog -- Optimisation ; labeling/2 supports different strategies that can have an effect on search performance.
Thank you for the hint.
I have already tested “a lot” of the suggested strategies. However, in my case, none of them returns after a reasonable time.
One option I am missing in the labeling, would be trying random values, rather than counting up/down.
It would equally be excellent if there was a debugging option, to see whats going on inside the process.
Thank you for the pointer to the literature; this was already helpful.
Unfortunately, my problem cannot be simplified, which is by design.
It involves a lot of bit rotations, xor and modulo (2^32) calculations, Each of them leaving a lot of constraints
However, I will try harder to simplify.
e.g.
suml_32(Ls, W):- sum(Ls, #=, S), W #= S mod 4294967296.
?- suml_32([A, B], W), A = 33.
A = 33,
33+B#=_A,
_A mod 4294967296#=W,
W in 0..4294967295.
% B is A rotated by 3 bits to the right
?- rotate_right(A, 3, B).
A*536870912#=_A,
A div 8#=_B,
_C#=_B/\4294967295,
B#=_C/_D,
_D#=_A/\4294967295.
Thanks for your insights. Please allow me to comment:
- look inside clpfd itself: unfortunately, calling labeling/2 seems to be quite an opaque process, or is there a way to see what’s going on inside e.g. debug(whatever)?
- trying different labeling strategies: Yes I did, but no luck so far
- split problem into parts thus simplify/reduce the constraints before combining them: unfortunately, I could not find a way to do so. The system is essentially the resolution of f(f(f(…f(x, y…).
One hope I had was to do a “concurrent labeling” e.g. something naively along the idea of:
concurrent_label([ X | Xs], ):-
foreach(
label([X[)
, thread_create_in_pool(pool, concurrent_label(X, Xs))
)
But that needs more knowledge of threading in Prolog.
Anyway, thanks for your support
In this example, the largely unconstrained domain of W
is 0..4294967295
, i.e., over 4 billion values. Searching such a domain is going to be problematical from a performance perspective. bisect
labelling would seem to be the only practical option. And that’s only effective if the values are constrained enough that there are large regions of the domain that can be removed at some of the steps. But as you can see, there are many solutions to this, so the problem is severely under-constrained:
?- Ls=[33,B], sum(Ls, #=, S), W #= S mod 4294967296, labeling([bisect],[W]).
Ls = [33, B],
W = 0,
-33+S#=B,
S mod 4294967296#=0 ;
Ls = [33, B],
W = 1,
-33+S#=B,
S mod 4294967296#=1 ;
Ls = [33, B],
W = 2,
-33+S#=B,
S mod 4294967296#=2 ;
Ls = [33, B],
W = 3,
-33+S#=B,
S mod 4294967296#=3 ;
There will be a performance overhead to concurrency, so I’m not sure it’s going to be that useful on this simple, under-constrained example.
I assume there’s a lot more to your program than this. If you can provide something more complete, including a query that takes a long time, you might get more helpful advice.
That could work if labeling multiple variables is independent, but it is not SWI-Prolog threads are completely independent solvers. They only share the program and some global state like flags and I/O streams, etc.
That’s really what I have feared, and that explains my poor results in my attempts.
But then, is there a way to copy a new set of constraints into the new thread?
e.g. Something along the idea:
concurrent_label([X | Xs]):-
foreach(
label([X])
,(
copy_constraints(Xs, Cs)
, thread_create_in_pool(pool, concurrent_label(X, Cs))
)).
I do fully agree with you, but my problem domain is that large.
The suml_32/2 was an example, and it is only a fraction of my domain of investigation.
For now, I am trying to make “tactical improvements”, e.g. my rotate_right/3 clause below, seems slightly more efficient then putting the “result #= …” on the last line of the code, although the number of constraints is not reduced, just different.
More ideas for improvement welcome!
rotate_right(Input, Bits, Result) :-
C is 32 - Bits,
Result #= ShiftedRight \/ Underflow % Combine shifted and underflowed bits
, ShiftedRight #= (Input >> Bits) /\ 0xffffffff % Right shift with masking
, Underflow #= (Input << C) /\ 0xffffffff % Calculate bits that underflow
.
N.B. The expresion “C is 32 - Bits” is safe in my specific use case, as Bits is always provided!
This is an interesting suggestion. However, rather than recalculating the model again and again (probably expensive), wouldn’t copy_term/2 do the trick as in:
?- A#=B-3, copy_term([A, B], T).
T = [_A, _B],
A+3#=B,
_A+3#=_B.
``
SWI-Prolog message queues behave as copy_term/2, i.e., they copy attributes and preserve internal sharing and cycles of the input term.
I don’t know what original design your are referring to. Message queues as well as e.g., findall/3 stacked terms have always used PL_record() and this has supported cycles when cycle support was added and attributed variables when that was added.
Thus, if you create a thread from a goal that has constraints, the associated attributed variables are copied to the new thread. Note that this does not faithfully copy all constraints. For example, library(chr)
constraints also involve global variables. It should work fine for clp(fd) AFAIK.
So, according to what you are saying, copy_term/2 should be a good replacement rather than recalculating the model in each thread?
With a complex model, this should be more efficient.
Also, if I understand you well, I could use message queues to parallelise my search?
I will definitely have to look into this!
Picat is about 40 times faster than swi-prolog 9.2.9 for clpfd-heavy calculation in 15 puzzle.
How would that help? A message queue is just a FIFO queue of terms where the term that comes out is a copy of the term that goes in. Normally they are used to transfer terms between threads. Note that each thread has its own stacks and (thus) terms cannot be shared between threads.
Surely SWI-Prolog’s clp(fd) implementation is among the slower ones. First of all, the library is completely written in Prolog while the systems with more performant clp(fd) are partially written in C. They typically do also not support unbounded integers. Second, SWI-Prolog’s attributed variable wakeup seems relatively slow. That should be carefully reviewed at some point.