CLP Optimizations Solved! Should I have known?

You might have noticed from other posts that I’ve been getting into CLP.

The problem relates to board-game maps that must satisfy constraints as follows. (Feel free to skip this.)

  1. the map is a grid of known size
  2. each location in the grid has one of several known “properties”
  3. for any property type, it must occur between a minimum and maximum number of times
  4. some properties must occur with spatial relation to others
  5. some properties are constrained by proximity to the map edges
  6. there are guarantees about columns
  7. there are guarantees about rows

Problem:
Generating the constraints is fine.
Labeling (solving) was impossibly slow. I was doing a maplist(labeling(...), Rows) because this was natural for printing results. Even when pre-seeding the map with some values, a solution would take many hours.

Insight/Solution:
After much experimentation and thought, each informing the other, the following emerged:

  1. columns have more constraints imposed on them than rows
  2. while maplist(labeling(...), Rows) was unusably slow, even when pre-seeded, transpose(Rows, Cols), maplist(labeling(...), Cols) was nearly immediate, even without any entries seeded

I know that labeling has parameters that affect variable ordering solution strategy, but none had any meaningful impact in this case.

What I’m wondering is:
Is there a systematic way for choosing a labeling strategy, and labeling ordering? And/Or is there a way of profiling the constraint solver to find a good strategy? (Especially if the constraint solving is so slow as to not provide solutions during profiling).

Or is this essentially witchcraft?

Thanks.

1 Like

Don’t use maplist/2 with labeling/2

Call labeling/2 once, at the end, providing a list of the relevant variables. So it can do its own clever optimizations.

2 Likes

Oh right! Of course.

Bratko’s Prolog for Artificial Intelligence has a really good chapter on solving these kinds of problems, as well as tips for helping the programs return faster. I won’t be able to do the chapter justice in a summary, but it was stuff like having a mental model of how your program is going to backtrack, and helping it minimize that backtracking.

He had several solutions to the N-Queens problem, and demonstrated how a recursive predicate for setting the Y value of each queen in each column would first “recurse” across the board to the rightmost column, then fill in the columns right to left as it backed out again. Each queen only had to consider the queens to the right, the ones that had already been placed, thus minimizing or eliminating backtracking.

There were other tricks like, if you have multiple variables of different-sized domains, state constraints on the smallest domains first. Try to structure things so that whole areas of the problem space are invalidated early, and don’t have to be explored at all.

Anyway, it’s a great book! If you haven’t spent some time thinking about how your program backtracks, I’d recommend doing that.

2 Likes

Other than trial and error, none that I’m aware of. In general, optimal labeling in any CLP seems to be very much problem and data dependent.

For your particular problem it would appear that ordering the variables strictly by number of constraints might be the best strategy, but one that clpfd doesn’t directly support. But you could try sorting your list of variables (using predsort/3 ?) based on fd_degree/2 before labeling with the leftmost (default) option.

1 Like

Presumably you mean in descending order by number of constraints? And what you’re trying to guess is the number of possible values, so you’d want the variables with the least number of possible values labeled first.

That’s what I meant.

Optional. In general, labelling the variable with the least number of possible values isn’t necessarily optimal in terms of pruning. Otherwise, why is first fail an option?

Is there a systematic way for choosing a labeling strategy, and labeling ordering?

There can be some rules of thumbs but labelling is more of an art than a science, i.e you have to try for your particular problem.

you have to try for your particular problem

Yes. So an update for anyone interested.

With the knowledge that the map rules (in my case) often relate a location to it’s neighbours, and that a map will have one location known, I wrote a predicate that orders points on a grid based on their distance from a given point. So a “radial fill” sort.

Solving from a given point outwards drastically improves performance in nearly all cases.