Understanding Gradients in Prolog

Now suppose I have real values a1, .., an.
I want to maximize L(p) on the interval [0..1]:

L(p) = p Π_j (1-p^aj) 

If a1 = .. = an = 1 the solution is p = 1/(n+1), but otherwise?
How would I do that in Prolog, with or without gradient?

P.S.: I can use CLP(BNR) for SWI-Prolog like this:

?- current_prolog_flag(clpBNR_default_precision,P).
P = 6.

?- P::real(0,1), time(global_maximum(P*(1-P**2)*(1-P**3), Q)), midpoint(P, R).
% 2,632,164 inferences, 0.844 CPU in 0.881 seconds (96% CPU, 3119602 Lips)
R = 0.5749041546976932,
P::real(0.48290488838067125, 0.666903421014715),
Q:: 0.328600... .

But the result R = 0.5749041546976932 is not near the maximum.
Wolfram Alpha gives me P = 0.484755.. and Q = 0.3268.. . So
since Q is nevertheless corrrect, maybe there is a bug in CLP(BNR),

i.e. some problem in returning the result of the maximization?

Ok, maybe I was misleaded by this paper, which was also
talking about gradient. But the function L(p) has only one
parameter, so there is anyway no vector gradient:

Lifted discriminative learning of probabilistic logic programs
A. N. Fadja and F. Riguzzi - 2018
https://link.springer.com/article/10.1007/s10994-018-5750-0

Even ChatGPT didn’t know that the derivative is not needed.
One has to look closely at the curve, is it unimodal ?

Now one can use ternary search:

trisect(A, B, _, X) :- B - A < 1e-6, !,
   X is (A+B)/2.
trisect(A, B, F, X) :-
   P is (2*A+B)/3,
   call(F, P, U),
   Q is (A+2*B)/3,
   call(F, Q, V),
   (U < V ->
      trisect(P, B, F, X);
      trisect(A, Q, F, X)).

Which is ultrafast, doesn’t need derivative and can reach the precision:

fun(P, F) :-
   F is P*(1-P**2)*(1-P**3).

?- time(trisect(0,1,fun,P)), call(fun,P,Q).
% 332 inferences, 0.000 CPU in 0.000 seconds (0% CPU, Infinite Lips)
P = 0.4847548172992615,
Q = 0.32860038445085504.

I don’t think there is a bug in clpBNR but a misunderstanding on your part.
the predicate global_maximum(P*(1-P**2)*(1-P**3), Q) tells clpBNR to find the maximum Q with a precision of 6 digit, that’s all.
If you want to narrow P, you can tell it do so with the solve(P) predicate:

?- P::real(0,1), time(global_maximum(P*(1-P**2)*(1-P**3), Q)), solve(P).
% 2,361,153 inferences, 0.199 CPU in 0.199 seconds (100% CPU, 11866925 Lips)
P:: 0.484263...,
Q:: 0.3285999... .

This is a classic “problem” of interval arithmetic where the interval will blow up when used with exponents.

Ok, gotcha, but using solve/1 doubles the runtime again. It jumps from
0.844 secs to 1.781 secs, since solve/1 forces to compute the root of
a function, i.e. f(x) - ymax = 0 where f was the non-trivial objective function:

?- P::real(0,1), time((global_maximum(P*(1-P**2)*(1-P**3), Q), solve(P))).
% 5,062,053 inferences, 1.781 CPU in 1.802 seconds (99% CPU, 2841854 Lips)
P:: 0.4843558...,
Q:: 0.3286000... .

In ternary search the time is even below 0.001 secs. But you would
need an expression analysis and/or a maximizer option, since not all
functions are unimodal. Unimodal is even not the same as concave.

Yeah, although @ridgeworks will know better about this, I believe the solve predicate uses a very simple search strategy where it tries to narrow the interval bit by bit from the ends.

You can always combine your trisect strategy with clpBNR.

I don’t know. Do you mean doing this here?

fun(P, F) :-
   ... /* something clpBNR defined */ ...

But what if I want to bring trisect to CLP(BNR) ?

So using clpBNR:

  1. global_maximum searches for the maximum value of the objective function and applies that as a constraint, but does not fully constrain the values of any variables in the function. It does this so that if there multiple values for the “maximizer(s)”, a non-deterministic search (e.g., solve), can be used to find all such values.
  2. If you are only interested in one such set of maximizer(s), global_maximize will unify the maximizer(s) with the value(s) that determined the maximum value of the function in roughly the same time as global_maximum without the solve, e.g.,
?- P::real(0,1), time(global_maximize(P*(1-P**2)*(1-P**3), Q)).
% 2,519,681 inferences, 0.642 CPU in 0.644 seconds (100% CPU, 3927557 Lips)
P:: 0.484755...,
Q:: 0.328600... .
  1. The non-deterministic solve predicate that comes with clpBNR is a simple bifurcating search on an interval (or list of intervals) that (recursively) splits it at a point which is not a solution. The deterministic predicate absolve nibbles at the ends of the interval to attempt additional narrowing that may not be achieved with either propagation or solve. Both solve and absolve don’t know anything about the system of constraints containing the intervals - they’re brute force but general purpose algorithms.
  2. There is nothing special about the builtin search predicates; they could equally well be done in user code, so I don’t see any impediment to implementing custom search predicates better suited to particular problems.
  3. Ternary search appears to be an alternative search for min/max value. The builtin global minimum/maximum predicates use the Moore-Skelboe algorithm from New Computer Methods for Global Optimization. I haven’t done any comparison so I don’t profess to know the strengths/weaknesses of the two. Note however that the builtin (probably also the ternary search) has a precision/performance tradeoff, e.g.,
?- P::real(0,1), time(global_maximize(P*(1-P**2)*(1-P**3), Q, 4)).
% 215,321 inferences, 0.056 CPU in 0.057 seconds (99% CPU, 3820254 Lips)
P:: 0.484...,
Q:: 0.3286... .
  1. If you know that the global optima is also a local optima, the local_maxima predicate from library(clpBNR_toolkit) can improve performance and precision. It does so by applying additional constraints based on the derivative of the objective function. This adds some overhead so the altering precision parameter has much less impact:
?- P::real(0,1), OF = P*(1-P**2)*(1-P**3),local_maxima(OF),time(global_maximize(OF,Q)).
% 402,609 inferences, 0.084 CPU in 0.085 seconds (100% CPU, 4774831 Lips)
OF = P*(1-P**2)*(1-P**3),
P:: 0.48475471541839...,
Q:: 0.3286003844508... .

?- P::real(0,1), OF = P*(1-P**2)*(1-P**3),local_maxima(OF),time(global_maximize(OF,Q,2)).
% 300,543 inferences, 0.063 CPU in 0.064 seconds (99% CPU, 4737885 Lips)
OF = P*(1-P**2)*(1-P**3),
P:: 0.4847547...,
Q:: 0.3286004... .

HTH.

But somehow they dont agree:

?- X::real(0,pi/4), time((global_maximum(X*sin(4*X), Y), solve(Y))).
% 1,914,172 inferences, 0.578 CPU in 0.579 seconds (100% CPU, 3311000 Lips)
X:: 0.50...,
Y:: 0.454926... .

?- X::real(0,pi/4), time(global_maximize(X*sin(4*X), Y)).
% 1,285,825 inferences, 0.375 CPU in 0.381 seconds (98% CPU, 3428867 Lips)
X:: 0.507189...,
Y:: 0.454926... .

Is this to expect somehow? The two X are different.

What does it do when I have a 2-dimensional problem?

?- V::real(0,1), W::real(0,1), global_minimum(V**2+W**2+(1-V-W)**2, E).
%%% hangs ?

Does it give up when there are more than 1 parameter?

They’re different but not inconsistent; the second value of the X interval is contained in the first:

?- X::real(0,pi/4), time((global_maximum(X*sin(4*X), Y), solve(Y))), writeln(X).
% 1,936,549 inferences, 0.389 CPU in 0.391 seconds (99% CPU, 4980247 Lips)
_208{real(0.5068002638043855,0.5075827918577587)}
X:: 0.50...,
Y:: 0.454926... .

?- X::real(0,pi/4), time(global_maximize(X*sin(4*X), Y)), writeln(X).
% 1,308,217 inferences, 0.266 CPU in 0.267 seconds (99% CPU, 4921865 Lips)
_198{real(0.5071892508109634,0.5071896253179917)}
X:: 0.507189...,
Y:: 0.454926... .

They are different because the narrowing process applied to X is different. The first uses solve to search for the value (or values) of X that satisfy the computed maximum value of Y.

In the second case X is set to a value that is used to generate the max value of Y. The difference is subtle, but the second can’t be used if one is interested in finding more than one maximizer. Unless that is the case, it’s better to use global_maximize.

No, but that particular example takes a long time at the default precision; using a precision value of 3:

?- V::real(0,1), W::real(0,1), global_minimum(V**2+W**2+(1-V-W)**2, E,3).
V::real(0, 0.577773167543888),
W::real(0, 0.577773167543888),
E:: 0.33... .

?- V::real(0,1), W::real(0,1), global_minimize(V**2+W**2+(1-V-W)**2, E,3).
V:: 0.333...,
W:: 0.333...,
E:: 0.33... .

Again, using local_minima constraint (because the OF is differentiable in both variables) is a big improvement:

?- V::real(0,1), W::real(0,1), OF = V**2+W**2+(1-V-W)**2, local_minima(OF), global_minimum(OF,E).
OF = V**2+W**2+(1-V-W)**2,
V:: 0.3333333333333333...,
W:: 0.3333333333333333...,
E:: 0.333333333333333... .

You can solve it directly with the Moore–Penrose inverse, since
it is a linear least square problem. You can also solve it iteratively with
various gradient methods using as a loss function the aready present

L2 norm. What method does clp(BNR) currently use?

Not sure I understand the question, but the set of global_* predicates use the Moore-Skelboe algorithm referenced above. The local_m* predicates without additional explicit constraints just add the constraint(s) that the derivative of the objective function with respect to each variable in the function expression is 0 at a local minimum or maximum.

There is a version of local_m* which permits additional constraints to be defined at the local optima; these constraints are derived from the Karush–Kuhn–Tucker conditions. The example from library(clpBNR_tookit)'s pldoc:

?- [X1,X2]::real, OF = X1**4*exp(-0.01*(X1*X2)**2),
local_maxima(OF,{2*X1**2+X2**2==10}), global_maximum(OF,Z), solve([X1,X2]).
OF = X1**4*exp(-0.01*(X1*X2)**2),
X1:: -2.23606797749979...,
X2:: 0.0000000000000000...,
Z:: 25.0000000000000... ;
OF = X1**4*exp(-0.01*(X1*X2)**2),
X1:: 2.23606797749979...,
X2:: 0.0000000000000000...,
Z:: 25.0000000000000... .

The Global Optimization chapter of the CLP(BNR) User Guide might also help to fill in some blanks.

I had a look at this algorithm a few days ago, didn’t get it,
that it also works multi dimensional.

So in the end it is what? KKT conditions combined somehow
with MS algorihm. I still don’t understand what your system

does, and why the local_minima constraint is a big improvement.
How does the constraint change the search algorithm

and improve it? It doesn’t change the shape of the objective
function. How does it change the search.

Is the key here your Taylor Contractor stuff?

The interaction with the MS algorithm is through the additional constraints imposed by local_m* on the variables in the objective function before global_* is called. The MS search runs exactly the same except that failure (via local propagation) occurs faster due to these additional constraints on the variables. As an added bonus, the results are “sharpened” as part of the constraint propagation process.

So it’s just the classic argument improving performance by using constraints to prune the search while generating answers, rather than a generate and test approach.

This is unrelated to global optimization, but maybe there’s a way of composing them; I haven’t thought about it.