C float comparison based on "extended precision" floats fails unexpectedly

jan · December 31, 2022, 1:44pm

Thanks. If it is try that the mpq_canonicalize() can be omitted the performance may actually be better.

I’m looking at the definition of mpz_addmul_ui(). I assume I’m missing something obvious, but it seems to be doing that GMP docs state!?

Anyway, please have a look at the need for mpq_canonicalize() and make a PR.

ridgeworks · December 31, 2022, 5:03pm

I’m happy to take your word that canonicalization is unnecessary; I removed it and indeed there appears to be no issue. However, there’s only a minimal effect on performance since it is only done once on the final result.

The test set is minimal: rationalize(pi). While absolute performance is dependent on the data, I was only looking for relative performance, so didn’t pursue it.

The code in bf_gmp.h:

static inline void
mpz_addmul_ui(mpz_t r, const mpz_t n1, unsigned long n2)
{ mpz_t add;

  mpz_init(add);
  bf_mul_ui(add, n1, n2, BF_PREC_INF, BF_RNDN);
  bf_add(r, n1, add, BF_PREC_INF, BF_RNDN);
  mpz_clear(add);
}

As I read it this calculates r = n1+n1*n2 ; my interpretation of the spec says it should calculate r = r+n1*n2.

However, looking at the code base I don’t think this ever gets called. The only references are in function ar_add_si in pl-arith.c for adding a long to a Number when that Number is a rational (V_MPQ). However, ar_add_si is only called from builtin predicates between/3 and bounded_number/3 where Number is always an integer. This probably makes it safe to fix (if necessary) but hard to test.

j4n_bur53 · January 1, 2023, 7:10am

First post in 2023? Happy new year everybody. I don’t know
what is slowing down your implementation. Was doing this test:

?- X is pi, time((between(1,10000,_), rat_rationalize(X, Y), fail; true)).
% 1,010,000 inferences, 0.312 CPU in 0.321 seconds (97% CPU, 3232000 Lips)
X = 3.141592653589793.

?- X is pi, time((between(1,10000,_), rat_rationalize2(X, Y), fail; true)).
% 880,000 inferences, 0.188 CPU in 0.196 seconds (96% CPU, 4693333 Lips)
X = 3.141592653589793.

Some speed doubling! The two versions are:

rat_iter((V,W), (M,N), (P,Q), Flt, Rat) :-
   divmod(V, W, D, U),
   A is D*M+P,
   B is D*N+Q,
   Try is A rdiv B,
   ( (float(Try) =:= Flt ; U == 0)
	-> Rat = Try
      ;  rat_iter((W,U), (A,B), (M,N), Flt, Rat)
   ).

rat_iter2((V,W), (M,N), (P,Q), Flt, Rat) :-
   divmod(V, W, D, U),
   A is D*M+P,
   B is D*N+Q,
   ( (A/B =:= Flt ; U == 0)
	-> Rat is A rdiv B
      ;  rat_iter2((W,U), (A,B), (M,N), Flt, Rat)
   ).

In SWI-Prolog, I tested version 9.1.0, the A/B is different from
float(A rdiv B) I guess, the later I assume is mpz_fdiv(), at least
this test case shows me a difference:

/* SWI-Prolog 9.1.0 */
?- X is float(300000000000000000000009r9).
X = 3.333333333333333e+22.

?- X is 300000000000000000000009/9.
X = 3.3333333333333336e+22.

But I am not sure what is desired. In my implementation of rational
numbers I do not yet have some mpz_fdiv(), so there is currently
no difference between A/B and float(A rdiv B). But for high fidelty

possibly float(A rdiv B) is needed. There is already a fast path
in the mpz_fdiv() implementation. For example I find. But
maybe its too late, you have already converted a smallint

to a bigint, which caused an overhead:

if ( sa <= 53 && sb <= 53 )
    return mpz_get_d(a) / mpz_get_d(b);

https://github.com/SWI-Prolog/swipl-devel/blob/eedadd503ee9bee350006dbb7e01b110fdd9d4a7/src/pl-gmp.c

Testing would get easier, if I could bypass (rdiv)/2, and build
a rational somehow SWI-Prolog. I guess the speed doubling
is more caused by calling (rdiv)/2 only once

in rat_iter2. But how would I test in pure Prolog the omission
on mpq_canonicalize() ? In my Prolog system I can call
rational/3 with mode (-, +, +). So I am a little bit at loss in

fine tuning and optimizing the rationalize/1 implementation.

Edit 01.01.2023
I guess something between rat_iter and rat_iter2 would be
the reverse engineered, from C back to Prolog, current latest
prototype. But its not poissible to completely reverse engineer

to 100% Prolog, since there is no Prolog way in SWI-Prolog
to create a rational number without mpq_canonicalize(). So
many parts of the C code cannot be transfered back to pure

Prolog, the omission of mpq_canonicalize() and the call of
mpz_fdiv() which is different from A/B cause difficulty to model
back into Prolog, so that never (rdiv)/2 is called.

ridgeworks · January 2, 2023, 3:34pm

On SWIP, the differences appear to be small:

?- X is pi, time((between(1,10000,_), rat_rationalize(X, Y), fail; true)).
% 310,004 inferences, 0.298 CPU in 0.301 seconds (99% CPU, 1039093 Lips)
X = 3.141592653589793.

?- X is pi, time((between(1,10000,_), rat_rationalize2(X, Y), fail; true)).
% 310,000 inferences, 0.291 CPU in 0.291 seconds (100% CPU, 1066010 Lips)
X = 3.141592653589793.

But you’re probably correct in stating that the termination test for float equivalence comes at a significant cost. mpz_fdiv is a fairly beefy function entailing special cases, rounding modes, etc.

But the C version is fairly quick IMO, particularly using GMP:

?- X is pi, time((between(1,10000,_), Y is rationalize(X), fail; true)).
% 20,000 inferences, 0.037 CPU in 0.037 seconds (100% CPU, 541697 Lips)
X = 3.141592653589793.

That’s less than 4 microseconds (on my 7 year old hardware) to rationalize pi, which if I recall takes about 14 iteration steps. This seems more than adequate for the identified use case so I’m not sure further effort to improve it is warranted. Tuning the Prolog versions is a different matter, and dependant on Prolog implementation, but I’m not sure what the point would be outside academic curiosity.

j4n_bur53 · January 2, 2023, 5:32pm

What do you mean by SWIP? I have never heard
that acronym. I was testing with SWI-Prolog 9.1.0
on a Windows machine. Maybe you used another

platform than Windows? Why is there such a difference
between platforms?

/* SWI-Prolog 9.1.0, Windows 10 Machine AMD Ryzen */
% 1,010,000 inferences, 0.312 CPU in 0.321 seconds (97% CPU, 3232000 Lips)
% 880,000 inferences, 0.188 CPU in 0.196 seconds (96% CPU, 4693333 Lips)

/* Your 7 year old hardware, what is it? And what SWI-Prolog Version? */
% 310,004 inferences, 0.298 CPU in 0.301 seconds (99% CPU, 1039093 Lips)
% 310,000 inferences, 0.291 CPU in 0.291 seconds (100% CPU, 1066010 Lips)

Why do the number of inferences differ that much? Not rat_iter2
doesn’t have the Try is A rdiv B goal, therefore it must have at
least 10000 less inferences? But both of your runs have the

same number of inferences. There is something fishy, possibly
you didn’t rename the tail recursive call from rat_iter to rat_iter2,
in rat_iter2. This could be the error and you were running the same

thing twice? Right?

ridgeworks · January 2, 2023, 7:57pm

Always thought SWIP = SWI-Prolog.

Anyway SWI-Prolog, V 9.1.0. MacOS 10.13.6, 3Ghz, 8-Core Intel Xeon E5

Yes, I dropped the ball on this one - unoptimized artithmetic:

?- X is pi, time((between(1,10000,_), rat_rationalize(X, Y), fail; true)).
% 1,010,000 inferences, 0.376 CPU in 0.376 seconds (100% CPU, 2683651 Lips)
X = 3.141592653589793.

?- X is pi, time((between(1,10000,_), rat_rationalize2(X, Y), fail; true)).
% 880,000 inferences, 0.222 CPU in 0.223 seconds (100% CPU, 3956888 Lips)
X = 3.141592653589793.

and optimized (“arithmetic inferences” compiled away):

?- X is pi, time((between(1,10000,_), rat_rationalize(X, Y), fail; true)).
% 310,000 inferences, 0.308 CPU in 0.308 seconds (100% CPU, 1007144 Lips)
X = 3.141592653589793.

?- X is pi, time((between(1,10000,_), rat_rationalize2(X, Y), fail; true)).
% 310,000 inferences, 0.165 CPU in 0.165 seconds (100% CPU, 1877332 Lips)
X = 3.141592653589793.

The prefer_rationals flag impacts the semantics of A/B so that also has an impact.

I believe per-iteration canonicalization is avoided if the test is A/B =:= Flt and the prefer_rationals flag is false. I also think this doesn’t affect the C version, since there is no rational generated until the final result and hence no per-iteration canonicalization, i.e., mpz_fdiv(a,b) is equivalent (I think) to A/B with prefer_rationals = false.

ridgeworks · January 2, 2023, 8:27pm

Don’t disagree with anything you’ve said. I think this current exercise is a good example of using Prolog to develop/verify the algorithm which then serves as an executable specification for the C function. That doesn’t mean that the low level detail of data representation or operation must be equivalent. The new C code (not much in common with existing version) for the iteration structurally resembles the Prolog “specification” (see earlier post) but you’ve already noted some differences. The surrounding setup/cleanup code is quite different; for simplicity everything (even 0 and 1) is a “bigint” and leave it to GMP sort out any optimizations at the expense of a function call. I think that’s the right call in the near term for the available use cases (rationalize(pi) < 4 microseconds), but new requirements may yet come to light.

ridgeworks · January 2, 2023, 11:12pm

Yes, the value of pi being used is the closest floating point value to real pi. Given that, the sequence is a match up to the final termination value:

?- rat_rationalize(pi,R).
3, 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 3, 
R = 245850922r78256779.

j4n_bur53 · January 3, 2023, 3:32am

A001203 has for the real real value:

    3, 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, ...

rationalize of the machine floating point value gives:

    3, 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 3

Converting the two gives:

?- rat_cont([3,7,15,1,292,1,1,1,2,1,3,1,14,2], Rat).
Rat = 165707065r52746197 .

?- rat_cont([3,7,15,1,292,1,1,1,2,1,3,1,14,3], Rat).
Rat = 245850922r78256779 .

The truncation of A001203 gives a pi which is one epsilon above:

?- X is pi.
X = 3.141592653589793.

?- X is 165707065/52746197.
X = 3.1415926535897936.

Edit 03.01.2023
Source code in case somebody wants to play around:

rat_rationalize3(Flt, Rat, List) :-
   R is rational(Flt), rational(R, N, D),
   rat_iter3((N,D), (1,0), (0,1), Flt, Rat, List).

rat_iter3((V,W), (M,N), (P,Q), Flt, Rat, [D|List]) :-
   divmod(V, W, D, U),
   A is D*M+P,
   B is D*N+Q,
   ( (A/B =:= Flt ; U == 0)
       -> Rat is A rdiv B, List=[]
       ;  rat_iter3((W,U), (A,B), (M,N), Flt, Rat, List)
   ).

rat_cont([D], D) :- !.
rat_cont([D,E|List], Rat2) :-
   rat_cont([E|List], Rat),
   Rat2 is D+1 rdiv Rat.

j4n_bur53 · January 3, 2023, 11:13am

Now you can spare a further mpq_canonicalize(). Namely here,
I guess rational/1 will call mpq_canonicalize(), right?

rat_rationalize3(Flt, Rat, List) :-
   R is rational(Flt), rational(R, N, D),
   rat_iter3((N,D), (1,0), (0,1), Flt, Rat, List).

Replace the combo rational/1 and rational/3 by something
else, that doesn’t call mpq_canonicalize(). The continued fraction
result will be the same, with and without mpq_canonicalize()

in this part of the code. Omitting the mpq_canonicalize() in this
part of the code, is doing the underlying GCD algorithm, which
is part of rationalize/1 divmod/4 loop anyway only once, and

not twice. If we already do mpq_canonicalize() we will do
a full GDC algorithm just up front. And then the rationalize/1
divmod/4 loop will do an GCD algorithm again, although not

necessarily a full one, because of the stopping condition.
Thats quite some unnessary redundancy, the up front full GCD
algorthm. Although it might help reduce the size of the arguments,

I guess the rationalize/1 divmod/4 loop will anyway do the same,
reduce the size of the arguments, only it will do it on the fly a little
later. See also how rationalize/1 does indeed some GCD algorithm:

Continued Fractions on the Stern-Brocot Tree
https://www.cut-the-knot.org/blue/ContinuedFractions.shtml

Edit 03.01.2023
On my system I see indeed a small gain, around 10% speed-up:

/* Jekejeke Prolog 1.5.5, Windows, JDK 19 */

/* With mpq_canonicalize(), i.e. the combo rational/1 and rational/3 */
?- X is pi, time((between(1,10000,_), Y is rationalize(X), fail; true)).
% Threads 610 ms, GC 4 ms, Up 631 ms (Current 01/03/23 12:15:47)
X = 3.141592653589793.

/* Without mpq_canonicalize(), i.e. something else */
?- X is pi, time((between(1,10000,_), Y is rationalize(X), fail; true)).
% Threads 563 ms, GC 5 ms, Up 569 ms (Current 01/03/23 12:29:34)
X = 3.141592653589793.

The something else is this here:

rat_start(V#W, V, W) :- !.
rat_start(F, V, W) :-
   sys_float_mantissa(F, M),
   sys_float_exponent(F, E),
   sys_float_radix(F, B),
   (E < 0 ->
       V = M,
       user: -(E, H),
       user: ^(B, H, W);
       user: ^(B, E, H),
       user: *(M, H, V),
       W = 1).

It will deliver this nominator and denominator, already for the machine
floating point value 0.1 then omitting mpq_canonicalize() gives a slightly
different starting pair, scaled by a common factor 2 in this example:

?- R is rational(0.1), rational(R, V, W).
V = 3602879701896397, W = 36028797018963968.

?- rat_start(0.1, V, W).
V = 7205759403792794, W = 72057594037927936.

But there is no harm, rationalize/1 does still the same thing, because the
D in divmod/4 of the new rationalize/1 algorithm will be the same, even
if the arguments are both non-canonical, i.e. scaled by some common factor:

?- X is rationalize(0.1).
X = 1#10.

ridgeworks · January 4, 2023, 4:54pm

Almost certainly, but that detail is buried in the GMP function mpq_set_d which sets a rational fraction from a floating point value. It’s called from both the rational arithmetic function and the new C implementation of rationalize.

There may be faster ways of creating the starting “rational” (not necessarily canonicalized) but I can’t think of any easier, so I’m inclined to leave it until some new use case requires an incremental improvement. (There’s now a PR for the new implementation so the whole source is available for review.)

Topic		Replies	Views
Arithmetic functions rational/1 and rationalize/1 General	37	877	December 30, 2022
Ann: SWI-Prolog 9.0.3 Releases	0	359	December 18, 2022
Ann: SWI-Prolog 9.1.2 Releases	0	330	December 18, 2022
Windows, unit test swipl/rational Predicate	10	407	March 31, 2022
Get floating point representation in memory Predicate	21	1639	March 22, 2020

C float comparison based on "extended precision" floats fails unexpectedly

Related topics