Rounding error in (**)/2

I see a heavy rounding error in (**)/2. Whats the rounding mode
when the arguments are float? Here you see the discrepancy:

?- X is float(3333**10).
X = 1.6918160339379514e+35.

?- X is 3333.0**10.0.
X = 1.6918160339379518e+35.

Its quite pronounced, more than 50%:

/* SWI-Prolog 8.5.17 */
?- aggregate_all(count, (between(1,1000000,N),
     float(N)**10.0 =\= float(N**10)), C).
C = 585342.

I don’t see this extreme discrepancy in my system:

/* Jekejeke Prolog 1.5.4, JDK 19 */
?- aggregate_all(count, (between(1,1000000,N),
    float(N)**10 =\= float(N^10)), C).
C = 551.

Has this something to do with te clpBNR flags? Or maybe
an issue with (=\=)/2? The discrepancy is a little surprise,
wasn’t prepared for that.

Edir 21.09.2022
Scryer Prolog isn’t that lucky either. Trealla Prolog is a
little bit more lucky:

/* Scryer Prolog 0.9.0 */
?- aggregate_all(count, (between(1,1000000,N),
float(N)**10 =\= float(N^10)), C).
C = 500254. 

/* Trealla Prolog 2.2.5 */
?- aggregate_all(count, (between(1,1000000,N),
float(N)**10 =\= float(N^10)), C).
C = 12750. 

I think the standard default round mode is nearest. How that’s implemented depends on underlying the C library support; only a small subset of of the arithmetic guarantee correct results under all rounding modes. Most seem to get within a bit in "nearest’ mode.

On my config:

?- float(3333**10) =:= 3333.0**10.0.
true.

but I think this may be a lucky outcome. I would have perhaps expected a 1 bit (?) difference in the floating point values; I suspect most errors in the aggregate test satisfy this criteria but that needs to be verified.

?- aggregate_all(count, (between(1,1000000,N), float(N)**10 =\= float(N^10)), C).
C = 499338.

Different, slightly better, but not as good as Jekejeke Prolog 1.5.4 or Trealla. With all the layers of software it’s hard to determine where any rounding errors might occur.

I don’t see how clpBNR could affect this. It uses the arithmetic function roundtoward which tightly scopes any rounding mode changes.

Maybe you need also look into float/1, not only (**)/2 ?
Especially float/1 of a bigint, I guess smallint could be ok.

I did more testing:

/* Jekejeke Prolog, JDK 19 */
C = 551.

/* Dogelog Player, Python (CPython and PyPy) */
C = 1991.

/* Dogelog Player, JavaScript (FireFox) */
C = 1991.

/* Jekejeke Prolog, JDK 1.8 */
C = 15586.

/* Dogelog Player, JavaScript (Chrome and NodeJS) */
C = 97722.

The differences can be explained that the different host
languages have different runtime systems, with different
intrinsic Math.pow(double, double) implementations, that

is used for (**)/2 in Jekejeke Prolog and Dogelog Player, it
was all tested on the same hardware, so they possibly use
soft implementation of the pow intrinsic. Or maybe some

use hard implementation delegating to FPU, and I might
even see differences on different hardward. Will check with
a different hardware as welll. Give me some time. Since I

have Ryzen and Intel here, maybe will see a difference.

Yes there is also a hardware dependency, and these figures
are no joke, i.e. that one figure is permutation of the other figure,
thats some random accident. I get the following result:

/* Jekejeke Prolog, JDK 1.8, Ryzen, 5 4500U */
C = 15586.

/* Jekejeke Prolog, JDK 1.8, Intel, i5-1135G7 */
C = 15856.

When ints get big enough, they can’t be represented exactly by a floating point double. How they get converted depends on how big. Floating of bigints is done by the GMP library function mpz_get_d. Floating of normal integers is done via some C compiler magic (n->value.f = (double)n->value.i;).

I wouldn’t like to mess with any of that.

The documentation says. Which indicates that directly used
this is the wrong library routine for a float/1.

Convert op to a double, truncating if necessary (i.e. rounding towards zero).

How does Python, JavaScript or Java do it? I guess there
are two options. The first option is:

/* Variant 1 */
float(x) := mpz_get_d(x + ???)

The second option is:

/* Variant 2 */
float(x) := mpz_get_d(x) + ???

The ??? depends on the exponent and some information
about the number x if you want to get HALF_EVEN rounding.

I’ve pushed to a new branch bigint-to-float for @ridgeworks to review (or anyone interested). Notably whether this messes up @ridgeworks work on rounding? All tests pass. This uses code from mpq_get_d_nearest, adopted to better fit the existing code structure. Results:

  • Ubuntu AMD3950X, gcc-11
    ?- time(aggregate_all(count, (between(1,1000000,N), float(N)**10 =\= float(N^10)), C)).
    % 2,001,621 inferences, 0.557 CPU in 0.557 seconds (100% CPU, 3593785 Lips)
    C = 807.
  • Apple M1, AppleClang 13.1
    % 2,002,457 inferences, 0.449 CPU in 0.451 seconds (100% CPU, 4464727 Lips)
    C = 1225.
  • Windows, MinGW 11.2
    % 3,017,381 inferences, 2.330 CPU in 2.331 seconds (100% CPU, 1295013 Lips)
    C = 508689.

The first two are nice. The MinGW result is probably slow due to slow allocation (as we have also seen from the M^N for small integers) and doesn’t round as expected due to poor rounding of much of the MinGW math library (this uses ldexp()).

1 Like

I have a lot of difficulty following what this new version of mpz_fdiv does. Clearly there must be rounding going on but there’s no reference to the current IEEE rounding mode, so I’m suspicious that this always does rounding to nearest. That won’t be helpful, at least to me.

This should be easy to verify using something like float(N**10) =\= roundtoward(float(N**10), to_nearest) with all the various rounding modes. If my suspicion is correct, to_nearest should have small (0) values of C and all the rest should have significant values.

I’ll continue to study the proposed changes to mpz_fdiv to see what can be done to fix rounding mode support.

A couple of other notes:

  1. Some good news: the code for converting small ints to doubles is “n->value.f = (double)n->value.i;” According to my tests, this does obey the current rounding mode when the integer is to large to be represented as an exact float but not large enough to be a GMP bigint:
?- current_prolog_flag(float_max_integer,FM), X is roundtoward(float(integer(FM)+1),to_negative).
FM = X, X = 9.007199254740992e+15.

?- current_prolog_flag(float_max_integer,FM), X is roundtoward(float(integer(FM)+1),to_positive).
FM = 9.007199254740992e+15,
X = 9.007199254740994e+15.
  1. I raised an issue Rational to float conversion isn't monotonic · Issue #1022 · SWI-Prolog/swipl-devel · GitHub. Perhaps this activity will address that as well.

Likewise :slight_smile: As I interpret the code, it first tries to do the exact cases (upto 2^53,
I guess). Otherwise it prepares for an ldexp() call. That is also suggested by the results on MinGW vs Linux and MacOS. So, if ldexp() supports rounding, so does the new implementation. This seems indeed the case. The first case going wrong with truncate is 41^10. We now get:

102 ?- A is roundtoward(float(41^10), to_nearest).
A = 1.34226593101524e+16.

103 ?- A is roundtoward(float(41^10), to_positive).
A = 1.3422659310152402e+16.

Which seems completely right to me. So, I think we are ok if the math library is ok.

As I understand it, ldexp doesn’t do any rounding. It just uses the mantissa from the first argument.

I haven’t built a system for testing purposes, but what I “think” is happening in the new mpz_fdiv:

  1. Exact cases can be handled by converting integers to exact floats and dividing them. Normal arithmetic ops, like division, obey the current rounding mode so that should be correct.

  2. Otherwise the result exponent is calculated along with a 54 bit (note extra bit) quotient (aa) and remainder (bb) using mpz_tdiv_qr. (/* q = aa/bb*2^(sa-sb) */).

  3. “Custom” code then does 53 bit rounding of the quotient to_nearest using the last bit of the quotient and the remainder. So 53 bits of the quotient are “exact” as per this rounding.

  4. mpz_get_d then converts the quotient to a double with truncation semantics. That’s OK since Step 3. already rounded the quotient to_nearest.

  5. ldexp then applies the exponent from Step 2 to the double calculated in Step 4.

So I think what needs to happen is that Step 3 code should be replaced by a switch statement with the 4 rounding mode cases. The existing code provides the FE_TONEAREST case which should be the most complicated.

Unless you want to take this on, or have a better idea, I guess I’m on the hook for generating a PR for this. Additiional test cases should also be generated, so may not be a trivial amount of work.

Thanks for the analysis. Would be great if you can have a look at this. Should I merge the current version into master as it is already better than we had or would you prefer to add the rounding code before it is merged?

Well forget about rational numbers for the moment and try
solving the problem for bigints first. Which is the topic for this
thread. The solution for bigints involves msb/1 and lsb/1.

And the variant 2 is the faster variant. In as far mpq_get_d_nearest
is also bad, Since it uses variant 1, this makes it a little slow, here
an annotation what is fast and what is slow:

Edit 25.09.2022
The lsb/1 can be used to determine HALF_EVEN rounding
in a border case. Basically the EVEN rule needs only
be applied if you have, i.e. all zeros:

/* Needs the EVEN rule */
x.5000000....

But if you find a 1 you can round up:

/* Doesn't need the EVEN rule */
x.50001001...

Imagine the period in “x.” determined from a bigint by shifting away
from msb/1 by some bits to determine the float mantissa in float(bigint).
But then the 1 can be far far away much more far away. You can

determine the 1 via lsb/1 and I guess its much faster than what
mpq_get_d_nearest does via divmod/4 essentially.

I suspect this is a regression with respect to rounding modes so I’d prefer you wait until either I can test that this isn’t the case or until I’ve fixed it.

Just to be clear, we are talking about conversion of bigints to floats, a topic you did raise in this thread:

It’s just that the same function is used for both rational and bigint conversion.

These are Prolog arithmetic functions on positive integers. For bigints they eventually use functions in GMP, so I’m not sure “much faster” is in play here. Right now, I’m more focused on getting it right for all roundings rather than optimization, but this is open source software.

At the moment, I don’t see that this gives a fast implementation of float/1
for a bigint argument. lsb/1 is faster, it doesn’t modify its bigint argument.
It has the following signature:

lsb : bigint -> smallint

for realization of float/1, you only need to call it in 50% of the cases,
when you need to decided whether the EVEN rule needs to be applied,
decide whether round up applies or doesn’t apply based on an other bit,

or when you can directly round up. But I don’t know how to do the round up
inside SWI-Prolog, based on the float number. The Java implementation
does it on the mantissa before assembling the float number. And the

mantissa is already a smallint, which makes it quite fast.

For the record: I’ve built the bigint-to-float fork and verified rounding is incoorect:

?- aggregate_all(count, (between(1,1000000,N), roundtoward(float(N**10),to_positive) > roundtoward(float(N**10),to_negative)), C).
C = 19.

Where rounding is involved (inexact conversion), this test should always succeed, so the value of C should be approaching 1000000. The failed cases are all “equal” indicating the rounding mode has no effect:

?- aggregate_all(count, (between(1,1000000,N), roundtoward(float(N**10),to_positive) =:= roundtoward(float(N**10),to_negative)), C).
C = 999981.

It depends on the math library … Here are the results using glibc on Linux.

101 ?- aggregate_all(count, (between(1,1000000,N), roundtoward(float(N**10),to_positive) > roundtoward(float(N**10),to_negative)), C).
C = 999941.

102 ?- aggregate_all(count, (between(1,1000000,N), roundtoward(float(N**10),to_positive) =:= roundtoward(float(N**10),to_negative)), C).
C = 59.

Not really what I wanted to hear.

When I add rounding mode support in mpz_fdiv I get the following results:

?- aggregate_all(count, (between(1,1000000,N), roundtoward(float(N**10),to_positive) > roundtoward(float(N**10),to_negative)),C).
C = 999670.

?- aggregate_all(count, (between(1,1000000,N), roundtoward(float(N**10),to_positive) =:= roundtoward(float(N**10),to_negative)),C).
C = 330.

I think I’ve convinced myself that 330 is probably the right number of exact conversion cases. The first 40 cases are small ints which can be exactly converted. But there are a number of test values over that threshold which can be exactly converted: N=42,44,46, ... 80,84,88, ... 800,832,864,....

I’d be interested in the 59 cases that the Linux implementation (without proper rounding support) finds and what the overlap is with the 330 values from above.

If It would appear that there’s something in the GMP library that’s sensitive to rounding mode. Not clear to me whether that’s due to how GMP is built (compiler options?) or whether it’s a difference in the underlying versions of glibc.

Here’s a preliminary view of the modified code in mpz_fdiv; it replaces the current rounding code:

Rounding Mode support fragment for `mpz_fdiv`
  switch(fegetround())
  {
    case FE_TONEAREST:
  if ( mpz_tstbit(aa, 0) == 0 )
  {
  } else if (mpz_cmp_ui(bb, 0) != 0)
  { if ( mpz_sgn(aa) > 0 )
      mpz_add_ui(aa, aa, 1);
    else
      mpz_sub_ui(aa, aa, 1);
  } else /* mid case: round to even */
  { if (mpz_tstbit(aa, 1) == 0)
    { if (mpz_sgn(aa) > 0)
	mpz_sub_ui(aa, aa, 1);
      else
	mpz_add_ui(aa, aa, 1);
    } else
    { if (mpz_sgn(aa) > 0)
	mpz_add_ui(aa, aa, 1);
      else
	mpz_sub_ui(aa, aa, 1);
    }
  } 
  break;
  
    case FE_UPWARD:
  mpz_add_ui(aa, aa, 1);
  break;
  
    case FE_DOWNWARD:
  mpz_sub_ui(aa, aa, 1);
  break;
  
    case FE_TOWARDZERO:
  if ( mpz_sgn(aa) > 0 )
    mpz_sub_ui(aa, aa, 1);
  else
    mpz_add_ui(aa, aa, 1);

  }  /* switch(fegetround()) */

You can use lsb/1 to find these cases. lsb/1 allows you to decompose
a number X into M*2^N, where N = lsb(X):

X = M*2^N                 for              N = lsb(X)

And then X can be converted to float exactly if M is a smallint. Here is
a test. And your count 330 seems to be correct, I get the same:

?- aggregate_all(count, (between(1,1000000,Y), X is Y^10, 
N is lsb(X), M is X>>N, M<(1<<53)), C).
C = 330.

I hope you start liking lsb/1, it can give you an ultrafast float/1 for bigints.
Not directly because of the above decomposition, but because
you can implement the “Rounding Mode support fragment”, in the

case of bigint, and not in the case of a rational number, much
much faster by means of using lsb/1 to decide the rounding.

1 Like

The whole thing doesn’t seem to be trivial. There is a whole discussion on this for Julia at BigInt to Float conversions are unnecessarily slow · Issue #31293 · JuliaLang/julia · GitHub

My thought was that ok, we have something for rationals that seems to work fine. If it is, it should also work for integers as these are just a special case for rationals. Then you can obviously simplify the implementation as a lot of work on the denumerator is known as this is 1.

Anyway, I welcome first of all a correct implementation. Seems @ridgeworks is getting there and then a fast one. I would be surprised if a slow conversion from bigint to float would affect many applications. Anyone who needs this can implement it and send a PR :slight_smile: