Rounding error in (**)/2

Not what I hoped:

102 ?- aggregate_all(count, (between(1,1000000,N), roundtoward(float(N**10),to_positive) > roundtoward(float(N**10),to_negative)),C).
C = 19.

103 ?- aggregate_all(count, (between(1,1000000,N), roundtoward(float(N**10),to_positive) =:= roundtoward(float(N**10),to_negative)),C).
C = 999981.

Except for the float divisions and ldexp(), all the code talks about integers. We should not round those :slight_smile:

Merged into the bigint-to-float branch. I can confirm that I get the same results on the two queries on AMD64/Linux.

Instead of shifting X and materializing M, you can simply use msb/1,
to compute the bit width W of M. Gives the same result:

?- aggregate_all(count, (between(1,1000000,Y), X is Y^10,
     W is msb(X)-lsb(X), W<53), C).
C = 330.

Maybe this explains what msb/1 and lsb/1 can do for you, concerning the
float conversion of a bigint. But I am affraid, you might also need getbit/2.

Edit 27.09.2022
I think @ridgeworks had expressed reservation towards
these operations. But I read for example here:

Function getbit/2
The result is equivalent to (IntExprV >> IntExprI)/\1 , but more
efficient because materialization of the shifted value is avoided.
https://www.swi-prolog.org/pldoc/doc_for?object=f(getbit/2)

These functions all do not materialize a new bigint. Therefore they are fast.
Subsequently a float/1 implementation based on these operations can become
quite speedy. No memory allocation involved at all.

You can try yourself, see the speed difference:

/* With Materialization, even took the constant out */
?- K is 1<<53, time((between(1,1000000,Y), X is Y^10,
       N is lsb(X), M is X>>N, M<K, fail; true)).
% 5,000,001 inferences, 1.344 CPU in 1.335 seconds (101% CPU, 3720931 Lips)

/* Without Materialization */
?- time((between(1,1000000,Y), X is Y^10,
       W is msb(X)-lsb(X), W<53, fail; true)).
% 4,000,001 inferences, 0.812 CPU in 0.807 seconds (101% CPU, 4923078 Lips)

Just to confirm that “same” means the same as my results on Mac-Intel, i.e., number of =:= cases is 330?

Yes. Need to fill 10 characters :slight_smile:

Let’s be clear - so-called bigints in SWIP are GMP’s multi-precision integers represented by the opaque C data type mpz_t. As such there isn’t any lsb, msb, or getbit but there are functions to return the size of the integer value (in bits) or to test any particular bit so there is similar functionality. And these are the functions that are used in converting MPZ integers to floats in mpz_div.

I don’t see any bitshift functions but it’s a large API so I may have missed something. Rounding is done by carrying an extra bit of precision, adding or subtracting 1, and then truncating (using mpz_get_d). I don’t see any other options at the moment. In fact, I didn’t see this one until Jan found it on the GMP forum.

Current performance on my Mac-Intel test version:

?- time((between(1,1000000,_),_ is 42,fail;true)).  % Base line arithmetic overhead
% 2,000,000 inferences, 0.128 CPU in 0.128 seconds (100% CPU, 15677544 Lips)
true.

?- current_prolog_flag(max_tagged_integer,MTI),time((between(1,1000000,_),_ is float(MTI),fail;true)).  % tagged integer
% 2,000,000 inferences, 0.168 CPU in 0.168 seconds (100% CPU, 11912278 Lips)
MTI = 72057594037927935.

?- Int is 2**62-1,time((between(1,1000000,_),_ is float(Int),fail;true)).  % 64 bit integer
% 2,000,000 inferences, 0.173 CPU in 0.173 seconds (100% CPU, 11580038 Lips)
Int = 4611686018427387903.

?- Int is 2**65-1,time((between(1,1000000,_),_ is float(Int),fail;true)).  % MPZ 
% 2,000,000 inferences, 0.813 CPU in 0.813 seconds (100% CPU, 2458748 Lips)
Int = 36893488147419103231.

So the cost of floating a “normal” integer is around 40 ns. The cost of floating a MPZ bigint is around 700 ns. Jan is right in stating that some improvement in the MPZ case is possible assuming a denominator of 1, and also that this is unlikely to make much practical difference. But investigating this is on my short list.

If you check out ar_shift() you see it uses mpz_mul_2exp() for << and mpz_fdiv_q_2exp for <<, i.e., GMP disguises their bit shifts as arithmetic operations. These are also used by the current mpq → float conversion.

Note that for timing with less overhead you should put the body in a clause and use swipl -O. As is, it performs a quick-and-dirty compilation. If the bigints get big enough it doesn’t make a lot of difference :slight_smile:

It might make a fair difference as GMP numbers are rather expensive to allocate and free. Probably about half of the code can go and I’d expect it to be at least twice as fast.

Shall I merge the current version? As I understand it, it provides good rounding for rationals and good (but a bit slow) for bigints. That is IMO an improvement.

Correct me if I’m wrong, but wouldn’t that affect all the tests equally so subtracting out the baseline case should compensate. But in general, I agree. Is there a list somewhere of all the optimizations that -O does? For example, I think it also affects calls to debug.

I think a n approximate limit would limit be the exact rational case which I measure at 416-128 = 288 or about 2.5 times faster. But I’d still have to allocate 1 GMP number, so I would guess no better than 2. If this is significant enough, it’s probably worth spitting it out into a separate function so as to not burden the existing mpz_fdiv with yet more special cases.

Agreed. It seems to fix the monotonic rational problem (Issue #1022) as well as more precise rounding on MPZ ints. I’ll generate a PR for a few additional IEEE754 tests for this modified float functionality in the near future.

1 Like

Not entirely clear. The most important thing -O does is to compile arithmetic into a stack engine using virtual machine instructions rather than just creating the term to evaluate and call the foreign predicate is/2 to do so.

Yes please (and as two distinct functions).

Thanks!

You don’t need any bitshift functions to implement float/1.
Its really trivial, its only an if-then-else:

The shifting is already done in mpz_get_d. You only need to
nudge the result to perform the rounding. This is surely faster

than shifting and adding and then calling mpz_get_d. Because
if you modify x, you cause materialization:

/* alternative idea */
function float(x) {
      retiurn mpz_get_d(x+(1<<(msb(x)-53));
}

But you can try yourself, which algorithm is faster. I am pretty sure
an algorithm without some materialization is faster, also the above

probably does only HALF_UP and not HALF_EVEN. To
get HALF_EVEN you need also do some lsb/1 check.

Here is a 100% pure Prolog implementation of HALF_EVEN, without
any materialization, it assumes that the float/1 does only mpz_get_d:

float_half_even(X,Y) :- 
     (getbit(X, msb(X)-53) =:= 0; 
      getbit(X, msb(X)-52) =:= 0, msb(X)-53 =:= lsb(X)), !, Y is float(X).
float_half_even(X,Y) :-
     Y is nexttoward(float(X), 1E300).

Was using 1E300 for infinity, is there something else that would work?
float_half_even/2 is just a translation of the following into Prolog:

public double doubleValue() {

https://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/math/BigInteger.java#l3871

Here some test whether it agrees with BigInteger.doubleValue(), first testing
the old version of SWI-Prolog, where float is mpz_get_d:

/* SWI-Prolog 8.5.17 */
?- between(1,10,N), X is 10^20+N*2^13,
float_half_even(X,Y), write(Y), nl, fail; true.
1.0e+20
1.0000000000000002e+20
1.0000000000000003e+20
1.0000000000000003e+20
1.0000000000000003e+20
1.0000000000000005e+20
1.0000000000000007e+20
1.0000000000000007e+20
1.0000000000000007e+20
1.0000000000000008e+20

Comparing with a Prolog system that calls BigInteger.doubleValue():

/* Jekejeke Prolog 1.5.4 */
?- between(1,10,N), X is 10^20+N*2^13,
Y is float(X), write(Y), nl, fail; true.
1.0E20
1.0000000000000002E20
1.0000000000000003E20
1.0000000000000003E20
1.0000000000000003E20
1.0000000000000005E20
1.0000000000000007E20
1.0000000000000007E20
1.0000000000000007E20
1.0000000000000008E20

Yeah, theay agree! Riddle solved, the misterious veil of the
secret souce behind an ultrafast float/1 for bigint has been lifted.

Today I wanted to do some testing with a more elaborate version,
still no negative numbers, but smallints also handled:

float_half_even(X,Y) :-
    M is msb(X),
    float_half_even(M, X, Y).

float_half_even(M, X,Y) :- 
     (M < 53;
      getbit(X, M-53) =:= 0; 
      getbit(X, M-52) =:= 0, M-53 =:= lsb(X)), !, Y is float(X).
float_half_even(_, X,Y) :-
     Y is nexttoward(float(X), 1E300).

I wanted redo some aggregate_all(count). But I banged my head
here. So a pure 100% Prolog testing got a little bit obstructed.

Not sure whether there is a Prolog predicate that maps to mpz_get_d?
Can I make a foreign predicate declaration or something?

inf, but only if float overflows are not mapped to exceptions as demanded by ISO :frowning:

In old versions this is the float function, but only for integers larger than 64 bits. For values up to 64 bits it is simply the C (float) cast on int64_t that does the job.

Thanks for the pure Prolog solution. I’ll leave this to @ridgeworks

Its really a combination of two problems. float_half_even/2
doesn’t help that much. I could make it run via the mpz_get_d/2
simulation here. And then I can measure:

?- aggregate_all(sum(S), (between(1,1000000,N),
    S is abs(N^10-float(N)**10)), T).
T = 9.901018060037038e+48.

?- aggregate_all(sum(S), (between(1,1000000,N),
    A is N^10, float_half_even(A,B), S is abs(B-float(N)**10)), T).
T = 7.871954492015358e+48.

Whereas JDK 19 gives, aggregate error is 1000 times smaller:

?- aggregate_all(sum(S), (between(1,1000000,N),
  S is abs(N^10-float(N)**10)), T).
T = 7.00914689466423E45.

So on the windows platform it must be some low quality math.h
library. But I am not sure whether ar_pow() even uses some math.h
library. One value that has a problem is for example this here:

/* SWI-Prolog 8.5.17 */
?- X is 978400.0**10.0.
X = 8.038304251093646e+59.

But if it were to use math.h, it would be:

#include <math.h>
printf("X=%.15e", pow(978400.0, 10.0));
X=8.038304251093644e+59

https://www.onlinegdb.com/online_c_compiler

I can use this to validate my float_half_even/2, seems to be ok:

?- X is 978400^10, Y is float(X).
X = 803830425109364419968771652984040429977600000000000000000000,
Y = 8.038304251093643e+59.

?- X is 978400^10, float_half_even(X,Y).
X = 803830425109364419968771652984040429977600000000000000000000,
Y = 8.038304251093644e+59.

So the value is already a next after from 43 to 44. How the heck
does it even next after one more time from 44 to 46?

I’ve implemented an optimized float for GMP bigints based on @j4n_bur53 's suggestion of using mpz-get-d to produce a truncated float and then applying rounding using the rounding mode and MPZ bit level tests, so no allocation required. Precision and rounding mode tests look good to me:

?- aggregate_all(count, (between(1,1000000,N), float(N)**10 =\= float(N^10)), C).
C = 1777.

?- aggregate_all(count, (between(1,1000000,N), -float(N)**10 =\= float(-(N^10))), C).
C = 1763.

?- aggregate_all(count, (between(1,1000000,N), roundtoward(float(-(N**10)),to_positive) > roundtoward(float(-(N**10)),to_negative)), C).
C = 999670.

?- aggregate_all(count, (between(1,1000000,N), roundtoward(float(-(N**10)),to_positive) =:= roundtoward(float(-(N**10)),
C = 330.

?- aggregate_all(count, (between(1,1000000,N), roundtoward(float((N**10)),to_positive) > roundtoward(float((N**10)),to_negative)), C).
C = 999670.

?- aggregate_all(count, (between(1,1000000,N), roundtoward(float((N**10)),to_positive) =:= roundtoward(float((N**10)),to_negative)), C).
C = 330.

On the performance side, here are results for floating small and big ints, and small and big rationals on MacOS-Intel:

?- set_prolog_flag(optimise,true), assert(floatN(N) :- _ is float(N)).
true.

?- N is 42,time((between(1,1000000,_),floatN(N),fail;true)).
% 2,000,000 inferences, 0.167 CPU in 0.167 seconds (100% CPU, 11952786 Lips)
N = 42.

?- N is 2^65+1,time((between(1,1000000,_),floatN(N),fail;true)).
% 2,000,000 inferences, 0.360 CPU in 0.360 seconds (100% CPU, 5554028 Lips)
N = 36893488147419103233.

?- N is 4r3,time((between(1,1000000,_),floatN(N),fail;true)).
% 2,000,000 inferences, 0.534 CPU in 0.534 seconds (100% CPU, 3748140 Lips)
N = 4r3.

?- N is (2^65+1) rdiv 2^65,time((between(1,1000000,_),floatN(N),fail;true)).
% 2,000,000 inferences, 1.092 CPU in 1.092 seconds (100% CPU, 1831623 Lips)
N = 36893488147419103233r36893488147419103232.

Also, for Mac-Intel:

?- X is 978400.0**10.0.
X = 8.038304251093644e+59.

?- X is 978400^10, Y is float(X).
X = 803830425109364419968771652984040429977600000000000000000000,
Y = 8.038304251093644e+59.

?- aggregate_all(sum(S), (between(1,1000000,N),
|      S is abs(N^10-float(N)**10)), T).
T = 2.4540871481165494e+46.

Any other suggestions for tests before I generate a PR?

1 Like

Conscerning systematic unit tests for float : int → real, taking this code:

float_half_even(X,Y) :-
    M is msb(X),
    float_half_even(M, X, Y).

float_half_even(M, X,Y) :- 
     (M < 53;
      getbit(X, M-53) =:= 0; 
      getbit(X, M-52) =:= 0, M-53 =:= lsb(X)), !, Y is mpz_get_d(X).
float_half_even(_, X,Y) :-
     Y is nexttoward(mpz_get_d(X), 1E300).

And since the logic programming (;)/2 and (,)/2 before a cut are all
short circuits, it seems to me, there would be at least 5 test cases,
to arrive at 100% code coverage. So you need to pick 5 different int

values, for each row, a value that satisfies the condition:

Condition Action
M<53 getbit(X, M-53) =:= 0 getbit(X, M-52) =:= 0 M-53 =:= lsb(X) mpz_get_d(X) nexttoward(mpz_get_d(X), 1E300)
T X
F T X
F F T T X
F F T F X
F F F X

But you could also use fuzzy testing, i.e. use random test cases.
I don’t know whether SWI-Prolog PlUnit supports fuzzy testing. Also
difficult to establish an expectation reference. Logtalk has some QuickCheck

utility. Fuzzy testing could be done, by dumping some test cases, and
evaluating them in an other Prolog or Non-Prolog system, and comparing
the results. And a coverage tool could be used, to verify that the fuzzer

reached the same 100% code coverage. An error source for float/1 could
be that M is wrongly computed +/- 1, and even systematic testing could
slip such an error, if the int pattern satisfies some skewed conditions by

chance that lead to the same actions by chance. On the other hand a
fuzzer, might find out about this, since he would use more test cases.

Ok, but windows still does it differently? This is a very strange
test case, that a fuzzer possibly cannot find. It was found via
sweeping, i.e. between(1,1000000,_) was used.

BTW: I was digging a little deeper, I found a couple of
Prolog systems which all prefer 8.038304251093644e+59 on
the windows platform, except for one Prolog system (ECLiPSe):

/* Ciao Playground 1.21.0, WASM, Windows */
?- X is 978400.0**10.0.
X = 8.038304251093644e59 ? 

/* SWI-Prolog 8.5.17, WASM, Windows */
?- X is 978400.0**10.0.
X = 8.038304251093644e+59.

/* ECLiPSe Prolog, 7.0.61, Windows */
?- X is 978400.0**10.0.
X = 8.0383042510936461e+59

I really wonder whats going on?

Sounds good. Please note that I did merge the bigint-to-float branch. I think that is still ok as that includes the mpq_to_double() patches we want anyway. Your code should add a fast-path implementation for mpz_to_double() that now uses mpq_to_double().

8.038304251093644e+59 is correct, no? Using integers we get (_ for alignment)
8_038304251093644199...

SWI-Prolog gives the same result as Windows for the float version. I guess because both are AFAIK compiled with MinGW and the MinGW math library is not great when it comes to precision and rounding.

Little clue what you are measuring with WASM. A search for “wasm pow” gives various results that I can’t interpret quickly.

1 Like

SWI-Prolog 8.5.17 on Windows without WASM gives,
same glitch in the Matrix as in ECLiPSe Prolog:

?- X is 978400.0**10.0.
X = 8.038304251093646e+59.

?- X is 8.038304251093646e+59-8.038304251093644e+59.
X = 1.78405961588245e+44.

But with WASM on Windows, to be precise with Chrome
(Version 105.0.5195.127 (Official Build) (64-Bit)) I get
a different result. Don’t know what FireFox does:

https://dev.swi-prolog.org/wasm/shell

Edit 03.10.2022
Ok Firefox (Version 105.0.1 (64-Bit)) on Windows agrees:

https://dev.swi-prolog.org/wasm/shell

This kind of #define should always be avoided as it is undefined behavior according to C standard (see C11 7.1.3p1 and J.2 item 106).