Test results new floats

Was lazy making manual test cases, so used a fuzzer.
So far only testing float/1 with bigint argument from
new SWI-Prolog 8.5.18. The fuzzer test cases might not

be that good, especially for HALF_EVEN, since they might
not pick up the border cases, but nevertheless its a first start.
Might do more test cases. The test cases are simply:

cases.p.log (7,9 KB)

My Prolog systems, formerly Jekejeke Prolog and Dogelog
Player on its various target platforms, all pass these test cases.
Here are some very first results for other Prolog systems:

/* SWI-Prolog 8.5.18 (Windows and WASM) Ok */
?- case(N, X, Y), Y =\= float(X).
false.

/* Trealla Prolog 2.4.3 Ok */
?- case(N, X, Y), Y =\= float(X).
false.

/* Scryer Prolog 0.9.0 Nok */
?- case(N, X, Y), Y =\= float(X).
   N = 1, X = -572504891324561953821040518484, Y = -5.7250489132456196e29
;  N = 3, X = 884996183305110611102854483978, Y = 8.849961833051106e29
;  N = 4, X = 1251939306673717603656775488197, Y = 1.2519393066737177e30
Etc..

So somehow these test cases already do their job!

Edit 10.10.2022
More results:

/* Ciao Prolog 1.22.0 (WSL and WASM) Ok */
?- case(N, X, Y), Y =\= float(X).
no

/* ECLiPSe Prolog 7.0.61 Nok */
?- case(N, X, Y), Y =\= float(X).
N = 1
X = -572504891324561953821040518484
Y = -5.7250489132456196e+29
Etc..

Now I found a test case where SWI-Prolog fails on Windows
platform. Was using a slightly different fuzzer, eh voila I found:

/* SWI-Prolog 8.5.18 Nok */
?- case2(N, X, Y), Z is float(X), Y =\= Z.
N = 56,
X = -1190359501396335678156200476672,
Y = -1.1903595013963357e+30,
Z = -1.1903595013963356e+30 ;
false.

The test case is Ok in all my systems, like formerly
Jekejeke Prolog and Dogelog Player. The test
case is also Ok here:

/* Trealla Prolog 2.4.3 Ok */
?- case2(N, X, Y), Y =\= float(X).
   false.

/* Ciao Prolog 1.22.0 Ok */
?- case2(N, X, Y), Y =\= float(X).
no

Edit 10.10.2022
The error is some asymmetry in the sign:

/* SWI-Prolog 8.5.18 */
?- X is float(-1190359501396335678156200476672).
X = -1.1903595013963356e+30.

?- X is float(1190359501396335678156200476672).
X = 1.1903595013963357e+30.

But I guess the IEEE rounding applied to bigints, would
require a more symmetric behaviour. If I remember well,
when discussing my prototype float_half_even/2 this was

also noted, that there is a missing case for negative numbers as
well. But I don’t remember what the conclusion was how to
handle negative numbers and what the test cases would be?

Yes, rounding “to nearest” in the negative case isn’t correct, at least for this value. I’ll investigate.

?- test_ieee754:rounding(float(-1190359501396335678156200476672),r(Rc,Rp,Rn,Rz)).
Rc = Rp, Rp = Rz, Rz = -1.1903595013963356e+30,
Rn = -1.1903595013963357e+30.

?- test_ieee754:rounding(float(1190359501396335678156200476672),r(Rc,Rp,Rn,Rz)).
Rc = Rp, Rp = 1.1903595013963357e+30,
Rn = Rz, Rz = 1.1903595013963356e+30.

Also it seems to me float/1 is still running with breaks in SWI-Prolog.
On my machine I get the following measurement, which could be
due that float/1 for bigint still uses the detour over rational numbers?

/* SWI-Prolog 8.5.18 */
?- time((between(1,50000,_), case(N, X, Y), Y =\= float(X), fail; true)).
% 5,100,003 inferences, 2.469 CPU in 2.459 seconds (100% CPU, 2065824 Lips)
true.

?- time((between(1,50000,_), case2(N, X, Y), Y =\= float(X), fail; true)).
% 5,100,001 inferences, 2.438 CPU in 2.429 seconds (100% CPU, 2092308 Lips)
true.

/* Jekejeke Prolog 1.5.5 */
?- time((between(1,50000,_), case(N, X, Y), Y =\= float(X), fail; true)).
% Threads 859 ms, GC 0 ms, Up 875 ms (Current 10/11/22 03:00:27)
true.

?- time((between(1,50000,_), case2(N, X, Y), Y =\= float(X), fail; true)).
% Threads 860 ms, GC 1 ms, Up 857 ms (Current 10/11/22 03:00:30)
true.

Unfortunately my pure Prolog implementation is not competetive.
I guess it will also not work fast enough in SWI-Prolog, here is the
throughput of the emulated function:

/* Jekejeke Prolog 1.5.5 */
?- time((between(1,50000,_), case(N, X, Y),
   float_half_even(X,Z), Y =\= Z, fail; true)).
% Threads 12,016 ms, GC 73 ms, Up 12,244 ms (Current 10/11/22 03:06:15)
true.

?- time((between(1,50000,_), case2(N, X, Y),
   float_half_even(X,Z), Y =\= Z, fail; true)).
% Threads 11,312 ms, GC 70 ms, Up 11,490 ms (Current 10/11/22 03:05:58)
true.

I guess the emulated float for bigint, would only beat an emulated
float with detour over rational numbers, but otherwise not beat something
more native. So correct and fast remains a challenge!

Edit 11.10.2022
Trealla Prolog is also not extremly fast. But speed depends also on
how facts with bigints are invoked. In Jekejeke Prolog there is pointer
sharing with the bigint in the fact, so there is no copying.

/* Trealla Prolog 2.4.3 WSL2 */
?- time((between(1,50000,_), case(N, X, Y), Y =\= float(X), fail; true)).
Time elapsed 1.34s
   true.

?- time((between(1,50000,_), case2(N, X, Y), Y =\= float(X), fail; true)).
Time elapsed 1.32s
   true.

Dogelog Player does the same pointer sharing, it can almost compete
with SWI-Prolog, due to the breaks in float/1:

/* Dogelog Player 1.0.3 */
?- time((between(1,50000,_), case(N, X, Y), Y =\= float(X), fail; true)).
% Wall 3363 ms, gc 3 ms, 1531449 lips
true.

?- time((between(1,50000,_), case2(N, X, Y), Y =\= float(X), fail; true)).
% Wall 3331 ms, gc 1 ms, 1546161 lips
true.

Assuming you test on Windows, GMP performance seems poor. I get 2.1 sec on Windows, 1.1 on Linux, same hardware. Here is the time breakdown according to valgrind.

For the actual conversion to double we get

1 Like

Reimplemented “to_nearest” rounding and it now passes all 100 test cases in cases.p plus the failed case reported. Any additional test data is welcome.

This is a completely separate implementation from rational number conversion and requires no additional bigints to be created. But it does involve 4-5 calls to GMP library functions. Execution time on my MacOS Intel config is 1.7 sec for 50000 reps (5,000,000 calls to =\=).

Great! I guess I’ll see a PR soon?

How do other Prolog systems react on these new test cases,
the case3/3 facts? I am still on the sunny side with Dogelog Player,
does pass in command lines (JavaScript and Python) and in browsers

(Chrome and Firefox). Also these Prolog systems still agree:

/* Trealla Prolog 2.4.3 Ok (on WSL2) */
?- case3(N, X, Y), Y =\= float(X).
   false.

/* Ciao Prolog 1.22.0 Ok (on WASM) */
?- case3(N, X, Y), Y =\= float(X).
no

Thanks. I’ve added these to my case.pl tests and they now all (total of 107) pass using my current experimental version. It would seem I can’t use your fuzzers to generate my own tests on SWIP because it uses the same software to generate the tests, so it would always succeed.

Hope to do this in the next couple days pending other “surprises”.

1 Like

It depends, you could use the emulated version float_half_even/2 to
generate the test cases. I posted it somewhere on discourse SWI-Prolog.
Actually I posted two version now, one with ulp/1 and one with nexttoward/2.

But I didn’t do a re-validation yet against case/3, case2/3 and case3/3 with
the SWI-Prolog version of float_half_even/2.

Edit 13.10.2022
This seems to work:

emulatorswi.p.log (2,3 KB)

It can be used to find test cases:

?- case3(N, X, Y), float_half_even(X,Z), Y =\= Z.
false.

?- repeat, L is -(1<<57), H is (1<<57)+1, random(L, H, X), Z is X<<43,
float_half_even(Z, Y), Y =\= float(Z).
Z = -374221666956444521791366365184;
Z = 459815319774385442098847416320;
Etc..

Some timings for Ciao Prolog:

Ciao 1.22-v1.21-397-g5bb5c6fb53 (2022-09-29 10:17:58 +0200) [DARWINx86_64]
?- use_module('cases.p.log').

yes
?- time((between(1,50000,_), case(N, X, Y), Y =\= float(X), fail; true)).
% 0.766826 seconds

yes
?- time((between(1,50000,_), case2(N, X, Y), Y =\= float(X), fail; true)).
% 0.7644180000000002 seconds

yes
?- time((between(1,50000,_), case3(N, X, Y), Y =\= float(X), fail; true)).
% 0.7597860000000001 seconds

I could retest the floating point, redoing my test suite,
with SWI-Prolog 8.5.20. Most of the errors are gone.
On Windows there is still a (**)/2 glitch, mainly

when the exponent is negative. Some test cases
require Prolog flag iso to be set to true:

/* SWI-Prolog 8.5.20 */
case, swi: 0
case2, swi: 0
case3, swi: 0
case4, swi: 0
case5, swi: 0
case6, swi: 49

Here is a negative exponent that still differs:

/* SWI-Prolog 8.5.20 */
?- X is -2705** -4.0.
X = 1.867802370428037e-14.

/* http://numcalc.com/ */
> \p f64 
> (-2705)**-4.0
1.8678023704280373e-14

But this is really a nasty glitch, because the strange
thing is, that even with Prolog flag iso set to true, in
principle SWI-Prolog could do it?

/* SWI-Prolog 8.5.20 */
?- X is 1/(-2705 ^ 4).
X = 1.8678023704280373e-14.

Due to platform dependant libm. On my Mac Intel:

?- X is -2705** -4.0.
X = 1.8678023704280373e-14.

Note that libm pow() will be used in evaluating -2705** -4.0 but not 1/(-2705 ^ 4).

1 Like

Should raise an issue from wherever this library comes from.
What is its origin exactly? I have heard MinGW, is this what is use?

This is the source code of the microsoft pocket calculator desk
acessory. Often pow() branches into pown() for an integer exponent:

EXPLANATION: This uses x^y=e(y*ln(x)), or a
more exact calculation where y is an integer
https://github.com/microsoft/calculator/blob/78cd6d52da65983f5c0f8f0859126e9ccf701d6f/src/CalcManager/Ratpack/exp.cpp

Does MinGW have pown()? But I am afraid,
this calculator gives a lot of digits:

1.8678023704280373400344119736524e-14

but the routines can be possibly called with lower desired precision?
It seems the code shows how to bootstrap floats form rational numbers.

Edit 19.10.2022
What is the pown() function? See also IEEE:

Too much power - pow vs powr, powd, pown, rootn, compound
754-2008 set out to rectify this by dividing the power functions into
those with a floating-point power (pow and powr - and in 2019 - powd),
those with an integer power (pown and compound),
and those whose power was a reciprocal of an integer (rootn).
https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/power.txt

Explanation why case/3, case2/3 and case3/3 are that fast,
and maybe also case4/4 and case5/4, in Jekejeke Prolog,
namely I guess because:

  • Structure Sharing: bigints need not to be moved around,
    so in the test cases the Java BigInteger values are accessed
    simply by a pointer reference.

  • lsb/1 Caching: The doubleValue() function, used for float/1,
    of Java BigInteger calls getLowestSetBit(), which does cache
    the compute valued inside a field lowestSetBit. The Java BigInteger
    class is quite heavy and has a few such cache fields.

The structure sharing multiplies this lsb/1 caching. The lsb/1
caching is written into the test case Java BigInteger values,
similar like a just in time index is computed at runtime. It is not

that a lsb/1 value is computed for an intermediate result. So
different test cases would probably show a different performance
profile if more intermediate values would appear. On the other

hand the caching is quite suitable for bigint constants appearing
in a Prolog text. Maybe Dogelog Player is slower because nodeJS
doesn’t have the same. Ciao Prolog performs well!

Interesting. On Windows? At least for SWI-Prolog it is rather dependent on how you execute this on what. First of all, meta-calling does not do optimization of arithmetic. We can also implement this as:

:- set_prolog_flag(optimise, true).

swi_perf2 :-
    forall(between(1,6,Case), time_case(Case)).

time_case(Case) :-
    format('case~w, swi: ', [Case]),
    time((between(1,20000,_), case(Case), fail ; true)).

case(1) :- case(_, X, Y), Y =\= float(X).
case(2) :- case2(_, X, Y), Y =\= float(X).
case(3) :- case3(_, X, Y), Y =\= float(X).
case(4) :- case4(_, P, Q, Y), Y =\= float(P)/Q.
case(5) :- case5(_, P, Q, Y), Y =\= float(P)/Q.
case(6) :- case6(_, P, Q, Y), Y =\= float(P)**Q.

Now, we get the results in the table below, where the first should be yours except for different hardware. The measurements are not repeating very well. Some runs some differences up to about 5%, so it is not very bad.

Condition case case2 case3 case4 case5 case6
Windows, swi_perf 0.780 0.830 0.820 0.470 0.450 0.690
Windows, swi_perf 2 0.660 0.710 0.770 0.840 0.830 0.360
Linux, swi_perf 0.390 0.379 0.390 0.382 0.379 0.914
Linux, swi_perf 2 0.317 0.284 0.291 0.353 0.370 0.252

This is a bit hard to interpret. Comparing for example the CHAT80 benchmark, we see Linux 0.654 vs Windows 0.840, a lot closer than these arithmetic tests. Both the Windows and Linux versions are compiled with GCC 11, using the same optimization options, so that can’t be it. The measurements for Windows below are using Wine under Linux, assuming that the OS doesn’t matter much for a CPU bounded task. Indeed, running the same under a Windows 11 VM gives roughly the same result (a little worse). Looks as if libGMP is really slow on Windows. That too is rather strange as it is the version that ships with Fedora Linux as part of the MinGW cross-compiler suite and I assume compiled with the same MinGW cross compiler.

On my machine Windows and WSL2
(Ubuntu 22.04) compare as follows:

System case case2 case3 case4 case5 case6
swi 8.4.2 WSL2 421 417 420 564 604 530
swi 8.5.20 Windows 989 941 979 528 532 861

But the case/3, case2/3 and case3/2 test cases are slower
than Jekejeke Prolog, even on WSL2. Maybe it is the
lowestSetBit caching and a very light doubleValue()

implementation, that makes a dent? Don’t know,
was just recalling that Java does this, noticed this only
now. You would need to run other test cases, that do not

depend on this caching, to get a different picture. If
lsb/1 is not cached, you get a bit scan in around 25% of
the bigints. With caching this bit scan is negligible.

But swi 8.4.2 does anyway only rounding DOWN and
no HALF_EVEN? So maybe there are other brakes around,
like copying the bigint, the pointer sharing gives speed.

Just another data point - SWIP 8.5.20 on Mac-Intel:

?- swi_perf.
case, swi: 
% 2,040,001 inferences, 0.706 CPU in 0.707 seconds (100% CPU, 2888141 Lips)
case2, swi: 
% 2,040,001 inferences, 0.689 CPU in 0.689 seconds (100% CPU, 2962500 Lips)
case3, swi: 
% 2,040,001 inferences, 0.702 CPU in 0.702 seconds (100% CPU, 2905194 Lips)
case4, swi: 
% 2,040,001 inferences, 0.583 CPU in 0.583 seconds (100% CPU, 3499654 Lips)
case5, swi: 
% 2,040,001 inferences, 0.587 CPU in 0.587 seconds (100% CPU, 3476638 Lips)
case6, swi: 
% 2,040,001 inferences, 2.643 CPU in 2.643 seconds (100% CPU, 771979 Lips)
true.

The big outlier here is case6 (**). Compiler or libm ?

@j4n_bur53 runs these tests with the Prolog flag iso set to true, which does float pow() rather than bigint. That makes quite a difference in performance.

First is likely to be an issue for SWI-Prolog. There is a lot of copying in bigint handling. These GMP objects are serialized to be placed on the Prolog stacks as well as when stored inside clauses. That makes object lifetime management easy, but at the cost of a lot of copying. Possibly it would be faster to handle them as external resources and get rid of them using (dedicated) garbage collection.

I very much doubt lsb caching does much on these tests. You bit ints are pretty small. I assume libGMP walks over the limbs (64 bits) until it finds a non-zero one and then uses either assembly or one of the GCC built-ins to find the lsb of the first limb it finds to be non-zero. Msb starts at the know size and thus skips no/few zeros and lsb could in theory have to scan a lot of zeros, but not on these test cases.

libGMP is great in dealing with really large numbers and carefully handling (de) allocation in C/C++. It is not that great for fairly small “big ints” embedded in some language :frowning:

As for Ciao, it doesn’t appear to link to any (dynamic) bigint library. Possibly they implemented their own? If you do so it should be easy enough to win for small bigints against libGMP.