Making the case for rational numbers ("Rump example")

For those who may be interested in mathematically correct arithmetic, here’s one case for supporting the use of SWIP’s rational numbers.

I recently came across a paper (https://www.academia.edu/25498082/Formal_Aspects_of_Correctness_and_Optimality_of_Interval_Computations) which used the supposedly well known Rump example to demonstrate one of the pitfalls of floating point arithmetic. The purpose of the example was to promote the use of interval arithmetic (e.g., as implemented in pack(clpBNR)) for reasons of correctness.

The example was to compute:

y = 333.75b^6 + a^2(11a^2b^2 - b^6 - 121b^4 - 2) + 5.5b^8 + a/(2b)
for a = 77617.0 and b = 33096.0.

Rump computed this function in an IBM S/370 main frame, he used single, double and extended precision arithmetic for that and obtained the following results:

  1. single precision: y = 1.172603 …;
  2. double precision: y = 1.1726039400531 …;
  3. extended precision: y = 1.172603940053178 …;

All results lead any user to conclude that IBM S/370 returned the correct result. However this result is WRONG and the correct result lies in the interval
-0.82739605994682135 ± 5e-17

A subsequent paper published in 2002 (https://www.researchgate.net/publication/225180314_Rump’s_Example_Revisited) reproduced the result on modern computers using IEEE 754 arithmetic.

Using SWI-Prolog (C) double floating point arithmetic to compute the Rump example:

?- A=77617.0,B=33096.0, Y is 333.75*B**6+A**2*(11*A**2*B**2-B**6-121*B**4-2)+5.5*B**8+A/(2*B).
A = 77617.0,
B = 33096.0,
Y = -1.1805916207174113e+21.

So this is also wrong and different from the original result. But a simple rearrangement of the expression generates the value from the original paper:

?- A=77617.0,B=33096.0, Y is (333.75-A**2)*B**6+A**2*(11*A**2*B**2-121*B**4-2)+5.5*B**8+A/(2*B).
A = 77617.0,
B = 33096.0,
Y = 1.1726039400531787.

Two different results, both wrong, from mathematically equivalent expressions.

Using SWIP’s rational arithmetic:

?- A=77617,B=33096, RY is 33375r100*B**6+A**2*(11*A**2*B**2-B**6-121*B**4-2)+11r2*B**8+A/(2*B),Y is float(RY).
A = 77617,
B = 33096,
RY = -54767r66192,
Y = -0.8273960599468214.

producing the correct answer (-54767r66192 ≈ -0.8273960599468214).

The referenced “Formal Aspects ..” paper seems to use this example to justify the case for the correctness of interval arithmetic. But using clpBNR’s interval arithmetic implementation:

?- {IY is 333.75*B**6+A**2*(11*A**2*B**2-B**6-121*B**4-2)+5.5*B**8+A/(2*B)}, A=77617.0,B=33096.0.
B = 33096.0,
A = 77617.0,
IY::real(-9.44473296573929e+21, 8.26414134502188e+21).

While this is correct (the answer is in the interval), the interval is too wide to be useful. Arithmetic using the interval’s floating point bounds is susceptible to the same problem. As above, the answer is to use rational arithmetic:

?- {IY is 33375r100*B**6+A**2*(11*A**2*B**2-B**6-121*B**4-2)+11r2*B**8+A/(2*B)}, A=77617,B=33096.
IY = -54767r66192,
B = 33096,
A = 77617.

The issue is the loss of precision when floating point values are normalized for addition. The alternative is to use a numeric representation that is not susceptible to the underlying root cause.

Summary: interval arithmetic (with floating point bounds) exposes the problem, rational arithmetic solves it.

4 Likes