Correct floating point arithmetic

OK. A platform independent, correctly rounded math library is tantalizingly close based on this discussion, but the proposed solutions are far from perfect (added library dependancy, performance issues, functional overkill) so I’m happy to let this slide in hope of something better in the future.

Now I nearly fell into a rabbit hole. Was having doubts about
(=:=)/2, etc… predicates and friends. What are your plans for these
predicates? They currently work in first mode (less precision

more speed), converting arguments to float, which is not exact
comparison, i.e. second mode (more precision less speed).
This is in the ISO core standard:

Unbenannt2

Currently we have in most Prolog systems, which is correct
as a first mode (less precision more speed):

/* SWI-Prolog 8.5.20 */
?- 563^56 =:= 1.0677513069269998e+154.
true.

?- 1/(563^56) =:= 9.365476712718912E-155.
true.

On the other hand in second mode (more precision less speed),
we would get this result:

?- 563^56 =:= integer(1.0677513069269998e+154).
false.

Edit 20.10.2022
But there is no data type, number type, that works for all
floats. integer/1 does the job to transfer the full precision from
float to bigint, but only for numbers which are large enough

and don’t have really a fraction. Usually systems that offer
high precision also implement second mode (more precision
less speed) comparison, at least it seems to me from some

experimentation. For example http://numcalc.com/ gives me:

> 1000 == 1.0E3
true

> 563^56 == 1.0677513069269998e+154
false

> 1/(563^56) == 9.365476712718912E-155
false

Quite a dilemma, there are now also two modes in comparison,
and the second mode is difficult to provide, integer/1 doesn’t
always do the job. I didn’t try SWI-Prolog rational numbers,

would maybe 1 rdiv (563^56) compared to a float give a different
result, is this the hidden door for a second mode. Maybe dependent
on some flags?

I have no plans. Is it really true that ISO demands conversion to float for =:=/2? That can’t be the case unless they did not anticipate the existence of big integers. They did, as there is a flag bounded. The rules are simple: when both are of the same type, we compare directly. Otherwise we convert to a common type. In practice we now only have rationals and floats (integers are rationals), so if one is a float we convert both to a float and compare, otherwise we simply compare. Rational comparison is the same as term equivalence (==/2) anyway.

So, in

we compare a bigint to another bigint. The float represents an interval of real numbers, and floats this big also a set of integers. A good question is which of them is picked? I guess that should ideally be the one in the middle.

Since floats are a subset of rationals, shouldn’t the float be converted to a rational before comparing?

Was hopeing that http://numcalc.com/ can do it,
but it seems to give up on rational numbers:

> \p 53

> 12846184174843000 / 2**53 == 1.4262129449486016
true

> 12846184174843001 / 2**53 == 1.4262129449486016
true

Which leads to a nice logical contradiction, via symmetry
and transitivity, since it has nevertheless:

> 12846184174843000 / 2**53 == 12846184174843001 / 2**53
false

But symmetry and transitivity would have a=b & c=b => a=c. But
I guess you find such symmetry and transitivity violations now also in

Prolog without rational numbers, just because of the float rounding. Can
there be a symmetric and transitive (=:=)/2, by making it more precise?

Not in my view. A rational is in my view exact, while a float represents an interval. In Prolog arithmetic we may implicitly convert from the exact world to the float world, but not the other way around except for explicit conversion. Ideally we use prefer_rationals (as I have on by default), so 1/3 is indeed the rational 1/3 (or 1r3 as agreed for portable syntax with ECLiPSe), so we do not have to move to the imprecise world if not needed.

But, we have fundamentally R → R functions that demand for floats, e.g.

 ?- A is e**log(5).
 A = 4.999999999999999

Here we see that equivalence using floats is difficult :frowning:

This is very surprising for me… Can you explain better? Which interval is represented by a float F?

I also found this surprising, i.e., a float is not a number(?). And floating point arithmetic in no way obeys the rules of interval arithmetic, i.e. arithmetic on sets on numbers, so floats certainly don’t “act” like intervals.

I don’t believe there are “two worlds”, just the “real” world. In that world floats are a subset of rationals which in turn are a subset of reals. All represent precise values but clearly floats cannot precisely represent all rationals (or reals). Also floating point arithmetic is mathematically unsound, so problems do arise when you try and treat it as such.

The example A is e**log(5) a little problematical since the e here is an imprecise float approximating e, Euler’s number. So maybe the calculated value of A is a correct (precise) value.

But the set of real numbers is ordered, so it should always be possible to compare two real values, regardless of their representation. The original query would be mathematically sound if floats were truly viewed as a subset of rationals and so were converted to rationals before comparing.

?- X is 563^56, Y is float(X), Z is rational(Y), X =\= Z.
X = 10677513069269997516399447194695704099555397064807727001054698543834721136271816976528146321903788538392823298685745841931165971147583689889896977012707041,
Y = 1.0677513069269998e+154,
Z = 10677513069269997545269698019591811751464710995362620267227333714081530862014611278262343441335941736142189062670101631597639432277660465201020053907046400.

OTOH, what you do about it, if anything, is an open question.

IMO, the ISO spec fragment is just irrelevant (if not wrong) given the state of arithmetic in modern Prolog systems.

I don’t find this surprising … as someone who has been subjected to a few numerical analysis courses (stiff differential equations, ill-conditioned matrices, epsilon, etc. etc.) plus a few physic courses (number of significant figures, etc.), I think of floating point numbers as an abbreviation for a range of values, and they don’t obey many of the normal rules of arithmetic (associative or distributive properties, even simple identities such as x/y*y=x).
The accuracy calculations tend to get somewhat nasty (which is probably why interval arithmetic libraries aren’t terribly popular), but the accuracy is always in the back of my mind – a floating point number is nothing more than a best guess of a value with an inaccuracy (often unstated). Part of the “fun” of numerical analysis is finding better ways of computing values that reduce the accumulated errors while also doing the computations quickly.

This, of course, doesn’t fit well with constraints over inequalities, especially because the epsilon more-or-less says that it’s sometimes impossible to know whether one number is less than, equal to, or greater than annother, and therefore it might be impossible to order a given set of floating point numbers.

2 Likes

I think my wording was not very good. @peter.ludemann gives a more accurate description of the way I look at floats.

Thinking further about this, I think the crux is in equality. As long as we stay in the rational number domain arithmetic is 100% precise and equality is well defined. Once we get to the float domain this gets hard. As @peter.ludemann says and how I always looked at it being trained as engineer is that a float is (normally?) an approximation of some (real) value, I guess we can see “real” in two ways here, a mathematical number and the value of something in the real world. Given two floats that are equal, but that came to us using different calculations, they may both be approximations of two different real numbers, so the equivalence is false. Similarly, as 5.0 =:= e**log(5) shows, two floats that result from two different computations that represent the same real value may not compare equal in the float world. Even for ordering, the float ordering may be different from the actual values the numbers approximate.

Possibly there are two views, one considering floats an ordered set of points in the space of real numbers as a mathematical concept and one considering them approximations of real numbers/values and I’m in favor of the second due to my background?

In the second view, float comparison is pretty much useless without considering the accuracy of the approximation at the same time.

One addition I would make to this for calculations with variables one should try to do as much of the math symbolically before obtaining a numerical result.

This why I am a fan of

IMHO your model is wrong: the point is not about the fact that floating number are approximations (in your words “represents an interval”), but that the operations on floating numbers are approximated.

Once we confuse the fp “+” (rounded floating point addition) with “+” (arithmetic addition) operator we might be forced to think that floating point are approximations, but they are not.

In summary, floating point numbers are a subset of rationals (as @ridgeworks noted) and the operators on floating point numbers are not the mathematical operators.

I would call that the mathematical way to see it. I’m an engineer :slight_smile: I don’t know whether we have a right/wrong or practical/impractical or whatever way to view these matters here.

1 Like

Well, my use of “wrong” is definitely improper, sorry.

What I meant is that I believe that following the model “floating point numbers are approximations” leads to a numbers for problems (e.g. why to say that 1.0 is an approximation? why to say that 0.0 =\= 0?) that are easily avoided considering floating point numbers representing exactly a rational number, and fp “+” and mathematical “+” as two different operators (despite having the same symbol).

No worries. What I do worry about is what fp “+” means. And, while this might still be fairly easy with +, it gets more fuzzy with many of the mathematically R → R functions as we have seen in this forum. Well, maybe there is a sound definition, but real world implementations do not appear to behave according to that definition :frowning:

Your view doesn’t help me much understanding why 5.0 =:= e**log(5) is false. Yes, because we have fp “e” and fp “log” which are not their mathematical counterpart, I guess. But in practice we are faced with a rather arbitrary subset of equations that are true in real math that are also true in the float world.

Maybe this is time to fork the thread again. The title says “Correct
floating point arithmetic”, but the thread initially did search for crlibm,
which is an acronym for “correctly-rounded elementary functions
in double-precision”.

There is possibly a subtle difference in the two? I believe the scope
of correctly-rounded elementary functions in double-precision is much
narrower. In my opinion, from glossing over the crlibm documentation,
it excludes/includes the following issues:

  • Since it refers to elementary functions, it only deals with the
    accuracy of a single function invocation, mapping F -> F in the
    floating point domain, it doesn’t deal with function composition like
    e**log(5) and neither asks for mapping R -> R in the real number domain.

  • You can take any document that does the usual specification
    of correctly-rounded, be it the ISO core standard document
    or any other document which reflects the basic idea of a
    rounding function rnd : R -> F.

  • In this setting the only problem that is solved is for an elementary
    mathematical unary or binary etc… function g : R -> R or h : R x R -> R,
    etc… to find machine functions g' : F -> F and h' : F x F -> F,
    such that they obey:

g'(u) = rnd(g(u))
h'(u,v) = rnd(h(u,v))
  • Symbolic computation and other methods can do much more,
    especially for the composition of functions like e**log(5). You
    have notions such as the aggregate rounding error result from
    the composition of multiple functions, which was already introduced
    by Gauss (1777 - 1855), sometimes called “error propagation”.

  • crlibm itself is a misnomer. libm should already correctly round. crlibm
    additionally offers multiple rounding functions, so that interval arithmetic
    can be also realized. This is also a tool to compute aggregate rounding
    errors, namely to compute so called error bounds.

  • But I guess the interval arithmetic itself is again not in the scope of a
    crlibm itself. Only the different rounding modes that could lead to the
    forming of intervals. You see this in the crlibm, with what they occupy
    their precious time:

I doubt you will ever find a crlibm, which has a very narrow
specification, if you start discussing heaven and earth. On the
other hand I admire everybodies passion for heaven and earth,
and maybe my scoping of crlibm is too narrow?

There is a further nifty detail, which makes the crlibm topic extremly
narrow. For example it says "correctly-rounded elementary functions
in double-precision”. So it also fixes a binary precision to p=52.

But error propagation or interval arithmetic, for example when the
interval bounds are rational numbers, can calculate with smaller
precisions or with larger precisions. Also dynamic methods that

start with a small precision and go to a larger precision are possible.
This wouldn’t be covered by a classical crlibm with p=52. Possibly
things along the line of http://numcalc.com/ could help getting

out of the cage of p=52. But do you want to get out of the cage?
You could open a new thread for that. The current architecture
of SWI-Prolog seems to rather assume p=52, snapshot 8.5.20:

  • ar_XXX() functions in SWI-Prolog that use cr_XXX() functions only
    change the rounding mode with intent double-precision floating
    point values?

  • cr_XXX() functions in SWI-Prolog that can respect different rounding
    modes, do only offer them for double-precision floating
    point values?

Looking at the above two bullet points, it seems to me that SWI-Prolog
as of version 8.5.20 has currently an architecture, that heavily leans
towards a crlibm in the sense of a crlibm for f64, i.e. p=52.

Isn’t the reason this fails due to the fact that floating point arithmetic is mathematically unsound? And x = e**log(x) is a mathematical axiom that isn’t necessarily true in an unsound system of numeric arithmetic. If I use mathematically sound arithmetic (as in clpBNR’s interval arithmetic):

?- {5.0 =:= e**log(5)}.
true.

IMO, the reason floating point arithmetic is widely used is that it’s very fast (hardware support) and usually good enough. Adapt whatever mental model you need to make sense of its inherent idiosyncrasies.

And for the record, I am an engineer.

2 Likes

Pretty good summary IMO. The only kind of float that SWIP (currently) supports are C doubles. There does exist a partial crlibm called crlibm which would help but it’s unsupported and can’t be easily accomodated within SWIP’s licensing and building constraints. All the discussion about mpfr and libBf were largely focused on whether it made sense to build a “custom” crlibm using one of those libraries.

1 Like

Yes, that was my proposal. Access to the libraries would be hidden behind the “cr_* → double” functions in such a way that the underlying library could be easily changed in a future release. Obviously those libraries are overkill for just this functionality.

Using the same libraries in other ways, e.g., multi-precision, alternative radixes, is a new ball game (from my perspective) and all your questions would need to be addressed. That’s the kind of thing than should probably be done using the foreign language interface and separated from the core SWIP (along the lines of @jan 's suggestion earlier in the thread).