Start of IEEE 754 support

Following the discussions with @ridgeworks and Joachim Schimpf from ECLiPSe, I’ve pushed a couple of commits that provide the first steps towards IEEE-754 support following this proposal

  • (Already since 2014) Support NaN, Inf, -0.0 and copysign/2
  • Added nexttoward/2
  • Have 4 flags, one for each float event (underflow, overflow, zero-div, undefined) that allow generating one of the IEEE float special values or throw an exception. By default, as before, all but underflow generate exceptions.
  • Have control over float rounding using the flag float_rounding.

Not all is set in stone. One thing to be done is verify all the special value generation and subsequent handling in functions is correct. I’ll wait for a test suite :slight_smile:

Another issues is float rounding. As is, you can set this to to_negative, compute, to to_positive, compute again and (optionally) back to nearest to get an interval. Unfortunately it is hard to guarantee the flag remains unchanged at the end of the process. The ECLiPSe solution is to have a type breal (bounded real). It feels as a bit of an overkill to add another atomic type to deal with this. ECLiPSe is primarily a constraint system where this often makes sense, SWI-Prolog is not. Ideas welcome. One might be an alternative to is/2 that computes the bounds as a compound term? If we define the same compound as a function that, depending on the rounding mode returns the upper or lower bound we have something similar to what we used to have for rdiv that allows getting the job done fairly easily with little impact (but a little less elegantly).

1 Like

I guess its not so easy to fully support IEEE-754 and the ECLiPSe proposal does also only do cursory. For example we have:

Welcome to SWI-Prolog (threaded, 64 bits, version 8.1.21-1-g58fb90ecb)

?- X is 0**0.
X = 1.
?- X is 0^0.
X = 1.

But 0**0 should return NaN. See also:

Edit 07.02.2020:
I get the same result in ECLiPSe Prolog. ECLiPSe Prolog has initially (**)/2 undefined, so I guess it could be moded. But when I start it with “-L iso_strict” command line options I get:

Version 7.0 #52 (x86_64_nt), Mon Feb  3 23:59 2020

[eclipse 1]: X is 0**0.

X = 1.0
Yes (0.00s cpu)
[eclipse 2]: X is 0^0.

X = 1
Yes (0.00s cpu)
[eclipse 3]:

I’ll leave all the hairy details to e.g., @ridgeworks who is interested in IEEE support or anyone else who cares enough to study the standards, write test cases and submit PRs to make it compliant. There are most likely a lot of cases where the wrong value or exception is raised. I merely provided the core infrastructure …

1 Like

Jan got a bit ahead of me; a good thing because I’m still stuck in rational exponents.

My main interest is in turning the multitude of exceptions that can happen in arithmetic evaluation into testable values, even if it isn’t the ISO Prolog way. Testing via unification is cheaper in many ways to wrapping every is in a catch just map the exceptions back to numeric values (like infinity). I think the new flags will permit this to be done.

I think it’s best to start with the IEEE standard as the guiding document for this. (They don’t make it easy by charging $$ for a copy.) And on top of that we’re limited by what C99 (and C11 I guess) allows us to do, but I have to assume they’re IEEE compliant. Mostly it’s about generating infinities (of the appropriate sign) from overflows. But there are some interesting corner cases like the one above 0**0 as well as 0/0, inf-inf, sqrt(-2), log(-2) etc. which generally mean nan. This may necessitate changes in the current functions and, if so, there will be backwards compatibility issues to be discussed. I think most generate exceptions now so it shouldn’t be a big issue but 0*0 might be such a case.

Regarding the arithmetic function nexttoward/2 I don’t think the Eclipse version is quite what I’m looking for, although it’s pretty close. The second argument is fine; it’s just an indicator of the rounding direction. I’d probably just use -inf, 0, inf for most cases, but there maybe situations where you want to round to a specific number. But based on the first argument (expression to “round”) I would suggest the following semantics which differ from Eclipse:

  1. If the first argument is a a rational (including integers), an infinity, or NaN, the result is that value, i.e., 1 is nexttoward(1,inf). The theory is that rational bounds of an interval are precise and shouldn’t be rounded outwards. Similarly the rounded value of infinity is infinity (same as infinity-1). (And NaN’s are just better left alone.)

  2. If If the first argument is a float, there are several cases depending on rounding direction and proximity to zero. E.g., 0.0 should round down to -DBL_MIN and round up to DBL_MIN. Similarly DBL_MIN should round down to 0.0 and -DBL_MIN should round up to 0,0. This avoids generating denormalized floats.
    If the the argument is already a denormalized float, the the result should be the next normalized value in the direction of rounding.
    Otherwise round the argument to the next floating point value in the indicated direction. Note that the result of rounding a finite float can be an infinity (just overflowed).

  3. If the argument is an expression (compound term):

  • save the current rounding mode
  • set the rounding mode to to_positive if the the second argument is positive and to_negative if negative.
  • evaluate the expression.
  • restore the saved rounding mode
  • return the result of the expression, no further rounding necessary. (IEEE floating point hardware should have done the job.)

I’m don’t think this is close enough to the Eclipse nexttoward/2 function to use the same name, so some decisions ahead. However, with this function the need for a separate float_rounding flag may be rare, although it probably doesn’t hurt to have it.

As the principle instigator, I’m prepared to take on the test suite for this unless someone else is desperately interested. In doing so I’ll flag any backward compatibility issues I find on this discussion.

All feedback and/or other participation welcome.

I know sometimes you can get official copies at reduced prices by looking around. At the ANSI site (ref) it is $100.

If others find it for less, please post.

We’ll see. ECLiPSe nexttoward/2 is however a simple direct wrapper around its C99 cousin. I think we should not touch that. Otherwise I agree with your observations and look forward to further input :slight_smile:

I agree we shouldn’t touch it if it’s C99 semantics.

In that case, I have two choices:

  1. Lobby for a new and different function with the semantics I described. (Most efficient, but maybe special purpose.)
  2. Implement these semantics in Prolog using nexttoward and the rounding_mode flag as building blocks. This may not be too bad; at some point I’ll assess what this entails.

Here’s an (obviously untested) Prolog implementation of what I tried to describe earlier:

:- op(700, xfx, 'isL').  % adjust result toward -Infinity
:- op(700, xfx, 'isH').  % adjust result toward +Infinity

R isL R :- rational(R), !.  % including integers
R isL F :- float(F), !,
	nextDown(F,R) -> true.  % remove choicepoints
3.141592653589793 isL pi :- !.  % atom irrational constant
2.718281828459045 isL e  :- !.  % atom irrational constant
R isL Exp :-
	set_prolog_flag(float_rounding, to_negative),
	R is Exp,  % ignoring exceptions
	set_prolog_flag(float_rounding, SaveRM).

% Note: clause order is significant
nextDown(P,0.0) :- 0.0 < P, pos_DBL_MIN(PDM), P =< PDM.  % avoid denormalized numbers
nextDown(N,NDM) :- N =< 0.0, neg_DBL_MIN(NDM), NDM < N.  %     ditto
nextDown(F,FL)  :- FL is nexttoward(F,-inf).             % round downward

pos_DBL_MIN( 2.22507385850720138e-308).

% isH/2 similar but "reversed".

Note that a C implementation of this limited use of nexttoward -INFINITY is just:

double d;
-- *(int64_t*) &d;

So not optimal, but not too bad either. It’s just that in a relational arithmetic context, a simple constraint like {Z==X*Y} requires 6 of these operations.

In Java there is a function ulp(). I have ported it to Prolog. You can then compute x+ulp(x) or x-ulp(x). It don’t whether this is exactly nextUp() or nextDown(). I have extended it to float32 and BigDecimal, types available in my Prolog system, but not in ISO core standard:

ulp: integer -> integer
ulp: float -> float (should be both float32 and float64, have to check)
ulp: decimal -> decimal

ulp is acronym for “unit of least precision”. This is not an absolute notion like epsilon which is a constant, but a relative notion, its a function that takes an argument. The Java version can also deal with NaN and consorts, but I do not support this in the ALU (Arithmetic Logic Unit) of my Prolog system

and subequently also not for the ported Prolog function. Here are some sample runs:

Jekejeke Prolog 4, Runtime Library 1.4.2 (January 21, 2020)

/* ISO core float = float64 in my Prolog system */
?- X is 34.543, Y is X-ulp(X), Z is X+ulp(X).
X = 34.543, 
Y = 34.54299999999999, 
Z = 34.543000000000006 
/* float32 */
?- X is 0f34.543, Y is X-ulp(X), Z is X+ulp(X).
X = 0f34.543, 
Y = 0f34.542995, 
Z = 0f34.543003
/* decimal */
?- X is 0d34.543, Y is X-ulp(X), Z is X+ulp(X).
X = 0d34.543, 
Y = 0d34.542, 
Z = 0d34.544

Interesting. Yes it’s pretty close to nextUp/Down other than the checking for corner cases, i.e., checking for infinities and nan’s, catching denormalized values, and not rounding precise values. ulp looks functionally equivalent to nexttoward.

I needed to do something similar using epsilon because I can’t actually add to the set of built-in arithmetic functions in SWIP:

roundDown(R,Z) :-  Z is R - abs(R)*epsilon.  % could overflow to -infinity

This is done after evaluation in my use-case, so R is always a number (as opposed to an expression). Seems to work OK but not a particularly efficient way of doing it.

The Java version also handles some border cases.
Its written in pure Java (from java.lang.Math, author Joseph D. Darcy):

public static double ulp(double d) {
    int exp = getExponent(d);
    switch(exp) {
    case DoubleConsts.MAX_EXPONENT+1:       // NaN or infinity
        return Math.abs(d);
    case DoubleConsts.MIN_EXPONENT-1:       // zero or subnormal
        return Double.MIN_VALUE;
        exp = exp - (DoubleConsts.SIGNIFICAND_WIDTH-1);
        if (exp >= DoubleConsts.MIN_EXPONENT) {
            return powerOfTwoD(exp);
        else {
            return Double.longBitsToDouble(1L <<
            (exp - (DoubleConsts.MIN_EXPONENT - 
               (DoubleConsts.SIGNIFICAND_WIDTH-1)) ));

You could also try to give Prolog a few new builtins, like getExponent(), and then bootrstrap things from this. Its even so that getExponent() itself is bootstraped:

public static int getExponent(double d) {
    return (int)(((Double.doubleToRawLongBits(d) & DoubleConsts.EXP_BIT_MASK) >>
                  (DoubleConsts.SIGNIFICAND_WIDTH - 1)) - DoubleConsts.EXP_BIAS);

I had some problems in the past where I used these things as well. Thats quite fun. For example powerOfTwoD can be also bootstrapped via float bit patterns.

Edit 08.02.2020:
It seems that since Java 1.6 resp. Java 1.8, they also home nextUp() and nextDown(). And interestingly its bootstrapped from some lowlevel ops, just as discussed above.

You can check the source of Its not that difficult to realize. You don’t need some flags and instruct the FPU. It could be even the case that the Java nextAfter() is the

binary nexttoward/2 from Joachim Schimpf. Not sure. But it all boils down to what you already wrote I guess. Having some primitive representation access:

I wonder is this mathematically sound:

What can Exp be? What do you get for such an expression:

?- R isL (-(X+Y))+Z.

The change sign would revert to_negative into to_positive.

Exp can be any expression that is/2 can evaluate. I think of it this way:

  1. Use is to evaluate the expression. Assume for now that it is correct to within 1 ulp.
  2. Calculate two bounds, one by rounding the value down 1 ulp and 1 by rounding it up.

Now the “real” value may not be the original calculated value due to floating point imprecision, but it will be somewhere between the two bounds. My unsound calculation just became sound by representing it as a bounded set of reals.

Back to the assumption that the calculation is accurate to within 1 ulp. One way that can be done is to restrict Exp to a single arithmetic operation. My understanding is that IEEE compliant hardware will make this guarantee. So that’s good but somewhat restrictive.

The other way is to relax this restriction by ensuring the calculated result is always lower than the real value for a lower bound calculation. That’s what the rounding mode is for. isH will use to_positive to achieve the same result for the upper bound. But the result is the same: the “real” answer is guaranteed to be between the bounds even if it isn’t representable as a floating point value.

I think you may be thinking the rounding mode flag (to_negative, to_positive) modifies the expression semantics somehow. It doesn’t; it only controls how the hardware maps an internal result (with guard bits) to a 64 bit float. Default is round to nearest, but the rounding mode flag can be used to change that.

A pair of bounds maps into an Eclipse breal (I think). But evaluating a expression of breal values is a totally new game.

Try this here with isL/2 and isR/2, or some data variation of it:

?- X = 0.05, Y = 0.049999999999999996, Z is -(X+Y).

Edit 09.02.2020:
ECLiPSe Prolog can also not make extended float visible through intervals.
I am interpreting your “guard bit” as extended float or something while evaluating
an expression. At least on Windows platform this doesn’t work. I get:

[eclipse 15]: X is 0.1, Y is 2^54, Z is breal(X+X/Y).

X = 0.1
Y = 18014398509481984
Z = 0.1__0.1

It only works when the arguments are already intervals:

[eclipse 18]: X is breal(0.1), Y is 2^54, Z is X+X/Y.

X = 0.1__0.1
Y = 18014398509481984
Z = 0.1__0.10000000000000002
Yes (0.00s cpu)

You don’t need the breal/1 around the expression, only the values need
to be breals, and then special interval operations are invoked. I don’t know
how you want to port this behaviour to SWI-Prolog through some rounding flags.

You possibly need a separate data type, similar like rational number, to
get the ECLiPSe Prolog behaviour.

How would you get an interval for pi?
Contrary to the documentation of ECLiPSe Prolog,
it doesn’t work as advertised:

   /* what the documentation says */
   ?- X is breal(pi).
   X = 3.1415926535897927__3.1415926535897936

What I get:

   /* what I get */
   Version 7.0 #52 (x86_64_nt), Mon Feb  3 23:59 2020

   [eclipse 3]: X is breal(pi).
   X = 3.1415926535897931__3.1415926535897931 

The documentation error is here:

Edit 09.02.2020:
You get an interval in ECLiPSe Prolog, if you try this:

[eclipse 5]: X is 4*atan(breal(1)).

X = 3.1415926535897927__3.1415926535897936
Yes (0.00s cpu)

I don’t know whats the best interval right know. Maybe you want to
optimize the error to 1/2 ULP or something and not instead look only at
to_negative to_positive, rather try minimize distance to (L+H)/2.

The to_negative to_positive interval would be:


But the (L+H)/2 error is ca. 1.1*10^(-16), whereas the
atan (L+H)/2 interval error is only ca. 8.8*10^(-17). On the
otherhand the bigger interval might propagate more errors.

Current implementation:

?- X = 0.05, Y = 0.049999999999999996, clpBNR:isL(Z,-(X+Y)).
X = 0.05,
Y = 0.049999999999999996,
Z = -0.10000000000000003.

?- X = 0.05, Y = 0.049999999999999996, clpBNR:isH(Z,-(X+Y)).
X = 0.05,
Y = 0.049999999999999996,
Z = -0.09999999999999998.

So the bounds are [-0.10000000000000003,-0.09999999999999998].

Lower bound is always less than the upper bound, which holds. It looks a bit odd because the magnitude of the LB is greater than the UB due to the negative sign.

No, the guard bits are the extended bits in the CPU hardware that are carried during the computation. It’s what is used to insure the 64 bit result is within 1 ulp. Extended (80 bit) floats are something different.

Yes, breal's. But as Jan indicated, that’s not currently an option for SWIP. But I don’t need them to be “built-in”. I can use attributed variables and build the arithmetic for them in Prolog. For your Eclipse example:

?- {Z is 0.1+0.1/2**54}.
Z = _55494{real(0.09999999999999998,0.10000000000000003)}.

Two things: interval (similar to breal) constraints are specified using {...} simillar to some other constraint systems, but unlike Eclipse and clp(fd). Z is a variable inside a constraint so it’s automatically converted to an interval.

The second point is that 0.1, or any float for that matter, cannot be guaranteed to be an exact real value. So they are rounded out before any calculation. (The Eclipse answer is a bit suspect IMO.)

The constant pi is just a floating point constant which is “fuzzed” like any other:

?- {Pi == pi}.
Pi = _56544{real(3.1415926535897922,3.141592653589794)}.

This is even coarser than Eclipse probably due to the outward rounding using
epsilon. nexttoward or your ulp would probably give a tighter answer. But that’s of secondary concern to me; the main thing is that the interval contains the real value of pi so all the math is sound, even if floating point arithmetic is used.

Also note that the answer that Eclipse gives for X is breal(pi). is incorrect in that the breal does not contain the real value of pi ( 3.141592653589793238463...) . The documentation looks correct to me so it’s an implementation bug. Of course pi is an irrational number so it has no finite machine representation.


?- {X is 4*atan(1)}.
    X = _57946{real(3.1415926535897922,3.141592653589794)}.

Same as Pi which is a good thing.

Joachim Schimpf confirmed a documentation error on comp.lang.prolog just now. But I guess this kind of guess work is part of the ongoing research. Its always difficult to judge what a system does. Especially since you are now dealing with 3 systems, SWI-Prolog, ECLiPSe Prolog and

your CLP(BNR). Is it possible to get some hands on of CLP(BNR)? Is there a pre-built Windows .EXE for those poor souls that do not build on their own?

Edit 09.02.2020:
Of course the pi/0 of type float thingy could also lead to a system change in ECLiPSe Prolog. Like having a pi_breal/0 of type breal constant somewhere. Is there such a constant? ECLiPSe Prolog would need to provide “fuzzifyed” variants for all common constants.

The CLP(BNR) approach is of course more elegant. Maybe we should not compare with breal/1 but rather with CLP(R) in ECLiPSe Prolog. Didn’t check yet whats around.

I think I finally (sorry about that) understand the point you’re making, and it’s valid. Each operation, e.g., + , must be properly rounded before proceeding. Internally:

?- [X,Y]::real(100.0,101.0),{T==X+Y,Z== -T}.
X = _62238{real(99.99999999999997,101.00000000000003)},
Y = _62400{real(99.99999999999997,101.00000000000003)},
T = _62906{real(199.9999999999999,202.0000000000001)},
Z = _63942{real(-202.0000000000001,-199.9999999999999)}.

So all is good if the rounding is done at each step. As you noted, some don’t require rounding at all, e.g., unary -. But all this is hidden from the user, he just writes:

?- [X,Y]::real(100.0,101.0),{Z== -(X+Y)}.
X = _67084{real(99.99999999999997,101.00000000000003)},
Y = _67246{real(99.99999999999997,101.00000000000003)},
Z = _67914{real(-202.0000000000001,-199.9999999999999)}.

Where the rounding mode is helpful is on each bounds calculation, e.g., for a Z==X+Y then lb(Z) isL lb(X)+lb(Y) and ub(Z) isH ub(X)+ub(Y).

For Z== -T then lb(Z) is -ub(T) and ub(Z) is -lb(T) (note no rounding).

It’s all in Prolog (SWI-Prolog to be precise), so no platform specific code (one of my original objectives). It’s a SWIP package, so it should install from the package manager (use pack_install/2). But you can download from github directly if that’s more convenient. Instructions are at ; let me know if they’re not clear.

Although it all seems to be fairly stable, it’s still an alpha release as far as I’m concerned and documentation is even more alpha.

Quite right, or the ic library in Eclipse. Interval arithmetic (i.e., breal's) is just the foundation. (In this world pi is like any other floating point constant which must be fuzzed when constructing a constraint.)

BTW, none of this is particularly new. Some 20-30 years ago I worked on the original CLP(BNR) which was a Unix based Prolog. Also CLIP (Hickey et. al) and Numerica (Van Hentenryck, a DSL focusing on global optimization problems using relational interval arithmetic. But none of these seem to be still available in working form.

And interval arithmetic goes back to the 60’s (R.E. Moore).

1 Like

Ok, currently you are just fuzzying up and down by 1 ULP,
blindly it seems. And then for example you use:

% Z := X + Y  (add)
+([Xl,Xh], [Yl,Yh], [Zl,Zh]) :-
	Zl isL Xl+Yl, Zh isH Xh+Yh.            % Z := [Xl+Yl, Xh+Yh].

This would give you probably a much larger interval
than what ECLiPSe Prolog does. Assuming ECLiPSe Prolog
has access to more accurate rounding. Example from
comp.lang.prolog by Joachim Schimpf:

?- T is breal(1)/10, One is T+T+T+T+T+T+T+T+T+T.  % interval arithmetic
T = 0.099999999999999992__0.1
One = 0.99999999999999978__1.0000000000000007
Yes (0.00s cpu) 

Did not yet check, but I guess it currently comes out worse in the
SWI-Prolog port of CLP(BNR). This would also explain the more
coarser arctan/1 result.

But SWI-Prolog would have the option to incorporate BigFloat
from GMP
. This would even allow to select the number of bits
one wants to have. Maybe I can find a Java library for BigFloat.

BigDecimal is standard in Java, but doesn’t have trigonometric
functions. BigFloat is not standard in Java. Maybe slower than
FPU, on the other hand you could go arbitrary precision, something

that ECLiPSe Prolog might missing.

Edit 10.02.2020:
Here they show computing e to 200 bits:

$ ./sample
Sum is 2.7182818284590452353602874713526624977572470936999595749669131

Yes. I don’t really have much choice since I don’t know which way the hardware rounded the answer when it converted back to a 64 bit float. But I think it’s correct to say that if the rounding mode on the hardware was used, the bounds should be 1 ulp apart for any single operation, rather than 2 for the current scheme. And it should be faster.

Good question. But to be honest, I don’t care much. 64 bit floats provide lots of precision for most purposes so throwing some away is a good tradeoff for mathematical soundness. For me, a precise answer isn’t of much value if it’s wrong.