LibBF binding

I have a partially working version of the root functions based on Newton’s method somewhat working:

?- M is (2^32+3), N is M^3, nth_integer_root_and_remainder(3, N, Root, Remainder).
M = Root, Root = 4294967299,
N = 79228162680285034372894031899,
Remainder = 0.

but there’s an issue when I try to find the same root in an arithmetic expression:

?- M is (2^32+3), N is M^3,R is N**1r3.

It either produces the wrong result or hangs requiring a manual abort. Including some print statements for debugging, the code looks like:

mpz_rootrem
void
mpz_rootrem(mpz_t rop, mpz_t rem, const mpz_t op, unsigned long int n)
{ 
  mpz_t cn,cn1,nxt,x;

  mpz_init_set_ui(cn, n);     // const n
  mpz_init_set_ui(cn1, n-1);  // const n-1
  mpz_init_set(nxt, op); 
  mpz_init(x); 
  
printf("n= %i ",n); bf_print_str("op ",op);  
  do  // using rop and rem as temporaries
  { mpz_set(x, nxt);                  // x = nxt
bf_print_str("op before pow ",op); 
    //    mpz_pow_ui(rop, x, n-1);          // rop = pow(x, n-1)        PLUS 1
    bf_pow(rop, x, cn1, BF_PREC_INF, BF_RNDN);  // rop = pow(x, n-1)
bf_print_str("op after pow ",op); 
    mpz_fdiv_qr(rop, rem, op, rop);     // rop = op // rop
    //    mpz_addmul_ui(rop, &s, n-1);       // rop = rop + x * (n-1)    PLUS 1
    mpz_mul_ui(rem, x, n-1);           // rem = x * (n-1)
    mpz_add(rop, rop, rem);             // rop = rop + rem
    mpz_fdiv_qr(nxt, rem, rop, cn);   // nxt = rop // n
  } while (mpz_cmp(nxt, x) < 0);
  
  mpz_set(rop, x);
  
  bf_pow(rem, rop, cn, BF_PREC_INF, BF_RNDN);  // rem = pow(rop, n)
  bf_sub(rem, op, rem, BF_PREC_INF, BF_RNDN);  // rem = op - rem
  
  mpz_clear(cn);
  mpz_clear(cn1);
  mpz_clear(nxt); 
  mpz_clear(x);  
  return;
}

When I look at the debug output from the working query:

?- M is (2^32+3),N is M^3, nth_integer_root_and_remainder(3, N, Root, Remainder).
n= 3 op =0x0.800000048000000d8000000d80000000p97
op before pow =0x0.800000048000000d8000000d80000000p97
op after pow =0x0.800000048000000d8000000d80000000p97
op before pow =0x0.800000048000000d8000000d80000000p97
op after pow =0x0.800000048000000d8000000d80000000p97
op before pow =0x0.800000048000000d8000000d80000000p97
op after pow =0x0.800000048000000d8000000d80000000p97
 .
 .

This is good, i.e., no change to op value. But the failing query outputs:

?- M is (2^32+3),N is M^3,R is N**1r3.
n= 3 op =0x0.800000048000000d8000000d80000000p97
op before pow =0x0.800000048000000d8000000d80000000p97
op after pow =0x0.80000009000000438000010e0000025f800002d90000016c8000000000000000p193
op before pow =0x0.800000048000000d8000000dc0000000p98
op after pow =0x0.e38e38f38e38e40638e39019c71c76071c71cc4471c71f12p191
op before pow =0x0.aaaaaab0aaaaaabcaaaaaabd80000000p97
op after pow =0x0.ca4587f4f03291cca45880181948b4ce9e0656e1948b1244p190
op before pow =0x0.e38e38eb8e38e3a638e38e5300000000p96
op after pow =0x0.b3cc07129c9eba7d03dce3a3a4b267d41ab04d3a4b264920p189
^C^C
WARNING: By typing Control-C twice, you have forced an asynchronous
WARNING: interrupt.  Your only SAFE operations are: c(ontinue), p(id),
WARNING: s(stack) and e(xit).  Notably a(abort) often works, but
WARNING: leaves the system in an UNSTABLE state

This indicates the bf_pow function is somehow modifying the op value, which isn’t even an argument to the function, suggesting memory corruption of some kind. And using the mpz_pow_ui function (which calls bf_pow) doesn’t change anything. Also note the op value is modified further before the start of the next iteration.

I don’t have the expertise/tools to tackle this, but any suggestions for resolving this welcome.

1 Like

Thanks. I’ll have a look. LibBF typically cannot handle that the output argument is the same as one of the input arguments. Typically it raises an assertion error if this is violated. I do not know whether the checks are exhaustive. On Apple this is by default only enabled for debug builds. The mpz emulation should compensate for this, but probably doesn’t in all places.

Seems to work now :slight_smile: mpz_fdiv_qr() could handle shared output and input and your mpz_rootrem() also modified its own input if arguments were shared.

The first indeed resulted in an assertion error. Now I should check mpz_to_double(), as

 ?- M is (2^32+3), N is M^3+1,R is N**1r3.

Should promote to float, but crashes :frowning:

Yes, guess I need to clean that up. I was looking to minimize the number of temporary “registers”, but not a good idea.

I worked around it. Whether it is a good or bad idea to reduce temp bignums is not clear to me. As is, it probably is, but I plan to avoid allocation for “smallish” bignums altogether.

Just pull the latest version :slight_smile:

My recollection is that square root can be done in 18 machine instructions on an IBM 370. :wink:

I did a Google search for [IBM Research square root] and found a few interesting items. e.g. Computation of elementary functions on the IBM RISC System/6000 processor (Journal Article) | OSTI.GOV and some patents.

(I have no particular interest in these things … but in ~1987 I was TL for the IBM/370 C runtime, and part of that involved productizing IBM Research’s elementary functions library (written in assembler). I didn’t deal directly with this work (it was done by a ulp-zealot) and I don’t remember any of the researchers’ names; and no doubt there have been improvements in subsequent years. This was pre-IEEE floating point; and I don’t know how much depended on things like the ability to do unnormalized arithmetic)

In further debugging mpz_rootrem, e.g., for negative values, I noticed there were other functions in bf_gmp.h that need to guard against use of the same argument as input and output. The one I tripped over was mpz_tdiv_qr but it looks like there are others, e.g., most of the basic arithmetic predicates as well as mpz_fdiv_r, mpz_tdiv_r, … . Maybe the general rule of not sharing inputs and outputs isn’t always applicable?

It also appears that there’s a bug somewhere in pl-arith in processing the remainder value from mpz_rootrem:

%  plus increment
?- M is (2^32+3), N is M^3+5, nth_integer_root_and_remainder(3,N,R,Rem).
M = R, R = 4294967299,
N = 79228162680285034372894031904,
Rem = 5.

?- M is (2^32+3), N is M^3+5, R is N**1r3.
M = 4294967299,
N = 79228162680285034372894031904,
R = 4294967298.999994.

% minus increment
?- M is (2^32+3), N is M^3-5, nth_integer_root_and_remainder(3,N,R,Rem).
M = 4294967299,
N = 79228162680285034372894031894,
R = 4294967298,
Rem = 55340232285553164302.

?- M is (2^32+3), N is M^3-5, R is N**1r3.
M = 4294967299,
N = 79228162680285034372894031894,
R = 4294967298.999994.

but same also occurs in 8.4.1, so not anything to do with current activity. I’ll put this in my queue for investigation once the libbf stuff has basic functionality.

1 Like

These are the two I want to compare:

% root slightly more than 4294967299 NOK
?- M is (2^32+3), N is M^3+5, R is N**1r3.
M = 4294967299,
N = 79228162680285034372894031904,
R = 4294967298.999994.

% root slightly less than 4294967299 OK
?- M is (2^32+3), N is M^3-5, R is N**1r3.
M = 4294967299,
N = 79228162680285034372894031894,
R = 4294967298.999994.

I added the nth_integer_root_and_remainder to show that the results of the underlying C function appears to be correct, but how those results get mapped to a float is suspect.

The libBF power function seems to be woefully slow:

% SWIP 8.5.20 - result is bigint
?- M is (2^32+3), time((between(1,100000,_), X is M^2, fail; true)).
% 200,000 inferences, 0.028 CPU in 0.028 seconds (100% CPU, 7092953 Lips)
M = 4294967299.

% using libBF - result is bigint
?- M is (2^32+3), time((between(1,100000,_), X is M^2, fail; true)).
% 200,000 inferences, 4.472 CPU in 4.473 seconds (100% CPU, 44722 Lips)
M = 4294967299.

% libBF app but result is not bigint
?- M is (2^31+3), time((between(1,100000,_), X is M^2, fail; true)).
% 200,000 inferences, 0.021 CPU in 0.021 seconds (100% CPU, 9544262 Lips)
M = 2147483651.

Is libBF going to be usable in its current state?

The whole point of this exercise is to figure that out :slight_smile: On the other hand, multiplication is doing quite well, so we could roll our own integer power, no?

There are three possible continuations:

  1. Forget about LibBF. All we have left then is some abstraction that allows easier replacement for big integers later.
  2. The LibBF version gets roughly the same performance as the GMP version and will be the default. That is my hope.
  3. The LibBF version is easy enough to keep around as alternative and provides bigints if GMP cannot be used (typically for license reasons).

Functionality-wise we are getting quite close. There are surely bugs and some functions are really slow. Some of that could be easy to fix. Hope that is enough.

Note that we already replaced random and have mostly working rationals, so we now only use a fairly small set if integer functions from LibBF. That makes another switch fairly simple. I hope that is not needed such that we can use LibBF and some of its other floating point goodies.

Yes, this is the root of the problem. When there is a remainder, the base and exponent are converted to floats and the rounding errors take over. The end result is that, unlike the mathematical function, ** is not monotonic over bigints, e.g.,:

?- M is (2^32+3), (M**3+1)**1r3 < (M**3)**1r3.
M = 4294967299.

Over a small integer, e.g.,:

?- M is (2^16+3), (M**3+1)**1r3 < (M**3)**1r3.
false.

Lets define loss, as loss of accuracy. bigint to float conversion is
not lossless, it introduces an error which can go both ways, i.e.
be positive or negative.

Thats because it mixes float conversion results, which has some loss,
and exact result, which is lossless. The exact result is lossless, because
it doesn’t convert the base or exponent first to float:

?- M is (2^32+3), X2 is (M**3+1)**1r3, X1 is (M**3)**1r3, X1 > X2.
M = X1, X1 = 4294967299,
X2 = 4294967298.999994.

Doesn’t happen if you don’t mix, make everything float converted:

?- M is (2^32+3), X2 is (M**3+1)**(1.0/3), X1 is (M**3)**(1.0/3), X1 > X2.
false.

Normally correctly rounded float (**)/2 is monotonic, where the original
real valued function is monotonic. Because the rounding function is
monotonic, and composition of monotonic

function gives monotonic founction. If you start mixing loss and lossless,
you destroy this monotonicity. Because in the neighbourhood of an
exact lossless result, you can have an inexact result with loss,

and the error of the loss can be so big that it crosses the boundary
of the exact result and invalidates montonicity.

In current version:

51 ?- M is (2^32+3), time((between(1,100000,_), X is M^2, fail; true)).
% 200,000 inferences, 0.072 CPU in 0.072 seconds (100% CPU, 2771180 Lips)
M = 4294967299.

vs. 0.027 for 2^31. When I implement more efficient memory handling the difference should get smaller. Seems this is not going to be a show stopper either :slight_smile:

LibBF version (Ubuntu 22.04, AMD3950X, -O2 (no PGO))

56 ?- time((between(1,1000000,_), fail; true)).
% 1,000,002 inferences, 0.041 CPU in 0.041 seconds (100% CPU, 24361580 Lips)
true.

57 ?- M is (2^32+3), time((between(1,1000000,_), X is M^2, fail; true)).
% 2,000,000 inferences, 0.442 CPU in 0.442 seconds (100% CPU, 4520779 Lips)
M = 4294967299.

Normal GMP version (same machine and compiler options)

101 ?- time((between(1,1000000,_), fail; true)).
% 1,000,002 inferences, 0.042 CPU in 0.042 seconds (100% CPU, 24020820 Lips)
true.

102 ?- M is (2^32+3), time((between(1,1000000,_), X is M^2, fail; true)).
% 2,000,000 inferences, 0.192 CPU in 0.192 seconds (100% CPU, 10441695 Lips)
M = 4294967299.

LibBF has no square function and a quick look suggests bf_mul() doesn’t check the two inputs are the same, so it seems to do general multiplication. That is good as i means we can optimize if we want :slight_smile:

As we have seen before, GMP is slow on Windows. On Linux:

1 ?- M is (2^16+3), time((between(1,1000000,_), X is M^2, fail; true)).
% 2,000,002 inferences, 0.128 CPU in 0.128 seconds (100% CPU, 15579571 Lips)
M = 65539.

2 ?- M is (2^32+3), time((between(1,1000000,_), X is M^2, fail; true)).
% 2,000,000 inferences, 0.209 CPU in 0.209 seconds (100% CPU, 9569868 Lips)
M = 4294967299.

3 ?- M is (2^64+3), time((between(1,1000000,_), X is M^2, fail; true)).
% 2,000,000 inferences, 0.217 CPU in 0.217 seconds (100% CPU, 9211140 Lips)
M = 18446744073709551619.

The libbf branch now compiles the entire system and runs all tests. This means we can at least provide a fully functional version for users that need that and cannot handle the LGPL license conditions (embedded systems, game consoles, etc. or simply company policy). There are still some loose ends, but this is the status

  • Compiles and runs. Tested on Ubuntu/AMD64 and MacOS/M1. I think all 64 bit systems should be fine. Not sure about 32-bit systems. Surely can be made to work though.
  • Passes tests. There are not that many tests for arithmetic correctness though, assuming GMP gets that part right. As we now do various parts ourselves this claim no longer holds.
  • Did not test this branch to still compile and run using GMP. Will do that soon.

The real surprise: the benchmark suite used to train the PGO (Profile Guided Optimization) runs 20% faster on AMD64 (both PGO compiled) and at the same performance on the M1 (PGO doesn’t help on MacOS, so we do not use it). The benchmark set mostly contains old programs that do quite a bit of arithmetic, but I think only one uses big integers. I’m a puzzled, but this result seems solid. If that really holds, I think we can accept some loss on big numbers.

3 Likes

Big picture:

So 2. or 3.? How does it compare to GMP? I hesitate to draw any conclusions on my builds but I’m guessing an order of magnitude slower on, e.g., ?- bench(F,1000,200,300). That would suggest 3. unless something can be done about it.

Yes, it seems 1 is not needed :slight_smile: The impact of the branch on the overall code base is rather minimal. Just the GMP/BF serialization was not pretty and is still not pretty (bit prettier by abstracting a bit and a bit less because of additional #ifdef :frowning: Can be abstracted a bit further. Considering we can remove all the conditional code on big number and rational support I’d say the total code base becomes simpler.

2 or 3 is still open to me. I’m tempted to merge the current state using GMP as default. We are not too far of making LibBF the default though. The only serious bottleneck seems rational normalization. You have already made it a lot faster. Haven’t checked the detail, but I saw another gcd approach that seems to be faster for finding “co-primes”, e.g., gcd is 1. I’d suspect that to be a rather common case in rational normalization.

I would volunteer in testing, performance and accuracy. But I don’t find
any public visible repository with some early prototypical binding

where I can build from? Or maybe this is the “libbf branch”:

https://github.com/SWI-Prolog/swipl-devel/tree/libbf

How does one build this branch? I am on WSL2. Is there maybe
a pre-built Windows .exe or some PPA , this would be also fine.

I have merged the libbf branch into master. I’ve done some random testing on part of the arithmetic involving big integers and rational numbers. See attached test. The idea is to have two Prolog systems talking through TCP. The first generates random expressions sends these to the peer, both compute and the result is compared. That turned out to crash the gmp version rather easily. Pushed several enhancements, notably to ar_pow(): mpz_get_ui() returns the value modulo ULONG_MAX, which may give wrong results. Before calling mpz_pow_ui() we better first do some sanity checking or GMP may call abort() (not the Prolog one, but the C one that kills the process :frowning: ). Than I had some infinite loops (or at least too long to wait for), both in our LibBF binding and GMP. Solved the ones for the LibBF bindings.

All in all, the GMP version has been improved with these patches and the LibBF version compiles, passes the tests and seems to crash less :slight_smile: I have a reasonable trust in all functions that involve bit integers and rational numbers.

I’m a little puzzled by the behavior of the (\)/1 function. I think there are some problems. See test_bitwise_negation/1 in test.pl, which seems incorrect.

test.pl (5.0 KB)

As it is now in master, just checkout the git and follow the normal build instructions. To get the LibBF version, add -DUSE_GMP=OFF to the CMake configure run. So, to combined all instructions for Debian based systems, we get:

sudo apt-get install \
        build-essential git cmake ninja-build pkg-config \
        ncurses-dev libreadline-dev libedit-dev \
        libgoogle-perftools-dev \
        libgmp-dev \
        libssl-dev \
        unixodbc-dev \
        zlib1g-dev libarchive-dev \
        libossp-uuid-dev \
        libxext-dev libice-dev libjpeg-dev libxinerama-dev libxft-dev \
        libxpm-dev libxt-dev \
        libdb-dev \
        libpcre2-dev \
        libyaml-dev \
        default-jdk junit4

git clone --recursive https://github.com/SWI-Prolog/swipl-devel.git
cd swipl-devel
mkdir build.libbf
cd build.libbf
cmake -DUSE_GMP=OFF -G Ninja ..
ninja

Now run using src/swipl and/or test using ctest -j N, where N is the concurrency (number of cores is a good start).

You can make other subdirectories and build with different configurations. Add -DCMAKE_BUILD_TYPE=PGO for maximum performance.

This should not affect accuracy in any way. We use the same algorithm for float ↔ bit integer/rational conversion, so any different answer means there is a bug in one of the GMP emulation functions. No floating point behavior is affected.

1 Like