Benchmarking [CLP(B) libraries and other stuff]

Well almost, in your old ZDD post, you did XOR representation counting
by converting into OR representation and you didn’t have a primitive
zdd_xor, so you had to do it like this here:

gf2_sat_count(P, C, As, U, V, S):- cofact(P, t(_-J, L, R), S),
	C0 is C * (2^J-1),
	gf2_sat_count(R, C0, [L-C|As], U, V, S).

I don’t know whether this is correct. Do you have the GF(2) already converted
to ZDD, otherwise I don’t see this as correct for an XOR representation.
You find a counting based on XOR representation in my tree2.p.

I am also using this as a benchmark for Dogelog Player now. Dogelog Player
has currently a garbage collection problem in these test cases. So they give
me a nice playground to experiment with improvements:


GNU Prolog is blazing fast, but you need to increase memory setting GLOBALSZ.

Its good to publish benchmarks, but I cannot find a link to the source. That makes them useless, even more so if you do not include the data for the Prolog system this forum is about.

It is in the Novacore GIT, initially:

git clone

For updates as desired:

cd novacore
git pull

Mostlikely you can also test it for SWI-Prolog. I don’t use any exotic Prolog
features. Its the same educational tree.p and tree2.p of 4000 bytes I already
shared. Nothing new for this forum. But since GNU Prolog and SWI-Prolog

are very similar performance wise I did exclude SWI-Prolog. But recently
SWI-Prolog has become a little bit faster, I noticed, beats GNU Prolog now,
so I guess should include it again. Its quite a moving target. Sorry, not sorry.

There are quite different claims at places and also your

suggests GNU Prolog is top (speed). It is still widely believed SWI-Prolog is (very) slow and I think that assumption should be corrected where fair.

Adding a time (wall) that is a bit easier to read and a total, I get (AMD3950X, Ubuntu 22.04)

swipl -O -l samples/common/17_bench/crypto/tests/suite.p

5 ?- suite, total.
sator18 0ms
sat2or18 162ms
satxor18 422ms
sat2xor18 0ms
countor18 0ms
count2or18 115ms
countxor18 106ms
count2xor18 129ms
Total: 934ms

What makes you think I share this believe? I wrote:

But to have a test that says something, since you test on
a different machine, you would need to create a test report
like I did, with SWI-Prolog and GNU Prolog side by side.

Maybe you can use my library(runner) (*) and library(report)
for that. Or roll your own report generator. Or juggle manually
from commend line. I also still use command line from time

to time to get quick comparisons. Maybe you can also change
peoples believe systems if you show side-by-side results?
But doing such reports is a lot of initial investment, but

once you have them, its payback time.

Edit 19.11.2023
(*) Funny approach I am using now, I started with a kind of PlUnit,
which counts success and failure, and turned it additionally into a benchmark
utility not counting success and failure, but summing

measured time. A lot of code reuse possible! Plus you need to make the
PlUnit more flexible so that it can generate and integrate results from
different runs, like with different Prolog systems or different platforms.

I tried to generalize GitHub - SWI-Prolog/bench: Prolog benchmarks (`van Roy' set). I managed to revive the YAP and SICStus ports. We know these systems are fast. This results in

Note that SWI-Prolog did make some progress. These timings were calibrated on 1 seconds for 8.5.2 on the same hardware. In part, this is due to improvements in GCC. The missing bars are not really fast, but benchmarks that do not run on the target system. queens_clpfd can surely run on SICStus with porting.

Upto zebra, these are the classical “van Roy” benchmarks. Others were added to get wider coverage:

  • sieveSieve of Eratosthenes using assert/1 and retract/1 (to benchmark these)
  • queens_clpfd – as the name says, for clp(fd). By Markus Triska.
  • pingpong – Tabling test
  • fib – Tabling and big integer test
  • moded_path – Tabling with answer subsumption
  • det – Single sided unification and det/1 evaluation.
  • eval – Evaluate large expressions (from discussion here)
  • average – Not a benchmark, but the average over all of them.

I tried to include

  • Scryer – Says caught: error(syntax_error(inconsistent_entry),use_module/1) on anything it doesn’t like, which makes debugging a little hard.
  • Ciao – Probably possible, but I could not find a way to load a non-module file into a module.
  • Trealla – Crashes on a segmentation fault when loading the driver (current GIT, passes tests)
  • GNU-Prolog – Needs a way to make the deal with Prolog systems without modules.
  • XSB – Doesn’t like nested conditional compilation (there is a bit of :- if(...).)

Please contribute benchmarks or ports. Unfortunately some old benchmarks have unknown copyright. New ones must have an open source license that is accepted by the Linux distributions.

1 Like

For example this here .sh, what shell does it even expect?

If I want to run these tests under Windows 10, I have to create my own .bat right?

There can be also a lot of troubles with dump files created during
a bechmark run and if your test cases are connected to
a GIT, SVN, what ever repository. You are creating .csv files.

On the other hand I have the location where to put the dumps
configurable. Same with the location where to put the reports. So
that test cases do not get polluted with this stuff. Otherwise for

example in GIT, you would need to put them on ignore, but you
might still see them in an IDE, creating a big mess. Or to avoid the
mess you would remove them after a run? Well I got already

conformatble that the dump folder is under local history of the IDE,
so I can see changes there. Which is also quite convenient.

I am afraid that it sounds as “馬の耳に念仏” in Japanese (“reading the nenbutsu into a horse’s ear”).In fact, too much for me to digest. Anyay, I notice that I should learn from you and ZDD experts.

I have managed to write a couple of codes to count the number of solutions (assignments) for given a polynomial or
a sytem of polynomias over GF(2). Both are below but only main parts.
Rest are untouched as previous version. One of new ones uses XOR operation.
Both of them works well compared with my previous one, but it is not drastically efficient.

I borrowed the same benchmark on the unitary matricies over GF(2).

Codes on counting directly.
gf2_card(_,  0, 0, _):-!.
gf2_card(Vs, 1, N, _):-!, length(Vs, K), N is 2^K.
gf2_card(Vs, X, N, S):- memo(gf2_card(X, Vs)-N, S),
	(	nonvar(N) -> true
	; 	cofact(X, t(A, L, R), S),
		divide_at(A, Vs, H, T),
		gf2_card(T, L, I, S),
		gf2_join(L, R, U, S),
		gf2_card(T, U, J, S),
		N0 is I + J,
		length(H, K),
		N is N0*(2^K)
Codes on building the assignment space first.
gf2_solve(_,  0, 0, _):-!.
gf2_solve(Vs, 1, P, S):-!, zdd_power(Vs, P, S).
gf2_solve(Vs, X, P, S):- memo(gf2_unit(X, Vs)-P, S),
	(	nonvar(P) -> true
	; 	cofact(X, t(A, L, R), S),
		divide_at(A, Vs, H, T),
		gf2_solve(T, R, R0, S),
		zdd_insert(A, R0, R1, S),
		(	L = 0 -> P0 = R1
		;	gf2_solve(T, L, L0, S),
			cofact(L1, t(A, L0, L0), S),
			zdd_xor(L1, R1, P0, S)
		zdd_power_rev(H, P0, P, S)
Benchmark on unitary matrix over GF(2)

% ?- between(1, 7, I), orth_matrix(I, M),
%	time((zdd gf2_poly_solve_list(M, Sols), card(Sols, C))),
%	format("Num of sols for orth-matrix(~w) = ~w~n~n", [I, C]),
%	fail; true.
%@ % 308 inferences, 0.000 CPU in 0.000 seconds (91% CPU, 4812500 Lips)
%@ Num of sols for orth-matrix(1) = 1
%@ % 3,942 inferences, 0.000 CPU in 0.000 seconds (83% CPU, 10950000 Lips)
%@ Num of sols for orth-matrix(2) = 2
%@ % 27,493 inferences, 0.002 CPU in 0.002 seconds (98% CPU, 14439601 Lips)
%@ Num of sols for orth-matrix(3) = 6
%@ % 181,676 inferences, 0.013 CPU in 0.013 seconds (97% CPU, 14279337 Lips)
%@ Num of sols for orth-matrix(4) = 48
%@ % 1,475,875 inferences, 0.117 CPU in 0.119 seconds (98% CPU, 12665737 Lips)
%@ Num of sols for orth-matrix(5) = 720
%@ % 25,199,511 inferences, 2.618 CPU in 2.667 seconds (98% CPU, 9625857 Lips)
%@ Num of sols for orth-matrix(6) = 23040
%@ % 844,701,675 inferences, 108.134 CPU in 110.328 seconds (98% CPU, 7811610 Lips)
%@ Num of sols for orth-matrix(7) = 1451520
%@ true.

Whats the time without memo/2 ? Without memo/2 also without
using a shared representation at all, just the plain tree.p and tree2.p
of 4000 bytes, and with n=6 I get:

/* tree.p, OR representation */
% Time 4376 ms, GC 627 ms, Wall 16/11/2023 17:03

/* tree.p, XOR representation [GF(2) Read-Muller expansion] */
% Time 5790 ms, GC 696 ms, Wall 16/11/2023 15:40

My conclusion, memo/2 is pretty useless, if it only gives speed-up 2x.
Memo and sharing, besides the sharing Prolog can do itself, seems to
be pretty useless, because in the ortho example there is nothing to share.

But to be sure, I might also do a test with sharing. Probablility of sharing gets
higher the closer one is to the leave nodes. But too close to the leave nodes
memo/2 also only incurs additional cost, doesn’t give any advantage. I am

quite ambivalent here about sharing, except maybe for a trick I have not yet
implemented that has to do with taking the negation of a node.

The file that works with my tree.p and tree2.p is as follows:

ortho.p.log (3,7 KB)

Edit 20.11.2023
But maybe memo/2 is nevertheless useful to reduce the space
requirememt? Because for n=7 I get the folowing, in formely Jekejeke
Prolog and JDK 21, so I am running out of memory:

% Error: Execution aborted since memory threshold exceeded.

How do we measure space requirements? But reducing space
also requires that there is something that can be shared or memoized,
so I have also my doubts that this works for the ortho example.

Maybe I should just buy a computer like you have with 128 GB memory?

Lets make a different competition. Run ortho until n=6 with:

Only 512m bytes (half giga bytes) memory

Can you do that? I can do it, but its slower. Results so far
are that it runs by this speed on only 512m bytes
(half giga bytes) memory:

/* treelow.p, OR representation */
% Time 30262 ms, GC 250 ms, Wall 16/11/2023 21:52

/* tree2low.p, XOR representation [GF(2) Read-Muller expansion] */
% Time 30699 ms, GC 233 ms, Wall 16/11/2023 21:51

The trick is to use Quine Algorithm for counting over a list of OR respectively
XOR representations. Quine Algorithm was presented by Joseph Vidal-Rosset
here. But I am only using a very naive implementation, also no product

heuristic which would use multiplication of cardinalities sometimes:

Implementing Quine's algorithm

Many CLP(B) solvers use certain heuristics to choose a variable in
the Quine Algorithm that gives additional speed. DPLL might also use unit
preference and pure preference heuristics, I did also not implement

such stuff since my prototype is only 6000 bytes and not based on clauses.

treelow.p.log (5,7 KB)
tree2low.p.log (5,9 KB)
ortholow.p.log (4,7 KB)

Yes. surely memo is useless for the case. Do you think memo is useless
in general for counting solutions in GF(2) ? I am not sure, but it may be. It is my custom to use it as table of SWI-Prolog without considering polynomials in GF(2).

Without memo.

%@ % 25,196,259 inferences, 1.856 CPU in 1.902 seconds (98% CPU, 13574166 Lips)
%@ Num of sols for orth-matrix(6) = 23040

With memo.

%@ % 25,199,511 inferences, 2.618 CPU in 2.667 seconds (98% CPU, 9625857 Lips)
%@ Num of sols for orth-matrix(6) = 23040

(Nothing of importance)

(Nothing of importance)

I have tested for a case like counting the number of elements in a power set, in which memo is necessary. With memo, counting for solutions for a polynomial with 1000 variables is done in a few second. Otherwise (without memo) for only 20 variables, counting seems going to looping. So I have to conclude that memo is necessary also for counting solutions for GF2.

(Nothing of importance)

Let me add comments to the test, which made me to conclude that memo (like table ) is necessary.

I used the “power polynomial” over variables x1, x2, …, xn defined by this.

 P1 = 1 + x1
 P(n+1) = Pn + x(n+1)*P(n)

One can produce in prolog by power_poly below, using 1,2,3,
as variables in the test below for simplicity.

% ?- power_poly(1, X).
% ?- power_poly(2, X).
%@ X = 1+x(1)+x(2)*(1+x(1)).
% ?- power_poly(3, X).
%@ X = 1+x(1)+x(2)*(1+x(1))+x(3)*(1+x(1)+x(2)*(1+x(1))).

power_poly(1, 1 + x(1)):-!.
power_poly(N, P+x(N)*P):- N0 is N-1, power_poly(N0, P).

The polynomial P(1000) is too large as a usual prolog term. So I created it
in ZDD using ZDD commands as below

BTW, according to the result by gf2_solve, solution seems unique, right? I tested for only 1, 2 by hand. If true, I rediscovered a " theorem" though
I have not a proof yet.

EDIT Aha, it is obvious by induction like a joke, though I checked by hand.

% ?- numlist(1, 1000, Ns),
%	time(((zdd zdd_power(Ns, P), gf2_solve(Ns, P, Sols),
%	card(P, D), card(Sols, C)))).
%@ % 256,597 inferences, 0.019 CPU in 0.021 seconds (94% CPU, 13240983 Lips)
%@ Ns = [1, 2, 3, 4, 5, 6, 7, 8, 9|...],
%@ P = 1001,
%@ Sols = C, C = 1,
%@ D = 10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376.

%@ % 222,001 inferences, 0.018 CPU in 0.020 seconds (90% CPU, 12586518 Lips)
%@ Ns = [1, 2, 3, 4, 5, 6, 7, 8, 9|...],
%@ P = 1001,
%@ Sols = C, C = 1.

For now, portability of the driver and tests themselves is a bigger issue. The most serious issue is that most benchmark are small programs, many of which stress arithmetic. That is not very representative. SWI-Prolog mostly likes big programs with large data sets. On various real world programs its performance has been reported to be pretty similar to e.g., YAP and SICStus. Sometimes much better (notably when large data sets or using lots of dynamic predicates), sometimes much worst (notably when constraints or tabling are important).

To avoid pollution of the directory with the intermediate files is easy. More annoying is that several Prolog systems insist adding compiled versions to the directory.

Possibly I’ll rewrite the scripts that connect the whole thing and makes the final output in (SWI-)Prolog :slight_smile:

(Nothing of importance)

No, I can’t do that. For dimension n = 7, was needed. (Activity Monitor) . I remember that for n = 8, 80 gb was needed several years ago. Anyway there must be much room in my codes to be revised. But for the time being, I have no plan to revisions. No idea for revision.

% ?- orth_matrix(7, _M),
%	profile(time((zdd gf2_poly_solve_list(_M, Sols), card(Sols, C)))).
%@ % 955,921,898 inferences, 119.263 CPU in 124.051 seconds (96% CPU, 8015238 Lips)
%@ Sols = 5138659,
%@ C = 1451520.

(Nothing of importance)