Performance SWI-Prolog 8.3.5 vs YAP 6.5.0

I have redone performance comparison in the classical van Roy benchmark suite extended with some more recent benchmarks (this suite is used for profile guided optimization). First made the benchmark run again on YAP 6.5.0 and disabled the additions for yap except sieve which needed no porting.

Details:

  • YAP @3298a680300b350a8c56634d020efae252d09422 (Jan 5, 2020)
    • Default ./configure && make && make install
  • SWI-Prolog V8.3.5-18-g7f94d2ebf
    • cmake -G Ninja .. && ninja
  • SWI-Prolog V8.3.5-18-g7f94d2ebf (PGO)
    • cmake -G Ninja .. && ../scripts/pgo-compile.sh && ninja
  • System
    • Ubuntu 20.04, gcc 9.3.0, AMD Ryzen 9 3950X, 64Gb mem

yap swipl swipl -O swipl -O (PGO)
boyer 199 273 275 189 137.19% 138.19% 94.97%
browse 150 258 259 176 172.00% 172.67% 117.33%
chat_parser 219 390 393 298 178.08% 179.45% 136.07%
crypt 255 265 131 109 103.92% 51.37% 42.75%
fast_mu 266 317 239 175 119.17% 89.85% 65.79%
flatten 295 316 279 185 107.12% 94.58% 62.71%
meta_qsort 152 243 234 165 159.87% 153.95% 108.55%
mu 160 320 327 211 200.00% 204.38% 131.88%
nreverse 92 269 273 122 292.39% 296.74% 132.61%
poly_10 231 271 209 159 117.32% 90.48% 68.83%
prover 186 334 337 225 179.57% 181.18% 120.97%
qsort 179 378 306 195 211.17% 170.95% 108.94%
queens_8 373 303 163 117 81.23% 43.70% 31.37%
query 247 298 146 132 120.65% 59.11% 53.44%
reducer 219 337 331 229 153.88% 151.14% 104.57%
sendmore 204 364 140 120 178.43% 68.63% 58.82%
simple_analyzer 238 343 313 222 144.12% 131.51% 93.28%
tak 285 299 207 136 104.91% 72.63% 47.72%
zebra 170 332 340 190 195.29% 200.00% 111.76%
sieve 1524 277 239 195 18.18% 15.68% 12.80%

table generated using https://www.tablesgenerator.com

As chart, shorter bar is better

5 Likes

Interesting that nreverse is 3 times slower in swi.
Do you have an idea why this is?

EDIT: Since it is only 30 integers, probably most of the cost is in some kind of initial setup for the call.

For a long time the performance on naive reverse was the holy grail in Prolog country :slight_smile: . I never bothered. Nrev basically evaluates the performance of append/3. Quite likely some systems have dedicated recognition and VM support to speedup what append needs. SWI-Prolog has some of that too: dedicated indexing for predicates with two clauses, one having [] as first argument and the other [_|_].

1 Like

Updated the above. Where I claimed the comparison against the PGO compiled one, this was actually not the case. Added a column using the really PGO compiled version of SWI-Prolog. This gives a seriously better picture :slight_smile:

1 Like

Does it make sense to have such a comparison with one of the commercial prolog such as Sicstus.

thanks,

Dan

Sure, but I do not have access to SICStus. If anyone has, extend the portability of the benchmark and run it. I’ve created the charts the lazy way with Libre Office calc :frowning:

Some more insight comes from the Pereira benchmarks, which test specific aspects of Prolog. I’ve created a plot comparing the same YAP and SWI-Prolog (with PGO).

Having a closer look at this I (tentatively) come to these observations:

  • The arg(N) test stands out. It doesn’t really measure arg/3 but a 100 fold chain of calls like this
    arg5(N, T, R) :- arg(N, T, X), arg6(N, X, R).
    only a small part of this is spent on arg/3 itself.
  • Also considering the args(N) tests, the main bottleneck wrt YAP seems to be pushing the arguments for the next call, in particular for calls subject to LCO (Last Call Optimization).
  • The shallow_backtracking test assumes no multi-argument indexing. As both YAP and SWI-Prolog have this we get 0/0 :slight_smile:

I was more or less aware of this. I’ve made similar comparisons long ago (around version 5.4 I think). Lots of things have improved since. I have the impression that the attention Vitor and colleagues have put in writing the C code such that the C compiler gets everything right is no longer required due to improvements of the C compiler technology, in particular when helping the compiler using profile guiding.

What remains mostly the consequence of a register based design (the WAM) vs. the stack based design of the ZIP VM. AFAIK this is not fundamental. B-Prolog (now Picat) does get last calls efficient AFAIK and also uses a stack based design.


SWI-Prolog vs YAP on the Peirera benchmarks. Below 100% means SWI-Prolog is faster, above 100% YAP is faster

Thanks for the benchmarks, what does PGO stand for?

Profile Guided Optimization.

There is a section on it in here: https://github.com/SWI-Prolog/swipl-devel/blob/7c37bf00d2195ca25a32bca855c0daaad88780b5/CMAKE.md

2 Likes

Update for the classical benchmark set (see first post). Versions:

  • SWI-Prolog 8.3.6-9-g47015b5c4 (upgraded from 8.3.5)
  • YAP 6.5.0
  • SICStus 4.6.0 (new)
SWI YAP SICSTUS
boyer 0.116 0.200 0.044
browse 0.163 0.140 0.040
chat_parser 0.297 0.208 0.065
crypt 0.107 0.242 0.023
fast_mu 0.159 0.252 0.037
flatten 0.170 0.270 0.038
meta_qsort 0.157 0.148 0.032
mu 0.191 0.153 0.041
nreverse 0.101 0.094 0.020
poly_10 0.150 0.222 0.031
prover 0.228 0.192 0.044
qsort 0.166 0.170 0.029
queens_8 0.114 0.348 0.019
query 0.129 0.236 0.034
reducer 0.220 0.214 0.042
sendmore 0.127 0.192 0.032
simple_analyzer 0.201 0.229 0.043
tak 0.142 0.275 0.023
zebra 0.197 0.162 0.070
sieve 0.208 1.546 0.697
queens_clpfd 0.238 - -
pingpong 0.216 0.073 -
fib 0.220 0.055 -

Or, as chart (shorter bar is better)

swi-yap-sicstus

The performance of SICStus 4.6.0 on these benchmarks is truly impressive. Note that SICStus 4 compiles to native code for the AMD64 instruction set.

Now the question is what is the value of all this? I’m doing some work on ALE, notably the not-yet-public version 4. ALE compiles a grammar into Prolog program, for ALE4 to a constraint program. Now we get

SWI SICStus
105 sec 143 sec

Running the grammar is still a bit dubious to compare as there is still a porting issue due to which not all sentences are correctly parsed by the SWI-Prolog port. The current result on a few sentences is below.

SWI SICStus
3.3 sec 1.9 sec

Note that the code is written for SICStus 3 and ported to SICStus 4 by the developers and to SWI-Prolog by me. It is not unlikely the choices are not optimal for each Prolog. For example the code quite heavily uses clause/2 for facts holding huge structures in the head arguments rather than calling the dynamic predicate which is claimed to be faster in SICStus 3, but surely slower in SWI-Prolog. The benchmark set indicates SWI-Prolog is only faster than SICStus on the dynamic database. Profiling the grammar compilation says only 2.1% of the time is spent on assert/1.

Another interesting observation is that while the recent VM changes to SWI-Prolog for faster last calls and in the current GIT version for inline arg/3 calls (these are heavily used by ALE) have significant impact on some dedicated benchmarks such as the Peirera set, but hardly affect larger programs.

The overall picture is rather unclear to me :frowning:

1 Like

This is looking better and better; thanks!

Is this with PGO enabled for the benchmarks?

Yes. SWI-Prolog was compiled on Ubuntu 20.04, gcc 9.3 on AMD 3950X using

../scripts/pgo-compile.sh
ninja
1 Like

It won’t make much difference for this. SWI-Prolog does have specific code to deal with two clauses, one of which has [] and the other [_|_] as first argument.

This code would be analyzed as something like the following (the compiler probably does something slightly different, but effectively equivalent for this situation):

sorted([X|YZ]) :-
    YZ=[Y|Z],
    after(X,Y,T), sat(T), sorted([Y|Z]).
sorted([_|_]).

Both clauses’ heads match [|], so there’s a choice point.

As discussed before, SWI-Prolog’s hash based indexes are for many clauses. Except for a few special cases and the classical first argument indexing, nothing happens for predicates with just a few clauses.

Nope, I saw this in Kuniaki Mukais code :spider::

inner_product([X], [Y], X*Y):-!.
inner_product([X|Xs], [Y|Ys], #((X*Y),Z)):- inner_product(Xs, Ys, Z).

Since [] is at the end of a list, and deep indexing is even not possible
since the second clause has a variable as tail,

it will probe the first clause for every element it traverses. Changing the
code could also lead to slight performance improvement.

I don’t understand … [_|_] is the same as [_]:

?- [A|B] = [C].
A = C,
B = [].

Anyway, to get full indexing, you’d need to rewrite like the following, I think (code might contain typos):

sorted([X|YZ]) :- sorted2(YZ, X).

sorted2([Y|Z], X]) :-
   after(X, Y, T),
   sat(T),
   sorted([Y|Z]).
sorted2([], _).

It would be nice if SWI-Prolog did this transformation automatically (and a lot of other transformations). I think that Peter Van Roy’s Aquarius compiler did such things, but it’s a non-trivial amount of work to write such a compiler.

1 Like

Oops. I meant: [_] is the same as [_|Zs], Zs=[] (or [_] is the same as [_|[]]).