Trying to understand release 9.1.16 arithmetic

I am trying to understand release 9.1.16 arithmetic.
I didn’t test whether this already happened before
release 9.1.16. I am using this popular test case:

tarai(X, Y, _, R) :- X =< Y, !, R = Y.
tarai(X, Y, Z, R) :- X_1 is X-1, tarai(X_1, Y, Z, R_X),
   Y_1 is Y-1, tarai(Y_1, Z, X, R_Y),
   Z_1 is Z-1, tarai(Z_1, X, Y, R_Z),
   tarai(R_X, R_Y, R_Z, R).

Testing with optimise flag false and true. I get these results:

/* optimise=false */
?- time(tarai(12, 11, 0, X)).
% 54,182,800 inferences, 2.625 CPU in 2.616 seconds (100% CPU, 20641067 Lips)
X = 12.

/* optimise=true */
?- time(tarai(12, 11, 0, X)).
% 27,091,399 inferences, 1.906 CPU in 1.918 seconds (99% CPU, 14211881 Lips)
X = 12.

Interestingly when using vm_list/1 I don’t see a difference. Already
optimise=false does generate a_add_fc instruction. But how is the
inference count different between the two? And as a result the LIPS

lower, although optimise=true is faster and does the same thing?

Edit 27.09.2023
There is also the suspicion that SWI-Prolog LIPS are too low in
general. Maybe the tail recursive call is not counted? For example
this Prolog system reports a different count of inferences, much higher:

$ ./tarai 12 11 0
% 196412655 inferences, 3.34256 CPU seconds (58761215.967661 Lips)
12

Guarded Horn Clauses - tadashi9e, 2023
https://qiita.com/tadashi9e/items/45cef62cda6d38dda0c7

Or the SWI-Prolog approach is the more reasonable one, it might not count
(=<)/2, (!)/0 and (=)/2, which also appears in Tarai? But then how does
optimise=true influence the counting, when these built-ins are not counted?

The first clause is different as -O inlines the X =< Y. As we have seen, calling out to C from the VM is fairly expensive. I should do some profiling on that. We currently make a mark (part of a normal choicepoint) to allow for backtracking and we must publish several VM registers that are needed in case the foreign code triggers a GC or stack shift. On return these published registers may have changed, so we need to restore these. The inlined =< doesn’t need all that and also seems to optimize a bit on the common small int/int and float/float comparisons.

It seems to be counting call and redo ports. It surely does not count anything that is inlined into the VM. Redos that are eliminated by clause indexing are missing too.

Alright, now I see it. Was just looking at the number of
dissassembled lines, and somehow thought they have the
same byte code. But now I see it. So line numbers can make

+1 and +2 increments in vm_list/1? I also see +3 and +4
increments elsewhere. What do the numbers measure, they are
not really line numbers, is this documented somewhere?

/* optimise=false */
?- vm_list(tarai/4).
       [...]
       1 b_var0
       2 b_var1
       3 i_call(system:(=<)/2)
       5 i_cut
       [...]

/* optimise=true */
?- vm_list(tarai/4).
       [...]
       1 a_enter
       2 a_var0
       3 a_var1
       4 a_le
       5 i_cut
       [...]

But otherwise I am right that the byte code is the same? So a_XX
instead of b_XX and i_call in the (=<)/2 is responsible for the
0.7 sec speed up? Quite amazing!

They are not line numbers but addresses relative to the start of the clause. Each instruction has an op-code and zero or more arguments. Some have variable length arguments, e.g., strings and big numbers. And no, it is not documented. The VM listing has no formal status. it is just a useful tool for helping to understand and debug what is going on. To understand it in detail, read the source :slight_smile:

1 Like

Wouldn’t it be possible to make optimise=true the default
for arithmetic. Or is this still an experimental flag? I don’t
see some functional difference between optimise=true

and optimise=false. For example one can try:

/* optimise=false */
?- time(tarai(8+4, 6+5, 0, X)).
% 54,182,800 inferences, 2.641 CPU in 2.647 seconds (100% CPU, 20518930 Lips)
X = 8+4.

/* optimise=true */
?- time(tarai(8+4, 6+5, 0, X)).
% 27,091,399 inferences, 1.906 CPU in 1.896 seconds (101% CPU, 14211881 Lips)
X = 8+4.

So unlike other Prolog systems, that sometimes forbid expressions
when inlining arithmetic, SWI-Prolog does not forbid expressions.
This would get you out of options hell, fewer options. Or lets say

better defaults, that give you the best performance, and not defaults
that deprive you of some performance. When I install SWI-Prolog freshly
arithmetic optimisation is not enabled, which somehow makes no sense.

Or is there a use case which I am not aware of, where optimise=false
would be needed? What would be such a use case?

Edit 28.09.2023
If the use case can be specified, the arithmetic optimization could be
turned from an opt-in option, into an opt-out option. So that default
would be arithmetic optimization is on, and those banging their head

on such a special use case, will have the option to opt-out. It looks
to me that the flag is Prolog text local, right? So special use cases
can be opted in isolation, without affecting the rest of the code base.

Possibly. It would probably lead to better results in performance comparisons as many people do not enable this and many benchmarks are biased towards arithmetic. On the other hand, it only affects programs that spend significant time in arithmetic and most (interesting) Prolog programs do not. The somewhat odd observation is that programs using a little arithmetic tend to get slower. That probably is related to caching and likely depends on hardware and C compiler. The second problem is that optimization causes arithmetic to disappear from the debugger.

Yes. The flag is scoped by the compiler and applies all code (including code loaded from this file) following the flag to the end of the file. The -O changes the initial value.

1 Like

This principle is not consistently applied. And maybe you would get
a little speed advantage if you would have two instructions instead
of only one instruction a_add_fc, One instruction for optimise=true

and one instruction for optimise=false. I am assuming that the
debugger branching needs a few CPU cycles? Here is proof that the
principle is not consistently applied. I find when optimise=true:

/* SWI-Prolog 9.1.16, optimise=true */
?- trace, tarai(12,11,0,X).
   Call: (11) tarai(12, 11, 0, _30674) ? creep
   Call: (12) _32114 is 12+ -1 ? 

On the other hand the (=<)/2 indeed disappears from debugging.

The consistency is that some really simple arithmetic is always compiled and for these we take the trouble to support it in the debugger. In theory that is also possible for compiled arithmetic: execute the instructions in another mode to create the evaluable term and call normal arithmetic. That is IMO too much work and code.

A post was merged into an existing topic: Ann: SWI-Prolog 9.1.16