Prolog's performance on "Billion Nested Loop Iteration"

AQ2017 · January 12, 2026, 3:23am

Have you any information on Prolog’s performance in the 2025 “Billion Nested Loop Iteration” test? Three hundred different programming languages are said to have taken part in. [My apologies if a thread already exists on this issue]

kwon-young · January 12, 2026, 1:18pm

So, this is kind of the worst type of benchmark for prolog because it uses 2 features from procedural languages that prolog completely lacks:

loops for x in y
inplace assignment +=

But I enjoy applying prolog where it doesn’t belong, so here is my take, following the python code and the official description of the task:

loops(U, Return) :-
   random(0, 10_000, R),
   compound_name_arity(A, array, 10_000),
   init(A, 10_000),
   loops(U, R, A, 10_000),
   R1 is R + 1,
   arg(R1, A, Return).

init(A, I) :-
   (  I == 0
   -> true
   ;  arg(I, A, 0),
      I1 is I - 1,
      init(A, I1)
   ).

loops(U, R, A, I) :-
   (  I == 0
   -> true
   ;  loops(U, R, A, I, 10_000),
      arg(I, A, V1),
      V2 is V1 + R,
      nb_setarg(I, A, V2),
      I1 is I - 1,
      loops(U, R, A, I1)
   ).
loops(U, R, A, I, J) :-
   (  J == 0
   -> true
   ;  arg(I, A, V1),
      J1 is J - 1,
      V2 is V1 + J1 mod U,
      nb_setarg(I, A, V2),
      loops(U, R, A, I, J1)
   ).

Notes on the implementation:

I have used a compound to model the array and not a prolog list. (just try to write this task with a prolog list and you will understand why I didn’t ^^)
loops are obviously translated to recursion. One twist is that I do iterations from 10_000 to 0 instead of 0 to 10_000 because I find it easier and I believe slightly faster. I believe it doesn’t change anything to the spirit of the benchmark.
inplace operations += is translated to:
- arg(I, Array, V1),
- V2 is V1 + Other,
- nb_setarg(I, Array, V2).
  nb_setarg is slightly faster than setarg but I don’t really know if it is portable.
the code needed to be adapted to use 1-based indexing for arg.

Here are the results on my machine:

$ lscpu
12th Gen Intel(R) Core(TM) i7-1255U % on Fedora 43
$ swipl -O loops.pl
Welcome to SWI-Prolog (threaded, 64 bits, version 9.2.9)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.
Please run ?- license. for legal details.

For online help and background, visit https://www.swi-prolog.org
For built-in help, use ?- help(Topic). or ?- apropos(Word).

?- time(loops(40, R)).
% 200,072,550 inferences, 13.724 CPU in 13.782 seconds (100% CPU, 14578271 Lips)
R = 201415.

Profiling reveals that 99.9% of the time is spent in loops/5 (the inner loop), of which 77.5% is spent in itself and 22.5% in nb_setarg.

AQ2017 · January 12, 2026, 2:09pm

Impressive!
Thank you for your response ; but, you did not tell us in what position this result would have placed Prolog among the 300 or so languages.
Regarding the actual configuration of the test, how about organizing a parallel test more atune to the characteristics of higher level languages? Could SWI-Prolog take the lead?

kwon-young · January 12, 2026, 2:15pm

Hm, how about last

I tried to track down the runtime of other language and I believe I saw something like 4 sec for Python.
So no, prolog can’t take the lead for higher level languages for procedural benchmark.

What would be interesting would be a backtracking heavy benchmark. maybe prolog could take the lead there ?

jan · January 12, 2026, 3:59pm

This is indeed just about the worst task imaginable for comparing Prolog performance. I think the code is about as good as it gets. On my AMD Ryzen 9 3950X, Fedora 43, using 10.1.1 I get 8.8 sec. Compared to ~4 sec for Python I think this is excellent

abaljeu · January 12, 2026, 4:53pm

Python is considered to be a slow language. Twice as slow as python is not excellent. It might be understandable given Prolog’s nature, but it is certainly not excellent. To improve might require a complex optimization subsystem.

I’m not sure nb_setarg is a key Prolog function, but what else does Prolog have if you want O(1) update time?

abaljeu · January 12, 2026, 5:51pm

Using :- set_prolog_flag(optimize, true). on the original function reduces execution time by 50%. Replacing the original code with your code, keeping optimize on, reduces another 30% of the execution time.

Wait: the original code has a 37% distortion in profiler time, and the new code has 2%. This accounts for all of the time savings. But what accounts for the 37% distortion?

AQ2017 · January 12, 2026, 7:17pm

“What would be interesting would be a backtracking heavy benchmark”
… and why not! It is not just a question of relative advantage, but also one of higher level of computational capability.

kwon-young · January 12, 2026, 7:22pm

Well, profiling shows that nb_setarg is 22% of the whole runtime, so pretty important I would say.

Well, I believe your proposed implementation using prolog list doesn’t really follow the challenge rules. They explicitly says in the description of the task:

Note: Like with the reference implementations the innermost sum should be assigned directly to the array index.

You don’t assign the results of each innermost sum to the list, reducing the 10^8 array access to just 10^4, which of course leads to your mistaken observation that nb_setarg doesn’t contribute to the runtime much.

peter.ludemann · January 12, 2026, 7:40pm

I wonder how Prolog compares with Haskell?

jan · January 12, 2026, 8:43pm

No sure how to read this. Prolog is designed to search for solutions (variable bindings) that satisfy some logic based specification. Pattern matching based data transformation could also be considered something Prolog is good at. Loops over integers and handling arrays are not what it is good at. Then there are low level operations. Incrementing an array element is expensive. Imperative languages typically have low level support for array[i]++, Prolog has not. It doesn’t have a generally accepted and usable/portable notion of arrays. Few algorithms based arrays and notably updating arrays can be represented comfortably and efficient in Prolog.

Note that Python is not that good at these tasks either. It compensates for that using vector libraries that are implemented in C(++). That can be done for Prolog as well.

peter.ludemann · January 12, 2026, 9:10pm

Richard O’Keefe implemented FFTs in Prolog: Learning to implement data structures & algorithms in Prolog - #4 by peter.ludemann

kwon-young · January 13, 2026, 5:47am

It’s explained in the README of the project:

Changes to the benchmarks compared to legacy runner
…

loops: The inner loop is now 10k, again to allow slower languages to complete more runs.

It was 100k before.

AQ2017 · January 13, 2026, 5:51am

Perhaps a move away from plain number crunching to logic intensive number crunching? Even better, provide Prolog with the missing “low-level” extra capabilities? … Sounds like F1 racing.

Boris · January 13, 2026, 9:42am

Hi, with your code as is, this is the result I get with the same query as you. I compared this to a naive “count nested loops” query:

101 ?- time( loops(40, R) ).
% 200,051,123 inferences, 5.729 CPU in 5.742 seconds (100% CPU, 34922005 Lips)
R = 198000.

101 ?- time( aggregate_all(count, ( between(1, 10 000, _), between(1, 10 000, _) ), N) ).
% 200,010,003 inferences, 5.090 CPU in 5.099 seconds (100% CPU, 39296557 Lips)
N = 100000000.

It seems that using aggregation to count doesn’t add much in run time, or number of inferences. I then used the following query to try and guess how much of this time is caused by aggregation:

108 ?- time( forall( between(1, 10 000, _), forall( between(1, 10 000, _), true ) ) ).
% 100,009,998 inferences, 2.237 CPU in 2.240 seconds (100% CPU, 44705238 Lips)

Could anyone comment on the correctness of this approach in regards to the problem statement? Am I measuring the right things?

[EDIT]

The above timings were done with swipl -O. Without -O, I get such numbers:

101 ?- time( loops(40, R) ).
% 300,061,123 inferences, 17.066 CPU in 21.231 seconds (80% CPU, 17582905 Lips)
R = 204685.

102 ?- time( forall( between(1, 10 000, _), forall( between(1, 10 000, _), true ) ) ).
% 200,019,999 inferences, 4.503 CPU in 4.511 seconds (100% CPU, 44415360 Lips)
true.

kwon-young · January 13, 2026, 10:24am

So, the benchmark/challenge is measuring 100 million recursive call where each call does a load_memory + add + store_memory operation. This took you 5.7 sec.

You are comparing that with:

100 million backtracking + a simple non-backtracking counter. that took 5 sec
100 million backtracking. that took you 2 sec.

I’m not sure that we can draw any conclusion from this, as you are not comparing the same things.
But here are some observations:

there is a big performance drop from having no op to having something to do when backtracking/recursing.
- I attribute this to the shuffling of registers that needs to happen when you do something, instead of just doing jumps to the same instruction. (this is very hand wavy, I don’t really know what I am talking about).
  This is shown in the big difference between your two comparison (5sec vs 2sec)
swipl -O is extremely effective when having arithmetic in a hot path
your 100k backtracking + non backtracking counter takes almost the same time as the full benchmark with recursion. I believe this hints that backtracking is in general slower than recursing.

This compares 100 million non backtracking counter with recursion vs backtracking:

$ swipl -O
Welcome to SWI-Prolog (threaded, 64 bits, version 9.2.9)

?- [user].
|: loop(T) :- arg(1, T, N), (N == 0 -> true ; N1 is N - 1, nb_setarg(1, T, N1), loop(T)).
|: ^D% user://1 compiled 0.00 sec, 1 clauses
true.

?- time(loop(t(100 000 000))).
% 200,000,001 inferences, 2.849 CPU in 2.856 seconds (100% CPU, 70211263 Lips)
true.

?- time( aggregate_all(count, (between(1, 100 000 000, _) ), N) ).
% 200,000,002 inferences, 5.204 CPU in 5.219 seconds (100% CPU, 38433252 Lips)
N = 100000000.

jan · January 13, 2026, 10:55am

A big problem with this type of code in the current system is calling out to predicates defined in C. If we try the code below, doing a simple loop and only retrieving elements from “the array”, according to the profiler, 44% of the time is spent in arg/3, the only involved foreign predicate.

Reconsidering the foreign predicate interface might be worthwhile. It will remain problematic though because we need to save enough of the VM state to allow the foreign code to call the garbage collector, setup callbacks to Prolog, etc. Possibly we should define a more low level foreign interface …

t(N) :-
    functor(State, state, 100),
    loop(N, State).

loop(N, State) :-
    (   N > 0
    ->  loop2(State, 1, 100),
        N2 is N - 1,
        loop(N2, State)
    ;   true
    ).

loop2(State, I, Max) :-
    (   I =< Max
    ->  arg(I, State, _),
        I2 is I+1,
        loop2(State, I2, Max)
    ;   true
    ).

Boris · January 13, 2026, 12:23pm

Would that be a “no call-backs allowed” limitation that would help make faster foreign predicates? Or something completely different?

AQ2017 · January 13, 2026, 12:45pm

Additional twist to the benchmark could also help: a coefficient for terseness/brevity of the code.

jan · January 13, 2026, 12:45pm

There are several things we do before calling out to a foreign predicate:

Save several VM registers, so we get the complete view for running GC.
More VM state to allow for callbacks
Create a choicepoint, so we can backtrack and we can implement backtracking foreign predicates.

In addition, we create a normal environment stack frame holding the arguments. This allows us to have a uniform simple calling convention and provide a backtrace that includes the foreign predicate in case of an exception.

Omitting some of these steps will reduce the calling overhead, but also complicates the codebase. It is not that clear what the net effect would be …

Topic		Replies	Views
Six ways to iterate in Prolog General	44	14655	June 15, 2019
Database micro-benchmark (Discussion) Discussion	22	885	April 25, 2020
Porting the SWI-Prolog benchmark suite: comparing 8 Prolog systems Nice to know benchmark , performance , comparison	24	2556	November 29, 2023
Thinking In Prolog - for procedural programmers Help!	18	3476	July 21, 2019
Prolog program is much slower than Haskell -- why? General discussion	20	2238	February 24, 2023

Prolog's performance on "Billion Nested Loop Iteration"

Related topics