New Personal Records for Grid Graph Path Counting ($c(14)$ and $c(15)$) using ZDDs

I am excited to share some recent progress on a long-term project of mine.

Let c(n) be the number of simple paths in an undirected n \times n rectangular grid graph connecting the bottom-left node to the top-right node. For context, c(2) = 2 and c(3) = 12.

About a year ago, I reached c(13) using my ZDD library in SWI-Prolog but hit a wall. Recently, after implementing a very simple modification to the code, I managed to calculate c(14) and c(15).

Results

  • c(14) = 6,945,066,476,152,136,166,427,470,154,890,735,896,648,8
  • c(15) = 227,449,714,676,812,739,631,826,459,327,989,863,387,613,323,440

Benchmarks

c(14) [2026-02-23]

  • Hardware: M4 MacBook Pro (48 GB RAM)
  • OS/Env: macOS Tahoe 26.3 | SWI-Prolog 10.1.3 (arm64-darwin)
  • Performance: 601,379,081,724 inferences
  • Time: 35,600.805s CPU / 85,676.193s Wall (42% CPU)
  • Memory: 21.58 GB used

c(15) [2026-02-25]

  • Hardware: iMac (Intel 2020, 128 GB RAM)
  • OS/Env: macOS Tahoe 26.3 | SWI-Prolog 10.1.3
  • Performance: 2,422,258,053,335 inferences
  • Time: 288,955.899s CPU / 327,655.109s Wall (88% CPU)
  • Memory: 85.89 GB used

I have heard that the current world record stands at c(29). To me, that is incredible. I am curious what kind of ā€œmagicā€ (or advanced frontier-based pruning) is required to reach such heights!


correnction: world record is c(26).

With your numbering there is also a c(27) already:

https://oeis.org/A007764/b007764.txt

You could maybe solve c(16) with 1.5 TB RAM , who knows?
Pitty RAM prices are high right now, an 1.5 TB is beyond the
consumer sweet spot, might need server‑class ECC RDIMM.

Selected: 1.5 TB via 12 DIMMs at 6400 MHz.
Price goes up to: $58,661.00
https://www.thinkmate.com/system/rax-re95-1104-01

Can go cheaper? Seems that RAM now prints money.

Thanks for updating the current world record for the c(n). I have to find articles for that on the net.

BTW, now I guess the current version of my algorithm on c(n)
could be rewritten without using zdd instead simply using the built-in assoc library
mainly due to its automatic GC. In addition, I am interested in whether there is room to introduce foreign C code for some critical operations in the c(n), though as far as I see there seems no room for now.

Same question here. What appeared on my side can we use
Harddisk and an AI Accelerator? This would then work also for
smaller RAM. In as far we would avoid H. Iwashita and try some

A. J. Guttmann ideas. But even if we can apply some transfer
matrix method, there is still the problem that GPU, NPU or
TPU floats are too small for the big integer results.

To get around the big integer problem Ruben Spaans uses
modulo counting, one can then later use chinese remainder.
Its compact C-code and he writes n=24 required around 700 GB:

efficient and parallel version for the sequence A007764
OEIS/A007764-fast.c at master Ā· stubbscroll/OEIS Ā· GitHub

Thanks @Johnny_Rotten for comments advising to consider ā€œvirtual RAMā€ in my understanding, though I have no idea of such direction. However I remember one year ago when c(13) was successful I observed the Activity monitor showed 800 GB (not sure) virtual memory used for the prolog process in Emacs buffer, which I do not understand what was going on in the process. Also I thought for a while storing computation state for c(n) onto SSD so that suspended c(n) would be resumed later elsewhere. If I were inspired correctly by your comments, I would like to think again for more details. The computation state is a simple array of t(a, L, R), where a is in a a certain given small set, but L and R are indexes of the array possibly big integers. I have not clear idea on treating such array of text data involving big numbers.

Not necessarely virtual RAM. You could also use streaming processing.
When your computation is basically binary WAM. You can open the
same stream twice, doing a double loop and write a new stream.

Iterate this until you have the bottom up fixpoint as a solution. The new
thing I am suggesting would be to use some AI accellerator inbetween.
So the streams would not have small elements rather provide big blocks,

the blocks being processed and combined quite quickly. Then maybe
use a special stream TPU, i.e. a tensor processing unit, that has special
streaming architecture, like for example a chip set from Groq, Inc.:

ā€œGroq’s initial name for their ASIC was the Tensor Streaming
Processor (TSP), codenamed ā€œAlanā€ but was later rebranded
by Mark Heaps - VP of Brand, and Jonathan Ross - CEO/Founder,
from the TSP to the Language Processing Unit (LPU) to make
the nature of the processor more obvious.ā€
https://en.wikipedia.org/wiki/Groq

BinProlog(*) by Paul Tarau used the idea of binary WAM.
Basically break down all clauses into clauses with maximally
two body literals. Main challenge for counting is deduplication.

Here is a binary counting computation, that gives you the
paths in a triangle. The triangle is narrow at the top
and wide at the bottom, and you can go left and right

when you are inside the triangle:

p(R, C, M) :- 1 < C, C < R, !,
   R2 is R-1, 
   C2 is C-1, 
   p(R2, C2, M1),    /* right */
   p(R2, C, M2),     /* left */
   M is M1+M2.
p(_, _, 1).

Works fine so far:

?- between(1,10,R), (between(1,R,C), p(R,C,M), 
    write(M), write(' '), fail; nl), fail; true.
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
1 9 36 84 126 126 84 36 9 1
true.

But more illustrative would be an example with M is M1*M2,
and an aggregate sum to compute the next count. I think
this is called convolution algorithms in combinatorics.

See also:

The BinProlog Experience
https://arxiv.org/abs/1102.1178

1 Like

I realized that I lack knowledge to understand your comments using ā€œhard diskā€. So far I have been trying to find only on-core solutions using builtt-in Prolog library. I need take time to catch up with your comments somehow.

Mostlikely hard disk is not necessary for smaller n. You can bring
time and space requirement also down by a clever frontier
encoding. Like for example this paper and C++ code does:

Efficient Computation of the Number of Paths
https://github.com/kunisura/GGCount

I find that it uses much less time and space already for c(14)
and c(15). If I compare with your figures they use multiple
order of magnitude less seconds and mega bytes:

compute in s SWI Kuniaki C++ Iwashita
c(14) 35601 5
c(15) 288956 14
memory in mb SWI Kuniaki C++ Iwashita
c(14) 21580 41
c(15) 85880 111

Their own benchmark is mainly against the simpath algorithm
with ZZD encoding I suppose, from another paper and code of
theirs, in 2012. The 2013 improvement being the frontier

encoding without any ZZD, and using hash tabling?

No progress and idea for using external storage for zdd computation yet.
Instead, I added an additional eager pruning by accident, which seemed trivial but
showed about twice time improvement for c(10).

% ?- time(rect_path_count(rect(9,9), C)) %% for 10x10 nodes.
@ 2,146,483,380 inferences, 95.845 CPU in 99.984 seconds (96% CPU, 22395249 Lips)
%@ C = 41044208702632496804.

1 Like

I changed the Pascal triangle backward chaining rules, going from goals
to sub goals, that ordinary Prolog also does, into Pascal triangle
forward chaining rules, going from facts to derived facts:

% rule(-Term)
rule(p(1, 1, 1)).

% rule(+Term, -Term)
rule(p(R2, 1, 1), p(R, 1, 1)) :-
   R is R2+1.
rule(p(R2, R2, 1), p(R, R, 1)) :-
   R is R2+1.

% rule(+Term, +Term, -Term)
rule(p(R2, C2, M1), p(R2, C, M2), p(R, C, M)) :-
   R is R2+1,
   C =:= C2+1,
   M is M1+M2.

Now using a predicate init/1 that writes all the result of rule/1 on a
file. And a predicate step/2 that applies and writes all the results
of rule/2 and rule/3 by reading an old file and writing a new file:

?- init('even.p').
?- step('even.p', 'odd.p').
?- step('odd.p', 'even.p').
?- step('even.p', 'odd.p').
?- step('odd.p', 'even.p').

Its basically BinProlog but bottom up and streaming, step/2 trivially
uses an inner and outer loop for the two inputs of rule/3. After doing
the above, inspecting the file ā€˜even.p’ I see a valid Pascal triangle

row, and its indeed a frontier calculation:

p(5, 1, 1).
p(5, 2, 4).
p(5, 3, 6).
p(5, 4, 4).
p(5, 5, 1).

My count_rect_paht/3 calls directly count_udg_path, and
the latter is for general undirected graphs, in particular, does not assume planarity like the rectangular grid graph.

This query is for path counting in complete graph K(n). n=13 seems max for test. It is well-known that complete graph K(n) with n nodes is not planar for n >= 5.

% ?- time((between(2, 13, N), findall(I-J, (between(1, N, I), K is I + 1, between(K, N, J)), E),
%	udg_path_count(1-N, E, C), writeln(count_path_of_K(N)=C), fail)).
%@ count_path_of_K(2)=1
%@ count_path_of_K(3)=2
%@ count_path_of_K(4)=5
%@ count_path_of_K(5)=16
%@ count_path_of_K(6)=65
%@ count_path_of_K(7)=326
%@ count_path_of_K(8)=1957
%@ count_path_of_K(9)=13700
%@ count_path_of_K(10)=109601
%@ count_path_of_K(11)=986410
%@ count_path_of_K(12)=9864101
%@ count_path_of_K(13)=108505112
%@ % 1,364,231,392 inferences, 61.385 CPU in 63.662 seconds (96% CPU, 22224211 Lips)
%@ false.

Looks like a graphillion library is in the working!

Seems to agree with:

Total number of ordered k-tuples (k=0..n) of
distinct elements from an n-element set.
https://oeis.org/A000522

It has also an easy recursion formula:

factorial(0, 1) :- !.
factorial(N, X) :- M is N-1, factorial(M, Y), X is N*Y.

Seems to work, except for N=2:

?- between(2, 13, N), M is N-2, factorial(M, Y), 
Z is floor(e*Y), write(paths(N)=Z), nl, fail; true.
paths(2)=2
paths(3)=2
paths(4)=5
paths(5)=16
paths(6)=65
paths(7)=326
paths(8)=1957
paths(9)=13700
paths(10)=109601
paths(11)=986410
paths(12)=9864101
paths(13)=108505112
true.

I somewhere claimed the Pascal triangle can be computed
with a window N = 2, replacing BinProlog with two streams
by SlideProlog with only one stream but a sliding window.

Here is some code, the first stream/1 clause is a sliding window N = 2:

init([p(1,1,1)]).

step([p(R2, 1, 1)|L], [p(R, 1, 1)|S]) :- 
   R is R2+1,
   stream([p(R2, 1, 1)|L], S).

stream([p(R2, C2, M1), p(R2, C, M2)|L], [p(R, C, M)|S]) :-
   R is R2+1,
   C =:= C2+1,
   M is M1+M2,
   stream([p(R2, C, M2)|L], S).
stream([p(R2, R2, 1)],[p(R, R, 1)]) :- 
   R is R2+1.

Seems to work:

?- init(X).
X = [p(1, 1, 1)].
?- init(_X), step(_X, Y).
Y = [p(2, 1, 1), p(2, 2, 1)].
?- init(_X), step(_X, _Y), step(_Y, Z).
Z = [p(3, 1, 1), p(3, 2, 2), p(3, 3, 1)].
?- init(_X), step(_X, _Y), step(_Y, _Z), step(_Z, T).
T = [p(4, 1, 1), p(4, 2, 3), p(4, 3, 3), p(4, 4, 1)].
?- init(_X), step(_X, _Y), step(_Y, _Z), step(_Z, _T), step(_T, R).
R = [p(5, 1, 1), p(5, 2, 4), p(5, 3, 6), p(5, 4, 4), p(5, 5, 1)].