New Personal Records for Grid Graph Path Counting ($c(14)$ and $c(15)$) using ZDDs

I am excited to share some recent progress on a long-term project of mine.

Let c(n) be the number of simple paths in an undirected n \times n rectangular grid graph connecting the bottom-left node to the top-right node. For context, c(2) = 2 and c(3) = 12.

About a year ago, I reached c(13) using my ZDD library in SWI-Prolog but hit a wall. Recently, after implementing a very simple modification to the code, I managed to calculate c(14) and c(15).

Results

  • c(14) = 6,945,066,476,152,136,166,427,470,154,890,735,896,648,8
  • c(15) = 227,449,714,676,812,739,631,826,459,327,989,863,387,613,323,440

Benchmarks

c(14) [2026-02-23]

  • Hardware: M4 MacBook Pro (48 GB RAM)
  • OS/Env: macOS Tahoe 26.3 | SWI-Prolog 10.1.3 (arm64-darwin)
  • Performance: 601,379,081,724 inferences
  • Time: 35,600.805s CPU / 85,676.193s Wall (42% CPU)
  • Memory: 21.58 GB used

c(15) [2026-02-25]

  • Hardware: iMac (Intel 2020, 128 GB RAM)
  • OS/Env: macOS Tahoe 26.3 | SWI-Prolog 10.1.3
  • Performance: 2,422,258,053,335 inferences
  • Time: 288,955.899s CPU / 327,655.109s Wall (88% CPU)
  • Memory: 85.89 GB used

I have heard that the current world record stands at c(29). To me, that is incredible. I am curious what kind of “magic” (or advanced frontier-based pruning) is required to reach such heights!


correnction: world record is c(26).

With your numbering there is also a c(27) already:

https://oeis.org/A007764/b007764.txt

You could maybe solve c(16) with 1.5 TB RAM , who knows?
Pitty RAM prices are high right now, an 1.5 TB is beyond the
consumer sweet spot, might need server‑class ECC RDIMM.

Selected: 1.5 TB via 12 DIMMs at 6400 MHz.
Price goes up to: $58,661.00
https://www.thinkmate.com/system/rax-re95-1104-01

Can go cheaper? Seems that RAM now prints money.

Thanks for updating the current world record for the c(n). I have to find articles for that on the net.

BTW, now I guess the current version of my algorithm on c(n)
could be rewritten without using zdd instead simply using the built-in assoc library
mainly due to its automatic GC. In addition, I am interested in whether there is room to introduce foreign C code for some critical operations in the c(n), though as far as I see there seems no room for now.

Same question here. What appeared on my side can we use
Harddisk and an AI Accelerator? This would then work also for
smaller RAM. In as far we would avoid H. Iwashita and try some

A. J. Guttmann ideas. But even if we can apply some transfer
matrix method, there is still the problem that GPU, NPU or
TPU floats are too small for the big integer results.

To get around the big integer problem Ruben Spaans uses
modulo counting, one can then later use chinese remainder.
Its compact C-code and he writes n=24 required around 700 GB:

efficient and parallel version for the sequence A007764
OEIS/A007764-fast.c at master · stubbscroll/OEIS · GitHub

Thanks @Johnny_Rotten for comments advising to consider “virtual RAM” in my understanding, though I have no idea of such direction. However I remember one year ago when c(13) was successful I observed the Activity monitor showed 800 GB (not sure) virtual memory used for the prolog process in Emacs buffer, which I do not understand what was going on in the process. Also I thought for a while storing computation state for c(n) onto SSD so that suspended c(n) would be resumed later elsewhere. If I were inspired correctly by your comments, I would like to think again for more details. The computation state is a simple array of t(a, L, R), where a is in a a certain given small set, but L and R are indexes of the array possibly big integers. I have not clear idea on treating such array of text data involving big numbers.

Not necessarely virtual RAM. You could also use streaming processing.
When your computation is basically binary WAM. You can open the
same stream twice, doing a double loop and write a new stream.

Iterate this until you have the bottom up fixpoint as a solution. The new
thing I am suggesting would be to use some AI accellerator inbetween.
So the streams would not have small elements rather provide big blocks,

the blocks being processed and combined quite quickly. Then maybe
use a special stream TPU, i.e. a tensor processing unit, that has special
streaming architecture, like for example a chip set from Groq, Inc.:

“Groq’s initial name for their ASIC was the Tensor Streaming
Processor (TSP), codenamed “Alan” but was later rebranded
by Mark Heaps - VP of Brand, and Jonathan Ross - CEO/Founder,
from the TSP to the Language Processing Unit (LPU) to make
the nature of the processor more obvious.”
https://en.wikipedia.org/wiki/Groq

BinProlog(*) by Paul Tarau used the idea of binary WAM.
Basically break down all clauses into clauses with maximally
two body literals. Main challenge for counting is deduplication.

Here is a binary counting computation, that gives you the
paths in a triangle. The triangle is narrow at the top
and wide at the bottom, and you can go left and right

when you are inside the triangle:

p(R, C, M) :- 1 < C, C < R, !,
   R2 is R-1, 
   C2 is C-1, 
   p(R2, C2, M1),    /* right */
   p(R2, C, M2),     /* left */
   M is M1+M2.
p(_, _, 1).

Works fine so far:

?- between(1,10,R), (between(1,R,C), p(R,C,M), 
    write(M), write(' '), fail; nl), fail; true.
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
1 9 36 84 126 126 84 36 9 1
true.

But more illustrative would be an example with M is M1*M2,
and an aggregate sum to compute the next count. I think
this is called convolution algorithms in combinatorics.

See also:

The BinProlog Experience
https://arxiv.org/abs/1102.1178

1 Like

I realized that I lack knowledge to understand your comments using “hard disk”. So far I have been trying to find only on-core solutions using builtt-in Prolog library. I need take time to catch up with your comments somehow.

Mostlikely hard disk is not necessary for smaller n. You can bring
time and space requirement also down by a clever frontier
encoding. Like for example this paper and C++ code does:

Efficient Computation of the Number of Paths
https://github.com/kunisura/GGCount

I find that it uses much less time and space already for c(14)
and c(15). If I compare with your figures they use multiple
order of magnitude less seconds and mega bytes:

compute in s SWI Kuniaki C++ Iwashita
c(14) 35601 5
c(15) 288956 14
memory in mb SWI Kuniaki C++ Iwashita
c(14) 21580 41
c(15) 85880 111

Their own benchmark is mainly against the simpath algorithm
with ZZD encoding I suppose, from another paper and code of
theirs, in 2012. The 2013 improvement being the frontier

encoding without any ZZD, and using hash tabling?