Memory accumulation on backtracking

I’m searching a large graph in a depth first fashion, and the memory usage accumulates until I’m out of memory. I don’t really understand why. Here is some simple code for a depth first search to a finite depth:

walk(0, [Node|_], Node) :- !.
walk(N0, [Node|T], Final) :-
    N is N0 - 1,
    connect(Node, Next),
    \+ memberchk(Next, T),
    walk(N, [Next,Node|T], Final).

which I invoke like this, with the aim of counting all the final nodes:

aggregate_all(count, Final, walk(64, [Start], Final), Count).

Why does it run out of memory? How can I write it so that it does not?

Oops, I feel responsible, since I was drumming for aggregates!

Because you use aggregate_all/4 and not aggregate_all/3.
aggregate_all/4 is still based on findall/3, doesn’t optimize count,
its not implemented via a state and some failure driven loop.

SWI-Prolog is open source. Just inspect the implementations,
the Prolog code for aggregate_all/4 starts at line 265:

GitHub Repository - SWI Devel
https://github.com/SWI-Prolog/swipl-devel/blob/1cb3a63efb08819f453432d760c7c83cab017cbb/library/aggregate.pl

Edit 23.12.2023
One way out that library(aggregate) is not optimized always would
be to use mode directed tabling in SWI-Prolog. With mode directed tabling
you can for example get an aggregate/3 without a findall/3 under the hood.

Another way out would be an improved library(aggregate).

1 Like

Would you mind providing a short example how I could achieve a count of the unique final nodes using mode directed tabling?

An alternative is to just not use backtracking by having cuts all over the place and manage the stack myself, but that seems to defy the point of using Prolog…

Don’t know exactly. But you can use “sum” mode with value
1 to archive counting? And a declaration like this here:

:- table p(_, sum)

Gives you a grouping on the first argument, of the sums. It basically
makes p/2 behave like, except that the result is memoized as well:

p(X,Y) :- aggregate(sum(Z), q(X,Z), Y).

Where q/2 would be the definition of p/2.

Edit 23.12.2023
aggregate/3 is very much designed after SQL, although one can
understand it in terms of bagof/3. But aggregate/3 shares a property
of SQL, namely that the non aggregate result variables work as “groupers”:

SELECT ProductName, SUM(Price)
FROM Sales
GROUP BY ProductName

Isn’t the GROUP BY clause in SQL optional, wouldn’t a ProductName grouping
be the default if it appears in the SELECT. Don’t remember exactly.

Ok, just managing the stack myself looks neater to be honest, so I’ll just do that. I also strongly suspect that tabling will explode the memory too.

No, because it doesn’t table q/2:

mode directed tabling tables after the aggregate, it will table p/2,
not before the aggregate, it will not table q/2. You need to add a
predicate by some new name, and not place the table/1 directive

before walk/3. But if you use a table directive for q/2, it will
probably explode. But this is not what I was suggesting.

How would I do that with the walk predicate above to count unique final states? A final state is a function of the path that brought it to the final state. How would I avoid tabling paths in order not to explode the memory?

Whats your start value?

:- table walk_grouped_and_counted(_, sum).

walk_grouped_and_counted(Final, 1) :- walk(64, [start], Final).

?- walk_grouped_and_counted(Final, Count).

Make walk_grouped_and_counted/2 with a larger arity, and pass some parameters?

Edit 23.12.2023
Works fine, here a small test case:

walk(64, [start], stop1) :- between(1,1000000,_).
walk(64, [start], stop2) :- between(1,500000,_).
walk(64, [start], stop1) :- between(1,500000,_).

I then get:

?- walk_grouped_and_counted(Final, Count).
Final = stop2,
Count = 500000 ;
Final = stop1,
Count = 1500000.

Didn’t give me a Prolog overflow. But maybe I didn’t try hard enough?

2 Likes

Marvelous, thank you. I’ll try it.