I’m using janus to write a Prolog solver that returns proofs as Python objects.
As there are millions of proofs, I loop over the iterator returned by Janus and immediately process each proof so that the object created by the solver can be freed. However, the objects created are not removed from the heap by Python’s automatic garbage collection (which means there is still some reference to them somewhere?).
I added a simplified test case to illustrate:
engine.pl:
import time
from janus_swi import janus
import numpy as np
class Dummy():
def __init__(self):
self.mem = np.arange(0, 1024**2)
class DummyFactory():
def __init__(self):
pass
def make_dummy(self):
return Dummy()
variables = {
"Factory": DummyFactory(),
"N": 1000,
}
janus.consult("engine.pl")
results = janus.query('get_dummies(Factory, Dummy, N)', variables)
for x in results:
print(x)
time.sleep(0.1)
When creating the Dummy objects with Janus, the memory usage increases consistently, while if I create the objects in Python the memory usage stays constant as the created objects are immediately freed.
Prolog references to Python objects are represented as blobs. Blobs is a generalization of a Prolog atom. Eventually, Prolog should call the atom garbage collector and each freed blob that represents a Python object causes a Python reference count decrement.
So, collecting is not immediate. By default Prolog starts the atom garbage collector if there are 10,000 new atoms that could be garbage. It seems you only have 1,000 iterations, so atom GC is probably not yet triggered. You can verify this by calling garbage_collect_atoms/0 from time to time. You can also set the flag agc_margin to some lower number.
You can check Prolog’s AGC behaviour using statistics/0 and you can examine references to Python objects from Prolog using
?- current_blob(X, 'PyObject').
Then there is one more nasty thing. The Prolog atom garbage collector runs in a background thread. It cannot grab the Python GIL without the risk of deadlocking. So, instead, it maintains a store of Python objects whose reference count must be decremented. If another thread takes the GIL, it will empty this store and decrement the Python reference counts.
Finally, you can call py_free/1 if you do not wish to wait for GC. This immediately decrements the reference count. It does not free the blob, but ensures that the final garbage collection of the blob only frees the blob.
If the blobs are correctly reclaimed and the memory is not freed we may be facing a bug