Memory problem while using the Janus library to call Prolog from Python

I have encountered a memory issue while using the Janus library to call Prolog from Python.

We use Janus to consult a 62mb Prolog file. At this point the memory usage of our program is low. We then perform many query_once calls. At this point, the memory usage of our program quickly grows (>10gb).

We are not adding anything to the Prolog database, so I am unsure what is wrong.

From a memory profile, the issue is our query_once calls. We are making many queries, such as:

Thanks!

findall(_ID, (pos_index(_ID, stop(_V0)),( collidingclose(_V2,_V0),leftof(_V1,_V2),not_higherpri(_V2,_V1)-> true)), S)

Is there a reason why the Janus interface might inexorably use memory? If so, is there a way to prevent this memory explosion?

Might it be related to this feature (section('packages/janus.html')):

Note that the input argument may also be passed literally. Below we give two examples. We strongly advise against using string interpolation for three reasons. Firstly, the query strings are compiled and cached on the Prolog sided and (thus) we assume a finite number of distinct query strings.

Thanks,

Andrew

It is a bit hard to say from this terse description. If this example query is passes as a string to query_once() and you make lots of other queries that are similar, you build up one clause per query. You can find find the space used by that using

?- predicate_property(janus:py_call_cache(String, Input, TV, M, Goal, Dict, Truth, OutVars).
                      size(Bytes)).

The size property is available on many objects, so you can figure out where the space is. For example

?- current_module(M), module_property(M, size(Bytes)).

Or, a bit more advanced:

?- order_by([desc(Bytes)], (current_module(M), module_property(M, size(Bytes)))).

That should tell you the module. Next, you can play the same trick using predicate_property to zoom in to the predicate. Maybe we should add a little library for this …

Note that you can get a Prolog shell from Python by calling

>>> janus.prolog()

Hi Jan,

Thank you for the helpful answer. Using the information in your answer, I have determined that Janus is not the cause of the memory issue.

Thanks!

Hi Jan,

Janus is indeed causing problems. In our program, we use Janus to perform many (100k+) queries. Using the code you suggested, I can see that the memory use of Janus grows linearly with the number of queries. For long running instances, Janus uses almost 1GB memory.

Here are example goals/queries:

findall(_ID, (pos_index(_ID, next_has_arson(_V0,_V1)),( mypos_1(_V3),true_sown(_V0,_V1,_V3,_V2),true_plowed(_V0,_V4,_V3,_V5)->  true)), S)
findall(_ID, (pos_index(_ID, next_has_arson(_V0,_V1)),( mypos_1(_V3),true_sown(_V0,_V1,_V3,_V2),true_plowed(_V0,_V4,_V2,_V5)->  true)), S)
findall(_ID, (pos_index(_ID, next_has_arson(_V0,_V1)),( mypos_1(_V3),true_sown(_V0,_V1,_V3,_V2),true_plowed(_V0,_V4,_V5,_V2)->  true)), S)
findall(_ID, (pos_index(_ID, next_has_arson(_V0,_V1)),( mypos_1(_V3),true_sown(_V0,_V1,_V3,_V2),true_plowed(_V0,_V4,_V5,_V3)->  true)), S)

The only output variable is S.

In other words, our python code is

 `query_once('findall(_ID, (pos_index(_ID, next_has_arson(_V0,_V1)),( mypos_1(_V3),true_sown(_V0,_V1,_V3,_V2),true_plowed(_V0,_V4,_V5,_V3)->  true)), S)')['S']

Is there a way to reduce this memory consumption, such as by reformulating our queries or manually clearing the cache? For the latter, I looked at the janus module (using listing(janus:_))) but it is tricky to understand. Would I need to retract some of the py_call_cache facts?

Thanks,

Andrew

To answer my own question, this query clears the cache:

query_once('retractall(janus:py_call_cache(_String,_Input,_TV,_M,_Goal,_Dict,_Truth,_OutVars))')

I wonder if the __del__ method is being called implicitly in call cases?

I can provide a patch that adds a context manager, which ought to rule that out. (I haven’t submitted the PR because I’ve encountered something strange with exception handling, which I want to fix first (and I also did a bit of refactoring); however, if it’s important, I could probably quickly make a PR).

My work-in-progress (with some text cases that show how to use it): WIP context manager for swipy · GitHub
For more on how to use context managers: Problem with Calling Prolog from Python - #5 by peter.ludemann

The design assumes that strings passed to Janus queries are not dynamically generated. In other words, each of these strings are explicitly stated in the source and the dynamic part is passed in as data using the “in” dict. Generating these strings from data is vulnerable to injection attacks, slow and indeed saturates the cache. It is not clear to me whether you violate that or not.

Hi both,

I am unsure if there is another issue somewhere.

Even if I avoid using dynamic strings, the query_once method uses a lot of memory. Here is a screenshot when profiling using Memray (Memray: the endgame memory profiler):

I am struggling to find more useful information to help see whether this issue is indeed an issue. Sorry for not providing more helpful information.

Might there be an issue with memory management?

Thanks,

Andrew

===== EDIT =====

I have created a simple example of the issue, available here:

Here is the script. This code uses an inexorable amount of memory and soon crashes my computer.

import os
import time
from janus_swi import query_once, consult

class Tester():
    def __init__(self):

        bk_pl_path = "bk.pl"
        exs_pl_path = "exs.pl"
        test_pl_path = "test.pl"

        for x in [exs_pl_path, bk_pl_path, test_pl_path]:
            if os.name == 'nt': # if on Windows, SWI requires escaped directory separators
                x = x.replace('\\', '\\\\')
            print('consulting', x)
            consult(x)

        query_once('load_examples')

    def test_prog_pos(self, q):
        return query_once(q)['S']

tester = Tester()

for i in range(100000):
    print(i)
    q = "findall(_ID, (pos_index(_ID, stop(_V0)),( not_higherpri(_V0,_V2),rightof(_V1,_V2)->  true)), S)"
    tester.test_prog_pos(q)

I am unsure if I am doing something wrong on the Python side, but I do not expect that this program would use more and more memory as it is running.

Just to add, this simpler case also illustrates the issue:

from janus_swi import query_once
for i in range(10000000):
    query_once("findall(_ID, between(1, 10000, _ID), S)")

When i is around 30,000, the python process uses 10GB memory.

Looks like a bug. Thanks for the simple code to reproduce. I’m traveling this week, so you’ll have to wait or search yourself :slight_smile:

Did have some fun using Asahi Fedora remix for Apple silicon. Much more comfortable for development than MacOS :slight_smile: Pushed a fix for this issue. Bottom line was an omission to decrement the reference count of a Python object, causing Python memory to fill up.

2 Likes

That is brilliant, thanks!