Proper use of temporary module?

I’m using: SWI-Prolog 9.2.9

I want the code to: I want to be able to create a context for temporary use and then get rid of it. I see that one can set a module property of temporary, but I don’t see explanation of the meaning of this. Is it only for use with in_temporary_module, or can programmers use it directly? If one creates a module and marks it as temporary, how does one signal that one is done with it?

1 Like

I think this is a great question, I often wish I had this. The point is that while I can create a module dynamically, it’s not clear to me if it can be garbage collected when I don’t need it anymore. In practice, I retract every fact contained in that module, and then pretend it never existed, figuring the overhead is probably minimal. But I wish I knew of a more elegant solution.

In general, I find that using modules as temporary containers for facts instead of data structures (like a list of facts) is a very natural idea, but doesn’t have enough attention from an usability perspective in prolog. And I don’t really understand why.

Hope to hear from more experienced users!

Temporary modules are rather fragile. in_temporary_module/3 captures the scenario for which it was intended. If that doesn’t suit you, please explain your scenario.

That is exactly what in_temporary_module/3 does: the Setup argument creates the context in the module and Goal is executed in this context.

I think you have already discussed this question a little elsewhere: this is by way of being a follow-up.

I am using the Popper ILP system, which uses SWI Prolog to evaluate learned rule sets (invoking SWI through the Janus library). A challenge is that Janus does not (as far as I can tell) permit resetting the Prolog to work on a new rule set; attempting to do so causes Prolog errors.

More specifically, the test component of Popper queries against the learned ruleset (and some fixed background knowledge) to see if the ruleset correctly labels all of the positive and negative examples. Currently this is done with multiple queries (the python code has approximately 25 invocations of query_once), so it’s not obvious how to it to make use of in_temporary_module, which handles only a single goal.

The simplest and most robust way is probably to create a single query :slight_smile: The second option is to look at the implementation of in_temporary_module/3 and use the building blocks directly. That is not officially supported as it uses undocumented predicates whose name start with $. It is not very likely something will change to these shortly though. You have to obey two rules

  • Make sure no other module imports anything from the temporary module.
  • Make sure no query is running when you destroy the temporary module.

Failure to do so most likely results in a crash.

Thank you very much for the advice! I will look into the implementation directly.

You were kind enough to explain to me how to use a temporary module so that I could garbage-collect a bunch of code in an ILP system (Popper).

I have tested this out, and I find that it works correctly, but the version using a temporary module runs much more slowly, especially for a large problem. I conjecture that this is because the code in the temporary module is not being compiled. Is that correct?

If so, is there any way to abolish all code in the user module so that it can be reloaded?

I believe it’s possible to look at the ILP problem and determine whether it will be expensive to solve, and modify one’s approach to handling it.

Should not be the case. There is nothing special about temporary modules in how code is handled. You should only avoid any other code relying on the code in temporary modules as in the end the code is deleted.

I’d first try using the profiler to see whether can can find what exactly is slower.

Thank you for the quick and helpful reply! I will try the profiler, and will also be examining what specific problem features are correlated with the slow-down.

It’s interesting to see that this use case arises in the context of another ILP system: I have the same need in my own ILP systems (Louise and now Poker).

Every time I try to explain the user case I get the feeling I confuse Jan because of the terminology but the problems is really simple: I have a dataset of rules and facts that I want to train and test on (appropriately partitioned), and I want to be able to change that dataset “on the fly” without having to restart the Prolog session. The complication is that I want my learning systems to be able to point to every dataset through a common module name, e.g. training_data or test_data and the individual data in the dataset must be terms with the same functor/arity say positive_example/2 or negative_example/2. This ends up raising errors.

Suppose I have a dataset of cats and dogs, and another of dolphins and whales. I want to load the first dataset, train on the cats and test on the dogs, then, in the same session, unload the first dataset, load the second one, train on the dolphins and test on the whales.

It seems like associating the cats and dogs with a module named cats_and_dogs and the dolphins and whales with one named dolphins_and_whales, and then loading each into a temporary module called my_dataset when I need it should work, but the way to achieve this is complicated (I do it via load_files/2 with redefine_module(my_dataset)) and the end result is I get errors that say I’m trying to redefine a previously defined predicate.

I could give more detail on the errors and how they happen (I think I have it figured out) but the point is that modules sound like the ideal abstraction for a collection of self-encapsulated worlds of facts and rules with a common interface that be loaded and unloaded as needed, but it turns out it’s very fiddly and error-prone to use them that way.

I appreciate that has everything to do with the horrible module standard though and not SWI’s singular fault.

1 Like

Surely the module system is not designed for this. The Quintus module system is pretty similar in spirit to static and non-static functions in C. This is understandable as it was designed in the late 80s … But, SWI-Prolog has some extensions that allows for more dynamic usage and, maybe, we can fix that using some creativity :slight_smile: Let me try.

  1. Put each dataset into a module. That is a no-brainer. These things are big (potentially) and you want to swap between them.

Now the (candidate) rule sets. They need to call out to a dataset or a combination/selection of datasets.

  1. We can combine/select dataset by using a new module that makes module qualified calls into the real datasets.

We are still faced with the ruleset. Ideally, we create a ruleset as a module and then make calls that combine the ruleset with a (possibly combined) dataset. There are roughly three ways to do so (that I can think of right now):

  • Rule sets are (I assume) fairly small compared to the datasets. So, we could create a (possibly temporary) module that imports the dataset and copies the ruleset. That can use copy_predicate_clauses/2.
  • Make all predicates in the ruleset transparent. Module transparent predicates are a bit of a left over of the SWI-Prolog module system, but they still partly underpin meta predicates. With this, we can call @(Ruleset:Goal, DataSet), meaning we must run Goal using the predicates from RuleSet and make calls to the database using call(AccessData) (or any other meta predicate). Might need some tweaks, but I’m quite sure this can work.
  • Set a global variable (these are thread local) to point at the dataset module. Now define something like call_data(Goal) :- b_getval(dataset, Module), call(Module:Goal).
  • Swap the dataset of a rule module by abolishing all imported dataset predicates and using import/1 to import them from the right dataset module.

Note that if you know the set of predicates provided by the data, you can use goal_expansion/2 when compiling the ruleset to make this all fully transparent (except for the last one). If you have the clauses as a list of terms, first call expand_term/2 on the list before asserting them.

The first three options allow multiple threads to combine a ruleset with different datasets. The last one does not.

Note that modules are pretty cheap, so don’t be afraid to use a lot of them. Cross-module calls have no overhead. Meta calls that need to be resolved at runtime have some overhead. In many cases this will only be a couple of % of the total execution time.

Does one (or more) of these seem a good solution to you?

1 Like

Thank you, again Jan. You’ve helped me with this before. I’m sorry that I keep coming back to it.

This is indeed what I’ve done up to now. I create the new module (the one with the database) like this:

%!      load_experiment_file(+Filename) is det.
%
%       Load a new experiment file, unloading the old one.
%
load_experiment_file(F):-
        load_files(F, [module(experiment_file)
                      ,redefine_module(true)
                      ]).

As to the candidate rules (what we call the “hypothesis”, i.e. a logic program that we want the system to learn from examples and background knowledge in the experiment_file module) I generally don’t need to keep it separated so I just assert it into the experiment_file module where it can easily find the “databas” (i.e. the examples and background knowledge) so that’s not a problem.

So what’s the problem? I really need to get you a proper bug report, if this is indeed a bug, but here’s what I understand. Say I have a dataset file (the “database”) called family_relations.pl. To load it, I’d pass its (expanded) path to load_experiment_file/1. No problem with that, I can use the data in family_relations.pl and learn a hypothesis and so on.

Trouble begins when I edit family_relations.pl and then call make/0 to reload everything. There is a directive in my code that calls load_experiment_file/1 so the edited family_relations.pl gets re-loaded. I want that, because I want the changes I made to it to be loaded in memory. But if I do this, at some point down the line I invariably get a permission error (along the lines of "
ERROR: import/1: No permission to import … already imported from …"). Sorry I don’t have the exact error, I posted about it here previously (Re-defining a predicate in a new module file, vs. in a traditional file). In fact, I set everything up with the redefine_module options after your kind advice in that thread.

I think what happens at that point is that I have two Prolog modules both exporting the same predicates, which is to say, the predicates defined in family_relations.pl. There’s a copy of those predicates now exported from both the experiment_file and family_relations modules. That happens only if I edit, save, and reload the family_relations.pl module, so that’s what’s causing the trouble.

I’ve been working around this by starting two separate SWI-Prolog sessions, one where I run my system, and one where I edit the experiment files. The one that is running my system is without the IDE so I can restart often without messing up my workflow. So this is not blocking me, but I’m worried that any user that picks up my system will stumble upon the error, swear at me, and never use my system again.

I don’t even know if this is a bug. I think it’s kind of expected to happen that way, I just haven’t worked out a way to avoid it in the code yet.

I have some clue what you are doing, but this all depends on the details :slight_smile: I find it rather dubious to put the hypothesis program into the same module. For one thing it is not allow you to explore multiple hypothesis concurrently. A better model (I think) is to have a database module that exports the data and hypothesis modules that import this.

Not sure, but possible the reloading comes from your load_experiment_file/1 being called multiple times? That could be a bug. That might cause make/0 to load both the old and the new. Using unload_file/1 and reloading could be a better alternative.

Anyway, it would help if we have some small example that reproduces the problem. I think that is possible, no? Than we can rework that example to make it work properly. I’m pretty sure that can be done.

Absolutely, it’s been on my backlog for a while. I’ll see if I can do it in the next few days.

Yes, I thought that was why you suggested a separate module for hypotheses! That’s not useful for my systems though. The main learning algorithm first creates a “master” program (called the “Top Program”) that includes as a sub-set each hypothesis that is consistent with the positive examples. Then, it throws out all the sub-set hypotheses that accept the negative examples.

The first step, constructing the Top Program, is done by meta-interpretation (no bottom clauses here; this is not inverse entailment as in Aleph) so nothing needs to be written to the database at that point (I made sure that was the case to avoid the overhead of recompiling the dynamic db since it’s possible to create many hundreds of thousands of initial hypotheses). The second step, throwing out hypotheses that accept the negative examples uses the dynamic db, i.e. I write each hypothesis to the dynamic db before testing it against the negative examples but at that point I don’t need to explore different hypotheses just weed out over-general ones. This is an advantage of the algorithm, it makes everything significantly faster. Like, a lot.

Concurrency, or rather multithreading, would be nice to have in both steps, regardless of use of the dynamic db. I tried my hand at it and made good a old mess that ended up with everything slowing down significantly. I don’t think that was because the problems were too small to benefit from multiple threads, it’s just that I bodged it. I might try again in the future. A PhD student I worked with re-implemented my algorithms in Rust with multi-threading and they ran 500 times faster than my meta-interpreter (although there were proooobably some other differences that made this a bit of an over-estimate of the advantages of multithreading).