I am experimenting with JPL in one of the projects I am working on.
I am looking for a good way to transfer a large list from java to prolog. My current approach is the following:
List<Atom> detection_terms = detections.stream()
.map(detection -> new Atom(detection.toString()))
.toList();
new Query("detection", Term.termArrayToList(detection_terms.toArray(new Term[0]))).hasSolution();
There are about 1.400.000 detections, but the final code will handle about 10.000.000 detections. The current code gives me a StackOverflowError because of the recursive function Term.putTerms. I can of course raise the Stack limit, but I feel this would only be moving the problem to a later point in time.
One solution I can think of are passing the detections one by one (or in batches) and then using assertz to save them in prolog. Then when I am done passing the list I can call my main predicate that reassembles the list on the prolog side. But I was wondering if anyone here as another suggestion?
Due to technical constraints I cannot connect to the database containing the detections using odbc.
Right now I am just passing the detections as strings but I would like to pass them as compounds later on.
Hmm. Hard. JPL isn’t well suited for that (AFAIK, I only have a rough overview of JPL and dealt with some really low level issues, never used it). When creating a Prolog term it first creates a Java equivalent and, when starting a query, it transfers this to Prolog, packs up the result and translates this into the Java representation. All this is fine for small terms, but for big things it gets a real problem. Prolog itself can handle lists with 10M elements fine
The C(++) interface can handle this fine too as it builds the Prolog term immediately without intermediate representation. That could be an option?
Sending in batches, asserting and recollecting will also work, but still creates a huge overhead and with these big terms you should not want that.
There might be an option turning things around, i.e., add a Java defined predicate that picks up a chunk of the data. Now you can use library(lazy_lists) to define a lazy list based on the callback that transfers a chunk. Depending on the processing you want to do this may be the nice solution because you may not need to have the entire list in Prolog memory at all. Compare to library(pure_input).
Yet another option might be to create a pipe and have a Java thread writing to it and a Prolog thread reading the data. Not ideal, but probably still faster than the JPL term exchange in chunks.
Ideally though, get the data out of the database while bypassing Java using ODBC or a C/C++ plugin to Prolog to get the data.
My current solution I went with is serializing the whole list and passing it as an Atom, Then using atom_to_term/3 to get the list again. This works and is decent speed-wise. The 1.4M detections are passed in about 2s. I think this will have a similar result as using a pipe.
Good to know the C(++) interface can handle this fine. Sadly I cannot change this part of the program.
I really like the idea of using lazy lists. Will definitely give it a shot to see how it goes.
library(protobufs) might be of use to you … protobufs were invented at Google for sending large amounts of data between C++, Java, Python; subsequently, there have been implementations for Javascript, Go, and quite a few other languages.
Protobufs give you access to “plain old data” (roughly speaking, the equivalent of C structs with some extensions such as arbitrary size strings or arrays). In Prolog, this is represented as nested dicts and lists.
Might be a bit better to use a string and term_string/2,3. That avoids a giant atom that needs to be garbage collected and if you have bad luck won’t as atom-gc is conservative.
It’s sad that this works better than the JPL way. I’ve not designed it The advantage of the design as-is is that you need far less understanding about the life time issues that come with raw Prolog term handles and as the result is a bunch of objects you can use all the OO goodies on them.
This is now fixed; it was an implementation issue - JPL used recursion for term traversal during transput - neat and self-evidently correct. What’s not to like? ok apart from (JVM) stack use
The new non-recursive implementations are correct by intention and unit tests, and will be in the next release (didn’t entirely make it into 8.5.14). As they only affect jpl.jar, you could get this from the Windows nightly build…