Best way to weave list processing predicates?

I’m getting stymied by a task that I’d find simple in other languages, but I just don’t know what the best way is to implement it in Prolog. Specifically, I want to perform several processing steps on a stream of data, each of which may or may not change the cardinality of the dataset, and which may or may not be deterministic, and I’d like them to operate without having to hold the entire stream (or, worse, multiple copies of the stream) in memory. Basically, I want to do the following:

process_stream(List0, Processed) :-
    transformation1(List0, List1), % no cut
    transformation2(List1, List2), % no cut
    transformation3(List2, Processed).

only, without storing the whole list in between. If all the streams had the same cardinality, I could just do:

process_stream([H0|T0], [HP|TP]) :-
    transform1(H0, H1),
    transform2(H1, H2),
    transform3(H2, HP),
    process_stream(T0, TP).
process_stream([], []).

but that only works if the lists remain in lockstep, and each can process a single element at a time, without context.

In, say, Python or C#, I could just use generators/iterators, with each transformer consuming the elements it needs before yielding whatever values it can produce to its output. I’ve glanced at library(lazy_lists), but that really seems geared only towards deterministic processes, and especially those that can opportunistically pull in a whole block of data at once. Basically, for reading from file/network. And honestly, constraint programming in general seems pretty ill-suited to this, because unless you are dealing with a det/semidet source and you use extra-logical storage for stream blocks, it only ever works in post-hoc mode. Basically, if you had this definition for transform2:

transform2(foo, bar).
transform2(apple, banana).
transform2(bear, eel).

and you called it with a delayed/attributed variable in the first argument that would eventually resolve to ‘bear’, Prolog would have no choice but to try unifying it with foo, calling the unification hook, failing to unify foo = bear, undo the H1 = foo unification, try it again with H1 = apple, and so on. Not precisely the most practical or efficient way to go about things.

So, how do I do this in Prolog? And, to extend it a little, how do I do it with DCGs, which are even more limited in their calling convention?

1 Like

Pehaps Coroutining using Prolog engines.

HTH

1 Like

Yeah, I was thinking about that, though I wasn’t sure whether the context-switching would impair efficiency too much. Also, would there be any way to feed input to a DCG from a pengine yield? Also, how would you avoid extra backtracking (if the upstream predicate produces more stream elements and choicepoints than necessary) or overenthusiastic consuming (if the downstream transformers end up instantiating variables that haven’t been populated by upstream yet)?

!! I’ve never seen this use of pushback lists! How does it avoid double-processing the expansions and getting into an infinite recursion?

…wait, that’s not actually a difference list, is it? That’s using the DCG List\Tail syntax but with two completely unrelated lists, isn’t it? Mind blown. :astonished: :exploding_head:

I have not used this ever, so I won’t be giving any other info. :neutral_face:

To learn more read the section named " Implicitly passing states around" in this tutorial

See, that part I understand (and I used that tutorial myself, way back when I was learning DCGs), and I’ve actually used that syntax myself from time to time. The problem is that you can’t do that and do list-processing of any significant complexity (you know, the thing DCGs are ostensibly used for when they’re not being used to emulate procedural languages) in the same DCG unless you’re willing to jump through some pretty significant syntactic hoops and lose some runtime efficiency, because the states get put into the same list as the data.

To illustrate, here’s what happens to that tree example if, instead of counting leaves in a compound term, we wanted to count newline characters in a list of charcodes:

state(S), [S] --> [S].
state(S0, S), [S] --> [S0].

count_newlines(Codes, Newlines) :-
    phrase(nlcount, [0 | Codes], [Newlines]).

nlcount --> checknl, nlcount.
nlcount --> [], state(N), {format('Found ~p newlines~n', [N])}.

checknl --> state(N0, N), `\n`, {N #= N0 + 1}.
checknl --> [NonNewline], {NonNewline \= 0'\n}.

That’s not gonna work nearly as well as you might want it to, because the `\n` isn’t going to match a character code in the Codes list; it’s going to match whatever the number was that was generated by state(N0, N). Basically, if you’re mixing state variables and data in a DCG, then every nonterminal that you use has to understand that there’s gonna be a state that comes first before the data.

In any case, though, this is kinda beside point when it comes to the original question I wanted help with. I’m not trying to maintain a state through a single complex DCG, I’m trying to get multiple list-processing predicates (which may or may not be DCGs) to work well together.

1 Like

I see, you may want to try the EDCG package maintained by @peter.ludemann, it solves exactly this problem as it allows you to have DCGs handling separate lists.

This is something of an unsolved problem for me too. By unsolved I mean that I don’t really know how to do it efficiently, with clean obvious code, entirely from Prolog. But my understanding is that you want something equivalent to Unix pipes, without giving away multiple solutions, is that right? Like,

would be actually (in some unix shell):

< List0 transformation1 \
        | transformation2 \
        | transformation3 \
        > Processed

… but you also want to keep the non-determinism? Meaning that any of the transformations can succeed more than once, with different lists pushed down to the next transformation, and the whole pipeline would succeed more than once with different solutions?

Do you want the transformations to be unaware of each other? In other words, that you can write two parts of the pipeline ignorant of each other, but you can chain them if they happen to produce/consume compatible elements?

EDIT: I strongly suspect that Prolog Engines as suggested by @EricGT above is what will get you the closest to what I described above. Not sure yet if this is what you are after, though. Also haven’t myself tried to see how this combines with non-determinism. I have this suspicion that long lists + non-determinism somewhere along the list requires buffering and can blow up really fast.

That is not really true. Lazy lists (and pure input, where it all started) allow processing external data (typically using a DCG) as it comes in. GC discards all already processed data before the oldest point to where we can still backtrack. If your DCG is fully nondet there is little choice but holding the entire list in memory.

Otherwise it is an interesting problem. Your comparison to Python and C# seems a little unfair to me as it is precisely the required non-determinism that makes this problem hard to deal with. If we do not have non-determinism that needs to be communicated between the layers we have at least one theoretical option: use a thread for each layer, make one layer send the elements to the message queue of the next thread and there use a lazy list to process the input using a DCG. That is pretty much the shell analogy mentioned in this thread.

If you need non-determinism between the steps it gets nasty. The trivial solution is to merge the steps into one big DCG. Often not so elegant :frowning: If there is another solution (I don’t know) I’d suspect it to use delimited continuations or, as mentioned, engines.

This has been my go to solution. I wonder what is not elegant about it. With good enough building blocks, as available in library(dcg/basics) for example, the “one big DCG” is not so big after all.

Lately I have been doubting the idea that everything must be composable using a particular pattern. In my experience, too often this creates too much overhead in terms of design, initially, and then programming the adapter code. It ends up more confining than liberating.

1 Like

This is really unnecessary. There isn’t any apparent unnecessary non-determinism in this library, and if you find it, you should report it or suggest an improvement.

This has been written. There is a chapter about this in “The Craft of Prolog”. Maybe we can kindly ask Richard O’Keefe if we can use his chapter and make a tutorial based on it available on the internet.