Yielding prolog engines from within a foreign predicate

Hello friends,

I’m spending my christmas holiday trying to integrate tokio, a rust library for asynchronous code, with SWI-Prolog. Tokio allows one to write rust code where certain operations, like network communication, will suspend a task if the operation cannot be completed immediately, to be woken up later by the tokio runtime when it detects that the task is able to continue. This allows many tasks to share a limited number of threads.

My goal is to allow prolog engines to participate in this. SWI-Prolog allows prolog engines to yield and be resumed later on, so it seems like this would be a very good fit. Specifically, what I’d like to accomplish is for prolog to be able to call a native predicate which may suspend. If it does, I want the prolog engine to suspend too. Tokio should then later on resume the whole thing when it detects that the operation is able to continue. For super bonus points I’d really like this to work even if we call into native from prolog, then back into prolog, then back into native.

There is however one big hurdle I’m running into. Suspending a prolog engine is only possible from within prolog, through '$engines':'$engine_yield'(Term,Code). This unfortunately means that to suspend a prolog engine, I first need to return out of my foreign predicate, which has the unfortunate consequence that any foreign frames will be closed in the process. What I would really like to be able to do is to suspend a prolog engine while leaving all those frames fully intact, and then on resume, to call into a continuation that directly drops me back into native code, in a context that expects all those frames to still be there.

I’m currently working on a workaround where whenever I need to open a frame, allocate some terms, or call a prolog predicate, I return back into prolog and do the required work there, before calling into a continuation. This ensures that the native code always lives at the top of the stack, and all terms it accesses live on the stack below, allowing native code to return to prolog and yield as many times as necessary without losing terms. But this is obviously not ideal.

Am I perhaps missing a feature in SWI-Prolog that’d make it possible to yield from a foreign predicate without closing the foreign frames? Or if not, is there some way to extend SWI-Prolog to have the yielding behavior that I desire?

Thanks,
Matthijs

Interesting. I have little idea what problems this will face though. If you are in Rust from a call to Prolog we probably have to deal with some state represented in the C stack. Is that going to be a bottleneck?

I don’t think the c stack has to be a problem. I can keep all required state around on the heap, as part of a continuation closure, and return from the foreign predicate. That means the c stack would be back into the state it was at the time the foreign predicate was called. The prolog stack however would have any number of extra frames on it, with a bunch of active terms that I would like to not go away, as the continuation closure wants to make use of those on resume.

So I think what would work is if there was some way of returning from a foreign predicate where I get to say ‘please yield the prolog engine with result term X and yield code Y, and when it is resumed please call native function Z (a pointer) with continuation data Q (another pointer).’ Since the prolog stack is separate from the c stack, it should in theory be possible for the frames there to remain open, even though we pop the c stack, right?

Edit: There does need to be a way to clean up such closure data if we never end up continuing, but I figure this could be pretty similar to the existing prune functionality on nondeterministic predicates. I really don’t now enough of the underlying c code to adequately judge this though.

I see. I think that should be technically possible. I’m now doing holidays though :slight_smile: The rough idea is to allow foreign predicates to return some special code and make PL_next_solution() perform a yield. The special return could be a return code. Alternatively we could exploit the exception mechanism, i.e., leave something in the environment and return FALSE. PR welcome :slight_smile:

I would love to do a PR but studying up on the swipl internals is probably going to take longer than my holiday. But I’ll see what I can do :slight_smile: .

I wonder if we can’t expand the way nondeterministic predicates work to also allow yielding, as nondets already have a way to pass in a bit of data between calls. All that would be required is some way to differentiate between being called due to needing a next result, or being called because of continuing from a yield. Exploiting the exception mechanism could maybe be used for returning the actual yield term, and the return code could be used to signify that a yield should happen (maybe everything above 256 can be interpreted as a yield code?)
Also, perhaps predicates that can be expected to yield should be registered somewhat differently, to prevent them from being called from within a context that doesn’t allow yielding.

Let me try to get this straight. Just for myself, I tend to get confused about engines and yielding :slight_smile: We have a Prolog engine calling a predicate defined in Rust. In Rust you want to call Rust’s yield and when the Prolog engines is resumed using engine_next/2, you want the Rust function that called the Rust yield to continue, right?

I guess we need to exchange two pieces of information: the term we normally provide as an argument to engine_yield/1. When resuming we probably have some Rust handle/object/… we need to activate. Still ok?

So, roughly we could implement the yield from Rust as a Prolog predicate that returns two values: the term we need to give back and the above Rust handle. Now we can do the trick in Prolog:

    ...,
    rust_pred(Return, Handle),
    engine_yield(Return),
    rust_resume(Handle),
    ...,

Here, rust_pred is the thing that can yield in Rust. This returns the two arguments, but may have more arguments to do its normal work. Still right?

And the next question is whether we can do this more elegantly?

Non-det predicates can return a small integer or pointer. They have indeed the logic to restart. If we accept that predicates either can be nondet or yield we could define that when we yield we get a pointer to the two pieces of info we need. That might be better that using exceptions as we can define a predicate to be det, but we cannot define one not to be capable of raising exceptions. The advantage of the exception route is that it is not time critical.

All in all, this might get rather tricky and I wonder whether we cannot elegantly pack the above Prolog route and keep this as a possible future extension. I’m not too fond of complications to the VM if they are not needed.

I’m sorry but I’m going to be contradictory and confusing right away. I’m actually not interested in any of the existing predicates around coroutining using prolog engines. Though I guess, in a sense, this is right enough to talk about the problem at hand, let me just make sure that my use case is fully clear here.

Instead of having a bunch of engines around that were started with engine_create/3 and friends, I’ll be starting engines from rust, using PL_create_engine(...). I intend to have a pool of these engines which I’ll use with hyper, a rust async web server library. Any time a request comes in, I’ll pick up an available prolog engine and call a hander predicate for that request (using PL_open_query(...) and PL_next_solution(...)), after which control passes to prolog.
I then want prolog to manipulate this request using foreign predicates that potentially have to wait. For example, I may want to read some data that the user sent, but the data is not available yet. In this case, the foreign predicate should indicate that the prolog engine must yield.
At that point, I want to return back into my calling rust context. PL_next_solution(...) will return with a yield code, and from here I can tell the rust async runtime (tokio) that this task is waiting for some result, and should be suspended until that result is available. Tokio will actually know what I’m waiting for, because through its own mechanisms it will still remember the inner call that caused the suspend, in this case, trying to read a network socket that didn’t have any data ready. When that data is ready for consumption, it’ll wake up my task, at which point I’ll have it do another PL_next_solution(...) and the whole dance starts again. In the mean time, while this task is waiting, tokio may schedule any number of other tasks to run on the same thread. It may also resume the task on a completely different thread when data does become available. In this way, it should be possible to efficiently use a large number of prolog engines to handle requests without ever suspending any OS threads, and, ideally, limiting the amount of thread scheduling the OS has to do, thus raising throughput.

But all that said, yes, essentially I want to have a predicate defined in rust, called by a prolog engine, and in rust I want to be able to yield, and then later on continue from that yield.

I think I would also need to communicate a third item, namely the yield code. Reading the code, engine_yield(Term) is actually just $engine_yield(Term, 256), where ‘256’ is a special code used to indicate that this specifically is an engine coroutining yield, and not any other kind of yield. As of yet, it doesn’t seem like there are other kinds of yield, but as I described above, I intend to build one. So I probably need to use a different number there.

Unless I am significantly misunderstanding the prolog stack, the problem here is that after rust_pred returns, all the foreign frames go invalid. The stack top resets to whatever it was before rust_pred is called. so when calling rust_resume, using any term or frame it might still remember from its time as rust_pred results in undefined behavior. It might work, up to some point, but if rust_resume allocates its own term refs, opens a few frames, does a few queries, etc, all that stack will be overwritten. The only safe terms that can be used by both are those that were already on the stack before rust_pred was called.

What I very much want is to be able to create frames and term refs in rust_pred, which I can then keep using in rust_resume. This is why this bit of prolog on its own is not enough. It is only enough if we accept that the part after the yield is not able to use any of the frames or terms from the part before the yield, except those that were already on the stack before calling rust_pred. This is the problem I’m trying to solve.

This could work, as long as there’s no NULL pointers involved, as that’d just be interpreted as a nondet predicate returning false. Considering we always need to return 2 (or 3) pieces of information anyway though, that’d not be the case. So yeah, that’s probably better.

It is tricky, but for me the question really is, which path is more tricky? Is it more tricky to expand SWI-Prolog in such a way that we can yield from a foreign predicate in a way that keeps the prolog stack intact, or is it more tricky to build an elaborate prolog trampoline that takes care of any term and frame allocations and prolog calls which potentially need to survive over multiple continuation calls.

Currently, if any terms or rewindable frames need to be created from rust in a way that keeps them available for future continuations, there’s no choice but to return to prolog, do the required allocations there, and call into a continuation, ensuring that they remain valid over future continuation calls.
A similar problem occurs with calling prolog from rust. If I can expect a prolog call to yield due to running into a waiting operation, I cannot call this predicate from inside of rust, because I’m unable to propagate the wait outward. The moment I return from the predicate, my inner prolog query is closed too. So when calling into (potentially yielding) prolog, it looks like I’m forced to first return to prolog, do the call there, and then call into a continuation to continue with the native rust bits.

I am actually looking into doing this sort of approach, and it is probably doable, but it looks like it could be quite a hefty prolog trampoline and I’m not sure if the constant context switching is great for performance. I’m also not sure if this is a faster way of building this than figuring out the SWI-Prolog internals and expanding the VM to allow the sort of yield I need.

You want to suspend inside the foreign (Rust) function and resume? Roughly, yield stores all needed information in a foreign frame and the queryFrame struct and then returns from PL_next_solution(). Resuming calls PL_next_solution() with the QID that is a handle to the queryFrame. PL_next_solution() sees it is called to resume, restores the environment and jumps to the next VMI.

That should be fairly close to what we want. I don’t have the Rust picture clear though. In my model, the stack is like this

Rust ← PL_next_solution() ← C foreign function ← Rust ← Something blocking

This thing needs to suspend. How does that happen? Notably, Rust seems to have some way to save/restore the state that is in the C functions in between. I do not yet have a mental model on how that could work?

Rust has the async keyword which is able to turn functions into state machine objects:

async fn demonstration() {
   if do_a_thing().await {
      do_a_second_thing().await;
   }
   else {
      do_something_else().await;
   }
}

When calling this function like this:

let foo = demonstration();

The function won’t actually execute. Instead, a Future object is created and stored in the foo variable. This object has a method poll(...), which when called will execute as much of the state machine it can until running into a wait, and then either returns Poll::Ready(..output..) when it was all done or Poll::NotReady when it encountered a wait. On the next call to poll, the state machine picks up from where it left off.
This object doesn’t have to live on the stack. It may just as well live on the heap, as long as we keep some pointer to it around allowing us to continue its execution later.
This is what the tokio runtime does. It has the concept of a top-level task (an original future that was scheduled on its runtime), which in turn may call other futures inside of it. It then automates the calling of poll on this future by keeping track of why things suspended (such as waiting for data on a socket) and then doing an OS-appropriate thing to be notified of when that result is available (such as epoll on linux), calling poll again on the whole task when necessary.

A call into prolog can be implemented as a special type of Future, one whose poll calls PL_next_solution(...), and which returns ready or not ready depending on whether a yield code comes back. This allows prolog calls to automatically be integrated into the kind of state machine described above.

In the scenario I describe in my earlier post, a ‘task’ from tokio’s perspective would be the handling of an incoming request. Nested inside of that task would be a call into a prolog engine, which in turn may call back into native, which in turn may call back into prolog, etc. But as soon as the innermost part of that call encounters a wait (some future that returns Poll::NotReady, such as the socket that doesn’t have data yet), we can propagate back to the original call of the prolog engine by yielding any prolog query, and returning Poll::NotReady from any rust Future on the way.
Tokio will ensure that poll will be called again when the data is ready, causing a new call of PL_next_solution(...), which should in turn trigger calls to PL_next_solution(...) for any inner queries, the innermost of which will cause poll(...) to be called again on the Future for the foreign predicate that caused the wait in the first place, which, in turn, will be calling poll(...) on the actual thing that caused the wait (the socket whose data should now be there).

It’s all a little bit tricky to describe, but the short of it is that we should have an object, a state machine, that internally keeps track of where it is at in a function, without persistently using the c stack. Whenever it returns, we are safe to pop stack frames from the c stack without trampling on anything. All we need to keep around is a reference to that state machine object.

Sorry, I may have misunderstood.
Is the scenario you describe here one where a c foreign function is being called from inside prolog, and this c foreign function in turn initiates a prolog call?

In this case, I don’t see how we could make that work. If the intermediate foreign bits don’t know about yielding and how to handle that kind of thing, they’re screwed. To my understanding, this is actually already the case. If the inner prolog bit uses engine_yield/1, c will get a return code 256 back from its call to PL_next_solution(...), the code around it will probably not know what to do with that, and probably either crash or do something inappropriate.

swipl-rs, the rust library for building native predicates for SWI-prolog, will currently panic if it ever encounters an unexpected return code from PL_next_solution(...), which, if done from within a foreign predicate, will result in an error being raised in prolog. To me, this seems like the only sensible thing to do when calling foreign predicates that are not expecting to be put to sleep, but which due to an inner call to prolog are attempted to be put to sleep anyway.

So I’m really only thinking of a scenario here where all the parts cooperate to make a yield work.

EDIT: Come to think of it, since you need to open queries with yielding explicitly enabled, the c function would probably not get 256 back from the query, but -1, with an exception available of the form

error(permission_error(execute,vmi,'I_YIELD'), context(...))

So dealing with yield is opt-in. If a native predicate cannot do it, yielding will just result in an error. Which is fine.

You did a pretty good job :slight_smile: So, roughly we have Prolog calling an async Rust function (Future). Meaning after creating the future it calls poll() on it and may get Poll::NotReady as a result. Now it should return from the foreign function using some special return value that saves everything relevant into the current query and make PL_next_solution() return a yield indication. This is probably called from an other Rust Future that will yield in return, etc. If the Tokio engine thinks it is time to restart the whole thing goes into the other direction, starting a chain of poll calls. The future encapsulating Prolog will call PL_next_solution() using the QID we already have and the restarted foreign wrapper will call poll() on the Rust Future, etc.

Do I have it right now? If so, what needs to happen is some way to return from the foreign predicate that causes the VM to do more or less what the VMI I_YIELD does now. I guess somehow we should preserve the identity of the future over the Prolog yield/resume, no? So, we’d need something similar to the foreign non-det success that allows passing a pointer that is handed back to the foreign predicate on a redo.

Let me do a little research on that (and please correct if I still got it wrong)

Ok. It seems this could at least theoretically fit in the API used for nondet foreign code. I’ll assume the foreign function can store the data required to resume (poll) in a pointer that is 4-byte aligned, not being the NULL pointer. I’m still not sure whether or not we also want to/can support yielding together with non-determinism.

I’ll try to find some time to work on this. No promises …

1 Like

Yes, yes and yes :).

Fingers crossed…

Could you draw this out in a diagram

My mind is already spinning :slight_smile:

Ok. pushed some stuff to the git repo. After building, use

?- help('foreign-yield').

for the docs. This is all pretty experimental. The good news is that the impact on the machinery turned out to be really small.

I’m curious whether you manage to get this to work with Rust and how well it works. Comments, on implementation, docs and API are of course welcome.

1 Like

Maybe Matthijs could also try out PlantUML. :slightly_smiling_face:

image

Note: If the text is hard to read, open image in a new tab should do the trick.

Jan pushed out working and high quality code before a diagram could be made; confirming what I’ve said before: Jan is one of the best software developers I have seen :grin:

This is looking very promising! Judging from the documentation, this is pretty much exactly what is needed. I’m currently working on integrating it in the library, and I’ll let you know when I have something to show for it.

One question I have is, is it possible to know that my foreign predicate is being called in a context where yielding is allowed? I may want to write my predicates in such a way that, if yielding is impossible, it just blocks instead. This would ensure that I don’t need two sets of a bunch of predicates that may be called in either context.

Great work, Jan :slight_smile:

1 Like

I wanted to play with PlantUML a little bit, so here is a diagram showing the sample problem that is solved with what Jan just implemented (of course this is already solvable with threads or engines in Prolog, but this allows another option: yielding from C).

3 Likes

Makes sense. Added int PL_can_yield(void). It is a pretty cheap test. I’m not too happy with the name, so better suggestions are welcome.

In fact, I’m not sure we should refer to this stuff as yield anyway. The technology is similar to engine_yield/1, etc. The intend is rather different though. Where the Prolog engine API aims at interacting with another Prolog engine, this merely aims at suspending and resuming engines and is only related to asynchronous programming. For example, Prolog engines can be used to implement e.g., findall/3 without storing the intermediate results (it is not faster as we still have to copy the term between engines).