Exceptions in foreign code - more questions

Hello friends,

I’m back with more questions on exception handling in foreign code. Now that I know about the way resource exceptions work in case of unification and friends, I have some questions about how it works in the case of ordinary exceptions, thrown by a prolog query or foreign code. Just making sure all my assumptions are correct.

I’ve been reading some of the underlying code, and I think what is going on is the following. Please correct me if I’m wrong on anything.
When a query that was opened in foreign code throws an exception, it’ll store the exception term reference within the query frame. On returning from PL_next_solution(qid), this term reference is strictly speaking not valid at this point, as it belongs to an inner frame that we have by now returned out of. That’s why PL_exception(qid) is required to copy it into a term ref.

Foreign code can throw exceptions with PL_raise_exception(term). This works much like resource exceptions in unification, in that it duplicates the exception term and puts that duplication into the thread exception term, which is active as long as the thread (actually prolog engine) lives, and won’t change when foreign frames are rewinded or discarded.

Since an exception raised by foreign code behaves much like the resource exceptions in unification, they can be retrieved with PL_exception(0), and cleared with PL_clear_exception().

Now for my questions…

  • Is the above description accurate?
  • Is there no chance that the PL_new_term_ref() inside of PL_exception(qid) will overwrite the query exception term on the stack? And if I run a whole bunch of PL_new_term_ref() after PL_next_solution(qid) returns with an exception, will I eventually overwrite the exception term?
  • What exactly does PL_Q_CATCH_EXCEPTION do? In my own tests, it seems that it only suppresses an ERROR from being written to console, but PL_next_solution(qid) still returns the proper status code, and PL_exception(qid) still retrieves the exception. Is there more to it?
  • What exactly is PL_Q_PASS_EXCEPTION for? Documentation on it is a bit scarce, saying that the term is not invalidated on closing the query, but not what this is good for. Does this also apply to discarding frames? What is this option for? And I assume that it’s still not exactly safe to keep using the term ref, as it may eventually get overwritten. Is that true?

Thanks,
Matthijs

1 Like

Yes. Applause!

I’m afraid that can happen. What happens is that SWI-Prolog keeps a bit of spare stack space that it will normally not use (i.e., it will generate a resource exception instead of using the spare stack space). As long as the exception is pending, the system enables using this spare stack space. That should normally avoid that handling the exception will overwrite the same exception. As there is no final upper limit on how much resources are required to handle an exception, your initial exception can still be replaced with a resource exception. That can get into a loop if handling the resource exception triggers a new resource exception. The system tries to be smart, discarding the exception context should this happen and if even that fails raise the abort exception (‘$aborted’). Notably SWISH has helped a lot producing test cases such that we are now quite sure we don’t crash out completely.

It is a compatibility thing. In the old days SWI-Prolog (and most Prolog systems) did not have exceptions, so the caller of PL_next_solution() would not handle exceptions and PL_next_solution() itself had to do the job. New code should always use this flag (or PL_Q_PASS_EXCEPTION) and deal with exceptions.

It is intended for foreign predicates that make a callback on Prolog. When using PL_Q_PASS_EXCEPTION they can (and should) simply return FALSE from the foreign predicate and the system will propagate the exception.


I’m afraid the exception handling API is a bit of a mess. First there were no exceptions, next foreign code handled these using the C setjmp()/longjmp() interface and finally they were handled as they should have been from the beginning: using return codes. Unfortunately this should have used the POSIX conventions to return 0 on success some code for logical failure and one or more others for the exception (type).

The C++ wrapper translates these forward and backward to C++ exceptions. This may provide some inspiration as the problem is most likely similar, no?

2 Likes

I’m not sure I follow. It is my understaning that when you do your first PL_next_solution(qid), a foreign frame is created, Which will be closed or discarded on cut or close. Are you saying that in case of an exception, this foreign frame is not actually used, but we’re instead going to be on a pre-reserved exception-handling bit of the stack? Where on the stack is this? And what will then actually happen when I close the query? Are we then jumping back? Or is it the case that this foreign frame created in PL_next_solution has some exception handling space reserved on it before we go into prolog?

Incidentally, while on the subject of the PL_next_solution(qid) context, the documentation is pretty explicit about not being allowed to have more than one query open at the time. I discovered however that after PL_next_solution(qid) returns successfully, I can just open a foreign frame and call something else, with no apparent issues resulting from that, as long as I’ve closed that frame before calling PL_next_solution(qid) on the original query. Is this actually true, or am I breaking things without realizing?

Rust does not actually have exceptions. While there’s a stack unwind mechanism, panic, it’s really only intended for when we basically give up on actually handling an error. There’s no expectation that user code will be recovering from a panic. The normal way of doing error handling in rust is through result types, which either contain a value or an error. So I’ll have to do this slightly different.

More important though is that Rust places heavy emphasis on safety. Rust libraries that interface with the scary unsafe non-rust world should be written in such a way that users of such a library are not exposed at all to these horrors. So these Rust bindings cannot just be a type wrapper around the C foreign language interface. Special care must be taken to ensure that nothing the user does can result in undefined behavior. So it’s not enough to just wrap a term_t in a class and hope for the best. Care must be taken to ensure that such a term object does not survive the context in which it is allowed to be active, that it is not used on a different engine than where it was created, etc. This is why I have these questions. It is important to know what is actually happening here, so I can judge if some operation is actually safe, or if I need to take some extra precautions to make it so.

Final question (for now), if there’s a chance that handling an exception normally, in the context of a PL_next_solution(qid) having just returned an exception, results in undefined behavior, isn’t a possibly better strategy then to immediately ex = PL_exception(qid); PL_raise_exception(ex); and from that point onward work with the term returned by PL_exception(0)?

This is slowly turning into a “tutorial reconstruction of the SWI-Prolog VM” :slight_smile: I guess that is useful to have. Some of the complications in the exception API is related to the fact that it used to work differently that how it works now. Let me clarify. At this moment, PL_raise_exception() uses duplicate_term/2 and global stack freezing to protect the exception term. The term-reference itself is stored in a term reference created as one of the first while creating the Prolog stack, so it remains valid on backtracking and stack unwinding. The stack freezing mechanism was another lesson from Bart Demoen to deal with non-backtrackable global data on the stacks which is required by e.g, CHR. Normally a data mark holds a mark on the top of the global and trail stacks. Undoing rewinds all trailed variables and resets the top of the global stack. It is easy enough to protect an assignment from backtracking by not trailing it. If the term lives above the data-mark for the global stack this doesn’t help. So, stack freezing maintains a pointer just above the latest frozen term. Now the global stack is reset to the max of the frozen bar and the mark.

In the old days we didn’t have all this. The exception handling code did some dirty tricks using a temporary copy of the exception term to be able to backtrack the data while preserving the exception. This implied we needed to be really careful with the exception between the time it was raised and when it could be handled. Notably getting it from one PL_next_solution() to its parent was tricky. Possibly all this can be simplified a lot now. Maybe we only need PL_exeption(0) these days. We still need PL_Q_CATCH_EXCEPTION or PL_Q_PASS_EXCEPTION. The latter is used by the engine to figure out whether an exception is uncaught by checking the enclosing query for a matching catch/3 call.

Hope this more or less explains exceptions …

This should be described better. What is not allowed is to enumerate over two queries. Thus, this is not allowed. At any point in the computation you can open a query, play with it and close it. Just be sure to close before continuing.

   q1 = PL_open_query(...)
   q2 = PL_open_query(...)
   PL_next_solution(q1);
   PL_next_solution(q2);
   etc.

That is going to be quite a challenge. As the life time of Prolog objects follows rather unconventional rules this is really nasty. If you need to encapsulate each term_t into an separately allocated object with additional information about where it belongs to and additional bookkeeping to know when it is still valid I have my doubt whether a low-level interface like this is still viable. You could also opt for the JPL route. This represents a Prolog term as a Java object and uses the low-level stuff to hand this to the Prolog engine when needed. A returned Prolog term goes the other way and it immediately materialized as a Java object. Now the entire life time issue is gone. The price is a lot of duplication :frowning:

:slight_smile: From the start of this reply, I think the conclusion is that this is even no longer required and you can just handle the exception as PL_exception(0). Just try. If it works I think it is safe.

2 Likes

Once again, thanks a lot for the detailed answer!

After some testing, it looks like when I use PL_Q_CATCH_EXCEPTION, the exception is not available from PL_exception(0), but when I use PL_Q_PASS_EXCEPTION, it is. Neat!

Are you saying that PL_Q_PASS_EXCEPTION behaves differently in the context of a foreign predicate whose call is wrapped by a catch/3 in prolog? Can I expect PL_exception(0)and PL_clear_exception() to work regardless of what the caller of my foreign predicate has set up?

I’m actually quite some ways there already. Hopefully I can post a proper announcement with example programs soon. For sure, it’s probably impossible to support everything that is possible from the SWI-Prolog fli, but it looks like it is very possible to build some restricted model on top of it where guarantees can be given. I’ll keep you posted :slightly_smiling_face:.

1 Like

ISO defines exception handling as “unwind the deepest call until this is a catch/3 call for which the exception unifies with the 2nd argument”. SWI-Prolog is a little different:

  • It walks up the stack to find such a catch/3 goal without unwinding.
  • If it finds one in the current PL_next_solution() environment. It unwinds to that frame, unifies the exception with the ball and calls the recovery goal.
  • If not, the behaviour depends on the PL_Q_CATCH_EXCEPTION/PL_Q_PASS_EXCEPTION flags of the current PL_next_solution(). If none is given the command line debugger is started. In the PL_Q_CATCH_EXCEPTION is given it considers the exception unhandled and saves it as discussed before returning from PL_next_solution(). If PL_Q_PASS_EXCEPTION it checks the enclosing PL_next_solution(0) to see whether or not it is caught there. If so, it behaves as discussed.

Before doing the unwinding though, it calls prolog_exception_hook/4, which it passes who caught the exception or that the exception is uncaught. This hook is used by library(prolog_stack) to print a backtrace on uncaught exceptions and by the debugger to start tracing on specified exceptions. By default, the debugger is activated if an exception is uncaught. After the debugger starts, the ISO unwind frame at a time is done, stopping after each frame. This allows the user to retry the goal.

Thanks. A good Rust interface is surely valuable and from your feedback I see you are really into creating such a beast.

3 Likes

After some more testing, I’d like to write a short wrap-up of my findings.

It looks like PL_Q_PASS_EXCEPTION does exactly what I need. it stores my exception in a way that is retrievable by PL_exception(0), and which survives frame discards. I can choose to handle it and then clear it with PL_clear_exception(), or leave it in place, as long as my foreign predicate returns -1 to signify that an error was thrown.

If I don’t return -1 though, swipl will notice my error and print a warning, then clears the exception anyway. The same thing happens if I call any predicate while there’s still an uncleared error.

If I return -1, even though there’s no exception thrown, swipl will catch that too, and throw a domain error for me, guaranteeing that there’s always going to be an exception term waiting for me.

This behavior remains the same regardless of whether there’s a catch/3 or catch_with_backtrace/3 in the calling context. Your earlier comments had me worried that maybe it’d flat out not report an error on PL_exception(0) if it’s caught later on, but it’s definitely there regardless, and I get a chance to clear it if I choose to handle it. In the case of catch_with_backtrace/3, the prolog exception hook does its job (provided the prolog_stack library is loaded) and fill in the stacktrace, but I still get to clear the exception and the catch won’t run its handler.

I’ll be using this as the basis for exception handling in the rust bindings.

1 Like

The only glitch in this that I can see is that if you clear the exception you fool the logic that detects whether or not the exception is caught (in some outer PL_next_solution() call). That doesn’t really change normal operation. It merely changes the debugging experience that tries to find uncaught exceptions and act on them by starting the debugger as soon as it can rather than just reporting the there was an uncaught exception.

Given the current status it might make sense to modify PL_Q_CATCH_EXCEPTION to be much closer to PL_Q_PASS_EXCEPTION. I’d have to look at the impact that may have.

1 Like

I see. I had not tested that. Admittedly all my tests so far have been from rust, without a proper prolog toplevel. I should actually build a proper module and test from there too.

Having PL_Q_CATCH_EXCEPTION be much closer to PL_Q_PASS_EXCEPTION would help me, but I think I could fake it with my earlier proposal - use PL_Q_CATCH_EXCEPTION, but immediately rethrow the returned error. This should give the behavior of PL_Q_PASS_EXCEPTION, minus the bit where the debugger will step in on seemingly uncaught exceptions. Unless I am missing something.

It would be nice if, from a foreign predicate, when calling back into prolog, it were possible to somehow specify what one intends to catch. That way the debugger’s behavior could be made correct in combination with PL_Q_PASS_EXCEPTION. I don’t know enough about the internals to tell if that’s a viable approach, but supposedly it’s just a bit of data on the stack, and that data could somehow be written from the fli, no?