Cpp2 exceptions

I looked around the code, and saw that PL_raise_exception() calls copy_exception(), presumably to make sure that the exception term is available outside the current frame. (I may have misunderstood this code … there are at least two levels of foreign frames that are created as part of copy_exception() – one in copy_exception() and one in duplicate_term()→`copy_term_refs() – and I don’t understand them.) And it’s possible to extract an exception from a foreign predicate by unifying it with an argument.

So, I don’t see why calling PL_raise_exception() in the context of PL_Q_PASS_EXCEPTION is a problem (assuming FALSE is returned) or calling PL_clear_exception() and returning TRUE. And, by experiment, this code works (my test code, with error checking and intermediate debug output, is in https://github.com/kamahen/packages-cpp/tree/exceptions-2023-01-26 … you ca try it with test_ffi:catch(ffi_call_pass_2(unknown_pred, Ex2), Ex, true) for clearing the exception (but unifying it to the 2nd arg) and test_ffi:catch(ffi_call_pass_3(unknown_pred), Ex, true) (for changing the exception). [The catch/3 wrappers are to prevent the debugger from confusing things; but it does what I expect without the catch/3]

PL_raise_exception() roughly is responsible for

  • If the raised exception is the current exception, we are done.
  • If there is a current exception, decide which one is more “urgent”. If that is the pending one we are done.
  • Make sure the exception term survives the backtracking that will take place. That is the reason for the duplicate_term(): get a term without trail pointer references. All the foreign frames in there are merely for bookkeeping.
  • Ensure the term_t handle for the exception term is not part of some foreign frame we are going to abandon. That is done by using a pre-allocated term_t that was created at the very start of this Prolog engine/thread (there is a hand full of such reserved references to escape from term_t life-time issues).

And yes, you may use PL_Q_PASS_EXCEPTION, detect failure, close/cut the query and use return PL_raise_exception(PL_exception(0)). This is the same as simply return FALSE because the exception is already pending and as described above, PL_raise_exception() for an already pending exception is the same as return FALSE. If the first is more comfortable in the C++ execution it is fine.

What you may not do is to clear the exception or raise another exception. That is only because of the debugger. If the debugger decides the exception is not caught and the Prolog flag debug_on_error is set, it will start the debugger asap. That contradicts with clearing the exception (obviously there was no reason to debug) or passing a different exception that may be caught and thus, again, we should not have trapped the debugger.

By experiment (and looking at the code), if I use the flag PL_Q_PASS_EXCEPTION|PL_Q_NODEBUG, the debugger isn’t entered.

Anyway, it seems strange to disallow something that otherwise makes sense only because of the way the debugger works … if the PL_Q_NODEBUG flag has other side effects than just blocking the debugger inside a PL_open_query(), then we could define another flag (e.g., PL_Q_PROCESS_EXCEPTION that has the semantics I want.

If it’s unacceptable to use PL_Q_PASS_EXCEPTION|PL_Q_NODEBUG, an alternative is to change the PlException class to not inherit from PlTerm, but instead to use PL_record() and friends (I can avoid a memory leak by calling PL_erase() in ~PlException()). I’m not sure how this would interact with the “context” part of the error term.

ADDED:
In general, composition is a better design paradigm than inheritance, it’s probably best to change the PlException class to not inherit from PlTerm but instead to contain the exception term (whether as term_t or record_t). And PlException should probably be a subclass of std::exception, for better C++ compatibility.

I’m still wondering what you are after. Using PL_record(), etc to bypass lifetime issues for exceptions seems an unnecessary and possibly harmful way to go. Roughly, I think we (should) have

  • a PlQuery that uses PL_Q_PASS_EXCEPTION. If this is the case then the PLQuery destructor should close the query and throw the pending Prolog exception just as all other API calls. That should probably be the default, in particular when doing a call to Prolog from a foreign predicate. It follows the standard rules: if an API call raises an exception, propagate it to Prolog ASAP.
  • a PlQuery that uses PL_Q_CATCH_EXCEPTION, where PL_next_solution re-throws the exception. The application should handle this inside the lifetime of the query. I guess the query can run the content in a catch block if this flag is used to prevent the exception from propagating.

That is more your expertise :slight_smile:

I’ve made a Google doc that contains my thoughts and hopefully clarifies what I want to do. Anyone with the link can add comments to it (I’ve added a couple of comments for Jan to answer).

2 Likes

I think we’re in agreement about PlException and PlFail. I’ll rework the code with the new definitions and hope that some solution to PlQuery::next_solution() occurs in the process. But it might not be feasible to allow rethrowing the exception – and it might not be very useful anyway, given how the debugger works.

EDIT: might not be feasible to rethrow the exception

1 Like

Thinking more about turning Prolog exceptions into C++ exceptions … my first attempt was to create a separate subclass of PlException for each of the PL_*_error() functions (e.g., PlResourceError for PL_resource_error(), etc.). However, it turns out that there are a lot more kinds of exceptions in the source code (PL_error() in pl-error.c has dozens of them), which makes this impractical, even without considering user- and package-defined errors (e.g., package/odbc defines errors).

This means that the following code wouldn’t be supported:

try {
  ...
}
catch (PlResourceError& e) {
  cerr << "Resource error suppressed: " << e.as_string() << endl;
}
catch (PlPermissionError& e )
  cerr << "debug: Permission error: " << e.as_string() << endl;
  throw;
}

but instead the code would be something like this:

try {
  ...
}
catch (PlException& e) {
  if (e.error_type == "resource_error" && e.error_arity == 1) {
    cerr << "Resource error suppressed: " << e.as_string() << endl;
  } else if (e.error_type = "permission_error" && e.error_arity == 1) {
    cerr << "debug: Permission error: " << e.as_string() << endl;
    throw;
  } else {
    throw;
  }
}

This is a bit clunkier and possibly more error-prone (although I could provide helpers for the most common exceptions, such as e.is_resource_error()), but I think it’s a reasonable trade-off between capability and extensibility.

The existing convenience subclasses for PlResourceError, PlDomainError, etc. would remain, but they would only be for creating user-defined exceptions and shouldn’t be used in try-catch code – that is, there would be no attempt to transform a system-generated resource_error into a PlResourceError - it would be just a PlException that can be tested for whether it’s a resource_error.

Once I get this sorted out, I’ll get back to fixing the handling of exceptions generated by PlQuery.

No really. Many map to the same permission/type/… error, but from different inputs.

That said, I agree that having C++ classes for each error is probably an overkill. Moreover because after all my work writing C code for SWI-Prolog and to connect it to so many different systems you’ll find only a handful cases where the exception term is inspected. In 99% of the cases you propagate the exception to Prolog or, if there is no Prolog to propagate to, you print the exception and either die or continue.

1 Like

Back to this code (a slight variation of code from @mgondan1 ), which doesn’t work.

TL;DR: It shouldn’t work, will never work, and there are some simple work-arounds.

PREDICATE(do_query, 1)
  PlQuery q(A1.as_string(), PlTermv(0)); // PL_PASS_EXCEPTION is the default
  try {
    PlCheck(q.next_solution()); // throw PlFail on non-exception failure
  } catch(PlException& ex) {
    std::cerr << ex.as_string() << std::endl; // log the error
    throw; // rethrow the exception
  }
  return true;
}

The underlying problem (which I suppose is obvious to @jan but wasn’t to me) is:

  • Prolog frames and C++ frames and object lifetimes have almost no connection with each other.
  • PlQuery is implemented by a PL_open_query() / PL_cut_query() block; the PL_cut_query() is done when the throw is executed by ~PlQuery as part of the C++ stack unwinding.
  • Pl_cut_query() (in ~PlQuery) causes any terms created inside the PL_open_query() / PL_cut_query() scope to become invalid and this includes the exception term that’s accessed by PL_exception().

This situation occurs with both foreign frames and queries. Perhaps it would be better to rename PlQuery to PlQueryFrame to make this more obvious.

When PL_raise_exception() is called, the exception term is put into the local stack frame and will stay in scope only if it is unified with a variable from an outer scope. (Alternatively, it can be preserved using PL_record() or is otherwise serialized, e.g., using PL_get_chars() or fast_term_serialized/2).

@jan: foreign.doc (line 206) contains this sentence and footnote:

Term references that are accessed in “write” (-) mode will refer to an invalid term if the term is allocated on the global stack and backtracking takes us back to a point before the term was written

This could have been avoided by trailing term references when data is written to them. This seriously hurts performance in some scenarios though. If this is desired, use PL_put_variable() followed by one of the PL_unify_*() functions.

I interpret this to mean the following … is my interpretation correct?

  • There is no mechanism for forcing a term onto the trail (which would protect it when the frame is unwound)
  • The PL_unify_*() preserves the term only if the unification is with a term that is in an outer scope (e.g., passed as an argument to the foreign predicate).

Anyway, getting back to the do_query/1 code above, which is the kind of thing I’ve done plenty of times when debugging servers written in C++ because some bugs can require searching through millions of lines of server logs, looking for a situation that triggers the bug, then adding more “print” statements to narrow things down … perhaps it’s not needed with Prolog?

  • PL_Q_CATCH_EXCEPTION allows printing the log message and then setting some kind of error indication to the caller (and if do_query takes an additional parameter, the exception can be unified with it and further processed outside of do_query).
  • The default handling of PL_Q_PASS_EXCEPTION will output an error message and – if it’s running as an http_server thread, the thread will die and a new worker thread can be started (at least, I think that’s what happens)

So, for those situations, there’s not much need for the re-throw.

For the situation where we might want to sometimes handle the exception and sometimes re-throw it, the following should work (@jan - please confirm that it’s OK to do PL_clear_exception() in this situation):

PREDICATE(do_query, 1)
  PlQuery q(A1.as_string(), PlTermv(0), PL_PASS_EXCEPTION);
  try {
    PlCheck(q.next_solution()); // throw PlFail on non-exception failure
  } catch(PlException& ex) {
    if (handle_exception(ex) {
      PL_clear_exception();
      return true;
    } else {
      return false; // "re-throw" the exception to Prolog
    }
  }
  return true;
}

This refers to something that is more theory that practice. It says that if you create a term reference, use e.g., PL_put_functor() to put a term in it and than somehow backtrack to before the PL_put_functor() without hitting the normal invalidation of term_t, the term_t becomes invalid. The simples (but not so realistic example) is

  term_t t = PL_new_term_ref();
  fid_t f = PL_open_foreign_frame();
  PL_put_functor(t, ...);
  PL_discard_foreign_frame(f);

Now t points at an invalid location to the global stack as the created term is gone. It is realistically possible to get into this scenario, but should be uncommon. If you do and get a GC, the system crashes.

I think we should reconsider (and simplify) all this a bit. First the default PL_PASS_EXCEPTION is a little dubious. There are two main scenario, the query is created while we are in a foreign predicate, which means we are called from Prolog and the second, the control is in C++. Please do not forget that that is an important use case and, although I think it is often not the preferred one, probably the most common. When in the first scenario (defining a predicate), callbacks to Prolog are actually fairly rare. If it happens, we are often dealing with a call back (C++ refined virtual method) from some C++ library that we want to replay to Prolog. This can be a nasty case.

One case is simple: if we are in a predicate and make a callback to Prolog we have to decide what we want with exceptions, which is 99% of the cases to pass it on. Use PL_Q_PASS_EXCEPTION, check a failing PL_next_solution() is due to an exception and return from the predicate (with exception). If you want, you can access the exception as PL_exception(0) after the PL_cut_query(), but you can only inspect it, not change the flow.

We could get the desire to catch the exception. That is typically not wise as it also leads to ignoring abort and timeout. If we do, we must act on it in C++. We could throw some other exception (after PL_cut_query()), but we must consume the original one before the PL_cut_query().

The other is also quite easy: if main control is in C++, PL_Q_PASS_EXCEPTION makes no sense as there is nothing to pass to. Use PL_Q_CATCH_EXCEPTION if you want to do something with it or use nothing and let Prolog handle the exception and return FALSE.

If, from C++, it is desirable to use PL_Q_PASS_EXCEPTION in this scenario, this is fine as the system will nevertheless consider the exception as not caught because there is no environment outside this one that can catch it.

The scenario with something in the middle is complicated. It depends who must handle the exception and whether or not it can be propagated. For example RocksDB has a merge operator that may be called from an arbitrary RocksDB thread that can or can not be associated with a Prolog thread. You basically cannot pass on the Prolog exception.

Maybe PlQuery::next_solution() should return a DoNotUsePlCheckOnThis object that has methods for getting the return code and the exception (so, no implicit transformation of a Prolog exception into a C++ exception) and leave it at that? And try to cover the various issues in the documentation (both in foreign.doc and the C++ API documentation). [The PlQuery constructor and destructor would still transform Prolog exceptions into C++ exceptions.]

Also, because of the different semantics of exception-handling, I’m wondering if PlQuery should be replaced by PlQueryPassException and PlQueryCatchException. As it is, the code for PlQuery::next_solution() is likely to be a bit different, depending on whether it’s “pass” or “catch” exception handling.

In the pass scenario it should probably be ~PlQuery() that creates the exception. I recall there was something fishy with exceptions from destructors, no? Is there something
fishy anyway? If we have

  { PlQuery q(....);
    while( q.next_solution() )
       ...
    <do more stuff with Prolog>
  }

We execute the <do more stuff with Prolog> before closing the query. That is rather dangerous :frowning: Might be allowed as long as you do not start any queries and only waste some stack space. If we delay the exception to ~PlQuery() though, we violate the rule to return from the predicate ASAP.

There is also the option with neither, which makes little sense inside a predicate, but often makes perfect sense if the main control is in C++. I’d guess we can deal with the differences by keeping the flags around in the PlQuery object and do runtime checks?

If the stack is being unwound because of another exception, throwing an exception from a destructor will result in program termination (C++ doesn’t chain exceptions nor apply any kind of prioritization to them, so it just crashes in this situation).

So, throwing an exception from ~PlQuery() would mean that the <do more stuff with Prolog> mustn’t throw an exception. I’m proposing things like throw PlFail() being implicit from PlCheck(), so it would seem that we shouldn’t throw an exception in ~PlQuery() or else we could trigger a difficult-to-debug runtime crash.

However, the current ~PlQuery() calls PL_cut_query() or PL_close_query() – and that could theoretically throw an exception, so we already might have a problem. Looking at PL_cut_query() – it appears to return FALSE if there’s a pending exception … is that because any failure from PL_next_solution() should immediately call PL_cut_query() (or PL_close_query()) and return FALSE and the easiest way of doing this is:
if ( ! PL_next_solution(qid) ) return PL_close_query(qid)?

[I think that there’s a work-around for dealing with a pending exception from `PlQuery::next_solution()`, but it would depend on `<do more stuff with Prolog>` not calling any `PL_*()` functions.]

(As to PlQueryPassException() etc. - it’s probably easiest to just keep the flags in the PlQuery object and do runtime checks, even if it ends up being a little bit messier than the “pure object-oriented” style.)

That is probably a good idea, provided you add some state to PlQuery that tells you already closed, so you don’t do this twice. PL_close_query() invalidates the query id. Would that actually resolve the problem if you turn this into a C++ exception after the close?

And yes, close/cut can result in a Prolog exception if pruning open choice points using setup_call_cleanup/3 results in an exception. So, it can only result in an exception if PL_next_solution() returned TRUE (and there is still an open choice point).

I already have state in PlQuery because the user can directly call PL_close_query() (using PlQuery::close()) and that shouldn’t be done again (with an invalid qid) in ~PlQuery(). But that doesn’t solve the general problem of PL_close_query() raising a Prolog exception when called from ~PlQuery() in the context of an exception (such as PlFail()) that is unwinding the stack. I will need to think some more about this. There might not be a 100% solution, given what C++ does with exceptions in destructors and the fact that there’s no mechanism AFAIK in C++ for checking whether the destructor is in the context of an exception stack unwinding.

BTW, when PL_next_solution() returns FALSE, I assume that it doesn’t make sense to call PL_cut_query() but PL_close_query() should always be called (although they both might end up doing essentially the same thing, in the context of a fail?).

I think the solution is not so hard, at least not for the passing situation. Just have PlQuery::next_solution() call

    if ( !PL_next_solution(qid) )
    { close();
      <the check that turns the current Prolog exception
       into a C++ one>
    }

Now the ~plQuery does nothing, so we are safe. Actually we are pretty much safe anyway as PL_close/cut_query() can only raise an exception after the preceding PL_next_solution returned TRUE because he only reason for an exception is a triggered cleanup handler from pruning the choice points.

Note that PL_close_query() is basically PL_cut_query() + backtracking to the start of the query. If an exception is pending, PL_close_query() omits the backtracking to preserve the exception.

If I’ve read the code for PL_close_query() correctly, it can only return FALSE if there is already a pending Prolog exception. Is that correct? - If so, I think I can make an API that’s safe (the “check that turns the current Prolog exception in a C++ one” in PlQuery::next_solution() would do a PL_clear_exception(), and that exception would be reinstated in the try-catch code in PREDICATE).

PL_close_query() returns FALSE if either there is already an exception or discard_query(), which may call cleanup handlers, generated one. The cleanup handlers only run if the last PL_next_solution() returned non-deterministically (with TRUE).

So, I think it is as simple as

  • test that PL_next_solution() returned FALSE
  • close the query
  • if an exception is pending, throw it, just like you do with all the other API functions.

Using PL_clear_exception() is almost always wrong. The above should deal perfectly with the scenario where we do a callback to Prolog from a C++ defined predicate. Once we agree on that we must think about the scenario where there is no Prolog context outside the C++ one.

I don’t think it’s quite as simple, because my design has two kinds of C++ exceptions: PlFail() and PlException() and no way to tell whether either one (or even another kind of C++ exception) is unwinding the stack. Perhaps if the C++ API could have a flag in the foreign stack frame, then it could record that it’s doing a throw PlFail() or throw PlException() (but I’m not suggesting this unless there’s no other way of doing things).

What are the circumstances where there could be a cleanup handler? The documentation says to see setup_call_cleanup/3 – is that the only thing that can create a cleanup handler? If so, the cleanup handler would be inside the called goal (to PL_open_query()), so the cleanup handler should have already been run by the time PL_next_solution() returns, and therefore wouldn’t happen inside PL_cut_query() – or did I misunderstand?

Is this for the situation where there is an enclosing PlEngine, and no PREDICATE? @mgondan1’s sample code had a try-catch (similar to what’s generated by the PREDICATE macro), and anyone using PlEngine would need to do something similar. (The try-catch would need to catch both PlFail and PlException). This should be straightforward unless there’s a need to distinguish whether the exception came from PlEngine, PlQuery, PlQuery::next_solution() or the destructors. And we could provide a utility function that creates an engine, calls a query and cleans up (C++ has a “lambda”, so this could be made fairly flexible).