Merrits of Erlang style for real-time systems

Fair enough, this time it may well have been me who failed to adhere to the stick-to-the-topic discipline. On the other hand, error handling is often hopelessly intertwined with normal operation, and I guess it’s sometimes difficult to determine what should be thought of as an error. Leaving a process in a living (but idle) state on a remote node despite no longer having any use for it since, as in your example, a cut on the client side made it clear that no more backtracking over nodes will take place - is that an error? It’s a programmer’s error in the sense that the programmer could have avoided it (e.g. by wrapping the query in once/1), and it leads to a problem for sure, especially when the node on which the “stale” actor resides is owned by someone else. SWISH and library(pengines) deals with this problem by imposing a time limit on how long a stale pengine is allowed to hang around. When this limit is exceeded, the pengine is terminated.

Erlang is often described as a concurrent and distributed language with a functional core. When it comes to error handling, Erlangers seem to argue (in my mind correctly) that there’s a need for two fundamentally different ways of handling errors. In sequential Erlang code, catch and throw can and should be used, but for concurrent and distributed programming, tools such as error messages, monitors and links are more important. (In addition, Erlangers often seem to argue that too much “defensive programming” using catch and throw is a mistake, and that “let it crash” (and then restart) is often a better error recovery model.)

So how about error detection and error recovery in Web Prolog - thinking about it as a concurrent and distributed language with a relational/logic programming core? I believe we should allow ourselves to be inspired by Erlang here too. Now, when I think about setup_call_cleanup/3, I find easy to think about it as a construct, alongside catch and throw, for error detection and recovery in sequential Web Prolog, but not as a very useful construct for dealing with errors and recovery from errors outside sequential code. For this, error messages, monitors and links seem to be more useful.

I think you need to do a lot less of that in Erlang (or in Web Prolog) - most of that is supposed to happen automatically.

In the context of the definition of rpc/2-3 it serves to “convert” an error message into an exception, which is, as far as I can see, the only sane way to deal with it in this context. In other, more “asynchronous” contexts, e.g. when pengine_spawn/1-2 and friends are used as is, other ways to deal with this message are probably called for, and they are left for the programmer to decide.

Yes, that’s what it means.

Let’s look at the following statechart instead, which is the one to keep in mind when discussing Web Prolog.

PLAP_statechart

It’s important to understand that the statechart captures only one (but important!) aspect of what’s going on, not everything. One thing that it doesn’t capture is the effect of the exit option we discussed before. If exit(true) is passed when pengine_spawn/1-2)is called, the pengine will always be automatically destroyed upon reception of the messages success(false), stop, failure, or error. That is, the pengine is set up to run just one query to completion, which may, of course, involve the reception of several next messages and the sending of several answer messages.