Merrits of Erlang style for real-time systems

Ok, or perhaps it’s about both error handling and (a particular kind of) “garbage collection”, and how these two things interact?

In any case, I agree that ensuring the termination of pengines and other kind of actors (and don’t forget the other kinds of actors!) at the proper point in time is an important topic. Part of the reason it is both important and surprisingly tricky, is that the program and the node on which we want to run the program may have different owners. In this case, who’s responsible for cleaning up the garbage in the form of stale pengines - the person who has written and is running the program, or the owner of the node? So far, as I see it, we’ve touched upon a couple of ideas:

  1. The client (i.e. the person who authored the program or the query) is responsible. As far as I can see, this would only work if the client and the owner of the node is the same person (or organisation), e.g. if you’re setting up a cluster of nodes inside a firewall and/or with proper authorisation. This is how distribution in Erlang normally works. However, in the case of Web Prolog, we want a scheme which is more open, in the sense that the Web is open.

  2. We also have the case which can be illustrated by the way SWISH works. As long as the sandbox code is fine with it, SWISH allows unauthorised clients to run any query or program on the SWISH server. So far at least, this has worked fine. If a client sends the query ?-repeat,fail to the SWISH backend, it will work, but only for a while, until (after a minute or so, depending on an owner-controlled setting) a timeout occurs, which kills the pengine running the query. And in a situation like this

    ?-member(X,[a,b]).
    X = a 
    

    if the client waits long enough (perhaps 5 minutes or so, again depending on a setting), then the pengines will be destroyed, and the second solution to the query will no longer be available.

    So, in this case, the client has a lot of freedom and has almost full control over the remote pengine, but the final say, when it comes to looking after resources, belongs to the owner of the node.

  1. In the case of this situation

    ?- rpc('http://ex.org', member(X,[a,b])).
    X = a 
    

    the same scheme as in 2) can be used by the owner of http://ex.org, but there may be another way which is better - the way I had in mind when I wrote this post. Here’s what I write in the Erlang’19 paper about the idea (which was first proposed by Jan Wielemaker back in 2009 or so):

    "Interestingly, it also turns out that since rpc/2-3 does not produce output or request input, it can be run over HTTP instead of over the WebSocket protocol. In our proof-of-concept implementation this is the default transport.

    To retrieve the first solution to ?-mortal(X) using HTTP, a GET request can be made with the following URI:

    http://remote.org/ask?query=mortal(X)&offset=0 
    

    Here too, responses are returned as Prolog or as Prolog variable bindings encoded as JSON. Such URIs are simple, they are meaningful, they are declarative, they can be bookmarked, and responses are cachable by intermediates.

    To ask for the second and third solution to ?-mortal(X), another GET request can be made with the same query, but setting offset to 1 this time and adding a parameter limit=2. In order to avoid recomputation of previous solutions, the actor manager keeps a pool of active pengines. For example, when the actor manager received the first request it spawned a pengine which found and returned the solution to the client. This pengine - still running - was then stored in the pool where it was indexed on the combination of the query and an integer indicating the number of solutions produced so far (i.e. 1 in this case). When a request for the second and third solution arrived, the actor manager picked a matching pooled pengine, used it to compute the solutions, and returned them to the client. Note that the second request could have come from any client, not necessarily from the one that requested the first solution. This is what makes the HTTP API stateless.

    The maximum size of the pool is determined by the node’s settings. To ensure that the load on the node is kept within limits, the oldest active pengines are terminated and removed from the pool when the maximum size is reached. This may mean that some solutions to some subsequent calls must be disposed of, but this will not hurt the general performance."

    As I wrote before, if you want to know (much!) more, have a look at section 6.3 - 6.5 in the longer manuscript. The tutorial in the PoC provides a couple of examples.

    Using this scheme, the node becomes 100% responsible for the maintenance of resources, which, in many cases, may be a good thing.