Based on an early prototype, I’ve deployed two Web Prolog ISOBASE profile nodes. This is alpha quality code. I wanted to put this up early here because I want to find security issues now rather than later when there are important things on these machines and I’m relying on them for public testing by the masses.
There’s not much to see from the outside other than the fact that queries work. Also, distributed queries work, too. All of this has been provided by Torbjörn, and I’ve only fixed a few bugs to make it work in a cluster on named hosts instead of only ‘localhost’.
There are two nodes with sample knowledge bases at:
This allows true stateless distributed Web queries, with query and index cached on the client-side.
We need to know if these nodes can be hacked. They use the built-in sandboxing available by swi-prolog. I can imagine all kinds of ways that sandboxing can be broken, and like I said, I want find out now if there’s a security hole.
We are now looking into a good way to do performance and scalability testing, too. Ideas appreciated!
As you can see, unless we are querying an ISOTOPE or ACTOR node, source code injection isn’t supported.
Suppose you’d make the following query:
A = american_beauty,
B = 1999 <blinking cursor>
Now, if Damon decides to restart his nodes tonight, you would still be able to continue exploring the answers to your query tomorrow, and get the correct second solution, just as if the node had not been restarted:
A = american_beauty,
B = 1999 ;
A = anna,
B = 1987 <blinking cursor>
That’s a true sign of a stateless API. Only the client keeps track of the state of the interaction, and it represents it with a pair consisting of a the current query and an integer, that’s all. The node doesn’t need to remember anything from the previous interaction.
The statelessness of the API makes it necessary to handle slow or non-terminating queries in a special way, as we cannot abort a query once it has been submitted. Here’s an example:
?- repeat, fail.
Error: Time limit (1s) exceeded.
To get an idea of how the stateless HTTP API works, you may want to select the following links. (How the responses are shown depends on your browser.)
The rpc/2-3 predicate is a meta-predicate for making non-deterministic remote procedure calls. It allows a process running in a node A to call and try to solve a query in the Prolog context of another node B, taking advantage of the data and programs being offered by B, just as if they were local to A.
That is why, as Damon showed, the following works from the shell at http://one.prolog.computer:3010
?- actress(M, scarlett_johansson, _),
D = peter_webber,
M = girl_with_a_pearl_earring ;
D = sofia_coppola,
M = lost_in_translation ;
(It’s a silly example, of course, since director/2 is available at http://one.prolog.computer:3010 as well. We need to come up with better examples here.)
Here’s how rpc/2-3 is implemented on top of the stateless HTTP API:
The most important thing here is that the HTTP API is stateless. This is what makes it different from the HTTP API that’s available in library(pengines), and the pengine_rpc/2-3 that this library offers.
A naive implementation of the stateless HTTP query API is easy to build. For a request of the form <BaseURI>/ask?query=<Q>&offset=<N>&limit=<M> the node needs to compute the slice of solutions 0..N+M, drop the solutions 0..N and respond with the rest of the slice. Even a CGI script could do this, but this would be slow, not only because CGI by its nature is slow, but also since such a naive approach may involve a lot of recomputation. Computing the first slice (i.e. the one starting at offset 0) is as fast as it can be, but computing the second slice involves the recomputation of the first slice and, more generally, computing the nth slice involves the recomputation of all preceding slices, the results of which are then just thrown away. This, of course, is a waste of resources and puts an unnecessary burden on the node.
To implement the HTTP API in a way that makes it both stateless and scalable we use a novel kind of caching scheme that Jan W and I experimented with some years ago, and which has now been refined and reimplemented. It’s somewhat complicated, and I won’t describe it here unless you ask me to, but it seems to work just fine. One way to see that recomputation is avoided is to run the following query at http://one.prolog.computer:3010:
?- (sleep(0.9), X=a ; X=b).
You will then find that because of the call to sleep/1, the answer X=a will take around a second to appear, but when you ask for the second answer, it will appear immediately. This shows that the first disjunct isn’t recomputed.
Yes, thanks for spotting these bugs. We are not going to fixed them yet though since the terminal is just a quick hack to make it possible to demo Web Prolog online. In the future, I think SWISH should be used as a front-end instead.
I don’t think that HTTP becomes stateful when keep-alive is used, at least not in any interesting sense. WebSocket, however, is a truly stateful protocol. But I’m glad you asked for source code, because here comes the implementation of the API. It is not exactly the implementation used in the demo nodes (as that’s a bit too hacky to show), but it should give you an idea of how the API can be built on top of pengines.
To make it easier to understand, the above code illustrates the main idea of our caching scheme, but leaves out many of the details that are necessary for a fully working node. For example, it does not prune the cache when it has grown too big (that should be done from the top when a max-size has been reached), it does not implement the mechanism that will terminate a pengine when it has run for too long, and it does not implement the format option. Code for handling all of this would be easy to add, and our demonstrator adds it.
I’ll be happy to explain even further, if you want me to. Also, the following illustration might help:
My long term goal is to create a p2p secure distributed search engine with a conversational agent interface on mobile. The sponsored hardware would be your mobile phone and everyone else’s. My hope, and this is beyond the scope of just Web Prolog, is that our prolog.computer cluster will prove out Web Prolog and seed the network using Web Prolog as the interaction language.
This requires that the other profiles beyond ISOBASE be implemented as well to do things like code injection for peer knowledge bases and distributed Web search queries. Also, I foresee WebAssembly playing a big role.
This is very much a research project. I’m stepping out on a limb here as not all the details are understood, but I’m a big believer in Release Early Release Often. Always interested in hearing everyone’s thoughts.
I’m grateful for Torbjörn’s work here as it fleshes out the security model in many ways and enables the use of Prolog as a central feature and implementation path. I’m interested in algebraic and so-called Explainable AI approaches to indexing and caching derived from the queries and knowledge bases themselves. That is all very far-future so I cannot really answer detailed questions there as I’m still researching.
Please be kind. Don’t shred my ideas just yet. I need help exploring these ideas. Helpful comments and criticism is definitely wanted.
Yes. Edge-AI is exactly what I’m talking about. Cell phones are the primary deployment target. Like that article says, 80% of mobile devices will have AI chips by 2020. A secure p2p search engine which protects your identity and keeps your queries safe from prying eyes. Using distributed queries which hop from host to host means they can be anonymized. This should boost security.
You mean better as in never fails? No, I don’t think so. Computers fail, software fails - that’s life. Or do you mean better as in faster? For queries that both can handle, an SWI-Prolog implementation of an ISOBASE node might be marginally faster since there are things it doesn’t need to do in order to deliver a response, such as creating a module, or register its identity. On the other hand, it needs to search the cache before it tries to compute a response, so I don’t think the difference is measurable.
What I dont understand is how the cache should work for actors.
It works for pengines that don’t do I/O or dynamic db modifications. Thus, it works for pure Prolog, but also for many Prolog programs that are not pure, as long as they don’t do I/O or dynamic db modifications. Note that a pengine is a kind of actor, so it works for some actors, but not for all. It’s easy to see that a strict request-response protocol cannot work for arbitrary actors.
Arbitrary actors, such as pengines that do I/O or dynamic db modifications, should be queries over the WebSocket API. In such cases, an ACTOR node is needed. This will be much faster than SWISH, especially for applications that do a lot of “server push” - a game engine which very frequently needs to communicate the state of the game to a client, for example.
The library behind SWISH (library(pengines)) relies on something akin to longpolling. Unfortunately, when it comes to concurrent and distributed programming, library(pengines) promises more than it can deliver. Hopefully, in the future, a Web Prolog node can replace the SWISH backend.
Of course it can - some of them. But try to do the following in SWISH, a straightforward translation of an Erlang program into Web Prolog. Two actors, residing on two different nodes, playing ping-pong:
I have already pointed you to an PoC implementation of an ACTOR node. It’s here: https://github.com/Web-Prolog/swi-web-prolog. Unfortunately, it is not yet safe to open up to the world, so you have to download, install it locally and run it there. If you do, you’ll find there is a tutorial too, that will let you try examples such as the above ping-pong program, as well as many other examples.
It’s a good example of something that cannot be done in SWISH, not even when library(pengines) is imported.
At this point in time, no. Later, yes, when we have figured out how to use library(sandbox) in this case.
Time is measured from the moment a pengine starts to compute a solution to the query until it has found one. There is a setting that allows the owner of a node to change the time limit.
Telescript seems to be framework for mobile agents. Web Prolog does not aim for mobility, except in the sense that a running actor can decide to create a copy of itself on a remote node. However, note that if the original actor then decides to terminate, the copy running remotely will (by default) be forced to terminate too. This is an effect of the linking mechanism that Web Prolog inherits from Erlang. When running on the Web it is important that the default for the link option is true, and that only an authorised user (e.g. the owner of the node) can set it to false.
An obvious example of the use of linking is that when a programmer closes/leaves/reloads the web-based IDE, the top-level pengine which served the shell as well as actors that may have been spawned by this pengine are forced to terminate. (This is what happens in SWISH too.)
Note that the discussion above is only relevant when we are programming against the ACTOR profile. Let’s return to the properties of the ISOBASE profile. When making a stateless HTTP request to a node, notions such as linking isn’t really relevant. For example, if you submit a query ?-movie(A,B) in the shell at http://one.prolog.computer:3010 and then, after having seen the first answer, close or reload the window/tab, the remote pengine will still be kept in the cache in a suspended state. Later, it may become useful to another client who is requesting the second answer to ?-movie(A,B), and so on. For the benefit of any client that can make use of it, the pengine will be around until the query has either been run to completion, or the max size of the cache is exceeded so that some pooled pengines - which may include “your” pengine - must be terminated and removed from the cache.
And suppose two identical requests of the following form are made at the same time:
When they try to retract the first cache/3 clause, only one of them will succeed. (At least that’s how I understand the documentation on retract/1 at https://www.swi-prolog.org/pldoc/doc_for?object=retract/1) It will thus grab the pengine with the pid 55321100. The other request will grab the pengine with the pid 66783421.
For a short while, there will only be one clause in the cache, namely
cache(p(_), 1, 84926378).
but when both requests have returned their responses, the cache has been updated and might look as follows:
So far, since retract/1 is an atomic operation, nothing bad has happened. And AFAICS, in this scenario, nothing bad can happen. So far, I see no need for a mutex.
However, the above implementation is not complete. We also need a mechanism that will prune the cache when it its max size is exceeded. Suppose the max size is 3. (Of course, normally it would be 1,000, or 10,000, or something like that.)
Now, suppose the following request is made:
The cache will be of no help here, so a new pengine will be spawned which will do the job. Since there are more solutions, a clause will be added to the cache:
Now, the cache has exceeded its max size and must be pruned. This is done from the top since that’s were we have the oldest entry. The clause cache(p(_),3,84926378) must be removed and the pengine with the pid 84926378 must be forced to terminate.
This is where we have a potential race condition. Because when the pruning mechanism tries to retract cache(p(_),3,84926378), there may also be an incoming request of the form:
We have two cases:
The incoming request retracts the clause cache(p(_),3,84926378) before the pruning mechanism has a chance to do so. The pruning mechanism will then instead retract cache(mortal(_),2,66783421). This, AFAICS, won’t lead to any problems.
The pruning mechanism retracts the clause cache(p(_),3,84926378) before the incoming request has a chance to do so. This only means that the incoming request needs to spawn a new pengine. No problem here either.
There is no need for a mutex in the context of the pruning mechanism either. The “entry point” into the cache is always through the atomic retract/1. Once a thread has succeeded in retracting a cache/3 clause, no other thread can get at it, and thus no other thread can get hold of the corresponding pengine. (Note that for each pengine in the system, there can exist at most one cache/3 clause.)
I know only too well how difficult programming with concurrency becomes in the presence of shared state, so I’m perhaps a bit too optimistic when I boldly claim that 1) the proposed scheme will work, and 2) that there is no need for mutexes. (I thought there was, but I now believe there isn’t.)
Do you, or anyone else, see any holes in my line of reasoning?
My tabling idea doesn’t apply to your current prototype. Since in your prototype you do cache processes(*) and not results. But tabling caches results. Nevertheless one could also try an actor prototype that is closer to tabling.
I guess in shared tabling the result caching is extended to cache results plus some process(*) info, because of the deadlock management. So this type of tabling is a little closer to your prototype, but still it is not the same.
(*) Erlang terminology, but its in fact is typically a thread inside an OS process.