Note that these are dependencies that a particular implementation of Web Prolog has (and BTW, it no longer uses with_mutex/2). Neither flag/3 nor with_mutex/2 are likely to be built-ins in a standard for Web Prolog.
Right, that should amount to just a couple of days work.
Why did you leave out Jekejeke Prolog?
Web Prolog is not a vendor, but a language. And if I can bring Web Prolog as far as Paulo has brought the Logtalk language (a solid implementation, real users, etc.), I will retire feeling really good about myself. It is, as I’m sure you know, incredibly difficult to succeed in this arena.
D = peter_webber,
M = girl_with_a_pearl_earring,
Y = 2003 ;
D = sofia_coppola,
M = lost_in_translation,
Y = 2003 ;
D = ethan_coen,
M = the_man_who_wasn_t_there,
Y = 2001 ;
Thanks, that’s an interesting example of what you should not expect to work.
Yes, maybe it can be fixed, I’m not sure, but even if it can’t, it would just add to the things that can go wrong in Prolog when you treat variables as just another datatype.
It works if we instantiate the variables:
?- X = 1, Y =2, X @< Y, rpc('http://two.prolog.computer:3010', Y @> X).
X = 1,
Y = 2.
?-
(BTW, the transport and pid options don’t work in our PoC. They only make sense in the ACTOR profile.)
As for semantics, as long as we deal with pure Prolog (which your example certainly doesn’t), the traditional approach to semantics for logic programs will probably work also in the distributed case. Recall the quote by Fitting. I don’t think he had examples like your’s in mind.
Thanks for suggesting benchmarking. I’m not going to present a “real” benchmark here, but instead run a number of queries and reason about their performance. This might be a good step to take before designing a real benchmark. I won’t use the data you suggest, but instead use something from a Prolog file with Wordnet data (available here):
I only use the predicate s/6 from the Wordnet file wn_s.pl and define two predicates, word/2 and type/2, like so:
Here are timings for some queries. Also shows how many solutions there are:
% How many words are there in Wordnet?
?- time(findall(.,word(S,_), L)), length(L, N).
% 212,567 inferences, 0.033 CPU in 0.037 seconds (88% CPU, 6431485 Lips)
N = 212556.
% How many verbs are there in Wordnet?
?- time(findall(.,type(S,v), L)), length(L, N).
% 25,058 inferences, 0.004 CPU in 0.004 seconds (91% CPU, 6962490 Lips)
N = 25047.
?- time(findall(.,(type(S,v), word(S,_)),L)), length(L, N).
% 124,306 inferences, 0.018 CPU in 0.019 seconds (95% CPU, 6751358 Lips)
N = 74201.
Note the fantastic performance here – thanks Master Wielemaker, for your great job!
Let’s move on to your ideas for a benchmark of rpc/2-3. Here’s the timing for solving the query ?-type(S,v),word(S,_) wrapped in rpc/2:
?- time(findall(.,rpc('http://localhost:3010',(type(S,v),word(S,_))),L)), length(L, N).
% 150,283 inferences, 0.071 CPU in 0.148 seconds (48% CPU, 2110924 Lips)
N = 74201.
So that’s efficient enough. But what you really wanted was a query that does searches on two different nodes. You wanted them on two different computers, but I will use only one, on which two nodes are running, http://localhost:3010 and http://localhost:3011.
As we can see below, performance is absolutely no problem if we want to look at the solutions one by one in the shell:
?- time((rpc('http://localhost:3010', type(S,v)),
rpc('http://localhost:3011', word(S,W)))).
% 5,060 inferences, 0.010 CPU in 0.029 seconds (33% CPU, 527083 Lips)
S = 200001740,
W = breathe ;
% 4 inferences, 0.000 CPU in 0.000 seconds (57% CPU, 500000 Lips)
S = 200001740,
W = 'take a breath' ;
% 2 inferences, 0.000 CPU in 0.000 seconds (50% CPU, 285714 Lips)
S = 200001740,
W = respire ;
% 2 inferences, 0.000 CPU in 0.000 seconds (71% CPU, 133333 Lips)
S = 200001740,
W = suspire ;
...
That’s not much of a benchmark, though. Let’s instead measure how long it takes to compute all 74201 solutions:
?- time(forall((rpc('http://localhost:3010', type(S,v)),
rpc('http://localhost:3011', word(S,W))),true)).
% 45,131,335 inferences, 7.305 CPU in 46.703 seconds (16% CPU, 6177767 Lips)
true.
That’s pretty slow. The main reason is that since the first occurrence of the call to rpc/2 has 25047 solutions, the second occurrence of rpc/2 is called 25047 times!
The following query confirms that making 25047 network roundtrips to localhost takes around 45 seconds, so by far that’s where the most time is spent:
?- time(forall((between(1,25047,_), rpc('http://localhost:3010', true)),true)).
% 43,080,839 inferences, 6.803 CPU in 45.369 seconds (15% CPU, 6332984 Lips)
true.
Note that rpc/2 was used here, which meant that the default limit=infinite was used. As we see here, and as expected, setting limit to a lesser value such as 100 is even slower:
?- time(forall((rpc('http://localhost:3010', type(S,v), [limit(100)]),
rpc('http://localhost:3011', word(S,W), [limit(100)])),true)).
% 44,963,746 inferences, 7.533 CPU in 62.859 seconds (12% CPU, 5968555 Lips)
true.
By using once we lose completeness, but can save a lot of time:
?- time(findall(.,(rpc('http://localhost:3010', type(S,v), [limit(100),once(true)]),
rpc('http://localhost:3011', word(S,W), [limit(100),once(true)])),L)),
length(L,N).
% 202,258 inferences, 0.028 CPU in 0.139 seconds (20% CPU, 7139358 Lips)
N = 256.
In addition to options such as limit and once, rpc/2-3 should very likely support at least some of the options that apply to http_open/3. They are here. I’ve only tried to use one of them, namely the connection option. As we see below, it turns out that passing connection('Keep-alive') will make a significant difference in the performance:
?- time(forall((between(1,25047,_), rpc('http://localhost:3010', true, [connection('Keep-alive')])),true)).
% 48,466,242 inferences, 6.155 CPU in 15.133 seconds (41% CPU, 7874288 Lips)
true.
That’s only a third of the time this took when this option wasn’t passed, so it should reduce most of the timings above by a third.
The options (or at least most of them) that can be passed to rpc/3 can be seen as pragmas, i.e. as language constructs that specifies how the conversation between the client and the node should be conducted. Passing such pragmas will have no effect on the meaning of a query, but can have a significant effect on performance when running the query over a cluster of nodes.
By means of the src_uris option in combination with the special-purpose URI localnode, we can imagine doing something like this:
This will only work under certain circumstances, namely when the node-resident source code for http://localhost:3010 and http://localhost:3011 is self-contained. When this is true, this configuration is likely to be the fastest. I haven’t implemented this yet, but it should mean that the task is solved in just a few second again. This time, the time it takes to download the resources will dominate.
Yes, that might be yet another way to improve upon the performance of a query running over a cluster of nodes.
But again, leveraging caching performed by the already existing intermediates such as proxies and reverse proxies would be another (and, I believe, more important) key to good performance and scalability of the Prolog Web, as they can cache web content at a number of different locations along the path between a client an origin node. A proxy deployed close to clients can eliminate the need to contact the origin node. A reverse proxy located in front of a set of nodes can cache popular responses, and, at the same time, it can act as a load balancer.
As far as I understand, such proxies don’t need to know anything about Prolog. What they do need in order to make a difference are truly stateless HTTP requests, which don’t rely on cookies or other forms of session identifiers. This is why a request such as
GET http://n1.org/ask?query=p(X)&offset=10
is more useful for caching than a request such as
GET http://n1.org/ask?id=0f453c62-03b4-11ea-813d-17802f9bfeab&query=p(X)&offset=10
That’s why I think the statelessness of a node’s HTTP API is very important.
I must admit that although I appreciate the importance of caching on the Web, I don’t know a whole lot about it. Hopefully, if a W3C Community Group pushing for Web Prolog can be formed, it will attract members well versed in such things.
First, sorry everybody for the long and excessively detailed posts. Writing them helps me getting things straight, so for me it’s worth it, even if no one would read them. And thank you, Jan B, for actually reading, for your comments and suggestions, and for keeping the conversation going.
I’m not sure I follow all of what you say above, and as for your question
I’m afraid the answer is that I have no such ideas.
But since you mention HTTP POST, I should perhaps say something about the ISOTOPE profiles, which is the next level up in the hierarchy of profiles, and where POSTing has a (small) role to play. As can be seen in this diagram,
the ISOTOPE profile doesn’t seem to bring much in addition to what’s already available in the ISOBASE profile. It adds the options/parameters src_text, src_list, src_predicates and src_uris to the options/parameters that already ISOBASE supports. (As we saw in my previous post, the src_uris option can be supported already by the ISOBASE profile, but only in combination with the localnode special-purpose URI.)
We don’t have a secure ISOTOPE node up and running yet, but once we have that, it will allow us to add an editor to the GUI which appears alongside the shell when we direct our browsers at http://one.prolog.computer. In other words, the ISOTOPE profile can, in contrast to the ISOBASE profile, support a simple IDE, like this one:
(The request doesn’t change the state of the node, so for this reason it’s still a task for a GET request, but some browsers restrict the length of query strings, so we should support POST too.)
This is still a stateless API, so it has the same nice properties as the ISOBASE node in that we can ask for the first solution
?- mortal(Who).
Who = socrates <blinking cursor>
wait for an arbitrary long time (and even restart the node) and then ask for the next solution
?- mortal(Who).
Who = socrates ;
Who = plato <blinking cursor>
Only the client keeps track of the state of the interaction, this time represented with a triple consisting of the query, the source text, and an integer. Again, the node doesn’t need to remember anything from the previous interaction.
When talking to an ISOTOPE node, the src_text option can be used when calling rpc/3 too:
Compared to the listing here, some things have changed: The default for the limit option is now infinite, the once option is in place, and a trick using term_variables/2 is used in order to avoid shipping around unnecessarily large terms.
Support for the src_* options is not there yet, but it’s easy to see that in order to add it we must first translate the src_predicates and src_list options into a src_text parameter which can then be passed to http_open/3.
Some things will have to change in the implementation of the node’s HTTP handler as well. The hash which in the ISOBASE node is computed from the query only, must now be computed from the query plus any source code to be injected into the pengine to be created.
Note that each HTTP request for a solution to a query must carry along the source text, or else the cache mechanism wouldn’t be able to locate the right pengine. This may seem wasteful, but it’s a consequence of going statelessness. In contrast, in the ACTOR profile, when transport over the WebSocket API is an option, like so
I don’t think your proposal is compatible with HTTP, and since you mention actors, I guess it’s time to take another step up the hierarchy of Web Prolog profiles, and have a look at the ACTOR profile:
This profiles contains a lot, and even has its own version of the rpc/2-3 predicate. Here’s the implementation of rpc/2-3 which will run when the value of the transport option is set to websocket:
It only works over a WebSocket connection, since this is what the remote pengine doing the actual work requires.
This implementation of rpc/2-3 and the implementation built on top of HTTP are almost functionally equivalent in the sense that they will behave in almost exactly the same way when called with an URI and a query and (possibly) some options such as limit or any of the src_* options. However, there’s one difference: as it runs over the WebSocket protocol this version allows the remote process to produce output “written” by the program and for the calling process to receive it. Here’s an example:
As seen above, the pid option is passed with a free variable which will be bound to the pid of the remote pengine. This means that we can (as we do above) check that the message is coming from the right source and potentially also send messages back to it, using e.g. the send operator !/2 (as we also do above). Of course, using the pid option kind of breaks the abstraction for remote procedure calling, so it should probably be used with care. In the above case, it might be better to be more explicit and use pengine_spawn/2-3 to create a remote pengine instead of calling rpc/3.
I suspect that both implementations of rpc/2-3 are useful, and if we’re running against an ACTOR node we often have a choice. Running rpc/2-3 over the stateless HTTP protocol can be a blessing for caching performed by intermediates, whereas caching by intermediates won’t work for communication over the WebSocket protocol. Running rpc/2-3 over the stateless HTTP protocol can’t deal with I/O, whereas the stateful WebSocket protocol can handle it. Running rpc/2-3 over the stateless protocol often involves the shipping of the same source code more than once, whereas rpc/2-3 over websockets only needs to ship such code when making the initial call. So, all other things being equal, what would be the most performant choice – rpc/2-3 with option transport(http) or with transport(websocket)? I have no idea, I guess we need to do some benchmarking in order to determine this.
So far in this thread we’ve only dealt with rpc/2-3 and two roles that pengines can play for its specification and implementation: as cached processes that have “more to give” in the implementation of the HTTP API, and as the sole process reponsible for the computation on the remote node when rpc/2-3 is defined as above. For the sake of completeness, recall a third role that pengines play: when we’re interacting with a shell attached to an ACTOR node, we’re actually interacting with a pengine running there. This pengine is created with the option exit(false), which means it will be there for a whole session.
?- write(hello).
hello
true.
?- read(Term).
|: goodbye.
Term = goodbye.
?- assert(foo(bar)).
true.
?- retract(foo(X)).
X = bar.
?-
Except for database manipulation spanning over two queries (which it currently isn’t configured to do), SWISH can also do this. But SWISH (supported by library(pengines)) doesn’t provide us with Erlang-style concurrent programming with pengines and other kinds of actors, and it doesn’t support efficent pushing of messages from node to client, but relies on a kind of long-polling.
In contrast, when working against an ACTOR node, we are able to program using Erlang-style concurrency:
So against this background, let’s return to your question/comment:
As I said in the beginning of this post, I don’t think your model is compatible with HTTP, but using pengines running over websockets I guess your problem might be solvable. For a start, it’s clear that when the pengine top creates the pengine foo, it has to send along its pid, and when foo creates bar it has to forward this pid to bar as well, so that bar knows where to send answers. A lot more must be done, of course, and I doubt something like that can be hidden under an API as simple as rpc/2-3. In any case, I don’t think it would be worth the attempt, but would love if you could prove me wrong.
If it can be done in Erlang it can perhaps be done in Web Prolog too. They really are very similar, except that running over nodes on the open Web rather than over a closed cluster of Erlang nodes may well make some things harder, and maybe impossible.
But “routinely”? In Erlang, you’d need the third-party skel library for it, and probably have to read the tutorial you link to in order to learn how to use it first. Doesn’t sound like routinely at all.
You’re welcome to port the skel library to Web Prolog if you want, but I’m giving up on your example now as I don’t think it contributes anything of value.
Sure, why not? But a node running the ACTOR profile of Web Prolog is of course needed, as this is the only profile that supports the WebSocket protocol.
I think it’s more interesting to ask what can be done with Web Prolog but not with Erlang. With Web Prolog we have pengines, and they are not available in Erlang. And on top of pengines and other actors we can build a Prolog Web, which, when the Web Prolog code involved i pure, is a web of pure logic. An Erlang cluster – which is not an extension of the open Web, but a closed layered network of nodes and processes – doesn’t give us that.
The way I implement it costs very little extra. But if your asking if Erlang has a spawn/1, the answer is yes. Web Prolog can have it too. You would not be able to send to an actor spawned by means of it, since you lack an address.
Pure Erlang primitives realization, no pengine_xxx.
Pure Erlang primitives realization, no http_xxx.
No separate pengine_ask needed for the first solution.
Repeat fail realization of the wait_answer loop.
Automatic tear down via exit/2 from events in the continuation.
The tear down can be seen in this example where a cut happens in the continuation. No stop message support is needed by the remote part of the Prolog RPC:
?- rpc('localhost:3010', father(tom, X)), !.
X = sally
Wont leave any orphaned actor on the remote site. Instead simply the remote part is immediately exited with the help of a local setup_call_cleanup/3 usage.
Maybe this complete reduction to the actor model has a few advantages:
transport: We can delegate provisioning of some transport to the actor platform, its not necessary to write a different rpc for different actor platforms. As long as the end point that we want to invoke is representable as what spawn understands, rpc/2 should work.
interleaving: In as far as the actor platform allows multiplexing, multiple rpc calls can interleave their messages that are exchanged between local part and remote part, increasing resource utilization.
state: Like actors, the RPC call will optionally have state. At the remote part of the Prolog RPC the execution can involve side effects and blocking, in particular backtracking will always adress the same actor in the remote site.
Open issues:
security: It might be desirable to identify the messages that the remote part sends back to the local part as replies, and have some rules in place. Best would be if the actor platform would already provide this security.
I saw that the Java DatagramSocket calls SecurityManager. I guess its possible to implement policies that an applet is only allowed to commmunicate with its originating server, etc… But then maybe we also want a SecurityManager on the server.