Web Prolog ISOBASE Test Cluster - Hack Away, Find Security Bugs

Tabling is certainly cool and I’m looking forward to when we can have nodes which provide users with built-ins for tabling in addition to the built-ins available in the ISOTOPE or ACTOR profile. But I think we should start simple, and design something that can be easily implemented and easily standardised, so I think tabling on that level must wait.

You seem to suggest that tabling can be used for internal purposes as well, as a technique for doing what we do with our caching scheme. Isn’t what you describe yet another way to cache solutions, rather than caching pengine processes that, as it were, have “more to give”? For caching of solutions, I think what’s already available on the Web in the form of intermediates that do caching of responses to stateless requests will take us a long way towards scalability.

I must say I’m very happy with the current caching scheme. It shows, I believe, that there exists at least one method that allows us to offer a stateless HTTP API which avoids a lot of recomputation. It may be that your proposal would also work, and may be better than ours, and there may be methods that are far superiour to both of our proposals. People should be allowed to build an ISOBASE node using the old CGI technology if the want to. It won’t be fast, but it’s possible to make it conform to a future Web Prolog ISOBASE standard, precisely since such a standard won’t prescribe a particular way to achieve statelessness.

1 Like

Why do you think it avoids recomputation? Define recomputation! Because you can have this here, it doesn’t avoid recomputation in the general sense:

cache(mortal(_), 2, 66783421).
cache(mortal(_), 2, 55321100).

Possibly you compare with a scenarion where for a solution index k, it recomputes the solutions 1, 2, 3, …, k-1 before it returns solution k. Yes this would be a recomputation as well. But tabling solves the unnecessary recomputation problem from the

viewpoint of a query and not from the viewpoint of a solution index. I was a little surprised that you cannot share solutions for the same query and reuse a cached process here. But I guess you cannot provide this benefit because you dont store some result in the cache.

Question is what is your real world query frequency scenarion. If you have many questions that have long running process to answer and if they all have only one solution and if they have many similar queries then avoiding your duplicate cache entry and storing the result, would be benefitial.

You can also extend this to queries that have side effects, i.e. change their result. You can use some REST technology to convey the information that some result didn’t change. But an actor needs to provide for each query and approximate result with an ETag or last modified.

Possibly REST could be used to combine discributed and shared table with incremental tabling, not sure. Kind of ask detection whether an increment happened instead of a tell approach that an increment. Also not sure whether REST would have benefit in the iterations needed

for shared tabling in case there is some recursion.

1 Like

I’ve already given an example of in what sense recomputation is avoided:

Avoiding recomputation in this sense is all we’ve been looking for. I’m sure there are ways to avoid recomputation of solutions in your sense that can be combined with our method, but I’m not going to explore this myself.

I guess an ISOBASE node shouldn’t be too hard to implement in JekeJeke Prolog. It would be great if you could show us a working demo of your way of doing it. After all, if we want to standardise the ISOBASE profile of Web Prolog in (say) the W3C, we need at least two working implementations.

Just a sketch for an implementation in SWI-Prolog would also be interesting, one that replaces ... below. But note that statelessness must be preserved, so no cookies or session ID’s are allowed.

Such a sketch would help me to better understand your ideas.

Sorry about that. For the rest of my life, I’ll make sure to carry my worry beads, clutch a crucifix tight to me heart, and smear myself in garlic paste.

That’s right. It would be nice to use a cache which is infinite in size, but in the absence of this, a finite one will have to do. Note that when queries are deterministic, the cache will not be used, and for many non-deterministic queries, the limit parameter (or the limit option for rpc/2-3) can be used in such a way that one request is enough to retrieve all solutions, so again the cache will not be used. What the cache does is to guarantee that backtracking between (say) a web browser and a node, or backtracking over nodes, will work too, when this makes sense (and it often does). This is when the cache kicks in and let us do it without performing recomputations. Sometimes, when the max size has been reached and the cached pengine that should have been there to help no longer isn’t, recomputation will have to be performed anyway, and some solutions discarded. With a max size of (say) 1000 or even 10000, it won’t happen very often though.

What I like about this is that a system of nodes equipped with the stateless HTTP API, where each node holds a program written in a Prolog dialect extended with rpc/2-3, constitutes something that really does deserve to be thought of as a Prolog Web. And when programs are written in pure Prolog (Horn clauses only) plus uses of the rpc/2-3 predicate, we can think of it as a web of pure logic. I like that, and I like that it’s so simple and straightforward to achieve.

I think that the following diagram illustrates the idea, and also nicely suggests where the line between declarative and procedural ought to be drawn when pure Web Prolog is used to program the Prolog Web.

Ok, fair enough, and I suppose that was your reason for withdrawing the previous post, i.e. the one that describes what turned out to be a non-issue. But two other posts were withdrawn as well, and I fail to understand why. I think the suggestion you made about the FAP in this post made sense. And the pointer to the paper “Caching for Definite-Clause Theorem Proving” here was also welcome as it allowed me to pick up some useful terminology. I think you should stop withdrawing posts, or at least do it much less often. It doesn’t make sense, especially since they don’t disappear anyway. (You know that anyone can click the pen symbol and read the “withdrawn” post anyway, don’t you?)

Thanks for the reference and the quote. I’ve made a similar (or at least related) observation in my book manuscript. First, a tiny Prolog Web:

And then a simulation using modules:

You seem to have misunderstood my motives. I’m not looking for originality. Here’s something I’ve written about this:

As it is based on the ISO standard of Prolog, the de facto standard of Erlang, and a number of well-established web standards, Web Prolog does not aspire to much originality or novelty, except perhaps in the ways underlying concepts, models and mechanisms are combined in our ambition to introduce Web Prolog as an embedded and sandboxed scripting language for the Web. With traits inherited from Prolog as well as from Erlang, Web Prolog should be seen as a product of small steps of evolution, rather than the result of a revolution. This should make it easier to standardise.

Because when it comes to standardisation, a lack of originality and novelty may actually be an advantage, since it makes it more probable a standardisation effort might actually succeed. Few hitherto unknown problems are likely to surface, so by as much as possible borrowing rather than inventing we may be able to create a viable and trustworthy standard much quicker than would otherwise have been the case.

So if you happen to find some old ideas concerning the computational or implementation issues, and not only the semantic ones, please let me know!

1 Like

I once suggested to @pmoura that Logtalk would be a great implementation language for Web Prolog (although for other reasons than the one you mention). I still think so, although I guess it would only run with SWI-Prolog as a backend to Logtalk, at least until the other back-end implementations catch up on WebSockets, HTTP libraries, etc. Unfortunately, I don’t think Paulo warmed to the idea, and he’s very busy with the development of Logtalk and with commercial contracts. I’m secretly hoping that might change some day, and that I’ll be able to turn him. :slight_smile:

1 Like

I agree that rpc/2-3 is somehow related to call/1. I also agree that call/N is a good idea, but I fail to see why rpc/2-N would be a good idea. If you need to use closures in that way, why don’t you just call rpc(Site,call(Closure,Argument)), rpc(Site,call(Closure,Argument,Argument2)), etc., directly instead? It will cost you six characters extra, but hey…

You would have to convince me that the reasons for wanting call/N are valid also for RPC. Can you do that?

(Also, you ignored the fact that there is an optional but very important third argument to rpc/2-3 - a list of options such as limit.)

1 Like

But as I said, your proposal faces difficulties because of the third argument of rpc/2-3, doesn’t it?

And just as using

?- rpc('http://n1.org', findall(X, p(X), L)).

should always be preferred over

?- findall(X, rpc('http://n1.org', p(X)), L).

it’s likely that using

?- rpc('http://n1.org', maplist(succ, [1,2,3], L)).

should always be preferred over

?- maplist(rpc('http://n1.org', succ), [1,2,3], L).

Why should the one be always preferred over the the other? To me it looks like they mean different things. If you have a parallel maplist, for example?

Or do you mean that rpc is not actually a meta-calling predicate and shouldn’t be mixed up with those?

1 Like

Because the first goal makes a single RPC call while the second one makes 3 RPC calls? Or am I missing something in the example?

1 Like

Hi Boris and Paulo, thanks for chiming in!

Yes, that would be the reason. More obvious here, perhaps,

but, AFAICS, the same kind of thinking applies.

That’s the point I thought? If you have a parallel maplist and if the other side has multiple connections/workers?

@Boris, do you mean like this?:

?- rpc('http://n1.org', concurrent_maplist(succ, [1,2,3], L)).

?- concurrent_maplist(rpc('http://n1.org', succ), [1,2,3], L).

The same thinking about the difference in the number of roundtrips required would still apply, wouldn’t it?

Regarding the options to rpc/2-3, I’m starting to think that the default value of the limit option ought to be infinite rather than 1 (which is now the case). And the same should probably be true of the limit parameter for the HTTP API. (In fact, rpc/2-3 will “inherit” this default from the HTTP API.) It would mean that if a client makes a request

GET http://n1.org/ask?query=p(X)

the response will contain all solutions to ?-p(X) rather than just the first, and if a call such as

?- rpc('http://n1.org', p(X)).

is made, then, under the hood, all solutions to ?-p(X) will be retrieved (although the behaviour of rpc/2-3 will be the same as before).

This would mean that since the underlying queries are deterministic, such requests/calls will never leave entries in the node’s cache.

There will be fewer reasons to make use of the limit parameter/option. But, of course, our simple shell would still have to set limit to 1

GET http://n1.org/ask?query=p(X)&limit=1

and if the number of solutions to ?-p(X) is very large, or even infinite, is might be necessary to use

?- rpc('http://n1.org', p(X), [limit(N)]).

with a suitable N in order to avoid exceeding the timeout.

So given this change, rpc(URI,G) becomes equivalent to rpc(URI,G,[limit(infinite)]), which in turn is equivalent to rpc(URI,findall(G,G,L)).

Another potentially useful parameter/option might be once. For example, assuming ?-p(X) has three solutions, using once in combination with limit would only get us two of them:

?- rpc('http://n1.org', p(X), [
       limit(2),
       once(true)
   ]).
X = a ;
X = b.

Again, this will not add an entry to the node’s cache. Thus, if you know two solutions would be sufficient for your purpose, you are being “nice” to the node and its owner by using once in this way.

BTW, note that using the once parameter/option is not the same as running a goal once(Goal), which will only provide one solution:

?- rpc('http://n1.org', once(p(X)), [
       limit(2)
   ]).
X = a.

Thanks, yes, it’s a good idea to keep a record as it gives us something to start from once we begin working on a standard for Web Prolog.

(For clarity, I replaced “site” with a real URI.) Well, my proposal to make limit(infinite) the default solves this particular problem as the initial call to rpc/2-3 retrieves all solutions to ?-p(X) and therefore doesn’t add an entry to the node’s cache. Compare with this, which does something similar, and doesn’t add an entry to the cache of the remote node either:

?- rpc('http://n2.org', findall(X, p(X), L)), member(X, L), X=2, !.

With my proposal for a new default for the limit option, the problem will only occur if you do this (but why would you?):

?- rpc('http://n2.org', p(X), [limit(1)]), X=2, !.

The problem would also occur if you hit Return instead of a semi-colon in the shell after having seen only the first two solutions:

?- p(X).
X = 1 ;
X = 2.

That would leave an entry in the cache of the current node. Eventually, either another request (made by another client) can make use of it, or (more likely) it will be removed by the pruning mechanism. So it doesn’t create much of a problem.

It would of course be great if setup_call_cleanup/3 is added to the ISO core, but I don’t agree that a W3C standard for Web Prolog depends on it. Note that the implementation of rpc/2-3 that I provide is only an example of how this Web Prolog built-in predicate may be implemented. As long as it adheres to the specification, it doesn’t have to be implemented like that.

We depend on the availability of setup_call_cleanup/3 in the ISO core only if we find that it must be provided as a built-in in the Web Prolog standard too, but that’s still an open question I would say, at least for the ISOBASE profile.

I’m not sure what setup_call_cleanup/3 has to do with it, but let me show what happens with a number of queries if we are talking to a node http://n1.org which implements the ISOBASE profile and uses limit(infinite) as the default:

?- p(X).
X = 1 ;
X = 2 ;
X = 3.
?-

In this case, as the shell uses limit=1 rather than the default, the clause cache(p(X),3,<pid>) will be left in the cache, but will eventually be removed by the pruning mechanism.

?- p(X), X=2, !.
X = 2.
?-

In this case, the query is deterministic and will therefore not leave any entry in the cache.

?- findall(X, p(X), L).
Error: Time limit exceeded
?-

In this case, the query is both deterministic and will not terminate and will (for both of these reasons) therefore not leave any entry in the cache.

?- rpc('http://n2.org', p(X)).
X = 1 ;
X = 2 ;
X = 3.
?-

In this case, a clause cache(rpc('http://n2.org',p(X)),3,<pid>) will be left in the cache at http://n1.org and a clause cache(p(X),3,<pid>) will be left in the cache at http://n2.org. Eventually, they will both be removed by the pruning mechanisms.

?- rpc('http://n2.org', p(X)), X=2, !.
X = 2.
?-

In this case, a clause cache(p(X),3,<pid>) will be left in the cache at http://n2.org but will eventually be removed. No entry will be added to the cache at http://n1.org as the query is deterministic.

?- rpc('http://n2.org', findall(X, p(X), L)).
Error: Time limit exceeded
?-

In this case, the query is both deterministic and will not terminate and will (for both of these reasons) therefore not leave any entry in any of the two caches.

(Hope I got all of that right.)