You might want to think of using the word “interface”, then, since in most OO languages that means “a thing you can use without knowing the implementation details”. I’m also put in mind of GDB/MI, gdb’s machine interface protocol, so that’s another point there. And, since this is effectively a replacement UI for Prolog’s classic REPL toplevel, perhaps call the toplevel predicate machine_query_interface
and call the various other-language libraries “Prolog-MQI”? That reinforces that you’re only exposing half of the language (the query half, not the programming half), and a command line like
swipl /path/to/program.pl -t machine_query_interface
makes it incredibly clear what you’re getting (a way to send queries to the Prolog engine for this particular program) and how it’s supposed to be used (connect it to another piece of software). “Prolog-MQI” is also very googleable, which more software really ought to take into account (there’s a reason I stopped using the Awesome Window Manager and it has a lot to do with not being able to find documentation and assistance online ).
It also opens up the concept space a bit, which I think is welcome. If this is intended to make Prolog become just a local implementation detail, then it needs to be able to be transparently replaced with something else, whether that’s another prolog engine, a unit test mock, or even an app written in some other language that’s still able to respond to some sort of query with a similar kind of machine interface. That’s going to be a big draw for someone that might be on the fence about implementing part of their project in Prolog, because it leaves their options open.
Happy to! Shouldn’t be tough, once the protocol is finalized, and it’s always nice to play around in other languages
As for the protocol itself, let’s see. Since this is a two-way, line-based text protocol, I think you’re missing something by not including a direct stdin/stdout interface. It should probably be the default for the arity-0 toplevel predicate unless you’ve specified otherwise in code, so you can use it as above. Requiring socket/network access to be able to use this makes for a significantly more difficult integration, depending on the language, but I can’t think of a single general-purpose language that doesn’t have a simple, builtin way to access standard input and output. Yes, that might restrict you to a single connection stream (though see below), but I imagine many use cases would only need one. Case in point: if pengines hadn’t been such a hassle to get up and running for inter-language queries, I probably would have done my project with a Python main talking to a Prolog query interface, since all the third-party code I’m linking up with is either Python or C/C++.
Other thoughts, in no particular order:
- The JSON mapping of Prolog terms is more cumbersome than it probably needs to be. (It also needs to be documented, since I’m only getting this by looking at the protocol dumps you’ve provided.) My suggestion would be something along these lines:
Prolog type |
Prolog representation |
JSON representation |
Notes |
integer |
1234 |
1234 |
1 |
float |
1234.0 |
1234.0 |
1 |
atom |
'1234' |
"1234" |
2 |
variable |
_1234 |
{"_":"_1234"} |
|
string |
"1234" |
{"\"":"1234"} |
|
list |
`1234` |
[49,50,51,52] |
|
compound |
foo(12,34) |
{"foo":[12,34]} |
3 |
dict |
foo{12:34} |
{"{":["foo",{12:34}]} |
|
improper list |
`[1,2,3 |
4]` |
{"[":[[1,2,3],4]} |
unrepresentable |
'['([1,2,3],4) |
{"'":["compound","[",[1,2,3],4]} |
4 |
[1] If it fits in a pointer-size signed integer (or in a double). Otherwise translate to an “unrepresentable” functor, because many JSON libraries (including, of course, Javascript itself) have limitations about the size of numbers they’ll read. |
|
|
|
[2] Between atoms and SWI strings, atoms get a lot more use, so they get to use the bare JSON string representation. |
|
|
|
[3] Any compound whose functor can be represented as an unquoted atom will be represented this way. Otherwise wrap with `T0 =… L, T1 =… [‘'’,compound |
L]`. |
|
|
[4] This can be used for any unrepresentable type, or any value outside the representable domain of one of the other above types. |
|
|
|
This keeps the representation compact (not actually a small consideration, if you’re transferring large data sets), makes it more readable (functor names before arguments), and gives every value (modulo dicts, whose keys can be reordered, and floats which are weird) a single canonical, invariant representation - which means I don’t even need a JSON library, if my results are predictable. Thus:
true([[threads(language_server1_conn2_comm, language_server1_conn2_goal)]])
% Original:
{"args": [[[{"args": ["language_server1_conn2_comm", "language_server1_conn2_goal"], "functor":"threads"}]]], "functor":"true"}
% New:
{"true": [[[{"threads": ["language_server1_conn2_comm", "language_server1_conn2_goal"]}]]]}
- I’d suggest using the same base encoding in both directions. This doesn’t necessarily mean you need to translate your query from Prolog-syntax into JSON-syntax, but even something like
{"query":"atom(a)"}
gives you a good way to extend the command set without having to create new predicates to handle command variants or fall back on the (quite idiosyncratic) Prolog option-list convention. Plus, you could even do away with the byte-length prefix if you wanted, and just make the protocol line-oriented.
- I think it’s a mistake for the default
run
to fetch all solutions to a query. Given that this is intended as a lightweight way to use Prolog in a project that has a different primary language (and thus by definition will have developers familiar with the primary language, but not necessarily with Prolog), you’re one run(member(foo, L), -1)
away from the connection stalling while Prolog gobbles up all the system memory it possibly can. Do what the console toplevel does, return the first result, and indicate there are more results that can be fetched if desired. If the client runs another query while results are pending, just discard them.
- Return variable bindings as a JSON object rather than a list of =(Var, Value) functors. Don’t make the possibly-rudimentary client library do the work, the whole point is to reduce the client library footprint.
- Provide for the ability to bind variables to pre-query values without using Prolog syntax, to avoid injection vulnerabilities. Something like
{"query":"player_exists(X)", "bindings":{"X": "Johnny\"),abort. %"}}
will save you a lot of headaches.
- It’d be nice to have the option to pass along an ID with every command that the server will echo back with the reply. That avoids potential desync of the command/response pattern, and it also opens the door for:
- Make
run_async
actually async. Right now it’s just a long-form for run
with a server-side results cursor, which as I mentioned ought to be a function of the basic run
command. The purpose of async is to allow you to do other things while waiting for the long-running process to complete, and currently you can’t issue any other queries while the async one is running. (Obviously the client program can do other things while the query is running. It’s in a separate process.) Prolog has great coroutining facilities, use them to let you issue multiple queries and let them run independently, or at least sequentially. Use command IDs (or have the server return a query ID in the run_async
reply) to distinguish which async_result
the client wants to fetch, and then the server can start the engine working on the next result for that query after it returns this one.
- Support or at least provide room in the protocol for push messages from the server. Without server-initiated messages, the only way to figure out when an async operation is finished is to poll repeatedly or issue a long-timeout fetch, at which point you’re no longer async. (Bonus: your
.....
keepalives can just be turned into a subclass of push message.) Push messages could include “results available for async query 5” but also “data written to user_output: hello world”. Especially important if using stdin/stdout comms.
- Allow access to and control of multiple threads as first-class options in the protocol, rather than requiring a separate OS-level connection for each thread. This could tie in nicely with the stdin/stdout comm layer, so you wouldn’t need to pull in networking just to have separate Prolog threads. Blocking protocol commands should block on a per-thread basis, so you can have each thread doing a
run
, and they’ll each return their results (sensibly multiplexed) when available.
- Add named pipes and raw file descriptors to the list of comm layers, possibly. They’re available on Windows and on Unixlikes, and they also represent a much lower barrier of entry, while still enabling the simplex thread-per-connection model. It could be something as simple as issuing a “here I made a FIFO pair, please connect to it” command to an already-connected thread.
Right! That’s a lot, I know. Most of it is opt-in on the part of the client, though, so it shouldn’t make a big difference in how much code is needed for the foreign-language shim layer. And these are all only suggestions! Feel free to use or ignore as you see fit, I just wanted to offer some ideas to consider