Prolog Language Server: Enabling swipl integration with Python and other languages

Very cool! In looking through the code it seems like this is a great tutorial for showing how to use the SWI Prolog HTTP functionality and build the start of a local server that can server static pages as well as prolog queries. If I’m understanding correctly, it seems like you were targeting a different design center with that than the “language_server” (or whatever we call it).

Were you bringing it up as an example of something that could be used to compare notes on and do possible code borrowing?

(aside, if you’re looking for an LSP for Prolog, I have one I’ve been developing here)

1 Like

The HTTP functionality is there for serving queries (the queries themselves are at swipl-server-js-client/simple_client.js at 9494950e461105f41e7a7f2b1123e1876aa2a057 · kamahen/swipl-server-js-client · GitHub), which just uses the text that a user types in – obviously this is not very safe. The query is handled here: swipl-server-js-client/simple_server.pl at 9494950e461105f41e7a7f2b1123e1876aa2a057 · kamahen/swipl-server-js-client · GitHub

Because of XSS security, it’s simplest to serve both the query results and the static HTML from the same server, so the static HTML server code is also in the Prolog server.

I didn’t understand some stuff in the client-server tutorial, so I wrote this as a kind of test code, then stripped it down and added some documentation (which evidently isn’t good enough). I then used the code in a more complex server.

You might want to think of using the word “interface”, then, since in most OO languages that means “a thing you can use without knowing the implementation details”. I’m also put in mind of GDB/MI, gdb’s machine interface protocol, so that’s another point there. And, since this is effectively a replacement UI for Prolog’s classic REPL toplevel, perhaps call the toplevel predicate machine_query_interface and call the various other-language libraries “Prolog-MQI”? That reinforces that you’re only exposing half of the language (the query half, not the programming half), and a command line like

swipl /path/to/program.pl -t machine_query_interface

makes it incredibly clear what you’re getting (a way to send queries to the Prolog engine for this particular program) and how it’s supposed to be used (connect it to another piece of software). “Prolog-MQI” is also very googleable, which more software really ought to take into account (there’s a reason I stopped using the Awesome Window Manager and it has a lot to do with not being able to find documentation and assistance online :rofl:).

It also opens up the concept space a bit, which I think is welcome. If this is intended to make Prolog become just a local implementation detail, then it needs to be able to be transparently replaced with something else, whether that’s another prolog engine, a unit test mock, or even an app written in some other language that’s still able to respond to some sort of query with a similar kind of machine interface. That’s going to be a big draw for someone that might be on the fence about implementing part of their project in Prolog, because it leaves their options open.

Happy to! Shouldn’t be tough, once the protocol is finalized, and it’s always nice to play around in other languages :slight_smile:

As for the protocol itself, let’s see. Since this is a two-way, line-based text protocol, I think you’re missing something by not including a direct stdin/stdout interface. It should probably be the default for the arity-0 toplevel predicate unless you’ve specified otherwise in code, so you can use it as above. Requiring socket/network access to be able to use this makes for a significantly more difficult integration, depending on the language, but I can’t think of a single general-purpose language that doesn’t have a simple, builtin way to access standard input and output. Yes, that might restrict you to a single connection stream (though see below), but I imagine many use cases would only need one. Case in point: if pengines hadn’t been such a hassle to get up and running for inter-language queries, I probably would have done my project with a Python main talking to a Prolog query interface, since all the third-party code I’m linking up with is either Python or C/C++.

Other thoughts, in no particular order:

  • The JSON mapping of Prolog terms is more cumbersome than it probably needs to be. (It also needs to be documented, since I’m only getting this by looking at the protocol dumps you’ve provided.) My suggestion would be something along these lines:
Prolog type Prolog representation JSON representation Notes
integer 1234 1234 1
float 1234.0 1234.0 1
atom '1234' "1234" 2
variable _1234 {"_":"_1234"}
string "1234" {"\"":"1234"}
list `1234` [49,50,51,52]
compound foo(12,34) {"foo":[12,34]} 3
dict foo{12:34} {"{":["foo",{12:34}]}
improper list [1,2,3|4] {"[":[[1,2,3],4]}
unrepresentable '['([1,2,3],4) {"'":["compound","[",[1,2,3],4]} 4

[1] If it fits in a pointer-size signed integer (or in a double). Otherwise translate to an “unrepresentable” functor, because many JSON libraries (including, of course, Javascript itself) have limitations about the size of numbers they’ll read.
[2] Between atoms and SWI strings, atoms get a lot more use, so they get to use the bare JSON string representation.
[3] Any compound whose functor can be represented as an unquoted atom will be represented this way. Otherwise wrap with T0 =.. L, T1 =.. ['\'',compound|L].
[4] This can be used for any unrepresentable type, or any value outside the representable domain of one of the other above types.

This keeps the representation compact (not actually a small consideration, if you’re transferring large data sets), makes it more readable (functor names before arguments), and gives every value (modulo dicts, whose keys can be reordered, and floats which are weird) a single canonical, invariant representation - which means I don’t even need a JSON library, if my results are predictable. Thus:

true([[threads(language_server1_conn2_comm, language_server1_conn2_goal)]])
% Original:
{"args": [[[{"args": ["language_server1_conn2_comm", "language_server1_conn2_goal"], "functor":"threads"}]]], "functor":"true"}
% New:
{"true": [[[{"threads": ["language_server1_conn2_comm", "language_server1_conn2_goal"]}]]]}
  • I’d suggest using the same base encoding in both directions. This doesn’t necessarily mean you need to translate your query from Prolog-syntax into JSON-syntax, but even something like {"query":"atom(a)"} gives you a good way to extend the command set without having to create new predicates to handle command variants or fall back on the (quite idiosyncratic) Prolog option-list convention. Plus, you could even do away with the byte-length prefix if you wanted, and just make the protocol line-oriented.
  • I think it’s a mistake for the default run to fetch all solutions to a query. Given that this is intended as a lightweight way to use Prolog in a project that has a different primary language (and thus by definition will have developers familiar with the primary language, but not necessarily with Prolog), you’re one run(member(foo, L), -1) away from the connection stalling while Prolog gobbles up all the system memory it possibly can. Do what the console toplevel does, return the first result, and indicate there are more results that can be fetched if desired. If the client runs another query while results are pending, just discard them.
  • Return variable bindings as a JSON object rather than a list of =(Var, Value) functors. Don’t make the possibly-rudimentary client library do the work, the whole point is to reduce the client library footprint.
  • Provide for the ability to bind variables to pre-query values without using Prolog syntax, to avoid injection vulnerabilities. Something like {"query":"player_exists(X)", "bindings":{"X": "Johnny\"),abort. %"}} will save you a lot of headaches.
  • It’d be nice to have the option to pass along an ID with every command that the server will echo back with the reply. That avoids potential desync of the command/response pattern, and it also opens the door for:
  • Make run_async actually async. Right now it’s just a long-form for run with a server-side results cursor, which as I mentioned ought to be a function of the basic run command. The purpose of async is to allow you to do other things while waiting for the long-running process to complete, and currently you can’t issue any other queries while the async one is running. (Obviously the client program can do other things while the query is running. It’s in a separate process.) Prolog has great coroutining facilities, use them to let you issue multiple queries and let them run independently, or at least sequentially. Use command IDs (or have the server return a query ID in the run_async reply) to distinguish which async_result the client wants to fetch, and then the server can start the engine working on the next result for that query after it returns this one.
  • Support or at least provide room in the protocol for push messages from the server. Without server-initiated messages, the only way to figure out when an async operation is finished is to poll repeatedly or issue a long-timeout fetch, at which point you’re no longer async. (Bonus: your ..... keepalives can just be turned into a subclass of push message.) Push messages could include “results available for async query 5” but also “data written to user_output: hello world”. Especially important if using stdin/stdout comms.
  • Allow access to and control of multiple threads as first-class options in the protocol, rather than requiring a separate OS-level connection for each thread. This could tie in nicely with the stdin/stdout comm layer, so you wouldn’t need to pull in networking just to have separate Prolog threads. Blocking protocol commands should block on a per-thread basis, so you can have each thread doing a run, and they’ll each return their results (sensibly multiplexed) when available.
  • Add named pipes and raw file descriptors to the list of comm layers, possibly. They’re available on Windows and on Unixlikes, and they also represent a much lower barrier of entry, while still enabling the simplex thread-per-connection model. It could be something as simple as issuing a “here I made a FIFO pair, please connect to it” command to an already-connected thread.

Right! That’s a lot, I know. Most of it is opt-in on the part of the client, though, so it shouldn’t make a big difference in how much code is needed for the foreign-language shim layer. And these are all only suggestions! Feel free to use or ignore as you see fit, I just wanted to offer some ideas to consider :slight_smile:

Yes, some libraries (and languages) have support for it. But it often requires explicit calling or handling; in JS, for example, a BigInt is a completely different type than a Number. If you try to parse a huge JSON number in the raw, it’ll show up as floating point, and at that point you’re screwed, because you’ve lost the value (due to conversion) and you have no idea where the representation is in the big mess of JSON. Representing the number in string form allows the client to decide whether to opt in to the special parsing/handling modes, if the client language has support for it.

Note 1 applies to two different items, integer rep and float rep. The “double” part was for the float rep, not the integer. And no, Javascript is 32-bit only across the board.

Also, these are implementation details that probably don’t need to be ironed out here until and unless @ericzinda decides he wants to go with the type of representation I was suggesting :slightly_smiling_face:

Oooh, I like that. I haven’t messed around with the GDB/MI, but perusing it it seems very similar in how it is used. I also like using the term “interface” as it gets around all of the “server/service” stuff and doesn’t lead people down the path of thinking this is a centralized server they should run. Nice!

I’ll see if @jan has an objections when he’s back online and, if not, go with that name for both the Prolog predicate (machine_query_interface/N), and the Python library: PrologMQI.

Thanks!

Edit: added to the next feature push here: Rename language_server and swiplserver · Issue #9 · EricZinda/swiplserver · GitHub]

1 Like

Yeah. Maybe its my non-Unix background, or all the crazy issues I’ve had in the past using STDIO and what seems to be the byzantine challenges there (especially cross platform), but my gut was that sockets would actually be easier. Partially confirmed by the number of google queries I did for just trying to get swipl to launch correctly from Python with the correctly quoted arguments…

Ignoring my whining, though, I definitely agree I need to investigate it. @Jan has mentioned it as well. It doesn’t necessarily complicate the interface since the library doesn’t have to use it (unless the library supports both TCP and command line, which I suspect many might feel they have to…). And: it may turn out to be simpler as you say :slight_smile: .

I’ll add that to the list to think about and investigate: Support STDIO interface in addition to sockets · Issue #10 · EricZinda/swiplserver · GitHub

I just used the builtin json_to_prolog/2 since that is what is used by the http package already. You’re right that I should have pointed at that in the docs. Need to document the JSON format used · Issue #11 · EricZinda/swiplserver · GitHub

I believe using the same encoding there is goodness for the platform. If we want to fix that an add an optional different encoding it seems like both could benefit. Don’t think I want to take this one on, but would use it if someone did :slight_smile: .

The ironic thing is that this format is actually what I use internally in my big use of swiplserver at the moment. So I agree with you it is nicer. I just wanted to keep it consistent with the rest of the platform. I convert with this function before i use the JSON:

    def ConvertStdJson(self, value):
        if isinstance(value, dict):
            return {value["functor"]: self.ConvertStdJson(value["args"])}
        elif isinstance(value, list):
            return [self.ConvertStdJson(x) for x in value]
        else:
            return value

Edit: Consider adding an option to use a different JSON Format · Issue #12 · EricZinda/swiplserver · GitHub

A few things here:

Passing string length along with the command lets us get around the issue that the Prolog read_term/3 will hang waiting for more characters if the caller sends a Prolog term that is invalid (in that it isn’t finished). I.e. something like my_atom(foo, bar. Using read_term_from_atom/3 will throw. That way, we can report on invalid Prolog without doing something like a timeout. Just seemed cleaner.

My intention here was to have a tiny number of commands to keep the overhead of writing a new language library very low. So I’m hoping to not have to wish that it were easier to add more :slight_smile:

I did debate making the input JSON instead of a string, though, just for consistency. In my usage of the system strings seemed more natural from the Python side, but that just my usage. I’ll add an issue to the list and see what other folks think as they use it.

BTW: Let’s do any further discussion using the issues at Gihub since I fear this thread will become unreadable…

My thoughts on this were:

Thought 1: I strongly suspect that new people building a system with Prolog will start by doing REPL using the toplevel a bunch until they get their Prolog code into the approximate right form. Otherwise, it will be just too painful and slow to get things right. Once in the rightish form, they’ll switch to the Python side. So, the queries they are running will mostly be things they’ve already tried out.

Thought 2: I also suspect that it is rare to want to run a query and only use the first result (otherwise you’d use once/1). If my first thought is right, then you’ll normally want all the answers by default, unless you want to stream the answers one by one. And for that case you’ll use the async version.

The challenge here is that several things can get returned: true([with answers]), false, or an exception(). It seemed to me that keeping a consistent model where the result on the Prolog side is a single term that gets converted into Prolog using the standard serialization simplified understanding of the protocol and made handling of the responses on the language library side easier.

It also doesn’t feel too onerous on the Python side, for example. Here’s my library code that converts the true() answers into a form that is more like what I think you’re suggesting.

answerList = []
for answer in prolog_args(jsonResult)[0]:
    if len(answer) == 0:
        answerList.append(True)
    else:
        answerDict = {}
        for answerAssignment in answer:
            # These will all be =(Variable, Term) terms
            answerDict[prolog_args(answerAssignment)[0]] = prolog_args(
                answerAssignment
            )[1]
        answerList.append(answerDict)

I broke this out into a separate thread for discussion: Language_server feedback: bind variables to pre-query values to avoid injection vulnerabilities

So, if you yourself use that format because it’s more convenient, why is the protocol sending a format that’s less convenient? If your goal is to make it easy to write new language glue libraries, shouldn’t the protocol be as close to the intended usage as possible?

Remember, no one outside the Prolog community has used the builtin Prolog term conversion functions (or, most likely, any of the rest of it), so I feel like it’d make more sense to base your communications protocol around what is standard for the rest of the tech world, not around what is standard for Prolog.

I guess I’d argue that run_async is async, from a “the client is not synchronously waiting for a result” point of view, but I see your point.

My thinking here was to keep the interface as straightforward as possible. Each connection represents a thread that can run one query at a time. The async API on a thread allows you to retrieve answers without waiting (i.e. asynchronously) as they are available. If you want to run queries concurrently, you can create a new connection (i.e. thread) and run them concurrently.

I see the benefit, though, of having a way to fire off a bunch of queries on one thread and checking back to see when they are done. Certainly the library writer, or the developer, could each do this at their layer, given what is provided already.

OK, let me think about this a bit, I do agree it is going to be something people will want. One way or another.

That’s exactly why I suggest wrapping the input in JSON. You can’t depend on people knowing the right format of a Prolog term, but everyone knows (by which I mean there are libraries available for basically every language) how to read and write JSON. That way you’re never in doubt as to what the intended input is.

You may be underestimating how intimidating the Prolog REPL is to newcomers :slightly_smiling_face: My expectation would be that, if this project does take off and more people start using it, most people’s introduction will probably be in “hey, there’s this really cool package that lets you add programmable AI to your project, you just have to put this one library in there and make these method calls and it’ll just work”.

Really, it points to what the target audience is, and only you can answer that. If the target audience is “Prolog developers who want to be able to work in another language”, then it makes sense that you should do things with Prolog syntax and expect people to use Prolog constructs like once/1. It sounded to me, though, like you want your audience to be “Application developers who don’t know Prolog but find it interesting and want to try it out in conjunction with something they’re familiar with,” so avoiding penalizing newbie missteps like “didn’t put a period on the end of a term” or “didn’t realize that the non-determinacy of this predicate was a problem” will make things friendlier for new Prolog developers.

Remembered this but haven’t read it recently. Don’t know how up to date it still is but the ideas are in the concept area.

“Syntactic integration of external languages in Prolog” by Jan Wielemaker and Nicos Angelopoulos (pdf)


If you read it and think others should know about it, perhaps add it to Useful Prolog references.

Conversely, you can get all results using forall/2 or concurrent_forall/2.

Love reading these threads !

I think much will depend on the IDE in use – how well it supports debugging in an integrated way.

Otherwise, from an engineering perspective a developer will probably will want to work in a Prolog language development environment with a proxy / surrogate client in Prolog and related test cases to get the AI working before writing the python application code.

Otherwise, there would be too many moving parts that can go wrong, to deal with concurrently.

just my hunch …

Dan