Prolog Language Server: Enabling swipl integration with Python and other languages

Your example code is:

while True:
    result = prolog_thread.query_async_result()
        if result is None:
            break
        else:
            print(result)

It’s more “Pythonic” to package this as an iterator:

for result in prolog_thread.query_iter():
    print(result)

Also, you might want to consider an interface that is truly asynchronous, so that it works with await. (I’m not a big fan of await, but it seems that a lot of people are.)

I guess PDT [1] is also a language server in this sense.

I think PDT offers a Java API [2] to access swi-prolog, which is then used to build the PDT Eclipse plugin.

Dan

[1] https://sewiki.iai.uni-bonn.de/research/pdt/docs/start
[2] https://sewiki.iai.uni-bonn.de/research/pdt/connector/start

Yeah, I agree. I planned to add that as an alternative as I iterate the code forward, but forgot to put it in the backlog. Added it in there so it’ll be in the next turn of the crank. Thanks Peter!

Thank you, I was about to point this out myself. @ericzinda, @jan, can we get this naming collision sorted out?

1 Like

I was taking the fact that “language server” has other meanings in the wild as a small thing (as peter said) given that reading even a bit of the docs (I think) makes it clear that it’s not an IDE plug-in. Do we have IDE language servers in the build (or on the books somewhere) that are going to collide and cause confusion?

Naming this thing was a little challenging and I felt like the broader suggested names like “Prolog server” or “SWI-Prolog server” were too broad and would steal that name from something more grandiose and also misrepresent what the language_server is actually doing.

The best I came up with was “json_server” since it isn’t too broad and at least explains functionally what is going on. Any other suggestions?

Edit: the other problem with “json_server” is that we may want other protocols at some point (Jan suggested a binary one, for example), so then that would be confusing…

FWIW, I have a simple example program of a Javascript web front-end communicating with a SWI-Prolog backend, using JSON (and Javascript “fetch”). There’s a bit of discussion here: Simple Prolog server with JavaScript client

and the repo is here: GitHub - kamahen/swipl-server-js-client: Sample SWI-Prolog server with JavaScript client

I mean, the thing I instantly thought of when I saw the topic title was that someone had written an LSP server for Prolog, which I was very excited about because it meant cleaner IDE integration as I work on Prolog stuff, until I realized it wasn’t actually that. Of course, I’ve personally done work with LSP clients and servers in the past, so I may be a bad example? Or maybe it means I’m a good one, who knows.

Amusingly enough, I’m doing roughly the same thing in the opposite direction for a personal project of mine. I need to integrate Prolog and Python, though in my case I decided to make Prolog the controlling agent, rather than Python, so I wrote a very tiny Python shim script that Prolog can send queries to. Very similarly, queries are sent in a decimal-length-prefixed string format, and replies are in JSON. Funny!

Anyway, as far as names go, perhaps something like “query_responder” would work? That avoids the issue entirely, and it also means there’s no assumption that this is necessarily a long-running process of some sort (as opposed to what it is, a component of a single running application that happens to be compartmentalized into a separate system process). You could also call it “prolog_oracle” if you wanted to get all theory-of-compsci on it :slightly_smiling_face:

I do have some thoughts on the library/implementation itself, but I wanted to check, before I start diving into it - what kind of feedback are you interested in getting on the idea, and what kinds of changes are you open to making (beyond, I’m hoping, the name)?

Another name that just popped into my head, if you wanted to stick with the “server” nomenclature - perhaps integration_server or prolog_integration_server, since the primary goal is to integrate prolog with other languages?

Yeah, it seemed like a nice simple approach. Nice! Love to get the reverse at some point too: a javascript library talking to “language_server” (modulo name) too, don’t know if I can interest you in that…:slight_smile: Next up for me was writing a C# binding to it.

I am open to changing the name if we can come up with a better one, for sure.

As for other changes: my goal with this component is to keep the native language interface (i.e. the protocol) as small and simple as possible so we can get lots of different languages integrating with Prolog without a huge amount of effort writing the native language libraries. So I guess that means:

  • Internal fixes for perf, security, cleanliness, etc are fair game if they don’t change the protocol.
  • Protocol changes are fair game if they simplify the interface or add key/missing functionality that is worth the cost they incur to writing the native language library.
  • I view the Python library that calls the “language_server” as more open to feedback/enhancements since it only affects that component and not all future languages.

Does that make sense?

When I’m evaluating names, I’m thinking of two key facets:

  • Designed for using Prolog as a local implementation detail
  • “Easy” to integrate into a new language

OK, here’s what I think we have so far, including some I just added. I roughly sorted them with “my personal preferred” on top – no disrespect meant to suggesters!:

  • language_query_service
  • local_query_service
  • query_responder (I like that it gets rid of ‘server/service’)
  • query_service (shorter but sounds a bit more grandiose)
  • prolog_oracle (I like the idea of just picking a name like “oracle” – kind of like “pengines” did)
  • language_server (obvious issues above)
  • local_binding_service (pretty vague)
  • local_server (seems a bit too broad and vague)
  • json_server (may support different protocols some day, also maybe too central)
  • prolog_server (sounds like a central server)
  • swi_prolog_server (ditto)

How do you guys feel about “language_query_service”? Seems like it gets the point across and doesn’t seem too grandiose and maybe far enough away from “language server”?

Or “local_query_service” if we feel like it still might get confused?

Very cool! In looking through the code it seems like this is a great tutorial for showing how to use the SWI Prolog HTTP functionality and build the start of a local server that can server static pages as well as prolog queries. If I’m understanding correctly, it seems like you were targeting a different design center with that than the “language_server” (or whatever we call it).

Were you bringing it up as an example of something that could be used to compare notes on and do possible code borrowing?

(aside, if you’re looking for an LSP for Prolog, I have one I’ve been developing here)

1 Like

The HTTP functionality is there for serving queries (the queries themselves are at swipl-server-js-client/simple_client.js at 9494950e461105f41e7a7f2b1123e1876aa2a057 · kamahen/swipl-server-js-client · GitHub), which just uses the text that a user types in – obviously this is not very safe. The query is handled here: swipl-server-js-client/simple_server.pl at 9494950e461105f41e7a7f2b1123e1876aa2a057 · kamahen/swipl-server-js-client · GitHub

Because of XSS security, it’s simplest to serve both the query results and the static HTML from the same server, so the static HTML server code is also in the Prolog server.

I didn’t understand some stuff in the client-server tutorial, so I wrote this as a kind of test code, then stripped it down and added some documentation (which evidently isn’t good enough). I then used the code in a more complex server.

You might want to think of using the word “interface”, then, since in most OO languages that means “a thing you can use without knowing the implementation details”. I’m also put in mind of GDB/MI, gdb’s machine interface protocol, so that’s another point there. And, since this is effectively a replacement UI for Prolog’s classic REPL toplevel, perhaps call the toplevel predicate machine_query_interface and call the various other-language libraries “Prolog-MQI”? That reinforces that you’re only exposing half of the language (the query half, not the programming half), and a command line like

swipl /path/to/program.pl -t machine_query_interface

makes it incredibly clear what you’re getting (a way to send queries to the Prolog engine for this particular program) and how it’s supposed to be used (connect it to another piece of software). “Prolog-MQI” is also very googleable, which more software really ought to take into account (there’s a reason I stopped using the Awesome Window Manager and it has a lot to do with not being able to find documentation and assistance online :rofl:).

It also opens up the concept space a bit, which I think is welcome. If this is intended to make Prolog become just a local implementation detail, then it needs to be able to be transparently replaced with something else, whether that’s another prolog engine, a unit test mock, or even an app written in some other language that’s still able to respond to some sort of query with a similar kind of machine interface. That’s going to be a big draw for someone that might be on the fence about implementing part of their project in Prolog, because it leaves their options open.

Happy to! Shouldn’t be tough, once the protocol is finalized, and it’s always nice to play around in other languages :slight_smile:

As for the protocol itself, let’s see. Since this is a two-way, line-based text protocol, I think you’re missing something by not including a direct stdin/stdout interface. It should probably be the default for the arity-0 toplevel predicate unless you’ve specified otherwise in code, so you can use it as above. Requiring socket/network access to be able to use this makes for a significantly more difficult integration, depending on the language, but I can’t think of a single general-purpose language that doesn’t have a simple, builtin way to access standard input and output. Yes, that might restrict you to a single connection stream (though see below), but I imagine many use cases would only need one. Case in point: if pengines hadn’t been such a hassle to get up and running for inter-language queries, I probably would have done my project with a Python main talking to a Prolog query interface, since all the third-party code I’m linking up with is either Python or C/C++.

Other thoughts, in no particular order:

  • The JSON mapping of Prolog terms is more cumbersome than it probably needs to be. (It also needs to be documented, since I’m only getting this by looking at the protocol dumps you’ve provided.) My suggestion would be something along these lines:
Prolog type Prolog representation JSON representation Notes
integer 1234 1234 1
float 1234.0 1234.0 1
atom '1234' "1234" 2
variable _1234 {"_":"_1234"}
string "1234" {"\"":"1234"}
list `1234` [49,50,51,52]
compound foo(12,34) {"foo":[12,34]} 3
dict foo{12:34} {"{":["foo",{12:34}]}
improper list [1,2,3|4] {"[":[[1,2,3],4]}
unrepresentable '['([1,2,3],4) {"'":["compound","[",[1,2,3],4]} 4

[1] If it fits in a pointer-size signed integer (or in a double). Otherwise translate to an “unrepresentable” functor, because many JSON libraries (including, of course, Javascript itself) have limitations about the size of numbers they’ll read.
[2] Between atoms and SWI strings, atoms get a lot more use, so they get to use the bare JSON string representation.
[3] Any compound whose functor can be represented as an unquoted atom will be represented this way. Otherwise wrap with T0 =.. L, T1 =.. ['\'',compound|L].
[4] This can be used for any unrepresentable type, or any value outside the representable domain of one of the other above types.

This keeps the representation compact (not actually a small consideration, if you’re transferring large data sets), makes it more readable (functor names before arguments), and gives every value (modulo dicts, whose keys can be reordered, and floats which are weird) a single canonical, invariant representation - which means I don’t even need a JSON library, if my results are predictable. Thus:

true([[threads(language_server1_conn2_comm, language_server1_conn2_goal)]])
% Original:
{"args": [[[{"args": ["language_server1_conn2_comm", "language_server1_conn2_goal"], "functor":"threads"}]]], "functor":"true"}
% New:
{"true": [[[{"threads": ["language_server1_conn2_comm", "language_server1_conn2_goal"]}]]]}
  • I’d suggest using the same base encoding in both directions. This doesn’t necessarily mean you need to translate your query from Prolog-syntax into JSON-syntax, but even something like {"query":"atom(a)"} gives you a good way to extend the command set without having to create new predicates to handle command variants or fall back on the (quite idiosyncratic) Prolog option-list convention. Plus, you could even do away with the byte-length prefix if you wanted, and just make the protocol line-oriented.
  • I think it’s a mistake for the default run to fetch all solutions to a query. Given that this is intended as a lightweight way to use Prolog in a project that has a different primary language (and thus by definition will have developers familiar with the primary language, but not necessarily with Prolog), you’re one run(member(foo, L), -1) away from the connection stalling while Prolog gobbles up all the system memory it possibly can. Do what the console toplevel does, return the first result, and indicate there are more results that can be fetched if desired. If the client runs another query while results are pending, just discard them.
  • Return variable bindings as a JSON object rather than a list of =(Var, Value) functors. Don’t make the possibly-rudimentary client library do the work, the whole point is to reduce the client library footprint.
  • Provide for the ability to bind variables to pre-query values without using Prolog syntax, to avoid injection vulnerabilities. Something like {"query":"player_exists(X)", "bindings":{"X": "Johnny\"),abort. %"}} will save you a lot of headaches.
  • It’d be nice to have the option to pass along an ID with every command that the server will echo back with the reply. That avoids potential desync of the command/response pattern, and it also opens the door for:
  • Make run_async actually async. Right now it’s just a long-form for run with a server-side results cursor, which as I mentioned ought to be a function of the basic run command. The purpose of async is to allow you to do other things while waiting for the long-running process to complete, and currently you can’t issue any other queries while the async one is running. (Obviously the client program can do other things while the query is running. It’s in a separate process.) Prolog has great coroutining facilities, use them to let you issue multiple queries and let them run independently, or at least sequentially. Use command IDs (or have the server return a query ID in the run_async reply) to distinguish which async_result the client wants to fetch, and then the server can start the engine working on the next result for that query after it returns this one.
  • Support or at least provide room in the protocol for push messages from the server. Without server-initiated messages, the only way to figure out when an async operation is finished is to poll repeatedly or issue a long-timeout fetch, at which point you’re no longer async. (Bonus: your ..... keepalives can just be turned into a subclass of push message.) Push messages could include “results available for async query 5” but also “data written to user_output: hello world”. Especially important if using stdin/stdout comms.
  • Allow access to and control of multiple threads as first-class options in the protocol, rather than requiring a separate OS-level connection for each thread. This could tie in nicely with the stdin/stdout comm layer, so you wouldn’t need to pull in networking just to have separate Prolog threads. Blocking protocol commands should block on a per-thread basis, so you can have each thread doing a run, and they’ll each return their results (sensibly multiplexed) when available.
  • Add named pipes and raw file descriptors to the list of comm layers, possibly. They’re available on Windows and on Unixlikes, and they also represent a much lower barrier of entry, while still enabling the simplex thread-per-connection model. It could be something as simple as issuing a “here I made a FIFO pair, please connect to it” command to an already-connected thread.

Right! That’s a lot, I know. Most of it is opt-in on the part of the client, though, so it shouldn’t make a big difference in how much code is needed for the foreign-language shim layer. And these are all only suggestions! Feel free to use or ignore as you see fit, I just wanted to offer some ideas to consider :slight_smile:

Yes, some libraries (and languages) have support for it. But it often requires explicit calling or handling; in JS, for example, a BigInt is a completely different type than a Number. If you try to parse a huge JSON number in the raw, it’ll show up as floating point, and at that point you’re screwed, because you’ve lost the value (due to conversion) and you have no idea where the representation is in the big mess of JSON. Representing the number in string form allows the client to decide whether to opt in to the special parsing/handling modes, if the client language has support for it.

Note 1 applies to two different items, integer rep and float rep. The “double” part was for the float rep, not the integer. And no, Javascript is 32-bit only across the board.

Also, these are implementation details that probably don’t need to be ironed out here until and unless @ericzinda decides he wants to go with the type of representation I was suggesting :slightly_smiling_face:

Oooh, I like that. I haven’t messed around with the GDB/MI, but perusing it it seems very similar in how it is used. I also like using the term “interface” as it gets around all of the “server/service” stuff and doesn’t lead people down the path of thinking this is a centralized server they should run. Nice!

I’ll see if @jan has an objections when he’s back online and, if not, go with that name for both the Prolog predicate (machine_query_interface/N), and the Python library: PrologMQI.

Thanks!

Edit: added to the next feature push here: Rename language_server and swiplserver · Issue #9 · EricZinda/swiplserver · GitHub]

1 Like

Yeah. Maybe its my non-Unix background, or all the crazy issues I’ve had in the past using STDIO and what seems to be the byzantine challenges there (especially cross platform), but my gut was that sockets would actually be easier. Partially confirmed by the number of google queries I did for just trying to get swipl to launch correctly from Python with the correctly quoted arguments…

Ignoring my whining, though, I definitely agree I need to investigate it. @Jan has mentioned it as well. It doesn’t necessarily complicate the interface since the library doesn’t have to use it (unless the library supports both TCP and command line, which I suspect many might feel they have to…). And: it may turn out to be simpler as you say :slight_smile: .

I’ll add that to the list to think about and investigate: Support STDIO interface in addition to sockets · Issue #10 · EricZinda/swiplserver · GitHub

I just used the builtin json_to_prolog/2 since that is what is used by the http package already. You’re right that I should have pointed at that in the docs. Need to document the JSON format used · Issue #11 · EricZinda/swiplserver · GitHub

I believe using the same encoding there is goodness for the platform. If we want to fix that an add an optional different encoding it seems like both could benefit. Don’t think I want to take this one on, but would use it if someone did :slight_smile: .

The ironic thing is that this format is actually what I use internally in my big use of swiplserver at the moment. So I agree with you it is nicer. I just wanted to keep it consistent with the rest of the platform. I convert with this function before i use the JSON:

    def ConvertStdJson(self, value):
        if isinstance(value, dict):
            return {value["functor"]: self.ConvertStdJson(value["args"])}
        elif isinstance(value, list):
            return [self.ConvertStdJson(x) for x in value]
        else:
            return value

Edit: Consider adding an option to use a different JSON Format · Issue #12 · EricZinda/swiplserver · GitHub

A few things here:

Passing string length along with the command lets us get around the issue that the Prolog read_term/3 will hang waiting for more characters if the caller sends a Prolog term that is invalid (in that it isn’t finished). I.e. something like my_atom(foo, bar. Using read_term_from_atom/3 will throw. That way, we can report on invalid Prolog without doing something like a timeout. Just seemed cleaner.

My intention here was to have a tiny number of commands to keep the overhead of writing a new language library very low. So I’m hoping to not have to wish that it were easier to add more :slight_smile:

I did debate making the input JSON instead of a string, though, just for consistency. In my usage of the system strings seemed more natural from the Python side, but that just my usage. I’ll add an issue to the list and see what other folks think as they use it.

BTW: Let’s do any further discussion using the issues at Gihub since I fear this thread will become unreadable…