Prolog Language Server: Enabling swipl integration with Python and other languages

I’ve been using SWI Prolog with Python for a couple of years now. When I first started, I had trouble finding a Python library that worked well for me and wanted to find a simple approach I could use for integrating Python and SWI Prolog. After consulting with @Jan, we settled on a model that would be simple to integrate with many different languages: calling a JSON service that runs in an application-dedicated SWI Prolog process. This is not a general purpose “client/server” or “micro-service” server. It is designed to be dedicated to a particular running instance of a program. It is for applications that want to have Prolog as a local implementation detail – applications that might be described as wanting to use a “Prolog library”.

This approach puts very few requirements on the “integrating” language (e.g. Python). The language has to be able to launch/kill processes, read STDOUT, use sockets, and read and write JSON. Hopefully that will lower the bar for integrating Prolog into lots of languages.

I’ve published both the Prolog “Language Server” (language_server/1) code and the Python library code that calls it on github for feedback. The Python library is also meant to act as a reference implementation for others that want to integrate other languages with SWI Prolog. The github project has documentation that describes how it works, why, performance measurements, etc.

The intention is to include language_server/1 and the Python library as part of SWI Prolog.

I’m looking for feedback from the community on both language_server/1 itself and the Python library. Feedback welcome in this thread, or, if it gets too noisy, on the github discussions for the project.

(BTW: I’ve been using the library “in production” with my Perplexity natural language game project for a while now. That runs on Ubuntu and I do development on the mac. The approach has been working well for me as a “beta tester”, but of course I’m biased :slight_smile: .)

Let me know what you think!

6 Likes

One small thing – “language server” is used by IDEs and editors to do syntax highlighting.
https://langserver.org/
https://microsoft.github.io/language-server-protocol/

You might want to change your terminology to “Prolog server” or “SWI-Prolog server”.

3 Likes

Your example code is:

while True:
    result = prolog_thread.query_async_result()
        if result is None:
            break
        else:
            print(result)

It’s more “Pythonic” to package this as an iterator:

for result in prolog_thread.query_iter():
    print(result)

Also, you might want to consider an interface that is truly asynchronous, so that it works with await. (I’m not a big fan of await, but it seems that a lot of people are.)

I guess PDT [1] is also a language server in this sense.

I think PDT offers a Java API [2] to access swi-prolog, which is then used to build the PDT Eclipse plugin.

Dan

[1] https://sewiki.iai.uni-bonn.de/research/pdt/docs/start
[2] https://sewiki.iai.uni-bonn.de/research/pdt/connector/start

Yeah, I agree. I planned to add that as an alternative as I iterate the code forward, but forgot to put it in the backlog. Added it in there so it’ll be in the next turn of the crank. Thanks Peter!

Thank you, I was about to point this out myself. @ericzinda, @jan, can we get this naming collision sorted out?

1 Like

I was taking the fact that “language server” has other meanings in the wild as a small thing (as peter said) given that reading even a bit of the docs (I think) makes it clear that it’s not an IDE plug-in. Do we have IDE language servers in the build (or on the books somewhere) that are going to collide and cause confusion?

Naming this thing was a little challenging and I felt like the broader suggested names like “Prolog server” or “SWI-Prolog server” were too broad and would steal that name from something more grandiose and also misrepresent what the language_server is actually doing.

The best I came up with was “json_server” since it isn’t too broad and at least explains functionally what is going on. Any other suggestions?

Edit: the other problem with “json_server” is that we may want other protocols at some point (Jan suggested a binary one, for example), so then that would be confusing…

FWIW, I have a simple example program of a Javascript web front-end communicating with a SWI-Prolog backend, using JSON (and Javascript “fetch”). There’s a bit of discussion here: Simple Prolog server with JavaScript client

and the repo is here: GitHub - kamahen/swipl-server-js-client: Sample SWI-Prolog server with JavaScript client

I mean, the thing I instantly thought of when I saw the topic title was that someone had written an LSP server for Prolog, which I was very excited about because it meant cleaner IDE integration as I work on Prolog stuff, until I realized it wasn’t actually that. Of course, I’ve personally done work with LSP clients and servers in the past, so I may be a bad example? Or maybe it means I’m a good one, who knows.

Amusingly enough, I’m doing roughly the same thing in the opposite direction for a personal project of mine. I need to integrate Prolog and Python, though in my case I decided to make Prolog the controlling agent, rather than Python, so I wrote a very tiny Python shim script that Prolog can send queries to. Very similarly, queries are sent in a decimal-length-prefixed string format, and replies are in JSON. Funny!

Anyway, as far as names go, perhaps something like “query_responder” would work? That avoids the issue entirely, and it also means there’s no assumption that this is necessarily a long-running process of some sort (as opposed to what it is, a component of a single running application that happens to be compartmentalized into a separate system process). You could also call it “prolog_oracle” if you wanted to get all theory-of-compsci on it :slightly_smiling_face:

I do have some thoughts on the library/implementation itself, but I wanted to check, before I start diving into it - what kind of feedback are you interested in getting on the idea, and what kinds of changes are you open to making (beyond, I’m hoping, the name)?

Another name that just popped into my head, if you wanted to stick with the “server” nomenclature - perhaps integration_server or prolog_integration_server, since the primary goal is to integrate prolog with other languages?

Yeah, it seemed like a nice simple approach. Nice! Love to get the reverse at some point too: a javascript library talking to “language_server” (modulo name) too, don’t know if I can interest you in that…:slight_smile: Next up for me was writing a C# binding to it.

I am open to changing the name if we can come up with a better one, for sure.

As for other changes: my goal with this component is to keep the native language interface (i.e. the protocol) as small and simple as possible so we can get lots of different languages integrating with Prolog without a huge amount of effort writing the native language libraries. So I guess that means:

  • Internal fixes for perf, security, cleanliness, etc are fair game if they don’t change the protocol.
  • Protocol changes are fair game if they simplify the interface or add key/missing functionality that is worth the cost they incur to writing the native language library.
  • I view the Python library that calls the “language_server” as more open to feedback/enhancements since it only affects that component and not all future languages.

Does that make sense?

When I’m evaluating names, I’m thinking of two key facets:

  • Designed for using Prolog as a local implementation detail
  • “Easy” to integrate into a new language

OK, here’s what I think we have so far, including some I just added. I roughly sorted them with “my personal preferred” on top – no disrespect meant to suggesters!:

  • language_query_service
  • local_query_service
  • query_responder (I like that it gets rid of ‘server/service’)
  • query_service (shorter but sounds a bit more grandiose)
  • prolog_oracle (I like the idea of just picking a name like “oracle” – kind of like “pengines” did)
  • language_server (obvious issues above)
  • local_binding_service (pretty vague)
  • local_server (seems a bit too broad and vague)
  • json_server (may support different protocols some day, also maybe too central)
  • prolog_server (sounds like a central server)
  • swi_prolog_server (ditto)

How do you guys feel about “language_query_service”? Seems like it gets the point across and doesn’t seem too grandiose and maybe far enough away from “language server”?

Or “local_query_service” if we feel like it still might get confused?

Very cool! In looking through the code it seems like this is a great tutorial for showing how to use the SWI Prolog HTTP functionality and build the start of a local server that can server static pages as well as prolog queries. If I’m understanding correctly, it seems like you were targeting a different design center with that than the “language_server” (or whatever we call it).

Were you bringing it up as an example of something that could be used to compare notes on and do possible code borrowing?

(aside, if you’re looking for an LSP for Prolog, I have one I’ve been developing here)

1 Like

The HTTP functionality is there for serving queries (the queries themselves are at https://github.com/kamahen/swipl-server-js-client/blob/9494950e461105f41e7a7f2b1123e1876aa2a057/static/simple_client.js#L16), which just uses the text that a user types in – obviously this is not very safe. The query is handled here: https://github.com/kamahen/swipl-server-js-client/blob/9494950e461105f41e7a7f2b1123e1876aa2a057/simple_server.pl#L211

Because of XSS security, it’s simplest to serve both the query results and the static HTML from the same server, so the static HTML server code is also in the Prolog server.

I didn’t understand some stuff in the client-server tutorial, so I wrote this as a kind of test code, then stripped it down and added some documentation (which evidently isn’t good enough). I then used the code in a more complex server.

You might want to think of using the word “interface”, then, since in most OO languages that means “a thing you can use without knowing the implementation details”. I’m also put in mind of GDB/MI, gdb’s machine interface protocol, so that’s another point there. And, since this is effectively a replacement UI for Prolog’s classic REPL toplevel, perhaps call the toplevel predicate machine_query_interface and call the various other-language libraries “Prolog-MQI”? That reinforces that you’re only exposing half of the language (the query half, not the programming half), and a command line like

swipl /path/to/program.pl -t machine_query_interface

makes it incredibly clear what you’re getting (a way to send queries to the Prolog engine for this particular program) and how it’s supposed to be used (connect it to another piece of software). “Prolog-MQI” is also very googleable, which more software really ought to take into account (there’s a reason I stopped using the Awesome Window Manager and it has a lot to do with not being able to find documentation and assistance online :rofl:).

It also opens up the concept space a bit, which I think is welcome. If this is intended to make Prolog become just a local implementation detail, then it needs to be able to be transparently replaced with something else, whether that’s another prolog engine, a unit test mock, or even an app written in some other language that’s still able to respond to some sort of query with a similar kind of machine interface. That’s going to be a big draw for someone that might be on the fence about implementing part of their project in Prolog, because it leaves their options open.

Happy to! Shouldn’t be tough, once the protocol is finalized, and it’s always nice to play around in other languages :slight_smile:

As for the protocol itself, let’s see. Since this is a two-way, line-based text protocol, I think you’re missing something by not including a direct stdin/stdout interface. It should probably be the default for the arity-0 toplevel predicate unless you’ve specified otherwise in code, so you can use it as above. Requiring socket/network access to be able to use this makes for a significantly more difficult integration, depending on the language, but I can’t think of a single general-purpose language that doesn’t have a simple, builtin way to access standard input and output. Yes, that might restrict you to a single connection stream (though see below), but I imagine many use cases would only need one. Case in point: if pengines hadn’t been such a hassle to get up and running for inter-language queries, I probably would have done my project with a Python main talking to a Prolog query interface, since all the third-party code I’m linking up with is either Python or C/C++.

Other thoughts, in no particular order:

  • The JSON mapping of Prolog terms is more cumbersome than it probably needs to be. (It also needs to be documented, since I’m only getting this by looking at the protocol dumps you’ve provided.) My suggestion would be something along these lines:
Prolog type Prolog representation JSON representation Notes
integer 1234 1234 1
float 1234.0 1234.0 1
atom '1234' "1234" 2
variable _1234 {"_":"_1234"}
string "1234" {"\"":"1234"}
list `1234` [49,50,51,52]
compound foo(12,34) {"foo":[12,34]} 3
dict foo{12:34} {"{":["foo",{12:34}]}
improper list `[1,2,3 4]` {"[":[[1,2,3],4]}
unrepresentable '['([1,2,3],4) {"'":["compound","[",[1,2,3],4]} 4
[1] If it fits in a pointer-size signed integer (or in a double). Otherwise translate to an “unrepresentable” functor, because many JSON libraries (including, of course, Javascript itself) have limitations about the size of numbers they’ll read.
[2] Between atoms and SWI strings, atoms get a lot more use, so they get to use the bare JSON string representation.
[3] Any compound whose functor can be represented as an unquoted atom will be represented this way. Otherwise wrap with `T0 =… L, T1 =… [‘'’,compound L]`.
[4] This can be used for any unrepresentable type, or any value outside the representable domain of one of the other above types.

This keeps the representation compact (not actually a small consideration, if you’re transferring large data sets), makes it more readable (functor names before arguments), and gives every value (modulo dicts, whose keys can be reordered, and floats which are weird) a single canonical, invariant representation - which means I don’t even need a JSON library, if my results are predictable. Thus:

true([[threads(language_server1_conn2_comm, language_server1_conn2_goal)]])
% Original:
{"args": [[[{"args": ["language_server1_conn2_comm", "language_server1_conn2_goal"], "functor":"threads"}]]], "functor":"true"}
% New:
{"true": [[[{"threads": ["language_server1_conn2_comm", "language_server1_conn2_goal"]}]]]}
  • I’d suggest using the same base encoding in both directions. This doesn’t necessarily mean you need to translate your query from Prolog-syntax into JSON-syntax, but even something like {"query":"atom(a)"} gives you a good way to extend the command set without having to create new predicates to handle command variants or fall back on the (quite idiosyncratic) Prolog option-list convention. Plus, you could even do away with the byte-length prefix if you wanted, and just make the protocol line-oriented.
  • I think it’s a mistake for the default run to fetch all solutions to a query. Given that this is intended as a lightweight way to use Prolog in a project that has a different primary language (and thus by definition will have developers familiar with the primary language, but not necessarily with Prolog), you’re one run(member(foo, L), -1) away from the connection stalling while Prolog gobbles up all the system memory it possibly can. Do what the console toplevel does, return the first result, and indicate there are more results that can be fetched if desired. If the client runs another query while results are pending, just discard them.
  • Return variable bindings as a JSON object rather than a list of =(Var, Value) functors. Don’t make the possibly-rudimentary client library do the work, the whole point is to reduce the client library footprint.
  • Provide for the ability to bind variables to pre-query values without using Prolog syntax, to avoid injection vulnerabilities. Something like {"query":"player_exists(X)", "bindings":{"X": "Johnny\"),abort. %"}} will save you a lot of headaches.
  • It’d be nice to have the option to pass along an ID with every command that the server will echo back with the reply. That avoids potential desync of the command/response pattern, and it also opens the door for:
  • Make run_async actually async. Right now it’s just a long-form for run with a server-side results cursor, which as I mentioned ought to be a function of the basic run command. The purpose of async is to allow you to do other things while waiting for the long-running process to complete, and currently you can’t issue any other queries while the async one is running. (Obviously the client program can do other things while the query is running. It’s in a separate process.) Prolog has great coroutining facilities, use them to let you issue multiple queries and let them run independently, or at least sequentially. Use command IDs (or have the server return a query ID in the run_async reply) to distinguish which async_result the client wants to fetch, and then the server can start the engine working on the next result for that query after it returns this one.
  • Support or at least provide room in the protocol for push messages from the server. Without server-initiated messages, the only way to figure out when an async operation is finished is to poll repeatedly or issue a long-timeout fetch, at which point you’re no longer async. (Bonus: your ..... keepalives can just be turned into a subclass of push message.) Push messages could include “results available for async query 5” but also “data written to user_output: hello world”. Especially important if using stdin/stdout comms.
  • Allow access to and control of multiple threads as first-class options in the protocol, rather than requiring a separate OS-level connection for each thread. This could tie in nicely with the stdin/stdout comm layer, so you wouldn’t need to pull in networking just to have separate Prolog threads. Blocking protocol commands should block on a per-thread basis, so you can have each thread doing a run, and they’ll each return their results (sensibly multiplexed) when available.
  • Add named pipes and raw file descriptors to the list of comm layers, possibly. They’re available on Windows and on Unixlikes, and they also represent a much lower barrier of entry, while still enabling the simplex thread-per-connection model. It could be something as simple as issuing a “here I made a FIFO pair, please connect to it” command to an already-connected thread.

Right! That’s a lot, I know. Most of it is opt-in on the part of the client, though, so it shouldn’t make a big difference in how much code is needed for the foreign-language shim layer. And these are all only suggestions! Feel free to use or ignore as you see fit, I just wanted to offer some ideas to consider :slight_smile:

Yes, some libraries (and languages) have support for it. But it often requires explicit calling or handling; in JS, for example, a BigInt is a completely different type than a Number. If you try to parse a huge JSON number in the raw, it’ll show up as floating point, and at that point you’re screwed, because you’ve lost the value (due to conversion) and you have no idea where the representation is in the big mess of JSON. Representing the number in string form allows the client to decide whether to opt in to the special parsing/handling modes, if the client language has support for it.

Note 1 applies to two different items, integer rep and float rep. The “double” part was for the float rep, not the integer. And no, Javascript is 32-bit only across the board.

Also, these are implementation details that probably don’t need to be ironed out here until and unless @ericzinda decides he wants to go with the type of representation I was suggesting :slightly_smiling_face:

Oooh, I like that. I haven’t messed around with the GDB/MI, but perusing it it seems very similar in how it is used. I also like using the term “interface” as it gets around all of the “server/service” stuff and doesn’t lead people down the path of thinking this is a centralized server they should run. Nice!

I’ll see if @jan has an objections when he’s back online and, if not, go with that name for both the Prolog predicate (machine_query_interface/N), and the Python library: PrologMQI.

Thanks!

Edit: added to the next feature push here: https://github.com/EricZinda/swiplserver/issues/9]

1 Like

Yeah. Maybe its my non-Unix background, or all the crazy issues I’ve had in the past using STDIO and what seems to be the byzantine challenges there (especially cross platform), but my gut was that sockets would actually be easier. Partially confirmed by the number of google queries I did for just trying to get swipl to launch correctly from Python with the correctly quoted arguments…

Ignoring my whining, though, I definitely agree I need to investigate it. @Jan has mentioned it as well. It doesn’t necessarily complicate the interface since the library doesn’t have to use it (unless the library supports both TCP and command line, which I suspect many might feel they have to…). And: it may turn out to be simpler as you say :slight_smile: .

I’ll add that to the list to think about and investigate: https://github.com/EricZinda/swiplserver/issues/10