Sweep: SWI-Prolog Embedded in Emacs

Hi all,
I’ve been working for the past couple of weeks on a little project for improving the integration of SWI-Prolog with GNU Emacs. The idea was to create a setup where Emacs can call Prolog predicates and examine their output as seamlessly as possible, in order to allow Emacs to utilize the Prolog runtime in time-sensitive tasks, such as semantic highlighting of Prolog source code, on-the-fly documentation, etc.

Background

I started with a reexamination of @jamesnvc’s lsp_server as the obvious suspect for these use cases, but I felt that the overhead imposed by the JSON-based Language Server Protocol is not the best starting point. Instead, I’ve set up a UDP-based channel between Emacs and Prolog where Emacs sends raw Prolog queries to Prolog, which Prolog executes and responds to Emacs with raw strings obtained from the query’s output.
Last week I’ve shown @jan a demo of this work which included semantic highlighting, autocompletion, interactive code formatting, and such bells and whistles built on top of this setup. Jan provided some good feedback and asked about the performance of this setup on large buffers. This made me wonder if it would be possible to achieve a performance that is as good as the built-in editor’s, whose biggest inherent advantage AFAICT is the ability to open buffers directly as Prolog streams without going through costly IPC mechanisms.
So I decided to take a step back and see if I can make SWI-Prolog and Emacs talk to each other without resorting to OS-based IPC, which brings me to the actual subject of this post:

sweep

sweep is an embedding of SWI-Prolog in Emacs. It uses the C interfaces of both SWI-Prolog and Emacs Lisp to create a dynamically loaded Emacs module that contains the SWI-Prolog runtime.
Its core functionality is the ability to execute Prolog queries from Emacs Lisp and examine their results. This is achieved via a set of C-implemented Elisp functions, sweep-open-query, sweep-cut-query, etc., which expose corresponding functions from the SWI-Prolog C interface.

I’m sharing this work in its initial stage (currently lacking most of its intended user-facing features) to collect some comments on the core functionality of embedding Prolog. Notably, there is currently an annoying limitation that I’m wondering how best to overcome:

Currently, sweep builds SWI-Prolog without GMP support (using the cmake flag -DUSE_GMP=OFF). Initializing SWI-Prolog inside Emacs with GMP support enabled for SWI-Prolog causes Emacs to crash during garbage collection, AFAICT because the SWI-Prolog mp_free routine is mistakenly invoked to cleanup Elisp big integers.
Aside from the obvious problem of not being able to utilize SWI-Prolog support for unbounded integer arithmetic, this issue also prevents us from using an existing libswipl since it is most likely to have been built with GMP enabled.

Thoughts on how to address this issue will be greatly appreciated :slight_smile:

The source code for sweep is available at ~eshel/sweep: / - sourcehut git, and some initial documentation can be found at sweep: SWI-Prolog Embedded in Emacs.
I’ve included Emacs commands for listing and jumping to modules and predicate definitions (sweep-find-module and sweep-find-predicate) to show the basic utility:

EDIT [27/08]:
Added a command sweep-pack-install for interactively installing SWI-Prolog packages

Screenshot

Cheers!

3 Likes

Your project sounds interesting. About ten years ago, I wrote “emacs-handler.pl” and “emacs-jockey.pl” to control GnuEmacs via emacs-lisp “start-process” function. The two prolog codes files are in my pack pac. Clearly my approach is more complex than necessary, but I could not find an alternative. Anyway it works for my purpose, and still I am using it from time to time. Two days ago, I added a “handle” to apply lualatex to a region of latex codes to compile and preview as below. It works but only too slowly. So I went back to existing use latexmk(rc) on emacs buffer. As such, I am looking forward to integration of SWI-Prolog and GnuEmacs.

(global-set-key (kbd "s-p")
				(lambda () (interactive)
				  ( when (get-process "PAC")  (delete-process "PAC") )
				  (start-emacshandler)))
% In emacs-jockey.pl

handle([luatex, region])--> region,
	current(R),
	peek([]),
	{
		expand_tilda("~/tmp/deldel.tex", TeXFile),
		expand_tilda("~/tmp/preamble.tex", Preamble),
		Fs=	[text("\\RequirePackage{luatex85}\n"),
			text("\\documentclass{ltjsarticle}\n"),
			text("\\usepackage[hiragino-pron,jis2004]{luatexja-preset}\n"),
			file(Preamble),
			text("\\begin{document}\n"),
			codes(R),
			text("\\end{document}\n") ],
			assemble(Fs, TeXFile)
	},
	{	expand_tilda("~/tmp", TMP),
		qshell(	cd(TMP) ;
				lualatex("deldel") ;
				open(-a("Preview"), "deldel.pdf")
			)

	}.
% In emacs-handler.pl
assemble(Fs, F) :- expand_file_search_path(F, F1),
        open(F1, write, FX, [encoding(utf8)]),
        maplist(assemble_basic(FX), Fs),
        close(FX).
%
assemble_basic(FX, text(F)) :- !, clean_io(FX, write, basic:smash(F)).
assemble_basic(FX, file(F)) :- !, expand_file_search_path(F, F1),
        open(F1, read, FY, [encoding(utf8)]),
        clean_io(FY, read, eh:getstring(D)),
        maplist(put_code(FX), D).
assemble_basic(FX, codes(Codes)) :-!, maplist(put_code(FX), Codes).
assemble_basic(FX, region(Codes)):-!, maplist(put_code(FX), Codes).
assemble_basic(FX, buffer) :-
	call_lisp(list('point-min'(), 'point-max'()), [value(L), string(t)]),
	list_number_list(L, [Min, Max]),
	get_buffer_region(Min, Max, R),
	maplist(put_code(FX), R).

Nice! The GMP issues can possible be fixed. As is, SWI-Prolog takes over GMP allocation for two reasons: to avoid program termination due to giant allocations in GMP on some operations and to ensure cleanup of GMP objects, also in the case of exceptions. As is, this more or less assumes Prolog is the only user of GMP in the process. Possibly this can be relaxed. It could also be that more recent versions of GMP allow for a better design of the memory management integration. As is, it is rather tricky, but as good as possible when it was written :frowning:

The overall story raises some questions. Most of the power of the built-in tools is that they can (concurrently) operate in the same environment as the running (user) Prolog program, so the tools have access to all aspects of the process in the correct state. That can also work for running SWI-Prolog in an Emacs (shell) window and at the same time have the editor side of Emacs talking to some network service that Prolog provides from another thread. That is also how the embedding in Java Eclipse (PDT) works. How would that translate to linking Prolog and Emacs in the same process?

Thanks, that’s about what I thought. I’m not very familiar with GMP but I’ll try to look into it.

Basically, the idea is to use threads for this kind of concurrent use - e.g. if we want a top-level we’ll spawn a Prolog thread from the embedded environment that provides a top-level and either connect to this new thread as if it was a remote process or implement some other bidirectional channel that doesn’t block the main thread like message passing.

Here’s a quick and dirty example (I’ll push a more robust implementation soon):

(defun sweep-top-level (dir port)
  "Start a Prolog top-level in DIR and connect to it over PORT."
  (interactive "DStart top level in directory: \nnConnect over port: ")
  (sweep-open-query "user" "sweep" "sweep_new_top_level" (list (expand-file-name dir) port))
  (let ((sol (sweep-next-solution)))
    (sweep-close-query)
    (if (sweep-true-p sol)
        (progn
          (sit-for 0.5) ; give the new thread a chance to start listening
          (comint-run "telnet" (list "127.0.0.1" (number-to-string port))))
      (user-error "Top level initialization failed"))))

And in the Prolog side we add the predicate sweep_new_top_level/2:

sweep_new_top_level(Args, []) :-
    thread_create(sweep_top_level(Args), _Thread, []).

sweep_top_level([Dir,Port]) :-
    working_directory(_, Dir),
    prolog_server(Port, []).

With this we can call M-x sweep-top-level and get a top-level running in the context of the embedded environment that works concurrently with other Emacs interactions:

Note that even though we resort to an IPC mechanism for the top-level interaction this implementation, we still retain the ability to communicate directly and efficiently (without OS overhead) with the main Prolog thread through the Elisp interface to Prolog provided by the shared linking.

WDYT?

1 Like

:+1: Sounds like a promising approach.

Pushed ~eshel/sweep: ADDED: sweep-top-level command and appropriate mode - sourcehut git, this is somewhat cleaner than the example implementation from above. The relevant parts for starting the top-level are documented in the manual.

1 Like

This looks very cool! My dream when I first started on the Prolog LSP server was to have something like Clojure’s CIDER for Prolog; LSP seemed like the easiest way to get started, I really like you’re approaching; having access to the actual running process would make a lot of stuff much easier and more powerful. I look forward to seeing how this evolves and helping if I can!

2 Likes

There is one other thing to worry about that also (partly) applies to the built-in tools. If for some reason you need to restart Prolog it is tight to the Emacs process. For the built-in tools this really means restarting. In the Emacs case we may often get away calling PL_cleanup() and re-initialize the Prolog process. In both cases a crash in Prolog takes down the IDE :frowning:

I normally work around these issues by having one instance of Prolog with the program loaded running the IDE and another instances for running the program under development. This is typically only needed when developing programs that maintain a lot of state or foreign resources.

I have good hopes this may result in something that can really compete with the built-in tools :slight_smile: Only, it is bound to GNU-Emacs …

2 Likes

Thank you! If you get a chance to try it out let me know how it goes :slight_smile:

Yes, I have exposed PL_cleanup() and PL_initialize() as sweep-cleanup and sweep-initialize exactly for allowing the user (and myself when debugging this, really) to re-initialize Prolog, perhaps even with different start-up flags.

Indeed, that’s an inherent problem with this approach :frowning:

Thanks, glad to hear that you have faith in this direction.

I’ve pushed another update that introduces foreign Prolog predicates sweep_funcall/2,3 which are available in the context of Prolog queries initiated from Elisp. These predicates allow for calling Elisp from Prolog (from Elisp). This means we can write Prolog code that manipulates Emacs buffers directly. For more information, see the relevant section in the manual.

To showcase this feature, I added (initial) semantic highlighting to queries in the Prolog top-level interface:

The query is colorized whenever the user stops typing for 0.2 seconds or more. This is achieved by setting up an Elisp function sweep-colourise-query to run whenever Emacs is idle for long enough, this function invokes the Prolog predicate sweep_colourise_query/2 using the sweep-open-query interface and passing it the query string as an argument. sweep_colourise_query/2 in turn analyzes the query and invokes the Elisp function sweep--colourise with arguments specifying the text ranges and colors to use for each semantic token by utilizing the new sweep_funcall/3 interface.

2 Likes

I think I may have found a workaround for this issue that is runtime based only, meaning it can be used with an existing SWI-Prolog installation:
Basically I learned about PL_action(PL_GMP_SET_ALLOC_FUNCTIONS, FALSE), which asks SWI-Prolog to leave the GMP memory function pointers untouched.
Luckily Emacs doesn’t set the GMP memory functions to anything special, just simple wrappers for malloc() and friends, so if I call PL_action() before PL_initialize(), everything seems to work fine with a libswipl compiled with GMP support, including unbounded integers:

% sweep with host swipl (built with GMP):
?- A is 2**80, integer(A).
A = 1208925819614629174706176.

Compare to the previous state:

% sweep with dedicated swipl (built without GMP):
?- A is 2**80.
A = 1.2089258196146292e+24.

?- A is 2**80, integer(A).
false.

The only problem is that I’m not sure I fully understand the implications of this setup, namely in what circumstances it may cause the problems described above.

I’ve tried to create large integers and immediately generate an exception to see if it leaks memory, but couldn’t see a dent in the memory usage of Emacs+Prolog, is there a recommended test/benchmark that could show such a leak?

Surely better then no GMP :slight_smile:

Try running src/test.pl. I think one of the affected tests is below.

gmp(shift-3) :-
	unbound(A),
	forall(between(1, 100, X),
	       catch(A is 1<<(1<<X), error(resource_error(stack), _), true)).

Normally, after computing a result, the resulting GMP number is copied to the Prolog (global) stack. It is there almost in its normal GMP representation such that GMP can read from these numbers directly. It cannot write them directly though.

The allocation hooks in pl-gmp.c do two things: watch for too big allocations (those that will not fit on the stack anyway) and collect the allocations done. The latter is used if some sequence of arithmetic operations raises an error and we find ourselves with some dangling intermediate GMP objects. See AR_BEGIN(), etc. in pl-gmp.h

I think it should be possible to adjust the allocation hook to do its normal thing if LD->gmp.context is NULL or LD is NULL (not a Prolog thread).

I see, that makes a lot of sense. I’ve just recompiled swipl after changing the current if ( LD->gmp.persistent ) checks to:

if ( LD == NULL || LD->gmp.context == NULL || LD->gmp.persistent )
    malloc()/realloc()/free()

And everything works without using PL_action(PL_GMP_SET_ALLOC_FUNCTIONS, FALSE) in sweep :slight_smile:

I’ll open a PR with this change, thank you!

Quick update:

I’ve pushed a new version of sweep (version 0.1.1) which includes a specialized Emacs mode for reading and editing Prolog source buffers, as well as a few other enhancements.
The new mode sweep-mode currently provides rich semantic highlighting, context-aware indentation, and module-aware completion for predicate calls.
Notably, the indentation implementation is highly experimental, so any reports of non-optimal behavior or improvement suggestions will be appreciated.

Here's how a Prolog buffer may look with this mode:

Refer to the relevant part of the manual for more information. It’s not much but I’m expanding the manual along the way.

1 Like

Very cool! I had to do some fiddling to build it, but I can see it highlighting my code. I’ll send a patch for one of the changes I made; looking forward to continuing to play with this & contributing more!

2 Likes

Hey all,
I’ve just released a new revision of sweep, tagged as version 0.3.0.

The most important update wrt to the previous versions is the inclusion of a new indentation engine that takes into account operator precedence according to the current operator definitions.
This required non-trivial changes but was necessary to resolve some indentation issues reported by @jan.
There’s now also a test suite for the indentation logic that runs in CI to avoid regressions in the future.

Cheers

2 Likes

This is absolutely fantastic stuff. I’ve just been playing around with it and it’s a big improvement over the current TAGS table approach I’m using. So much value to be won from having a tight integration with a REPL that is as introspective as swipl.

1 Like

I was playing around with the new sweep package in the master branch (V8.5.17-37-gd778c9d88) , but I get the following error:

$ emacs
emacs: symbol lookup error: /tmp/swipl-devel/build.release/packages/sweep/sweep-module.so: undefined symbol: PL_register_foreign

It is as if libswipl.so had not been linked in, but isn’t this running within an internal swipl process?

Thanks for checking it out!

May I ask which OS and which Emacs version you’re trying this with?
If you’re using Emacs 27 or earlier on Linux, you may need to preload libswipl, as in:

$ LD_PRELOAD=/tmp/swipl-devel/build.release/src/libswipl.so emacs

This is due to the way Emacs before version 28 invokes dlopen when loading dynamic modules.

Thanks!
it is emacs 28. 2 on archlinux.

EDIT: it works fine with LD_PRELOAD