I happened to look at some C# code today disassembled into .NET IL “byte code” – which reminded me that SWI-Prolog also has its byte code and (WAM) VM.
I am curious – how do these compare in practice in terms of performance. Question is also applicable to Java byte code and VM. I guess all these are highly optimized but, then again, they are VMs.
The reason i was asked is whether my code could be made enterprise ready – with a hint that a rewrite with Java EE would be appropriate …
In terms of enterprise architecture – perhaps its sufficient to work with parallel engines each running in an own thread – although some writes to a shared Prolog store would be needed.
The keyword I use to search the SWI-Prolog source code for where the Prolog predicates are implemented in C is PRED_IMPL
The way I think of the code is that what can be done with Prolog is done in Prolog, the low level actions such as unification are done in C. Then if a predicate needs to be faster it is done with calls straight into C.
Sometimes, as Jan notes, the code is so tricky that it is better to implement in C than Prolog.
That is why I say AFAIK it does not use a Virtual Machine such as WAM or an Intermediate Language such as JVM or CIL.
I know Jan will correct me if this is wrong and then I will have learned something.
This module (pl-comp.c) forms together with the module ‘pl-wam.c’ the complete
kernel of SWI-Prolog.
Excerpts from: _ Bowen et al. , 1983 D. L. Bowen, L. M. Byrd, and WF. Clocksin. A portable Prolog compiler.
We have opted for the structure-copying method of [Mellish 80] and [Bryunooghe 80], rather than the structure-sharing [Warren 77].
Our storage management strategy is basically that of [Warren 77], i.e. there is a heap containing the program, a “local” stack for control information and variable bindings, a “global” stack for structures, and a "trail stack which keeps track of when variables are bound so that they can be reset to “uninstantiated” at the appropriate time on backtracking. One change is that a reference count is maintained for each clause so that pointers to clauses can safely be included in asserted terms.
As our run-time system is based on previously published work [Warren 77] [warren 80]
Bowen et al. , 1983
D. L. Bowen, L. M. Byrd, and WF. Clocksin. A portable Prolog compiler. In L. M. Pereira, editor, Proceedings of the Logic Programming Workshop 1983 , Lisabon, Portugal, 1983. Universidade nova de Lisboa. - Explains low level concepts such as control instructions: enter, call, and exit.
Neumerkel, 1993
Ulrich Neumerkel. The binary WAM, a simplified Prolog engine. Technical report, Technische Universität Wien , 1993 - The binary WAM, a simplified Prolog engine - Notes design concepts of Prolog abstract machines (as opposed to virtual machine VM) at time, including ZIP.
EDIT:
After response by Jan B.
This reminds me of minProlog and specifically the execution engine (solve.ml)
This also reminds me of the MIT open couseware lecture of advanced data structures (videos) . There was one in particular but I can’t remember the exact one.
and then if I search the C unification code for i_fcalldetva, i_fcalldetva is not found.
So what is the relationship between vm_list/1, the source files pl-wam.c pl-vmi.c and the implementation of the predicates using PRED_IMPL?
When I look at the instructions for the WAM, e.g. unify_value I would expect to see them in the result from vm_list(=). So what is vm_list/1 really returning?
The choice between Prolog and C is not so much about tricky. Some things cannot be done in pure Prolog, such as opening a file. So this must be in C. Then there are notably term manipulations that can be done way more efficiently in C, in part because C is faster but also for a large part because we can temporary modify the terms we are working on to avoid the need for tables to keep track of the state. This saves space and turns many operations into O(n), where n is the complexity of the term. Then there is simple deterministic stuff that needs to be fast, such as sort/2 and variations.
I don’t think the (SWI-)Prolog VM is conceptually that different from the Java one. There are some clear differences:
The (SWI-)Prolog VM instruction set is not stable. This means you cannot load VM code on a different version (let alone a different Prolog). There is no real reason why this shouldn’t be possible in theory.
A Prolog VM has a rather different instruction set. Many instructions are related to unification.
The WAM is a register VM, but despite some wrong names in the sources, the SWI-Prolog VM is based on a minimal version of the ZIP, which passes arguments over the stacks rather than using registers.
I don’t know how the JVM deals with stuff such as OS access. The (SWI-)Prolog VM can deal with such things in two ways: add a VM instruction or add a foreign predicate. Most not very time critical stuff uses foreign predicates as that keeps the VM small, is easier to manage and even allows loading such extensions at runtime.
A predicate is implemented by what is called the supervisor code. That can be regarded a preamble and depends on how the predicate is defined. If it is foreign, this uses one of the i_fcall* instructions which creates the arguments for the C function and calls it. If is a normal predicate it depends on things such as being thread-local, dynamic or static and in the latter case on the indexing opportunities. The supervisor is created on the first call (actually, the initial supervisor is S_VIRGIN, which analyses the predicate and replaces itself by the real supervisor on the first call. reloading the predicate resets the supervisor to S_VIRGIN).
Unification is a very special case. The VM defines many unification instructions and the predicate is hardly ever called. It is there to support meta calling it (e.g, maplist(=(x), LIst)) and such that you can reason about programs using reflexive predicates such as predicate_property/2. Being foreign, =/2 only has a supervisor and no clauses.
So, it seems that Prolog would, for an equivalent task be quite slower – speed is mostly gained by those predicates that executed in compiled C.
And my assumption that VMs are inherently slower than native code is not really true … once just in time compiled they run at native speed.
So, i guess, C++ key benefit over, say, Java and .Net, is more control over low level details such as memory layout . I do remember reading a paper that garbage collection done automatically is often more efficient than manual GC in non-garbage collection languages …
The granularity of most of the Prolog VM instructions is a lot higher, meaning a single instruction does far more work than it would in an imperative language. The price of a VM (without compiling it) is roughly) threefold:
Find and jump to the next instruction
Possible loss of CPU pipelining
Loose opportunities to schedule instructions smartly due to the split in VM instructions. I.e., if you create a jumbo instruction out of a series of simple VM instructions the compiler may reschedule the instructions in a more efficient way.
There are also some benefits for VM code, especially for Prolog:
As VM instructions are quite large and very different from the bare metal instructions, the VM code is a lot shorter while using the same bits of the VM interpreter over and over. That reduces cache misses.
The Prolog garbage collector, debugger and many of the reflexive capabilities reason about the VM instructions. This is lost if we do native code compilation and thus we need either annotations in the code or duplicate representations or give up some of these goodies.
All in all, the debate about the usefulness of compiling Prolog to native code is not settled AFAIK. Surely, some systems showed significant speedup on tight loops running static code, but overall performance on large real wold programs is far less clear.
In the past there was a big deal about adding jumbo instructions to the VM that represents common sequences of primitive instructions. That has the clear value of reducing the size of the compiled program and provide the compilation of the VM to reschedule the instructions in these jumbo instructions. The biggest advantage was attributed to keeping the CPU pipeline running longer. In the old days, a switch(*PC++) was typically the end of the CPU pipeline, i.e., the CPU had to start from scratch to load the code and plan the execution. SWI-Prolog, like many modern VM languages uses the GCC extension to get the code addresses for a C label and use goto ptr, turning the main VM operation to jump to the next instruction into goto *PC++. According to timing I caried out long ago, modern CPUs can deal with that quite efficiently.
The other two still apply and creating jumbo instructions still pays of. Unfortunately it typically implies a lot of code duplication when using e.g., C as language for implementing the VM. This makes the VM harder to maintain and larger. Quite a bit of work has been done in this area. I must admit I followed only some of that from a distance. I never bothered much.
As is, if we want to make SWI-Prolog faster, these are in my perception the most promising:
Improve the implementation of last call optimization. That is rather clumsy right now.
Improve clause selection for predicates with some (say 2…10) clauses.
Reorganize the data representation, notably for 64-bit systems. The tagging scheme for 64-bit is just a slightly modified version of the 32-bit scheme. That can be a lot better, reducing indirections and thus cache misses when accessing data.
Too bad that this effort is not focused expanding / improving swi-prolog.
From what I learned about Rust, one key aim is to be fully inoperable with C / C++ – with the intent to port existing code bases to Rust in a piece by piece manner.
Wouldn’t it (have been) great to continue improving on code base that embodies a decade (and more) of experience in Prolog design, by further adding the safety of Rust.
In terms of performance – i think C / C++ is still somewhat faster than Rust, while Rust provides the pointer safety harder to accomplish in C++ (with ownership models and the like).
I’m not very convinced using Rust for developing a VM. Better management of object lifetime is of course great, especially for applications. For a VM though you typically need dedicated techniques. Just consider clauses. Prolog must keep them after deletion until no thread has an open call on the involved predicate with an older generation and until it is no longer involved in an explicit database reference as returned (for example) using assertz/2. You cannot do this using reference counting as multiple threads running the same clauses will cause a huge slowdown. These things are in SWI-Prolog handled using a dedicated garbage collector. Generic GC is also problematic as it scans far too much for finding references, seriously slowing down Prolog programs with many gigabytes of in use memory.
I’ve tried using the Boehm garbage collector as getting rid of old objects safely is a big challenge when using lock free algorithms. There are still various traces in the code. Initially it seemed quite nice, but I reverted quickly after running some really big programs and observing a very large performance degradation.
Keri and I also tried reference counting on several objects, but we needed to abandon that idea for most object types as well.
So yes, I believe you can relatively quickly implement a safe multi-threaded Prolog in Rust that will perform well single threaded. I doubt it will do a good job on real concurrent workloads though. It may be possible to bypass Rust’s default coping mechanisms to improve on that. Possibly you still gain in terms of readability. Possibly you loose.
Hi,
to be honest I don’t understand every detail in the posts here. But my conclusion would be: if doing some adaption such that the SWI VM runs on .NET CLI, SWI Prolog would run in the .Net ecosystem. But how much effort would that be? Nearly impossible or some work but doable? The motivation here would be participate the .NET ecosystem via CLI.
But I have to read more about the implementation of SWI, its interesting topic.
I read somewhere that the Java VM is not build for optimizing tail recursion – so a programming language, such as Clojure (a Lisp on the Java VM), that could significantly benefit from such an optimization does not have it.
Now that M$ is (literally) buying into FOSS, I have reconsidered C# and took some time to help on SO about using SwiPLCSharp.
Have you already tried it ?
As Jan remarked some time ago, SWI-Prolog is a complex system, that needs a strict control over essential resources, and conflicts with other complex runtimes can be a nightmare…
Thanks for the pointer. This lib was unknown to me. But the point is, it is an connector, meaning that SWI Prolog would run unmanaged as normal, right?