Curious: How does Prolog "Byte code" compare to .NET IL (and some thoughts about enterprise ready systems)

Hi,

I happened to look at some C# code today disassembled into .NET IL “byte code” – which reminded me that SWI-Prolog also has its byte code and (WAM) VM.

I am curious – how do these compare in practice in terms of performance. Question is also applicable to Java byte code and VM. I guess all these are highly optimized but, then again, they are VMs.

The reason i was asked is whether my code could be made enterprise ready – with a hint that a rewrite with Java EE would be appropriate …

In terms of enterprise architecture – perhaps its sufficient to work with parallel engines each running in an own thread – although some writes to a shared Prolog store would be needed.

any thoughts are much appreciated,

Dan

1 Like

AFAIK SWI-Prolog is implemented with C, there is no IL, it does not use WAM.

Thanks.

I mean, swi-prolog programs are translated into byte code and executed through a WAM VM

Dan

I started searching the source at GitHub and found pl-wam.c so now I have to question my own reply.

There is also the command:

vm_list(Predicate_name)

The keyword I use to search the SWI-Prolog source code for where the Prolog predicates are implemented in C is PRED_IMPL

The way I think of the code is that what can be done with Prolog is done in Prolog, the low level actions such as unification are done in C. Then if a predicate needs to be faster it is done with calls straight into C.

Sometimes, as Jan notes, the code is so tricky that it is better to implement in C than Prolog.

That is why I say AFAIK it does not use a Virtual Machine such as WAM or an Intermediate Language such as JVM or CIL.

I know Jan will correct me if this is wrong and then I will have learned something.

Related questions:

How can a foreign language predicate also be a built-in predicate?
“decompiled prolog”

Efficiency of Prolog


Personal notes for my own use

vm_list/1.

?- vm_list(append).
========================================================================
append/2
========================================================================
   0 s_trustme(clause(638140))
----------------------------------------
clause 1 (<clause>(000001707FC6F2F0)):
----------------------------------------
   0 i_enter
   1 b_atom(list)
   3 b_var0
   4 i_call(error:must_be/2)
   6 b_var0
   7 b_var1
   8 i_depart(lists:append_/2)
  10 i_exit
========================================================================
append/3
========================================================================
   0 s_list(clause(735160),clause(641508))
----------------------------------------
clause 1 (<clause>(000001707FCCDEE0)):
----------------------------------------
   0 h_nil
   1 h_void
   2 h_var(1)
   4 i_exitfact
----------------------------------------
clause 2 (<clause>(000001707FC72790)):
----------------------------------------
   0 h_list_ff(3,4)
   3 h_void
   4 h_list
   5 h_var(3)
   7 h_firstvar(5)
   9 h_pop
  10 i_enter
  11 b_var(4)
  13 b_var1
  14 b_var(5)
  16 i_depart(lists:append/3)
  18 i_exit
========================================================================
append/1
========================================================================
   0 i_fopen
   1 i_fcalldetva(-4611686413639185926)
   3 i_fexitdet
true.

Source code:

pl-wam.c
pl-vmi.c
pl-supervisor.c
pl-comp.c
pl-export

From: pl-vmi.c
Virtual machine instruction names. Prefixes:

I_ General instructions
B_ Body specific version
H_ Head specific version
A_ Arithmetic compilation specific
C_ Control (compilation of ;/2, etc.)
S_ Supervisor instructions. See pl-supervisor.c

References:

Logic Programming Implementation - Part I: The WAM
Limits on memory areas - Notes trail stack which is also part of WAM.
Warren’s Abstract Machine - A Tutorial Reconstruction - List the basic instructions of WAM such as put_structure, set variable, unify_value, unify_variable.
An Abstract Prolog Instruction Set

From pl-comp.c

This module (pl-comp.c) forms together with the module ‘pl-wam.c’ the complete
kernel of SWI-Prolog.

Excerpts from: _ Bowen et al. , 1983 D. L. Bowen, L. M. Byrd, and WF. Clocksin. A portable Prolog compiler.

We have opted for the structure-copying method of [Mellish 80] and [Bryunooghe 80], rather than the structure-sharing [Warren 77].

Our storage management strategy is basically that of [Warren 77], i.e. there is a heap containing the program, a “local” stack for control information and variable bindings, a “global” stack for structures, and a "trail stack which keeps track of when variables are bound so that they can be reset to “uninstantiated” at the appropriate time on backtracking. One change is that a reference count is maintained for each clause so that pointers to clauses can safely be included in asserted terms.

As our run-time system is based on previously published work [Warren 77] [warren 80]


EDIT: After responses (1) (2) by Jan W.

References:

SWI-Prolog Implementation history

Bowen et al. , 1983
D. L. Bowen, L. M. Byrd, and WF. Clocksin. A portable Prolog compiler. In L. M. Pereira, editor, Proceedings of the Logic Programming Workshop 1983 , Lisabon, Portugal, 1983. Universidade nova de Lisboa. - Explains low level concepts such as control instructions: enter, call, and exit.

Neumerkel, 1993
Ulrich Neumerkel. The binary WAM, a simplified Prolog engine. Technical report, Technische Universität Wien , 1993 - The binary WAM, a simplified Prolog engine - Notes design concepts of Prolog abstract machines (as opposed to virtual machine VM) at time, including ZIP.

EDIT:

After response by Jan B.
This reminds me of minProlog and specifically the execution engine (solve.ml)
This also reminds me of the MIT open couseware lecture of advanced data structures (videos) . There was one in particular but I can’t remember the exact one.

What confuses me is that if I use vm_list/1 on unification (=)

?- vm_list(=).
========================================================================
=/2
========================================================================
   0 i_fopen
   1 i_fcalldetva(4611685623215523968)
   3 i_fexitdet
true.

and then if I search the C unification code for i_fcalldetva, i_fcalldetva is not found.

So what is the relationship between vm_list/1, the source files pl-wam.c
pl-vmi.c and the implementation of the predicates using PRED_IMPL?

When I look at the instructions for the WAM, e.g. unify_value I would expect to see them in the result from vm_list(=). So what is vm_list/1 really returning?

The choice between Prolog and C is not so much about tricky. Some things cannot be done in pure Prolog, such as opening a file. So this must be in C. Then there are notably term manipulations that can be done way more efficiently in C, in part because C is faster but also for a large part because we can temporary modify the terms we are working on to avoid the need for tables to keep track of the state. This saves space and turns many operations into O(n), where n is the complexity of the term. Then there is simple deterministic stuff that needs to be fast, such as sort/2 and variations.

I don’t think the (SWI-)Prolog VM is conceptually that different from the Java one. There are some clear differences:

  • The (SWI-)Prolog VM instruction set is not stable. This means you cannot load VM code on a different version (let alone a different Prolog). There is no real reason why this shouldn’t be possible in theory.
  • A Prolog VM has a rather different instruction set. Many instructions are related to unification.
  • The WAM is a register VM, but despite some wrong names in the sources, the SWI-Prolog VM is based on a minimal version of the ZIP, which passes arguments over the stacks rather than using registers.
  • I don’t know how the JVM deals with stuff such as OS access. The (SWI-)Prolog VM can deal with such things in two ways: add a VM instruction or add a foreign predicate. Most not very time critical stuff uses foreign predicates as that keeps the VM small, is easier to manage and even allows loading such extensions at runtime.
1 Like

A predicate is implemented by what is called the supervisor code. That can be regarded a preamble and depends on how the predicate is defined. If it is foreign, this uses one of the i_fcall* instructions which creates the arguments for the C function and calls it. If is a normal predicate it depends on things such as being thread-local, dynamic or static and in the latter case on the indexing opportunities. The supervisor is created on the first call (actually, the initial supervisor is S_VIRGIN, which analyses the predicate and replaces itself by the real supervisor on the first call. reloading the predicate resets the supervisor to S_VIRGIN).

Unification is a very special case. The VM defines many unification instructions and the predicate is hardly ever called. It is there to support meta calling it (e.g, maplist(=(x), LIst)) and such that you can reason about programs using reflexive predicates such as predicate_property/2. Being foreign, =/2 only has a supervisor and no clauses.

1 Like

Most JVM don’t execute bytecode beause this would be too slow. They only use it to translate it to machine code, and then execute the machine code. What helps in doing this translation is also the class format, which is bytecode plus type information. Bytecode is only the body of a Java method.

There are different ways to translate bytecode:

Ahead-of-time-Compilation
https://en.wikipedia.org/wiki/Ahead-of-time_compilation

Just-in-time-Compilation
https://en.wikipedia.org/wiki/Just-in-time_compilation

A JVM cannot execute Prolog satisficing like a WAM can do, because it has no notion of logical variable, and subsequently last call optimization, the variable garbage collection part is not available directly in the JVM.

I don’t know whether JVM extension exist, that would change the situation, or whether it can be nevertheless done with a JVM. Even tail recursion calls is not on the agenda of the usual JVM. Most JVMs use an ordinary stack for recursion.

What do you mean by a logic variable? Never heard that term before.

EDIT

Found this: What precisely is a “logical variable” and what is the general approach for implementing the language feature?

Hi,

Thank you for the information.

So, it seems that Prolog would, for an equivalent task be quite slower – speed is mostly gained by those predicates that executed in compiled C.

And my assumption that VMs are inherently slower than native code is not really true … once just in time compiled they run at native speed.

So, i guess, C++ key benefit over, say, Java and .Net, is more control over low level details such as memory layout . I do remember reading a paper that garbage collection done automatically is often more efficient than manual GC in non-garbage collection languages …

Dan

Logical variables are at the core of a WAM. See also:

What precisely is a “logical variable”
https://stackoverflow.com/questions/33538812/

A logical variable is a memory cell, that can only go from uninstantiated to instantiated and nothing else. And this effect is consistently undone during backtracking. Unused such memory cells can be garbage collected. This is called environment trimming in the WAM. See also page 1 here:


Warren, David H. D.: An abstract Prolog instruction set (PDF; 269 kB) ,
Technical Note 309, SRI International, Menlo Park, CA, October 1983.

The problem is you need to trail them somehow, to allow the backtracking. Also effects of determinate predicates need to be undoable. In the same time you need some garbage collection. Maybe some tools inside the Java Runtime allow to model this close to the JVM.

The Java Runtime has things like Phantom, Weak and Soft references. See reachability of Java Runtime. Maybe something could be done with these tools. Even if tools would be used, I don’t know whether they perform as well as other approaches that are also available.

The granularity of most of the Prolog VM instructions is a lot higher, meaning a single instruction does far more work than it would in an imperative language. The price of a VM (without compiling it) is roughly) threefold:

  • Find and jump to the next instruction
  • Possible loss of CPU pipelining
  • Loose opportunities to schedule instructions smartly due to the split in VM instructions. I.e., if you create a jumbo instruction out of a series of simple VM instructions the compiler may reschedule the instructions in a more efficient way.

There are also some benefits for VM code, especially for Prolog:

  • As VM instructions are quite large and very different from the bare metal instructions, the VM code is a lot shorter while using the same bits of the VM interpreter over and over. That reduces cache misses.
  • The Prolog garbage collector, debugger and many of the reflexive capabilities reason about the VM instructions. This is lost if we do native code compilation and thus we need either annotations in the code or duplicate representations or give up some of these goodies.

All in all, the debate about the usefulness of compiling Prolog to native code is not settled AFAIK. Surely, some systems showed significant speedup on tight loops running static code, but overall performance on large real wold programs is far less clear.

In the past there was a big deal about adding jumbo instructions to the VM that represents common sequences of primitive instructions. That has the clear value of reducing the size of the compiled program and provide the compilation of the VM to reschedule the instructions in these jumbo instructions. The biggest advantage was attributed to keeping the CPU pipeline running longer. In the old days, a switch(*PC++) was typically the end of the CPU pipeline, i.e., the CPU had to start from scratch to load the code and plan the execution. SWI-Prolog, like many modern VM languages uses the GCC extension to get the code addresses for a C label and use goto ptr, turning the main VM operation to jump to the next instruction into goto *PC++. According to timing I caried out long ago, modern CPUs can deal with that quite efficiently.

The other two still apply and creating jumbo instructions still pays of. Unfortunately it typically implies a lot of code duplication when using e.g., C as language for implementing the VM. This makes the VM harder to maintain and larger. Quite a bit of work has been done in this area. I must admit I followed only some of that from a distance. I never bothered much.

As is, if we want to make SWI-Prolog faster, these are in my perception the most promising:

  • Improve the implementation of last call optimization. That is rather clumsy right now.
  • Improve clause selection for predicates with some (say 2…10) clauses.
  • Reorganize the data representation, notably for 64-bit systems. The tagging scheme for 64-bit is just a slightly modified version of the 32-bit scheme. That can be a lot better, reducing indirections and thus cache misses when accessing data.
  • Move some critical foreign predicates to the VM.
1 Like

There is ongoing work on Scryer Prolog, which aims to be the high performant one, especially since it is written in Rust.

Interesting,

Too bad that this effort is not focused expanding / improving swi-prolog.

From what I learned about Rust, one key aim is to be fully inoperable with C / C++ – with the intent to port existing code bases to Rust in a piece by piece manner.

Wouldn’t it (have been) great to continue improving on code base that embodies a decade (and more) of experience in Prolog design, by further adding the safety of Rust.

In terms of performance – i think C / C++ is still somewhat faster than Rust, while Rust provides the pointer safety harder to accomplish in C++ (with ownership models and the like).

Dan

I’m not very convinced using Rust for developing a VM. Better management of object lifetime is of course great, especially for applications. For a VM though you typically need dedicated techniques. Just consider clauses. Prolog must keep them after deletion until no thread has an open call on the involved predicate with an older generation and until it is no longer involved in an explicit database reference as returned (for example) using assertz/2. You cannot do this using reference counting as multiple threads running the same clauses will cause a huge slowdown. These things are in SWI-Prolog handled using a dedicated garbage collector. Generic GC is also problematic as it scans far too much for finding references, seriously slowing down Prolog programs with many gigabytes of in use memory.

I’ve tried using the Boehm garbage collector as getting rid of old objects safely is a big challenge when using lock free algorithms. There are still various traces in the code. Initially it seemed quite nice, but I reverted quickly after running some really big programs and observing a very large performance degradation.

Keri and I also tried reference counting on several objects, but we needed to abandon that idea for most object types as well.

So yes, I believe you can relatively quickly implement a safe multi-threaded Prolog in Rust that will perform well single threaded. I doubt it will do a good job on real concurrent workloads though. It may be possible to bypass Rust’s default coping mechanisms to improve on that. Possibly you still gain in terms of readability. Possibly you loose.

1 Like

2 posts were split to a new topic: Is SWI-Prolog being used in domains that requires multi-threading at scale?

10 posts were split to a new topic: Encouraging industry about Prolog

Sometimes another unit is used. There is some relationship between logical variables and argument positions in compounds. And during last call optimization, when we drop a frame, we can view it as going from one compound to another compound (my ideas, not from the paper below).

For example I find:

“Linearity analysis [33] for Mode Flat GHC is more directly concerned with the
number of access paths. Under reasonable conditions, it enables us to implement
setarg=5 as destructive update as long as the original structure is not shared.”

A Pure Meta-interpreter for Flat GHC - Ueda 2003
https://www.researchgate.net/publication/2479811

But it possibly breaks if the functor or arity changes. So simple reuse of previously allocated compounds falls appart as soon as the compound indicator changes. What I would like to see is Scryer-Prolog running, but it seems broken at the moment. [user] Ctrl-D doesn’t work as advertised.