Elevator Example and the Prolog VM

@grossdan Expressed a lot of doubts that Prolog can execute deterministic code as fast as C. On the other hand I am rather optimistic about that, although my own Prolog system is also still far off from that goal.

@grossdan Objected mainly that Prolog code needs to execute non-determinism related instructions. On the other hand from my system I don’t see this a problem. Deterministic code does only move forward in a “call” fashion.

A “redo” is only searched when a call fails. But “determistic” means not to fail, and choice points are ommitted, so deterministically succeeding predicates shouldn’t leave any choice point related trace.

Here is a little comparison, take this deterministic code:

elevator(N,M) :-
    elevator(N, 0, M).

elevator(0, N, N) :- !.
elevator(N, X, Y) :-
     M is N-1,
     H is X+1,
     elevator(M, H, Y).

SWI-Prolog already performs very well:

?- time((between(1,10000,_), elevator(10000,_), fail; true)).
% 100,030,001 inferences, 2.969 CPU in 2.972 seconds (100% CPU, 33694316 Lips)
true.

But then I find for ECLiPSe Prolog:

[eclipse 10]: between(1,10000,_), elevator(10000,_), fail; true.
Yes (1.72s cpu)

Where is the bottleneck? I think both have tagged integers and bignums. I rather assume this is not the bottle neck it has to do with code generation.

If I look at the generated code, I find. SWI-Prolog detects via i_tcall a self call.
But isn’t sure about it and needs a i_nolco check:

?- vm_list(elevator/3).
========================================================================
elevator/3
========================================================================
       0 s_static
       1 i_exit
----------------------------------------
clause 1 (<clause>(0000000006A25970)):
----------------------------------------
       0 h_smallint(0)
       2 h_void
       3 h_var(1)
       5 i_enter
       6 i_cut
       7 i_exit
----------------------------------------
clause 2 (<clause>(00000000067BC950)):
----------------------------------------
       0 i_enter
       1 a_add_fc(3,0,-1)
       5 a_add_fc(4,1,1)
       9 l_nolco('L1')
      11 l_var(0,3)
      14 l_var(1,4)
      17 i_tcall
L1:   18 b_var(3)
      20 b_var(4)
      22 b_var2
      23 i_depart(user:elevator/3)
      25 i_exit
true.

On the other hand ECLiPSe Prolog always tail recurses with a jmp,
and does need less permutation of variables:

[eclipse 9]: disasm(elevator/3, L), member(I, L), write(I), nl, fail; true.
label(_1311)
savecut(a(4))
integer_switch(a(1), [0 - ref(_676)], ref(_638))
label(_676)
try_me_else(0, 4, ref(_728))
get_integer(a(1), 0)
get_value(a(3), a(2))
cut(a(4))
ret
label(_728)
trust_me(9)
label(_638)
bi_addi(a(1), -1, a(1), 24)
bi_addi(a(2), 1, a(2), 24)
jmp(ref(_1311))
code_end

Maybe the static ECLiPSe Prolog integer_switch is also faster than SWI-Prolog just-in-time indexing. At least in my Prolog system the indexing still incures some overhead.

Java and C is only faster, because one could say the argument to the elevator is a primitive datatype. The check for tagged integers or branching into bignums, is then not necessary. So that the arithmetic gets quite some ticks faster.

Here is a Java code:

public class Elevator {

    public static int elevator(int n) {
        return elevator(n, 0);
    }

    public static int elevator(int n, int m) {
        if (n==0) {
            return m;
        } else {
            return elevator(n-1, m+1);
        }
    }

}

It runs ca. 6x times faster than ECLiPSe Prolog on same machine:

time=298 (ms)
time=295 (ms)

Thank you for looking into this.

Could you test performance of Elevetor if it is fully written in imperative style and not as a recursion – a simple loop.

I think this is justified because If I were to write this as a foreign function, that is what I would do and not mimic the recursive style of Prolog. I think this is also the intent to allow imperative-looking code in Picat

Daniel

To aid in further exploring my line of argument, let me try summarize my considerations as to why I believe C will outperform Prolog when it comes to system level programming:

  1. Prolog isn’t always capable to identify on its own at compile time, whether a call is derterminstic or not. Hence, some overhead may be generated by assuming that the call is non deterministic.

  2. One key purpose of cuts is to guide Prolog in making a predicate deterministic – however, cut is a runtime feature and hence only prunes choice points already generated – i.e. the overhead already happened.

  3. During recursive calls, not all Prolog implementations ensure fully optimized argument passing, in particular for context arguments (and accumulators?), and unnecessarily push arguments on a stack rather than eliminate argument passing and turning those arguments into “real” local variables.

  4. When it comes to data structures stored in the fact base, Prolog only supports hash based indexed lookup – which for some purposes such as graph-based or array based data is not as efficient as pointers and arrays possible in c (or rust).

(Btw, as an aside, it turns out that Rust has serious problems with pointer based graph structures due to its strict pointer ownership model enforced during compile – graphs are either implemented with pointers and unsafe or with kinds of adjacency arrays – not sure what the performance implications of that is, when graphs need to steadily grow)

I think in system near programming having full control over how data is stored and accessed in memory is a key part of writing performant code. With the Prolog fact base stored data, this is not possible – it will always be relational and indexed.

Transient data – i.e. terms – generated during calls can be made more efficient such as by use of “holes” (e.g. difference lists) and by use of arrays in arg structure – a professional prolog programmer can surely optimize here – but, such optimizations are then only limited to term data and not data in the fact base.

Graph based and array based datastructures that by design should be part of the fact base would therefore perform significantly better than prolog when implemented in C.

  1. VM based instructions in Prologs ultimately always have some interpretive step – even if it then calls a c-based implementation of a predicate, whereas C is compiled.

  2. perhaps somewhat controversial but C, C++ and Rust, do not support garbage collection out of the box – by design – and hence i am to give programmers full control over the overhead produced by memory management – including pre-allocations, programming techniques to more even distribution of overheads that come from resizing memory structures and the like.

Btw, implementing c code as foreign predicates carries an overhead when called, coming from calling across the foreign predicate interface – so a Prolog implementation that can minimize or even eliminate such an overhead would perform better – what if both Prolog and foreign predicate live in the same Java VM – “foreign” calls should then perform better than calls over traditional FFI …

This is what i can currently think of.

What else, is relevant?

thanks,

Dan

Its a great example of a Prolog system where its VMI is (jit) compiled, hence should be faster than an interpreted one – at least so it seems to me.

I thought that every java program gets jit compiled – that is why java, these days, performs (almost) as fast as compiled programs.

But, you are right, i got confused – also swi-prolog is, in this sense compiled – c compiled.

As i mentioned, i got confused.

SWI prolog interprets its VMIs and your Prolog interprets its VMIs. SWI’s binary are compiled c and your “binary” is initially java byte code which is then jit complied.

Dan

I actually had a brief discussion with Jan about using attribute variables to represent a graph – after Markus (Trishka) had mentioned that approach to me.

I was unsure how having a graph referenced as an argument would work when multiple clients would try to access it concurrently. There is also the issue of dealing with making the changed graph available – which is threaded through variables…

It seems that Jan preferred having it reside in the fact base.

Edit:

I am currently implementing a messaging based API via websockets – so multiple websocket clients could create, update and query the graph based structure concurrently.

“Deterministic code” usually does not refer to issues of data modelling. It more refers whether you can execute the control flow of determinstic code, without overhead in Prolog.

Maybe open a separate thread for your graph modelling problem. I guess you want deterministic and non-deterministic Prolog code to run efficiently over your graphs?

Not exclusively deterministic queries? (Otherwise Prolog wouldn’t make any sense)

Indeed, a lot of code is deterministic – and Prolog non-determinism doesn’t offer any advantage for this code.

Within the deterministic code there are islands of non-determinism – for example, when checking for semantic correctness and non-violation of integrity constraints-- which has to be checked prior to committing changes.

Hence, my quest to make deterministic code in Prolog very efficient, but allowing me to stay within Prolog’s uniquely simple syntax, productive language environment, for both kinds of code.

In my mind the ideal Prolog implementation would enable a seamless imperative style of programming that would be automatically compiled, say, into optimized C, and wrapped into a very efficient foreign language call interface.

How flexible, seamless and performant this can be provided is an open research question – since it does require integrating code that runs native with VMI’s that run in a Prolog VM – but, i guess, Java seems a great candidate for such research since Prolog VM interpreted VMIs and imperative code run in the same Java VM.

The ideal prolog implementation would also allow indicating that code should be treated deterministically - -quite like the use of once/1 – just that it compiles into a deterministic call, rather than include cut/prune after the call – this could also include a once for a block of code which would treat each conjunctive call in the block deterministically – if one fails, all (prior calls) fail.

Dan

Hi,

I do think I understand your comment – but, your code was recursive and is likely expensive in C to execute – i think the comparison should be between how loops are written – and optimized – in Prolog and how loops are typically written in imperative languages.

Hence, the java should encode a simple loop/.

Clearly a more compact / optimized data / memory structure will also play a role …

Why not compare the recursive java with the loop java implementation …

Dan

You can try yourself a loop, the Java code will only get faster. Now its around 6x times faster. Which already supports my argument. If it were 9x times faster it would also support my argument. My argument boils down to that this here:

Is much faster than this here:

The difference is in the type int versus term_t. And a further difference is then how addition is realized. In JIT-ed Java it will be a 32-bit addition machine instruction and in Prolog, when current_prolog_flag(bounded, false) holds, it will be a complex routine

that does also handle overflow, which Java doesn’t. The Prolog flag bounded is from the ISO core standard. It shows again that I was comparing Apples with Oranges. Java doesn’t have such a flag, but Java int is basically bounded=true, and not bounded=false like

SWI, ECLiPSe and few other Prolog systems. But there was an old WebAssembly version of SWI-Prolog which had bounded=true. But you can then not anymore do a lot of nice things
with your Prolog system.

I think your analysis is great and also very educational for me.

I think this is exactly the kind of analysis needed to get at the performance considerations – and possibly also broaden the scope of analysis from the time a call is made to the time a call returns – to see what work the Prolog VM is really doing vs. what work an equivalent, but optimizied, implementatin in Java would be doing – and finally, an optimized implementation in C.

Dan

I think optimizing for numbers is one key optimization technique.

In my work I notice a further need. The ability to efficiently associate backtrackable data to predicated atoms in the (dynamic) fact base.

Currently, one way to do this is to create a mapping, for example, by use of assoc. A better approach would be something similar to attributed variables – where an attribute is associated with a variable via a pointer.

Similarly, I felt that having an atom or, perhaps a predicated atom (e.g. an first atom argument in context of a predicate) associated with backtrackable attribute could, i think, significantly improve efficient access of attribute lookup – once a reference to the atom is loaded in a variable.

Dan

Is there anything better than Prolog’s “pattern matching” (unification)? Haskell and Erlang/Elixer are the next-best that I can think of, but they’re essentially one-way unification. Also, Prolog’s logical variables are really handy; and they allow more opportunities for tail recursion. (Haskell has a built-in “++” because append isn’t tail recursive in Haskell; it is tail recursive in Prolog.(*)) So, even without non-determinism, Prolog has significant advantages

As for the overhead of dynamically figuring out a data type – in theory, a type-inferencing compiler could help with that (possibly combined with mode declarations, like in Mercury). I once sped up a Common Lisp program by about an order of magnitude, using two straight-forward optimizations: changed some lists into structures / arrays, and added some type declarations (in some cases, these were “dangerous”, by omitting even the type check). The first optimization isn’t so attractive with Prolog (although arrays would be nice); the second optimization is probably good for something 2-3x by itself (lots of caveats on this number that I picked out of the air).

(*) Of course, there’s no 100% free lunch - Prolog’s tail recursion comes at the cost of extra indirection from unified logical variables; but that overhead is constrained whereas a non-tail-recursive stack can grow unboundedly.

I keep wondering about the dichotomy between the dynamic fact base and the data structure that are created and processed during goal processing.

And the work around i notice because of that – such as “backtrackable” asserts and the like.

I wonder if a more unified approach could be envisioned …

Thank you Peter,

It looks like Erlang is the closets to Prolog in that it has the Symbol. List and Map as its basic data types and that it also runs on its VM rather compiled. It seems that the core language is also quite simple – which is also similar to Prolog.

I think Haskell is compiled, and oddly enough, i was not able to find an easy to read manual for Haskell on the Haskell site – they link to various commercial books – and from trying to decipher the compiler guide it seems that its a (very) complex core language – reminds me of Ada in its complexity - wonder why they made it so complex – what’s the gain here?

It seems that the learning curve cost to start working in Haskells is very high and Erlang would have similar performance considerations that Prolog has – probably less given that it only executes in one direction - but system level programming will still carry overheads one would want to get rid off.

Indeed, if i were to switch from Prolog, probably Erlang would be the way to go …

But, i think, given that Erlang is not compiled – its probably better to invest the effort into C/C++ foreign language extensions – or perhaps, even Rust based extensions with C/C++ wrappers to connect it to Prolog.

Dan

p.s. i guess Lisp could be another potential language – and, i guess, its compiled – but, the language syntax is surely something someone needs to get used to – and am unsure how big the developers community is …

Thanks.

I guess for commercial development one would want to be careful to choose something actively developed / maintained – so a compiler currently not maintained (per the site) would be a problem.

Thank you.

I never warmed up to Javascript-- not sure why … I think its so super flexible a language, with various very subtle semantics, such as the ones governing object property propagation (if i recall correctly), that you are bound to make mistakes – almost as bad as C, just with another layer of complexity given its object and functional programming capabilities :slight_smile: