When are evaluable functions looked up?

I am trying to get my head around how to write test cases
and check error balls, in the light of this discrepancy:

GNU generates a single error:

/* GNU Prolog */
?- X is append(1,2).
uncaught exception: error(type_error(evaluable,append/2),(is)/2)
?- X is append([1,2],[3]).
uncaught exception: error(type_error(evaluable,append/2),(is)/2)

SWI generates two different errors:

/* SWI-Prolog */
?- X is append(1,2).
ERROR: Arithmetic: `append/2' is not a function
?- X is append([1,2],[3]).
ERROR: Type error: `[]' expected, found `[1,2]' (a list) 
("x" must hold one character)

So it seems SWI-Prolog does first evaluate all arguments, and
then check whether a functor is an evaluable function? Right?

It depends. Runtime evaluation, yes. When arithmetic is compiled (using -O flag) the compiler will already complain append/2 is not a function and refuses to compile the code. Compiled arithmetic creates a stack engine. Runtime evaluation simply processes the term and indeed first evaluates the arguments.

In general, I think there are two schools in the Prolog world. One where is/2 and friends simply call predicates and the other where it is restricted to arithmetic and optimized for this purpose. SWI-Prolog belongs to the latter and defines that an arithmetic function is a function on a set of numeric values returning a numeric value or raising an exception. An advantage of keeping arithmetic restricted to numbers is that we can avoid translation between the native C arithmetic types and their Prolog representation for intermediate results. But yes, the disadvantage is that you can only use it for numbers.

Where do you place the -O flag? How do I compile some Prolog
code? I was looking at the code, via vm_list/1. Its not doing anything
to append/2 or complain about it:

test :- _ is append(1,2).
test2 :- _ is append([5,4],[6]).

?- vm_list(test).
       [...]
       2 b_functor(append/2)
       [...]
       9 i_depart(system:(is)/2)
       [...]
?- vm_list(test2).
       [...]
       2 b_functor(append/2)
       [...]
      17 i_depart(system:(is)/2)
       [...]

The error is a runtime error. I guess any compile time error will
look the same, especially if code is generated for a stack engine.
i.e. have two different errors instead of a single error?

If you generate stack engine code, by first generating the operands
and later generating the functor, you would get the same two different
errors and not a single error, i.e. stumbling over [5,4] which cannot

be evaluated, namely the second error in SWI-Prolog is from this:

?- X is [5,4].
ERROR: Type error: `[]' expected, found `[5,4]' (a list) 
("x" must hold one character)

The single errors in GNU Prolog come from another approach.
First checking the functor, and then doing something.
I am more currious whether a certain order has a performance

advantage. Like if you do the functor lookup first, you can
release the functor and keep the handle? Don’t know.

See below. The -O flag is also accessible using the Prolog flag optimise.

swipl -O
...
?- [user].
|: test :- _ is append(1,2).

ERROR: user://1:10:
ERROR:    Arithmetic: `append(1,2)' is not a function (No such arithmetic function)
|: test :- _ is 1+2.
|: ^D% user://1 compiled 0.01 sec, 2 clauses
true.

26 ?- vm_list(test).
========================================================================
test/0
========================================================================
       0 s_virgin
       1 i_exit
----------------------------------------
clause 1 (<clause>(0x556dadee59e0)):
----------------------------------------
       0 i_enter
       1 b_void
       2 a_enter
       3 a_integer(2)
       5 a_integer(1)
       7 a_add
       8 a_is
       9 i_exit
true.

I started using this test case:

test :-
   between(0,1000000,N),
   _ is exp(1+N/1000000),
   fail.
test.

To test a new Java foreign function interface. I then
observed that SWI-Prolog stack engine causes
a little overhead:

/* SWI-Prolog, 9.1.14, optimise=false */
?- time(test).
% 2,000,001 inferences, 0.313 CPU in 0.315 seconds
(99% CPU, 6400003 Lips)
true.

/* SWI-Prolog, 9.1.14, optimise=true */
?- time(test).
% 1,000,002 inferences, 0.172 CPU in 0.176 seconds
(98% CPU, 5818193 Lips)
true.

Intrestingly GNU Prolog doesn’t use a stack engine,
just relies on the native stack. Its quite speedy without
any optimisation:

/* GNU Prolog 1.5.0 (64 bits) */
?- test.
(125 ms) yes

The internal call is tail recursive I guess, since the functor is
already checked, and a looked up handle, a function pointer,
causes the evaluation. Recently GNU Prolog has moved to GitHub,

so I can find the source code of GNU Prolog stuff more easily, things
like Load_Math_Expression. But I think the GNU Prolog approach is
only feasible, if you dare to rely on the native stack.

Optimization avoids building the term and then processing it as well as finding the function from the functor. A little performance analysis found a bit easy speedup, so for my current git I now get 113ms against GNU-Prolog 125 (also for me :slight_smile: ) That is quite remarkable because GNU-Prolog doesn’t deal with large integers, overflow checks, float rounding, etc. It would be quite trivial to find some more possible speedups. The arithmetic primitive where never optimized much. I think it would be quite easy to speed this up a little more.

Some Prologs have allowed the programmer to make what look like “function calls”, for example (where $ is used to mark functional evaluation) foo($bar(1,2) Z), which is equivalent to bar(1,2,R), foo(R,Z). We already have this kind of term expansion for the . operator (for dicts); it should be straightforward to extend this to arbitrary predicates. And it might also make sense to define, e.g. +(A,B,C) :- C is A+B, which would allow foo($1+2,Z).

($ is a terrible choice for an operator, but it’s probably relatively unused; other possibilities are *, #, !, etc.)

The most common places where I would like “functional” notation are:

  • When counting – typically, X2 is X+1, foo(X2, ...).
  • When appending – typically append(A,[B],C), foo(C, ...) … this can sometimes be avoided by using [B|A] (but that reverses the order) or by putting [B|A] in the head, or by using difference lists.

Could you please open a different thread? This question applies equally well to
the two worlds, that Jan W. mentioned. Like when you use foo/3 instead of append/3.
This is also a decoy. For gods sake, just look at the first post of this thread:

Both Prolog systems belong to the same world. In both systems ('.')/2 is
also an evaluable function, provided the tail is [].

You can evaluate ('.')/2 in GNU Prolog:

/* GNU Prolog */
?- X = "a".
X = [97]
yes

?- X is "a".
X = 97

And you can evaluate ('.')/2 in SWI-Prolog:

?- set_prolog_flag(double_quotes, codes).
true.

?- X = "a".
X = [97].

?- X is "a".
X = 97.

That ('.')/2 can be evaluated is a legacy which already existed in DEC
Prolog, as an alternative to have a codepoint constant notation inside
evaluable expressions. But now I gave more background information

and the question of my post should be clear now, right? Do you even
understand what is asked? Or maybe you got distracted because I used
append/3 as an example? Maybe Jan W. got also a little bit distracted?

Maybe it helps if we recall that there are different tree traversal orders.
If you build a tree visitor for Prolog compounds, you have different options,
when you visit the functor. Here are two examples:

prefix_visitor(T) :-
   T =.. [F|L],
    write(F), write(' '),
   (member(X, L), prefix_visitor(X), fail; true).

postfix_visitor(T) :-
   T =.. [F|L],
   (member(X, L), postfix_visitor(X), fail; true),
    write(F), write(' ').

If we take the example, and visit the less controversal foo(bar(1,2),3),
then we find that sometimes the functor foo appears first, and sometimes
the functor foo appears last, even after the functor bar:

?- prefix_visitor(foo(bar(1,2),3)).
foo bar 1 2 3 
true.

?- postfix_visitor(foo(bar(1,2),3)).
1 2 bar 3 foo 
true.

I wonder whether the prefix traversal by GNU Prolog has some performance
advantage over the postfix traversal by SWI-Prolog? Like in the prefix traversal
we can get rid of the functor, no need to keep it till the end, and keep the handle,

until the evaluation results are ready. This question is more an interpreter question
about evaluable expressions and less a compiler question about evaluable
expressions. But could be also relevant for compilation.

I don’t see it. Yes, depending on the order of visiting we get different errors. Fine. Glad (I hope) ISO didn’t define which we should get in this respect. Comparing GNU-Prolog to SWI-Prolog wrt. arithmetic performance is close to pointless. Native compilation almost surely wins. Probably more important is that GNU-Prolog uses raw C datatypes for its arithmetic, does no overflow checking, stack shifts or garbage collection. That helps. In addition SWI-Prolog is safe against stack overflows and cyclic terms during evaluation. So.

gprolog
| ?- A = (A+1), B is A.
Segmentation fault (core dumped)
swipl
?- A = (A+1), B is A.
ERROR: Type error: `expression' expected, found `@(S_1,[S_1=S_1+1])' (a cyclic) (cyclic term)

All this stuff costs CPU cycles :frowning: The order of traversal seems no important factor in this. The encoding in postfix is normally done by the compiler. Interpreting an expression uses this technique as well though, while avoiding the C-stack. That we way can process much larger terms and ensure a graceful exception if the term is too large anyway. The cycle detection is executed if we get deep into the structure (I think > 1,000 levels).

Correct results, no limits (except for memory) and trapping all errors as Prolog exceptions you can catch is quite high on the agenda (and quite close to being a fact, although there are still a few places where you can trigger C stack overflows).

The byte-code of GNU Prolog doesn’t use any evaluable
function compilation scheme. I was using byte-code for
my benchmark testing. To view the byte-code you can use:

gplc -w math.pl

It then gives me a file math.wbc, nothing compiled, just a
term handed to (is)/2. Just like set_prolog_flag(optimise, false)
in SWI-Prolog, the same code more or less:

clause(:-(test,','(between(0,1000000,A),
','(is(_,exp(+(1,/(A,1000000)))),fail))),[
    allocate(1),
    put_integer(0,0),
    put_integer(1000000,1),
    put_variable(y(0),2),
    call((between)/3),
    put_void(0),
    put_structure((exp)/1,1),
    unify_structure((+)/2),
    unify_integer(1),
    unify_structure((/)/2),
    unify_local_value(y(0)),
    unify_integer(1000000),
    call((is)/2),
    fail]).
clause(test,[
    proceed]).

The evaluable function lookup happens in Load_Math_Expression.
I double checked the GNU manual. byte-code and not native-code, is what
you get when you use user consult [user], thats how I quickly tested.

ISO asks us to do so (as did SWI-Prolog when it used simple native stack traversal long ago).

Depends on your host language. C is very bad at handling stack overflows. I try to do that for the two remaining recursive C functions, but notably if some cleanup is required it gets really hard. Doing your own stack is not slower, typically uses less stack space and allows for portable handling of overflows.

Nice test for expression evaluation, where D controls the expression depth and N
the number of iterations. Here we see GNU Prolog is about 4 times faster. This is probably a fair price for sound arithmetic, handling larger and cyclic expressions and recovering gracefully.

t(D,N) :-
    time(t_(D,N)).

t_(D,N) :-
    add(D, Expr),
    (   between(1, N, _),
        _ is Expr,
        fail
    ;   true
    ).

add(0, 1) :- !.
add(N, (Expr+N)) :-
    N2 is N - 1,
    add(N2, Expr).

I have the feeling that it is a magnitudes slower, 0.141 CPU
overhead in SWI-Prolog, measured via optimise flag and
the orginal test/0 case.

is it that bad? Isn’t there some C-programming standard
which has improved on that? Or is this rather an optional
component, like for example garbage collection.

I have already tested C Python, and I got:

with nodeJS, I now get:

/* Dogelog Player 1.1.1, nodeJS */
?- X=1+X, Y is X.
Unbekannte Ausnahme: 0rReference

Yeah, it offers stackoverflow exceptions!

If the original is this, my current SWI-Prolog does the job in 105ms and GNU Prolog in
116ms. And the time is not spent in is/2, but merely in between/3 and bracktracking.

between/3 suffers from the relatively high overhead of calling (non-deterministic) foreign
predicates. Possibly calling foreign code can be improved. Surely here we loose quite
a bit of time preparing to leave the VM and re-entering it. Most of that is due to the need to deal with garbage collection and stack shifts while the foreign predicate runs. Something that is just overhead in this case :frowning:

Not that I am aware of. In MSVC there is try … catch, which is fairly trivial to use, but not C (standard). A few old systems had a stack_avail() call. On POSIX systems you can get the stack size using getrlimit(). Next you have to figure out the direction it grows and the base
(guess by rounding the stack top to the pagesize immediately at main()). From all that you can write your own stack_avail(). Unfortunately you have to be able to guess how much stack is required until the next point you can/will check.

The alternative is to setup a signal stack, trap SIGSEGV and use setjmp(). But then you need to figure out whether the SEGV was due to a stack overflow or another bug. Anyway, look for the macro C_STACK_OVERFLOW_GUARDED() in the SWI-Prolog source. It protects read/1 and friends as these (still) use the native stack.

Seems CPython uses a parameter for the max recursion depth that defaults to 1,000. In the functional and logic programming world we may call that “recursion for beginners” :slight_smile:

That was only with set_prolog_flag(optimise, true). But your new test is
agnostic to this flag I guess, we are now testing dynamic expression
evaluation, inspired by the question what a_varXXX does:

How can we make this faster?

As said, it is only 4 times slower. I think that is more or less what to expect comparing no safety vs all safety. Safety is expensive :slight_smile: Using t(100000, 10), 30% of the time is spent on cycle detection which is executed after diving 1,000 nodes deep.

If you go through the code of valueExpression() you can probably squeeze it a little more. valgrind claims the PGO optimized process compiles it as cold code, which means it is optimized for space rather than time. This is due to the fact that runtime arithmetic expressions are rare and not part of the PGO training set.

Maybe an argument in favor could be if you have resizable thread sizes.
Java is surely inflexible in this respect. At least the last time I looked,
not sure what the current state is. There are also fibers comming now.

I think it can be made faster. I guess also in SWI-Prolog. Now what
is the performance of this use of native stack? Don’t know, lets do
some testing. Running your new example gives me.

First SWI-Prolog non-native stack:

?- t(1000, 10000).
% 21,000 inferences, 0.453 CPU in 0.465 seconds (97% CPU, 46345 Lips)
true.

?- t(10000, 10000).
% 30,000 inferences, 4.781 CPU in 5.102 seconds (94% CPU, 6275 Lips)
true.

Then formerly Jekejeke Prolog native stack:

/* Jekejeke Prolog 1.6.3, JDK 8 */
?- t(1000, 10000).
% Time 225 ms, GC 6 ms, Wall 11/09/2023 21:25
true.

?- t(10000, 10000).
% Time 2363 ms, GC 23 ms, Wall 11/09/2023 21:25
true.

Two times faster than SWI-Prolog, two times slower than GNU Prolog.

With some small optimizations (adding this to the PGO training data and use a fast path for small integers (GNU-Prolog only has small integers)), we get

101 ?- t(99, 100000).
% 100,101 inferences, 0.189 CPU in 0.189 seconds (100% CPU, 530888 Lips)
true.

102 ?- t(101, 100000).
% 100,101 inferences, 0.356 CPU in 0.356 seconds (100% CPU, 281294 Lips)
true.

vs GNU-Prolog

| ?- t_(99, 100000).
(136 ms) yes
| ?- t_(101, 100000). 
(137 ms) yes

Note that the cycle check is at depth 100 rather than 1,000 as I claimed before. So, without the cycle test SWI-Prolog is less than 50% slower than GNU-Prolog. Now, is the cycle test a bad idea? Hard to say. Such deeply nested evaluable expressions are rare. Not having a check will eventually raise a memory resource error, but only after eating all the memory it can get from the machine. Eating all memory of the machine can cause all sorts of issues. Probably we can get this a bit faster. There is not much point though as evaluation of expressions not known at compile time is rare.

For the fun, a changed version of the test below, where t2/2 compiles the expression. Now for
t(1000, 100000) we get 3.795 sec and for t2(1000, 100000) we get 1.041 sec. For the compiled version the compiler does the cycle check, finds the function (ar_add()) from an array and gets the constant 1 more efficiently.

t(D,N) :-
    add(D, Expr),
    time(t_(N, Expr)).

t_(N, Expr) :-
    (   between(1, N, _),
        _ is Expr,
        fail
    ;   true
    ).

:- dynamic t2_/1.

t2(D, N) :-
    add(D, Expr),
    retractall(t2_(_)),
    asserta((t2_(N) :-
                 (   between(1, N, _),
                     _ is Expr,
                     fail
                 ;   true
                 ))),
    time(t2_(N)).

add(0, 1) :- !.
add(N, (Expr+N)) :-
    N2 is N - 1,
    add(N2, Expr).

is there any use case for the cycle check? It gives an error more earlier.
If you have a resource quota, you would also get an error only a little bit
later, when the stack is to large and some memory is exhausted.

I don’t see any use case for a cycle check. You could do a cycle check
after resource is exhausted as a kind of error analysis service. But its
not that eval would implement some rational term unification, where

it would be part of the functional requirement. There is no functional
requirement for a cycle test. You could also simply let it exhaust some
resource limit. For example via resource quota, I simply get:

/* Jekejeke Prolog 1.6.3, JDK 21 */
?- X=1+X, Y is X.
java.lang.StackOverflowError
	at jekpro.reference.arithmetic.EvaluableElem.arit_add(EvaluableElem.java:369)

Since the default thread stack size is 1m, this is quite quickly exhausted,
and I get quite quickly a StackOverflowError. But do I care whether
a cycle induced error or a non-cycle induced error happend?