Is this doing TCO, and if so how?

::Annie is here, arguing bitterly with herself::

When I gave the SWI-Prolog course a couple years ago I made this exercise, and included the
code below, with the explanation

in double/2 we compute the value of HOut on the recursion down, but tack it onto the front of the output list on the way out. We save stack by using TCO .

Jason Rothfuss, one of my students, writes

The part I’m confused about is in bold about saving stack. I don’t see how stack can be saved here because you have to hold on to the HOut value until you’re on the way up/out. In other words, I don’t believe swipl can achieve TCO because the stack frame has a value that must be preserved to be used on the way up/out. Is that right, or am I missing something?

I agree with him, I guess (though at this point I’m baffled too).
Running the test, I see a spike in global but not in local.

So either this is doing TCO, or I’m misunderstanding the thread monitor output.

double([], []).
double([H|T], [HOut|TOut]) :-
    HOut is 2 * H,
    double(T, TOut).
test :- 
    findall(X, between(1, 1_000_000, X), List),
    writeln('made list'),
    double(List, _).

::Annie is here, looking baffled::

1 Like

I have been told (by Jan himself) that I can check if tail-call optimization is performed by using prolog_current_frame/1. If it is the same in consecutive calls, then it is indeed optimized.

Like this:

double([], []).
double([H|T], [HOut|TOut]) :-
    format("~w~n", [Frame]),
    HOut is 2 * H,
    double(T, TOut).

And when you run it:

?- double([1,2,3,4], R).
R = [2, 4, 6, 8].

I admit this is a functional readout but the good thing about it is that I don’t need to understand the mechanism in order to make a meaningful conclusion.

Here starts the part where I say a lot of stupid things and someone corrects me. I will attempt an explanation that does not involve looking at byte code or the VM:

As you go deeper in the recursion, you are progressively unifying a free variable (the tail of the list in the second argument) with a new list with one element and a free variable as a tail. The algorithm itself only needs to hold on to that free variable, so it is “constant space” (but creates a data structure that grows).

This is identical to having to hold on to only the last element in a linked list if you want to copy it to another one.


Ok, that was my suspicion after looking at the thread monitor, and the prolog_current_frame trick is awsome!

“TCO” is a new term to me … I’m always used to “TRO” (Tail Recursion Optimization"). But “TCO” is more accurate.

I remember once trying to explain (and failing) to a LISPer why append/3 is TRO/TCO in Prolog but not in Lisp:

(defun append (ls1 ls2)
    (if (null? ls1)
      (cons (car ls1) (append (cdr ls1) ls2)))))

compared to Prolog:

append([X|Xs],Ys,[X|Zs]) :-

In LISP, the final call is to cons, so there’s no tail-call (the if isn’t the final call because it can be collapsed). In Prolog (when called with the first two arguments sufficiently instantiated), the final call is to append/3 because the “output” argument ([X|Zs]) contains a “place holder” (logical variable) that’s allocated on the heap (there are technicalities here, but I’m gong to avoid going into the details of Prolog’s stack, trail, heap). And the TCO/TRO isn’t entirely free: the logical variable uses space, even when it’s eventually instantiated.

1 Like

The logical variable is like magic, in the sense that it gives you a lot that you’d do with pointers in other languages, without the pointers. Like having your cake and eating it.


Not a direct answer but something of note that people often forget to check when looking for more info. While Google is great don’t for that sometimes a picture is worth a thousand words and instead of looking for text based answer look for a picture by searching Google images (ref).

Note: I did not check any of the images for a clearer understanding but there are many with different ways of trying to represent how TCO works.

But, as @Boris says, a lot of the magic is in the logical variable, and that magic doesn’t show in most TCO diagrams. TCO is nice for functional languages (Erlang, for example, depends heavily on it) and a lot nicer for Prolog.

1 Like

My preference for an image that displays the significance of TCO would be one without TCO and showing how the stack grows because it is keeping each frame and a second with the code using TCO as showing the same memory being reused for each call. To make the image effective it would also have to show about three recursive calls with the results of the current stack and then the change of either adding a new frame or changing the current frame.

If I wasn’t busy looking into migrating the site I would do this.