I think you literally compile last call into iteration sometimes,
depending on determinism of the current clause. This compilation
gives quite some speed. But I guess it also accounts for a
further budget of 10-100 days compiler design and implementation:
app([X|Y], Z, [X|T]) :- app(Y, Z, T).
?- vm_list(app).
0 h_list_ff(3,4)
3 h_void
4 h_list
5 h_var(3)
7 h_firstvar(5)
9 h_pop
10 i_enter
11 l_nolco('L1')
13 l_var(0,4)
16 l_var(2,5)
19 i_tcall
L1: 20 b_var(4)
22 b_var1
23 b_var(5)
25 i_depart(app/3)
27 i_exit
The depart is also found in A Portable Prolog Compiler, Clocksin
et al. from 1983 as an addition in the section “Some Additions to
the Intermediate Language”, its not part of the 7 basic instructions.
Clocksin refers to a Warren 1980 paper concerning the depart instruction.
What SWI-Prolog added further is l_nolco and i_tcall, l_nolco doing
the determinism check, right? And i_tcall doing the looping jump. Seems
you need the check always since freeze/2 can sneak in non-determinism.

