Please test branch no-local-references (Discussion)

Does this have to do with SWI versus YAP?
I find an old micro benchmark:

test1(0).
test1(N) :- N > 0, M is N - 1, test1(M).

test2(N) :- N > 0, !, M is N - 1, test2(M).
test2(0).

test3(N) :-
     (N > 0 ->
         M is N - 1,
         test3(M)
     ;
         true
     ).

The queries are:

?- time(test1(10000000)).
?- time(test2(10000000)).
?- time(test3(10000000)).

The old timings were on some unknown machine:

Test SWI 5.5.23 YAP 4.4.2
test1 3980 487
test2 3929 728
test2 4231 1094

Was redoing the testing right now, and I got:

Test SWI 8.3.5 YAP 6.3.0
test1 676 359
test2 750 422
test2 750 343

So there was some improvement already in the past.
The average ratio between SWI and YAP went down

from 5.3 and 1.9. Also there is no more difference
between if-then-else and cut for this example in SWI.

It would be quite a gas if this sound barrier could be broken.
Disclaimer: Did not yet test with your new variable handling.

Yes, in the sense that notably the Peirera benchmarks show that SWI looses most on tail recursive benchmarks. This is a requirement to improve on that.

Yip. On quite a few aspects. Notably arithmetic, handling (if->then;else) and small central changes. Current results (using this branch) are (225/269/183) while I get on the 8.3.5 on the same machine (219/281/199), so a small gain. Note that -O makes a huge difference on micro tests using arithmetic. Without we get (463/499/484). The GIT version of YAP (6.5.0) crashes on all three tests :frowning:

Hi Jan,

That looks very nice and all tests for EYE succeed


Looking at the git diff

things also look nice :slight_smile:
3 Likes

I got a number of warnings like this – I presume they’re OK?

[859/2396] Building C object packages/o.../CMakeFiles/plugin_odbc4pl.dir/odbc.c.o
../packages/odbc/odbc.c: In function ‘odbc_report’:
../packages/odbc/odbc.c:404:7: warning: ignoring return value of ‘PL_put_term’, declared with attribute warn_unused_result [-Wunused-result]
       PL_put_term(av+1, msg);

All tests passed: 100% tests passed, 0 tests failed out of 70

When I ran the swipl compiler, I got this error (I did not see this with the regular executable):

$ /home/peter/src/swipl-devel/build.nlr/src/swipl --stand_alone=true --undefined=error --verbose=false \
    --foreign=save \
    -o /tmp/pykythe_test/pykythe.qlf -c pykythe/pykythe.pl
[FATAL ERROR: at Thu Aug 20 12:12:51 2020
	Could not allocate memory: Cannot allocate memory]

I restarted Chrome (I always have too many tabs open) which freed up over 6GB and still got this error.

System information:
Linux 5.4.0-42-generic #46~18.04.1-Ubuntu SMP (Ubuntu 18.04.5 LTS)
   Fri Jul 10 07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

1 Like

I tested some of my code with the new swipl, and except for the stack overflow when compiling, everything worked. The new code is noticeably faster (15-30%, although there is some non-determinicity because I use multi-threading a lot). I have not tried PGO versions of the swipl executable.

1 Like

Fairly innocent. If you sync the submodules these should go away.

Best way to find more:

gdb --args swipl ...
(gdb) break fatalError
(gdb) run
<wait for the trap>
(gdb) bt
<stack trace>

That is encouraging. I now have a prototype that performs proper LCO (Last Call Optimization). That shows a nice 30% speedup on naive reverse (i.e., append/3). Extending it and trying it on larger programs though shows only very small improvements (more like 5%).

1 Like

@jan - here’s the gdb output from running the compiler:

$ gdb --args /home/peter/src/swipl-devel/build.nlr/src/swipl --stand_alone=true --undefined=error --verbose=false --foreign=save     -o /tmp/pykythe_test/pykythe.qlf -c pykythe/pykythe.pl
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/peter/src/swipl-devel/build.nlr/src/swipl...done.
(gdb) break fatalError
Function "fatalError" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (fatalError) pending.
(gdb) run
Starting program: /home/peter/src/swipl-devel/build.nlr/src/swipl --stand_alone=true --undefined=error --verbose=false --foreign=save -o /tmp/pykythe_test/pykythe.qlf -c pykythe/pykythe.pl
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff59f2700 (LWP 31352)]

Thread 1 "swipl" hit Breakpoint 1, fatalError (
    fm=fm@entry=0x7ffff7925f60 "Could not allocate memory: %s") at ../src/pl-init.c:1547
1547	{ va_list args;
(gdb) bt
#0  fatalError (fm=fm@entry=0x7ffff7925f60 "Could not allocate memory: %s")
    at ../src/pl-init.c:1547
#1  0x00007ffff78202fa in outOfCore () at ../src/pl-alloc.c:755
#2  0x00007ffff7855275 in compileArgument (arg=<optimized out>, arg@entry=0x7ffff7eb6fd0, 
    where=<optimized out>, where@entry=6, ci=ci@entry=0x7fffffffce00, 
    __PL_ld=__PL_ld@entry=0x7ffff7b86cc0 <PL_local_data>) at ../src/pl-comp.c:2251
#3  0x00007ffff7854c26 in compileArgument (arg=0x7ffff7eb6fd0, arg@entry=0x7ffff7e79c08, where=6, 
    where@entry=2, ci=ci@entry=0x7fffffffce00, 
    __PL_ld=__PL_ld@entry=0x7ffff7b86cc0 <PL_local_data>) at ../src/pl-comp.c:2432
#4  0x00007ffff7858ef9 in compileSubClause (arg=0x7ffff7e79c08, arg@entry=0x7ffff7e79b90, 
    call=call@entry=58, ci=ci@entry=0x7fffffffce00) at ../src/pl-comp.c:2716
#5  0x00007ffff785ee4f in compileBody (body=0x7ffff7e79b90, call=call@entry=58, 
    ci=ci@entry=0x7fffffffce00, __PL_ld=__PL_ld@entry=0x7ffff7b86cc0 <PL_local_data>)
    at ../src/pl-comp.c:2090
#6  0x00007ffff785fa6f in compileBody (body=body@entry=0x7ffff7eb8d30, call=call@entry=59, 
    ci=ci@entry=0x7fffffffce00, __PL_ld=__PL_ld@entry=0x7ffff7b86cc0 <PL_local_data>)
    at ../src/pl-comp.c:1945
#7  0x00007ffff7860619 in compileClause (cp=cp@entry=0x7fffffffd020, head=<optimized out>, 
    body=0x7ffff7eb8d30, proc=proc@entry=0x5555564540c0, module=0x555556253200, 
    warnings=warnings@entry=0, __PL_ld=0x7ffff7b86cc0 <PL_local_data>) at ../src/pl-comp.c:1714
#8  0x00007ffff7860f1b in assert_term (term=412, module=<optimized out>, module@entry=0x0, 
    where=where@entry=0x2, owner=owner@entry=0, loc=loc@entry=0x0, flags=flags@entry=0, 
    __PL_ld=0x7ffff7b86cc0 <PL_local_data>) at ../src/pl-comp.c:3734
#9  0x00007ffff78ebf22 in assert_wrapper (clause=<optimized out>, 
    __PL_ld=0x7ffff7b86cc0 <PL_local_data>) at ../src/pl-wrap.c:200
#10 0x00007ffff78ec3d7 in pl_c_wrap_predicate5_va (PL__t0=408, PL__ac=<optimized out>, 
    PL__ctx=<optimized out>) at ../src/pl-wrap.c:303
#11 0x00007ffff783ab8e in PL_next_solution (qid=qid@entry=24) at ../src/pl-vmi.c:3908
#12 0x00007ffff78846d4 in query_loop (goal=goal@entry=380293, loop=0) at ../src/pl-pro.c:136
#13 0x00007ffff7884f7b in prologToplevel (goal=380293) at ../src/pl-pro.c:482
#14 0x00007ffff78c8990 in PL_initialise (argc=<optimized out>, argv=<optimized out>)
    at ../src/pl-init.c:1151
#15 0x0000555555554756 in main (argc=9, argv=0x7fffffffe078) at ../src/pl-main.c:139

(gdb) show configuration
This GDB was configured as follows:
   configure --host=x86_64-linux-gnu --target=x86_64-linux-gnu
             --with-auto-load-dir=$debugdir:$datadir/auto-load
             --with-auto-load-safe-path=$debugdir:$datadir/auto-load
             --with-expat
             --with-gdb-datadir=/usr/share/gdb (relocatable)
             --with-jit-reader-dir=/usr/lib/gdb (relocatable)
             --without-libunwind-ia64
             --with-lzma
             --with-python=/usr (relocatable)
             --without-guile
             --with-separate-debug-dir=/usr/lib/debug (relocatable)
             --with-system-gdbinit=/etc/gdb/gdbinit
             --with-babeltrace

("Relocatable" means the directory can be moved with the GDB installation
tree, and GDB will still find it.)

The traceback looks weird, so here is my source cksums, so you can confirm it’s the same:

src (no-local-references=)]$ cksum *.[ch]3192933558  5100    defatom.c
2703422     9481    mkvmi.c
1579298944  4421    pentium.c
3810132944  3225    pentium.h
3679655744  39706   pl-alloc.c
3094503525  5104    pl-alloc.h
168324353   2586    pl-allocpool.c
1877401875  2161    pl-allocpool.h
3266861646  110000  pl-arith.c
244690134   4924    pl-arith.h
696917127   3502    pl-assert.c
1152863446  55332   pl-atom.c
2293389129  32140   pl-attvar.c
419982284   10858   pl-bag.c
1385357881  2966    pl-beos.c
2015293418  3663    pl-btree.c
3043062154  11501   pl-builtin.h
3583501801  3423    pl-codelist.h
4286413215  1653    pl-codetable.c
3284100915  191675  pl-comp.c
2340691908  4017    pl-comp.h
356858181   13328   pl-cont.c
1267734278  20154   pl-copyterm.c
117166466   2064    pl-copyterm.h
194910515   12373   pl-data.h
43749538    6357    pl-dbref.c
173127326   2137    pl-dbref.h
609497925   17219   pl-dde.c
2609824598  9367    pl-debug.c
2285396334  6451    pl-debug.h
320754582   36615   pl-dict.c
2283301360  2568    pl-dict.h
416100957   6416    pl-dwim.c
3336608453  29968   pl-error.c
4011024792  4413    pl-error.h
1110636621  16490   pl-event.c
1688587197  4318    pl-event.h
3032598468  16124   pl-ext.c
3895804789  5446    pl-flag.c
1346891260  105691  pl-fli.c
3086221742  32523   pl-funcs.h
1617660890  13326   pl-funct.c
61021472    145612  pl-gc.c
2807729484  27595   pl-global.h
3739201007  41931   pl-gmp.c
1942644242  5268    pl-gmp.h
1073674255  9291    pl-gvar.c
3857730076  3965    pl-hash.c
3600542337  416     pl-hash.h
822196172   86881   pl-incl.h
1599039188  70776   pl-index.c
2516193089  11995   pl-indirect.c
2416537859  3323    pl-indirect.h
3628676221  41784   pl-init.c
2895163602  4072    pl-init.h
4153803532  17081   pl-inline.h
2542107114  7637    pl-ldpass.h
2126564695  16566   pl-list.c
539450058   9192    pl-load.c
771412655   4402    pl-main.c
4232509298  42337   pl-modul.c
3962144524  16379   pl-mutex.c
3478162866  5607    pl-mutex.h
3590275782  27071   pl-nt.c
2670434290  12158   pl-ntconsole.c
680150452   29299   pl-ntmain.c
1906218269  20249   pl-op.c
232860969   136724  pl-prims.c
4278447739  4608    pl-privitf.c
629313199   3493    pl-privitf.h
4159192138  19879   pl-pro.c
278654607   103793  pl-proc.c
3811368285  31329   pl-prof.c
923087720   2214    pl-prof.h
661425183   127193  pl-read.c
1298716439  52488   pl-rec.c
1807804143  3629    pl-ressymbol.c
3124305088  1907    pl-ressymbol.h
3942209579  3977    pl-rsort.c
1912951200  1856    pl-rsort.h
1263068406  8046    pl-segstack.c
2085862315  5508    pl-segstack.h
8559700     41296   pl-setup.c
1526139889  42151   pl-srcfile.c
1210872503  12617   pl-string.c
377826690   12533   pl-supervisor.c
1217539705  3423    pl-sys.c
4055742569  179499  pl-tabling.c
2559564414  8908    pl-tabling.h
4004779490  9471    pl-term.c
826231697   25086   pl-termhash.c
185749438   12136   pl-termwalk.c
1361635016  176329  pl-thread.c
48307327    18006   pl-thread.h
485020833   59157   pl-trace.c
4069801006  18038   pl-transaction.c
1292592495  1919    pl-transaction.h
3570327249  69803   pl-trie.c
2549518641  7361    pl-trie.h
4202624014  199278  pl-umap.c
1061478064  7781    pl-util.c
3925866677  11011   pl-variant.c
2766394743  1757    pl-version.c
3473203690  152843  pl-vmi.c
3237686623  91854   pl-wam.c
346553956   96132   pl-wic.c
1276735639  11991   pl-wrap.c
1058156007  1994    pl-wrap.h
188413860   52554   pl-write.c
1118340588  8671    pl-xterm.c
3467450833  29622   pl-zip.c
3995774470  3229    pl-zip.h
2645846778  40995 swipl-ld.c
1126481414  51499 SWI-Prolog.h
1 Like

In case it helps, here’s the last part of strace with the same command. The last mmap seems to be asking for 0x10000001000 bytes.

stat("/home/peter/src/pykythe/pykythe", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/home/peter/src/pykythe/pykythe/pykythe_utils.pl", {st_mode=S_IFREG|0644, st_size=19005, ...}) = 0
access("/home/peter/src/pykythe/pykythe/pykythe_utils.pl", R_OK) = 0
stat("/home/peter/src/pykythe/pykythe", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
read(5, "a1Hex),\n        SrcSha1Hex == Sh"..., 4096) = 2276
futex(0x7fca622d37d8, FUTEX_WAKE_PRIVATE, 1) = 1
close(5)                                = 0
clock_gettime(0xfffeb2a6 /* CLOCK_??? */, {tv_sec=0, tv_nsec=194220467}) = 0
read(4, "nches).\n:- style_check(+no_effec"..., 4096) = 4096
stat("/home/peter/src/pykythe/pykythe/rdet2.pl", {st_mode=S_IFREG|0644, st_size=3002, ...}) = 0
access("/home/peter/src/pykythe/pykythe/rdet2.pl", R_OK) = 0
stat("/home/peter/src/pykythe/pykythe", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
read(4, " kyfacts_signature_node/5,\n     "..., 4096) = 4096
rt_sigprocmask(SIG_BLOCK, ~[QUIT BUS SEGV CONT STOP PROF RTMIN RT_1], [], 8) = 0
clock_gettime(0xfffeb2a6 /* CLOCK_??? */, {tv_sec=0, tv_nsec=195165735}) = 0
mmap(NULL, 401408, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fca625c5000
munmap(0x7fca62627000, 335872)          = 0
clock_gettime(0xfffeb2a6 /* CLOCK_??? */, {tv_sec=0, tv_nsec=195403868}) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
clock_gettime(0xfffeb2a6 /* CLOCK_??? */, {tv_sec=0, tv_nsec=195796708}) = 0
rt_sigprocmask(SIG_BLOCK, ~[QUIT BUS SEGV CONT STOP PROF RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
clock_gettime(0xfffeb2a6 /* CLOCK_??? */, {tv_sec=0, tv_nsec=195944439}) = 0
mmap(NULL, 1099511631872, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
write(2, "[FATAL ERROR: at Fri Aug 21 12:5"..., 43) = 43
write(2, "Could not allocate memory: Canno"..., 49) = 49
write(2, "]\n", 2)                      = 2
futex(0x7fca622d37dc, FUTEX_WAKE_PRIVATE, 1) = 1
nanosleep({tv_sec=0, tv_nsec=100000000}, 0x7fff2a054e00) = 0
close(4)                                = 0
rt_sigaction(SIGHUP, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, {sa_handler=0x7fca61fe9880, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, 8) = 0
rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, {sa_handler=0x7fca61fe9880, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, 8) = 0
rt_sigaction(SIGQUIT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, {sa_handler=0x7fca61fe9880, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, 8) = 0
rt_sigaction(SIGILL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, {sa_handler=0x7fca61fe9880, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, 8) = 0
rt_sigaction(SIGABRT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, {sa_handler=0x7fca61fe9880, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, 8) = 0
rt_sigaction(SIGFPE, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, {sa_handler=0x7fca61fe9880, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, 8) = 0
rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, {sa_handler=0x7fca61fe9880, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, 8) = 0
rt_sigaction(SIGALRM, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, {sa_handler=0x7fca61fe9880, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, 8) = 0
rt_sigaction(SIGTERM, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, {sa_handler=0x7fca61fe9880, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, 8) = 0
rt_sigaction(SIGUSR2, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, {sa_handler=0x7fca61fe9880, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, 8) = 0
rt_sigaction(SIGBUS, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, {sa_handler=0x7fca61fe9880, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, 8) = 0
rt_sigaction(SIGXCPU, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, {sa_handler=0x7fca61fe9880, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, 8) = 0
rt_sigaction(SIGXFSZ, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, {sa_handler=0x7fca61fe9880, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, 8) = 0
rt_sigaction(SIGVTALRM, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, {sa_handler=0x7fca61fe9880, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fca61b8bfd0}, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
getpid()                                = 10667
gettid()                                = 10667
tgkill(10667, 10667, SIGABRT)           = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=10667, si_uid=1000} ---
+++ killed by SIGABRT (core dumped) +++

This bug appears to be related to an earlier bug I reported, which couldn’t be reproduced. I’ve sent @jan information on how to reproduce this. (In other words, this is a Heisenbug that probably isn’t related to the no-local-references changes but to something else that’s pretty rare)

Looks pretty much ok for a stack trace from an optimized executable. To get something more understandable checkout CMAKE.md for how to build for debugging. This simply tells it runs out of memory while trying to wrap some predicate.

I don’t know exactly which git version you are on. Simply make sure there is no diff wrt the git using either git status or git diff. Note that the no-local-references branch has been updated a few times after rebasing it to the current master. To update, run git fetch and git reset --hard rather then the usual git pull.

This is rather inconsistent with the backtrace. This indeed looks like an older problem. I found the test for this and that doesn’t reproduce.

Got your test case. Thanks. It doesn’t reproduce though. Tried using several versions: the normal build, the PGO build, the debug build and the one compiled with AddressSanitizer. All run fine. Use strace to trace mmap() calls. Looks pretty much sane. Used /bin/time -f "%M" to get the peak RSS usage. This says 21Mb, which doesn’t look alarming to me.

Are you using the tcmalloc() version or the system malloc()?

Next step is probably to compile for debugging and/or with AddressSanitizer (for both see CMAKE.md). Using GDB on the debug version should allow you to look around and get some clue on whether it performs a normal allocation or something weird. The AddressSanitizer should normally trap memory issues. Another tool is valgrind. That is really easy to use, simply run valgrind command arg .... That too finds memory errors. Typically valgrind is more precise, but the emulation causes your program to run 10-20 times slower.

1 Like

Considering that @peter.ludemann’s problem seems unrelated I’ve merged this into master. This also includes a partial optimization for last call optimization in general and more specifically for tail-recursive calls (i.e., last calls that call the same predicate).

When I got the minimal example running for naive reverse I thought this would be a serious improvement. The nrev benchmark is 25-30% faster now. Running on more general programs shows only small improvements, more like 5%. Hardly worth the trouble :frowning: . As we have this now anyway and this seems a better starting point for further improvements it is probably worth continuing from here.

The new LCO is restricted to tail calls that only pass variables, small integers and atoms. Also passing a variable to an argument to the left of the original disables the new LCO. Still, this optimization applies to roughly 30% of the tail calls.

Please test and report issues.

2 Likes

I’m using tcmalloc. However, the “master” branch worked and it was also built with tcmalloc.

I made a fresh clone (just to be sure) and got the same problem. 8.3.5-93-gc2da063e9-DIRTY

I’ll try the various debugging tests you’ve suggested; it could be a few days (or more) before I can report any useful results. I’ll append anything interesting to this message.

My tests all work with the new merged master (except the weird compiler bug that I’m working on isolating and which doesn’t seem related to no-local-references).

I’m not seeing any signficant performance difference between /usr/bin/swipl (which doesn’t use tcmalloc) and my locally built versions (which use tcmalloc) … the difference I saw before would seem to have been due to something else (the code uses threads quite heavily to process a partially ordered set of files and can non-deterministically reprocess some of the files - the processing is idempotent, so it doesn’t hurt to process the same file twice at almost the same time).

1 Like

Which version is that and how did it get there?

Latest version from the PPA (8.3.5-1-g56bf6c148-bionicppa2)

$ /usr/bin/swipl --version
SWI-Prolog version 8.3.5 for x86_64-linux

The problem I reported with running the compiler seems to be fixed with https://github.com/SWI-Prolog/swipl-devel/commit/0a0d9800134b361eb2344f29cece8b0a4b571666

Thank-you @jan!

2 Likes