Janus and swish

Did it and it’s working. However, when I launch swish with

swipl run.pl

run examples of liftcover with multiple threads then exit with CTRL+D, swipl hangs and I have to kill the process.

I managed to reproduce the bug. I run

swipl ./daemon.pl --port=3050 --pidfile=/home/rzf/swish/swish.pid --output=o

and, executing query

on https://cplint.eu/p/bongard_em_python.pl
got:

^[[32m% Started server at http://localhost:3050/
^[[0m
SWI-Prolog [thread 15 () at Sat Jan 13 12:14:37 2024]: received fatal signal 11 (segv)
C-stack trace labeled "crash":
  [0] save_backtrace() at /home/rzf/swipl-devel/src/os/pl-cstack.c:337 [0x7effb3b31e75]
  [1] print_c_backtrace() at /home/rzf/swipl-devel/src/os/pl-cstack.c:911 [0x7effb3b32040]
  [2] sigCrashHandler() at /home/rzf/swipl-devel/src/os/pl-cstack.c:949 [0x7effb3b3217b]
  [3] __restore_rt() at libc_sigaction.c:? [0x7effb3642520]
  [4] PyErr_Occurred() at ??:? [0x7effa947cc42]
  [5] check_error() at /home/rzf/swipl-devel/packages/swipy/janus/janus.c:475 [0x7effb2281374]
  [6] PL_next_solution___LD() at /home/rzf/swipl-devel/src/pl-vmi.c:4681 [0x7effb3a2edd0]
  [7] PL_call_predicate() at /home/rzf/swipl-devel/src/pl-fli.c:4282 [0x7effb3b02914]
  [8] py_init() at /home/rzf/swipl-devel/packages/swipy/janus/janus.c:1534 [0x7effb227edbe]
  [9] py_call3() at /home/rzf/swipl-devel/packages/swipy/janus/janus.c:2007 [0x7effb22844fd]
  [10] PL_next_solution___LD() at /home/rzf/swipl-devel/src/pl-vmi.c:4675 [0x7effb3a2ee4e]
  [11] callProlog() at /home/rzf/swipl-devel/src/pl-pro.c:482 [0x7effb3a78621]
  [12] start_thread() at /home/rzf/swipl-devel/src/pl-thread.c:2093 [0x7effb3ab0e4f]
  [13] start_thread() at ./nptl/pthread_create.c:442 [0x7effb3694ac3]
  [14] __clone3() at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:83 [0x7effb3726850]
Prolog stack:
Running on_halt hooks with status 139
Killing 538553 with default signal handlers

Running gdb on the core I got

db -c core._usr_local_lib_swipl_bin_x86_64-linux_swipl.1000.652517a3-43a8-4fcd-9992-94c7d8eff85c.538242.15458584 --args  swipl ./daemon.pl --port=3050 --pidfile="$pid_file" --output="$output_file"
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from swipl...
[New LWP 538553]
[New LWP 538242]
[New LWP 538245]
[New LWP 538247]
[New LWP 538246]
[New LWP 538243]
[New LWP 538244]
[New LWP 538249]
[New LWP 538250]
[New LWP 538402]
[New LWP 538404]
[New LWP 538413]
[New LWP 538403]
[New LWP 538552]
[New LWP 538248]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `swipl ./daemon.pl --port=3050 --pidfile=/home/rzf/swish/swish.pid --output=o'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007effa947cc42 in PyErr_Occurred ()
   from /lib/x86_64-linux-gnu/libpython3.10.so.1.0
[Current thread is 1 (Thread 0x7effaa297640 (LWP 538553))]
(gdb) bt
#0  0x00007effa947cc42 in PyErr_Occurred ()
   from /lib/x86_64-linux-gnu/libpython3.10.so.1.0
#1  0x00007effb2281374 in check_error (obj=0x0)
    at /home/rzf/swipl-devel/packages/swipy/janus/janus.c:473
#2  py_initialize_ (prog=<optimized out>, Argv=<optimized out>,
    options=<optimized out>)
    at /home/rzf/swipl-devel/packages/swipy/janus/janus.c:1610
#3  0x00007effb3a2edd0 in PL_next_solution___LD (__PL_ld=<optimized out>,
    qid=qid@entry=0x5588e9436040) at /home/rzf/swipl-devel/src/pl-vmi.c:4681
#4  0x00007effb3b02914 in PL_call_predicate (ctx=ctx@entry=0x0,
    flags=flags@entry=2, pred=<optimized out>, h0=h0@entry=0)
    at /home/rzf/swipl-devel/src/pl-fli.c:4281
#5  0x00007effb227edbe in py_init ()
    at /home/rzf/swipl-devel/packages/swipy/janus/janus.c:1533
#6  py_gil_ensure (state=state@entry=0x7effaa2960b0)
    at /home/rzf/swipl-devel/packages/swipy/janus/janus.c:1891
#7  0x00007effb22844fd in py_call3 (Call=<optimized out>, result=476,
    options=0) at /home/rzf/swipl-devel/packages/swipy/janus/janus.c:2007
#8  0x00007effb3a2ee4e in PL_next_solution___LD (
    __PL_ld=__PL_ld@entry=0x5588e93c5200, qid=qid@entry=0x5588e9378fa0)
    at /home/rzf/swipl-devel/src/pl-vmi.c:4675
#9  0x00007effb3a78621 in callProlog (module=<optimized out>,
    goal=goal@entry=20, flags=flags@entry=8, ex=ex@entry=0x7effaa296d00)
    at /home/rzf/swipl-devel/src/pl-pro.c:475
#10 0x00007effb3ab0e4f in start_thread (closure=0x5588e956d080)
    at /home/rzf/swipl-devel/src/pl-thread.c:2063
#11 0x00007effb3694ac3 in start_thread (arg=<optimized out>)
    at ./nptl/pthread_create.c:442
#12 0x00007effb3726850 in clone3 ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Note that when when running the same query to the same file using
swipl run.pl there is no crash

The? This looks like something else. It reproduces, and gives an important clue:

unknown option --pidfile=swish.pid
usage: /home/janw/src/swipl-devel/build.pgo/src/swipl [option] ... [-c cmd | -m mod | file | -] [arg] ...
Try `python -h' for more information.

That is because we pass all non-Prolog command line options to py_initialize/3. That is probably not a good idea. Changed this to use a new Prolog flag py_argv, which is by default set to [].

Works for me with this. Of course, the crash should be cleaner. Not sure how. We get a Python error, but apparently not enough is initialized to handle this cleanly.

Back to “The bug”, cleanly shutting down Python in multi-threaded Prolog is a rather tricky business. It probably needs some improvements, but whether it can always be 100% safe, I do not know. Reproducing cases where it goes wrong can help avoiding most pitfalls.

1 Like

Many thanks, it now works, I have updated https://cplint.eu. Many thanks as always :pray: