Janus and swish

Hi,
I’m trying to use Janus in liftcover, a pack for machine learning, and make it run under swish in https://cplint.eu. However, after the first successful run of a predicate that uses Janus, the second call gives a segmentation violation. How can I start a different Python engine for each run of the predicate?

Fabrizio

As is, not. Shutdown of the Python engine is done with Prolog shutdown. It is incomplete in the sense that several global (caching) data structures are not reset. That could, at least in theory, all be fixed such that we could add a py_shutdown/0, after which one could (lazily) create a new Python engine. This would not be thread-safe and thus not usable with SWISH.

Another option would be to allow for explicit creation and destruction of Python engines. The Python C API doesn’t pass an engine (interpreter) argument though, so I assume there can be only one Python interpreter in a process.

If there is no way to reuse the existing Python interpreter for the second run, I fear that doing the Python work in a new process is the only option.

Or, is the crash in the second run due to a bug in Janus? Janus for SWI-Prolog does support multiple Prolog and Python threads. This is not thoroughly tested though. Can you produce crash info using gdb from a Prolog+Python compiled in debug mode?

My code was using global variables, I thought that was the problem and I removed them but the problem didn’t go.
I would be happy to produce crash info but I need a few instructions: I know how to compile switch with debug, I don’t know how to compile python with debug and I’m not sure how to obtain crash info using gdb.

There is also a problem with output from Python and swish: when I run a program printing output from Python, I get the following message

?- Thread 17 (): foreign predicate system:current_prolog_flag/2 did not clear exception:
	error(existence_error(thread,17),context(system:thread_signal/2,_19044))
 0

the Python output appears in the console instead of the query window and swish reports the query as dead.

So I managed to find out how to produce a crash file (I think).
On my Mac, this is what I got

sudo gdb -ex r --args swipl run.pl
GNU gdb (GDB) 14.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin23.0.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from swipl...
Starting program: /usr/local/bin/swipl run.pl
[New Thread 0x1503 of process 81198]
[Thread 0x1503 of process 81198 exited]
[New Thread 0x2903 of process 81198]
warning: unhandled dyld version (17)
Warning: /Users/fabrizio/swish/lib/gitty_driver_files.pl:199:
Warning:    Local definition of gitty_driver_files:ensure_directory/1 overrides weak import from files_ex
% Updating GIT version stamps in the background.
% Started server at http://localhost:3050/
?- [New Thread 0x1607 of process 81198]
[New Thread 0x2103 of process 81198]
[New Thread 0x2203 of process 81198]
[New Thread 0x2303 of process 81198]
[New Thread 0x2403 of process 81198]
[New Thread 0x2503 of process 81198]
[New Thread 0x2603 of process 81198]
[New Thread 0x2703 of process 81198]
[New Thread 0x2803 of process 81198]
[New Thread 0x2a03 of process 81198]
[New Thread 0x2b03 of process 81198]
[New Thread 0x2c03 of process 81198]
[New Thread 0x2d03 of process 81198]
[New Thread 0x3d03 of process 81198]
[New Thread 0x3e03 of process 81198]
[New Thread 0x3f03 of process 81198]

Thread 3 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x1607 of process 81198]
0x00007ff815a674ac in ?? ()
(gdb) bt
#0  0x00007ff815a674ac in ?? ()
#1  0x00007000023c9bc0 in ?? ()
#2  0x0000000101b12a43 in ?? ()
#3  0x00007000023c9ef0 in ?? ()
#4  0x00007fd4fbaba2d8 in ?? ()
#5  0x206465696669661f in ?? ()
#6  0x0000000000000002 in ?? ()
#7  0x0000000000000005 in ?? ()
#8  0x00007fd4f0f00000 in ?? ()
#9  0xb8480e1600f900e0 in ?? ()
#10 0x00007fd4f6933c10 in ?? ()
#11 0x000060000737dac8 in ?? ()
#12 0x000060000737dac0 in ?? ()
#13 0x00007fd4f74b1f40 in ?? ()
#14 0x0000000102e7d6a0 in ?? ()
#15 0x00007000023c9be0 in ?? ()
#16 0x0000000101b131d3 in ?? ()
#17 0x0000000000000001 in ?? ()
#18 0x00007000023c9c28 in ?? ()
#19 0x00007000023c9c00 in ?? ()
#20 0x00000001064a7113 in ?? ()
#21 0x0000000000000001 in ?? ()
#22 0x00007fd4f6d08860 in ?? ()
--Type <RET> for more, q to quit, c to continue without paging--
#23 0x00007000023c9ca0 in ?? ()
#24 0x00000001069ed999 in ?? ()
#25 0x0000000000000180 in ?? ()
#26 0x00007fd4f7821660 in ?? ()
#27 0x0000000405f9eba0 in ?? ()
#28 0x00007fd4f6933c10 in ?? ()
#29 0x00007fd4f7820101 in ?? ()
#30 0x00007fd4f7800000 in ?? ()
#31 0x0000000100013200 in ?? ()
#32 0x000000010000e080 in ?? ()
#33 0x0000001e00000000 in ?? ()
#34 0x0000000100013200 in ?? ()
#35 0x0000000000000004 in ?? ()
#36 0x00007fd4f7800000 in ?? ()
#37 0x000000010000e080 in ?? ()
#38 0x0000000000000018 in ?? ()
#39 0x00007000023c9d00 in ?? ()
#40 0x00007ff81588d701 in ?? ()
#41 0x00006000004efab8 in ?? ()
#42 0x00006000004efa80 in ?? ()
#43 0x00007000023c9cc0 in ?? ()
#44 0x0000000102e4ba27 in ?? ()
#45 0x0000000102e78098 in ?? ()
--Type <RET> for more, q to quit, c to continue without paging--
#46 0x00006000004efa80 in ?? ()
#47 0x00007000023c9ce0 in ?? ()
#48 0x0000000102e3768d in ?? ()
#49 0x00006000004efa80 in ?? ()
#50 0x0000000000000000 in ?? ()
(gdb) 

I also tried to reproduce the bug on an ubuntu machine but I could not get janus to work: after a clean installation from git sources of swipl I get in swish

induce_par_lift([all],P).
 Exported procedure janus:py_with_gil/1 is not defined
 Exported procedure janus:py_setattr/3 is not defined
 Exported procedure janus:py_call/2 is not defined
 Exported procedure janus:py_iter/3 is not defined
 Exported procedure janus:py_call/1 is not defined
 Exported procedure janus:py_is_object/1 is not defined
 Exported procedure janus:py_iter/2 is not defined
 Exported procedure janus:py_free/1 is not defined
 Exported procedure janus:py_call/3 is not defined
 Exported procedure janus:py_gil_owner/1 is not defined
 [Thread 12]: exception handler failed to define janus:py_call/2
 Exception:janus:py_call(sys:path,_34460)
Unknown procedure: janus:py_call/2 However, there are definitions for: janus:px_call/4 janus:py_call/4

I managed to make janus work on linux, the problem was that swi was not finding python because it was installed by anaconda in a local environment.
So I produced the core and I analyzed it with gdb obtaining

gdb -c /var/lib/apport/coredump/core._usr_local_lib_swipl_bin_x86_64-linux_swipl.1000.a71a9348-f952-419c-aaa5-1d43ad56ff42.914636.167284548  --args swipl run.pl --public
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from swipl...
[New LWP 914799]
[New LWP 914647]
[New LWP 914636]
[New LWP 914646]
[New LWP 914637]
[New LWP 914644]
[New LWP 914639]
[New LWP 914645]
[New LWP 914806]
[New LWP 914804]
[New LWP 914803]
[New LWP 914802]
[New LWP 914801]
[New LWP 914648]
[New LWP 914798]
[New LWP 914805]
[New LWP 914807]
[New LWP 914800]
[New LWP 914808]
[New LWP 914649]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
--Type <RET> for more, q to quit, c to continue without paging--
Core was generated by `swipl run.pl --public'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  alt_segv_handler (sig=11) at /home/rzf/swipl-devel/src/pl-setup.c:772
772	  if ( LD->signal.sig_critical )
[Current thread is 1 (Thread 0x7f466a9f7640 (LWP 914799))]

More info

(gdb) bt
#0  alt_segv_handler (sig=11) at /home/rzf/swipl-devel/src/pl-setup.c:772
#1  <signal handler called>
#2  PL_open_foreign_frame___LD (__PL_ld=0x0)
    at /home/rzf/swipl-devel/src/pl-wam.c:347
#3  PL_open_foreign_frame () at /home/rzf/swipl-devel/src/pl-wam.c:356
#4  0x00007f4670313aa9 in py_unify (t=t@entry=504,
    obj=obj@entry=0x7f466edfbb40, flags=0)
    at /home/rzf/swipl-devel/packages/swipy/janus/janus.c:941
#5  0x00007f46703186b8 in py_call3 (Call=<optimized out>, result=504,
    options=<optimized out>)
    at /home/rzf/swipl-devel/packages/swipy/janus/janus.c:2036
#6  0x00007f4671afde4e in PL_next_solution___LD (
    __PL_ld=__PL_ld@entry=0x556d34d30000, qid=qid@entry=0x556d34d38070)
    at /home/rzf/swipl-devel/src/pl-vmi.c:4675
#7  0x00007f4671b47621 in callProlog (module=<optimized out>,
    goal=goal@entry=20, flags=flags@entry=8, ex=ex@entry=0x7f466a9f6d00)
    at /home/rzf/swipl-devel/src/pl-pro.c:475
#8  0x00007f4671b7fe4f in start_thread (closure=0x556d34cb6480)
    at /home/rzf/swipl-devel/src/pl-thread.c:2063
#9  0x00007f4671894ac3 in start_thread (arg=<optimized out>)
    at ./nptl/pthread_create.c:442
#10 0x00007f4671926660 in clone3 ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thanks. Looks like this may be just a Janus bug. I’m away this week. Could you try to create a reproducible scenario? Than I’ll see what I can do next week.

You should download my fork of swish from
git@github.com:friguzzi/swish.git
install the packs

  • cplint
  • bddem
  • lbfgs
  • auc
  • phil
  • matrix
  • pascal
  • liftcover

Install numpy and torch

pip install numpy
pip install torch

Launch cplint on swish by running swipl run.pl from swish directory.
Create two new files copying
https://cplint.eu/p/bongard_em_python.pl
and
https://cplint.eu/p/bongard_gd_python.pl
Run the query induce_par_lift([all],P). in the first file and then in the second, the error should appear.

I updated my swish fork and now that bug disappeared. However, there is another one that happens when python is trying to print something on the stdout.
I get

swipl run.pl --public
% Updating GIT version stamps in the background.
% Started server at http://localhost:3050/
?- Thread 14 (): foreign predicate system:current_prolog_flag/2 did not clear exception:
	error(existence_error(thread,14),context(system:thread_signal/2,_52456))
 0
Random_restart: Score  -616.6839014805798
Segmentation fault (core dumped)

To reproduce, use the file https://cplint.eu/p/bongard_em_python.pl and run
induce_par_lift([all],P).

The previous bug is still there on Mac, it disappeared only on Ubuntu

I didn’t even have to run the query twice :slight_smile: Fixed with c968368e9d1ebf6446785ae6a354d56a8a3199d0 on swipl-devel.git. The bug is triggered when Python, called from a Prolog thread not being the main thread, makes callbacks on Prolog.

After that, running both queries works fine.

Might be wise to see what happens if you run these queries at the same time from multiple SWISH windows :slight_smile: If the Python code calls out to C, enabling Python threads while the C code is running and the C code is thread-safe, one should have good concurrency if (as often the case) the heavy work is done in C code called from Python. The Python interpreter should be involved in getting the Prolog data through Python to the C module and, at the end, transferring the results back.

1 Like

I tried the new version and now it works perfectly. I also tried multiple windows and no issue arose.

What do you mean? Aren’t thread enabled in Python by default?
Anyway, my Python code just calls numpy, cupy and torch functions, which I guess call C and should be thread safe.

1 Like

Threads in Python are a bit weird. As is, it supports native threads, but it has only one VM engine (interpreter) that hops cooperatively between threads. That means that at most one core runs Python byte code at any moment in time. Other native threads may execute non-Python code concurrently. The “may” refers to the requirement to “cooperate”. The current Janus for SWI-Prolog implements this optimally AFAIK: before interacting with the Python interpreter it grabs the GIL. If Python makes a call to Prolog, Janus releases the GIL.

For short, you get fine concurrency with Janus if you use it to call native code that also nicely cooperates in the Python thread model. In that case the single Python interpreter just hops around to deal with transferring a job from Prolog to the Python managed native code and again to transfer the results back to Prolog.

That is the current situation as I understand it (please correct if I am wrong). Python is about to remove the GIL. I do not yet know what that means for Janus. Most likely it will get better :slight_smile:

I have developed a multithreaded version of liftcover than uses concurrent_maplist and each thread calls python. Running it in SWISH I get

?- Thread 15 (): foreign predicate system:current_prolog_flag/2 did not clear exception:
	error(existence_error(thread,15),context(system:thread_signal/2,_16720))

I use current_prolog_flag(cpu_count,Chunks) to get how many cores the CPU.

Does this imply the concurrent_maplist works fine outside SWISH? The error location is probably misleading. Can you share the code (and instructions :slight_smile: )?

yes, it works perfectly :slight_smile:

Now I’m not able to reproduce the bug anymore.
If I remember correctly, you can reproduce the experiment I did by:
pulling the liftcover repo, launching cplint on swish with swipl run.pl and running the query induce_lift([train],P). on the file https://cplint.eu/p/bongard_em_python.pl (copy again the file because I changed it). But now it works fine.

Yip. Works fine. According to SWI-Prolog’s thread monitor it indeed creates a lot of threads (my dev machine has 32 cores :slight_smile: ). Probably there is something fishy :frowning: though. If you see this again, run under gdb and put a break point on PL_raise_exception() using

(gdb) break PL_raise_exception

Then, if it hits the breakpoint, get the gdb backtrace using the bt command. Quite likely that is enough context to figure out what is wrong. You can do this on the normal release build as details on variable bindings are of no interest.

Sure, will do.

I will now put the Python and parallel version of liftcover on cplint on swish server but I would like to limit the number of threads users can use. At the moment I set them using the directive :- set_lift(threads,t). with t=cpu to indicate the number of cores. What would be the most elegant way of limiting that number only in swish?

Typically, add a file to config-available, edit/link it from config-enabled and put the code for loading and/or setting parameters there. That is how most extension libraries are added to SWISH.