Thread error while doing boolean calculations and file writing

Hi, I am performing simple boolean operations on integers with a few thousand bits.
The same code worked for SWIPL 9.0. After a SWIPL update, one particular place sometimes gives a fatal error. I am trying to understand the reason behind this.
The trace when crashed:

SWI-Prolog [thread 17 () at Mon Jun 17 15:48:50 2024]: received fatal signal 11 (segv)
C-stack trace labeled “crash”:
[0] Sopen_string() at ??:? [0x7f3c6c968e3e]
[1] PL_scan_options() at ??:? [0x7f3c6c96dae0]
[2] PL_scan_options() at ??:? [0x7f3c6c96dcda]
[3] __sigaction() at ??:? [0x7f3c6c664ae0]
[4] PL_advance_hash_table_enum() at ??:? [0x7f3c6c95d85f]
[5] Sset_filter() at ??:? [0x7f3c6c96160b]
[6] PL_encoding_to_atom() at ??:? [0x7f3c6c948067]
[7] PL_write_prompt() at ??:? [0x7f3c6c94fd5c]
[8] PL_write_prompt() at ??:? [0x7f3c6c9503f3]
[9] PL_close_query() at ??:? [0x7f3c6c86456a]
[10] PL_check_data() at ??:? [0x7f3c6c8baab4]
[11] PL_get_thread_id_ex() at ??:? [0x7f3c6c8f4ab8]
[12] pthread_condattr_setpshared() at ??:? [0x7f3c6c6baded]
[13] __clone() at ??:? [0x7f3c6c73e0dc]
Prolog stack:
[5] system:open/3
[4] system:/1 [PC=18 in clause -1]
[3] system:catch/3 [PC=2 in clause 1]
[2] system:catch_with_backtrace/3 [PC=6 in clause 1]
[1] thread:fa_worker/4 [PC=51 in clause 1]
[0] system:$c_call_prolog/0 [PC=0 in top query clause]
Running on_halt hooks with status 139

I use concurrent_forall with meta calls to facts where each contains a large integer BXs.
pprod_p_/5 does some boolean operations on the integer and returns Row which then is written to a file.


SWIPL version:
SWI-Prolog version 9.2.4 for x86_64-linux


  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          46 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   20
  On-line CPU(s) list:    0-19
Vendor ID:                GenuineIntel
  Model name:             Intel(R) Core(TM) i9-7900X CPU @ 3.30GHz
    CPU family:           6
    Model:                85
    Thread(s) per core:   2
    Core(s) per socket:   10
    Socket(s):            1
    Stepping:             4


Not sure this will help, but I would try to use a mutex to protect the file shared access, something like (untested…)

  setup_call_cleanup(mutex_create(MutexId), (
    concurrent_forall(call(P_M1,X,BXs), (
      with_mutex(MutexId, (
  ), mutex_destroy(MutexId)),

Please also note that you are not on the latest stable release which is now 9.2.5. So you have to read the release notes, too…

Might be worth trying. Without a lock or using the open/4 option for exclusive access this probably makes a mess of the file. Still, it should not crash with a segv. Unfortunately it seems the binary has no debugging symbols and thus the C backtrace is bogus. The Prolog backtrace is probably correct.

Seems unlike this is related to big integers. I also doubt it is related to recent changes. Looks more like a concurrency issue in open/3 that just happens to trigger now.

Can you share a complete program to reproduce?

Thanks for the suggestions. @CapelliC @jan

This program is part of a larger function to compute boolean matrix products. After returning, the whole program is halted, a temp output is saved to file and a script calls the program for new operations.

In 9.2.4, the above error appears arbitrarily after many calls and always comes up after hours while the prolog setup is consistent across all calls.

Things tried: setting the max_integer_size flag seems to delay the error. I haven’t tried the mutex but went back to 9.0.4, which did not have this error without open/4 and mutex after extensive tests.