Thread error while doing boolean calculations and file writing

lai1997 · June 18, 2024, 11:15am

Hi, I am performing simple boolean operations on integers with a few thousand bits.
The same code worked for SWIPL 9.0. After a SWIPL update, one particular place sometimes gives a fatal error. I am trying to understand the reason behind this.
The trace when crashed:

SWI-Prolog [thread 17 () at Mon Jun 17 15:48:50 2024]: received fatal signal 11 (segv)
C-stack trace labeled “crash”:
[0] Sopen_string() at ??:? [0x7f3c6c968e3e]
[1] PL_scan_options() at ??:? [0x7f3c6c96dae0]
[2] PL_scan_options() at ??:? [0x7f3c6c96dcda]
[3] __sigaction() at ??:? [0x7f3c6c664ae0]
[4] PL_advance_hash_table_enum() at ??:? [0x7f3c6c95d85f]
[5] Sset_filter() at ??:? [0x7f3c6c96160b]
[6] PL_encoding_to_atom() at ??:? [0x7f3c6c948067]
[7] PL_write_prompt() at ??:? [0x7f3c6c94fd5c]
[8] PL_write_prompt() at ??:? [0x7f3c6c9503f3]
[9] PL_close_query() at ??:? [0x7f3c6c86456a]
[10] PL_check_data() at ??:? [0x7f3c6c8baab4]
[11] PL_get_thread_id_ex() at ??:? [0x7f3c6c8f4ab8]
[12] pthread_condattr_setpshared() at ??:? [0x7f3c6c6baded]
[13] __clone() at ??:? [0x7f3c6c73e0dc]
Prolog stack:
[5] system:open/3
[4] system:/1 [PC=18 in clause -1]
[3] system:catch/3 [PC=2 in clause 1]
[2] system:catch_with_backtrace/3 [PC=6 in clause 1]
[1] thread:fa_worker/4 [PC=51 in clause 1]
[0] system:$c_call_prolog/0 [PC=0 in top query clause]
Running on_halt hooks with status 139

I use concurrent_forall with meta calls to facts where each contains a large integer BXs.
pprod_p_/5 does some boolean operations on the integer and returns Row which then is written to a file.

	concurrent_forall(
	    call(P_M1,X,BXs),
	    (
	        pprod_p_(TP_M2,P_M3,X,BXs,Row),
	        open(M3fname,append,Str),
	        writes(Str,Row),
	        close(Str)
	    )
	)

SWIPL version:
SWI-Prolog version 9.2.4 for x86_64-linux

Specs:

  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          46 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   20
  On-line CPU(s) list:    0-19
Vendor ID:                GenuineIntel
  Model name:             Intel(R) Core(TM) i9-7900X CPU @ 3.30GHz
    CPU family:           6
    Model:                85
    Thread(s) per core:   2
    Core(s) per socket:   10
    Socket(s):            1
    Stepping:             4

Thanks

CapelliC · June 18, 2024, 5:08pm

Not sure this will help, but I would try to use a mutex to protect the file shared access, something like (untested…)

...
  setup_call_cleanup(mutex_create(MutexId), (
    concurrent_forall(call(P_M1,X,BXs), (
      pprod_p_(TP_M2,P_M3,X,BXs,Row),
      with_mutex(MutexId, (
        open(M3fname,append,Str),
        writes(Str,Row),
        close(Str)
      ))
    )),
  ), mutex_destroy(MutexId)),

Boris · June 19, 2024, 9:33am

Please also note that you are not on the latest stable release which is now 9.2.5. So you have to read the release notes, too…

jan · June 19, 2024, 7:41pm

Might be worth trying. Without a lock or using the open/4 option for exclusive access this probably makes a mess of the file. Still, it should not crash with a segv. Unfortunately it seems the binary has no debugging symbols and thus the C backtrace is bogus. The Prolog backtrace is probably correct.

Seems unlike this is related to big integers. I also doubt it is related to recent changes. Looks more like a concurrency issue in open/3 that just happens to trigger now.

Can you share a complete program to reproduce?

lai1997 · June 24, 2024, 11:45am

Thanks for the suggestions. @CapelliC @jan

This program is part of a larger function to compute boolean matrix products. After returning, the whole program is halted, a temp output is saved to file and a script calls the program for new operations.

In 9.2.4, the above error appears arbitrarily after many calls and always comes up after hours while the prolog setup is consistent across all calls.

Things tried: setting the max_integer_size flag seems to delay the error. I haven’t tried the mutex but went back to 9.0.4, which did not have this error without open/4 and mutex after extensive tests.

Topic		Replies	Views
Segfault happening some time after 9.3.0 General	12	76	July 26, 2024
Ann: SWI-Prolog 8.3.3 Releases	7	1000	July 7, 2020
Ann: SWI-Prolog 8.3.7 Releases	14	943	September 11, 2020
We have a segfault General	11	123	February 21, 2025
Ann: SWI-Prolog 9.2.7 (stable) Releases	1	174	September 16, 2024

Thread error while doing boolean calculations and file writing

Related topics