I’ve been looking at a somewhat challenging CLP application that is resource (time and memory) intensive. When I run it with garbage collection disabled, it quickly runs out of memory with the standard “Stack limit exceeded” exception. When garbage collection is enabled, the vast majority of the time a MacOS kernel panic is generated (sometimes after hours) with cause:
...
mp_kdp_enter() timed-out on cpu 10, NMI-ing
mp_kdp_enter() NMI pending on cpus: 0 1 2 3 4 5 6 7 8 9 11 12 13 14 15
mp_kdp_enter() timed-out during locked wait after NMI;expected 16 acks but received 1 after 10945024 loops in 1500000000 ticks
panic(cpu 10 caller 0xffffff8000adb5da): "Machine Check at 0x00000001013a9c23,
...
I interpret this (perhaps incorrectly) that some thread has a lock and won’t give it up. The process running is usually swipl, which isn’t surprising since not much else is running at the time. Occasionally the process is the kernel extension responsible for power management (com.apple.driver.AppleIntelCPUPowerManagement(220.0)).
On a couple occasions, more graceful aborts:
[PROLOG SYSTEM ERROR: Thread 1 (main) at Sun Jul 7 18:43:40 2024
relocation chains = 1
[While in 78345-th garbage collection]
C-stack trace labeled "SYSERROR":
[0] save_backtrace() at [0x104fbfdcc]
PROLOG STACK:
[202241] clpBNR:getValue/2 [PC=25 in clause 1]
[202240] clpBNR:doNode_/7 [PC=35 in clause 1]
[202239] clpBNR:stableLoop_/2 [PC=42 in clause 2]
[199806] clpBNR:stable_/1 [PC=11 in clause 2]
[199793] clpBNR:eval_MS/4 [PC=17 in clause 1]
[199792] clpBNR:iterate_MS/6 [PC=122 in clause 1]
[12] system:<meta-call>/1 [PC=13 in clause -1]
[11] $toplevel:toplevel_call/1 [PC=3 in clause 1]
[10] $toplevel:stop_backtrace/2 [PC=4 in clause 1]
[9] $tabling:$wfs_call/2 [PC=17 in clause 1]
]
[pid=904] Action?
and
Failed to print resource exception due to lack of space
error(resource_error(stack),stack_overflow{choicepoints:8,depth:2731,environments:18,globalused:204039,localused:3,stack:[frame(2731,clpBNR:narrowing_op(mul,_52234302,($)/3,($)/3),[]),frame(2730,clpBNR:evalNode(mul,_52234338,($)/3,($)/3),[]),frame(2729,clpBNR:doNode_(($)/3,mul,_52234376,424,_52234380,(/)/2,_52234384),[]),frame(2728,clpBNR:stableLoop_((/)/2,424),[]),frame(151,clpBNR:stable_((/)/2),[])],stack_limit:204800,trailused:636})
Either of these is preferable to a kernel panic.
I’ve tried changing a few of the Prolog stack parameters (min_free and spare) but observed behaviour didn’t change. It occurs with both SWIP 9.2.5 and 9.3.5. I’m running an older version of MacOS (Mojave); it may be an OS bug, or even a hardware issue, but it’s certainly reproducible with SWIP.
I’m not really expecting any solutions (more graceful error handling) from this post, but any insights/suggestions welcome. If there’s any interest I can compose a SWISH notebook with the application test code although I’d be reluctant to run it on a production server.