Core dump in dev release 9.3.24

This happened today when I installed SWI on a new machine running Fedora Linux 40:

Welcome to SWI-Prolog (threaded, 64 bits, version 9.3.24-58-g407ec23a9)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.
Please run ?- license. for legal details.

For online help and background, visit https://www.swi-prolog.org
For built-in help, use ?- help(Topic). or ?- apropos(Word).

?- [load_headless].
Loading poker
% Interactive session; added `.` to Python `sys.path`
Loading experiment file module data(poker_examples/experiment_script_lnf.pl) from exp_script_lnf.
Global stack limit 17,179,869,184
Table space 137,438,953,472
true.

?- experiment_file:abop_plant_a.
% Generating labelled examples...
% Generating 20 abop_plant_a examples of length in [0,6].
% Generating all abop_plant_a_with_vars examples of length in [11,13].
% Generating unlabelled examples...
% Got 77 labelled examples.
% Got 0 unlabelled examples.
% Generalising positive examples
% Derived 2274 sub-hypotheses (unsorted)
% Derived 125 sub-hypotheses (sorted)
% Derived 108 sub-hypotheses (unfolded)
% Generating new atoms...
% Found safe_example/2.
% Generated 497 new atoms.

ERROR: API error: invalid atom_t 1593861 (no valid atom at this index)
Time: Mon Jun  9 19:04:25 2025
Inferences: 870
Thread: 2 (gc)
C-stack trace labeled "API":
  [0] save_backtrace() at /home/yegoblynqueenne/swipl-devel/src/os/pl-cstack.c:337 [0x7f8735d6e5f8]
  [1] vsysError() at /home/yegoblynqueenne/swipl-devel/src/pl-init.c:2088 [0x7f8735ccf181]
  [2] vfatalError() at /home/yegoblynqueenne/swipl-devel/src/pl-init.c:2140 [0x7f8735ccf3dc]
  [3] fetchAtomArray() at /home/yegoblynqueenne/swipl-devel/src/pl-inline.h:387 [0x7f8735cecc17]
  [4] write_closure() at /home/yegoblynqueenne/swipl-devel/src/pl-wrap.c:72 [0x7f8735ce568d]
  [5] dbgAtomName() at /home/yegoblynqueenne/swipl-devel/src/pl-atom.c:1163 [0x7f8735c72ae3]
  [6] unregister_atom_clause() at /home/yegoblynqueenne/swipl-devel/src/pl-proc.c:1751 [0x7f8735d54886]
  [7] forAtomsInCodes() at /home/yegoblynqueenne/swipl-devel/src/pl-comp.c:4188 (discriminator 1) [0x7f8735d259fe]
  [8] freeClause() at /home/yegoblynqueenne/swipl-devel/src/pl-proc.c:1764 [0x7f8735d2e1e2]
  [9] freeClauseRef() at /home/yegoblynqueenne/swipl-devel/src/pl-proc.c:1240 [0x7f8735d2e104]
  [10] pl_garbage_collect_clauses() at /home/yegoblynqueenne/swipl-devel/src/pl-proc.c:2751 [0x7f8735d547a8]
  [11] PL_next_solution___LD() at /home/yegoblynqueenne/swipl-devel/src/pl-vmi.c:4359 [0x7f8735c76dc6]
  [12] PL_call_predicate() at /home/yegoblynqueenne/swipl-devel/src/pl-fli.c:4529 [0x7f8735d41c6b]
  [13] GCmain() at /home/yegoblynqueenne/swipl-devel/src/pl-thread.c:6926 [0x7f8735cc7fc0]
  [14] start_thread() at ??:? [0x7f8735aecfa8]
  [15] __clone3() at :? [0x7f8735b70fcc]


PROLOG STACK (without arguments):
  [4] system:garbage_collect_clauses/0 [PC=2 in supervisor]
  [2] $gc:gc_loop/0 [PC=29 in clause 1]
  [0] system:$c_call_prolog/0 [PC=0 in top query clause]


PROLOG STACK (with arguments; may crash if data is corrupted):
      [4] garbage_collect_clauses
      [2] gc_loop
      [0] '$c_call_prolog'

[pid=26319] Action? Aborted (core dumped)

Last time the same query terminated without errors was the daily build for windows 9.3.20-23-ge2557e66e (64 bits).

I can try creating a short example but it’s complicated. Is it possible to find the culprit just from the version?

Looks like an issue in atom reference counting, possibly related to Janus. Getting a simpler example is surely welcome. Probably hard though. Just a reproducible scenario might be good enough.

My program does use Janus but I didn’t think it is called anywhere on the path that causes the error, other than calling py_add_lib_dir/1.

I can try to dig down and find where exactly the error is raised.

Meanwhile, the program that I ran is in a private github repository. I think we had the same issue last time when I wanted to share code. Should I send a zip file? Or I can add you to the repo so you can clone it?

OK, so I ran my program with debug(_) and it seems this was the last predicate that was called before the core dump:

verify_program(Cs,W,Es):-
	PM = experiment_file
	,debug_clauses(verify_program_full,'Verifying program:',Cs)
	,S = (assert_program(PM,Cs,Rs)
	     ,table_untable_predicates(table,PM,Cs)
	     )
	,(   poker_configuration:multithreading(W)
	->   G = (concurrent_forall(member(E,Es)
				  ,(debug(examples,'Verifying Example: ~w', [E])
				   ,call(PM:E)
				   )
				  )
		 )
	 ;   G = forall(member(E,Es)
		       ,(debug(examples,'Verifying Example: ~w', [E])
			,call(PM:E)
			)
		       )
	 )
	,C = (erase_program_clauses(Rs)
	     ,table_untable_predicates(untable,PM,Cs)
	     )
	,setup_call_cleanup(S,G,C)
	,debug(verify_program,'Verified program accepts all examples',[]).

%!	table_untable_predicates(+What,+Module,+Clauses) is det.
%
%	Table or untable the predicates defined in a set of Clauses.
%
%	What is one of: [table, untable].
%
%	Module is the module where the programs that are to be tabled or
%	untabled are defined.
%
%	Clauses is a list of clauses that potentially use the predicates
%	to table or untable in their body literals.
%
table_untable_predicates(W,M,Cs):-
	program_symbols(Cs,Ss)
	,forall(member(S,Ss)
	       ,table_untable(W,M,S)
	       ).


%!	table_untable(+What,+Module,+Symbol) is det.
%
%	Table or untable a predicate Symbol.
%
table_untable(_,_M,F/A):-
% Attempt to identify BK predicates. Those are already defined with
% their own properties, and trying to table them raises a permission
% error.
	functor(T,F,A)
	,poker_configuration:experiment_file(_P,M)
	,predicate_property(M:T,static)
	,!.
table_untable(table,M,S):-
	M:table(S)
	,!.
table_untable(untable,M,S):-
	M:untable(S).

I know you’ve said before not to use table/untable that way. I’ve been bad, sorry :0

Is that what’s possibly causing the core dump?

P.S. Also running again now with:

leash(-all)
visible(+all)
trace.

Will let you know if anything else shows up.

What is atom reference counting and how can it go wrong? Is it the number of atoms in memory that can cause problems? Btw, those are Prolog atoms, constants, correct? Not facts (logical atoms)?

Atoms, as in constants, are managed by the atom garbage collector that scans the stacks for references. Data structures that are not on the stacks and reference atoms must increment/decrement a reference count. The atom is subject to GC if its reference count is zero and there are no references from the stacks. Manual reference counting is always a bit scary though. Decrementing too early or forgetting to increment can cause an atom to be collected too early.

Note that “atoms” (actually “blobs”) are also used as handles to streams, clauses, tables, etc.

And no, on 64 bit systems there is no realistic limit to the number of atoms the system can have except for exhausting memory.

What seems to be happening in the crash is that there is a clause that holds some atom. Erasing the clause should decrement the atom’s reference count, but the atom is already gone.

I’m traveling right now, so it can take some time …

I’ve seen something that seems similar with 9.3.24. In my case I was reading in a large JSON file, asserting a lot of facts based on it, retracting the facts and then re-reading the JSON again. I can’t remember if I was using tabling at that point - I am now, and I’ve switched to abolish rather than retract and I haven’t seen the issue recently.

Dunno if that helps much, if I see the problem again I’ll provide a stack trace.

Thanks for the update. Considering earlier communication on this with @stassa.p, this is probably unrelated. If the problem is reproducible, please share. Using abolish/1 rather than retractall/1 should be more likely to cause problems than resolve them. You are probably just lucky.

Thanks, it was reproducible at one point - then the issue disappeared but I’m not sure why. I’ll switch to retractall if that’s the better, but I haven’t had any issues with abolish so far.

Thank you for the explanation, Jan. I’ll wait patiently, no worries if it takes time. I’m trying to go back to earlier tagged versions from the repo to find the latest one I can run my experiments with.

I’m going away too for a bit so I rented a beefy-ass server to run some experiments while I’m traveling, but if I can figure out how to downgrade my SWI version then I’ll be fine.

Meanwhile my attempts to dig further down and find what bit of my code raises the error exactly have so far failed. I’ll try some more later.

Edit: btw, what does “DIRTY” mean next to a version name? I’m guessing it refers to the fact that after checking out an earlier commit there’s a bunch of submodules that look like they’re modified, in the repo?

I mean this:

 git status
HEAD detached at c768457e0
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   packages/PDT (new commits)
        modified:   packages/RDF (new commits)
        modified:   packages/archive (new commits)
        modified:   packages/bdb (new commits)
% more... 

Welcome to SWI-Prolog (threaded, 64 bits, version 9.3.24-DIRTY)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.
Please run ?- license. for legal details.

I might have made mistake and not gone back to the earlier version though not sure.

Yay! This worked:

yegoblynqueenne@43508:~/vanilla$ ../swipl-devel/build_9.3.23/src/swipl
Welcome to SWI-Prolog (threaded, 64 bits, version 9.3.23-DIRTY)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.
Please run ?- license. for legal details.

    CMake built from "/home/yegoblynqueenne/swipl-devel/build_9.3.23"

For online help and background, visit https://www.swi-prolog.org
For built-in help, use ?- help(Topic). or ?- apropos(Word).

My code terminated without crashing, without errors and no core dumps so I guess whatever changed, changed after 9.3.23? That still leaves several commits though.

I’ll see about making you a version of my code you can use to reproduce the error.

Can I say something here? I often have trouble installing projects and there is a clear demarcation line between the ones whose installation procedure is a major hassle filled with unexpected errors that spring up on you without warning. Python is the exemplar of this category of project, and Ruby when I used to work with it back in the day (maybe it’s fixed now). Python in particular makes such a fiddly, complicated mess of installing projects and keeping track of dependencies that, my god, why? Anaconda? Why? My friend who works with JS a lot tells me npm can realy suck too.

But SWI is just not like that. I never had any problem installing SWI on any machine I wanted to. When I was at Imperial for my PhD, one of the tech support guys (Imperial had a stellar tech support team, probably better engineers than anyone in the CS dept.) set up a little script for me to update the latest SWI development version on one of the departmental servers as soon as a new version came out. Not once did I have a problem with these automatic updates. If that was Python, or JS I’m told, then I’d have to deal with crashes every other day (or Lloyd would, the guy who wrote the install script for me- thanks, Lloyd).

So I just wanted to shower Jan and whoever is responsible for SWI’s build and installation process with praise. Thank you guys. From the bottom of my heart, thank you for hours of frustration I have never had to deal with.

2 Likes

It means the source files do not precisely match the version as it appears in the GIT repo with the indicated version, i.e., git diff is not empty. That indeed can point at sub repos not being updated. If you switch versions it is wise to run git submodule update. I do not suspect out-of-sync submodules to be involved in this issue, but it is of course not impossible.

1 Like

Should be fixed with 64a4b19317989858a8b0691ecac0f1471bcbc401. Thanks or reporting.

1 Like

I’ll try it as soon as I can. Thank you for fixing!