Supported by Claude, I’m looking for a way to implement the stateless HTTP API using engines instead of threads, as engines are cheaper to store in the cache. It’s looking good except for one thing: the engine-backed variant does not currently implement a robust timeout for nonterminating goals. Claude tested the obvious approaches; call_with_time_limit/2 did not interrupt tight loops inside engines, and destroying a running engine from another thread was unsafe in this SWI build.
Claude, the floor is your’s …
Environment
Observed on:
SWI-Prolog version 10.1.3 for fat-darwin
Programmatic version:
?- current_prolog_flag(version_data, Version).
Version = swi(10, 1, 3, []).
Issue 1: call_with_time_limit/2 does not interrupt engine_next/2
For ordinary Prolog code, call_with_time_limit/2 behaves as expected:
?- catch(call_with_time_limit(0.1, (repeat, fail)),
Error,
writeln(Error)).
time_limit_exceeded
But the equivalent goal inside an engine does not appear to be interrupted:
?- engine_create(true, (repeat, fail), Engine),
catch(call_with_time_limit(0.1, engine_next(Engine, _)),
Error,
writeln(Error)).
Expected behavior:
The call should raise time_limit_exceeded, or the limitation should be documented.
Observed behavior:
The call does not return or throw within several seconds.
During one run, SWI also printed this diagnostic before the process was killed externally:
foreign predicate system:repeat/0 did not clear exception:
error(signal(alrm,14),context(repeat/0,_))
This suggests the alarm signal may be delivered but not converted into the expected time_limit_exceeded exception while executing through engine_next/2.
Minimal reproduction
:- initialization(main, main).
main :-
current_prolog_flag(version_data, Version),
format('SWI version: ~q~n', [Version]),
catch(
call_with_time_limit(0.1, (repeat, fail)),
PlainError,
format('plain loop: ~q~n', [PlainError])
),
engine_create(true, (repeat, fail), Engine),
catch(
call_with_time_limit(0.1, engine_next(Engine, _)),
EngineError,
format('engine loop: ~q~n', [EngineError])
),
writeln(done).
Expected output:
SWI version: swi(10,1,3,[])
plain loop: time_limit_exceeded
engine loop: time_limit_exceeded
done
Observed output:
SWI version: swi(10,1,3,[])
plain loop: time_limit_exceeded
Then the process hangs in the engine case.
Issue 2: engine_destroy/1 can segfault when called from another thread
As a workaround for the timeout problem above, I tried running engine_next/2 in a worker thread and destroying the engine from the main thread when a timeout expires. This appears unsafe: if one thread is actively running an engine and another thread calls engine_destroy/1 on that engine, SWI segfaults.
Expected behavior:
engine_destroy/1 should fail, throw, or safely interrupt/destroy the engine. It should not crash the process.
Observed behavior:
Segmentation fault.
Minimal reproduction
:- initialization(main, main).
main :-
current_prolog_flag(version_data, Version),
format('SWI version: ~q~n', [Version]),
engine_create(true, (repeat, fail), Engine),
thread_create(run_engine(Engine), Worker, []),
sleep(0.05),
writeln('Destroying engine from main thread while worker is running it...'),
engine_destroy(Engine),
writeln('engine_destroy/1 returned'),
thread_join(Worker, Status),
format('Worker status: ~q~n', [Status]).
run_engine(Engine) :-
catch(
engine_next(Engine, Answer),
Error,
( format('Worker caught: ~q~n', [Error]),
fail
)
),
format('Worker answer: ~q~n', [Answer]).
Observed output:
SWI version: swi(10,1,3,[])
Destroying engine from main thread while worker is running it...
ERROR: Received fatal signal 11 (segv)
Why this matters
The use case is a stateless HTTP API that evaluates Prolog goals with paged responses. A thread-backed implementation can cache suspended toplevel actor threads between requests, but engines look like a better fit because cached engines should be much cheaper than cached threads.
For example, an engine-backed cache entry can store:
cache(GoalId, NextOffset, Engine)
instead of a suspended actor thread:
cache(GoalId, NextOffset, Pid)
The engine can be made to yield whole protocol chunks, so it does not need to compute one solution beyond the current response:
chunk_session(Goal, Template, Offset, Answer) :-
engine_fetch(Limit0),
Limit = count(Limit0),
chunk_loop(Goal, Template, Offset, Limit, Answer).
chunk_loop(Goal, Template, Offset, Limit, Answer) :-
answer(Goal, Template, Offset, Limit, Answer),
( Answer = success(_, true)
-> engine_yield(Answer),
engine_fetch(NextLimit),
nb_setarg(1, Limit, NextLimit),
fail
; true
).
This works for normal finite goals and cached paging. The missing piece is a robust way to bound execution time for nonterminating goals.
Questions
- Should
call_with_time_limit/2interrupt nonterminating Prolog execution while it is being run throughengine_next/2orengine_post/3? - If not, is this an intentional limitation that should be documented?
- Should
engine_destroy/1detect and reject attempts to destroy an engine that is currently running in another thread? - Is there a recommended pattern for implementing timeouts around engines used this way?