Engine timeout and cross-thread destroy issues

Supported by Claude, I’m looking for a way to implement the stateless HTTP API using engines instead of threads, as engines are cheaper to store in the cache. It’s looking good except for one thing: the engine-backed variant does not currently implement a robust timeout for nonterminating goals. Claude tested the obvious approaches; call_with_time_limit/2 did not interrupt tight loops inside engines, and destroying a running engine from another thread was unsafe in this SWI build.

Claude, the floor is your’s …

Environment

Observed on:

SWI-Prolog version 10.1.3 for fat-darwin

Programmatic version:

?- current_prolog_flag(version_data, Version).
Version = swi(10, 1, 3, []).

Issue 1: call_with_time_limit/2 does not interrupt engine_next/2

For ordinary Prolog code, call_with_time_limit/2 behaves as expected:

?- catch(call_with_time_limit(0.1, (repeat, fail)),
         Error,
         writeln(Error)).
time_limit_exceeded

But the equivalent goal inside an engine does not appear to be interrupted:

?- engine_create(true, (repeat, fail), Engine),
   catch(call_with_time_limit(0.1, engine_next(Engine, _)),
         Error,
         writeln(Error)).

Expected behavior:

The call should raise time_limit_exceeded, or the limitation should be documented.

Observed behavior:

The call does not return or throw within several seconds.

During one run, SWI also printed this diagnostic before the process was killed externally:

foreign predicate system:repeat/0 did not clear exception:
    error(signal(alrm,14),context(repeat/0,_))

This suggests the alarm signal may be delivered but not converted into the expected time_limit_exceeded exception while executing through engine_next/2.

Minimal reproduction

:- initialization(main, main).

main :-
    current_prolog_flag(version_data, Version),
    format('SWI version: ~q~n', [Version]),

    catch(
        call_with_time_limit(0.1, (repeat, fail)),
        PlainError,
        format('plain loop: ~q~n', [PlainError])
    ),

    engine_create(true, (repeat, fail), Engine),
    catch(
        call_with_time_limit(0.1, engine_next(Engine, _)),
        EngineError,
        format('engine loop: ~q~n', [EngineError])
    ),

    writeln(done).

Expected output:

SWI version: swi(10,1,3,[])
plain loop: time_limit_exceeded
engine loop: time_limit_exceeded
done

Observed output:

SWI version: swi(10,1,3,[])
plain loop: time_limit_exceeded

Then the process hangs in the engine case.

Issue 2: engine_destroy/1 can segfault when called from another thread

As a workaround for the timeout problem above, I tried running engine_next/2 in a worker thread and destroying the engine from the main thread when a timeout expires. This appears unsafe: if one thread is actively running an engine and another thread calls engine_destroy/1 on that engine, SWI segfaults.

Expected behavior:

engine_destroy/1 should fail, throw, or safely interrupt/destroy the engine. It should not crash the process.

Observed behavior:

Segmentation fault.

Minimal reproduction

:- initialization(main, main).

main :-
    current_prolog_flag(version_data, Version),
    format('SWI version: ~q~n', [Version]),
    engine_create(true, (repeat, fail), Engine),
    thread_create(run_engine(Engine), Worker, []),
    sleep(0.05),
    writeln('Destroying engine from main thread while worker is running it...'),
    engine_destroy(Engine),
    writeln('engine_destroy/1 returned'),
    thread_join(Worker, Status),
    format('Worker status: ~q~n', [Status]).

run_engine(Engine) :-
    catch(
        engine_next(Engine, Answer),
        Error,
        (   format('Worker caught: ~q~n', [Error]),
            fail
        )
    ),
    format('Worker answer: ~q~n', [Answer]).

Observed output:

SWI version: swi(10,1,3,[])
Destroying engine from main thread while worker is running it...

ERROR: Received fatal signal 11 (segv)

Why this matters

The use case is a stateless HTTP API that evaluates Prolog goals with paged responses. A thread-backed implementation can cache suspended toplevel actor threads between requests, but engines look like a better fit because cached engines should be much cheaper than cached threads.

For example, an engine-backed cache entry can store:

cache(GoalId, NextOffset, Engine)

instead of a suspended actor thread:

cache(GoalId, NextOffset, Pid)

The engine can be made to yield whole protocol chunks, so it does not need to compute one solution beyond the current response:

chunk_session(Goal, Template, Offset, Answer) :-
    engine_fetch(Limit0),
    Limit = count(Limit0),
    chunk_loop(Goal, Template, Offset, Limit, Answer).

chunk_loop(Goal, Template, Offset, Limit, Answer) :-
    answer(Goal, Template, Offset, Limit, Answer),
    (   Answer = success(_, true)
    ->  engine_yield(Answer),
        engine_fetch(NextLimit),
        nb_setarg(1, Limit, NextLimit),
        fail
    ;   true
    ).

This works for normal finite goals and cached paging. The missing piece is a robust way to bound execution time for nonterminating goals.

Questions

  1. Should call_with_time_limit/2 interrupt nonterminating Prolog execution while it is being run through engine_next/2 or engine_post/3?
  2. If not, is this an intentional limitation that should be documented?
  3. Should engine_destroy/1 detect and reject attempts to destroy an engine that is currently running in another thread?
  4. Is there a recommended pattern for implementing timeouts around engines used this way?

engine_next/1 is specified:

% Switch control to Engine and if engine produces a result, switch
% control back and unify the instance of Template from
% engine_create/3,4 with Term.

Mostlikely call_with_time_limit/2 is bootstrapped from thread_signal/2.
If thread_signal/2 places the signal inside an engine context of the
parent engine that called engine_next/2, then the child engine

will not see this signal. Happens also in other Prolog systems,
when a signal is sent to a task. It will not automatically propagate
to other tasks. Erlang was a little bit different since it had a link

from agent to spawned agents, and did do some book keeping.
But many modern async libraries did abandon this link, and the
viewpoint of async tasks is that they are living in a sea of tasks.

One structuring method that became more popular recently,
like in the Python world, is then a so called task group, if you
would have a task group notion, and add a task group parenthesis

around your doing, you could tear down a task group. I am
currently using task groups in a notebook application. The
implementaion is a little limmited since there is only one task

group, but more mature systems allow multiple task groups
even addressing some quirks of the sea of tasks model that
are persisting to the life cycle of such tasks:

The Heisenbug lurking in your async code
Will McGugan - February 11, 2023
https://textual.textualize.io/blog/2023/02/11/the-heisenbug-lurking-in-your-async-code/

More a feature than a bug, i.e. task garbage collection.

Indeed. Not sure that has a reasonable fix. It has a reasonable work-around: apply the time limit inside the engine to the goal running for the next answer.

It should not segfault. Pushed a fix that makes the call wait until the engine is done, which implies this example hangs.

You should either do your timeout management inside the engine or use threads. On Linux these scale well enough, i.e., 100K threads won’t starve the machine. MacOS has a lot thread limit (some thousands). I don’t know about Windows.

Interestingly grouping exists on all levels. You can group
cooperating multitasking coroutines, or you can put truly
multithreading threads into groups, or even processes can

be grouped. Also there are a couple of approaches to
do it, like drawing on Linda tuple stores, which might be
seen as the ancestor of KIF/KQML. You then create a

second sea, usually called the blackboard or pandemonium
which can be used for data transfer and control transfer. The
structured approach might hide such details, and/or do it

completely different with primitives that might even show signal
evaporation. For example a monitor channel is fire and
forget, if no other monitor user is waiting, a signaled condition

evaporates. Structured approachs might have DSL like:

<PARALLEL>
  <TASK1/>
  <TASK2/>
  <TASK3/>
</PARALLEL>

In Ant Task or in David Harel state charts. But the later
mandates a broadcast communication mechanism for
certain events. Not sure how to get locality and error

handling with state charts realized on top of a Linda tuple
store
. Seems to be quite some exercise? What are the
alternatives for simple job control missions?

Thanks for input. Yes, there are many ways in which my current approach can be changed or extended, but I really need to focus on what I already have,

The main problem is how do you do a catholic, i.e. all embracing
design of Paul Tarau engines inside a Prolog system which has
already threads. One possibly design would say engines are special

thread so that you have:

public class Thread {
    public boolean is_engine;
    etc...
}

But then you see that alertThread might not hit the right engine.
Another design would realize a engine group simply by a parent
pointer, creating a 1-mc multiplicity, namely:

public class Thread {
    etc...
}

public class Engine {
    public Thread parent;
    etc...
}

The second modeling might have various semantic traps, if
signal handling involves polling, you need a further dereferencing.
I think SWI-Prolog has both data structures, both variants,

i.e. PL_thread_info_t and thread_handle.interactor. But it seems
the SWI Prolog system is designed around PL_thread_info_t,
already seen by the PL_ prefix, for example alertThread takes

such an argument, making it possibly less catholic?