Enhanced thread signal interface

During a proprietary project we had the need for an enriched interface for thread_signal/2 and documented the behavior of this interface more precisely. The issue that triggered this was that signals got lost if they were queued (because they could not be processed in time) and a signal earlier raised an exception. Behavior should now be consistent, regardless of whether signals are queued or not. Being able to run goals that cannot be interrupted by signals is now part of the public API (sig_atomic/1) and calls have been added to (un)block signals, inspect and modify the signal queue.

This is on swipl-devel git repo.

If you see flaws or unconventional issues with this interface, please share.

--- Jan

Below are the docs (copy/paste destroyed the layout a little),

10.3.3 Signalling threads

The predicates in this section provide signalling between threads. A thread signal inserts any goal as an interrupt into the control flow of any target thread. The target thread processes the goal at the first safe opportunity. The mechanism was introduced with two goals in mind: (1) running a goal inside a thread for debugging purposes such as enabling the status or get access thread-specific data and (2) force a thread to abort its current goal by inserting an exception into its control flow.

Over time, more complicated use cases have been identified that may result in multiple signals that occur (nearly) simultaneous. As of version 8.5.1 the interface has been extended and the interaction with other built-in predicates has been specified in much more detail.

[det] thread_signal (+ThreadId, :Goal)

Make thread ThreadId execute Goal at the first opportunity. The predicate thread_signal/2 itself places Goal into the signalled thread’s signal queue and returns immediately.

ThreadId executes Goal as an interrupt at the first opportunity. Defined opportunities are:

  • At the call port of any predicate except for predicates with the property sig_atomic . Currently this only applies to sig_atomic/1.
  • Before retrying a foreign predicate.
  • Before backtracking to the next clause of a Prolog predicate.
  • When a foreign predicate calls PL_handle_signals(). Foreign predicates that take long to complete should call PL_handle_signals() regularly and return with FALSE after PL_handle_signals() returned -1, indicating an exception was raised.
  • Foreign predicates calling blocking system calls should attempt to make these system calls interruptible. To enable this on POSIX systems, SWI-Prolog sends a SIGUSR2 to the signalled thread while the handler is an empty function. This causes most blocking system calls to return with EINTR . See also the commandline option –sig-alert . On Windows, PL_handle_signals() is called when the user processes Windows messages.
  • For some blocking (thread) APIs we use a timed version with a 0.25 sec timeout to achieve a polling loop .

If one or more signals are queued, the queue is processed. Processing the queue skips signals blocked due to sig_block/1 and stops after the queue does not contain any more non-blocked signals or processing a signal results in an exception. After an exception, other signals remain in the queue and will be processed after unwinding to the matching catch/3. Typically these queued signals will be processed during the Recover goal of the catch/3. Note that sig_atomic/1 may be used to protect the recovery goal.

The thread_signal/2 mechanism is primarily used by the system to insert debugging goals into the target thread (tspy/1, tbacktrace/1, etc.) or to interrupt a thread using e.g., thread_signal(Thread, abort) . Predicates from library library(thread) use signals to stop workers for e.g. concurrent_maplist/2 if some call fails. Applications may use it, typically for similar purposes such as asynchronously stopping tasks or inspecting the status of a task. Below we describe the behaviour of thread signalling in more detail. The following notes apply for Goal executing in ThreadId

  • The execution is protected by sig_atomic/1 and thus signal execution is not nested .
  • If Goal succeeds , possible choice points are discarded. Changes to the Prolog stacks such as changes to backtrackable global variables remain.
  • If Goal fails , no action is taken, i.e., failure is not considered a special condition.
  • If Goal raises an exception the exception is propagated into the environment. This allows for forcefully stopping the target thread. The system uses this to implement abort/0 and call_with_time_limit/2.
  • Code into which signals may be injected must make sure to use setup_call_cleanup/3 and friends to ensure proper cleanup in the case of an exception. This is good practice anyway to guard against unpredicatable exceptions such as resource exhaustion.
  • Goal may use stack inspection such as prolog_frame_attribute/3 to determine what the thread is doing.

[det] sig_pending (-List)
True when List contains all signals submitted using thread_signal/2 that are not yet processed. This includes signals blocked by sig_block/1.

[det] sig_remove (:Pattern, -List)

Remove all signals that unify with Pattern from the signal queue and make the removed signals available in List

[det] sig_block (:Pattern)

Block thread signals queued using thread_signal/2 that match Pattern.

[det] sig_unblock (:Pattern)

Remove any effect of sig_block/1 for patterns that are more specific (see subsumes_term/2). If any patterns are removed, reschedule blocked signals. Note that sig_unblock/1 normally causes all unblocked signals to be executed immediately.

[semidet] sig_atomic (:Goal)

Execute Goal as once/1 while blocking both thread signals (see thread_signal/2) and OS signals (see on_signal/3). The system executes some goals while blocking signals. These are:

  • The goal injected using thread_signal/2, i.e., signals do not interrupt a running signal handler.
  • The Setup call of setup_call_cleanup/3 and friends.
  • The Cleanup call of call_cleanup/2 and friends.
  • Compiling a file or loading a quick load file .

The call port of sig_atomic/1 does not handle signals. This may notably be used to prevent interruption of the catch/3 Recover goal. For example, we may ensure the recovery goal of a timeout is called using the code below. Without this precaution another signal may run before writeln/1 and raise an exception to prevent its execution. Note that catch/3 should generally not be used for cleanup of resources in case of an exception and thus it is typically fine if its Recover goal is interrupted. Use setup_call_cleanup/3 or one of the other predicates from the call_cleanup/2 family for cleanup.

…, catch(call_with_time_limit(Time, Goal), time_limit_exceeded, sig_atomic(writeln(‘Time limit exceeded’))).

3 Likes

Congrats

Congratulations on a well done and excellent design of this API! At this point I don’t know of any other high-level language that is able to handle sending signals to blocking foreign code, and you handle this nicely in a well defined manner (compliments on thinking about using SIGUSR2 for this). Not even erlang provides this.

Queued blocked goals

The only question I have is if it is possible to tell which goals are in the queue due to sig_block/1 and which ones are in the queue simply because they have not been yet executed.

Is this part of the information returned in List when calling sig_pending/1?

(minor note: the predicate links here seem to be pointing to localhost – presumably those should be swi-prolog.org (once it’s running the latest)?)

I wish i could understand in depth the capabilities and need covered by them.

Are there introductory resources you could suggest to look at, to read up on this.

thanks,

Dan

You would know when you need this kind of API, it is not a generic type of feature, but it is quite useful when you are doing some more advanced things with multi-threaded code. The new improvements are especially useful if you have foreign code running in some of those threads, and also solves some gray areas in the previous code.

The only typical use (that I can think of) for a regular user is to abort a thread by calling: thread_signal(Thread, abort) as it is mentioned above.

Thanks …

I am working on a demo that communicates via web-sockets bi-directionally with a game engine – essentially, doing some of the game logic in Prolog.

I tinkered a bit to get this working and also played with threading --i.e. when messages are received from the game then these are handled by the same goal, but in a separate thread in the Prolog.

I guess, one use case would then be “global” decision has to be taken in the game, that would need to affect all currently running threads - for example, to abort them all, and do some global action such as game reset – or level ended – or something like this.

Do i see this correctly ?

Dan

I think this kind of scenario should be dealt with using regular messaging between the threads. Thread signaling is more for special situations like what Jan mentioned above:

What you described is more part of the regular life-cycle of the application (in this case your game).

Is this to insert an exception in particular – or just to abort a thread.

E.g. if i have an algorithm that is parallelized and once one thread finds a solution or identifies a condition that fails everything, then all threads should be aborted but not specifically or necessarily with an exception.

Thanks for the positive comment!

You got me :slight_smile: I was thinking about that. Roughly I guess there are two ways out, (1) is to mark the signals you get from sig_pending/1 and (2) would be to provide a predicate that tests whether a signal is blocked (which already exists internally as that is used to decide whether we should block a signal). So far I’m not convinced it is worth adding. When considered necessary the second solution is probably best as the first requires some ugly wrapping of the signal goals that will probably just be annoying for most users.

Another thing I was considering is to add a possibility for thread_wait/2 to include waiting for a signal. At least that is backed up by the POSIX sigwait() primitive. If you want to wait for something you already can wait for messages or database changes. That may be enough?

That is what first_solution/3 uses thread signals for. I think it is valid usage. The main complication is that you must make sure that it is safe to interrupt your code. Pure Prolog code is. Anything that involves side effects or allocates things such as (file) handles must make sure to properly cleanup. Most of this can be achieved using setup_call_cleanup/3, undo/1, transaction/1, etc. The other problem concerns blocking calls. As explained in the docs, this is mostly solved. Still, it requires foreign code to be aware of signals (properly behaved code should be anyway) and some APIs cannot be aborted, which requires fallback to polling (if that is possible). Polling either wastes CPU cycles or results in slow response :frowning:

Will be corrected at the next release. The quickest way to get the docs here was to view them in PlDoc and copy/paste.

When I read

force a thread to abort its current goal by inserting an exception into its control flow.

the model that came to mind is that of print spooling.


I know that many don’t dig into understanding how to communicate with debuggers via the API but many have experience with trying to kill a large print job while it is printing.


Another similar usage that comes to mind is web crawlers

You’re probably right about this, the use cases are narrow.

Distributed SWI-Prolog

I think this is a great idea. What came to my mind is that this (along with the whole signal API) can provide the infrastructure to make SWI-Prolog truly distributed, allowing inter-process communication by sending signals across the network or to processes in the same machine (it is better and more generic that simple message passing with mailboxes). I think this, together with the redis library would cover this major use case of having a distributed prolog system. This, in my humble opinion, will give SWI-Prolog a major advantage given the current distributed world.

Conceptually, speaking – its probably cleaner to support a “signal native” wait rather than force the programmer to know and use other, unrelated, mechanisms such as messaging and the database.

Unless, messaging here means messaging related to signals.

I’m interested to hear what you have in mind. Signalling doesn’t sound like a great idea to solve distributed programming issues, possibly beside dealing with urgent and unexpected situations. If it is unexpected you’re not going to wait for them though :slight_smile:

Note that you can fairly easily deal without sigwait. Just make the signal handler call e.g. assert(signalled) and use thread_wait/2 on signalled/0. I that will do until we have a good use case.

I’ve used POSIX sigwait() once. It is a famous trick to stop a thread: send it a signal, have the signal handler signal a semaphore such that the signalling thread can wait for the signal to be handled and then make the signal handler block using sigwait(). SWI-Prolog’s atom garbage collector used that before it was able to run without stopping threads.

You can now use a similar trick in Prolog. That is a little easier as Prolog signals can safely be mixed with everything while POSIX signal handlers must be handled with care (but they run truly asynchronous). Still, stopping a thread is typically a bad idea.