Engines and preemptive scheduling


From what i read about how Engines work – they are attached to a thread and cooperatively yield once work is done – it seems that one could not program a preemptive schedule of engines.

Is this correct?



Engines work the normal way like SWI-Prolog multi-threading works.
You can take a CPU with one core. So anything with more than one
thread will use preemptive scheduling by the operating system.

Now you can create 3 threads, create 3 engines, run the 3 engines
each on one of the 3 threads. They will be premptively scheduled
by the operating system.

You can run in some of the engines a Prolog program that calls
with_mutex/2, and the engines and their threads will block
as in normal SWI-Prolog multi-threaded.

The blocking means the operating system preemptive scheduler
suspends i.e parks the threads. And also unsuspends i.e. unparks
then depending on the aquire and release of the mutex.

Edit 03.11.2021:
The parking metaphore is possibly the best image for what then coroutines
do. If you now would do the following, build some extra scheduler that runs in
one thread and manages multiple engines, like the Erlang loop does,

then you have the advantage that when an actor suspends, you don’t
have to park the thread. The thread with its extra scheduler continues
running and handles other actors which are not suspended.

There are basically two levels, threads

and then inside threads coroutines:


1 Like

thank you —

I was thinking about a virtualization of engines, so that many more engines could be scheduled that threads existing … i guess this is not possible

Thats corouting. If you do not insist that this scheduling is then preemptive.
To allow such corouting there is engine_yield/1. Thats what Erlang does with
its actors, it loads many many actors on one thread.

Actually they have a corresponding principle of only a few schedulers,
a scheduler runs in one thread and vice versa. He schedules multiple
actors. They aling the number of scheduers with the number of cores:

The Scheduler Loop
The current strategy of the load balancer is to use as few schedulers
as possible without overloading any CPU.

But you could also build a preemptive scheduler inside a single thread.
You would then have like a real time sharing system inside a real
time sharimg system. I guess nobody does this.

There is this leveling in Erlang, few multiple schedulers on the level
of preemptive threads and then inside each scheduler coroutine
style many many actors, which have special suspension points

where there can transfer control to other actors.


Yes, I misunderstood – i thought that BEAM is preemptive, but its not.

Instead, apparently, whenever a function is called within a Process code, a “reduction” is calculated, and when a limit is achieved the process is suspended by a yield.

In Erlang its apparently a built in thing that the compiler provides for scheduling – so reduction counting and handing is at the VMI level.

In essence the analog would be that every call in an engine would need to be wrapped in order to calculate a reduction and then yield when a limit for an engine is achieved.

Looks like this can be done, but it sound very expensive – since every predicate call in an engine needs to be wrapped.

Swi prolog would need to be extended to support a BEAM like cooperative scheduling of engines by adding VMI instruction wrappers to calls and reduction handing and, i guess, hooks for a scheduler.


Btw, there are no loop constructs in Erlang – its all (tail) recursion, hence function calls – so, the reduction based yield algorithm can be injected in such loops as well.



Yes I saw this thingy. You could hook into the SWI-Prolog inference counter to do the same, but I guess you need to modify the engine implementation. I think this is done to avoid that you have one actor, that indefinitely runs and never yields his engine, because it never calls receive or friends, that would yield.

Maybe this automatic suspend is also a good opportunity to do garbage collection. In another thread here by some testing with fibonacci I saw that SWI-Prolog does up to 800 times garbage collection per second. In Dogelog Runtime I have a set up which does 60 garbage collections per second,

using the inference counter to decide when to do garbage collection.
I don’t know how SWI-prolog times the garbage collection?

Perhaps hooking into the inference counter is a good idea …

In Erlang the reason they use reduction and not preemption is to ensure completion of certain tasks such as tagging of integers – which can not be interrupted.

However, i guess in Prolog an engine could simply run for x inference steps and then yield.

How does one hook into inference count – i noticed call with inference limit, but that’s not what is needed here.


How is inference count actually defined - what exactly is counted?

Check out/fork SWI-Prolog devel from GitHub repository. Do some programming check
in your fork. Offer your fork to Jan W. that it gets integrated back into SWI-Prolog devel.

SWI-Prolog is open source, the above is called a pull request. The programming will
possibly involved C code and Prolog code. If you are nice person you write docu as well.

Contributions to a source code repository that uses a distributed version control system are commonly made by means of a pull request , also known as a merge request

That’s quite a project – since i have no idea yet how to program hooks going from C back to Prolog, nor how hooks work in Prolog, nor where inferences are counted, etc.

As a first stab at Erlang style processes i would probably try use to use wrapper to inline a reduction and yield check – or simply manually put a yield check into each predicate called from within an engine.

A good start is:

git grep inferences

The docs and implementation of call_with_inference_limit/3 also help.

Yes, you could combine this with engine_yield/1 to achieve a new type of yield. I don’t know how useful that is. Maybe you can do interesting stuff. As explained before though, SWI-Prolog thread/engine handling doesn’t really scale into the millions. The main reason is in memory management, both atom/clause GC and discarding old objects such as outdated clause indexes, hash tables, etc. All these do linear scans through the existing engines and threads (they are internally the same except for a few flags). Thousands of these work fine. Depending on hardware and Prolog workload things go wrong (as in gets slow) at some point. Applications that do not create (many) atoms and do not retract (many) clauses may scale quite far.

Thank you.

My intent is to use this only for analysis of (graph) structures rather than creation / modification of the graph in the KB. So, perhaps, for this more constrained use case, it may have good merit.

Although, i do need to aggregate results – need to see how this could be done, in such an isolated and concurrent environment – perhaps by passing in and out a dictionary or something.