The subtle costs of garbage collection

Possibly on some workloads. In the early days of multi-threaded SWI-Prolog running things concurrently typically meant a performance improvement with 2 or 4 threads, after which the performance started dropping down, often quickly getting below the performance of 1 thread. At least on Linux I no longer observe that on most workloads.

Using this driver code

Driver
:- meta_predicate
    conc_table(+,0),
    conc(+,0).

conc_table(Max, Goal) :-
    format('~t~w~15|~t~w~15+~t~w~15+~n', ["#threads", "total", "per run"]),
    between(0,Max,Step),
    Conc is 1<<Step,
    call_time(conc(Conc, Goal), Time),
    PerSol is Time.wall/Conc,
    format('~t~D~15|~t~3f~15+~t~3f~15+~n', [Conc, Time.wall, PerSol]),
    fail.
conc_table(_, _).

conc(N, Goal) :-
    length(Threads, N),
    maplist(thread_create(Goal), Threads),
    maplist(thread_join, Threads).

we get this result. Machine is 16 cores, 32 threads. So, you see close to perfect speedup up to 16 threads (total time remains almost flat). Some more speedup to 32 (hyper threading) and after that flat performance.

?- conc_table(11, rtest_chats(100)).
       #threads          total        per run
              1          0.674          0.674
              2          0.668          0.334
              4          0.690          0.173
              8          0.723          0.090
             16          0.806          0.050
             32          1.201          0.038
             64          2.400          0.038
            128          4.640          0.036
            256          9.139          0.036
            512         18.337          0.036
          1,024         36.686          0.036
          2,048         73.649          0.036

There is still a reason for not using system threads: scheduling gets unfair as each of the threads has equal priority. Well, you can work around that using scheduling settings.

My assessment is roughly this, where details depend on workload and also OS (and possibly hardware).

  • Threads use more resources (stack, OS resources)
  • Fibers probably scale better with many often blocking tasks
  • Threads scale better with many CPU bound tasks
  • Fibers are easier to program with
  • If you use threads+fibers you get probably best scalability. On the other hand, where old OSes used to have that for implementing POSIX threads, e.g., Linux doesn’t bother.
  • With only fibers you can only use one CPU. With threads+fibers you can scale to multiple cores, but you get most of the complications of preemptive multi-threaded programming in return.
1 Like