How to achieve a x1000 performance increase till 2025 and why architecture matters

For those of you interested in understanding how processors architecture matters especially in AI, here is the link to the Intel Architecture Day 2021 … including subjects like caching, registers, INT, FLOAT, AMX, Matrix Multiplication, multi-threading and most of all the convergence to hybrid architecture … Tons of useful under the hood information to understand on how maximum performance can be achieved and the coming boost on processors, may it be in mobility, desktop or servers.

2 Likes

For small arithmetic tests, do not forget -O or set the Prolog flag optimise to true before compiling. Changes the runtime for this test from 0.200 to 0.138 for me :slight_smile:

SWI-Prolog’s GC is pretty aggressive, generally preferring GC over stack expansion. This is due to its focus on multi threading that make it attractive to keep the memory usage of threads low.

Minor GC pays off for programs that carry huge terms around which are not modified (a lot). If the code produces a lot of garbage a major GC will process the entire stacks, including the huge term. GC time is (roughly) linear in the size of the reachable memory (with a much smaller linear component for the non-reachable memory). Using minor GC memory can be kept close to the memory required by the huge terms without slowing down too much. Using only major GC we can only keep acceptable performance if we allow memory to grow to a couple of times the amount required for the huge terms. And yes, Mat’s excellent paper on SICStus GC describes both major and minor GC (called generational GC in the paper if I recall correctly). SWI-Prolog’s GC follows the paper quite closely, but only implements major GC.

Stack GC is local to a thread, so no reason to stop the world. In fact, the shared object garbage collectors for atoms and clauses do not stop the world either :slight_smile:

statistics/0 reports the shifts and time for them separately.