@ridgeworks discovered that 8.1.26 is up to nearly twice as fast as the current version on MacOS. I first kind of didn’t believe it, but in the end it proves true. What changed is that I installed the latest (10.15.4) MacOS version on the build machine and I got two Xcode updates since. Current version using clang 11.0.3. I don’t know the other versions
See this on the old but still useful Pereira benchmark suite below.
Anyone has a clue what causes this? Did Apple really mess up so badly or did they change some default flag/added additional security/… that causes this? Hints on how to get back to good old performance are welcome.
8.1.26 from download page
tail_call_atom_atom took 0.38-0.09=0.29/145946 = 2.0 usec/iter.
binary_call_atom_atom took 0.41-0.06=0.35/94737 = 3.7 usec/iter.
cons_list took 0.29-0.06=0.24/91525 = 2.6 usec/iter.
walk_list took 0.36-0.07=0.28/122727 = 2.3 usec/iter.
walk_list_rec took 0.34-0.08=0.26/125581 = 2.1 usec/iter.
args(1) took 0.34-0.07=0.26/120000 = 2.2 usec/iter.
args(2) took 0.29-0.05=0.24/81818 = 3.0 usec/iter.
args(4) took 0.28-0.03=0.25/54545 = 4.5 usec/iter.
args(8) took 0.31-0.02=0.29/33333 = 8.7 usec/iter.
args(16) took 0.32-0.01=0.31/19355 = 15.8 usec/iter.
cons_term took 0.30-0.05=0.24/84375 = 2.9 usec/iter.
walk_term took 0.33-0.07=0.26/110204 = 2.4 usec/iter.
walk_term_rec took 0.48-0.08=0.41/122727 = 3.3 usec/iter.
shallow_backtracking took 0.29-0.27=0.02/415385 = 0.0 usec/iter.
deep_backtracking took 0.23-0.04=0.20/59341 = 3.3 usec/iter.
choice_point took 0.66-0.06=0.60/94737 = 6.3 usec/iter.
trail_variables took 0.56-0.06=0.51/87097 = 5.8 usec/iter.
medium_unify took 0.79-0.51=0.29/771429 = 0.4 usec/iter.
deep_unify took 0.70-0.15=0.55/235161 = 2.3 usec/iter.
integer_add took 0.63-0.03=0.60/49091 = 12.2 usec/iter.
floating_add took 0.53-0.03=0.49/40909 = 12.1 usec/iter.
arg(1) took 0.30-0.03=0.27/40000 = 6.8 usec/iter.
arg(2) took 0.31-0.03=0.28/40909 = 6.9 usec/iter.
arg(4) took 0.28-0.03=0.26/37500 = 6.9 usec/iter.
arg(8) took 0.28-0.03=0.26/38217 = 6.8 usec/iter.
arg(16) took 0.28-0.03=0.26/38298 = 6.8 usec/iter.
index took 0.57-0.07=0.50/100000 = 5.0 usec/iter.
assert_unit took 1.03-0.00=1.03/1525 = 676.1 usec/iter.
access_unit took 0.32-0.02=0.30/26471 = 11.4 usec/iter.
slow_access_unit took 0.02-0.00=0.02/1607 = 11.2 usec/iter.
setof took 0.20-0.01=0.20/7692 = 25.4 usec/iter.
pair_setof took 0.23-0.01=0.22/6522 = 34.3 usec/iter.
double_setof took 0.46-0.00=0.46/1837 = 251.5 usec/iter.
bagof took 0.17-0.01=0.16/10112 = 16.2 usec/iter.
34 benchmarks took 11.16 seconds
8.1.26 rebuilt using latest clang
tail_call_atom_atom took 0.75-0.13=0.62/145946 = 4.2 usec/iter.
binary_call_atom_atom took 0.73-0.08=0.65/94737 = 6.9 usec/iter.
cons_list took 0.68-0.08=0.60/91525 = 6.6 usec/iter.
walk_list took 0.88-0.11=0.77/122727 = 6.2 usec/iter.
walk_list_rec took 0.76-0.12=0.64/125581 = 5.1 usec/iter.
args(1) took 0.69-0.11=0.58/120000 = 4.8 usec/iter.
args(2) took 0.65-0.07=0.58/81818 = 7.1 usec/iter.
args(4) took 0.69-0.05=0.64/54545 = 11.7 usec/iter.
args(8) took 0.76-0.03=0.73/33333 = 22.0 usec/iter.
args(16) took 0.86-0.02=0.84/19355 = 43.6 usec/iter.
cons_term took 0.73-0.09=0.64/84375 = 7.6 usec/iter.
walk_term took 0.89-0.12=0.77/110204 = 7.0 usec/iter.
walk_term_rec took 1.08-0.11=0.97/122727 = 7.9 usec/iter.
shallow_backtracking took 0.42-0.38=0.04/415385 = 0.1 usec/iter.
deep_backtracking took 0.44-0.05=0.39/59341 = 6.5 usec/iter.
choice_point took 1.10-0.09=1.01/94737 = 10.7 usec/iter.
trail_variables took 0.97-0.08=0.89/87097 = 10.3 usec/iter.
medium_unify took 0.98-0.72=0.26/771429 = 0.3 usec/iter.
deep_unify took 0.74-0.22=0.52/235161 = 2.2 usec/iter.
integer_add took 0.85-0.04=0.81/49091 = 16.4 usec/iter.
floating_add took 0.74-0.04=0.70/40909 = 17.2 usec/iter.
arg(1) took 0.49-0.04=0.46/40000 = 11.5 usec/iter.
arg(2) took 0.51-0.04=0.47/40909 = 11.5 usec/iter.
arg(4) took 0.47-0.03=0.43/37500 = 11.5 usec/iter.
arg(8) took 0.48-0.04=0.44/38217 = 11.6 usec/iter.
arg(16) took 0.48-0.04=0.45/38298 = 11.7 usec/iter.
index took 0.95-0.09=0.86/100000 = 8.6 usec/iter.
assert_unit took 0.81-0.00=0.81/1525 = 531.1 usec/iter.
access_unit took 0.57-0.02=0.54/26471 = 20.5 usec/iter.
slow_access_unit took 0.03-0.00=0.03/1607 = 19.9 usec/iter.
setof took 0.24-0.01=0.24/7692 = 30.7 usec/iter.
pair_setof took 0.27-0.01=0.26/6522 = 40.2 usec/iter.
double_setof took 0.54-0.00=0.54/1837 = 292.9 usec/iter.
bagof took 0.24-0.01=0.23/10112 = 22.5 usec/iter.
34 benchmarks took 19.42 seconds
If you say “broke” I guess you mean something different than Ciao being just slower?
(there is quite some variance in the timing). Disabling threaded bytecode does help for clang, but changed the time from 19.4 to 14.8 sec. So it is still useful to disable, but it seems wise to keep building the binary bundles with gcc.