Not much followup discussion (other than Peter’s offer) so thought I’d share my motivating example: The Stigler Diet Problem | OR-Tools | Google Developers
This is a minimization problem with 77 variables and 9 constraints. According to the web page, available tools in C, Java, and Python produce a solution in about 1 ms. (Apparently, when originally solved in 1949, this problem took 120 man-days using desk calculators.) Out of the box, library(simplex)
takes about 1.5 sec on 8.5.14 (MacOS fat binary), so about 1500 times slower than the documented OR tools. Simple tweaks to remove a couple of unnecessary rational arithmetic operations reduces this to about 650 ms. - a nice improvement.
As explained earlier, profiling data indicates nth0/3
is now a significant bottleneck. As suggested arg/3
is an O(1) alternative to nth0
when the data is in a compound term rather than a list so I wrote a simple version of nth0_chk
which copies the list into a term and then uses arg
:
nth0_chk(N,L,X) :- % integer(N), N >=0,
C=..[c|L],
N1 is N+1,
arg(N1,C,X).
Strategically replacing nth0
by nth0_chk
in two locations reduced the execution time from 650 ms. to 440 ms. (now less than a factor of 500). Note that this isn’t near optimal since a full copy of the whole list is produced on each call and, on average, only half the the list is required anyway. A more focused benchmark accessing the 38th element (the average) of a 77 element list shows an almost 2x improvement for this specific case and one would expect a properly implemented primitive to do somewhat better (the last query copies a half list for comparison):
?- numlist(1,77,L), time((between(1,1000000,_), nth0(38,L,X), fail)).
% 15,000,003 inferences, 1.437 CPU in 1.437 seconds (100% CPU, 10441438 Lips)
false.
?- numlist(1,77,L), time((between(1,1000000,_), nth0_chk(38,L,X), fail)).
Correct to: "simplex:nth0_chk(38,L,X)"? yes
% 4,000,001 inferences, 0.758 CPU in 0.759 seconds (100% CPU, 5275028 Lips)
false.
?- numlist(1,38,L),time((between(1,1000000,_),nth0_chk(38,L,X),fail)).
Correct to: "simplex:nth0_chk(38,L,X)"? yes
% 4,000,001 inferences, 0.464 CPU in 0.464 seconds (100% CPU, 8620172 Lips)
false.
As an aside, for me performance analysis is complicated due to the fact that AFAICT MacOS fat binaries appear to run 20-30% slower on Intel than the the old Intel-only releases, presumably due to some C compiler option restriction. Sure would be nice if this wasn’t the case even if it entails rebuilding a custom version for Intel from source.