Hi!
The work described here was done in collaboration with Alfredo Capozucca from University of Luxembourg.
We find time discrepancies when running the same goal using concurrent_maplist w.r.t. when the goal is run sequentially (no parallelization).
Part of the code we use is as follows (the full program can be found here):
countSdk4(S) :-
open('log.txt',write,Log),
findall(_,
(findnsols(S,Y, between(1, 100, Y), L1),
get_time(Ti),
concurrent_maplist(sudoku4234(4),L1),
get_time(Tf),
T is Tf - Ti,
write(Log,T),
write(Log,'\n')
),
_
),
close(Log).
where sudoku4234/2
is a hard goal based on finding all the solutions for a 4x4 sudoku.
So, if we call countSdk4(1)
, the findnsols
call will return only one solution, so L1
is a single element list and so concurrent_maplist(sudoku4234(4),L1)
is basically maplist
and so only one processor is used. Then, through get_time
we take the time needed to solve each call to sudoku4234
.
In a second experiment we call countSdk4(8)
which means that batches of 8 calls to sudoku4234
will be run in parallel using 8 processors. In this case we take the time used to solve each batch of 8 calls.
In machines with more than 8 processors we expected that the time needed to solve a batch of 8 calls will be close to the maximum of the times needed to solve the same 8 calls sequentially (i.e. when countSdk4(1)
is called). But we found (consistent) discrepancies between 20% and 30% more time. That is, instead of the maximum we get 20-30% more time than that. For instance, these are the times (in seconds) needed to solve the first 8 calls to sudoku4234
in a Mac laptop with 12 processors:
33.0132069587707
33.0883748531341
32.6344561576843
32.7681801319122
32.6260249614715
32.5810809135437
26.2362399101257
31.1244869232177
while the same goals run in parallel took 42.5555839538574. As you can see the maximum of these 8 times is 33.0883748531341 meaning that running them in parallel took 28% more time than the maximum. The full data set can be found here.
Are we doing anything wrong? Is our expectation wrong? Can anyone tell us why is this happening?
Although we post the sudoku-based experiments we intend to use the same technique with {log} (setlog) (www.clpset.unipr.it/setlog.Home.html) which is a satisfiability solver for set theory and set relation algebra implemented in Prolog. The goals to be parallelized in {log} take way more time than solving 4x4 sodokus and the discrepancies are even higher than those shown here. This behavior would render the parallel approach not very useful.
Thanks!
Regards,
Maxi