Hi!

The work described here was done in collaboration with Alfredo Capozucca from University of Luxembourg.

We find time discrepancies when running the same goal using concurrent_maplist w.r.t. when the goal is run sequentially (no parallelization).

Part of the code we use is as follows (the full program can be found here):

```
countSdk4(S) :-
open('log.txt',write,Log),
findall(_,
(findnsols(S,Y, between(1, 100, Y), L1),
get_time(Ti),
concurrent_maplist(sudoku4234(4),L1),
get_time(Tf),
T is Tf - Ti,
write(Log,T),
write(Log,'\n')
),
_
),
close(Log).
```

where `sudoku4234/2`

is a hard goal based on finding all the solutions for a 4x4 sudoku.

So, if we call `countSdk4(1)`

, the `findnsols`

call will return only one solution, so `L1`

is a single element list and so `concurrent_maplist(sudoku4234(4),L1)`

is basically `maplist`

and so only one processor is used. Then, through `get_time`

we take the time needed to solve each call to `sudoku4234`

.

In a second experiment we call `countSdk4(8)`

which means that batches of 8 calls to `sudoku4234`

will be run in parallel using 8 processors. In this case we take the time used to solve each batch of 8 calls.

In machines with more than 8 processors we expected that the time needed to solve a batch of 8 calls will be close to the maximum of the times needed to solve the same 8 calls sequentially (i.e. when `countSdk4(1)`

is called). But we found (consistent) discrepancies between 20% and 30% more time. That is, instead of the maximum we get 20-30% more time than that. For instance, these are the times (in seconds) needed to solve the first 8 calls to `sudoku4234`

in a Mac laptop with 12 processors:

33.0132069587707

33.0883748531341

32.6344561576843

32.7681801319122

32.6260249614715

32.5810809135437

26.2362399101257

31.1244869232177

while the same goals run in parallel took 42.5555839538574. As you can see the maximum of these 8 times is 33.0883748531341 meaning that running them in parallel took 28% more time than the maximum. The full data set can be found here.

Are we doing anything wrong? Is our expectation wrong? Can anyone tell us why is this happening?

Although we post the sudoku-based experiments we intend to use the same technique with {log} (setlog) (www.clpset.unipr.it/setlog.Home.html) which is a satisfiability solver for set theory and set relation algebra implemented in Prolog. The goals to be parallelized in {log} take way more time than solving 4x4 sodokus and the discrepancies are even higher than those shown here. This behavior would render the parallel approach not very useful.

Thanks!

Regards,

Maxi