Suggestion: concurrent_findall

swi · July 12, 2020, 6:45pm

I was thinking (for a use case I have now) it would be a good idea to have a:

concurrent_findall(+Template,:Goal,-Bag,:Sort,Opts)

This predicate would work like findall/3 but would do the following:

Start several threads and run Goal in each thread, to partially fill the UnsortedBag
Use Opts for thread creation or other options
Run Bag=Sort(UnsortedBag) to sort the results in the user-expected order

This is very useful for when you have a large database and you want to search it, but need to keep the search order in the results. This assumes the search result is much smaller (in size) than the database, which is usually the case.

What do you guys think?

EDIT: Goal would be called as call(Goal,ThreadNumber) so that the user can decide how to partition the search inside the goal.

peter.ludemann · July 12, 2020, 6:59pm

In the database world, it’s considered a bad idea to depend on search order of results (IIRC, the SQL spec says that you can’t depend on it unless you have an ORDER BY). In they Python world, it’s a bad idea to depend on the ordering of items stored in a hash table (“dict” - see PYTHONHASHSEED).
Etc.

swi · July 13, 2020, 1:11am

I meant the application/business logic needs the results in a certain order; just like it is guaranteed by findall/3 in a single-threaded app. Because findall/3 would already returns the results in the proper order, it would be fitting for concurrent_findall/3 to do the same also or at least to allow it. This is the purpose of the :Sort goal, which would be set to return its input if one doesn’t care about sorting the results.

swi · July 13, 2020, 1:34am

Interesting trick, I’ll play with it some. I still think that a concurrent_findall/_ in the api would be useful though.

peter.ludemann · July 13, 2020, 1:51am

In the database world, where performance is gained by sophisticated indexing, parallelism, replication, and sharding, if you want to have results in the same order you put them in, then you should have a field that contains that ordering information (a time stamp, a sequence number, etc.). It’s a bad idea for people to depend on asserta/assertz and bagof/findall ordering for their results (and even worse if they depend on the data being in bags with possible duplications).

A SQL “ORDER BY” isn’t necessarily a performance problem - there’s often no need to sort the results because there can be indexes that have the ordering information already computed. (One of my pet interview questions for people who claimed database expertise was to ask them why b-tree organization was often preferred over hash indexing, even though hash indexing has O(1) lookup cost vs O(log N) for b-trees.)

swi · July 13, 2020, 2:22am

In my use case, the ordering is not dependent on assert, but because the desired order never changes (it is an invariant in this domain), it is simply a set of facts in a source file. Therefore there is no danger here.

Nonetheless, I agree with you in the general database case. Nonetheless, both cases can be properly handled by the proposed concurrent_findall/_.

peter.ludemann · July 13, 2020, 2:39am

That’s the same as using assertz/1.

Even if it makes no difference now, I would caution against any guarantees in the order of results from concurrent_findall. (The original implementors of SQL took some shortcuts that seemed reasonable at the time but which prevented many useful optimizations later, and which also introduced many subtle inconsistencies in the query language.)

swi · July 13, 2020, 2:41am

The order of results, in my proposal, is guaranteed by calling the user-defined Sort goal, not by anything else.

swi · July 13, 2020, 2:42am

I tried your porposal with concurrent_and and got a 2-2.5 speed up , not bad.

EricGT · July 13, 2020, 9:28am

Perhaps what you seek is a cousin to library(solution_sequences) but instead of working with findall/3 , create a new predicate concurrent_findall/3 (which might be concurrent_and/2 or concurrent_maplist/2 (ref)) and then the cousin library ( library(concurrent_solution_sequences) ) would work with concurrent_findall/3.

PeterChan · February 14, 2021, 8:59pm

Cosytec proposed the partial search strategy to explore huge search spaces. You may be interested in implementing it in SWI in a concurrent fashion. Here is the paper title:
N. Beldiceanu, E. Bourreau, P. Chan, and D. Rivreau. Partial search strategy in CHIP. In
2nd International Conference on Metaheuristics - MIC 97, Sophia Antipolis, France, July
1997.

joeblog · February 16, 2021, 5:52pm

The way I do this is to first listify whatever the inputs to :Goal1 in findall (+Template, :Goal1, -Bag1) are (using findall/3 to get them from the clausal store), then using concurrent_maplist (:Goal2, +Bag1, -Bag2).

I guess it would be more efficient to figure out how to do this without the initial traversal through the clausal store.

Topic		Replies	Views
Concurrent_setof? Discussion	3	492	December 19, 2020
Careful with the findall Help!	0	275	August 5, 2021
Efficiently use concurrency predicates to find n solutions General	1	50	December 31, 2024
Strange behavior with findall/3 Help! discussion	3	111	May 2, 2024
A little doubt on maplist/2 Help! discussion	3	774	April 29, 2020

Suggestion: concurrent_findall

Related topics