Suggestion: concurrent_findall

I was thinking (for a use case I have now) it would be a good idea to have a:

concurrent_findall(+Template,:Goal,-Bag,:Sort,Opts)

This predicate would work like findall/3 but would do the following:

  1. Start several threads and run Goal in each thread, to partially fill the UnsortedBag
  2. Use Opts for thread creation or other options
  3. Run Bag=Sort(UnsortedBag) to sort the results in the user-expected order

This is very useful for when you have a large database and you want to search it, but need to keep the search order in the results. This assumes the search result is much smaller (in size) than the database, which is usually the case.

What do you guys think?

EDIT: Goal would be called as call(Goal,ThreadNumber) so that the user can decide how to partition the search inside the goal.

1 Like

In the database world, it’s considered a bad idea to depend on search order of results (IIRC, the SQL spec says that you can’t depend on it unless you have an ORDER BY). In they Python world, it’s a bad idea to depend on the ordering of items stored in a hash table (“dict” - see PYTHONHASHSEED).
Etc.

I meant the application/business logic needs the results in a certain order; just like it is guaranteed by findall/3 in a single-threaded app. Because findall/3 would already returns the results in the proper order, it would be fitting for concurrent_findall/3 to do the same also or at least to allow it. This is the purpose of the :Sort goal, which would be set to return its input if one doesn’t care about sorting the results.

Interesting trick, I’ll play with it some. I still think that a concurrent_findall/_ in the api would be useful though.

In the database world, where performance is gained by sophisticated indexing, parallelism, replication, and sharding, if you want to have results in the same order you put them in, then you should have a field that contains that ordering information (a time stamp, a sequence number, etc.). It’s a bad idea for people to depend on asserta/assertz and bagof/findall ordering for their results (and even worse if they depend on the data being in bags with possible duplications).

A SQL “ORDER BY” isn’t necessarily a performance problem - there’s often no need to sort the results because there can be indexes that have the ordering information already computed. (One of my pet interview questions for people who claimed database expertise was to ask them why b-tree organization was often preferred over hash indexing, even though hash indexing has O(1) lookup cost vs O(log N) for b-trees.)

2 Likes

In my use case, the ordering is not dependent on assert, but because the desired order never changes (it is an invariant in this domain), it is simply a set of facts in a source file. Therefore there is no danger here.

Nonetheless, I agree with you in the general database case. Nonetheless, both cases can be properly handled by the proposed concurrent_findall/_.

That’s the same as using assertz/1.

Even if it makes no difference now, I would caution against any guarantees in the order of results from concurrent_findall. (The original implementors of SQL took some shortcuts that seemed reasonable at the time but which prevented many useful optimizations later, and which also introduced many subtle inconsistencies in the query language.)

The order of results, in my proposal, is guaranteed by calling the user-defined Sort goal, not by anything else.

I tried your porposal with concurrent_and and got a 2-2.5 speed up , not bad.

Perhaps what you seek is a cousin to library(solution_sequences) but instead of working with findall/3 , create a new predicate concurrent_findall/3 (which might be concurrent_and/2 or concurrent_maplist/2 (ref)) and then the cousin library ( library(concurrent_solution_sequences) ) would work with concurrent_findall/3.

Cosytec proposed the partial search strategy to explore huge search spaces. You may be interested in implementing it in SWI in a concurrent fashion. Here is the paper title:
N. Beldiceanu, E. Bourreau, P. Chan, and D. Rivreau. Partial search strategy in CHIP. In
2nd International Conference on Metaheuristics - MIC 97, Sophia Antipolis, France, July
1997.

1 Like

The way I do this is to first listify whatever the inputs to :Goal1 in findall (+Template, :Goal1, -Bag1) are (using findall/3 to get them from the clausal store), then using concurrent_maplist (:Goal2, +Bag1, -Bag2).

I guess it would be more efficient to figure out how to do this without the initial traversal through the clausal store.