Just noticed that group_by/4 calculates variables and then
delegates to bagof/3. But the later predicate calculates also
varables, so I suspect quite an overhead:
/* SWI-Prolog 9.3.19 */
group_by(By, Template, Goal, Bag) :-
ordered_term_variables(Goal, GVars),
ordered_term_variables(By+Template, UVars),
ord_subtract(GVars, UVars, ExVars),
bagof(Template, ExVars^Goal, Bag).
I went with another soluton. First I provided a variant of aggregate/3
by the name aggregate_by/4
where one can offload the internal
term_variables/2 calculation. Then use this bootstrapping:
/* Dogelog Player 1.3.0 */
group_by(Witness, Template, Goal, List) :-
aggregate_by(Witness, bag(Template), Goal, List).
Here is some testing:
/* SWI-Prolog 9.3.19 */
?- length(_H,4000), time((between(1,2000,_),
group_by(X,Y,(nonvar(_H),between(1,10,Y),between(1,10,X)),L),
fail; true)).
% 1,153,998 inferences, 0.562 CPU in 0.568 seconds (99% CPU, 2051552 Lips)
true.
?- length(_H,8000), time((between(1,2000,_),
group_by(X,Y,(nonvar(_H),between(1,10,Y),between(1,10,X)),L),
fail; true)).
% 1,153,998 inferences, 1.047 CPU in 1.060 seconds (99% CPU, 1102326 Lips)
true.
/* Dogelog Player 1.3.0 */
?- length(_H,4000), time((between(1,2000,_),
group_by(X,Y,(nonvar(_H),between(1,10,Y),between(1,10,X)),L),
fail; true)).
% Zeit 399 ms, GC 0 ms, Lips 16987636, Uhr 10.02.2025 10:49
true.
?- length(_H,8000), time((between(1,2000,_),
group_by(X,Y,(nonvar(_H),between(1,10,Y),between(1,10,X)),L),
fail; true)).
% Zeit 400 ms, GC 1 ms, Lips 16945167, Uhr 10.02.2025 10:50
true.
The old version suffers from some term_variables/2 dependency
whereas the new version is totally immune on the size of the
given goal, since any internal term_variables/2 has been offloaded.
I couldn’t name aggregate_by/4
as aggregate/4, since the later
already exists in SWI-Prolog and SICStus Prolog and has a different
semantics, it is not the analog of distinct/2, where one can specify Witnesses.