Counting actual prolog lines loaded

(I hope I didn’t misunderstand the conversation)

The aggregate predicate family is useful but at the moment not generic enough. The obvious to me solution to single-pass aggregation (very large files, streams) is as follows:

  1. Create a predicate that backtracks over the solutions;
  2. Use a non-backtrackable data structure for aggregation.

For the second one, I have used (nb)_rbtrees with this tiny bit of code:

nbdict_apply(X, Key, Pred, Init) :-
    (   nb_rb_get_node(X, Key, Node)
    ->  nb_rb_node_value(Node, Val0),
        call(Pred, Val0, Val1),
        nb_rb_set_node_value(Node, Val1)
    ;   nb_rb_insert(X, Key, Init)
    ).

This will either insert a default Initial value or apply the Predicate to the existing value associated with the Key. This post by Jan W discusses the computational complexity. It also suggests an obvious and better way to count word frequency that does not work if your input is big enough. I guess your question refers in part to that post?

If you do it like this, you can simply use a forall/2. If we assume you have defined file_word/2 predicate that succeeds once for every word in a file, you would do:

rb_empty(Freqs),
forall(file_word(File, Word),
    nbdict_apply(Freqs, Word, succ, 1)),
forall(rb_in(Key, Val, Freqs),
    format("~w ~w~n", [Key, Val]))

But this is still not optimal, you need to hand-roll both the backtracking predicate and the non-backtrackable accumulator for anything non-trivial. Do you have a better idea?