Counting actual prolog lines loaded

grossdan · July 12, 2021, 6:51am

Hello,

I noticed (great) code that counts lines of a file – so i could use this to load a files in a folder and count their lines.

However, i am curious as to how many lines of code a loaded program has. Can this be counted?

There is listing but i am unsure – from the documentation – if it goes into modules – and it does also list things that aren’t part of the consulted files, but predefined.

Edit:

Can one identify in a line a valid prolog term, without enumerating all possible terms it could be?

Using the file read approach, i am now wondering if i can identify in a given line a valid prolog term – if such a term can be identified, then its a valid prolog line and not something else --i.e. comment or blank line

thanks,

Dan

Edit:

That’s the code i currently have:

%------------------------- 
% all, and only,  *.pl files are in a folder in path
% path atom must end with '\\'
%-------------------------

count_lines(Path, Count) :-
	dir_files(Entries),
	aux_count_file(Path, Entries, Count).

aux_count_file([], 0).

aux_count_file(Path, [First | Rest], New_Count) :-
	First \= '.',
	First \= '..',
	file_nl_count(First, Count),
	aux_count_file(Path, Rest, Count_Rest),
	New_Count is Count + Count_Rest.
	
aux_count_file(Path, [_First | Rest], New_Count) :-	
	aux_count_file(Path, Rest, New_Count).


dir_files(Path, Entries) :-
	directory_files(Path, Entries).

file_nl_count(Path, File0, Count) :-
	atomic_concat(Path, File0, File),
	format('~w~n', [File]),
    setup_call_cleanup(
        open(File, read, In),
        stream_nl_count(In, Count),
        close(In)).

stream_nl_count(In, Count) :-
    stream_to_lazy_list(In, List),
    aggregate_all(count, member(0'\n, List), Count).

peter.ludemann · July 16, 2021, 3:45pm

If you want to find all the clauses in a file, you could probably use read_term/2 with the subterm_positions option (or term_expansion/4, which uses read_term/2 with that option).

Boris · July 19, 2021, 1:16pm

(I hope I didn’t misunderstand the conversation)

The aggregate predicate family is useful but at the moment not generic enough. The obvious to me solution to single-pass aggregation (very large files, streams) is as follows:

Create a predicate that backtracks over the solutions;
Use a non-backtrackable data structure for aggregation.

For the second one, I have used (nb)_rbtrees with this tiny bit of code:

nbdict_apply(X, Key, Pred, Init) :-
    (   nb_rb_get_node(X, Key, Node)
    ->  nb_rb_node_value(Node, Val0),
        call(Pred, Val0, Val1),
        nb_rb_set_node_value(Node, Val1)
    ;   nb_rb_insert(X, Key, Init)
    ).

This will either insert a default Initial value or apply the Predicate to the existing value associated with the Key. This post by Jan W discusses the computational complexity. It also suggests an obvious and better way to count word frequency that does not work if your input is big enough. I guess your question refers in part to that post?

If you do it like this, you can simply use a forall/2. If we assume you have defined file_word/2 predicate that succeeds once for every word in a file, you would do:

rb_empty(Freqs),
forall(file_word(File, Word),
    nbdict_apply(Freqs, Word, succ, 1)),
forall(rb_in(Key, Val, Freqs),
    format("~w ~w~n", [Key, Val]))

But this is still not optimal, you need to hand-roll both the backtracking predicate and the non-backtrackable accumulator for anything non-trivial. Do you have a better idea?

jan · July 19, 2021, 4:43pm

I think @boris doesn’t like that this program uses memory that corresponds to the length of input text rather than the set of words. If you are not concerned with that, the most efficient way is to convert the input into a list of words, sort the list without removing duplicates and count adjacent equivalent words. If you want to process in time proportional to the number of words it gets more complicated. A dynamic fact with the word and a counter is rather costly. Hash tables or trees, possibly using destructive assignment come to mind. My most recent approach for a (different) counting problem was to maintain facts counter(+Obj, -Index), where each new object found gets the next index and then a global array (compound) where we use nb_setarg/3 to update the count. If the array gets too small we double it in size. You find the code in the new test coverage code (package plunit).

Boris · July 19, 2021, 6:57pm

I don’t not like it I just thought it is a well understood solution. You even provided the clumped/2 predicate recently to make this even easier, together with the “counting word frequency” example. I thought the question was “what if you can’t do it the normal way”…

I know that memory is cheap etc but sometimes the data is too much, and we are moving more and more to an “Infrastructure as a service” mode of operation and the costs are slowly starting to add up.

Boris · July 20, 2021, 7:07am

j4n_bur53:

clumped/2 is not the same as histogram. clumped/2 gives:
?- clumped([a,a,b,a,a,a,a,c,c,c], R).
R = [a-2, b-1, a-4, c-3].
histogram needs to give:
?- histogram([a,a,b,a,a,a,a,c,c,c], R).
R = [a-6, b-1, c-3].

If you looked at the example under clumped/2 you will see the thing I was referring to:

msort(Words, Sorted), clumped(Sorted, Counts)

But this is of course unnecessary if you already have your data in the Prolog database. Your aggregate is perfectly good, and before that was available, I have memories of doing something like this:

bagof(true, file_word(foo, X), Xs), length(Xs, Freq)

but of course using aggregate(count, ...) is better in every way. (EDIT: I can’t now remember which came first, library(aggregate) or library(solution_sequences), but I seem to remember that both were not available when I started using SWI-Prolog; could be a false memory though.)

Just to make it clear, I am looking at this from a very specific angle, namely, how to write Prolog programs that have somewhat predictable memory use regardless of the size of the input data. I now know of at least one more user, @swi , who has run into related issues.

Topic		Replies	Views
Lines of code in modules -- without comments and with conditional compile applied Help!	4	366	March 17, 2021
How-to line_count Help! how-to	7	650	April 17, 2020
Line_count/2 and phrase_from_file Help!	3	794	May 28, 2019
Load generated code Help!	1	354	March 22, 2019
Changed behaviour of load_files? Help!	5	347	April 9, 2019

Counting actual prolog lines loaded

Related topics