How-to line_count

vincent · April 9, 2020, 9:42pm

I’m using: SWI-Prolog version 8.0.2

I want the code to: count the lines in any file

I am trying line_count/2 and it shows 1 no mater what file I give it. I also read that it doesn’t work properly. So I am trying to implement one but have difficulties with the counting the occurrence of the ‘newline’. I want in between get_char(’\n’) and get_char(‘end_of_file’) a counting procedure that counts the ‘newline’-s. I need help with what should I put in the place of 'increase Lines by 1.

Thanks!

My code looks like this:

% your code here
line_co(Lines) :-
       repeat,
       get_char('\n'),
       increase Lines by 1,
       get_char('end_of_file').

Boris · April 10, 2020, 5:53am

It is too much effort to do this with get_char and repeat. At the very least, look at the code sample at the bottom of the docs for repeat:

  repeat,
    read(Term),
    (  Term == end_of_file
    -> !
    ;  process(Term),
       fail
    ).

You’d also need to use a mutable variable for the counting. You can see examples of mutable variables in the implementation of aggregate_all.

Putting those together I get:

line_count(Filename, N) :-
    setup_call_cleanup(open(Filename, read, In),
                       stream_line_count(In, N),
                       close(In)).

stream_line_count(In, N) :-
    Count = count(0),
    repeat,
        get_char(In, X),
        (   X == end_of_file
        ->  !
        ;   (   X == '\n'
            ->  arg(1, Count, N0),
                N1 is N0 + 1,
                nb_setarg(1, Count, N1)
            ),
            fail
        ),
    arg(1, Count, N).

and then:

?- line_count("line_count.pl", N).
N = 19.

One easier way is using read_string/5 and using a loop instead of repeat:

line_count(Filename, N) :-
    setup_call_cleanup(open(Filename, read, In),
                       stream_line_count(In, 0, N), % note the extra argument!
                       close(In)).

stream_line_count(In, N0, N) :-
    read_string(In, "\n", "", Sep, _),
    Sep \== -1,
    !,
    N1 is N0 + 1,
    stream_line_count(In, N1, N).
stream_line_count(_, N, N).

Instead of read_string/5 you could have used one of the predicates in library(readutil). Or maybe you can use a DCG:

:- use_module(library(dcg/basics)).

line_count(N0, N) -->
    string_without("\n", _), "\n",
    !,
    { N1 is N0 + 1 },
    line_count(N1, N).
line_count(N, N) --> [].

and then:

?- phrase_from_file(line_count(0, N), "line_count.pl").
N = 8.

There are endless possibilities (do you have to do it in constant memory? or can you just read the whole file with read_string/3 and count the newlines in this? and so on).

Boris · April 10, 2020, 7:55am

Now I also have a question for everyone:

if I wanted to count the newlines in a stream (or any single character, really), and I decided to use aggregate_all(count, stream_newline(In), N). How would I define stream_newline/1?

This is one attempt that seems to work:

stream_newline(In) :-
    repeat,
        get_char(In, C),
        (   C == end_of_file
        ->  !, fail
        ;   C == '\n'
        ;   fail
        ).

Is there an easier way to achieve the same?

jan · April 10, 2020, 4:02pm

Just for Easter fun …

file_nl_count(File, Count) :-
    setup_call_cleanup(
        open(File, read, In),
        stream_nl_count(In, Count),
        close(In)).

stream_nl_count(In, Count) :-
    stream_to_lazy_list(In, List),
    aggregate_all(count, member(0'\n, List), Count).

It is even pretty fast. Takes 16ms for a file holding 186713 chars and 7331 newlines. Your version uses 31ms and Unix wc does the job in 2ms (3ms for this file and 1ms for a really short file).

We can also do the counting as below. Performs equal to the above.

count_nl(List, Count) :-
    count_nl(List, 0, Count).

count_nl([], Count, Count) :-
    !.
count_nl([H|T], Count0, Count) :-
    (   H == 0'\n
    ->  Count1 is Count0+1,
        count_nl(T, Count1, Count)
    ;   count_nl(T, Count0, Count)
    ).

Boris · April 10, 2020, 7:33pm

@jan this is brilliant!

So what this tells me is the lazy list is the real magic here: it makes a stream look exactly like a list and it is more efficient!

jan · April 11, 2020, 8:08am

Roughly it gets its performance from using block reading rather than character-by-character, avoiding all the argument validation and locking for each character. It is of course also nice that you can use normal list predicates, but be aware that not all list predicates always work. That is the nasty part: it is not very easy to describe where the limits are and you must understand what happens to see where things can go wrong.

Boris · April 17, 2020, 7:34am

For my own piece of mind, I did run different version of the “line counting” program and for the record:

Using string/5 with a newline as the separator beats everything else (like, about 4-5 times faster than using a lazy list). As your experiment showed, using aggregate_all is roughly the same as writing a classical tail-recursive loop.

I can paste my full code and results if there is interest.

Still, I cannot thank you enough for bringing stream_to_lazy_list/2 to my attention!

EricGT · April 17, 2020, 10:09am

Please do.

Topic		Replies	Views
Counting actual prolog lines loaded Help!	5	692	July 20, 2021
Line_count/2 and phrase_from_file Help!	3	796	May 28, 2019
Lines of code in modules -- without comments and with conditional compile applied Help!	4	366	March 17, 2021
I want to read a stream that contains newlines to end-of-file character Help! how-to	16	2030	September 7, 2020
Read lines of STDIN until EOF into a list Help!	1	655	April 28, 2023

How-to line_count

Related topics