How-to line_count

I’m using: SWI-Prolog version 8.0.2

I want the code to: count the lines in any file

I am trying line_count/2 and it shows 1 no mater what file I give it. I also read that it doesn’t work properly. So I am trying to implement one but have difficulties with the counting the occurrence of the ‘newline’. I want in between get_char(’\n’) and get_char(‘end_of_file’) a counting procedure that counts the ‘newline’-s. I need help with what should I put in the place of 'increase Lines by 1.

Thanks!

My code looks like this:

% your code here
line_co(Lines) :-
       repeat,
       get_char('\n'),
       increase Lines by 1,
       get_char('end_of_file').

It is too much effort to do this with get_char and repeat. At the very least, look at the code sample at the bottom of the docs for repeat:

  repeat,
    read(Term),
    (  Term == end_of_file
    -> !
    ;  process(Term),
       fail
    ).

You’d also need to use a mutable variable for the counting. You can see examples of mutable variables in the implementation of aggregate_all.

Putting those together I get:

line_count(Filename, N) :-
    setup_call_cleanup(open(Filename, read, In),
                       stream_line_count(In, N),
                       close(In)).

stream_line_count(In, N) :-
    Count = count(0),
    repeat,
        get_char(In, X),
        (   X == end_of_file
        ->  !
        ;   (   X == '\n'
            ->  arg(1, Count, N0),
                N1 is N0 + 1,
                nb_setarg(1, Count, N1)
            ),
            fail
        ),
    arg(1, Count, N).

and then:

?- line_count("line_count.pl", N).
N = 19.

One easier way is using read_string/5 and using a loop instead of repeat:

line_count(Filename, N) :-
    setup_call_cleanup(open(Filename, read, In),
                       stream_line_count(In, 0, N), % note the extra argument!
                       close(In)).

stream_line_count(In, N0, N) :-
    read_string(In, "\n", "", Sep, _),
    Sep \== -1,
    !,
    N1 is N0 + 1,
    stream_line_count(In, N1, N).
stream_line_count(_, N, N).

Instead of read_string/5 you could have used one of the predicates in library(readutil). Or maybe you can use a DCG:

:- use_module(library(dcg/basics)).

line_count(N0, N) -->
    string_without("\n", _), "\n",
    !,
    { N1 is N0 + 1 },
    line_count(N1, N).
line_count(N, N) --> [].

and then:

?- phrase_from_file(line_count(0, N), "line_count.pl").
N = 8.

There are endless possibilities (do you have to do it in constant memory? or can you just read the whole file with read_string/3 and count the newlines in this? and so on).

1 Like

Now I also have a question for everyone:

if I wanted to count the newlines in a stream (or any single character, really), and I decided to use aggregate_all(count, stream_newline(In), N). How would I define stream_newline/1?

This is one attempt that seems to work:

stream_newline(In) :-
    repeat,
        get_char(In, C),
        (   C == end_of_file
        ->  !, fail
        ;   C == '\n'
        ;   fail
        ).

Is there an easier way to achieve the same?

1 Like

Just for Easter fun …

file_nl_count(File, Count) :-
    setup_call_cleanup(
        open(File, read, In),
        stream_nl_count(In, Count),
        close(In)).

stream_nl_count(In, Count) :-
    stream_to_lazy_list(In, List),
    aggregate_all(count, member(0'\n, List), Count).

It is even pretty fast. Takes 16ms for a file holding 186713 chars and 7331 newlines. Your version uses 31ms and Unix wc does the job in 2ms (3ms for this file and 1ms for a really short file).

We can also do the counting as below. Performs equal to the above.

count_nl(List, Count) :-
    count_nl(List, 0, Count).

count_nl([], Count, Count) :-
    !.
count_nl([H|T], Count0, Count) :-
    (   H == 0'\n
    ->  Count1 is Count0+1,
        count_nl(T, Count1, Count)
    ;   count_nl(T, Count0, Count)
    ).
6 Likes

@jan this is brilliant!

So what this tells me is the lazy list is the real magic here: it makes a stream look exactly like a list and it is more efficient!

2 Likes

Roughly it gets its performance from using block reading rather than character-by-character, avoiding all the argument validation and locking for each character. It is of course also nice that you can use normal list predicates, but be aware that not all list predicates always work. That is the nasty part: it is not very easy to describe where the limits are and you must understand what happens to see where things can go wrong.

2 Likes

For my own piece of mind, I did run different version of the “line counting” program and for the record:

Using string/5 with a newline as the separator beats everything else (like, about 4-5 times faster than using a lazy list). As your experiment showed, using aggregate_all is roughly the same as writing a classical tail-recursive loop.

I can paste my full code and results if there is interest.

Still, I cannot thank you enough for bringing stream_to_lazy_list/2 to my attention!

Please do.