A couple of questions about low-level reading

The assumption here is that I need more low-level stuff than what I can easily achieve with phrase_from_file/3 and phrase_from_stream/2.

From the docs I gather that fill_buffer/1 and read_pending_codes/3 together allow reading in “chunks” the size of the buffer. I am guessing that size is decided by the OS or the C library implementation, or maybe even the SWI-Prolog implementation?

I was mildly confused by the docs of read_pending_codes/3 and the example, the docs talk about using at_end_of_stream/1 but the code does not use it… with the example there code, I wrote:

read_chunks(File) :-
    setup_call_cleanup(open(File, read, In, [encoding(octet)]),
        read_with_pending(In),
        close(In)).

read_with_pending(In) :-
    repeat,
        fill_buffer(In),
        read_pending_codes(In, Codes, Tail),
        \+ \+ ( Tail = [],
                length(Codes, N),
                format(user_error, "read ~d codes chunk~n", [N])
              ),
        (   Tail == []
        ->  !
        ;   fail
        ).

And then:

?- read_chunks('test.txt').
read 4096 chars chunk
% repeats many times
read 4096 chars chunk
read 2416 chars chunk
read 0 chars chunk
true.

There is now an obvious extra read here at the end, I can get rid of it by moving the test for Tail == [] before the printing:

read_with_pending(In) :-
    repeat,
        fill_buffer(In),
        read_pending_codes(In, Codes, Tail),
        (   Tail == []
        ->  !
        ;   \+ \+ ( Tail = [],
                    length(Codes, N),
                    format(user_error, "read ~d codes chunk~n", [N])
                  ),
            fail
        ).

Now I get:

?- read_chunks('test.txt').
read 4096 chars chunk
% repeats many times
read 4096 chars chunk
read 2416 chars chunk
true.

First real question: is there a good reason to do it exactly as in the example? For example, can it be that Tail == [] but either:

  • there is still something in Codes; or
  • we are not really yet at_end_of_stream(In)?

And the second question, is the choice of how to test just a matter of opinion or is there a difference between:

  • at_end_of_stream(In); and
  • Codes == []; and
  • Tail == []?

Finally, what is the use case for subsequent calls to fill_buffer/1? As a programmer, how can I tell if the buffer is now full? This is what the docs say:

Fill the Stream’s input buffer. Subsequent calls try to read more input until the buffer is completely filled.

It is Prolog level. Can be changed using set_stream/2 option buffer_size(Bytes). Not that fill_buffer/1 does not guarantee the buffer is full. It just calls read() with a size that reflects the remaining space in the buffer and adds the returned bytes to the buffer. Whether or not it is full after that depends on the OS and underlying stream.

The docs claim both code and tail are if and only after we got end-of-file.

I don’t think there is a useful scenario. At least, I do not see it. If the buffer is full, it will perform a read() call with count = 0, which may cause incorrect detection of end-of-file. You may be interested in peek_string/3, which allows pre-fetching data from a stream without reading it. This is intended to examine the content of a stream prior to deciding on how to process it.

The primary intend of these predicates is to process data as it arrives on a stream as chunks. Using character-level I/O poses much more overhead that using larger chunks. This makes e.g., phrase_from_stream/2 a lot faster

1 Like