Simple example of phrase_from_file/2?

I’m using: SWI-Prolog version 8.3.20

I have a file with the following line:

% Target:           ability/2

Suppose I’d like to parse it with the following grammar:

:- use_module(library(dcg/basics)).

target(T) --> comment, blank, `Target:`, blanks, predicate_symbol(T).

comment --> `%`.

predicate_symbol(F/A) --> string(F), `/`, integer(A).

phrase/2 succeeds:

?- phrase(target(_Cs/A), `% Target:           ability/2`), atom_codes(T, _Cs).
A = 2,
T = ability .

phrase_from_file/2 fails:

?- phrase_from_file(target(T), 'my_file.log').
false.

How can I make phrase_from_file/2 succeed?

Note that the example of ‘file_contains/3’ in the documentation only shows how to count occurrences of a pattern but I want to bind variables in the output. Is this possible with phrase_from_file/2?

Just a guess: the content of the file may end with a newline? Your grammar doesn’t deal with that.

Of course. It is just a grammar and can do anything a grammar can do.

P.s. Probably better use "Target:" rather than `target`. Only for the input you need `...`. The DCG compiler handles double quoted strings as string literals.

1 Like

Thanks - but, no, the file doesn’t end wth a newline.

A bit confused with all the different quotation schemes but I’ll manage.

Next step is tracing, I guess :frowning:

Next step was to roll-my-own log parser with read_line_from_file/2 (that’s what my program was tring to do, read a log) :slight_smile:

But I’d still like to find out how phrase_from_file/2 works. Is there a way to ask the tracer nicely to render lists of codes as lists of chars?

No, but using portray_text/1 you can get them printed as strings.

2 Likes

That’s what I was looking for, thanks.

For others that might find this topic and want to know more ways to display info.

When format, ansi_format, print_term and print_message don’t seem right, try portray

1 Like

can you post my_file.log? my guess is that it contains more than one line, and your grammar only processes one string.

I added log to the list of authorized file extensions for uploading.

image

Unfortunately I don’t have it anymore. It was an excerpt from a longer log with about 64 lines, but I don’t remember how many lines I copied in the excerpt- I think you’re right it was more than one line.

But I’m confused: what do you mean my grammar only processes one string? My understanding of phrase_from_file/2 is that it will nondeterministically process each string in a file and succeed for each line that contains a string that is accepted by the grammar rule. Is that not the case?

Edit: The above is my understanding after copying the file_contains/2 example from the documentation into my source and running it against my full, 64-line log, whereupon it returned a count of 1, if I remember correctly.

I was curious so I tried again and like you say the error on my part was that I expected phrase_from_file/2 to go through the entire file nondeterministically, treating each new line as a new input to the grammar.

Instead, it looks like phrase_from_file/2 treats the entire file as input to the grammar, i.e. from the first character of the first line on to the end-of-file marker. If the grammar does not accept the first line, phrase_from_file/2 fails deterministically (i.e. it doesn’t backtrack to get another line). [Edit: it didn’t help that I was writing a log parser, where it’s natural to want to ignore much of the file’s content and only be interested in some particular bit of information.]

I was confused by the aggregate_all/3 call in the example program file_contains/2 given in the documentation:

:- use_module(library(dcg/basics)).

file_contains(File, Pattern) :-
        phrase_from_file(match(Pattern), File).

match(Pattern) -->
        string(_),
        string(Pattern),
        remainder(_).

match_count(File, Pattern, Count) :-
        aggregate_all(count, file_contains(File, Pattern), Count).

I didn’t realise that the string(_) call before string(Pattern) is meant to match the entire file up to the sought-for pattern, and thought it was supposed to match at the start of a line (and so ignore the input up to that pattern). My example grammar matches an entire log line so it fails when there are lines before and after it.

To be honest, I find the documentation for phrase_from_file/2 very confusing and it’s not the first time I’ve grappled with it.

I guess I have to propose a better example now?

For the time being I’m uploading a very simple example grammar and two files, one with the target line as the first line, named “works.log” and one with a different first line named “not_works.log”.

Hopefully that could help clear up the confusion to someone else trying to understand how phrase_from_file/2 works.

Source file with very simple grammar:
target.pl (422 Bytes)

Text file with the target string as first line:
works.log (40 Bytes)

Text file with a different first line:
not_works.log (53 Bytes)

Example queries:

?- findall_targets(Ts,'works.log').
Ts = [move/2].

?- findall_targets(Ts,'not_works.log').
Ts = [].

I should also comment that the phrase_from_file/2 example is confusing because it doesn’t show (what I think is) the most natural application for such a predicate, i.e. to extract bits of text containing interesting information from a file. Instead, it shows how to count lines that contain a pattern, which is a bit of an edge case in general, I find. The first time I saw this bit of documentation I formed the impression that, perhaps, the point of the predicate was to confirm that a pattern exists in a file, rather than analyse a pattern as one would do with an ordinary DCG (hence my earlier question to Jan, whether phrase_from_file/2 can bind variables in the output).

There is no problem with phrase_from_file/2, what is happening is that your grammar does not reflect the structure of the file properly, the following will work:

:-module(target, [findall_targets/2
                 ]).

:- use_module(library(dcg/basics)).

findall_targets(F, Ts):-
    phrase_from_file(lines(Entries), F),
    exclude(is_comment,Entries,Ts).

is_comment(comment(_)).

% Convert log to a list like this:
% [ comment(SomeComment), P/A, ... ]
lines([Entry|Entries]) -->
   line(Entry), !,
   lines(Entries).
lines([]) --> [].

% Line is either a comment or a comment with a target.
line(Entry) -->
   comment_with_target(Entry), !
   | comment(Entry).

comment_with_target(T) -->
   `%`, blank, `Target:`, blanks, predicate_symbol(T), blanks_to_nl.

comment(comment(C)) -->
   `%`, string_without("\n",C), `\n`.

predicate_symbol(F/A) --> string(F), `/`, integer(A).

Query:

29 ?- findall_targets('works.log',Ts).
Ts = [`move`/2].

30 ?- findall_targets('not_works.log',Ts).
Ts = [`move`/2].

Remember the grammar (whether you use phrase/2 or phrase_from_file/2) must describe all possible contents, not just the bits you are interested in. The grammar is not a regex to extract the bits you want, but a description of all possible file contents.

EDIT: This is a more advanced topic, but after you are comfortable using grammars, and want a way to process the file using the grammar but without necessarily building a list, you can use the new library(intercept).