Phrase_from_file vs phrase

Boris · September 13, 2020, 6:43pm

As a matter of fact, you can use string literals in DCG rules. With vanilla SWI-Prolog (version 8.3.7, but this has worked for years now):

?- phrase(`foo`, X).
X = [102, 111, 111].

?- phrase("foo", X).
X = [102, 111, 111].

It correctly doesn’t work the other way:

?- [user].
|: x --> "foo".
|: ^D% user://1 compiled 0.02 sec, 1 clauses
true.

?- phrase(x, `foo`).
true.

?- phrase(x, "foo").
ERROR: Type error: `list' expected, found `"foo"' (a string)
...

EDIT: somewhere towards the bottom of the manual page on strings:

We observe that in many programs, most strings are only handled as a single unit during their lifetime. Examining real code tells us that double quoted strings typically appear in one of the following roles:

A DCG literal
Although represented as a list of codes is the correct representation for handling in DCGs, the DCG translator can recognise the literal and convert it to the proper representation. Such code need not be modified.

jan · September 14, 2020, 7:11am

This

 `hello world|Tail`

is not valid syntax for read/1. It is just used by library(portray_text) to indicate that a list of code points ends in the variable Tail. We could of course add it to the syntax, though using a backslash for disambiguation.

`hello world\|Tail`

I think this makes sense but I think it is not a big enough improvement to justify the incompatibility and difficulties handling this correctly in (IDE) tools. We have partial evaluation of phrase("hello world", List, Tail) as an acceptable and portable alternative.

EricGT · September 14, 2020, 9:50am

I did not get to doing due diligence with

`hello world\|Tail`

and did think it odd that the operator | would be recognized as an operator and not text in a form of quoted text.

I agree that it should not be added.

anniepoo · September 14, 2020, 10:26am

See

phrase_from_file loads lazily. It has an attributed variable, when you try to unify it it fires off
a goal that loads more text. This kinda-sorta works.

@jan - would it make sense to have a non-lazy phrase_from_file that supported the current line/char reporting? I usually am using phrase_from_file not because it’s lazy but because it supports line tracking, which is pretty much always needed - you will eventually have to deal iwth invalid files,
and ‘not valid’ is rarely the right error message.

remy.s · September 14, 2020, 10:47am

remy.s:

remy.s:

is there a way to make phrase_from_file use chars instead of codes?

By the way, I looked up the code of phrase_from_file/3 in module pure_input.
As I understand the code, the actual reading of the file is done in a clause attr_unify_hook_ndebug/2 attached as a handler to an attributed var.
The relevant code there is
attr_unify_hook_ndebug(State, Value) :-
   State = lazy_input(Stream, _PrevPos, Pos, Read),
   (   var(Read)
   ->  fill_buffer(Stream),
       read_pending_codes(Stream, NewList, Tail),
       (   Tail == []
       ->  nb_setarg(4, State, []),
           Value = []
       ;   stream_to_lazy_list(Stream, Pos, Tail),
           nb_linkarg(4, State, NewList),
           Value = NewList
       )
   ;   Value = Read
   )
Besides read_pending_codes/3 used in this clause, prolog has a read_pending_chars/3
So changing the behaviour of phrase_from_file, should one consider to do so, would be to pass an additional option from phrase_from_file/3 down the line to the attribute hook.

I did some more research on that topic and found an excellent tutorial on DCGs from Markus Triska.
In the section about reading from files, he kind of recommends to set double_quotes to char, which will not work with SWI, as we learned in this thread.
In the same section, he references a “pure io” library from Ulrich Neumerkel (http://www.complang.tuwien.ac.at/ulrich/Prolog-inedit/sicstus/pio.pl).
Interestingly Neumerkel’s library “auto-adjusts” to codes vs chars:

phrase_from_file(NT__0, File) :-
   current_prolog_flag(double_quotes, Value),
   (  Value == chars -> R_3 = get_pending_chars % recommended
   ;  Value == codes -> R_3 = get_pending_codes % suboptimal
   ;  must_be(Value, oneof([chars,codes]), phrase_from_file(NT__0, File), 0)
   ),
   phrase_of_from_file(NT__0, R_3, File).

Later on, R_3 is used as a pointer to the actual reading routine:

reader_step(R_3, Stream, Pos, Xs0) :-
   set_stream_position(Stream, Pos),
   (  at_end_of_stream(Stream)
   -> Xs0 = []
   ;  % phrase(call(call(R_3,Stream)), Xs0,Xs), % conforming call
      call(R_3, Stream, Xs0,Xs), % effective call
      reader_to_lazy_list(R_3, Stream, Xs)
   ).

remy.s · September 14, 2020, 10:51am

anniepoo:

See

The lazy parsing predicate phrase_from_file/2,3 is useful, especially since it provides nice feedback for file position when an error is encountered. It does, however, place certain restrictions on the DCG - particularly in handling eof. I asked Jan to outline those restrictions, he suggested I do so on discourse so everyone could benefit - So this is me asking Jan (or anybody else who can help) what steps one needs to take to make one’s DCG happy with phrase_from_file.

phrase_from_file loads lazily. It has an attributed variable, when you try to unify it it fires off
a goal that loads more text. This kinda-sorta works.

@jan - would it make sense to have a non-lazy phrase_from_file that supported the current line/char reporting? I usually am using phrase_from_file not because it’s lazy but because it supports line tracking, which is pretty much always needed - you will eventually have to deal iwth invalid files,
and ‘not valid’ is rarely the right error message.

Annie, what’s wrong with laziness. Actually I consider this a quality.
Should’nt we all be lazy, and aren’t we using Prolog because we want to be lazy?

anniepoo · September 14, 2020, 11:13am

well, I too love laziness, but it conflicts with the goal of not causing bafflement when used with library(dcg/basics)

anniepoo · September 14, 2020, 11:14am

However if your goal is to reduce programming work, then Definitely - SWI-Prolog.

Boris · September 14, 2020, 11:43am

I was about to bring this up; I thought it isn’t relevant, but apparently it is.

There was some ideological struggle about this at some point of time. I definitely did not understand what it is about, exactly, but the take-home message for me was that it was indeed about ideology and not technology only.

Up until about three years ago (when I stopped wasting time on Stackoverflow) there was also a clique of high-rep prolific answerers on the [prolog] tag; they would always recommend to use chars and set certain global flags and so on. Those would fix some “defects” and break other things at the same time, thus creating some confusion that still persists.

Just to give you some context.

PS: this all was somehow entangled with discussions about purity and ISO standardization and compliance. It was strange.

jan · September 14, 2020, 11:52am

Well, the only thing that works is unifying the lazy list with either [_|_] or []. This is what DCGs do as long as you do not start dirty hacking, so there is not much of a problem. Things go wrong if you write e.g.

at_eof(End,End) :- End == [].

or the application expects the 3rd argument of phrase/2 to be a list and call e.g., length/2 on it.

Not that much. The way it works is a little tricky. When asked for the position it scans the list forward to the attributed variable. There it finds the position info for the start of the current block that enables it to compute the position for the current point in the list. This works fairly well as the amount to scan is just one block. Doing this without the lazy stuff simply means do the position calculation from the beginning of the list. If the list is long that gets pretty expensive …

That makes some sense, but unlike SICStus, most of SWI-Prolog syntax changes are module local. This holds for operators, but also the double_quotes flag. So, this can work but handling the module context correctly gets pretty hairy if you want to put DCGs in modules. The module-local syntax for double_quotes is intended to allow for using (typically ugly) code that relies all over the place on “” to be a list of character codes (chars).

One of the nasty issues with the code/chars is that your entire application has to agree if text is passed as lists between components. As most existing components are written with codes in mind, I think it is best to stick with codes and improve the environment to make working with them as pleasant as possible. With the development of SWI-7 I’ve considered adding code as a primary type. That would have solved this issue, but it became to complicated and expensive and I dropped the idea.

Topic		Replies	Views
What's the idiomatic way of developing DCGs? Help!	8	678	December 17, 2020
Simple example of phrase_from_file/2? Help!	13	957	March 29, 2021
Double_quotes flag and DCGs Discussion dcg	12	760	June 30, 2025
Using phrase_from_file Help!	9	1231	February 19, 2020
Most efficient DCG for text parsing? Algorithm	4	1892	July 5, 2021

Phrase_from_file vs phrase

Related topics