Through trial and error I’ve arrived at the opinion that setting the double_quotes
flag to chars
is a bad idea when working with DCGs. It seems to break the functionality of the predicates defined in the dcg/basics
library. Before working with DCGs I thought it was recommended (maybe by Markus Triska) to make chars
the default setting. But am I correct that that’s not a good idea when using DCGs?
Yes, indeed, you need a very good reason to set double_quotes
to chars
in SWI-Prolog. Anecdotally, I have never needed this setting. Why exactly it has become so popular to do that is difficult to understand.
Keep in mind that while double-quoted literals are strings in SWI-Prolog, in the context of DCGs they work as intended. In other words, if you leave the flags at their default setting, those two both work:
foo --> "bar".
and
foo --> `bar`.
and they mean the same. But of course the first one is the usual way to define DCGs. This is documented in the same section, where it says:
Although represented as a list of codes is the correct representation for handling in DCGs, the DCG translator can recognise the literal and convert it to the proper representation. Such code need not be modified.
It is nice for little student exercises on the terminal. At least I think that is the main reason. There is some value of chars over codes as they are easier to read, notably in the debugger. That is why there is the portray_text/1 predicate. Still, chars would have made this more easy as recognizing a list of integers as text is a weaker heuristic than recognizing a list of one-character atoms as text.
But, chars and codes do not go well together. Libraries are (often) written for either and thus the libraries make the choice. As codes are historically used by Prolog systems in the Edinburgh/Quintus tradition, it is hard to switch Finally, codes are probably a little faster and sometimes you can do meaningful arithmetic on them.
The default value of the double_quotes
flag is string
, but I have to change it to codes
for the following simple DCG example to work.
:- set_prolog_flag(double_quotes, codes).
% This gathers a sequence of characters into a list of character atoms.
seq([]) --> [].
seq([H|T]) --> [H], seq(T).
% This gathers a sequence of characters into a string.
string_seq(S) --> seq(Cs), { atom_codes(S, Cs) }.
hello(Name) --> "Hello, ", string_seq(Name), "!".
I can use this in a REPL session like this:
?- once(phrase(hello(Name), "Hello, World!")).
Name = 'World'.
This breaks if I leave double_quotes
set to string
.
Is there a different way I need to write these DCG rules so it works in that case?
You don’t need to change the definition, but you do need to change the call:
?- once(phrase(hello(Name), `Hello, World!`)).
Name = 'World'.
Within a DCG rule definition, SWI-Prolog will do the right thing with double-quoted literals, as discussed above. However, the second argument of phrase/2 has to be a list of codes, so you need to quote it in backticks Other than one-line examples on the top-level, you rarely will be typing in your input by hand so this is not a huge issue in my experience.
(Footnote: anecdotally, I only use phrase/2 for trying out manually on the top-level. In my code I usually need phrase_from_file/2 and phrase_from_stream/2.)
All this said, string//1 from library(dcg/basics) is what you’d be using in SWI-Prolog (or maybe string_without//2?). The definition of seq//1 as in your example is not necessary. The docs show the more usual place for the cut: not wrapping the call to phrase/2, but right after your delimiter within the parser. You will find the examples in the docs but I will copy them here for posterity.
:- use_module(library(dcg/basics)).
hello_string(Name) -->
"Hello, ", string(S), "!", !, % you probably should cut here, not outside!
{ atom_codes(S, Name) }.
hello_string_without(Name) -->
"Hello, ", string_without("!", S), "!", % you don't need the cut!
{ atom_codes(Name, S) }.
It is a very big topic where to cut, I don’t want to go into it…
@Boris Thank you so much! This was very helpful!
Note that operators such as ->
work as expected in DCGs, so this could be rewritten (untested):
hello_string(Name) -->
"Hello, "
( string(S), "!",
-> [] % or {true}
; string_without("!", S)
),
{ atom_codes(Name, S) }.
This was meant as an exclusive or, either use string//1, or use string_without//2.
What am I missing
I’m just starting, so take my input with a grain of salt, but I think a good complement to mapping double quotes to codes
is to use the portray_text
library. The latter allows one to see a list of codes
like a string of characters. One can influence for which codes
this conversion behaviour occurs (default is fine), what the minimum of the list length needs to be (I set it to 0 to also see a single code
as a character), and what the maximum number of codes in a list is before the display is truncated with an ellipsis.
I find this useful when experimenting with DCG code and during debugging.
I started with mapping double quotes to chars
, as recommended by Markus Triska, but then noticed that SWI-Prolog’s phrase_from_file/2 assumes the DCG to parse codes
, rather than chars
.
FWIW, if one uses set_prolog_flag(double_quotes, codes)
, it might be a good idea to also use set_prolog_flag(back_quotes, string)
to effectively swap the meaning of ...
and “…”; otherwise back quotes redundantly create codes
and one has no quick way of defining a string
literal.
FWIW, the SWI-Prolog documentation about “The string type and its double quoted syntax” provides helpful information on the matter.
The codes/chars thing is very old. The ISO people couldn’t decide on either, so they decided to more or less support both. I don’t know whether they realised that you cannot meaningfully support both in the same system and surely not in the same application.
I wouldn’t call it an “ancient status”. The fact that Scryer calls itself modern does not imply they got this right. Many other systems primarily target codes. Both have their advantages. Chars are “readable”, codes require less resources, allow for enumerating over code point ranges and do some useful arithmetic (although that is mostly for ASCII). The only drawback for codes I can think of is readability, but then library(portray_text) or something similar deals with that quite well. Codes allows for e.g. between(C, 0'0, 0'9), Digit is C-0'0
. I don’t think we’d like to support Digit is C - '0'
as this would make evaluation rather awkward (and 'e'
ambiguous).
I don’t know. for most applications it is not terrible. SWI-Prolog currently caches the atoms it creates for code points up to 0xffff
. Above that it does the atom lookup. The cache is an array of 256 pointers to a page of 256 chars, where we allocate the page if we need a character from it. Most applications will use only a tiny part of the Unicode space. Those that use a large part pay a pretty high price in terms of memory and currently also time for code points > 0xffff
So, SWI-Prolog defaults to codes and relevant libraries assume this. If you want chars, you can switch the flags and provide your own set of libraries.
I just dealt with this very thing for a program I wrote for $WORK and I was very confused about why phrase/2
was doing one thing with a string literal given at the top level but another thing with an atom read from a CSV, which had been converted to a string. I finally landed on something that was like what @peter.ludemann suggested, except that the conversion was handled by a helper predicate which was then part of a high level predicate that wrapped the call to phrase/2
.
Doing the conversion in the helper predicate allowed me to avoid (some) instantiation errors when using my DCG to generate strings. I needed to support this because the program is for validation of “static” (in quotes because they change pretty often) config files, and I needed to quantify how many valid solutions exist for ranges of possible inputs (a lot of the stuff has to do with dates so I have been using julian
).
Once I’d figured out what was going on with the string literal thing, it wasn’t that complicated – what is more vexatious is dealing with numbers. It would be really cool if I could use the DCG to generate some numbers but I usually get an instantiation error when the DCG rule has a logic goal involving arithmetic.
Edit: I should clarify that part of the issue I had with the string-vs-atom thing was how the double_quotes
flag worked with modules and the top-level.
Using library(clpfd)
might be useful for a DCG involving arithmetic, if the problem is instantiation errors when running in the “other direction”. If you want an example of doing complicated stuff with a bidirectional DCG, check out frames.pl
in my HTTP/2 client - the DCGs written here work to both parse and generate HTTP2 frames.
(post deleted by author)