Double_quotes flag and DCGs

mvolkmann · July 21, 2023, 10:48am

Through trial and error I’ve arrived at the opinion that setting the double_quotes flag to chars is a bad idea when working with DCGs. It seems to break the functionality of the predicates defined in the dcg/basics library. Before working with DCGs I thought it was recommended (maybe by Markus Triska) to make chars the default setting. But am I correct that that’s not a good idea when using DCGs?

Boris · July 21, 2023, 10:58am

Yes, indeed, you need a very good reason to set double_quotes to chars in SWI-Prolog. Anecdotally, I have never needed this setting. Why exactly it has become so popular to do that is difficult to understand.

Keep in mind that while double-quoted literals are strings in SWI-Prolog, in the context of DCGs they work as intended. In other words, if you leave the flags at their default setting, those two both work:

foo --> "bar".

and

foo --> `bar`.

and they mean the same. But of course the first one is the usual way to define DCGs. This is documented in the same section, where it says:

Although represented as a list of codes is the correct representation for handling in DCGs, the DCG translator can recognise the literal and convert it to the proper representation. Such code need not be modified.

jan · July 21, 2023, 11:13am

It is nice for little student exercises on the terminal. At least I think that is the main reason. There is some value of chars over codes as they are easier to read, notably in the debugger. That is why there is the portray_text/1 predicate. Still, chars would have made this more easy as recognizing a list of integers as text is a weaker heuristic than recognizing a list of one-character atoms as text.

But, chars and codes do not go well together. Libraries are (often) written for either and thus the libraries make the choice. As codes are historically used by Prolog systems in the Edinburgh/Quintus tradition, it is hard to switch Finally, codes are probably a little faster and sometimes you can do meaningful arithmetic on them.

mvolkmann · July 21, 2023, 9:39pm

The default value of the double_quotes flag is string, but I have to change it to codes for the following simple DCG example to work.

:- set_prolog_flag(double_quotes, codes).

% This gathers a sequence of characters into a list of character atoms.
seq([]) --> [].
seq([H|T]) --> [H], seq(T).

% This gathers a sequence of characters into a string.
string_seq(S) --> seq(Cs), { atom_codes(S, Cs) }.

hello(Name) --> "Hello, ", string_seq(Name), "!".

I can use this in a REPL session like this:

?- once(phrase(hello(Name), "Hello, World!")).
Name = 'World'.

This breaks if I leave double_quotes set to string.
Is there a different way I need to write these DCG rules so it works in that case?

Boris · July 22, 2023, 6:55am

You don’t need to change the definition, but you do need to change the call:

?- once(phrase(hello(Name), `Hello, World!`)).
Name = 'World'.

Within a DCG rule definition, SWI-Prolog will do the right thing with double-quoted literals, as discussed above. However, the second argument of phrase/2 has to be a list of codes, so you need to quote it in backticks Other than one-line examples on the top-level, you rarely will be typing in your input by hand so this is not a huge issue in my experience.

(Footnote: anecdotally, I only use phrase/2 for trying out manually on the top-level. In my code I usually need phrase_from_file/2 and phrase_from_stream/2.)

All this said, string//1 from library(dcg/basics) is what you’d be using in SWI-Prolog (or maybe string_without//2?). The definition of seq//1 as in your example is not necessary. The docs show the more usual place for the cut: not wrapping the call to phrase/2, but right after your delimiter within the parser. You will find the examples in the docs but I will copy them here for posterity.

:- use_module(library(dcg/basics)).

hello_string(Name) -->
    "Hello, ", string(S), "!", !, % you probably should cut here, not outside!
    { atom_codes(S, Name) }.

hello_string_without(Name) -->
    "Hello, ", string_without("!", S), "!", % you don't need the cut!
    { atom_codes(Name, S) }.

It is a very big topic where to cut, I don’t want to go into it…

mvolkmann · July 22, 2023, 4:35pm

@Boris Thank you so much! This was very helpful!

peter.ludemann · July 22, 2023, 6:25pm

Note that operators such as -> work as expected in DCGs, so this could be rewritten (untested):

hello_string(Name) -->
    "Hello, "
    (  string(S), "!",
    -> []  % or {true}
    ;  string_without("!", S)
    ),
    { atom_codes(Name, S) }.

Boris · July 22, 2023, 6:43pm

This was meant as an exclusive or, either use string//1, or use string_without//2.

What am I missing

Plotinus · June 21, 2025, 2:35pm

I’m just starting, so take my input with a grain of salt, but I think a good complement to mapping double quotes to codes is to use the portray_text library. The latter allows one to see a list of codes like a string of characters. One can influence for which codes this conversion behaviour occurs (default is fine), what the minimum of the list length needs to be (I set it to 0 to also see a single code as a character), and what the maximum number of codes in a list is before the display is truncated with an ellipsis.

I find this useful when experimenting with DCG code and during debugging.

I started with mapping double quotes to chars, as recommended by Markus Triska, but then noticed that SWI-Prolog’s phrase_from_file/2 assumes the DCG to parse codes, rather than chars.

FWIW, if one uses set_prolog_flag(double_quotes, codes), it might be a good idea to also use set_prolog_flag(back_quotes, string) to effectively swap the meaning of ... and “…”; otherwise back quotes redundantly create codes and one has no quick way of defining a string literal.

FWIW, the SWI-Prolog documentation about “The string type and its double quoted syntax” provides helpful information on the matter.

jan · June 23, 2025, 3:42pm

The codes/chars thing is very old. The ISO people couldn’t decide on either, so they decided to more or less support both. I don’t know whether they realised that you cannot meaningfully support both in the same system and surely not in the same application.

I wouldn’t call it an “ancient status”. The fact that Scryer calls itself modern does not imply they got this right. Many other systems primarily target codes. Both have their advantages. Chars are “readable”, codes require less resources, allow for enumerating over code point ranges and do some useful arithmetic (although that is mostly for ASCII). The only drawback for codes I can think of is readability, but then library(portray_text) or something similar deals with that quite well. Codes allows for e.g. between(C, 0'0, 0'9), Digit is C-0'0. I don’t think we’d like to support Digit is C - '0' as this would make evaluation rather awkward (and 'e' ambiguous).

I don’t know. for most applications it is not terrible. SWI-Prolog currently caches the atoms it creates for code points up to 0xffff. Above that it does the atom lookup. The cache is an array of 256 pointers to a page of 256 chars, where we allocate the page if we need a character from it. Most applications will use only a tiny part of the Unicode space. Those that use a large part pay a pretty high price in terms of memory and currently also time for code points > 0xffff

So, SWI-Prolog defaults to codes and relevant libraries assume this. If you want chars, you can switch the flags and provide your own set of libraries.

pizzapal · June 24, 2025, 1:40am

I just dealt with this very thing for a program I wrote for $WORK and I was very confused about why phrase/2 was doing one thing with a string literal given at the top level but another thing with an atom read from a CSV, which had been converted to a string. I finally landed on something that was like what @peter.ludemann suggested, except that the conversion was handled by a helper predicate which was then part of a high level predicate that wrapped the call to phrase/2.

Doing the conversion in the helper predicate allowed me to avoid (some) instantiation errors when using my DCG to generate strings. I needed to support this because the program is for validation of “static” (in quotes because they change pretty often) config files, and I needed to quantify how many valid solutions exist for ranges of possible inputs (a lot of the stuff has to do with dates so I have been using julian).

Once I’d figured out what was going on with the string literal thing, it wasn’t that complicated – what is more vexatious is dealing with numbers. It would be really cool if I could use the DCG to generate some numbers but I usually get an instantiation error when the DCG rule has a logic goal involving arithmetic.

Edit: I should clarify that part of the issue I had with the string-vs-atom thing was how the double_quotes flag worked with modules and the top-level.

jamesnvc · June 24, 2025, 4:11pm

Using library(clpfd) might be useful for a DCG involving arithmetic, if the problem is instantiation errors when running in the “other direction”. If you want an example of doing complicated stuff with a bidirectional DCG, check out frames.pl in my HTTP/2 client - the DCGs written here work to both parse and generate HTTP2 frames.

j4n_bur53 · June 24, 2025, 6:42pm

(post deleted by author)

Topic		Replies	Views
What's the idiomatic way of developing DCGs? Help!	8	674	December 17, 2020
Phrase_from_file vs phrase Help!	29	2935	September 14, 2020
Write to socket -- does it strip quotes? Help!	23	858	February 24, 2020
DCG translation easter egg in SWI-Prolog General	0	68	January 25, 2025
Porting from SICStus Prolog - double-quoted strings and --traditional Help!	4	399	December 2, 2020

Double_quotes flag and DCGs

Related topics