We observe that in many programs, most strings are only handled as a single unit during their lifetime. Examining real code tells us that double quoted strings typically appear in one of the following roles:
A DCG literal
Although represented as a list of codes is the correct representation for handling in DCGs, the DCG translator can recognise the literal and convert it to the proper representation. Such code need not be modified.
is not valid syntax for read/1. It is just used by library(portray_text) to indicate that a list of code points ends in the variable Tail. We could of course add it to the syntax, though using a backslash for disambiguation.
`hello world\|Tail`
I think this makes sense but I think it is not a big enough improvement to justify the incompatibility and difficulties handling this correctly in (IDE) tools. We have partial evaluation of phrase("hello world", List, Tail) as an acceptable and portable alternative.
phrase_from_file loads lazily. It has an attributed variable, when you try to unify it it fires off
a goal that loads more text. This kinda-sorta works.
@jan - would it make sense to have a non-lazy phrase_from_file that supported the current line/char reporting? I usually am using phrase_from_file not because it’s lazy but because it supports line tracking, which is pretty much always needed - you will eventually have to deal iwth invalid files,
and ‘not valid’ is rarely the right error message.
I did some more research on that topic and found an excellent tutorial on DCGs from Markus Triska.
In the section about reading from files, he kind of recommends to set double_quotes to char, which will not work with SWI, as we learned in this thread.
In the same section, he references a “pure io” library from Ulrich Neumerkel (http://www.complang.tuwien.ac.at/ulrich/Prolog-inedit/sicstus/pio.pl).
Interestingly Neumerkel’s library “auto-adjusts” to codes vs chars:
Annie, what’s wrong with laziness. Actually I consider this a quality.
Should’nt we all be lazy, and aren’t we using Prolog because we want to be lazy?
I was about to bring this up; I thought it isn’t relevant, but apparently it is.
There was some ideological struggle about this at some point of time. I definitely did not understand what it is about, exactly, but the take-home message for me was that it was indeed about ideology and not technology only.
Up until about three years ago (when I stopped wasting time on Stackoverflow) there was also a clique of high-rep prolific answerers on the [prolog] tag; they would always recommend to use chars and set certain global flags and so on. Those would fix some “defects” and break other things at the same time, thus creating some confusion that still persists.
Just to give you some context.
PS: this all was somehow entangled with discussions about purity and ISO standardization and compliance. It was strange.
Well, the only thing that works is unifying the lazy list with either [_|_] or []. This is what DCGs do as long as you do not start dirty hacking, so there is not much of a problem. Things go wrong if you write e.g.
at_eof(End,End) :- End == [].
or the application expects the 3rd argument of phrase/2 to be a list and call e.g., length/2 on it.
Not that much. The way it works is a little tricky. When asked for the position it scans the list forward to the attributed variable. There it finds the position info for the start of the current block that enables it to compute the position for the current point in the list. This works fairly well as the amount to scan is just one block. Doing this without the lazy stuff simply means do the position calculation from the beginning of the list. If the list is long that gets pretty expensive …
That makes some sense, but unlike SICStus, most of SWI-Prolog syntax changes are module local. This holds for operators, but also the double_quotes flag. So, this can work but handling the module context correctly gets pretty hairy if you want to put DCGs in modules. The module-local syntax for double_quotes is intended to allow for using (typically ugly) code that relies all over the place on “” to be a list of character codes (chars).
One of the nasty issues with the code/chars is that your entire application has to agree if text is passed as lists between components. As most existing components are written with codes in mind, I think it is best to stick with codes and improve the environment to make working with them as pleasant as possible. With the development of SWI-7 I’ve considered adding code as a primary type. That would have solved this issue, but it became to complicated and expensive and I dropped the idea.