Quasi-quotations, again

Using the latest SWI-Prolog. I want to use quasi-quotations for more complicated stuff, but first, I would like to understand how to make the easiest thing.

So, I tried like this:

:- quasi_quotation_syntax(multiline).

multiline(Content, _Vars, _Dict, R) :-
    with_quasi_quotation_input(Content, Stream, read_string(Stream, _, R)).


foo(X) :-
    X = {|multiline||
          This is supposed to be
          a "string" over multiple lines.

          But what happens _really_?
          We'll see.
         |}.

Now, this works. I now have a multiline string with quotes in it and so on.

?- foo(X), format("~s~n", [X]).

          This is supposed to be
          a "string" over multiple lines.

          But what happens _really_?
          We'll see.
         
X = "\n          This is supposed to be\n          a \"string\" over multiple lines.\n\n          But what happens _really_?\n          We'll see.\n         ".

A few somewhat related questions, out of ignorance:

  1. Is there an “automatic” way to get rid of the leading space or do I have to handle it in my quasi-quotation parser?

  2. Is there a preferred way of indenting this inside the Prolog code? PceEmacs suggests I do it like this:

baz(X) :-
    X = {|multiline||
        line
        another line
        |}.

Any pointers appreciated.

PS: what is the preferred nomenclature? “Quasi quotations” as two separate words?

1 Like

OK, after I asked I suddenly remembered that I asked already once and Jan answered… apparently, it is only me who has a “strip leading space” fetish. Here is the question and Jan’s reply (from the old mailing list, June 12 2013):

‘Quoted’ in the first example does not (cannot?) contain line-breaks

Between | and |}, everything is allowed, including an unbounded number
of newlines.

In the second example, we have two lines, with significant white space
(how about white space at the end of the line?)
In the second example, the quoted material is two lines, each ending
in a newline

You’ve misunderstood this. The quasi quoter always gets the raw
material as it appears between | and |}. The only thing that is done
to it by read is to respect the encoding of the file and provide a list
of Unicode code points, regardless of the encoding in the file.

A Quasi quoter may do stripping of newlines, delete ^\s*|, etc, but
this is up to the quoter. It can do anything it likes with the quoted
material.

Note that the quoter may decide that it does not allow for newlines
(or set a maximum or have a newline escape, or …) and throw a
syntax error. To the user, that will simply be transparent.

I might write a parser that uses the following convention and share it:

foo(X) :-
    X = {|multiline||
         |This is supposed to be
         |a "string" over multiple lines.
         |
         |There will be no leading white space!
         |}.

While I am at it I might also try to do a “text” quasi quotation grammar that can embed Prolog variables.

Already in the docs it is written as two separate words. I would have used the hyphen, as is “quasi-quotations” and I might as well keep doing it :slight_smile: just to be different.

About the leading white space, as I already replied to myself, it seems to be a personal preference of mine (being able to strip it automagically). I have to still see if it is possible to compose quasi-quotation grammars somehow. I will follow up on this if I find time.

In this other thread, @jan wrote:

One day we should add a nice quasi quotation rule for plain text.

How did you mean that? That you can just embed (unescaped?) variables in free text? As in,

{|plaintext(Name)||Hello, Name. How are you doing today?|}

Or do you think there would have to be some kind of escaping mechanism? How did you imagine it will work?

I am asking because I actually need a plain text quasi quotation rule and might as well make it, but obviously I don’t even know for sure what the interface should look like.

There are many ways to do it; for example Python has formatted string literals, or “f-strings” (one of maybe 5 ways to format strings in Python… “one obvious way” my a**). Those capture named variables from the surrounding scope (implicitly!), like this:

def say_hello(name):
    greeting = "Hello"
    return f"{greeting}, {name}!"

As you see, Python uses curly braces for the variable. Noweb uses double square brackets for something similar, so with quasi quotations it would be:

{|plaintext(Greeting, Name)||[[Greeting]], [[Name]]!|}

(ugh… on the other hand, you almost never have double square brackets in code, markdown, text…)

Or maybe someone has already done the work and wants to share?

While I am interested in what you are doing with quasiquotations with SWI-Prolog, in doing some back research for my own understanding a few weeks ago found the following of value:

Quasiquotation in Lisp by Alan Bawden
Why It’s Nice to be Quoted: Quasiquoting for Haskell by Geoffrey B. Mainland
Why It’s Nice to be Quoted: Quasiquoting for Prolog by Jan Wielemaker
and Michael Hendricks
library(quasi_quotations): Define Quasi Quotation syntax
Package uri_qq by Michael Hendricks - This seems to predate the implementation of quasiquotations in SWI-Prolog and shows some of the same steps others have taken. The reason this is interesting is that it is a natural stepping stone from formatting text to quasiquotations but shows a different syntax than the SWI-Prolog implementation. It reminds me more of string interpolation.
Haskell: Quasiquotation
Wikipedia - String interpolation
Wikipedia - Quasi-quotation

IMHO one should read the history in the papers for what quasiquotations solve or why it was added to a programming language. This then gives one an understanding of why certain options were not used and what specific problems quasiquotations solve. This then helps to understand how to use quasiquotations, etc.

Remember the post about parsing JSON and the difference between parsing JSON with syntax rules and parsing JSON with semantic rules. I plan to see how effective quasiquotations are for the semantic check.

One other thing I am finding is that in some ways quasiquotations used with DCGs are like BNF and BNF is like algebraic data types and algebraic data types can be used like a typing system, so if you have some input that you want to type check, you don’t need to decorate the text with types, but instead pass the text to a quasiquotation and if the quasiquotation is done correctly will do the semantic check and the typing check. I don’t know how close this is to Hindley–Milner type system but the bonus is that you can infer types from untyped text.

So what is the point of all of this for where I am headed and why my interest in quasiquotations? When you have a web server or such receiving text input you want to make sure it is not an attack. Thus create a quasiquotation for the allowed text and it should toss out all but valid input. Also along the way the quasiquotation should pull out the needed argument values, (think Command-line argument parsing but on steroids).
:slightly_smiling_face:

1 Like

Thank you @EricGT. I have a really simple question for the moment though. Or is it 2 questions:

  1. Is there a working quasi quotation for plain text?
  2. If there isn’t, I will make one. Should I quote variables (I think definitely yes but who knows…) and how exactly?

The considerations behind quoting and plain text: it should be possible to achieve quite a lot with “plain text”, even if it isn’t exactly plain text: sometimes it is good enough as a fast-and-dirty way to generate any text.

And because it can be any text, it is nice to have a quoting mechanism that works for as many languages as possible. This is why I actually like the double brackets from Noweb and I think I will now try to write a simplistic quasi quotation parser for plain text that uses double square brackets for this. I will share it when I have it.

Let’s not forget that one of the promises of quasi-quotations is being able to easily make multi-line strings (without any imposed meaning) from inside Prolog source code. This is something I did not say explicitly, I admit.

“Easily” means easy to type, to read, and doesn’t look too bad when it is embedded in the Prolog source.

I did a bit of hacking to see whether I can make the claim for nice multiline text true. Please find a prototype attached. This allows for

:- use_module(string).

test(To) :-
    write({|string(To)||
           |Dear {{To}},
           |
           |I'm happy to announce a string interpolation
           |quasi quoter.
           |}).

After which we can do:

105 ?- test('Prolog user').
Dear Prolog user,

I'm happy to announce a string interpolation
quasi quoter.
true.

Now, there are lots of stuff one might want different, in particular:

  • How to detect and remove leading white space? Now the convention is that if the first char is a newline (following the ||), we do multiline detection removing ^\s*|. There are lots of other options.
  • How to interpolate Prolog variables. Now uses {{Var}}?
  • What do do with the bindings? Now these must be atomic, but there are lots of other options.

If you look at the code, you’ll see this is rather tricky. In the end it produces a dict with a function such that you can use the term anywhere and the string is materialized by evaluating the function, so the parser turns the input into a list of atoms and variables and the function joins the list (now with hopefully bound variables) to get the final string.

string.pl (1.8 KB)

2 Likes

This already looks better than what I was planning. I will use your code to hack around a bit when I find time. But to on face value this looks exactly what I was looking for. Big thank you!

The code should be posted as a new topic in Useful Code category.

Also didn’t realize there was phrase_from_quasi_quotation/2, will have to remember that one.

Do you ever think I will be able to write Prolog code as fast as you? I am always amazed at how fast you write new Prolog code.

Over 30 years of experience using Prolog and implementing a Prolog system :slight_smile: I had the same feeling when I was visiting Edinburgh and saw Richard O’Keefe writing Prolog code :slight_smile:

3 Likes

Good start. Better yet, settle what it should be doing and add it to the library. Does anyone have a good overview on how several languages deal with similar problems so we can define something that looks pretty much familiar to most people with relevant experience?

To answer your questions in a fashion (and I really don’t mind trying to implement it, since I do need it) (and of course those are my opinions)

  • Leading white space removing as an option that is automagically turned on when the content starts with a newline, followed by optional white space and a |. I think in that case every following line MUST start with an identical prefix (exactly same white space and a |). This is roughly how “indent-sensitive” languages work (or should work…). Anything else should be a syntax error. However, what would you do with content that starts like this anyway?
  • Interpolating variables is good as it is at the moment. Again, the double square bracket is almost never used, and neither is the double curly braces.
  • I tend to use the "~w" format specifier since it does “the right thing” for all the things I care about. Not sure if this is a good idea though.

EDIT: some markdowns use {{something}} for quoting preformatted/code.
Question: how strict is the interpolation? Is it an error to have:

{|string(Name)||{{Hello}}, {{Name}}!|}

That is to be defined. As is, {{Var}} is interpolated, where there should be no white space around Var and Var should satisfy the Prolog syntax rules for variables. Anything else between {{...}} is passed verbatim. If this is matched, it is an error if Var does not appear in the parameters.

Note that in my opinion quasi quotations are first of all intended to interpolate given that the input satisfies some language. So, the HTML quasi quotation demands the input to be valid HTML and ensures that interpolated values are properly escaped according to the relevant HTML rules, depending on whether the interpolation happens in CDATA or in an attribute value. That is the real strength of quasi quotations. Plain strings are a relatively simple case :slight_smile:

OK, I admit, I am about to generate SQL. I had to start somewhere though :slight_smile: and plain text sounded easy enough. Now that this is done I can concentrate on the real thing…

And yes, I have heard of bind variables. Sometimes this is not good enough. People come up with all kinds of ways to abuse a database. Those include, but are not limited to:

  • “soft” links to other tables, where the link is a compound of a table name, a column name, and an id into that column (but the table name and the column name are implicit, you have to guess by the shape of the id…)
  • unnecessary replications of schemas and data across separate servers so that you can’t even query it if you wanted to
  • splitting data across tables with different shapes for reasons

It is terrible. :_(

Now that I have this off my chest, it is time to quote my SQL properly (using quasi quotations that actually parse it).

1 Like

You can have a look at the JavaScript quasi quotation library (in the http libdir). That doesn’t fully understand JavaScript, but it does do the tokenization. That is enough to avoid interpolating in the wrong place and ensure Prolog data gets embedded safely in JavaScript code.

1 Like

FWIW, Python has a textwrap.dedent() that I’ve often used to nicely format multiline text. It shouldn’t be too difficult to write something similar in Prolog. The Python code is here and this is the documentation.

1 Like

I think a dedent predicate makes a lot of sense. A lot of the other text wrapping, filling, etc. is already part of the HTML to text rendering library (library/lynx) that is used by help/1 to render the HTML manual on the console.

2 Likes

I’ve added a tentative library(strings) now providing dedent_string/3, interpolate_string/4 and the string quasi quotations, now based on these two predicates. I think that is useful for anyone working with long strings, although
there is a big WARNING telling you that interpolating strings is dangerous if the result is to be processed as some formal language.

See https://github.com/SWI-Prolog/swipl-devel/blob/16f07a5abe9f943425ad06952a36727c4317b7b1/library/strings.pl

Comments (naming, consistency with conventions elsewhere, etc) are welcome.

6 Likes

I was going to offer to write a dedent predicate; and you did it before I could even make the offer (in my defence, I’m 9 time zones behind you).
@jan - you have to slow down and give the rest of us a chance!

2 Likes