Parser for a simple DSL

Hello,
can You give me a small simple Example on "how to load a file with the contents of three words: “for me do”, and parsing it in SWIPL ?
Thanks for helping
Jens

What is there to parse in this file?

$ echo "for me do" > small_file
$ swipl
Welcome to SWI-Prolog (threaded, 64 bits, version 9.1.21)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.
Please run ?- license. for legal details.

For online help and background, visit https://www.swi-prolog.org
For built-in help, use ?- help(Topic). or ?- apropos(Word).

?- phrase_from_file("for me do\n", small_file).
true.

At least explain what kind of AST you expect to get from such input?

To parse into lists of codes:

words([W|Ws]) -->
    word(W),
    next_words(Ws).
    
word([]) --> [].
word([C|Cs]) -->
    [C],
    { code_type(C, alnum) },
    word(Cs).
    
next_words([]) --> "\n".
next_words(Ws) -->
    " ",
    words(Ws).

Usage:

?- once(phrase_from_file(words(Ws), 'small_file')).
Ws = [[102, 111, 114], [109, 101], [100, 111]].

portray_text/1 is handy, to represent code lists of length 3 or greater:

?- once(phrase_from_file(words(Ws), 'small_file')).
Ws = [`for`, [109, 101], [100, 111]].
1 Like

You really should be using library(dcg/basics) and library(dcg/high_order) for such things. I would nevertheless wait for @paule32 to show what AST he would expect from their example input. Still:

Your words//1 could be a sequence//2

your word//1 could be maybe csym//1 or nonblanks//1?

There are also integer//1, xdigits//1, …

I get other output:
grafik

It does not help beginners to understand a simple example, if parts of the puzzle are hidden in a library they don’t immediately know how to see.

I like DCGs which are specific/explicit, to prevent surprises.

It’s the same as my 1st of 2 showings.

1 Like

sorry for idle, have doing house hold work…
however. The AST in flex could be:

digits   [0-9]*
id        [_a-zA-Z0-9]

either then:

"for"   { do something }
"me" { do other thing }
"do" { do simple }

or:

id  { get id name, and handle the name. then do things on it }

and in the grammar:

start
  : /* could be empty */
  | for_token
 | me_token
 | do_token
 ;

for_token
 : for do me
| do me
| do
;:

as example in source file:

/* empty or whitespaces line/s) */
do me
me
for me do

did I miss something ?
You show:

Ws = [`for`, [109, 101], [100, 111]].

I get:

Ws = [[102, 111, 114], [109, 101], [100, 111]].

or did You mean the using of portray_text/1 ?

This is just busy work. The source code is out there, you can always read it.

1 Like

don’t worry. I have’nt used SWIPL for a while.
Thanks for your hints.

I have a bit of trouble following.

How familiar are you with DCGs? The main point is that to parse anything, you should probably use a DCG, unless there is a good reason not to. Then, you could additionally use code_type/2 as @brebs did in their example. From the docs of char_type/2:

csym
Char is a letter (upper- or lowercase), digit or the underscore (_). These are valid C and Prolog symbol characters.

So your [_a-zA-Z0-9] seems to be exactly a csym?

It gets interesting when you take a Pascal or C style language and then use DCGs to parse it to an AST but not clear from your example so far.