Solving left recursion in sub-production rule


#1

Hi, logic programmers! I’m writing a C-like language parser that looks like this:

expression(Input) --> simple_expression(Input).
expression(Input) --> invocation(Input).
simple_expression(Input) --> 
  literal(Input) | symbol(Input) | function(Input).
invocation(funcall(Func, Args)) -->
  expression(Func), ['('], invocation_args(Args), [')'].
function(func(Params, Body)) -->
  ['['], function_params(Params), [']'],
  ['{'], expression(Body), ['}'].

For example, I’d like the following program to be valid and create the following parse tree:

?- string_chars("[x]{ [y]{ x * y } }(2)(3)", Cs), phrase(expression(Tree), Cs, []).
Tree = funcall(
  funcall(
    func([symbol(x)],
      func([symbol(y)],
        op('*', symbol(x), symbol(y)))),
    [number(2)]),
  [number(3)]).

The issue is that this grammar is left-recursive on expression -> invocation -> expression. I’m trying to solve it from some resources online but would appreciate the input of more experienced people:

  1. factoring out the left recursion, the relevant part of the grammar becomes
expression(Input) --> simple_expression(Input).
expression(funcall(Func, Args)) -->
  simple_expression(Func), inv_args(Input).
inv_args(Args) -->
  ['('], invocation_args(Args), [')'].
inv_args(????) -->
  ['('], invocation_args(Args1), [')'], inv_args(Args2).

This would change the parsing tree, and I don’t know how best to handle it.

  1. keep the left recursion and include an additional difference list to invocation, expecting to consume two chars:
invocation(funcall(Func, Args), [_,_|T], T) -->
  expression(Func), ['('], invocation_args(Args), [')'].

In this case, how should the difference list be “propagated”? Do I need to change all production rules to include it?

  1. Finally, if I were using a procedural language, I would use some local state to keep the expression read so far, and lookahead for a parenthesis to parse an invocation.
def parse_expression(stream):
  expr = parse_simple_expression(stream)
  while stream.peek() == '(':
    args = parse_inv_args(stream)
    expr = FunCall(expr, args)
  return expr

Can this be emulated using semicontext pushback?

Thanks for any pointers,

Bruno Kim.


#2

You can use tabling - indeed, parsing left recursive languages was one of the primary targets of this extension recently added to SWI-Prolog.

:- module(lr_parse, [expression//1]).

% deprecated ?
% :- use_module(library(tabling)).

:- use_module(library(dcg/basics)).
:- use_module(library(dcg/high_order)).

:- table expression//1.

expression(Input) --> simple_expression(Input).
expression(Input) --> invocation(Input).

simple_expression(Input) -->
  literal(Input) | symbol(Input) | function(Input).

invocation(funcall(Func, Args)) -->
  expression(Func), "(", invocation_args(Args), ")".

function(func(Params, Body)) -->
  "[", function_params(Params), "]",
  "{", expression(Body), "}".

literal(N) --> number(N).

symbol(symbol(S)) -->
    code_type(csymf,First),
    sequence(code_type(csym),Rest),
    {atom_codes(S,[First|Rest])
    ,debug(lr_parse,'~w',symbol(S))
    }.

code_type(Type,C) -->
    [C], {code_type(C,Type)}.

function_params(Ps) -->
    sequence(symbol, (blanks,",",blanks), Ps).

invocation_args(Args) -->
    sequence(expression, (blanks,",",blanks), Args).

I don’t understand your grammar too well, and I provided some of the missing non terminals and tokenize phase with freedom, but your question was on point for me to explore library(dcg/high_order).

While debugging tabled grammars, you could need to issue

?- abolish_all_tables.

otherwise you could get a failure but no tracing (goals already failed aren’t retried).

My test:

?- phrase(expression(E), `[x]{[y]{x}}(2)(3, 4, u)`).
E = funcall(funcall(func([symbol(x)], func([symbol(y)], symbol(x))), [2]), [3, 4, symbol(u)]).

HTH, Carlo


#3

Thanks for the tips, it worked well! I also didn’t know about dcg/basics, had already recreated some of its stuff.

Regarding your comment on the need to import tabling: I had to import it in my system (SWI-Prolog version 7.6.4).


#4

Sorry, but if not impossible, it will be very difficult.

If under pressure, maybe I could try to emulate (limited) tabling, ‘bracketing’ expression//1 call ports (you know, those verbs you see starting a trace line when debugging…) with a memo mechanism, storing at least the first hidden argument and the outcome of expression//1.

But surely would be buggy, slow, and painful to develop and debug…


#5

I don’t know how to retrieve when tabling has been implemented, but remember my first test was with a DCG. Have you tried to uncomment this directive ? What swipl 7.6.4 does ?

% deprecated ?
% :- use_module(library(tabling)).

#6

Oh, I think I expressed myself badly, I was just remarking that I had to uncomment the use_module line to make it work on my system – but it did work. No need to fake tabling :slight_smile:


#7

Phew …

You expressed rather clearly, but I’m so bad at English, had read too much in your post…