Sequence/3: problematic interaction with dcg/basics

I must say that I don’t know enough about the scoping of cuts, so the source code of sequence/3 is baffling me… in the sense that I cannot find the exact point of failure (the nearest point seems to be located near line 128 in dcg/high_order.pl).

In the following module, I tried to report the problem, and also to show what I was expecting, in a (partial) implementation, named naive_seq/3.

Of course, I would like test(ko1) and test(ko2) to succeed…

:- module(test_parse,
          []).

:- begin_tests(test_parse).

:- use_module(library(dcg/basics)).
:- use_module(library(dcg/high_order)).
:- use_module(library(plunit)).

codes(`1 2 3 | 4 5 6`).

integers_ko1(Is) --> sequence(integer, whites, Is).
integers_ko2(Is) --> sequence(integer, white, Is).

integers_ok1([]) --> [].
integers_ok1([I|Is]) --> integer(I), ( " " -> integers_ok1(Is) ; {Is = []} ).

naive_seq(OnElem, OnSep, Es) -->
    {var(Es)} -> naive_seq_(Es, OnElem, OnSep).

naive_seq_([], _OnElem, _OnSep) --> [].
naive_seq_([H|T], OnElem, OnSep) -->
  call(OnElem, H),
  ( OnSep -> naive_seq_(T, OnElem, OnSep) ; {T = []} ).

integers_ok2(Is) --> naive_seq(integer, whites, Is).
integers_ok3(Is) --> naive_seq(integer, white, Is).

test(ko1, [fail]) :-
    codes(Cs),
    phrase((
        integers_ko1(_),
        whites,
        "|",
        whites,
        integers_ko1(_)), Cs).

test(ko2, [fail]) :-
    codes(Cs),
    phrase((
        integers_ko2(_),
        whites,
        "|",
        whites,
        integers_ko2(_)), Cs).

test(ok1, [true(Is1=[1,2,3]), true(Is2=[4,5,6])]) :-
    codes(Cs),
    phrase((
        integers_ok1(Is1),
        whites,
        "|",
        whites,
        integers_ok1(Is2)), Cs).

test(ok2, [true(Is1=[1,2,3]), true(Is2=[4,5,6])]) :-
    codes(Cs),
    phrase((
        integers_ok2(Is1),
        whites,
        "|",
        whites,
        integers_ok2(Is2)), Cs).

test(ok3, [true(Is1=[1,2,3]), true(Is2=[4,5,6])]) :-
    codes(Cs),
    phrase((
        integers_ok3(Is1),
        whites,
        "|",
        whites,
        integers_ok3(Is2)), Cs).

:- end_tests(test_parse).

Sideway question: is a cut after → (like seen on line 128 of dcg/high_order.pl) an efficiency booster ? I thought that the arrow already implemented the cut. I’m wrong ?

Is that mixture of ok and ko intentional?

Yes, of course. I tried to be explicit about I was expecting…
So, I didn’t expect ko1 (i.e., sequence(integer, whites, Is) to fail, and I show some naive code that performs (i. think) correctly, (i.e. ok1, ok2, ok3).

No. It cuts the call(OnElem,H) choice point left intact by the soft-cut (*->). The sequence//2 in parsing mode commits at each Separator. That is why using whites//0 doesn’t work at this always succeeds, processing no input if the current code is not white space. So, to use sequence//2 to deal with white space delimited content you need to use (white,whites) as separator.

Parsing with DCGs is a bit trickier than what many text books try to tell you. If you do not regularly commit you quickly build up so many choicepoints on a real document that a possible failure will simply cause the grammar to backtrack (practically) forever. The sequence//2 predicate assumes that a matched object followed by the defined separator is a reason to commit.

The motivation for adding the sequence operators was using DCGs as serializers, notable for the HTML generation framework.

Thanks, Jan.

This was one of the first things I tried, because I’m aware of (some) of the subtleties of parsing/generating with DCGs (or others languages that I used to parse/generate data for sizable applicative problems).

Didn’t worked… FWIW, I attach the amended test_parse module, where now there is a test(ko0), with the corresponding integers_ko0//1 using your suggestion.

:- module(test_parse,
          []).

:- use_module(library(plunit)).
:- use_module(library(dcg/basics)).
:- use_module(library(dcg/high_order)).

codes(`1 2 3 | 4 5 6`).

:- begin_tests(test_parse).

integers_ko0(Is) --> sequence(integer, (white,whites), Is).

integers_ko1(Is) --> sequence(integer, whites, Is).
integers_ko2(Is) --> sequence(integer, white, Is).

integers_ok1([]) --> [].
integers_ok1([I|Is]) --> integer(I), ( " " -> integers_ok1(Is) ; {Is = []} ).

naive_seq(OnElem, OnSep, Es) -->
    {var(Es)} -> naive_seq_(Es, OnElem, OnSep).

naive_seq_([], _OnElem, _OnSep) --> [].
naive_seq_([H|T], OnElem, OnSep) -->
  call(OnElem, H),
  ( OnSep -> naive_seq_(T, OnElem, OnSep) ; {T = []} ).

integers_ok2(Is) --> naive_seq(integer, whites, Is).
integers_ok3(Is) --> naive_seq(integer, white, Is).

test(ko0, [fail]) :-
    codes(Cs),
    phrase((
        integers_ko0(_),
        whites,
        "|",
        whites,
        integers_ko0(_)), Cs).

test(ko1, [fail]) :-
    codes(Cs),
    phrase((
        integers_ko1(_),
        whites,
        "|",
        whites,
        integers_ko1(_)), Cs).

test(ko2, [fail]) :-
    codes(Cs),
    phrase((
        integers_ko2(_),
        whites,
        "|",
        whites,
        integers_ko2(_)), Cs).

test(ok1, [true(Is1=[1,2,3]), true(Is2=[4,5,6])]) :-
    codes(Cs),
    phrase((
        integers_ok1(Is1),
        whites,
        "|",
        whites,
        integers_ok1(Is2)), Cs).

test(ok2, [true(Is1=[1,2,3]), true(Is2=[4,5,6])]) :-
    codes(Cs),
    phrase((
        integers_ok2(Is1),
        whites,
        "|",
        whites,
        integers_ok2(Is2)), Cs).

test(ok3, [true(Is1=[1,2,3]), true(Is2=[4,5,6])]) :-
    codes(Cs),
    phrase((
        integers_ok3(Is1),
        whites,
        "|",
        whites,
        integers_ok3(Is2)), Cs).

:- end_tests(test_parse).

I would be very interested in opinions from others about this problem… is just me ?

FWIW, I noticed that the (weird for me) construct ... -> !, ... appears also at line 152 of dcg/high_order.pl, without a soft-cut preceding:

sequence_as([H|T], OnElem, OnSep) -->
    call(OnElem, H),
    (   OnSep
    ->  !,
        sequence_as(T, OnElem, OnSep)
    ;   {T=[]}
    ).

It’s the code I used as a blueprint to naive_seq//3, but - of course - I omitted the cut…

sequence//3 expects the whole input to be a sequence. If you want the sequence to end at something you need sequence//5.

It kills the choice points of call(OnElem,H) again. Note that the ! kills all choicepoints created since the predicate was entered. In both cases, the idea is that if we find a separator we kill possible choice points of the element matcher. Whether that is good or bad, I don’t know. It all depends on the use case. That is the weak point of these high order predicates as parsers: depending on the use case you want to restrain non-determinism of the parser at different places. Of course, some people argue you should not restrain it at all …

Ok, thanks for your help