Dcg/high_order sequence/3 nested with the same separator

dklyons81 · December 2, 2022, 12:03am

Hi,

I’m trying to parse the Advent of Code 2022 Day 1 input file, which looks like this:

For the purposes of this challenge, this should parse to [[1000,2000,3000], [4000], [5000,6000], [7000,8000,9000], [10000]]. From reading an earlier topic, it seems like this should work:

phrase(sequence(sequence(integer, "\n"), "\n", Integers), Input).

However, this does not work, whereas changing the input so that the separators are different does. I suspect this has something to do with lookahead in the sequence parsing.

I don’t understand enough about how sequence//3 and friends are implemented to fix this, and anyway, it seems like a long shot. But I am wondering if anyone has a clever workaround for this.

Thanks!

EricGT · December 2, 2022, 6:57am

This is just a guess, I have not tired this in code.

I suspect the problem is with the last integer.
If I understand your code correctly the inner sequence/3 expects it to have \n but it does not and so fails. However it does have an eof delimiter so the inner rule needs to accept either

integer,‘\n’
integer, eof

Then the outer sequence/3 will fail for the last integer as the sequence does not end with \n. However one can not use the eof for the outer sequence as the inner one ate it. But if one thinks of recursive cases and base cases one so often sees one of the base cases for the empty input and that is what is needed here. So the outer sequence/3 also needs to accept empty but only if it fails for integer,\n

I don’t think lookahead is needed to solve this problem but then as I noted I have not tried solving this with working code.

I also suspect the rules could be modified so that the eof is eaten by the outer sequence and the inner sequence just accepts integer without \n only after failing for integer with \n leaving the eof for the outer sequence/3.

Ian · December 2, 2022, 10:38am

When I did this two years ago(!), I made my own combined version of dcg basics + high order, as the libraries sequence commits greedily, and makes it harder to parse this type of ‘ambiguous’ input. (I also wrote a version for Scryer prolog to contrast and compare hence the naming! ) .I’ve pasted my version below, and the answers to day1 using it.

:- module(dcg_basics_swi, [
    digits//1,
    digit//1,
    blank//0,
    blank//1,
    blanks//0,
    blanks_to_nl//0,
    nonblank//1,
    hex_digit//1,
    integer//1,
    word//1,
    sequence//3,
    sequence//5,
    string_upto//2,
    string_without//2,
    white//0,
    whites//0,
    eos//0
    ]).

word([S|Ss]) --> alphanum(S), word(Ss).
word([]) --> [].

alphanum(S) --> [S], {
   char_type(S,alnum)
}.

digits([D|Ds]) --> digit(D), !, digits(Ds).
digits([]) --> [].

digit(D) --> [D], {
  D >= 0'0, D =< 0'9
}.

hex_digit(D) --> [D], {
  (D >= 0'0, D =< 0'9) ; (D >= 0'a, D =< 0'f )
}.

integer(N) --> "-", digits(Ds), {
  Ds = [_|_],
  number_chars(PosN,Ds),
  N is -PosN
}, !.

integer(N) --> ("+" ; []), digits(Ds), {
  Ds = [_|_],
  number_chars(PosN,Ds),
  N is PosN
}, !.

blank --> [C], {
  char_type(C,white)
}.

nonblank(C) --> [C], {
  \+ char_type(C,white)
}.

blanks --> blank, !, blanks.
blanks --> [].

blanks_to_nl, "\n" --> whites, "\n".

white --> [C], {
  C \= 0'\n,
  char_type(C, white)
}.

whites --> white, !, whites.
whites --> [].

string_without(Chars, [S|Ss])  --> [S], { \+ memberchk(S, Chars) },!, string_without(Chars,Ss).
string_without(Chars, []), [S] --> [S], { memberchk(S, Chars) }, !.
string_without(_, []) --> eos.

string_upto(Guard, S) --> { length(S,_)}, S, Guard, !.

eos --> [].

%%%%%%% Generation   %%%%%%%

white(" ") --> " ". % generating whitespace
blank(" ") --> " ".

%%%%%%% Higher Order %%%%%%%
sequence(Start, Element, Sep, End, Es) --> Start, sequence(Element, Sep, Es), End.

sequence(Element, Sep, [E|Es]) --> call(Element,E), Sep, sequence(Element, Sep, Es).
sequence(Element, _ , [E])     --> call(Element,E).
sequence(_      , _ , [])      --> [].

day 1

:-use_module(dcg_basics_swi).

l(L)-->sequence((sequence(integer,"\n")),"\n",L).

part1:-
    phrase_from_file(l(L),'./day.1.input'),
    maplist(sumlist,L,T),
    sort(0,@>,T,[Max|_]),
    writeln(Max).

part2:-
    phrase_from_file(l(L),'./day.1.input'),
    maplist(sumlist,L,T),
    sort(0,@>,T,[A,B,C|_]),
    format('~d~n',A+B+C).

Boris · December 4, 2022, 9:06pm

The straight-forward solution would be to define “integer on a line” separately. Like this:

int_line(N) --> integer(N), "\n".

and then you can do:

phrase(sequence(int_line, "\n", Ints), ...)

But you should be using phrase_from_file/2 maybe. And sometimes there are choice points left over after using sequence/2 for parsing (instead of generating).

Here is the difference:

?- phrase(sequence(digit, Ds), `123`).
Ds = [49, 50, 51] ;
false.

?- phrase(sequence(atom, [1,2,3]), Out).
Out = [49, 50, 51].

Topic		Replies	Views
Confusing behavior of sequence//3 and sequence//5 from dcg/high_order Help! how-to	18	1366	December 25, 2020
Problem with number//1 Help! dcg	12	968	July 15, 2021
Advent of code 2023 General	248	2883	December 2, 2024
Is there a preferred way to check for valid characters when using DCG? Help!	5	846	March 12, 2019
An experiment on a mutually recursive system of closures General	0	122	June 2, 2024

Dcg/high_order sequence/3 nested with the same separator

Related topics