Controlling backtracking in a DCG grammar?

prodog · July 10, 2019, 1:58am

I’m using: SWI-Prolog version 8.1.9

Given the simple DCG below:

item_specifier(youngest).
item_specifier(son).
item_specifier(oldest).
item_specifier(room).
	
dcg_item_model -->
	dcg_item,
	dcg_item_connector_phrase,
	dcg_item_specifier.
	
dcg_item --> [key].

% Specifiers can occur in a row.
dcg_item_specifier --> [Specifier], { item_specifier(Specifier) }, dcg_item_specifier.
dcg_item_specifier --> [].

dcg_item_connector_phrase --> [for, the].
dcg_item_connector_phrase --> [to, the].
dcg_item_connector_phrase --> [who, is, the].
dcg_item_connector_phrase --> [belong, to].

I get the desired result on the first solution using the following goal:

phrase(dcg_item_model, [key,to,the,youngest,son,room], R).
R = [] ;
R = [room] ;
R = [son, room] ;
R = [youngest, son, room].

However, upon backtracking, the code appears to progressively unwind the successful matches by the dcg_item_specifier rule until all the item specifiers end up in the tail of the difference list.

Some questions:

Suppose I wanted to take the first solution, which seems to me to be the best, and then end the search for further solutions. In other words, in a DCG rule, where do I put the cut or whatever the DCG equivalent is?
Why would I want to keep the inferior solutions offered upon backtracking? Are there use cases or contexts that I am too inexperienced to understand as new DCG user? Or is it always better to simply not allow that behavior (not allow backtracking)?

EricGT · July 10, 2019, 11:33am

The clause(s) that will cause backtracking are

item_specifier/1

?- item_specifier(X).
X = youngest ;
X = son ;
X = oldest ;
X = room.

dcg_item_connector_phrase//0

?- dcg_item_connector_phrase(S0,S).
S0 = [for, the|S] ;
S0 = [to, the|S] ;
S0 = [who, is, the|S] ;
S0 = [belong, to|S].

so to have only one possible solution would require placing a cut after them. Since the question specifically ask about dcg_item_model//0 and dcg_item_connector_phrase//0 is before dcg_item_specifier//0, a cut is only needed after dcg_item_specifier//0.

dcg_item_model -->
	dcg_item,
	dcg_item_connector_phrase,
	dcg_item_specifier, !.

?- make.
% c:/users/eric/documents/projects/prolog/swi-discourse_005 compiled 0.00 sec, 0 clauses
true.

?- phrase(dcg_item_model, [key,to,the,youngest,son,room], R).
R = [].

Note:

dcg_item_specifier//0 causes an infinite loop

?- dcg_item_specifier(S0,S).
ERROR: Stack limit (1.0Gb) exceeded
ERROR:   Stack sizes: local: 0.9Gb, global: 66.6Mb, trail: 61.3Mb
ERROR:   Stack depth: 2,911,799, last-call: 0%, Choice points: 5,823,584
ERROR:   Possible non-terminating recursion:
ERROR:     [2,911,798] user:dcg_item_specifier([length:1|_17470852], _17470846)
ERROR:     [2,911,797] user:dcg_item_specifier([length:2|_17470878], _17470872)
% Execution Aborted

Remember that with DCGs you can use them for more than just parsing. If you are doing a BOM example for bicycles and wanted to know the variations on bikes with three wheels then you would want more than one possible answer. If you are using them to solve combinatoric problems then you might want all the solutions. Using DCGs to generate exhaustive cases is something I do from time to time, but then you have to be ever cautious of combinatorial explosion.

prodog · July 10, 2019, 11:43am

Thanks again Eric.

Interesting point. I want my DCG to be able to handle sentences where there may be one or more item specifiers in a row, which is why it is currently written like that with the recursive call.

Suppose I wanted to avoid putting a cut in dcg_item_specifier to make it a complete generator as you mentioned. How would I still make it handle contiguous item specifiers without leaving it vulnerable to a recursion loop as it is now?

prodog · July 10, 2019, 12:09pm

Please don’t worry about it. I thought you might have an easy answer to it. For now, I’ll just add a counter or something to stop at 5 since there would never be a case of more than that. I just wanted to make sure there wasn’t a DCG idiom for solving the problem that I was unaware of before I added the counter.

EricGT · July 10, 2019, 1:39pm

Still don’t have an answer I like but this was in the ball park. Giving this now because it might help you find the answer before I do.

How to write an efficient DCG acceptor for strings of unique tokens?

While I can probably answer this at present, I don’t like the way it works; my current means generates all possible permutations and combinations first then pulls from that to fill the gap, not very efficient.

See: comb/2

I am starting to think tabling, but haven’t really used it before other than for some quick examples.

EDIT

Markus always has useful information, sometimes not so obvious. See: Pruning the search. I think part of the answer lies in the use of something similar to all_disctinct/1 but a way to limit the size of the list is also needed, e.g. length/2 and one more constraint to keep length/2 from going to infinity.

EDIT

I think I have a working answer that I like, need to kick the tires some more. Care to solve the problem on your own if I give you some hints?

prodog · July 11, 2019, 2:27am

@jan Seems like comb/2 as mentioned in EricGT’s reply is a good candidate for library(lists) alongside `permutation/2’?

EricGT · July 11, 2019, 9:45am

Did you check comb/2 for bugs?

comb/2 by repeat

comb(As,Bs) :-
    same_length(As,Full),
    Bs = [_|_],
    prefix(Bs,Full),
    foldl(selectd,Bs,As,_).

selectd(E,[A|As],Bs1) :-
    if_(A = E, As = Bs1,
               (Bs1 = [A|Bs], selectd(E,As,Bs))).

if_(If_1, Then_0, Else_0) :-
    call(If_1, T),
    (  T == true -> call(Then_0)
    ;  T == false -> call(Else_0)
    ;  nonvar(T) -> throw(error(type_error(boolean,T),_))
    ;  /* var(T) */ throw(error(instantiation_error,_))
    ).

=(X, Y, T) :-
    (  X == Y -> T = true
    ;  X \= Y -> T = false
    ;  T = true, X = Y
    ;  T = false,
        dif(X, Y)                             % ISO extension
        % throw(error(instantiation_error,_)) % ISO strict
    ).

:- begin_tests(comb).

comb_test_case([]     ,[ []                                                                                              ] ).
comb_test_case([a]    ,[ [a]                                                                                             ] ).
comb_test_case([a,a]  ,[ [a],[a,a]                                                                                       ] ).
comb_test_case([a,b]  ,[ [a],[b],[a,b],[b,a]                                                                             ] ).
comb_test_case([a,b,c],[ [a],[b],[c],[a,b],[a,c],[b,a],[b,c],[c,a],[c,b],[a,b,c],[a,c,b],[b,a,c],[b,c,a],[c,a,b],[c,b,a] ] ).

test(001, [forall(comb_test_case(Input,Expected_result)),all(Result == Expected_result)]) :-
    comb(Input,Result).

:- end_tests(comb).

?- run_tests.
% PL-Unit: comb 
ERROR: c:/users/eric/documents/projects/prolog/swi-discourse_005a.pl:47:
        test 1 (forall bindings = [[],[[]]]): wrong "all" answer:
ERROR:     Expected: [[]]
ERROR:        Found: []
.... done
% 1 test failed
% 4 tests passed
false.

Repeat said the fix was simple, but I could not figure out how to fix it. Would be nice to know the fix.

jan · July 11, 2019, 10:08am

In any case, we do not want library(lists) to depend on constraints. None of the rest of the library does and secretly introducing delayed goals can lead to all sort of surprises. I also think the name is (too) short and confusing. The complexity is really bad and any programmer should think three times before using a relation like this.

If anything, it might be nice to have a sub_list/2 or something like that that selects a subset of the elements of a lst while preserving order. Problem is, should it enumerate starting with the empty list or the input list? Such choices can matter enormously for the overall complexity. It might be better to leave that too to the user as (1) it is wise to rethink your problem if you need it and (2) you may want to consider the order of testing.

EricGT · July 11, 2019, 10:11am

Not understanding that statement. Can you elaborate?

jan · July 11, 2019, 10:48am

I guess generating (empty–>all or the other way around) would have been more appropriate. In particular domains you may have heuristics to explore the search space more efficient.

EricGT · July 11, 2019, 12:49pm

DCGs are used many ways and for many problems, the problem is that most of the talk and examples found by Googling may lead someone first learning about them to believe that they are only used for deterministic parsing or NLP which does DCGs a disservice; they can do so much more.

In your original clause (dcg_item_model//0) it was used with no parameters except the two hidden variables used with DCGs and because it was used with phrase/2, the input for the first hidden parameter had an argument. Upon checking dcg_item_specifier//0 it was found to cause an infinite loop when used with the most general query.

So your code worked as expected based on what you wrote it for your need, the problem started when I used a specific clause (dcg_item_specifier//0) with the most general query. The natural thing with the clause being a DCG was to make it work as the most general query, in this case as a generator, and thus the question.

In trying to understand how to get the DCGs to both parse and generate it became apparent that there was nothing wrong with dcg_item_specifier\\0 for recognizing, (a parser returns something and a recognizer just returns true or false)

dcg_item_specifier --> 
   [Specifier], 
   { item_specifier(Specifier) }, 
   dcg_item_specifier.
dcg_item_specifier --> [].

the problem was using it for generating.

So the code needs to be modified to work as a generator.

dcg_item_specifier//0 modified into
dcg_item_specifier//1 to be used as a generator.

dcg_item_specifier([Specifier|Specifiers]) -->
    [Specifier],
    { item_specifier(Specifier) },
    dcg_item_specifier(Specifiers).
dcg_item_specifier([]) --> [].

Still works as a parser,

dcg_item_specifier_test_case([]                 , []                  ,[]).
dcg_item_specifier_test_case([youngest,son,room], [youngest,son,room] ,[]).
dcg_item_specifier_test_case([room]             , [room]              ,[]).

test(001, [nondet,forall(dcg_item_specifier_test_case(Input,Expected_specifiers,Expected_rest))]) :-
    DCG = dcg_item_specifier(Specifiers),
    phrase(DCG,Input,Rest),
    assertion( Specifiers == Expected_specifiers ),
    assertion( Rest == Expected_rest ).

:- end_tests(dcg_item_specifier).

and still fails as a generator

?- dcg_item_specifier(Specifiers,S0,S).
ERROR: Stack limit (1.0Gb) exceeded
ERROR:   Stack sizes: local: 0.8Gb, global: 0.1Gb, trail: 73.5Mb
ERROR:   Stack depth: 2,548,715, last-call: 0%, Choice points: 5,097,417
ERROR:   Possible non-terminating recursion:
ERROR:     [2,548,715] user:dcg_item_specifier(_30584596, _30584598, _30584600)
ERROR:     [2,548,714] user:dcg_item_specifier([length:1|_30584628], [length:1|_30584634], _30584622)

but as hard as this may be to believe, the answer is now much closer than it seems.

While dcg_item_specifier//1 is now working as a parser and not as a generator, the rest the code needs to be converted to work as both a parser and a generator.

item_specifier(youngest).
item_specifier(son).
item_specifier(oldest).
item_specifier(room).

dcg_item_model(Atom) -->
	dcg_item(Item),
	dcg_item_connector_phrase(Connector),
    dcg_item_specifier(Specifiers),
    {
        append([[Item],Connector,Specifiers],List),
        atomic_list_concat(List, ' ', Atom)
    }.

dcg_item(key) --> [key].

% Specifiers can occur in a row.
dcg_item_specifier([Specifier|Specifiers]) -->
    [Specifier],
    { item_specifier(Specifier) },
    dcg_item_specifier(Specifiers).
dcg_item_specifier([]) --> [].

dcg_item_connector_phrase([for,the]) --> [for, the].
dcg_item_connector_phrase([to,the]) --> [to, the].
dcg_item_connector_phrase([who,is,the]) --> [who, is, the].
dcg_item_connector_phrase([belong,to]) --> [belong, to].

Still works as a parser,

:- begin_tests(dcg_item_model).

dcg_item_model_test_case([key,for,the,youngest,son,room], 'key for the youngest son room' ,[]).
dcg_item_model_test_case([key,who,is,the,oldest,son]    , 'key who is the oldest son'     ,[]).
dcg_item_model_test_case([key,for,the,room]             , 'key for the room'              ,[]).

test(001, [nondet,forall(dcg_item_model_test_case(Input,Expected_sentence,Expected_rest))]) :-
    DCG = dcg_item_model(Sentence),
    phrase(DCG,Input,Rest),
    assertion( Sentence == Expected_sentence ),
    assertion( Rest == Expected_rest ).

dcg_item_model_fail_test_case([]).
dcg_item_model_fail_test_case([one]).

test(002, [fail,forall(dcg_item_model_fail_test_case(Input))]) :-
    DCG = dcg_item_model(_Sentence),
    phrase(DCG,Input,_Rest).

test(002,[nondet]) :-
    Input = [key,for,the,room],
    DCG = dcg_item_model(Sentence),
    phrase(DCG,Input,Rest),
    assertion( Sentence == 'key for the room' ),
    assertion( Rest == [] ).

:- end_tests(dcg_item_model).

?- run_tests(dcg_item_model).
% PL-Unit: dcg_item_model ...... done
% All 6 tests passed
true.

and still fails as a generator

?- dcg_item_model(Sentence,S0,S).
ERROR: Stack limit (1.0Gb) exceeded
ERROR:   Stack sizes: local: 0.8Gb, global: 0.1Gb, trail: 73.5Mb
ERROR:   Stack depth: 2,548,715, last-call: 0%, Choice points: 5,097,416
ERROR:   Possible non-terminating recursion:
ERROR:     [2,548,715] user:dcg_item_specifier(_30584614, _30584616, _30584618)
ERROR:     [2,548,714] user:dcg_item_specifier([length:1|_30584646], [length:1|_30584652], _30584640)

% Execution Aborted

Now for the serendipity.

As noted in an earlier reply, I found How to write an efficient DCG acceptor for strings of unique tokens? to be of value. The part of value is in the response by Paulo

have separate grammars for generating and recognizing.

If you use DCGs for both parsing and generating often you will see the problem of how to do both with one clause is very common. The reference was used to give a hint without giving the answer away.

When I first stated coding with Prolog I was writing code as book answers, and not as real world code. Then having spent some time looking at the SWI-Prolog code on GitHub started to see more uses of guard statements and conditionals.

Also years ago learned about using length/2 to generate a bunch of holes that needed filling.

?- length(N,L).
N = [],
L = 0 ;
N = [_914],
L = 1 ;
N = [_914, _920],
L = 2 ;
N = [_914, _920, _926],
L = 3 ;
N = [_914, _920, _926, _932],
L = 4 ;
N = [_914, _920, _926, _932, _938],
L = 5 ;

When dcg_item_specifier//1 generates it really just needs to take values from item_specifier/1 and drop the values in the holes created by the result of length/2. The problem with using length/2 is that it will go on forever and again back to an infinite loop, but by constraining the values for N to be between 1 and the number of item_specifier/1 clauses and making sure the values placed in the holes are unique for a response, the generator should work. Notice how that is a declarative description of the problem and not a procedural description of the problem.

dcg_item_generator(Specifiers,S0,_S) :-
    aggregate_all(count, item_specifier(_Specifier), Count),
    between(1,Count,N),
    length(Specifiers,N),
    dcg_item_specifier(Specifiers,S0,[]),
    list_distinct(Specifiers).

list_distinct(List) :-
    list_to_set(List, Set),
    length(List,List_length),
    length(Set,List_length).

:- begin_tests(dcg_item_generator).

test(001, [all(Specifiers ==
    [
     [youngest],[son],[oldest],[room],
     [youngest,son],[youngest,oldest],[youngest,room],
     [son,youngest],[son,oldest],[son,room],
     [oldest,youngest],[oldest,son],[oldest,room],
     [room,youngest],[room,son],[room,oldest],
     [youngest,son,oldest],[youngest,son,room],[youngest,oldest,son],[youngest,oldest,room],[youngest,room,son],[youngest,room,oldest],
     [son,youngest,oldest],[son,youngest,room],[son,oldest,youngest],[son,oldest,room],[son,room,youngest],[son,room,oldest],
     [oldest,youngest,son],[oldest,youngest,room],[oldest,son,youngest],[oldest,son,room],[oldest,room,youngest],[oldest,room,son],
     [room,youngest,son],[room,youngest,oldest],[room,son,youngest],[room,son,oldest],[room,oldest,youngest],[room,oldest,son],
     [youngest,son,oldest,room],[youngest,son,room,oldest],[youngest,oldest,son,room],[youngest,oldest,room,son],[youngest,room,son,oldest],[youngest,room,oldest,son],
     [son,youngest,oldest,room],[son,youngest,room,oldest],[son,oldest,youngest,room],[son,oldest,room,youngest],[son,room,youngest,oldest],[son,room,oldest,youngest],
     [oldest,youngest,son,room],[oldest,youngest,room,son],[oldest,son,youngest,room],[oldest,son,room,youngest],[oldest,room,youngest,son],[oldest,room,son,youngest],
     [room,youngest,son,oldest],[room,youngest,oldest,son],[room,son,youngest,oldest],[room,son,oldest,youngest],[room,oldest,youngest,son],[room,oldest,son,youngest]
    ]
    )]) :-
    DCG = dcg_item_generator(Specifiers),
    phrase(DCG,_Input,[]).

:- end_tests(dcg_item_generator).

?- run_tests(dcg_item_generator).
% PL-Unit: dcg_item_generator . done
% test passed
true.

dcg_item_generator//1 is correctly generating the specifiers and more importantly using dcg_item_specifier//1, albeit without using comb/2 as mentioned earlier.

To use the parser version and the generator version of the clause requires a choice to be made based on the input given. If the parameter to dcg_item_model//1 is a variable then use the generator, if not then use the parser.

dcg_item_specifiers(Specifiers,S0,S) :-
    (
         var(S0)
    ->
        % Generate if var(S0)
        dcg_item_generator(Specifiers,S0,S)
    ;
        % Parse if not var(S0)
        dcg_item_specifier(Specifiers,S0,S)
    ).

and modify dcg_item_model//1 to use the new clause.

dcg_item_model(Atom) -->
	dcg_item(Item),
	dcg_item_connector_phrase(Connector),
    dcg_item_specifiers(Specifiers),
    {
        append([[Item],Connector,Specifiers],List),
        atomic_list_concat(List, ' ', Atom)
    }.

A complete test case would be to large, so here is just dcg_item_model//1 partially run as the most general query.

?- dcg_item_model(Sentence,S0,S).
Sentence = 'key for the youngest',
S0 = [key, for, the, youngest] ;
Sentence = 'key for the son',
S0 = [key, for, the, son] ;
Sentence = 'key for the oldest',
S0 = [key, for, the, oldest] ;
Sentence = 'key for the room',
S0 = [key, for, the, room] ;
Sentence = 'key for the youngest son',
S0 = [key, for, the, youngest, son] ;
Sentence = 'key for the youngest oldest',
S0 = [key, for, the, youngest, oldest] ;

and to show that dcg_item_model//1 is not an infinite loop, all the results of most general query for the clause will be counted.

?- aggregate_all(count,dcg_item_model(Sentence,S0,S),Count).
Count = 256.

Here is the final version of the code.

item_specifier(youngest).
item_specifier(son).
item_specifier(oldest).
item_specifier(room).

dcg_item_model(Atom) -->
	dcg_item(Item),
	dcg_item_connector_phrase(Connector),
    dcg_item_specifiers(Specifiers),
    {
        append([[Item],Connector,Specifiers],List),
        atomic_list_concat(List, ' ', Atom)
    }.

dcg_item(key) --> [key].

dcg_item_specifiers(Specifiers,S0,S) :-
    (
         var(S0)
    ->
        % Generate if var(S0)
        dcg_item_generator(Specifiers,S0,S)
    ;
        % Parse if not var(S0)
        dcg_item_specifier(Specifiers,S0,S)
    ).

% Specifiers can occur in a row.
dcg_item_specifier([Specifier|Specifiers]) -->
    [Specifier],
    { item_specifier(Specifier) },
    dcg_item_specifier(Specifiers).
dcg_item_specifier([]) --> [].

dcg_item_connector_phrase([for,the]) --> [for, the].
dcg_item_connector_phrase([to,the]) --> [to, the].
dcg_item_connector_phrase([who,is,the]) --> [who, is, the].
dcg_item_connector_phrase([belong,to]) --> [belong, to].

dcg_item_generator(Specifiers,S0,_S) :-
    aggregate_all(count, item_specifier(_Specifier), Count),
    between(1,Count,N),
    length(Specifiers,N),
    dcg_item_specifier(Specifiers,S0,[]),
    list_distinct(Specifiers).

list_distinct(List) :-
    list_to_set(List, Set),
    length(List,List_length),
    length(Set,List_length).

I did not take the time to tweak it, just made it to demonstrate how to do it.

The part I don’t like is how dcg_item_generator//1 has to calculate the number of clauses using aggregate_all/3 for item_specifier/1 each time it enters the clause and why I was thinking about tabling in the earlier comment.

FYI

While the use of length/2 for generating holes to fill may be like a door opening on new ways to use Prolog, for an example that generates a structure more complicated than a list and that even changes the structure as query progresses see: Hints to understand splendid program to solve Queens

Remember the first part of generate and test is generate as in using a DCG to generate data.

Topic		Replies	Views
DCG look ahead, how its done? General	47	294	August 13, 2024
Are DCG's just "syntactic sugar"? Help!	18	1461	July 9, 2019
Trivial parsing and generating DCG Help!	5	877	August 3, 2019
What's the idiomatic way of developing DCGs? Help!	8	682	December 17, 2020
Phrase_from_file with initial state as push back list General	16	611	May 3, 2020

Controlling backtracking in a DCG grammar?

Related topics