Tracking/constraining choicepoints while parsing with DCG

SWI-Prolog (threaded, 64 bits, version 8.0.0)
Windows 10

While developing DCGs to parse large files (230 MB (241,632,543 bytes)) the parse sometimes ends with an exception:

ERROR: Stack limit (2.0Gb) exceeded
ERROR:   Stack sizes: local: 1.0Gb, global: 0.4Gb, trail: 0.1Mb
ERROR:   Stack depth: 204,035, last-call: 0%, Choice points: 4,279,251

In this case, as in many cases before, I suspect the cause is that a DCG rule is not properly written and thus not finding the ending character(s), e.g.


multiline_text_codes(Cs) -->
    \+ "\n;",
    "\n",
    multiline_text_codes(Cs).
multiline_text_codes([C|Cs]) -->
    multiline_text_code(C),
    multiline_text_codes(Cs).
multiline_text_codes([]) --> [].

multiline_text_code(C) -->
    \+ "\n;",
    [C].

and so continues looking further creating more choice points, even to the end of the list of character codes.

Is there a way to track (gain access to) the number of choice points in a predicate (clause) so that an action can be taken such as reporting the predicate if the number of choice points reaches or exceeds a set value?

If this could be done only once per program or pl file, versus once per predicate, it would be desired.

I am aware of Hackers corner but have never used anything from there.

While my main development system is Windows, I can switch to Linux if that is necessary.

I would try to put a cut after a line… but anyway I would write multiline_text_codes//1 very differently, without negation. Do you have specific reasons to use negation for this parse?

You may also benefit from rewriting the first grammar rule as:

multiline_text_codes(Cs) -->
    "\n", [C],
    multiline_text_codes(C, Cs).

multiline_text_codes(0';, Cs) -->
    ...

This would avoid looking for "\n" twice.

1 Like

Thanks. I am aware that cuts can be used but would prefer to avoid cuts when possible by modifying the clauses as necessary for the DCG.

This was better than this.

If you have a better answer to that then let me know so I can ask a new question so that the information, which would be of value to others, would not get obfuscated by being buried in this thread.

I think the most insightful tool for examining non-determinism is the graphical debugger. Set a spy point on the predicate/DCG you are interested in (?- spy(mypred).). When the spy point triggers, use s (skip) to get to the end of the predicate and look around in the top-right stack view. Choice points are marked with a double arrow. Click on a choice point to see the code it comes from. You can then use u (up) to find where the code is called from.

Then think what should really be there. Some hints for creating fewer choice points have been given. I’d typically put a cut at some tactical places, notably after the completion of a statement/record, etc.
Some people really do not want to use cuts. For parsing it is fairly natural: once you know you are on the right track, use a !. Note that this also makes the grammar terminate with false in a reasonable time. Without a cut and a failure far down in the file it will spend very long exhausting all the choice points before.

1 Like

Note that since SWI-Prolog has deep indexing it is no longer needed to get the next character and use it in a subsequent predicate for first argument indexing. Consider:

t(aap) -->
    "aap".
t(noot) -->
    "noot".

Now, we run this:

?- phrase(t(X), `aapnoot`, L).
X = aap,
L = [110, 111, 111, 116].

As you see, there is no choice point. WE can (almost) figure what was done:

?- jiti_list(t).
Predicate                                     Indexed  Buckets Speedup Flags
============================================================================
user:t/3                                        2            2     1.0  L
true.

This tells us it created an index on the second argument. This (outer) index has no speedup as both clauses use [_|_], but the not shown index on the first argument of this list discriminates. This can make your code significantly easier to read, faster and fully declarative :slight_smile:

2 Likes

Deep indexing is a great feature, I agree. But my suggestion is not about exploiting first-argument indexing but avoiding duplicated parsing due to the \+ "\n;", "\n" conjunction in the original grammar rule body. Also, all the arguments in the clause resulting from the expansion of the original grammar rule are variables and thus indexing doesn’t help in this case:

?- expand_term((multiline_text_codes(Cs) -->  \+ "\n;", "\n", multiline_text_codes(Cs)), Clause).
Clause =  (multiline_text_codes(Cs, _7910, _7932):-(\+_7910=[10, 59|_8008], _8046=_7910), _8046=[10|_8086], multiline_text_codes(Cs, _8086, _7932)).

Ok. I’d any way write this as "\n", \+ ";" or more likely as

x([]) --> "\n;", !.
x(L) --> "\n", ...

Or, if I’m in the puristic mode

x(L) --> "\n;", !, {L=[]}.
x(L) --> "\n", ...

I would say instead that cuts must be used - of course where appropriate.
In classical Prolog, there is only negation by failure, nothing more than a cut fail.