DCG optimisation and non-optimisation with multiple accumulators

I modified @jan 's proposal for DCGs with multiple accumulators (Dealing with state), to use records (see library(record)) instead of dicts. Initially, this didn’t speed things up; but when I added some expansion rules to inline the record accessors, I got a nice speed-up. (My goal expansion code is at the end of this post).

However, there was one expansion that looked non-optimal:

dx -->
    value(Prev)<prv,  % Expanded inline
    { Prev \= d },
    [i]<ops.

expanded to this:

dx(#(A, B, Prev, C, D, E), #(F, G, H, I, J, K)) :-
    Prev \= d,
    #(F, [i|G], H, I, J, K)= #(A, B, Prev, C, D, E).

instead of this:

dx(#(A, [i|B], Prev, C, D, E), #(A, B, Prev, C, D, E)) :-
    Prev \= d.

This was easily fixed (I thought) by changing boot/dcg.pl line 125 from

dcg_body({T}, P0, Q, S, SR, (QT, SR = S), P) :-
    !,
    dcg_bt_pos(P0, P1),
    qualify(Q, T, P1, QT, P).

to

dcg_body({T}, P0, Q, S, SR, QT, P) :-
    !,
    dcg_bt_pos(P0, P1),
    SR = S,
    qualify(Q, T, P1, QT, P).

and it did what I expected for my code.

But it failed the unit tests:

ac --> {!}.
ac --> [_].

bx --> {\+ throw(executed)}.

which should have expanded to

ac(A, B) :-
    !,
    B=A.
ac([_|A], A).


bx(A, B) :-
    \+ throw(executed),
    B=A.

but my “optimisation” changed these to the non-steadfast:

ac(A, A) :-
    !.
ac([_|A], A).

bx(A, A) :-
    \+ throw(executed).

This lack of inline expansion of (=)/2 isn’t a problem with regular DCGs, because they generate a simple equality that the compiler can optimise away. But with the multi-accumulator expansion, the equality becomes something like

#(F, [i|G], H, I, J, K)= #(A, B, Prev, C, D, E).

The generated code constructs the two terms and unifies them, which is somewhat slower than the equivalent

F=A,
[i|G]=B,
H=Prev,
I=C,
J=D,
K=E

I have a work-around: replace value(Prev)<prv,{Prev\=d} with not_value(d)<prv (and an appropriate goal expansion) but that more-or-less requires writing a DCG version of every non-DCG goal.

So, I can see three solutions:

  1. Make the compiler smarter in handling goals such as f(A,B)=f(C,D)
  2. Change the DCG expansion to look inside the body for potentially dangerous items (cuts, calls, etc.) and inlining the (=)/2 only if those aren’t present
  3. A new form of {...} (maybe {{...}}) that promises there’s nothing dangerous inside the {{...}.

Option #1 is probably the best because it would also benefit other situations. But I don’t know my way around the compiler – @jan please tell me where to look.

Goal expansion code
goal_expansion(<(On, Name,State0,State), Goal) :-
    expand_record(On, Name, State0, State, Goal).

% expand_record/2 uses record expansion instead of a dict, for faster
% performance. There must be a dcg_record_name/1 fact and an
% appropriate `:- record` directive using the same name.
expand_record(Literal,Name,State0,State, Goal), is_list(Literal) =>
    get_set_record(Name, State0, List, State, Tail),
    append(Literal, Tail, List),
    Goal = true.
expand_record(String,Name,State0,State, Goal), string(String) =>
    get_set_record(Name, State0, List, State, Tail),
    string_codes(String, Literal),
    append(Literal, Tail, List),
    Goal = true.
expand_record(value(Value),Name,State0,State, Goal) =>
    get_set_record(Name, State0, V0, State, V),
    Value = V0,
    V0 = V,
    Goal = true.
expand_record(set_value(Value),Name,State0,State, Goal) =>
    get_set_record(Name, State0, _V0, State, V),
    Goal = (V = Value).
expand_record(not_value(NotValue),Name,State0,State, Goal) =>
    get_set_record(Name, State0, V0, State, V),
    V0 = V,
    Goal = (NotValue \= V0).
expand_record(incr,Name,State0,State, Goal) =>
    get_set_record(Name, State0, V0, State, V),
    Goal = (V is V0 + 1).
expand_record(decr,Name,State0,State, Goal) =>
    get_set_record(Name, State0, V0, State, V),
    Goal = (V0 > 0, V is V0 - 1).
expand_record(Step,Name,State0,State, Goal), callable(Step) =>
    extend_goal(Step, [V0,V], StepEx),
    get_set_record(Name, State0, V0, State, V),
    Goal = StepEx.

dcg_record_name('#').

:- record '#'( % must match dcg_record_name/1
               acc,             % accumulator
               ops,             % list of opcodes
               prv,             % previous opcode
               out,             % result of running the opcodes,
               num, % the number to be output (for limiting search space)
               nsq  % see deadfish/3 (for limiting search space)
             ).

compileBodyUnify() in pl-comp.c. As is, it only deals with Var = Something and notably avoids real unification if Var is fresh (first in the jargon of the code). You can of course also do this expansion at the Prolog side. I’m not convinced it always wins because the compiled unification is not very efficient due to the need for debugger and decompiler support.

(2) could be worth considering.

As a more general remark, this touches on Prolog source → source transformation that should perform simple inlining, moving (notably) unifications around, etc.

On the other hand, possibly we can reach a bigger gain with the same effort with a careful look at the VM instruction set and its implementation. See results on arithmetic. There are more areas with low-hanging fruit.

Off the top of my head, I can think of these as potentially dangerous:

  • cut (!)
  • throw/1
  • call/1

So, if none of these are inside the {...}, then it would be safe to apply my optimisation?

On a related note, I took a look at compileBodyUnify() and decided that modifying it was likely more work than I want to do right now, partly because the code is in C and partly because I could just add a goal_expansion/2 that handles the particular situation … except when I tried adding a goal_expansion/2 rule, it didn’t work – it appears that the optimisation that results in the goal #(F, [i|G],H,I,J,K)=#(A,B,Prev,C,D,E) happens after all the expansion is done. :frowning:

The Aquarius compiler was written in Prolog (that’s why Peter Van Roy invented the EDCG notation); I wonder if we could add a layer to SWI-Prolog that’s written in Prolog, to allow more extensive optimisations?