E-Graph matching with monotonic tabling

Hello,
I have recently stumbled at a very interesting data structure called e-graph (equivalence graph) used to implement term rewriting system (TRS).
When we try to do TRS in prolog, although the formulation is quite elegant, we are immediately confronted to the explosion of possible rewrites.
E-Graph kind of avoid this explosion by grouping equivalent nodes in classes and apply rewrites on classes.
Here is a nice blog post explaining egraph in general, and another that implements it in chr.

I already have tested multiple approach and currently trying to reuse monotonic tabling for applying rewrite rules.
Here is a minimal implementation of the idea with commutative and associative rewrite rules for addition:

:- table (match/2, rewrite/2, rule/2) as monotonic.
:- dynamic node/2 as monotonic.

% flattened representation of (a+b)+c
% node(Node, Class)
node(a, a).
node(b, b).
node(c, c).
node(a+b, ab).
node(ab+c, abc).

% commutativity
rule(A+B, B+A).
% associativity
rule(AB+C, A+BC) :-
   node(A+B, AB),
   % don't insert BC if it already exists
   (  node(B+C, BC)
   -> true
   ;  node(B, BClass),
      node(C, CClass),
      atom_concat(BClass, CClass, BC),
      assert(node(B+C, BC))
   ).

rewrite(X, X).
rewrite(X, Z) :-
   rule(X, Y),
   rewrite(Y, Z).

match(Y, Class) :-
   node(X, Class),
   rewrite(X, Y).

:- prolog_listen(match/2, connection_change).

connection_change(Type, _:match(From, To)) :-
   format('~p: ~p are now match~n', [Type, From-To]).

This already works quite well:

?- match(_Node, _Class), Node = _Node-_Class.
new_answer: c+a-ca are now match
new_answer: a+c-ca are now match
new_answer: c+b-bc are now match
new_answer: b+c-bc are now match
Node = c-c ;
Node = b-b ;
Node = a-a ;
Node = c+b-bc ;
Node = c+a-ca ;
Node = c+ab-abc ;
Node = b+c-bc ;
Node = b+a-ab ;
Node = a+c-ca ;
Node = a+b-ab ;
Node = ab+c-abc.

But it is missing 4 nodes: [a+bc, bc+a, b+ac, ac+b].
Very weirdly, querying [a+bc, bc+a] directly works ???

?- match(a+bc, abc).
true.

?- match(bc+a, abc).
true.

But not [b+ac, ac+b]:

?- match(ac+b, abc).
false.

?- match(b+ac, abc).
false.

I am kind of stumped, could somebody explain why ?

2 Likes

Ah, my bad, it was the construction of the BC class name in the assoc rule which was incorrect:

rule(AB+C, A+BC, assoc) :-
   node(A+B, AB),
   (  node(B+C, BC)
   -> true
   ;  node(B, BClass),
      node(C, CClass),
      % need to sort subclasses so that the resulting class is unique with respect to permutation
      sort([BClass, CClass], [C1, C2]),
      atom_concat(C1, C2, BC),
      assert(node(B+C, BC))
   ).

Now it works:

?- match(_Node, _Class), Node = _Node-_Class.
new_answer: c+a-ac are now match
new_answer: a+c-ac are now match
new_answer: c+b-bc are now match
new_answer: b+c-bc are now match
Node = c-c ;
Node = b-b ;
Node = a-a ;
Node = bc+a-abc ;
Node = c+b-bc ;
Node = c+a-ac ;
Node = c+ab-abc ;
Node = ab+c-abc ;
Node = ac+b-abc ;
Node = b+c-bc ;
Node = b+ac-abc ;
Node = b+a-ab ;
Node = a+c-ac ;
Node = a+b-ab ;
Node = a+bc-abc.
2 Likes

Unfortunately, it seems that the implementation is not completely correct.
I get 195 nodes for 5 additions, whereas I should get 185 nodes…

node(a, a).
node(b, b).
node(c, c).
node(d, d).
node(e, e).
node(a+b, ab).
node(ab+c, abc).
node(abc+d, abcd).
node(abcd+e, abcde).

% use a global counter to generate class ids
next(Next) :-
   nb_getval(next, Next),
   Next1 is Next + 1,
   nb_setval(next, Next1).

:- nb_setval(next, 1).

rule(AB+C, A+BC, assoc) :-
   node(A+B, AB),
   (  node(B+C, BC)
   -> true
   ;  next(BC),
      assert(node(B+C, BC))
   ).
?- time(once(match(_, _))), findall(Node-Class, match(Node, Class), Nodes),
   length(Nodes, N).
Nodes = [3+e-20, 3+b-7, 3+b-32, 3+a-29, 3+a-8, 3+ab-abcd, 2+b-abc, ... + ... - 7, ... - ...|...],
N = 195.

I am wondering if this is because I haven’t written the full congruence closure ?
Anybody has an idea ?

I am sorry to report that after a lot of experimentation with tabling, I couldn’t write an elegant version of the full congruence closure with tabling.

The problem has to do with the fact that when using tabling (so backtracking) with dynamic predicates, we need to ground the class ids.
However, when doing congruence closure, one need to replace all uses of a class id in all pairs that uses it. For example, let’s say you have the following pairs Node-Class: [a-a, f(a)-fa, f(fa)-ffa] and you want to unify a and ffa. You need to replace every instance of ffa in every pairs.

Doing this is extremely slow and very tedious. Moreover, I couldn’t manage to do batch rewrites and would always get incorrect nodes.

I have written another approach based on variables for class ids that is correct and scales quite a bit better.
I will post it when I have figured out how to make cost-based extraction to work.