E-Graph implementation using prolog variables

Hello all,
I have finally managed to cleanup an implementation of E-Graph, this time using prolog variables for class ids.
For a bit of context about egraph, I have recently made another topic explaining that, although that approach was a dead-end.

Implementation

The idea behind this new implementation is to represent an E-Graph as an ordset of pairs, where each pair is a Node-ClassId.
A Node is either:

  • a prolog non-compound term (atom, number, string, variable).
  • or a compound term with its argument replaced with the class id of the arguments like this: A+B where A would be the class id of the first argument, etc

Usage

To use the e-graph, you would use it in three steps:

  1. add_term(Term, Id): convert your prolog term into an e-graph (as an ord_list of pairs as explained above). Don’t loose that Id, it will become the rewritten term.
  2. saturate(Rules, Budget): saturate the e-graph given a set of rewrite rules. The budget is used to limit saturation iterations.
  3. extract: extract the lowest cost term. The Id produced in add_term will be instantiated with the rewritten term

These three predicates are actually DCG predicates, so that you can thread the egraph through easily.

Rewrite Rules

I have currently implemented the following rewrite rules for addition and multiplication:

  • comm: A+B => B+A, A*B => B*A
  • assoc: A+(B+C) => (A+B)+C, A*(B*C) => (A*B)*C
  • reduce: A+0 => A, A*1 => A, A*0 => 0
  • factorize: A+A => A*2, A+B*A => A*(B+1)
  • constant_folding: 1+2 => 3, 1*2 => 2 for any constants

Cost

I have implemented a naive cost system where:

  • a non-compound term has a cost of 1
  • compound term has a cost of the sum of its argument cost + 1
  • cost of a factorized node is slightly lower (0.9) to always prefer it from its non-factorized form

Example

% with this, we can create an egraph from a prolog term, saturate and extract
% the lowest cost term
example3(N, Expr, R) :-
   distinct(R, phrase((
      add_term(Expr, R),
      saturate([comm, assoc, factorize1, factorize2, constant_folding], N),
      saturate([reduce], N), % reduce with comm+assoc+factorize explodes
      extract
   ), [], _)).
% commutativity + associativity
?- egraph:example3(inf, a+b+c, R).
R = a+b+c ;
R = b+a+c ;
...
% Constant folding
?- egraph:example3(inf, 1+2, R).
R = 3 ;
R = 1+2 .
% factorization
?- egraph:example3(inf, a+a+b+2*a, R).
R = b+4*a .
% reduction
?- egraph:example3(inf, a+0, R).
R = a ;
R = 0+R ;
R = R+0.

Performances

I can do full equality saturation for 10 additions with commutativity + association rewrite rules in ~30 seconds.
The blog post on doing egraph with chr reports doing 6 additions in 40 seconds and can’t do 7 and says that a rust implementation of egraph, egglog, can do 10 in less than a second.
So, it is better than the chr implementation, but still far from the rust implementation.

Although this is still highly experimental and buggy, please try and test it and let me know what you think.
If anybody express some interests, I’ll make a pack out of it.

Notes

The design choice of using prolog variables for class ids makes congruence closure much simpler.
However, the implementation of rewrite rules is a real pain.
For this implementation to become practical, I believe we need a rewrite rule compiler to make writing rewrite rules much simpler…

This looks great!!! I’ll have to take a closer look. Have you seen this one? Unification Modulo E-Graphs | Hey There Buddo! The idea here was that egraphs could be used to form CLP(ground_equations), an egraph carried along in the prolog state could give you a richer notion of unification (modulo ground equations). I wonder if your implementation enables that.

No, I haven’t read that blog post and the idea is amazing (although I haven’t took the time to read the python code much).
In a previous post of mine, I thought about extending syntactical unification, but nobody brought egraph to the discussion.

So currently, the implementation treats variables like any other term.
But one could first saturate the egraph of a term, extract and check for syntactical unification:

?- phrase((
     add_term(X+a, Xa),
     add_term(a+b, AB),
     saturate([comm]),
     extract), [], G),
   unifiable(Xa, AB, Unifs).
Xa = X+a,
AB = b+a,
G = [X-node(X, 1), a-node(a, 1), b-node(b, 1), ...],
Unifs = [X=b] ;
Xa = a+X,
AB = a+b,
G = [X-node(X, 1), a-node(a, 1), b-node(b, 1), ...],
Unifs = [X=b] ;
false.

But I suppose this is very, very inefficient.
I would guess a more efficient unification strategy would be to traverse the egraph classes directly without doing any extraction ?

I also don’t really understand where would constraints came into play here ?
Could you help me understand with a simple example ?