New egraph pack v0.1.0 annoucement for term rewriting

Hello all,

I have finally managed to publish the first version of my egraph (Equivalence Graph) implementation in swi-prolog !
For a bit of context, I have already made an earlier post detailing how I implemented the egraph.

I’m going to make a very big claim here that this pack provides the fastest term rewriting system implementation of the prolog world !
The reason for me to say this is that this implementation can do full equality saturation of a 10 additions term with commutativity and associativity rewrite rules in under 15 sec (on my laptop).
For comparison, this blog post says that egglogg (a popular rust implementation of egraph) can do this in under a second, while his chr implementation can’t do even a 7 addition term.
If anyone knows of a prolog TRS that can do this faster, please tell us :slight_smile:

What’s new is that I have finally managed to implement a rule compiler to translate rewrite rules into efficient compiled predicates.
Here is an example from the readme:

% myrule.pl
:- use_module(library(egraph)).

% Algebraic rules
egraph:rewrite(comm_add, A+B, B+A).
egraph:rewrite(assoc_add, A+(B+C), (A+B)+C).

% Rules with custom cost
egraph:rewrite(factorize_aa, A+A, 2*A, [cost(9r10)]).

% Rules with left-hand side conditions
egraph:rewrite(reduce_add0, A+B, [const(B, 0)], A, []).

% Rules with a Prolog body
egraph:rewrite(constant_folding, A+B, [const(A, VA), const(B, VB)], VC, [const(VC)]) :-
   VC is VA + VB.

Then you can use it like this:

?- use_module(library(egraph)).
true.

?- phrase((
       % convert a prolog term into an e-graph,
       % Optimized will be the rewritten term
       add_term(a+0, Optimized),
       saturate([reduce_add0]), % apply rewriting rules
       extract % extract optimal cost term in Optimized
   ), [], _Graph).
Optimized = a,
...

The rule compiler has the following features:

  • use prolog terms to match and rewrite terms
    • use '$VAR(Var)` to match literal prolog var
  • the rewritten term can have a custom cost, so that extraction will prioritize lower cost nodes
    the default cost is linked to the size of the prolog term (atomic cost is 1, compound is arity + 1)
  • you can match and attach const number property nodes. this is how the constant folding rule is implemented, as well as add arbitrary prolog code in the body of the rule.

To install, you can just do:

$ swipl pack install egraph

You’ll just need swi-prolog > 9.3.23 because the pack uses ==> SSU DCG rules.

Please, report any feedback, bugs or thought here or in the github issue page.
I know that TRS are used in popular clp packages like clpBNR (@ridgeworks) and clpfd.
I would really like to have opinion of the author of those packages if this could be useful to them ?

@jan I can’t see my egraph package anymore on SWI-Prolog packages ?
Did I do something wrong with my package publishing ?

I just fixed a bug with dicts and published a new version. is it because of that ?

I’ve installed v0.2.0 and written a small test module based on the README example:

:- module(esimplify,[esimplify/2]).

:- use_module(library(egraph)).

esimplify(Exp,SExp) :-
	phrase((
       add_term(Exp, SExp),
       saturate([comm_add,assoc_add,factorize_aa,reduce_add0,constant_folding]),
       extract
   ), [], _Graph).

% Algebraic rules
egraph:rewrite(comm_add, A+B, B+A).
egraph:rewrite(assoc_add, A+(B+C), (A+B)+C).

% Rules with custom cost
egraph:rewrite(factorize_aa, A+A, 2*A, [cost(9r10)]).

% Rules with left-hand side conditions
egraph:rewrite(reduce_add0, A+B, [const(B, 0)], A, []).

% Rules with a Prolog body
egraph:rewrite(constant_folding, A+B, [const(A, VA), const(B, VB)], VC, [const(VC)]) :-
   VC is VA + VB.

% Dict support
egraph:rewrite(operator_fusion, array{op: array{op: A+B}+C}, array{op: A+B+C}).

When I run your example query:

[debug] 143 ?- esimplify(A+0,S).
ERROR: Unknown procedure: egraph:comm_add/7
ERROR: In:
ERROR:   [20] egraph:comm_add(0,_622,t(black('',_644,_646,''),black(...,_654,...,...)),_626,_628,_630,_632)
ERROR:   [19] egraph:rules([comm_add,assoc_add|...],t(black('',_724,_726,''),black(...,_734,...,...)),0-node(_748,1),_694,_696,_698,_700) at /Users/rickworkman/.local/share/swi-prolog/pack/egraph/prolog/egraph.pl:126
ERROR:   [18] egraph:match([0- ...,...|...],[comm_add,assoc_add|...],t(black('',_834,_836,''),black(...,_844,...,...)),_786,[],_790,[0- ...,...|...]) at /Users/rickworkman/.local/share/swi-prolog/pack/egraph/prolog/egraph.pl:142
ERROR:   [17] egraph:saturate([comm_add,assoc_add|...],inf,[0- ...,...|...],_902) at /Users/rickworkman/.local/share/swi-prolog/pack/egraph/prolog/egraph.pl:207
ERROR:   [16] egraph:saturate([comm_add,assoc_add|...],[0- ...,...|...],_966) at /Users/rickworkman/.local/share/swi-prolog/pack/egraph/prolog/egraph.pl:194
ERROR:   [15] '<meta-call>'(esimplify:(...,...)) <foreign>
ERROR:   [14] call_dcg(esimplify:(...,...),[],_1070) at /Applications/SWI-Prolog10.1.3.app/Contents/Resources/swipl/boot/dcg.pl:368
ERROR:   [13] phrase(esimplify:(...,...),[],_1110) at /Applications/SWI-Prolog10.1.3.app/Contents/Resources/swipl/boot/dcg.pl:360
ERROR:   [12] esimplify:esimplify(_1158+0,_1154) at /Users/rickworkman/Documents/Prolog/egraph_tst/esimplify.pl:6
ERROR:   [11] toplevel_call(user:user: ...) at /Applications/SWI-Prolog10.1.3.app/Contents/Resources/swipl/boot/toplevel.pl:1520

Did I miss something for this use case?

Ah, the rules are expanded in the module they are defined, in your case esimplify.
I can now see how this could be counterintuitive with the egraph module qualification of rewrite…
This was done to restrict where rewrite term would be expanded.

So the quick fix is to module qualify the rules in saturate.
Does anyone know of a better solution ?

Yes, that fixed the exception. Now:

147 ?- esimplify(A+0,S).
S = _S1, % where
    _S1 = _S1+_S2,
    _S2 = 2*_S2 ;
S = _S1, % where
    _S1 = _S1+_S2,
    _S2 = _S2+_S2 ;
S = S+0 ;
S = _S2, % where
    _S1 = 2*_S1,
    _S2 = _S1+_S2 ;
S = _S2, % where
    _S1 = _S1+_S1,
    _S2 = _S1+_S2 ;
S = 0+S ;
A = S ;
A = S ;
A = S ;
false.

Is this the expected result? The desired answer (A = S) is there but why all the others? Should I expect this to be deterministic with “best” answer, or is some additional filtering required?

Yes, unfortunately…
e-graph can produce highly cyclical terms.
Given that the extract is driven by a structural cost, the extraction code have a hard time prioritizing non cyclical term vs cyclical term.

edit: after thinking more about this, the pb is that I extract structural cost through clpr and cyclical term produces cyclical cost, which are never resolved.
Then, when sorting by cost, variables are actually sorted before numbers, so extract will produce all cyclical terms first !
Definitely not what I wanted, I’ll try to fix it quickly.

No, extract is fully non-deterministic and will produce all possible variation of a term given the rewrite rules.
Fortunately, we can easily filter solutions in prolog, so something like this:

% if you want the first non-cyclical solution
?- once((esimplify(a+0, S), \+ cyclic_term(S))).
S = a.
% if you want to explore the set of non-cyclical solution
?- distinct((esimplify(a+0, S), \+ cyclic_term(S))).
S = a ;
false.

In the context of term rewriting, is it ever desirable to produce cyclic terms? (Or are they just an artifact of the implementation?) If yes, is the answer the same if the input term is not cyclic?

For my use case, I’m pretty sure cyclic solutions can be eliminated so I’ll build that into esimplify/2. That leaves me with non-determinism. As per your suggestion, for now I’ll use library(solution_sequences):distinct, also built-in to esimplify. Now:

?- esimplify(A+0,S).
A = S ;
false.

?- esimplify(0+A,S).
A = S ;
false.

OK, but:

?- esimplify(A+B,S).
S = A+B ;
S = B+A ;
false.

?- esimplify(A+0+A,S).
S = 2*A ;
S = A+A ;
false.

?- esimplify(A+B+A,S).
S = B+2*A ;
S = B+(A+A) ;
S = 2*A+B ;
S = A+A+B ;
S = A+B+A ;
S = B+A+A ;
S = A+(A+B) ;
S = A+(B+A) ;
false.

So this still leaves the additional job of selecting one from the several distinct solutions, and some are definitely better than others for my particular application. Perhaps the situation can be improved with a more selective set of rules, or perhaps by exploring the cost option.

I also did a simple performance measurement to compare this with my current, rather simple minded, expression simplification library:

% fail to generate all solutions for picking the best
?- time((esimplify(A+0+A,S),fail)).
% 6,805 inferences, 0.002 CPU in 0.003 seconds (54% CPU, 4200617 Lips)
false.

?- time((esimplify(A+B+A,S),fail)).
% 8,564 inferences, 0.003 CPU in 0.003 seconds (88% CPU, 2793216 Lips)
false.


% using library in pack clpBNR
?- time(simplify_constraint:simplify(A+0+A,S)).
% 157 inferences, 0.000 CPU in 0.000 seconds (27% CPU, 1427273 Lips)
S = 2*A.

?- time(simplify_constraint:simplify(A+B+A,S)).
% 249 inferences, 0.000 CPU in 0.000 seconds (79% CPU, 3557143 Lips)
S = 2*A+B.

I admit this is comparing apples to oranges, but it illustrates the performance gap to overcome in replacing the current, customized expression rewriting system in clpBNR.

Somehow the US mirror did not sync and thus the visibility depends on the backend chosen by the CDN. Synced the mirror by hand, but I think something is wrong there :frowning:

In Knuth Bendix algorithms, if you have an equation a = b,
you reorder it to either a |-> b or b |-> a rewriting, depending
on a term size idea. Hopefull you get something that you

can show termination, so that your rewriting becomes terminating.
You can though do rewriting without such orienting ideas. Like some
egraph ideas that do saturation and iterative deepening. But if

you are interested in termination and hence normal forms, even
ultra efficient rewriting without the breadth first aspect of egraphs,
cyclic terms can make it more difficult to define a term size idea.

Egraphs can possibly also be quite fast in the end, since they
don’t have to go down some strict oriented rewriting rules, and
might spot a short connection between two terms, right?

See also:

Reduction Order
https://mathworld.wolfram.com/ReductionOrder.html

It seems not.
I didn’t put enough effort on the extraction part of the library.
Writing the rule compiler seemed already like such a feat that I wanted to share it with others.
Given your feedback, improving extraction is going to be my next focus for improving this library.

The whole point of tracking the cost per node is to prioritize which will be the first extracted term (which is currently kind of bugged because of cyclical term).
If the first solution is correct for your need, you would just need to extract the first solution with once/1.
I don’t really have a reason for extract to be non-deterministic. Moreover, it seems like other egraph libraries have dedicated min cost extraction primitives.

Even if we compare only extracting the first solution for egraph, egraph is still like 10x to 100x slower than your simplify solution in clpBNR.
I think I can still optimize extraction a lot, which should reduce the overhead of using egraph a lot.

On the other hand, egraph can do things that your simplify implementation cannot (I think ?):

% adding these rules
egraph:rewrite(factorize_aba, A+B*A, A*(B+1), [cost(9r10)]).
egraph:rewrite(distribute, A*(B+C), A*B+A*C).
egraph:rewrite(cancel_add_sub, A+B-A, B).
?- time(once(esimplify(A+B*A,S))).
% 9,097 inferences, 0.003 CPU in 0.003 seconds (99% CPU, 3173606 Lips)
S = A*(B+1).

?- time(clpBNR:simplify(A+B*A,S)).
% 81 inferences, 0.000 CPU in 0.000 seconds (95% CPU, 1249730 Lips)
S = B*A+A.

?- time(clpBNR:simplify(A*(X+Y) - A*X,S)).
% 201 inferences, 0.000 CPU in 0.000 seconds (97% CPU, 1733177 Lips)
S = A*(Y+X)-X*A.

?- time(esimplify(A*(X+Y) - A*X,S)).
% 7,329 inferences, 0.002 CPU in 0.003 seconds (94% CPU, 2948093 Lips)
S = A*(X+Y)-A*X ;
% 37 inferences, 0.000 CPU in 0.000 seconds (91% CPU, 980210 Lips)
S = A*(Y+X)-A*X ;
% 65 inferences, 0.000 CPU in 0.000 seconds (91% CPU, 1797020 Lips)
S = A*Y ; % extraction is bugged in this case, should be first

I suspect there is a case to be made for the cost function to be user definable. For my purposes I defined the cost of an expression to be the sum of the number of operators and variable occurrences in the expression, but that probably doesn’t fit all use cases.

I then collect all the solutions (using bagof), and then select the one in the list with the lowest cost:

esimplify(Exp,SExp) :-
	bagof(S,esimplify_(Exp,S),Ss),     % collect all solutions
	ecost(Exp,Cin),                    % cost of input expression
	mincost(Ss,Cin,Exp,SExp).          % select lowest cost alternative

esimplify_(Exp,SExp) :-
	phrase((
       add_term(Exp, SExp),
       saturate([
         esimplify:comm_add,
         esimplify:assoc_add,
         esimplify:factorize_aa,
         esimplify:reduce_add0,
         esimplify:constant_folding
        ]),
       extract
   ), [], _Graph),
   acyclic_term(SExp).

Seems to do what I want for these simple tests:

?- esimplify(A+0+A,S).
S = 2*A.

?- esimplify(B+0+A,S).
S = B+A.

So the fact that extract is nondeterministic works in my favour for this use case. But it comes at the expense of generating all solutions then testing them. But maybe there’s something useful in this prototype that can be pushed back into egraph to improve overall performance or usability.

Almost certainly. the clpBNR simplify is really only looking at the low-hanging fruit to generate a hopefully simpler expression with the proviso that it has to be quick since pretty much every added constraint is simplified. It’s primarily useful for machine generated code, e.g., the result of a symbolic partial differentiation. I would think that egraph could be used to find an optimal equivalent of any expression, however optimal is defined. (The clpBNR version also has some special requirements, e.g., no floating point arithmetic, and 0*X can’t be reduced in cases where the domain of X can include infinity.)

Now that I have the basics running it would be interesting to expand my set of egraph rules to cover what the clpBNR simplify does to better understand the gap between it and “optimal” (as defined by the egraph version).

I have published a new version of this pack (v0.3.0) with the following fixes:

  • compile the list of rules given to saturate on-the-fly to avoid meta call and recursion during matching. I can see performance improvement when doing small term saturation over and over again with a lot of rules. This also fixes the need to module qualify rules.
  • complete rewrite of extract based on dijkstra min-cost computation. extract is now deterministic and always extract the minimal cost representation of a rewritten term.
    • it also avoids naturally any cyclic terms
    • WARNING: the interfaced extract has changed

here is the new way of using extract:

?- use_module(library(egraph)).
true.

?- phrase((
       add_term(a+0, Id),
       saturate([reduce_add0]),
       extract(Id, Optimized)
   ), [], _Graph).
Optimized = a,

Unfortunately, the use of dijkstra didn’t improve the overhead of using egraph, but at least extract is not correct.
it is still much much slower than clpBNR:simplify for example.

Yes, for sure.
I think that your use-case could be cleanly integrated into the current cost mechanism in egraph.
I think you would only need to override the cost for constants to be 0.
Maybe I could add a user definable hook in add_term to override the default cost assignement ?
or you could do a fake rewrite(A, [const(A, Const)], A, [cost(0)]). but that would be a very bad performance wise ?

Yes, maybe there could be some instances where you don’t need to be quick but would prefer more optimized expressions ?
For example, when calling a labelling predicate like global_minimum or enumerate, you could optimize once before doing the actual labelling ?
or maybe provide an optimize_constraints predicate that the user could manually call, that will reevaluate all the constraints after an egraph optimization ?

so, currently, the left hand side options in the rewrite rules only includes const(A, B) terms.
However, I could let the user put any kind of guard code in there.
Meaning you could do a [const(A, B), \+ float(B)] to avoid matching any floats.
Or [domain(X, real(Min, Max)), Min =\= -inf, Max =\= inf].
But that would mean trusting the user with providing correct guards for SSU rules.

I would be very interested to implement any features that would allow you to express cleanly any requirements for clpBNR.
Please list them and I will try to implement each of them.

If you have an appropriate normal forms, you can use multi-set
operations, what @Boris was discussing. Like for example:

    p = X - Y

    q = 2 Y

Is just this here, but with maps because of negative numbers:

?- ord_merge([X-1,Y-(-1)], [Y-2], R).
R = [X-1, Y-1]

Or more commonly understood:

     p + q = X + Y

But it would require a stable variable order. But since ord_merge/3
is a terminating Prolog program, there exist also a conditional
equality rewriting (with a reduction order) that would give the same.

It could have rules (sketch) for the ordset cases <, = and >:

    (P+X*A)+(Q+Y*B) = (P+Q+X*A)+Y*B          X @< Y
    (P+X*A)+(Q+Y*B) = (P+Q)+X*(A+B)          X == Y
    (P+X*A)+(Q+Y*B) = (P+Q+Y*B)+X*A          X @> Y

But there might be reasons to use egraphs nevertheless.
Wanted only give an explanation why you might see a drastic
speed difference between the different methods.

That seems to address the major reasons for writing additional user code to filter the results of the previous non-deterministic version. But I don’t really understand the operational semantics of the various forms of rewrite rules. As a simple example I tried to write a set pf rules to simplify expressions using only the add operation, e.g.,

A  ==> A
A = 0  ==> A
A + A  ==> 2*A
A + A + A  ==> 3*A
A + A + A + A  ==> 4*A
A + 0 + A  ==> 2*A
A + B + A  ==> 2*A+B
A + B + A + B  ==> 2*(A+B)
...

My attempt at a set of rules for this:

egraph:rewrite(comm_add, A+B, B+A).

egraph:rewrite(assoc_add, A+(B+C), (A+B)+C).

egraph:rewrite(factorize_add, A+A, 2*A).

egraph:rewrite(reduce_add0, A+B, [const(B, 0)], A, []).
egraph:rewrite(factor_add1, N*A+A, (N+1)*A).
egraph:rewrite(factor_add,  N1*A+N2*A, (N1+N2)*A).

egraph:rewrite(constant_folding_add, A+B, [const(A, VA), const(B, VB)], VC, [const(VC)]) :-
   VC is VA + VB.

but I couldn’t get anything useful. The two typical results were no simplification or stack limit exceeded (1.0 GB) so I’m clearly not understanding something.

I would leave this decision up to the user, i.e., pre-optimize any expressions before adding a constraint via {..} if that’s useful for the problem at hand. I don’t see any merit in trying to optimize anything after the constraints have been defined.

Don’t follow but then, as I said, I don’t really understand the operational semantics of this pack’s rewrite rules. At least one form of the rewrite rule has a body; why not put this code there? How does SSU impact how the user writes these rules? Shouldn’t they be largely declarative?

rewrite(RuleName,From,To,Cost) :- Body.  % rewrite From to To with cost Cost

The biggest difference between egraph and Knuth Bendix based
approaches seems to me now, that in egraph, like you have implemented
it, the rules lead to non-deterministic expansion. This is unlike

Knuth Bendix, which not only takes an equation A = B and orients
it into A |-> B or B |-> A rewriting. Its also the case that in each
reduction step, where a redex is identified and then the redex

is reduced by a rule, this is then commited. It means the old
term is not anymore subject to rewriting, the system proceedes
with the new term. It then stops when it finds no more redex.

Which redex is picked is part of the rewriting strategy:

Reduction strategy
In a term rewriting system a rewriting strategy specifies, out
of all the reducible subterms (redexes), which one should be
reduced (contracted) within a term.
https://en.wikipedia.org/wiki/Reduction_strategy#Term_rewriting

A deterministic strategy picks only one.

So, I have worked on your examples here: egraph/examples/test_clpBNR.pl at d99b1a2db171588530c4262752e2e1c416979d60 · kwon-young/egraph · GitHub

There was 2 bugs in the rule compiler that I have now fixed. Thank you !
I have published a v0.4.0 that fixes this.

The reason that some rewrites are not terminating is not a bug but the fact that combining reduction rules with commutativty/associativity/factorize rewrite rules produces an exploding system of rules that generates new eclasses infinitely.
The basic solution is to stratify the saturation by first doing comm/assoc/factorize and then do the reduction after as is done here: egraph/examples/test_clpBNR.pl at d99b1a2db171588530c4262752e2e1c416979d60 · kwon-young/egraph · GitHub

what @Johnny_Rotten has written is relevant in the sense that if you are very careful with how you specify your rewrite rules, you can actually prove termination through the Knuth Bendix algorithm.
This puts a lot of restriction on the rewrite rules one can specify (for example, associativity A+B => B+A is not allowed), so much so that I have always found Knuth Bendix to be kind of useless.

So, the separation of the guard code and body code is largely to ease the compilation of the rules.
In theory, I could put the whole Body inside the guard of the compiled SSU predicate, but that is hard for practical “reasons”.
Basically, rewrite rules are compiled into a nested calls of SSU predicates, where each nested call checks a single node in the left hand side pattern.

Now that I have the addition tests working, would you mind listing other type of rewrite rules needed for clpBNR ?

Prof. Wu used some polynomial rewriting in 1980, and found
a proof of a theorem that human geometers hadn’t found in
300 years since Pascal formulated it in 1640, when 16 years old.

Prof. Wu used a 9835A, it had a single 16-bit 4-chip hybrid
processor and a maximum of 256KB of RAM. It ran HP Enhanced
BASIC and showed Pascal’s Mystic Hexagram Theorem.

How fast can Egg or Egraphs solve it? With their birds-eye view of the
entire labyrinth at once. Although I must admit the computer algebra
guys from back then 70’s and 80s, like Wu, Buchberger, etc..

had sometimes also quite efficient methods subscribing to
some matrix arrangement of polynomials.

See also:

Rewrite rules for solving geometric problems
Beat Bruderlin - 1993
https://www.researchgate.net/publication/210231846

Mechanical theorem proving in elementary geometries
Wu Wen-Tsun - 1986
https://link.springer.com/article/10.1007/BF02328447

But you found a need to modify my rule set, so some questions:

% Factoring terms with constant multipliers
egraph:rewrite(factorize_naa, N*A+A, [const(N, VN)], M*A, [const(M)]) :- 
    M is VN + 1.

egraph:rewrite(factorize_ana, A+N*A, [const(N, VN)], M*A, [const(M)]) :- 
    M is VN + 1.

This is an example of what I mean about my lack of understanding of the operational semantics. Why do I need two rules? Isn’t the second achieved by applying the comm_add? I.e., A+N*A is equivalent to N*A+A by comm_add which is then reduced to M*A via factorize_naa.

Also note that my intent was to handle any N in the rule, i.e., the rewrite result is (N+1)*A (N needn’t be a constant). Now if N was a constant, N+1 would be reduced to a single constant using the constant folding rule (constant_folding_add in my rule set).

Well, that’s a problem!

That doesn’t seem very satisfactory. How would you describe the properties of the rules such that you can divide them into the two required sets? Could this be done inside egraph?

I’m wondering why this should be of concern to a user. Can’t this level of detail be abstracted away?

clpBNR doesn’t “need” any rewrite rules. It has a builtin quick (and dirty) constraint simplification function which could be implemented with a general purpose rewriting system given adequate performance. While I’m not currently contemplating this, the requirements for such a system have largely been discussed:

  • arbitrary code in the body a) to filter the input expression (type checks or things that can’t be done with only using unification) , and b) to call helper predicates provided by the system (e.g., is) or user.
  • the ability to customize the selection of the result through the cost option or some other mechanism. (I’m not sure what the current default criteria is or how the cost option can be used to modify its behaviour.)

But also, and more importantly for the general user, a rewriting system that’s understandable and easy to use.

In addition, I think there’s a need to provide some tools or guidance as to how to “debug” the rewrite rules. The rules themselves seem to transformed in the compilation process, so it’s difficult to discern root causes when an unexpected result is produced.

Sorry, you are right about all this.
I have corrected the rules exactly as you provided them: egraph/examples/test_clpBNR.pl at f362c3b883386cb2e7f5a1b7a10844bfb2f26971 · kwon-young/egraph · GitHub

The only modification I did was to add a custom cost to the factor_add1 rule:

egraph:rewrite(factor_add1, N*A+A, (N+1)*A, [cost(9r10)]).

Because N*A+A has the same structural cost as (N+1)*A, we lower the cost of the node (N+1)*A to prioritize it during extraction.

Well, as said before, one could try to use knuth bendix algorithm to prove that a set of rewrite rules terminates, but that puts so much constraints on which rules you can write, I find it quite useless.
Egraphs (in general) are better than naive term rewriting system because they can natively represent cycles, so rules like commutativity doesn’t lead to non-terminating saturation.
But it is still possible (and very easy) to write a set of rewrite rules that doesn’t terminate, as we have seen here.
And it is kind of impossible to automatically prove non-termination (this is like the halting pb).

Note that saturate has a 2 arguments version where you can specify a maximum number of iterations. this will bound your saturation budget and stop eventually even if the rule set is non-terminating.

Yes of course, I’m just saying that writing the rule compiler pushed my brain to my limits…
I’ll try but can’t promise anything.

Just to be clear, I don’t want clpBNR to integrate egraph or anything.
I am just asking for potential use-case for my egraph library that I can try.
As seen above, this leads to bugfixes and actual improvement of the library.
I am really thankful for you to take the time and try my libraries idea ^^

I agree, I’ll see what I can do.

Note that you can already do this in the body of the rule, like what is used in the constant_folding rewrite rule.
But the body of a rule currently need to be deterministic.

You can already do that.
Each node in the egraph has a default cost of 1.
when computing the total cost of a compound node for extraction, I sum the cost of each arg + the cost of the compound node.
so, by altering the cost of a rewritten term, you can influence its priority for extraction like what is done in the factorize rules.

Yes, I do feel that need too.
I have tried adding a debug topic which you can use as a directive, which will print the results of expanding rewrite rules.
The problem is that if you don’t know how rewrite rules works under the hood, you won’t understand a thing.
But I have had some success in feeding the whole debug output to LLM and it can manage to pinpoint bugs very accurately.
I’ll try to add a section in the README about this.

Yes, of course. Because I spent so much time with the implementation, I don’t really know what will trip up new user when trying the library for the first time.
That’s why I am asking for feedback on this forum :slight_smile:

I would also like to take some time to explain why egraph rewrite rules can’t be express exactly like this.
For example, with egraphs, you can’t do something like:

rewrite(constant_folding, A+B, C, 1) :-
  number(A), number(B),
  C is A+B.

The reason for this is because egraph represent a term by flattening it into a list of nodes.
For example, 0+1 is flattened into a list of nodes: [(A+B)-AB, A-0, B-1].
Similarly, when compiling a rewrite rule, the left hand side pattern is flattened into a list of nodes that needs to be found in the egraph:
For example, 0+D is flattened into [(C+D)-CD, C-0] and then each pair is checked for membership in the list of nodes:

  • (C+D)-CD matches (A+B)-AB
  • C-0 matches A-0
    Note that C is never actually unified with 0 here, but that we checked that a node inside a eclass C has the literal 0.

The implication is that when doing:

rewrite(constant_folding, A+B, C, 1) :-
  number(A), number(B), % this won't work as A and B are eclass id variables
  C is A+B.

A and B are not numbers but eclass id variables.
The workaround that egraph libraries provides is to attach custom properties to eclass ids.
In our case, since eclass ids are variables, I use attributed variables to store properties like put_attr(A, const, 0) when parsing the term in add_term and then when using const(A, VA) in rewrite rules, it is translated as get_attr(A, const, VA).
In the end, I would like to make the library modular so that users can define any custom properties on eclass ids.

@ridgeworks Does this help you understand better the operational semantic of an egraph rewrite rules ? Or in other words, why I need to separate guard code operating on eclass ids and the body of the rule ?