Glitch in term_factorized/3, stack overflow

Was trying to run some monkey testing over my rep_compare/3.
I guess term_factorized/3 shows a glitch somewhere. Maybe because
it relies on buggy native compare/3 for their rbtrees? I don’t know

how the rbtrees are implemented. But during monkey testing it shows me:

?- aggregate_all(count, (between(1,1000000,_), random_cyclic(A),
random_cyclic(B), random_cyclic(C), rep_less(A,B),
rep_less(B,C), \+ rep_less(A,C)), F).
ERROR: Stack limit (1.0Gb) exceeded
ERROR:   Stack sizes: local: 0.7Gb, global: 0.2Gb, trail: 1Kb
ERROR:   Stack depth: 12,882,088, last-call: 50%, Choice points: 6
ERROR:   In:
ERROR:     [12,882,088] rbtrees:lookup(<compound s/2>,
_64416492, <compound black/4>)
ERROR:     [12,882,086] terms:insert_vars(<compound s/2>,
_64416526, <compound t/2>)

To reproduce the problem, here are the sources:

fuzzer.p.log (1,7 KB)
rep.p.log (1,6 KB)

This is a bad news for me, though the glitch might be fixed soon. I thought efficient C-level implementation of my semi_lex_sotrt based on compare_with_stack. It assumes term_facterized is built-in basic architecture (coalgebraic architecture) with a requirement that for x,y in the domain of the current state (colagebra), expanded x y are equal iff x=y.
I hope this property, i.e. keeping maximum bisimilarity
on the coalgebra( as state), is not expensive.

Please do not take this spontaneous comment serious. Anyway this is a reason why that glitch information sounds a pity for me.

Good ! Now I am wondering if smart “common sub terms identification” could be added to your factorize/3, which looks very likely but I am not sure for now.

EDIT. This is what I have in mind as for “common sub terms identifications”. Of course,
it might look like joking.

% ?- A=f(g(f(g(A)))),  B=f(g(B)),  factorize_minimum(h(A, B), S, L).
%@ A = B, B = f(g(B)),
%@ L = [S=h(_A, _A), _A=f(_B), _B=g(_A)] .

% ?- A=f(g(f(g(A)))),  B=f(g(B)),  factorize(h(A, B), S, L).
%@ A = B, B = f(g(B)),
%@ L = [S=h(_A, _B), _A=f(_C), _C=g(_A), _B=f(_D), _D=g(_B)].

factorize_minimum(T, S, L):-
	factorize(T, S, L0),
	minimize(L0, L).
minimize(X, Y):- select(E, X, X0),
	select(F, X0, X1),
	E = F,
	sort([E|X1], X2),
	minimize(X2, Y).
minimize(X, X).

Anonymous variable are not supported, they should be treated as atoms in coaglebra by definition.
Anyway, that is joking for the time being.

Oh, anonymous variables are not used. I must look into my codes.

Element equation of what factorize returns must be of the form
variable = f(variable, variable, variable,…).
For instance, X =f(1, A) is not allowed. This is required for constructing bisimulation
on the coalgera.

Yes, thats exacly.

I didn’t specify factorize/3 like that. You are making this up! It was specified like this:

Here is a fixed version:

factorize_minimum(T, S, L):-
    factorize(T, S, L0),
	minimize(L0, L).
minimize(X, Y):-
    select(V = E, X, X0),
	member(W = F, X0),
	E == F, !,
	V = W,
	minimize(X0, Y).
minimize(X, X).

The fixed version doesn’t destroy anything:

?- A = s(A,0), B = s(1,B), T = s(A,B), factorize_minimum(T, V, L).
A = s(A, 0),
B = s(1, B),
T = s(A, B),
L = [V=s(_A, _B), _A=s(_A, 0), _B=s(1, _B)].

But it cannot detect common subexpressions, if they are cyclic terms,
works only for common subexpressions that are acyclic, because the
Naish reference, used in my factorize/3 alrgorithm, is not unique.

?- A = s(1,0), T = s(A,A), factorize_minimum(T, V, L).
A = s(1, 0),
T = s(s(1, 0), s(1, 0)),
L = [V=s(_A, _A), _A=s(1, 0)].

?- A = s(A,0), T = s(A,A), factorize_minimum(T, V, L).
A = s(A, 0),
T = s(A, A),
L = [V=s(_A, _B), _A=s(_A, 0), _B=s(_B, 0)].

So you would need to redesign factorize/3 from ground up, to
get something that can do the same work as term_factorized/3 from
SWI-Prolog. There is a little bit more than what meets the eye.

I would like to do experiment modifying your factorize a little bit to see
whether expected minimization is gotten. Give me time.

Now I see a problem of my codes. My free-ride on you has failed.

Summarizing: term_factorized/3 uses library(rbtrees) which depends heavily on compare/3 which is buggy given cyclic terms so, in hindsight, not surprising it isn’t reliable.

In fact many (most, all?) of the problems encountered so far seem to be traceable back to the same bug, so I think a useful step would be to raise a new issue on using compare/3 with left-recursive rational trees (may not be the only issue but it’s simple) based on the following example:

?- X=(X,a), Y=(Y,b), Z=(Y,c), compare(A,X,Y), compare(B,Y,Z), compare(C,X,Z).
X = (X, a),
Y = (Y, b),
Z = (Y, c),
A = B, B = (<),
C = (>).

specifically, the violation of the transitive property of comparison system predicates.

I think Jan essentially localized the problem:

Is there any other useful input that would help resolve this issue?

I have managed to write my own minimum_factorization(X, U, C), where X is a given ground possibly cyclic term, C is minimum system of equations inluding U= A such that X and A are equivalent
when fully expanded.

The Current version may contain bugs.

% ?- A=f(B, 0), B=f(A, 1), factorization(f(A, B), U, F).
%@ A = f(f(A, 1), 0),
%@ B = f(A, 1),
%@ F = [U=f(_A, _B), _A=f(_B, _C), _C=0, _B=f(_A, _D), _D=1].
% ?- minimum_factorization(f(f(a), f(a)), U, L).
%@ L = [U=f(_A, _A), _A=f(_B), _B=a].
% ?- minimum_factorization(f(f(a), f(b)), U, L).
%@ L = [U=f(_A, _B), _B=f(_C), _C=b, _A=f(_D), _D=a].
% ?- A=f(B, B), B=f(A, A), minimum_factorization(f(A, B), U, F).
%@ A = B, B = f(_S1, _S1), % where
%@     _S1 = f(f(_S1, _S1), f(_S1, _S1)),
%@ F = [U=f(U, U)].
% ?- A=f(B, 0), B=f(A, 1), minimum_factorization(f(A, B), U, F).
%@ A = f(f(A, 1), 0),
%@ B = f(A, 1),
%@ F = [U=f(_A, _B), _A=f(_B, _C), _C=0, _B=f(_A, _D), _D=1].

minimum_factorization(T, S, L):-
	factorization(T, S, L0),
	minimize(L0, L).
minimize(X, Y):- select(U=_, X, X0),
	select(V=_, X0, X1),
	U \== V,
	simulate(U, V, X, [], Cs),
	sort(X0, X1),
	minimize(X1, Y).
minimize(X, X).

% ?- A=f(B, 0), B=f(A, 1), factorization(g(A,B), V, [], U, [], V).
% ?- A=f(B, 0), B=f(A, 1), factorization(g(A,B), V, C).
% ?- A=f(B, 0), B=f(A, 1), factorization(f(A,B), V, C).

factorization(X, V, C):- factorization(X, V, [], C, [], _).
factorization(X, _, _, _, _, _):- var(X), !, throw(unexpected_variable).
factorization(X, V, C, C, H, H):- member(U=T, H), T == X, !, V=U.
factorization(X, V, C, [V=U|C1], H, H1):- X=..[F|Xs],
	factorization_list(Xs, Vs, C, C1, [V=X|H], H1),
	U =..[F|Vs].
factorization_list([], [], C, C, H, H).
factorization_list([X|Xs], [V|Vs], C, C0, H, H0):-
	factorization(X, V, C, C1, H, H1),
	factorization_list(Xs, Vs, C1, C0, H1, H0).

unify_cluster([X, X|R]):- unify_cluster([X|R]).

unify_cluster_list([C|Cs]):-unify_cluster(C), unify_cluster_list(Cs).

get_target(U, A, X):- member(V=A, X), U==V, !.

% ?- simulate(U, V, [U=a, V=a], [], Cs).
simulate(U, V, X, Cs, Cs0):-
	get_target(U, A, X),
	get_target(V, B, X),
	functor(A, F, N),
	functor(B, F, N),
	union_find(U, V, Cs, Cs1),
	simulate_list(As, Bs, X, Cs1, Cs0).
simulate_list([], [], _, Cs, Cs):-!.
simulate_list([A|As], [B|Bs], X, Cs, Cs0):-
	(	in_same_cluster(A, B, Cs) -> Cs1 = Cs
	;	simulate(A, B, X, Cs, Cs1)
	simulate_list(As, Bs, X, Cs1, Cs0).
in_same_cluster(U, V, [C|_]):-
	member(X, C), X==U,
	member(Y, C), Y==V,
in_same_cluster(U, V, [_|Cs]):- in_same_cluster(U, V, Cs).

% ?- union_find(A, B, [], X), union_find(C, B, X, Y).
union_find(X, Y, Cs, Ds):-
	find_cluster(X, C, Cs, Cs0),
	(	member(U, C), U==Y -> Ds =[C|Cs0]
	;	find_cluster(Y, C0, Cs0, Cs1),
		append(C0, C, C1),
		Ds =[C1|Cs1]
find_cluster(X, C, [], []):-!, C=[X].
find_cluster(X, C, [D|Cs], Cs):-member(B, D), B==X, !, C=D.
find_cluster(X, C, [D|Cs], [D|Ds]):-find_cluster(X, C, Cs, Ds).

It doesn’t agree with SWI-Prolog:

?- term_factorized(f(f(a),f(b)), U, L).
U = f(f(a), f(b)),
L = [].

If you look at the SWI-Prolog source, I guess it walks all subterms,
and has also a counter, how many times it sees the same subterm.
This way you can detect common subterms, worth of factoring,

if the count>1. If you also arrange that cycles have count>1, you
are practically done. In a first pass you get all the counts. And it
seems to me that a second pass is enough to build the list.

The output system of equations has a unique solution, this is main purpose of
the minimum_factorization. However I am not 100% sure about this property, but 90 %.
Proof is needed.

EDIT. Each variable has a unique value, different from that of other variable.

Your solution doesn’t give the necessary innovation to get
closer to the SWI-Prolog term_factorized/3 predicate.

I mean its quite good:

?- Z=g(f(Z),Z), W=f(g(W,Z)), X=g(f(W),Z), minimum_factorization(X, Y, L).
L = [Y=g(_A, _B), _A=f(_C), _C=f(_B), _B=g(_C, _B)].

But compared to SWI-Prolog there is still something missing:

?- Z=g(f(Z),Z), W=f(g(W,Z)), X=g(f(W),Z), term_factorized(X, Y, L).
Y = g(f(_A), _B),
L = [_A=f(_B), _B=g(_A, _B)].

For implementing compare_with_stack in C-level, expensive == check becomes not necessary.
And also union-find process will become just derefencing. These two is a motivation of the minimum_factorization. I had no intention to give innovation to existing term_factorized.

Meaning of free variable of input terms is not fixed. It seems enough
to treat these free variable as atoms as far as mininimum_factorization concerns, which
is an easiest policy.

However, I see a serious problem behind such free-variable-as-atom policy, which
I have no idea to fix for now.

There is no problem in term_factorized/3 itself. If you accidentially use a new
internal variable for an external variable, the term_factorized/3 result will
be still correct. Only if you use the term_factorized/3 result for comparison

this might make a difference. It depends on the Prolog system and the
used Prolog rules what exactly happens. Also the map in term_factorized/3
doesn’t carry variables directly, you only store compounds in it, so there is

no danger of aliasing from the side of the term map. You can check the
source code of SWI-Prolog by yourself, you will immediately
see that “primitives” are not stored in the map:

primitive(Term) :-
primitive(Term) :-

add_map(Term, Map0, Map) :-
    (   primitive(Term)
    ->  Map = Map0

primitive don’t go into the collection. I didn’t make ‘$VAR’/1 also a primitive in
my code. Is this even documented for SWI-Prolog? So if you don’t make ‘$VAR’/1
primitive the corresponding clause becomes much easier. If atomic/1

would also cover variables I would use this check, but since I also want
to cover variables I simply used the inverted compound/1 check:

Given a rational tree T as coalgebra A, the minimum_factorization
constucts an ‘injective’ coalgebra B for the same T.
However if T has free variables treated as atoms and
if they are instantiated afterward, the solution of B
might be no more injective. This is bad of the purpose T to eliminate expensive test on ‘==’ from compare_with_stack (in C-level efficient implementation.)

This is what I am seeing as a serious problem in allowing free variables in the input terms to the minimum_factorization.

Your suggestion decreases degree of my seriousness for allowing free variables. Thanks.

Does it help forbidding free variables? There is a new surprise:

Because of the many complications of (@<)/2, especially also
the internal/external variable problem, one might speculate that
this problem below is caused by the anonymous variable _,

by the fact that the offending term is non-ground:

But I get the same stack overflow when I use a ground term:

/* SWI-Prolog 9.1.7 */
?- T = _S1, % where
    _S1 = s(s(_S1, _S2), 0),
              _S2 = s(_S2, _S1),  term_factorized(T, Y, L).
ERROR: Stack limit (1.0Gb) exceeded

So the chastity belt of @kuniaki.mukai doesn’t change the problem,
doesn’t make it more manageable. The stack overflow glitch is not only
present in non-ground terms, it sadly happens for ground terms as well.