Glitch in term_factorized/3, stack overflow

Was trying to run some monkey testing over my rep_compare/3.
I guess term_factorized/3 shows a glitch somewhere. Maybe because
it relies on buggy native compare/3 for their rbtrees? I don’t know

how the rbtrees are implemented. But during monkey testing it shows me:

?- aggregate_all(count, (between(1,1000000,_), random_cyclic(A),
random_cyclic(B), random_cyclic(C), rep_less(A,B),
rep_less(B,C), \+ rep_less(A,C)), F).
ERROR: Stack limit (1.0Gb) exceeded
ERROR:   Stack sizes: local: 0.7Gb, global: 0.2Gb, trail: 1Kb
ERROR:   Stack depth: 12,882,088, last-call: 50%, Choice points: 6
ERROR:   In:
ERROR:     [12,882,088] rbtrees:lookup(<compound s/2>,
_64416492, <compound black/4>)
ERROR:     [12,882,086] terms:insert_vars(<compound s/2>,
_64416526, <compound t/2>)

To reproduce the problem, here are the sources:

fuzzer.p.log (1,7 KB)
rep.p.log (1,6 KB)

This is a bad news for me, though the glitch might be fixed soon. I thought efficient C-level implementation of my semi_lex_sotrt based on compare_with_stack. It assumes term_facterized is built-in basic architecture (coalgebraic architecture) with a requirement that for x,y in the domain of the current state (colagebra), expanded x y are equal iff x=y.
I hope this property, i.e. keeping maximum bisimilarity
on the coalgebra( as state), is not expensive.

Please do not take this spontaneous comment serious. Anyway this is a reason why that glitch information sounds a pity for me.

Good ! Now I am wondering if smart “common sub terms identification” could be added to your factorize/3, which looks very likely but I am not sure for now.

EDIT. This is what I have in mind as for “common sub terms identifications”. Of course,
it might look like joking.

% ?- A=f(g(f(g(A)))),  B=f(g(B)),  factorize_minimum(h(A, B), S, L).
%@ A = B, B = f(g(B)),
%@ L = [S=h(_A, _A), _A=f(_B), _B=g(_A)] .

% ?- A=f(g(f(g(A)))),  B=f(g(B)),  factorize(h(A, B), S, L).
%@ A = B, B = f(g(B)),
%@ L = [S=h(_A, _B), _A=f(_C), _C=g(_A), _B=f(_D), _D=g(_B)].

factorize_minimum(T, S, L):-
	factorize(T, S, L0),
	minimize(L0, L).
%
minimize(X, Y):- select(E, X, X0),
	select(F, X0, X1),
	E = F,
	sort([E|X1], X2),
	minimize(X2, Y).
minimize(X, X).

Anonymous variable are not supported, they should be treated as atoms in coaglebra by definition.
Anyway, that is joking for the time being.

Oh, anonymous variables are not used. I must look into my codes.

Element equation of what factorize returns must be of the form
variable = f(variable, variable, variable,…).
For instance, X =f(1, A) is not allowed. This is required for constructing bisimulation
on the coalgera.

Yes, thats exacly.

(Nothing of importance)

I would like to do experiment modifying your factorize a little bit to see
whether expected minimization is gotten. Give me time.

Now I see a problem of my codes. My free-ride on you has failed.

Summarizing: term_factorized/3 uses library(rbtrees) which depends heavily on compare/3 which is buggy given cyclic terms so, in hindsight, not surprising it isn’t reliable.

In fact many (most, all?) of the problems encountered so far seem to be traceable back to the same bug, so I think a useful step would be to raise a new issue on using compare/3 with left-recursive rational trees (may not be the only issue but it’s simple) based on the following example:

?- X=(X,a), Y=(Y,b), Z=(Y,c), compare(A,X,Y), compare(B,Y,Z), compare(C,X,Z).
X = (X, a),
Y = (Y, b),
Z = (Y, c),
A = B, B = (<),
C = (>).

specifically, the violation of the transitive property of comparison system predicates.

I think Jan essentially localized the problem:

Is there any other useful input that would help resolve this issue?

I have managed to write my own minimum_factorization(X, U, C), where X is a given ground possibly cyclic term, C is minimum system of equations inluding U= A such that X and A are equivalent
when fully expanded.

The Current version may contain bugs.

% ?- A=f(B, 0), B=f(A, 1), factorization(f(A, B), U, F).
%@ A = f(f(A, 1), 0),
%@ B = f(A, 1),
%@ F = [U=f(_A, _B), _A=f(_B, _C), _C=0, _B=f(_A, _D), _D=1].
% ?- minimum_factorization(f(f(a), f(a)), U, L).
%@ L = [U=f(_A, _A), _A=f(_B), _B=a].
% ?- minimum_factorization(f(f(a), f(b)), U, L).
%@ L = [U=f(_A, _B), _B=f(_C), _C=b, _A=f(_D), _D=a].
% ?- A=f(B, B), B=f(A, A), minimum_factorization(f(A, B), U, F).
%@ A = B, B = f(_S1, _S1), % where
%@     _S1 = f(f(_S1, _S1), f(_S1, _S1)),
%@ F = [U=f(U, U)].
% ?- A=f(B, 0), B=f(A, 1), minimum_factorization(f(A, B), U, F).
%@ A = f(f(A, 1), 0),
%@ B = f(A, 1),
%@ F = [U=f(_A, _B), _A=f(_B, _C), _C=0, _B=f(_A, _D), _D=1].


minimum_factorization(T, S, L):-
	factorization(T, S, L0),
	minimize(L0, L).
%
minimize(X, Y):- select(U=_, X, X0),
	select(V=_, X0, X1),
	U \== V,
	simulate(U, V, X, [], Cs),
	!,
	unify_cluster_list(Cs),
	sort(X0, X1),
	minimize(X1, Y).
minimize(X, X).

% ?- A=f(B, 0), B=f(A, 1), factorization(g(A,B), V, [], U, [], V).
% ?- A=f(B, 0), B=f(A, 1), factorization(g(A,B), V, C).
% ?- A=f(B, 0), B=f(A, 1), factorization(f(A,B), V, C).

factorization(X, V, C):- factorization(X, V, [], C, [], _).
%
factorization(X, _, _, _, _, _):- var(X), !, throw(unexpected_variable).
factorization(X, V, C, C, H, H):- member(U=T, H), T == X, !, V=U.
factorization(X, V, C, [V=U|C1], H, H1):- X=..[F|Xs],
	factorization_list(Xs, Vs, C, C1, [V=X|H], H1),
	U =..[F|Vs].
%
factorization_list([], [], C, C, H, H).
factorization_list([X|Xs], [V|Vs], C, C0, H, H0):-
	factorization(X, V, C, C1, H, H1),
	factorization_list(Xs, Vs, C1, C0, H1, H0).

%
unify_cluster([]):-!.
unify_cluster([_]):-!.
unify_cluster([X, X|R]):- unify_cluster([X|R]).

%
unify_cluster_list([]).
unify_cluster_list([C|Cs]):-unify_cluster(C), unify_cluster_list(Cs).

%
get_target(U, A, X):- member(V=A, X), U==V, !.

% ?- simulate(U, V, [U=a, V=a], [], Cs).
simulate(U, V, X, Cs, Cs0):-
	get_target(U, A, X),
	get_target(V, B, X),
	functor(A, F, N),
	functor(B, F, N),
	A=..[_|As],
	B=..[_|Bs],
	union_find(U, V, Cs, Cs1),
	simulate_list(As, Bs, X, Cs1, Cs0).
%
simulate_list([], [], _, Cs, Cs):-!.
simulate_list([A|As], [B|Bs], X, Cs, Cs0):-
	(	in_same_cluster(A, B, Cs) -> Cs1 = Cs
	;	simulate(A, B, X, Cs, Cs1)
	),
	simulate_list(As, Bs, X, Cs1, Cs0).
%
in_same_cluster(U, V, [C|_]):-
	member(X, C), X==U,
	member(Y, C), Y==V,
	!.
in_same_cluster(U, V, [_|Cs]):- in_same_cluster(U, V, Cs).

% ?- union_find(A, B, [], X), union_find(C, B, X, Y).
union_find(X, Y, Cs, Ds):-
	find_cluster(X, C, Cs, Cs0),
	(	member(U, C), U==Y -> Ds =[C|Cs0]
	;	find_cluster(Y, C0, Cs0, Cs1),
		append(C0, C, C1),
		Ds =[C1|Cs1]
	).
%
find_cluster(X, C, [], []):-!, C=[X].
find_cluster(X, C, [D|Cs], Cs):-member(B, D), B==X, !, C=D.
find_cluster(X, C, [D|Cs], [D|Ds]):-find_cluster(X, C, Cs, Ds).

(Nothing of importance)

The output system of equations has a unique solution, this is main purpose of
the minimum_factorization. However I am not 100% sure about this property, but 90 %.
Proof is needed.

EDIT. Each variable has a unique value, different from that of other variable.

(Nothing of importance)

For implementing compare_with_stack in C-level, expensive == check becomes not necessary.
And also union-find process will become just derefencing. These two is a motivation of the minimum_factorization. I had no intention to give innovation to existing term_factorized.

Meaning of free variable of input terms is not fixed. It seems enough
to treat these free variable as atoms as far as mininimum_factorization concerns, which
is an easiest policy.

EDIT
However, I see a serious problem behind such free-variable-as-atom policy, which
I have no idea to fix for now.

(Nothing of importance)

Given a rational tree T as coalgebra A, the minimum_factorization
constucts an ‘injective’ coalgebra B for the same T.
However if T has free variables treated as atoms and
if they are instantiated afterward, the solution of B
might be no more injective. This is bad of the purpose T to eliminate expensive test on ‘==’ from compare_with_stack (in C-level efficient implementation.)

This is what I am seeing as a serious problem in allowing free variables in the input terms to the minimum_factorization.

Your suggestion decreases degree of my seriousness for allowing free variables. Thanks.

(Nothing of importance)