Can ILP learn geometry rules with floating point numbers

This is a little bit mangled: procedure Generalise in TPC constructs each clause that entails at least one positive example with respect to the background knowledge, not all constructible clauses. For example, there are constructible clauses that do not entail any positive examples and those are not constructed, because they can’t be proved. In Vanilla there are further constraints that exclude e.g. tautologies (as defined by the user) and arbitrary instantiations of single second-order definite clauses. This way it is possible to avoid constructing left-recursive clauses if that’s not necessary. This is yet another step that requires some insight into the desired hypothesis though. I don’t recall if Prolog^2 has the same kind of mechanism to exclude specific instantiations.

I should also explain that the newer version of TPC constructs not just single clauses that entail an example, but entire multi-clause hypotheses. Thus the earlier version of TPC (in the first two Louise papers in the Machine Learning Journal special issues) could not form arbitrary recursive programs. The new version that uses Vanilla, can. There’s really been an overhaul of the framework and the implementations since ca. 2023 if I’m not mistaken.

Algorithm 1 only returns clauses that have positive examples:

So I guess in this context, the term constructible implies being
provable? Making the statement by James Trewern less mangled?
Of course you could understand constructible in a weaker manner,

in other contexts, like only derived from the given higher order patterns.
Making a difference between the -ible form and the -ed form of the
same word stem. Creating a higher barrier in understanding even trivial

stuff. Its pretty common for heterogenous communities to use the same
technical term with different connotations or even ambiguously. Thats why
for example the ISO core standard starts with a glossary. A glossary (from

Ancient Greek: γλῶσσα, glossa; language, speech, wording), also known
as a vocabulary or clavis, is an alphabetical list of terms in a particular
domain of knowledge with the definitions for those terms.

Recently ChatGPT is nagging me with even the meaning of the simplest
words, I dunno, maybe they upgraded their AI. I indent to stop
using ChatGPT, it has lost the ability to tolerate sloppyiness.

Apologies, when I say “constrtuctible clauses” I mean the set of all instances of the second-order clauses that can be constructed by substitution of their (first- and second- order) variables with predicate symbols and constants from the first-order background knowledge, examples, and any invented predicates. The set of all constructible clauses is called the “Hypothesis language” in the first TPC paper. The Hypothesis Space then is the powerset of the Hypothesis Language. My argument at the time was that we can construct logic programs by searching only the Hypothesis Language, for exponential gains in efficiency compared to a search of the Hypothesis Space.

However, Algorithm 1 shows how a single clause is constructed in the older version of Louise. See my note above about the differences with Vanilla. The versions of Louise and TPC before Vanilla and the switch to second-order SLD-Resolution were not complete and could not learn arbitrary recursion, or invent arbitrary predicates.

Metagol on the other hand was complete in principle because it was capable of forming hypotheses with more than one clause from each positive example, but its implementation was rendered incomplete by an attempt to control recursion with a cyclicity test. The completeness came at the cost of exponential time complexity (more specifically, exponential in the number of clauses in the Hypothesis Language) because of its iterative deepening search of the Hypothesis Space. That’s what you get when you search a powerset.

Vanilla overcomes Metagol’s implementation incompleteness thanks to tabling. I think when Metagol was first coded SWI-Prolog didn’t have tabling yet. Reformulating MIL as second-order SLD-Resolution eliminates the need to search a Hypothesis Space and again improves performance while retaining completeness.

You are still searching a space, namely a subspace of the hypothesis
space the so called target space. Specified via 2nd order patterns.
Learnability is often feasible only if the hypothesis space is larger

than a subspace containing the target hypothesis. You didn’t
eradicate search from the construction of hypothesis, in
particalular search of proofs that constructed hypothesis are

true, in some cases via SLD in other cases via SLG Tabling,
plus the search involved in the construction of hypotheses
itself from the given 2nd order patterns.

See also, published like 20 years ago:

Machine Learning and Data Mining
Chapter 13 - Learning Theory
Kononenko & Kukar - 2007
https://www.sciencedirect.com/book/monograph/9781904275213

Feasibility is a quite different notion than completness. While you
can change the notion of completness if you change the target
subspace, and the search method. A complete method might

still not be feasible enough, i.e. have a too high complexity, even
if the complexity is polynomial. Practically speaking, any algorithm
that runs in O(n^2) or longer is generally considered too

inefficient to be useful in practice. I wouldn’t be that strict, but
there is surely a k such that O(n^k) wouldn’t be tolerated
anymore because of some time limits. You can try the seven

eleven problem as a demonstration:

The Seven-Eleven Problem - 1983
https://www.cs.cornell.edu/gries/TechReports/83-574.pdf

Its quite magical that humans are nevertheless not so bad
learners, especially sensory. It has maybe todo with the fact
that certain graph algorithms can be made to run faster,

when the graph is planar. @kuniaki.mukai might be interested
in this fact, but I couldn’t yet get my head around how to utilize
it. It impacts transitive closure computations and hence

SLG tabling, since a planar graph has automatically a sparse
edge set. Already a taxonomy tree is planar. The robot, music,
geometry, etc.. examples could also be “planar-ish”? So yes,

lets doit. Lets do the impossible and apply learning. On the
other hand, I guess Darwinian evolution has had an advantage
with its billion years budget. So what are the AI boom and

these LLMs exactly cooking?

The claim is that we don’t need to search a set of logic programs, i.e. the Hypothesis Space. SLD-Resolution of course still has to search an SLD-tree, as I say above which as you suggest is a space of proofs. Replacing a search of a set of programs with a search of a set of proofs yields exponential gains in efficiency.

I don’t know what the “target space” is.

That’s right, we haven’t eradicated search. Note that I made no such claim: my claim is that we don’t need to search a Hypothesis space, very specifically.

On the other hand, in the new framework, only one search is needed: a search for a proof. A hypothesis is constructed during the proof by collecting the substitutions of second-order variables in second-order clauses. That means we don’t need to search separately for a hypothesis and its proof. This is another source of improvement in performance compared to earlier ILP approaches, that tend to separate the search for a hypothesis and the testing of “coverage” of examples.

That, too, is correct. Again, I haven’t claimed magick, only improvements in performance, and completeness like you say.

Completeness is important but what is more important to me to bridge the gap between ordinary SLD-Resolution for deduction, and its use for induction. If there’s a lesson to draw from the results on second-order SLD-Resolution the lesson is that the old idea of inverting deduction to achieve induction is not necessary: induction is a form of higher-order deduction; raising deductive inference to the second order of logic suffices to achieve inductive inference. Or, as I like to say: learning and reasoning are one.

Finally, all this tells us that J. Alan Robinson’s Resolution may be a more important procedure than was even recognised at his time, and possibly more important than is still recognised. With its application to MIL, Resolution is now used in three separate applications: (deductive) logic programming, SAT-solving and now machine learning. Resolution is a workhorse symbolic approach, not unlike gradient optimisation in statistical AI. It is a gold mine that is not yet spent and there is possibly much more still to learn and understand about it.

The separation will lead to the same polynomial complexity,
if you search in the apropriate target subspace. The key for
complexity reduction, or switching to a tamer complexity class,

is the choice of the target subspace not the interwining.
The interwining itself is not extremly successful to be honnest.
If interwining would work, if we had very powerful 2nd order

theorem prover, we could even have nice ZFC provers that
deduce sets. Or nice Peano provers that can invent formulas
for classical mathematical induction. But most provers chocke

since this interwining is relatively poor. Mostly lazy building of
formulas. Also in your case its only an interwining at the top,
while the prototype of James Trewern allows also dispersed

interwining I suspect. I think neither do any good. You could
equally well use, where invent/1 draws from a polynomial class only:

Note how Prolog homoiconicity is used for higher order,
a Prolog rule (i.e. clause) is also a Prolog term, the fresh
variables in the Prolog rule become universally quantified

during assertz/1 in the implementation of validate/1. The interwining
would only fold invent/1 and validate/1 into one thing. Which you
could also do automatically via freeze/2 and when/2, but freeze/2

and when/2 although they can reorder goals, they do not change
complexity classes in principle so much. You can study what folding
does already with the Seven-Eleven problem. I don’t think there is a general

folding speed-up theorem anywhere? But I might be mistaken.

BTW: The problem is a little bit how to formulate such a speed-up theorem.
While there are notions of NP-Hard or PSPACE-Complete, I am currently
short of material on the P class itself. Maybe we have to nevertheless

look at it from a combined complexity aspect. So if the patterns for your
2nd order SLD MIL change, are part of the input metric, we might somehow
have a combined complexity again, which has some hardness.

Its not really second order logic. You cannot do second order
logic alone with resolution. Resolution does not yield a second
order or higher order logic theorem prover. Substituting atomic

names for Predicate variables is a very narrow special case.
You only specified atomic substitution here, no word of
second or higher order unification in constructible clauses:

With ordinay unification J. Alan Robinson’s Resolution is a first
order method only. With gensym maybe? The most successful
interwining is still first order unification itself, which can invent

most general Prolog terms:

mgu(s, t) = ϕ

By exhibiting a substitution ϕ:

s ϕ = t ϕ

On the other hand, higher order unification often falls back to
backtracking at one point and there are then many problems
of fair backtracking. There is one exception I know, or lets say a

positive example performance wise, where for example this here:

:- set_prolog_flag(occurs_check, on).

Can do wonders. Trying to find Douglas Hofstadter
Quine, i.e. a program P with the definition that. In
higher order setting it would be just a higher order unify

question, by a declartive modelling of the input / output:

?- quine.
prints quine program itself, not using clause/2

Prints itself. Using the declartive modelling you could
formulate it as learning problem, since a program is
searched. Some of the such found quines are relatively

short, so you could give a polynomial bound in advance.
Its a nice exercise in interwining and also a benchmark
for freeze/2 and when/2, and the occurs check,

you can also code it with ISO unify_with_occurs_check/2,
and with less freeze/2 and when/2 I suspect, by managing
the choice points more explicitly. But its already some

time ago since I did all that, have to consult the archives.

See also:

miniKanren, Live and Untagged Quine Generation
via Relational Interpreters (Programming Pearl)
William E. Byrd et al. - 2013
http://webyrd.net/quines/quines.pdf

From looking at algorithm 1, it seems that “learning from mistakes”
is missing. Maybe algorithm 2, 3, 4, .. have the same “restart
methodology”. But I don’t see that the information of finite failure

inside SLD would be used. The “restart methodology” might
nevertheless be quite fitting for 2nd order SLD MIL learning,
since there might be a pool of non-mistake learnt rules so far,

which are then subject to reuse? Didn’t study the other algorithms:

It only returns zero ∅ if SLD fails. This is unlike neural
networks that thrive on a discrepancy between the target
in the forward pass, when they learn via back propagation.

Its also different in genetic programming. We can imagine
that the genes form “kinder garden” pool, with genes that
made mistakes, and that are less fit. We can then adapt

the predicate learn/1 into another predicate learn/1 that
does not implement a “restart methodolgy” via repeat/0,
but passes around this “kinder garden” pool and extends it:

/* “kinder garden” pool algorithm */
% learn(+Rule)
learn(Rule) :-
   learn(Rule, []).

% learn(+Rule, +List)
learn(Rule, Pool) :-
   invent(Rule2, Pool),
   (validate(Rule2) -> 
        (Rule = Rule2; learn(Rule, Pool);
        learn(Rule, [Rule2|Pool])).

The learning from mistakes would happen among other things
via the new invent/2, which does recombine also rules that
made mistakes as found in the “kinder garden” pool.

If I am not mistaken the latest generation of SAT solvers
also uses such a learning in their Conflict-driven clause learning
(CDCL) incarnation, but didn’t do much yet with it.

But the ultimate genetic algorithm has both a “kinder garden” pool,
with less fit members, and a “graduate school” pool, with more
fit members, and policies to keep the combined pool small, for its

so called multi membered algorithm. For example the members
of the combine pool might have an age limit, and die of old age.

The framework is in second-order logic. The implementation is in Prolog with an “encapsulation” procedure that converts second-order clauses to first-order with a function that maps second-order existentially quantified variables to first-order universally quantified variables and predicate symbols to Prolog “atoms” (i.e constants).

It’s important to remember that Prolog is not a first-order language. In FOL, there are disjoint sets of function and predicate symbols. Function symbols are used to create terms, predicate symbols to create atomic formulae (“atoms”). In Prolog there are only “terms”, while “atoms” (i.e. constants) can be both arguments to terms and functors at the same time. This allows all sorts of shenanigans like meta-interpretation which are not allowed in strict FOL: a meta-interpreter is a program that takes another program as argument; clearly not first-order.

Unification is not decidable in second-order logic but I don’t know anyone who argues it’s not decidable in Prolog, with or without occurs check. It’s also important to remember that as we go up the orders of logic, definitions very between authors. For example AFAIK Frege and Pierce both allowed quantifiers to range over predicate and function symbols and did not really recognise a separation between first and higher orders. Probably for the best.

I mean its a form of weak second order.
What is missing is this axiom:

/* comprehension axiom */
∃X ∀x (X(x) <=> φ(x))

Where φ is an arbitrary formula not containing
X. Exactly because you make it polynomial
complexity, you have not arbitrary φ. The

enumeration of Quine functions shows, that
second order problems can have arbitrary large φ.
In the Quine function problem the searched function

is the 2nd order object. Functions and not only
relations can be quantified in logics going beyond FOL.
In FOL you can only quantify individuals, which

makes it first-order. Even Prolog is only first order,
because a goal G, that you ship to call/1 is still
from the Herbrand domain. But you are right call/1

is a tricky “meta” predicate. With an infinite definition:

∀x1..∀xn (call(f(x1,..,xn)) ← f(x1,..,xn)).

For all arities n and all functors f. But the above
clauses all have an ordinary first order quantifier
block ∀x1..∀xn. This call/1 predicate definition

usually doesn’t lead to higher order unification. Further
in practical Prolog, the call mode (-) is anyways
blocked, required by the ISO core standard. You

could circumvent it by using current_predicate/1. But
again not automatically get intertwining of a kind Predicate
variable, maybe if freeze/2 or when/2 is deployed?

BTW: The above comprehension axiom usually
leads to two inference rules, that go beyond the
resolution rule. Because you have arbitrary φ, you

would have arbitrary CNF and/or DNF, not to speak
of skolemization problems, with multiple clauses
ever changing for different φ on one side of a

biconditional, while resolution combines only two
clauses. The comprehension axiom is a nice tool,
you can define what 2nd order objects always

exist, it is thus used in reverse mathematics.

Do you mean SWI-Prolog tabling. Interestingly tabling records
both success and failure of a predicate, and it could somehow
remember mistakes, when used across different hypothesis.

The tries in SWI-Prolog store the full answer set of a variant call.
Or do you use a custom tabling implemented by Vanilla itself?
Somehow I didn’t get my head around yet what the status

of tabling is, I read the following disclaimer:

7.13.1 Status of tabling
The current implementation is merely a first prototype. […]

  • The performance needs to be improved.
  • Memory usage needs to be reduced.
    […]

https://www.swi-prolog.org/pldoc/man?section=tabling-status

How do you do execution trace explanation with the
intercept feature of SWI-Prolog? As mentioned in the
parent thread. It shows me an interesting clutter

when I trace a tabled predicate:

?- fib(2,X).
   Call: (14) fib(2, _20592) ? creep
^  Call: (15) start_tabling(<closure>(fib/2), user:fib(2, _20592), [...]
[...]
   Call: (26) _20592 is 1+1 ? creep
   Exit: (26) 2 is 1+1 ? creep
   Exit: (25) fib(2, 2) ? creep
   Call: (20) trie_gen_compiled(<clause>(00000242302c2000), [...]
   Exit: (20) trie_gen_compiled(<clause>(00000242302c2000), [...]
^  Exit: (15) start_tabling(<closure>(fib/2), user:fib(2, 2), [...]
X = 2.

I see not only fib/2, my query, but also start_tabling/3,
trie_gen_compiled/2, .. what else?

The second-order in which MIL exists is a “weak second order” because it’s datalog, with second order variables quantified only over the set of predicate symbols, but not that of function symbols. In the Prolog implementation it doesn’t matter since the two sets are not disjoint.

I haven’t seen the “comprehension axiom” before and I don’t know what it means, but we also do not have “arbitrary formulae”, only definite clauses. Additionally the set of first-order definite clauses that can be constructed is finite because it’s a function of the finite sets of clauses in the first- and second-order background knowledge and the examples. This is what keeps the number of instances of the second-order clauses to a polynomial size (with some abuse of terminology).

Yes. It was suggested to me that I could have implemented MIL in XSB not only because of its tabling but also because of HiLog which accepts arbitrary terms, including variables, as functors AFAICT. But I’m a long-time user of SWI and preferred to stick with what I know best.

Come to think of it, I have in the distant past used Win-LPA Prolog that allowed variables to be used in the place of functors and it seems that Ciao Prolog and λProlog also allow this.

So I’m now wondering if I could pester @jan to implement MIL as a library for SWI-Prolog, with a native Prolog engine rather than a meta-interpreter. That would require the ability to declare terms with variable functors like in Win-LPA, Ciao and λProlog, e.g. P(X,Y) and also clauses like P(X,Y):- Q(X,Z), R(Z,Y).

From what we saw with James Trewern’s implementation this avoids some overhead that comes with meta-interpretation, though his implementation also used parallelism for maximum speedups. On the other hand I think he had some problems with controlling recursion which has to be done either with tabling (which he didn’t use) or with some sort of constraint over allowed instantiations of second-order clauses.

@jan what do you think? Could this be easily done in SWI? Is there a motivation to do it other than for ILP?

The first Prolog I used (not one of those you mentioned) also had variable functors. One of the things it allowed you do is replace uses of =.. (univ) with unification when the arity is known. I find that generally useful.

P.S. In case anyone’s interested, the Prolog I referred to also had tail variables (sequences) so =.. was defined in a compatibility library (all unification so no need for a builtin) as the fact:

F(X..) =.. [F,X..].

There were probably several Prolog implementations doing similar things pre-standardization.

That’s what enables inductive inference in MIL. Like you say it is also generally useful, for higher-order programming. Which I guess is a bit like induction, or in any case it’s possible to use MIL for a kind of higher-order programming if we know the program we’re trying to learn in advance. In fact this is a bit of limitation of MIL because it’s easy to lapse into this “I know what I want to learn” mode and force a certain program to be learned.

I wonder if the Prolog you mean was one of the IBM Prolog implementations?

Well usually the formula in the comprehension axiom, is a rather
a goal, than a clause. To arrive at clauses, one might have
embedded implication. You can guess clauses via this “what if?”

rule, C is a clause, E is a goal, and (C => E) is also a goal:

 K, C |- E
------------ (Right Implication Introduction)
K |- C => E

Its also not directly available in Prolog, but for example found
in lambda Prolog. The above does assume C and check whether
an E, for example an example, is satisified, when adding C to

the background knowledge K. On the other hand comprehension
leads to these two higher order rules, which are also not directly
available in Prolog, but via a detour found in lambda Prolog. The

axiom is a biconditional which I guess can be split into:

  K |- φ(x)
------------- (Right 2nd Order Exist Introduction)
K |- ∃X X(x)

K |- ∀X X(x)
------------- (2nd Order Forall Elimination)
  K |- φ(x)

But since the eigenvariable and substitution problems are
quite severe, for example Willard Van Orman Quine spent the rest
of his live trying to solve this problem, by reducing it to some

subproblems, some more primitive logical rules. On the other hand
the people comming from intuitionism are interested in witnesses,
one arrives pretty quickly that the witness for X in

(2nd Order Exist Introduction) could have a witness term
involving a lambda. Which then leads to higher order unification
very quickly, available for example in lambda Prolog:

X = lambda x.φ(x)

But some people see the replacement of X(x) by φ(x) also
as a quantifier elimination method. This theme was already
found in Ackerman & Hilberts booklet from 1938, sketching

a kind of higher order boolean Pressburger method. And
seems to be en vogue again right now in 2026, ca. 100
years later, judging from this recent workshop:

Workshop on Craig Interpolation, Beth Definability,
and Second-Order Quantifier Elimination

FLoC 2026 - Lisbon, Portugal, 24–25 July 2026
http://2026.ci-bd.soqe.org/

But I do not want to judge Willard Van Orman Quine badly,
his booklet Methods of logic from 1950 shows already some
extensive second order methods, sprinkled with everyday

logic examples.

I am really sympathetic with 2nd order MIL. But its still
a little remote. Its not integrated into SWI, like CLP(FD)
and CLP(B), where I can simply do use_module(library(clpfd)).

For example an integration or a separate system like
Teyus for lambda Prolog, could make 2nd order MIL more
accessible. Together with a tutorial learn 2nd order MIL now.

In Teyjus I can do “what if?” questions, its currently broken
in Teyjus top-level embedded implication, but would work when
put into a clause. Maybe ELPI by Enrico Tassi can show it:

/* would martin pass the current cycle if he would grade 6 in logic? */
?- (grade martin logic 6) => (pass martin cycle).
yes or no answer

Or a 2nd order MIL question, also from Teyjus:

/* what should martin do to pass the current cycle? */
?- (Q martin X Y) => (pass martin cycle).
answer with Q, X and Y

BTW: I don’t mind that 2nd order MIL is more Prolog-ish.
It seems to be a less intrusive syntax extension. Here
a speculation, how a 2nd order MIL system could do it:

/* would martin pass the current cycle if he would grade 6 in logic? */
?- grade(martin, logic, 6) => pass(martin, cycle).
yes or no answer

/* what should martin do to pass the current cycle? */
?- Q(martin, X, Y) => pass(martin, cycle).
answer with Q, X and Y

Maybe we are anyway on the brink of a transition from first
order Prolog systems, based on for example on a ZIP by Clocksin,
to higher order Prolog systems, based on a lambda ZIP ?

You can reuse the “const” instruction to realize (=>)/2.

Actually one can show that @stassa.p CONSTRUCT is
second order quantifier elimination. We already saw these
inference rules in logic programming beyond Prolog,

like for example lambda Prolog, that uses more than
only flat resolution but still stays in the BACKCHAIN
paradigma of subgoaling:

 K, C |- E
------------  (R=>)
K |- C => E

  K |- φ(x)
------------- (R∃^2)
K |- ∃X X(x)

We have Right Implication Introduction (R=>), and Right
2nd Order Exist Introduction (R∃^2). There is a buddy of
(R∃^2), the left 2nd Order Forall Introduction (L∀^2):

  K, φ(x) |- E
------------------ (L∀^2)
K, ∀X X(x) |- E

In the above you would indeed pick a clause for φ(x).
But one can convince himself that CONSTRUCT uses a
sequence of (R=>) and (L∀^2). It is based by the

further observation that BACKCHAIN Prolog resolution
basically activates clauses of the left hand side of
(|-)/2. By using other inference rules, like a variant of (L=>):

K, A -> B |- A
--------------- (L=>)
K, A -> B |- B

The variant (L=>) is not simply the buddy of (R=>),
its a form of modus ponens. This variant (L=>) together
with (R/\) is usually what drives ordinary Prolog. Especially in

2nd order MIL, the goal can be still some E, i.e. some
example. In this view 2nd order MIL basicaly invokes a
form of 2nd order quantifier elimination, via (R=>) and (L∀^2).

BTW: One can show that the synthesis of φ(x) from
the templates given to the 2nd order MIL, can be delayed
into a form of the (R∃^2). So that 2nd order quantifier

elimination cuts much deeper than one would think,
it would not only superficially explain the truth seeking
in CONSTRUCT but also the template exploration.

Disclaimer: I am exploring 2nd order MIL from the viewpoint
of natural deduction and 2nd order, as decribed in chapter
V, Second Order Logic, of this booklet by Dag Prawitz:

Natural deduction : a proof-theoretical study
Chapter V, Second Order Logic; Prawitz, Dag - 1965
https://archive.org/details/naturaldeduction0000praw/page/62/mode/2up

The booklet is not behind a paywall. My explorations might
contain errors, typos, etc.. its not something formally verified
on the paper or on the computer, just informal musing.

It was BNR Prolog - for history, see: