Identity and facts

Thanks. I fully support your skepticism regarding system comparison. I was more interested in a qualitative comparison and you already point at overcoming one limitation. I happen to be likely in need for ILP soon :slight_smile:

1 Like

Well, for one thing (some) ILP systems can learn from rules, not just facts
(Louise can, for example). So it’s not just about “converting facts into rules”.

The general motivation of ILP is to learn programs from data. There are all
sorts of use cases for this. For example, maybe you have a bunch of examples of
the behaviour you want your program to implement, but you can’t figure out how
to write the program yourself. Some of the big successes of deep learning in
image recognition have come from such a motivation: nobody has yet figured how
to do image recognition “by hand” as well as Convolutional Neural Networks (the
state-of-the-art approach for image recognition) can do it.

Another motivation of course is that you might have a big database of facts and
rules and you want to know the relations between the predicates they define. In
that case, you don’t even know what program you want to write, let alone how to
write it. Some ILP systems can do this kind of “exploratory analysis” (Louise
can).

Then of course, there’s the automatic programming, or program induction
motivation: rather than writing a program by hand, let an algorithm do it for
you. You are required to define some constraints: examples of the inputs and
outputs of your target program (both positive and negative examples: correct and
incorrect inputs); a library of sub-programs from which the target program will
be synthesised; and, optionally, some kind of language bias that describes the
general structure of the target program (e.g. whether it should be recursive and
so on). I see this as automating the task of writing unit tests, where you
provide the inputs and outputs of unit tests and press “enter”, to let the
computer do the hard work.

Finally, historically speaking, ILP is the offshoot of the early work in
symbolic machine learning that followed in the wake of the Expert Systems era.
That early work was directed to learning the production rules for Expert Systems
from data- a big deal, considering that one of the weaknesses of Expert Systems
was the so-called “knowledge acquisition bottleneck”, i.e. the difficulty of
encoding large amounts of knowledge into hand-crafted rules. Instead of
hand-crafting a rule base, symbolic machine learning approaches (such as
decision tree learners) tried to learn the rules from data. Production rules
were propositional. ILP took things one step further and looked at ways to learn
fully first-order rules, taking advantage on the work in logic programming and
resolution thoerem proving.

There are more reasons to consider and I could go all poetic about inductive
logic and the interesting aspects of its inverse relation to deduction, etc etc.
I’ll probably bore you to death though :slight_smile:

2 Likes

Oh cool! If I can help with that, let me know :smiley:

(Edit: not necessarily to use Louise! In general.)

1 Like

please go on – its very interesting.

What about defeasiblity – is this something addressed as well – also, are there use cases for interactive systems that sort of do some guided learning.

Dan

Sorry, I don’t know anything about defeasibility. I’d be happy if you could explain what that means.

Interactive systems were all the rage in the early days of ILP, but nowadays it’s not a major requirement. What did you have in mind, more precisely, could you say?

Finally, I’m curious about the turn of phrase in your earlier comment “converting facts into rules”. I hope I didn’t misunderstand your question and start talking about ILP when you were talking about something else? ILP systems don’t “convert facts into rules”. They learn programs from examples. So the “facts” are the examples (positive or negative) and the “rules” are the clauses of a program. The point of course is to learn programs that generalise beyond their examples, to definitions that cover unseen examples. It’s machine learning, yes? The goal is not compression, but generalisation.

Thank you,

re: defeasibility and generalization

Sometimes after generalizing data into rules you find more specific rules that are like the general rule, but not fully, they are in effect exceptions.

The classic example is a general rule that all birds fly. And then, comes along tweety, the penguin, that gets classified as a bird, but is also known to not fly.

So, when generalizing a rule from data - the rule is tentative and a default assumption - i.e. additional data could disprove it and require adjusting. It seems to me that defeasibility is at work here as well.

I guess, this is deal with in one way or another in inductive programming …

There may be some literature on this kind of thing in ILP, so I’d have to look it up to tell you for sure, but off the top of my head, and assuming I understand what you mean, the kind of problem you describe can be solved in principle by learning theories with “exceptions”, or with negation as failure.

The first case is simpler. If you have a “fact” that tells you that tweety is a bird, and a “rule” that tells you that “all brids fly”, you then have a program that looks like the following:

bird(tweety).
bird(X):- flies(X).

flies(eagle).
flies(pig).

And so on. If you don’t have the fact bird(tweety) you can learn it. Learning atoms (“facts”) is sometimes easier than learning clauses (“rules”) so it’s not a big deal. There’s also abductive learning that’s more finely tuned to learning atoms (deduction, induction and abduction are the three forms of logical reasoning typically identified, and abduction is the derivation of new facts from facts and rules, although that’s also the case with deduction, so it’s a bit confusing. Suffice it to say that what Sherlock Holmes does is actually abduction, not deduction).

I note also that this kind of example, with brids flying or not, even called “tweety”, is typical of older ILP work so if I have a more closer look in the literature I might find a concrete example, although at this moment my memory is failing me. I’ve seen plenty of tweeties in ILP though, it has to be said.

There’s something that bothers me with your turn of phrase of “generalising data into rules”, partly because it’s a very informal thing to say and I’m not sure exactly what it means (I lost the ability to reason about informal speech when I started my PhD). What does it even mean to “genearlise data”? I mean, a string of numbers representing temperature readings is “data”, but how do you “generalise” it?

In ILP and logic programming the meaning of “generalisation” is formal: A is more genearl than B if and only if B is true whenever A is true, or in other words A |= B, read “A entails B”. In ILP, B is usually our set of positive or negative examples, so we want to answer the questions: a) “what entails B?” and b) “what doesn’t entail B?” (when B is a negative example). So we want to find an “A” that entails B when B is a set of positive examples and does not entail B when B is a member of a set of negative examples. In the second case, the entailment-as-generalisation relation between A and B helps to specialise A, by excluding from it clauses that are over-general, i.e. entail any B that is a negative example.

So ILP is not only genearlisation and it’s certainly not only generalisation of facts. In fact, ILP approaches often go the other way and specialise an over-general theory, until it is specific enough to entail the positive examples without entailing the negative examples. Other approaches do take the road you describe informally of generalising a theory that is over-specialised to begin with. These are the top-down and bottom-up approaches in ILP, respectively. But, it’s important to make the distinction between generalising an example as opposed to generalising a theory.

1 Like