Non Monotonicity/Defeasibility Strategies and Other Stuff

Yes, that is just the kind of modification I’m talking about. Nice and simple, isn’t it?

Btw, this is the vanilla part in Louise:

prove(true,_K,_MS,_Ss,Subs,Subs):-
    !.
prove((L,Ls),K,MS,Ss,Subs,Acc):-
    prove(L,K,MS,Ss,Subs,Subs_)
    ,prove(Ls,K,MS,Ss,Subs_,Acc).
prove((L),K,MS,Ss,Subs,Acc):-
    L \= (_,_)
    ,L \= true
    ,clause(L,K,MS,Ss,Subs,Subs_,Ls)
    ,prove(Ls,K,MS,Ss,Subs_,Acc).

The differences with solve/1 in your comment are a) the two guard literals in the last clause, b) the call to the custom clause-fetcher clause/7, and c) some additional arguments.

The custom clause-fetcher is necessary because we want to resolve the clauses of the program derived so-far with new clauses, and with second-order clauses. This is what ultimately makes the proof an inductive one. You can omit this custom fetcher and use clause/2 instead if you write every induced clause to the dynamic database, and then remove it when the proof fails, etc, but while that would make the meta-interpreter simpler, the whole program would be a bug-ridden mess that would be much harder to understand. I know because an earlier version did just that and it was an endless pain in the entailment.

The additional arguments are there to store the set of inducible symbols (Ss) the set of second-order clauses (Ms) and two accumulator variables (the last two arguments) necessary to return the induced clauses at the end of the proof. These would also collect abduced atoms in the abductive modification I describe above. Again, if you wanted to keep everything in three lines and ten literals, plus the cuts, you could write everything to the dynamic database, but, why? That would make it all a lot more painful to follow the program.

Of course the main attraction to the inductive meta-interpreter above is that it’s inductive, not that it’s vanilla. With impeccable reviewer style you focused on the part of the claim that was least consequential :stuck_out_tongue:

But if you think that an inductive Prolog meta-interpreter can be written that is simpler than the one above, and that is “more vanilla”, please go ahead and demonstrate. I for one would be very curious to see what that looked like.

I didn’t think that you invented the term “vanilla”, anyway I’ve known it for a long time even if I don’t know where it comes from.

But listen, what I’m trying to say above is that the structure of the meta-interpreter is a distraction from its much more interesting property, that of being inductive. Let’s not waste time with irrelevancies.

You said that SLD-Resolution can only do deduction. It turns out, it can also do induction and adbuction. You saw yourself how simple and easy it is to do abduction. Induction is a little more complicated … a lot more complicated actually but it gets a lot simpler once you figure out how to do it.

This is something new. It is new in logic programming and in Inductive logic programming, in fact it’s exactly what “inductive” in “ILP” means, and it’s the first time in 30 years that it’s become a reality and not just an aspiration. And I think that the three modes of inference can actually be combined to create something even more powerful and useful. This should be useful information, right?

Well if you’re making historical references that’s more interesting to me. A couple of days ago I had an online conversation about Prolog and meta-interpreters come up (they tend to, around me). My interlocutor said that they were listening to a talk by Sterling who said that meta-interpreters where invented by Warren, and then Kowalski who was in the audience stepped up to say that no, it was Colmerauer. This is the relevant part of my online discussion:

Just one personal recollection (from memory) which probably also influences my view on the Kowalski-Colmerauer relation. At a META 90 tutorial/talk about meta interpreters, Sterling attributed the origin of meta interpreters to Warren and the DEC10 manual. Kowalski responded (publicly during the talk) that this was well known to Colmerauer, well before Warren ever got into Prolog.

From here: > [3] What does the DEC10 Prolog manual say about the occurs check? See I.2.1. O... | Hacker News

Which is kind of weird given the Sterling & Shapiro quote, who also attributes meta-interpreters to Colmerauer. Perhaps my interlocutor misremembered.

I mean that, given a Horn goal and a set of first- and second-order definite clauses, it derives a new set of definite clauses that entail the atoms in the goal.

More formally, if B is a set of first-order definite clauses, M is a set of second-order definite clauses and ←e is an atomic Horn goal, such that B ∧ M |= e then an inductive meta-interpreter like the one I pasted above derives a set H of first-order clauses such that B ∧ M |= H and B ∧ H |= e.

So it’s a meta-interpreter that learns Prolog programs from Prolog programs.

I mean the meta-interpreter in this comment that I posted above:

I think we’re being confused because I mentioned that “you’ll recognise its structure as that of a vanilla meta-interpreter”, with which you disagree. I meant that it resembles the vanilla meta-interpreter. It’s not the vanilla meta-interpreter. But the structure, or the flavour, is not the point. The function is the point.

The meta-interpreter is found in the source file that I linked to, above, in this comment:

You then replied to say that this meta-interpreter is not “vanilla”. You said so in this comment:

https://swi-prolog.discourse.group/t/non-monotonicity-defeasibility-strategies-and-other-stuff/6038/23?u=stassa.p

I thought that meant you had a look at the source file I linked to and found the meta-interpreter’s flavour not to your liking. Sorry if I misunderstood this, but I’m confused now.

Ah, OK, yes, now we’re on the same page :slight_smile:

Louise is not an acronym. It followed an earlier implementation of MIL, called “Thelma”, which is an acronym: for Theory Learning Machine. Then I wrote a new program that I thought was just a curiosity and I wanted a name for it so I called it Louise. As these things tend to, it stuck (meaning I made a couple presentations where I used it and then I was advised to not change it to avoid confusing people who had seen my presentation). To be honest, by that point I was all out of funny acronyms anyway.

“MIL” is “Meta-Interpretive Learning” (see my earlier post). This is a new setting for Inductive Logic Programming that is basically SLD-Resolution with a second-order program.

But I did give you a reference to Louise. I’m giving you inter-thread references because that’s where I gave the out-of-thread references. See this post:

That’s where I linked you to the Louise repository. I didn’t link you to the paper because it’s long and boring and I didn’t think you would read it anyway.

Also because it’s wrong and confused and I understand things much better now and I’m embarrassed every time anyone mentions that paper :confused:

Don’t be mean! He’s my supervisor and he’s called Muggleton. He’s the guy behind most of the good ideas in ILP, and I can’t do anything about that, neither can he or anybody else.

The breakthrough of MIL is that it can learn recursion and do predicate invention without the restrictions of earlier approaches. Edit: Louise uses a polynomial-time algorithm for MIL and that’s the main contribution of the paper you cite above (“Top Program Construction and Reduction for Polynomial-Time Meta-Interpretive Learning”).

No, that’s the point of my earlier posts here: I’m trying to figure out how to transfer the intuitions in s(CASP) to an inductive logic programming setting, with Louise and MIL. The motivation for that is to handle uncertainty in a purely symbolic framework, like s(CASP) does, but for learning, and not only reasoning.

I can’t tell you much more than that because that’s all I have at the moment, but the idea is to learn Prolog programs with some kind of representation of uncertainty, like the “shades of truth” from the s(CASP) paper discussed earlier.

Edit: there is an ILP approach that learns ASP programs. See here for references:

https://ilasp.com/research

Ooh thank you, I’ll look into that constraint based revision it sounds very similar to what I’ve arrived at!

I think in cases of unrepairable contradiction, for my use case at least, it would fall back to a humans playing to make a judgement call. And would flag the rules involved in creating the game at hand as unclear and or broken so the system designers can clarify or ratify them.

More than once in this endeavour I’ve found genuine contradictions in some of the rule sets. Some arise from linguistic ambiguity. But others have just been in accounted for contradictions in very big rule sets.

It’s a very particular use case, but I can see it being useful in other domains filled with contradictions and incomplete rules. Business rules for one, or even expert systems with multiple designers. Where there may be multiple designers or maintainers with inconsistent belief systems.

Edit:
Oh wow, I’ve got some catching up to do I just realised how long the thread got over night. Will try to catch up later today! Haha

It’s getting late now and I’ll be traveling tomorrow so I may not have time to continue our conversation which I’m enjoying. I have to say that I don’t know what a valuation lattice is and I couldn’t find out with a quick search on the internet. Can you please clarify?

It’s the non-monotonicity I’m interested in, specifically the non-monotonicity of NAF. NAF, or rather the CWA itself is already a simple way to handle uncertainty (by ignoring it, essentially). Non-monotonicity of NAF means that a theory can be updated whenever new evidence becomes available, so there is no risk to be stuck with a bad initial theory as when reasoning monotonically.

In a learning setting this means that we can continue learning, and updating, new theories as new training examples become available. The only problem is that it’s tricky to have SLDNF-resolution be complete for normal programs.

I think non-monotonicity is more interesting in a learning setting, where we want to be able to update learned theories automatically.

Note that SWI s(CASP) also runs in SWISH. See SWISH -- SWI-Prolog for SHaring for this example.

1 Like

You can load s(CASP) into the WASM version. It is one of the demo files:

:- use_module('https://raw.githubusercontent.com/SWI-Prolog/sCASP/master/prolog/scasp.pl').

p :- not q.
q :- not p.

Takes a bit long to load :frowning:

The SWISH version has nicer and interactive output though.

And yes, SWISH is under maintenance. It is frequently restarted and it seems the automatic restart on issues does not work. More details will probably follow shortly.

Yes, that is correct, the Ciao Playground for s(CASP) runs fully on the client side.

Thank you for the explanation of “valuation lattice”, I see what you mean, you mean a quantification of uncertainty that imposes an ordering on events, objects etc.

The way I think about this is that I don’t want to explicitly assign un/certainty values to “things”: programs, clauses or atoms. I like how s(CASP) does it: the “shades of truth” are a semantics, they are not explicitly represented in the code. That is, “¬a” is “certainly false” just because a human reading it knows that’s what classical negation means, in this context. Thus proofs can proceed unburdened from human interpretations of their results, which is efficient and sensible (since we’re asking a computer to complete the proof, and not a human). One can then build on that semantics with further code, and explicitly represent the “shades” in a downstream task. For example, I could programmatically look at the output of an s(CASP) program and separate the results by their “truth shades”.

The same goes for NAF, btw. The fact that NAF operates under a Closed World Assumption (CWA) is not ever explicitly represented in a Prolog program. The programmer, or the user, must keep in mind the semantics of the formalism. I guess that explains why I often find an odd-looking remark in Prolog courses explaining that “true” and “false” are not to be taken as asbolutes and must be considered in the context of the CWA.

See my first comment in this thread about my thoughts on mixing probabilities and logic (and generally please see my comments on s(CASP) and why I am so enthusiastic about it, if you are confused by this comment). In summary there are too many frameworks that attempt to do this and now that I think about it, I even worked one out myself when I was doing my MSc (I was training some PCFGs). As far as I know there’s no accepted standard and all frameworks are deficient in one way or another. In any case I think probabilistic reasoning itself is borken: it’s an elegant theory that only works in practice for simple cases and turns into a horrible mess everywhere else. And to rant a bit about this, it seems these days, everytime I pick up a paper that does “Bayesian this and that” I find that somewhere in there somebody is trying to integrate over an infinite distribution and ends up approximating it with some data, possibly annotated by their dog for all I know. So much for the elegant mathematics of the Bayesian calculus.

Yeah, absolutely not. I’m doing none of that. That stuff doesn’t work and I don’t want it soiling my clean and tidy logical notations. Begone, probabilities! Oksapodo! Ftou! Ftou! (And other apotropaic expressions in Greek).

Think of it this way: Prolog is the syntax and semantics of logic turned into a programming language (I’m not saying “of FOL” because Prolog is not FOL and it’s not even one logic). It’s not perfect, but it’s useful and usable. But, how would it look like if someone tried to turn the Bayesian calculus into a programming language? It would look like an unholy mess, that’s what it would look like. It would be like one of those esoteric languages like Brainfuck that look like line noise, or Piet (where one programs using coloured squares like a Modrian painting). There’s a reason why this is so: the notation is too abstract and the semantics have nothing to do with the real world. The entire edifice is erected in Plato’s world of ideas, or in Aristophanes’ cloud cuckoo land, more like it.

And yet so much effort in AI has been wasted trying to do just that, use probabilities as a language to program computers to take complex decisions in the real world. That’s done under the pretext that the real world is full of uncertainties and probabilities model uncertainty. Great idea, but it doesn’t work because probabilities are a representation that humans can’t understand and that computers can’t compute: the most widely used probabilistic machine learning algorithm has got to be … Naive Bayes. A simplification, necessary to avoid the intractable calculations that one must always perform in the non-naive case. Everything in probabilistic reasoning is like that, there are intractable calculations all over the place and they crop up every time anyone tries to do anything halfway useful, let alone try to model human decision making.

And we’re still only in the theoretical sphere, where probabilities are not reified and we’re just doing pure mathematics with random variables like A, B and C. But the moment you try to assign actual, numeric values to your variables, the moment you take probabilities into the real world they transform into statistics, like Mogwai transform into Gremlins when you feed them after midnight. And then you’re back to doing dirty, unprincipled heuristics. So what was the point of the elegant and principled approach in the first place? It’s the old switcheroo, it’s a rip-off.

Statistical AI is an mess. Why would I ever want to bring that mess into my beautiful and practical predicate calculus? Never!

It’s not purism, not at all. Exactly the opposite. I’m a pragmatist who wants something that really works. Probabilities work on paper, but not anywhere else.

That’s incomprehensible. You had to draw a Venn diagram to explain it and I still don’t get it.

I’m a simpleton and I want simplicity, mkay?

Indeed, I missed the bit about “rational choice”, mainly because I don’t know what that is and so I didn’t recognise it as a special term.

On the other hand the rant wasn’t aimed specifically at you so I wasn’t expecting a response. Sorry for the misunderstanding. I tend to explode in incontinent rants at the most inappropriate times.

Wow, I did not expect to partially responsible for kicking off such a discussion. To address a far earlier point, ASP is interesting to me compared to s(CASP) because it is in my personal interest to be more interested in forward inference / datalog than backward inference. I’ve been in the business of trying to find useful ways to enhance the expressivity of datalog. One such enhancement is to add egraph rewriting to datalog, for which I still do not understand how it could work in backward inference. Maybe somehow something something tabling, but it is not at all clear. ASP is very compelling as one direction of a super datalog.

Since it seems I’ve got abductive and inductive logic programming experts on the horn, is there advice for references or systems to use? In particular, I’m interested in applications to binary and imperative program analysis. Invariant inference, spec inference and other such questions.

1 Like