Non Monotonicity/Defeasibility Strategies and Other Stuff

https://groups.google.com/g/ontolog-forum has possibly useful information (search for “kif” or “gql”)

2 Likes

There’s a couple things your post brings to mind which I am not an expert on but have found interesting.

Answer Set Programming is definitely associated with non-monotonic reasoning. It has some interesting capabilities when it comes to modelling time and planning. This paper was neat.
https://personal.utdallas.edu/~gupta/csr-scasp.pdf Automating Commonsense Reasoning with ASP and s(CASP)*. s(CASP) is a swi-prolog library with similar but different capabilities to clingo https://potassco.org/ which is a standalone solver with something of the flavor of a SAT solver or minizinc if you’ve come across those. I believe there is a body of work connecting ASP to narrative production.

You may also find the work of Chris Martens interesting. Chris Martens Chris Martens - Publications They use logic programming (in particular linear logic programming) in different forms for narrative production in their thesis.

2 Likes

The logical negation (-)/1 of ASP isn’t non-monotonic, its monotonic. Thats actually
the advantage of ASP over ordinary Prolog, that it has a logical negation.

Then you can have ASP and add a further negation (not)/1, similar to negation as
failure in ordinary Prolog, which makes it then non-monotonic.

See also here, (-)/1 is called strong negation:

What Is Answer Set Programming?
In the context of logic programming, this idea leads to the need to
distinguish between two kinds of negation: negation as failure, used
above, and strong (or “classical”) negation, which is denoted in the
language of LPARSE by - (Gelfond & Lifschitz 1991).
https://www.cs.utexas.edu/users/vl/papers/wiasp.pdf

1 Like

I agree with what you’re saying. (-)/1 is monotonic and (not)/1 is non-monotonic. But from what I’ve been reading (I’ve been kind of fixated on this stuff the last couple days), the core insight ASP adds is in fact its treatment of (not)/1 and finding declarative meaning to negation as failure via the concept of stable model and relationship with non-monotonic logics such as default logic and autoepistemic logic (not that I know to much about these). The paper I linked above is a good reference on these points and also https://www.cs.utexas.edu/users/vl/papers/13defs.pdf 13 Definitions of Stable Models. By comparison, (-)/1 is almost a syntactic sugar convenience feature (although an excellent one) and I have not seen a reference describing it as the core feature of ASP.

2 Likes

These semantics also translate to ordinary Prolog. Ordinary Prolog is
just a special case of ASP, with the folllowing restrictions:

  • No disjunction in head (allowed in ASP).
  • No empty head (allowed in ASP).
  • No logical negation (allowed in ASP).

Or as the paper expresses it, traditional rules = ordinary Prolog:

These definitions are equivalent to each other when applied to
“traditional rules” – with an atom in the head and a list of atoms,
some possibly preceded with the negation as failure symbol, in the body:
A0 ← A1, . . . , Am, not Am+1, not An. (1)
https://www.cs.utexas.edu/users/vl/papers/13defs.pdf

The challenge is rather to lift these semantics to ASP, which were already
known for long for ordinary Prolog. But the Lifschitz paper isn’t exhaustive,
there are quite some more routes to understand ASP. For example a variant

of unit resolution, that only takes positive literals, might quite well explain the
logical negation of ASP and how stable models are generated. Thats also closer
to how ASP was invented and has less to do with circumscription.

The link to unit resolution tells us also a fundamental difference between ASP
and ordinary Prolog, namely that it is rather based on forward chaining, whereas
ordinary Prolog usually performs backward chaining. Despites its forward

chaining origins of ASP you can of course try to bring in some goal directed
behaviour, similarly like Prolog tabling is both forward and backward chaining.
You might also get lesser-ASP, if you restrict, for example if you don’t allow

disjunction in the head, as is done in s(CASP), and then replacing stable
models, by an abduction mechanism.

1 Like

Hey, have you had your mind blown by the s(CASP) paper, too? I sure have! It’s been a while since I read a paper so well-written, so clear, and so intellectually satisfying.

It’s the first time also that I found a clear motivation for ASP. In the past, I had heard it proposed as a purely declarative logic programming language with classical negation, but that sounded … a bit meh? I also couldn’t understand the execution strategy of grounding the entire Herbrand base which robs the language of expressivity (no lists!) and makes its execution NP-complete (because SAT-solving). Btw, I still don’t get that bit. Why?

In the s(CASP) paper instead I found a motivation of ASP as an elegant framework for reasoning with uncertainty, with certainty.

I have to explain this a bit: I’m coming from the Inductive Logic Programming (ILP) point of view where dealing with uncertainty is a lot more important than in ordinary, deductive logic programming. That’s becaue the real world is a dirty, noisy place, full of uncertainty and we want our machine learning systems to be able to deal with that. In ILP the standard approaches to reasoning with uncertainty are “magic” error tolerance parameters (like the “error” parameter in Aleph) or unholy mixtures of FOL with probabilities (there’s an entire field of Statistical Relational Learning which is all about that).

Magic parameters are magic, but the problem with probabilistic reasoning is that in representing uncertainty by probabilities, one loses the ability to represent certainty, even when certainty is there. As a trivial example, “1 + 1 = 2” and “1 + 1 = 3” must both be assigned a probability value, and the best one can say about these two statements in a probabilistic framework is that one is more likely than the other. And gods help us if the data from which probabilities are calculated is such that “1 + 1 = 2” is the less likely. Yet a proof can be derived of “1 + 1 = 2” from the axioms and theorems of arithmetic, which btw is the only context in which “1 + 1 = x” really makes sense as a statement, so there is no reason to have any uncertainty about it.

Probabilistic reasoning delendum est!

But what to replace it with? Here’s what the s(CASP) paper describes (see the numbered list in section 4.1, page 8), or rather, this is my re-interpretation of it in terms of SLD-resolution proofs with negation as failure (NAF):

Shade of truth          SLDNF representation
--------------          --------------------
p is certainly true     p← is an axiom 
p is probably true      ¬p has no SLD-derivation
p is of unknown truth   p and ¬p both have an SLD-derivation
p is probably false     ¬p has an SLD-derivation
p is certainly false    not_p← is an axiom and p← not not_p is a theorem

Where an “axiom” is what we call a “fact” in Prolog and a “theorem” is what we call a “predicate”, more correctly a predicate definition, i.e. a set of definite clauses. These basically replace ASP’s classical negation (I guess I still can’t get my head around that :P). I say “axiom” instead of “fact” to represent the er fact that all our knowledge is ultimately axiomatic and we can have no greater degree of certainty than the axioms of a theory. Opinion, that!

In any case, this blows my mind right through, not least because I’ve been looking for a formalism that could do all this, reason with uncertainty, with certainty, for a long time. John McCarthy has said that NAF is a simple way to represent uncertainty (Oral History of John McCarthy - YouTube) and now this guy Gupta shows how to do it in practice. Now all that remains for me is to re-interpret all this in an inductive setting. It feels like someone just tied rockets to my ankles.

4 Likes

Oh wow! I haven’t read it yet no, I’ve started to see some mentions of it around but thought I had mostly taken what I needed from ASP already. I might have to revisit!

Those 5 shades of truth reflect similar structures I’ve started using in my current approach. But in my approach they’ve been very intuitive and messy. Bringing a formalism like that to them could really help my understanding of what it is I’ve been trying to accomplish intuitively.

Might have so reading material for today now! Thank you! It’s been a while since I found something close enough to what I’m working on to be helpful.

Solving this so far has mostly been me chasing after an intuition finding systems and formalisms that approximate what I’ve been doing and slowly getting closer to what I need.

1 Like

Same thing here! But the more I look, the more I find that folks have thought of all that well before me and a lot of work has already been done. It’s just a very very tricky subject, if one is indeed looking for an elegant (simple, not cumbersome, intuitive) formalism.

Good luck!

Btw, just to be clear, my tabulation of the “shades of truth” and their SLDNF representations above are my re-interpretation of the s(CASP) paper’s explanation of ASP, and my re-interpretation is in the context of Good, Old-Fashioned, SLD-Resolution because that’s what I know. s(CASP) does something very similar, in that it adds new clauses to a Prolog program that represent classical negation, but it’s not exactly what I show above.

I can follow about half of that but a lot goes above my level of understanding.

What I can say is that in my use case there will often be cases where “p” and “~p” can be derived precisely because many of the systems I’m modelling aren’t entirely logically consistent or because their natural language reasoning isn’t easily (or usefully) reducible to logical cases and are defeasible based on some metric of specificity or case-by-case basis.

Allowing for p & ~p actually becomes a more efficient way of handling a lot of these cases. When that “state” for lack of a better term occurs we know something in the model is at odds with something else, and can introduce reasoning to handle it.

I treat them almost as hypotheticals. One of the rules I use to test whether I’ve got a robust enough system comes from Dungeons and Dragons 5th edition goes something like this.

When a half orc character is reduced to zero hit points, their player can instead choose to drop to 1 hit point instead once per long rest. (If you are unfamiliar a long rest is another game rule which has some preconditions.)

It’s important to note that 5th edition also uses the rather nebulous concept of “specificity” for defeasibility.

You could introduce some logic to make the resultant damage itself defeasible by introducing to all damage logic a check for whether a character is an orc and has chosen etc. but this requires rewriting that whole core damage logic if this rule were changed or introduced or running extra reasoning when it is never used.

Or, what I’ve found to be more elegant is to allow some predicate like a: hp(character, 0) and b: hp(character, 1) to both hold true, and then have the querying reasoner introspect the antecedents that led to that conclusion and see that either/and b can only hold true if a held true, or that b is some how a consequence of a. And so introduced a kind of “consequence of a hypothetical” meta-reasoning that can be employed when these conflicts take place. Or b) the tree of reasoning for b is longer and therefore more specific so a trumps b.

I don’t know how that maps to logic formalisms at all but it’s more efficient to compute (particularly with many continuous queries over steaming data sets).

It’s become quite an elegant way to handle values that change through time also. Rather than having to solve the frame problem or handle world state. You simply take the predicate of value at time, and assume the most recent value before the query time is the current one. If the time is unknown but you know the sequence of antecedent events that triggered the value change you can sequence them and check for conflicts or logical inconsistencies or branching consequence paths and handle them as they arise with defeasibility strategies.

I think part of why I’ve gravitated to Conceptual Graphs as my cornerstone for it is because the semantics of them map fairly closely to natural language. A lot of what I’m doing is connecting and mapping semantics onto each other in ontological space with some reasoning applied but only where it is needed.

And many of the games I’m modelling write their rules in that kind of natural language with a lot of exceptions and edge cases. And you need to be able to reconcile them on the fly much the same way you do as a human, rather than bake them into a singularly consistent, logically provable set. Especially with large rule sets that can be used piecemeal and altered, often during the run of a game, like in Dungeons and Dragons.

Early on I tried to account for this by having a lot of “this is a safe assumption until proven otherwise” clauses built into the rules. But that quickly became unwieldy and not future proof or modular enough when introducing new rules or building large rule sets. Instead it became far more reasonable to build the logic for handling and reasoning about exceptions into the reasoner itself.

It has also opened up a lot of room for proceduralism in specific common types of queries and defeasibility strategies to improve search efficiency.

For me it’s okay if my rules say p & ~p, because o also have rules for how to handle that. And if I ever don’t, I want the system to be able to explain its reasoning to a human so they can make a final decision on the truth.

It’s certainly not a robust solution for solving logical problems, but it’s a very good solution for my use case that avoids a lot of the pitfalls in defeasibility and temporal reasoning without having to worry about the frame problem and all the rest.

I wouldn’t even necessarily call what I’m doing anymore logic programming anymore so much as it is “reasonable” programming. Or even maybe some breed of natural language programming. It’s not so much trying to be provable as approximating the intuitive reasoning practices of a human tackling the same problem efficiently.

1 Like

[This is a very long comment and I missed the rest of the conversation while writing it. Aplogies!]

Yes, that’s absolutely right as usual. It’s my fault of sloppy thinking and sloppy notation: ¬p is meant as an atomic Horn goal, not an atom (in the FOL, not Prolog, sense) with classical negation. It should be denoted as ←p. Thank you for pointing out my error to me.

With that correction in mind, then, if ←p is an atomic Horn goal, and p and ¬p is a complementary pair eliminated during resolution, then the truth of p is uncertain (here, ¬p is a negative literal in a Horn goal so it is correct to denote it with ¬).

More precisely, in that case the truth of p is “uncertain” under a closed world assumption (CWA). NAF under a CWA, as in SLDNF-Resolution, is non-monotonic, which means that if we add clauses to a definite program, its success set can not only increase, but also decrease in cardinality (and the same if we remove clauses). Then a positive literal p that previously had an SLD-derivation (equivalently, a goal ←p that previously had an SLD-refutation) may now no longer have one, and v.v.

And that is where the uncertainty comes from: we may be able to establish a contradiction between p and ¬p now, but we have no idea whether that will still be the case in the future. Thus, a contradiction indicates not inconsistency, but uncertainty.

You have to excuse me if this is still not terminologically spick and span, but I’m still feeling my way around both the ideas, and the notation.

Yes! In the framework above, the atom p of uncertain truth is to be abduced.

I have to say that I started fumbling in the dark towards all this after a conversation with Antonis Kakas in the Hook meeting on Cognitive AI at the Royal Society in London, in September this year, so it’s not an accident that I considered the role of abducton in all this.

I have to say also that I can only “grok” abduction, as well as deduction and induction, in the context of NAF and SLD-resolution - because that’s all I know! In that context, given a definite clause C and a Horn goal ←G: “deduction” is the derivation of a new positive literal p in the head of C; “induction” is the derivation of a new definite clause D of the same predicate as a literal in ←G; and “abduction” is the derivation of a new positive literal p (that is the complement of a literal) in the body of C.

I bring good news: not only am I Greek, but also, in pricinple, all three modes of inference can be performed with Good, Old-Fashioned, SLD-resolution and, in practice, with a suitably modified Prolog meta-interpreter.

Deduction, of course, is done with the standard, “vanilla”, Prolog meta-interpreter (and its many variants).

Induction is possible with the inductive Prolog meta-interpreters in Meta-Interpretive Learning (MIL), a form of Inductive Logic Programming (ILP), that include second-order definite clauses in resolution. You can find the most recent example of one such meta-interpreter in my MIL system, Louise:

The inductive meta-interpreter is prove/5 starting on line 465. You’ll recognise its structure as that of the “vanilla” Prolog meta-interpreter. prove/5 is tabled so it’s really SLG-Resolution. I think prove/5 is the simplest and easiest to understand implementation of MIL to date (but of course I’d think that). The original version is in Metagol:

Abduction of a positive literal p is possible when its complement, ¬p, is derived during the SLD-Resolution of a Horn goal ←G with a definite clause C. Hence, p is of uncertain truth because it is a contradiction under a CWA. This I think accounts for the philosophical perspective of abduction as uncertain inference from evidence (here, incomplete evidence).

Abduction is performed in the hybrid neuro-symbolic MIL system Meta-Abd described in this paper:

To clarify, the MIL component of Meta-Abd uses an inductive-abductive Prolog meta-interpreter to “invent” new atoms of predicates marked as abducible, similar to what you say above about inventing “facts” by abduction.

Having been introduced to the code of Meta-Abd by its author, Wang-Zhou Dai, I know how Louise’s currently inductive-only meta-interpreter can be modified to also perform abduction. I’ve known this for a while but I couldn’t find a clear motivation to make this modification. I guess I couldn’t understand why abduction is proposed as a form of invention, or uncertain inference. Well, I think I get it now, and I’m very interested in handling uncertainty in a purely symbolic framework, for the purpose of (machine) learning and not just reasoning. Hence my remarks above about wanting to “re-interpret all this in an inductive setting”. I mean in the setting of ILP with MIL.

Unfortunately, for the time being I’m not being paid by anyone to work on that so I don’t know when I’ll be able to do it. I might have to suffer for my art, I guess :confused:

1 Like

Regarding temporal reasoning, treating NAF as a form of non-monotonic reasoning with uncertainty, also implies a temporal dimension to reasoning: we assume that “so far” what we know to be true, is true, and we derive all the logical consequences of our current assumptions, but we allow for future information to change our beliefs. Thus, we avoid the risk of being trapped in a world of bad assumptions.

Hah! When you said you’re developing a game scripting system, I thought you meant computer games, not board games. While I don’t play much anymore, I’ve done a bit of work with Magic: the Gathering. I’m interested in all aspects of this, with respect to machine-learning (with ILP): learning the rules, learning the language on the cards (“ability text”), learning to play and so on.

1 Like

Not really. If it doesn’t fit one page, I guess its not anymore sweet
fragrance “vanilla”. More a bagful of potatoes with a musty smell.
How would you make a teaching friendly version of your claim?

Starting with “vanilla” interpreter, pardon the cuts:

solve(true) :- !.
solve((A,B)) :- !, solve(A), solve(B).
solve(H) :- clause(H, B), solve(B).

What would be the extra for “abduction”? In my mind I
have like one extra line and some DCG magic.

Edit 07.01.2023
I took SLD what it is, no negation, unlike SLDNF. The SLDNF meta
interpretater has one extra line for NF. Here is the solution for some
SLD “abduction”, using DCG to return the abducted literals:

/* SWI-Prolog 9.1.0 */
solve(true) --> !.
solve((A,B)) --> !, solve(A), solve(B).
solve(A) --> {abducible(A)}, !, [A].
solve(H) --> {clause(H, B)}, solve(B).

Here is an example run:

/* SWI-Prolog 9.1.0 */
engine(turns) :- battery(nonempty).
abducible(battery(_)).

?- solve(engine(turns), L, []).
L = [battery(nonempty)].

I read the news but I had forgotten his connection to logic programming. I had interacted with him on HN where a barbarous mob of ignorant fools used to downvote his comments because he kicked established computer science orthodoxies in the shins and that’s too much for some peoples’ education to handle, I guess. He was a maverick and a pioneer to the end of his life. Would it that I could be like him.

That’s harsh :stuck_out_tongue:

The inductive meta-interpreter is small, including its clause-fetcher (like clause/2) that fetches second-order clauses and clauses induced so-far. I don’t know if it’s “one page” because I don’t know what is “one page”. About double the space is taken up by my comments, as is typical of my code. The rest of the code is pre- and post-processing, constraints, and a learning loop that passes positive and negative examples to the meta-interpreter. Those are not strictly necessary to understand the meta-interpreter.

Note that in ILP, “vanilla” is a reclaimed word. It is used somewhat derogatorily to mean ILP that isn’t probabilistic. The current fad is to try and mix probabilities with ILP, but that is such a mess that there is at least one such attempt made every time a paper is written. MIL is not probabilistic so Louise’s meta-interpreter is “vanilla”.

Yes, that is just the kind of modification I’m talking about. Nice and simple, isn’t it?

Btw, this is the vanilla part in Louise:

prove(true,_K,_MS,_Ss,Subs,Subs):-
    !.
prove((L,Ls),K,MS,Ss,Subs,Acc):-
    prove(L,K,MS,Ss,Subs,Subs_)
    ,prove(Ls,K,MS,Ss,Subs_,Acc).
prove((L),K,MS,Ss,Subs,Acc):-
    L \= (_,_)
    ,L \= true
    ,clause(L,K,MS,Ss,Subs,Subs_,Ls)
    ,prove(Ls,K,MS,Ss,Subs_,Acc).

The differences with solve/1 in your comment are a) the two guard literals in the last clause, b) the call to the custom clause-fetcher clause/7, and c) some additional arguments.

The custom clause-fetcher is necessary because we want to resolve the clauses of the program derived so-far with new clauses, and with second-order clauses. This is what ultimately makes the proof an inductive one. You can omit this custom fetcher and use clause/2 instead if you write every induced clause to the dynamic database, and then remove it when the proof fails, etc, but while that would make the meta-interpreter simpler, the whole program would be a bug-ridden mess that would be much harder to understand. I know because an earlier version did just that and it was an endless pain in the entailment.

The additional arguments are there to store the set of inducible symbols (Ss) the set of second-order clauses (Ms) and two accumulator variables (the last two arguments) necessary to return the induced clauses at the end of the proof. These would also collect abduced atoms in the abductive modification I describe above. Again, if you wanted to keep everything in three lines and ten literals, plus the cuts, you could write everything to the dynamic database, but, why? That would make it all a lot more painful to follow the program.

Of course the main attraction to the inductive meta-interpreter above is that it’s inductive, not that it’s vanilla. With impeccable reviewer style you focused on the part of the claim that was least consequential :stuck_out_tongue:

But if you think that an inductive Prolog meta-interpreter can be written that is simpler than the one above, and that is “more vanilla”, please go ahead and demonstrate. I for one would be very curious to see what that looked like.

Looking at vanilla separately from the full phrase “vanilla meta-interpreter” is
probably not legit. Thats like looking at butter in buttermilk.

I didn’t invent the term “vanilla meta-interpreter” (*). Also the usual “vanilla meta-interpreter”
is less “meta” than what one would think. Try doing the following queries with the
usual “vanilla meta-interpreter”:

?- solve(solve(true)).

?- solve(clause(p,X)).

A “vanilla meta-interpreter” which is meta-circular would need to be able to answer the
above queries as well, it is then likely that the “vanilla meta-interpreter” can interpret itself.

(*) The non-meta-circular “vanilla meta-interpreter” is for example found here,
thats one of the earliest references I could find.

Program 17.5 A meta-interpreter for pure Prolog
Sterling & Shapiro: The Art of Prolog, MIT Press, 1986
https://mitpress.mit.edu/9780262691635/the-art-of-prolog/

Maybe we can go more back in time?
Well Sterling & Shapiro give this explanation:

Unbenannt

1 Like

I didn’t think that you invented the term “vanilla”, anyway I’ve known it for a long time even if I don’t know where it comes from.

But listen, what I’m trying to say above is that the structure of the meta-interpreter is a distraction from its much more interesting property, that of being inductive. Let’s not waste time with irrelevancies.

You said that SLD-Resolution can only do deduction. It turns out, it can also do induction and adbuction. You saw yourself how simple and easy it is to do abduction. Induction is a little more complicated … a lot more complicated actually but it gets a lot simpler once you figure out how to do it.

This is something new. It is new in logic programming and in Inductive logic programming, in fact it’s exactly what “inductive” in “ILP” means, and it’s the first time in 30 years that it’s become a reality and not just an aspiration. And I think that the three modes of inference can actually be combined to create something even more powerful and useful. This should be useful information, right?

What do you mean by inductive? The usual saying is, it is a pure Prolog
program itself, which is synonymous to it is a definite program, although
this term is not anymore so much in use. What I gave as solve/1

wasn’t pure, because I used cuts. But I demanded pardon. What you gave
as prove/6 wasn’t pure either since it had (\=)/2 and a cut. A pure version
is found in Sterling & Shapiros book. It requires that clause/2 doesn’t bark

at (,)/2 or true/0, so that the “vanilla meta-interpreter” simply falls through
and can do its job. The “vanilla meta-interpreter” is not something practical,
rather a teaching device. Unlike a bagful of potatoes which could be

a more practical thing.

Edit 07.01.2023
Thats the pure thingy, no cuts and no (\=)/2, doesn’t work with the modern
ISO clause/2 anymore. Already the modern ISO clause/2 cannot access
static predicates. So its a clause/2 from another era:

image

Well if you’re making historical references that’s more interesting to me. A couple of days ago I had an online conversation about Prolog and meta-interpreters come up (they tend to, around me). My interlocutor said that they were listening to a talk by Sterling who said that meta-interpreters where invented by Warren, and then Kowalski who was in the audience stepped up to say that no, it was Colmerauer. This is the relevant part of my online discussion:

Just one personal recollection (from memory) which probably also influences my view on the Kowalski-Colmerauer relation. At a META 90 tutorial/talk about meta interpreters, Sterling attributed the origin of meta interpreters to Warren and the DEC10 manual. Kowalski responded (publicly during the talk) that this was well known to Colmerauer, well before Warren ever got into Prolog.

From here: > [3] What does the DEC10 Prolog manual say about the occurs check? See I.2.1. O... | Hacker News

Which is kind of weird given the Sterling & Shapiro quote, who also attributes meta-interpreters to Colmerauer. Perhaps my interlocutor misremembered.

I mean that, given a Horn goal and a set of first- and second-order definite clauses, it derives a new set of definite clauses that entail the atoms in the goal.

More formally, if B is a set of first-order definite clauses, M is a set of second-order definite clauses and ←e is an atomic Horn goal, such that B ∧ M |= e then an inductive meta-interpreter like the one I pasted above derives a set H of first-order clauses such that B ∧ M |= H and B ∧ H |= e.

So it’s a meta-interpreter that learns Prolog programs from Prolog programs.