Weeell, if youâre curious then itâs your fault 
The specific use case thatâs been troubling me in the past few days is an
evaluation module for an Inductive Logic Prorgramming algorithm. The algorithm
learns Prolog programs from examples and background knowledge and the
evaluation module measures the accuracy of the learned program with respect to
a partition of the examples unseen during training (the âtestingâ partition).
To evaluate a learned program the evaluation module generates its success set.
There are two options for this that are selected by a configuration option in
the application: one is to generate the programâs Least Herbrand Model by
bottom-up evaluation with a TP operator; the other is to resolve the learned
program against the background knowledge nondeterministically and collect all
results.
In both cases the program is evaluated (as in executed) in the context of the
background knowledge, because the purpose of learning is to find a program
that entails the positive examples and no negative examples given the
background knowledge or, in typical notation, B âȘ H |= Eâș
and for all eâ»
in
Eâ», B âȘ H \|= eâ»
, where Eâș
the positive, Eâ»
negative, examples, B
the
background knowledge and H the learned program -the âhypothesisâ- and â|=
â and
â\|=
â are my best attempts at ASCII for âentailsâ and âdoes not entailâ.
To measure the accuracy of H
, the evaluation module performs set operations on
the success set of H
, SS(H)
, and the positive and negative examples. So, for
example, to measure the rate of false positives (a component of accuracy) the
evaluation module takes the set intersection of SS(H)
and Eâ»
. So a âfalse
positiveâ is an atom of the predicate of H
, that is in the intersection of
the set of negative examples and the success set of H
. And so on for false
negatives, true positives and true negatives.
For reference, the following is the current bit of code that measures false
positives. The ground/1 check is to avoid the issues Iâve been discussing.
Rs
is the success-set, Neg
is the negative examples and NP
are the false
positives:
false_positives(Rs,Neg,NP):-
ground(Rs)
,!
,ord_intersection(Rs,Neg,NP).
false_positives(Rs,Neg,NP):-
intersection(Rs,Neg,NP).
The problems begin when the background knowledge a) is non-ground, b) cannot
be ground, or c) is not datalog. By âdatalogâ I mean Horn clauses with no
function symbols of arity more than 0 and with no literals negated by negation
as failure. If (a), (b), or (c), the TP operator canât be used or will return
nonsensical results, so that option is out.
The second option, using ordinary SLD resolution to generate SS(H)
does not
care about grounding, but that doesnât mean it will always generate ground
terms. One recent experiment thatâs been giving me trouble is learning a DCG
from a set of example strings (taken from a language used in a popular card
game). The background knowledge is also a DCG. Note that examples are always
ground. In fact, examples must be ground for the algorithm to work as
expected.
Here is an example string, generated by calling phrase/2 on a âtrueâ grammar
(hey, itâs an experiment):
ability([return,target,planeswalker,from,a,graveyard,to,'its\'','owner\'s',hand],[]).
Hereâs a clause in the learned grammar covering that example:
ability(A,B):-return_verb(A,C),target_from_graveyard_to_hand(C,B).
And hereâs the relevant atom of the success set of ability/2:
ability([return,target,planeswalker,from,a,graveyard,to,'its\'','owner\'s',hand|A],A).
This should explain the ground/1 check in my false_negatives/3, above.
ord_intersection/3 will consider the example string and the success set atom
above to be distinct and it will not add them to the intersection of SS(H)
and
examples. intersection/3 will unify them and add them to the intersection.
However, intersection/3 expects no duplicates and generating SS(H)
cannot be
guaranteed to not generate any duplicates (e.g. two DCG rules may entail the
same examples) so SS(H)
must be sorted to remove duplicates. Below is the bit
in my code that currently does the generation and grounding:
program_results(F/A,Ps,_BK,Rs):-
configuration:success_set_generation(sld)
,S = (table(user:F/A)
,assert_program(user,Ps,Refs_Ps)
)
,G = (findall(H
,(functor(H,F,A)
,call(user:H)
,numbervars(H)
)
,Rs_)
,sort(Rs_, Rs_S)
,varnumbers(Rs_S, Rs)
)
,C = (erase_program_clauses(Refs_Ps)
,untable(user:F/A)
)
,setup_call_cleanup(S,G,C).
Above, F/A
is the predicate of H
, Ps
is the learned program, _BK
is
the background knowledge, used when the âtpâ option is selected, and Rs
is
SS(H)
.
(Note the code actually uses SLG resolution to avoid going infinite when the
hypothesis is left-recursive.)
While it is possible to twist things into loops and ground the background
knowledge and the success set of the learned grammar, in this case, I
personally donât see how this can be done in a general enough manner that will
work for different datasets where non-ground terms crop up for different
reasons. My preferred method to ground terms is to Skolemise them with
numbervars/1 but, for example, in the case of the DCG learning experiment that
would cause false_negatives/3 to fail to return an empty intersection of
SS(H)
and examples ('$VAR'(1)
in ground H
atoms would not unify with
[]
in example strings).
My code is on github, you can have a look if youâre still curious:
The evaluation module is in lib/evaluation/evaluation.pl. The dataset is in
data/examples/mtg_fragment.pl Have fun and please report bugs 