<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Probabilistic Abductive Logic Programming using Possible Worlds</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>F. Rotella</string-name>
          <email>fulvio.rotella@uniba.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>S. Ferilli</string-name>
          <email>stefano.ferilli@uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centro Interdipartimentale per la Logica e sue Applicazioni - Università di Bari</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dipartimento di Informatica - Università di Bari</institution>
        </aff>
      </contrib-group>
      <fpage>131</fpage>
      <lpage>145</lpage>
      <abstract>
        <p>Reasoning in very complex contexts often requires purely deductive reasoning to be supported by a variety of techniques that can cope with incomplete data. Abductive inference allows to guess information that has not been explicitly observed. Since there are many explanations for such guesses, there is the need for assigning a probability to each one. This work exploits logical abduction to produce multiple explanations consistent with a given background knowledge and defines a strategy to prioritize them using their chance of being true. Another novelty is the introduction of probabilistic integrity constraints rather than hard ones. Then we propose a strategy that learns model and parameters from data and exploits our Probabilistic Abductive Proof Procedure to classify never-seen instances. This approach has been te sted on some standard datasets showing that it improves accuracy in presence of corruptions and missing data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In the field of Artificial Intelligence (AI) so far two approaches have been
attempted: numerical/statistical on one hand, and relational on the other.
Statistical methods are not able to fully seize the complex network of relationships,
often hidden, between events, objects or combinations thereof. In order to apply
AI techniques to learn/reason in the real world, one might be more interested on
producing and handling complex representations of data than flat ones. This first
challenge has been faced by exploiting First-Order Logic (F OL) for representing
the world. This setting is useful when data are certain and complete. However
this is far for being always true, there is the need for handling incompleteness
and noise in data. Uncertainty in relational data makes things even more
complex. In particular it can the affect the features, the type or more generally the
identity of an object and the relationships in which it is involved. When putting
together logical and statistical learning, the former provides the representation
language and reasoning strategies, and the latter enforces robustness. This gave
rise to a new research area known as Probabilistic Inductive Logic Programming
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] (PILP) or Statistical Relational Learning [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] (SRL). Although clearly
relevant in complex domains such as Social or Biological data, it inherits well-known
problems from Probabilistic Graphical Models [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] (PGM) (parameter and model
learning, inference).
      </p>
      <p>
        Furthermore, reasoning in contexts characterized by a high degree of
complexity often requires purely deductive reasoning to be supported by a variety
of techniques that cope with the lack or incompleteness of the observations.
Abductive inference can tackle incompleteness in the data by allowing to guess
information that has not been explicitly observed [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. For instance, if one is
behind a corner and a ball comes out of it, a good explanation might be that
someone has kicked it. However there are many other plausible explanations for
the ball’s movement, and thus that inference provides no cer tainty that someone
really stroke the ball. While humans are able to discriminate which
explanations are consistent with their previous knowledge and which ones have to be
discarded, embedding in machines this capability is not easy.
      </p>
      <p>This work faces two issues: the generation of multiple (and minimal)
explanations consistent with a given background knowledge, and the definition of a
strategy to prioritize different explanations using their chance of being true. It
is organized as follows: the next section describes some related works; Section 3
introduces the Abductive Logic Programming framework; our Probabilistic
Abductive Logic Proof procedure is presented in Section 4; then in Section 5 we
propose a strategy to exploit our approach in classification tasks; finally there is
an empirical evaluation on three standard datasets followed by some
considerations and future works.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Abductive reasoning is typically used to face uncertainty and incompleteness.
In the literature there are two main approaches: uncertainty has been faced
by bayesian probabilistic graphical models [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and incompleteness by means of
the classical approach based on pure logic [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Nowadays many works combined
logical abduction and statistical inference, which allows to rank all possible
explanations and choose the best one.
      </p>
      <p>
        One of the earliest approaches is [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] where a program contains non-probabilistic
definite clauses and probabilistic disjoint declarations { h1 : p1, ..., hn : pn} where
an abducible atom hi is considered true with probability pi and i ∈ { 1, ..., N } .
That work focuses on the representation language and proposes a simple
language for integrating logic and probability. It does not integrate any form of
logic-based abductive proof procedure with statistical le arning, and considers
hard assumptions for the nature of constraints (i.e. only disjoint declarations)
in order to keep simple the general procedure. This framework does not assign
probabilities to constraints but only to ground literals.
      </p>
      <p>
        PRISM [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] is a system based on logic programming with multivalued
random variables. It provides no support for integrity constraints but includes a
variety of top-level predicates which can generate abducti ve explanations. Since
a probability distribution over abducibles is introduced, the system chooses the
best explanation using a generalized Viterbi algorithm. Another central feature
of PRISM is its capability to learn probabilities from training data.
      </p>
      <p>
        Two approaches have merged directed and undirected graphical models with
logic. The former [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] exploits Bayesian Logic Programs [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] (BLPs) as a
representation language for abductive reasoning and uses the Expectation
Maximization algorithm to learn the parameters associated to the model. The latter
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], exploiting Markov Logic Networks (MLN), carries out abduction by adding
reverse implications for every rule in the knowledge base (since MLNs provide
only deductive inference). However, the addition of these rules increases the size
and complexity of the model, resulting computationally expensive. It’s worth
noting that like MLNs, most SRL formalisms use deduction for logical inference,
and hence they cannot be used effectively for abductive reasoning.
      </p>
      <p>
        An approach for probabilistic abductive Logic programming with Constraint
Handling Rules has been proposed in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. It differs from other approaches to
probabilistic logic programming by having both interaction with external constraint
solvers and integrity constraints. Moreover exploits probabilities to optimize the
search for explanations using Dijkstra’s shortest path alg orithm. Hence, the
approach explores always the most probable direction, so that the investigation
of less probable alternatives is suppressed or postponed. Although the
knowledge representation formalism is similar to our one, there are several differences
in the approaches. The first difference regards the support for negation. Their
framework do not handle negation, and so they simulate it by introducing new
predicate symbols (eg. haspower(X) → hasnopower(X)). The other difference
involves the definition of the constraints that, due to the lack of negation, might
be tricky. Conversely our ones allow a flexible and intuitive representation
without such restrictions. The last difference is the lack of a strategy to learn the
probabilities.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] abduction is conducted with Stochastic Logic Programs [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] (SLP) by
considering a number of possible worlds. Abductive reasoning is carried out by
reversing the deductive flow of proof and collecting the probabilities associated to
each clause. Although this approach is probabilistically consistent with the SLP
language, abduction through reversion of deduction is quite hazardous because
abductive reasoning by means of deduction without constraints may lead to
wrong conclusions.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Abductive Logic Programming framework</title>
      <p>
        Our proposal is based on Abductive Logic Programming [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] (ALP), a high-level
knowledge representation framework that allows to solve problems declaratively
based on abductive reasoning. It extends Logic Programming by allowing some
predicates, called abducible predicates, to be incompletely defined. Problem
solving is performed by deriving hypotheses on these abducible predicates (abductive
hypotheses) as solutions of the problems to be solved. These problems can be
either observations that need to be explained (as in classical abduction) or goals
to be achieved (as in standard logic programming). An abductive logic program
is made up of a triple &lt; P, A, IC &gt;, where:
– P is a standard logic program;
– A (Abducibles) is a set of predicate names;
– IC (Integrity Constraints or domain-specific properties) is a set of first order
formulae that must be satisfied by the abductive hypotheses.
      </p>
      <p>These three components are used to define abductive explanations.
Definition 1 (Abductive explanation). Given an abductive theory T =&lt;</p>
      <sec id="sec-3-1">
        <title>P, A, IC &gt; and a formula G, an abductive explanation Δ for G is a set of</title>
        <p>ground atoms of predicates in A s.t. P ∪ Δ | = G (Δ explains G) and P ∪ Δ | = IC
(Δ is consistent). When it exists, T abductively entails G, in symbols T | =A G.
Suppose a clause C: h(t1, .., tn) :- l1, ..ln′ , h is the unique literal of the head, n is
its arity, li with i = { 1, ..., n′} are the literals in the body and li stands for the
negative literal ¬ li. For instance, Example 1 defines a logic program P for the
concept printable(X ) that describes the features that a document must own in
order to be printed by a particular printer.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Example 1 (Example theory for paper domain).</title>
        <p>c1 : printable(X) ← a4(X), text(X)
c2 : printable(X) ← a4(X), table(X), black_white(X)
c3 : printable(X) ← a4(X), text(X), black_white(X)
A = { image, text, black_white, printable, table, a4, a5, a3}</p>
        <p>
          In this framework, a proof procedure for abductive logic programs has been
presented in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. It interleaves phases of abductive and consistency derivations:
an abductive derivation is the standard Logic Programming derivation extended
in order to consider abducibles. When an abducible literal δ has to be proved,
it is added to the current set of assumptions (if not already included). Since the
addition of δ must not violate any integrity constraint, a consistency derivation
starts to check that all integrity constraints containing δ fail. In the consistency
derivation an abductive derivation is used to solve each goal. This might cause
an extension of the current set of assumptions.
        </p>
        <p>More specifically an abductive derivation from (G1 Δ1) to (Gn Δn) in &lt;
P, A, IC &gt; of a literal from a goal, is a sequence</p>
        <p>(G1 Δ1), (G2 Δ2), ..., (Gn Δn)
such that each Gi has the form ← L1, ..., Lk and (Gi+1Δi+1) is obtained
according to one of the following rules:
1. If Lj is not abducible, then Gi+1 = C and Δi+1 = Δi where C is the resolvent of
some clause in P with Gi on the selected literal Lj ;
2. If Lj is abducible and Lj ∈ Δi, then Gi+1 =← L1, ..., Lj−1, Lj+1, ..., Lk and
Δi+1 = Δi;
3. If Lj is a ground abducible, Lj 6∈ Δi and Lj 6∈ Δi and there exists a consistency
derivation from ({ Lj } Δi∪{ Lj } ) to(}{ Δ′) then Gi+1 =← L1, ..., Lj−1, Lj+1, ..., Lk
and Δi+1 = Δ′
In the first two steps the logical resolution is performed exploiting: (1) the rules
of P and (2) the abductive assumptions already made. In the last step before
adding a new assumption to the current set of assumption a consistency checking
is performed.</p>
        <p>A consistency derivation for an abducible α from (α, Δ1) to (Fn Δn) in
&lt; P, A, IC &gt; is a sequence</p>
        <p>(α Δ1), (F1 Δ1), ..., (Fn Δn)
where:
1. F1 is the union of all goals of the form ← L1, ..., Ln obtained by resolving
the abducible α with the constraints in IC with no such goal being empty;
2. for each i &gt; 1 let Fi have the form {← L1, ..., Lk} ∪ Fi′, then for some
j = 1, ..., k and each (Fi+1 Δi+1) is obtained according to one of the following
rules:
(a) If Lj is not abducible, then Fi+1 = C′ ∪ Fi′ where C′ is the set of all resolvents
of clauses in P with ← L1, ..., Lk on literal Lj and the empty goal [] 6∈ C′, and
Δi+1 = Δ
(b) If Lj is abducible, Lj ∈ Δi and k &gt; 1, then Fi+1 = ←{ L1, ..., Lk} ∪ Fi′ and
Δi+1 = Δi
(c) If Lj is abducible, Lj ∈ Δi then Fi+1 = Fi′ and Δi+1 = Δi
(d) If Lj is a ground abducible, Lj 6∈ Δi and Lj 6∈ Δi and there exists an abductive
derivation from (← Lj Δi) to ([] Δ′) then Fi+1 = Fi′ and Δi+1 = Δ′;
(e) If Lj is equal to A with A a ground atom and there exists an abductive
derivation from (← A Δi) to ([] Δ′) then Fi+1 = Fi′ and Δi+1 = Δ′.</p>
        <p>In the first case 2a the current branch is split into the number of resolvents
of ← L1, ..., Lk with the clauses in P on Lj. If we get the empty clause the whole
check fails. In the second case if Lj belongs to Δi, it is discarded and if it is
alone, the derivation fails. In case 2c the current branch is consistent with the
assumptions in Δi and so it is dropped from the checking. In the last two cases
2d and 2e the current branch can be dropped if respectively ← Lj and A are
abductively probable.</p>
        <p>
          The procedure returns the minimal abductive explanation set if any,
otherwise it fails. It is worth noting that according to the implementation in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] the Δs
in the abductive explanations must be ground. Thus if a non ground abducible
is encountered it is first unified with a clause in the background knowledge, with
an example or with a previously abduced literal. If this is not possible, it is
grounded with a skolem constant.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Probabilistic Abductive Logic Programming</title>
      <p>This work extends the technique shown in Section 3 in order to smooth the
classical rigid approach with a statistical one.</p>
      <p>In order to motivate our approach we can assume that to abduce a fact, there
is the need for checking if there are constraints that prevent such an abduction.
The constraints can be either universally valid laws (such as temporal or physical
ones), or domain-specific restrictions. Constraints verifi cation can involve other
hypotheses that have to be abduced, and others that can be deductively proved.
We can be sure that if a hypothesis is deductively verified, it can be surely
assumed true. Conversely, if a hypothesis involves other abductions, there is the
need of evaluating all possible situations before assuming the best one. In this
view each abductive explanation can be seen as a possible world, since each time
one assumes something he conjectures the existence of that situation in a specific
world. Some abductions might be very unlikely in some worlds, but most likely
in other ones. The likelihood of an abduction can be assessed considering what
we have seen in the real world and what we should expect to see.</p>
      <p>Moreover, this work handles a new kind of constraints since typically the
constraints are only the nand of the conditions and are not probabilistic. They
allow a more suitable and understandable combination of situations. The first
kind is the classical nand denial where at least one condition must be false,
the type or represent a set of conditions where at least one must be true, and
the type xor requires that only one condition must be true. Moreover, due to
noise and uncertainty in the real world, we have smoothed each constraint with
a probability that reflects the degree of the personal belief in the likelihood of
the whole constraint.</p>
      <p>In Example 2 it can be seen that each probabilistic constraint is a triple
hP r, S, T i where P r is a probability and expresses its reliability, or our confidence
on it, T is the type of constraint and represents the kind of denial, and S is the
set of literals of the constraint. For example ic1 states that a document can be
only of one of the three size formats (a3, a4 or a5) and that our personal belief
in the likelihood of this constraint is 0.8, ic2 states that a document can be
composed either of tables, images or text, and so on.</p>
      <sec id="sec-4-1">
        <title>Example 2 (Typed Probabilistic Constraints).</title>
        <p>ic1 = h0.8, [a3(X), a4(X), a5(X)], xori
ic2 = h0.9, [table(X), text(X), image(X)], ori
ic3 = h0.3, [text(X), color(X)], nandi
ic4 = h0.3, [table(X), color(X)], nandi
ic5 = h0.6, [black_white(X), color(X)], xori
Our probabilistic approach to logical abductive reasoning can be described from
two perspectives: the logical proof procedure and the computation of
probabilities.
4.1</p>
        <sec id="sec-4-1-1">
          <title>ALP perspective</title>
          <p>
            The logical proof procedure consists of an extended version of the classical one
[
            <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
            ]. While the classical procedure resolved the goal by looking for one minimal
abductive explanation set, our approach is to generate different (minimal)
abductive explanations and then evaluate them all. This goal can be achieved by
changing each assumption that may constrain the subsequent abductive
explanations. For example, supposing that a document is black and white, all subsequent
assumptions must be consistent with this. But if we change this assumption,
assuming that in another world that document is not black and white, we might
obtain another set of hypoteses consistent with this last assumption. Each time
the procedure assumes something two possible worlds can be considered: one
where the assumption holds and another where it does not. The overall view is
analogous to the exploration of a tree in which each interior node corresponds to
a decision (an assumption), as for example changing the truth value of a literal,
an edge represents the consequences of that decision and a leaf is the conclusion
of the abductive reasoning in a particular possibile consistent world. In order to
generate such possible worlds in which a literal holds and others in which it does
not, the classical procedure has been extended by means of introducing two rules
for the derivation procedures. The 4th rule of the abductive derivation will be:
4. If Lj is a ground abducible, Lj 6∈ Δi and Lj 6∈ Δi and there exists a consistency
derivation from ({ Lj } Δi∪{ Lj } ) to (}{ Δ′) then Gi+1 =← L1, ..., Lj−1, Lj+1, ..., Lk
and Δi+1 = Δ′
A corresponding rule 2d* for the consistency derivation is introduced between
2d and 2e, and states:
2d* If Lj is a ground abducible, Lj 6∈ Δi and Lj 6∈ Δi and there exists an abductive
derivation from (← Lj Δi) to ([] Δ′) then Fi+1 = Fi′ and Δi+1 = Δ′;
It can be noted that the rules 3 and 4 of the abductive derivation are the choice
points on which the procedure can backtrack in order to assume both values
(positive and negative) of a literal. In fact in a possible world Lj is added to the
set of hypoteses by means of 3rd rule, and in the other possible world Lj holds
by means of 4th rule. The analogous choice points in the consistency derivation
are respectively the rules 2d* and 2d.
          </p>
          <p>The choice of the predicate definition, that is the 1st rule of the abductive
derivation, is another choice point where the procedure can backtrack to explore
different explanations and consequently other possible worlds. In fact choosing
different definitions, other literals will be taken into account and thus other
constraints must be satisfied, and so on. It is worth noting that if some assumptions
do not preserve the consistency, the procedure discards them along with the
corresponding possible worlds. Section 4.2 will present a probabilistic strategy to
choose the most likely explanation among all possible worlds.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Example 3 (Observation o1, Query and Possible Explanations).</title>
        <p>a4(o1) ?- printable(o1)
Δ1 = { text(o1), table(o1)} Ic1 = { ic2, ic3, ic4}
Δ2 = { text(o1), table(o1), image(o1)} Ic2 = { ic2, ic3, ic4}
Δ3 = { table(o1), black_white(o1)} Ic3 = { ic2, ic4, ic5}
Δ4 = { text(o1), black_white(o1)} Ic4 = { ic2, ic3, ic5}
In order to motivate our point of view, lets’ consider Exampl e 3 belonging to
the “paper” domain presented in Section 3. In order to abduct ively cover o1
by means of the concept definition printable(X ) (i.e. query ?- printable(o1)),
there is the need for abducing other literals since the concept definitions ci with
i = { 1, ..., 3} need some facts that are not in the knowledge base (KB) (i.e. only
a4(o1) holds). This is the classical situation in which deductive reasoning fails
and there is the need of abductive one.</p>
        <p>The procedure executes an abductive derivation applying the 1st rule and
obtaining one of the resolvent in P , for instance, the first clause c1 : printable(X) ←
a4(X), text(X). In this case a4(o1) holds, and so we can exclude it from the
abductive procedure continuing with the next literal text(o1). This literal does not
hold, and so there is the need of abducing it. This abduction involves the
verification of ic2 and ic3 by the consistency derivation. Since ic2 is a or constraint
and thus at least one literal must be true, there are two possible ways to satisfy
it considering that text(o1) must not hold (by current hypothesis). Thus there
are two possible worlds, one in which table(o1) holds, and the other in which
table(o1) does not.</p>
        <p>In the former world rule 2d* fires and table(o1) is abduced by performing
a consistency derivation on the constraints ic2 and ic4. The former constraint
is already verified by means of the previous abductions (rule 2b). Since ic4 is a
nand constraint and thus at least one literal must be false, it is satisfied because
color is not abducible (see abducible predicates A in Example 1) and does not
hold (if a literal is not abducible, its value depends on background knowledge,
i.e. rule 2a). Thus coming back to the initial abduction text(o1), ic2 is satisfied
abducing table(o1) and ic3 is already satisfied (since it is a nand constraint, it
is falsified by abducing text(o1)). The first abductive explanation Δ1 for the
goal printable(o1) is so formed by the abductions of text(o1) and table(o1) (see
Example 3).</p>
        <p>Similarly to the classical procedure that in backtracking returns all minimal
abductive explanations, our procedure comes back to each choice point changing
the truth values of the literals in order to explore different possible explanations
(and thus other possible worlds). The first choice point, as mentioned above,
regards the abduction of table(o1). While in the former possible world table(o1)
holds, now the procedure abduces table(o1) (rule 2d) thus exploring the other
possible worlds. In this world to abduce table(o1) the constraints ic2 and ic4 are
taken into account. The former constraint is verified by abducing image(o1) (i.e.
or constraint) and the latter is satisfied by the initial abduction text(o1) as in
explanation Δ1. Then text(o1) can be abduced also in this world since ic2 has
been just verified and ic3 as before. So Δ2 is the second explanation obtained by
changing the assumption on literal table(o1) and thus exploiting another possible
world.</p>
        <p>The explanations Δ1 and Δ2 are the only two possible worlds given the first
clause c1 because other literal configurations are inconsistent. Hence, the
procedure comes back to last backtracking point that were the choice of the predicate
definition. This time the 1st rule of the abductive derivation can be applied to
obtain the second predicate definition c2 : printable(X) ← a4(X), table(X), black_white(X).
So there is the need of abducing table(o1) and black_white(o1). The former can
be abduced because ic2 is satisfied (i.e. or constraint), and ic4 is verified by
means of color(o1). The latter literal can be abduced because ic5 is a xor
constraint and thus at most one literal must be true. In explanation Δ3 no other
literals must be abduced by the consistency derivation and so no other worlds
must be explored since the abduced literals table(o1) and black_white(o1) are
constrained by the predicate definition.</p>
        <p>Once again the procedure comes back to last backtracking point and chooses
the predicate definition c3 : printable(X) ← a4(X), text(X), black_white(X).
The abductive derivation, similarly to the previous run, returns only one possible
explanation Δ4 as it can been seen in Example 3.</p>
        <p>It easy to note that rules (2d) and (2d*) are candidate entry points for
backtracking in the consistency derivation , and rules 1,3 and 4 are the analogues
in the abductive derivation. In this way all possible (minimal) explanations are
obtained along with all possible consistent worlds.
4.2</p>
        <sec id="sec-4-2-1">
          <title>Probabilistic perspective</title>
          <p>After all different explanations have been found by the above abductive proof
procedure, the issue of selecting the best one arises. The simple approach of
choosing the minimal explanation is reliable when there is only one minimal
proof or when there are no ways to assess the reliability of the explanations.
However, Example 3 shows that there might be different explanations for the
same observation and so we need to assess the probability of each explanation in
order to choose the best one. To face this issue, our approach regards each set of
abductions as a possible world and so a chance of being true can be assigned to
it. The abduction probability of each ground literal through the possible worlds
can be obtained considering two aspects: the chance of being true in the real
world and all sets of assumptions made during the consistency derivation in
each possible world.</p>
          <p>Lets’ introduce some notation: Δ = { P1 : (Δ1, Ic1), ..., PT : (ΔT , IcT )} is
the set of the T consistent possible worlds that can be assumed for proving
a goal G (i.e. the observation to be proved). Each (Δi, Ici) is a pair of sets:
Δi = { δ1, ..., δJ } contains the ground literals δj with j ∈ { 1, ..., J } abduced in a
single abductive proof, and Ici = { ic1, ..., icK} is the set of the constraints ick
with k = { 1, ..., K} involved in the explanation Δi. Both Δi and Ici may be
empty. Moreover, we have used the following symbols in our equations: n(δj) is
the number of true grounding of the predicate used in literal δj, n(cons) is total
number of constants encountered in the world, a(δj) is the arity of literal δj and
P (ick) is the probability of the kth-constraint. Thus the chance of being true of
a ground literal δj can be defined as:</p>
          <p>P (δj) =
n(δj)
n(cons)!
(n(cons)−a(δj))!
(1)
Then the unnormalized probability of the abductive explanation can be assessed
by Equation 2 and the probability of the abductive explanation normalized over
all T consistent worlds can be computed as in Equation 3:</p>
          <p>J K
P(′Δi,Ici) = Y P (δj) ∗ Y P (ick) (2)
j=1 k=1</p>
          <p>P(Δi,Ici) =</p>
          <p>P(′Δi,Ici)
PtT=1 P(′Δt,Ict)
(3)</p>
          <p>Equation 1 expresses the ratio between true and possible groundings of literal
δj. The intuition behind this equation can be expressed with a simple example:
given a feature f (· ) and an item obj that does not have such a feature, if we want
to assess the probability that obj owns f (· ) (i.e. P (f (obj))), we should consider
how often we found items that hold f (· ) over all items that might own it in real
world.</p>
          <p>
            It is worth noting that we are considering the Object Identity [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ] assumption
and so within a clause, terms (even variables) denoted with different symbols
must be distinct (i.e. they must refer to different objects). Thus only literal
groundings without constants repetitions are allowed in Equation 1. It is
important to underline that such a bias does not limit the expressive power of
the language, since for any clause/program it is possible to find an equivalent
version under Object Identity. The first part of the formula 2 encloses the
probability that all abduced literals are true in that particular world. The second
part expresses the reliability of the constraints involved in the i-th abductive
explanation. Although our approach is focused on the computation of the most
probable explanation and hence there is no need of normalizing the probabilities
of the explanations over all possible worlds (i.e some worlds are ruled out due
to the presence of integrity constraints), it follows that Equation 3 is presented
for completeness. The probability of δj is equal to 1 − P (δj).
          </p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>Example 4 (Probability assessment of the Abductive Explanations).</title>
        <p>A = { 0.2:image, 0.4:text, 0.1:black_white, 0.6:printable, 0.1:table, 0.9:a4, 0.1:a5, 0.1:a3}
P ′(Δ1, Ic1) = 0.00486 P ′(Δ2, Ic2) = 0.00875
P ′(Δ3, Ic3) = 0.00162 P ′(Δ4, Ic4) = 0.00648
For the sake of clarity, each abducible literal of the current example has been
labeled with its probability using formula (1) as shown in Example 4. For
instance, the probability of the first explanation using (2) can be computed
as P(′Δ1,Ic1) = P (text(o1)) ∗ P (table(o1)) ∗ P (ic2) ∗ P (ic3) ∗ P (ic4) and thus
P(′Δ1,Ic1) = 0.6 ∗ 0.1 ∗ 0.9 ∗ 0.3 ∗ 0.3 = 0.00486. Then, Example 4 shows the
probability of all explanations computed using equation (2).</p>
        <p>Finally we can state the maximum probability among all abductive
explanations. It corresponds to the maximum between all T possible consistent worlds
(in this example T is equal to 4) s.t. P ′(printable(o1)) = max1≤i≤T Pi′(Δi, Ici),
that is Δ2. It is worth noting that this behaviour claims the need of the
logical abductive reasoning to be supported by a probabilistic assessment of all
abductive explanations rather than relying on the minimal one.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Improving Classification Exploiting Probabilistic</title>
    </sec>
    <sec id="sec-6">
      <title>Abductive Reasoning</title>
      <p>
        Now the above proof procedure can be exploited to classify never-seen instances.
In particular we first learns from data the model (i.e. the Abductive Logic
Program &lt; P, A, IC &gt;) and the parameters (i.e. literals probabilities), and then
our Probabilistic Abductive Logic proof procedure can be exploited to classify
new instances. The strategy, presented in Algorithm 1, can be split into two
parts: the former prepares the data (model and parameters), and the latter
performs the classification. In the former part, given a train set T raini and a set
of abducible literals A (possibly empty) our approach learns the corresponding
Theory Ti (line 1) by exploiting INTHELEX [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], a well-known ILP system , and
then obtains the integrity constraints (line 2) with the procedure described in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Such procedure returns a set of nand constraints of different sizes and descriptor
type domain. This last information can be useful to define a new kind of
constraint called type_domain that can be dealt as an xor constraint. In fact if the
descriptor type domain for the color property is {blue, red, yellow, black, green},
and the object X is part of an observation, it will be impossible to abduce two
different color descriptors from the above set applied to X. However since our
procedure handles natively a probability value associated to the constraints, and
this procedure does not return those values, the constraints should be manually
labelled or they will be automatically considered true with probability of 1.0. In
any case, the set of abducible A can be left empty, because our system considers
abducible all predicates without definitions in Ti. The last step of the first part
(line 3) computes the Equation 1 for each literal in T raini before starting the
abductive procedure since those values depend only on T raini.
      </p>
      <p>Hence, the second part of the strategy starts at line 5. As can be seen at lines
8 and 9, our algorithm tries to abductively cover the example considering both
as positive and as negative for the class c. In fact, when an example is considered
negative, our procedure discovers all possibile worlds in which it cannot be
abductively proved as instance of concept c. Specularly if the example is positive, it
discovers all the possibile worlds in which must be abduced something to prove
it. Those two executions return an explanation probability that can be compared
each other in order to choose the best class. Then the algorithm selects the best
classification between all concept probabilities as can be seen at line 16.</p>
      <p>It is worth noting that this strategy cannot be performed by a pure abductive
logic proof procedure since in such context we do not need a logical response
(true/false) but we want a probabilistic value.
6</p>
    </sec>
    <sec id="sec-7">
      <title>Experimental Evaluation</title>
      <p>
        The evaluation is aimed at assessing the quality of the results obtained by the
probabilistic classification when it faces incomplete and noisy data. All
experiments were performed on datasets obtained from UCI [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] machine learning
repository.
      </p>
      <p>A 10-fold split of each dataset has been performed in order to obtain a set
of 10 couples &lt;T rain, T est&gt;. Then each test-set has been replaced by a set
of corrupted versions, in which we removed at random a K% of each example
description, with K varying from 10% to 70% with step 10. This procedure has
been repeated 5 times for each corrupted test-set in order to randomize the
example corruption. In this way 35 test-sets for each fold ha ve been obtained
(7 levels of corruption by 5 runs for level). In order to compare the outcomes
with a complete test-set we exploited deductive reasoning o n the original test-set
(i.e corruption level 0%). Moreover we exploited only deductive reasoning to the
same corrupted test-set in order to show the improvement of o ur approach. The
maximum length of constraints has been set to 4 for all datasets. Since we do
not have any previous knowledge on the datasets we assumed true all obtained
constraints with a probability of 1.0 as described in the Section 5.</p>
      <p>Then the performances of the system can be evaluated with the aim of
understanding how the approach is sensible to the progressive lack of knowledge
across the 10 folds. The following synthetic descriptions of the datasets refer to
values averaged on the folds.</p>
      <p>Breast-Cancer contains 201 instances of the benign class and 85 instances
of the malignant class and each instance is described by 9 literals. There is
the presence of less than 10 instances with missing values. The learned theory
consists of 30 clauses where each one has an average of 6 literals in the body.</p>
      <p>The learned integrity constraints are 1784 (55% are constraints of length 4, 35%
of length 3 and 10% of length 2), and 9 type domain constraints (one for each
literal in the example description language).</p>
      <p>Congressional Voting Records contains 435 instances (267 democrats,
168 republicans) classified as democrats or republicans according to their votes.
Each instance is described by 16 literals. The obtained theory consists of 35
clauses where each one has an average of 4.5 literals in the body. The learned
integrity constraints are 4173 (16% are constraints of length 4, 37% of length 3
and 47% of length 2), and 16 type domain constraints (as before).</p>
      <p>Tic-Tac-Toe dataset contains 958 end game board configurations of tic-ta
ctoe (about 65.3% are positive), where x is assumed to have played first. The
target concept win(x) represents one of the 8 possible ways to create a
three-ina-row. Thus, each instance is described by 8 literals. The le arned theory consists
of 18 clauses where each has an average of 4 literals in the body. The learned
integrity constraints are 1863 (99% are constraints of length 4, 1% of length 3),
and 16 type domain constraints (as before).</p>
      <sec id="sec-7-1">
        <title>Results and Discussion</title>
        <p>the corruption levels. The probabilistic classification performed on the first two
datasets allows to assess the strength and robustness of the approach. In fact
their accuracy curves go down less fast than the thirds’ one a nd even when the
70% of each example description has been removed, the system is able to classify
never-seen instances with a mean accuracy of 0.7 and 0.6 resp ectively.</p>
        <p>In order to understand the non-linear trend of the accuracy o n the third
dataset we can analyze Table 1. It shows the classification performances averaged
within the same fold (i.e. 5 random corruptions) and between different folds
for each level of corruption. We can find a sharp decay of recall for the third
dataset in correspondence of the descending accuracy curve. This happens for
two reasons: each example is less described than the ones of the other datasets
(8 literals for description) and its background theory consists of an average of 4
literals for clause. Those two aspects make the approach more prone to abduce
few literals (often a single one) to make positive examples never covered by the
class definitions.</p>
        <p>Comparing these results with the deductive approach, it can be noted that
in the first dataset after the 30% of corruption, no positive example has been
classified correctly. This deficiency has never affected our approach in all
experiments. The performances of the deductive approach on last two datasets degrade
faster compared to the abductive one.</p>
        <p>In general if the examples are less described (as in Tic-Tac- Toe and
BreastCancer), the number of misclassifications increases due to missing information.
It follows that corruption levels greater than 50% rise up the chance of removing
important object features, and so such experiments have been performed only
for putting stress on the proposed approach.
7</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Conclusions</title>
      <p>Reasoning in very complex contexts often requires pure deductive reasoning to
be supported by a variety of techniques that can cope with incomplete and
uncertain data. Abductive inference allows to guess information that has not
been explicitly observed. Since there are many explanations for such guesses,
there is the need for assigning a probability to them in order to choose the best
one.</p>
      <p>In this paper we propose a strategy to rank all possible minimal explanations
according to their chance of being true. We have faced two issues: the generation
of multiple explanations consistent with the background knowledge, and the
definition of a strategy to prioritize among different ones using their chance of
being true according to the notion of possible worlds. Another novelty has been
the introduction of probabilistic integrity constraints rather than hard ones as
in the classical Abductive Logic Programming framework.</p>
      <p>The proposal has been described from two perspectives: the abductive proof
procedure and the computation of the probabilities. The former, extending the
classical one, allows to generate many different explanations for each abduction,
while the latter provides a probabilistic assessment of each explanation in order
to choose the most likely one. Moreover we introduce a strategy to exploit our
Probabilistic Abductive Proof procedure in classification tasks. The approach
has been evaluated on three standard datasets, showing that it is able to
correctly classify unseen instances in the presence of noisy and missing information.
Further studies will be focused on learning the probabilistic constraints, and on
the use of a richer probabilistic model for the literal probability distribution.
Then we aim to exploit the Probabilistic Abductive Proof Procedure in other
tasks such as natural language understanding and plan recognition.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Arvanitis</surname>
          </string-name>
          ,
          <string-name>
            <surname>Stephen H. Muggleton</surname>
            , Jianzhong Chen, and
            <given-names>Hiroaki</given-names>
          </string-name>
          <string-name>
            <surname>Watanabe</surname>
          </string-name>
          .
          <article-title>Abduction with stochastic logic programs based on a possible worlds semantics</article-title>
          .
          <source>In In Short Paper Proc. of 16th ILP</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bache</surname>
          </string-name>
          and
          <string-name>
            <surname>M. Lichman.</surname>
          </string-name>
          <article-title>UCI machine learning repository</article-title>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Henning</given-names>
            <surname>Christiansen</surname>
          </string-name>
          .
          <article-title>Implementing probabilistic abductive logic programming with Constraint Handling Rules</article-title>
          . pages
          <fpage>851</fpage>
          -
          <lpage>18</lpage>
          .
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Floriana</given-names>
            <surname>Esposito</surname>
          </string-name>
          , Giovanni Semeraro, Nicola Fanizzi, and
          <string-name>
            <given-names>Stefano</given-names>
            <surname>Ferilli</surname>
          </string-name>
          .
          <article-title>Multistrategy theory revision: Induction and abduction in inthelex</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>38</volume>
          :
          <fpage>1331</fpage>
          -
          <lpage>56</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Stefano</given-names>
            <surname>Ferilli</surname>
          </string-name>
          ,
          <string-name>
            <surname>Teresa M. A. Basile</surname>
            , Nicola Di Mauro, and
            <given-names>Floriana</given-names>
          </string-name>
          <string-name>
            <surname>Esposito</surname>
          </string-name>
          .
          <article-title>Automatic induction of abduction and abstraction theories from observations</article-title>
          .
          <source>In Proc. of the 15th ILP, ILP0'5</source>
          , pages
          <fpage>1031</fpage>
          -
          <lpage>20</lpage>
          , Berlin, Heidelberg,
          <year>2005</year>
          . Springe rVerlag.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Lise</given-names>
            <surname>Carol Getoor</surname>
          </string-name>
          .
          <article-title>Learning statistical models from relational data</article-title>
          .
          <source>PhD thesis</source>
          , Stanford, CA, USA,
          <year>2002</year>
          . AAI3038093.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Kakas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Kowalski</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Toni</surname>
          </string-name>
          .
          <source>Abductive logic programming</source>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Antonis</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Kakas</surname>
            and
            <given-names>Fabrizio</given-names>
          </string-name>
          <string-name>
            <surname>Riguzzi</surname>
          </string-name>
          .
          <article-title>Abductive concept learning</article-title>
          .
          <source>New Generation Comput.</source>
          ,
          <volume>18</volume>
          (
          <issue>3</issue>
          ):
          <fpage>2432</fpage>
          -
          <lpage>94</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Rohit</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kate</surname>
            and
            <given-names>Raymond J.</given-names>
          </string-name>
          <string-name>
            <surname>Mooney</surname>
          </string-name>
          .
          <article-title>Probabilistic abduction using markov logic networks</article-title>
          .
          <source>In Proceedings of the IJCAI0-9 Workshop on Plan, Activity, and Intent Recognition (PAIR0-9)</source>
          , Pasadena, CA,
          <year>July 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Muggleton</surname>
          </string-name>
          .
          <article-title>Stochastic Logic Programs</article-title>
          . In L. De Raedt, editor,
          <source>Proceedings of the 5th International Workshop on Inductive Logi c Programming</source>
          . Department of Computer Science, Katholieke Universiteit Leuven,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Judea</given-names>
            <surname>Pearl</surname>
          </string-name>
          .
          <article-title>Graphical models for probabilistic and causal reasoning</article-title>
          .
          <source>In The Computer Science and Engineering Handbook</source>
          , pages
          <fpage>6977</fpage>
          -
          <lpage>14</lpage>
          .
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>David</given-names>
            <surname>Poole</surname>
          </string-name>
          .
          <article-title>Probabilistic horn abduction and bayesian networks</article-title>
          .
          <source>Artif. Intell.</source>
          ,
          <volume>64</volume>
          (
          <issue>1</issue>
          ):
          <fpage>811</fpage>
          -
          <lpage>29</lpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>David</given-names>
            <surname>Poole</surname>
          </string-name>
          .
          <article-title>Learning, Bayesian probability, graphical models, and abduction</article-title>
          .
          <source>In Abduction and Induction: Essays on their Relation and Integration</source>
          , Chapter
          <volume>10</volume>
          , pages
          <fpage>1531</fpage>
          -
          <lpage>68</lpage>
          . Kluwer,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Luc De Raedt and Kristian Kersting</surname>
          </string-name>
          .
          <article-title>Probabilistic inductive logic programming</article-title>
          .
          <source>In ALT</source>
          , pages
          <fpage>193</fpage>
          -
          <lpage>6</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Sindhu</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Raghavan</surname>
          </string-name>
          .
          <article-title>Bayesian abductive logic programs: A probabilistic logic for abductive reasoning</article-title>
          . In Toby Walsh, editor,
          <source>IJCAI</source>
          , pages
          <fpage>28402</fpage>
          -
          <lpage>841</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Taisuke</given-names>
            <surname>Sato</surname>
          </string-name>
          .
          <article-title>Em learning for symbolic-statistical mo dels in statistical abduction</article-title>
          .
          <source>In Progress in Discovery Science, Final Report of the Japanese Discovery Science Project</source>
          , pages
          <fpage>1892</fpage>
          -
          <lpage>00</lpage>
          , London, UK, UK,
          <year>2002</year>
          . Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Giovanni</surname>
            <given-names>Semeraro</given-names>
          </string-name>
          , Floriana Esposito, Donato Malerba, Nicola Fanizzi, and
          <string-name>
            <given-names>Stefano</given-names>
            <surname>Ferilli</surname>
          </string-name>
          .
          <article-title>A logic framework for the incremental inductive synthesis of datalog theories</article-title>
          . In Norbert E. Fuchs, editor,
          <source>LOPSTR</source>
          , volume
          <volume>1463</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>3003</fpage>
          -
          <lpage>21</lpage>
          . Springer,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>