-

Probabilistic Abductive Logic Programming using Possible Worlds

F. Rotella

fulvio.rotella@uniba.it 1

S. Ferilli

stefano.ferilli@uniba.it 0 1 0 Centro Interdipartimentale per la Logica e sue Applicazioni - Università di Bari 1 Dipartimento di Informatica - Università di Bari

131 145

Reasoning in very complex contexts often requires purely deductive reasoning to be supported by a variety of techniques that can cope with incomplete data. Abductive inference allows to guess information that has not been explicitly observed. Since there are many explanations for such guesses, there is the need for assigning a probability to each one. This work exploits logical abduction to produce multiple explanations consistent with a given background knowledge and defines a strategy to prioritize them using their chance of being true. Another novelty is the introduction of probabilistic integrity constraints rather than hard ones. Then we propose a strategy that learns model and parameters from data and exploits our Probabilistic Abductive Proof Procedure to classify never-seen instances. This approach has been te sted on some standard datasets showing that it improves accuracy in presence of corruptions and missing data.

In the field of Artificial Intelligence (AI) so far two approaches have been attempted: numerical/statistical on one hand, and relational on the other. Statistical methods are not able to fully seize the complex network of relationships, often hidden, between events, objects or combinations thereof. In order to apply AI techniques to learn/reason in the real world, one might be more interested on producing and handling complex representations of data than flat ones. This first challenge has been faced by exploiting First-Order Logic (F OL) for representing the world. This setting is useful when data are certain and complete. However this is far for being always true, there is the need for handling incompleteness and noise in data. Uncertainty in relational data makes things even more complex. In particular it can the affect the features, the type or more generally the identity of an object and the relationships in which it is involved. When putting together logical and statistical learning, the former provides the representation language and reasoning strategies, and the latter enforces robustness. This gave rise to a new research area known as Probabilistic Inductive Logic Programming [ 14 ] (PILP) or Statistical Relational Learning [ 6 ] (SRL). Although clearly relevant in complex domains such as Social or Biological data, it inherits well-known problems from Probabilistic Graphical Models [ 11 ] (PGM) (parameter and model learning, inference).

Furthermore, reasoning in contexts characterized by a high degree of complexity often requires purely deductive reasoning to be supported by a variety of techniques that cope with the lack or incompleteness of the observations. Abductive inference can tackle incompleteness in the data by allowing to guess information that has not been explicitly observed [ 7 ]. For instance, if one is behind a corner and a ball comes out of it, a good explanation might be that someone has kicked it. However there are many other plausible explanations for the ball’s movement, and thus that inference provides no cer tainty that someone really stroke the ball. While humans are able to discriminate which explanations are consistent with their previous knowledge and which ones have to be discarded, embedding in machines this capability is not easy.

This work faces two issues: the generation of multiple (and minimal) explanations consistent with a given background knowledge, and the definition of a strategy to prioritize different explanations using their chance of being true. It is organized as follows: the next section describes some related works; Section 3 introduces the Abductive Logic Programming framework; our Probabilistic Abductive Logic Proof procedure is presented in Section 4; then in Section 5 we propose a strategy to exploit our approach in classification tasks; finally there is an empirical evaluation on three standard datasets followed by some considerations and future works. 2

Related Work

Abductive reasoning is typically used to face uncertainty and incompleteness. In the literature there are two main approaches: uncertainty has been faced by bayesian probabilistic graphical models [ 13 ] and incompleteness by means of the classical approach based on pure logic [ 7 ]. Nowadays many works combined logical abduction and statistical inference, which allows to rank all possible explanations and choose the best one.

One of the earliest approaches is [ 12 ] where a program contains non-probabilistic definite clauses and probabilistic disjoint declarations { h1 : p1, ..., hn : pn} where an abducible atom hi is considered true with probability pi and i ∈ { 1, ..., N } . That work focuses on the representation language and proposes a simple language for integrating logic and probability. It does not integrate any form of logic-based abductive proof procedure with statistical le arning, and considers hard assumptions for the nature of constraints (i.e. only disjoint declarations) in order to keep simple the general procedure. This framework does not assign probabilities to constraints but only to ground literals.

PRISM [ 16 ] is a system based on logic programming with multivalued random variables. It provides no support for integrity constraints but includes a variety of top-level predicates which can generate abducti ve explanations. Since a probability distribution over abducibles is introduced, the system chooses the best explanation using a generalized Viterbi algorithm. Another central feature of PRISM is its capability to learn probabilities from training data.

Two approaches have merged directed and undirected graphical models with logic. The former [ 15 ] exploits Bayesian Logic Programs [ 14 ] (BLPs) as a representation language for abductive reasoning and uses the Expectation Maximization algorithm to learn the parameters associated to the model. The latter [ 9 ], exploiting Markov Logic Networks (MLN), carries out abduction by adding reverse implications for every rule in the knowledge base (since MLNs provide only deductive inference). However, the addition of these rules increases the size and complexity of the model, resulting computationally expensive. It’s worth noting that like MLNs, most SRL formalisms use deduction for logical inference, and hence they cannot be used effectively for abductive reasoning.

An approach for probabilistic abductive Logic programming with Constraint Handling Rules has been proposed in [ 3 ]. It differs from other approaches to probabilistic logic programming by having both interaction with external constraint solvers and integrity constraints. Moreover exploits probabilities to optimize the search for explanations using Dijkstra’s shortest path alg orithm. Hence, the approach explores always the most probable direction, so that the investigation of less probable alternatives is suppressed or postponed. Although the knowledge representation formalism is similar to our one, there are several differences in the approaches. The first difference regards the support for negation. Their framework do not handle negation, and so they simulate it by introducing new predicate symbols (eg. haspower(X) → hasnopower(X)). The other difference involves the definition of the constraints that, due to the lack of negation, might be tricky. Conversely our ones allow a flexible and intuitive representation without such restrictions. The last difference is the lack of a strategy to learn the probabilities.

In [ 1 ] abduction is conducted with Stochastic Logic Programs [ 10 ] (SLP) by considering a number of possible worlds. Abductive reasoning is carried out by reversing the deductive flow of proof and collecting the probabilities associated to each clause. Although this approach is probabilistically consistent with the SLP language, abduction through reversion of deduction is quite hazardous because abductive reasoning by means of deduction without constraints may lead to wrong conclusions. 3

Abductive Logic Programming framework

Our proposal is based on Abductive Logic Programming [ 7 ] (ALP), a high-level knowledge representation framework that allows to solve problems declaratively based on abductive reasoning. It extends Logic Programming by allowing some predicates, called abducible predicates, to be incompletely defined. Problem solving is performed by deriving hypotheses on these abducible predicates (abductive hypotheses) as solutions of the problems to be solved. These problems can be either observations that need to be explained (as in classical abduction) or goals to be achieved (as in standard logic programming). An abductive logic program is made up of a triple < P, A, IC >, where: – P is a standard logic program; – A (Abducibles) is a set of predicate names; – IC (Integrity Constraints or domain-specific properties) is a set of first order formulae that must be satisfied by the abductive hypotheses.

These three components are used to define abductive explanations. Definition 1 (Abductive explanation). Given an abductive theory T =<

P, A, IC > and a formula G, an abductive explanation Δ for G is a set of

ground atoms of predicates in A s.t. P ∪ Δ | = G (Δ explains G) and P ∪ Δ | = IC (Δ is consistent). When it exists, T abductively entails G, in symbols T | =A G. Suppose a clause C: h(t1, .., tn) :- l1, ..ln′ , h is the unique literal of the head, n is its arity, li with i = { 1, ..., n′} are the literals in the body and li stands for the negative literal ¬ li. For instance, Example 1 defines a logic program P for the concept printable(X ) that describes the features that a document must own in order to be printed by a particular printer.

Example 1 (Example theory for paper domain).

c1 : printable(X) ← a4(X), text(X) c2 : printable(X) ← a4(X), table(X), black_white(X) c3 : printable(X) ← a4(X), text(X), black_white(X) A = { image, text, black_white, printable, table, a4, a5, a3}

In this framework, a proof procedure for abductive logic programs has been presented in [ 8 ]. It interleaves phases of abductive and consistency derivations: an abductive derivation is the standard Logic Programming derivation extended in order to consider abducibles. When an abducible literal δ has to be proved, it is added to the current set of assumptions (if not already included). Since the addition of δ must not violate any integrity constraint, a consistency derivation starts to check that all integrity constraints containing δ fail. In the consistency derivation an abductive derivation is used to solve each goal. This might cause an extension of the current set of assumptions.

More specifically an abductive derivation from (G1 Δ1) to (Gn Δn) in < P, A, IC > of a literal from a goal, is a sequence

(G1 Δ1), (G2 Δ2), ..., (Gn Δn) such that each Gi has the form ← L1, ..., Lk and (Gi+1Δi+1) is obtained according to one of the following rules: 1. If Lj is not abducible, then Gi+1 = C and Δi+1 = Δi where C is the resolvent of some clause in P with Gi on the selected literal Lj ; 2. If Lj is abducible and Lj ∈ Δi, then Gi+1 =← L1, ..., Lj−1, Lj+1, ..., Lk and Δi+1 = Δi; 3. If Lj is a ground abducible, Lj 6∈ Δi and Lj 6∈ Δi and there exists a consistency derivation from ({ Lj } Δi∪{ Lj } ) to(}{ Δ′) then Gi+1 =← L1, ..., Lj−1, Lj+1, ..., Lk and Δi+1 = Δ′ In the first two steps the logical resolution is performed exploiting: (1) the rules of P and (2) the abductive assumptions already made. In the last step before adding a new assumption to the current set of assumption a consistency checking is performed.

A consistency derivation for an abducible α from (α, Δ1) to (Fn Δn) in < P, A, IC > is a sequence

(α Δ1), (F1 Δ1), ..., (Fn Δn) where: 1. F1 is the union of all goals of the form ← L1, ..., Ln obtained by resolving the abducible α with the constraints in IC with no such goal being empty; 2. for each i > 1 let Fi have the form {← L1, ..., Lk} ∪ Fi′, then for some j = 1, ..., k and each (Fi+1 Δi+1) is obtained according to one of the following rules: (a) If Lj is not abducible, then Fi+1 = C′ ∪ Fi′ where C′ is the set of all resolvents of clauses in P with ← L1, ..., Lk on literal Lj and the empty goal [] 6∈ C′, and Δi+1 = Δ (b) If Lj is abducible, Lj ∈ Δi and k > 1, then Fi+1 = ←{ L1, ..., Lk} ∪ Fi′ and Δi+1 = Δi (c) If Lj is abducible, Lj ∈ Δi then Fi+1 = Fi′ and Δi+1 = Δi (d) If Lj is a ground abducible, Lj 6∈ Δi and Lj 6∈ Δi and there exists an abductive derivation from (← Lj Δi) to ([] Δ′) then Fi+1 = Fi′ and Δi+1 = Δ′; (e) If Lj is equal to A with A a ground atom and there exists an abductive derivation from (← A Δi) to ([] Δ′) then Fi+1 = Fi′ and Δi+1 = Δ′.

In the first case 2a the current branch is split into the number of resolvents of ← L1, ..., Lk with the clauses in P on Lj. If we get the empty clause the whole check fails. In the second case if Lj belongs to Δi, it is discarded and if it is alone, the derivation fails. In case 2c the current branch is consistent with the assumptions in Δi and so it is dropped from the checking. In the last two cases 2d and 2e the current branch can be dropped if respectively ← Lj and A are abductively probable.

The procedure returns the minimal abductive explanation set if any, otherwise it fails. It is worth noting that according to the implementation in [ 8 ] the Δs in the abductive explanations must be ground. Thus if a non ground abducible is encountered it is first unified with a clause in the background knowledge, with an example or with a previously abduced literal. If this is not possible, it is grounded with a skolem constant. 4

Probabilistic Abductive Logic Programming

This work extends the technique shown in Section 3 in order to smooth the classical rigid approach with a statistical one.

In order to motivate our approach we can assume that to abduce a fact, there is the need for checking if there are constraints that prevent such an abduction. The constraints can be either universally valid laws (such as temporal or physical ones), or domain-specific restrictions. Constraints verifi cation can involve other hypotheses that have to be abduced, and others that can be deductively proved. We can be sure that if a hypothesis is deductively verified, it can be surely assumed true. Conversely, if a hypothesis involves other abductions, there is the need of evaluating all possible situations before assuming the best one. In this view each abductive explanation can be seen as a possible world, since each time one assumes something he conjectures the existence of that situation in a specific world. Some abductions might be very unlikely in some worlds, but most likely in other ones. The likelihood of an abduction can be assessed considering what we have seen in the real world and what we should expect to see.

Moreover, this work handles a new kind of constraints since typically the constraints are only the nand of the conditions and are not probabilistic. They allow a more suitable and understandable combination of situations. The first kind is the classical nand denial where at least one condition must be false, the type or represent a set of conditions where at least one must be true, and the type xor requires that only one condition must be true. Moreover, due to noise and uncertainty in the real world, we have smoothed each constraint with a probability that reflects the degree of the personal belief in the likelihood of the whole constraint.

In Example 2 it can be seen that each probabilistic constraint is a triple hP r, S, T i where P r is a probability and expresses its reliability, or our confidence on it, T is the type of constraint and represents the kind of denial, and S is the set of literals of the constraint. For example ic1 states that a document can be only of one of the three size formats (a3, a4 or a5) and that our personal belief in the likelihood of this constraint is 0.8, ic2 states that a document can be composed either of tables, images or text, and so on.

Example 2 (Typed Probabilistic Constraints).

ic1 = h0.8, [a3(X), a4(X), a5(X)], xori ic2 = h0.9, [table(X), text(X), image(X)], ori ic3 = h0.3, [text(X), color(X)], nandi ic4 = h0.3, [table(X), color(X)], nandi ic5 = h0.6, [black_white(X), color(X)], xori Our probabilistic approach to logical abductive reasoning can be described from two perspectives: the logical proof procedure and the computation of probabilities. 4.1

ALP perspective

The logical proof procedure consists of an extended version of the classical one [ 7, 8 ]. While the classical procedure resolved the goal by looking for one minimal abductive explanation set, our approach is to generate different (minimal) abductive explanations and then evaluate them all. This goal can be achieved by changing each assumption that may constrain the subsequent abductive explanations. For example, supposing that a document is black and white, all subsequent assumptions must be consistent with this. But if we change this assumption, assuming that in another world that document is not black and white, we might obtain another set of hypoteses consistent with this last assumption. Each time the procedure assumes something two possible worlds can be considered: one where the assumption holds and another where it does not. The overall view is analogous to the exploration of a tree in which each interior node corresponds to a decision (an assumption), as for example changing the truth value of a literal, an edge represents the consequences of that decision and a leaf is the conclusion of the abductive reasoning in a particular possibile consistent world. In order to generate such possible worlds in which a literal holds and others in which it does not, the classical procedure has been extended by means of introducing two rules for the derivation procedures. The 4th rule of the abductive derivation will be: 4. If Lj is a ground abducible, Lj 6∈ Δi and Lj 6∈ Δi and there exists a consistency derivation from ({ Lj } Δi∪{ Lj } ) to (}{ Δ′) then Gi+1 =← L1, ..., Lj−1, Lj+1, ..., Lk and Δi+1 = Δ′ A corresponding rule 2d* for the consistency derivation is introduced between 2d and 2e, and states: 2d* If Lj is a ground abducible, Lj 6∈ Δi and Lj 6∈ Δi and there exists an abductive derivation from (← Lj Δi) to ([] Δ′) then Fi+1 = Fi′ and Δi+1 = Δ′; It can be noted that the rules 3 and 4 of the abductive derivation are the choice points on which the procedure can backtrack in order to assume both values (positive and negative) of a literal. In fact in a possible world Lj is added to the set of hypoteses by means of 3rd rule, and in the other possible world Lj holds by means of 4th rule. The analogous choice points in the consistency derivation are respectively the rules 2d* and 2d.

The choice of the predicate definition, that is the 1st rule of the abductive derivation, is another choice point where the procedure can backtrack to explore different explanations and consequently other possible worlds. In fact choosing different definitions, other literals will be taken into account and thus other constraints must be satisfied, and so on. It is worth noting that if some assumptions do not preserve the consistency, the procedure discards them along with the corresponding possible worlds. Section 4.2 will present a probabilistic strategy to choose the most likely explanation among all possible worlds.

Example 3 (Observation o1, Query and Possible Explanations).

a4(o1) ?- printable(o1) Δ1 = { text(o1), table(o1)} Ic1 = { ic2, ic3, ic4} Δ2 = { text(o1), table(o1), image(o1)} Ic2 = { ic2, ic3, ic4} Δ3 = { table(o1), black_white(o1)} Ic3 = { ic2, ic4, ic5} Δ4 = { text(o1), black_white(o1)} Ic4 = { ic2, ic3, ic5} In order to motivate our point of view, lets’ consider Exampl e 3 belonging to the “paper” domain presented in Section 3. In order to abduct ively cover o1 by means of the concept definition printable(X ) (i.e. query ?- printable(o1)), there is the need for abducing other literals since the concept definitions ci with i = { 1, ..., 3} need some facts that are not in the knowledge base (KB) (i.e. only a4(o1) holds). This is the classical situation in which deductive reasoning fails and there is the need of abductive one.

The procedure executes an abductive derivation applying the 1st rule and obtaining one of the resolvent in P , for instance, the first clause c1 : printable(X) ← a4(X), text(X). In this case a4(o1) holds, and so we can exclude it from the abductive procedure continuing with the next literal text(o1). This literal does not hold, and so there is the need of abducing it. This abduction involves the verification of ic2 and ic3 by the consistency derivation. Since ic2 is a or constraint and thus at least one literal must be true, there are two possible ways to satisfy it considering that text(o1) must not hold (by current hypothesis). Thus there are two possible worlds, one in which table(o1) holds, and the other in which table(o1) does not.

In the former world rule 2d* fires and table(o1) is abduced by performing a consistency derivation on the constraints ic2 and ic4. The former constraint is already verified by means of the previous abductions (rule 2b). Since ic4 is a nand constraint and thus at least one literal must be false, it is satisfied because color is not abducible (see abducible predicates A in Example 1) and does not hold (if a literal is not abducible, its value depends on background knowledge, i.e. rule 2a). Thus coming back to the initial abduction text(o1), ic2 is satisfied abducing table(o1) and ic3 is already satisfied (since it is a nand constraint, it is falsified by abducing text(o1)). The first abductive explanation Δ1 for the goal printable(o1) is so formed by the abductions of text(o1) and table(o1) (see Example 3).

Similarly to the classical procedure that in backtracking returns all minimal abductive explanations, our procedure comes back to each choice point changing the truth values of the literals in order to explore different possible explanations (and thus other possible worlds). The first choice point, as mentioned above, regards the abduction of table(o1). While in the former possible world table(o1) holds, now the procedure abduces table(o1) (rule 2d) thus exploring the other possible worlds. In this world to abduce table(o1) the constraints ic2 and ic4 are taken into account. The former constraint is verified by abducing image(o1) (i.e. or constraint) and the latter is satisfied by the initial abduction text(o1) as in explanation Δ1. Then text(o1) can be abduced also in this world since ic2 has been just verified and ic3 as before. So Δ2 is the second explanation obtained by changing the assumption on literal table(o1) and thus exploiting another possible world.

The explanations Δ1 and Δ2 are the only two possible worlds given the first clause c1 because other literal configurations are inconsistent. Hence, the procedure comes back to last backtracking point that were the choice of the predicate definition. This time the 1st rule of the abductive derivation can be applied to obtain the second predicate definition c2 : printable(X) ← a4(X), table(X), black_white(X). So there is the need of abducing table(o1) and black_white(o1). The former can be abduced because ic2 is satisfied (i.e. or constraint), and ic4 is verified by means of color(o1). The latter literal can be abduced because ic5 is a xor constraint and thus at most one literal must be true. In explanation Δ3 no other literals must be abduced by the consistency derivation and so no other worlds must be explored since the abduced literals table(o1) and black_white(o1) are constrained by the predicate definition.

Once again the procedure comes back to last backtracking point and chooses the predicate definition c3 : printable(X) ← a4(X), text(X), black_white(X). The abductive derivation, similarly to the previous run, returns only one possible explanation Δ4 as it can been seen in Example 3.

It easy to note that rules (2d) and (2d*) are candidate entry points for backtracking in the consistency derivation , and rules 1,3 and 4 are the analogues in the abductive derivation. In this way all possible (minimal) explanations are obtained along with all possible consistent worlds. 4.2

Probabilistic perspective

After all different explanations have been found by the above abductive proof procedure, the issue of selecting the best one arises. The simple approach of choosing the minimal explanation is reliable when there is only one minimal proof or when there are no ways to assess the reliability of the explanations. However, Example 3 shows that there might be different explanations for the same observation and so we need to assess the probability of each explanation in order to choose the best one. To face this issue, our approach regards each set of abductions as a possible world and so a chance of being true can be assigned to it. The abduction probability of each ground literal through the possible worlds can be obtained considering two aspects: the chance of being true in the real world and all sets of assumptions made during the consistency derivation in each possible world.

Lets’ introduce some notation: Δ = { P1 : (Δ1, Ic1), ..., PT : (ΔT , IcT )} is the set of the T consistent possible worlds that can be assumed for proving a goal G (i.e. the observation to be proved). Each (Δi, Ici) is a pair of sets: Δi = { δ1, ..., δJ } contains the ground literals δj with j ∈ { 1, ..., J } abduced in a single abductive proof, and Ici = { ic1, ..., icK} is the set of the constraints ick with k = { 1, ..., K} involved in the explanation Δi. Both Δi and Ici may be empty. Moreover, we have used the following symbols in our equations: n(δj) is the number of true grounding of the predicate used in literal δj, n(cons) is total number of constants encountered in the world, a(δj) is the arity of literal δj and P (ick) is the probability of the kth-constraint. Thus the chance of being true of a ground literal δj can be defined as:

P (δj) = n(δj) n(cons)! (n(cons)−a(δj))! (1) Then the unnormalized probability of the abductive explanation can be assessed by Equation 2 and the probability of the abductive explanation normalized over all T consistent worlds can be computed as in Equation 3:

J K P(′Δi,Ici) = Y P (δj) ∗ Y P (ick) (2) j=1 k=1

P(Δi,Ici) =

P(′Δi,Ici) PtT=1 P(′Δt,Ict) (3)

Equation 1 expresses the ratio between true and possible groundings of literal δj. The intuition behind this equation can be expressed with a simple example: given a feature f (· ) and an item obj that does not have such a feature, if we want to assess the probability that obj owns f (· ) (i.e. P (f (obj))), we should consider how often we found items that hold f (· ) over all items that might own it in real world.

It is worth noting that we are considering the Object Identity [ 17 ] assumption and so within a clause, terms (even variables) denoted with different symbols must be distinct (i.e. they must refer to different objects). Thus only literal groundings without constants repetitions are allowed in Equation 1. It is important to underline that such a bias does not limit the expressive power of the language, since for any clause/program it is possible to find an equivalent version under Object Identity. The first part of the formula 2 encloses the probability that all abduced literals are true in that particular world. The second part expresses the reliability of the constraints involved in the i-th abductive explanation. Although our approach is focused on the computation of the most probable explanation and hence there is no need of normalizing the probabilities of the explanations over all possible worlds (i.e some worlds are ruled out due to the presence of integrity constraints), it follows that Equation 3 is presented for completeness. The probability of δj is equal to 1 − P (δj).

Example 4 (Probability assessment of the Abductive Explanations).

A = { 0.2:image, 0.4:text, 0.1:black_white, 0.6:printable, 0.1:table, 0.9:a4, 0.1:a5, 0.1:a3} P ′(Δ1, Ic1) = 0.00486 P ′(Δ2, Ic2) = 0.00875 P ′(Δ3, Ic3) = 0.00162 P ′(Δ4, Ic4) = 0.00648 For the sake of clarity, each abducible literal of the current example has been labeled with its probability using formula (1) as shown in Example 4. For instance, the probability of the first explanation using (2) can be computed as P(′Δ1,Ic1) = P (text(o1)) ∗ P (table(o1)) ∗ P (ic2) ∗ P (ic3) ∗ P (ic4) and thus P(′Δ1,Ic1) = 0.6 ∗ 0.1 ∗ 0.9 ∗ 0.3 ∗ 0.3 = 0.00486. Then, Example 4 shows the probability of all explanations computed using equation (2).

Finally we can state the maximum probability among all abductive explanations. It corresponds to the maximum between all T possible consistent worlds (in this example T is equal to 4) s.t. P ′(printable(o1)) = max1≤i≤T Pi′(Δi, Ici), that is Δ2. It is worth noting that this behaviour claims the need of the logical abductive reasoning to be supported by a probabilistic assessment of all abductive explanations rather than relying on the minimal one.

Improving Classification Exploiting Probabilistic Abductive Reasoning

Now the above proof procedure can be exploited to classify never-seen instances. In particular we first learns from data the model (i.e. the Abductive Logic Program < P, A, IC >) and the parameters (i.e. literals probabilities), and then our Probabilistic Abductive Logic proof procedure can be exploited to classify new instances. The strategy, presented in Algorithm 1, can be split into two parts: the former prepares the data (model and parameters), and the latter performs the classification. In the former part, given a train set T raini and a set of abducible literals A (possibly empty) our approach learns the corresponding Theory Ti (line 1) by exploiting INTHELEX [ 4 ], a well-known ILP system , and then obtains the integrity constraints (line 2) with the procedure described in [ 5 ]. Such procedure returns a set of nand constraints of different sizes and descriptor type domain. This last information can be useful to define a new kind of constraint called type_domain that can be dealt as an xor constraint. In fact if the descriptor type domain for the color property is {blue, red, yellow, black, green}, and the object X is part of an observation, it will be impossible to abduce two different color descriptors from the above set applied to X. However since our procedure handles natively a probability value associated to the constraints, and this procedure does not return those values, the constraints should be manually labelled or they will be automatically considered true with probability of 1.0. In any case, the set of abducible A can be left empty, because our system considers abducible all predicates without definitions in Ti. The last step of the first part (line 3) computes the Equation 1 for each literal in T raini before starting the abductive procedure since those values depend only on T raini.

Hence, the second part of the strategy starts at line 5. As can be seen at lines 8 and 9, our algorithm tries to abductively cover the example considering both as positive and as negative for the class c. In fact, when an example is considered negative, our procedure discovers all possibile worlds in which it cannot be abductively proved as instance of concept c. Specularly if the example is positive, it discovers all the possibile worlds in which must be abduced something to prove it. Those two executions return an explanation probability that can be compared each other in order to choose the best class. Then the algorithm selects the best classification between all concept probabilities as can be seen at line 16.

It is worth noting that this strategy cannot be performed by a pure abductive logic proof procedure since in such context we do not need a logical response (true/false) but we want a probabilistic value. 6

Experimental Evaluation

The evaluation is aimed at assessing the quality of the results obtained by the probabilistic classification when it faces incomplete and noisy data. All experiments were performed on datasets obtained from UCI [ 2 ] machine learning repository.

A 10-fold split of each dataset has been performed in order to obtain a set of 10 couples <T rain, T est>. Then each test-set has been replaced by a set of corrupted versions, in which we removed at random a K% of each example description, with K varying from 10% to 70% with step 10. This procedure has been repeated 5 times for each corrupted test-set in order to randomize the example corruption. In this way 35 test-sets for each fold ha ve been obtained (7 levels of corruption by 5 runs for level). In order to compare the outcomes with a complete test-set we exploited deductive reasoning o n the original test-set (i.e corruption level 0%). Moreover we exploited only deductive reasoning to the same corrupted test-set in order to show the improvement of o ur approach. The maximum length of constraints has been set to 4 for all datasets. Since we do not have any previous knowledge on the datasets we assumed true all obtained constraints with a probability of 1.0 as described in the Section 5.

Then the performances of the system can be evaluated with the aim of understanding how the approach is sensible to the progressive lack of knowledge across the 10 folds. The following synthetic descriptions of the datasets refer to values averaged on the folds.

Breast-Cancer contains 201 instances of the benign class and 85 instances of the malignant class and each instance is described by 9 literals. There is the presence of less than 10 instances with missing values. The learned theory consists of 30 clauses where each one has an average of 6 literals in the body.

The learned integrity constraints are 1784 (55% are constraints of length 4, 35% of length 3 and 10% of length 2), and 9 type domain constraints (one for each literal in the example description language).

Congressional Voting Records contains 435 instances (267 democrats, 168 republicans) classified as democrats or republicans according to their votes. Each instance is described by 16 literals. The obtained theory consists of 35 clauses where each one has an average of 4.5 literals in the body. The learned integrity constraints are 4173 (16% are constraints of length 4, 37% of length 3 and 47% of length 2), and 16 type domain constraints (as before).

Tic-Tac-Toe dataset contains 958 end game board configurations of tic-ta ctoe (about 65.3% are positive), where x is assumed to have played first. The target concept win(x) represents one of the 8 possible ways to create a three-ina-row. Thus, each instance is described by 8 literals. The le arned theory consists of 18 clauses where each has an average of 4 literals in the body. The learned integrity constraints are 1863 (99% are constraints of length 4, 1% of length 3), and 16 type domain constraints (as before).

Results and Discussion

the corruption levels. The probabilistic classification performed on the first two datasets allows to assess the strength and robustness of the approach. In fact their accuracy curves go down less fast than the thirds’ one a nd even when the 70% of each example description has been removed, the system is able to classify never-seen instances with a mean accuracy of 0.7 and 0.6 resp ectively.

In order to understand the non-linear trend of the accuracy o n the third dataset we can analyze Table 1. It shows the classification performances averaged within the same fold (i.e. 5 random corruptions) and between different folds for each level of corruption. We can find a sharp decay of recall for the third dataset in correspondence of the descending accuracy curve. This happens for two reasons: each example is less described than the ones of the other datasets (8 literals for description) and its background theory consists of an average of 4 literals for clause. Those two aspects make the approach more prone to abduce few literals (often a single one) to make positive examples never covered by the class definitions.

Comparing these results with the deductive approach, it can be noted that in the first dataset after the 30% of corruption, no positive example has been classified correctly. This deficiency has never affected our approach in all experiments. The performances of the deductive approach on last two datasets degrade faster compared to the abductive one.

In general if the examples are less described (as in Tic-Tac- Toe and BreastCancer), the number of misclassifications increases due to missing information. It follows that corruption levels greater than 50% rise up the chance of removing important object features, and so such experiments have been performed only for putting stress on the proposed approach. 7

Conclusions

Reasoning in very complex contexts often requires pure deductive reasoning to be supported by a variety of techniques that can cope with incomplete and uncertain data. Abductive inference allows to guess information that has not been explicitly observed. Since there are many explanations for such guesses, there is the need for assigning a probability to them in order to choose the best one.

In this paper we propose a strategy to rank all possible minimal explanations according to their chance of being true. We have faced two issues: the generation of multiple explanations consistent with the background knowledge, and the definition of a strategy to prioritize among different ones using their chance of being true according to the notion of possible worlds. Another novelty has been the introduction of probabilistic integrity constraints rather than hard ones as in the classical Abductive Logic Programming framework.

The proposal has been described from two perspectives: the abductive proof procedure and the computation of the probabilities. The former, extending the classical one, allows to generate many different explanations for each abduction, while the latter provides a probabilistic assessment of each explanation in order to choose the most likely one. Moreover we introduce a strategy to exploit our Probabilistic Abductive Proof procedure in classification tasks. The approach has been evaluated on three standard datasets, showing that it is able to correctly classify unseen instances in the presence of noisy and missing information. Further studies will be focused on learning the probabilistic constraints, and on the use of a richer probabilistic model for the literal probability distribution. Then we aim to exploit the Probabilistic Abductive Proof Procedure in other tasks such as natural language understanding and plan recognition.

[1]

Andreas

Arvanitis , Stephen H. Muggleton , Jianzhong Chen, and Hiroaki Watanabe . Abduction with stochastic logic programs based on a possible worlds semantics . In In Short Paper Proc. of 16th ILP , 2006 .

[2]

Bache and M. Lichman. UCI machine learning repository , 2013 .

[3]

Henning

Christiansen . Implementing probabilistic abductive logic programming with Constraint Handling Rules . pages 851 - 18 . 2008 .

[4]

Floriana

Esposito , Giovanni Semeraro, Nicola Fanizzi, and

Stefano

Ferilli . Multistrategy theory revision: Induction and abduction in inthelex . Machine Learning , 38 : 1331 - 56 , 2000 .

[5]

Stefano

Ferilli , Teresa M. A. Basile , Nicola Di Mauro, and Floriana Esposito . Automatic induction of abduction and abstraction theories from observations . In Proc. of the 15th ILP, ILP0'5 , pages 1031 - 20 , Berlin, Heidelberg, 2005 . Springe rVerlag.

[6]

Lise

Carol Getoor . Learning statistical models from relational data . PhD thesis , Stanford, CA, USA, 2002 . AAI3038093.

[7]

A. C.

Kakas ,

R. A.

Kowalski , and

Toni . Abductive logic programming , 1993 .

[8] Antonis

Kakas and Fabrizio

Riguzzi . Abductive concept learning . New Generation Comput. , 18 ( 3 ): 2432 - 94 , 2000 .

[9] Rohit

Kate and Raymond J.

Mooney . Probabilistic abduction using markov logic networks . In Proceedings of the IJCAI0-9 Workshop on Plan, Activity, and Intent Recognition (PAIR0-9) , Pasadena, CA, July 2009 .

[10]

Muggleton . Stochastic Logic Programs . In L. De Raedt, editor, Proceedings of the 5th International Workshop on Inductive Logi c Programming . Department of Computer Science, Katholieke Universiteit Leuven, 1995 .

[11]

Judea

Pearl . Graphical models for probabilistic and causal reasoning . In The Computer Science and Engineering Handbook , pages 6977 - 14 . 1997 .

[12]

David

Poole . Probabilistic horn abduction and bayesian networks . Artif. Intell. , 64 ( 1 ): 811 - 29 , 1993 .

[13]

David

Poole . Learning, Bayesian probability, graphical models, and abduction . In Abduction and Induction: Essays on their Relation and Integration , Chapter 10 , pages 1531 - 68 . Kluwer, 1998 .

[14] Luc De Raedt and Kristian Kersting . Probabilistic inductive logic programming . In ALT , pages 193 - 6 , 2004 .

[15] Sindhu

Raghavan . Bayesian abductive logic programs: A probabilistic logic for abductive reasoning . In Toby Walsh, editor, IJCAI , pages 28402 - 841 , 2011 .

[16]

Taisuke

Sato . Em learning for symbolic-statistical mo dels in statistical abduction . In Progress in Discovery Science, Final Report of the Japanese Discovery Science Project , pages 1892 - 00 , London, UK, UK, 2002 . Springer-Verlag.

[17] Giovanni

Semeraro

, Floriana Esposito, Donato Malerba, Nicola Fanizzi, and

Stefano

Ferilli . A logic framework for the incremental inductive synthesis of datalog theories . In Norbert E. Fuchs, editor, LOPSTR , volume 1463 of Lecture Notes in Computer Science, pages 3003 - 21 . Springer, 1997 .