Tensor-Based Abduction in Horn Propositional Programs Yaniv Aspis, Krysia Broda, and Alessandra Russo Department of Computing, Imperial College London {yaniv.aspis17,k.broda,a.russo}@imperial.ac.uk Abstract. This paper proposes an algorithm for computing solutions of abductive Horn propositional tasks using third-order tensors. We first in- troduce the notion of explanatory operator, a single-step operation based on inverted implication, and prove that minimal abductive solutions of a given Horn propositional task can be correctly computed using this operator. We then provide a mapping of Horn propositional programs into third-order tensors, which builds upon recent work on matrix rep- resentation of Horn programs. We finally show how this mapping can be used to compute the explanatory operator by tensor multiplication. Keywords: Abduction · Linear Algebra · High-Order Tensor 1 Introduction Linear algebra plays a central role in both Artificial Intelligence and Machine Learning. In particular, statistical approaches to AI often involve analysis and manipulation of feature vector spaces for the purpose of learning and inference. For example, high-order tensors and dense vectors have been employed to sup- port logic-based deductive and inductive inference [11,8], but for a limited class of logic programs. The area of symbolic AI, however, has seen only limited usage of linear algebra. Two methods have recently been proposed for embedding logic programs into vector spaces. The first, [10], allows the computation of the truth value of a First-Order formula using tensor products. The second, [9], offers a characterization of logic programs using linear algebra by mapping Horn logic programs into matrices, and disjunctive and normal programs into third-order tensors in order to support deductive inference. Our work builds upon these recent results as a first step towards a tensor- based approach for inductive learning. Abduction plays an important part in the inference of knowledge completion, and is related to deduction and Clark Completion [1]. It has been used in several inductive logic programming ap- proaches. For instance, XHAIL [7] uses abductive inference to compute suitable head atoms for hypothesis clauses, and both ASPAL [2] and Metagol [6] use an abductive meta-representation of the hypothesis space to search for inductive solutions. An abductive task includes an observation, a background knowledge, in the form of logical rules, and a set of abducible atoms (or explanations), with which Yaniv Aspis, Krysia Broda, and Alessandra Russo to construct an explanation to the given observation. Explanations must be consistent with the given background knowledge and integrity constraints (if any). An abductive solution is a subset of abducibles that, together with the background knowledge, consistently explain the given observation. In this paper we consider abductive tasks where the background knowledge is represented as a set of propositional Horn clauses. We refer to such a task as an abductive propositional Horn task. Drawing on the work of Sakama et al. [9], we propose an algorithm for computing solutions of an abductive propositional Horn task using third-order tensors. We first introduce the notion of explanatory operator, a single-step operation based on inverted implication. We prove that minimal abductive solutions of an abductive propositional Horn task can be correctly computed using this operator. We then provide a mapping for expressing sets of propositional Horn clauses into third-order tensors and show how this mapping can be used to compute the explanatory operator by tensor multiplication. The paper is structured as follows. Section 2 introduces the basic notions and terminologies. Section 3 presents our approach with theoretical results and Section 4 discusses its implementation and future research directions. 2 Background We assume the reader is familiar with logic programming, linear algebra and the basics of high-order tensors [5]. A Horn rule is of the form h ← b1 ∧ b2 ∧ ... ∧ bn , where h, b1 , b2 , ..., bn are propositional variables (called atoms) belonging to an alphabet Σ. The atom h is the head of r, denoted head(r), and the set of atoms {b1 , b2 , ..., bn } is the body of r, denoted body(r). Constants ⊥ and > represent false and true respectively and also belong to Σ. A fact is a rule of the form h ← >, and a constraint is a rule of the form ⊥ ← b1 ∧...∧bn . A Horn program P is a finite set of Horn rules, facts and constraints. The set of atoms that appear in P , along with ⊥ and >, is called the Herbrand base of P and denoted BP . An Interpretation I of P is a subset of BP , containing also > (implicitly). If I contains ⊥ we say I is inconsistent. I is a model of P if it is consistent and satisfies the property: if body(r) ⊆ I then head(r) ∈ I. We say a program P is satisfiable, denoted sat(P ) if it has a model, and if not we write unsat(P ). Horn programs have at most one minimal model, called the Least Herbrand Model (LHM (P )) [12]. A program P ∪ I is a short hand notation for adding the atoms of I as facts to a program P . The Immediate Consequence operator of a program P , TP , is defined over interpretations by TP (I) = head(r) r ∈ P, body(r) ⊆ I . For every Horn Program P there exists a smallest natural number n such that TPn+1 (∅) = TPn (∅). TPn (∅) is called a fixed point. If a program is satisfiable, the fixed point is equal to LHM (P ) [12]. Sakama et al. have proposed in [9] a mapping of Horn programs into matri- ces. This is defined only for programs that satisfy a Multiple Definitions (MD) condition: for every two rules r1 and r2 such that head(r1 ) = head(r2 ) then |body(r1 )| ≤ 1 and |body(r2 )| ≤ 1. In other words, if an atom is head of more than one rule, then those rules must have no more than one propositional vari- Tensor-Based Abduction in Horn Propositional Programs able in their body. It has been shown that every program can be mapped into a program satisfying the MD condition [9]. We now describe a modified version of the mapping proposed in [9]. Let P be a Horn program with Herbrand base BP = {⊥, >, p3 , ..., pN }. First, elements of BP are mapped into one hot-vectors in RN . An interpretation I can be mapped into a vector v I by setting vi = 1 if pi ∈ I, else vi = 0. A program P is mapped into a matrix D P ∈ RN ×N as follows. For every rule r of the form P P P 1 pi ← pj1 , pj2 , ..., pjm , the elements Dij 1 = Dij 2 = ... = Dij m = m . Elements P P D11 = ... = DN 1 = 1, indicating that “⊥ implies everything”, and elements D21P P = ... = D2N = 1, denoting that “every- ⊥ > p q r t s  ⊥ 1 000 001 thing implies >”. Finally, differently from [9], we also set P P P > 1 1 1 1 1 1 1  D11 = D22 = ... = DN N = 1, representing the tautologies  1 1  p 1 0 1 2 2 0 0  pi ← pi for every atom pi in P . This addition is required   q 1 0 0 1 0 1 0  for the abductive procedure we propose. As an example, for   r 1 1 0 0 1 0 0  the program P = {p ← q ∧ r, q ← t, r ← >, t ← s, ⊥ ← s},   t 1 0 0 0 0 1 1  the corresponding matrix D P can be seen on the right. s 1 0 0 0 0 0 1 Let now H1 : R → {0, 1} be defined as H1 (x) = 1, if x ≥ 1, else 0. Analogously to Proposition 3.2 in [9] we can prove the following proposition: Proposition 1. Let I and J be two interpretations. Then J = TP (I) ∪ I if and only if v J = H1 (D P · v I ) where H1 is applied element-wise. Proposition 2. Let TP0 (I) = TP (I) ∪ I. Then TP0k (I) is a fixed point for a large enough k. If sat(P ∪ I) then TP0k (I) = LHM (P ∪ I), otherwise TP0k (I) is inconsistent. Hence, given a program P and an interpretation I, we can construct the cor- responding matrix D P and vector v I . By repeatedly multiplying by D P and applying H1 , we can compute a vector v ∗ corresponding to the fixed point of TP0 (I). If P is satisfiable, by Proposition 2, v ∗ corresponds to the LHM (P ∪ I). Otherwise v1∗ = 1, meaning the interpretation includes ⊥. An abductive task is a triplet < P, Ab, g >, where program P consists of back- ground knowledge and constraints. Ab ⊆ BP is a set of atoms called abducibles. g is an atom called an observation or goal term. We define an abductive solution to be a set ∆ ⊆ Ab such that sat(P ∪ ∆) and g ∈ LHM (P ∪ ∆). A solution ∆ is minimal if there does not exist ∆0 ⊆ ∆ such that ∆0 is also an abductive solution. We are interested in finding minimal solutions of an abductive task. 3 Method We first define a single-step operator, the explanatory operator (EP ). This is similar in spirit to TP , and it allows us to generate abductive solutions in a “deductive” fashion. For simplicity, we assume Ab = BP . We will later consider filtering solutions according to a given specific set of abducibles. The idea be- hind the explanatory operator is inverting implications. For instance, suppose Yaniv Aspis, Krysia Broda, and Alessandra Russo P contains the rules {g ← q, g ← r, r ← s ∧ t}. Then possible (minimal) expla- nations for g are {g} (since g ∈ LHM (P ∪ {g})), {q}, {r} and {s, t}. To arrive at these solutions, our operator takes an interpretation such as {g} and for each rule with g as its head, creates a new interpretation consisting of the atoms in the body of that rule. So in the example above, the explanatory operator maps {g} to the two interpretations {q} and {r}. An interpretation I generated in this manner may be inconsistent with the program (that is, unsat(P ∪ I)). So we remove from its output any interpretations that are inconsistent with P . The explanatory operator output includes I as well, as we would like to maintain solutions that we have already found as we repeatedly perform the operation. Definition 1. Let P be a program and I an interpretation such that sat(P ∪ I). The explanatory operator EP applied to I, denoted EP (I) is given by   ∩ I 0 sat(P ∪ I 0 )   EP (I) = {I} ∪ I\{head(r)} ∪ body(r) r ∈ P, head(r) ∈ I To apply EP iteratively, we extend its definition to ÊP , which can be ap- plied to a set of Sinterpretations, I, all of which should be consistent with P . Then ÊP (I) = EP (I). We can now define the powers of ÊP (I) by setting I∈I ÊP0 (I) = I and ÊPn+1 (I) = ÊP (ÊPn (I)). For the program above, ÊP0 ({{g}}) = {{g}}, ÊP1 ({{g}}) = {{g}, {q}, {r}}, ÊP2 ({{g}}) = {{g}, {q}, {r}, {s, t}} and ÊP3 ({{g}}) = ÊP2 ({{g}}). ÊP2 ({{g}}) is therefore a fixed point. The next propo- sition ensures that a fixed point will always be reached. Proposition 3. Let I be a set of interpretations all consistent with a program P . Then there exists a smallest natural number n for which ÊPn+1 (I) = ÊPn (I). Proof. Note that, by the definition of EP , for every I ∈ I we have I ∈ EP (I), meaning I ⊆ ÊP (I). By induction it follows that ÊPn (I) ⊆ ÊPn+1 (I). Hence |ÊPn (I)| is monotonically increasing in n. But for a finite program P , the Her- brand base BP is finite and therefore |ÊPn (I)| is bounded from above by 2|BP | . Therefore, there exists a smallest n for which ÊPn (I) = ÊPn+1 (I). t u We denote the fixed point by ÊP∗ ({{g}}). A generic abductive algorithm would construct ÊP∗ ({{g}}) by iteratively computing ÊP over interpretations. This is connected to the relationship between abduction and Clark Completion [1]. Proposition 4. Suppose ∆ ∈ ÊP∗ ({{g}}). Then sat(P ∪ ∆) and g ∈ LHM (P ∪ ∆) and ∆ is an abductive solution. Proposition 4 shows that ÊP∗ ({{g}}) contains only abductive solutions. Although it does not in general find all solutions, it does contain all the minimal ones. To show this, we first define a more general version of minimality. Definition 2. Given an abductive task < P, Ab, g >, a solution ∆ is n-step minimal if g ∈ TPn∪∆ (∅) and for every proper subset ∆0 ( ∆ we have g 6∈ TPn∪∆0 (∅). Tensor-Based Abduction in Horn Propositional Programs Intuitively, ∆ is n-step minimal if g can be derived from P ∪ ∆ in exactly n steps, and every atom in ∆ is necessary to do so. For example, consider the program P = {g ← p ∧ q, p ← t, p ← q}. We have ÊP∗ ({{g}}) = {{g}, {p, q}, {t, q}, {q}}. {p, q} is a solution that is not minimal, in fact it is 2-step minimal. {q} is a minimal solution that is not 2-step minimal, but is 3-step minimal. {q, t} is another solution that is not minimal, nor is it n-step minimal for any n. It is easy to see that all minimal solutions are n-step minimal for some natural number n. Proposition 5. Suppose that for the abductive task < P, BP , g > we have that ∆ is an n-step minimal solution for some natural number n. Then ∆ ∈ ÊP∗ ({{g}}). We now move from a generic algorithm to a linear algebraic one. First, we construct a third-order tensor which realises the ÊP operator. Interpretations, as before, can be represented as vectors. To represent a set of interpretations, we simply stack these vectors as columns of a matrix. Using 2-mode multiplication [5] on this matrix with the tensor realises application of the ÊP operator. This operation, however, does not produce quite the desired result. Firstly, there may be duplicate columns, which can simply be removed. More importantly, some of the columns will represent interpretations inconsistent with P . In essence, tensor  multiplication will realise the following part of the EP operation: {I} ∪ I\{head(r)} ∪ body(r) r ∈ P, head(r) ∈ I . Luckily, we have a simple way to remove inconsistent solutions: We can compute the fixed point for each column of the matrix using matrix multiplication as in Section 2. Filtering these out, we get exactly the behaviour of EP and achieve linear algebraic abduction. In more detail, let P be a program with K − 1 non-constraint rules such that BP has size N . We map a program P to an Abductive Tensor AP ∈ RN ×N ×K . The first K − 1 frontal slices of AP correspond to its (inverted) rules. If the k − th rule has head(r) = pj and body(r) = {pi1 , pi2 , ..., pim } then we set AP P P i1 jk = Ai2 jk = ... = Aim jk = 1. This ensures that if pj is “turned on” in an interpretation, then multiplication with this frontal slice will “turn on” the atoms in the body of the rule. We also set AP P P 11k = A22k = ... = Ai−1,i−1,k = AP P i+1,i+1,k = ... = AN,N,k = 1, with all other elements set to 0 (note that P Ai,i,k = 0). This will “turn off” pj while leaving all other atoms intact. We do this for all K − 1 rules. Finally, the last slice is equal to the identity matrix AP ::K = I, and corresponds to keeping the current interpretation. As an exam- ple, consider the program P = {g ← q ∧ r, q ← t, ⊥ ← t}. The frontal slices of the corresponding tensor are: ⊥ > g q r t  ⊥ > g q r t  ⊥ > g q r t  ⊥ 1 0 0 0 0 0 ⊥ 1 0 0 0 0 0 ⊥ 1 0 0 0 0 0 > 0 1 0 0 0 0  > 0 1 0 0 0 0  > 0 1 0 0 0 0        g 0 0 0 0 0 0  g 0 0 1 0 0 0  g 0 0 1 0 0 0  AP ::1 =   AP ::2 =   AP ::3 =   q 0 0 1 1 0 0  q 0 0 0 0 0 0  q 0 0 0 1 0 0        r 0 0 1 0 1 0  r 0 0 0 0 1 0  r 0 0 0 0 1 0  t 0 0 0 0 0 1 t 0 0 0 1 0 1 t 0 0 0 0 0 1 Yaniv Aspis, Krysia Broda, and Alessandra Russo Our generic algorithm can then be realised as follows: Given a program P and goal g, construct the tensor AP and matrix D P . For a set of interpretations I construct the corresponding matrix U I and then compute ÊP (I) as follows. Let W = H1 (AP ×2 U I ), where duplicate columns are removed from W . Next compute the fixed point for each column of W , using D P and the same method outlined in Section 2. The first element of each column corresponds to ⊥ and if set to 1 in the fixed point it indicates the column can be removed since it represents an inconsistent interpretation. If we begin with I = {{g}} and perform the above steps iteratively, we compute ÊP∗ ({{g}}). We only need to ensure {g} is consistent with P . Each column of the final matrix corresponds to an abductive solution, with all minimal solutions assured to be included. For example, for the above program the abductive process is as follows:           0 000 00 000000 00 1 1 1 1 1 1 1 1 1 1 1 1 1 1           1  0 1 1 H1 AP ×2   H1 AP ×2 0 1 = 0 0 0 0 1 1 ⇒ W = 0 1        = 0 1 0 0 1 0 1 0 1 1 0 0 1 0           0 1 0 0 1 0 1 1 1 1 0 0 1 0 0 000 00 010000 00 The matrix W is the fixed point, its columns corresponding to {q, r} and {g}, while the column corresponding to {r, t} has been removed as it is inconsistent with the program. The next proposition ensures the correctness of the tensor representation. Proposition  6. Let I be an interpretation consistent with a program P . Denote J = {I} ∪ I\{head(r)} ∪ body(r) r ∈ P, head(r) ∈ I . Let U = H1 (AP ×2 v I ). Then the columns of U correspond to the interpretations in J . Finally, we consider the case where we have a general set of abducibles Ab ⊆ BP . After computing ÊP∗ ({{g}}) we can filter out solutions that are not a subset of Ab. Note that ∆ ⊆ Ab if and only if v ∆ ×v Ab = v ∆ where × means element-wise multiplication. This is one such way to achieve post-filtering of solutions. Theorem 1. Given an abductive task < P, Ab, g >, the algorithm above pro- duces a subset of the abductive solutions that contains all minimal solutions. Proof. Propositions 1 and 6 ensure that linear algebraic operations allow the computation of ÊP∗ ({{g}}). Proposition 3 ensures the algorithm always halts and correctness can be seen from Propositions 4 and 5. 4 Discussion We have introduced a linear algebraic approach to abduction in propositional Horn programs via an explanatory operator and outline the correctness of the approach. We have also provided a characterization of the operator by use of third-order tensors. Tensor-Based Abduction in Horn Propositional Programs A straight-forward unoptimised implementation has been constructed and here we consider two possible optimisations: First, to prevent repeated computa- tions, we can maintain a list of interpretations that have already been multiplied by the tensor. At each step of the algorithm we can remove columns from the interpretation matrix U I if they already appear in this list. Second, our tensor may be broken down to a sequence of tensors AP P P 1 , A2 , ..., An , each embedding I rules that share the same head. Multiplying U by each tensor in sequence will produce the same result but can create fewer duplicate columns. While find- ing abductive solutions in Horn propositional programs is NP-complete [3], our bottom-up approach together with the suggested optimisations can eliminate many unnecessary computational steps. In the future, we plan to investigate extending our method to first-order logic, for instance by combining it with Sato’s tensor representation of Tarskian Semantics [10], initially starting with unary predicates. A second extension is to allow for normal logic programs P and exploit the connection between negation- as-failure and abduction [4] to compute models under stable semantics. References 1. Console, U., Dupr, D., Torasso, P.: On the relationship between abduc- tion and deduction. Journal of Logic and Computation 1(5), 661–690 (1991). https://doi.org/10.1093/logcom/1.5.661 2. Corapi, D., Russo, A., Lupu, E.: Inductive Logic Programming in Answer Set Programming. In: Muggleton, S.H., Others (eds.) Inductive Logic Programming. pp. 91–97. Springer Berlin Heidelberg, Berlin, Heidelberg (2012) 3. Eiter, T., Gottlob, G.: The complexity of logic-based abduction. J. ACM 42(1), 3–42 (Jan 1995). https://doi.org/10.1145/200836.200838 4. Eshghi, K., Kowalski, R.: Abduction Compared with Negation by Failure. pp. 234– 254 (01 1989) 5. Kolda, T.G., Bader, B.W.: Tensor Decompositions and Applications. SIAM RE- VIEW 51(3), 455–500 (2009) 6. Muggleton, S.H., Lin, D., Pahlavi, N., Tamaddoni-Nezhad, A.: Meta-interpretive learning: application to grammatical inference. Machine Learning 94(1), 25–49 (Jan 2014). https://doi.org/10.1007/s10994-013-5358-3 7. Ray, O.: Nonmonotonic Abductive Inductive Learning. Journal of Applied Logic 7(3), 329–340 (2009). https://doi.org/10.1016/j.jal.2008.10.007, special Issue: Ab- duction and Induction in Artificial Intelligence 8. Rocktäschel, T., Riedel, S.: End-to-end differentiable proving. In: Guyon, I., Oth- ers (eds.) Advances in Neural Information Processing Systems 30, pp. 3788–3800. Curran Associates, Inc. (2017) 9. Sakama, C., Inoue, K., Sato, T.: Linear Algebraic Characterization of Logic Pro- grams. In: KSEM. Lecture Notes in Computer Science, vol. 10412, pp. 520–533. Springer (2017), https://doi.org/10.1007/978-3-319-63558-3_44 10. Sato, T.: Embedding Tarskian Semantics in Vector Spaces (Mar 2017), http:// arxiv.org/abs/1703.03193v1 11. Serafini, L., Garcez, A.S.d.: Logic Tensor Networks: Deep Learning and Logical Reasoning from Data and Knowledge. CoRR abs/1606.04422 (2016) 12. van Emden, M.H., Kowalski, R.A.: The Semantics of Predicate Logic as a Pro- gramming Language. Journal ACM 23(4), 733–742 (Oct 1976)