Tensor-Based Abduction in Horn Propositional
                 Programs

               Yaniv Aspis, Krysia Broda, and Alessandra Russo

               Department of Computing, Imperial College London
               {yaniv.aspis17,k.broda,a.russo}@imperial.ac.uk


      Abstract. This paper proposes an algorithm for computing solutions of
      abductive Horn propositional tasks using third-order tensors. We first in-
      troduce the notion of explanatory operator, a single-step operation based
      on inverted implication, and prove that minimal abductive solutions of
      a given Horn propositional task can be correctly computed using this
      operator. We then provide a mapping of Horn propositional programs
      into third-order tensors, which builds upon recent work on matrix rep-
      resentation of Horn programs. We finally show how this mapping can be
      used to compute the explanatory operator by tensor multiplication.

      Keywords: Abduction · Linear Algebra · High-Order Tensor


1   Introduction
Linear algebra plays a central role in both Artificial Intelligence and Machine
Learning. In particular, statistical approaches to AI often involve analysis and
manipulation of feature vector spaces for the purpose of learning and inference.
For example, high-order tensors and dense vectors have been employed to sup-
port logic-based deductive and inductive inference [11,8], but for a limited class
of logic programs. The area of symbolic AI, however, has seen only limited usage
of linear algebra. Two methods have recently been proposed for embedding logic
programs into vector spaces. The first, [10], allows the computation of the truth
value of a First-Order formula using tensor products. The second, [9], offers a
characterization of logic programs using linear algebra by mapping Horn logic
programs into matrices, and disjunctive and normal programs into third-order
tensors in order to support deductive inference.
    Our work builds upon these recent results as a first step towards a tensor-
based approach for inductive learning. Abduction plays an important part in
the inference of knowledge completion, and is related to deduction and Clark
Completion [1]. It has been used in several inductive logic programming ap-
proaches. For instance, XHAIL [7] uses abductive inference to compute suitable
head atoms for hypothesis clauses, and both ASPAL [2] and Metagol [6] use an
abductive meta-representation of the hypothesis space to search for inductive
solutions.
    An abductive task includes an observation, a background knowledge, in the
form of logical rules, and a set of abducible atoms (or explanations), with which
        Yaniv Aspis, Krysia Broda, and Alessandra Russo

to construct an explanation to the given observation. Explanations must be
consistent with the given background knowledge and integrity constraints (if
any). An abductive solution is a subset of abducibles that, together with the
background knowledge, consistently explain the given observation. In this paper
we consider abductive tasks where the background knowledge is represented as
a set of propositional Horn clauses. We refer to such a task as an abductive
propositional Horn task. Drawing on the work of Sakama et al. [9], we propose
an algorithm for computing solutions of an abductive propositional Horn task
using third-order tensors. We first introduce the notion of explanatory operator,
a single-step operation based on inverted implication. We prove that minimal
abductive solutions of an abductive propositional Horn task can be correctly
computed using this operator. We then provide a mapping for expressing sets of
propositional Horn clauses into third-order tensors and show how this mapping
can be used to compute the explanatory operator by tensor multiplication.
    The paper is structured as follows. Section 2 introduces the basic notions
and terminologies. Section 3 presents our approach with theoretical results and
Section 4 discusses its implementation and future research directions.


2    Background

We assume the reader is familiar with logic programming, linear algebra and the
basics of high-order tensors [5]. A Horn rule is of the form h ← b1 ∧ b2 ∧ ... ∧ bn ,
where h, b1 , b2 , ..., bn are propositional variables (called atoms) belonging to an
alphabet Σ. The atom h is the head of r, denoted head(r), and the set of atoms
{b1 , b2 , ..., bn } is the body of r, denoted body(r). Constants ⊥ and > represent
false and true respectively and also belong to Σ. A fact is a rule of the form
h ← >, and a constraint is a rule of the form ⊥ ← b1 ∧...∧bn . A Horn program P
is a finite set of Horn rules, facts and constraints. The set of atoms that appear
in P , along with ⊥ and >, is called the Herbrand base of P and denoted BP .
An Interpretation I of P is a subset of BP , containing also > (implicitly). If
I contains ⊥ we say I is inconsistent. I is a model of P if it is consistent and
satisfies the property: if body(r) ⊆ I then head(r) ∈ I. We say a program P is
satisfiable, denoted sat(P ) if it has a model, and if not we write unsat(P ). Horn
programs have at most one minimal model, called the Least Herbrand Model
(LHM (P )) [12]. A program P ∪ I is a short hand notation for adding the atoms
of I as facts to a program P . The Immediate Consequence        operator of a program
P , TP , is defined over interpretations by TP (I) = head(r) r ∈ P, body(r) ⊆ I .
For every Horn Program P there exists a smallest natural number n such that
TPn+1 (∅) = TPn (∅). TPn (∅) is called a fixed point. If a program is satisfiable, the
fixed point is equal to LHM (P ) [12].
    Sakama et al. have proposed in [9] a mapping of Horn programs into matri-
ces. This is defined only for programs that satisfy a Multiple Definitions (MD)
condition: for every two rules r1 and r2 such that head(r1 ) = head(r2 ) then
|body(r1 )| ≤ 1 and |body(r2 )| ≤ 1. In other words, if an atom is head of more
than one rule, then those rules must have no more than one propositional vari-
                   Tensor-Based Abduction in Horn Propositional Programs

able in their body. It has been shown that every program can be mapped into a
program satisfying the MD condition [9].
    We now describe a modified version of the mapping proposed in [9]. Let P be
a Horn program with Herbrand base BP = {⊥, >, p3 , ..., pN }. First, elements of
BP are mapped into one hot-vectors in RN . An interpretation I can be mapped
into a vector v I by setting vi = 1 if pi ∈ I, else vi = 0. A program P is
mapped into a matrix D P ∈ RN ×N as follows. For every rule r of the form
                                          P       P             P      1
pi ← pj1 , pj2 , ..., pjm , the elements Dij 1
                                               = Dij 2
                                                       = ... = Dij m
                                                                     = m . Elements
  P            P
D11 = ... = DN 1 = 1, indicating that “⊥ implies everything”,
and elements D21P            P
                   = ... = D2N  = 1, denoting that “every-       ⊥ > p q r t s 
                                                                ⊥  1 000 001
thing implies >”. Finally, differently from [9], we also set
  P      P            P                                         > 1 1 1 1 1 1 1 
D11  = D22 = ... = DN   N = 1, representing the tautologies
                                                                        1 1
                                                                                 
                                                                p 1 0 1 2 2 0 0 
pi ← pi for every atom pi in P . This addition is required                      
                                                                q 1 0 0 1 0 1 0 
for the abductive procedure we propose. As an example, for                      
                                                                r 1 1 0 0 1 0 0 
the program P = {p ← q ∧ r, q ← t, r ← >, t ← s, ⊥ ← s},                        
                                                                t 1 0 0 0 0 1 1 
the corresponding matrix D P can be seen on the right.
                                                                s 1 0 0 0 0 0 1

Let now H1 : R → {0, 1} be defined as H1 (x) = 1, if x ≥ 1, else 0. Analogously
to Proposition 3.2 in [9] we can prove the following proposition:
Proposition 1. Let I and J be two interpretations. Then J = TP (I) ∪ I if and
only if v J = H1 (D P · v I ) where H1 is applied element-wise.
Proposition 2. Let TP0 (I) = TP (I) ∪ I. Then TP0k (I) is a fixed point for a
large enough k. If sat(P ∪ I) then TP0k (I) = LHM (P ∪ I), otherwise TP0k (I) is
inconsistent.
Hence, given a program P and an interpretation I, we can construct the cor-
responding matrix D P and vector v I . By repeatedly multiplying by D P and
applying H1 , we can compute a vector v ∗ corresponding to the fixed point of
TP0 (I). If P is satisfiable, by Proposition 2, v ∗ corresponds to the LHM (P ∪ I).
Otherwise v1∗ = 1, meaning the interpretation includes ⊥.
     An abductive task is a triplet < P, Ab, g >, where program P consists of back-
ground knowledge and constraints. Ab ⊆ BP is a set of atoms called abducibles.
g is an atom called an observation or goal term. We define an abductive solution
to be a set ∆ ⊆ Ab such that sat(P ∪ ∆) and g ∈ LHM (P ∪ ∆). A solution
∆ is minimal if there does not exist ∆0 ⊆ ∆ such that ∆0 is also an abductive
solution. We are interested in finding minimal solutions of an abductive task.


3   Method
We first define a single-step operator, the explanatory operator (EP ). This is
similar in spirit to TP , and it allows us to generate abductive solutions in a
“deductive” fashion. For simplicity, we assume Ab = BP . We will later consider
filtering solutions according to a given specific set of abducibles. The idea be-
hind the explanatory operator is inverting implications. For instance, suppose
        Yaniv Aspis, Krysia Broda, and Alessandra Russo

P contains the rules {g ← q, g ← r, r ← s ∧ t}. Then possible (minimal) expla-
nations for g are {g} (since g ∈ LHM (P ∪ {g})), {q}, {r} and {s, t}. To arrive
at these solutions, our operator takes an interpretation such as {g} and for each
rule with g as its head, creates a new interpretation consisting of the atoms in
the body of that rule. So in the example above, the explanatory operator maps
{g} to the two interpretations {q} and {r}. An interpretation I generated in
this manner may be inconsistent with the program (that is, unsat(P ∪ I)). So
we remove from its output any interpretations that are inconsistent with P . The
explanatory operator output includes I as well, as we would like to maintain
solutions that we have already found as we repeatedly perform the operation.
Definition 1. Let P be a program and I an interpretation such that sat(P ∪ I).
The explanatory operator EP applied to I, denoted EP (I) is given by
                                                         
                                                            ∩ I 0 sat(P ∪ I 0 )
                                                             
EP (I) = {I} ∪ I\{head(r)} ∪ body(r) r ∈ P, head(r) ∈ I

To apply EP iteratively, we extend its definition to ÊP , which can be ap-
plied to a set of Sinterpretations, I, all of which should be consistent with P .
Then ÊP (I) =       EP (I). We can now define the powers of ÊP (I) by setting
                 I∈I
ÊP0 (I) = I and ÊPn+1 (I) = ÊP (ÊPn (I)). For the program above, ÊP0 ({{g}}) =
{{g}}, ÊP1 ({{g}}) = {{g}, {q}, {r}}, ÊP2 ({{g}}) = {{g}, {q}, {r}, {s, t}} and
ÊP3 ({{g}}) = ÊP2 ({{g}}). ÊP2 ({{g}}) is therefore a fixed point. The next propo-
sition ensures that a fixed point will always be reached.
Proposition 3. Let I be a set of interpretations all consistent with a program
P . Then there exists a smallest natural number n for which ÊPn+1 (I) = ÊPn (I).
Proof. Note that, by the definition of EP , for every I ∈ I we have I ∈ EP (I),
meaning I ⊆ ÊP (I). By induction it follows that ÊPn (I) ⊆ ÊPn+1 (I). Hence
|ÊPn (I)| is monotonically increasing in n. But for a finite program P , the Her-
brand base BP is finite and therefore |ÊPn (I)| is bounded from above by 2|BP | .
Therefore, there exists a smallest n for which ÊPn (I) = ÊPn+1 (I).           t
                                                                                u

We denote the fixed point by ÊP∗ ({{g}}). A generic abductive algorithm would
construct ÊP∗ ({{g}}) by iteratively computing ÊP over interpretations. This is
connected to the relationship between abduction and Clark Completion [1].
Proposition 4. Suppose ∆ ∈ ÊP∗ ({{g}}). Then sat(P ∪ ∆) and g ∈ LHM (P ∪
∆) and ∆ is an abductive solution.
Proposition 4 shows that ÊP∗ ({{g}}) contains only abductive solutions. Although
it does not in general find all solutions, it does contain all the minimal ones. To
show this, we first define a more general version of minimality.
Definition 2. Given an abductive task < P, Ab, g >, a solution ∆ is n-step
minimal if g ∈ TPn∪∆ (∅) and for every proper subset ∆0 ( ∆ we have g 6∈
TPn∪∆0 (∅).
                   Tensor-Based Abduction in Horn Propositional Programs

Intuitively, ∆ is n-step minimal if g can be derived from P ∪ ∆ in exactly n steps,
and every atom in ∆ is necessary to do so. For example, consider the program
P = {g ← p ∧ q, p ← t, p ← q}. We have ÊP∗ ({{g}}) = {{g}, {p, q}, {t, q}, {q}}.
{p, q} is a solution that is not minimal, in fact it is 2-step minimal. {q} is
a minimal solution that is not 2-step minimal, but is 3-step minimal. {q, t} is
another solution that is not minimal, nor is it n-step minimal for any n. It is easy
to see that all minimal solutions are n-step minimal for some natural number n.

Proposition 5. Suppose that for the abductive task < P, BP , g > we have
that ∆ is an n-step minimal solution for some natural number n. Then ∆ ∈
ÊP∗ ({{g}}).

We now move from a generic algorithm to a linear algebraic one. First, we
construct a third-order tensor which realises the ÊP operator. Interpretations,
as before, can be represented as vectors. To represent a set of interpretations, we
simply stack these vectors as columns of a matrix. Using 2-mode multiplication
[5] on this matrix with the tensor realises application of the ÊP operator. This
operation, however, does not produce quite the desired result. Firstly, there may
be duplicate columns, which can simply be removed. More importantly, some
of the columns will represent interpretations inconsistent with P . In essence,
tensor
       multiplication will realise the following part of the EP operation: {I} ∪
  I\{head(r)} ∪ body(r) r ∈ P, head(r) ∈ I . Luckily, we have a simple way to
remove inconsistent solutions: We can compute the fixed point for each column
of the matrix using matrix multiplication as in Section 2. Filtering these out, we
get exactly the behaviour of EP and achieve linear algebraic abduction.
     In more detail, let P be a program with K − 1 non-constraint rules such that
BP has size N . We map a program P to an Abductive Tensor AP ∈ RN ×N ×K .
The first K − 1 frontal slices of AP correspond to its (inverted) rules. If the
k − th rule has head(r) = pj and body(r) = {pi1 , pi2 , ..., pim } then we set
AP          P               P
  i1 jk = Ai2 jk = ... = Aim jk = 1. This ensures that if pj is “turned on” in
an interpretation, then multiplication with this frontal slice will “turn on” the
atoms in the body of the rule. We also set AP             P             P
                                                 11k = A22k = ... = Ai−1,i−1,k =
AP                       P
  i+1,i+1,k = ... = AN,N,k = 1, with all other elements set to 0 (note that
  P
Ai,i,k = 0). This will “turn off” pj while leaving all other atoms intact. We do
this for all K − 1 rules. Finally, the last slice is equal to the identity matrix
AP ::K = I, and corresponds to keeping the current interpretation. As an exam-
ple, consider the program P = {g ← q ∧ r, q ← t, ⊥ ← t}. The frontal slices of
the corresponding tensor are:

        ⊥ > g q r t                ⊥ > g q r t                 ⊥ > g q r t 
       ⊥ 1 0 0 0 0 0                ⊥ 1 0 0 0 0 0                 ⊥ 1 0 0 0 0 0
       > 0 1 0 0 0 0              > 0 1 0 0 0 0               > 0 1 0 0 0 0 
                                                                            
       g 0 0 0 0 0 0              g 0 0 1 0 0 0               g 0 0 1 0 0 0 
AP
 ::1 =                     AP
                              ::2 =                      AP
                                                            ::3 =               
       q 0 0 1 1 0 0              q 0 0 0 0 0 0               q 0 0 0 1 0 0 
                                                                            
       r 0 0 1 0 1 0              r 0 0 0 0 1 0               r 0 0 0 0 1 0 
       t 0 0 0 0 0 1                t 0 0 0 1 0 1                 t 0 0 0 0 0 1
       Yaniv Aspis, Krysia Broda, and Alessandra Russo

Our generic algorithm can then be realised as follows: Given a program P and
goal g, construct the tensor AP and matrix D P . For a set of interpretations
I construct the corresponding matrix U I and then compute ÊP (I) as follows.
Let W = H1 (AP ×2 U I ), where duplicate columns are removed from W . Next
compute the fixed point for each column of W , using D P and the same method
outlined in Section 2. The first element of each column corresponds to ⊥ and if set
to 1 in the fixed point it indicates the column can be removed since it represents
an inconsistent interpretation. If we begin with I = {{g}} and perform the
above steps iteratively, we compute ÊP∗ ({{g}}). We only need to ensure {g} is
consistent with P . Each column of the final matrix corresponds to an abductive
solution, with all minimal solutions assured to be included. For example, for the
above program the abductive process is as follows:
                                                                      
              0        000                    00        000000                 00
            1      1 1 1                 1 1    1 1 1 1 1 1          1 1
                                                                      
            1  0 1 1
H1 AP ×2                     H1 AP ×2 0 1 = 0 0 0 0 1 1 ⇒ W = 0 1
                                                                         
                =
            0      1 0 0                 1 0    1 0 1 1 0 0          1 0
                                                                      
            0      1 0 0                 1 0    1 1 1 1 0 0          1 0
              0        000                    00        010000                 00

The matrix W is the fixed point, its columns corresponding to {q, r} and {g},
while the column corresponding to {r, t} has been removed as it is inconsistent
with the program. The next proposition ensures the correctness of the tensor
representation.

Proposition
           6. Let I be an interpretation consistent with a program P . Denote
J = {I} ∪ I\{head(r)} ∪ body(r) r ∈ P, head(r) ∈ I . Let U = H1 (AP ×2 v I ).
Then the columns of U correspond to the interpretations in J .

Finally, we consider the case where we have a general set of abducibles Ab ⊆ BP .
After computing ÊP∗ ({{g}}) we can filter out solutions that are not a subset of
Ab. Note that ∆ ⊆ Ab if and only if v ∆ ×v Ab = v ∆ where × means element-wise
multiplication. This is one such way to achieve post-filtering of solutions.

Theorem 1. Given an abductive task < P, Ab, g >, the algorithm above pro-
duces a subset of the abductive solutions that contains all minimal solutions.

Proof. Propositions 1 and 6 ensure that linear algebraic operations allow the
computation of ÊP∗ ({{g}}). Proposition 3 ensures the algorithm always halts
and correctness can be seen from Propositions 4 and 5.


4   Discussion
We have introduced a linear algebraic approach to abduction in propositional
Horn programs via an explanatory operator and outline the correctness of the
approach. We have also provided a characterization of the operator by use of
third-order tensors.
                    Tensor-Based Abduction in Horn Propositional Programs

    A straight-forward unoptimised implementation has been constructed and
here we consider two possible optimisations: First, to prevent repeated computa-
tions, we can maintain a list of interpretations that have already been multiplied
by the tensor. At each step of the algorithm we can remove columns from the
interpretation matrix U I if they already appear in this list. Second, our tensor
may be broken down to a sequence of tensors AP           P         P
                                                    1 , A2 , ..., An , each embedding
                                                 I
rules that share the same head. Multiplying U by each tensor in sequence will
produce the same result but can create fewer duplicate columns. While find-
ing abductive solutions in Horn propositional programs is NP-complete [3], our
bottom-up approach together with the suggested optimisations can eliminate
many unnecessary computational steps.
    In the future, we plan to investigate extending our method to first-order
logic, for instance by combining it with Sato’s tensor representation of Tarskian
Semantics [10], initially starting with unary predicates. A second extension is to
allow for normal logic programs P and exploit the connection between negation-
as-failure and abduction [4] to compute models under stable semantics.

References
 1. Console, U., Dupr, D., Torasso, P.: On the relationship between abduc-
    tion and deduction. Journal of Logic and Computation 1(5), 661–690 (1991).
    https://doi.org/10.1093/logcom/1.5.661
 2. Corapi, D., Russo, A., Lupu, E.: Inductive Logic Programming in Answer Set
    Programming. In: Muggleton, S.H., Others (eds.) Inductive Logic Programming.
    pp. 91–97. Springer Berlin Heidelberg, Berlin, Heidelberg (2012)
 3. Eiter, T., Gottlob, G.: The complexity of logic-based abduction. J. ACM 42(1),
    3–42 (Jan 1995). https://doi.org/10.1145/200836.200838
 4. Eshghi, K., Kowalski, R.: Abduction Compared with Negation by Failure. pp. 234–
    254 (01 1989)
 5. Kolda, T.G., Bader, B.W.: Tensor Decompositions and Applications. SIAM RE-
    VIEW 51(3), 455–500 (2009)
 6. Muggleton, S.H., Lin, D., Pahlavi, N., Tamaddoni-Nezhad, A.: Meta-interpretive
    learning: application to grammatical inference. Machine Learning 94(1), 25–49
    (Jan 2014). https://doi.org/10.1007/s10994-013-5358-3
 7. Ray, O.: Nonmonotonic Abductive Inductive Learning. Journal of Applied Logic
    7(3), 329–340 (2009). https://doi.org/10.1016/j.jal.2008.10.007, special Issue: Ab-
    duction and Induction in Artificial Intelligence
 8. Rocktäschel, T., Riedel, S.: End-to-end differentiable proving. In: Guyon, I., Oth-
    ers (eds.) Advances in Neural Information Processing Systems 30, pp. 3788–3800.
    Curran Associates, Inc. (2017)
 9. Sakama, C., Inoue, K., Sato, T.: Linear Algebraic Characterization of Logic Pro-
    grams. In: KSEM. Lecture Notes in Computer Science, vol. 10412, pp. 520–533.
    Springer (2017), https://doi.org/10.1007/978-3-319-63558-3_44
10. Sato, T.: Embedding Tarskian Semantics in Vector Spaces (Mar 2017), http://
    arxiv.org/abs/1703.03193v1
11. Serafini, L., Garcez, A.S.d.: Logic Tensor Networks: Deep Learning and Logical
    Reasoning from Data and Knowledge. CoRR abs/1606.04422 (2016)
12. van Emden, M.H., Kowalski, R.A.: The Semantics of Predicate Logic as a Pro-
    gramming Language. Journal ACM 23(4), 733–742 (Oct 1976)