Abduction as Satisfiability

                                    Petr Homola

                                      LoqTek
                                phomola@loqtek.com


       Abstract. Abduction is reasoning in propositional or first-order logic
       that provides explanations for observations. We show how context-free
       parsing and automated planning can be formulated using abductive rules
       and present a fast prover based on propositional satisfiability. Further-
       more we show how weighted abduction can be efficiently implemented in
       the prover.

       Keywords: abduction, satisfiability, parsing, planning


1     Introduction
Abduction is a form of ampliative reasoning that provides explanations for obser-
vations given a background theory. While abductive reasoning has proven useful
in many areas of artificial intelligence and computer science (e.g., discourse in-
terpretation [11], plan ascription [1], diagnostic expert systems), the fact that
even its propositonal version is NP-hard restricts its use with large ontologies.
This paper presents an algorithm for finding abductive proofs by converting the
logical representation into a satisfiability problem. In conjunction with a state-
of-the-art SAT solver, the transformation gives us an efficient abduction-based
prover.
    Abduction can be formally defined as follows: Let T be a first-order theory
and O a set of literals (“observations”). Given a set of abductive rules (in the form
antecedent Ñ consequent), abduction is finding a set of literals H (“hypothesis”)
such that

(1)                         T Y H ( O and T Y H * K

that is, T Y H entails O and is consistent.
   Abduction can be combined we other forms of reasoning, such as deduction.
Indeed, most ontologies consist of both deductive and abductive rules. Partic-
ularly useful is weighted abduction, a variant of probabilistic abduction that
can be used to find optimal proofs. What “optimal” means depends on what is
being reasoned about. In discourse analysis, for instance, the least specific proof
maximizing redundancy is preferred [11]. In the domain of automated planning,
the shortest proof (i.e., the shortest plan leading to the desired goal state) is
considered the best, and so forth.
   We show how context-free parsing with feature structures (Section 2) and
STRIPS-like (goal-driven) automated planning (Section 3) can be formulated
by means of abductive rules and present a transformation of abductive problems
into propositional satisfiability (Section 4). An extension of the procedure to
weighted abduction is described in Section 5. In Section 6 grounding is discussed
and we conclude in Section 7.


2     Context-free Parsing
It is well-known that parsing can be thought of as deduction in first-order logic.
On the other hand, a context-free grammar can be conceived as a set of abductive
rules and the input as observations. In this setting, finding a parse is proving
that there is an abductive proof of the “observed” terminal symbols (the input)
that bottoms out in the initial nonterminal symbol S. For example, in the rule

(2)                                   S Ñ NP VP

    S is the antecedent while NP and VP constitute the consequent. We use
Kowalski’s [15] notation and encode the input as a graph whose edges are la-
belled with the terminal symbols whereas nodes represent “interword points”.
For example, rule (2) is written in the abductive framework as

(3)                         Spx, zq Ñ NP px, yq ^ VP py, zq

which in turn is an abbreviation for

(4)                    @x, z.Spx, zq Ñ Dy.NP px, yq ^ VP py, zq

where x, y, and z are interword points.1
   The sentence (i.e., the “observation”)

(5)                             Time flies like an arrow

can be represented as

        tokenp0, 1, timeq ^ tokenp1, 2, fliesq ^ tokenp2, 3, likeq ^ tokenp3, 4, anq^
(6)
        ^tokenp4, 5, arrowq

   The lexicon consists of rules whose consequent is a terminal symbol while the
antecedent is the corresponding preterminal symbol. For example, the ambiguous
word flies is represented by two lexical rules:

                              N px, yq Ñ tokenpx , y, fliesq
(7)
                              V px, yq Ñ tokenpx , y, fliesq

   If we have an observation (input tokens) and a set of abductive rules (based
on a lexicon and a context-free grammar), we can use abductive reasoning to find
a parse (or parses if the input is ambiguous) based on the following principles:
1
    We could also add the constraint x ă y^y ă z but it is not needed if the observations
    are a translation of a correct graph.
 1. Each literal in the proof must be observed or abduced.
 2. Each observed or abduced literal must be explained by at most one rule.

    Note that we do not require that each observation be explained. However, we
can require that Sp0, eq (where e is the end node) be abduced, i.e., it must be
present in the proof. This integrity constraint ensures that the abductive proof
explains all observations and intermediate abduced literals unless there is no
proof that bottoms out in Sp0, eq, in which case the input is not well-formed
according to the grammar.
    In the abductive framework, functional constraints (such as f-structures in
LFG [13, 4]) can be easily added to the rules. Using a representation similar to
that of Wedekind [17], the antecedents of the lexical rules are augmented with
literals that represent feature structures and the antecedents of the grammar-
specific rules are enriched with literals that represent functional constraints (i.e.,
unification). The literals that represent nonterminal symbols will now have the
form

(8)                                    Xpx, y, f q

where f is a constant representing a feature structure. The integrity constraint
Sp0, eq becomes

(9)                  Sp0, e, f1 q _ Sp0, e, f2 q _ ¨ ¨ ¨ _ Sp0, e, fn q

where f1 , . . . , fn are all possible feature structures associated with the nonter-
minal symbol S spanning the whole sentence.
    The (somewhat simplified) lexical rule for arrow augmented with a feature
structure is
                 N px, y, fn px, yqq ^ attrpfn px, yq, pred, arrowq^
(10)
                 ^attrpfn px, yq, number, sgq Ñ tokenpx , y, arrowq
that is, the corresponding feature structure has two attributes and its identifier
is the Skolem function fn px, yq where fn is unique for every variant of the same
word:
                              «                     ff
(11)                           PRED         ‘arrow’
                               NUMBER sg

    The unification of feature structures can be represented by a reflexive, sym-
metric, and transitive predicate eq for which the following additional axioms
hold (C is the set of constants, F is the set of feature structures, and A is the
set of attributes):

          @f1 , f2 P F, a P A, v1 , v2 P F Y C, x, y P C
          1. x ‰ y Ñ eqpx, yq
(12)      2. attrpf1 , a, v1 q ^ attrpf2 , a, v2 q ^ eqpf1 , f2 q Ñ eqpv1 , v2 q
          3. attrpf1 , a, v1 q ^ attrpf2 , a, v2 q ^ eqpv1 , v2 q Ñ eqpf1 , f2 q
          4. attrpf1 , a, v1 q ^ eqpf1 , f2 q Ñ attrpf2 , a, v1 q
that is, two distinct constants can never be unified and each f P F can be
thought of as a function from A to F Y C.
    An example is in order. Let us assume the we want to analyze the fragment
will go using the rule

(13)                                VP Ñ Aux V

that is, an auxiliary and a following verb constitute a verb phrase. The corre-
sponding (simplified) feature structures of the words are
                                   «          ff
                                    will :
                                    TENSE fut
(14)                               «         ff
                                    go:
                                    PRED ‘go’

and we want the rule to unify them.2 The following axiom will effect what we
want:

(15)         VP px, z, f1 q ^ eqpf1 , f2 q Ñ Aux px, y, f1 q ^ V py, z, f2 q

that is, we unify f1 and f2 , which yields the expected feature structure
                                 «                ff
(16)                              PRED       ‘go’
                                  TENSE fut

   When backchaining on axiom (15), we assume the antecedent, that is, Aux
and V constitute a VP and their feature structures can be unified. We could also
use attrIn (set membership) instead of attr to express multiple dependencies
(adjuncts) with the same attribute, such as
          N px, z, f2 q ^ attrInpf2 , adj, f1 q Ñ Apx, y, f1 q ^ N py, z, f2 q
          »                                    fi
            PRED           ‘time’
(17)
          —NUMBER !pl
          —                                    ffi
          –                                  )ffi
                                               fl
            ADJ             [“good”], [“old”]

for good old times. It is not hard to see that the eq-axioms represent the unifi-
cation of feature structures used in unification grammars.3
    Completeness and coherence [4] can be encoded using a special predication,
say, hasAttr :

(18)                   @f, x, y.attrpf, x, yq Ñ hasAttrpf, xq
2
  Note that the feature structure of the auxiliary has no PRED attribute, for it is
  synsemantic. The main (autosemantic) word carries the semantic information of the
  phrase.
3
  The axioms define transitive closures in the sense of [17].
The lexical rules rules can then specify subcategorization frames by asserting
hasAttrpf, gf q or hasAttrpf, gf q where f is a feature structure and gf a sub-
categorizable grammatical function.
    We could also add rules producing first-order logical forms (by augmenting
a unification grammar as suggested by Homola [12]). In general, any annotation
expressible by means of first-order literals can be added to the antecedents of
the abductive rules.
    Note that the abductive rules for parsing are effectively propositional,4 i.e.,
we can use grounding and the transformation described below in Section 4 to
find all abductive proofs.


3      Goal-driven Planning

Solving a STRIPS-like planning problem [7] can be thought of as finding an
abductive proof. Let us assume that we want to find a plan of length n given an
initial state, a goal state, and a set of actions with preconditions and effects.
    The goal state is represented as Gpnq, that is, the goal predicate(s) are true
at time n. The initial state is represented as Ip0q, i.e., it is an integrity constraint
at time 0 that enforces an abductive explanation of the goal state unless there is
no plan of length n. Actions are converted into abductive rules. The antecedent
consists of the preconditions and an action literal and the consequent consists
of the effects. For example,

               agentAtpx, tq ^ boxAtpx, tq ^ actionpmoveBox, x, y, tq Ñ
(19)
               Ñ agentAtpy, t ` 1q ^ boxAtpy, t ` 1q

means that if the agent and the box are at location x at time t, the box can be
moved to location y. The consequent represents the effects at time t ` 1.5
  The frame axioms have the form

(20)                        agentAtpx, tq Ñ agentAtpx, t ` 1q

   The rules are effectively propositional since all variables in the antecedent
except in the action literal appear in the consequent and the actions do not add
any new variables.
   The same principles used for parsing

 1. Each literal in the proof must be observed or abduced.
 2. Each observed or abduced literal must be explained by at most one rule.
4
    In all rules, all variables in the antecedent appear in the consequent.
5
    We can add hard constraints such as

              actionpmoveBox, x , y, tq Ñ locationpxq ^ locationpyq ^ x ‰ y

    to restrict the state space.
   ensure that abductive reasoning finds all plans of length n.
   Automated planning formulated in the abductive framework has a desirable
property: The sequence of actions need not be linear, that is, parallel actions
can be generated if one so wishes. On the other hand, we can add constraints
that exclude undesired configurations, such as

(21)           @x, y, t.x ‰ y Ñ     agentAtpx, tq _      agentAtpy, tq

or rule out nonlinear plans by allowing only one action at any time t. We can even
have independent background theories about the modelled world interoperating
with the plan, but this goes beyond the scope of this paper.


4      Abduction as Propositional Satisfiability
Let us assume that we have a set of ground literals (observations, integrity con-
straints, and possibly hard constraints) and a set of propositionalized abductive
rules. In this section we show how the rules can be converted into a SAT problem
whose valuations encode all abductive proofs of the observations that bottom
out in the integrity constraints.

1. We add the observations and constraints to the clause set (as unit clauses).
2. We numerate the rules. We write the i-th rule as

    (22)                 ai,1 ^ ¨ ¨ ¨ ^ ai,n Ñ ci,1 ^ ¨ ¨ ¨ ^ ci,m

    For every rule, we add

    (23)               ri Ñ ai,1 ^ ¨ ¨ ¨ ^ ai,n ^ ci,1 ^ ¨ ¨ ¨ ^ ci,m

   to the clause set (ri is an auxiliary propositional variable representing the i-
   th rule). That is, if an abductive rule is applied, the literals in its consequent
   must be observed or abduced and the rule adds the literals in its antecedent
   to the proof.
3. For every literal which is not an observation or a known fact we add
                           ł
   (24)              lÑ       tr : l appears in the antecent of ru

   to the clause set. That is, l must be licensed by at least one rule.
4. For every literal which is observed or abduced we add

    (25)                             lÑ     ri _ rj

    to the clause set where ri and rj (i ‰ j) are all possible rule pairs that have
    l in the consequent. That is, l must be explained by at most one rule.

    The clause set obtained in steps 1–4 can be evaluated for satisfiability by
a propositional SAT solver (we use MiniSAT). It is not hard to see that there
is a 1-to-1 correspondence between valuations of the clause set and abductive
proofs. The observations, constraints, used rules (ri ), and abduced literals are
true. All other literals are false. In the case of parsing (Section 2), the parse can
be decoded from the positive rule literals. In the case of planning (Section 3),
the plan can be decoded from the abduced action literals.
   A simple example is given in form of a graph in Figure 1. The observation is
d and the abductive rules are
                                        a^bÑd
(26)
                                        b^c Ñd


                                          ?d_


                                rO1 _            ? rO2


                                a         b        c

                              Fig. 1. Abductive graph


    d appears in the proof because it is an observation. r1 and r2 exclude each
other because they share a propositional variable in the consequent (i.e., r1 _
  r2 ). If r1 is applied, a, b and d must appear in the proof. Likewise, if r2 is
applied, b, c and d must appear in the proof. If a appears in the proof, r1 must
be true because it is the only rule with a in the antecedent. If b appears in the
proof, r1 or r2 must be true, and so forth. The clause set can be thought of as
constraints that ensure that all its valuations conform to the rules of abductive
reasoning. The subgraph obtained from the abductive graph by removing all
false propositional variables (according to a valuation) represents an abductive
proof that explains the observations and satisfies all constraints.
    In the case of parsing, our algorithm seems to be faster in practice than
chart-based algorithms (whose theoretical complexity is Opn3 q). However, the
algorithm becomes much more interesting if we use it for parsing with unification
grammars, which is known to be NP-hard [2] (although natural languages may
well be parsable in subexponential time [6]). Note that the parser is neither
top-down nor bottom-up since the SAT solver alone decides the order of steps
carried out to find a valuation.
    In the case of automated planning, the abductive approach does not give us
a faster algorithm (it is very similar to SATPLAN [14]), but since the process
of finding abductive proofs can be visualized by means of rule graphs, it reveals
that SATPLAN and GRAPHPLAN [3] are merely two manifestations of the
same principle.
5      Weighted Abduction

We are using SAT-based abduction in a project that includes natural language
understanding. A unification grammar is used for parsing. Subsequently, Hobbs
et al.’s [11] weighted abduction is used to interpret the logical form(s) in the
discourse context at hand.
    In weighted abduction, rules of the following form are used:

(27)                             pω            ωn
                                  1 ^ ¨ ¨ ¨ ^ pn Ñ q
                                   1


When backchaining on a consequent whose weight is ω, the weight of the (as-
sumed) i-th literal in the antecedent is

(28)                                     ωi ω
   řn
If i“1 ωi ω is less than 1, proofs that explain
                                            řn the consequent by assuming some
literals that make it true are preferred. If i“1 ωi ω is greater than 1, it is cheaper
to assume the consequent. The weights used in rules depend on the modelled
problem. In the domain of language understanding, for example, in a rule such
as

(29)                       elephantpxqω1 Ñ mammalpxq

we want ω1 ą 1 for the class of individuals described by the antecedent is more
specific than that described by the consequent.
    Weighted abduction requires finding all abductive proofs and assigning a
weight to them. If the used SAT solver returns at most one valuation, we can
iteratively add its negation to the clause set until it is unsatisfiable. In other
words, if v1 , . . . , vn is a valuation, we add the clause

(30)                      pv1 ^ ¨ ¨ ¨ ^ vn q ”   v1 _ ¨ ¨ ¨ _ vn

to the clause set.
    A valuation obtained from the SAT solver can be used to assign a weight
to the corresponding proof using a recursive algorithm. At the beginning, the
observations have non-zero weights and the known facts have zero weights. An
abduced literal without a weight must appear in at least one rule such that ri is
true. If there are literals in the consequent of the rule that do not have a weight,
we recurse and compute those weights first. Otherwise, we compute the weight
using the corresponding weight factor in the antecedent of the rule. If there are
more applied rules, we use the lowest weight.
    As for reasoning, we use a subset of the theory described in [10]. It seems
that disjunction occurs only in the reified logical connectives (and1 , or1 , etc.) that
operate on eventualities for which Rexist or Rexist holds. Thus we use stable
model semantics [8] to expand all disjunctions (most disjunctions are ruled out
in the process of interpretation) and evaluate queries to the knowledge base(s)
using Chen and Warren’s [5] SLG resolution, which seems to be very efficient in
the domain of commonsense reasoning.6
   Weighted abduction is considered intractable because in order to find the
best proof we have to evaluate all possible proofs [1]. Possible solutions may be
to extend the algorithm for use with a MAXSAT solver or to design a special
SAT solver optimized for abductive tasks. Either way, there is a broad field for
experiments and further research.


6      On Grounding
In the abductive framework, ontologies are typically expressed in first-order logic.
In order to be able to use a SAT solver to find abductive proofs, we have to
propositionalize (ground) the theory. This entails a few problems.
    In order to find all valid proofs, the clause set (or the abductive graph)
must contain all instantiations of the abductive rules that may occur in a proof.
In the case of parsing or planning, all variables in the antecedent appear in
the consequent (with the exception of the action predicates), which makes the
grounding straightforward (in a top-down way starting with the observations).
Problems arise if there is an existentially bound variable in the antecedent, as
in the rules
            a. @x.rDy.inf ectedpy, malariaq ^ bloodT ransf usedpy, xqs Ñ
               Ñ inf ectedpx, malariaq
(31)
            b. @x.rDy.mosquitopyq ^ inf ectedpy, malariaq ^ bitepy, xqs Ñ
               Ñ inf ectedpx, malariaq

that is, if a person is infected with malaria, he may have undergone a blood
transfusion from an infected donor or he may have been biten by an infected
mosquito (in short, blood transfusions and mosquito bites transmit malaria).
The problem is that when backchaining on axioms (31), the existence of y is
either assumed or y is unified with a person or mosquito in the knowledge base.
In the example at hand, the donor should be unified with an individual in the
knowledge base (medical evidence) whereas in the case of a mosquito bite, the
existence of an infected mosquito will virtually always be assumed. In general,
we cannot exclude either possibility. Therefore, if there is a rule of the form

(32)                            @x.rDy.P pyqs Ñ Qpxq

we add to the clause set all rule instantiations in which the individual substituted
for y is compatible with P , and a special instantiation in which we substitute a
new individual (a Skolem constant) for y. This solution to the grounding problem
is particularly useful in the case of weighted abduction, for we can use weights to
prefer one of the propositionalizations. In discourse interpretation, for instance,
definite noun phrases (such as the man) should be unified with an individual
6
    Alternatively, Loveland’s [16] resolution for “slightly non-Horn (near Horn) clause
    sets” could be used.
in the knowledge base while indefinite noun phrases (such as a man) are much
more likely to introduce a new individual.
    Even though every theory with the bounded-term-size property can be propo-
sitionalized, the number of rules can grow exponentially. It is therefore important
to use a fast algorithm (in the typical case, grounding takes more time than SAT
solving). We use a modified RETE algorithm to find all relevant rule instantia-
tions.


7   Conclusions
We showed how context-free parsing and automated planning can be formulated
in an abductive framework and how abductive proofs can be efficiently found by
converting ontologies into a propositional clause set whose valuations correspond
to abductive proofs. Since modern SAT solvers (such as MiniSAT) are very fast,
the procedure presented in this paper can be used as an efficient abductive
prover. Moreover, integrity constraints and hard (deductive) rules can be used
to rule out proofs that are undesirable in the modelled domain.
    Since abduction is widely used in many areas of artificial intelligence and
computer science, such as discourse interpretation [11], plan ascription [1], di-
agnostic expert systems, etc., but is NP-hard, it is important to have a fast
procedure for finding abductive proofs. We offer one by capitalizing on the enor-
mous progress achieved in the field of SAT solving in the last decade.
    The most important problem is to find a fast grounding algorithm for theories
based on weighted abduction, as they may contain deductive rules, coercion
rules [9], etc. Furthermore, we would like to test the performance of our algorithm
with an ontology consisting of (tens of) thousands of facts and axioms.
                             Bibliography


 [1] Appelt, D.E., Pollack, M.E.: Weighted Abduction for Plan Ascription. User
     Modeling and User-Adapted Interaction 2(1–2), 1–25 (1992)
 [2] Berwick, R.: Computational Complexity and Lexical Functional Grammar.
     American Journal of Computational Linguistics 8, 20–23 (1982)
 [3] Blum, A.L., Furst, M.L.: Fast Planning Through Planning Graph Analysis.
     Artificial Intelligence 90, 281–300 (1997)
 [4] Bresnan, J.: Lexical-Functional Syntax. Blackwell Textbooks in Linguistics,
     New York (2001)
 [5] Chen, W., Warren, D.S.: Tabled Evaluation with Delaying for General Logic
     Programs. Journal of the ACM 43(1), 20–74 (1996)
 [6] Dalrymple, M., Kaplan, R.M., Maxwell III, J.T., Zaenen, A.: Formal Issues
     in Lexical-Functional Grammar. CSLI Publications (1995)
 [7] Fikes, R., Nilsson, N.J.: STRIPS: A New Approach to the Application of
     Theorem Proving to Problem Solving. Artificial Intelligence 2, 189–208
     (1971)
 [8] Gelfond, M., Lifschitz, V.: The Stable Semantics for Logic Programs. In:
     Kowalski, R., Bowen, K. (eds.) Proceedings of the 5th International Sym-
     posium on Logic Programming. pp. 1070–1080 (1988)
 [9] Hobbs, J.R.: Syntax and Metonymy. In: Bouillon, P., Busa, F. (eds.)
     The Language of Word Meaning, pp. 290–311. Cambridge University Press
     (2001)
[10] Hobbs, J.R.: Encoding Commonsense Knowledge (2005), ms, Information
     Sciences Institute, University of Southern California, Marina del Rey
[11] Hobbs, J.R., Stickel, M., Appelt, D., Martin, P.: Interpretation as Abduc-
     tion. Artificial Intelligence 63, 69–142 (1993)
[12] Homola, P.: Neo-Davidsonian Semantics in Lexicalized Grammars. In: Pro-
     ceedings of the International Conference on Parsing Technologies. pp. 134–
     140 (2013)
[13] Kaplan, R.M., Bresnan, J.: Lexical-Functional Grammar: A formal system
     for grammatical representation. In: Bresnan, J. (ed.) Mental Representation
     of Grammatical Relations. MIT Press, Cambridge (1982)
[14] Kautz, H., Selman, B.: Planning as Satisfiability. In: Proceedings of ECAI-
     92. pp. 359–363 (1992)
[15] Kowalski, R.: Logic for Problem Solving. Oxford University Press (1979)
[16] Loveland, D.W.: Automated Theorem Proving: Mapping Logic into AI. In:
     Proceedings of the International Symposium on Methodologies for Intelli-
     gent Systems. pp. 214–229 (1986)
[17] Wedekind, J.: Classical Logics for Attribute-Value Languages. In: Proceed-
     ings of the 5th EACL conference. pp. 204–209 (1991)