Natural Language Understanding as First-Order Abduction via Stable Models

                                                    Petr Homola
                                                    Codesign Ltd.
                                               phomola@codesign.cz


                         Abstract                                 2     Preliminaries
                                                                  Abduction is the reasoning process of finding explanations for
     We present a method for finding abductive proofs             observations. In NLU, the “observations” are (logical forms
     in first-order Horn theories by using an answer-set          of) sentences and the interpretation of a sentence is the “best”
     solver. We illustrate our solution with examples             (i.e., most coherent) abductive proof that explains it with re-
     from the domain of natural language understand-              spect to a background theory. Formally, I is an interpretation
     ing. Furthermore we describe a novel way of rank-            of observations O with respect to a background theory T if1
     ing abductive proofs of logical forms of sentences.
                                                                  (1)                   T ∪ I  O ∧ T ∪ I 6 ⊥
                                                                  that is, T ∪ I entails O and is consistent. The sentence John is
1   Introduction                                                  an elephant may mean that there is an actual elephant whose
                                                                  name is John, i.e., I can be the literal meaning of the sentence.
From a pragmatic perspective, natural language understand-        But if we know (from context) that John is a person, T ∪ I
ing (NLU) can be thought of as first-order abduction [Hobbs       will be inconsistent, hence I cannot be the literal meaning
et al., 1993; Ovchinnikova et al., 2014]. The sentence we         of the sentence and we have to find some other (nonliteral)
want to interpret is what is “observed” and the “best” abduc-     interpretation that makes sense in the given context.
tive proof tells us what the sentence actually means. Abduc-         We use the logical representation proposed
tive reasoning is ampliative, that is, what we conclude cannot    by Hobbs (1985), which is a “conjunctivist” scope-free
be proved deductively, but it extends our background knowl-       first-order approach to linguistic meaning. Consider the
edge in a coherent way.                                           sentence
   Abduction is intractable (cf. [Appelt and Pollack, 1992],      (2)                        John sees Mary.
“even in the case of propositional Horn clause theories, the
problem of computing an explanation for an arbitrary propo-       Irrelevant details aside, its logical representation is2
sition is NP hard”). Propositional abduction is known to          (3)      (∃e, x, y)see ′ (e, x, y) ∧ John(x) ∧ Mary(y)
be implementable as stable model enumeration [Gelfond and
Kahl, 2014] but in the case of NLU the underlying theory          that is, there is an eventuality e which is a seeing, John does
is first-order. In early experiments with interpretation as       it and Mary undergoes it. The logical representation of
(weighted, i.e., cost-based) abduction, a Prolog-based abduc-
tive theorem prover was used [Hobbs et al., 1993]. Later          (4)                   John doesn’t see Mary.
a more efficient algorithm based on integer linear program-       would be
ming was developed [Inoue et al., 2012; Ovchinnikova et al.,
2014]. But neither approach allows for seamless integration                 (∃e1 , e2 , x, y)not ′ (e1 , e2 ) ∧ see ′ (e2 , x, y)∧
                                                                  (5)
of abduction with full-fledged deduction. In this contribution                           ∧John(x) ∧ Mary(y)
we present a method for comparatively efficient first-order
                                                                  that is, the negation of the eventuality expressed in (3) is as-
abduction based on answer-set solving. We use the solver
                                                                  serted. Hobbs (2005) showed that an appropriately rich and
described in [Gebser et al., 2012].
                                                                      1
   In Section 2 we give an overview of the problem illustrated          T is a set of first-order formulae and I and O are sets of positive
with a simple example. Section 3 describes the translation        literals.
                                                                      2
of a first-order abductive problem into an answer-set prob-             The relation between the primed and unprimed predicates is
lem. Section 4 briefly describes possible integrity constraints   given by the following axiom schema (see [Hobbs, 1985]):
that significantly constrain the proof search space. Section 5                     P (x) ≡ (∃e)P ′ (e, x) ∧ Rexist(e)
presents a method for selecting the best proof in the domain
of NLU. Section 6 concludes.
                                              Tbilisi (x) ∧ office(y) ∧ nn(x, y)


                                             location(x) location(y)                in(y, x)

                                                Figure 1: Proof of the Tbilisi office

                                                               S

                                                       NP               VP
                                                     a boy         V           NP

                                                                built         a boat


                                (∃e, x1 , y1 , x2 , y2 )boy(x1 )∧build′ (e,x2 ,y2 )∧P ast(e)∧boat(y1 )


                                Figure 2: Parse tree and variable bindings for A boy built a boat


precise theory of commonsense reasoning can be expressed                is only defeasible, for an in relation can also be expressed by
in this way.                                                            other linguistic constructions. Thus, to interpret (7) we have
   The intended real-world meaning of the logical forms (3)             to backchain on axiom (8) and unify both variables. The proof
and (5) is their literal meaning. But the logical representation        of (7) is depicted in Figure 1. As can be seen, the abduction
can contain linguistic predications that need be interpreted in         process is first-order even for simple examples.
order to be given a real-world meaning. The noun phrase
(NP)                                                                    3     Translation of deductive and abductive
(6)                      the Tbilisi office                                   rules
is parsed as
                                                                        To get the logical form of a sentence we first have to parse
(7)         (∃x, y)Tbilisi (x) ∧ office(y) ∧ nn(x, y)                   it. Since this paper focuses on interpretation, we will only
that is, there are two entities and an nn (i.e., N-N compound)          sketch very briefly how the parser is implemented. We have
relation between them, but the predicate nn only tells us that          an LFG grammar [Kaplan and Bresnan, 1982; Dalrymple et
x and y are adjacent NPs and x precedes y. If we know from              al., 1995a; Dalrymple, 2001; Bresnan, 2001] whose rules are
context that there is an office in Tbilisi, we want to inter-           augmented with annotations that incrementally build up the
pret (7) as in(office, Tbilisi ). If we have no such informa-           logical form. Consider the sentence
tion, but our background knowledge contains the facts that
Tbilisi is a city and that cities and offices are locations, we         (9)                    A boy built a boat.
can still draw the (defeasible) conclusion that there is an in
relation between the entities. Whichever the case, though, we           whose parse tree and variable bindings are given in Figure 2.
need axiom (8). We use ⇀ to denote (possibly defeasible)                The lexicon contains a logical expression for every preter-
implication in abductive rules and ⊃ to signify implication in          minal. For example, the morpholexical entry for boy pro-
“hard” rules, i.e., P (x) ⇀ Q(x) ≡ P (x) ∧ etc(x) ⊃ Q(x)                vides boy(x1 ), the entry for built provides build′ (e, x2 , y2 ) ∧
where etc is a literal specific to the rule.                            P ast(e), etc. The syntactic rules unify the variables provided
                                                                        by the nodes they operate on. The rule S → NP VP, for ex-
          (∀x, y)location(x) ∧ location(y) ∧ in(x, y) ⇀                 ample, will unify x1 and x2 , thus expressing the fact that the
(8)
                             ⇀ nn(y, x)                                 boy is the agent of e, which is a building event. This way of
that is, the fact that x is located in y can be expressed by a          semantic parsing is easy to implement if one already has a
compound nominal at the lexical level. Of course, axiom (8)             rule-based grammar (lexicon and rules).
   Alternatively, we could use so-called “glue seman-                    et al., 1993]):5
tics” [Dalrymple et al., 1993; 1995b], though this would be
less straightforward, since glue semantics produces higher-                              assumedBy(p, x, r) ⊃ assumed(p, x)
order formulae, thus we would have to “flatten” them and                              pred (p, asmpt, x)∧ ∼assumed(p, x) ⊃ ⊥
                                                                         (14)
introduce eventualities into the predicates. Yet another possi-                     explainedBy(p, x, r1 ) ∧ explainedBy(p, x, r2 )∧
bility is to implement the complete grammar in the abductive                                       ∧r1 6= r2 ⊃ ⊥
framework itself, as suggested by Hobbs (2003), which has                that is, a predication can be assumed only if it can be derived
the advantage that parsing and interpretation can be seam-               by backchaining on an abductive rule and a predication can
lessly integrated and carried out in one step. However if a              be explained by no more than one rule.
wide-coverage rule-based grammar is available, it is proba-                 The most important aspect is how variables in observa-
bly better to use it as a “preprocessor”.                                tions are handled. Since an answer-set program (ASP) has
   Deduction poses no problem to the solver. We can have                 to be effectively propositional, we have to “emulate” equal-
rules such as3                                                           ity. If a predication containing a “reified” variable (e.g.,
                elephant1 (x) ⊃ mammal1 (x)                              varx ) can be unified with another predication, we can bind
(10)                                                                     the variable. For example, if the knowledge base contains
                mammal1 (x) ⊃ animal1 (x)
                                                                         person1 (John) and we observe or assume person1 (varx ),
and we can use strong negation and disjunction, as in                    we may add eq(John, varx ) to the stable model. We may
           person1 (x) ⊃ ¬animal1 (x)                                    also decide not to bind a variable, in which case a new in-
(11)                                                                     dividual has to be added to the knowledge base. Of course,
           person1 (x) ⊃ man1 (x) ∨ woman1 (x)
                                                                         we need axioms that guarantee that eq is an equivalence rela-
The main result reported in this paper is a method for convert-          tion. We will not list all of them here but let us mention the
ing abduction with observations that contain variables into an           most important axiom schema. If P is a predicate, we need
answer-set problem. We represent observations and assump-                P (x) ∧ eq(x, y) ⊃ P (y) in order for deduction to work. Thus
tions as follows:                                                        whenever we bind a variable, the knowledge base grows (in
(12)                                                                     the worst case exponentially). There is no way around this
    observations:                                                        problem since answer-set solving is propositional. Luckily
    elephant(x) ⇒ pred(elephant, obsrv, varx )                           for us, the logical form of a sentence contains only few vari-
    assumptions:                                                         ables (around a dozen), hence the presented method is viable
    elephant1 (x) ⇒ pred(elephant1, asmpt, varx )                        for NLU. In actual fact, we are sacrificing space for time,
that is, variables are encoded as individuals. An abductive              since we need new literals for each variable assignment.
rule is encoded as follows:4                                                The rules described above correctly define all and only the
                                                                         stable models that correspond to an abductive proof in our
              elephant1 (x) ⇀ elephant(x) ⇒                              NLU framework. We use defeasible rules to infer what might
                   ⇒ pred(elephant, x, y)∧                               be true (thus extending our knowledge) based on what we al-
                   ∧x ∈ {obsrv, asmpt} ⊃10                               ready now. We process one sentence at a time in order to keep
(13)
               ⊃10 pred(elephant1 , asmpt, y)∧                           the proof search space as small as possible. Of course, in a
           ∧explainedBy(elephant, y, rule1 (y))∧                         connected discourse this may lead to a contradiction. Gener-
            ∧assumedBy(elephant1 , y, rule1 (y))                         ally we try to assume as little as possible (see Section 5), but if
that is, if there is a predication we want to explain that can           we arrive at a contradiction, we have to backtrack and repro-
be unified with the consequent of an abductive rule, we may              cess part of the discourse. We plan on using a truth mainte-
assume the antecedent (but we may ignore the rule because                nance system in the future, but for the time being, we simply
the cost of assuming the antecedent may be higher than the               reinterpret the sentences which might be affected by the false
cost of assuming the consequent, thus we have to consider                assumption.
both cases). The auxiliary predications are used in the fol-                We will now illustrate with an example how the solver finds
lowing rules, which help us guarantee that the computed sta-             proofs. Consider the following observation, abductive rules,
ble model is a correct abductive proof (in the sense of [Hobbs           and known facts (as usual, x is a variable and a, b are con-
                                                                         stants):
    3
      In the remainder of the paper, we use subscripted predicates
to represent real-world meaning and unsubscripted predicates to
represent lexical meaning. This distinction is necessary in or-                              observation:         q(x)
der to accommodate lexical ambiguity and figurative speech such                              rule:            p(x) ⇀ q(x)
                                                                         (15)
as metaphors and metonymy. For example, the literal real-world                               rule:            r(x) ⇀ q(x)
meaning of John is an elephant (whose logical form is John(x) ∧                              facts:            p(a), r(b)
elephant(x)) is John 1 (x) ∧ elephant 1 (x), but if we already know
that John is a person (person 1 (John)) we are forced into a figura-       The complete proof graph for (15) is given in Figure 3.
tive meaning such as John 1 (x)∧clumsy 1 (x). Thus in this sense, to     The labelled edges are possible “merges” (variable bind-
interpret a sentence is to steer clear of contradictions in the knowl-   ings by unification) and the unlabelled edges are licensed by
edge base.                                                               backchaining on abductive rules. A proof is equivalent to a
    4
      p ⊃10 q means 0{q}1 :- p, that is, the consequent may or
                                                                            5
may not be included in the stable model.                                        ∼ denotes default negation.
                                                                      q(x)


                                                              p(x)            r (x)

                                                                =                =


                                                              p(a)            r (b)

                                                  Figure 3: Complete proof graph for (15)

                     q(x)                     q(x)            q(x)                            q(x)       q(x)


                                     p(x)                             r (x)            p(x)                      r (x)

                                                                                        =                          =


                                                                                       p(a)                      r (b)

                                                           Figure 4: Proofs of (15)


subgraph of the complete proof graph complying with the fol-              icate whose argument tells us the size of (that is, the number
lowing conditions:                                                        of assumptions in) a proof and rule out any proof that is too
  1. An observation or assumption is explained by no more                 big:
     than one rule.6                                                                 numberOfAssumptions(x) = |{p : assumed(p)}|
  2. There is a path from any assumption to an observation                (17)               numberOfAssumptions(x)∧
     (i.e., we eliminate assumptions that do not contribute to                          ∧x > maxNumberOfAssumptions ⊃ ⊥
     the explanation of an observation).
                                                                          We can also have a predicate that tells us the length of the
  3. Variable assignments conform to the usual constraints                proof path from p1 to p2 and an integrity constraint that rules
     on equivalence.                                                      out any proof whose length is greater than an integer constant,
   All the proofs of (15) are depicted in Figure 4. The corre-            maxProofLength:
sponding hypotheses (including the observation) are:7                     (18) proofLength(p1 , p2 , l) ∧ l > maxProofLength ⊃ ⊥
                               q(x)                                       These two simple integrity constraints can significantly con-
                            p(x) ∧ q(x)                                   strain the proof search space, thus speeding up the enumera-
(16)                        r(x) ∧ q(x)                                   tion of proofs.
                               q(a)
                               q(b)                                       5      Ranking abductive proofs
                                                                          Even a relatively simple background theory, as in our ex-
4       Constraining the proof search space                               periments, will have at least hundreds of abductive and de-
In modern answer-set solvers, aggregate functions such as                 ductive rules, which means that the logical form of an av-
count, sum, or max can be used. We can thus define a pred-                erage sentence can have many different proofs (since both
    6
       This condition does not mean that a literal cannot be implied by   deduction and abduction are explosive). Moreover in a
more rules, it only says that only one (defeasible) rule is taken to be   long discourse consisting of many sentences, the knowl-
its explanation.                                                          edge base will contain many individuals available for unifica-
     7
       In the domain of NLU, the logical form of a sentence is an ex-     tion, which can multiply the number of proofs. The method
istential closure so we would have to introduce a new individual for      of weighted-abduction [Hobbs et al., 1993; Hobbs, 2001;
every free variable.                                                      2003] seems to yield good results, but it cannot be used in
                                                          elephant(x)


                                                         elephant1 (x)


                                                     =           =              =


                                  elephant1 (E11 )       elephant1 (E22 )           elephant1 (E33 )

                                        Figure 5: Literal interpretations of elephant(x)

                                            bank (x) ∧ account(y) ∧ nn(x, y)


                                                                        appurt1 (y, x)


                                           bank1 (x)     account1 (y)

                                           Figure 6: Interpretation of bank account


an answer-set program because it works with real numbers.            indefinite NP in
While it is possible to enumerate all the abductive proofs and
evaluate them later, we decided to try to solve the ranking                            He bought a book.
                                                                     (20)
problem within ASP. In this section, we describe how to rank                           he(x) ∧ buy ′ (e, x, c) ∧ book(c)
proofs using the answer-set solver.
   The basic idea, coming from Hobbs et al. (1993), is that             If    there    is    elephant(x)      among        the    ob-
one should prefer proofs that unify assumed predications with        servations     and     the     knowledge     base       contains
what we already know and assume as little as possible. In            elephant1 (E11 ), elephant1 (E22 ), elephant1 (E33 ),8     there
other words, maximally coherent (with respect to the con-            are three literal interpretations in which the variable x is
text) and minimally ampliative proofs are preferred. We add          unified, as illustrated in Figure 5. Based on a linguistic
one more criterion: salience. Informally, individuals that           insight, we want to prefer the proof which unifies the variable
occurred recently in the discourse have higher salience. If          x with the most salient individual.
the pronoun he is used in a sentence, it is interpreted by              Unification can help us resolve lexical ambiguity even
backchaining on the following abductive rule:                        when there are no individuals in the knowledge base which
                                                                     could be unified with the variables. Consider the compound
(19)         person1 (x) ∧ male1 (x) ⇀ he(x)
                                                                     nominal
that is, he refers to a male person. To bind the variable, we
have to find an individual which conforms to the selectional                          bank account
                                                                     (21)
constraints. But there can be many such individuals in the                            bank(x) ∧ account(y) ∧ nn(x, y)
knowledge base. Thus we rank the proofs with respect to
their salience, which is the sum of the saliences of all indi-       and assume that we have the following background theory
viduals unified with a variable. Newly introduced constants
                                                                        8
are assigned the highest salience, as in the case of c for the              The upper index expresses the salience of the individual.
(with five lexical rules and one commonsense rule):                  investigate how the proposed method for ranking proofs can
                                                                     be applied to automated goal-driven planning with incom-
                     bank1 (x) ⇀ bank(x)
                                                                     plete information.
                     bank2 (x) ⇀ bank(x)
                 account1 (x) ⇀ account(x)
(22)                                                                 References
                 account2 (x) ⇀ account(x)
                  appurt1 (x, y) ⇀ nn(y, x)                          [Appelt and Pollack, 1992] Douglas E. Appelt and Martha E.
           account1 (x) ∧ bank1 (y) ⇀ appurt1 (x, y)                    Pollack. Weighted Abduction for Plan Ascription. User
that is, accounts (defeasibly) appertain to banks and the               Modeling and User-Adapted Interaction, 2(1–2):1–25,
relation appurt(enance) can be expressed by a compound                  1992.
nominal (nn).9 We see in Figure 6 that two predications              [Bresnan, 2001] Joan Bresnan. Lexical-Functional Syntax.
are unified. If we interpreted bank(x) as bank2 (x) and/or              Blackwell Textbooks in Linguistics, New York, 2001.
account(y) as account2 (y), there would be no unification of         [Dalrymple et al., 1993] Mary Dalrymple, John Lamping,
predications and hence the proof would be less coherent.
                                                                        and Vijat Saraswat. LFG Semantics via Constraints. In
   As we have just shown, abduction can be used to lexically
                                                                        Proceedings of Sixth Meeting of the European ACL, 1993.
disambiguate phrases even if there is no additional context
or previous discourse. Of course, in order for this method to        [Dalrymple et al., 1995a] Mary Dalrymple, Ronald M. Ka-
work background theories are needed that capture relations              plan, John T. Maxwell III, and Annie Zaenen. Formal Is-
between entities that occur in the sentence. Creating com-              sues in Lexical-Functional Grammar. CSLI Publications,
monsense knowledge bases is generally a very complex task               1995.
but it is feasible at least for smaller closed domains.              [Dalrymple et al., 1995b] Mary Dalrymple, John Lamping,
   There can be many proofs with the same number of unified             Fernando Pereira, and Vijat Saraswat. Linear logic for
predications, thus we need a criterion that will help us distin-        meaning assembly. In Proceedings of Computational
guish them. The simplest criterion is the size of the proof, i.e.,      Logic for Natural Language Processing, 1995.
the number of assumptions made. Intuitively, we do not want
to assume more than is necessary; if an assumption does not          [Dalrymple, 2001] Mary Dalrymple. Lexical Functional
help us arrive at a more coherent proof (that is, a proof with          Grammar, volume 34 of Syntax and Semantics. Academic
more unifications), is should be omitted. This simple idea              Press, 2001.
seems to be good enough to rule out most undesired proofs.           [Gebser et al., 2012] Martin Gebser, Benjamin Kaufmann,
   We can use the predicate numberOfAssumptions de-                     and Torsten Schaub. Conflict-Driven Answer Set Solv-
fined in Section 4 and an analogously defined predicate                 ing: From Theory to Practice. Artificial Intelligence, 187–
numberOfUnifications to select the best proof. Our method               188:52–89, 2012.
for ranking abductive proofs yields slightly better results than
                                                                     [Gelfond and Kahl, 2014] Michael Gelfond and Yulia Kahl.
that proposed by Hobbs et al. (1993) in our evaluation, but
this does not mean that it is better because the difference is          Knowledge Representation, Reasoning, and the Design
not statistically significant and because Hobbs’ method relies          of Intelligent Agents. The Answer-Set Programming Ap-
on probabilistic weights which are hard to “get right” empir-           proach. Cambridge University Press, 2014.
ically. Nevertheless our method is relatively simple and can         [Hobbs et al., 1993] Jerry R. Hobbs, Mark Stickel, Douglas
be implemented in ASP (for it uses only natural numbers).               Appelt, and Paul Martin. Interpretation as Abduction. Ar-
                                                                        tificial Intelligence, 63:69–142, 1993.
6      Conclusions                                                   [Hobbs, 1985] Jerry R. Hobbs. Ontological Promiscuity. In
We have presented a method for translating (a fragment of)              Proceedings of the 23rd Annual Meeting of the Association
first-order abduction into an answer-set program in the con-            for Computational Linguistics, pages 61–69, 1985.
text of NLU. The heretofore used algorithms do not allow for         [Hobbs, 2001] Jerry R. Hobbs. Syntax and Metonymy. In
seamless integration of the process of abduction with deduc-            P. Bouillon and F. Busa, editors, The Language of Word
tion. Our method helps the solver confine the search space by           Meaning, pages 290–311. Cambridge University Press,
ruling out logically impossible proofs (with respect to a back-         2001.
ground theory). We have also suggested how to rank proofs
within the answer-set program, since the original framework          [Hobbs, 2003] Jerry R. Hobbs. Discourse and Inference.
of weighted abduction would require an additional step to               MS , Information Sciences Institute, University of South-
evaluate the proofs. An evaluation has shown that in con-               ern California, Marina del Rey, 2003.
junction with a state-of-the-art answer-set solver, our method       [Hobbs, 2005] Jerry R. Hobbs. Encoding Commonsense
is an order of magnitude faster than the approach based on a            Knowledge. MS, Information Sciences Institute, Univer-
general automated theorem prover.                                       sity of Southern California, Marina del Rey, 2005.
   There is no doubt that first-order Horn abduction is useful
                                                                     [Inoue et al., 2012] Naoya Inoue, Ekaterina Ovchinnikova,
in many areas of artificial intelligence. Our future work will
                                                                        Kentaro Inui, and Jerry Hobbs. Coreference Resolution
   9                                                                    with ILP-based Weighted Abduction. In Proceedings of
     bankx and accountx are different lexical meanings of bank
and account, respectively.                                              COLING 2012, pages 1291–1308, 2012.
[Kaplan and Bresnan, 1982] Ronald M. Kaplan and Joan
  Bresnan. Lexical-Functional Grammar: A formal sys-
  tem for grammatical representation. In Joan Bresnan,
  editor, Mental Representation of Grammatical Relations.
  MIT Press, Cambridge, 1982.
[Ovchinnikova et al., 2014] Ekaterina Ovchinnikova, Ross
  Israel, Suzanne Wertheim, Vladimir Zaytsev, Niloofar
  Montazeri, and Jerry Hobbs. Abductive Inference for In-
  terpretation of Metaphors. In Proceedings of the Second
  Workshop on Metaphor in NLP, pages 33–41, 2014.