Natural Language Understanding as First-Order Abduction via Stable Models Petr Homola Codesign Ltd. phomola@codesign.cz Abstract 2 Preliminaries Abduction is the reasoning process of finding explanations for We present a method for finding abductive proofs observations. In NLU, the “observations” are (logical forms in first-order Horn theories by using an answer-set of) sentences and the interpretation of a sentence is the “best” solver. We illustrate our solution with examples (i.e., most coherent) abductive proof that explains it with re- from the domain of natural language understand- spect to a background theory. Formally, I is an interpretation ing. Furthermore we describe a novel way of rank- of observations O with respect to a background theory T if1 ing abductive proofs of logical forms of sentences. (1) T ∪ I  O ∧ T ∪ I 6 ⊥ that is, T ∪ I entails O and is consistent. The sentence John is 1 Introduction an elephant may mean that there is an actual elephant whose name is John, i.e., I can be the literal meaning of the sentence. From a pragmatic perspective, natural language understand- But if we know (from context) that John is a person, T ∪ I ing (NLU) can be thought of as first-order abduction [Hobbs will be inconsistent, hence I cannot be the literal meaning et al., 1993; Ovchinnikova et al., 2014]. The sentence we of the sentence and we have to find some other (nonliteral) want to interpret is what is “observed” and the “best” abduc- interpretation that makes sense in the given context. tive proof tells us what the sentence actually means. Abduc- We use the logical representation proposed tive reasoning is ampliative, that is, what we conclude cannot by Hobbs (1985), which is a “conjunctivist” scope-free be proved deductively, but it extends our background knowl- first-order approach to linguistic meaning. Consider the edge in a coherent way. sentence Abduction is intractable (cf. [Appelt and Pollack, 1992], (2) John sees Mary. “even in the case of propositional Horn clause theories, the problem of computing an explanation for an arbitrary propo- Irrelevant details aside, its logical representation is2 sition is NP hard”). Propositional abduction is known to (3) (∃e, x, y)see ′ (e, x, y) ∧ John(x) ∧ Mary(y) be implementable as stable model enumeration [Gelfond and Kahl, 2014] but in the case of NLU the underlying theory that is, there is an eventuality e which is a seeing, John does is first-order. In early experiments with interpretation as it and Mary undergoes it. The logical representation of (weighted, i.e., cost-based) abduction, a Prolog-based abduc- tive theorem prover was used [Hobbs et al., 1993]. Later (4) John doesn’t see Mary. a more efficient algorithm based on integer linear program- would be ming was developed [Inoue et al., 2012; Ovchinnikova et al., 2014]. But neither approach allows for seamless integration (∃e1 , e2 , x, y)not ′ (e1 , e2 ) ∧ see ′ (e2 , x, y)∧ (5) of abduction with full-fledged deduction. In this contribution ∧John(x) ∧ Mary(y) we present a method for comparatively efficient first-order that is, the negation of the eventuality expressed in (3) is as- abduction based on answer-set solving. We use the solver serted. Hobbs (2005) showed that an appropriately rich and described in [Gebser et al., 2012]. 1 In Section 2 we give an overview of the problem illustrated T is a set of first-order formulae and I and O are sets of positive with a simple example. Section 3 describes the translation literals. 2 of a first-order abductive problem into an answer-set prob- The relation between the primed and unprimed predicates is lem. Section 4 briefly describes possible integrity constraints given by the following axiom schema (see [Hobbs, 1985]): that significantly constrain the proof search space. Section 5 P (x) ≡ (∃e)P ′ (e, x) ∧ Rexist(e) presents a method for selecting the best proof in the domain of NLU. Section 6 concludes. Tbilisi (x) ∧ office(y) ∧ nn(x, y) location(x) location(y) in(y, x) Figure 1: Proof of the Tbilisi office S NP VP a boy V NP built a boat (∃e, x1 , y1 , x2 , y2 )boy(x1 )∧build′ (e,x2 ,y2 )∧P ast(e)∧boat(y1 ) Figure 2: Parse tree and variable bindings for A boy built a boat precise theory of commonsense reasoning can be expressed is only defeasible, for an in relation can also be expressed by in this way. other linguistic constructions. Thus, to interpret (7) we have The intended real-world meaning of the logical forms (3) to backchain on axiom (8) and unify both variables. The proof and (5) is their literal meaning. But the logical representation of (7) is depicted in Figure 1. As can be seen, the abduction can contain linguistic predications that need be interpreted in process is first-order even for simple examples. order to be given a real-world meaning. The noun phrase (NP) 3 Translation of deductive and abductive (6) the Tbilisi office rules is parsed as To get the logical form of a sentence we first have to parse (7) (∃x, y)Tbilisi (x) ∧ office(y) ∧ nn(x, y) it. Since this paper focuses on interpretation, we will only that is, there are two entities and an nn (i.e., N-N compound) sketch very briefly how the parser is implemented. We have relation between them, but the predicate nn only tells us that an LFG grammar [Kaplan and Bresnan, 1982; Dalrymple et x and y are adjacent NPs and x precedes y. If we know from al., 1995a; Dalrymple, 2001; Bresnan, 2001] whose rules are context that there is an office in Tbilisi, we want to inter- augmented with annotations that incrementally build up the pret (7) as in(office, Tbilisi ). If we have no such informa- logical form. Consider the sentence tion, but our background knowledge contains the facts that Tbilisi is a city and that cities and offices are locations, we (9) A boy built a boat. can still draw the (defeasible) conclusion that there is an in relation between the entities. Whichever the case, though, we whose parse tree and variable bindings are given in Figure 2. need axiom (8). We use ⇀ to denote (possibly defeasible) The lexicon contains a logical expression for every preter- implication in abductive rules and ⊃ to signify implication in minal. For example, the morpholexical entry for boy pro- “hard” rules, i.e., P (x) ⇀ Q(x) ≡ P (x) ∧ etc(x) ⊃ Q(x) vides boy(x1 ), the entry for built provides build′ (e, x2 , y2 ) ∧ where etc is a literal specific to the rule. P ast(e), etc. The syntactic rules unify the variables provided by the nodes they operate on. The rule S → NP VP, for ex- (∀x, y)location(x) ∧ location(y) ∧ in(x, y) ⇀ ample, will unify x1 and x2 , thus expressing the fact that the (8) ⇀ nn(y, x) boy is the agent of e, which is a building event. This way of that is, the fact that x is located in y can be expressed by a semantic parsing is easy to implement if one already has a compound nominal at the lexical level. Of course, axiom (8) rule-based grammar (lexicon and rules). Alternatively, we could use so-called “glue seman- et al., 1993]):5 tics” [Dalrymple et al., 1993; 1995b], though this would be less straightforward, since glue semantics produces higher- assumedBy(p, x, r) ⊃ assumed(p, x) order formulae, thus we would have to “flatten” them and pred (p, asmpt, x)∧ ∼assumed(p, x) ⊃ ⊥ (14) introduce eventualities into the predicates. Yet another possi- explainedBy(p, x, r1 ) ∧ explainedBy(p, x, r2 )∧ bility is to implement the complete grammar in the abductive ∧r1 6= r2 ⊃ ⊥ framework itself, as suggested by Hobbs (2003), which has that is, a predication can be assumed only if it can be derived the advantage that parsing and interpretation can be seam- by backchaining on an abductive rule and a predication can lessly integrated and carried out in one step. However if a be explained by no more than one rule. wide-coverage rule-based grammar is available, it is proba- The most important aspect is how variables in observa- bly better to use it as a “preprocessor”. tions are handled. Since an answer-set program (ASP) has Deduction poses no problem to the solver. We can have to be effectively propositional, we have to “emulate” equal- rules such as3 ity. If a predication containing a “reified” variable (e.g., elephant1 (x) ⊃ mammal1 (x) varx ) can be unified with another predication, we can bind (10) the variable. For example, if the knowledge base contains mammal1 (x) ⊃ animal1 (x) person1 (John) and we observe or assume person1 (varx ), and we can use strong negation and disjunction, as in we may add eq(John, varx ) to the stable model. We may person1 (x) ⊃ ¬animal1 (x) also decide not to bind a variable, in which case a new in- (11) dividual has to be added to the knowledge base. Of course, person1 (x) ⊃ man1 (x) ∨ woman1 (x) we need axioms that guarantee that eq is an equivalence rela- The main result reported in this paper is a method for convert- tion. We will not list all of them here but let us mention the ing abduction with observations that contain variables into an most important axiom schema. If P is a predicate, we need answer-set problem. We represent observations and assump- P (x) ∧ eq(x, y) ⊃ P (y) in order for deduction to work. Thus tions as follows: whenever we bind a variable, the knowledge base grows (in (12) the worst case exponentially). There is no way around this observations: problem since answer-set solving is propositional. Luckily elephant(x) ⇒ pred(elephant, obsrv, varx ) for us, the logical form of a sentence contains only few vari- assumptions: ables (around a dozen), hence the presented method is viable elephant1 (x) ⇒ pred(elephant1, asmpt, varx ) for NLU. In actual fact, we are sacrificing space for time, that is, variables are encoded as individuals. An abductive since we need new literals for each variable assignment. rule is encoded as follows:4 The rules described above correctly define all and only the stable models that correspond to an abductive proof in our elephant1 (x) ⇀ elephant(x) ⇒ NLU framework. We use defeasible rules to infer what might ⇒ pred(elephant, x, y)∧ be true (thus extending our knowledge) based on what we al- ∧x ∈ {obsrv, asmpt} ⊃10 ready now. We process one sentence at a time in order to keep (13) ⊃10 pred(elephant1 , asmpt, y)∧ the proof search space as small as possible. Of course, in a ∧explainedBy(elephant, y, rule1 (y))∧ connected discourse this may lead to a contradiction. Gener- ∧assumedBy(elephant1 , y, rule1 (y)) ally we try to assume as little as possible (see Section 5), but if that is, if there is a predication we want to explain that can we arrive at a contradiction, we have to backtrack and repro- be unified with the consequent of an abductive rule, we may cess part of the discourse. We plan on using a truth mainte- assume the antecedent (but we may ignore the rule because nance system in the future, but for the time being, we simply the cost of assuming the antecedent may be higher than the reinterpret the sentences which might be affected by the false cost of assuming the consequent, thus we have to consider assumption. both cases). The auxiliary predications are used in the fol- We will now illustrate with an example how the solver finds lowing rules, which help us guarantee that the computed sta- proofs. Consider the following observation, abductive rules, ble model is a correct abductive proof (in the sense of [Hobbs and known facts (as usual, x is a variable and a, b are con- stants): 3 In the remainder of the paper, we use subscripted predicates to represent real-world meaning and unsubscripted predicates to represent lexical meaning. This distinction is necessary in or- observation: q(x) der to accommodate lexical ambiguity and figurative speech such rule: p(x) ⇀ q(x) (15) as metaphors and metonymy. For example, the literal real-world rule: r(x) ⇀ q(x) meaning of John is an elephant (whose logical form is John(x) ∧ facts: p(a), r(b) elephant(x)) is John 1 (x) ∧ elephant 1 (x), but if we already know that John is a person (person 1 (John)) we are forced into a figura- The complete proof graph for (15) is given in Figure 3. tive meaning such as John 1 (x)∧clumsy 1 (x). Thus in this sense, to The labelled edges are possible “merges” (variable bind- interpret a sentence is to steer clear of contradictions in the knowl- ings by unification) and the unlabelled edges are licensed by edge base. backchaining on abductive rules. A proof is equivalent to a 4 p ⊃10 q means 0{q}1 :- p, that is, the consequent may or 5 may not be included in the stable model. ∼ denotes default negation. q(x) p(x) r (x) = = p(a) r (b) Figure 3: Complete proof graph for (15) q(x) q(x) q(x) q(x) q(x) p(x) r (x) p(x) r (x) = = p(a) r (b) Figure 4: Proofs of (15) subgraph of the complete proof graph complying with the fol- icate whose argument tells us the size of (that is, the number lowing conditions: of assumptions in) a proof and rule out any proof that is too 1. An observation or assumption is explained by no more big: than one rule.6 numberOfAssumptions(x) = |{p : assumed(p)}| 2. There is a path from any assumption to an observation (17) numberOfAssumptions(x)∧ (i.e., we eliminate assumptions that do not contribute to ∧x > maxNumberOfAssumptions ⊃ ⊥ the explanation of an observation). We can also have a predicate that tells us the length of the 3. Variable assignments conform to the usual constraints proof path from p1 to p2 and an integrity constraint that rules on equivalence. out any proof whose length is greater than an integer constant, All the proofs of (15) are depicted in Figure 4. The corre- maxProofLength: sponding hypotheses (including the observation) are:7 (18) proofLength(p1 , p2 , l) ∧ l > maxProofLength ⊃ ⊥ q(x) These two simple integrity constraints can significantly con- p(x) ∧ q(x) strain the proof search space, thus speeding up the enumera- (16) r(x) ∧ q(x) tion of proofs. q(a) q(b) 5 Ranking abductive proofs Even a relatively simple background theory, as in our ex- 4 Constraining the proof search space periments, will have at least hundreds of abductive and de- In modern answer-set solvers, aggregate functions such as ductive rules, which means that the logical form of an av- count, sum, or max can be used. We can thus define a pred- erage sentence can have many different proofs (since both 6 This condition does not mean that a literal cannot be implied by deduction and abduction are explosive). Moreover in a more rules, it only says that only one (defeasible) rule is taken to be long discourse consisting of many sentences, the knowl- its explanation. edge base will contain many individuals available for unifica- 7 In the domain of NLU, the logical form of a sentence is an ex- tion, which can multiply the number of proofs. The method istential closure so we would have to introduce a new individual for of weighted-abduction [Hobbs et al., 1993; Hobbs, 2001; every free variable. 2003] seems to yield good results, but it cannot be used in elephant(x) elephant1 (x) = = = elephant1 (E11 ) elephant1 (E22 ) elephant1 (E33 ) Figure 5: Literal interpretations of elephant(x) bank (x) ∧ account(y) ∧ nn(x, y) appurt1 (y, x) bank1 (x) account1 (y) Figure 6: Interpretation of bank account an answer-set program because it works with real numbers. indefinite NP in While it is possible to enumerate all the abductive proofs and evaluate them later, we decided to try to solve the ranking He bought a book. (20) problem within ASP. In this section, we describe how to rank he(x) ∧ buy ′ (e, x, c) ∧ book(c) proofs using the answer-set solver. The basic idea, coming from Hobbs et al. (1993), is that If there is elephant(x) among the ob- one should prefer proofs that unify assumed predications with servations and the knowledge base contains what we already know and assume as little as possible. In elephant1 (E11 ), elephant1 (E22 ), elephant1 (E33 ),8 there other words, maximally coherent (with respect to the con- are three literal interpretations in which the variable x is text) and minimally ampliative proofs are preferred. We add unified, as illustrated in Figure 5. Based on a linguistic one more criterion: salience. Informally, individuals that insight, we want to prefer the proof which unifies the variable occurred recently in the discourse have higher salience. If x with the most salient individual. the pronoun he is used in a sentence, it is interpreted by Unification can help us resolve lexical ambiguity even backchaining on the following abductive rule: when there are no individuals in the knowledge base which could be unified with the variables. Consider the compound (19) person1 (x) ∧ male1 (x) ⇀ he(x) nominal that is, he refers to a male person. To bind the variable, we have to find an individual which conforms to the selectional bank account (21) constraints. But there can be many such individuals in the bank(x) ∧ account(y) ∧ nn(x, y) knowledge base. Thus we rank the proofs with respect to their salience, which is the sum of the saliences of all indi- and assume that we have the following background theory viduals unified with a variable. Newly introduced constants 8 are assigned the highest salience, as in the case of c for the The upper index expresses the salience of the individual. (with five lexical rules and one commonsense rule): investigate how the proposed method for ranking proofs can be applied to automated goal-driven planning with incom- bank1 (x) ⇀ bank(x) plete information. bank2 (x) ⇀ bank(x) account1 (x) ⇀ account(x) (22) References account2 (x) ⇀ account(x) appurt1 (x, y) ⇀ nn(y, x) [Appelt and Pollack, 1992] Douglas E. Appelt and Martha E. account1 (x) ∧ bank1 (y) ⇀ appurt1 (x, y) Pollack. Weighted Abduction for Plan Ascription. User that is, accounts (defeasibly) appertain to banks and the Modeling and User-Adapted Interaction, 2(1–2):1–25, relation appurt(enance) can be expressed by a compound 1992. nominal (nn).9 We see in Figure 6 that two predications [Bresnan, 2001] Joan Bresnan. Lexical-Functional Syntax. are unified. If we interpreted bank(x) as bank2 (x) and/or Blackwell Textbooks in Linguistics, New York, 2001. account(y) as account2 (y), there would be no unification of [Dalrymple et al., 1993] Mary Dalrymple, John Lamping, predications and hence the proof would be less coherent. and Vijat Saraswat. LFG Semantics via Constraints. In As we have just shown, abduction can be used to lexically Proceedings of Sixth Meeting of the European ACL, 1993. disambiguate phrases even if there is no additional context or previous discourse. Of course, in order for this method to [Dalrymple et al., 1995a] Mary Dalrymple, Ronald M. Ka- work background theories are needed that capture relations plan, John T. Maxwell III, and Annie Zaenen. Formal Is- between entities that occur in the sentence. Creating com- sues in Lexical-Functional Grammar. CSLI Publications, monsense knowledge bases is generally a very complex task 1995. but it is feasible at least for smaller closed domains. [Dalrymple et al., 1995b] Mary Dalrymple, John Lamping, There can be many proofs with the same number of unified Fernando Pereira, and Vijat Saraswat. Linear logic for predications, thus we need a criterion that will help us distin- meaning assembly. In Proceedings of Computational guish them. The simplest criterion is the size of the proof, i.e., Logic for Natural Language Processing, 1995. the number of assumptions made. Intuitively, we do not want to assume more than is necessary; if an assumption does not [Dalrymple, 2001] Mary Dalrymple. Lexical Functional help us arrive at a more coherent proof (that is, a proof with Grammar, volume 34 of Syntax and Semantics. Academic more unifications), is should be omitted. This simple idea Press, 2001. seems to be good enough to rule out most undesired proofs. [Gebser et al., 2012] Martin Gebser, Benjamin Kaufmann, We can use the predicate numberOfAssumptions de- and Torsten Schaub. Conflict-Driven Answer Set Solv- fined in Section 4 and an analogously defined predicate ing: From Theory to Practice. Artificial Intelligence, 187– numberOfUnifications to select the best proof. Our method 188:52–89, 2012. for ranking abductive proofs yields slightly better results than [Gelfond and Kahl, 2014] Michael Gelfond and Yulia Kahl. that proposed by Hobbs et al. (1993) in our evaluation, but this does not mean that it is better because the difference is Knowledge Representation, Reasoning, and the Design not statistically significant and because Hobbs’ method relies of Intelligent Agents. The Answer-Set Programming Ap- on probabilistic weights which are hard to “get right” empir- proach. Cambridge University Press, 2014. ically. Nevertheless our method is relatively simple and can [Hobbs et al., 1993] Jerry R. Hobbs, Mark Stickel, Douglas be implemented in ASP (for it uses only natural numbers). Appelt, and Paul Martin. Interpretation as Abduction. Ar- tificial Intelligence, 63:69–142, 1993. 6 Conclusions [Hobbs, 1985] Jerry R. Hobbs. Ontological Promiscuity. In We have presented a method for translating (a fragment of) Proceedings of the 23rd Annual Meeting of the Association first-order abduction into an answer-set program in the con- for Computational Linguistics, pages 61–69, 1985. text of NLU. The heretofore used algorithms do not allow for [Hobbs, 2001] Jerry R. Hobbs. Syntax and Metonymy. In seamless integration of the process of abduction with deduc- P. Bouillon and F. Busa, editors, The Language of Word tion. Our method helps the solver confine the search space by Meaning, pages 290–311. Cambridge University Press, ruling out logically impossible proofs (with respect to a back- 2001. ground theory). We have also suggested how to rank proofs within the answer-set program, since the original framework [Hobbs, 2003] Jerry R. Hobbs. Discourse and Inference. of weighted abduction would require an additional step to MS , Information Sciences Institute, University of South- evaluate the proofs. An evaluation has shown that in con- ern California, Marina del Rey, 2003. junction with a state-of-the-art answer-set solver, our method [Hobbs, 2005] Jerry R. Hobbs. Encoding Commonsense is an order of magnitude faster than the approach based on a Knowledge. MS, Information Sciences Institute, Univer- general automated theorem prover. sity of Southern California, Marina del Rey, 2005. There is no doubt that first-order Horn abduction is useful [Inoue et al., 2012] Naoya Inoue, Ekaterina Ovchinnikova, in many areas of artificial intelligence. Our future work will Kentaro Inui, and Jerry Hobbs. Coreference Resolution 9 with ILP-based Weighted Abduction. In Proceedings of bankx and accountx are different lexical meanings of bank and account, respectively. COLING 2012, pages 1291–1308, 2012. [Kaplan and Bresnan, 1982] Ronald M. Kaplan and Joan Bresnan. Lexical-Functional Grammar: A formal sys- tem for grammatical representation. In Joan Bresnan, editor, Mental Representation of Grammatical Relations. MIT Press, Cambridge, 1982. [Ovchinnikova et al., 2014] Ekaterina Ovchinnikova, Ross Israel, Suzanne Wertheim, Vladimir Zaytsev, Niloofar Montazeri, and Jerry Hobbs. Abductive Inference for In- terpretation of Metaphors. In Proceedings of the Second Workshop on Metaphor in NLP, pages 33–41, 2014.