Reasoning about Conditional Beliefs for the Winograd Schema Challenge Denis Golovin and Jens Claßen Christoph Schwering Knowledge-Based Systems Group School of Computer Science and Engineering RWTH Aachen University The University of New South Wales Aachen, Germany Sydney, Australia Abstract the statement (“the delivery truck”, “the school bus”) a pro- noun (“it”) refers. Each schema contains a special word (here: The Winograd Schema Challenge has been proposed “slow”) that, when replaced by an alternate word (“fast”), as an alternative to the Turing Test for measuring changes the answer. While obvious to an English-speaking a machine’s intelligence by letting it solve difficult human reader, such questions pose a real challenge for a ma- pronoun resolution problems that cannot be tackled chine since the given statement alone does not provide enough by statistical analysis alone. While many solutions information to derive an answer, but rather requires having so far are based on machine learning and natural appropriate commonsense background knowledge and being language processing, we believe that a knowledge- able to reason about it. Winograd schemas are designed to be based approach is better suited. In particular, we pro- “Google-proof”, meaning that they should not be solvable by pose to employ a logic for conditional beliefs that is simple statistical analysis: If instead of a “delivery truck” the capable of dealing with incomplete or even inconsis- above sentence spoke of a “sports car”, it would not be hard to tent information (which commonsense knowledge detect a correlation to words associated with fast movement. often is). It does so by formalising the observation WSC competitions that were held so far prove that the task is that humans often reason by picturing different con- indeed difficult, and likely cannot be solved by classical natural tingencies of what the world could be like, and then language processing alone. Robust commonsense reasoning choose to believe what is thought to be most plau- on the other hand will undoubtedly be beneficial in other areas, sible. We discuss and evaluate an implementation in particular when it comes to domestic robots and human- where relevant commonsense background informa- machine interaction in general. tion is obtained from the ConceptNet semantic net- Such a system will not only need access to a large corpus work, translated into our formalism, and processed of background information that preferably covers all areas by a reasoner for our logic. of the wide spectrum of everyday human life, but also be able to appropriately function in light of the fact that such 1 Introduction information is often incomplete or even conflicting. Humans The Winograd Schema Challenge (WSC) has been proposed by are very capable of this, and we usually do so by picturing Levesque, Davis and Morgenstern [2012] as an alternative to different contingencies of what the world could be like, and the Turing Test for measuring a machine’s intelligence. While then choosing to believe what we think the most plausible agreeing with Turing’s primary concern, which is focusing scenario is. on the observable behaviour exhibited by the system, the aim In this paper, we present an approach to the WSC that is is to avoid some of the pitfalls that come with a free-form based on a logic that formalises these concepts, the logic of conversation as Turing had in mind. Most importantly, in conditional beliefs [Schwering et al., 2017]. We designed and order to pass as a person, a machine will have to resort to implemented a system that deception to be able to answer questions about their height, • translates the WSC sentence into a logical formula, childhood, and so on. Moreover, the judgement of what passes as human-like conversation is highly subjective and may vary • extracts relevant background information from a knowl- depending on the interrogator that is performing the test. edge corpus and translates it into our logical language, The central idea behind the WSC is to instead let the ma- and chine answer a number of questions of a specific form. As an • performs reasoning to decide what the most plausible example, consider the statement answer is. The delivery truck zoomed by the school bus because it was The rest of this paper is organised as follows. In Section 2 going so slow. we discuss related work. Section 3 shows an overview of The question is “What was going slow?”, i.e., the task is our system’s architecture. We then present the formal frame- to disambiguate to which of the two parties mentioned in work of our approach in Section 4. Section 5 describes the implementation of our system, while Section 6 presents an (these were hand coded in their examples), nor did they come evaluation. Finally we conclude in Section 7. up with an implementation that would allow automating the reasoning process and evaluating the approach against others. 2 Related Work Their formalism also currently lacks the capability to handle non-monotonic aspects such as the statement Rahman and Ng [2012] view the WSC as a special form of pronoun resolution problem and propose an approach based If the shelf is level, then the sculpture will not roll off. on machine learning. Their system relies on abstract feature which, when read as default rule, allows to withdraw the con- vectors extracted from eight different sources, among which clusion that the sculpture does not roll off in face of new Google and Lexical Features turned out to be most useful information, e.g. that there is an earthquake. in their analysis. For the former, they construct a number of Finally, Liu et al. [2017] propose another approach based on queries from the verb governing the pronoun in the input sen- machine learning, which is used (a) in a knowledge acquisition tence, the words followed by it, and the two candidate parties, step to identify cause-effect relationships between commonly and then count the number of results returned by the search en- used words from large text corpora and (b) to find association gine. The latter are obtained from a training set and represent relationships in the form of conditional probabilities between the presence or absence of certain predefined structures in the pairs of discrete events. They manually selected a subset of sentence. They achieve an accuracy of 73.1% on their test set, 70 Winograd schemas, among which their method achieves an where the overall dataset of 941 sentence pairs was created by accuracy of 70%. 30 students and split using a 70/30 training/test ratio. It should be noted that they considered a relaxed version of the problem: while the WSC only allows for instances where resolving the 3 System Architecture Overview pronoun requires background knowledge not expressed in the The architecture of our system is visualised in Figure 1. In statement, they did not impose this condition on sentences. a preprocessing step, tuples that represent the acting parties, Sharma et al. [2015] present a three-step method based on their actions, and their relationships are extracted from the their semantic parser “K-parser”: first, the input sentence is input Winograd schema sentences. For instance, the Winograd brought into a graph representation encoding the conceptual schema from Section 1 leads to tuples (truck, zoomed by, bus) classes of words as well as their dependencies and semantic and (zoomed by, goes slow). Then a commonsense repository, relations. Next, a series of Google queries is constructed where ConceptNet, is employed to derive further information about nominal entities are replaced by wild cards and verbs are sub- the extracted words. These facts come in the form of binary stituted by their synonyms. The sentences thus found are then relations; for instance, it may tell us that zooming by is a form matched against the input by comparing their semantic graphs of travelling rapidly. From these facts, a conditional knowl- by a set of predefined rules. For evaluation, they selected 71 edge base is constructed in the logic BO, and from which sentences from the WSC corpus, resulting in 53 answers, 49 the system aims to infer through automated reasoning how to of which were correct. disambiguate the pronoun in the most plausible way. In the Bailey et al. [2015] introduce a new formalism for reasoning school bus-scenario, the system compares the plausibility of about the correlation between sentences, which can either be party # 1 (the truck) versus party # 2 (the school bus) being the positive or negative. Intuitively, the correlation between two slow party. sentences is positive when after hearing the first sentence, the hearer views the second sentence as more plausible than before. Negative correlation on the other hand lowers the plausibility. 4 A Logic of Plausible Beliefs For example, Reasoning in our framework is based on a logic of plausible The delivery truck zoomed by the school bus. beliefs [Schwering et al., 2017] called BO. This logic features is positively correlated with non-monotonic conditionals of the form The school bus was going so slow. If some premise holds, then plausibly some consequent is true, but negatively with which enables us to represent and reason about more and less The truck was going so slow. plausible contingencies. We expect this to be crucial in the The authors of the aforementioned paper present the corre- context of the WSC, where we aim to find the correct answer lation calculus as a deductive system to formalise this idea. by comparing the plausibility of the different candidates. In It extends classical predicate calculus by a new operator ⊕, the following we briefly introduce the formal machinery of where F ⊕ G expresses that F and G are positively correlated, this logic. and F ⊕ ¬G denotes a negative correlation. They show how The terms in our language consist of infinitely many first- their formalism can be used to derive such correlation formu- order variables, usually denoted as x or y, and infinitely many las from a set of formulas encoding background information (standard) names n ∈ N = {# 1, # 2,...}. Formulas are of the and thus justify the answers for several WSC instances. While form their work makes an important step towards solving the WSC, • P (t1 ,..., tk ), (t1 = t2 ), (α ∧ β), ¬α, ∀xα, they admit that it so far does not address the important problem of generating the needed commonsense background axioms • B(φ1 ⇒ ψ1 ), O{φ1 ⇒ ψ1 ,..., φm ⇒ ψm }, inition requires that ep = ep+1 = ... for some p. (Just as well we could have defined an epistemic state to be a non-empty finite sequence of supersets; the present definition however is often easier to work with.) Given an epistemic state ~e and a world w, we can define truth of a sentence α in BO, written ~e, w |= α. We let αnx denote the result of substituting all free occurrences of x by n. The objective part of the semantics is defined inductively as follows: 1. ~e, w |= P (n1 ,..., nj ) iff P (n1 ,..., nj ) ∈ w; 2. ~e, w |= (n1 = n2 ) iff n1 and n2 are identical names; 3. ~e, w |= (α ∧ β) iff ~e, w |= α and ~e, w |= β; 4. ~e, w |= ¬α iff ~e, w 6|= α; 5. ~e, w |= ∀xα iff ~e, w |= αnx for all n ∈ N . Notice that Rule 5 handles quantification by substitution of standard names. Before we proceed with the semantics of conditional belief, we define the plausibility b~e | φc of an objective sentence φ in Figure 1: Components overview. ~e as the index of the first sphere consistent with φ: b~e | φc = min{p | p = ∞ or ~e, w |= φ for some w ∈ ep }, where P is a predicate symbol other than =, ti are terms, α and where ∞ ∈ / {1, 2,...} represents an “undefined” plausibility β are formulas, and φi and ψi are objective formulas, that is, with the understanding that p + ∞ = ∞ and p < ∞ for all φi and ψi mention no B or O operators. A formula is ground p ∈ {1, 2,...}. Then the semantics of beliefs is as follows: when it contains no variables. A sentence is a formula without free variables. Let ∨, ∃, ⊃, ≡ be the usual abbreviations, > 6. ~e, w |= B(φ ⇒ ψ) iff for all p ≥ 1, stand for ∀x(x = x), and ⊥ abbreviate ¬>. if p ≤ b~e | φc and w0 ∈ ep , then ~e, w0 |= (φ ⊃ ψ); Standard names can be seen as special constants that rep- resent all the individuals in the universe. They considerably 7. ~e, w |= O{φ1 ⇒ ψ1 ,..., φm ⇒ ψm } iff for all p ≥ 1, simplify the semantics compared to the classical Tarskian w0 ∈ ep iff ~e, w0 |= i:b~e | φi c≥p (φi ⊃ ψi ). V model theory; in particular, quantification can be handled by simply substituting standard names. The formula B(φ ⇒ ψ) The intuitive meaning of Rule 6 is that B(φ ⇒ ψ) holds iff the intuitively expresses a conditional belief, namely the belief most-plausible φ worlds also satisfy ψ. Note that B(φ ⇒ ψ) that if φ is true, then plausibly ψ is also true. The formula is vacuously true when there is no φ world. As a consequence, O{φ1 ⇒ ψ1 ,..., φm ⇒ ψm } goes one step further and says B(¬φ ⇒ ⊥) can be used to express that φ is known. that the conditional beliefs B(φi ⇒ ψi ) are all that is be- Only-believing O{φ1 ⇒ ψ1 ,..., φm ⇒ ψm } has the effect lieved; everything that is not a consequence of these condi- of B(φi ⇒ ψi ) plus maximising every sphere: it requires the tional beliefs is implicitly not believed. The concept is called sphere ep to contain all worlds that satisfy all (φi ⊃ ψi ) for only-believing; it generalises Levesque’s only-knowing [1990] which b~e | φi c ≥ p. It turns out that this definition gives rise to the case of conditional beliefs and shares many properties to a procedure that generates the unique system of spheres ~e with Pearl’s System Z [1990]. Only-believing is particularly that corresponds to O{φ1 ⇒ ψ1 ,..., φm ⇒ ψm } [Schwering useful to capture the idea of a conditional knowledge base: the et al., 2017], which means that a conditional knowledge base knowledge base is all the agent believes. has a unique semantic representation as a system of spheres. To investigate such problems, we define a possible-worlds While reasoning in the logic as presented here is undecid- semantics for BO. A world is a set of ground non-equality able given its first-order nature, a decidable (and sometimes atoms. An epistemic state ~e is an infinite sequence of sets tractable) variant based on the theory of limited belief has been of worlds e1 , e2 ,... such that e1 ⊆ e2 ⊆ ... and for some developed and implemented [Schwering and Lakemeyer, 2016; p ∈ {1, 2,...}, ep = ep+1 = ... Schwering, 2017]. Intuitively, a world is a truth assignment of the ground non-equality atoms, that is, of all P (n1 ,..., nj ) where the ni are standard names. An epistemic state stratifies sets of such 5 Implementation worlds with the subset relation. The intuition behind an epis- In order to automate the process of solving WSC through temic state ~e is to model a system of spheres: e1 contains reasoning with conditional beliefs, we need to (a) translate the most-plausible worlds, e2 adds the second-most-plausible a given WSC instance into formulas of BO, (b) provide nec- worlds, and so on. Note that in any epistemic state ~e only essary background knowledge in the form of a KB of BO finitely many different sets of worlds are allowed, since the def- formulas, and (c) call the BO reasoner to answer the query. 5.1 Extraction of Knowledge from ConceptNet 5 Here we adopt an approach based on ConceptNet 5 [Speer and Havasi, 2012], a large semantic network that represents knowledge in a graph structure where each node is a concept (e.g., “zoom”, “going fast”) and each edge corresponds to one of 28 different relations (e.g., “IsA”, “RelatedTo”). Apart from Relation Description Mapping manually entered data, the network integrates information IsA A is a kind of B A⇒B from various sources, including subsets of the OpenCyc and NotIsA A is not kind of B A ⇒ ¬B DBPedia ontologies as well as the Wiktionary and WordNet RelatedTo A is related to B A ⇒ B and 3.0 dictionaries. The origin of each piece of information is kept B⇒A in metadata associated to each edge, in particular a numerical DerivedFrom A is derived from B A⇒B weight representing the strength of the assertion. Weights have Antonym A is the opposite of B ¬(A ∧ B) a default value of 1 but can be higher or lower, where a negative MotivatedBy A is motivated by B B⇒A value expresses that the assertion is false or irrelevant. Synonym A is a synonym of B A≡B As opposed to fully-fledged commonsense knowledge bases FormOf B is the root word of A≡B such as Cyc [Matuszek et al., 2006], ConceptNet 5 does not A come with a pre-defined formal semantics and/or inference engine. It can rather be viewed as an intermediate step between HasPrerequisite in order for A to hap- A⇒B the intuitive, informal knowledge used by humans on the one pen, B needs to hap- hand and a formal, logic-based representation on the other. pen As developers, this gives us the freedom to attach meaning to UsedFor A is used for B A⇒B assertions in a way that suits our application. NotUsedFor A is not used for B A ⇒ ¬B In our case, we identified 24 of the 28 relations to be useful PartOf A is part of B B⇒A and mapped them to conditional beliefs. Consider the IsA rela- HasA B belongs to A, either A⇒B tion: Each corresponding tuple has the form IsA(A,B), where as an inherent part or A and B are concepts. Assuming that we represent concepts by due to a social con- unary predicates, we can encode the assertion as an ordinary struct of possession belief (expressing that it is plausible to assume that if x is an CapableOf A is capable of B A⇒B A, it is also a B): NotCapableOf A is not capable of B A ⇒ ¬B ObstructedBy A is a goal that can be B ⇒ ¬A > ⇒ ∀x (A(x) ⊃ B(x)) (1) prevented by B However, thus the statement will only hold in the most plau- HasProperty A has B as a property A⇒B sible sphere, which essentially amounts to doing monotonic NotHasProperty A has not B as a prop- B ⇒ ¬A reasoning only. We hence instead use the following encoding: erty Desires A is a conscious A⇒B A(x) ⇒ B(x) (2) entity that typically That is, the most plausible A’s are assumed to be B’s. Similar wants B mappings have to be defined for the remaining relations. The NotDesires A is a conscious A ⇒ ¬B ones we used are summarised in Table 1, where A ⇒ B stands entity that typically for formula (2), A ⇒ ¬B is shorthand for A(x) ⇒ ¬B(x) does not want B etc. CreatedBy B is a process or A⇒B agent that creates A 5.2 Representing Winograd Schemas in BO DefinedAs A and B overlap con- A⇒B To create the knowledge base, we start with the given WSC siderably in meaning, sentence, e.g. and B is a more ex- planatory version of The fish ate the worm. It was tasty. A Here, the word tasty can be replaced by hungry and the refer- Entails If A is happening, B A⇒B ence of it changes from worm to fish. Note that this a structure is also happening that many (but not all) Winograd schemas exhibit: They start MannerOf A is a specific way to A⇒B with a statement (unambiguously) describing a situational do B relationship between two parties (“the fish ate the worm”), fol- lowed by an ambiguous description of an effect of it (“it was Table 1: ConceptNet relations and their mapping tasty”). We use the natural language processing tool ReVerb [Fader et al., 2011] to obtain a structured representation for the former in the form of a triple (fish, ate, worm). Note that it is possible to replace both parties with placehold- ers and still understand the sentence and resolve the pronoun. The requirement that the names of the two parties do not carry any relevant information for solving a WSC, actually intended 6 Evaluation as a main difficulty, can here be used to our advantage: Instead 6.1 Quantitative Evaluation of “fish” we simply denote the first party by standard name # 1, and similarly the “worm” as second party by # 2. We will not We performed a preliminary experimental evaluation of our im- have to worry about reasoning about the relationship between plemented system on the publicly available Winograd schema a fish and a worm, but can rather concentrate on the relation collection1 containing 144 schemas in total. Since ConceptNet between concepts ate and tasty. does not directly support querying paths between concepts, we The triple (fish, ate, worm) now gives us the certain infor- search in a two-stage process. First, a subgraph of ConceptNet mation that the fish (# 1) is the active party in the event “ate”, is generated in memory, which we do by starting with the which we represent by adding target concepts and successively adding adjacent edges and nodes, where for each node the (at most) 30 edges with highest ¬ate(# 1 ) ⇒ ⊥ (3) weight are considered. We then search for the actual path by to the KB. Next, we consult ConceptNet to check if more infor- means of the Dijkstra algorithm, alternatingly using edge costs mation about the two parties is available. Specifically, we try to of 1/weight or 1 (the latter because sometimes ConceptNet’s use the fact that the worm (# 2) is the passive party in the event. weights are not reliable). We hence search for a concept which has similar properties as Table 2 shows the results for increasing depth limits in an eaten worm, that is to say something which is commonly terms of the number of decisions made (recall that this means known to be eaten, i.e., something which is the passive party that exactly one of the two queries comes out true) and the in the event of ate. After determining eaten to be the passive fraction of correctly answered sentences. While obviously a form of ate through the common root word eat, we therefore search depth of zero yields no results, neither does a limit of query ConceptNet to find instances of the ReceivesAction one, meaning none of the concepts mentioned in Winograd relation (that we only use for this specific purpose). In the schemas are directly connected in ConceptNet, showing that example, ConceptNet suggests that the concept apple is com- these schemas are indeed hard in some sense. Once we set monly known to be eaten since ReceivesAction(apple, eaten) the limit to two or above, decisions are made with increasing has a high weight. We hence represent that the worm is the accuracy (though not much better than guessing). With the passive party by the formula current implementation we were not able to increase the depth > ⇒ apple(# 2 ) (4) beyond three due to reaching memory limitations. The new expression can be read as it is believed that the second depth decisions correctly answered party is active in being an ’apple’. The fact that the passive 0 0 0 form of ate is eaten is further represented as follows: 1 0 0 ate(x ) ⇒ ¬eaten(x ) 2 103 52 (50.5%) (5) 3 159 85 (53.5%) ¬ate(x ) ⇒ eaten(x ) The new beliefs can be read as it is believed that if party x is Table 2: Quantitative evaluation active in the event ’ate’ it presumably is not active in the event ’eaten’, and vice versa. Finally, we add the belief that if a party After looking into some of the examples where wrong or no is active in the event of eaten, then it is presumably active in decisions were made, we adjusted our method slightly. First, the event of being an apple: we know that exactly one of the two parties is referred by eaten(x ) ⇒ apple(x ) (6) the pronoun. Moreover, while one party (the fish) actively Note that beliefs of the form (4) to (6) are only generated in participates (eats) in the described scenario as encoded by (3), case a corresponding ReceivesAction actually exists (which is often the other party (the worm) is not the active one. We often not the case); only the ones of the form (3) are guaranteed hence changed the antecedents of the belief conditionals in the to be present in every WSC instance. queries (7) to formulas of the form (tasty(# 1) ⊕ tasty(# 2)) ∧ ¬ate(# 2 ) 5.3 Reasoning where ⊕ denotes exclusive-or. Furthermore, we found that the Starting with the formulas generated for the WSC sentence returned paths disproportionally often contain the RelatedTo, as demonstrated in Section 5.2, we search for a path in Con- Synonym and IsA relations. To discourage such overuse, we ceptNet that connects the two target concepts ate and tasty introduced a penalty on the weights for the most frequently identified from the WSC sentenced (using normalisation like appearing relations. These modifications, with depth limit elimination of stop words and word stemming if needed). All three, resulted in 93 decisions and a success rate of 54.8%, relations on this path are mapped to formulas as described in i.e., 51 correct and 42 incorrect answers. Section 5.1, instantiated by the standard names # 1, # 2 repre- senting the two parties. We then call the BO reasoner on the 6.2 Qualitative Evaluation resulting KB with the two queries To illustrate our system’s operation, let us a look at a schema tasty(# 1 ) ∨ tasty(# 2 ) ⇒ tasty(# 1 ) successfully resolved by it in more depth, namely “The de- (7) livery truck zoomed by the school bus because it is go- tasty(# 1 ) ∨ tasty(# 2 ) ⇒ tasty(# 2 ) ing so fast/slow”. ReVerb maps the sentence to the triple A decision is then made if exactly one of them comes out as 1 true and the other as false. http://www.cs.nyu.edu/faculty/davise/papers/WinogradSchemas/WS.html (delivery truck , zoomed by, school bus), yielding the two thus adopt a knowledge-based approach that, when it comes target concepts zoomed by and going fast (or going slow , to the reasoning part, has the benefit that the generated BO respectively). A search in ConceptNet returns the following KB can be regarded as a human-readable justification for the RelatedT o Synonym decision made by the system, which moreover comes with path between them: zoomed −−−−−−−→ zoom −−−−−−→ RelatedT o RelatedT o RelatedT o a clear formal semantics. We also employ a major difficulty whizz −−−−−−−→ sound −−−−−−−→ wave −−−−−−−→ of the WSC to our advantage, namely that the two parties RelatedT o IsA water −−−−−−−→ boat −−→ going fast. The KB constructed in a schema do not carry any relevant information and could from this is then equally be replaced by placeholders. {¬zoomed(# 1) ⇒ ⊥, While we do not achieve a very high quantitative success zoomed(n) ⇒ zoom(n), rate on the WSC dataset, our results are consistently better than chance, indicating that this may be a step in the right direction. zoom(n) ⇒ zoomed(n), Our system’s performance of course heavily depends on the > ⇒ whizz(n) ≡ zoom(n), quality of data fed into it, both from the NLP module’s and whizz(n) ⇒ sound(n), ConceptNet’s side. In particular, information in ConceptNet on many topics is rather sparse and relies heavily on the RelatedTo sound(n) ⇒ wave(n), relation, whose meaning is very vague and ambiguous. wave(n) ⇒ sound(n), There are many directions for future work. First of all, fur- wave(n) ⇒ water(n), ther experimentation with our system may be in order. Our water(n) ⇒ wave(n), mapping from ConceptNet relations to belief conditionals is not set in stone, and quite simple in the sense that many rela- boat(n) ⇒ water(n), tions are represented in the same way. It would be interesting water(n) ⇒ boat(n), (and not much effort to implement in our existing framework) boat(n) ⇒ going f ast(n) | n ∈ {# 1, # 2}} to try other translations. For example, at the moment our sys- tem neglects that there may be interactions between the differ- which correctly suggests that party # 1 is the correct one, i.e. the ent relations, e.g. we may want to deduce that X is not capable delivery truck. For the alternate sentence we obtain the KB of doing Y from the facts that X is not capable of doing Z and {¬zoomed(# 1) ⇒ ⊥, that Z is a prerequisite of Y. zoomed(n) ⇒ zoom(n), Furthermore, ConceptNet was only one possible choice for a commonsense knowledge repository, and the system may zoom(n) ⇒ zoomed(n), benefit from trying other alternatives such as the Never-Ending- zoom(n) ⇒ travel(n), Learning (NELL) [Mitchell et al., 2015] KB. In contrast to rush(n) ⇒ travel(n), ConceptNet, NELL employs a large number of lower level zoom(n) ⇒ travel rapidly(n), relations such as playsInstrument which on the one hand would reduce ambiguity, but on the other hand increase the required > ⇒ hurry(n) ≡ rush(n), modelling effort, i.e., constructing a mapping. > ⇒ hurry(n) ≡ travel rapidly(n), Acknowledgments. > ⇒ ¬(rush(n) ∧ go slow(n)) | n ∈ {# 1, # 2}} This work was supported by the German Research Founda- which similarly relies heavily on RelatedTo, Synonym, and tion (DFG) research unit FOR 1513 on Hybrid Reasoning for IsA, but additionally uses that rush is an antonym of go slow. Intelligent Systems, project A1. The system again yields the correct answer # 2 (the school bus) to the query “what is going slow?”. References The examples demonstrate that most generated condition- als seem reasonable and relevant for the schema, however [Bailey et al., 2015] Dan Bailey, Amelia Harrison, Yuliya sometimes interspersed with ones that are not as intuitive. The Lierler, Vladimir Lifschitz, and Julian Michael. The Wino- latter is often due to the RelatedTo relation, whose mean- grad schema challenge and reasoning about correlation. In ing is on the one hand very vague, and which on the other Working Notes of the Symposium on Logical Formalizations hand accounts for the majority of edges in ConceptNet. In of Commonsense Reasoning, 2015. any case, even if the system yields a wrong answer, we ob- [Fader et al., 2011] Anthony Fader, Stephen Soderland, and tain human-understandable justifications for the automatically Oren Etzioni. Identifying relations for open information made decisions, which we believe is a major benefit of a extraction. In Proceedings of the Conference on Empiri- knowledge-based approach to the WSC. cal Methods in Natural Language Processing, pages 1535– 1545. Association for Computational Linguistics, 2011. 7 Conclusion [Levesque et al., 2012] Hector J. Levesque, Ernest Davis, and We presented a solution to the Winograd Schema Challenge Leora Morgenstern. The Winograd Schema Challenge. In based on reasoning about conditional beliefs, where knowl- Gerhard Brewka, Thomas Eiter, and Sheila A. McIlraith, edge bases are generated by means of the NLP tool ReVerb editors, Proceedings of the Thirteenth International Con- and the semantic net ConceptNet 5. Unlike many other ap- ference on the Principles of Knowledge Representation and proaches that rely on searching text repositories (Google), we Reasoning (KR 2012), pages 552–561. AAAI Press, 2012. [Levesque, 1990] Hector J. Levesque. All I know: a study in [Speer and Havasi, 2012] Robert Speer and Catherine Havasi. autoepistemic logic. Artificial Intelligence, 42(2), 1990. Representing general relational knowledge in ConceptNet [Liu et al., 2017] Quan Liu, Hui Jiang, Andrew Evdokimov, 5. In Proceedings of the Eighth International Conference Zhen-Hua Ling, Xiaodan Zhu, Si Wei, and Yu Hu. Cause- on Language Resources and Evaluation (LREC), pages effect knowledge acquisition and neural association model 3679–3686. European Language Resources Association for solving a set of Winograd schema problems. In Proceed- (ELRA), 2012. ings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI), pages 2344–2350. AAAI Press, 2017. [Matuszek et al., 2006] Cynthia Matuszek, John Cabral, Michael Witbrock, and John Deoliveira. An introduction to the syntax and content of Cyc. In Proceedings of the 2006 AAAI Spring Symposium on Formalizing and Compil- ing Background Knowledge and Its Applications to Knowl- edge Representation and Question Answering, pages 44–49, 2006. [Mitchell et al., 2015] T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mo- hamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves, and J. Welling. Never-ending learning. In Proceedings of the Twenty-Ninth AAAI Confer- ence on Artificial Intelligence (AAAI-15), pages 2302–2310. AAAI Press, 2015. [Pearl, 1990] Judea Pearl. System Z: A natural ordering of defaults with tractable applications to nonmonotonic rea- soning. In Proceedings of the Third Conference on Theoret- ical Aspects of Reasoning about Knowledge (TARK), pages 121–135. Morgan Kaufmann, 1990. [Rahman and Ng, 2012] Altaf Rahman and Vincent Ng. Re- solving complex cases of definite pronouns: The Winograd schema challenge. In Jun’ichi Tsujii, James Henderson, and Marius Pasca, editors, Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learn- ing (EMNLP-CoNLL 2012), pages 777–789. ACL, 2012. [Schwering and Lakemeyer, 2016] Christoph Schwering and Gerhard Lakemeyer. Decidable reasoning in a first-order logic of limited conditional belief. In Proceedings of the Twenty-Second European Conference on Artificial Intelli- gence (ECAI), pages 1379–1387. IOS Press, 2016. [Schwering et al., 2017] Christoph Schwering, Gerhard Lake- meyer, and Maurice Pagnucco. Belief revision and projec- tion in the epistemic situation calculus. Artificial Intelli- gence, 251:62–97, 2017. [Schwering, 2017] Christoph Schwering. Limbo: A reason- ing system for limited belief. In Proceedings of the Twenty- Sixth International Joint Conference on Artificial Intelli- gence (IJCAI), pages 5246–5248. AAAI Press, 2017. [Sharma et al., 2015] Arpit Sharma, Nguyen Ha Vo, Somak Aditya, and Chitta Baral. Towards addressing the Winograd schema challenge – building and using a semantic parser and a knowledge hunting module. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI), pages 1319–1325. AAAI Press, 2015.