Reasoning about Conditional Beliefs for the Winograd Schema Challenge

                 Denis Golovin and Jens Claßen                              Christoph Schwering
                 Knowledge-Based Systems Group                   School of Computer Science and Engineering
                    RWTH Aachen University                           The University of New South Wales
                        Aachen, Germany                                      Sydney, Australia


                           Abstract                               the statement (“the delivery truck”, “the school bus”) a pro-
                                                                  noun (“it”) refers. Each schema contains a special word (here:
      The Winograd Schema Challenge has been proposed             “slow”) that, when replaced by an alternate word (“fast”),
      as an alternative to the Turing Test for measuring          changes the answer. While obvious to an English-speaking
      a machine’s intelligence by letting it solve difficult      human reader, such questions pose a real challenge for a ma-
      pronoun resolution problems that cannot be tackled          chine since the given statement alone does not provide enough
      by statistical analysis alone. While many solutions         information to derive an answer, but rather requires having
      so far are based on machine learning and natural            appropriate commonsense background knowledge and being
      language processing, we believe that a knowledge-           able to reason about it. Winograd schemas are designed to be
      based approach is better suited. In particular, we pro-     “Google-proof”, meaning that they should not be solvable by
      pose to employ a logic for conditional beliefs that is      simple statistical analysis: If instead of a “delivery truck” the
      capable of dealing with incomplete or even inconsis-        above sentence spoke of a “sports car”, it would not be hard to
      tent information (which commonsense knowledge               detect a correlation to words associated with fast movement.
      often is). It does so by formalising the observation
                                                                     WSC competitions that were held so far prove that the task is
      that humans often reason by picturing different con-
                                                                  indeed difficult, and likely cannot be solved by classical natural
      tingencies of what the world could be like, and then
                                                                  language processing alone. Robust commonsense reasoning
      choose to believe what is thought to be most plau-
                                                                  on the other hand will undoubtedly be beneficial in other areas,
      sible. We discuss and evaluate an implementation
                                                                  in particular when it comes to domestic robots and human-
      where relevant commonsense background informa-
                                                                  machine interaction in general.
      tion is obtained from the ConceptNet semantic net-
                                                                     Such a system will not only need access to a large corpus
      work, translated into our formalism, and processed
                                                                  of background information that preferably covers all areas
      by a reasoner for our logic.
                                                                  of the wide spectrum of everyday human life, but also be
                                                                  able to appropriately function in light of the fact that such
1    Introduction                                                 information is often incomplete or even conflicting. Humans
The Winograd Schema Challenge (WSC) has been proposed by          are very capable of this, and we usually do so by picturing
Levesque, Davis and Morgenstern [2012] as an alternative to       different contingencies of what the world could be like, and
the Turing Test for measuring a machine’s intelligence. While     then choosing to believe what we think the most plausible
agreeing with Turing’s primary concern, which is focusing         scenario is.
on the observable behaviour exhibited by the system, the aim         In this paper, we present an approach to the WSC that is
is to avoid some of the pitfalls that come with a free-form       based on a logic that formalises these concepts, the logic of
conversation as Turing had in mind. Most importantly, in          conditional beliefs [Schwering et al., 2017]. We designed and
order to pass as a person, a machine will have to resort to       implemented a system that
deception to be able to answer questions about their height,         • translates the WSC sentence into a logical formula,
childhood, and so on. Moreover, the judgement of what passes
as human-like conversation is highly subjective and may vary         • extracts relevant background information from a knowl-
depending on the interrogator that is performing the test.             edge corpus and translates it into our logical language,
   The central idea behind the WSC is to instead let the ma-           and
chine answer a number of questions of a specific form. As an         • performs reasoning to decide what the most plausible
example, consider the statement                                        answer is.
    The delivery truck zoomed by the school bus because it was      The rest of this paper is organised as follows. In Section 2
    going so slow.                                                we discuss related work. Section 3 shows an overview of
   The question is “What was going slow?”, i.e., the task is      our system’s architecture. We then present the formal frame-
to disambiguate to which of the two parties mentioned in          work of our approach in Section 4. Section 5 describes the
implementation of our system, while Section 6 presents an            (these were hand coded in their examples), nor did they come
evaluation. Finally we conclude in Section 7.                        up with an implementation that would allow automating the
                                                                     reasoning process and evaluating the approach against others.
2    Related Work                                                    Their formalism also currently lacks the capability to handle
                                                                     non-monotonic aspects such as the statement
Rahman and Ng [2012] view the WSC as a special form of
pronoun resolution problem and propose an approach based                 If the shelf is level, then the sculpture will not roll off.
on machine learning. Their system relies on abstract feature         which, when read as default rule, allows to withdraw the con-
vectors extracted from eight different sources, among which          clusion that the sculpture does not roll off in face of new
Google and Lexical Features turned out to be most useful             information, e.g. that there is an earthquake.
in their analysis. For the former, they construct a number of           Finally, Liu et al. [2017] propose another approach based on
queries from the verb governing the pronoun in the input sen-        machine learning, which is used (a) in a knowledge acquisition
tence, the words followed by it, and the two candidate parties,      step to identify cause-effect relationships between commonly
and then count the number of results returned by the search en-      used words from large text corpora and (b) to find association
gine. The latter are obtained from a training set and represent      relationships in the form of conditional probabilities between
the presence or absence of certain predefined structures in the      pairs of discrete events. They manually selected a subset of
sentence. They achieve an accuracy of 73.1% on their test set,       70 Winograd schemas, among which their method achieves an
where the overall dataset of 941 sentence pairs was created by       accuracy of 70%.
30 students and split using a 70/30 training/test ratio. It should
be noted that they considered a relaxed version of the problem:
while the WSC only allows for instances where resolving the          3     System Architecture Overview
pronoun requires background knowledge not expressed in the           The architecture of our system is visualised in Figure 1. In
statement, they did not impose this condition on sentences.          a preprocessing step, tuples that represent the acting parties,
   Sharma et al. [2015] present a three-step method based on         their actions, and their relationships are extracted from the
their semantic parser “K-parser”: first, the input sentence is       input Winograd schema sentences. For instance, the Winograd
brought into a graph representation encoding the conceptual          schema from Section 1 leads to tuples (truck, zoomed by, bus)
classes of words as well as their dependencies and semantic          and (zoomed by, goes slow). Then a commonsense repository,
relations. Next, a series of Google queries is constructed where     ConceptNet, is employed to derive further information about
nominal entities are replaced by wild cards and verbs are sub-       the extracted words. These facts come in the form of binary
stituted by their synonyms. The sentences thus found are then        relations; for instance, it may tell us that zooming by is a form
matched against the input by comparing their semantic graphs         of travelling rapidly. From these facts, a conditional knowl-
by a set of predefined rules. For evaluation, they selected 71       edge base is constructed in the logic BO, and from which
sentences from the WSC corpus, resulting in 53 answers, 49           the system aims to infer through automated reasoning how to
of which were correct.                                               disambiguate the pronoun in the most plausible way. In the
   Bailey et al. [2015] introduce a new formalism for reasoning      school bus-scenario, the system compares the plausibility of
about the correlation between sentences, which can either be         party # 1 (the truck) versus party # 2 (the school bus) being the
positive or negative. Intuitively, the correlation between two       slow party.
sentences is positive when after hearing the first sentence, the
hearer views the second sentence as more plausible than before.
Negative correlation on the other hand lowers the plausibility.      4     A Logic of Plausible Beliefs
For example,                                                         Reasoning in our framework is based on a logic of plausible
    The delivery truck zoomed by the school bus.                     beliefs [Schwering et al., 2017] called BO. This logic features
is positively correlated with                                        non-monotonic conditionals of the form

    The school bus was going so slow.                                    If some premise holds,
                                                                         then plausibly some consequent is true,
but negatively with
                                                                     which enables us to represent and reason about more and less
    The truck was going so slow.                                     plausible contingencies. We expect this to be crucial in the
The authors of the aforementioned paper present the corre-           context of the WSC, where we aim to find the correct answer
lation calculus as a deductive system to formalise this idea.        by comparing the plausibility of the different candidates. In
It extends classical predicate calculus by a new operator ⊕,         the following we briefly introduce the formal machinery of
where F ⊕ G expresses that F and G are positively correlated,        this logic.
and F ⊕ ¬G denotes a negative correlation. They show how                The terms in our language consist of infinitely many first-
their formalism can be used to derive such correlation formu-        order variables, usually denoted as x or y, and infinitely many
las from a set of formulas encoding background information           (standard) names n ∈ N = {# 1, # 2,...}. Formulas are of the
and thus justify the answers for several WSC instances. While        form
their work makes an important step towards solving the WSC,
                                                                         • P (t1 ,..., tk ),   (t1 = t2 ),   (α ∧ β),   ¬α,    ∀xα,
they admit that it so far does not address the important problem
of generating the needed commonsense background axioms                   • B(φ1 ⇒ ψ1 ),         O{φ1 ⇒ ψ1 ,..., φm ⇒ ψm },
                                                                     inition requires that ep = ep+1 = ... for some p. (Just as well
                                                                     we could have defined an epistemic state to be a non-empty
                                                                     finite sequence of supersets; the present definition however is
                                                                     often easier to work with.)
                                                                        Given an epistemic state ~e and a world w, we can define
                                                                     truth of a sentence α in BO, written ~e, w |= α. We let αnx
                                                                     denote the result of substituting all free occurrences of x by n.
                                                                     The objective part of the semantics is defined inductively as
                                                                     follows:
                                                                         1. ~e, w |= P (n1 ,..., nj ) iff P (n1 ,..., nj ) ∈ w;
                                                                         2. ~e, w |= (n1 = n2 ) iff n1 and n2 are identical names;
                                                                         3. ~e, w |= (α ∧ β) iff ~e, w |= α and ~e, w |= β;
                                                                         4. ~e, w |= ¬α iff ~e, w 6|= α;
                                                                         5. ~e, w |= ∀xα iff ~e, w |= αnx for all n ∈ N .
                                                                      Notice that Rule 5 handles quantification by substitution of
                                                                      standard names.
                                                                         Before we proceed with the semantics of conditional belief,
                                                                     we define the plausibility b~e | φc of an objective sentence φ in
                 Figure 1: Components overview.
                                                                     ~e as the index of the first sphere consistent with φ:
                                                                         b~e | φc = min{p | p = ∞ or ~e, w |= φ for some w ∈ ep },
where P is a predicate symbol other than =, ti are terms, α and      where ∞ ∈  / {1, 2,...} represents an “undefined” plausibility
β are formulas, and φi and ψi are objective formulas, that is,       with the understanding that p + ∞ = ∞ and p < ∞ for all
φi and ψi mention no B or O operators. A formula is ground           p ∈ {1, 2,...}. Then the semantics of beliefs is as follows:
when it contains no variables. A sentence is a formula without
free variables. Let ∨, ∃, ⊃, ≡ be the usual abbreviations, >             6. ~e, w |= B(φ ⇒ ψ) iff for all p ≥ 1,
stand for ∀x(x = x), and ⊥ abbreviate ¬>.                                         if p ≤ b~e | φc and w0 ∈ ep , then ~e, w0 |= (φ ⊃ ψ);
   Standard names can be seen as special constants that rep-
resent all the individuals in the universe. They considerably            7. ~e, w |= O{φ1 ⇒ ψ1 ,..., φm ⇒ ψm } iff for all p ≥ 1,
simplify the semantics compared to the classical Tarskian                          w0 ∈ ep iff ~e, w0 |= i:b~e | φi c≥p (φi ⊃ ψi ).
                                                                                                        V
model theory; in particular, quantification can be handled by
simply substituting standard names. The formula B(φ ⇒ ψ)             The intuitive meaning of Rule 6 is that B(φ ⇒ ψ) holds iff the
intuitively expresses a conditional belief, namely the belief        most-plausible φ worlds also satisfy ψ. Note that B(φ ⇒ ψ)
that if φ is true, then plausibly ψ is also true. The formula        is vacuously true when there is no φ world. As a consequence,
O{φ1 ⇒ ψ1 ,..., φm ⇒ ψm } goes one step further and says             B(¬φ ⇒ ⊥) can be used to express that φ is known.
that the conditional beliefs B(φi ⇒ ψi ) are all that is be-            Only-believing O{φ1 ⇒ ψ1 ,..., φm ⇒ ψm } has the effect
lieved; everything that is not a consequence of these condi-         of B(φi ⇒ ψi ) plus maximising every sphere: it requires the
tional beliefs is implicitly not believed. The concept is called     sphere ep to contain all worlds that satisfy all (φi ⊃ ψi ) for
only-believing; it generalises Levesque’s only-knowing [1990]        which b~e | φi c ≥ p. It turns out that this definition gives rise
to the case of conditional beliefs and shares many properties        to a procedure that generates the unique system of spheres ~e
with Pearl’s System Z [1990]. Only-believing is particularly         that corresponds to O{φ1 ⇒ ψ1 ,..., φm ⇒ ψm } [Schwering
useful to capture the idea of a conditional knowledge base: the      et al., 2017], which means that a conditional knowledge base
knowledge base is all the agent believes.                            has a unique semantic representation as a system of spheres.
   To investigate such problems, we define a possible-worlds            While reasoning in the logic as presented here is undecid-
semantics for BO. A world is a set of ground non-equality            able given its first-order nature, a decidable (and sometimes
atoms. An epistemic state ~e is an infinite sequence of sets         tractable) variant based on the theory of limited belief has been
of worlds e1 , e2 ,... such that e1 ⊆ e2 ⊆ ... and for some          developed and implemented [Schwering and Lakemeyer, 2016;
p ∈ {1, 2,...}, ep = ep+1 = ...                                      Schwering, 2017].
   Intuitively, a world is a truth assignment of the ground
non-equality atoms, that is, of all P (n1 ,..., nj ) where the ni
are standard names. An epistemic state stratifies sets of such
                                                                     5     Implementation
worlds with the subset relation. The intuition behind an epis-       In order to automate the process of solving WSC through
temic state ~e is to model a system of spheres: e1 contains          reasoning with conditional beliefs, we need to (a) translate
the most-plausible worlds, e2 adds the second-most-plausible         a given WSC instance into formulas of BO, (b) provide nec-
worlds, and so on. Note that in any epistemic state ~e only          essary background knowledge in the form of a KB of BO
finitely many different sets of worlds are allowed, since the def-   formulas, and (c) call the BO reasoner to answer the query.
5.1   Extraction of Knowledge from ConceptNet 5
Here we adopt an approach based on ConceptNet 5 [Speer
and Havasi, 2012], a large semantic network that represents
knowledge in a graph structure where each node is a concept
(e.g., “zoom”, “going fast”) and each edge corresponds to one
of 28 different relations (e.g., “IsA”, “RelatedTo”). Apart from      Relation         Description               Mapping
manually entered data, the network integrates information             IsA              A is a kind of B          A⇒B
from various sources, including subsets of the OpenCyc and            NotIsA           A is not kind of B        A ⇒ ¬B
DBPedia ontologies as well as the Wiktionary and WordNet              RelatedTo        A is related to B         A ⇒ B and
3.0 dictionaries. The origin of each piece of information is kept                                                B⇒A
in metadata associated to each edge, in particular a numerical        DerivedFrom     A is derived from B        A⇒B
weight representing the strength of the assertion. Weights have       Antonym         A is the opposite of B     ¬(A ∧ B)
a default value of 1 but can be higher or lower, where a negative
                                                                      MotivatedBy     A is motivated by B        B⇒A
value expresses that the assertion is false or irrelevant.
                                                                      Synonym         A is a synonym of B        A≡B
   As opposed to fully-fledged commonsense knowledge bases
                                                                      FormOf          B is the root word of      A≡B
such as Cyc [Matuszek et al., 2006], ConceptNet 5 does not
                                                                                      A
come with a pre-defined formal semantics and/or inference
engine. It can rather be viewed as an intermediate step between       HasPrerequisite in order for A to hap-     A⇒B
the intuitive, informal knowledge used by humans on the one                           pen, B needs to hap-
hand and a formal, logic-based representation on the other.                           pen
As developers, this gives us the freedom to attach meaning to         UsedFor         A is used for B            A⇒B
assertions in a way that suits our application.                       NotUsedFor      A is not used for B        A ⇒ ¬B
   In our case, we identified 24 of the 28 relations to be useful     PartOf          A is part of B             B⇒A
and mapped them to conditional beliefs. Consider the IsA rela-        HasA            B belongs to A, either     A⇒B
tion: Each corresponding tuple has the form IsA(A,B), where                           as an inherent part or
A and B are concepts. Assuming that we represent concepts by                          due to a social con-
unary predicates, we can encode the assertion as an ordinary                          struct of possession
belief (expressing that it is plausible to assume that if x is an     CapableOf       A is capable of B          A⇒B
A, it is also a B):                                                   NotCapableOf A is not capable of B         A ⇒ ¬B
                                                                      ObstructedBy A is a goal that can be       B ⇒ ¬A
                   > ⇒ ∀x (A(x) ⊃ B(x))                        (1)                    prevented by B
However, thus the statement will only hold in the most plau-          HasProperty     A has B as a property      A⇒B
sible sphere, which essentially amounts to doing monotonic            NotHasProperty A has not B as a prop-      B ⇒ ¬A
reasoning only. We hence instead use the following encoding:                          erty
                                                                      Desires         A is a conscious           A⇒B
                         A(x) ⇒ B(x)                           (2)                    entity that typically
That is, the most plausible A’s are assumed to be B’s. Similar                        wants B
mappings have to be defined for the remaining relations. The          NotDesires      A is a conscious           A ⇒ ¬B
ones we used are summarised in Table 1, where A ⇒ B stands                            entity that typically
for formula (2), A ⇒ ¬B is shorthand for A(x) ⇒ ¬B(x)                                 does not want B
etc.                                                                  CreatedBy       B is a process or          A⇒B
                                                                                      agent that creates A
5.2   Representing Winograd Schemas in BO                             DefinedAs       A and B overlap con-       A⇒B
To create the knowledge base, we start with the given WSC                             siderably in meaning,
sentence, e.g.                                                                        and B is a more ex-
                                                                                      planatory version of
   The fish ate the worm. It was tasty.                                               A
Here, the word tasty can be replaced by hungry and the refer-         Entails         If A is happening, B       A⇒B
ence of it changes from worm to fish. Note that this a structure                      is also happening
that many (but not all) Winograd schemas exhibit: They start          MannerOf        A is a specific way to     A⇒B
with a statement (unambiguously) describing a situational                             do B
relationship between two parties (“the fish ate the worm”), fol-
lowed by an ambiguous description of an effect of it (“it was               Table 1: ConceptNet relations and their mapping
tasty”). We use the natural language processing tool ReVerb
[Fader et al., 2011] to obtain a structured representation for
the former in the form of a triple (fish, ate, worm).
   Note that it is possible to replace both parties with placehold-
ers and still understand the sentence and resolve the pronoun.
The requirement that the names of the two parties do not carry
 any relevant information for solving a WSC, actually intended         6       Evaluation
 as a main difficulty, can here be used to our advantage: Instead      6.1       Quantitative Evaluation
 of “fish” we simply denote the first party by standard name # 1,
 and similarly the “worm” as second party by # 2. We will not          We performed a preliminary experimental evaluation of our im-
 have to worry about reasoning about the relationship between          plemented system on the publicly available Winograd schema
 a fish and a worm, but can rather concentrate on the relation         collection1 containing 144 schemas in total. Since ConceptNet
 between concepts ate and tasty.                                       does not directly support querying paths between concepts, we
    The triple (fish, ate, worm) now gives us the certain infor-       search in a two-stage process. First, a subgraph of ConceptNet
 mation that the fish (# 1) is the active party in the event “ate”,    is generated in memory, which we do by starting with the
 which we represent by adding                                          target concepts and successively adding adjacent edges and
                                                                       nodes, where for each node the (at most) 30 edges with highest
                          ¬ate(# 1 ) ⇒ ⊥                        (3)    weight are considered. We then search for the actual path by
 to the KB. Next, we consult ConceptNet to check if more infor-        means of the Dijkstra algorithm, alternatingly using edge costs
 mation about the two parties is available. Specifically, we try to    of 1/weight or 1 (the latter because sometimes ConceptNet’s
 use the fact that the worm (# 2) is the passive party in the event.   weights are not reliable).
 We hence search for a concept which has similar properties as            Table 2 shows the results for increasing depth limits in
 an eaten worm, that is to say something which is commonly             terms of the number of decisions made (recall that this means
 known to be eaten, i.e., something which is the passive party         that exactly one of the two queries comes out true) and the
 in the event of ate. After determining eaten to be the passive        fraction of correctly answered sentences. While obviously a
 form of ate through the common root word eat, we therefore            search depth of zero yields no results, neither does a limit of
 query ConceptNet to find instances of the ReceivesAction              one, meaning none of the concepts mentioned in Winograd
 relation (that we only use for this specific purpose). In the         schemas are directly connected in ConceptNet, showing that
 example, ConceptNet suggests that the concept apple is com-           these schemas are indeed hard in some sense. Once we set
 monly known to be eaten since ReceivesAction(apple, eaten)            the limit to two or above, decisions are made with increasing
 has a high weight. We hence represent that the worm is the            accuracy (though not much better than guessing). With the
 passive party by the formula                                          current implementation we were not able to increase the depth
                         > ⇒ apple(# 2 )                        (4)    beyond three due to reaching memory limitations.
 The new expression can be read as it is believed that the second                      depth      decisions       correctly answered
 party is active in being an ’apple’. The fact that the passive                        0          0               0
 form of ate is eaten is further represented as follows:                               1          0               0
                       ate(x ) ⇒ ¬eaten(x )                                            2          103             52 (50.5%)
                                                                (5)                    3          159             85 (53.5%)
                       ¬ate(x ) ⇒ eaten(x )
 The new beliefs can be read as it is believed that if party x is                              Table 2: Quantitative evaluation
 active in the event ’ate’ it presumably is not active in the event
’eaten’, and vice versa. Finally, we add the belief that if a party       After looking into some of the examples where wrong or no
 is active in the event of eaten, then it is presumably active in      decisions were made, we adjusted our method slightly. First,
 the event of being an apple:                                          we know that exactly one of the two parties is referred by
                      eaten(x ) ⇒ apple(x )                     (6)    the pronoun. Moreover, while one party (the fish) actively
 Note that beliefs of the form (4) to (6) are only generated in        participates (eats) in the described scenario as encoded by (3),
 case a corresponding ReceivesAction actually exists (which is         often the other party (the worm) is not the active one. We
 often not the case); only the ones of the form (3) are guaranteed     hence changed the antecedents of the belief conditionals in the
 to be present in every WSC instance.                                  queries (7) to formulas of the form
                                                                                     (tasty(# 1) ⊕ tasty(# 2)) ∧ ¬ate(# 2 )
5.3    Reasoning                                                       where ⊕ denotes exclusive-or. Furthermore, we found that the
Starting with the formulas generated for the WSC sentence              returned paths disproportionally often contain the RelatedTo,
as demonstrated in Section 5.2, we search for a path in Con-           Synonym and IsA relations. To discourage such overuse, we
ceptNet that connects the two target concepts ate and tasty            introduced a penalty on the weights for the most frequently
identified from the WSC sentenced (using normalisation like            appearing relations. These modifications, with depth limit
elimination of stop words and word stemming if needed). All            three, resulted in 93 decisions and a success rate of 54.8%,
relations on this path are mapped to formulas as described in          i.e., 51 correct and 42 incorrect answers.
Section 5.1, instantiated by the standard names # 1, # 2 repre-
senting the two parties. We then call the BO reasoner on the           6.2       Qualitative Evaluation
resulting KB with the two queries                                      To illustrate our system’s operation, let us a look at a schema
             tasty(# 1 ) ∨ tasty(# 2 ) ⇒ tasty(# 1 )                   successfully resolved by it in more depth, namely “The de-
                                                            (7)        livery truck zoomed by the school bus because it is go-
             tasty(# 1 ) ∨ tasty(# 2 ) ⇒ tasty(# 2 )
                                                                       ing so fast/slow”. ReVerb maps the sentence to the triple
A decision is then made if exactly one of them comes out as
                                                                           1
true and the other as false.                                                   http://www.cs.nyu.edu/faculty/davise/papers/WinogradSchemas/WS.html
(delivery truck , zoomed by, school bus), yielding the two             thus adopt a knowledge-based approach that, when it comes
target concepts zoomed by and going fast (or going slow ,              to the reasoning part, has the benefit that the generated BO
respectively). A search in ConceptNet returns the following            KB can be regarded as a human-readable justification for the
                                   RelatedT o            Synonym       decision made by the system, which moreover comes with
path between them: zoomed −−−−−−−→ zoom −−−−−−→
        RelatedT o           RelatedT o           RelatedT o           a clear formal semantics. We also employ a major difficulty
whizz −−−−−−−→ sound −−−−−−−→ wave −−−−−−−→                            of the WSC to our advantage, namely that the two parties
       RelatedT o       IsA
water −−−−−−−→ boat −−→ going fast. The KB constructed                 in a schema do not carry any relevant information and could
from this is then                                                      equally be replaced by placeholders.
     {¬zoomed(# 1) ⇒ ⊥,                                                   While we do not achieve a very high quantitative success
         zoomed(n) ⇒ zoom(n),                                          rate on the WSC dataset, our results are consistently better than
                                                                       chance, indicating that this may be a step in the right direction.
            zoom(n) ⇒ zoomed(n),                                       Our system’s performance of course heavily depends on the
                   > ⇒ whizz(n) ≡ zoom(n),                             quality of data fed into it, both from the NLP module’s and
           whizz(n) ⇒ sound(n),                                        ConceptNet’s side. In particular, information in ConceptNet on
                                                                       many topics is rather sparse and relies heavily on the RelatedTo
           sound(n) ⇒ wave(n),
                                                                       relation, whose meaning is very vague and ambiguous.
            wave(n) ⇒ sound(n),                                           There are many directions for future work. First of all, fur-
            wave(n) ⇒ water(n),                                        ther experimentation with our system may be in order. Our
           water(n) ⇒ wave(n),                                         mapping from ConceptNet relations to belief conditionals is
                                                                       not set in stone, and quite simple in the sense that many rela-
              boat(n) ⇒ water(n),                                      tions are represented in the same way. It would be interesting
           water(n) ⇒ boat(n),                                         (and not much effort to implement in our existing framework)
              boat(n) ⇒ going f ast(n) | n ∈ {# 1, # 2}}               to try other translations. For example, at the moment our sys-
                                                                       tem neglects that there may be interactions between the differ-
which correctly suggests that party # 1 is the correct one, i.e. the   ent relations, e.g. we may want to deduce that X is not capable
delivery truck. For the alternate sentence we obtain the KB            of doing Y from the facts that X is not capable of doing Z and
{¬zoomed(# 1) ⇒ ⊥,                                                     that Z is a prerequisite of Y.
   zoomed(n) ⇒ zoom(n),                                                   Furthermore, ConceptNet was only one possible choice for
                                                                       a commonsense knowledge repository, and the system may
     zoom(n) ⇒ zoomed(n),                                              benefit from trying other alternatives such as the Never-Ending-
     zoom(n) ⇒ travel(n),                                              Learning (NELL) [Mitchell et al., 2015] KB. In contrast to
      rush(n) ⇒ travel(n),                                             ConceptNet, NELL employs a large number of lower level
     zoom(n) ⇒ travel rapidly(n),                                      relations such as playsInstrument which on the one hand would
                                                                       reduce ambiguity, but on the other hand increase the required
           > ⇒ hurry(n) ≡ rush(n),                                     modelling effort, i.e., constructing a mapping.
           > ⇒ hurry(n) ≡ travel rapidly(n),
                                                                       Acknowledgments.
           > ⇒ ¬(rush(n) ∧ go slow(n)) | n ∈ {# 1, # 2}}
                                                                       This work was supported by the German Research Founda-
which similarly relies heavily on RelatedTo, Synonym, and              tion (DFG) research unit FOR 1513 on Hybrid Reasoning for
IsA, but additionally uses that rush is an antonym of go slow.         Intelligent Systems, project A1.
The system again yields the correct answer # 2 (the school bus)
to the query “what is going slow?”.                                    References
   The examples demonstrate that most generated condition-
als seem reasonable and relevant for the schema, however               [Bailey et al., 2015] Dan Bailey, Amelia Harrison, Yuliya
sometimes interspersed with ones that are not as intuitive. The           Lierler, Vladimir Lifschitz, and Julian Michael. The Wino-
latter is often due to the RelatedTo relation, whose mean-                grad schema challenge and reasoning about correlation. In
ing is on the one hand very vague, and which on the other                Working Notes of the Symposium on Logical Formalizations
hand accounts for the majority of edges in ConceptNet. In                 of Commonsense Reasoning, 2015.
any case, even if the system yields a wrong answer, we ob-             [Fader et al., 2011] Anthony Fader, Stephen Soderland, and
tain human-understandable justifications for the automatically            Oren Etzioni. Identifying relations for open information
made decisions, which we believe is a major benefit of a                  extraction. In Proceedings of the Conference on Empiri-
knowledge-based approach to the WSC.                                      cal Methods in Natural Language Processing, pages 1535–
                                                                         1545. Association for Computational Linguistics, 2011.
7    Conclusion                                                        [Levesque et al., 2012] Hector J. Levesque, Ernest Davis, and
We presented a solution to the Winograd Schema Challenge                  Leora Morgenstern. The Winograd Schema Challenge. In
based on reasoning about conditional beliefs, where knowl-                Gerhard Brewka, Thomas Eiter, and Sheila A. McIlraith,
edge bases are generated by means of the NLP tool ReVerb                  editors, Proceedings of the Thirteenth International Con-
and the semantic net ConceptNet 5. Unlike many other ap-                  ference on the Principles of Knowledge Representation and
proaches that rely on searching text repositories (Google), we            Reasoning (KR 2012), pages 552–561. AAAI Press, 2012.
[Levesque, 1990] Hector J. Levesque. All I know: a study in        [Speer and Havasi, 2012] Robert Speer and Catherine Havasi.
   autoepistemic logic. Artificial Intelligence, 42(2), 1990.         Representing general relational knowledge in ConceptNet
[Liu et al., 2017] Quan Liu, Hui Jiang, Andrew Evdokimov,             5. In Proceedings of the Eighth International Conference
   Zhen-Hua Ling, Xiaodan Zhu, Si Wei, and Yu Hu. Cause-              on Language Resources and Evaluation (LREC), pages
   effect knowledge acquisition and neural association model          3679–3686. European Language Resources Association
   for solving a set of Winograd schema problems. In Proceed-        (ELRA), 2012.
   ings of the Twenty-Sixth International Joint Conference on
   Artificial Intelligence (IJCAI), pages 2344–2350. AAAI
   Press, 2017.
[Matuszek et al., 2006] Cynthia Matuszek, John Cabral,
   Michael Witbrock, and John Deoliveira. An introduction
   to the syntax and content of Cyc. In Proceedings of the
   2006 AAAI Spring Symposium on Formalizing and Compil-
   ing Background Knowledge and Its Applications to Knowl-
   edge Representation and Question Answering, pages 44–49,
   2006.
[Mitchell et al., 2015] T. Mitchell, W. Cohen, E. Hruschka,
   P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner,
   B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mo-
   hamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi,
   B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen,
   A. Saparov, M. Greaves, and J. Welling. Never-ending
   learning. In Proceedings of the Twenty-Ninth AAAI Confer-
   ence on Artificial Intelligence (AAAI-15), pages 2302–2310.
   AAAI Press, 2015.
[Pearl, 1990] Judea Pearl. System Z: A natural ordering of
   defaults with tractable applications to nonmonotonic rea-
   soning. In Proceedings of the Third Conference on Theoret-
   ical Aspects of Reasoning about Knowledge (TARK), pages
   121–135. Morgan Kaufmann, 1990.
[Rahman and Ng, 2012] Altaf Rahman and Vincent Ng. Re-
   solving complex cases of definite pronouns: The Winograd
   schema challenge. In Jun’ichi Tsujii, James Henderson,
   and Marius Pasca, editors, Proceedings of the 2012 Joint
   Conference on Empirical Methods in Natural Language
   Processing and Computational Natural Language Learn-
   ing (EMNLP-CoNLL 2012), pages 777–789. ACL, 2012.
[Schwering and Lakemeyer, 2016] Christoph Schwering and
   Gerhard Lakemeyer. Decidable reasoning in a first-order
   logic of limited conditional belief. In Proceedings of the
   Twenty-Second European Conference on Artificial Intelli-
   gence (ECAI), pages 1379–1387. IOS Press, 2016.
[Schwering et al., 2017] Christoph Schwering, Gerhard Lake-
   meyer, and Maurice Pagnucco. Belief revision and projec-
   tion in the epistemic situation calculus. Artificial Intelli-
   gence, 251:62–97, 2017.
[Schwering, 2017] Christoph Schwering. Limbo: A reason-
   ing system for limited belief. In Proceedings of the Twenty-
   Sixth International Joint Conference on Artificial Intelli-
   gence (IJCAI), pages 5246–5248. AAAI Press, 2017.
[Sharma et al., 2015] Arpit Sharma, Nguyen Ha Vo, Somak
   Aditya, and Chitta Baral. Towards addressing the Winograd
   schema challenge – building and using a semantic parser
   and a knowledge hunting module. In Proceedings of the
   Twenty-Fourth International Joint Conference on Artificial
   Intelligence (IJCAI), pages 1319–1325. AAAI Press, 2015.