Using Semantic Web Resources for Solving Winograd
        Schemas: Sculptures, Shelves, Envy, and Success∗

                          Peter Schüller                                               Mishal Kazmi
               Computer Engineering Department                          Faculty of Engineering and Natural Science
            Faculty of Engineering, Marmara University                              Sabanci University
                          Istanbul, Turkey                                            Istanbul, Turkey
               peter.schuller@marmara.edu.tr                                mishalkazmi@sabanciuniv.edu

ABSTRACT                                                                  Reasoning requires knowledge, the biggest repository of
Winograd Schemas are sentences where a pronoun must be                 knowledge is arguably the Internet, however it is mostly un-
linked to one of two possible entities in the same sentence.           structured information. The Linked Data effort [4] struc-
Deciding correctly which entity should be linked was pro-              tures data in a way that it becomes machine readable, hence
posed as an alternative to the Turing test. Knowledge is a             it can be used for automated reasoning. Therefore using the
critical component of solving this challenge and Linked Data           Semantic Web as a knowledge resource for tackling the WSC
resources promise to be useful to that end. We discuss two             is highly suggestive. But knowledge is more than data, so
example Winograd Schemas and related knowledge that can                how far can we get with existing resources?
be discovered by manual search in WikiData, DBPedia, Ba-                  In this work we discuss two examples of WSs and attempt
belNet, freebase, WordNet, VerbNet, and the Component                  to resolve them using data repositories typically considered
Library. We find that these resources are difficult to lever-          part of the Semantic Web and other resources. We show
age because (i) they mix named entities with expert jargon             that repositories like WikiData, DBPedia, BabelNet, and free-
and generic ontological knowledge, (ii) annotation tools are           base are necessary but not sufficient by themselves: they
lacking, and (ii) commonsense knowledge is kept implicit.              contain mostly taxonomic knowledge and (historical) facts
                                                                       about named entities.
                                                                          On the contrary, most existing Winograd Schemas [7] do
1.     INTRODUCTION                                                    not refer to historical events or entities, they can be under-
The Winograd Schema Challenge (WSC) [20, 12] was pro-                  stood out of the blue (i.e., without additional context) using
posed as a more practical alternative for the Turing Test.             Commonsense knowledge [14] that is shared by humans be-
An example is the following Winograd Schema (WS):                      cause they live in a similar world as other humans.1 In the
                                                                       above WS such knowledge is that anchoring/fixing an object
     [The sculpture]a rolled off [the shelf ]b
                                                   (ScAnchor)          (usually) prevents its movement.
     because [it]X wasn’t anchored.
                                                                          But is such Commonsense knowledge represented in Se-
     [The sculpture]a rolled off [the shelf ]b                         mantic Web resources?
                                                     (ScLevel)
     because [it]X wasn’t level.                                          In this work we first outline how to perform reasoning,
                                                                       following the idea that many schemas can be resolved us-
Each sentence in such a schema poses a coreference ambigu-
                                                                       ing correlation [2].2 Then we give an — in our opinion —
ity problem between the phrases marked in square brackets.
                                                                       representative part of background knowledge obtained from
This example has two candidate solutions: X = a and X = b.
                                                                       existing resources by manual search
An important property of WSs is, that the correct solution
                                                                          Our contribution is to point out potential use of Semantic
is different in both sentences, and that the sentences differ
                                                                       Web resources towards tackling Winograd Schemas and to
only by a single word (‘level’ vs. ‘anchored’): the correct so-
                                                                       show problems that become apparent while doing so. The
lution of (ScAnchor) is X = a while for (ScLevel) it is X = b.
                                                                       main issues we point out are as follows.
Because of this property it has been argued, that purely sta-
tistical approaches will be insufficient for beating the WSC              • Misinterpreting the topic of a sentence causes annota-
and that methods of (symbolic) knowledge representation                     tion of many wrong entities, in particular if knowledge
and reasoning will be necessary [12].                                       about named entities or expert jargon is preferred over
∗This work has been supported by Scientific and Technologi-                 generic concepts. Therefore tools that annotate text
cal Research Council of Turkey (TUBITAK) Grant 114E777.                     with the correct links (concepts or entities) are crucial.

                                                                          • A high level of detail in textual descriptions, or a vary-
                                                                            ing detail of such descriptions, can mislead reasoning.

                                                                          • Missing Commonsense knowledge is a limiting factor,
                                                                            but annotating Web content with links to common vo-
                                                                       1
                                                                         We will disregard the question where commonsense knowl-
                                                                       edge ends and where culture-dependent knowledge starts.
                                                                       2
                                                                         Note that such reasoning need not be based on symbolic
.                                                                      logic, we can similarly envision to realize it statistically.


                                                                  22
       cabularies has the potential to enable future work that          and discourse coherence (e.g., [13, 1]). For simplicity, our
       mines such knowledge from the (annotated) Web.                   examples show only explicit discourse structure indicated
                                                                        by ‘because’ or ‘although’. However in a coherent text all
2.     REASONING ABOUT CORRELATION                                      sentences are related in a structure, often a tree structure,
                                                                        and often not explicitly marked. Two examples of frequent
Why is correlation [2] a possibility for resolving coreferences
                                                                        implicit discourse relations are temporal order (time usu-
in the WSC?
                                                                        ally progresses forward from one sentence to the subsequent
  If we split (ScAnchor) and (ScLevel) into two sentences,
                                                                        one); and elaboration (a topic is explained in more detail in
including all possible resolutions of the pronoun, we obtain
                                                                        a subsequent sentence). Examples for further explicit dis-
the following simple statements.
                                                                        course connectives are ‘but’, ‘hence’, ‘in order to’, . . . .
              The sculpture rolled off the shelf.           (1)
              The sculpture wasn’t anchored.                (2)         3.     SEMANTIC WEB KNOWLEDGE
              The shelf wasn’t anchored.                    (3)         We now investigate how to obtain the required knowledge
              The sculpture wasn’t level.                   (4)         from resources integrated into the Semantic Web and from
                                                                        similar resources built by other communities. We consider
              The shelf wasn’t level.                       (5)         WikiData [19], which is a language- and presentation-inde-
   In the original schema, the word ‘because’ raises an ex-             pendent annotated data backend for Wikipedia, DBPedia [10],
pectation in the reader, namely that the second sentence is             which contains RDF triples extracted from Wikipedia in-
a plausible reason for the first sentence. One way to handle            foboxes, freebase [5], which is a community-built knowledge
this plausibility is to reduce it to correlation: For example           graph repository, and BabelNet [17], which connects sev-
by checking whether (1) has a higher correlation with (2)               eral Wikipedia projects with the linguistic resources WordNet
than with (3) we can find the correct solution.                         [15]. Furthermore we use the linguistic resources WordNet,
   But what kind of correlation do we use in this case?                 VerbNet [9], and the Component Library Clib [3], which is a
   The topic of all three sentences is related with movement            Commonsense knowledge resource.
or its impossibility. Hence the correlation can be about prop-             For WikiData, freebase, WordNet, VerbNet, and Clib, we
erties within that topic. (1) and (2) pertain to movement of            lookup single words. We also search in the Falcons Semantic
the sculpture, while (3) pertains to movement of the shelf.             Web search engine [18] which performs search and ranking
In absence of knowledge about the meaning of anchoring this             results in most of the above resources.
can be enough to infer the correct solution, namely that (1)               Additionally we perform annotation of the whole schemas
and (2) are correlated more than (1) and (3). If we addition-           using the annotation engines DBPedia Spotlight3 [6] and the
ally know that anchoring prevents rolling, and not anchoring            Babelfy [16] annotation engine for linking to BabelNet.4
allows rolling (this can be seen as a positive correlation) then
we can also infer the correct solution: (2) better fulfills the         3.1     Disambiguating the Anchored Schema
expectation raised by ‘because’, yielding the solution X = a.           (ScAnchor) can be disambiguated if we have knowledge that
   Note that we deal with (ScLevel) in the next section.                (i) sentences (1), (2), and (3) are about the same topic
Our second example is the following schema.                             (movement of a physical object); and that (ii) anchoring
                                                                        prevents movement and rolling is movement.
     [Pete]a envies [Martin]b                                              We will take for granted that we have linguistic knowledge
                                                (EnvyBecause)
     because [he]X is very successful.                                  and mechanisms that allow us to identify subject and object
     [Pete]a envies Martinb                                             of ‘rolled’ and how to handle the predicate ‘wasn’t’: these
                                               (EnvyAlthough)           are research areas on their own.
     although [he]X is very successful.
                                                                        Useful Knowledge. WikiData has definitions for ‘sculp-
Extracted parts of this schema are
                                                                        ture’ as well as ‘shelf’, classifying them as ‘three-dimensional
                  Pete envies Martin.                       (6)         work of art’ and as ‘furniture’, respectively, and both are a
                  Pete is very successful.                  (7)         subclass of ‘artificial physical object’.
                                                                           VerbNet contains an entry classifying ‘roll off’as ‘move’.
                  Martin is very successful.                (8)            Babelfy produces several correct annotations: ‘rolled’ is
Note that this time the same three sentences are used for re-           linked to ‘roll’ which is a kind of ‘move’; ‘anchored’ is linked
solving (EnvyBecause) and (EnvyAlthough). The only dif-                 to ‘anchor’ which is a kind of ‘fix’ which is a kind of ‘attach’.
ference between (EnvyBecause) and (EnvyAlthough) is the                    Falcons finds WordNet and WikiData concepts for ‘sculp-
connective between the sentences: ‘because’ (again) raises              ture’, ‘to roll’, and ‘shelf’, ranking correct data high but not
an expectation of positive correlation, however ‘although’              in first place. Results for ‘anchored’ are not helpful, but
raises an expectation of an exception: although we would                searching for ‘anchor’ reveals the correct WordNet entry.
normally assume that Pete is not envious (because he is                    While attaching intuitively prevents moving, this knowl-
successful), he actually is.                                            edge is cannot be found easily.
   Intuitively, the object of ‘to envy’ is correlated with being           Clib contains nearly enough information to infer this causal
successful, and the subject of ‘to envy’ is correlated with not         relation: ‘Move’ has a precondition that the object to be
being successful. Therefore the solution for (EnvyBecause)              moved does not have the property ‘Be-Restrained’ which
can be found by maximizing correlation, while we need to                inherits from ‘Be-Obstructed’. The problem is that ob-
minimize correlation for (EnvyAlthough).                                struction or restraint is caused by ‘Move’ and the separate
                                                                        3
Theoretical Justification. The expectation of correlation                   http://spotlight.dbpedia.org/rest/annotate
                                                                        4
is explained by linguistic theories about discourse structure               http://babelfy.org/index


                                                                   23
event ‘Attach’ is not causally related to restraining (‘Be-                among the two candidates (4) and (5), the second one is a
Restrained’ contains the linguistic annotation ‘fixed’ but fix-            property that is reasonable while the first is an unreasonable
ing and attaching are modeled as different concepts).                      one, then we find the correct result.
   While it seems that enough knowledge is available, au-                     To show this we require the following knowledge: (i) a
tomatically linking that knowledge in the right way is not                 shelf is usually a flat entity, (ii) a sculpture is usually not
trivial. Next we show that there is also additional knowledge              flat, and (iii) ‘level’ is a potential property of flat entities.5
that could be linked and that is counterproductive towards                 Useful knowledge. freebase’s entry for ‘shelf’ states ‘A
the goal of reasoning about the intuition of our example.                  shelf is a flat horizontal plane [. . . ]’ and most of its entries
Misleading knowledge. freebase provides several entries                    for ‘level’ refer to ‘horizontal’ or ‘plane’ in their definitions.
for ‘sculpture’, mostly about the art form of sculpturing,                 This allows us to infer that ‘level’ is a more likely property
and few about (very specific) physical objects. The closest                of ‘shelf’ than of ‘sculpture’, yielding the solution X = b.
helpful entry is ‘statue’ which, according to its definition,              Misleading knowledge. DBPedia Spotlight wrongly links
‘. . . is a sculpture representing one or more people or animals           ‘level’ to ‘deck of a ship’, again using the wrong topic area.
. . . .’ For ‘shelf’ the first hit is the correct entry, however in           Babelfy wrongly links ‘level’ to ‘level of a game’.
its definition we find that ‘A shelf is a flat horizontal plane               Falcons provides many pages of search results, but the
. . . to hold items . . . It is raised off the ground and usually          order of results is misleading: the first five entries are related
anchored/supported on its shorter length sides by brackets.’               with ‘level of visibility’.
    The definition of sculpture (or statue) does not contain in-
formation about anchoring, hence a system that heuristically               3.3    Disambiguating both Envy Schemas
evaluates correlation will conclude that shelves are more                  (EnvyBecause) and (EnvyAlthough)differ only in the rhetor-
likely to be anchored than sculptures. Therefore (ScAnchor)                ical relation, therefore the same knowledge is relevant.
will be disambiguated in the wrong way, even though we                     Useful knowledge. DBPedia Spotlight correctly links ‘en-
have only truthful evidence. The problem is that the heuris-               vies’ to an emotion which ‘occurs when a person lacks an-
tic fails because (by chance) one definition contains mislead-             other’s superior quality, achievement, or possession and ei-
ing information. (Actually searching the web yields many                   ther desires it or wishes that the other lacked it’.
do-it-yourself forums with information about proper ways                      VerbNet does not contain an entry for ‘to envy’ but for
to anchor shelves in the wall, and not nearly as much for                  ‘success’ which is a potential property of humans according
anchoring statues, hence correlation of anchoring seems to                 to several of its free-text definitions.
be higher for shelves than for statues, however we need to                    WikiData provides as first results for ‘envy’ the same entry
consider correlation between rolling an object and not an-                 as DBPedia Spotlight. Additionally for ‘success’ it contains
choring that object.)                                                      an entry for ‘achievement of a goal’ and one for ‘victory’.
    Babelfy provides the following annotations: ‘sculpture’ is             (For ‘successful’ there are only entries related to arts pieces.)
a ‘three-dimensional figure’ which is a ‘shape’ which is a                    Babelfy links ‘envies’ to ‘to envy’ which is a subclass of ‘to
‘mathematical object’, moreover ‘sculpture’ is subclass of                 admire’ which is a subclass of ‘to think’, moreover ‘success-
several art forms; ‘shelf’ is a ‘furniture’ which is a ‘decora-            ful’ is linked to the entry of the same name but this entry
tive art’ which is a ‘perceptible object’, moreover ‘shelf’ is             does not contain any classification.
a ‘support’ which is a ‘machine’ and a ‘tool’. In summary                     Falcons provides useful results, linking ‘successful’ to the
‘sculpture’ is classified as an intangible abstract concept, and           corresponding WordNet entry, and ‘to envy’ to ‘jealousy’.
‘shelf’ is classified also as a tool which can be misleading.                 These pieces of knowledge can be sufficient for our pur-
    DBPedia Spotlight annotates ‘sculpture’ with a particular              pose, in particular the definition of ‘to envy’ in connection
species of sea snail, ‘shelf’ with ‘Shelf life’ (shelf here means          with linking success to ‘achievement’.
shallow coastal area of the sea). While ‘rolled’ and ‘an-
                                                                           Misleading knowledge. However there is also misleading
chored’ are not associated with any entity, a search on the
                                                                           and missing knowledge.
web reveals more potentially misleading information: there
                                                                              freebase provides many results for ‘envy’, ‘envies’, ‘suc-
are ‘roll anchors’ for anchoring ships in the shelf zone, more-
                                                                           cess’, and ‘successful’, most of them names of arts pieces.
over rolling is a specific movement of ships induced by wind.
                                                                              DBPedia Spotlight links ‘Pete’ and ‘Martin’ to TV pro-
    While the presence of this (expert jargon) knowledge in
                                                                           grams and characters, respectively. Note that interpreting
DBPedia is no problem, its usage is a problem: it should be
                                                                           these names is not useful for disambiguating this schema.
linked only when significant evidence suggests that the text
                                                                              Clib does not contain any information about envy or suc-
is about anchoring ships near the coast. It seems that Ba-
                                                                           cess, it does not even contain the concepts of feeling, emo-
belfy performs better than DBPedia Spotlight, although this
                                                                           tion, attitude, or thinking.
can be a random effect due to the limited number of exam-
                                                                              In summary, (EnvyBecause) and (EnvyAlthough) can be
ples we are looking at. While Falcons contains a possibility
                                                                           disambiguated automatically with existing resources, if we
to choose between ‘object’ and ‘concept’ in the search, this
                                                                           manage to ignore irrelevant search results.
does not seem to provide the required distinction: expert
jargon is always contained in search results.
    Note that we mainly discussed ‘sculpture’, because for                 4.    CONCLUSION
other content words, misleading knowledge cannot be found                  Authors of content in the Web rarely describe how the world
to such an extent. Due to the amount of available knowl-                   works, mostly they give an efficient account of what hap-
edge, separating useful from irrelevant knowledge is crucial.              pened, why, when, and how it happened. Such an efficient
                                                                           5
                                                                            The words ‘usually’ and ‘potential’ point out that this
3.2    Disambiguating the Level Schema                                     knowledge is default knowledge and can be defeated by more
For (ScLevel) we do not need correlation: if we can show that              specific knowledge (to account for atypical cases).


                                                                      24
use of natural language omits certain content that can be                  International Conference on Knowledge Capture
inferred by the reader, therefore computers must interpret                 (K-CAP), New York, USA, 2001. ACM Press.
natural language to reason with it. Similarly, if data is pub-         [4] T. Berners-Lee, C. Bizer, and T. Heath. Linked
lished in non-annotated unstructured form, humans can of-                  data-the story so far. International Journal on
ten guess which part of that data is a name or a location.                 Semantic Web and Information Systems (Special Issue
Computers cannot do that, therefore the Linked Data initia-                on Linked Data), 5(3):1–22, 2009.
tive aims to annotate data with type information in common             [5] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and
ontologies and information about its relation to other data.               J. Taylor. Freebase: A Collaboratively Created Graph
   Linked Data, as the name indicates, is about data, anno-                Database For Structuring Human Knowledge. In ACM
tated with its type and further meta information. However,                 SIGMOD International Conference on Management of
various Semantic Web resources do not only contain data                    Data, pages 1247–1249, 2008.
about named entities and events, they also contain a bit of            [6] J. Daiber, M. Jakob, C. Hokamp, and P. N. Mendes.
(mostly taxonomic) commonsense knowledge. This knowl-                      Improving Efficiency and Accuracy in Multilingual
edge is used to organize the meta information and is often                 Entity Extraction Categories and Subject Descriptors.
mixed with the other knowledge.                                            In Semantic Systems, pages 3–6, 2012.
   Using Linked Data for reasoning requires to connect it              [7] E. Davis. A collection of Winograd Schemas. http:
with additional commonsense knowledge that is currently                    //www.cs.nyu.edu/faculty/davise/papers/WS.html.
not contained in existing resources. Moreover, connecting
                                                                       [8] A. S. Gordon. Mining Commonsense Knowledge From
Linked Data to natural language texts (i.e., with unstruc-
                                                                           Personal Stories in Internet Weblogs. In Workshop on
tured data), requires annotation tools like Babelfy or DBPe-
                                                                           Automated Knowledge Base Construction (AKBC),
dia Spotlight that annotate words and phrases in a text with
                                                                           pages 8–15, 2010.
appropriate URIs to resources in the Semantic Web. Such
tools are often based on (or supported by) machine learning.           [9] K. Kipper, A. Korhonen, N. Ryant, and M. Palmer. A
   In this work we saw that only Babelfy provides reasonable               large-scale classification of English verbs. Language
automatic annotations, so its ranking scheme seems to be su-               Resources and Evaluation, 42(1):21–40, 2007.
perior to DBPedia Spotlight. Babelfy distinguishes between            [10] J. Lehmann, R. Isele, M. Jakob, A. Jentzsch,
knowledge about concepts and named entities, and inter-                    D. Kontokostas, P. N. Mendes, S. Hellmann,
nally uses coherence measures. Our examples of misleading                  M. Morsey, P. van Kleef, S. Auer, and C. Bizer.
knowledge consider only the correct POS, however NER de-                   DBpedia - A Large-scale , Multilingual Knowledge
tection could help to separate between common nouns and                    Base Extracted from Wikipedia. Semantic Web
names. Falcons distinguishes between ‘concepts’ and ‘ob-                   Journal, 1(1-5):1–29, 2012.
jects’, however it returns expert jargon in both result types         [11] D. B. Lenat. CYC: A Largs-Scale Investment in
and the tfidf ranking [18] produces misleading search results.             Knowledge Infrastructure. Communications of the
   About the issue of expert jargon (e.g., ‘sculpture’ as a                ACM, 38(11):33–38, 1995.
certain type of mollusc) we note that already in the CYC              [12] H. J. Levesque, E. Davis, and L. Morgenstern. The
project [11] there were ‘microtheories’ for separating more                Winograd Schema Challenge. In Principles of
specific from more generic knowledge. However, in none of                  Knowledge Representation and Reasoning (KR), pages
the resources discussed in this work we found a method of                  552–561. AAAI Press, 2012.
distinguishing between these types of knowledge.                      [13] W. C. Mann and S. A. Thompson. Rhetorical
   RDFa allows Web authors to annotate parts of their web-                 structure theory: Toward a functional theory of text
site content (i.e., words or phrases) with type information                organization. Text, 8(3):243–281, 1988.
and links to common vocabularies such as WikiData. This               [14] J. McCarthy. Programs with Common Sense.
eliminates the need for disambiguation and can make ma-                    www-formal.stanford.edu/jmc/mcc59.ps, 1959.
chine reading more feasible on these websites. Therefore we           [15] G. A. Miller. WordNet: a lexical database for English.
think that widespread usage of RDFa could be a crucial en-                 Communications of the ACM, 38(11):39–41, 1995.
abler for mining commonsense knowledge from the web, in               [16] A. Moro, A. Raganato, and R. Navigli. Entity Linking
efforts similar to [8].                                                    meets Word Sense Disambiguation: a Unified
   We conclude that Linked Data can be used for reason-                    Approach. Transactions of the Association for
ing, but we need better tools that automatically annotate a                Computational Linguistics, 2:231–244, 2014.
given text with the most suitable Semantic Web URIs. Ad-              [17] R. Navigli and S. P. Ponzetto. BabelNet: The
ditionally, only if we manage to integrate Linked Data with                automatic construction, evaluation and application of
commonsense knowledge, we can use this data as knowledge.                  a wide-coverage multilingual semantic network.
                                                                           Artificial Intelligence, 193:217–250, 2012.
5.   REFERENCES                                                       [18] Y. Qu and G. Cheng. Falcons concept search: A
                                                                           practical search engine for web ontologies. IEEE
 [1] N. Asher and A. Lascarides. Logics of Conversation.                   Transactions on Systems, Man, and Cybernetics Part
     Cambridge University Press, 2003.                                     A: Systems and Humans, 41(4):810–816, 2011.
 [2] D. Bailey, A. Harrison, Y. Lierler, V. Lifschitz, and            [19] D. Vrandečić and M. Krötzsch. Wikidata: A Free
     J. Michael. The Winograd Schema Challenge and                         Collaborative Knowledgebase. Communications of the
     Reasoning about Correlation. In Symposium on Logical                  ACM, 57(10):78–85, 2014.
     Formalizations of Commonsense Reasoning, 2015.                   [20] T. Winograd. Understanding Natural Language.
 [3] K. Barker, B. Porter, and P. Clark. A Library of                      Academic Press, 1972.
     Generic Concepts for Composing Knowledge Bases. In


                                                                 25