Using Semantic Web Resources for Solving Winograd Schemas: Sculptures, Shelves, Envy, and Success∗ Peter Schüller Mishal Kazmi Computer Engineering Department Faculty of Engineering and Natural Science Faculty of Engineering, Marmara University Sabanci University Istanbul, Turkey Istanbul, Turkey peter.schuller@marmara.edu.tr mishalkazmi@sabanciuniv.edu ABSTRACT Reasoning requires knowledge, the biggest repository of Winograd Schemas are sentences where a pronoun must be knowledge is arguably the Internet, however it is mostly un- linked to one of two possible entities in the same sentence. structured information. The Linked Data effort [4] struc- Deciding correctly which entity should be linked was pro- tures data in a way that it becomes machine readable, hence posed as an alternative to the Turing test. Knowledge is a it can be used for automated reasoning. Therefore using the critical component of solving this challenge and Linked Data Semantic Web as a knowledge resource for tackling the WSC resources promise to be useful to that end. We discuss two is highly suggestive. But knowledge is more than data, so example Winograd Schemas and related knowledge that can how far can we get with existing resources? be discovered by manual search in WikiData, DBPedia, Ba- In this work we discuss two examples of WSs and attempt belNet, freebase, WordNet, VerbNet, and the Component to resolve them using data repositories typically considered Library. We find that these resources are difficult to lever- part of the Semantic Web and other resources. We show age because (i) they mix named entities with expert jargon that repositories like WikiData, DBPedia, BabelNet, and free- and generic ontological knowledge, (ii) annotation tools are base are necessary but not sufficient by themselves: they lacking, and (ii) commonsense knowledge is kept implicit. contain mostly taxonomic knowledge and (historical) facts about named entities. On the contrary, most existing Winograd Schemas [7] do 1. INTRODUCTION not refer to historical events or entities, they can be under- The Winograd Schema Challenge (WSC) [20, 12] was pro- stood out of the blue (i.e., without additional context) using posed as a more practical alternative for the Turing Test. Commonsense knowledge [14] that is shared by humans be- An example is the following Winograd Schema (WS): cause they live in a similar world as other humans.1 In the above WS such knowledge is that anchoring/fixing an object [The sculpture]a rolled off [the shelf ]b (ScAnchor) (usually) prevents its movement. because [it]X wasn’t anchored. But is such Commonsense knowledge represented in Se- [The sculpture]a rolled off [the shelf ]b mantic Web resources? (ScLevel) because [it]X wasn’t level. In this work we first outline how to perform reasoning, following the idea that many schemas can be resolved us- Each sentence in such a schema poses a coreference ambigu- ing correlation [2].2 Then we give an — in our opinion — ity problem between the phrases marked in square brackets. representative part of background knowledge obtained from This example has two candidate solutions: X = a and X = b. existing resources by manual search An important property of WSs is, that the correct solution Our contribution is to point out potential use of Semantic is different in both sentences, and that the sentences differ Web resources towards tackling Winograd Schemas and to only by a single word (‘level’ vs. ‘anchored’): the correct so- show problems that become apparent while doing so. The lution of (ScAnchor) is X = a while for (ScLevel) it is X = b. main issues we point out are as follows. Because of this property it has been argued, that purely sta- tistical approaches will be insufficient for beating the WSC • Misinterpreting the topic of a sentence causes annota- and that methods of (symbolic) knowledge representation tion of many wrong entities, in particular if knowledge and reasoning will be necessary [12]. about named entities or expert jargon is preferred over ∗This work has been supported by Scientific and Technologi- generic concepts. Therefore tools that annotate text cal Research Council of Turkey (TUBITAK) Grant 114E777. with the correct links (concepts or entities) are crucial. • A high level of detail in textual descriptions, or a vary- ing detail of such descriptions, can mislead reasoning. • Missing Commonsense knowledge is a limiting factor, but annotating Web content with links to common vo- 1 We will disregard the question where commonsense knowl- edge ends and where culture-dependent knowledge starts. 2 Note that such reasoning need not be based on symbolic . logic, we can similarly envision to realize it statistically. 22 cabularies has the potential to enable future work that and discourse coherence (e.g., [13, 1]). For simplicity, our mines such knowledge from the (annotated) Web. examples show only explicit discourse structure indicated by ‘because’ or ‘although’. However in a coherent text all 2. REASONING ABOUT CORRELATION sentences are related in a structure, often a tree structure, and often not explicitly marked. Two examples of frequent Why is correlation [2] a possibility for resolving coreferences implicit discourse relations are temporal order (time usu- in the WSC? ally progresses forward from one sentence to the subsequent If we split (ScAnchor) and (ScLevel) into two sentences, one); and elaboration (a topic is explained in more detail in including all possible resolutions of the pronoun, we obtain a subsequent sentence). Examples for further explicit dis- the following simple statements. course connectives are ‘but’, ‘hence’, ‘in order to’, . . . . The sculpture rolled off the shelf. (1) The sculpture wasn’t anchored. (2) 3. SEMANTIC WEB KNOWLEDGE The shelf wasn’t anchored. (3) We now investigate how to obtain the required knowledge The sculpture wasn’t level. (4) from resources integrated into the Semantic Web and from similar resources built by other communities. We consider The shelf wasn’t level. (5) WikiData [19], which is a language- and presentation-inde- In the original schema, the word ‘because’ raises an ex- pendent annotated data backend for Wikipedia, DBPedia [10], pectation in the reader, namely that the second sentence is which contains RDF triples extracted from Wikipedia in- a plausible reason for the first sentence. One way to handle foboxes, freebase [5], which is a community-built knowledge this plausibility is to reduce it to correlation: For example graph repository, and BabelNet [17], which connects sev- by checking whether (1) has a higher correlation with (2) eral Wikipedia projects with the linguistic resources WordNet than with (3) we can find the correct solution. [15]. Furthermore we use the linguistic resources WordNet, But what kind of correlation do we use in this case? VerbNet [9], and the Component Library Clib [3], which is a The topic of all three sentences is related with movement Commonsense knowledge resource. or its impossibility. Hence the correlation can be about prop- For WikiData, freebase, WordNet, VerbNet, and Clib, we erties within that topic. (1) and (2) pertain to movement of lookup single words. We also search in the Falcons Semantic the sculpture, while (3) pertains to movement of the shelf. Web search engine [18] which performs search and ranking In absence of knowledge about the meaning of anchoring this results in most of the above resources. can be enough to infer the correct solution, namely that (1) Additionally we perform annotation of the whole schemas and (2) are correlated more than (1) and (3). If we addition- using the annotation engines DBPedia Spotlight3 [6] and the ally know that anchoring prevents rolling, and not anchoring Babelfy [16] annotation engine for linking to BabelNet.4 allows rolling (this can be seen as a positive correlation) then we can also infer the correct solution: (2) better fulfills the 3.1 Disambiguating the Anchored Schema expectation raised by ‘because’, yielding the solution X = a. (ScAnchor) can be disambiguated if we have knowledge that Note that we deal with (ScLevel) in the next section. (i) sentences (1), (2), and (3) are about the same topic Our second example is the following schema. (movement of a physical object); and that (ii) anchoring prevents movement and rolling is movement. [Pete]a envies [Martin]b We will take for granted that we have linguistic knowledge (EnvyBecause) because [he]X is very successful. and mechanisms that allow us to identify subject and object [Pete]a envies Martinb of ‘rolled’ and how to handle the predicate ‘wasn’t’: these (EnvyAlthough) are research areas on their own. although [he]X is very successful. Useful Knowledge. WikiData has definitions for ‘sculp- Extracted parts of this schema are ture’ as well as ‘shelf’, classifying them as ‘three-dimensional Pete envies Martin. (6) work of art’ and as ‘furniture’, respectively, and both are a Pete is very successful. (7) subclass of ‘artificial physical object’. VerbNet contains an entry classifying ‘roll off’as ‘move’. Martin is very successful. (8) Babelfy produces several correct annotations: ‘rolled’ is Note that this time the same three sentences are used for re- linked to ‘roll’ which is a kind of ‘move’; ‘anchored’ is linked solving (EnvyBecause) and (EnvyAlthough). The only dif- to ‘anchor’ which is a kind of ‘fix’ which is a kind of ‘attach’. ference between (EnvyBecause) and (EnvyAlthough) is the Falcons finds WordNet and WikiData concepts for ‘sculp- connective between the sentences: ‘because’ (again) raises ture’, ‘to roll’, and ‘shelf’, ranking correct data high but not an expectation of positive correlation, however ‘although’ in first place. Results for ‘anchored’ are not helpful, but raises an expectation of an exception: although we would searching for ‘anchor’ reveals the correct WordNet entry. normally assume that Pete is not envious (because he is While attaching intuitively prevents moving, this knowl- successful), he actually is. edge is cannot be found easily. Intuitively, the object of ‘to envy’ is correlated with being Clib contains nearly enough information to infer this causal successful, and the subject of ‘to envy’ is correlated with not relation: ‘Move’ has a precondition that the object to be being successful. Therefore the solution for (EnvyBecause) moved does not have the property ‘Be-Restrained’ which can be found by maximizing correlation, while we need to inherits from ‘Be-Obstructed’. The problem is that ob- minimize correlation for (EnvyAlthough). struction or restraint is caused by ‘Move’ and the separate 3 Theoretical Justification. The expectation of correlation http://spotlight.dbpedia.org/rest/annotate 4 is explained by linguistic theories about discourse structure http://babelfy.org/index 23 event ‘Attach’ is not causally related to restraining (‘Be- among the two candidates (4) and (5), the second one is a Restrained’ contains the linguistic annotation ‘fixed’ but fix- property that is reasonable while the first is an unreasonable ing and attaching are modeled as different concepts). one, then we find the correct result. While it seems that enough knowledge is available, au- To show this we require the following knowledge: (i) a tomatically linking that knowledge in the right way is not shelf is usually a flat entity, (ii) a sculpture is usually not trivial. Next we show that there is also additional knowledge flat, and (iii) ‘level’ is a potential property of flat entities.5 that could be linked and that is counterproductive towards Useful knowledge. freebase’s entry for ‘shelf’ states ‘A the goal of reasoning about the intuition of our example. shelf is a flat horizontal plane [. . . ]’ and most of its entries Misleading knowledge. freebase provides several entries for ‘level’ refer to ‘horizontal’ or ‘plane’ in their definitions. for ‘sculpture’, mostly about the art form of sculpturing, This allows us to infer that ‘level’ is a more likely property and few about (very specific) physical objects. The closest of ‘shelf’ than of ‘sculpture’, yielding the solution X = b. helpful entry is ‘statue’ which, according to its definition, Misleading knowledge. DBPedia Spotlight wrongly links ‘. . . is a sculpture representing one or more people or animals ‘level’ to ‘deck of a ship’, again using the wrong topic area. . . . .’ For ‘shelf’ the first hit is the correct entry, however in Babelfy wrongly links ‘level’ to ‘level of a game’. its definition we find that ‘A shelf is a flat horizontal plane Falcons provides many pages of search results, but the . . . to hold items . . . It is raised off the ground and usually order of results is misleading: the first five entries are related anchored/supported on its shorter length sides by brackets.’ with ‘level of visibility’. The definition of sculpture (or statue) does not contain in- formation about anchoring, hence a system that heuristically 3.3 Disambiguating both Envy Schemas evaluates correlation will conclude that shelves are more (EnvyBecause) and (EnvyAlthough)differ only in the rhetor- likely to be anchored than sculptures. Therefore (ScAnchor) ical relation, therefore the same knowledge is relevant. will be disambiguated in the wrong way, even though we Useful knowledge. DBPedia Spotlight correctly links ‘en- have only truthful evidence. The problem is that the heuris- vies’ to an emotion which ‘occurs when a person lacks an- tic fails because (by chance) one definition contains mislead- other’s superior quality, achievement, or possession and ei- ing information. (Actually searching the web yields many ther desires it or wishes that the other lacked it’. do-it-yourself forums with information about proper ways VerbNet does not contain an entry for ‘to envy’ but for to anchor shelves in the wall, and not nearly as much for ‘success’ which is a potential property of humans according anchoring statues, hence correlation of anchoring seems to to several of its free-text definitions. be higher for shelves than for statues, however we need to WikiData provides as first results for ‘envy’ the same entry consider correlation between rolling an object and not an- as DBPedia Spotlight. Additionally for ‘success’ it contains choring that object.) an entry for ‘achievement of a goal’ and one for ‘victory’. Babelfy provides the following annotations: ‘sculpture’ is (For ‘successful’ there are only entries related to arts pieces.) a ‘three-dimensional figure’ which is a ‘shape’ which is a Babelfy links ‘envies’ to ‘to envy’ which is a subclass of ‘to ‘mathematical object’, moreover ‘sculpture’ is subclass of admire’ which is a subclass of ‘to think’, moreover ‘success- several art forms; ‘shelf’ is a ‘furniture’ which is a ‘decora- ful’ is linked to the entry of the same name but this entry tive art’ which is a ‘perceptible object’, moreover ‘shelf’ is does not contain any classification. a ‘support’ which is a ‘machine’ and a ‘tool’. In summary Falcons provides useful results, linking ‘successful’ to the ‘sculpture’ is classified as an intangible abstract concept, and corresponding WordNet entry, and ‘to envy’ to ‘jealousy’. ‘shelf’ is classified also as a tool which can be misleading. These pieces of knowledge can be sufficient for our pur- DBPedia Spotlight annotates ‘sculpture’ with a particular pose, in particular the definition of ‘to envy’ in connection species of sea snail, ‘shelf’ with ‘Shelf life’ (shelf here means with linking success to ‘achievement’. shallow coastal area of the sea). While ‘rolled’ and ‘an- Misleading knowledge. However there is also misleading chored’ are not associated with any entity, a search on the and missing knowledge. web reveals more potentially misleading information: there freebase provides many results for ‘envy’, ‘envies’, ‘suc- are ‘roll anchors’ for anchoring ships in the shelf zone, more- cess’, and ‘successful’, most of them names of arts pieces. over rolling is a specific movement of ships induced by wind. DBPedia Spotlight links ‘Pete’ and ‘Martin’ to TV pro- While the presence of this (expert jargon) knowledge in grams and characters, respectively. Note that interpreting DBPedia is no problem, its usage is a problem: it should be these names is not useful for disambiguating this schema. linked only when significant evidence suggests that the text Clib does not contain any information about envy or suc- is about anchoring ships near the coast. It seems that Ba- cess, it does not even contain the concepts of feeling, emo- belfy performs better than DBPedia Spotlight, although this tion, attitude, or thinking. can be a random effect due to the limited number of exam- In summary, (EnvyBecause) and (EnvyAlthough) can be ples we are looking at. While Falcons contains a possibility disambiguated automatically with existing resources, if we to choose between ‘object’ and ‘concept’ in the search, this manage to ignore irrelevant search results. does not seem to provide the required distinction: expert jargon is always contained in search results. Note that we mainly discussed ‘sculpture’, because for 4. CONCLUSION other content words, misleading knowledge cannot be found Authors of content in the Web rarely describe how the world to such an extent. Due to the amount of available knowl- works, mostly they give an efficient account of what hap- edge, separating useful from irrelevant knowledge is crucial. pened, why, when, and how it happened. Such an efficient 5 The words ‘usually’ and ‘potential’ point out that this 3.2 Disambiguating the Level Schema knowledge is default knowledge and can be defeated by more For (ScLevel) we do not need correlation: if we can show that specific knowledge (to account for atypical cases). 24 use of natural language omits certain content that can be International Conference on Knowledge Capture inferred by the reader, therefore computers must interpret (K-CAP), New York, USA, 2001. ACM Press. natural language to reason with it. Similarly, if data is pub- [4] T. Berners-Lee, C. Bizer, and T. Heath. Linked lished in non-annotated unstructured form, humans can of- data-the story so far. International Journal on ten guess which part of that data is a name or a location. Semantic Web and Information Systems (Special Issue Computers cannot do that, therefore the Linked Data initia- on Linked Data), 5(3):1–22, 2009. tive aims to annotate data with type information in common [5] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and ontologies and information about its relation to other data. J. Taylor. Freebase: A Collaboratively Created Graph Linked Data, as the name indicates, is about data, anno- Database For Structuring Human Knowledge. In ACM tated with its type and further meta information. However, SIGMOD International Conference on Management of various Semantic Web resources do not only contain data Data, pages 1247–1249, 2008. about named entities and events, they also contain a bit of [6] J. Daiber, M. Jakob, C. Hokamp, and P. N. Mendes. (mostly taxonomic) commonsense knowledge. This knowl- Improving Efficiency and Accuracy in Multilingual edge is used to organize the meta information and is often Entity Extraction Categories and Subject Descriptors. mixed with the other knowledge. In Semantic Systems, pages 3–6, 2012. Using Linked Data for reasoning requires to connect it [7] E. Davis. A collection of Winograd Schemas. http: with additional commonsense knowledge that is currently //www.cs.nyu.edu/faculty/davise/papers/WS.html. not contained in existing resources. Moreover, connecting [8] A. S. Gordon. Mining Commonsense Knowledge From Linked Data to natural language texts (i.e., with unstruc- Personal Stories in Internet Weblogs. In Workshop on tured data), requires annotation tools like Babelfy or DBPe- Automated Knowledge Base Construction (AKBC), dia Spotlight that annotate words and phrases in a text with pages 8–15, 2010. appropriate URIs to resources in the Semantic Web. Such tools are often based on (or supported by) machine learning. [9] K. Kipper, A. Korhonen, N. Ryant, and M. Palmer. A In this work we saw that only Babelfy provides reasonable large-scale classification of English verbs. Language automatic annotations, so its ranking scheme seems to be su- Resources and Evaluation, 42(1):21–40, 2007. perior to DBPedia Spotlight. Babelfy distinguishes between [10] J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, knowledge about concepts and named entities, and inter- D. Kontokostas, P. N. Mendes, S. Hellmann, nally uses coherence measures. Our examples of misleading M. Morsey, P. van Kleef, S. Auer, and C. Bizer. knowledge consider only the correct POS, however NER de- DBpedia - A Large-scale , Multilingual Knowledge tection could help to separate between common nouns and Base Extracted from Wikipedia. Semantic Web names. Falcons distinguishes between ‘concepts’ and ‘ob- Journal, 1(1-5):1–29, 2012. jects’, however it returns expert jargon in both result types [11] D. B. Lenat. CYC: A Largs-Scale Investment in and the tfidf ranking [18] produces misleading search results. Knowledge Infrastructure. Communications of the About the issue of expert jargon (e.g., ‘sculpture’ as a ACM, 38(11):33–38, 1995. certain type of mollusc) we note that already in the CYC [12] H. J. Levesque, E. Davis, and L. Morgenstern. The project [11] there were ‘microtheories’ for separating more Winograd Schema Challenge. In Principles of specific from more generic knowledge. However, in none of Knowledge Representation and Reasoning (KR), pages the resources discussed in this work we found a method of 552–561. AAAI Press, 2012. distinguishing between these types of knowledge. [13] W. C. Mann and S. A. Thompson. Rhetorical RDFa allows Web authors to annotate parts of their web- structure theory: Toward a functional theory of text site content (i.e., words or phrases) with type information organization. Text, 8(3):243–281, 1988. and links to common vocabularies such as WikiData. This [14] J. McCarthy. Programs with Common Sense. eliminates the need for disambiguation and can make ma- www-formal.stanford.edu/jmc/mcc59.ps, 1959. chine reading more feasible on these websites. Therefore we [15] G. A. Miller. WordNet: a lexical database for English. think that widespread usage of RDFa could be a crucial en- Communications of the ACM, 38(11):39–41, 1995. abler for mining commonsense knowledge from the web, in [16] A. Moro, A. Raganato, and R. Navigli. Entity Linking efforts similar to [8]. meets Word Sense Disambiguation: a Unified We conclude that Linked Data can be used for reason- Approach. Transactions of the Association for ing, but we need better tools that automatically annotate a Computational Linguistics, 2:231–244, 2014. given text with the most suitable Semantic Web URIs. Ad- [17] R. Navigli and S. P. Ponzetto. BabelNet: The ditionally, only if we manage to integrate Linked Data with automatic construction, evaluation and application of commonsense knowledge, we can use this data as knowledge. a wide-coverage multilingual semantic network. Artificial Intelligence, 193:217–250, 2012. 5. REFERENCES [18] Y. Qu and G. Cheng. Falcons concept search: A practical search engine for web ontologies. IEEE [1] N. Asher and A. Lascarides. Logics of Conversation. Transactions on Systems, Man, and Cybernetics Part Cambridge University Press, 2003. A: Systems and Humans, 41(4):810–816, 2011. [2] D. Bailey, A. Harrison, Y. Lierler, V. Lifschitz, and [19] D. Vrandečić and M. Krötzsch. Wikidata: A Free J. Michael. The Winograd Schema Challenge and Collaborative Knowledgebase. Communications of the Reasoning about Correlation. In Symposium on Logical ACM, 57(10):78–85, 2014. Formalizations of Commonsense Reasoning, 2015. [20] T. Winograd. Understanding Natural Language. [3] K. Barker, B. Porter, and P. Clark. A Library of Academic Press, 1972. Generic Concepts for Composing Knowledge Bases. In 25