The Role of Pragmatics in Solving the Winograd Schema Challenge Adam Richard-Bollans and Lucı́a Gómez Álvarez and Anthony G. Cohn School of Computing University of Leeds, Leeds, UK {mm15alrb, sc14lga, a.g.cohn}@leeds.ac.uk Abstract The pronoun ‘it’ refers to either the ball or the table de- pending on whether ‘steel’ or ‘styrofoam’ is used. In both Different aspects and approaches to commonsense reason- cases the syntactic structure remains the same and, supposing ing have been investigated in order to provide solutions for the Winograd Schema Challenge (WSC). The vast complexi- that clear semantic preferences relating ‘steel’ and ‘crashing ties of natural language processing (parsing, assigning word through things’ or ‘styrofoam’ and ‘being crashed through’ sense, integrating context, pragmatics and world-knowledge, cannot be easily learned from mining a large corpus, it is ...) give broad appeal to systems based on statistical analysis hoped that any system which resolves the pronoun must use of corpora. However, solutions based purely on learning from some sort of genuine understanding. corpora are not currently able to capture the semantics under- In the literature discussing the WSC and its motivation as lying the WSC – which was intended to provide problems a benchmark we see example reasoning processes incorporat- whose solution requires knowledge and reasoning, rather than ing detailed semantics of the language involved (Davis 2013; statistical analysis of superficial lexical features. In this paper Levesque 2014; Levesque, Davis, and Morgenstern 2012; we consider the WSC as a means for highlighting challenges in the field of commonsense reasoning more generally. We be- Morgenstern and Ortiz Jr 2015). This kind of approach how- gin by discussing issues with current approaches to the WSC. ever has not been at the forefront of proposals to the chal- Following this we outline some key challenges faced, in par- lenge. This is in large part due to the enormous complexity of ticular highlighting the importance of dealing with pragmatics. dealing with natural language and constructing large enough We then argue for an alternative approach which favours the knowledge bases to handle such varied contexts. use of knowledge bases where the deep semantics of the dif- In order to further the symbolic approach we investigate ferent interpretations of commonsense terms are formalised. some problems faced, mainly pragmatics. It is hoped that this Furthermore, we suggest using heuristic approaches based on sort of analysis helps to shed light on what kind of reasoning pragmatics to determine appropriate configurations of both is needed where; and that heuristic methods will remove a reasonable interpretations of terms and necessary assumptions large portion of the burden of reasoning about natural lan- about the world. guage. Along similar lines, a partial solution is provided in (Schüller 2014), using relevance theory (Sperber and Wilson Introduction 2004) to motivate selection of the best knowledge graph to The Winograd Schema Challenge (Levesque, Davis, and Mor- describe a sentence. genstern 2012) was conceived as a new benchmark in arti- In this paper we first explore what kind of reasoning ca- ficial intelligence, which would improve on the Turing Test pabilities we expect a system to display when solving the (Turing 1950) by removing the need for deception and focus- WSC and we analyse how some of the proposed approaches ing more on understanding. The task is a particular type of compare to this. We then consider some key challenges for pronoun disambiguation problem. Sentences with a pronoun solving the WSC using reasoning we consider appropriate; and two candidate referents are given, and the task is to find in particular, that pragmatics and context are very difficult to the correct referent of the pronoun. As the challenge is in- capture and semantics are hard to formalize due to vagueness. tended to require genuine intelligence and understanding, the Finally, we show how pragmatic considerations can help in sentences are supposed to be constructed in such a way that solving the WSC, specifically we consider how prototype the- syntactic constraints and semantic preference do not alone ory and heuristic methods can be used to support symbolic enable the disambiguation. This construction is achieved in approaches. part by finding pairs of sentences, differing only by one word but where the pronoun reference is different. For example: What kind of reasoning are we looking for? The large ball crashed right through the table because We first consider the example above (1) given in (Levesque it was made of [steel/styrofoam]. What was made of 2014), using the word ‘styrofoam’. Humans would success- [steel/styrofoam]? Answers: The ball/the table.1 (1) fully resolve this by knowing particular properties of sty- 1 Taken from www.cs.nyu.edu/faculty/davise/ papers/WinogradSchemas/WSCollection rofoam, maybe some naive physics and even some general and Marcus 2015; Levesque 2014; 2017), various methods properties of balls and tables. suggested for tackling the problem (Sharma et al. 2015; Levesque then considers what should be the outcome if Schüller 2014; Bailey et al. 2015; Rahman and Ng 2012; we change styrofoam to XYZZY, where XYZZY is some Peng, Khashabi, and Roth 2015), and four implementa- material that we are given some facts about, one of the facts tions entered into the 2016 challenge2 (Liu et al. 2016; being ‘It is ninety-eight percent air, making it lightweight and Isaak and Michael 2016) (two of the competitors did not buoyant’. Given this fact, humans would be able to reason release papers). The WSC is a particular type of anaphora res- that the table is made of XYZZY. This is a part of intelligent olution task, on which there has been much work done in the behaviour that we would like to replicate, and is clearly de- natural language processing community already (Ng 2017; pendent on having and being able to reason about detailed Mitkov 2014; Carbonell and Brown 1988); however due to knowledge. Further, it has been suggested as a possible exten- the nature of the task, necessitating the use of world knowl- sion to the test to add a requirement for the solution to provide edge, the methods employed are not wholly suitable for the a simple explanation of its choice (Morgenstern and Ortiz Jr challenge. 2015). This need for explanation would also seem to depend Formalizing the necessary aspects of reasoning to tackle on reasoning with detailed knowledge; in order to explain the WSC (spatial, temporal, causal, epistemic, ...) and inte- why the table is made of styrofoam, it seems necessary to grating them in one system is notoriously hard. Therefore, it have an understanding of the mechanics of the situation. The is not surprising that the space of genuine proposed solutions ability to provide an explanation is also important more gener- is sparse, and that existing approaches are mostly based on ally for the field of commonsense reasoning, for example for statistical methods, that circumvent the need for a precise decision support systems that need to provide justifications understanding of the semantics of the schemas by learning for decisions (Hayes-Roth, Waterman, and Lenat 1984). likely answers from text corpora. In this section we analyse some of the solutions proposed Versatile solutions along these lines. We consider both their performance and The WSC was conceived as a new benchmark for artificial success on the challenge and also their achievements and intelligence; as such, we hope that solutions to the WSC will relevance for broader commonsense reasoning, which is the provide tools for tackling a broader range of question answer- ultimate aim of the WSC as a benchmark. ing tasks and commonsense challenges. In this way, solutions to the challenge should display versatility as well as making Machine learning approaches advances in the WSC specifically, thus representing genuine Machine learning methods for anaphora resolution have been progress towards truly intelligent machines. Solutions which used extensively over the past two decades (Ng 2017). In this are over-specific to the WSC and only provide insight into section we consider some of the best known such approaches this narrow set of coreference resolution problems are not for tackling the WSC. likely to be ‘engaging in behaviour that we would say shows The team that came first in the 2016 WSC challenge2 used thinking in people’ (Levesque, Davis, and Morgenstern 2012). ‘Commonsense Knowledge Enhanced Embeddings’ (Liu et al. This is a similar but more general requirement than elabora- 2016) which works by learning word representation vectors tion tolerance (McCarthy 1998). from large text corpora while incorporating commonsense The situations described in Winograd sentences (WS) are knowledge as constraints in the training process. For the generally common/normal occurrences; however, it is desir- competition the commonsense knowledge was obtained from able for AI systems to be able to reason about out-of-place CauseCom — a set of cause and effect pairs such as ‘winning objects and strange scenarios. The ability to do this displays causes happiness’ (Liu et al. 2016) — though the team has a genuine understanding of what is going on. Levesque gives also incorporated WordNet (Miller 1995) and ConceptNet the example ‘Can a crocodile run a steeplechase?’ (Levesque (Speer and Havasi 2012). A neural network is then trained 2014). Most humans would answer this easily using ba- to answer yes or no when given candidate/pronouns pairs sic knowledge about crocodiles (in particular that they can- (as vectors), and this network is then used to answer new not jump) and what is necessary to be able to complete a disambiguation problems. steeplechase. Of course, as noted by Levesque, a statistical Though achieving a good performance on the challenge, approach using the closed world assumption would be likely it would be down to chance whether it correctly answers the to get the right answer to this question too as there is little ev- XYZZY problem given by Levesque, whether it could be idence of crocodiles running steeplechases. It would be less used to solve the crocodile-steeplechase problem, or in future likely however to answer the question correctly if the animal how it could be developed to explain how it comes to the was a gazelle (which presumably could run a steeplechase). conclusion. Having briefly considered the kind of solutions we are Rahman and Ng (2012) have worked combining multi- aiming for, we now look at how some existing approaches ple methods to resolve the pronoun for a large corpus of compare to this. WSs. This work achieved high results on their corpus, 73.1%. However, the corpus selection has been criticized for contain- Existing approaches to the challenge ing redundancy (Sharma 2014). Further, the approach relies Since the inception of the WSC there has been some the- 2 www.cs.nyu.edu/faculty/davise/papers/ oretical discussion on the purpose of the challenge (Davis WinogradSchemas/WS.html heavily on statistical methods for assessing the semantic pref- fying and missing out more important reasoning processes, erences of types and events e.g. a lion is a type of predator including context. Similarly, if we were to find a list of com- and being the subject of a kill event makes one more likely monsense correlations like ‘fit into(x, y) ⊕ large(y)’ through to be the object of an arrest event. It is clear that ‘lions eat corpus mining, we are ripping the words out of context and zebras because they are predators’ is not a ‘Google-proof’ may be missing out important reasoning processes. WS and should be discarded. When such type distinctions This is not to say that conventions do not exist or form an are not useful, the system may rely on FrameNet (Baker, important part of commonsense reasoning. Natural language Fillmore, and Lowe 1998); in the case of ‘John killed Jim is full of conventions that we may rely upon to communicate. so he was arrested’, FrameNet gives John the role of ‘killer’ For example, considering the sentence ‘Sam chopped down and Jim the role of ‘victim’ and the system, using statistical the tree’ there is a default assumption that the chopping is methods, concludes that it is more likely for a ‘killer’ (John) done with an axe. This kind of convention can be considered to be arrested. In this case the system resolves the pronoun as part of linguistic knowledge (Pustejovsky 1991). However, successfully. However, this takes no account of the impor- reasoning based solely on conventions may be too crude, as it tance of the connective: changing the sentence to ‘John killed does not take contextual factors into consideration. Say that Jim after he was arrested’ should force one to re-evaluate the we know that Sam is holding a sword, then we may reject disambiguation. the default assumption that Sam chops down the tree with Work by Peng et al. (2015) has been successful, achieving an axe. One way of dealing with the context dependency of higher results (76.4%) than Rahman and Ng on the same such conventions may be to apply context frames, as in (Mc- corpus. The technique is similar to the FrameNet approach of Carthy 1993), i.e. in the context of Sam holding a sword, the (Rahman and Ng 2012) but they also take connectives into ac- statement ‘Sam chopped down the tree’ suggests that Sam count. This approach can give crude, and clearly problematic, did the chopping with a sword rather than an axe. However, forms of knowledge such as ‘{flower has pollen} is more even if we can create appropriate context frames using salient likely than {bee has pollen}’; to more reasonable knowledge aspects of context, it seems that the process of creating con- such as ‘the subject of “be afraid of ” is more likely than the vention/context pairs would continue ad infinitum. We would object of “be afraid of ” to be the subject of “get scared of ”’. hope that reasoning removes the necessity for a lot of these Though these sorts of techniques will likely prove very useful rules e.g. when someone is holding an appropriate tool, T, for for natural language processing, and may even manage to performing action, A, and we are told that they performed pass the WSC, there is a fundamental issue that these tech- action A, then we can assume that they have used T to do A. niques are learning about the likelihood of combinations of The tactic for many approaches is to begin by learning words in corpora and there appears to be little in the way of commonsense knowledge from large text corpora or by inte- transferable knowledge or understanding. For example, it is grating natural language knowledge bases. Part of the appeal clear that the kind of background knowledge necessary to of this is that knowledge can be exploited without having to solve the crocodile-steeplechase problem is not present. translate between formal and natural language. However, the Rather than applying reasoning to knowledge, these tech- methods for extracting commonsense knowledge from the niques are geared towards mining what we may call com- Web can be problematic. Language is used in an efficient way monsense rules. We discuss the nature of such rules in the and commonsense knowledge is often left implicit (Schüller following section. and Kazmi 2015). Even if we were able to overcome some of the problems of Commonsense rules mining commonsense, do we want to use reasoning that relies It is clear that, in the WSC, it appears possible to resolve solely on these correlations and rules? Though they may be pronoun ambiguity through an appeal to normality — heavy helpful for certain applications, the reasoning mechanisms things cannot be lifted, younger people are fitter, useless need to incorporate less crude knowledge. Regarding the objects go in the bin while useful tools are kept in storage desire for versatility and considering some of the problems etc... Hence, a large part of the suggested approaches to the listed on the Common Sense Problem Page3 , it is clear that WSC have been about ways of finding and/or incorporating this approach is over-specific to the WSC. It would also such ‘commonsense rules’. We believe, however, that this is clearly be hard to mine relations between crocodiles and a rather crude view of commonsense reasoning and outline steeplechases in this way! Moreover, any explanation of the some problems of these approaches below. disambiguation given by such a system would not be very One proposed approach is that we reduce some of the im- enlightening. Considering schema (1) with ‘steel’; explaining plied causation in WSs to correlation (Bailey et al. 2015). why ‘it’ refers to the ball by saying that ‘steel things are more This uses ‘correlation formulas’ of the form F ⊕ G, such likely to crash through things than to be crashed through’ is as ‘fit into(x, y) ⊕ large(y)’ to say that ‘stuff fitting into y’ not a reasonable explanation. Even the ability to cite a salient is correlated with ‘y being large’. Some inference rules are property of steel like ‘steel is hard’ would be an important given governing such correlation formulas and it is shown improvement. how these could be used to justify a solution to a WS. This ap- The approaches outlined above at best only incorporate proach is however problematic. It is analogous to a discussion shallow semantic features and do not appear to exhibit the in (Bunt and Black 2000) — by reducing to mere convention the reason why ‘There is a howling gale in here!’ is under- 3 www-formal.stanford.edu/leora/ stood as a command to close the window, we are oversimpli- commonsense/ kind of intelligent behaviour the challenge was designed to considerations that we can assign appropriate interpretations test. We believe that, in order to carry out complex infer- to these terms and thus disambiguate the pronoun. ences and really understand the world, some definitions of Moreover, even in the sentences where each term can be the natural language in terms of more refined primitives is precisely and appropriately defined we can still have seman- often necessary. It is necessary to have genuine world knowl- tic underdeterminacy. Is it often the case that an utterance is edge of entities, as well as their physical, social/historical not totally explicit and leaves the reader to fill in the gaps and functional attributes, as in (Bennett 2005), and be able with available assumptions and inferences (Carston 1999). to reason about that knowledge, e.g. crocodiles have short One of the ways that a hearer may fill in these gaps and in- legs and long bodies, making them unsuitable candidates fer a speaker’s intention is by assuming Grice’s Maxims for for a steeplechase, rather than superficial knowledge about co-operative communication (Grice 1975); e.g. the ‘Quantity relations between entities which are mined from corpora, e.g. Maxim’, stating: ‘Make your contribution as informative as is crocodiles do not run steeplechases. A line may be drawn required’ and ‘Do not make your contribution more informa- by the distinction between reasoning from first principles tive than is required’. So for instance, if a speaker goes into a and reasoning by analogy. They can both be valid forms of lot of detail when making an utterance, we may assume that reasoning, but reasoning by analogy alone is not enough to there is particular reason for this and can infer things based be considered intelligent. on this knowledge. This kind of pragmatic inference is also important for written text, and hence the WSC. Therefore, as Key challenges it stands, any solution to the WSC needs some mechanisms This section outlines some particular problems that need for coping with this implicit knowledge. resolving in order to tackle the WSC and for commonsense In the next section we consider some particular examples reasoning systems more generally. of this sort of inference when addressing a WS. Pragmatics Assumptions about the world A large part of the complexity of the WSC comes from prag- When facing any WS there are multiple commonsense princi- matic considerations. There are varying positions on the defi- ples that apply which allow us to create an accurate model of nition of pragmatics (Carston 1999), however it is generally the situation. What we aim to achieve is some guidance on understood as the field concerned with extra-linguistic fac- how to choose these principles and when they apply. To this tors, such as context, and how they allow the understanding end we examine the following WS: of a speaker’s intended meaning. Tom threw his school bag down to Ray after hex Semantic considerations are clearly essential but they are reached the [top/bottom] of the stairs. Who reached the generally not enough in order to reach a conclusion about [top/bottom] of the stairs? Answer: top: Tom. bottom: the disambiguation for a WS. This is an example of semantic Ray.1 (3) underdeterminacy — that from only considering the literal meanings of terms in a sentence and not accounting for the We will use this example to help elucidate some of the intended meaning, we do not obtain a truth-evaluable propo- complexities faced, including the initial position of objects sition. For example, the sentence ‘Tom threw his school bag and relevant objects. down to Ray after he reached the top of the stairs’ does not The main idea of this sentence is that to throw something contain much information if we only consider the semantics. down to someone, that person must be below you. We then We also need to consider the intention of the speaker and we use the idea of what it means to be at the top of something, may infer this from the decisions the speaker takes regarding i.e. that if Ray is at the top of the stairs then he cannot be the specific choice of language, what information is omitted, below Tom. This is however not as clear as it seems. what is left ambiguous, the phrasing of the sentence etc... Indeed, Kempson argues that ‘the articulation of semantics Initial position It is possible that Tom is on some balcony [does not alone] provide the full propositional content/logical above the stairs and waits for Ray to reach the top of the stairs form/truth conditions expressed by a sentence’(Kempson before throwing the bag down to Ray. So why do we like 1984). the answer ‘Tom’? It appears we assume that Tom and Ray To evidence this view, we can see that for many WSs are initially in a similar location, or to be more precise, that wrongly disambiguating the pronoun does not necessarily they both have the same relation to any given landmark — in violate world knowledge. For example, when dealing with this case the stairs. Character x reaching the top of the stairs the sentence: implies that x has moved upwards. Not given any information on the other character, y, we assume they have not moved The trophy does not fit into the suitcase because itx is and so x is likely to be above y. too large1 (2) Alternatively, x may have been walking along a corridor there are various interpretations of ‘large’ which give no to reach the top of the stairs. In this scenario we have two definite disambiguation. If we imagine a trophy and suitcase locations to consider, the corridor and the stairs. We suppose to be vase-shaped, with a wide base, narrow stem and wide that Tom and Ray are on the stairs or in the corridor. In this top, and that the trophy fits into the suitcase, it is possible that case it would make no sense for Ray to be at the top of the making the suitcase larger via a scale projection would make stairs, as then Tom would not be able to throw anything down the trophy no longer fit. It is in part by making pragmatic to him (from the corridor or the stairs); so we suppose that it must be Tom who walks along the corridor to reach the top discussion so far has motivated a detailed level of knowledge. of the stairs and throw the bag down to Tom. Further, there is evidence that, even for coreference prob- We appeal to a rule that in some narrative, unless we have lems that would be considered easy with respect to the WSC, reason to infer otherwise, characters are nearby/in the same incorporating shallow semantic features is not enough (Dur- place. This idea can be explained by Grice’s quantity maxim rett and Klein 2013). Yet, if we are to solve the WSC using i.e. there is no pertinent difference in the positions of either deeper semantics, it is clear that the necessary commonsense Tom or Ray; if there were then the quantity maxim says it knowledge would involve the formalization of a notoriously should be made known. extensive knowledge base. How to obtain and organize such This rule however does not always hold. Imagine we re- a large knowledge base is unclear. place ‘stairs’ with ‘swimming pool’: On the one hand, due to the variety and scope necessary, mining commonsense knowledge is appealing; however, as Tom threw his school bag down to Ray after hex reached previously discussed, the available methods and nature of the top of the swimming pool. Who reached the top of text corpora pose limitations to obtaining deep knowledge, the swimming pool? Answer: Ray. which is complex and commonly not explicit. On the other In this scenario x reaches the top of the swimming pool, hand, hand crafted knowledge bases such as CYC (Lenat breaking the surface of the water. x is then not in a position 1995), which incorporate a deeper level of knowledge, have to throw something like a school bag downwards, as it is had limited success and it is not clear how they should be pretty hard to throw textile objects through water. Hence, we exploited. imagine that x is not Tom, but Ray, and that Tom must be Beyond the problem of its acquisition, it is well known that stood somewhere above the swimming pool. commonsense knowledge is hard to formalize, particularly if the required level of detail involves the semantics of natural Relevant objects In general in the WSC to come to a con- terms to be preserved. Vagueness and ambiguity are inherent clusion we only need to reason about entities that are explic- to natural language and, for that reason, it is problematic itly mentioned. In the school bag example we reason about to prescribe single strict interpretations to natural terms. To the two characters in the narrative, Tom and Ray, the staircase illustrate this, consider the WS (3) and imagine the case of and the school bag itself. Combining knowledge of actions a naive definition of a relation at the top of (x, y) ≡ x is on like ‘throwing’ ‘reaching the top of’ etc.. with knowledge of y and for any z which is part of y, x is not below z. We see these objects. In general then, we do not need to appeal to that this fails for multiple reasons. the existence of extra entities in order to come to a conclu- sion. This can also be explained by the quantity maxim, the 1. If Tom were one step below the very last one, it could still sentence should provide the necessary objects for the reader be considered that he is at the top of the stairs, particularly to make sense of the sentence. if Ray were well below him. We call it sorites vagueness However, as previously discussed, certain words or phras- when there is a the lack of a clear threshold of application ings indirectly suggest the existence of certain entities, as of a term. in the ‘Sam chopped down the tree’ example. We can in 2. If we change ‘stairs’ to ‘building’ we might say that part account for these entities by encoding into a lexicon Tom is at the top of a building because he is on the top (Pustejovsky 1991), though these are conventions that will floor, rather than on the roof. In that case we are shift- not always hold. Therefore a defeasible reasoning process is ing the interpretation of the predicate to something like necessary to select the most appropriate interpretation. at the top of (x, y) ≡ z is the top part of y and x is on To conclude our discussion about assumptions about the z. There may also be many admissible interpretations of world, we see that appropriate assumptions need to be made what it means for z to be the top part of y. We call the in order to reach the right conclusion. Further, we believe that, multiplicity of conceptually distinct interpretations of nat- to varying extents, these kinds of considerations arise when ural terms conceptual vagueness. Further discussion on analysing most WSs appearing in the collection maintained the multiple interpretations of natural language terms and by Davis1 . However, the assumptions are dependent on the their role in knowledge bases and ontologies can be found specific situation and we need to discern somehow when the in (Bennett 2005). assumptions are appropriate. Deciding when to accept these Much of the work done in acquiring commonsense knowl- assumptions should include pragmatic considerations. For edge circumvents vagueness in different ways, such as us- example, it is lexical and semantic knowledge that suggest ing shallow semantics or microtheories that do not need to the existence of an axe in the sentence ‘Sam chopped down be consistent with one another. Various theories, however, the tree’, however it is a pragmatic task to actually infer have been proposed for dealing with vagueness. Fuzzy logic this. This motivates a heuristic process which incorporates (Zadeh 1965) stands as an intuitive solution for modelling pragmatics and gives preference to default assumptions, we sorites vagueness by assigning degrees of truth. More in- will discuss this idea later. teresting for this research, supervaluation semantics (Fine 1975) is based on the idea that vague language can be inter- Formalizing commonsense knowledge: level of preted in many different precise ways, each of which can be detail and vagueness logically conceptualised in a precisification (Bennett 2001; An important issue is to recognize the level of semantics that Gómez Álvarez and Bennett 2017), thus also offering support one believes is appropriate for a solution to the WSC. Our for modelling conceptual vagueness. So where do all these considerations lead us? In order to further, we are not only interested in picking a prototypical reach the kind of solution we desire, we must be able to deal example from a category, say from the class ‘pet’ or ‘things with semantic underdeterminacy — part of which involves that we eat’. Instead, we would also like to find prototypical deciding when to use appropriate commonsense assumptions instances of relations that can be used to compare an infinite — and also make use of a vast amount of detailed knowledge number of objects. Although there is some work done on while dealing with the associated problems of vagueness. vector analysis for relationships between words (Mikolov, With these issues in mind, we now consider some avenues Yih, and Zweig 2013), in particular for analogy problems, for further work. it does not appear to be applicable to this sort of reasoning problem. The role of pragmatics in solving the WSC Suppose we have a vague term, like ‘smaller’. How can we decide on prototypical instances of this relation? Adopting In the previous sections we have highlighted how current the supervaluation approach we would have a collection of approaches, regardless of their success in solving schemas, precise interpretations of its meaning. Following motivation have provided limited support for the kind of intelligent be- from (Rosch and Mervis 1975) — considering shared proper- haviour that we would like to replicate. Here, in an attempt ties of classes — in an ideal scenario prototypical instances to account for some of the key challenges, we propose an of ‘smaller’ share properties across all instances of ‘smaller’ alternative approach, favouring the use of knowledge bases i.e. a prototypical instance of smaller is considered smaller where the deep semantics of the different interpretations of in all plausible interpretations. Consider the definitions for commonsense terms are formalised. Furthermore, we suggest ‘smaller’ given in (Davis 2013): using heuristic approaches based on pragmatics to determine, in the context of each particular schema, appropriate config- 1. Smaller(a, b) ≡ VolumeOf (a) < VolumeOf (b) urations of both reasonable interpretations of the terms and 2. Smaller(a, b) ≡ DiameterOf (a) < DiameterOf (b) necessary assumptions about the world. 3. Smaller(a, b) ≡ a ⊂ b For this purpose we first motivate the use of prototypes for categories and relations and then develop how heuristic 4. Smaller(a, b) ≡ ∃s(s > 1 ∧ b = Scale(a, s)) methods can provide a manageable way of using pragmatic In this scenario, there are certainly pairs of objects that fall knowledge for the disambiguation of WSs. into all four categories (e.g. a sphere of radius 1 is smaller than a sphere of radius 2 in all the above senses). Hence, Appealing to prototypicality it would be appropriate to take the conjunction of all four There is various work in pragmatics and cognitive science definitions as a requirement for an instance to be considered a highlighting the importance of using prototypes: in utterance prototypical case of ‘smaller’. However, in certain scenarios interpretation defaults are assigned before contextual and it may be inappropriate to take the conjunction in this way, pragmatic considerations are taken into account (Levinson as some definitions may be conflicting. In this case different 1995; Recanati 2004) and there is also evidence for the human metrics can be proposed for selecting prototypes that satisfy preference for good examples (prototypes) of some category most of the interpretations. as opposed to boundary cases and, further, that prototypes are Finally, our main claim in this section is twofold. On the associated with the least processing effort (Rosch 1978). In one hand, we consider that an understanding of typicality the particular scenario of a WS, we argue that the way vague is necessary for commonsense reasoning — that by default terms are presented leads the reader to interpret them con- we should consider prototypes. On the other hand, a process sidering prototypical instances fitting the described scenario. which can only reason over prototypical definitions is clearly For instance, when one reasons about the WS (2) involving flawed in many respects as it creates over-simplification. Hu- the trophy and the suitcase, it is not necessary to worry about mans often use context to help narrow definitions, for ex- a precise semantic commitment for the notion of larger, but ample defining ‘smaller’ in a particular way makes sense instead to evaluate the sentence considering clear cases that when talking about ‘fitting in’. Hence we believe that a good satisfy most of the possible interpretations. approach should reflect the diversity of possible interpreta- Some of the previously discussed approaches work along tions of vague terms and that an engine based on pragmatics similar lines, using general commonsense rules and a notion should guide the selection of appropriate alternatives when of correlation which appeal to a sense of typicality. However, the prototype is not suitable. we believe that this should be more nuanced and that the deep semantics of different interpretations should be preserved. Heuristics standing in for pragmatics Hence, we propose an approach using ideas from prototype In this paper we have discussed some approaches proposed theory (Rosch and Mervis 1975) to differentiate prototypical for the WSC relying on heuristic methods in different ways instances of vague terms and relations from borderline cases (Rahman and Ng 2012; Peng, Khashabi, and Roth 2015; within a supervaluationist approach. Liu et al. 2016). Overall, we concluded that heuristics do Much work has been done on how to pinpoint prototypical not provide satisfactory solutions when reduced to evaluating members of categories, mainly using vector analysis or con- shallow semantic notions such as correlation. ceptual spaces to find the centroid of a concept (Verheyen, Instead, as has been argued, we believe that a good solution Ameel, and Storms 2007; Lenci 2011). However, it is not to the WSC should disambiguate the pronoun by considering clear how one could reason with this to resolve a WS, and the most plausible configuration of the scenario described, and the process of finding it should incorporate rich syntactic, it came to that disambiguation, potentially satisfying Mor- semantic and pragmatic considerations. However, although genstern and Ortiz’s requirement of a simple explanation. In advocating deeper semantics and symbolic based approaches spite of being preliminary research, in our view its reasonable that allow for the kind of reasoning that we want (see section results suggest that fruitful work can be done in further devel- above), we propose that heuristic methods have a key role oping heuristic methods to assess the pragmatic and semantic in the WS resolution: that of simplifying the space of pos- considerations that govern reasonable disambiguations of sibilities and estimating reasonably good configurations of natural language. precisifications and necessary assumptions about the world. To conclude this section, it is our claim that this use of As we have highlighted above in order to carry out satis- heuristics is much more in keeping with the nature of the factory reasoning we believe a system should give preference WSC. That what should be simplified in order to keep the to both commonsense assumptions about the world as well as task manageable is not so much the deep semantics of natural prototypical interpretations of the terms involved. These how- terms, but the process of selecting and integrating relevant ever should only be preferences rather than concrete rules. interpretations and background knowledge in the particular When to accept or reject these default assumptions requires context of the resolution of each sentence. knowledge and pragmatic understanding. The ability for this complex mix of pragmatics and world knowledge to con- Conclusion tradict itself means that possible solutions or configurations In this paper we have discussed the nature of the WSC as of a described scenario are not unique. For example, when a benchmark, highlighting the shortcomings of several cur- discussing the issue of throwing a school bag in a swim- rent approaches and providing motivation for a more detailed ming pool above, the implausibility of throwing a school level of knowledge. We have also analysed some of what we bag through water outweighed the assumption of Tom and consider to be key challenges, in particular drawing attention Ray being in the same place. However, we may also consider to the need to take account of pragmatic considerations. To that the assumption of characters being in the same place begin addressing these challenges, we have suggested using outweighs the usual interpretation of ‘throw down’ and ‘top’: frameworks able to support the detailed semantics of natural supposing Tom and Ray are both stood in the swimming pool, terms while accounting for its vagueness. Moreover, that their we may interpret ‘throw down’ as ‘throw horizontally away complexity can be manageable with the use of prototypes, from the end of the swimming pool’ and ‘top of the swim- which should be identified and used by default, and, finally, ming pool’ to denote the end of the swimming pool. The that heuristic methods can be used to incorporate varying result would then be to disambiguate the pronoun as ‘Tom’ semantic interpretations as well as assumptions about the rather than ‘Ray’. This second interpretation is not wrong, world, which maintain the pragmatic principles of coopera- however when ‘throw down’ and ‘top’ are interpreted in their tive communication. usual way there is a plausible inference that Tom and Ray In conclusion, it is our view that, while heuristic mecha- are not both located in the swimming pool. This would then nisms are necessary to deal with natural language and to re- be an example of a ‘conversational implicature’ (Grice 1975) duce the complexity of commonsense reasoning, they should and explain why the writer of the sentence did not explicitly not be used to over-simplify the semantics of natural terms. give Tom and Ray’s initial locations. Hence in the first inter- Instead, we believe that applications along the lines of the- pretation we have a good explanation for violating the default oretical studies in pragmatics can play a significant role in that Tom and Ray are located in the same place and we also the selection of good interpretations of natural terms and interpret all the terms in a usual fashion, therefore making to enrich the provided descriptions of the world with the this interpretation appear to be the valid one. appropriate implicit knowledge. Being able to leverage these kinds of inferences is an im- portant and difficult task in commonsense reasoning. Along Acknowledgements these lines, one avenue (Schüller 2014) adopted in tackling Thanks to Brandon Bennett for helpful discussion and to the the WSC has been to explore relevance theory (Sperber and anonymous reviewers for their useful feedback. Wilson 2004). This theory, inspired by Grice’s work, is based on the idea that an utterance can have a variety of interpre- References tations, and that it is through parsing, disambiguating terms, Bailey, D.; Harrison, A.; Lierler, Y.; Lifschitz, V.; and Michael, resolving pronouns and adding pragmatic inference as well J. 2015. The Winograd Schema Challenge and Reasoning about as appropriate assumptions based on context that one can Correlation. In Working Notes of the Symposium on Logical For- comprehend the meaning of an utterance. The principle guid- malizations of Commonsense Reasoning. ing these tasks is the idea of maximizing relevance4 . Schüller Baker, C. F.; Fillmore, C. J.; and Lowe, J. B. 1998. The berkeley uses these ideas to motivate a heuristic process for reasoning framenet project. In Proceedings of COLING/ACL, 86–90. over graphs, where a fitness function is employed to find rel- Bennett, B. 2001. What is a Forest? On the vagueness of certain evant combinations that provide a disambiguation. Moreover, geographic concepts. Topoi 20(2):189–201. the resulting graph can be read off to get some idea of how Bennett, B. 2005. Modes of concept definition and varieties of vagueness. Applied Ontology 1(1):17–26. 4 An input is said to be relevant if a worthwhile conclusion is Bunt, H., and Black, W. 2000. The ABC of Computational Pragmat- drawn from it. An input is more relevant if it yields a greater positive ics. In Bunt, H., and Black, W., eds., Natural Language Processing, cognitive effect for less processing effort. volume 1. Amsterdam: John Benjamins Publishing Company. 1–46. Carbonell, J. G., and Brown, R. D. 1988. Anaphora resolution: a Mitkov, R. 2014. Anaphora resolution. Routledge. multi-strategy approach. In Proceedings of the 12th Conference on Morgenstern, L., and Ortiz Jr, C. L. 2015. The Winograd Schema Computational linguistics, volume 1, 96–101. Challenge: Evaluating Progress in Commonsense Reasoning. In Carston, R. 1999. The semantics/pragmatics distinction: A view AAAI, 4024–4026. from relevance theory. In Turner, K., ed., The semantics/pragmatics Ng, V. 2017. Machine Learning for Entity Coreference Resolution: interface from different points of view. Oxford, UK: Elsevier. 85– A Retrospective Look at Two Decades of Research. In AAAI, 4877– 125. 4884. Davis, E., and Marcus, G. 2015. Commonsense reasoning and com- Peng, H.; Khashabi, D.; and Roth, D. 2015. Solving hard corefer- monsense knowledge in Artificial Intelligence. Communications of ence problems. In Proceedings of NAACL, 809–819. the ACM 58(9):92–103. Pustejovsky, J. 1991. The Generative Lexicon. Computational Davis, E. 2013. Qualitative Spatial Reasoning in Interpreting Text linguistics 17(4):409–441. and Narrative. Spatial Cognition & Computation 13(4):264–294. Rahman, A., and Ng, V. 2012. Resolving complex cases of definite Durrett, G., and Klein, D. 2013. Easy Victories and Uphill Battles pronouns: the winograd schema challenge. In Proceedings of the in Coreference Resolution. In EMNLP, 1971–1982. 2012 Joint Conference on EMNLP and CoNLL, 777–789. ACL. Fine, K. 1975. Vagueness, truth and logic. Synthese 30(3):265–300. Recanati, F. 2004. Pragmatics and Semantics. In Handbook of Grice, H. P. 1975. Logic and conversation. In Syntax and Semantics, Pragmatics. Oxford: Blackwell. 442–462. Vol. 3, Speech Acts. New York: Academic Press. 41–58. Rosch, E., and Mervis, C. B. 1975. Family resemblances: Studies in Gómez Álvarez, L., and Bennett, B. 2017. Classification, Individ- the internal structure of categories. Cognitive Psychology 7(4):573 – uation and Demarcation of Forests: formalising the multi-faceted 605. semantics of geographic terms. In 13th International Conference Rosch, E. 1978. Principles of categorization. In Rosch, E., and on Spatial Information Theory. Leibniz International Proceedings Lloyd, B. B., eds., Cognition and categorization, volume 1. Hills- in Informatics. dale, NJ: Lawrence Erlbaum Associates. 27–78. Hayes-Roth, F.; Waterman, D.; and Lenat, D. 1984. Building expert Schüller, P., and Kazmi, M. 2015. Using Semantic Web Resources systems. Reading, MA: Addison-Wesley. for Solving Winograd Schemas: Sculptures, Shelves, Envy, and Isaak, N., and Michael, L. 2016. Tackling the Winograd Schema Success. In SEMANTiCS (Posters & Demos), 22–25. Challenge Through Machine Logical Inferences. In Pearce, D., and Sofia Pinto, H., eds., STAIRS, volume 284 of Frontiers in Artificial Schüller, P. 2014. Tackling Winograd Schemas by formalizing Intelligence and Applications. IOS Press. 75–86. relevance theory in knowledge graphs. In Fourteenth International Conference on the Principles of Knowledge Representation and Kempson, R. 1984. Pragmatics, anaphora and logical form. In Reasoning. Schriffin, D., ed., Meaning, form and use in context: linguistic applications. Washington, DC: Georgetown University Press. 1–10. Sharma, A.; Vo, N. H.; Aditya, S.; and Baral, C. 2015. Towards Addressing the Winograd Schema Challenge-Building and Using Lenat, D. B. 1995. CYC: A large-scale investment in knowledge a Semantic Parser and a Knowledge Hunting Module. In IJCAI, infrastructure. Communications of the ACM 38(11):33–38. 1319–1325. Lenci, A. 2011. Composing and updating verb argument expecta- Sharma, A. 2014. Solving Winograd schema challenge: Using tions: A distributional semantic model. In Proc 2nd Workshop on semantic parsing, automatic knowledge acquisition and logical Cognitive Modeling and Computational Linguistics, 58–66. ACL. reasoning. Ph.D. Dissertation, Arizona State University. Levesque, H.; Davis, E.; and Morgenstern, L. 2012. The Winograd Speer, R., and Havasi, C. 2012. Representing General Relational Schema Challenge. In Thirteenth International Conference on the Knowledge in ConceptNet 5. In LREC, 3679–3686. Principles of Knowledge Representation and Reasoning. Sperber, D., and Wilson, D. 2004. Relevance theory. In Handbook Levesque, H. J. 2014. On our best behaviour. Artificial Intelligence of Pragmatics. Oxford: Blackwell. 607–632. 212:27–35. Turing, A. M. 1950. Computing machinery and intelligence. Mind Levesque, H. J. 2017. Common sense, the Turing test, and the quest 59(236):433–460. for real AI. Cambridge, MA: MIT Press. Verheyen, S.; Ameel, E.; and Storms, G. 2007. Determining the Levinson, S. C. 1995. Three levels of meaning. In Grammar and dimensionality in spatial representations of semantic concepts. Be- meaning: Essays in honour of Sir John Lyons. Cambridge University havior Research Methods 39(3):427–438. Press. 90–115. Zadeh, L. 1965. Fuzzy sets. Information and Control 8(3):338–353. Liu, Q.; Jiangb, H.; Linga, Z.-H.; Zhuc, X.; Weid, S.; and Hua, Y. 2016. Commonsense Knowledge Enhanced Embeddings for Solving Pronoun Disambiguation Problems in Winograd Schema Challenge. arXiv preprint arXiv:1611.04146. McCarthy, J. 1993. Notes on formalizing context. In Proceedings of the 13th international joint conference on Artifical intelligence- Volume 1, 555–560. Morgan Kaufmann Publishers Inc. McCarthy, J. 1998. Elaboration tolerance. In Common Sense, volume 98. Mikolov, T.; Yih, W.-t.; and Zweig, G. 2013. Linguistic regular- ities in continuous space word representations. In NAACL HLT, volume 13, 746–751. Miller, G. A. 1995. WordNet: a lexical database for English. Com- munications of the ACM 38(11):39–41.