The Role of Pragmatics in Solving the Winograd Schema Challenge

                  Adam Richard-Bollans and Lucı́a Gómez Álvarez and Anthony G. Cohn
                                                         School of Computing
                                                    University of Leeds, Leeds, UK
                                                {mm15alrb, sc14lga, a.g.cohn}@leeds.ac.uk


                             Abstract                                      The pronoun ‘it’ refers to either the ball or the table de-
                                                                        pending on whether ‘steel’ or ‘styrofoam’ is used. In both
  Different aspects and approaches to commonsense reason-
                                                                        cases the syntactic structure remains the same and, supposing
  ing have been investigated in order to provide solutions for
  the Winograd Schema Challenge (WSC). The vast complexi-               that clear semantic preferences relating ‘steel’ and ‘crashing
  ties of natural language processing (parsing, assigning word          through things’ or ‘styrofoam’ and ‘being crashed through’
  sense, integrating context, pragmatics and world-knowledge,           cannot be easily learned from mining a large corpus, it is
  ...) give broad appeal to systems based on statistical analysis       hoped that any system which resolves the pronoun must use
  of corpora. However, solutions based purely on learning from          some sort of genuine understanding.
  corpora are not currently able to capture the semantics under-           In the literature discussing the WSC and its motivation as
  lying the WSC – which was intended to provide problems                a benchmark we see example reasoning processes incorporat-
  whose solution requires knowledge and reasoning, rather than          ing detailed semantics of the language involved (Davis 2013;
  statistical analysis of superficial lexical features. In this paper   Levesque 2014; Levesque, Davis, and Morgenstern 2012;
  we consider the WSC as a means for highlighting challenges
  in the field of commonsense reasoning more generally. We be-
                                                                        Morgenstern and Ortiz Jr 2015). This kind of approach how-
  gin by discussing issues with current approaches to the WSC.          ever has not been at the forefront of proposals to the chal-
  Following this we outline some key challenges faced, in par-          lenge. This is in large part due to the enormous complexity of
  ticular highlighting the importance of dealing with pragmatics.       dealing with natural language and constructing large enough
  We then argue for an alternative approach which favours the           knowledge bases to handle such varied contexts.
  use of knowledge bases where the deep semantics of the dif-              In order to further the symbolic approach we investigate
  ferent interpretations of commonsense terms are formalised.           some problems faced, mainly pragmatics. It is hoped that this
  Furthermore, we suggest using heuristic approaches based on           sort of analysis helps to shed light on what kind of reasoning
  pragmatics to determine appropriate configurations of both            is needed where; and that heuristic methods will remove a
  reasonable interpretations of terms and necessary assumptions
                                                                        large portion of the burden of reasoning about natural lan-
  about the world.
                                                                        guage. Along similar lines, a partial solution is provided in
                                                                        (Schüller 2014), using relevance theory (Sperber and Wilson
                         Introduction                                   2004) to motivate selection of the best knowledge graph to
The Winograd Schema Challenge (Levesque, Davis, and Mor-                describe a sentence.
genstern 2012) was conceived as a new benchmark in arti-                   In this paper we first explore what kind of reasoning ca-
ficial intelligence, which would improve on the Turing Test             pabilities we expect a system to display when solving the
(Turing 1950) by removing the need for deception and focus-             WSC and we analyse how some of the proposed approaches
ing more on understanding. The task is a particular type of             compare to this. We then consider some key challenges for
pronoun disambiguation problem. Sentences with a pronoun                solving the WSC using reasoning we consider appropriate;
and two candidate referents are given, and the task is to find          in particular, that pragmatics and context are very difficult to
the correct referent of the pronoun. As the challenge is in-            capture and semantics are hard to formalize due to vagueness.
tended to require genuine intelligence and understanding, the           Finally, we show how pragmatic considerations can help in
sentences are supposed to be constructed in such a way that             solving the WSC, specifically we consider how prototype the-
syntactic constraints and semantic preference do not alone              ory and heuristic methods can be used to support symbolic
enable the disambiguation. This construction is achieved in             approaches.
part by finding pairs of sentences, differing only by one word
but where the pronoun reference is different. For example:                What kind of reasoning are we looking for?
   The large ball crashed right through the table because               We first consider the example above (1) given in (Levesque
   it was made of [steel/styrofoam]. What was made of                   2014), using the word ‘styrofoam’. Humans would success-
   [steel/styrofoam]? Answers: The ball/the table.1       (1)           fully resolve this by knowing particular properties of sty-
   1
       Taken   from   www.cs.nyu.edu/faculty/davise/                    papers/WinogradSchemas/WSCollection
rofoam, maybe some naive physics and even some general                and Marcus 2015; Levesque 2014; 2017), various methods
properties of balls and tables.                                       suggested for tackling the problem (Sharma et al. 2015;
   Levesque then considers what should be the outcome if              Schüller 2014; Bailey et al. 2015; Rahman and Ng 2012;
we change styrofoam to XYZZY, where XYZZY is some                     Peng, Khashabi, and Roth 2015), and four implementa-
material that we are given some facts about, one of the facts         tions entered into the 2016 challenge2 (Liu et al. 2016;
being ‘It is ninety-eight percent air, making it lightweight and      Isaak and Michael 2016) (two of the competitors did not
buoyant’. Given this fact, humans would be able to reason             release papers). The WSC is a particular type of anaphora res-
that the table is made of XYZZY. This is a part of intelligent        olution task, on which there has been much work done in the
behaviour that we would like to replicate, and is clearly de-         natural language processing community already (Ng 2017;
pendent on having and being able to reason about detailed             Mitkov 2014; Carbonell and Brown 1988); however due to
knowledge. Further, it has been suggested as a possible exten-        the nature of the task, necessitating the use of world knowl-
sion to the test to add a requirement for the solution to provide     edge, the methods employed are not wholly suitable for the
a simple explanation of its choice (Morgenstern and Ortiz Jr          challenge.
2015). This need for explanation would also seem to depend               Formalizing the necessary aspects of reasoning to tackle
on reasoning with detailed knowledge; in order to explain             the WSC (spatial, temporal, causal, epistemic, ...) and inte-
why the table is made of styrofoam, it seems necessary to             grating them in one system is notoriously hard. Therefore, it
have an understanding of the mechanics of the situation. The          is not surprising that the space of genuine proposed solutions
ability to provide an explanation is also important more gener-       is sparse, and that existing approaches are mostly based on
ally for the field of commonsense reasoning, for example for          statistical methods, that circumvent the need for a precise
decision support systems that need to provide justifications          understanding of the semantics of the schemas by learning
for decisions (Hayes-Roth, Waterman, and Lenat 1984).                 likely answers from text corpora.
                                                                         In this section we analyse some of the solutions proposed
Versatile solutions                                                   along these lines. We consider both their performance and
The WSC was conceived as a new benchmark for artificial               success on the challenge and also their achievements and
intelligence; as such, we hope that solutions to the WSC will         relevance for broader commonsense reasoning, which is the
provide tools for tackling a broader range of question answer-        ultimate aim of the WSC as a benchmark.
ing tasks and commonsense challenges. In this way, solutions
to the challenge should display versatility as well as making         Machine learning approaches
advances in the WSC specifically, thus representing genuine            Machine learning methods for anaphora resolution have been
progress towards truly intelligent machines. Solutions which           used extensively over the past two decades (Ng 2017). In this
are over-specific to the WSC and only provide insight into             section we consider some of the best known such approaches
this narrow set of coreference resolution problems are not             for tackling the WSC.
likely to be ‘engaging in behaviour that we would say shows               The team that came first in the 2016 WSC challenge2 used
thinking in people’ (Levesque, Davis, and Morgenstern 2012).          ‘Commonsense Knowledge Enhanced Embeddings’ (Liu et al.
This is a similar but more general requirement than elabora-           2016) which works by learning word representation vectors
tion tolerance (McCarthy 1998).                                        from large text corpora while incorporating commonsense
   The situations described in Winograd sentences (WS) are             knowledge as constraints in the training process. For the
generally common/normal occurrences; however, it is desir-             competition the commonsense knowledge was obtained from
able for AI systems to be able to reason about out-of-place            CauseCom — a set of cause and effect pairs such as ‘winning
objects and strange scenarios. The ability to do this displays         causes happiness’ (Liu et al. 2016) — though the team has
a genuine understanding of what is going on. Levesque gives            also incorporated WordNet (Miller 1995) and ConceptNet
the example ‘Can a crocodile run a steeplechase?’ (Levesque           (Speer and Havasi 2012). A neural network is then trained
2014). Most humans would answer this easily using ba-                  to answer yes or no when given candidate/pronouns pairs
sic knowledge about crocodiles (in particular that they can-          (as vectors), and this network is then used to answer new
not jump) and what is necessary to be able to complete a               disambiguation problems.
steeplechase. Of course, as noted by Levesque, a statistical              Though achieving a good performance on the challenge,
approach using the closed world assumption would be likely             it would be down to chance whether it correctly answers the
to get the right answer to this question too as there is little ev-   XYZZY problem given by Levesque, whether it could be
idence of crocodiles running steeplechases. It would be less           used to solve the crocodile-steeplechase problem, or in future
likely however to answer the question correctly if the animal          how it could be developed to explain how it comes to the
was a gazelle (which presumably could run a steeplechase).             conclusion.
   Having briefly considered the kind of solutions we are                 Rahman and Ng (2012) have worked combining multi-
aiming for, we now look at how some existing approaches                ple methods to resolve the pronoun for a large corpus of
compare to this.                                                      WSs. This work achieved high results on their corpus, 73.1%.
                                                                       However, the corpus selection has been criticized for contain-
       Existing approaches to the challenge                            ing redundancy (Sharma 2014). Further, the approach relies
Since the inception of the WSC there has been some the-                 2
                                                                          www.cs.nyu.edu/faculty/davise/papers/
oretical discussion on the purpose of the challenge (Davis            WinogradSchemas/WS.html
heavily on statistical methods for assessing the semantic pref-     fying and missing out more important reasoning processes,
erences of types and events e.g. a lion is a type of predator       including context. Similarly, if we were to find a list of com-
and being the subject of a kill event makes one more likely         monsense correlations like ‘fit into(x, y) ⊕ large(y)’ through
to be the object of an arrest event. It is clear that ‘lions eat    corpus mining, we are ripping the words out of context and
zebras because they are predators’ is not a ‘Google-proof’          may be missing out important reasoning processes.
WS and should be discarded. When such type distinctions                This is not to say that conventions do not exist or form an
are not useful, the system may rely on FrameNet (Baker,             important part of commonsense reasoning. Natural language
Fillmore, and Lowe 1998); in the case of ‘John killed Jim           is full of conventions that we may rely upon to communicate.
so he was arrested’, FrameNet gives John the role of ‘killer’       For example, considering the sentence ‘Sam chopped down
and Jim the role of ‘victim’ and the system, using statistical      the tree’ there is a default assumption that the chopping is
methods, concludes that it is more likely for a ‘killer’ (John)     done with an axe. This kind of convention can be considered
to be arrested. In this case the system resolves the pronoun        as part of linguistic knowledge (Pustejovsky 1991). However,
successfully. However, this takes no account of the impor-          reasoning based solely on conventions may be too crude, as it
tance of the connective: changing the sentence to ‘John killed      does not take contextual factors into consideration. Say that
Jim after he was arrested’ should force one to re-evaluate the      we know that Sam is holding a sword, then we may reject
disambiguation.                                                     the default assumption that Sam chops down the tree with
   Work by Peng et al. (2015) has been successful, achieving        an axe. One way of dealing with the context dependency of
higher results (76.4%) than Rahman and Ng on the same               such conventions may be to apply context frames, as in (Mc-
corpus. The technique is similar to the FrameNet approach of        Carthy 1993), i.e. in the context of Sam holding a sword, the
(Rahman and Ng 2012) but they also take connectives into ac-        statement ‘Sam chopped down the tree’ suggests that Sam
count. This approach can give crude, and clearly problematic,       did the chopping with a sword rather than an axe. However,
forms of knowledge such as ‘{flower has pollen} is more             even if we can create appropriate context frames using salient
likely than {bee has pollen}’; to more reasonable knowledge         aspects of context, it seems that the process of creating con-
such as ‘the subject of “be afraid of ” is more likely than the     vention/context pairs would continue ad infinitum. We would
object of “be afraid of ” to be the subject of “get scared of ”’.   hope that reasoning removes the necessity for a lot of these
Though these sorts of techniques will likely prove very useful      rules e.g. when someone is holding an appropriate tool, T, for
for natural language processing, and may even manage to             performing action, A, and we are told that they performed
pass the WSC, there is a fundamental issue that these tech-         action A, then we can assume that they have used T to do A.
niques are learning about the likelihood of combinations of            The tactic for many approaches is to begin by learning
words in corpora and there appears to be little in the way of       commonsense knowledge from large text corpora or by inte-
transferable knowledge or understanding. For example, it is         grating natural language knowledge bases. Part of the appeal
clear that the kind of background knowledge necessary to            of this is that knowledge can be exploited without having to
solve the crocodile-steeplechase problem is not present.            translate between formal and natural language. However, the
   Rather than applying reasoning to knowledge, these tech-         methods for extracting commonsense knowledge from the
niques are geared towards mining what we may call com-              Web can be problematic. Language is used in an efficient way
monsense rules. We discuss the nature of such rules in the          and commonsense knowledge is often left implicit (Schüller
following section.                                                  and Kazmi 2015).
                                                                       Even if we were able to overcome some of the problems of
Commonsense rules                                                   mining commonsense, do we want to use reasoning that relies
It is clear that, in the WSC, it appears possible to resolve        solely on these correlations and rules? Though they may be
pronoun ambiguity through an appeal to normality — heavy            helpful for certain applications, the reasoning mechanisms
things cannot be lifted, younger people are fitter, useless         need to incorporate less crude knowledge. Regarding the
objects go in the bin while useful tools are kept in storage        desire for versatility and considering some of the problems
etc... Hence, a large part of the suggested approaches to the       listed on the Common Sense Problem Page3 , it is clear that
WSC have been about ways of finding and/or incorporating            this approach is over-specific to the WSC. It would also
such ‘commonsense rules’. We believe, however, that this is         clearly be hard to mine relations between crocodiles and
a rather crude view of commonsense reasoning and outline            steeplechases in this way! Moreover, any explanation of the
some problems of these approaches below.                            disambiguation given by such a system would not be very
   One proposed approach is that we reduce some of the im-          enlightening. Considering schema (1) with ‘steel’; explaining
plied causation in WSs to correlation (Bailey et al. 2015).         why ‘it’ refers to the ball by saying that ‘steel things are more
This uses ‘correlation formulas’ of the form F ⊕ G, such            likely to crash through things than to be crashed through’ is
as ‘fit into(x, y) ⊕ large(y)’ to say that ‘stuff fitting into y’   not a reasonable explanation. Even the ability to cite a salient
is correlated with ‘y being large’. Some inference rules are        property of steel like ‘steel is hard’ would be an important
given governing such correlation formulas and it is shown           improvement.
how these could be used to justify a solution to a WS. This ap-        The approaches outlined above at best only incorporate
proach is however problematic. It is analogous to a discussion      shallow semantic features and do not appear to exhibit the
in (Bunt and Black 2000) — by reducing to mere convention
the reason why ‘There is a howling gale in here!’ is under-           3
                                                                        www-formal.stanford.edu/leora/
stood as a command to close the window, we are oversimpli-          commonsense/
kind of intelligent behaviour the challenge was designed to            considerations that we can assign appropriate interpretations
test. We believe that, in order to carry out complex infer-            to these terms and thus disambiguate the pronoun.
ences and really understand the world, some definitions of                Moreover, even in the sentences where each term can be
the natural language in terms of more refined primitives is            precisely and appropriately defined we can still have seman-
often necessary. It is necessary to have genuine world knowl-          tic underdeterminacy. Is it often the case that an utterance is
edge of entities, as well as their physical, social/historical         not totally explicit and leaves the reader to fill in the gaps
and functional attributes, as in (Bennett 2005), and be able           with available assumptions and inferences (Carston 1999).
to reason about that knowledge, e.g. crocodiles have short             One of the ways that a hearer may fill in these gaps and in-
legs and long bodies, making them unsuitable candidates                fer a speaker’s intention is by assuming Grice’s Maxims for
for a steeplechase, rather than superficial knowledge about            co-operative communication (Grice 1975); e.g. the ‘Quantity
relations between entities which are mined from corpora, e.g.          Maxim’, stating: ‘Make your contribution as informative as is
crocodiles do not run steeplechases. A line may be drawn               required’ and ‘Do not make your contribution more informa-
by the distinction between reasoning from first principles             tive than is required’. So for instance, if a speaker goes into a
and reasoning by analogy. They can both be valid forms of              lot of detail when making an utterance, we may assume that
reasoning, but reasoning by analogy alone is not enough to             there is particular reason for this and can infer things based
be considered intelligent.                                             on this knowledge. This kind of pragmatic inference is also
                                                                       important for written text, and hence the WSC. Therefore, as
                       Key challenges                                  it stands, any solution to the WSC needs some mechanisms
This section outlines some particular problems that need               for coping with this implicit knowledge.
resolving in order to tackle the WSC and for commonsense                  In the next section we consider some particular examples
reasoning systems more generally.                                      of this sort of inference when addressing a WS.

Pragmatics                                                             Assumptions about the world
A large part of the complexity of the WSC comes from prag-             When facing any WS there are multiple commonsense princi-
matic considerations. There are varying positions on the defi-         ples that apply which allow us to create an accurate model of
nition of pragmatics (Carston 1999), however it is generally           the situation. What we aim to achieve is some guidance on
understood as the field concerned with extra-linguistic fac-           how to choose these principles and when they apply. To this
tors, such as context, and how they allow the understanding            end we examine the following WS:
of a speaker’s intended meaning.
                                                                         Tom threw his school bag down to Ray after hex
   Semantic considerations are clearly essential but they are
                                                                         reached the [top/bottom] of the stairs. Who reached the
generally not enough in order to reach a conclusion about
                                                                         [top/bottom] of the stairs? Answer: top: Tom. bottom:
the disambiguation for a WS. This is an example of semantic
                                                                         Ray.1                                               (3)
underdeterminacy — that from only considering the literal
meanings of terms in a sentence and not accounting for the                We will use this example to help elucidate some of the
intended meaning, we do not obtain a truth-evaluable propo-            complexities faced, including the initial position of objects
sition. For example, the sentence ‘Tom threw his school bag            and relevant objects.
down to Ray after he reached the top of the stairs’ does not              The main idea of this sentence is that to throw something
contain much information if we only consider the semantics.            down to someone, that person must be below you. We then
We also need to consider the intention of the speaker and we           use the idea of what it means to be at the top of something,
may infer this from the decisions the speaker takes regarding          i.e. that if Ray is at the top of the stairs then he cannot be
the specific choice of language, what information is omitted,          below Tom. This is however not as clear as it seems.
what is left ambiguous, the phrasing of the sentence etc...
Indeed, Kempson argues that ‘the articulation of semantics             Initial position It is possible that Tom is on some balcony
[does not alone] provide the full propositional content/logical        above the stairs and waits for Ray to reach the top of the stairs
form/truth conditions expressed by a sentence’(Kempson                 before throwing the bag down to Ray. So why do we like
1984).                                                                 the answer ‘Tom’? It appears we assume that Tom and Ray
   To evidence this view, we can see that for many WSs                 are initially in a similar location, or to be more precise, that
wrongly disambiguating the pronoun does not necessarily                they both have the same relation to any given landmark — in
violate world knowledge. For example, when dealing with                this case the stairs. Character x reaching the top of the stairs
the sentence:                                                          implies that x has moved upwards. Not given any information
                                                                       on the other character, y, we assume they have not moved
   The trophy does not fit into the suitcase because itx is            and so x is likely to be above y.
   too large1                                                 (2)         Alternatively, x may have been walking along a corridor
there are various interpretations of ‘large’ which give no             to reach the top of the stairs. In this scenario we have two
definite disambiguation. If we imagine a trophy and suitcase           locations to consider, the corridor and the stairs. We suppose
to be vase-shaped, with a wide base, narrow stem and wide              that Tom and Ray are on the stairs or in the corridor. In this
top, and that the trophy fits into the suitcase, it is possible that   case it would make no sense for Ray to be at the top of the
making the suitcase larger via a scale projection would make           stairs, as then Tom would not be able to throw anything down
the trophy no longer fit. It is in part by making pragmatic            to him (from the corridor or the stairs); so we suppose that it
must be Tom who walks along the corridor to reach the top           discussion so far has motivated a detailed level of knowledge.
of the stairs and throw the bag down to Tom.                        Further, there is evidence that, even for coreference prob-
   We appeal to a rule that in some narrative, unless we have       lems that would be considered easy with respect to the WSC,
reason to infer otherwise, characters are nearby/in the same        incorporating shallow semantic features is not enough (Dur-
place. This idea can be explained by Grice’s quantity maxim         rett and Klein 2013). Yet, if we are to solve the WSC using
i.e. there is no pertinent difference in the positions of either    deeper semantics, it is clear that the necessary commonsense
Tom or Ray; if there were then the quantity maxim says it           knowledge would involve the formalization of a notoriously
should be made known.                                               extensive knowledge base. How to obtain and organize such
   This rule however does not always hold. Imagine we re-           a large knowledge base is unclear.
place ‘stairs’ with ‘swimming pool’:                                   On the one hand, due to the variety and scope necessary,
                                                                    mining commonsense knowledge is appealing; however, as
  Tom threw his school bag down to Ray after hex reached
                                                                    previously discussed, the available methods and nature of
  the top of the swimming pool. Who reached the top of
                                                                    text corpora pose limitations to obtaining deep knowledge,
  the swimming pool? Answer: Ray.
                                                                    which is complex and commonly not explicit. On the other
   In this scenario x reaches the top of the swimming pool,         hand, hand crafted knowledge bases such as CYC (Lenat
breaking the surface of the water. x is then not in a position      1995), which incorporate a deeper level of knowledge, have
to throw something like a school bag downwards, as it is            had limited success and it is not clear how they should be
pretty hard to throw textile objects through water. Hence, we       exploited.
imagine that x is not Tom, but Ray, and that Tom must be               Beyond the problem of its acquisition, it is well known that
stood somewhere above the swimming pool.                            commonsense knowledge is hard to formalize, particularly if
                                                                    the required level of detail involves the semantics of natural
Relevant objects In general in the WSC to come to a con-            terms to be preserved. Vagueness and ambiguity are inherent
clusion we only need to reason about entities that are explic-      to natural language and, for that reason, it is problematic
itly mentioned. In the school bag example we reason about           to prescribe single strict interpretations to natural terms. To
the two characters in the narrative, Tom and Ray, the staircase     illustrate this, consider the WS (3) and imagine the case of
and the school bag itself. Combining knowledge of actions           a naive definition of a relation at the top of (x, y) ≡ x is on
like ‘throwing’ ‘reaching the top of’ etc.. with knowledge of       y and for any z which is part of y, x is not below z. We see
these objects. In general then, we do not need to appeal to         that this fails for multiple reasons.
the existence of extra entities in order to come to a conclu-
sion. This can also be explained by the quantity maxim, the         1. If Tom were one step below the very last one, it could still
sentence should provide the necessary objects for the reader           be considered that he is at the top of the stairs, particularly
to make sense of the sentence.                                         if Ray were well below him. We call it sorites vagueness
   However, as previously discussed, certain words or phras-           when there is a the lack of a clear threshold of application
ings indirectly suggest the existence of certain entities, as          of a term.
in the ‘Sam chopped down the tree’ example. We can in               2. If we change ‘stairs’ to ‘building’ we might say that
part account for these entities by encoding into a lexicon             Tom is at the top of a building because he is on the top
(Pustejovsky 1991), though these are conventions that will             floor, rather than on the roof. In that case we are shift-
not always hold. Therefore a defeasible reasoning process is           ing the interpretation of the predicate to something like
necessary to select the most appropriate interpretation.               at the top of (x, y) ≡ z is the top part of y and x is on
   To conclude our discussion about assumptions about the              z. There may also be many admissible interpretations of
world, we see that appropriate assumptions need to be made             what it means for z to be the top part of y. We call the
in order to reach the right conclusion. Further, we believe that,      multiplicity of conceptually distinct interpretations of nat-
to varying extents, these kinds of considerations arise when           ural terms conceptual vagueness. Further discussion on
analysing most WSs appearing in the collection maintained              the multiple interpretations of natural language terms and
by Davis1 . However, the assumptions are dependent on the              their role in knowledge bases and ontologies can be found
specific situation and we need to discern somehow when the             in (Bennett 2005).
assumptions are appropriate. Deciding when to accept these             Much of the work done in acquiring commonsense knowl-
assumptions should include pragmatic considerations. For            edge circumvents vagueness in different ways, such as us-
example, it is lexical and semantic knowledge that suggest          ing shallow semantics or microtheories that do not need to
the existence of an axe in the sentence ‘Sam chopped down           be consistent with one another. Various theories, however,
the tree’, however it is a pragmatic task to actually infer         have been proposed for dealing with vagueness. Fuzzy logic
this. This motivates a heuristic process which incorporates         (Zadeh 1965) stands as an intuitive solution for modelling
pragmatics and gives preference to default assumptions, we          sorites vagueness by assigning degrees of truth. More in-
will discuss this idea later.                                       teresting for this research, supervaluation semantics (Fine
                                                                    1975) is based on the idea that vague language can be inter-
Formalizing commonsense knowledge: level of                         preted in many different precise ways, each of which can be
detail and vagueness                                                logically conceptualised in a precisification (Bennett 2001;
An important issue is to recognize the level of semantics that      Gómez Álvarez and Bennett 2017), thus also offering support
one believes is appropriate for a solution to the WSC. Our          for modelling conceptual vagueness.
   So where do all these considerations lead us? In order to      further, we are not only interested in picking a prototypical
reach the kind of solution we desire, we must be able to deal     example from a category, say from the class ‘pet’ or ‘things
with semantic underdeterminacy — part of which involves           that we eat’. Instead, we would also like to find prototypical
deciding when to use appropriate commonsense assumptions          instances of relations that can be used to compare an infinite
— and also make use of a vast amount of detailed knowledge        number of objects. Although there is some work done on
while dealing with the associated problems of vagueness.          vector analysis for relationships between words (Mikolov,
   With these issues in mind, we now consider some avenues        Yih, and Zweig 2013), in particular for analogy problems,
for further work.                                                 it does not appear to be applicable to this sort of reasoning
                                                                  problem.
   The role of pragmatics in solving the WSC                         Suppose we have a vague term, like ‘smaller’. How can we
                                                                  decide on prototypical instances of this relation? Adopting
In the previous sections we have highlighted how current          the supervaluation approach we would have a collection of
approaches, regardless of their success in solving schemas,       precise interpretations of its meaning. Following motivation
have provided limited support for the kind of intelligent be-     from (Rosch and Mervis 1975) — considering shared proper-
haviour that we would like to replicate. Here, in an attempt      ties of classes — in an ideal scenario prototypical instances
to account for some of the key challenges, we propose an          of ‘smaller’ share properties across all instances of ‘smaller’
alternative approach, favouring the use of knowledge bases        i.e. a prototypical instance of smaller is considered smaller
where the deep semantics of the different interpretations of      in all plausible interpretations. Consider the definitions for
commonsense terms are formalised. Furthermore, we suggest         ‘smaller’ given in (Davis 2013):
using heuristic approaches based on pragmatics to determine,
in the context of each particular schema, appropriate config-     1. Smaller(a, b) ≡ VolumeOf (a) < VolumeOf (b)
urations of both reasonable interpretations of the terms and      2. Smaller(a, b) ≡ DiameterOf (a) < DiameterOf (b)
necessary assumptions about the world.
                                                                  3. Smaller(a, b) ≡ a ⊂ b
   For this purpose we first motivate the use of prototypes
for categories and relations and then develop how heuristic       4. Smaller(a, b) ≡ ∃s(s > 1 ∧ b = Scale(a, s))
methods can provide a manageable way of using pragmatic              In this scenario, there are certainly pairs of objects that fall
knowledge for the disambiguation of WSs.                          into all four categories (e.g. a sphere of radius 1 is smaller
                                                                  than a sphere of radius 2 in all the above senses). Hence,
Appealing to prototypicality                                      it would be appropriate to take the conjunction of all four
There is various work in pragmatics and cognitive science         definitions as a requirement for an instance to be considered a
highlighting the importance of using prototypes: in utterance     prototypical case of ‘smaller’. However, in certain scenarios
interpretation defaults are assigned before contextual and        it may be inappropriate to take the conjunction in this way,
pragmatic considerations are taken into account (Levinson         as some definitions may be conflicting. In this case different
1995; Recanati 2004) and there is also evidence for the human     metrics can be proposed for selecting prototypes that satisfy
preference for good examples (prototypes) of some category        most of the interpretations.
as opposed to boundary cases and, further, that prototypes are       Finally, our main claim in this section is twofold. On the
associated with the least processing effort (Rosch 1978). In      one hand, we consider that an understanding of typicality
the particular scenario of a WS, we argue that the way vague      is necessary for commonsense reasoning — that by default
terms are presented leads the reader to interpret them con-       we should consider prototypes. On the other hand, a process
sidering prototypical instances fitting the described scenario.   which can only reason over prototypical definitions is clearly
For instance, when one reasons about the WS (2) involving         flawed in many respects as it creates over-simplification. Hu-
the trophy and the suitcase, it is not necessary to worry about   mans often use context to help narrow definitions, for ex-
a precise semantic commitment for the notion of larger, but       ample defining ‘smaller’ in a particular way makes sense
instead to evaluate the sentence considering clear cases that     when talking about ‘fitting in’. Hence we believe that a good
satisfy most of the possible interpretations.                     approach should reflect the diversity of possible interpreta-
   Some of the previously discussed approaches work along         tions of vague terms and that an engine based on pragmatics
similar lines, using general commonsense rules and a notion       should guide the selection of appropriate alternatives when
of correlation which appeal to a sense of typicality. However,    the prototype is not suitable.
we believe that this should be more nuanced and that the deep
semantics of different interpretations should be preserved.       Heuristics standing in for pragmatics
Hence, we propose an approach using ideas from prototype          In this paper we have discussed some approaches proposed
theory (Rosch and Mervis 1975) to differentiate prototypical      for the WSC relying on heuristic methods in different ways
instances of vague terms and relations from borderline cases      (Rahman and Ng 2012; Peng, Khashabi, and Roth 2015;
within a supervaluationist approach.                              Liu et al. 2016). Overall, we concluded that heuristics do
   Much work has been done on how to pinpoint prototypical        not provide satisfactory solutions when reduced to evaluating
members of categories, mainly using vector analysis or con-       shallow semantic notions such as correlation.
ceptual spaces to find the centroid of a concept (Verheyen,          Instead, as has been argued, we believe that a good solution
Ameel, and Storms 2007; Lenci 2011). However, it is not           to the WSC should disambiguate the pronoun by considering
clear how one could reason with this to resolve a WS, and         the most plausible configuration of the scenario described,
and the process of finding it should incorporate rich syntactic,           it came to that disambiguation, potentially satisfying Mor-
semantic and pragmatic considerations. However, although                   genstern and Ortiz’s requirement of a simple explanation. In
advocating deeper semantics and symbolic based approaches                  spite of being preliminary research, in our view its reasonable
that allow for the kind of reasoning that we want (see section             results suggest that fruitful work can be done in further devel-
above), we propose that heuristic methods have a key role                  oping heuristic methods to assess the pragmatic and semantic
in the WS resolution: that of simplifying the space of pos-                considerations that govern reasonable disambiguations of
sibilities and estimating reasonably good configurations of                natural language.
precisifications and necessary assumptions about the world.                   To conclude this section, it is our claim that this use of
   As we have highlighted above in order to carry out satis-               heuristics is much more in keeping with the nature of the
factory reasoning we believe a system should give preference               WSC. That what should be simplified in order to keep the
to both commonsense assumptions about the world as well as                 task manageable is not so much the deep semantics of natural
prototypical interpretations of the terms involved. These how-             terms, but the process of selecting and integrating relevant
ever should only be preferences rather than concrete rules.                interpretations and background knowledge in the particular
When to accept or reject these default assumptions requires                context of the resolution of each sentence.
knowledge and pragmatic understanding. The ability for this
complex mix of pragmatics and world knowledge to con-                                               Conclusion
tradict itself means that possible solutions or configurations             In this paper we have discussed the nature of the WSC as
of a described scenario are not unique. For example, when                  a benchmark, highlighting the shortcomings of several cur-
discussing the issue of throwing a school bag in a swim-                   rent approaches and providing motivation for a more detailed
ming pool above, the implausibility of throwing a school                   level of knowledge. We have also analysed some of what we
bag through water outweighed the assumption of Tom and                     consider to be key challenges, in particular drawing attention
Ray being in the same place. However, we may also consider                 to the need to take account of pragmatic considerations. To
that the assumption of characters being in the same place                  begin addressing these challenges, we have suggested using
outweighs the usual interpretation of ‘throw down’ and ‘top’:              frameworks able to support the detailed semantics of natural
supposing Tom and Ray are both stood in the swimming pool,                 terms while accounting for its vagueness. Moreover, that their
we may interpret ‘throw down’ as ‘throw horizontally away                  complexity can be manageable with the use of prototypes,
from the end of the swimming pool’ and ‘top of the swim-                   which should be identified and used by default, and, finally,
ming pool’ to denote the end of the swimming pool. The                     that heuristic methods can be used to incorporate varying
result would then be to disambiguate the pronoun as ‘Tom’                  semantic interpretations as well as assumptions about the
rather than ‘Ray’. This second interpretation is not wrong,                world, which maintain the pragmatic principles of coopera-
however when ‘throw down’ and ‘top’ are interpreted in their               tive communication.
usual way there is a plausible inference that Tom and Ray                     In conclusion, it is our view that, while heuristic mecha-
are not both located in the swimming pool. This would then                 nisms are necessary to deal with natural language and to re-
be an example of a ‘conversational implicature’ (Grice 1975)               duce the complexity of commonsense reasoning, they should
and explain why the writer of the sentence did not explicitly              not be used to over-simplify the semantics of natural terms.
give Tom and Ray’s initial locations. Hence in the first inter-            Instead, we believe that applications along the lines of the-
pretation we have a good explanation for violating the default             oretical studies in pragmatics can play a significant role in
that Tom and Ray are located in the same place and we also                 the selection of good interpretations of natural terms and
interpret all the terms in a usual fashion, therefore making               to enrich the provided descriptions of the world with the
this interpretation appear to be the valid one.                            appropriate implicit knowledge.
   Being able to leverage these kinds of inferences is an im-
portant and difficult task in commonsense reasoning. Along                                    Acknowledgements
these lines, one avenue (Schüller 2014) adopted in tackling               Thanks to Brandon Bennett for helpful discussion and to the
the WSC has been to explore relevance theory (Sperber and                  anonymous reviewers for their useful feedback.
Wilson 2004). This theory, inspired by Grice’s work, is based
on the idea that an utterance can have a variety of interpre-                                       References
tations, and that it is through parsing, disambiguating terms,             Bailey, D.; Harrison, A.; Lierler, Y.; Lifschitz, V.; and Michael,
resolving pronouns and adding pragmatic inference as well                  J. 2015. The Winograd Schema Challenge and Reasoning about
as appropriate assumptions based on context that one can                   Correlation. In Working Notes of the Symposium on Logical For-
comprehend the meaning of an utterance. The principle guid-                malizations of Commonsense Reasoning.
ing these tasks is the idea of maximizing relevance4 . Schüller           Baker, C. F.; Fillmore, C. J.; and Lowe, J. B. 1998. The berkeley
uses these ideas to motivate a heuristic process for reasoning             framenet project. In Proceedings of COLING/ACL, 86–90.
over graphs, where a fitness function is employed to find rel-             Bennett, B. 2001. What is a Forest? On the vagueness of certain
evant combinations that provide a disambiguation. Moreover,                geographic concepts. Topoi 20(2):189–201.
the resulting graph can be read off to get some idea of how                Bennett, B. 2005. Modes of concept definition and varieties of
                                                                           vagueness. Applied Ontology 1(1):17–26.
   4
    An input is said to be relevant if a worthwhile conclusion is          Bunt, H., and Black, W. 2000. The ABC of Computational Pragmat-
drawn from it. An input is more relevant if it yields a greater positive   ics. In Bunt, H., and Black, W., eds., Natural Language Processing,
cognitive effect for less processing effort.                               volume 1. Amsterdam: John Benjamins Publishing Company. 1–46.
Carbonell, J. G., and Brown, R. D. 1988. Anaphora resolution: a         Mitkov, R. 2014. Anaphora resolution. Routledge.
multi-strategy approach. In Proceedings of the 12th Conference on       Morgenstern, L., and Ortiz Jr, C. L. 2015. The Winograd Schema
Computational linguistics, volume 1, 96–101.                            Challenge: Evaluating Progress in Commonsense Reasoning. In
Carston, R. 1999. The semantics/pragmatics distinction: A view          AAAI, 4024–4026.
from relevance theory. In Turner, K., ed., The semantics/pragmatics     Ng, V. 2017. Machine Learning for Entity Coreference Resolution:
interface from different points of view. Oxford, UK: Elsevier. 85–      A Retrospective Look at Two Decades of Research. In AAAI, 4877–
125.                                                                    4884.
Davis, E., and Marcus, G. 2015. Commonsense reasoning and com-          Peng, H.; Khashabi, D.; and Roth, D. 2015. Solving hard corefer-
monsense knowledge in Artificial Intelligence. Communications of        ence problems. In Proceedings of NAACL, 809–819.
the ACM 58(9):92–103.
                                                                        Pustejovsky, J. 1991. The Generative Lexicon. Computational
Davis, E. 2013. Qualitative Spatial Reasoning in Interpreting Text      linguistics 17(4):409–441.
and Narrative. Spatial Cognition & Computation 13(4):264–294.
                                                                        Rahman, A., and Ng, V. 2012. Resolving complex cases of definite
Durrett, G., and Klein, D. 2013. Easy Victories and Uphill Battles      pronouns: the winograd schema challenge. In Proceedings of the
in Coreference Resolution. In EMNLP, 1971–1982.                         2012 Joint Conference on EMNLP and CoNLL, 777–789. ACL.
Fine, K. 1975. Vagueness, truth and logic. Synthese 30(3):265–300.      Recanati, F. 2004. Pragmatics and Semantics. In Handbook of
Grice, H. P. 1975. Logic and conversation. In Syntax and Semantics,     Pragmatics. Oxford: Blackwell. 442–462.
Vol. 3, Speech Acts. New York: Academic Press. 41–58.
                                                                        Rosch, E., and Mervis, C. B. 1975. Family resemblances: Studies in
Gómez Álvarez, L., and Bennett, B. 2017. Classification, Individ-     the internal structure of categories. Cognitive Psychology 7(4):573 –
uation and Demarcation of Forests: formalising the multi-faceted        605.
semantics of geographic terms. In 13th International Conference
                                                                        Rosch, E. 1978. Principles of categorization. In Rosch, E., and
on Spatial Information Theory. Leibniz International Proceedings
                                                                        Lloyd, B. B., eds., Cognition and categorization, volume 1. Hills-
in Informatics.
                                                                        dale, NJ: Lawrence Erlbaum Associates. 27–78.
Hayes-Roth, F.; Waterman, D.; and Lenat, D. 1984. Building expert
                                                                        Schüller, P., and Kazmi, M. 2015. Using Semantic Web Resources
systems. Reading, MA: Addison-Wesley.
                                                                        for Solving Winograd Schemas: Sculptures, Shelves, Envy, and
Isaak, N., and Michael, L. 2016. Tackling the Winograd Schema           Success. In SEMANTiCS (Posters & Demos), 22–25.
Challenge Through Machine Logical Inferences. In Pearce, D., and
Sofia Pinto, H., eds., STAIRS, volume 284 of Frontiers in Artificial    Schüller, P. 2014. Tackling Winograd Schemas by formalizing
Intelligence and Applications. IOS Press. 75–86.                        relevance theory in knowledge graphs. In Fourteenth International
                                                                        Conference on the Principles of Knowledge Representation and
Kempson, R. 1984. Pragmatics, anaphora and logical form. In             Reasoning.
Schriffin, D., ed., Meaning, form and use in context: linguistic
applications. Washington, DC: Georgetown University Press. 1–10.        Sharma, A.; Vo, N. H.; Aditya, S.; and Baral, C. 2015. Towards
                                                                        Addressing the Winograd Schema Challenge-Building and Using
Lenat, D. B. 1995. CYC: A large-scale investment in knowledge           a Semantic Parser and a Knowledge Hunting Module. In IJCAI,
infrastructure. Communications of the ACM 38(11):33–38.                 1319–1325.
Lenci, A. 2011. Composing and updating verb argument expecta-           Sharma, A. 2014. Solving Winograd schema challenge: Using
tions: A distributional semantic model. In Proc 2nd Workshop on         semantic parsing, automatic knowledge acquisition and logical
Cognitive Modeling and Computational Linguistics, 58–66. ACL.           reasoning. Ph.D. Dissertation, Arizona State University.
Levesque, H.; Davis, E.; and Morgenstern, L. 2012. The Winograd         Speer, R., and Havasi, C. 2012. Representing General Relational
Schema Challenge. In Thirteenth International Conference on the         Knowledge in ConceptNet 5. In LREC, 3679–3686.
Principles of Knowledge Representation and Reasoning.
                                                                        Sperber, D., and Wilson, D. 2004. Relevance theory. In Handbook
Levesque, H. J. 2014. On our best behaviour. Artificial Intelligence    of Pragmatics. Oxford: Blackwell. 607–632.
212:27–35.
                                                                        Turing, A. M. 1950. Computing machinery and intelligence. Mind
Levesque, H. J. 2017. Common sense, the Turing test, and the quest
                                                                        59(236):433–460.
for real AI. Cambridge, MA: MIT Press.
                                                                        Verheyen, S.; Ameel, E.; and Storms, G. 2007. Determining the
Levinson, S. C. 1995. Three levels of meaning. In Grammar and
                                                                        dimensionality in spatial representations of semantic concepts. Be-
meaning: Essays in honour of Sir John Lyons. Cambridge University
                                                                        havior Research Methods 39(3):427–438.
Press. 90–115.
                                                                        Zadeh, L. 1965. Fuzzy sets. Information and Control 8(3):338–353.
Liu, Q.; Jiangb, H.; Linga, Z.-H.; Zhuc, X.; Weid, S.; and Hua,
Y. 2016. Commonsense Knowledge Enhanced Embeddings for
Solving Pronoun Disambiguation Problems in Winograd Schema
Challenge. arXiv preprint arXiv:1611.04146.
McCarthy, J. 1993. Notes on formalizing context. In Proceedings
of the 13th international joint conference on Artifical intelligence-
Volume 1, 555–560. Morgan Kaufmann Publishers Inc.
McCarthy, J. 1998. Elaboration tolerance. In Common Sense,
volume 98.
Mikolov, T.; Yih, W.-t.; and Zweig, G. 2013. Linguistic regular-
ities in continuous space word representations. In NAACL HLT,
volume 13, 746–751.
Miller, G. A. 1995. WordNet: a lexical database for English. Com-
munications of the ACM 38(11):39–41.