<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>KARaML: Integrating Knowledge-Based and Machine Learning Approaches to Solve the Winograd Schema Challenge</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Suk Joon Hong</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Brandon Bennett</string-name>
          <email>B.Bennett@leeds.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Judith Clymo</string-name>
          <email>J.C.Clymo@leeds.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lucía Gómez Álvarez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>InfoMining Co.</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>South Korea</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>In A. Martin, K. Hinkelmann</institution>
          ,
          <addr-line>H.-G. Fill, A. Gerber, D. Lenat, R. Stolle, F. van Harmelen (Eds.)</addr-line>
          ,
          <institution>Proceedings of the AAAI 2022 Spring Symposium on Machine Learning and Knowledge Engineering for Hybrid Intelligence (AAAI-MAKE 2022), Stanford University</institution>
          ,
          <addr-line>Palo Alto, California</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>TU Dresden</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Leeds</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Winograd Schema Challenge (WSC) is a commonsense reasoning task introduced as an alternative to the Turing Test. While machine learning approaches using language models show high performance on the original WSC data set, their performance degrades when tested on larger data sets. Moreover, they do not provide an interpretable explanation for their answers. To address these limitations, we present KARaML, a novel asymmetric method for integrating knowledge-based and machine learning approaches to tackle the WSC. A central idea in our work is that semantic roles are key for the high-level commonsense reasoning involved in the WSC. We extract semantic roles using a knowledge-based reasoning system. For this, we use relational representations of natural language sentences and define high-level patterns encoded in Answer Set Programming to identify relationships between entities based on their semantic roles. We then use the BERT language model to find the semantic role that best matches the pronoun. BERT performs better at this task than on the general WSC. We apply our ensemble method to a restricted domain of the large WSC data set, WinoGrande, and demonstrate that it achieves better performance than a state of the art pure machine learning approach.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Winograd Schema Challenge</kwd>
        <kwd>Knowledge Representation</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Semantic Roles</kwd>
        <kwd>Natural Language Understanding</kwd>
        <kwd>Answer Set Programming</kwd>
        <kwd>BERT</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The Winograd Schema Challenge (WSC) is a commonsense reasoning test proposed in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to
demonstrate whether a machine is “capable of producing behaviour that we would say required
thought in people”. The task of the WSC is to resolve which noun a pronoun refers to in a
given sentence. Winograd schema (WS) examples are typically written in pairs (which we call
Winograd schema pairs). These difer in only a few words, called the special and the alternate
words. Two candidate nouns are given alongside each schema as possible referents of the target
pronoun (the same candidates for each schema in the pair), and the pronoun must be resolved
in opposite ways depending on which of the special or alternate words was used. The use
of schema pairs is intended to ensure that syntactic clues cannot help in finding the referent
of the pronoun. Instead, this must be done by using world knowledge and reasoning. The
original set of WSs, known as WSC273 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], contains only 273 instances, but more recently a
dataset of around 44000 following the same style was developed through crowd-sourcing [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
An example from WSC273 is given below. Large and small are the special and the alternate
words respectively:
• The trophy doesn’t fit in the brown suitcase because it is too large.
      </p>
      <p>the trophy (answer) / the suitcase.
• The trophy doesn’t fit in the brown suitcase because it is too small.</p>
      <p>the trophy / the suitcase (answer).</p>
      <p>
        Although the instigators of the WSC had originally envisaged that formalised theories of
commonsense knowledge would be required to address the challenge [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], it has been tackled
by a wide variety of approaches and has highlighted some serious dificulties that arise for
Knowledge Representation (KR) approaches when applied to unconstrained, general problems
of natural language understanding. By contrast, language models based on Machine Learning
(ML) have achieved relatively good performance on WSC test sets although they do not employ
any explicit representation of the detailed knowledge that seems to be involved in resolving
WSC problems. Despite this success, the language model approaches have some weaknesses.
Current language model methods are brittle, in that results are sensitive to small changes in
the way a problem is expressed that are irrelevant to its solution. Language model approaches
to the WSC so far do not provide any justification for the answers they give. As the WSC is
supposed to test ‘understanding’, this is a significant limitation.
      </p>
      <p>
        Our current work explores a combined KR and ML approach to the WSC. We call our system
KARaML, standing for Knowledge Assimilation based on Roles and Machine Learning, and use
the semantic roles of the agents participating in the situation described to resolve the WSC
problem. We use the semantic parser K-Parser [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to extract a relational semantic representation
of the schema, and ASP-based rules to determine semantic roles of the candidate nouns. We then
uses the language model BERT [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to match the pronoun to one of the extracted semantic roles.
This allows us to leverage the implicit knowledge in the language model and so avoid manually
building or attempting to explicitly learn a large knowledge base. By using the language model
in a more focused way, rather than asking it to solve the whole task, our system is able to avoid
some of the fragility commonly displayed by language models, and can provide an explanation
alongside its decision. We have tested our approach on a subset of the large WSC data set,
WinoGrande [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and found that it performs better than pure ML methods using BERT [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Winograd schemas have been tackled by both KR and ML approaches. A typical KR approach
would aim to resolve a WS by first translating the textual form of the schema into a logical
representation, then combining this with additional axiomatised background knowledge and
using rules of inference to deduce the reference of the pronoun. Early work on AI systems for
natural language understanding by Hobbs [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] proposed formalised principles of coherence that
can account for co-references in many cases. However, he noted that in some cases, establishing
the reference of a pronoun also requires detailed background knowledge. Indeed, the solution of
most WS examples appears to involve knowledge concerning particular physical and/or social
situations and understanding of vocabulary terms as well as general principles of communication
and inference.
      </p>
      <p>
        Sophisticated formal frameworks such as Segmented Discourse Representation Theory [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
have been developed in order to explain the logic underlying coherence and co-reference.
However, the complexity of such theories has been an obstacle to their implementation in
practical applications. Kehler et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and subsequently Bennett [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] gave formal analyses that
account for certain WS cases. Schüller [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] presented a general method based on relevance
theory and knowledge graphs. But the level of detail required to model knowledge relevant to
specific cases suggests that the extension of these kinds of approach to incorporate suficiently
comprehensive knowledge to give general coverage of WS problems would be an enormous
task. Bailey et. al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] proposed a ‘correlation calculus’, which uses first-order logic with a
novel correlation connective, to resolve WSs. This ofers the prospect of a more general form of
KR-based solution in which the complex types of correlation involved in solving the WSC might
be inferred from simpler assumptions but would still require large numbers of basic correlations
to be represented in order to cover the huge variety of possible WS problems.
      </p>
      <p>
        A possible way to make KR approaches more efective for particular problem types may
be to focus on aspects of semantics that are especially salient for those problems. We believe
that the notion of ‘semantic role’ is such an aspect, which is often decisive in establishing
co-reference and hence in solving WS problems. Semantic Role Labelling (SRL) is considered
to be a significant computational task for natural language understanding, and can be carried
out with high accuracy by some existing systems (such as SENNA [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]), and a method of
using semantic roles for co-reference resolution is described in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. In NLP semantic roles are
primarily defined in terms of the linkage of noun phrases to verbs (e.g. as ‘subject’, ‘object’ etc.).
However, in the current paper we advocate a more general idea of semantic role that is held in
relation to an activity (e.g. helping, needing help) and is not strictly tied to particular verbs and
grammar. This idea of semantic role is akin to that adopted in Frame Semantics [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        Many systems have been developed that can translate from natural language text into some
form of logical representation [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ]. This ‘semantic parsing’ task is extremely challenging and
the results obtained are unreliable, especially for complex sentences such as those occurring in
WSs. Nevertheless, the extracted representations do identify entities, properties, relationships
and logical structures that can be processed by KR-based reasoning systems. Sharma [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]
developed a semantic parser, K-Parser [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to transform schemas into relational representations,
and uses these to resolve WSs. This method enhanced the extracted semantic content using rules
formulated in Answer Set Programming (ASP) [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The initial use of this method for solving
WS problems also required hand crafted representation of relevant background knowledge.
The method shows accuracy of around 80% on the original WSC273 set when relational
representations of both schemas and background knowledge principles are manually created.
To address the problem of encoding suficient knowledge to cover a wide class of commonsense
reasoning problems, various automatic knowledge extraction techniques have been employed.
Sharma was able to achieve a more automated solution by extracting background knowledge
using Google search to obtain identity rules enabling pronoun resolution [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. However, fewer
than half of the required rules could be obtained by this automated method.
      </p>
      <p>
        In our previous work [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] we built on Sharma’s method [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. We used K-Parser with additional
hand-coded ASP rules to extract semantic roles of the candidate nouns, similar to the
patternbased semantic relation extraction of Al-yahya et al. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. Further logical rules were then used
to determine the pronoun’s referent based on its semantic role and those of the two candidates.
      </p>
      <p>
        Regarding ML approaches, Rahman and Ng [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] obtained promising results using an SVM
ranker based on a variety of linguistic features, both semantic and syntactic. More recently,
approaches based on neural network language models have made significant progress on the
WSC task. Using the BERT [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] language model, high accuracy for resolving WSs has been
demonstrated [
        <xref ref-type="bibr" rid="ref2 ref23 ref5">23, 5, 2</xref>
        ], with up to 90% accuracy reported for WSC273 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Using the BERT
variant RoBERTa, which has been found to perform better on many tasks, similarly high accuracy
has been obtained [
        <xref ref-type="bibr" rid="ref2 ref24">24, 2</xref>
        ].
      </p>
      <p>
        However, it is too early to claim that machines have reached human-like ability to resolve
Winograd schemas. WSC273 is a very small test set, and accuracy has been found to decrease
by around 10% or more on larger WSC-like data sets. Consequently, some researchers have
suggested that the strong performance on WSC273 may overstate the capability of neural
language models to carry out commonsense reasoning tasks [
        <xref ref-type="bibr" rid="ref2 ref25">2, 25</xref>
        ]. Tests that focus on cases
involving compositional logical structure indicate that BERT does not work well in relation
to function words such as negation [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. BERT also seems to lack robustness with respect
to irrelevant small variations: simply changing proper names can cause it to give incorrect
answers to some WSs which were previously answered correctly [
        <xref ref-type="bibr" rid="ref19 ref23">23, 19</xref>
        ]. This suggests that
language models may work by recognising features that are, at least in some cases, only
indirectly connected with genuine understanding of WS problems. This also relates to issues
of transparency and explainability. Humans would expect answers to be based on general
principles, whereas current methods based on language models do not provide any meaningful
explanation for their answers. Whereas humans appear to employ both commonsense reasoning
and intuition [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], neural language models seem to work in a way that is more similar to intuition
than to logical reasoning.
      </p>
      <p>In this paper we attempt to develop a new way to combine KR and ML in order to address the
WSC and contribute to exploration of the general problem of natural language understanding.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Winograd Schema Structure and Semantic Roles</title>
      <p>In this section we examine the syntactic and semantic structure of Winograd Schema problems
in order to motivate and explain our resolution method.</p>
      <sec id="sec-3-1">
        <title>3.1. Schema Structure</title>
        <p>A WS is a sequence of tokens in which three (non-overlapping) sub-sequences are indicated:
two words or phrases referring to ‘candidate’ entities and one pronoun (normally a single word).
Thus, it is an expression of the form  (, , ), whose meaning constrains the references the
candidate terms  and  and the pronoun . For the expression to be considered satisfactory as
a WS, any reasonable human being should either infer from it that  refers to the same thing as
, or infer from it that  refers to the same thing as .</p>
        <p>In nearly all WS examples, there is a clear division into two propositional components, with
the first component describing a situation involving both candidates,  and , and the second
giving information involving . Hence, a WS normally has the structure (, ) #  (), where
‘#’ represents the type of connection between the two parts. In many cases the two parts are
separate sentences. For these cases we can treat the connective as logical conjunction (although
temporal sequence may also be implied). In other cases, the halves may be connected with
words such as ‘and’, ‘because‘, ‘although‘, ‘since‘ etc. The particular connective is relevant to
pronoun resolution.1</p>
        <p>So the pronoun resolution problem has the following form:
( (, ) #  () ) ∧  = (|) ) ⇝
( =  ) ,
(1)
where ⇝ represents some kind of rational inference relation and  is either  or . The
presupposition that  must be identified with exactly one of the two candidates, is represented
by the notation  = (|).</p>
        <p>Given that we need to infer an identity between  and either  or , there must be some aspect
of the content of (, ) which can be linked to the content of  () in such a way that either 
or  can be distinguished as the more likely co-referent of . One way to approach this would
be to tease out from  what is said individually about each of the candidates and try to link
that to  . Indeed, by means of semantic intuitions or by using an automated semantic parser, a
given proposition (, ) can typically be analysed into a combination of simpler components,
 () ∧  () ∧  (, ) ∧  , where  and  represent conditions that are individually ascribed
to candidates  and  respectively,  represents whatever information is asserted about the
relationship between  and , and  is any additional information that does not directly involve
 or . More specifically, each of the components  ,  ,  ,  , may correspond to a (possibly
empty) set of predicates in the semantic analysis.</p>
        <p>
          In ordinary natural language, there are many examples where the reference of the pronoun
can be resolved just by considering the individual properties of potential candidates ( and  ).
Lesvesque et. al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] consider the example ‘The women stopped taking the pills because they were
[pregnant/carcinogenic]’. However, although this seems to be a typical use of a pronoun, it is not
considered to be a good schema. Lesvesque et. al. explicitly say that this is a poor example, since
correct resolution can be determined just by considering the types of the candidates (‘women’
and ‘pills’) and the types of entity of which the attributes ‘pregnant’ and ‘carginogenic’ could
be predicated. Such cases are considered too easy to demonstrate that intelligence is required
to resolve them. They suggest that suitably dificult WS examples must require understanding
of the situation. This would typically involve the relationship between the candidates or some
property that is not merely a simple type attribute of one of the candidates.
        </p>
        <p>
          1For instance, in the WS273 set presented by Levesque et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] includes the example ‘Pete envies martin
[because/although] he is successful‘, where swapping ‘because’ with ‘although’ changes the pronoun reference.
This case was also considered by [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], which suggests that, whereas ‘because’ implies positive correlation, ‘although’
implies negative correlation.’
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Semantic Role Extraction</title>
        <p>In the majority of cases we have examined, inferences based on individual properties  () and
 () are not enough. In order to resolve the pronoun, one needs to extract further attributes of 
and  from the roles they play in the relation  (, ). By introducing  to stand for the situation
described by the relationship  (, ) (i.e. we reify the relationship), we can conceptually unpack
the relation into a conjunction  1(, ) ∧  2(, ) ∧ () representing the semantic roles,  1
and  2 of the participants in relation to , together with any other information () attributed to
the situation. Furthermore, if we are concerned with distinguishing  and  in terms semantic
roles that occur in some particular types of situation (e.g. situations where one person helps
another), then the relevant role information can be represented by unary role properties,  1()
and  2() (e.g.  gives help, and  receives help).</p>
        <p>In fact, existing semantic parsers (such as K-Parser and SENNA) already assign role attributes
to referring constituents of sentences. However, these tend to be lacking in specific semantic
content and determined largely by syntactic features of their occurrence within the text. For,
example a referential word or phrase might be labelled as the ‘agent’, or ‘object’ of a verb. But,
like entity types, such basic role types can only be used to resolve pronouns in ‘easy’ cases.
In more complex cases, pronoun resolution requires understanding the way in which entities
participate in a situation; and this requires specific knowledge of the situation and the roles it
involves. Thus, we suggest that pronoun resolution in WSC problems requires an additional
role extraction mechanism (RE) going beyond an initial semantic parsing (SP) stage. Hence, the
semantic role extraction process can be represented by the following pattern:
 (, , ) ≡</p>
        <p>SP RE
(, )# () =⇒  () ∧  () ∧  (, ) ∧  =⇒  1() ∧  2()
(2)</p>
        <p>To illustrate our analysis, we consider the sentence “Maria is struggling with her exams and
asks for help from Rebecca, because she is already successful.”. Semantic parsing will produce a
formal representation similar to the following:</p>
        <p>(struggling_with_exams( ) ∧ ask_help( , )  successful(she)
from which we want to infer  = Rebecca.</p>
        <p>
          In this example,  () corresponds to the unary property struggling_with_exams( ),
 (, ) is the relation ask_help( , )), the connective # is  and  () is
successful(she). We have not specified any individual condition  predicated of Rebecca,
although if we identified it as a proper name of a person (e.g. by using a named-entity recognition
system [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]) we could add the individual condition ‘Person(Rebecca)’. Role extraction rules, as
explained later in the paper, can then be employed to infer the semantic roles of the participants.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Resolving the Pronoun</title>
        <p>The previous subsection examined the semantic structure of WSs and motivated the extraction
of semantic role attributes of the candidates from the first part of the schema. We now explain
how this can be used to identify the reference of the pronoun in the second part of the schema.</p>
        <p>
          Our general idea is related to the approach of Bailey et. al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], who proposed an extension
of first-order logic with a novel propositional connective. The statement  ⊕  means that
the truth of  is positively correlated with the truth of , in the sense that if a rational agent
becomes aware of the truth of either of the propositions they will consider the other proposition
more plausible than they would have in the absence of that information. The paper presents
a proof system to capture the logic of the ‘⊕ ’ operator and suggest that it can be used to
derive complex correlations from basic correlation assumptions and beliefs. Then these derived
correlations can be used for pronoun resolution. Assuming that what is said about the entity
via the pronoun reference is positively correlated with what is said about it in the candidate
phrase, we should be able to infer either (, ) ⊕  () or (, ) ⊕  () when given a schema
(, )# ().
        </p>
        <p>The correlation calculus is proved to be sound with respect to statistical semantics. And,
although specification of the calculus predates the successful application of language models to
the WSC, it seems that it would be well suited to interfacing with a language model. Instead of
requiring correlations to be determined by axioms and logical reasoning, one could potentially
evaluate or compare degrees of correlation by means of language model responses.</p>
        <p>In our setting the relevant notion of correlation is a little diferent. We aim to find a correlation
between the role description of one of the candidates and the description involving the pronoun.
Also we look for a preferential rather than an absolute correlation. Thus, we wish to determine
which of the semantic roles of the candidates is more likely to apply to the pronoun, given what
is said regarding the pronoun. Hence, given the extracted roles  1() and  2() and assertion
 () regarding the pronoun, then if  denotes  we would expect the following inequality of
relative probabilities:
 ( 1() |  ()) &gt;  ( 2() |  ())
(3)</p>
        <p>Note that what is said in the proposition  () does not need to explicitly describe  in terms
of either of the roles  1 or  2; it only needs to provide some reason to expect that one of the
potential facts  1() or  2() is more likely than the other.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Our Approach: KARaML</title>
      <p>In this section we introduce our system KARaML. We use the semantic roles of the agents to
resolve the WSC, following the analysis in Section 3.</p>
      <p>Figure 1 illustrates the pipeline of our method to resolve WSs. KARaML uses a combination
of KR and ML methods to derive semantic roles of the candidates and pronouns by defining
domain-specific background knowledge relating to these high-level semantic roles. In the figure,
the element labelled ‘Semantic parsing &amp; KR role derivation’ relates to Section 3.2 above, and
the element labelled ‘LM semantic role matching’ to Section 3.3. Finally, the ‘Semantic role
based reasoning’ component uses the previously derived knowledge to infer the solution. If
our combined system does not have suitable rules defined for resolving a schema, we simply
revert to using the language model alone. Other important features of our architecture are the
asymetric combination of KR and ML, and the selection of conceptually related sentences that
follow target patterns. We address these features in detail in the coming subsections.</p>
      <p>After addressing in detail the architecture of KARaML (in Section 5), we give results (in
Section 6) which show that where our combined reasoning method is applicable, we achieve
WS is in domain (e.g. “helping”, “asking”, etc)</p>
      <p>Our knowledge-based reasoning method
WS</p>
      <p>Domain
filter</p>
      <p>No</p>
      <p>Yes</p>
      <p>Semantic
parsing &amp;
KR role
derivation</p>
      <p>Pattern Yes
filter</p>
      <p>LM semantic
role matching
No</p>
      <p>Semantic role</p>
      <p>based
reasoning
Use LM
better performance than using a language model alone.</p>
      <p>
        A major diference from our previous work [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] is that we no longer need detailed
axiomatisations of the domain’s background knowledge to infer the semantic role of the pronoun, which
presented a challenge to scalability of the method. Instead, we will show that a minimal set of
high-level rules for the semantic roles, coupled with the usage of a language model, is enough
to obtain significant results.
      </p>
      <sec id="sec-4-1">
        <title>4.1. An Asymmetric Combination of KR and ML</title>
        <p>A notable feature of our approach is that we apply KR and ML methods asymmetrically with
respect to diferent parts of a WS. Specifically the KR mode of interpretation is focused on the
part of the WS that describes the candidates, whereas a neural language model is used to match
a correlated semantic role for the pronoun.</p>
        <p>A formal representation of a sentence has a rigid structure composed from specific symbolic
vocabulary. This means that if we have KR representations of two related pieces of information
(such as two successive sentences or clauses within a sentence) we can only draw inferences from
their combined content if we have some way of aligning them. This requires both combining
them in terms of a formal syntax, and also making explicit all significant semantic relationships
between the vocabulary of the two parts. When dealing with representations extracted from
natural language, this is a huge challenge. Not only are there an unbounded number of possible
situations that might be described, but even one situation could be described in a wide variety
of ways, using a wide variety of vocabulary terms. Hence, piecing together KR representations
extracted from diferent parts of a natural language text is extremely dificult, even when
connections are very clear to our intuitive understanding.</p>
        <p>By contrast, ML techniques are more malleable, in that they do not require exact matching in
order to connect one piece to another, so they can provide a mechanism for flexibly assimilating
or adjoining new information to an existing KR representation. Figure 2 illustrates the potential
advantage of this type of asymmetric combination.</p>
        <p>It may still seem puzzling why we are always focusing the use of KR on the left side of the
WS and reserving ML for interpreting the role of the right side. This is because our KR analysis
is designed to extract roles of the candidates in WSs and, in the majority of examples, these are
described primarily in the first part of the schema. In general, pronouns nearly always occur
after the noun or noun-phrase with which they co-refer. In most WS examples the pronoun
occurs in a following sentence or clause that does not usually make explicit the role of the
pronoun referent in a way that can be directly linked to the roles of the candidates. Nevertheless,
one might intuitively expect that there is a statistical correlation between the roles of the
candidates in the first part and what is then said using the pronoun in the second part. Indeed,
our results indicate that ML techniques can model this correlation.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Identifying Conceptually Related Sentences</title>
        <p>In general, a WS may involve any vocabulary or domain of knowledge. This is problematic for
KR approaches, which require detailed logical modelling of knowledge and semantics. We use
keywords to identify restricted domains that are more manageable. Our aim is to provide a
simple method for selecting related schemas for which high-level background knowledge rules
can be defined.</p>
        <p>
          A small number of logical rules should be suficient to explain a significant proportion of
schemas in a semantic domain. In particular we present in this paper a study of schemas
obtained by identifying instances containing the keyword ‘help’ (or ‘helping’, ‘helped’, ‘helpful’
etc.). We show that the same principles extend to a larger set including also schemas that
contain the keyword ‘ask’ (or ‘asking’, ‘asked’, etc.), for which only six additional rules needed
to be established. This shows that domains defined in this way are flexible and able to
encompass a variety of schemas. We have previously presented work on schemas containing the
‘thanking’ keyword [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Although our current work focuses on a few hand selected domains,
it demonstrates a general approach which could be extended to cover a larger proportion of
WinoGrande schemas.
        </p>
        <p>In our system, WSs are first filtered for use of keywords and then compared with high-level
patterns. It should be expected that there will be some overlap between domains, where a
sentence references multiple concepts. If a pattern is matched, this indicates that we have
suitable rules defined to understand this sentence. In the case that a sentence matches multiple
patterns we propose using the correlation between candidate and pronoun roles which is
identified as most significant by the language model (i.e. lowest loss).</p>
        <p>A sentence may use knowledge from a domain without containing the relevant keyword.
Provided that the sentences containing the keywords are representative of the domain and
allow us to generate appropriate rules, this is not a significant limitation. We anticipate that
our methods for identifying semantic roles may be extended to sentences which do not contain
the relevant keyword, allowing more sentences to be resolved using the existing rules.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. KARaML System Architecture</title>
      <p>We now describe how we have implemented each component of KARaML from Figure 1. We first
tackle the domain filter, and subsequently we introduce an ASP pattern filter based on K-Parser
output. The pattern filter selects WSs that match certain semantic roles, for which
domainspecific background knowledge has been encoded in ASP. Next, BERT2 is used to determine
which of a pair of contrasting semantic roles the pronoun has a stronger correlation to. By
using the pattern filter together with the background knowledge we can infer the high-level
(and not necessarily explicit) semantic roles of  and  that BERT will choose from. Finally, the
derived semantic roles for the candidates and the best-matching role for the pronoun are used
to infer the final answer.</p>
      <p>In what follows we will give a detailed explanation of our system architecture and we will
use a sample schema from WinoGrande as a running example:</p>
      <p>Maria helped Elena cope with the newly diagnosed autism because she was
inexperienced with the disorder.</p>
      <p>Maria / Elena (answer)
In this case, Maria () performs the semantic role of helper and Elena () performs the semantic
role of being_helped. Our proposal is that the correlation between the semantic roles of the
candidates and the information we have about the pronoun is a good indicator for the pronoun
resolution. In the example, we note that a person being inexperienced in a situation is more
likely to explain (“because” ) needing help than giving help.</p>
      <p>
        While the role Maria : helper can be derived with a relatively simple KR system, deriving
that an inexperienced person is in need of help (she : needing_help) previously required further
manually defined rules [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. In our current work, this task is given to a language model, which
can make use of its implicit understanding of the correlation between inexperience and need.
      </p>
      <sec id="sec-5-1">
        <title>5.1. Domain Filter</title>
        <p>Our pipeline begins with a domain filter that identifies the schemas that may be associated with
a domain. A WS is passed into the filter, which determines whether it belongs to any of the
pre-defined domains by using keywords. Our running example will be categorised using the
“help” keyword. If a schema does not belong to any pre-defined domains, it is categorised as
out-of-domain and will be resolved by BERT.</p>
        <p>In our experiments, we begin by narrowing our attention to a domain centered around the
keyword “help”, which contains 1356 schemas. Subsequently, we target schemas containing
the keyword “ask”, amounting to 1753, which in fact respond to the same underlying patterns,
thus giving rise to a more general domain. Indeed, these sets of schemas intersect, which gives
further evidence that they share a common underlying semantic structure.</p>
        <p>
          2Specifically, we use BERT_WIKI_WSCR from [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] throughout. This is an instance of BERT which has been
additionally fine-tuned for the WSC.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Parsing and Deriving Semantic Roles</title>
        <p>The schemas that have been assigned to domains are parsed by K-Parser, which produces a
relational semantic representation of the input text containing qualitative information about
the words in the text (e.g. their conceptual classes, the relationship between predications and
their participants among other). Then, high-level semantic roles are derived using the output of
K-Parser together with our domain-specific background knowledge rules.</p>
        <p>Let us look at an excerpt from the parsed output of the sample schema:
has_s( helped_2, agent, maria_1 ).
has_s( helped_2, instance_of, help ).
has_s( she_11, trait, inexperienced_13 ).
has_s( inexperienced_13, instance_of, inexperienced ).</p>
        <p>has_s( cope_4, caused_by, was_12 ).</p>
        <p>The output from K-Parser provides us with an initial representation of a given schema, which
is specified by means of predicates of the form has_s( node1, relation, node2 ).
Subsequently, the domain-specific rules are used to expand the output with relevant background
knowledge for the domain, which is mostly focused on the derivation of high-level semantic
roles. Two simple examples of such domain-specific rules are as follows:
1–16
has_s( X, semantic_role, helper )
:has_s( Action, agent, X ),
has_s( Action, instance_of, help ).
has_s( X, semantic_role, being_asked )
:has_s( Ask, recipient, X ),
has_s( Ask, instance_of, ask ).</p>
        <p>Using the first rule we can straightforwardly derive has_s( maria_1, semantic_role,
helper ) as desired for our running example.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. The Pattern Filter</title>
        <p>The parsed results together with the derived semantic roles of the schema are used as inputs to
the pattern filter. If a certain pattern is found by the pattern filter, that schema is to be resolved
by our combined framework. If not, just BERT is used for resolving the WSs.</p>
        <p>Our pattern filter exploits the generic structure given in formula (1) to select schemas that
follow recognised patterns, using the high level semantic roles previously inferred in section 5.2.
Patterns in our system are encoded in ASP and will typically fix semantic roles for one or more
of the agents  and , and possibly additional restrictions on other elements of the schema, such
as forcing # to be “because” and the pronoun to be a person (rather than an inanimate object).</p>
        <p>In this experiment we use a pattern to identify schemas where the roles of “helper” and “being
helped” are likely to be relevant for pronoun resolution. The filter checks whether the semantic
properties and relations extracted satisfy the conditions that: at least one of the candidate
expressions has one of these roles; the pronoun refers to a person (is “he” or “she”) which is the
agent of a verb in the sentence that has been identified as playing an explanatory role in the
situation. The relevant pattern is defined as follows:</p>
        <p>From the 1356 schemas containing the keyword “help”, 207 satisfy this pattern, and from the
1753 schemas containing the keyword “ask”, 456 schemas match a similar pattern.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Using BERT to Identify Semantic Roles</title>
        <p>
          In this phase, schemas that meet the pattern are given to BERT to extract an implicit semantic
role, where the possible roles are determined by the pattern that the schema matched. Previous
work (e.g. [
          <xref ref-type="bibr" rid="ref2 ref5">5, 2</xref>
          ]) uses language models such as BERT to evaluate the probabilities of each of the
candidates occurring as a replacement of the pronoun. So to resolve  in  (, , ) one would
replace  with [MASK] and compare probabilities for [MASK] to be  or .
        </p>
        <p>( [MASK] =  |  (, , [MASK]) ) &gt;  ( [MASK] =  |  (, , [MASK]) )</p>
        <p>In contrast, we focus on deriving which of the candidate semantic roles is most correlated
to the information that is given about the pronoun. To derive this using BERT, we extract the
textual fragment  () and concatenate it (following a period) with a basic sentence linking the
pronoun with the masked semantic role. In our experiment, we added the sentence “he/she
would [MASK] help”, where the [MASK] can be either give or need. Now BERT compares the
probabilities for [MASK] to be give or need, given the context  () and the additional linking
sentence ( would [MASK] help).
 ( [MASK] =  |  ().  would [MASK] help ) &gt;  ( [MASK] =  |  ().  would [MASK] help )
We interpret BERT’s output as indicating the semantic roles “giving_help” and “needing_help”.</p>
        <p>Below we can see the contrast between the input to BERT as part of our Knowledge Based
strategy in contrast to a basic WSC resolution only relying on BERT:
• “She was inexperienced with the disorder. she would [MASK] help.” (Our usage of BERT to
derive a semantic role.)
• “Maria helped Elena cope with the newly diagnosed autism because [MASK] was
inexperienced with the disorder.” (Full WSC resolution using BERT. )</p>
      </sec>
      <sec id="sec-5-5">
        <title>5.5. Reasoning Using Semantic Roles</title>
        <p>This is the last phase of our system. For each schema that was matched to a pattern, the semantic
roles derived in sections 5.2 and 5.4 are used as inputs. With these inputs, some background
knowledge rules are needed to derive the referent of the pronoun. The background knowledge
rules we use are:
• IF a person  {helps / is asked by} a person  because the pronoun  is giving help THEN
 refers to .
• IF a person  {is helped by / asks} a person  because the pronoun  is needing help THEN
 refers to .</p>
        <p>The encoded forms of the background knowledge rules are given below as:
answer(X) :- is_candidate(X),
1 {has_s( X, helps, _ ); has_s( _, asks, X )},
pronoun(P), has_s(Verb,agent,P),
has_s(_,caused_by,Verb),
has_s( P, semantic_role, giving_help ).
answer(X) :- is_candidate(X),
1 {has_s( _, helps, X ); has_s( X, asks, _ )},
pronoun(P), has_s(Verb,agent,P),
has_s(_,caused_by,Verb),
has_s( P, semantic_role, needing_help ).</p>
        <p>Using the domain-specific background knowledge rules and the previously derived semantic
roles, we derive the answer for a WS. In our running example, statements expressing the implicit
semantic role of the pronoun (“needing_help”) from section 5.4 and the semantic roles of the
candidates from section 5.2 are added to the ASP program. Then, the condition defined in the
second background knowledge rule is satisfied, and thus we can derive the answer as “Elena”.</p>
        <p>Note that, although the implicit semantic role of the pronoun is extracted by an ML method,
the reasoning used to resolve the schemas is based on interpretable rules. Hence, the rules used
in resolving a schema provide an explanation of the answer.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Results</title>
      <p>
        Table 1 shows the results of our method contrasted with two systems using BERT alone
(BERT_LARGE [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and BERT_WIKI_WSCR3 from [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]) on the 207 sentences including ‘help’
and the 457 sentences containing ‘ask’ that meet the patterns. Our method achieves accuracy of
81.64% and 75.93%, which is higher than the accuracy achieved by BERT by around 5% and
13% respectively. Moreover, for each answer from our method an explanation can be produced,
in contrast to a mere quantification on the certainty of the choice as given by BERT.
      </p>
      <p>In our method, we use BERT to match a pronoun with an appropriate semantic role. We
checked the accuracy of BERT on this task for the sentences including ‘help’. BERT achieved
84.06%, which is higher than its accuracy in resolving Winograd Schemas directly by around 8%.
By integrating KR reasoning we not only increased the overall performance of the framework,
but also made better use of an existing language model’s ability.</p>
      <p>Further strengthening our claim that BERT benefits from being given a small, focused task, we
show that the accuracy for selecting the semantic role is afected by the exact prompt provided.
In our main experiments we gave BERT a short prompt based only on the pronoun part of the
3This refers to BERT which has been further fine-tuned for the WSC.</p>
      <sec id="sec-6-1">
        <title>Test</title>
      </sec>
      <sec id="sec-6-2">
        <title>Accuracy (‘help’)</title>
      </sec>
      <sec id="sec-6-3">
        <title>Accuracy (‘ask’)</title>
      </sec>
      <sec id="sec-6-4">
        <title>T0. BERT_LARGE</title>
      </sec>
      <sec id="sec-6-5">
        <title>T1. BERT_WIKI_WSCR T2. Our system</title>
        <p>WS, and we later tested (for those containing ‘help’) the selection of semantic roles when a
longer prompt containing the whole WS was given. The accuracy reduced from 84.06% to
63.29% when using the whole context rather than the pronoun part only. For example,
• “She was inexperienced with the disorder. She would [MASK] help.” (Our usage of BERT
to match a semantic role.)
• “Maria helped Elena cope with the newly diagnosed autism because she was inexperienced
with the disorder. She would [MASK] help.” (Additional context given.)</p>
        <p>The finding also supports the view that the semantic role is often decisive in determining the
reference of a pronoun rather than other aspects of the semantic content.</p>
        <p>Note that the accuracy of our method as a whole (81.64%) for the schemas including ‘help’ is
only 2.42% lower than BERT’s accuracy in matching semantic roles. So, provided the pronoun
role is correctly identified, our KR reasoning is very accurate. The decreased accuracy of our
method compared to the semantic role prediction from BERT is mainly due to the fact that some
schemas were incorrectly parsed by K-Parser.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>
        Our new method KARaML improves on the work of [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. In our prior work it was necessary to
explicitly define a large number of rules in order to match the semantic roles for the candidates
and pronouns. Here we have significantly reduced the number of rules required by instead
using a language model to establish the correlation between the description of the pronoun and
the semantic roles of the candidates. In addition, we improve on the performance achieved by
BERT alone and we are able to generate an explanation for the chosen answer.
      </p>
      <p>
        Our current implementation can only be applied to a subset of Winograd schemas for which
domain-specific rules have been defined. For future work, if we include more domains and
patterns, we will increase the coverage of our system. We would also like to apply our method
to other language understanding problems such as COPA [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. As some parsing results from
K-Parser were incorrect, we intend to investigate using other parsers such as SENNA.
      </p>
      <p>
        So far we have only made limited use of BERT to identify the likely semantic relationship
of the pronoun to the candidate clause. However, the same method may be applied to identify
other semantic relationships that could be exploited by a KR reasoner. Moreover, BERT could
be replaced by other state-of-the-art language models such as GPT-3 [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ].
      </p>
      <p>More generally, our framework represents initial steps towards a progressive assimilation
architecture for language understanding where we use ML to successively combine new
information into a KR representation that we have built up from prior information. This seems to
provide a general way by which information expressed in natural language can be matched
with predicates occurring in the formalised axioms of a KR system.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Levesque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Davis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Morgenstern</surname>
          </string-name>
          , The Winograd Schema Challenge,
          <source>in: The 13th Int. Conf. on Principles of Knowledge Representation and Reasoning</source>
          , Italy,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Sakaguchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Bras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bhagavatula</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Choi,</surname>
          </string-name>
          <article-title>WinoGrande: An adversarial Winograd Schema Challenge at scale</article-title>
          ,
          <source>in: AAAI-20</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. H.</given-names>
            <surname>Vo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Aditya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Baral</surname>
          </string-name>
          ,
          <article-title>Identifying various kinds of event mentions in K-Parser output</article-title>
          ,
          <source>in: Procs. of the The 3rd Workshop on EVENTS: Definition</source>
          , Detection, Coreference, and
          <string-name>
            <surname>Representation</surname>
          </string-name>
          , Assoc. for Comp. Linguistics,
          <year>2015</year>
          , pp.
          <fpage>82</fpage>
          -
          <lpage>88</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , arXiv:
          <year>1810</year>
          .
          <article-title>04805[cs</article-title>
          .CL] (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kocijan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Cretu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. M.</given-names>
            <surname>Camburu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yordanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lukasiewicz</surname>
          </string-name>
          ,
          <article-title>A surprisingly robust trick for Winograd Schema Challenge, in: Procs. of the 57th Annual Meeting of the Assoc</article-title>
          . for Comp. Linguistics,
          <year>2019</year>
          , pp.
          <fpage>4837</fpage>
          -
          <lpage>4842</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Hobbs</surname>
          </string-name>
          , Coherence and coreference,
          <source>Cognitive science 3</source>
          (
          <year>1979</year>
          )
          <fpage>67</fpage>
          -
          <lpage>90</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Asher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lascarides</surname>
          </string-name>
          , Logics of Conversation, Cambridge University Press,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kehler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kertz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Rohde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Elman</surname>
          </string-name>
          ,
          <article-title>Coherence and coreference revisited</article-title>
          ,
          <source>Journal of semantics 25</source>
          (
          <year>2008</year>
          )
          <fpage>1</fpage>
          -
          <lpage>44</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Bennett</surname>
          </string-name>
          ,
          <article-title>Semantic analysis of Winograd Schema no. 1</article-title>
          , in: F.
          <string-name>
            <surname>Neuhaus</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          Brodaric (Eds.),
          <source>Procs. of the 12th Int. Conf. on Formal Ontology and Information Systems (FOIS</source>
          <year>2021</year>
          ),
          <source>Frontiers in Artificial Intelligence and Applications</source>
          , IOS Press,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Schüller</surname>
          </string-name>
          ,
          <article-title>Tackling Winograd Schemas by formalizing relevance theory in knowledge graphs</article-title>
          ,
          <source>in: Fourteenth Int. Conf. on the Principles of Knowledge Representation and Reasoning</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bailey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Harrison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lierler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lifschitz</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Michael,</surname>
          </string-name>
          <article-title>The Winograd Schema Challenge and reasoning about correlation</article-title>
          ,
          <source>in: Logical Formalizations of Commonsense Reasoning</source>
          , AAAI Spring Symposium, Stanford University, USA,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Collobert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bottou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Karlen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kavukcuoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kuksa</surname>
          </string-name>
          ,
          <article-title>Natural language processing (almost) from scratch</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          )
          <fpage>2493</fpage>
          -
          <lpage>2537</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>F.</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <article-title>Using semantic roles for coreference resolution</article-title>
          ,
          <source>in: 2008 Int. Conf. on Advanced Language Processing and Web Information Technology</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>150</fpage>
          -
          <lpage>155</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Fillmore</surname>
          </string-name>
          ,
          <article-title>Frame semantics</article-title>
          , in: Cognitive linguistics: Basic readings, De Gruyter Mouton,
          <year>2008</year>
          , pp.
          <fpage>373</fpage>
          -
          <lpage>400</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bos</surname>
          </string-name>
          ,
          <article-title>Wide-coverage semantic analysis with Boxer</article-title>
          ,
          <source>in: Procs. of the 2008 Conf. on Semantics in Text Processing, STEP '08</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computational Linguistics, USA,
          <year>2008</year>
          , p.
          <fpage>277</fpage>
          -
          <lpage>286</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>D. Das</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>A. F.</given-names>
          </string-name>
          <string-name>
            <surname>Martins</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>N. A.</given-names>
          </string-name>
          <string-name>
            <surname>Smith</surname>
          </string-name>
          , Frame-semantic parsing,
          <source>Computational linguistics 40</source>
          (
          <year>2014</year>
          )
          <fpage>9</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. H.</given-names>
            <surname>Vo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Aditya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Baral</surname>
          </string-name>
          ,
          <article-title>Towards addressing the Winograd Schema Challenge - building and using a semantic parser and a knowledge hunting module</article-title>
          ,
          <source>in: IJCAI</source>
          <year>2015</year>
          ,
          <year>2015</year>
          , pp.
          <fpage>1319</fpage>
          -
          <lpage>1325</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gelfond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lifschitz</surname>
          </string-name>
          ,
          <article-title>The stable model semantics for logic programming</article-title>
          ,
          <source>in: Procs. of Int. Logic Programming Conf. and Symposium</source>
          ,
          <year>1988</year>
          , pp.
          <fpage>1070</fpage>
          -
          <lpage>1080</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bennett</surname>
          </string-name>
          ,
          <article-title>Tackling domain-specific Winograd Schemas with knowledge-based reasoning and machine learning</article-title>
          ,
          <source>in: 3rd Conf. on Language, Data and Knowledge (LDK</source>
          <year>2021</year>
          ),
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <article-title>Using Answer Set Programming for commonsense reasoning in the Winograd Schema Challenge</article-title>
          , arXiv:
          <year>1907</year>
          .
          <article-title>11112[cs</article-title>
          .
          <source>AI]</source>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Al-yahya</surname>
          </string-name>
          , L. Aldhubayi,
          <string-name>
            <given-names>S. Al</given-names>
            <surname>Malak</surname>
          </string-name>
          ,
          <article-title>A pattern-based approach to semantic relation extraction using a seed ontology</article-title>
          , in: Procs. -
          <source>2014 IEEE Int. Conf. on Semantic Computing, ICSC</source>
          <year>2014</year>
          ,
          <year>2014</year>
          , pp.
          <fpage>96</fpage>
          -
          <lpage>99</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rahman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <article-title>Resolving complex cases of definite pronouns: The Winograd Schema Challenge</article-title>
          , in: EMNLP-CoNLL,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>P.</given-names>
            <surname>Trichelair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Emami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Trischler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Suleman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C. K.</given-names>
            <surname>Cheung</surname>
          </string-name>
          ,
          <article-title>How reasonable are common-sense reasoning tasks: A case-study on the Winograd Schema Challenge and swag</article-title>
          , arXiv:
          <year>1811</year>
          .
          <article-title>01778[cs</article-title>
          .LG] (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , V. Stoyanov,
          <article-title>RoBERTa: A robustly optimized BERT pretraining approach</article-title>
          , arXiv:
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>A.</given-names>
            <surname>Emami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Suleman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Trischler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C. K.</given-names>
            <surname>Cheung</surname>
          </string-name>
          ,
          <article-title>An analysis of dataset overlap on Winograd-style tasks</article-title>
          ,
          <source>in: Procs. of the 28th Int. Conf. on Computational Linguistics, Int. Committee on Computational Linguistics</source>
          , Barcelona,
          <source>Spain (Online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>5855</fpage>
          -
          <lpage>5865</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ettinger</surname>
          </string-name>
          ,
          <article-title>What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models</article-title>
          ,
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>8</volume>
          (
          <year>2020</year>
          )
          <fpage>34</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kahneman</surname>
          </string-name>
          , Thinking, fast and slow, Penguin, London,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>D.</given-names>
            <surname>Nadeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sekine</surname>
          </string-name>
          ,
          <article-title>A survey of named entity recognition and classification</article-title>
          ,
          <source>Lingvisticae Investigationes</source>
          <volume>30</volume>
          (
          <year>2007</year>
          )
          <fpage>3</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>M.</given-names>
            <surname>Roemmele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Bejan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Gordon</surname>
          </string-name>
          ,
          <article-title>Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning</article-title>
          ,
          <source>in: AAAI Spring Symposium on Logical Formalizations of Commonsense Reasoning</source>
          , Stanford University,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>T. B. Brown</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ryder</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Subbiah</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kaplan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dhariwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Neelakantan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Shyam</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Sastry</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Askell</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Herbert-Voss</surname>
            , G. Krueger,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Henighan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Child</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ramesh</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          <string-name>
            <surname>Ziegler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Winter</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Hesse</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            , E. Sigler,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Litwin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Chess</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Berner</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>McCandlish</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          , arxiv:
          <year>2005</year>
          .
          <article-title>14165[cs</article-title>
          .CL] (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>