<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Ben Wellner, Lisa Ferro, Warren Grei , and Lynette Hirschman. Reading Comprehension Tests for
Computer-Based Understanding Evaluation. Natural Language Engineering</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Abstractions for Narrative Comprehension Tasks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yi-Chun Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arnav Jhala</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science North Carolina State University Raleigh</institution>
          ,
          <addr-line>NC 27695</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <volume>12</volume>
      <issue>4</issue>
      <abstract>
        <p>This paper presents ongoing work in investigating the scale at which semantic abstractions are useful in intelligent reasoning about narrative. One method of evaluating narrative reasoning is to use comprehension tests on stories based on question-answering. Recent advances in language processing have led to promising results in general question-answering. However, current systems fail to accurately answer questions when information is not explicitly mentioned in the input story. Speci cally, we are interested in testing whether corpus-based deep learning methods can be extended with classical logic-based approaches to draw inferences beyond ones that are explicitly mentioned in sentences on the corpus. This paper describes a preliminary reimplementation of current methods on the bAbI corpus for questionanswering and then presents an algorithm for reasoning about missing information in the input by removing sentences from the corpus.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>Reading comprehension and question answering tasks have a long history in AI, which reaches back to the
formative years of the eld. Many early approaches [GJWCL61, Sho74, HSSS78] were information retrieval
systems which translated natural language questions into logical queries that could be run on a curated database,
like a collection of Major League Baseball statistics. Later work [Leh77, SH00] shifted into the realm of natural
language processing by building speci c knowledge representations and inference techniques to answer
usergenerated questions about natural language stories, like news articles. Most modern approaches are data-driven
and allow many di erent machine learning architectures to compete on a shared training, validation, and test
corpus engineered for a particular purpose. The rst of these approaches, Deep Read [HLBB99], used a corpus
of 3rd to 6th grade reading comprehension test stories paired with short-answer questions and an answer key.
The corpus contained 120 total stories split into 60 development and 60 test stories. Deep Read used a simple
bag-of-words technique to retrieve a sentence from the original story in answer to each question and was correct
30-40% of the time on the test set. These initial results were shortly improved on by subsequent work in the
eld that used di erent models and approaches on the same corpus of stories [NTK00, RT00].</p>
      <p>The current generation of reading comprehension datasets began with MCTest [RBR13], a dataset of 500
ctional stories paired with multiple-choice questions generated and curated by crowd-sourced workers.
Following MCTest, many specialized large-scale datasets were created to challenge and steer reading comprehension
research. Datasets have been built with text sources pulled from Wikipedia articles [YYM15, RZLL16], publicly
available news stories [HKG+15], children's literature [HBCW16], short informational excerpts paired with
triviaquestions [JCWZ17], and English exams for middle and high school students [LXL+17]. Some datasets introduce
special variations, like QAngaroo [WSR18], which requires models synthesize information from several related
documents. These large datasets have spurred competition, progress, and development among deep learning
architectures [SKFH17, HZSC18, CG18], which has prompted updates to the datasets themselves [RJL18].</p>
      <p>While deep learning approaches are making rapid progress on large natural language QA datasets, many
commonsense reasoning tasks have yet to be solved in the context of reading comprehension. A popular open
commonsense reasoning test is The Winograd Schema Challenge [LDM11], a collection of short sentences with a
single ambiguous word that can be resolve in one of two ways. These ambiguities can easily be resolved correctly
by humans based on context but at the rst Winograd Schema Challenge competition at IJCAI 2016 [DMOJ17]
the best machine score was 58%. A test similar to the Schema Challenge is the Story Cloze Test [MCH+16]
where the participant is given a four sentence story or context. After the context, the participant is given two
additional sentences where one is a logical conclusion to the story, but the other is not. The score of machine
participants has been rising since the test's introduction, but the best score is currently 75% [MRL+17]. In
this paper, we explore a logic-based [WFGH06] approach to narrative modeling, comprehension, and question
answering. We use a synthetic QA corpus called bAbI [WBC+15] as a testbed for our approach. bAbI's tests
are modular and targeted at speci c abilities needed for commonsense reasoning in reading comprehension like
deduction, induction, path nding, and positional reasoning. We plan to extend and modify these test cases in
order to sca old our approach up to tackle new types of problems.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Initial Model</title>
      <p>Reading comprehension is the ability of the mind to process and understand text, and then bridge the meaning
with experienced knowledge. These underlying information processing systems allow humans to comprehend
natural language. The processes interpret incoming information to make suitable responses within the context
of questions or situations. In this research, we used a model to imitate the information processing system behind
reading comprehension to let a computer accomplish reading comprehension tasks.</p>
      <p>In order to examine how our model performs, we ask questions about content with or without explicit
information, and let the model give possible answers. Like the key stages that are suggested by many information
processing theories, our model follows a sequential method of input-processing-output. Information gathered
from the text (input), is stored and processed by the core of the model for later use (processing). When
questions are asked, the model decides what it is going to do with the information and how to give suitable responses
(output).
When using question-answering as the testing method for reading comprehension task, we frame the task as a 3
elements set: fP; Q; Ang. Where the P denotes the passage which composed by a set of sentence S = fs1; s2; : : : :g.
The Q represents a set of questions regarding the passage content. By mapping the element in Q through the
answer function ank = answer f unction(qk), we map the questions to a set of possible answers An.
1. Daniel journeyed to the o ce. 2. Daniel grabbed the football. 3. Daniel
left the milk. 4. Where is Daniel?</p>
      <p>This example text is a simpli ed bAbI [WBC+15] QA framework test. We will use the Daniel example to
ground our technical discussion before introducing the full bAbI framework.
3.3</p>
      <sec id="sec-3-1">
        <title>Input Stage</title>
        <p>In this stage, our model has a process to read and encode information from raw text. Instead of storing whole
text, at the rst step, we use a logical knowledge representation framework as a tool to extract explicit concepts
as well as relations in text. Inspired by a knowledge representation framework Rensa [Har17], when our model
analyzes the text, the passage information was framed as di erent relation in Table 1.</p>
        <p>In narrative representations, the existence of concepts and how they are related to one another are required
in some shape or form. The text base and the story world that is composed by the text encodes these
information units. To get these concepts and further use them, we need a representation which can extract the
encodings from input data. We take knowledge representation frameworks, like Story Intention Graphs [EM07]
and Rensa [Har17], as inspiration for a hand-encoded knowledge representation. We use unary relations to model
concepts. A unary relation is a complex concept that includes left-hand (l), right-hand (r), binary relations, and
attributes for an information unit, and is written (x). Therefore, each assertion can be written as:
fl( ); relation( ); r( ); 1(x1); 2(x2); ::: n(xn)g</p>
        <p>The idea of left-hand and right-hand concepts represents a name and value pair, where the name is the label
that expresses the subject of the assertions whereas the value could be an array of attributes or another assertion
to support the nested relation structure. The name and the value are denoted by and respectively. The
symbol represents the relation between the left and right part. Following the framework, we implemented some
classic relations and variable relations in the context of the Daniel example passage in Table 1. When input
the raw text, we encode the data into assertions. By this step, the entities in the described world and their
perceivable actions and attributes are extracted. These concepts are then stored for later use.
(1)
relation</p>
        <p>de nition
is a / type of
action
action at
has property</p>
        <p>A hyponym is a type of hypernym
perform action
is located at
is
(class inclusion)
example
is a(Daniel, entity)
action(Daniel, journeyed)
action at(Daniel, o ce)
has property(garden, uncertain)</p>
        <p>In each sentence, the information can be divided to entity E and action A parts where the entities are nouns
in this sentence and the actions are verbs. Therefore sk denotes a sentence where sk = fEk; Akg. To reason
about the information in semantic level of the action, we rst integrate hand-crafted rules with the predicates and
thematic roles of verbs from Verb-Net [KKRP08] 1. Because the verbs in bAbI tasks are limited, we currently
1The Verb-Net project is in https://verbs.colorado.edu/verbnet/
are able to separate the verbs into three categories: CON N ECT , SEP ARAT E, and M OV E. If the predicates
of a verb include a predicate that shows changing on locations, the verb will be categorized into the M OV E
group. If the predicates of a verb contain the relation between source and theme, it will be categorized to the
CON N ECT group. And if the predicates of a verb include the relation between theme and destination or an
end location, it will be considered to the SEP ARAT E group.</p>
        <p>After the action in the sentence is put into a group, we process the entities and relations in the sentence.
Besides the predicates and thematic roles of a verb, Verb-Net also provides syntax frames for each verb. When
processing entities, we match syntax frames with sentences to ll entities into thematic roles. In order to
better reason over the relations between entities, they are divided into three di erent sets. According to the
characteristics of the entities, the sets are characters C, locations L, and other objects O.</p>
        <p>When we process the entities we categorize them by their characteristics. When a name matches some
linguistic rule such as using a capital letter, or is a reference to the subject of a clause, and display signs of
animacy such as taking action, the system considers the entity as a character. When we say an entity is capable
to conduct actions which means the entity can be lled into the Agent role in verb frames in the sentence,
or the entities is exist in certain external resources such as baby's name database, it will be categorize to
the character set C = fck 2 E and isActor(ck) = T rueg. Similarly, if an entity is linked by prepositions of
places, or it can match the Location role in verb frames, it will be categorized as an element in location set
L = flk 2 E and isLocaiton(lk) = T rueg. For the rest of the entities, if they are the object of the action and
is not categorized to character set, they will be put into object set O = fok 2 E and not(C [ L)g. In these
steps, we present the link between groups and actions, sets and entities by the type of and isa relations. For
example, is a(Daniel; entity), is a(Daniel; Character), type of (grab; CON N ECT ). This process is described
in Algorithm 1.</p>
        <p>Algorithm 1 Reason Action and Entities
1: S input sentence
2: [E; A] ProcessSentence(S)
3: categorize all a 2 A to fCON N ECT; SEP ARAT E; M OV Eg groups
4: create type of (a; actiongroup) relations
5: while exist e in E is unchecked do
6: pick an e arbitrarily
7: if isActor(e) == T rue then
8: put e in fCharacterg
9: else if isLocation(e) == T rue then
10: put e in fLocationg
11: else
12:
13:
14:</p>
        <p>put e in fObjectg
mark e as checked
create is a(e; fCharacter; Location; Objectg)
3.4</p>
      </sec>
      <sec id="sec-3-2">
        <title>Processing and Storage Stage</title>
        <p>For the information pair of a sentence i Ci; Ai; Oi, we already construct the relations between entities and between
entity and actions. By matching the syntax frame, we are also able to create the relation that action(ck 2 Ci; ak 2
Ai) where combines the the character ck and the corresponding action ak. As a next step, we start to process
the relation between actions and their e ects. If the action belongs to CONNECT group, we will match elements
in Oi and create has possession(ck; ok 2 Oi) relations to link the own-ship between objects and the characters,
and this change will re ect to the change of world states. When the action belongs to the SEPARATE group,
the has possession(ck; ok 2 Oi) relation will be removed to indicate that the own-ship not hold anymore and our
model will updates the world states accordingly. The last case is the MOVE category, when the action belongs
to this category, the location information is link to both actions and characters, we get action(ck; ak) and
action at(ck; lk 2 Li) relation pairs, the change will also be re ected to world states. Moreover, for each object
that links to the character, has possession(ck; ok 2 Oi), their location information will be updated accordingly.</p>
        <p>This stage includes adding information to the mental schema and encoding it. As described above, we rst
analyze the basic relations that inherited from the input stage and then use a rule-based method to bridge
them with the knowledge base to gure out the e ects that new incoming information applies to story world
Algorithm 2 Reason Action and E ects
1: if a 2 CON N ECT then
2: add has possession(c; o) relation
3: else if a 2 SEP ARAT E then
4: remove has possession(c; o) relation
5: else if a 2 M OV E then
6: update actionat(c; l) and at(c; l) relation
7: for all has possession(c; o) do
8: update at(o; l)
9: update world states
states. The relations form the assertions about entities. To build a logical model with the assertions, we use
the planning language PDDL [MGH+98] over rst-order logic with constant symbols called objects, relation
symbols called predicates, and variable symbols. We start with a set of PDDL operators O where each operator
o = hlo; po; eoi 2 O consists of a unique name or label l, a conjunctive set of rst-order literal preconditions p,
and a conjunctive set of rst-order literal e ects e. Preconditions specify what must be true in the story world
for an action to take place and e ects specify how the world is updated by the action. Operators have parameters
that can be grounded by substituting concrete PDDL objects, which represent story world characters, things,
and locations for parameter variables. Table 2 shows the parameters, preconditions, and e ects of an example
PDDL operator for moving a character from one location to another. The operator is ground with objects from
our example story.</p>
        <sec id="sec-3-2-1">
          <title>Algorithm 3 Sentence Processing</title>
          <p>1: S input sentence
2: use speech tagger to get nouns, verb of the sentence
3: E nouns
4: a verb of the sentence
5: return [E; a]</p>
          <p>During the process that interprets actions, e ects of actions, and adds preconditions to our model's knowledge
base, because the e ects are often unstated, background knowledge is often required to reason about the changed
world. When applying our model to test sets in experiments, we set some reference rules for actions that appear
in the text and update the changes of world states accordingly. In this stage, after processing the input assertions
from last stage, our model encodes the changing of world states, conditions, or actions into PDDL format. We
represent those result by PDDL-like uents similar to those used in a planning problem. Once the comprehend
world is not continuous, we could insert planning methods to ll in the gaps between the changing of states.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Parameters</title>
          <p>?mover ?newlocation ?oldlocation</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>Ground Parameters</title>
          <p>Daniel office kitchen</p>
        </sec>
        <sec id="sec-3-2-4">
          <title>Preconditions</title>
          <p>(at ?mover ?oldlocation)
:(at ?mover ?newlocation)</p>
        </sec>
        <sec id="sec-3-2-5">
          <title>E ects</title>
          <p>(at ?mover ?newlocation)
:(at ?mover ?oldlocation)</p>
        </sec>
        <sec id="sec-3-2-6">
          <title>Ground Preconditions</title>
          <p>(at Daniel kitchen)
:(at Daniel office)</p>
        </sec>
        <sec id="sec-3-2-7">
          <title>Ground E ects</title>
          <p>(at Daniel office)
:(at Daniel kitchen)
In this stage, the model prepares an appropriate response to outside stimuli. The stimuli are questions that
relate to text content and the response should be the answer or possible answers generated from the knowledge
base that is constructed by our previous comprehension process. To give answers to the questions, we analyze
the them through the processes in the previous two stages. Our model extracts information to make the question
be assigned to a speci c category. Like the process for input text, we rst transform the question into assertions,
and then interpret the assertion to get the subject of the question as well as the question type. Example question
types are yes/no question, where, or who. This step not only analyzes the question content, but also prepares
for generating answers.</p>
          <p>For certain questions, our model provides templates to create answers. For other questions, our model gives
the most related information which matches the question topic. When our model is exploring its knowledge
base for answers, it can encounter di erent situations. The easiest situation is when the answer can be directly
queued from existing world states. However, if the answer doesn't appear in existing world states because the
information is implicit or missing our model must process to explore answers. One processing method is to seek
other subjects which have a relation with the question subjects in current world states, then use the information
to ll out missing relations to produce possible answers. Another process is to trace back to previous world
states in order to look up information.
3.5.1</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>Planning for Missing Information</title>
        <p>When our model gets a question regarding missing information, it will try to nd a reference entity and plan for
the missing part. For example, the question "Where is Daniel?" asks for location information about the character
Daniel. If we cannot nd the answer from current world states, but the question subject has relations with other
entities such as has possession(Daniel; f ootball), our model will mark the entity as a reference. For example, our
system considers the states about the reference it can infer possible answers from, like at(Daniel; unknown) but
has possession(Daniel; f ootball) and at(f ootball; garden), then gives the possible answer should be "garden".
It also traces back to previous world states where the last time it know about the question subject (Daniel), and
with the possible answer at(Daniel; garden), it uses the following planning procedure to gure out the possible
missing information. In this example, our system will set previous state at(Daniel; unknown) as the initial state
and the possible answer at(Daniel; garden) as the goal state. After that, the model will assign previous world
states as preconditions. By matching the preconditions, the system gets a series of actions, and then according
to the e ects of the actions in the action set, our system choose a possible missing action.
Algorithm 4 Planning
1: possible inf o =[]
2: initial state previous state about the question subject
3: current state initial state
4: goal state the possible answer
5: precondition set previous world states
6: while current state not match goal state do
7: if precondition set matches elements in action groups then
8: add to doable set
9: if the e ect of any a 2 doableset matches goal state then
10: update possible inf o
11: current state e ect of a
12: else
13:
14:
15:
16:
pick an a
update possible inf o
current state e ect of a
update precondition set
17: return possible inf o
3.6</p>
      </sec>
      <sec id="sec-3-4">
        <title>Performance on bAbI Tasks</title>
        <p>The bAbI task is a proxy task which evaluates reading comprehension via question answering. It measures
a language understanding system in di erent aspects of abilities, including whether the system can answer
questions via chaining facts, simple induction, deduction, and so on. In our experiments, we use the task set to
identify successful and insu cient parts of our model. In our experiments, we get almost 100% correctness in
the following tasks sets. We tested tasks 1, 2, 3, 6, 8, and 10. Tasks 1, 2 and 3 provide test sets that answer</p>
        <sec id="sec-3-4-1">
          <title>Task ID MemNet PE LS RN MemNet PE LS RN JOINT Our Model</title>
          <p>questions about single supporting facts, among other facts or questions about the combination of information
from sentences. In the simplest case, the task has questions like, \Mary traveled to the o ce. Where is Mary?"
which only asks about information that is provided directly. When the task tests the understanding of two
or more facts, the questions require references of other information to get information regarding the question
subject. For example, \John is in the playground. John picked up the football. Where is the football?" The
question subject is the football, but to answer this question information about John is also needed.</p>
          <p>For the simplest type of tasks, our model can directly get an answer from its knowledge base. Because the
input sentences already contain explicit information regarding answers, our model can link the e ect of actions
to world states when processing input sentences. Therefore, the needed information is accessible in the current
world states. When answering the questions about a combination of information, our model rst queries the
information about question subjects from its knowledge base, and then nds the reference object from the queried
world states. From the information of the reference object, the question can be answered correctly. Task 6 is
very similar to task 1, it asks questions about supporting facts. But it aims to test the ability of a model in
answering true/false type questions. Such as \Is John in the playground?" Our model can generate these kind
of answers correctly by placing related information into a right category of answer framework. Task 8 requires
models to generate a list that is composed by a set of single word answers. A question example is \What is Daniel
holding?" Our model answers these questions by searching the states of objects in the described world. Task 10
gives possibility statement instead of facts. Our model preserves uncertain states while processing input sentences
in the very beginning, so we can answer the questions. Task 15 and Task 16 test basic deduction and induction,
respectively. In Task 15, sentences describe entities with some characteristics. After each characteristic, sentences
give a subject that inherits properties from the entity, and then asks questions about the characteristics. Task
16 gives di erent properties of subjects in sentences. It uses another subject as a question subject and asks if the
question subject has some of the properties with the described subject. In our processing stage these assertions
are connected through the same subjects and our model stores the relations as states of subjects, the is a relation
helps us to keep the information that be used in deduction or induction. Therefore, our model can also nd
answers from its knowledge base.</p>
          <p>For these tasks, we compare our result with the MemNet which is cited by bAbI paper [WBC+15] in Table 3.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Extended Task Framework</title>
      <p>Although bAbI tasks cover many aspects of evaluations about understanding models, they give explicit
information in each task. In addition to answering questions about described things, we are also interested in whether
the model can answer questions with implicit information. When people read a paragraph of text, like a story,
changing world states are not always clearly stated. We propose a task framework that modi es the bAbI task to
test an aspect of the ability of understanding models. The extended task removes information from the original
text in order to test whether a model can use known information to infer the implicit part. The following are
examples of the extended task. Here is an example of the original bAbI task, Two Supporting Facts:
1. Mary moved to the bathroom. 2. Mary got the football there. 3. John
went to the kitchen. 4. Mary went back to the kitchen. 5. Mary dropped the
football. 6. John got the football there. 7. Where is the football? Kitchen 4 5</p>
      <p>In this case, we can see that the answer to the question is the kitchen because the question subject, the
football, was mentioned in sentences 4 and 5. To create an inference task, our modi cation will be:</p>
      <p>We consider items that show up together in a sentence with the question subject. These guarantee there exist
certain relations between the question subject and things in the world. Then we remove the information directly
related to the question to create the inference part. In this example, because the question asks about location,
we remove the sentence which gives location information. To answer these questions, the models must determine
information from the relation between the two subjects. For this example, we can answer that John is in the
kitchen because he got the football, and the football was dropped in the kitchen. With the same original text,
and following the rule we descried above, another modi cation could be:
1. Mary moved to the bathroom. 2. Mary got the football there. 3. John
went to the kitchen. 4. 5. Mary dropped the football. 6. John got the
football there. 7. Where is Mary? Kitchen 3 5 6</p>
      <p>Similarly, the question subject Mary and a described item, the football, are in the same sentence. Also, the
location information of the question subject is removed. We can answer this question because Mary dropped the
football somewhere, but John got the football in his location, so we know that Mary should be at the same place
with John. These modi ed rules test not only the ability to infer unstated information from given sentences,
but also evaluate the ability to understand implications like if a subject can conduct actions on some objects
that they must be in the same place. Our model solves the extended task through the following process. In
the former modi cation, the answer searching process starts from the initial state that John is at an unknown
location and the football is in the kitchen, and the goal state would be that John is in the same location with
the football because he is able to get the football. By searching the possible actions that e ect John's state from
an unknown place to the kitchen, our model can not only answer the question but also ll back the missing part.
In the latter case, the initial state will be that Mary is at an unknown location as well as the football is in the
kitchen, and the goal state will be that Mary and the football are in the same location.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>This work, while preliminary, sets up a platform for further research in question-answering and narrative
abstraction. There are several areas of further improvement. We currently use a static set of PDDL operators O
to build our inferred world states from the knowledge representation. In the future, we'd like to dynamically
build this set. One avenue is to learn the set by reading stories [CMW13]. Another would be to use a lexicon,
like VerbNet [Sch05], to map relations in the knowledge representation to a database of PDDL operators. bABI
is one of several corpora that are currently being actively developed within the NLP community. We are already
looking at expanding this work to include other datasets, such as SQuAD [RJL18].
[CG18]</p>
      <p>Christopher Clark and Matt Gardner. Simple and E ective Multi-Paragraph Reading
Comprehension. In Meeting of the Association for Computational Linguistics, pages 845{855, 2018.</p>
      <p>Stephen N. Cresswell, Thomas L. McCluskey, and Margaret M. West. Acquiring Planning domain
models using LOCM. The Knowledge Engineering Review, 28(2):195{213, 2013.</p>
      <p>Ernest Davis, Leora Morgenstern, and Charles L. Ortiz Jr. The First Winograd Schema Challenge
at IJCAI-16. AI Magazine, 38(3), 2017.</p>
      <p>David K. Elson and Kathleen R. McKeown. A Platform for Symbolically Encoding Human
Narratives. In AAAI Fall Symposium on Intelligent Narrative Technologies, 2007.
[GJWCL61] Bert F. Green Jr., Alice K. Wolf, Carol Chomsky, and Kenneth Laughery. Baseball: An Automatic
Question Answerer. In Western Joint IRE-AIEE-ACM Computer Conference, pages 219{224. ACM,
1961.</p>
      <p>Sarah Harmon. Narrative Encoding for Computational Reasoning and Adaptation. PhD thesis,
University of California, Santa Cruz, 2017.
[HBCW16] Felix Hill, Antoine Bordes, Sumit Chopra, and Jason Weston. The Goldilocks Principle: Reading
Children's Books with Explicit Memory Representations. In International Conference on Learning
Representations, 2016.
[HKG+15]</p>
      <p>Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa
Suleyman, and Phil Blunsom. Teaching Machines to Read and Comprehend. In Advances in Neural
Information Processing Systems, pages 1693{1701, 2015.</p>
      <p>Lynette Hirschman, Marc Light, Eric Breck, and John D. Burger. Deep Read: A Reading
Comprehension System. In Meeting of the Association for Computational Linguistics, pages 325{332,
1999.</p>
      <p>Gary G. Hendrix, Earl D. Sacerdoti, Daniel Sagalowicz, and Jonathan Slocum. Developing a Natural
Language Interface to Complex Data. Transactions on Database Systems, 3(2):105{147, 1978.
Hsin-Yuan Huang, Chenguang Zhu, Yelong Shen, and Weizhu Chen. Fusionnet: Fusing Via
FullyAware Attention with Application to Machine Comprehension. In International Conference on
Learning Representations, 2018.</p>
      <p>Mandar Joshi, Eunsol Choi, Daniel S. Weld, and Luke Zettlemoyer. TriviaQA: A Large Scale
Distantly Supervised Challenge Dataset for Reading Comprehension. In Meeting of the Association
for Computational Linguistics, 2017.</p>
      <p>Karin Kipper, Anna Korhonen, Neville Ryant, and Martha Palmer. A Large-scale Classi cation of
English Verbs. Languge Resources and Evaluation Journal, 42(1):21{40, 2008.</p>
      <p>Hector J. Levesque, Ernest Davis, and Leora Morgenstern. The Winograd Schema Challenge. In
AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning, volume 46, page 47,
2011.</p>
      <p>Wendy G. Lehnert. A Conceptual Theory of Question Answering. In International Joint Conference
on Arti cial Intelligence, pages 158{164, 1977.</p>
      <p>Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. RACE: Large-scale
ReAding Comprehension Dataset From Examinations. In Conference on Empirical Methods in Natural
Language Processing, 2017.
[MCH+16] Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy
Vanderwende, Pushmeet Kohli, and James F. Allen. A Corpus and Evaluation Framework for Deeper
Understanding of Commonsense Stories. In Conference of the North American Chapter of the
Association for Computational Linguistics: Human Language Technologies, pages 839{849, 2016.
[MGH+98] Drew McDermott, Malik Ghallab, Adele Howe, Craig Knoblock, Ashwin Ram, Manuela Veloso,
Daniel Weld, and David Wilkins. PDDL - The Planning Domain De nition Language. Technical
Report CVC TR98003/DCSTR1165, Yale Center for Computational Vision and Control, 1998.
[MRL+17]</p>
      <p>Nasrin Mostafazadeh, Michael Roth, Annie Louis, Nathanael Chambers, and James F. Allen.
LSDSem 2017 Shared Task: The Story Cloze Test. In Workshop on Linking Models of Lexical, Sentential
and Discourse-level Semantics, pages 46{51, 2017.</p>
      <p>Hwee Tou Ng, Leong Hwee Teo, and Jennifer Lai Pheng Kwan. A Machine Learning Approach to
Answering Questions for Reading Comprehension Tests. In Conference on Empirical Methods in
Natural Language Processing and Very Large Corpora, pages 124{132, 2000.</p>
      <p>Matthew Richardson, Christopher J.C. Burges, and Erin Renshaw. MCTest: A Challenge Dataset
for the Open-Domain Machine Comprehension of Text. In Conference on Empirical Methods in
Natural Language Processing, pages 193{203, 2013.
[RT00]
[RZLL16]
[Sch05]
[SH00]
[Sho74]
[SKFH17]
[WSR18]
[YYM15]</p>
      <p>Pranav Rajpurkar, Robin Jia, and Percy Liang. Know What You Don't Know: Unanswerable
Questions for SQuAD. In Meeting of the Association for Computational Linguistics, 2018.
Ellen Rilo and Michael Thelen. A Rule-Based Question Answering System for Reading
Comprehension Tests. In Workshop on Reading Comprehension Tests as Evaluation for Computer-Based
Language Understanding Systems, pages 13{19, 2000.</p>
      <p>Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ Questions
for Machine Comprehension of Text. In Conference on Empirical Methods in Natural Language
Processing, page 2383, 2016.</p>
      <p>Karin Kipper Schuler. VerbNet: A Broad-Coverage, Comprehensive Verb Lexicon. PhD thesis,
University of Pennsylvania, 2005.</p>
      <p>Lenhart K. Schubert and Chung Hee Hwang. Episodic Logic Meets Little Red Riding Hood: A
Comprehensive, Natural Representation for Language Understanding. Natural Language Processing
and Knowledge Representation: Language for Knowledge and Knowledge for Language, pages 111{
174, 2000.</p>
      <p>Edward H. Shortli e. A Rule-Based Computer Program for Advising Physicians Regarding
Antimicrobial Therapy Selection. In ACM Conference, pages 739{739, 1974.</p>
      <p>Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. Bidirectional Attention
Flow for Machine Comprehension. In International Conference on Learning Representations, 2017.
Johannes Welbl, Pontus Stenetorp, and Sebastian Riedel. Constructing Datasets for Multi-hop
Reading Comprehension Across Documents. Transactions of the Association for Computational
Linguistics, 6:287{302, 2018.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Yi</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Wen-tau Yih, and Christopher Meek</article-title>
          .
          <article-title>WikiQA: A Challenge Dataset for Open-Domain Question Answering</article-title>
          .
          <source>In Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <year>2013</year>
          {
          <year>2018</year>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>