Introduction

Ben Wellner, Lisa Ferro, Warren Grei , and Lynette Hirschman. Reading Comprehension Tests for Computer-Based Understanding Evaluation. Natural Language Engineering

Abstractions for Narrative Comprehension Tasks

Yi-Chun Chen

Arnav Jhala

0 0 Department of Computer Science North Carolina State University Raleigh , NC 27695 , USA

2015

12 4

This paper presents ongoing work in investigating the scale at which semantic abstractions are useful in intelligent reasoning about narrative. One method of evaluating narrative reasoning is to use comprehension tests on stories based on question-answering. Recent advances in language processing have led to promising results in general question-answering. However, current systems fail to accurately answer questions when information is not explicitly mentioned in the input story. Speci cally, we are interested in testing whether corpus-based deep learning methods can be extended with classical logic-based approaches to draw inferences beyond ones that are explicitly mentioned in sentences on the corpus. This paper describes a preliminary reimplementation of current methods on the bAbI corpus for questionanswering and then presents an algorithm for reasoning about missing information in the input by removing sentences from the corpus.

Introduction Related Work

Reading comprehension and question answering tasks have a long history in AI, which reaches back to the formative years of the eld. Many early approaches [GJWCL61, Sho74, HSSS78] were information retrieval systems which translated natural language questions into logical queries that could be run on a curated database, like a collection of Major League Baseball statistics. Later work [Leh77, SH00] shifted into the realm of natural language processing by building speci c knowledge representations and inference techniques to answer usergenerated questions about natural language stories, like news articles. Most modern approaches are data-driven and allow many di erent machine learning architectures to compete on a shared training, validation, and test corpus engineered for a particular purpose. The rst of these approaches, Deep Read [HLBB99], used a corpus of 3rd to 6th grade reading comprehension test stories paired with short-answer questions and an answer key. The corpus contained 120 total stories split into 60 development and 60 test stories. Deep Read used a simple bag-of-words technique to retrieve a sentence from the original story in answer to each question and was correct 30-40% of the time on the test set. These initial results were shortly improved on by subsequent work in the eld that used di erent models and approaches on the same corpus of stories [NTK00, RT00].

The current generation of reading comprehension datasets began with MCTest [RBR13], a dataset of 500 ctional stories paired with multiple-choice questions generated and curated by crowd-sourced workers. Following MCTest, many specialized large-scale datasets were created to challenge and steer reading comprehension research. Datasets have been built with text sources pulled from Wikipedia articles [YYM15, RZLL16], publicly available news stories [HKG+15], children's literature [HBCW16], short informational excerpts paired with triviaquestions [JCWZ17], and English exams for middle and high school students [LXL+17]. Some datasets introduce special variations, like QAngaroo [WSR18], which requires models synthesize information from several related documents. These large datasets have spurred competition, progress, and development among deep learning architectures [SKFH17, HZSC18, CG18], which has prompted updates to the datasets themselves [RJL18].

While deep learning approaches are making rapid progress on large natural language QA datasets, many commonsense reasoning tasks have yet to be solved in the context of reading comprehension. A popular open commonsense reasoning test is The Winograd Schema Challenge [LDM11], a collection of short sentences with a single ambiguous word that can be resolve in one of two ways. These ambiguities can easily be resolved correctly by humans based on context but at the rst Winograd Schema Challenge competition at IJCAI 2016 [DMOJ17] the best machine score was 58%. A test similar to the Schema Challenge is the Story Cloze Test [MCH+16] where the participant is given a four sentence story or context. After the context, the participant is given two additional sentences where one is a logical conclusion to the story, but the other is not. The score of machine participants has been rising since the test's introduction, but the best score is currently 75% [MRL+17]. In this paper, we explore a logic-based [WFGH06] approach to narrative modeling, comprehension, and question answering. We use a synthetic QA corpus called bAbI [WBC+15] as a testbed for our approach. bAbI's tests are modular and targeted at speci c abilities needed for commonsense reasoning in reading comprehension like deduction, induction, path nding, and positional reasoning. We plan to extend and modify these test cases in order to sca old our approach up to tackle new types of problems. 3

Initial Model

Reading comprehension is the ability of the mind to process and understand text, and then bridge the meaning with experienced knowledge. These underlying information processing systems allow humans to comprehend natural language. The processes interpret incoming information to make suitable responses within the context of questions or situations. In this research, we used a model to imitate the information processing system behind reading comprehension to let a computer accomplish reading comprehension tasks.

In order to examine how our model performs, we ask questions about content with or without explicit information, and let the model give possible answers. Like the key stages that are suggested by many information processing theories, our model follows a sequential method of input-processing-output. Information gathered from the text (input), is stored and processed by the core of the model for later use (processing). When questions are asked, the model decides what it is going to do with the information and how to give suitable responses (output). When using question-answering as the testing method for reading comprehension task, we frame the task as a 3 elements set: fP; Q; Ang. Where the P denotes the passage which composed by a set of sentence S = fs1; s2; : : : :g. The Q represents a set of questions regarding the passage content. By mapping the element in Q through the answer function ank = answer f unction(qk), we map the questions to a set of possible answers An. 1. Daniel journeyed to the o ce. 2. Daniel grabbed the football. 3. Daniel left the milk. 4. Where is Daniel?

This example text is a simpli ed bAbI [WBC+15] QA framework test. We will use the Daniel example to ground our technical discussion before introducing the full bAbI framework. 3.3

Input Stage

In this stage, our model has a process to read and encode information from raw text. Instead of storing whole text, at the rst step, we use a logical knowledge representation framework as a tool to extract explicit concepts as well as relations in text. Inspired by a knowledge representation framework Rensa [Har17], when our model analyzes the text, the passage information was framed as di erent relation in Table 1.

In narrative representations, the existence of concepts and how they are related to one another are required in some shape or form. The text base and the story world that is composed by the text encodes these information units. To get these concepts and further use them, we need a representation which can extract the encodings from input data. We take knowledge representation frameworks, like Story Intention Graphs [EM07] and Rensa [Har17], as inspiration for a hand-encoded knowledge representation. We use unary relations to model concepts. A unary relation is a complex concept that includes left-hand (l), right-hand (r), binary relations, and attributes for an information unit, and is written (x). Therefore, each assertion can be written as: fl( ); relation( ); r( ); 1(x1); 2(x2); ::: n(xn)g

The idea of left-hand and right-hand concepts represents a name and value pair, where the name is the label that expresses the subject of the assertions whereas the value could be an array of attributes or another assertion to support the nested relation structure. The name and the value are denoted by and respectively. The symbol represents the relation between the left and right part. Following the framework, we implemented some classic relations and variable relations in the context of the Daniel example passage in Table 1. When input the raw text, we encode the data into assertions. By this step, the entities in the described world and their perceivable actions and attributes are extracted. These concepts are then stored for later use. (1) relation

de nition is a / type of action action at has property

A hyponym is a type of hypernym perform action is located at is (class inclusion) example is a(Daniel, entity) action(Daniel, journeyed) action at(Daniel, o ce) has property(garden, uncertain)

In each sentence, the information can be divided to entity E and action A parts where the entities are nouns in this sentence and the actions are verbs. Therefore sk denotes a sentence where sk = fEk; Akg. To reason about the information in semantic level of the action, we rst integrate hand-crafted rules with the predicates and thematic roles of verbs from Verb-Net [KKRP08] 1. Because the verbs in bAbI tasks are limited, we currently 1The Verb-Net project is in https://verbs.colorado.edu/verbnet/ are able to separate the verbs into three categories: CON N ECT , SEP ARAT E, and M OV E. If the predicates of a verb include a predicate that shows changing on locations, the verb will be categorized into the M OV E group. If the predicates of a verb contain the relation between source and theme, it will be categorized to the CON N ECT group. And if the predicates of a verb include the relation between theme and destination or an end location, it will be considered to the SEP ARAT E group.

After the action in the sentence is put into a group, we process the entities and relations in the sentence. Besides the predicates and thematic roles of a verb, Verb-Net also provides syntax frames for each verb. When processing entities, we match syntax frames with sentences to ll entities into thematic roles. In order to better reason over the relations between entities, they are divided into three di erent sets. According to the characteristics of the entities, the sets are characters C, locations L, and other objects O.

When we process the entities we categorize them by their characteristics. When a name matches some linguistic rule such as using a capital letter, or is a reference to the subject of a clause, and display signs of animacy such as taking action, the system considers the entity as a character. When we say an entity is capable to conduct actions which means the entity can be lled into the Agent role in verb frames in the sentence, or the entities is exist in certain external resources such as baby's name database, it will be categorize to the character set C = fck 2 E and isActor(ck) = T rueg. Similarly, if an entity is linked by prepositions of places, or it can match the Location role in verb frames, it will be categorized as an element in location set L = flk 2 E and isLocaiton(lk) = T rueg. For the rest of the entities, if they are the object of the action and is not categorized to character set, they will be put into object set O = fok 2 E and not(C [ L)g. In these steps, we present the link between groups and actions, sets and entities by the type of and isa relations. For example, is a(Daniel; entity), is a(Daniel; Character), type of (grab; CON N ECT ). This process is described in Algorithm 1.

Algorithm 1 Reason Action and Entities 1: S input sentence 2: [E; A] ProcessSentence(S) 3: categorize all a 2 A to fCON N ECT; SEP ARAT E; M OV Eg groups 4: create type of (a; actiongroup) relations 5: while exist e in E is unchecked do 6: pick an e arbitrarily 7: if isActor(e) == T rue then 8: put e in fCharacterg 9: else if isLocation(e) == T rue then 10: put e in fLocationg 11: else 12: 13: 14:

put e in fObjectg mark e as checked create is a(e; fCharacter; Location; Objectg) 3.4

Processing and Storage Stage

For the information pair of a sentence i Ci; Ai; Oi, we already construct the relations between entities and between entity and actions. By matching the syntax frame, we are also able to create the relation that action(ck 2 Ci; ak 2 Ai) where combines the the character ck and the corresponding action ak. As a next step, we start to process the relation between actions and their e ects. If the action belongs to CONNECT group, we will match elements in Oi and create has possession(ck; ok 2 Oi) relations to link the own-ship between objects and the characters, and this change will re ect to the change of world states. When the action belongs to the SEPARATE group, the has possession(ck; ok 2 Oi) relation will be removed to indicate that the own-ship not hold anymore and our model will updates the world states accordingly. The last case is the MOVE category, when the action belongs to this category, the location information is link to both actions and characters, we get action(ck; ak) and action at(ck; lk 2 Li) relation pairs, the change will also be re ected to world states. Moreover, for each object that links to the character, has possession(ck; ok 2 Oi), their location information will be updated accordingly.

This stage includes adding information to the mental schema and encoding it. As described above, we rst analyze the basic relations that inherited from the input stage and then use a rule-based method to bridge them with the knowledge base to gure out the e ects that new incoming information applies to story world Algorithm 2 Reason Action and E ects 1: if a 2 CON N ECT then 2: add has possession(c; o) relation 3: else if a 2 SEP ARAT E then 4: remove has possession(c; o) relation 5: else if a 2 M OV E then 6: update actionat(c; l) and at(c; l) relation 7: for all has possession(c; o) do 8: update at(o; l) 9: update world states states. The relations form the assertions about entities. To build a logical model with the assertions, we use the planning language PDDL [MGH+98] over rst-order logic with constant symbols called objects, relation symbols called predicates, and variable symbols. We start with a set of PDDL operators O where each operator o = hlo; po; eoi 2 O consists of a unique name or label l, a conjunctive set of rst-order literal preconditions p, and a conjunctive set of rst-order literal e ects e. Preconditions specify what must be true in the story world for an action to take place and e ects specify how the world is updated by the action. Operators have parameters that can be grounded by substituting concrete PDDL objects, which represent story world characters, things, and locations for parameter variables. Table 2 shows the parameters, preconditions, and e ects of an example PDDL operator for moving a character from one location to another. The operator is ground with objects from our example story.

Algorithm 3 Sentence Processing

1: S input sentence 2: use speech tagger to get nouns, verb of the sentence 3: E nouns 4: a verb of the sentence 5: return [E; a]

During the process that interprets actions, e ects of actions, and adds preconditions to our model's knowledge base, because the e ects are often unstated, background knowledge is often required to reason about the changed world. When applying our model to test sets in experiments, we set some reference rules for actions that appear in the text and update the changes of world states accordingly. In this stage, after processing the input assertions from last stage, our model encodes the changing of world states, conditions, or actions into PDDL format. We represent those result by PDDL-like uents similar to those used in a planning problem. Once the comprehend world is not continuous, we could insert planning methods to ll in the gaps between the changing of states.

Parameters

?mover ?newlocation ?oldlocation

Ground Parameters

Daniel office kitchen

Preconditions

(at ?mover ?oldlocation) :(at ?mover ?newlocation)

E ects

(at ?mover ?newlocation) :(at ?mover ?oldlocation)

Ground Preconditions

(at Daniel kitchen) :(at Daniel office)

Ground E ects

(at Daniel office) :(at Daniel kitchen) In this stage, the model prepares an appropriate response to outside stimuli. The stimuli are questions that relate to text content and the response should be the answer or possible answers generated from the knowledge base that is constructed by our previous comprehension process. To give answers to the questions, we analyze the them through the processes in the previous two stages. Our model extracts information to make the question be assigned to a speci c category. Like the process for input text, we rst transform the question into assertions, and then interpret the assertion to get the subject of the question as well as the question type. Example question types are yes/no question, where, or who. This step not only analyzes the question content, but also prepares for generating answers.

For certain questions, our model provides templates to create answers. For other questions, our model gives the most related information which matches the question topic. When our model is exploring its knowledge base for answers, it can encounter di erent situations. The easiest situation is when the answer can be directly queued from existing world states. However, if the answer doesn't appear in existing world states because the information is implicit or missing our model must process to explore answers. One processing method is to seek other subjects which have a relation with the question subjects in current world states, then use the information to ll out missing relations to produce possible answers. Another process is to trace back to previous world states in order to look up information. 3.5.1

Planning for Missing Information

When our model gets a question regarding missing information, it will try to nd a reference entity and plan for the missing part. For example, the question "Where is Daniel?" asks for location information about the character Daniel. If we cannot nd the answer from current world states, but the question subject has relations with other entities such as has possession(Daniel; f ootball), our model will mark the entity as a reference. For example, our system considers the states about the reference it can infer possible answers from, like at(Daniel; unknown) but has possession(Daniel; f ootball) and at(f ootball; garden), then gives the possible answer should be "garden". It also traces back to previous world states where the last time it know about the question subject (Daniel), and with the possible answer at(Daniel; garden), it uses the following planning procedure to gure out the possible missing information. In this example, our system will set previous state at(Daniel; unknown) as the initial state and the possible answer at(Daniel; garden) as the goal state. After that, the model will assign previous world states as preconditions. By matching the preconditions, the system gets a series of actions, and then according to the e ects of the actions in the action set, our system choose a possible missing action. Algorithm 4 Planning 1: possible inf o =[] 2: initial state previous state about the question subject 3: current state initial state 4: goal state the possible answer 5: precondition set previous world states 6: while current state not match goal state do 7: if precondition set matches elements in action groups then 8: add to doable set 9: if the e ect of any a 2 doableset matches goal state then 10: update possible inf o 11: current state e ect of a 12: else 13: 14: 15: 16: pick an a update possible inf o current state e ect of a update precondition set 17: return possible inf o 3.6

Performance on bAbI Tasks

The bAbI task is a proxy task which evaluates reading comprehension via question answering. It measures a language understanding system in di erent aspects of abilities, including whether the system can answer questions via chaining facts, simple induction, deduction, and so on. In our experiments, we use the task set to identify successful and insu cient parts of our model. In our experiments, we get almost 100% correctness in the following tasks sets. We tested tasks 1, 2, 3, 6, 8, and 10. Tasks 1, 2 and 3 provide test sets that answer

Task ID MemNet PE LS RN MemNet PE LS RN JOINT Our Model

questions about single supporting facts, among other facts or questions about the combination of information from sentences. In the simplest case, the task has questions like, \Mary traveled to the o ce. Where is Mary?" which only asks about information that is provided directly. When the task tests the understanding of two or more facts, the questions require references of other information to get information regarding the question subject. For example, \John is in the playground. John picked up the football. Where is the football?" The question subject is the football, but to answer this question information about John is also needed.

For the simplest type of tasks, our model can directly get an answer from its knowledge base. Because the input sentences already contain explicit information regarding answers, our model can link the e ect of actions to world states when processing input sentences. Therefore, the needed information is accessible in the current world states. When answering the questions about a combination of information, our model rst queries the information about question subjects from its knowledge base, and then nds the reference object from the queried world states. From the information of the reference object, the question can be answered correctly. Task 6 is very similar to task 1, it asks questions about supporting facts. But it aims to test the ability of a model in answering true/false type questions. Such as \Is John in the playground?" Our model can generate these kind of answers correctly by placing related information into a right category of answer framework. Task 8 requires models to generate a list that is composed by a set of single word answers. A question example is \What is Daniel holding?" Our model answers these questions by searching the states of objects in the described world. Task 10 gives possibility statement instead of facts. Our model preserves uncertain states while processing input sentences in the very beginning, so we can answer the questions. Task 15 and Task 16 test basic deduction and induction, respectively. In Task 15, sentences describe entities with some characteristics. After each characteristic, sentences give a subject that inherits properties from the entity, and then asks questions about the characteristics. Task 16 gives di erent properties of subjects in sentences. It uses another subject as a question subject and asks if the question subject has some of the properties with the described subject. In our processing stage these assertions are connected through the same subjects and our model stores the relations as states of subjects, the is a relation helps us to keep the information that be used in deduction or induction. Therefore, our model can also nd answers from its knowledge base.

For these tasks, we compare our result with the MemNet which is cited by bAbI paper [WBC+15] in Table 3. 4

Extended Task Framework

Although bAbI tasks cover many aspects of evaluations about understanding models, they give explicit information in each task. In addition to answering questions about described things, we are also interested in whether the model can answer questions with implicit information. When people read a paragraph of text, like a story, changing world states are not always clearly stated. We propose a task framework that modi es the bAbI task to test an aspect of the ability of understanding models. The extended task removes information from the original text in order to test whether a model can use known information to infer the implicit part. The following are examples of the extended task. Here is an example of the original bAbI task, Two Supporting Facts: 1. Mary moved to the bathroom. 2. Mary got the football there. 3. John went to the kitchen. 4. Mary went back to the kitchen. 5. Mary dropped the football. 6. John got the football there. 7. Where is the football? Kitchen 4 5

In this case, we can see that the answer to the question is the kitchen because the question subject, the football, was mentioned in sentences 4 and 5. To create an inference task, our modi cation will be:

We consider items that show up together in a sentence with the question subject. These guarantee there exist certain relations between the question subject and things in the world. Then we remove the information directly related to the question to create the inference part. In this example, because the question asks about location, we remove the sentence which gives location information. To answer these questions, the models must determine information from the relation between the two subjects. For this example, we can answer that John is in the kitchen because he got the football, and the football was dropped in the kitchen. With the same original text, and following the rule we descried above, another modi cation could be: 1. Mary moved to the bathroom. 2. Mary got the football there. 3. John went to the kitchen. 4. 5. Mary dropped the football. 6. John got the football there. 7. Where is Mary? Kitchen 3 5 6

Similarly, the question subject Mary and a described item, the football, are in the same sentence. Also, the location information of the question subject is removed. We can answer this question because Mary dropped the football somewhere, but John got the football in his location, so we know that Mary should be at the same place with John. These modi ed rules test not only the ability to infer unstated information from given sentences, but also evaluate the ability to understand implications like if a subject can conduct actions on some objects that they must be in the same place. Our model solves the extended task through the following process. In the former modi cation, the answer searching process starts from the initial state that John is at an unknown location and the football is in the kitchen, and the goal state would be that John is in the same location with the football because he is able to get the football. By searching the possible actions that e ect John's state from an unknown place to the kitchen, our model can not only answer the question but also ll back the missing part. In the latter case, the initial state will be that Mary is at an unknown location as well as the football is in the kitchen, and the goal state will be that Mary and the football are in the same location. 5

Conclusions

This work, while preliminary, sets up a platform for further research in question-answering and narrative abstraction. There are several areas of further improvement. We currently use a static set of PDDL operators O to build our inferred world states from the knowledge representation. In the future, we'd like to dynamically build this set. One avenue is to learn the set by reading stories [CMW13]. Another would be to use a lexicon, like VerbNet [Sch05], to map relations in the knowledge representation to a database of PDDL operators. bABI is one of several corpora that are currently being actively developed within the NLP community. We are already looking at expanding this work to include other datasets, such as SQuAD [RJL18]. [CG18]

Christopher Clark and Matt Gardner. Simple and E ective Multi-Paragraph Reading Comprehension. In Meeting of the Association for Computational Linguistics, pages 845{855, 2018.

Stephen N. Cresswell, Thomas L. McCluskey, and Margaret M. West. Acquiring Planning domain models using LOCM. The Knowledge Engineering Review, 28(2):195{213, 2013.

Ernest Davis, Leora Morgenstern, and Charles L. Ortiz Jr. The First Winograd Schema Challenge at IJCAI-16. AI Magazine, 38(3), 2017.

David K. Elson and Kathleen R. McKeown. A Platform for Symbolically Encoding Human Narratives. In AAAI Fall Symposium on Intelligent Narrative Technologies, 2007. [GJWCL61] Bert F. Green Jr., Alice K. Wolf, Carol Chomsky, and Kenneth Laughery. Baseball: An Automatic Question Answerer. In Western Joint IRE-AIEE-ACM Computer Conference, pages 219{224. ACM, 1961.

Sarah Harmon. Narrative Encoding for Computational Reasoning and Adaptation. PhD thesis, University of California, Santa Cruz, 2017. [HBCW16] Felix Hill, Antoine Bordes, Sumit Chopra, and Jason Weston. The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations. In International Conference on Learning Representations, 2016. [HKG+15]

Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. Teaching Machines to Read and Comprehend. In Advances in Neural Information Processing Systems, pages 1693{1701, 2015.

Lynette Hirschman, Marc Light, Eric Breck, and John D. Burger. Deep Read: A Reading Comprehension System. In Meeting of the Association for Computational Linguistics, pages 325{332, 1999.

Gary G. Hendrix, Earl D. Sacerdoti, Daniel Sagalowicz, and Jonathan Slocum. Developing a Natural Language Interface to Complex Data. Transactions on Database Systems, 3(2):105{147, 1978. Hsin-Yuan Huang, Chenguang Zhu, Yelong Shen, and Weizhu Chen. Fusionnet: Fusing Via FullyAware Attention with Application to Machine Comprehension. In International Conference on Learning Representations, 2018.

Mandar Joshi, Eunsol Choi, Daniel S. Weld, and Luke Zettlemoyer. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. In Meeting of the Association for Computational Linguistics, 2017.

Karin Kipper, Anna Korhonen, Neville Ryant, and Martha Palmer. A Large-scale Classi cation of English Verbs. Languge Resources and Evaluation Journal, 42(1):21{40, 2008.

Hector J. Levesque, Ernest Davis, and Leora Morgenstern. The Winograd Schema Challenge. In AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning, volume 46, page 47, 2011.

Wendy G. Lehnert. A Conceptual Theory of Question Answering. In International Joint Conference on Arti cial Intelligence, pages 158{164, 1977.

Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. RACE: Large-scale ReAding Comprehension Dataset From Examinations. In Conference on Empirical Methods in Natural Language Processing, 2017. [MCH+16] Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, and James F. Allen. A Corpus and Evaluation Framework for Deeper Understanding of Commonsense Stories. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 839{849, 2016. [MGH+98] Drew McDermott, Malik Ghallab, Adele Howe, Craig Knoblock, Ashwin Ram, Manuela Veloso, Daniel Weld, and David Wilkins. PDDL - The Planning Domain De nition Language. Technical Report CVC TR98003/DCSTR1165, Yale Center for Computational Vision and Control, 1998. [MRL+17]

Nasrin Mostafazadeh, Michael Roth, Annie Louis, Nathanael Chambers, and James F. Allen. LSDSem 2017 Shared Task: The Story Cloze Test. In Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics, pages 46{51, 2017.

Hwee Tou Ng, Leong Hwee Teo, and Jennifer Lai Pheng Kwan. A Machine Learning Approach to Answering Questions for Reading Comprehension Tests. In Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pages 124{132, 2000.

Matthew Richardson, Christopher J.C. Burges, and Erin Renshaw. MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In Conference on Empirical Methods in Natural Language Processing, pages 193{203, 2013. [RT00] [RZLL16] [Sch05] [SH00] [Sho74] [SKFH17] [WSR18] [YYM15]

Pranav Rajpurkar, Robin Jia, and Percy Liang. Know What You Don't Know: Unanswerable Questions for SQuAD. In Meeting of the Association for Computational Linguistics, 2018. Ellen Rilo and Michael Thelen. A Rule-Based Question Answering System for Reading Comprehension Tests. In Workshop on Reading Comprehension Tests as Evaluation for Computer-Based Language Understanding Systems, pages 13{19, 2000.

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ Questions for Machine Comprehension of Text. In Conference on Empirical Methods in Natural Language Processing, page 2383, 2016.

Karin Kipper Schuler. VerbNet: A Broad-Coverage, Comprehensive Verb Lexicon. PhD thesis, University of Pennsylvania, 2005.

Lenhart K. Schubert and Chung Hee Hwang. Episodic Logic Meets Little Red Riding Hood: A Comprehensive, Natural Representation for Language Understanding. Natural Language Processing and Knowledge Representation: Language for Knowledge and Knowledge for Language, pages 111{ 174, 2000.

Edward H. Shortli e. A Rule-Based Computer Program for Advising Physicians Regarding Antimicrobial Therapy Selection. In ACM Conference, pages 739{739, 1974.

Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. Bidirectional Attention Flow for Machine Comprehension. In International Conference on Learning Representations, 2017. Johannes Welbl, Pontus Stenetorp, and Sebastian Riedel. Constructing Datasets for Multi-hop Reading Comprehension Across Documents. Transactions of the Association for Computational Linguistics, 6:287{302, 2018.

Yang , Wen-tau Yih, and Christopher Meek . WikiQA: A Challenge Dataset for Open-Domain Question Answering . In Conference on Empirical Methods in Natural Language Processing , pages 2013 { 2018 , 2015 .