Machine Reading as Model Construction Peter Clark Allen Institute for AI (AI2) Seattle, WA peterc@allenai.org 1 WHAT IS MACHINE READING? stem into the leaf. Carbon dioxide enters the leaf. Light, With the advent of large datasets of paragraphs + questions, e.g., water and minerals, and the carbon dioxide all com- SQuAD [4], TriviaQA [3], there has been renewed interest in general- bine into a mixture. This mixture forms sugar (glucose) purpose “reading comprehension” (RC) systems, capable of answer- which is what the plant eats. ing questions against those paragraphs, e.g., [5, 6]. These systems While reading comprehension (RC) systems can reliably answer have become remarkably effective at factoid QA. However, they lookup questions such as: require extensive training data, and can still struggle with queries (1) What do the roots absorb? (A:water, minerals) requiring complex inference [1]. The extent to which these systems have truely read and understood the paragraph remains unclear they struggle when answers are not explicit, e.g., [2]. (2) Where is sugar produced? (A:in the leaf) At the other end of the spectrum, AI has also developed sophis- ticated formalisms for modeling the world, e.g., situation calculus, For example, the RC system BiDAF [5] answers “glucose” to this event calculus, qualitative modeling. These frameworks allow sys- second question. This question requires knowledge and inference: tems to represent facts which are known, and infer facts which If carbon dioxide enters the leaf (stated), then it will be at the leaf are unknown. Models built with these frameworks constitute an (unstated), and as it is then used to produce sugar, the sugar pro- understanding of the world, in that sense that they are predictive: duction will be at the leaf too. This is the kind of inference that If the model’s computational clockwork moves in a way similar to our system, ProComp (“process comprehension”), is able to model, the world, then the model can predict how the world will behave, using a structured representation of events and states. constituting a degree of understanding of the world. In this context, Our approach is illustrated in Figure 1, and we briefly summarize machine reading can be viewed as the task of constructing such it here. First, ProComp extracts a Process Graph from the paragraph, models from text, given a particular modeling framework in which representing the event sequence in the process. It then performs to express those models. a STRIPS-like simulation of the process, using a set of precondi- While it is possible that a neural system might eventually be tion/effect rules about events, mined from VerbNet. Finally, a small able to infer a predictive, neural model of the world solely from set of answer procedures operate over that simulation, allowing sev- large numbers of examples, we do not believe this is likely in the eral classes of questions about change to be answered (e.g., “Where near future. Rather, we see the way forward as combining the is X at step Y?”, “What entities change size during the process?”). pattern-learning techniques of neural systems with the modeling Although our initial work has used largely traditional techniques, capabilities of structured representations. AI modeling frameworks it is still able to outperform RC systems on questions about change, provide a set of primitives for constructing predictive models, and and thus illustrates the importance of modeling in machine reading. neural systems can help construct models within those frameworks that best fit data. The grand challenge for machine reading, going 3 INTEGRATING NEURAL METHODS forward, is combining these two technologies together to do this. Our initial system uses three basic operations: • (Event extraction) Given a sentence describing an event, 2 MACHINE READING ABOUT PROCESSES identify the event and the participants within it. At AI2 we have been pursuing a specific genre of machine reading • (State prediction) Given a sentence describing an event, and along these lines, namely reading paragraphs describing processes an entity mentioned in the sentence, predict the state of the (e.g., photosynthesis). Our goal is not to simply answer lookup entity before/after the event (where the state of the entity is a questions, but also answer questions that go beyond the text, in set of properties associated with it, selected from a predefined particular about the states that exist during a process. Such ques- set). tions are challenging because those world states are often implicit, • (State inference) Given a partial description of the entities making questions hard to answer from surface cues alone. and their states during the process (i.e., a partially filled For example, consider the following paragraph about photosyn- Participant Grid), fill in the remaining states. thesis: To date, we have collected a large number of hand-annotated exam- Chloroplasts in the leaf of the plant trap light from the ples of these predictions to evaluate our system ProComp. However, sun. The roots absorb water and minerals from the soil. clearly this data can also be used for learning, to train a system This combination of water and minerals flows from the to make these inferences. Note that this does not obviate the need K-CAP2017 Workshops and Tutorials Proceedings, 2017 for ontology design - the appropriate dimensions of modeling still ©Copyright held by the owner/author(s). need to be selected. However, it does offer an example-based means K-CAP2017 Workshops and Tutorials Proceedings, 2017 Peter Clark Figure 1: An illustration of machine reading as model construction and inference: Here the constructed model is a process graph (sequence of events), and inference is state-space simulation, presented graphically as a “Participant Grid”. Each row in the Grid is a state (time vertically downwards), each column is a process participant, and each cell shows facts true of a participant in a state. For brevity, @ denotes is-at(), yellow lines denotes exists(), red denotes a direct consequence of an event, green an inferred consequence. For example, at line 8 in the Grid (labelled “Assertion”), the “CO2 enters leaf” step asserts that CO2 is therefore @leaf after the event. By inference, the sugar must therefore be produced at the leaf too (green box), a fact not explicitly stated in the text. for connecting that ontology and the reasoning to data. This is an [3] Mandar Joshi, Eunsol Choi, Daniel S. Weld, and Luke Zettlemoyer. 2017. TriviaQA: exciting direction we are pursuing. A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. In Proc. ACL’17. [4] Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 4 SUMMARY 100,000+ questions for machine comprehension of text. In Proc. EMNLP’16. [5] Min Joon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2017. Unlike much recent work, we view machine reading as the task of Bidirectional Attention Flow for Machine Comprehension. In Proc. ICLR’17. constructing a model from text using a particular modeling frame- [6] Junbei Zhang, Xiaodan Zhu, Qian Chen, Lirong Dai, and Hui Jiang. 2017. Explor- ing Question Understanding and Adaptation in Neural-Network-Based Question work. The framework provides the building blocks for modeling a Answering. arXiv preprint arXiv:1703.04617 (2017). certain class of phenomena, and the task of reading is to construct a model within that framework. We have illustrated this for reading text about processes, using a state-based modeling framework. There is a symbiotic relationship between text and modeling frameworks: • Text suggests which modeling framework is appropriate (e.g., the text appears to be describing a process, so use a framework suitable for processes) • The modeling framework provides expectations about what to look for in the text (e.g., given it’s a process, expect to see events and their participants) This approach does not remove the need for learning, rather it provides a scaffolding within which learning can take place, and a mechanism for then supporting inference and prediction - activities that truly demonstrate that the machine has understood what it has read. REFERENCES [1] Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems. 1693–1701. [2] Robin Jia and Percy Liang. 2017. Adversarial Examples for Evaluating Reading Comprehension Systems. In Proc. EMNLP’17.