Open the Pod Bay Door: Using Ontology to Understand Instructions1 Yixin SUN a , Michael GRÜNINGER a , a Department of Mechanical and Industrial Engineering, University of Toronto, Ontario, Canada M5S 3G8 Abstract. There have been a great deal of developments and implementations for conversational robots like Amazon Alexa, and Google Home. Yet, little research has been done for robots to understand a more rigorously structured, instruction- wise natural language. In this paper, we present an ontology approach to convert instructions from natural language to logical formulas, Process Specification Lan- guage (PSL) in this case. Verbs, therefore, are treated as indication of actions or processes and are decomposed semantically to understand meanings. As a method of evaluating this approach, verbs, originally in PSL, are worked towards natural language. In this paper, we present how to take natural language instructions and map them to appropriate cutting classes; and moreover, we present the capability of this ontology-centered approach to go in a reverse direction (from PSL to natural language). Keywords. process ontology, semantic parsing, instructions, cutting process 1. Introduction In the movie, 2001: A Space Odyssey, the ship’s computer Hal would not follow Dave’s spoken instructions, and did not open the pod door. When someone utters ”open the pod bay door”, indeed, only the end result is specified, lacking all of the detailed steps in between. This door example best mimics human-to-human instruction, as opposed to the way that programmers will hard-code an autonomous robot, stressing even the robot’s exact angle with the door knob. This paper focuses on a higher level of abstraction and thus studies how a robot can understand instructions uttered in natural language. The form of matters we have based our research on is verbal instructions, which is inspired by the notion of Physical Turing Test. The Physical Turing Test extends the Turing Test by specifying questions that re- quire the integration of perception, reasoning, and action. In particular, Ortiz et al [1] pro- pose two tracks for the Physical Turing Test. The Construction Track focuses on build- ing predefined structures (such as a tent or modular furniture) given a combination of verbal instructions and images. The research presented in this paper is motivated by the following long-term vision related to the Construction Track: 1 Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Given a set of verbal instructions, together with a sequence of annotated images, answer questions about the activities that can possibly occur during performance of the instructions and the various objects that participate in these activity occurrences. In particular, we write out instructions not in a step-by-step manner, but in the most usual way for a worker or a practitioner to follow. By mapping natural language to the Process Specification Language (PSL) Ontology [2], we approach the problem by match- ing action verbs in a sentence to a particular process and then mapping this process to a first-order logic formula. Understanding verbs becomes our primary goal moving for- ward. Furthermore, in order to validate whether the process generated from natural lan- guage is correct or not, we can invert this approach and remap the logical formula - that is the PSL process description - to natural language. This is a proof-of-concept paper that proposes how semantic parsing can work from an ontology-focused perspective and verifies its feasibility. 2. Literature Review The Alexa Meaning Representation Language (AMRL) [3] introduced a graphical hierar- chy of the main Amazon Alexa ontology with nodes and arcs. Classes (each represented by a node) are connected with a property (an arc) in between, while children classes are connected to parent classes. In this way, Amazon Alexa can understand a more complex and cross-domain conversation. In this study, Amazon Alexa had better performance in solving ambiguities when mapped to AMRL, rather than spoken language understand- ing. This study mainly sought solutions for flexibility and ambiguity in which its ap- proach could be useful to our study. Nonetheless, our focus differs from Amazon Alexa insofar as we are not interested in daily conversational languages; instead, we are seek- ing to understand a more formally written language which can be best represented by the category of instructions. According to Levin’s definition of English Verb Classes [4], verbs can be divided into four categories: hit verb, touch verb, cut verb, and break verb. These classes differ by whether a change of state occurs, whether a contact with another object is made, and whether it in nature is a causative/inchoative motion, respectively. Under each category lies different subclasses, categorized by whether or not the subclass passes the designated alteration tests. Based on the nature of the tests, verbs belong to the same classes and subclasses do not necessarily imply they are synonyms. Classifying verbs in this way can only provide guidance to syntactic uses. VerbNet [5] further extends Levin’s idea to a coding library that can be imported into Java/Python/C++. However, this package is not comprehensive enough to be incorporated into our study as each subclass has a limited number of sample verbs. Nonetheless, Levin has pointed out a possible approach for instruction mapping: verb synonyms under the same subclasses can be used in the same sentence and convey the same meaning. In this way, The Cutting Process Ontology [6] proposed nine different ways of cutting a two-dimensional piece of sheet metal. The paper introduced three new topological components: points, edges and surfaces. An edge is a subset of at least two points, while a surface is a subset of at least two edges. Furthermore, other definitions can be built on top of the existing components. For instance, a hole is a hollow surface within a solid one. The PSL Ontology was used to axiomatize the cutting processes in terms of whether any edges, holes and/or surfaces were created/destroyed (see Figure 1). e1 e7 e1 e1 e5 e1 e5 e5 e6 e8 e2 e7 e4 e2 e4 e7 e4 e2 e4 e6 e6 e2 e3 e6 e3 e5 e3 e3 (a) (b) (c) (d) e1 e11 e1 e1 e1 e12 e10 e10 e5 e5 e5 e9 e6 e2 e9 e4 e4 e2 e4 e8 e8 e5 e2 e6 e7 e6 e4 e6 e2 e7 e3 e3 e3 e3 (e) (f) (g) (h) Figure 1. Classes of cutting processes. Through mapping natural language expressions to the Cutting Process Ontology, synonyms of “cut” such as “chip”, “split”, “snip” all belong to the same subclass, indi- cating that these synonyms can be represented by one or more of the cutting classes iden- tified in this paper. This indicates that synonyms can share some of the existing ontology axioms. When it comes to translating instructions into PSL, this finding can be useful as fewer new terms need to be created. Furthermore, physical cutting activities, in nature, are not limited to two-dimensional shapes. Some of the instructions may require cutting action to be done on three- dimensional shapes when it comes to tree trunks or food recipes. There also lie some intangible objects like movies, words and genes that when the cutting action were to be performed on them, they would not change shape; instead they would change sequence- wise. To better describe all cutting processes and instructions, extending the cutting on- tology to include three-dimensional shapes can be considered for future work. 3. Approach We focus on developing an ontology-specific semantic parser, in which treats the action verb as an utterance of processes and convert that into a first-order logic process de- scription. For evaluation and verification purposes, our approach can be proved by going backwards in this pipeline. We can test the natural language instructions deduced from the axioms against what people understand from natural language. Whether the two ut- terances yield the same meaning can be treated as an evaluation criteria. The verb, “cut” is taken as an example to demonstrate this process flow. See Figure 2 for details. 3.1. Semantic Analysis in the context of PSL A PSL description is a logical formula that can specify an occurrence of activity in terms of time-points while capturing changes in properties throughout the activity. We consider the verb in a natural language sentence to be the representation of the activity that causes objects to change state. In other words, only when something is done will the subjects Figure 2. Process Flow Diagram of Semantic Parsing. be affected. Therefore, the first thing to do is to locate where a verb is and of which two words it falls in between; this can determine the primitive activity and the agent and/or operator. A syntactic parsing tool, SpaCy [7], was applied to identify part-of-speech tags. 3.2. Cutting Ontology After natural language processing by SpaCy, we examine verbs specifically to interpret their intended semantics. Motivated by the Cutting Process Ontology, topological in- structions were created based on existing PSL ontology. Since cutting is the verb in all of the written instructions, by translating them into PSL, we hoped to find some pat- terns that could apply universally in terms of converting instructions into logical formu- las. Therefore, the primary findings that will be presented in this paper centers around the physical action verb, ”cut”. Figure 1 illustrates the nine different classes of cutting activities and the corresponding changes to the shape of the object. 3.3. Cutting Instructions In natural language, cutting is an action that does not require much description. It all comes in naturally when a human were to cut something. This is evident in the data collected from WikiHow.com 2 in which physical cutting activities do not stand alone as an individual instruction of multiple steps to follow. Instead, the action of cutting is mentioned in 193 different activities ranging from “cutting onions”, “cutting bread” to paper cutting. Yet, ambiguity exists when the instructions given were “Cut a piece of sheet metal”. Questions like “in what shape should the end result be” arise. Therefore, to better examine the backward loop of converting natural language into PSL, we created 2 Instruction data was scraped off from the website WikiHow.com as it was considered to be a go-to place for real-life problems, ranging from math questions to furniture assembly and maintenance. In total, 922 rows of data were scraped from the featured article page on May 6, 2020. instructions in respect to the nine classes of cutting from a topological perspective. That is, only relationships between edges, surfaces and holes were explained. Table 1 shows the finalized written instructions with respect to the nine classes of the Cutting Process Ontology [6]. Table 1. Cutting Instructions from a topological perspective. The letters in the first column refer to Figure 1 . Illustration No. Written Instructions a On one edge of the rectangle, from any point excluding the corners, cut so that three additional edges can be created, while creating no extra surfaces or holes. b From one corner of the rectangle, make a cut so that two additional edges are created, while creating no extra surfaces or holes. c Make a cut starting from any point (excluding the corners) of the rectangle and ending at any point (excluding the corners) on the opposite edge. The end result should be two separate surfaces that make up four additional edges. No extra holes should be created. d Starting from any corner of the rectangle and ending at any point excluding the corners on the opposite edge, the cut should create two separate surfaces from the rectangle with three additional edges. No extra holes should be created. e From any corner of the rectangle, make a cut that ends on the opposing corner. Two separate surfaces and Two additional edges should be created. No extra holes should be created. f From any point inside the rectangle, make a cut that is a closed-curve. The cut should not reach any edges of the rectangle. Two additional edges should be created. An additional edge should be created. No extra surfaces should be present. g From any point inside the rectangle, make four cuts that create four new edges that do not intersect with any of the existing edges. A hole should be created as a result. No extra surfaces should be present. h Continue from h), destroy the existing hole. Make two cuts starting from the width of the outer rectangle, ending at two points (excluding corners) of the closest inner edge. Four additional edges should be created. No extra surfaces should be created. i Continue from h),destroy the existing hole. Make a cut (Cut E) starting from any corner (Corner A) of the outer rectangle, ending at a corner (Corner B) of the inner shape. This cut should not cut through any part of the rectangle that does not have any material. Make a second cut (Cut F) starting from any point on the outer edge in which Corner A intersects, ending at any point on the inner edge in which Corner B intersects. This cut should not cut through any part of the rectangle that does not have any material. Two additional edges should be created. No extra surfaces should be created. 4. Shape Cutting Ontology Since PSL specifies a change of state, locations or directions, the starting position men- tioned in the written instruction were neglected. Only starting state and ending state were taken into account. Following the open-the-door example mentioned in section 1, human instructions are usually given in a higher level of abstraction. Correspondingly, the classes of the Cutting Process Ontology were applied to instructions shown in Ta- ble 1, and results can be found in Table 2. Table 2. Axiomatization of cutting processes. The letters in the first column refer to Figure 1 and the axioms in the second column use classes from the Cutting Process Ontology. Cutting Process Axiomatization a preserve sur f ace(a) ∧ preserve hole(a) ∧ change one meet(a) ∧ create three edge(a) b preserve sur f ace(a) ∧ preserve hole(a) ∧ change one meet(a) ∧ create two edge(a) c create sur f ace(a) ∧ preserve hole(a) ∧ change one meet(a) ∧ create f our edge(a) d create sur f ace(a) ∧ preserve hole(a) ∧ change one meet(a) ∧ create three edge(a) e create sur f ace(a) ∧ preserve hole(a) ∧ change one meet(a) ∧ create two edge(a) f preserve sur f ace(a) ∧ destroy hole(a) ∧ preserve meet(a) ∧ create two edge(a) g preserve sur f ace(a) ∧ destroy hole(a) ∧ change one meet(a) ∧ create f our edge(a) h preserve sur f ace(a) ∧ destroy hole(a) ∧ change two meet(a) ∧ create f our edge(a) i preserve sur f ace(a) ∧ destroy hole(a) ∧ change one meet(a) ∧ create two edge(a) 4.1. Discussion The instructions that we were able to recreate from the given illustrations have only in- cluded the starting and ending state of the shape. As a result, only topological relation- ships were involved, describing whether an edge, a surface, and/or a hole were to be preserved or destroyed throughout the process. Yet, geometric relationships could also be incorporated, making up a more detailed and thus more robotic-oriented instruction. For instance, a geometric way of describing illustration (a) would be: On one width of the rectangle, from any point excluding the corners, make a cut towards the centre of the rectangle. The cut should not be parallel to the height of the rectangle. Stop before the cut reaches the opposing width. From the end point of the cut, make another cut that ends on any point of the original width excluding corners. This would be a detailed version of how to make a cut, instead of only stressing the starting and ending conditions. Ge- ometric concepts are largely introduced here: “towards the centre”, “parallel to height”, and etc. If this were to be the simplest version of cutting instructions, then the cutting ontology would need an upgrade to include geometric definitions. 5. Conclusion This paper presents our initial efforts towards mapping natural language language in- structions to process descriptions using the PSL Ontology. In particular, cutting pro- cesses are used as a demonstration of our approach. Meanwhile, as we explore mapping procedures, we have found that going in the other direction - from PSL process descrip- tions to natural language - can help us understand how instructions should be written and interpreted. For example, in a day-to-day work routine, human beings communicate instructions in a sense that specify end results, yet inevitably neglecting how to make a cut. “How is cutting a sheet metal different from cutting a tree trunk?” is one possible question that can arise in a physical Turing Test. Instructions, as a form of written natural language, tend to be more straight-forward and less metaphorical. As for future studies, conversational natural language can be stud- ied that incorporate more rhetorical devices. For cutting instructions specifically, its on- tology could be extended in such a way that three-dimensional and intangible objects will be included to accommodate a wider range of cutting instructions. References [1] Ortiz, Jr, C. L. (2016). Why We Need a Physically Embodied Turing Test and What It Might Look Like. AI Magazine, 37(1), 55-62. [2] Gruninger, M. (2003) Ontology of the Process Specification Language, pp. 599-618, Handbook of On- tologies and Information Systems, S. Staab (ed.). Springer-Verlag [3] Kollar, T., Berry, D., Stuart, L., Owczarzak, K., Chung, T. ,Mathias, L.,Kayser, M.,Snow, B.,Matsoukas, S. (2018) The Alexa Meaning Representation Language. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technolo- gies, Volume 3 (Industry Papers) [4] Levin, B. (1993) English Verb Classes and Alternations: A Preliminary Investigation. University of Chicago Press. [5] Brown SW, Bonn J, Gung J, Zaenen A, Pustejovsky J, Palmer M. (2019) VerbNet Representations: Subevent Semantics for Transfer Verbs. Proceedings of the First International Workshop on Designing Meaning Representations. [6] Gruninger, M. and Delaval, A. (2009) A First-Order Cutting Process Ontology for Sheet Metal Parts, Fifth Conference on Formal Ontology Meets Industry, Vicenza, Italy. [7] Choi, J., Tetreault, J., Stent, J. (2015) It Depends: Dependency Parser Comparison Using a Web-based Exraction Tool. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguis- tics and the 7th International Joint Conference on Natural Language Processing, pages 387–396,Beijing, China, July 26-31, 2015.