=Paper=
{{Paper
|id=Vol-2347/paper1
|storemode=property
|title=Contextualization through Simulation
|pdfUrl=https://ceur-ws.org/Vol-2347/paper1.pdf
|volume=Vol-2347
|authors=John A. Bateman,Mihai Pomarlan,Gayane Kazhoyan
|dblpUrl=https://dblp.org/rec/conf/c3gi/BatemanPK18
}}
==Contextualization through Simulation==
Contextualization through Simulation John A. BATEMAN, Mihai POMARLAN and Gayane KAZHOYAN Bremen University, Bremen, Germany Abstract. It is well known that the semantics of human utterances only partially covers the meaning the speaker wishes to convey to the listener, because the listener can be expected to fill in gaps, specialize descriptions, and resolve ambiguities by using context clues. Context is often broadly construed to include previous utter- ances, but may also contain information about the present situation or history of the communicators, such as what task apart from communication they are trying to do now or about to engage in. This underspecificity of human language is a problem for interactions between humans and service robots, because, unlike a human lis- tener, a robot presently cannot be expected to ably employ contextual clues to fill in semantic gaps. Further, because the nature of the context clues is extremely varied, one can expect the reasoning mechanisms that would be able to handle this chal- lenge to be very heterogeneous also. In this paper, we present a framework to tackle aspects of contextual inference related to the interpretation of spatial relations by combining ontological engineering principles and situated embodied simulations of robotic agents. These simulations are a new reasoning tool to find constraints on human command interpretation. Keywords. embodiment, simulation, linguistic semantics, ontological analysis, formal ontology 1. Introduction: background and motivations The relation between descriptions, the bearers of meaning exchanged in communica- tion, and the contexts which those descriptions are taken to pick out, is a complex and flexible one. It goes further than just adding contextual information on top of what the descriptions contain (the classic semantics–pragmatics divide [20]). Descriptions sug- gest or constrain contexts, which makes it challenging to relate levels of description, e.g. linguistic utterances, with actual contexts of use, e.g. a physical environment in which language users are embedded in. Our proposed framework approaches this issue through a combination of ontolog- ical engineering principles and situated embodied simulations involving robotic agents. Simulation allows implementation and experimentation at a level of abstraction not usu- ally accessible to linguistic analysis alone. Also, it increasingly appears to be the case that some form of simulation may play a crucial role in language comprehension and production for humans as well. We will focus on the contextualization issues for the semantics of natural language construals of space, spatial entities, events and activities unfolding in space and time [7]. Contextualization is seen as embodied simulations in which actions and movements can be performed corresponding to the linguistic descriptions offered. Actions and move- ments will vary drawing on a range of parameters, thereby providing a formalized con- nection between linguistic descriptions and contextualized representation and consider- able flexibility concerning just what actions and movements ensue. Our problem setting is an application scenario drawn from service robotics, which showcases the issues of contextualization. A robot needs to know in great detail what it is tasked to perform, or it will not do anything. Natural language instructions on the other hand are very underspecified, requiring inference on the robot’s part to fill in the gaps and get to some actionable description of behavior. We demonstrate how access to simulation via appropriately managed intermediate levels of qualitative description makes relevant information derived from simulation available for the system as a whole. 2. Problem Setting: Everyday activities and their descriptions Consider this scenario: one or several intelligent robots are performing everyday house- hold activities with guidance or instruction from humans using natural language. Human utterances need to be somehow translated to robotic actions and movements. However, natural language describing everyday activities is often highly schematic, multiply am- biguous, and underspecified. At the same time, a robot needs very precise descriptions of the actions it is tasked to perform. This same difficulty must underlie language un- derstanding in humans as well, however, there the necessary inference mechanisms are opaque. The robotic scenario allows us access to all aspects of cognition and its interac- tion with the world, including motor control, perception and sensory feedback. A cognitive ability of our robotic system which we believe important is simulation: the ability to try out future actions and assess potential outcomes without performing those actions. This is likely an ability of human language comprehenders as well, and there is substantial research that suggests language use may involve partial simulations [2,17,28,13,1], even if the precise mechanisms are still unclear. Consider then this instruction: Put the plates on the table. Apart from grounding [29] discourse elements into context, interpretation also needs to select a location to move the plates to. The sentence to interpret merely constrains this location to be in a certain relation (indicated by ‘on’) to a relatum (the ‘table’). The manner in which ‘putting’ is to be performed, that is, the exact motion and resulting state of the moved objects, also needs to be selected. We use a ‘linguistically-motivated’ ontology that provides an ontology-like level of semantic description for natural language called the Generalized Upper Model (GUM) [7], in which categories and relationships are motivated by grammaticalization patterns within natural language [4]. Such a linguistic semantics alone, however, is insufficient for the contextualization task, even if it does constrain what would be valid states of the world prior, during, and following the action. These constraints will parameterize the range of simulations employed for contextualization. What a two-level semantics, particularly the GUM treatment of space, movement and action [7], buys us here is a level of semantic representation that is ‘rigid’ in its characterization of discourse entities but also still supportive of flexible contextualization where quite diverse properties, aspects, and ‘conceptualizations’ of states of affairs can occur. The shallow linguistic semantics of our example sentence has several components derived compositionally during analysis. The utterance as a whole evokes a generalized configuration of ‘putting’ or ‘placing’. In standard Manchester description logic notation: (1) AffectingAction ≡ Configuration u ∃actee.SimpleThing u ∃placement.GeneralizedLocation The kind of entity that GUM allows to fill the placement role is a GeneralizedLocation: (2) GeneralizedLocation ≡ Circumstance u ∃hasSpatialModality.SpatialModality u ∃relatum.SimpleThing GUM includes a subontology of spatial relations, defined as subtypes of SpatialModality and often characterized in terms of functional requirements [11]. In the above example, use of ‘on’ invokes a functional support SpatialModality. Notions defining ‘on’ in terms of geometric properties will often not be satisfied by actual uses of the preposition (e.g., ‘the painting is on the wall’) and so the functional characterization offers a more generally valid description [7,3]. Here then is the semantics of the example utterance, expressed in standard SPL- notation [16]: (3) (e / AffectingAction :actee (p / plate) :placement (l / GeneralizedLocation :hasSpatialModality Support :relatum (t / table))) Characterization as functional support covers most cases where ‘on’ invokes a spatial location, but it does not specify an actual pose. However, contextualization may require the receiver of the utterance to form an idea of what poses are appropriate or not. The same issue appears for all spatial relationship between objects invoked by language (“in”, “over”, or “near”), when these are to be produced in the real world. To decide whether a particular pose is appropriate for instantiating a functional rela- tion, the intended behavior for the situation must be considered. Such behavioral charac- terizations lend themselves well to simulation. Therefore, to go from qualitative linguis- tic descriptions to actual coordinates in space can be achieved by having the linguistic part parameterize a process of sampling and validation of candidate poses. However, the nature of the behavior described is itself also underspecified. One way to interpret the example command is that the plates should be placed such that the bottom surface of each plate contacts the top surface of the table directly. Stacking the plates on the table is another plausible interpretation. One has to consider then the intended purpose of the described action: why are the plates brought to the table? If people are about to eat from the plates, then leaving the plates stacked is inappropriate. If the plates are simply brought to a place for easy access so they can be washed, stacking may in fact be desired. Using ad hoc, hard-coded ‘if-then’ rules for such decisions (e.g., ‘if setting the table, do not stack’) will not scale. Even a semi-structured environment like the home has too much variability, and such rules contribute no understanding for the robot of the purpose of its activities. What is needed, then, is a capability to answer questions like “would it be appropriate to use stacking now? if not, why not?” Without answers to such questions, a robot will be unable to achieve flexible, human-level contextualizations of even the simplest instructions. Simulation is a powerful means of acquiring such knowledge. Contextualization problems are also highlighted in the following examples: (4) a. place the cups/chairs near to each other b. place the plates near the sink, I’ll wash them later c. don’t place the plates too near the table edge (because we will eat from them) d. when setting a table for eating, place the fork to the left of a plate and a knife to the right e. put the plates and cups in the cupboard In this case the difficulty stems from ‘what is near?’. A nearness criterion will vary with the objects considered or rather with the activities the objects will participate in. Finally, the performance of the action is also an issue of contextualization. Some items form stable stacks (like plates) and others do not (like cups). When filling up a shelf, it is better to push the first items placed there to the back, so as not to block access for more items. These constraints are often not linguistically expressed, but rather learned, in the case of humans, through embodied experience. For a robot, simulation offers a way to account for them during behavior generation. Another action that can be performed in many ways is simple picking up. Consider: (5) a. pick up the spoon b. pick up the spoon from the table and use it to scoop the soup from the plate c. you can use the spoon from the table to eat the soup” (picking is implied here) d. pick up the spoon then crack the egg (a different grasp may be needed here than in the soup case) There are many ways to grasp an object; Figure 1 shows some examples drawing from the robotics domain. Deciding which is appropriate often depends on the context in which it is given, in particular with respect to the task for which the picked up object will be used. Figure 1. Some different ways to grasp a spoon. Deciding which one is appropriate will depend on context, such as what the spoon is to be used for. Even apparently ‘simple’ actions may, therefore, need to be contextualized very dif- ferently – and this flexibility is precisely that targeted by our approach to combining linguistic specification with embodied simulation as we shall now see. 3. Simulation as a method to test interpretations In the scenarios we analyze here, the robot is given a task which involves establishing, using, or destroying a spatial relation between objects, and it is unclear from the linguistic semantics how this relation is, or is to be instantiated in the world. The robot is however not acting alone – the task it performs happens in a larger context of other agents doing tasks dependent on what the robot does. In this section, we will describe how we organize contextual information into an ob- ject we call the “execution context”, and we show how it is used in parameterizing and interpreting a simulation. We will first present the robot executive program, and the pro- cess by which the semantics of a linguistic command is converted into an underspecified executable program. 3.1. The robot executive We use the Cognitive Robot Abstract Machine [8] to generate and control the behavior of the simulated robot. CRAM is a set of tools for developing reactive programs, and in- cludes a library of basic robot actions, logging of task executions [33], and a light-weight simulation environment called “projection” [21]. While projection abstracts away from some aspects of execution, such as arm trajectories between waypoints, it nonetheless captures interesting constraints about what configurations are reachable, which locations would result in object collisions, what are occlusion and visibility constraints, etc. An important notion in CRAM is the designator, a feature structure that symboli- cally describes an object, location, or action. A symbolic description is one that does not necessarily commit to a particular object identifier or set of coordinates; for example, a designator may refer to (6) (a location (near (an object (type cup)))) and make no mention of an actual position, or any region boundary, for points satisfying this description. However, to be actionable, a designator needs to be “grounded”, that is, such specific selection information needs to be added. This is achieved through various techniques implemented in CRAM. For locations, a set of candidates is randomly sampled from “costmaps” [34,27] then tested against a battery of validators such as absence of collision and whatever conditions may be added via the designator description, e.g. visibility or reachability. The costmaps are probability density functions initialized based on the designator content; for example, when looking for a location near an object then the costmap would be a gaussian with a maximum at the object center. The set of active samplers and validators can be changed at runtime. Pro- grams in CRAM use designators to pass arguments to each other, and these designators are only grounded when needed. 3.2. Converting an SPL to a CRAM program The actions in the CRAM library do not align with linguistically-motivated action cat- egories from GUM; purely linguistic categories are too broad to be directly employed on the execution side. In this subsection we set out a translation mechanism by which underspecified linguistic semantic representations of the kind introduced above can be converted to CRAM programs for driving simulations. The characterization given here draws on the overall mechanism introduced in [25]. The conversion process applies transformation rules to semantic specifications, or semspecs, which are SPL logical forms, gradually replacing GUM configurations with function objects compositionally constructed from building blocks taken from the CRAM action library. The transformation rules have an antecedent (a pattern to look for), a consequent (a pattern to replace the antecedent with), and scope restriction on where to look for the antecedent. We will use as a starting semspec the SPL above for the plate placement instruction (3). The first step is to obtain a CRAM designator for the actee via the following rule: (7) ( ( / ?x) (an object (type ?x)) (:actee)) where ( /?x) is the antecedent, ?x is a variable, and the underscore (‘ ’) unifies with anything. The consequent creates a CRAM designator for “an object of type ?x”. Because of the scope restriction, the antecedent will only be matched against the value of an :actee property in the main clause of the semspec. Designators for placement locations are obtained via rules such as: (8) ( (?l / GeneralizedLocation :hasSpatialModality Support :relatum ( / ?x)) (?l / (a location (on (an object (type ?x))))) (:placement)) For an affecting action with actee and placement, our rules infer a transport action: (9) ( ( / AffectingAction :actee ?a :placement ?l) (Fn [(lambda () (perform (an action) (type transporting) (object ?a) (target ?l)))]) nil) where the nil scope restriction means the antecedent may match the top level of the semspec. The consequent is a function object containing an executable CRAM action. Note that this places no constraint on the designator grounding processes. At this point, both “individual plates placed on table” and “stacked plates on table” are consid- ered plausible interpretations for the “on” relation. 3.3. The table-setting scenarios We will now illustrate how the ambiguity we noticed above about what “on” should mean may be resolved by simulation. The simulation scenario is described by an “execution context”, which is an object containing information about the main task and the agents and objects that participate in it, and the context it is performed in– in particular, the task(s) it is supposed to enable. The enabled task is to be run after the main task, and may be performed by a different set of agents. In one of our running examples, the main task is a table setting task to be performed by one robot, and the enabled task is eating, to be performed by two other agents (humans, but we use simulated robots to model them as well). Table 1 shows the contents of this execution context. Table 1. Execution context for testing task performance: stacked placement S CENE SPECIFICATION ( AN OBJECT ( TYPE KITCHEN - MODEL )) ( AN OBJECT ( TYPE PLATE ) ( AT ( A LOCATION ( ON ( AN OBJECT ( NAME SINK - TABLE )))))) ( AN OBJECT ( TYPE PLATE ) ( AT ( A LOCATION ( ON ( AN OBJECT ( NAME SINK - TABLE )))))) P LAN TO RUN ( ENABLE - LOCATION - GENERATION STACKING - ON ) ( PERFORM ( AN ACTION ( TYPE TRANSPORT ) ( ACTEE ( AN OBJECT ( TYPE PLATE ) ( QUANTITY : PLURAL ))) ( DESTINATION ( A LOCATION ( ON ( AN OBJECT ( TYPE TABLE ))))))) ( PAR ( PERFORM ( AN ACTION ( TYPE EATING ) ( ACTEE ( AN OBJECT ( TYPE PLATE ) ( AT ( A LOCATION ( ON ( AN OBJECT ( TYPE TABLE ))))))) ( AGENT ( AN OBJECT ( TYPE HUMAN - MODEL ) ( NAME H-1))))) ( PERFORM ( AN ACTION ( TYPE EATING ) ( ACTEE ( AN OBJECT ( RECOGNIZABLE - AS PLATE ) ( AT ( A LOCATION ( ON ( AN OBJECT ( TYPE TABLE ))))))) ( AGENT ( AN OBJECT ( TYPE HUMAN - MODEL ) ( NAME H-2)))))) T O CHECK ON SIMULATION TIMELINE ( SUCCESS ? TL ) An execution context will contain information about how to set up a simulation scenario, that is, what objects should exist and where they should be placed. The exe- cution context also contains a list of tasks, expressed as action designators. The tasks may be performed by different agents and may vary in terms of what location designator resolution procedures they use. In particular, the execution context of table 1 uses the stacked-on designator solution candidate generator. Finally, the execution context con- tains a query to be performed on the simulation timeline, a record of the events in the simulation run. In this case, we want to test that all tasks in the execution context finished successfully. For the present, we use simulated robots to represent humans because we do not yet have a human model available for the CRAM projection environment. This is because the primary concern in CRAM’s development so far has been to reason about and produce robot actions. However, we think that giving the robot some representation of the actions humans expect to perform in the shared environment will help the robot understand its own tasks better and how it is supposed to cooperate with humans, so we will look into adding human models in the future. 4. Evaluation We will now show results from simulation runs to interpret spatial relations. The exe- cution context has ‘table setting’, done by one robot, as the main task, and we will al- low either ‘stacked’ or ‘individual’ samplers for the ‘on’ relationship. The enabled task in the context will either be ‘eating’, to be done by two robots, or ‘washing up’, to be done by one robot, resulting in the following execution contexts: set the table for eating with plates stacked (“EatStk” in table 2), set the table for eating with plates individually placed (“EatInd”), or set the table for washing up (“WashStk” and “WashInd”). Table 2. Total simulation count, number of simulations ending in failure, and a breakdown of failure types for the various task contexts. TASK CONTEXT # S IMS # FAILED S IMS FAILURES C OLLISION O BJ TOO FAR O BJ NOT S EEN E AT S TK 20 18 1 4 13 E AT I ND 20 6 0 6 0 WASH S TK 20 0 0 0 0 WASH I ND 20 0 0 0 0 We simulate each task context 20 times in projection. Object initial placement is randomized, and there is also a random element to the robot’s generated placements and motions. Failures were logged for each scenario (cf. table 2). As this record of failures shows, the ‘stacked’ vs ‘individually placed’ decision has no impact on the washing up task, which proceeds flawlessly. For eating however, if the plates are stacked, 18 of the simulations end in failure: either one of the robots reaching for the plate has its line of sight occluded by the other, or end effectors collide, or one of the robots cannot find a convenient place around the table from which to reach the plate. Individually placing plates on the table results in errors sometimes as well, but the error rate is lower (only 6 out of 20 simulations fail) and all these errors are alike: the “eating” robot chose to navigate to a place around the table that is too far from the plate. The simulation therefore reveals some plausible geometric reasons for why it is a bad idea to leave plates stacked if several people expect to eat from them. This leads to filtering out some interpretations for the semantics of ‘placing plates on’, a filtering that depends on the task context and the geometric and behavioral constraints of the agents and the world they operate in. The way the robot interprets the ‘near’ relation can also Figure 2. Robot configurations at projection end. Above: example end configurations for the eating from stacked plates scenarios; below: eating from individual plates. be improved by the simulated experience, since it appears that the sampler for locations near the table is liable to produce locations that are too distant to actually reach items near the table. Such points in the costmap should have their probability density reduced, so that they and their neighbors are less likely to be chosen in the future. In a similar fashion, other simulation scenarios can be created and evaluated. In the next example we look at a scenario to see how an action is performed affects some quality metric. Consider the instruction “take the cups to the table using the tray”, which in GUM terms is an AffectingAction where the actor causes itself to move so as to transport some items to a destination GeneralizedLocation [7]. An instrument to assist the transport is specified, which results in adapting the transport program we generate to first place the transported items into (or, in this case, on) an auxiliary container first. However, because the tray has a limited capacity, several trips between the source and destination tables will be needed. One could look at some metric of quality for the task completion. Optimality may be hard to reach, but one can keep track of the best simulated results as a good enough substitute. Projection does not measure time, so we use distance traveled by the robot base as a proxy. In our particular case, the robot has to bring 4 cups to 4 locations on a destination table (see Figure 3). But, only 3 cups may be placed on the tray. It would seem a good strategy to always use the tray at full capacity, but in this case the robot will do an extra trip around the table, because the cup destination locations are spread out. It is often the case that what is a better approach to a task will depend subtly on geometric aspects of the environment the robot operates in, geometric aspects that linguistic instructions do not attempt to capture. Simulation then comes in as a way to explore possible behaviors, filter out inappropriate ones, and seek good-enough solutions for the task at hand. 5. Related work Knowledge modeling for robotics presents a heterogeneous mix of representation and reasoning methods, as seen in the KnowRob system [32,31]. Capturing more formally Figure 3. Transporting cups scenario: start and ending configurations (top row left and right respectively). Trips taken by the robot (bottom row): transporting 2 then 2 cups with tray (left); transporting 3 then 1 cup with tray (right). the properties of hybrid reasoning systems for robotics has also been explored [6]. There has also been much research on interpreting language to programs. Some such translations are fairly straightforward, when the natural language itself resembles pseu- docode, including some control structures [18]. The TellMeDave system [19] learns asso- ciations between instructions and human user activity logs. Markov Logic Networks are used in the PRAC system [23] to formalize understanding as probabilistic inference: find the most likely program given the instruction as evidence. PRAC can also do coreference resolution and fill in some missing information, such as a tool to use for an action. None of the language understanding systems above take context into account, and operate purely on the semantics of the linguistic instruction. Interpretation filtering based on checking the feasibility of a plan in a STRIPS domain has been researched [26], but more general contextualization approaches have not been presented yet. Some sensitivity to the environment has been shown in a logic-based system [10], where an instruction, translated into a first order logic statement, is verified for consis- tency with a first order logic description of the environment. Logics based approaches however may not scale well to everyday activities, since formalizations for these re- quire extremely expressive and complex logics [12]. Instead, several levels of abstraction should be employed in a heterogeneous manner [5], to allow off-loading complexity to modules better suited to particular tasks, such as we do here with simulation. Support for heterogeneous knowledge modeling has already received attention [22,24]. Linguistic semantics is also embracing simulation as a necessary component of un- derstanding, as seen in “Embodied Construction Grammars”[9], but the simulations con- sidered there are rarely detailed enough for the execution of actual tasks on a robot. 6. Conclusions We have described a simulation-based approach to contextualization for robotics and demonstrated how it can be used to interpret spatial relations appropriately to a task context, and improve robot behavior. In the future we will be looking at using the ontological modeling to provide more flexible translations from semspecs to programs. Notions such as image schemas [15, 30], especially if ontologically formalized [14], may help here. We are also looking at retaining simulation results as a body of experiential knowledge so that rules for behavior can be learned, and help robot reactions to become as reflexive as human decisions are. Acknowledgments The research reported in this paper was supported by the German Research Foundation (DFG), as part of the Collaborative Research Center (Sonderforschungsbereich) 1320 “EASE - Everyday Activity Science and Engineering”, University of Bremen (http: //www.ease-crc.org/). The research was conducted primarily within subproject P01: ‘Embodied Semantics for the Language of Action and Change’. References [1] Arbib, M.A., Gasser, B., Barrès, V.: Language is handy but is it embodied? Neuropsychologia 55, 57–70 (2014) [2] Barsalou, L.W.: Situated simulation in the human conceptual system. Language and Cognitive Processes 18(5/6), 543–562 (2003) [3] Bateman, J.A.: Language and Space: a two-level semantic approach based on principles of ontological engineering. International Journal of Speech Technology 13(1), 29–48 (2010). https://doi.org/10.1007/s10772-010-9069-x [4] Bateman, J.A.: Ontologies of Language and Language Processing. In: Poli, R., Healy, M., Kameas, A. (eds.) Theory and Applications of Ontology: Computer Applications, pp. 393–410. Springer, Dordrecht, Heidelberg, London and New York (2010) [5] Bateman, J.A.: Space, Language and Ontology: A Response to Davis. Spatial Cognition & Computation 13(4), 295–314 (2013). https://doi.org/10.1080/13875868.2013.808491 [6] Bateman, J.A., Beetz, M., Beßler, D., Bozcuoglu, A.K., Pomarlan, M.: Heterogeneous ontologies and hybrid reasoning for service robotics: The ease framework. In: Third Iberian Robotics Conference. ROBOT ’17, Sevilla, Spain (2017) [7] Bateman, J.A., Hois, J., Ross, R.J., Tenbrink, T.: A linguistic ontology of space for natural language processing. Artificial Intelligence 174(14), 1027–1071 (September 2010), http://dx.doi.org/10. 1016/j.artint.2010.05.008 [8] Beetz, M., Jain, D., Mösenlechner, L., Tenorth, M., Kunze, L., Blodow, N., Pangercic, D.: Cognition- enabled autonomous robot control for the realization of home chore task intelligence. Proceedings of the IEEE 100(8), 2454–2471 (2012) [9] Bergen, B.K., Chang, N.: Embodied Construction Grammar in simulation-based language understand- ing. In: Östman, J.O., Fried, M. (eds.) Construction Grammar(s): Cognitive and Cross-Language Di- mensions, pp. 147–190. Johns Benjamins, Amsterdam (2005) [10] Bos, J., Oka, T.: A spoken language interface with a mobile robot. Artificial Life and Robotics 11(1), 42–47 (2007) [11] Coventry, K.R., Garrod, S.C.: Saying, seeing and acting. The psychological semantics of spatial prepo- sitions. Essays in Cognitive Psychology series, Psychology Press, Hove, UK (2004) [12] Davis, E.: Qualitative spatial reasoning in interpreting text and narrative. Spatial Cognition & Computa- tion 13(4), 264–294 (2013). https://doi.org/10.1080/13875868.2013.824976, https://doi.org/10. 1080/13875868.2013.824976 [13] Gennari, S.P.: Representing motion in language comphrension: Lessons from neuroimaging. Language and Linguistics Compass 6(2), 67–84 (2012) [14] Hedblom, M.M., Kutz, O., Neuhaus, F.: Choosing the right path: Image schema theory as a foundation for concept invention. Journal of Artificial General Intelligence 6(1), 21–54 (2015) [15] Johnson, M.: The body in the mind. University of Chicago Press, Chicago, Il (1987) [16] Kasper, R.T.: A flexible interface for linking applications to PENMAN’s sentence generator. In: Hirschman, L. (ed.) Proceedings of the DARPA Workshop on Speech and Natural Language. Morgan Kaufmann, San Mateo, CA (1989), http://www.cs.mu.oz.au/acl/H/H89/H89-1022.pdf, avail- able from ACL Anthology H89-1022 [17] Kaup, B., Lüdtke, J., Maienborn, C.: ‘the drawer is still closed’: Simulating past and future actions when processing sentences that describe a state. Brain & Language 112, 159–166 (2010) [18] Matuszek, C., Herbst, E., Zettlemoyer, L., Fox, D.: Learning to Parse Natural Language Commands to a Robot Control System, pp. 403–415. Springer International Publishing, Heidelberg (2013) [19] Misra, D.K., Sung, J., Lee, K., Saxena, A.: Tell me dave: Context-sensitive grounding of natural lan- guage to manipulation instructions. In: Proceedings of Robotics: Science and Systems. Berkeley, USA (July 2014) [20] Morris, C.W.: Foundations of the Theory of Signs. University of Chicago Press, Chicago (1938) [21] Mösenlechner, L., Beetz, M.: Fast temporal projection using accurate physics-based geometric reason- ing. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). pp. 1821–1827. Karlsruhe, Germany (May 6–10 2013) [22] Mossakowski, T., Codescu, M., Neuhaus, F., Kutz, O.: The Distributed Ontology, Modeling and Spec- ification Language – DOL. In: Koslow, A., Buchsbaum, A. (eds.) The Road to Universal Logic, vol. I, pp. 489–520. Birkhäuser (2015), http://www.springer.com/gp/book/9783319101927 [23] Nyga, D., Picklum, M., Koralewski, S., Beetz, M.: Instruction Completion through Instance-based Learning and Semantic Analogical Reasoning. In: International Conference on Robotics and Automa- tion (ICRA) (2017) [24] OMG: The Distributed Ontology, Modeling, and Specification Language (DOL). Tech. rep., OMG (2015), https://github.com/tillmo/DOL/raw/master/Standard/dol.pdf [25] Pomarlan, M., Bateman, J.A.: Robot program construction via grounded natural language semantics & simulation. In: Proceedings of the 17th Conference on Autonomous Agents and MultiAgent Systems. AAMAS ’18 (2018) [26] Pomarlan, M., Koralewski, S., Beetz, M.: From natural language instructions to structured robot plans. In: Kern-Isberner, G., Fürnkranz, J., Thimm, M. (eds.) KI 2017: Advances in Artificial Intelligence. pp. 344–351. Springer International Publishing, Cham (2017) [27] Regier, T., Carlson, L.A.: Grounding spatial language in perception: An empirical and computational investigation. Journal of Experimental Psychology: General 130(2), 273–298 (2001) [28] Richardson, D.C., Spivey, M.J., Barsalou, L.W., McRae, K.: Spatial representations activated during real-time comprehension of verbs. Cognitive Science 27(5), 767–780 (September 2003) [29] Roy, D.: Semiotic schemas: A framework for grounding language in action and perception. Artificial Intelligence 167, 170–205 (2005) [30] Talmy, L.: The fundamental system of spatial schemas in language. In: Hampe, B. (ed.) From perception to meaning: image schemas in cognitive linguistics, pp. 37–47. Mouton de Gruyter, Berlin (2006) [31] Tenorth, M., Beetz, M.: KnowRob – A Knowledge Processing Infrastructure for Cognition-enabled Robots. Int. Journal of Robotics Research 32(5), 566 – 590 (April 2013) [32] Tenorth, M., Jain, D., Beetz, M.: Knowledge Representation for Cognitive Robots. Künstliche Intelli- genz 24(3), 233–240 (2010) [33] Winkler, J., Tenorth, M., Bozcuoglu, A.K., Beetz, M.: CRAMm – memories for robots performing ev- eryday manipulation activities. Advances in Cognitive Systems 3, 47–66 (2014) [34] Zimmer, H.D., Speiser, H.R., Baus, J., Blocher, A., Stopp, E.: The use of locative expressions in depen- dence of the spatial relation between target and reference object in two-dimensional layouts. In: Freksa, C., Habel, C., Wender, K.F. (eds.) Spatial Cognition I - An interdisciplinary approach to representing and processing spatial knowledge, pp. 223–240. Springer (1998)