1. Introduction and Problem Space

A balancing act: Ordering algorithm and image-schematic action descriptors for stacking objects by household robots

KaviyaDhanabalachandr

kaviya@uni-bremen.de 0 2

nMaria M.Hedblom

1 2

MichaelBeetz

beetz@uni-bremen.de 0 2

Workshop Proceedings

2 0 Institute of Artificial Intelligence, University of Bremen , Germany 1 Jönköping Artificial Intelligence Laboratory, Jönköping University , Sweden 2 The Eighth Joint Ontology Workshops , JOWO'22

Optimising object order in stacking problems remains a hard problem for cognitive robotics research. In this paper, we continue our work on using the spatiotemporal relationships called image schemas to represent afordance spaces founded on object properties. Based on object properties, we introduce a stacking-order algorithm and describe the action descriptors using an image-schematic event segmentation format by describing a small subset using the Image SchemaISLLo gic.

object stacking cognitive robotics afordances algorithms action representation

1. Introduction and Problem Space

Despite the past decades progress in developing increasingly sophisticated software and hardware, research in artificial intelligence and cognitive robotics is still struggling to accurately represent and execute tasks that are learned by a human children in early infancy, a phenomenon known as Moravec’s parado1x][. One such task is the pick-and-place task. Appearing deceptively simple, the task includes visual mastery of gaze control, understand spatial depth and positioning, as well as object recognition and object permanence. It requires object manipulation abilities for moving the agent’s own body (or body parts), understand the spatial and force-dynamic relationships of grasping objects. Additionally, it needs to be able to identify object properties and reason how their afordances behave under particular conditions. In this paper, we look at a particular complex pick-and-place task by focusing on how to stack object on the vertical axis. In addition, to the complexities of the pick-and-place task, stacking objects require a deeper understanding of the properties and afordances of the objects being stacked. For instance, it is not possible to stack heavy objects on top of flexible objects, nor it is (under normal circumstances) possible to stack a flat object onto a convex surface.

The motivation for this research agenda is the common occurrence of stacking objects in nEvelop-O (M. Beetz) (M. Beetz) household activities. Kitchen utensils are stacked on top of one another in the cupboard and objects are carried on trays in particular arrangement.

To provide the household robots with a more intelligent understanding for stacking, we will use inspiration from cognitive science that demonstrate how human children core down object properties and action segments into meaningful components. We will then proceed to utilise these components by formally representing action descriptors involved in stacking and present an ordering algorithm that takes the object properties into account to provide a stable stack.

2. Foundation and Related Work 2.1. Theoretical Foundation

Our work is based on the hypothesis that there exists a finite number of conceptual building blocks that represent the salient features of object relations, including those providing balance and stackability. These are often referred tiomaasge schemas and encompass spatiotemporal relationships between objects, agents and their enviro1n[m2e,n3]t.sThe theory stems from the notion of embodied cognition, a common theoretical framework in cognitive robotics research (e.g. 4[ , 5 ]) which assumes that intelligent behaviour, in all its forms, stems from the body’s sensorimotor experiences in its environment. Within this framework the image schemas represent the atomic yet salient features that distinguish one particular situation from another. For instance, the image schCemoantact is present in a situation in which two objects are physically touching and the image schLeimnak defines objects that (may or may not be) touching but have a causal relationship to one another. The salient diferenceCiosntthaactt,in moving one object will not necessarily afect the other object, but if tLhineykeadr,emoving one object will automatically move the other object as well. Likewise, the salient features of an object like caup is defined by its ability tCoontain liquids, and objects with flat sturdy surfaces like atray are defined by their afordance toSupport other objects. In most cases, an event like stacking objects can be described as a combination of diferent vertical pick-and-place tasks (Verticality + Source_Path_Goal) with theSupport andContainment constraints of any involved objects.

Due to the conceptually-rich content of image schemas and their finite number, combinations of them can be argued to describe all kinds of spatiotemporal situations an6d, 7e]v.eFnotrs [ robotic research, this means that it is possible to formally represent the physical states of both the initial state and the goal state of particular actions, but also in detail describe the changes over time that constitute the actions that leads to these changes. In previous work we have investigated this for diferent scenarios (s8e,e9][) and in Section4 we will demonstrate this for stacking usingISL .

1The theory was originally developed in cognitive linguistics to explain the high number of spatial metaphors in abstract language, but has become a common hypothesis in many research areas dealing with semantic relevance in relation to spatial reasoning. Not all research fields would agree on defining them as spatiotemporal relationships, but it is a useful delimitation in cognitive robotics.

2.2. Formal Framework

The representation language ISL : We base our representation on the expressive combination language the Image Schema LoIgSiLc [ 3 ]. Following a popular temporalisation strategy (as in10[]), temporal structures are the primary model-theoretic objects, in turn based on Linear Temporal Logic over the reals (LTL). At each moment of time, we allow for the employment of secondary semantics to represent complex propositions. These atomic representations are topological assertions in 3D Euclidean space based on the Region Connection Calculus (RCC) [ 11 ], relative movement dimension using Qualitative Trajectory Calculus1(2Q] TaCn)d[ relative object position using Ligozat’s Cardinal Direction1s3](C.QDu)a[ntification is used to separate diferent sortal objects, whilst otherwise the syntax of the language follows a standard multi-modal logic paradigm. For more details on the logic we re3f,e1r4t].o [ The robotics framework: While preliminary, the robotics framework we utilise is based on KnowRob [ 15 ], a knowledge representation and reasoning system. Knowrob uses Web ontology language (OWL) based on description logics to represent the knowledge. Prolog, a logic-based programming language is used to reason over the knowledge base and to assert new facts that are computed. We use the formal afordance model defined i1n6][ and represented in SOMA [ 17 ]. The formal model comprises of concepts including Afordance, Disposition and Role where Dispositions are properties of objects that takes on the role of a bearer and acts as a description of Afordance. With the model defined in1[ 6 ], for an event requiring a certain afordance to execute a task, the task can be executed only if there is a presence of two suitably disposed objects with a trigger and a bearer role in the environment. This model is used in the algorithm for stacking.

2.3. Previous Work Image-schematic event representation for robotics: In [7], the authors demonstrate how

it is possible to divide the event of cracking eggs into bowls into image-schematic components. In previous work, we looked at how robotic action descriptors could be cored down into their image-schematic relationships. I8n],[we defined image-schematic relations liVkerticality, Source_Path_Goal, Containment, Link, Contact, Support to describe the action of cutting. The functional properties of objects and scene description defined using image schema concepts are used in1[ 8 ] to construct a simulation of the scene and estimate the parameters necessary for performing the action of pouring. While19in], [the author uses qualitatively described functional object relations in a spatial arrangement to enable the agent to understand the causal structure embedded in the scene.

Object stacking: Robotics research on object manipulation comes in many forms, including work on stacking objects with a focus on the complexity of balancing objects on top of one another. The major challenge is to learn the physics of how shapes of diferent level of complexity can be stacked on top of one one another without fa2l0l]inagp.p[roach the problem by using a neural network to learn the geometric afordance2s1.]I,na[reinforcement learning algorithm was used to train a system how to stack a series of complex shapes while simultaneously ignoring irrelevant object features such as colour.

In more domestic environments, in which household items are to be stacked for more practical reasons, other challenges presents themselves2.2]I,nt[he authors introduce a method for stacking objects on shelves based on crowd sourced data.

Learning functional relationships and afordances: To perform a successful stacking, it is important to reason about the object afordance property. There are wor2k0s, 2(e3.,g2. 4[]) using machine learning to learn the extracted features representing the physical properties of objects in a scene to be able to reason about the individual object behaviour and how they behave in pairwise object interactions. But the problem is the machine learning models does not enable an understanding of how the object features relate to functional properties. There needs to be a semantic component as in image schemas to perform object stacking and also be able to explain in case of failure, when certain objects do not in the stack or failure in action when a robot executes the task of object stacking.

3. Stacking order algorithm

Computing the stacking order is a search problem to find a stable configuration that can be stacked from the given set of objects in a scene. The search space of this problem is n!, where n is the number of objects. Only rigid objects are considered in this work and the stack height is limited to 5 objects.

Algorithm1 presents the stacking order by computing the disposition properties of the objects. By Turvey’s [ 25 ] definition of dispositions, there needs to be two objects whose disposition properties complement each other, for the afordances of the objects to be realised if the situation demands. For the task of stacking to be performed, one of the objects assumes the role of a bearer with the ability to support other objects and the second object with a trigger role can be placed on top of other objects. The roles of the objects can be interchanged, with the object that is in need of support acting as a bearer and the supporting object acting as a trigger. The important aspect is the presence of two suitably disposed objects that can enable stacking. The RuleonTopOf is used to verify if the considered objects satisfy the condition to be stacked and does not mean that the obj e1ctis placed on top o f2. The rule is defined below, ∀ 1, 2 ∶ ( 2, 1) ↔ ∃ 1, 1 ∶ ( 1) ∧ ( 2) ∧ ( 1)∧ ( ℎ( 1) ∧ ℎ( 2, 1) ∧ ( 1, 1) ∧ ℎ (

It comprises of the unary predicaOtebsject, Role andDisposition which are used to assert tha t 1 and 2 are instances oOfbject, 1 is of typeRole and 1 is aDisposition. The predicate ℎ relates the object1 with a suitable dispositi o1n, similarlyhasTrigger is a relation betwe en1 and the role ty pe1 that can act as a trigge r f1oarndhasRole relates the object2 to role 1 if the object can take on the role. The last precdaincSatatcek is used to perform geometric reasoning if the objec1tasnd 2 playing the trigger and bearer roles aforded by the dispositio n 1 can be stacked.

The rule above explains how to check if two objects can be stacked. When a set of objects is considered, they are of separated into two groups based on their disposition property. The separation of the objects is done with the reasoning that any object that can support or contain other objects with the disposition Deposition and Containment respectively, should be placed at the bottom of the stack. Likewise, objects without such dispositions such as spoons, as well as spherical objects that has the property of Rollability, are at the top.

The objects are modelled with their top facing the poZs-iatxiivse(height) and the length along theX-axis and the width along tYh-aexis, see Figure1. This is identical to how they are to be placed on a surface to ensure stability. To avoid comparing each object with the rest of the objects, we use a simple heuristic of sorting the objects based on their diagonal length along theXY plane. This is motivated with how it ensures the object with a large bottom surface is at the bottom of the stack.

Consider an example scenario with an ordered list of objects comprising of a plate, a spoon and a bowl as shown in Figu1r.eFirst, we will apply Ru1leto the plate and the bowl. The plate, having the disposition propertyDoepfosition, needs a trigger role of tyDpeepositedObject, as the lfat nature of the plate afords tSoupport other objects when placed on top of it. Based on its physical features, the bowl can play the rolDeeopfosaitedObject and the relatiocannStack holds as well. Thus, the bowl can be placed on the plate. Continuing, with the bowl and the spoon, the bowl with the disposition propertCyonotfainment needs a trigger of typCeontainedObject, as the concave shape alloCwosntainment of objects inside. If the spoon can be contained in the given bowl then the taskStoacfking can be realise2d. The obtained stack is similar to the left stack in Figure2. The unstable arrangement of objects is one of the configurations that will not be considered by the algorithm as the spoon placed at the bottom will not ofer any support to other objects or contain other objects.

In the next section, we define the actions necessary for stacking. Stacking is performed by transporting the objects in the order provided by the algorithm. The defined action sequence has to be performed recursively until all the objects are stacked.

2Note that dispositions often correlate with their image-schematic representations, but that we have chosen to keep their syntactic representation in the text.

4. Action representation of stacking

In order to be able to execute successful object stacking, the event of stacking one object on top of another needs to be cored down and explained in terms of each of the individual actions. Stacking is a repetition of the task of transporting objects until all objects in our set have been placed on top of one another following the stacking algorithm.

The actions are classified into atomic and compound as described in the taxonomy of actions, see [ 26 ]. An atomic action is defined as below, Definition 1. An action causing a single change in the image schematic relation is an atomic action.

For example, consider the action of Lifting, before the action is executed an object is in contact with the robot gripper and is supported by a surface. Once the action is executed, there exists no contact between the surface and the lifted object and the support is ofered by the gripper. The Lifting action leads to a changeSuinpport relationship, hencLei,fting is an atomic action. Compound actions are a composition of atomic actions or compound actions. By that definition, an action likteransporting is a compound action comprising of two other compound actions: picking up and placing. Each of the compound action consists of smaller atomic descriptors that can be described in image-schematic terms.

Further, the pre- and post scene of the compound action is defined in terms of the spatial relations between the involved objects. The defined spatial relations and the object roles must hold for a successful task execution. By defining the pre- and post scene of an action, it is possible to infer the necessary sequence of actions to execute a task. The representation of the action with its initial and the terminal scene has the following syntax: Action () ↔ (InitialScene1() → TerminalScen e(2)), where 1 , 2 are time points an dis an interval which begins a1tand ends at2 during which the action happens.

This means that to execute an Action, the InitialScene is a prerequisite and once the action is performed, it results in the TerminalScene, the goal state of the Action. The action takes place during the time inter varlesulting in the end stat e2a,gtiven the prerequisite is satisfied at some time point 1. The ’→’ does not point to implication, it rather means that the defined goal state is achieved if the action is successfully executed under the given context. The initial scene of an action comprises of objects that can take on suitable roles and relevant spatial relation Algorithm 1 Rules for Stacking Input ← given set of objects in scene Output ← Stacking order of given objects ← Objects that are not part of the stack 1: procedure ComputesOrderedObjectList 2: OrderedObject←s 3: Sortedlis←t Objectlist sorted based on their diagonal length in descending order ignoring the height /* considers objects that can support other objects */ for object O in do if hasDispositio n, ( ) and D in [Deposition, Containmentht]en

Remove O from Sortedlist if OrderedObjects is empttyhen

OrderedObject←s OrderedObject∪s O among objects. Note that when the action is executed, the relationships between the objects and the roles the objects initially have, can change.

For a transporting action to happen, the prerequisite is the presencSeuopfpaort relation between two physical objects taking the roDleespoosfit andDepositedObject at a time point 1 . When the task is completed successfully, there is agSauipnpoart relation between the transported object and an object witDhetphoseit role at a time poi n2t. The transporting predicate is parameterised by 3 objects and a time inter1v, al2,, 3 and , where 1 is transported fro m2 to 3 during . The predicate hasInterval relates the TimeIntienrsvtaalnce with the starting time po i1natnd the end poin t2. Picking up comprises of several atomic actions: LookingFor, LookingAt, MovingTo, Grasping and Lifting. LookingFor and LookingAt are movements that are necessary to perceive the object. MovingTo is a movement towards the object so that the object is reachable. Lifting and Grasping involve interaction with objects, the gripper of the agent comes in contact with the object in Grasping and in Lifting, the agent is in control of the object and is being supported by the gripper. Similar to transporting, there needs tSoubpepoart relation before and after the action is executed successfully. For picking up o1ffrom 2 to happen, there needs to beSuapport relation between the object and the surface. There is an3aignevnotlved in picking up and the object 1 is supported by the agent once the action is complete. The action happens over a time interval which start s1aatnd ends at2. supportedBy( 1, 3, 2)) supportedBy( 1, 2, 1) We provide the definition of the actions involving objects using ISL logic. The o perisator Until from Temporal Logi↩c, and⇝ are used as in Qualitative Trajectory Calculus to denote the movement of the object away from, and towards the other object respectively. Grasping defined below is a combination of the image-schematic aspectLsinokf andContainment. When the grasping action is performed, the o bj1ecatnd gripper 2 are initially disconnected and eventually the object becomes a part of the gripper and there exists a contact force between the gripper and the object.

∀ 1 ∶ grasping( 1) ↔ ∃ 2 Object( 1) ∧ Gripper( 2)∧ ( 1, 2) ( ( Lifting is equivalent to a combination of image schema concepts namUepl,ySource_Path and Support. During the execution of Lifting, the obje c1tto be lifted is supported by the arm2 while the arm supporting the object, moves along an upwards paawtahy from the support surface 3. The execution is complete when the arm lies above the support surface and there is no contact between the object and support surface.

∀ 1 ∶ lifting ( 1) ↔ ∃ 2, 3, Object( 1) ∧ Arm( 2) ∧ Object( 3)∧ ℎ() ∧ ( (( 1 ↩ 3 ∧ ( (¬(( 1, )) ∧ ¬(( 2, 3)) ∧ ( 1, 3))) Placing The placing action is performing picking up in reverse order, which comprises of the following atomic actions: MovingTo, Lowering, Releasing, MovingAway. MovingTo is the action of an agent moving towards the object to achieve reachability and MovingAway is the act of moving away from the object once it is placed stable on a surface. Again, we define Lowering and Releasing actions which involve objects using ISL. The condition necessary for placing to be executed is the existence oSfuapport that is provided by the ag en2tto an object1 at some time point 1. When the placing action is performed successfully during the time interval , there exists no contact between the agent and the object, and tShueprpoerits arelation between the placed obje c1tand the objec t3 that can aford to support at time p oi2n. t ∀ 1, 2, ∶ placing( 1, 2, ) ↔(∃ 3, 1, 2 Object( 1) ∧ Timepoint( 1) ∧ Timepoint( 2)∧ Timeinterva(l) ∧ 2 > 1 ∧ Agent( 3)∧ hasInterva( l, 1, 2) ∧ hasRole( 1, )∧ hasRole( 3, ) ∧ → Object( 2) ∧ hasRole( 2, )∧ supportedBy( 1, 2, 2)) supportedBy( 1, 3, 1) Lowering, an atomic action of Placing is a combinatioPnatohf_,Goal andSupport. As in Lifting, there exists a contact force between the obj1eacntd the ar m 2 during the execution phase and the arm moves towards the supporting surf3acaelong a pat h. The action is successfully completed, if at the end, there exists a contact between the object held by the gripper and the object with a supporting surface.

∀ 1 ∶ lowering( 1) ↔ ∃ 2, 3, Object( 1) ∧ Arm( 2) ∧ Object( 3)∧

Path() ∧ ( (( 1 ⇝ 3 ∧ ( (¬(( The next action is Releasing, which begins with a contact between th e g2rainppdetrhe object 1 and ends with the gripper and the object being disconnected. Once the action is complete, the object has no contact with the agent.

∀ 1 ∶ releasin(g 1) ↔ ∃ 2 Object( 1) ∧ Gripper( 2)∧ ( (

5. Discussion and Future Work

The complexity of designing methods for cognitive robots to successfully stack objects, extends the problem for eficient manipulation. Performing a successful manipulation, requires an understanding of how the objects behave in a particular arrangement. In this paper, we utilised object afordances and introduced an algorithm for intelligent stacking based on individual object properties, and a rule that acts on objects pairwise to check if on top of relation holds between them. Taking inspiration from human cognition, we argue that stacking objects of diferent properties require the agent to understand the underlying rules pertaining to how a particular object interacts with one another. Our inclination to use object afordances stems from the fact that they can be described to relate to spatiotemporal relationships, image schemas, which also constitute the fundamental steps in action execution. Further, we showcase how image-schematic event representation in the formISaLt of , can be used to describe the action descriptors. The described action descriptors can then be used for failure monitoring and also in case of failures in execution, for explaining the reason behind a failure. For example, with the image-schematic definition of Lifting, there needs to be a support relation between the lifted object and the gripper involved in lifting and the gripper in contact with the object must be moving away from the supporting surface in such a way the gripper lies above the supporting surface. In case of an object slipping while lifting, this can be detected with our definition of lifting as the support relation between the lifted object and the gripper is violated.

Much work (e.g. 2[ 7, 23 ]) has used simulation as a means to understand the scene. Considering an instance of stacking performed only with a physics simulator, the number of random samples required by the simulator to understand the object properties when they interact with other objects is huge. Also a part of the generated samples might consist of spatial arrangement of objects that violates what the object afords. For example, in2F,itguhreere is an object constellation with spoon at the bottom with a bowl on the top.However, this is an invalid combination according to our qualitative description while this is a completely valid sample considered by the simulator. Generating samples that respect our qualitative descriptions of the scene and object properties can significantly reduce the number of samples needed.

Due to the current theoretical nature of the work, evaluations of the underlying ideas are still lacking. In the future, we will rectify this, by investigating how the formalised image schemas can be used by the robot control programs. To use image-schema formalisms along with robot control programs, it is necessary to represent and quantify the defined spatiotemporal relations using physics based simulators. The image schematic event segmentation de4fined in can be used to infer the sequence of actions that the robot has to repeat to perform stacking until all the objects received from the algo1raitrhemin the stack. To execute the inferred actions successfully, we need failure handling. With the action descriptors defined in ISL logic, failure monitoring can be performed. To combine the action sequence for stacking and failure monitoring in a modular fashion, behaviour trees will be used. For instance, a behaviour tree can be generated with a parallel node consisting of two children, one for performing the action execution and the other monitoring the failures based on the action executed. The action execution node consists of a child node for each atomic action to be executed and will execute the actions in sequence until all of them succeed. We intend to use G2i8s]k,aarcdo,n[straintbased robot controller for executing the stacking task. As Giskard already uses behaviour trees for planning and executing the goals, it is relatively less complex to integrate our approach with Giskard. And with regards to the rules of stacking, we are interested in a long term research goal of extracting the rules by letting the agent interact with the environment. For this we want to use observational data for modelling the relation between the physical attributes of the object that influence the stacking stability and also collect intervention data by allowing the agent to interact with the environment similar to a curiosity-driven exploration in si2m9u].lation [

Acknowledgements

The research reported in this paper has been partially supported by the Federal Ministry for Economic Afairs and Energy BMWi within the Knowledge4Retail project, subproject semantic Digital Twin 01MK20001M (https://knowledge4retail.org). tion from interactive physics-based simulation, in: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2016, pp. 4005–4012. [29] M. Gasse, D. Grasset, G. Gaudron, P.-Y. Oudeyer, Causal reinforcement learning using observational and interventional dataa,r2X0i21v.:2106.14421.

[1]

Moravec , Mind children: The future of robot and human intelligence , Harvard University Press, 1988 .

[2]

Johnson , The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Reason , The University of Chicago Press, Chicago and London, 1987 .

[3] M. M. Hedblom , Image Schemas and Concept Invention: Cognitive, Logical, and Linguistic Investigations, Cognitive Technologies, Springer Computer Science, 2020 .

[4]

Batra ,

A. X.

Chang ,

Chernova ,

A. J.

Davison ,

Deng ,

Koltun ,

Levine ,

Malik , I. Mordatch ,

Mottaghi , et al., Rearrangement: A challenge for embodied ai , arXiv preprint arXiv: 2011 . 01975 ( 2020 ).

[5]

Smith ,

Gasser , The development of embodied cognition: Six lessons from babies , Artif. Life 11 ( 2005 ) 13 - 30 . URL:https://doi.org/10.1162/10645460532789 7.3 doi:Path10.1162/1064546053278973.

[6]

Oakley , Image schema , in: D. Geeraerts , H. Cuyckens (Eds.), The Oxford Handbook of Cognitive Linguistics , Oxford University Press, Oxford, 2010 , pp. 214 - 235 .

[7] M. M. Hedblom , O.

Kutz , R.

Peñaloza , G. Guizzardi, Image schema combinations and complex events , KI-Künstliche Intelligenz 33 ( 2019 ) 279 - 291 .

[8]

Dhanabalachandran ,

Hassouna , M. M. Hedblom , M.

Küempel , N.

Leusmann , M.

Beetz , Cutting events: Towards autonomous plan adaption by robotic agents through imageschematic event segmentation , in: Proceedings of the 11th on Knowledge Capture Conference, K-CAP '21 , Association for Computing Machinery, New York, NY, USA, 2021 , p. 25 - 32 . URL: https://doi.org/10.1145/3460210.34935 8.5doi: Path10 .1145/3460210.3493585.

[9] M. M. Hedblom , M.

Pomarlan , R.

Porzel , R.

Malaka , M.

Beetz , Dynamic action selection using image schema-based reasoning for robots , in: Proc. of the Joint Ontology Workshops , 2021 .

[10]

Finger ,

D. M.

Gabbay , Adding a Temporal Dimension to a Logic System , Journal of Logic, Language and Information 1 ( 1993 ) 203 - 233 .

[11]

D. A.

Randell ,

Cui ,

A. G.

Cohn , A spatial logic based on regions and connection , in: Proc. 3rd Int. Conf. on Knowledge Rep. and Reas ., 1992 .

[12] N. Van De Weghe , A. G.

Cohn , G. De Tré, P. D.

Maeyer , A qualitative trajectory calculus as a basis for representing moving objects in geographical information systems , Control and cybernetics 35 ( 2006 ) 97 - 119 .

[13]

Ligozat , Reasoning about cardinal directions , J. Vis. Lang. Comput . 9 ( 1998 ) 23 - 44 .

[14] M. M. Hedblom , O.

Kutz , T.

Mossakowski , F.

Neuhaus , Between contact and support: Introducing a logic for image schemas and directed movement , in: F. Esposito,

Basili ,

Ferilli ,

F. A.

Lisi (Eds.), AI*IA 2017: Advances in Artificial Intelligence , 2017 , pp. 256 - 268 .

[15]

Beetz ,

Beßler ,

Haidu ,

Pomarlan ,

A. K.

Bozcuoglu , G. Bartels, KnowRob 2 .0 - A 2nd Generation Knowledge Processing Framework for Cognition-Enabled Robotic Agents , in: 2018 IEEE Int. Conf. on Robotics and Automation , ICRA 2018 , Brisbane, Australia, May 21 -25, 2018 , 2018 , pp. 512 - 519 . doi:Path10.1109/ICRA. 2018 . 8460964 .

[16]

Beßler ,

Porzel ,

Pomarlan ,

Beetz ,

Malaka ,

Bateman , A formal model of afordances for flexible robotic task execution , in: ECAI 2020 , IOS Press, 2020 , pp. 2425 - 2432 .

[17]

Beßler ,

Porzel ,

Pomarlan ,

Vyas ,

Höfner ,

Beetz ,

Malaka ,

Bateman , Foundations of the socio-physical model of activities (soma) for autonomous robotic agents , arXiv preprint arXiv: 2011 . 11972 ( 2020 ).

[18]

Pomarlan , M. M. Hedblom , R. Porzel , Panta Rhei: curiosity-driven exploration to learn the image-schematic afordances of pouring liquids , in: Proceedings of the 29th Irish Conference on Artificial Intelligence and Cognitive Science , Dublin, Ireland, 2021 .

[19]

Pomarlan ,

J. A.

Bateman , Embodied functional relations: A formal account combining abstract logical theory with grounding in simulation ., in: FOIS , 2020 , pp. 155 - 168 .

[20]

Groth ,

F. B.

Fuchs , I. Posner ,

Vedaldi , Shapestacks: Learning vision-based physical intuition for generalised object stacking , in: Proceedings of the European Conference on Computer Vision (ECCV) , 2018 , pp. 702 - 717 .

[21]

A. X.

Lee ,

Devin ,

Zhou ,

Lampe ,

Bousmalis ,

J. T.

Springenberg ,

Byravan ,

Abdolmaleki ,

Gileadi ,

Khosid ,

Fantacci ,

J. E.

Chen ,

Raju ,

Jeong ,

Neunert ,

Laurens ,

Saliceti ,

Casarini ,

Riedmiller ,

Hadsell ,

Nori , Beyond pick-and-place: Tackling robotic stacking of diverse shapes, 2a0r2X1i .v: 2110 . 06192 .

[22]

Abdo ,

Stachniss ,

Spinello , W. Burgard, Robot, organize my shelves! tidying up objects by predicting user preferences , in: 2015 IEEE international conference on robotics and automation (ICRA) , IEEE, 2015 , pp. 1557 - 1564 .

[23]

P. W.

Battaglia ,

Pascanu ,

Lai ,

Rezende ,

Kavukcuoglu , Interaction networks for learning about objects , relations and physics,a2r0X1i6v.:1612 . 00222 .

[24]

Raposo ,

Santoro ,

Barrett ,

Pascanu ,

Lillicrap ,

Battaglia , Discovering objects and their relations from entangled scene representations , arXiv preprint arXiv:1702.05068 ( 2017 ).

[25]

Turvey , Ecological foundations of cognition: Invariants of perception and action . ( 1992 ).

[26]

Zech ,

Renaudo ,

Haller ,

Zhang , J. Piater, Action representations in robotics: A taxonomy and systematic classification , The International Journal of Robotics Research 38 ( 2019 ) 518 - 562 .

[27]

P. W.

Battaglia ,

J. B.

Hamrick ,

J. B.

Tenenbaum , Simulation as an engine of physical scene understanding , Proceedings of the National Academy of Sciences 110 ( 2013 ) 18327 - 18332 .

[28]

Fang , G. Bartels,

Beetz , Learning models for constraint-based motion parameteriza-