Steps Towards Generalized Manipulation Action Plans - Tackling Mixing Task Vanessa Hassouna1,* , Alina Hawkin1,* and Michael Beetz1 1 Institute for Artificial Intelligence, University of Bremen, Bremen, Germany Abstract In the rapidly evolving field of household robotics, the ability to autonomously execute complex tasks like "Serve Me Breakfast" introduces considerable technical challenges, particularly in the precise and adaptive handling of mixing tasks that involve substances with diverse densities and viscosities. This study advances the field of cognitive robotics by delving into the intricacies of everyday, complex, and highly variable activities within household settings. These scenarios demand that robots manage multiple, interconnected actions—a concept known as schemas. We present a holistic strategy to enhance household robots with sophisticated task execution capabilities, utilizing the Cognitive Robotic Abstract Machine (CRAM) framework for strategic planning and execution. Our method begins with an in-depth analysis of human behaviour to develop a theoretical model that guides the creation of adaptable and comprehensible action plans for robots. A crucial element of our approach involves using Narrative Enabled Episodic Memories (NEEMs), which capture detailed records of task executions to aid performance analysis and experiential learning. We propose incorporating additional criteria based on events recorded in the NEEMs to assess task success. For instance, a robot maintaining a steady grip on the whisk while interacting with a fluid suggests correctly executing mixing tasks. These criteria enable further evaluation of performance through simulations, despite potential limitations in simulation fidelity. This article explores the transition from human expertise to robotic execution of mixing tasks in household environments, the methodology for gathering and analysing NEEMs, and their prospective future applications. Keywords Domestic robotics, Mixing tasks, Cognitive Architecture, Undetermined Action Description, CRAM framework, Narrative Enabled Episodic Memories (NEEMs), Task parameters, Reasoning, Simulation, KnowRob 1. Introduction Enabling robots to perform complex, everyday tasks, such as preparing breakfast, with an understanding and adaptability akin to human behaviour is particularly challenging. This challenge is exciting because it transcends mere mechanical execution, advancing into the realm of cognitive robotics, where robots must not only perform actions but also comprehend their purposes and contextual implications. The challenge is significant because everyday activities in human environments are inherently complex and highly variable, requiring careful deliberation to ensure functional and adaptable robotic actions. These tasks often involve actions within actions—schemas [1] that robots must dynamically interpret and execute. Our research demonstrates that this layered understanding is possible, as shown in the context of object transportation tasks [2], where robotic actions are nested within broader goal-directed behaviours. Less explored, however, is whether similar schema-based approaches can be applied to manipulation functions that demand a deeper understanding of the world and its dynamic properties—tasks that fundamentally alter the state of their environment. This paper focuses on one such category of actions: mixing tasks. Mixing is not only about the physical stirring or blending of substances but also involves understanding the physical and chemical properties that change due to these actions, which is a very challenging problem [3]. Our method offers a perspective by conceptualizing the mixing process as a high-level symbolic description. This approach allows for a more abstract mixing task execution without Workshop on Actionable Knowledge Representation and Reasoning for Robots (AKR³) at Extended Semantic Web Conference (ESWC), May 27, 2024, Heraklion, Crete, Greece * Corresponding author. $ hassouna@uni-bremen.de (V. Hassouna); hawkin@uni-bremen.de (A. Hawkin); beetz@cs.uni-bremen.de (M. Beetz)  0000-0003-1335-5698 (V. Hassouna); 0000-0003-1826-9983 (A. Hawkin); 0000-0002-7888-7444 (M. Beetz) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings necessitating an exact physical representation. Past research has demonstrated that such high-level plans are sufficiently detailed to instruct robots on the required actions effectively [4] [5]. This article offers insights into leveraging the Cognitive Robotic Abstract Machine (CRAM) [6] framework, which supports high-level reasoning and decision-making for robotic agents. This involves analyzing human behaviour to develop theoretical models that guide the creation of understandable and executable action plans for robots. This approach allows robots to interpret the essence of their tasks and make informed decisions during task execution. Furthermore, we propose enhancing the robot’s ability to learn and adapt using Narrative-Enabled Episodic Memories (NEEMs) [7] [8] within the CRAM framework. These memories are recordings, detailed logs enriched with symbolic, environmental, and ontological data, providing a full perspective on each task’s execution. These memories help identify the conditions necessary for successful performance, allowing for continuous improvement of robotic capabilities. The paper’s contributions include a detailed methodology for creating theoretical models essential for programming cognitive tasks in robots, introducing an undetermined action designator for executing mixing tasks, and using NEEMs to record and evaluate robot actions, enabling their skill to reason retro perspective. The paper also highlights the potential of NEEMs to advance robotic capabilities, suggesting a future with more autonomous and adaptive robotic systems across various settings. 2. State of the Art Research has explored anthropomorphic-inspired approaches to enhance robotic capabilities in tasks like mixing [9]. Robots can replicate human-like movements by studying human motions and implementing inverse kinematics, fostering collaboration between humans and robots across various fields. This approach improves adaptability and opens up new possibilities in areas ranging from manufacturing to healthcare. Building upon this foundation, we introduce a high-level theoretical model for robots to simulate. Informed by human movement patterns, this model does not necessitate the robot to replicate these movements exactly. Instead, robots apply the underlying principles of these actions, optimizing performance to suit robotic capabilities and environmental constraints. Furthermore, advancements such as "BakeBot" [10] have tackled challenges in item localization and optimal mixing techniques. Robots can adapt to environmental changes while baking cookies by employing high-level motion plans and dynamic mixing trajectories. These strategies, which include compliance and force control, have proven effective in achieving desired outcomes despite potential failure scenarios. Additional studies on human motions, such as whipping cream and tea whisking [11], provide insights into effective agitating motions and skill progression. Observations of expert confectionery hygiene experts have revealed efficient agitating motions, highlighting the importance of specific patterns like elliptical and reciprocating motions. Similarly, analysis of motions in the traditional Japanese tea ceremony has helped highlight differences in skill levels and distinct motion processes, which aid in designing theoretical models for robotic tasks like mixing. Our methodology leverages these insights to develop a nuanced motion model that can dynamically adapt to the robot’s tasks. The "Robot Household Marathon Experiment" [2] explores the complexities of underdetermined actions and dynamic environments in real-world household settings. We utilize insights from this study, applying the same framework to enhance our understanding of action execution in real-world scenarios. Although the experiment primarily demonstrated basic household tasks like setting and cleaning up a breakfast table, more than these predominantly pick-and-place actions are needed for more complex tasks such as mixing, which require learning from past experiences and adapting strategies over time. Our approach integrates advanced memory and learning capabilities, enabling robots to perform complex mixing tasks with increased autonomy and adaptability. Regarding learning, significant research has been conducted on acquiring knowledge from human interactions within VR through NEEMs [12]. Recording VR NEEMs involves observing human actions in VR environments to generate NEEMs that can be generalized for robot actions. By extrapolating high-level actions from specific events, such as contact events and object movements, conditions for successful task completion can be inferred. This provides valuable insights into the usefulness of NEEMs for learning actions or reasoning about them. A key aspect of our approach is enabling the robot to retrospectively analyze its actions and ensure that newly introduced actions are executed successfully in different environments. 3. The Theoretical Planning Methodology We begin by formulating a theoretical action model, utilizing various knowledge sources to integrate new actions into our robotic system. This model includes an Action Designator, a symbolic underde- termined descriptor that is contextualized during runtime, encompassing preconditions, task-specific requirements resolved at runtime, and postconditions that outline the intended goals [13]. 3.1. Observation For the specific task of developing an action model for mixing, we used the "Max Planck Institute for Informatics Cooking Activities dataset 2.0" [14], hereafter referred to as the dataset. This dataset provides a collection of human-executed cooking tasks, including mixing and utilizing various tools, ingredients, and containers. We manually filtered this dataset to select videos relevant to our research based on predefined criteria. After annotating these videos, we analyzed them to identify different types of mixing actions. These actions were broken down into basic elements, such as gripping a tool or holding a bowl, which were then organized into typical phases to formulate a generalized action model. Our analysis was validated against established mixing techniques from Wikihow to ensure the accuracy of our observations[15]. WikiHow has been utilized in our other research to enhance our knowledge base and inform the design process of our model process [16]. (a) Circular Motion (b) Ecliptic Motion (c) Orbital Motion Figure 1: Three Types of Mixing Motion The evaluation process concluded with selecting 54 videos deemed relevant to our study. Initially, 273 videos were considered. This extensive review revealed that all mixing actions typically begin with the utensil in one hand and the container in the other, both ready for use. Every container is open-topped, and the utensil, usually gripped by its handle, is aimed towards the centre of the container before being inserted. Following this setup, the mixing actions vary and can be categorized into three distinct types of mixing motions: 1. Circle motion, rotation- drawing a circle on two axes as seen in Figure 1.a 2. Ellipse motion, a very simplified version would be a line - drawing on one axis back and forth as seen in Figure 1.b 3. Orbit motion, along an invisible big circle in even segments, small circle motions are drawn with the segments being the centre of the smaller circle as seen in Figure 1.c We categorize the mixing motions observed in our dataset as circular, ellipse, and orbital mix types. The term "orbital" was inspired by the visual similarity to the moon’s orbit around the sun, not implying any literal astrodynamics of orbital motion [17]. Analysis of the dataset reveals that these motion types can be repeated and linked together in sequences. However, it is important to note that more videos featuring consistent mix ingredients, tools, and containers are needed to establish any correlation with mix duration. After the various mixing motions, a consistent final step occurs in all observed actions: the tool is withdrawn from the container. This action-to-motion strategy aligns with the Flanagan Model [18]. Different mixing motions are preferred by subjects in the dataset depending on the context, such as the specific ingredient being mixed and the utensil used. One limitation we encountered in the dataset was the lack of tasks involving the whipping of heavy cream or the goal of incorporating air into a mixture, which typically involves a whisk. To address this, we draw upon insights from the paper "CFD Analysis of Effective Human Motion for Whipping Heavy Cream by Hand" [11]. For incorporating air, a motion with large amplitude is necessary [11, p. 121]. The elliptical and reciprocating motions described in the paper are deemed effective. When comparing these motions with those in our dataset, we observe similarities with our defined circular and ellipse mix types. A notable difference is that the subjects in the paper also tilt the container to increase amplitude during their whipping motion without altering the whisking pattern. This adaptation suggests that our defined mix types are relevant and applicable to whipping, a specialized form of mixing. 3.2. The Resulting Model Based on our previous observations, a structured timeline for the mixing action has been developed, as shown in Figure 2. This timeline delineates the key stages involved in the mixing process: Pre-condition: The initial step involves grasping the tool and the container, which is denoted as grab utensil in Figure 2 (left). Postcondition : After completing the mixing action, the tool should be set aside, represented as place tool away in Figure 2a (right). mix motion grab utensil mix designator place tool away time Figure 2: Timeline of Pre-condition and Postcondition of Mixing. mix motion on repeat until x duration/amount approach spiral outward circular/eclipse/orbital spiral inwards retract time Figure 3: Timeline of a Mixing Designator The mixing Designator is depicted in Figure 3. Initially, the arm equipped with the tool is guided to the container and inserted, a phase referred to as approach. A start position must be established to transition into repeatable circular, elliptical, orbital motions. To seamlessly connect the ’approach’ to the repeatable motion, a spiral outwards motion is employed. Selecting the appropriate container involves careful management of the contact points with the tool. An intermediate step is implemented to prevent any unexpected collisions during the tool’s withdrawal from the container. Since the ’spiral outwards’ motion is confirmed to be safe, it is retraced back to its starting point before initiating a withdrawal, termed as spiral inwards. A retract phase safely concludes the mixing process. It should be noted that the duration of the repeated mixing motions might vary based on factors such as the container’s size and the mixed substance’s viscosity. 4. Integrating New Actions into CRAM We are developing an undetermined action designator [13] for mixing tasks within the CRAM framework. An action designator translates symbolic action descriptions into concrete ROS action goals or similar data constructs, typically through an inference engine. The effective resolution of an action designator requires defining various predicates; our Designator includes predicates such as type, context, resolution, round arms object, and source. Importantly, not all parameters need to be explicitly specified by the user; for example, if an arm is not selected, the system will autonomously attempt to resolve the Designator, possibly by determining which arm is currently available. For the Designator to function effectively, it is essential that the tool object and container object are precisely identified. These elements are critical as they inform the system of the specific objects to interact with during the mixing process. Listing 1 presents the undetermined action designator for mixing. Listing 1: Undetermined Mixing Action Designator ( d e s i g : an a c t i o n ( type : mixing ) ( c o n t e x t : mix − c i r c l e ) ( reso 12) ( rounds 1 ) ( arm ( : r i g h t ) ) ( object ? object −container ) ( source ? object −desig −source ) ) The next step is to resolve this high-level and abstract designator. Once resolved, it transforms into a detailed action plan for the robot, incorporating atomic actions such as approaching, each corresponding to an individual motion. To derive these atomic actions, we calculate the entire trajectory of the robot for the task in segmented increments. The functionality and effectiveness of this action designator have been extensively tested in previous research [2], demonstrating its capability to accurately and efficiently guide robotic actions in complex tasks like mixing. (a) Bowl-axis (b) Ladle-axis (c) Tool-gripper-pose Interpola- Interpolation with a) tion with b) Figure 4: End Pose of Approach Pose Calculation with Container Vector Result Once compatibility between the tool and container is established, we calculate the end pose. Consider a bowl as the container and a ladle as the tool for illustrative purposes, though this methodology applies to any tool-container combination. The desired end pose is determined by interpolating the container bottom value with the tool height. This interpolation process is depicted in Figure 4. The next step involves determining the gripper’s pose on the tool and integrating this information into the previously interpolated result. Once the pose information is obtained in the object transform frame, it is converted to the base transform frame for further processing. This leads to the definition of the approach-pose, where the gripper positions just above the container rim. Following the approach-pose, we transition to the start-mix-poses, facilitating a smooth transition to the actual mixing action through a logarithmic spiral motion. The logarithmic spiral was chosen because it provides a natural motion trajectory. As the gripper remains static on the height axis during this phase, the spiral calculation is performed in two-dimensional space. The height dimension will be integrated later to complete the pose configuration within the container space. The mathematical representation of a logarithmic spiral in parametric equations is as follows: 𝑥(𝑡) = 𝑎 * 𝑒𝑘*𝑡 𝑐𝑜𝑠(𝑡) (1) 𝑦(𝑡) = 𝑎 * 𝑒𝑘*𝑡 𝑠𝑖𝑛(𝑡) (2) The two combined equations (1) and (2) provides the xy-coordination as a function of time t, with 𝑎 representing the section size of the spiral, determined by the number of segments into which the spiral is divided, 𝑘 is an arbitrary constant and 𝑡 denotestime. The value 𝑎 is critical, as it determines when the spiral will terminate at the maximum radius of the container. This maximum value for 𝑎 is calculated based on the tool and container types, adjusting for the tool width and the distance from the tool to the inner wall of the container, as informed by our object knowledge module. This ensures that the tool remains within the container’s bounds without being pushed out. The constant 𝑘 is chosen to be positive with no other constraints; a practical value of 0.4 has been selected for our applications. The process for calculating these parameters and their impact on the trajectory of the tool within the container is visualized in Figure 5. This visual aid helps understand how the spiral’s section size and the chosen value of 𝑘 influence the movement pattern within the specified constraints. (a) Sampling (b) Sampling (c) Sampling (d) Sampling rate 12 rate 2 rate 4 rate nearing +∞ Figure 5: Sampling Rates and the Resulting Trajectories Mix-poses perform the main tasks in mixing, with the "reso" value representing the resolution of the circle or ellipse. Orbital motion combines two circle motions for more complex mixing tasks. This section can be repeated based on the specified number of rounds. If no rounds value is defined, one completed mix-motion is assumed by default. Otherwise, the poses are linked as many times as rounds specify. 5. Integrating new actions into NEEM generation This section will briefly describe the steps needed to achieve NEEM generation of newly introduced actions within the CRAM framework, utilising the Bullet World simulation. It builds upon the previous work of S. Koralewski et.al. [19] and the there presented CRAM Cloud Logger (CCL), which by now has been dramatically extended and adapted to KnowRob 2.0 [8]. The newly extended CRAM Cloud Logger establishes the necessary connection between the CRAM action designators and maps the different terminologies of KnowRob, SOMA [20] and CRAM to each other. The integration of Virtual Reality NEEMs into CRAM for reasoning purposes has been previously presented by Kazhoyan et.al. [12] and can be seen as a proof of concept for this approach. However, VR NEEMS are very different to the NEEMs recorded from simulation for this paper, which will be further explained in the upcoming section. 5.1. The differences between Virtual Reality and CRAM Bullet World NEEMs In the learning from VR NEEMs approach, presented in "Learning Motion Parameterizations of Mobile Pick and Place Actions from Observing Humans in Virtual Environments" [12], the data is obtained by a human user performing everyday manipulation activities in a Virtual Reality environment. That work presents the pipeline of gathering NEEMs in VR, processing them in KnowRob [8] and generalizing the obtained data to determine parameters for executing the VR-shown action within the Bullet World simulation and the real robot. It also shows that the learned parameters can be generalized across different kitchen environments and used by other robots. Even though the goal of setting the table is given in advance, during the task execution and subsequent NEEM generation in Virtual Reality, it is impossible to know which high-level action the human user is currently performing. This knowledge is extrapolated based on specific events which take place. [21] [22] E.g. if a contact event between the simulated hand occurs with a cup, followed by the loss of contact between the cup and the table surface by which it was supported, it can be assumed that the human user has picked up the cup. Once the cup has lost contact with the hand but a contact event between the cup and the kitchen island has occurred, it can be assumed that the cup has now been transported to a new location and placed there. In contrast, the NEEMs generated from the CRAM Bullet World are based on action designators and their resolution. Since a designator is recorded in its unresolved and resolved form, this kind of NEEM would allow the comparison of different resolution approaches, which was also a crucial part of the evaluation in the Kazhoyan et al. paper [12]. 5.2. Setting up the NEEM generating environment The NEEM logging process works by attaching additional functions to the designator execution function within CRAM, which allows the designator and all of its parameters, both before and after the resolution, to be logged via queries to KnowRob, which then writes the data into an instance of MongoDB. However, it is necessary to link the CRAM designator terminology to the SOMA ontology used by the KnowRob framework so that the NEEM can be queried for information and reasoned about later. This is done by introducing additional resources to KnowRob, such as the environment urdf files and the robotic agent. In addition, owl files are needed to describe the contents of the URDF files semantically and to establish the connection between the robot/environment description and the semantics used within KnowRob. Since KnowRob can utilise "rospack" services, the paths can be given accordingly. This Parameter setting for CRAM Cloud Logger is shown in Listing 2. Listing 2: Environment Parameter settings for the CRAM Cloud Logger; extract of a setup-logging- function ( s e t f ∗ e n v i r o n m e n t − o w l ∗ " ’ p a c k a g e : / / i a i _ s e m a n t i c _ m a p s / owl / k i t c h e n . owl ’ " ) ( s e t f ∗ e n v i r o n m e n t − o w l − i n d i v i d u a l − n a m e ∗ " ’ h t t p : / / knowrob . o r g / kb / I A I − k i t c h e n . owl # i a i _ k i t c h e n _ r o o m _ l i n k ’ " ) ( s e t f ∗ environment−urdf ∗ " ’ package : / / i a i _ k i t c h e n / u r d f _ o b j / k itc hen . urdf ’ " ) ( setf ∗ environment−urdf−prefix ∗ " ’ iai_kitchen / ’ " ) ( s e t f ∗ a g e n t − o w l ∗ " ’ p a c k a g e : / / knowrob / owl / r o b o t s / PR2 . owl ’ " ) ( s e t f ∗ a g e n t − o w l − i n d i v i d u a l − n a m e ∗ " ’ h t t p : / / knowrob . o r g / kb / PR2 . owl # PR2_0 ’ " ) ( s e t f ∗ a g e n t − u r d f ∗ " ’ p a c k a g e : / / knowrob / u r d f / p r 2 . u r d f ’ " ) The utilised objects also have to be mapped to the SOMA ontology, e.g. the BOWL’s notion must be mapped to: http://www.ease-crc.org/ont/SOMA.owl#Bowl This must be done for every new notion of objects or actions accordingly. Since SOMA has been used previously for robotic NEEM generation, many of the necessary objects are already represented in the ontology. If such a representation does not exist, the overarching class can be used instead. E.g. since WHISK as a tool does not yet exist, it can be represented as a DesignedTool instead http://www.ease-crc.org/ont/SOMA.owl#DesignedTool which is described as "An item designed to enable some action, in which it will play an instrumental role." Alternatively, the necessary object can be added to the ontology by modifying SOMA directly. This approach would require, however, that the changes be submitted to the official SOMA repository so that everybody can use them. Some CRAM action key-value pairs formats, such as poses, must then be converted into a format that KnowRob supports. This coupling of SOMA and CRAM links CRAM actions and semantic knowledge. The low-level data, e.g. robot joint positions, is directly logged from the ROS tf topic and is being synchronised with the individual performed actions via time stamps. 5.3. NEEM generation After the mapping is complete, the NEEM generation can begin. For this, KnowRob has to be running with an instance of MongoDB, and the executed plan has to be wrapped into start- and stop-episode functions. The demo within the start and stop statements can also be executed multiple times to obtain an NEEM containing multiple task executions. An example query which is generated during runtime then to log actions and their subactions can look like this: Listing 3: Example of a query which adds a new subaction of type PickingUp. add_subaction_with_task ( ’ h t t p : / / www. o n t o l o g y d e s i g n p a t t e r n s . o r g / o n t / d u l / DUL . owl # Action_FKYRWDBL ’ , SubAction , ’ h t t p : / / www. e a s e − c r c . o r g / o n t /SOMA . owl # P i c k i n g U p ’ ) The resulting NEEM can then be loaded into a local instance of KnowRob or into OpenEase to be queried and utilised. Loading NEEMs into a locally running instance of KnowRob allows them to be queried from CRAM, which is the planning and execution system. The queries can be used to acquire the parameters with which the task execution was successful and in which the simulated robot fulfilled the previously found conditions, e.g. the whisk was not dropped and was touching the bowl and mixture during the mixing action. The results of the queries can then be used within the action designator to perform the action. 6. Discussion Our theoretical model provides a structured approach to identifying the core aspects of a task and transforming them into executable steps. To validate this hypothesis, we conducted an experiment using the PR2 Robot, a mobile manipulation platform Willow Garage developed to perform the specified action designator within Bullet World, a rapid-fire simulation. To support our analysis, the experiment was documented through recorded videos and screenshots, accessible online. Figure 6 displays a visual summary of the experiment, which illustrates how the theoretical model is applied in a practical scenario, showcasing the robot’s capacity to execute the designated tasks. In the experiment, we (a) Mixing with (b) Mixing with (c) Mixing with (d) Mixing with Whisk in Pot Whisk in Bowl Fork in Wineglass Fork in Bowl Figure 6: Specific Mixing Task variations performed by PR2 robot. used four containers and three types of tools in various combinations, thoroughly documenting the process for evaluation. The containers included a large bowl, a saucepan, a smaller round bowl, and a wineglass, while the tools consisted of a whisk, a fork, and a spoon. It is important to note that some combinations, such as a fork with a wineglass and a whisk with a small round bowl, though operational with the action designator, showed limited success due to the spatial constraints within the containers. Overall, the action designator was successful in most scenarios, except when the tool’s width was too great for the container’s capacity, leading to errors. This highlights the progression from a theoretical model to a versatile action designator capable of handling diverse scenarios with different tools, although limitations from the simulation still persist, such as not fully addressing factors like mixture homogeneity, the viscosities of liquids and substances, and the optimal speed for executing actions. 6.1. Potential applications of Bullet World NEEMs To further evaluate the success of the action, we recorded Narrative-Enabled Episodic Memories (NEEMs) during the experiment. These NEEMs can provide crucial insights into the parameters for successful action execution. We showcase one such recorded NEEM, available online, and discuss the potential insights gained from this and future recordings. So far, we have looked into NEEM generation in VR and in simulation. VR NEEMs have been already used to teach robots new manipulation activities and have their benefits, however NEEMs from simulation can be used to enhance a robots performance as well but in different ways. Since the Bullet World simulator is a rapid-fire simulator, it is very fast but omits certain visualisations. For example, the robot is teleported to a target location instead of the entire path being simulated and visualised. (Details of the implementation and inner workings can be found in Kazhoyan [13].) This way, time can be saved on trajectory simulation, and the gained speed can be utilised for testing various configurations before execution on the real robot [13]. However, since the simulation does provide information such as collision checking and physics, which is simulated for a split second to check if items would fall or slip out of the robot’s hand, it is enough to estimate if certain actions would be successful. From KnowRob [23] [8] and based on previously collected VR NEEMs [21] [22], we could manually gather the conditions which have to be met to achieve a successful mixing action. Such conditions could be touching events between the tool and the robot’s gripper, the fact that the tool has not been dropped during the action, a touching event between the tool and contents of the bowl during the mixing process must occur over some time or repeated multiple times, to achieve the effect of combining two ingredients. If these conditions are met, the simulated mixing action will likely be successful, even though the Bullet World could not have simulated the physical effect of mixing. This idea holds much potential for automating NEEM evaluation and selecting the best action parameters based on prior experiences. By obtaining a large number of NEEMs in which different parameters were used during task execution, it is possible to determine the potentially best parameters for an action and also estimate the robot’s success during task execution. Another benefit to this approach is that bullet NEEMs require fewer resources to generate compared to VR NEEMs. For the latter, more hardware is needed in the shape of VR Headsets, controllers, base stations, a decent GPU which can run Unreal Engine, and a human user who performs the actions in VR in real-time. For Bullet World NEEM generation, a computer with Ubuntu 20.04 and the software stack consisting of CRAM, KnowRob, MongoDB and ROS is needed. An onboard GPU is enough; the experiments can run in a loop overnight to generate the data. 7. Future Outlook Looking ahead, our future research will continue to advance the integration of cognitive robotics into everyday environments with several strategic initiatives: We plan to investigate the effects of different container shapes on robotic mixing capabilities. By understanding how various shapes influence the mixing process, we can refine our algorithms to handle a wider range of household tasks, enhancing the robot’s versatility and effectiveness. We will also deepen our use of Narrative-Enabled Episodic Memories (NEEMs) to improve the robot’s learning and decision-making processes. By expanding the NEEM dataset and enhancing the analysis techniques, the robot can learn from past actions and adjust its strategies more effectively. This ongoing learning will be critical for developing robots that adapt to new tasks with minimal human intervention. We will conduct extensive real-world testing to validate and refine our models based on our theoretical and simulated advancements. Testing in actual home environments will help identify practical challenges and user needs, ensuring that our robots can operate effectively in their intended settings. As robots can perform complex tasks independently, we will explore collaborative scenarios where robots work alongside humans or other robots. This research will focus on optimizing task division, improving interaction protocols, and ensuring safety in shared spaces. By pursuing these roads, we aim to push the boundaries of what cognitive robots can achieve, making them more aware, adaptable, and useful across various settings. Acknowledgments The research reported in this paper has been partially supported by the German Federal Ministy of Education and Research; Project-ID 16DHBKI047 “IntEL4CoRo - Integrated Learning Environment for Cognitive Robotics”, University of Bremen as well as the German Research Foundation DFG, as part of Collaborative Research Center (Sonderforschungsbereich) 1320 “EASE - Everyday Activity Science and Engineering”, University of Bremen (http://www.ease-crc.org/). The research was conducted in subproject R04 “Cognition-enabled execution of everyday actions”. References [1] M. M. Hedblom, M. Pomarlan, R. Porzel, R. Malaka, M. Beetz, Dynamic action selection using image schema-based reasoning for robots, 2021. [2] G. Kazhoyan, S. Stelter, F. K. Kenfack, S. Koralewski, M. Beetz, The robot household marathon experiment, IEEE International Conference on Robotics and Automation (ICRA) (2021). URL: https://arxiv.org/abs/2011.09792, (Accepted for publication). [3] H. Li, D. Xu, An overview of fluids mixing in t-shaped mixers, Theoretical and Applied Mechan- ics Letters 13 (2023). URL: https://www.sciencedirect.com/science/article/pii/S2095034923000375. doi:https://doi.org/10.1016/j.taml.2023.100466. [4] M. Kümpel, V. Hassouna, J.-P. Töberg, P. Cimiano, M. Beetz, Cut, chop, slice or dice: Parameterising general action plans using knowledge graphs, in: International Conference on Intelligent Robots and Systems (IROS 2024), 2024. Submitted. [5] K. Dhanabalachandran, V. Hassouna, M. M. Hedblom, M. Kümpel, N. Leusmann, M. Beetz, Cutting Events: Towards Autonomous Plan Adaption by Robotic Agents through Image-Schematic Event Segmentation, in: Proceedings of the 11th on Knowledge Capture Conference, K-CAP ’21, Associ- ation for Computing Machinery, New York, NY, USA, 2021, pp. 25–32. doi:10.1145/3460210. 3493585. [6] M. Beetz, G. Kazhoyan, D. Vernon, The cram cognitive architecture for robot manipulation in everyday activities, 2023. URL: https://arxiv.org/pdf/2304.14119.pdf. arXiv:2304.14119. [7] J. Winkler, M. Tenorth, A. K. Bozcuoglu, M. Beetz, Cramm–memories for robots performing everyday manipulation activities, Advances in Cognitive Systems 3 (2014) 47–66. URL: https://www.semanticscholar.org/ paper/CRAMm-Memories-for-Robots-Performing-Everyday-Winkler-Tenorth/ 9c8c2448a033da67f6bea2a4d89fe5c77cfb3b00. [8] M. Beetz, D. Beßler, A. Haidu, M. Pomarlan, A. K. Bozcuoglu, G. Bartels, Knowrob 2.0 – a 2nd generation knowledge processing framework for cognition-enabled robotic agents, in: In- ternational Conference on Robotics and Automation (ICRA), Brisbane, Australia, 2018. URL: https://ai.uni-bremen.de/papers/beetz18knowrob.pdf. [9] S. Warren, P. Artemiadis, On the control of human-robot bi-manual manipulation, vol- ume 78, Springer Science and Business Media LLC, 2015, pp. 21–32. URL: https://doi.org/10. 1007/s10846-014-0055-4. [10] M. A. Bollini, Following recipes with a cooking robot, 2012. URL: https://dspace.mit.edu/handle/ 1721.1/74451. [11] K. Ikeda, H. Masuda, N. Shirasugi, et al., Cfd analysis of effective human motion for whipping heavy cream by hand, volume 75, AIDIC, 2019, pp. 121–126. URL: https://doi.org/10.3303/CET1975021. doi:10.3303/CET1975021. [12] G. Kazhoyan, A. Hawkin, S. Koralewski, A. Haidu, M. Beetz, Learning motion parameterizations of mobile pick and place actions from observing humans in virtual environments, in: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 9736–9743. doi:10.1109/IROS45743.2020.9341458. [13] G. Kazhoyan, M. Beetz, Executing Underspecified Actions in Real World Based on Online Projection, in: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE Com- puter Society, Macau, China, 2019, pp. 5156–5163. doi:10.1109/IROS40897.2019.8967867. [14] M. Rohrbach, et al., A database for fine grained activity detection of cooking ac- tivities, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. URL: https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/ human-activity-recognition/mpii-cooking-activities-dataset. [15] Wikihow, WikiHow, ???? URL: https://www.wikihow.com. [16] M. Kümpel, J.-P. Töberg, V. Hassouna, M. Beetz, P. Cimiano, Towards a knowledge engineering methodology for flexible robot manipulation in everyday tasks, in: Workshop on Actionable Knowl- edge Representation and Reasoning for Robots (𝐴𝐾𝑅3 ) at Extended Semantic Web Conference (ESWC 2024), 2024. In press. [17] How Things Fly, Kepler’s laws of orbital motion, no date. URL: https://howthingsfly.si.edu/ flight-dynamics/keplerâĂŹs-laws-orbital-motion, homepage for How Things Fly. [Viewed 21 March 2024]. [18] J. R. Flanagan, M. C. Bowman, R. S. Johansson, Control strategies in object manipulation tasks, Current Opinion in Neurobiology 16 (2006) 650–659. doi:10.1016/j.conb.2006.10.005. [19] S. Koralewski, G. Kazhoyan, M. Beetz, Self-specialization of general robot plans based on experience, IEEE Robotics and Automation Letters 4 (2019) 3766–3773. doi:10.1109/LRA.2019.2928771. [20] D. Beßler, R. Porzel, M. Pomarlan, A. Vyas, S. Höffner, M. Beetz, R. Malaka, J. Bateman, Foundations of the socio-physical model of activities (soma) for autonomous robotic agents (2020). URL: https: //doi.org/10.48550/arXiv.2011.11972. arXiv:2011.11972, submitted on 24 Nov 2020. [21] A. Haidu, M. Beetz, Automated models of human everyday activity based on game and virtual reality technology, in: 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 2606–2612. doi:10.1109/ICRA.2019.8793859. [22] A. Haidu, M. Beetz, Action recognition and interpretation from virtual demonstrations, in: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, pp. 2833–2838. doi:10.1109/IROS.2016.7759439. [23] M. Tenorth, M. Beetz, KNOWROB - Knowledge Processing for Autonomous Personal Robots, in: 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, St. Louis, MO, USA, 2009, pp. 4261–4266. doi:10.1109/IROS.2009.5354602. A. Online Resources All referenced online sources are available via • MongoDB: https://www.mongodb.com/ • SOMA - Socio-physical Model of Activities: https://ease-crc.github.io/soma/ • OpenEase: http://www.open-ease.org/ • NEEM Handbook: https://ease-crc.github.io/soma/owl/1.1.0/NEEM-Handbook.pdf • Recorded videos: https://sunava.github.io/cram-robot-actions/mixing.html • Recorded NEEMs: https://data.open-ease.org/QA?neem_id=65fc2047ac7bd8f0875c795a • Unreal Engine: https://www.unrealengine.com/ • ROS TF library: http://wiki.ros.org/tf