=Paper=
{{Paper
|id=Vol-2594/short5
|storemode=property
|title=An Intrinsically Motivated Planning Architecture for Curiosity-driven Robots
|pdfUrl=https://ceur-ws.org/Vol-2594/short5.pdf
|volume=Vol-2594
|authors=Angelo Oddi,Riccardo Rasconi,Vieri Giuliano Santucci,Gabriele Sartor,Emilio Cartoni,Gianluca Baldassarre,Gianluca Baldassarre
|dblpUrl=https://dblp.org/rec/conf/aiia/OddiRSSCBB19
}}
==An Intrinsically Motivated Planning Architecture for Curiosity-driven Robots==
An Intrinsically Motivated Planning Architecture for Curiosity-driven Robots? Angelo Oddi, Riccardo Rasconi, Vieri Giuliano Santucci, Gabriele Sartor, Emilio Cartoni, Francesco Mannella, and Gianluca Baldassarre ISTC-CNR, Via San Martino della Battaglia, 44 - 00185 Rome, Italy {angelo.oddi,riccardo.rasconi,vieri.santucci,gabriele.sartor, emilio.cartoni,francesco.mannella,gianluca.baldassarre}@istc.cnr.it 1 Background and Objectives This paper presents a summary of the IMPACT (Intrinsically Motivated Planning Ar- chitecture for Curiosity-driven roboTs) project funded by the European Space Agency (ESA). The project aimed at investigating the possibility of employing Artificial In- telligence (AI) techniques to increase both the cognitive and operational autonomy of artificial agents in general, and of robotic platforms targeted at the space domain in par- ticular. The idea is based on the creation of a virtuous loop in which the agent increases its learned capabilities through the direct interaction with the real environment, and then exploits the autonomously acquired knowledge to execute activities of increasing complexity: this process is cumulative and virtually open-ended [3] as the information and abilities acquired up to a certain time are employed to further increase the agent’s knowledge of the application domain, as well as the skills to adequately operate in it. This self-induced tendency towards autonomously learning new skills, based on intrin- sic motivations (IM) [16, 5], will enable the artificial agent to face situations and solve problems not foreseeable when the agent is designed and implemented, especially be- cause of the limited knowledge on the environment the agent will operate in. The IMPACT software framework [11] aimed at extending the well-known three- layered robot control architecture [1, 4, 6] commonly accepted in general robotics to support the Sense-Plan-Act (SPA) autonomous deliberation and execution paradigm. Indeed, the IMPACT system implements a Discover-Plan-Act (DPA) cycle, which di- rectly extends the SPA cycle with a more general open-ended learning step (Discover) acquiring new knowledge from the external environment. In particular, within the three- layer architecture we integrated the following new functionalities: (1) autonomous learn- ing of new skills based on self-generated goals driven by intrinsic motivations (intrinsic goals) [15]; (2) automatic translation of the newly acquired skills [12], from a low-level sub-symbolic representation to a high-level symbolic representation (e.g., expressed in Planning Domain Definition Language - PDDL [10]); (3) autonomous enrichment of the planning domain by adding knowledge on new states and operators expressed in ? This research has been supported by the European Space Agency (ESA) under contract No. 4000124068/18/NL/CRS, project IMPACT - Intrinsically Motivated Planning Architecture for Curiosity-driven roboTs. The view expressed in this paper can in no way be taken to reflect the official opinion of the European Space Agency. Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Fig. 1. IMPACT high-level software architecture high-level symbols. Next section describes the IMPACT software architecture, whereas Section 3 introduces two test scenarios using the Gazebo Robot Simulator [7] 1 : a plan- etary rover and a robotic arm. 2 IMPACT Architecture Figure 1 represents the IMPACT high-level software architecture. As introduced above, it is based on the three-layer robot control architecture [1, 4, 6] commonly accepted in general robotics to support the Sense-Plan-Act (SPA) autonomous deliberation and execution paradigm. In this architecture, the decisional layer implements the high-level planning, plan execution, and re-planning capabilities; the executive layer controls and coordinates the execution of the functions distributed in the software, according to the requirements of the high-level tasks that have to be executed; lastly, the functional layer implements all the basic, built-in robot sensing, perception, and actuation capabilities. In Figure 1 the new blocks are highlighted (in light grey) within the three layers to represent the proposed extensions for including learning capabilities. In the following subsections we describe in details the single blocks. The Decisional Layer. Within the classical framework of the three-layer architecture the decisional layer traditionally contains a task planner to generate a sequence of oper- ations to reach a certain goal provided by the users. The IMPACT project has the main challenge of integrating this planning capabilities with autonomous intrinsically mo- tivated learning (IM-learning) features for increasing the number of known skills and 1 http://gazebosim.org/ extending the related high-level model in the robot’s knowledge base. We think that the given extended vision of the three-layer architecture can be seen as one of the possible ways for addressing the long-term autonomy (LTA) problem for robotic systems [9]. In general, LTA can be seen as: (i) the ability of a robotic system to perform reliable operations for long periods of time under changing and unpredictable environmental conditions; (ii) the capability of increasing its knowledge about the working environ- ment as time passes. Within this framework of LTA, the integration of IM-learning capabilities is one of the possible “enablers” for long-term robot autonomy[9]. For this reason, within the decisional layer of the IMPACT architecture has a long-term auton- omy manager (LTA-M), see Figure 1, which provides a set of strategies to coordinate the achievements of both extrinsic and intrinsic goals, i.e., the mission goals, coming from outside Vs. the internal goals generated by curiosity driven behaviours [16, 5]. Starting from the system proposed in [2], we extended the design of the ROSPlan system with the integration of IM-learning capabilities as it follows. ROSPlan includes two main ROS2 [14] nodes: (1) the Knowledge Base (KB) and the (2) Planning System (PS). KB is a collection of interfaces, and is intended to collate the up-to-date model of the environment. The Planning System acts as a wrapper for the internal planner such that: (1) builds the initial state automatically - as a PDDL problem instance - from the knowledge stored in the Knowledge Base; (2) passes the PDDL problem instance to the planner; (3) dispatches each action, deciding when to reformulate and re-plan. ROSPlan substantially implements the above introduced Sense-Plan-Act (SPA) autonomous de- liberation and execution paradigm, whereas we implement a Discover-Plan-Act (DPA) cycle, which represents a significant extension of the SPA cycle by means of a more general open-ended learning step (Discover) such that: (i) it refers (as in ROSPlan) to the process of populating the Knowledge Base from sensor data; (ii) it represents also the process of autonomous enrichment of the planning domain by adding knowledge on new states and operators expressed in high-level symbols. The Functional Layer. A functional layer (also called skill layer in [1]), which has access to the system’s sensors and actuators and provides reactive behaviour which is robust even under environmental disturbances, for example with the help of closed- loop control. In IMPACT the GRAIL [15] system works as functional layer. In par- ticular, as shown in Figure 1 GRAIL has a twofold role. On the one hand, within the functional layer, the block “GRAIL-C” will provide the set of stable controllers (i.e., the experts, see [15]) available in IMPACT for action execution. On the other hand, the block “GRAIL-IM-Learning” provides the autonomous learning component of new skills based on self-generated goals driven by intrinsic motivations (intrinsic goals). The Executive Layer. The executive layer mediates between the decisional and func- tional layer, i.e. it activates or deactivates the reactive operations according to the de- liberator’s specification. In general, it represents the interface between the decisional and the functional levels. It controls and coordinates the execution of the functions dis- tributed over the various functional level modules (i.e., the experts, see [15]) according to the task requirements. The above role, that is, the process of mapping symbolic and 2 Robot Operating System (ROS) - https://www.ros.org/ abstract plans to continuous actions into the real working domain, is well recognised in the literature [1, 4, 6]. However, we have also inserted at the same layer the subsystem (called “KoPro+”, see [8]) for the automatic translation of the newly acquired skills, from a low-level sub-symbolic representation (generated by “GRAIL-IM-Learning”) to a high-level symbolic representation. In some sense, the “KoPro+” module has a dual role in comparison to the Executor; in fact, the learning process can be seen as opposite to the execution process, as it maps low-level sub-symbolic representation into abstract symbolic representation. 3 Operational Evaluation The validation and verification of the IMPACT framework was carried out in the form of two demonstration test cases. The Rover scenario, which demonstrates how the IMPACT system can discover new ways to reach an already known effect by applying the procedure (called KoPro+, see [8]) for the automatic translation of the newly acquired skills from the sub-symbolic level to the PDDL (symbolic) level. This scenario proposes a situation where the ori- entation mechanism of a planetary rover antenna has been damaged and the rover can no longer use it to point the antenna and establish a stable communication. In this case, our technology can be used to demonstrate how the rover is capable of enriching its planning domain with the necessary knowledge to orient the antenna merely using the locomotion capabilities, for example moving around the entire body in order to reach the correct attitude to gain and maintain communication, possibly exploiting terrain slopes and/or small rocks. The Robot Arm scenario, which demonstrates the ability of the IMPACT system to acquire new ways to interact with the environment and integrate them in its planning domain. In this scenario, a robot equipped with a gripper actuator attached to a manoeu- vrable arm tries to grasp a “vase shaped” rock whose size exceeds the max opening span of the gripper. The robot is thus not able to pick-up the rock with its basic grasping skill - however, upon failure, the IMPACT system will automatically trigger the learning of a new skill and the robot will at the end be able to pick-up the “vase shaped” rock by grasping it from its edge. 4 Conclusion and Future Work We propose an extended version of the well-known three-layer architecture used in system robotics, extending the SPA cycle with a more general open-ended learning step (Discover) acquiring new knowledge from the external environment. Two differ- ent functionalities has been added to the classical three-layers architecture: (i) we have connected a goal-discovering and skill-learning robotic architecture (GRAIL) see [15] to the symbolic abstraction procedure proposed in [8], creating a processing pipeline from the low-level direct interaction of the agent with the environment, to the corre- sponding symbolic representation of the same environment; (ii) we have integrated a long-term autonomy manager (LTA-M) in the IMPACT architecture, which provides a set of strategies to coordinate the achievement of both extrinsic and intrinsic goals. Possible directions of future work are: (i) the integration of symbolic planning and open-ended learning to increase the ability on one agent to autonomously acquire new skills (bootstrap learning [3]) ; (ii) to use a different simulation engine, for example the 3DROV environment [13] developed by ESA. References 1. Bonasso, R.P., Firby, R.J., Gat, E., Kortenkamp, D., Miller, D.P., Slack, M.G.: Experiences with an architecture for intelligent, reactive agents. Journal of Experimental & Theoretical Artificial Intelligence 9(2-3), 237–256 (1997). https://doi.org/10.1080/095281397147103, https://doi.org/10.1080/095281397147103 2. Cashmore, M., Fox, M., Long, D., Magazzeni, D., Ridder, B., Carrera, A., Palomeras, N., Hurtos, N., Carreras, M.: Rosplan: Planning in the robot operating system. In: ICAPS 2015, the 25th International Conference on Automated Planning and Scheduling (2015), https: //www.aaai.org/ocs/index.php/ICAPS/ICAPS15/paper/view/10619 3. Doncieux, S., Filliat, D., Diaz-Rodriguez, N., Hospedales, T., Duro, R., Coninx, A., Roijers, D.M., Girard, B., Perrin, N., Sigaud, O.: Open-ended learning: A concep- tual framework based on representational redescription. Frontiers in Neurorobotics 12, 59 (2018). https://doi.org/10.3389/fnbot.2018.00059, https://www.frontiersin.org/article/10. 3389/fnbot.2018.00059 4. Gat, E.: Three-layer architectures. In: Kortenkamp, D., Bonasso, R.P., Murphy, R. (eds.) Artificial Intelligence and Mobile Robots, pp. 195–210. MIT Press, Cambridge, MA, USA (1998), http://dl.acm.org/citation.cfm?id=292092.292130 5. Hester, T., Stone, P.: Intrinsically motivated model learning for devel- oping curious robots. Artificial Intelligence 247, 170 – 186 (2017). https://doi.org/https://doi.org/10.1016/j.artint.2015.05.002, http://www.sciencedirect. com/science/article/pii/S0004370215000764, special Issue on AI and Robotics 6. Ingrand, F., Lacroix, S., Lemai-Chenevier, S., Py, F.: Decisional autonomy of planetary rovers. Journal of Field Robotics 24(7), 559–580 (2007). https://doi.org/10.1002/rob.20206, https://onlinelibrary.wiley.com/doi/abs/10.1002/rob.20206 7. Koenig, N., Howard, A.: Design and use paradigms for gazebo, an open-source multi- robot simulator. In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566). vol. 3, pp. 2149–2154 vol.3 (Sept 2004). https://doi.org/10.1109/IROS.2004.1389727 8. Konidaris, G., Kaelbling, L.P., Lozano-Perez, T.: From skills to symbols: Learning symbolic representations for abstract high-level planning. Journal of Artificial Intelligence Research 61, 215–289 (2018), http://lis.csail.mit.edu/pubs/konidaris-jair18.pdf 9. Kunze, L., Hawes, N., Duckett, T., Hanheide, M., Krajnı́k, T.: Artificial intelligence for long- term robot autonomy: A survey. IEEE Robotics and Automation Letters 3(4), 4023–4030 (Oct 2018). https://doi.org/10.1109/LRA.2018.2860628 10. Mcdermott, D., Ghallab, M., Howe, A., Knoblock, C., Ram, A., Veloso, M., Weld, D., Wilkins, D.: PDDL - The Planning Domain Definition Language. Tech. rep., CVC TR- 98-003/DCS TR-1165, Yale Center for Computational Vision and Control (1998), http: //citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.212 11. Oddi, A., Baldassarre, G., Rasconi, R., Santucci, V.G., Sartor, G., Cartoni, E.: Sviluppo di architetture software innovative per l’autonomia in ambito spaziale. In: Workshop AI for Space, Ital-IA 2019, Convegno Nazionale CINI sull’Intelligenza Artificiale, Roma, 18 Marzo 2019 (03 2019) 12. Oddi, A., Rasconi, R., Cartoni, E., Sartor, G., Baldassarre, G., Santucci, V.G.: Learning high- level planning symbols from intrinsically motivated experience (2019), arXiv:1907.08313 13. Poulakis, P., Joudrier, L., Wailliez, S., Kapellos, K.: 3drov: A planetary rover system de- sign, simulation and verification tool. In: International Symposium on Artificial Intelligence, Robotics and Automation in Space i?SAIRAS 2008, Hollywood, USA, February 26-29, 2008 (02 2008) 14. Quigley, M., Conley, K., Gerkey, B.P., Faust, J., Foote, T., Leibs, J., Wheeler, R., Ng, A.Y.: Ros: an open-source robot operating system. In: ICRA Workshop on Open Source Software (2009) 15. Santucci, V.G., Baldassarre, G., Mirolli, M.: GRAIL: A Goal-Discovering Robotic Architec- ture for Intrinsically-Motivated Learning. IEEE Transactions on Cognitive and Developmen- tal Systems 8(3), 214–231 (Sept 2016). https://doi.org/10.1109/TCDS.2016.2538961 16. White, R.W.: Motivation reconsidered: The concept of competence. Psychological Review 66(5), 297–333 (1959). https://doi.org/http://dx.doi.org/10.1037/h0040934