Workshop on Multimodal Semantics for Robotic Systems (MuSRobS) IEEE/RSJ International Conference on Intelligent Robots and Systems 2015 Connecting natural language to task demonstrations and low-level control of industrial robots Maj Stenmark, Jacek Malec Dept. of Computer Science, Lund University, Sweden Abstract— Industrial robotics is a complex domain, not In particular, we have devoted last years to understand and easily amenable to formalization using semantic technologies. describe robotic assembly, including force-based operations It involves such disparate aspects of the real world as geome- (snap, drill, press, etc.), using machine-readable formalism try, dynamics, constraint-satisfaction, planning and scheduling, real-time control, robot-robot and human-robot communication expressing the semantics of possible robot actions. Without and, finally, intentions of the robot user. To represent so it there will be no possibility to create a meaningful rea- different kinds of knowledge is a challenge and the research on soning leading from the task specification (what needs to be combining those topics is only in its infancy. manufactured) to task synthesis (how can this be achieved This paper describes our attempts to combine descriptions of using the available robot skills) and robust execution by robot tasks using natural language together with their realiza- tions using robot hardware involving force sensing, ultimately the synthesized code for a particular architecture. Not to leading to a potential of learning new robot skills employing mention swift error handling in case of unexpected problems, force-based assembly. We believe it is a novel approach opening portability of a robot skill from one robot to another, and possibilities of semantic anchoring for learning from demonstra- learning of new skills. tion. In previous research, we have focused our attention on two areas: interaction between the user and the robotic system, I. I NTRODUCTION preferably on the user’s conditions, e.g., using natural lan- Recent developments in robotics, artificial intelligence and guage [1], [2], and representation of force-controlled assem- cognitive science lead to bold predictions about the soon- bly operations, particularly problematic due to the inherent to-come robotization of all aspects of human life. Robots mix of continuous and discrete aspects [3]. Besides being will help the elderly, perform mundane jobs no one wants, able to talk with the robot about a force-controlled assembly drive our cars, fill our refrigerators when needed, tirelessly operation, we would like it to be learnt automatically from a rehabilitate patients in need of physical exercise, fight our demonstration and be represented semantically in a manner wars, become our sex partners, etc., etc. Some draw even enabling portability among different robots. the conclusion that robots will take over Earth and will turn So far these kinds of systems are developed only in humans into obsolete pets. research laboratories. Our own research is done in the context However, when observing the development of the robotics of several EU projects, in particular ROSETTA, P RACE and field, we can realize that this perspective is rather far, far SME ROBOTICS, aiming at developing intelligent interactive away. Service robots are clumsy and unskilled, no one systems suitable for inexperienced users, such as SMEs. trusts a robotized car, and production still relies on simple Before they reach the factory floor though, they need to manipulators programmed in a classical manner by skilled be filled with sufficient production knowledge so that they engineers. Any attempt to instruct a robot to perform a become useful. Knowledge acquisition is a bottleneck in concrete manufacturing task consists of person-weeks of developing practical systems, as it can only happen while work of skilled system integrator engineers, who take into the system is used, but it won’t be useful before it is done: account the geometrical layout of the workcell, all the objects a classical chicken and egg problem. Therefore the only involved, involving their geometry, physical properties and, viable solution is a learning system, capable of sharing last but not least, purpose of the task. It is the implicit its experiences by storing them in a (possibly cloud-based) knowledge that needs to be transferred into robot code, which knowledge base [4] and using experiences of other robots by makes this task so complex. importing and adapting their skills. However, such a solution Use of semantic technologies is advocated since at least requires a common understanding of the contents of this a decade. Unfortunately, industrial robotics is a complex knowledge base, thus, a commonly agreed-upon semantics. domain, not easily amenable to formalization. It involves The work on standardization of robotics domain is already such disparate aspects of the real world as geometry, dy- quite well advanced. There exist ontologies for specific namics (including forceful interaction with the work ob- domains, like service robotics or surgical robotics, and a jects), constraints, planning, scheduling, optimization, real- core ontology for robotics and automation (CORA) recently time control, robot-robot and human-robot communication standardized by IEEE [5]. However, they introduce concepts and, finally, intentions of the robot user. To represent so in symbolic form without properly connecting to all their different kinds of knowledge is a challenge and the research denotations, e.g., robot programs instantiating skills named on combining those topics is only in its infancy. in these ontologies. Our work addresses this problem by pro- 25 Workshop on Multimodal Semantics for Robotic Systems (MuSRobS) IEEE/RSJ International Conference on Intelligent Robots and Systems 2015 viding concrete denotations belonging to several modalities. operators how to interact with a robot using kinesthetic As we have mentioned before, we describe robot actions teaching and dialogue. using natural language, using assembly graphs, using tran- Please note that our understanding of the term multimodal sition systems, using iTaSC formalism, and using the actual semantics differs from the one quite commonly encountered robot code. Those multiple modalities are co-existing in one in literature, see e.g. [25], where the authors aim at finding system, letting the reasoner switch between representations the meaning of a particular text fragment using the statistical when such a need arises. approach grounded both in text and image corpora. However, Learning from demonstration leads to new problems in there is no attempt to use this semantics in the reverse semantic anchoring of robot actions, as there is no obvious, direction, to generate new utterances (that our robot programs apparent meaning in robot movements. Semantics may be would correspond to). either guessed, derived by inductive reasoning, or attributed post factum by humans via some form of annotating. In III. C URRENT WORK particular, force-based assembly is problematic as quite often The focus of current work is to semantically annotate task the difference between success and failure depends on a demonstrations to enable reuse and reasoning. This involves particular profile of the force signal. So far, this issue has annotating log data with quantities, units, and task states. The been approached using sensor fusion techniques, without logs can then be used to identify force/torque and position direct support from semantics. Our work attempts to remedy constraints and application specific parameter values (posi- this situation, introducing natural language into the picture tions, velocity, stiffness, etc.). The demonstrations are use to and letting assembly to be not only detected via sensor segment the task in different sub-skills and extract parameters readings, but also by being simultaneously told about. for each skill. One approach is to describe the trajectory of the sub-skills using DMPs and then parameterize the II. R ELATED WORK primitives and describe them with for example skill type, In the domain of service robotics, there are some in- preconditions, and postconditions. teresting frameworks for representation of household tasks As an example, when demonstrating picking and placing and environments. KnowRob [6] is a knowledge processing of an object, the task can be segmented into different system that combines declarative and procedural knowledge sub-skills. First the robot approaches the object, opens the from multiple sources, e.g., the RoboEarth [7] database and gripper, moves into pick position, closes the gripper, retracts web sites. A similar project is RoboHow [8], which devel- from the surface moves to the place position (perhaps using oped a knowledge-based reasoning service OpenEASE [9] via positions as well), positions the object correctly, releases and attempts at bridging the gap from symbolic planning it and retracts. Each segment can be described using a to constraint-based control [10]. Ontologies for kit building trajectory (e.g., a DMP) in some reference frame together applications for industrial robots have been developed by with a gripper state. Multiple demonstrations can be used Balakirsky et al. [11] and Carbonera et al. [12] developed for each sub-skill in order to detect which for example a ontology for positions. We have already mentioned the relevant reference frame and allowed gripping poses. To standardization work of IEEE Working Group ORA [5]. enable reuse, we are working on annotating the segments We are interested in integrating low-level statistical task with initial allowed start positions and gripper state, a skill representations taken from demonstrations. Such tasks can type and postconditions. This will allow the planner to add be represented by a trajectory or force profile. The tra- required actions before or after the skill and add error- jectories can be extracted from the demonstration by first handling procedures to the task (e.g., if the robot drops the applying segmentation algorithms and then parameterizing object when transporting it, the object should be localized each segment as a trajectory. Niekum et al. [13], [14] use and picked up again). The skill also has to be parameterized Beta Process Autoregressive Hidden Markov Models from so that it can be initialized correctly, for example, specifying Fox et al. [15] to automatically segment demonstrations and controller, reference frames, velocity values. dynamic movement primitives (DMPs) [16] to represent the Another example is force-controlled assembly. The force trajectories. Since the statistical properties of semantically data is not used for sensor fusion, it is used to control different sub-tasks can be similar, they use predecessor states the motions of the robot and signals failure or success to refine the classification and determine the transitions of the assembly. In a snapfit assembly skill, where a two in a finite state machine. Other learning methods are for plastic pieces, a switch and a box shown in Fig. 1, are example reinforcement learning, used by Metzen et al. [17] ”snapped” together, the force signature indicates whether the to learn skill templates, and Iterative Learning Control, used snap occurred or not. In previous work [1] such task could by Nemec et al. [18] to follow demonstrated force profiles. be expressed using the force constraint directly in guarded One way to annotate objects and actions is to describe motions. Using a graphical user interface, primitive actions them using natural language. Matuszek et al. [19], Kollar et and skills could be combined into a sequence. An example al. [20] and Landsiedel [21] use natural language to describe is shown in Fig. 2. In the sequence, the box is first picked routes and Walter et al. [22] use language descriptions to se- and placed on a fixture using three search motions. The first mantically annotate maps. She et al. [23] studies the dialogue motion moves the robot down until it feels contact forces system, while Cakmak [24] evaluates methods for teaching in the z-direction, then, while pressing down, it searches in 26 Workshop on Multimodal Semantics for Robotic Systems (MuSRobS) IEEE/RSJ International Conference on Intelligent Robots and Systems 2015 the y-direction until contact with the wall, finally, it searches in the x-direction while simultaneously pressing down and towards the wall. In the sequence pickbox, movetofixt, pickswitch and retract are position based motions running on the native robot controller. The snapFitSkill is a reused skill, which in turn contains multiple guarded searches. From the graphical representation, the skill specifi- cation can be exported to XML-format (see excerpt in Fig. 3) and to runnable format, see Fig. 4. The skills are semantically annotated with sensor and controller type, and the parameters are described with units. Fig. 1. A part of a box to the left and a switch to the right. Fig. 3. The XML representation of the three guarded search motions created in the GUI. longer side of the rectangle is parallel with the x-axis and the rotation around a xy-vector from the initial tilted position will align it with the PCB, the execution will branch into a rotation around the x-axis until the short side is aligned with the PCB. Otherwise, the rotation will align the short side first, as seen in Fig. 6. To lower the threshold for the user, we want to use natural language dialogues to describe the demonstration and extend the task. Together with the parameterized demonstrations, this will allow the user to use high-level structures such as Fig. 2. An example of the user interface. loops and if-then-else statements, which are easily described using language but tedious or difficult to describe using In another assembly, a rectangular metal plate (a shield demonstrations only. In our current system, the user can can) is inserted on a printed circuit board (PCB). The PCB instruct the robot using unstructured text or dictate the task is attached in a fixture, which is attached to a force sensor. using Google dictation tools. An example instruction is The assembly starts by tilting the shield can above the PCB displayed in Fig. 7. All parameters have default values, which (see Fig. 5), moving down until the corner touches the board. makes the high-level nominal task easy and fast to generate Then, the robot attaches one corner of the plate to a corner from text. The programming interface use language specific on the PCB and rotates the plate into place. The rotation is statistical tools to extract the semantics of the sentences, first carried out around the xy-plane of the PCB until either then a rule-based mapping to robot skills and world objects. the long or the short side of the rectangle touches the PCB, At the moment, English and Swedish [26] are supported then the last side has to be rotated into place. That is, if the programming languages. 27 Workshop on Multimodal Semantics for Robotic Systems (MuSRobS) IEEE/RSJ International Conference on Intelligent Robots and Systems 2015 Fig. 6. The short side is aligned with the PCB. Fig. 4. Part of the executable state machine. The guarded searches run on an external force controller (ExtCtrl) which has to be turned on and off before the force controlled skills are executed. Fig. 7. The user can instruct the robot using unstructured natural language commands. should ask questions and come with suggestions on what to do. The next step is to introduce the possibility of extending the robot knowledge by adding new concepts to the semantic hierarchy. This is a more complex task than the previous one, as it involves inducing relations with existing concepts and proper placing of the new symbol in the IsA hierarchy. Fig. 5. In the initial position, the shield can is tilted above the PCB. Yet another interesting problem is to reason about “syn- onyms” among robot programs, i.e. syntactically different structures or programs leading to the same effect. A simple IV. C ONCLUSIONS example is a “localize and pick” task that may use different The immediate future work involves investigating how to kinds of sensors to localize an object, while the goal (of teach pre-and postconditions for skills learned from demon- picking the object from its current location) is achieved stration, to enable online reasoning. These conditions need irrespectively of which concrete sensor is used. How to teach to be anchored in sensor readings. Inductive inference is one the system that two skills are equivalent in such (or some possibility; another is to use mixed-initiative dialogue with other) sense? What needs to be told? What kind of reasoning the user, asking for guidance or confirmation, yet another is performed? to introduce some annotation tool to be used simultaneously Representing knowledge about industrial processes involv- with the learning procedure. ing semantically-capable robots is a challenge leading to It is desirable to have natural language support on all fascinating questions. We are quite sure we will have a lot levels in the system. At the moment, we only support task to do in years to come. instruction, but we also want to be able to describe the world ACKNOWLEDGMENTS and connect the perceived objects and situations to (new) semantic symbols. E.g., saying ”This is a nut” after teaching The research leading to these results has received par- the camera system to recognize an object, or describing a tial funding from the European Union’s seventh framework pallet as ”empty”. At the moment, the robot is a passive program under grant agreement No. 287787 (project SME- participant in the dialogue, only reacting on commands from robotics) and from the European Union’s H2020 program the human. When interacting with non-expert users, the robot under grant agreement No. 644938 (project SARAFun). 28 Workshop on Multimodal Semantics for Robotic Systems (MuSRobS) IEEE/RSJ International Conference on Intelligent Robots and Systems 2015 R EFERENCES [14] S. Niekum, S. Osentoski, G. Konidaris, S. Chitta, B. Marthi, and A. G. Barto, “Learning grounded finite-state representations from [1] M. Stenmark, “Instructing industrial robots using high-level task unstructured demonstrations,” The International Journal of Robotics descriptions,” Ph.D. dissertation, Lund University, Department of Research, vol. 34, no. 2, pp. 131–157, 2015. Computer Science, Mar. 2015, licentiate Thesis. [15] E. B. Fox, M. I. Jordan, E. B. Sudderth, and A. S. Willsky, “Sharing [2] M. Stenmark and J. Malec, “Knowledge-Based Instruction of features among dynamical systems with beta processes,” in Advances Manipulation Tasks for Industrial Robotics,” Robotics and Computer in Neural Information Processing Systems 22, Y. Bengio, D. Schu- Integrated Manufacturing, vol. 33, pp. 56–67, 2015. [Online]. urmans, J. Lafferty, C. Williams, and A. Culotta, Eds. Curran Available: http://lup.lub.lu.se/record/4679243/file/4679245.pdf Associates, Inc., 2009, pp. 549–557. [3] J. Malec, K. Nilsson, and H. Bruyninckx, “Describing assembly tasks [16] A. J. Ijspeert, J. Nakanishi, and S. Schaal, “Learning attractor in declarative way,” in Proc. IEEE ICRA 2013 Workshop on Semantics, landscapes for learning motor primitives,” in Advances in Neural Identification and Control of Robot-Human-Environment Interaction, Information Processing Systems 15 (NIPS2002), 2002, pp. 1547–1554. Karlsruhe, Germany, May 2013, pp. 50–53. [17] J. Metzen, A. Fabisch, L. Senger, J. de Gea Fernndez, and E. Kirchner, [4] M. Stenmark, J. Malec, K. Nilsson, and A. Robertsson, “On “Towards learning of generic skills for robotic manipulation,” KI - Distributed Knowledge Bases for Robotized Small-Batch Assembly,” Knstliche Intelligenz, vol. 28, no. 1, pp. 15–20, 2014. IEEE Transactions on Automation Science and Engineering, [18] B. Nemec, F. Abu-Dakka, B. Ridge, A. Ude, J. Jorgensen, vol. 12, no. 2, pp. 519–528, 2015. [Online]. Available: T. Savarimuthu, J. Jouffroy, H. Petersen, and N. Kruger, “Transfer http://dx.doi.org/10.1109/TASE.2015.2408264 of assembly operations to new workpiece poses by adaptation to [5] “IEEE standard ontologies for robotics and automation,” IEEE Stan- the desired force profile,” in Advanced Robotics (ICAR), 2013 16th dard 1872-2015, 2015. International Conference on, Nov 2013, pp. 1–7. [6] M. Tenorth and M. Beetz, “Knowrob: A knowledge processing infras- [19] C. Matuszek, E. Herbst, L. Zettlemoyer, and D. Fox, “Learning to tructure for cognition-enabled robots,” The International Journal of parse natural language commands to a robot control system,” in Robotics Research, vol. 32, no. 5, pp. 566–590, 2013. Experimental Robotics, ser. Springer Tracts in Advanced Robotics. [7] M. Tenorth, A. Perzylo, R. Lafrenz, and M. Beetz, “Representation Springer International Publishing, 2013, vol. 88, pp. 403–415. and exchange of knowledge about actions, objects, and environments [20] T. Kollar, S. Tellex, D. Roy, and N. Roy, “Grounding verbs of in the roboearth framework,” Automation Science and Engineering, motion in natural language commands to robots,” in Experimental IEEE Transactions on, vol. 10, no. 3, pp. 643–651, July 2013. Robotics, ser. Springer Tracts in Advanced Robotics. Springer Berlin [8] M. Tenorth, G. Bartels, and M. Beetz, “Knowledge-based specification Heidelberg, 2014, vol. 79, pp. 31–47. of robot motions,” in Proceedings of the European Conference on [21] C. Landsiedel, R. de Nijs, K. Kuhnlenz, D. Wollherr, and M. Buss, Artificial Intelligence (ECAI), 2014. “Route description interpretation on automatically labeled robot [9] M. Beetz, M. Tenorth, and J. Winkler, “Open-EASE – a knowledge maps,” in Proceedings of the International Conference on Robotics processing service for robots and robotics/ai researchers,” in IEEE and Automation (ICRA), Karlsruhe, Germany, May 2013, p. 22512256. International Conference on Robotics and Automation (ICRA), Seattle, [22] M. R. Walter, S. Hemachandra, B. Homberg, S. Tellex, and S. Teller, Washington, USA, 2015. “Learning semantic maps from natural language descriptions,” in [10] E. Scioni, G. Borghesan, H. Bruyninckx, and M. Bonfe, “Bridging Proceedings of the 2013 Robotics: Science and Systems IX Conference, the gap between discrete symbolic planning and optimization-based Berlin, Germany, 2013. robot control,” in 2015 IEEE International Conference on Robotics [23] L. She, S. Yang, Y. Cheng, Y. Jia, J. Chai, and N. Xi, “Back to and Automation, 2015. the blocks world: Learning new actions through situated human-robot [11] S. Balakirsky, Z. Kootbally, C. Schlenoff, T. Kramer, and S. Gupta, dialogue,” in Proceedings of the 15th Annual Meeting of the Special “An industrial robotic knowledge representation for kit building ap- Interest Group on Discourse and Dialogue (SIGDIAL). Philadelphia, plications,” in Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ PA, U.S.A: for Computational Linguistics, 2014, pp. 89–97. International Conference on, Oct 2012, pp. 1365–1370. [24] M. Cakmak and L. Takayama, “Teaching people how to teach robots: [12] J. Carbonera, S. Rama Fiorini, E. Prestes, V. Jorge, M. Abel, R. Mad- The effect of instructional materials and dialog design,” in Inter- havan, A. Locoro, P. Goncalves, T. Haidegger, M. Barreto, and national Conference on Human-Robot Interaction (HRI), Bielefeld, C. Schlenoff, “Defining positioning in a core ontology for robotics,” in Germany, Mar. 2014. Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International [25] E. Bruni, N. K. Tran, and M. Baroni, “Multimodal distributional Conference on, November 2013, pp. 1867–1872. semantics,” J. Artif. Int. Res., vol. 49, no. 1, pp. 1–47, Jan. 2014. [13] S. Niekum, S. Chitta, A. Barto, B. Marthi, and S. Osentoski, “Incre- [26] M. Stenmark, “Bilingual robots: Extracting robot program statements mental semantically grounded learning from demonstration,” Berlin, from swedish natural language instructions,” in Proc. of The 13th Germany, 2013. Scandinavian Conf. on Artificial Intelligence, Halmstad, Sweden, 2015. 29