=Paper=
{{Paper
|id=Vol-1540/paper_05
|storemode=property
|title=Connecting Natural Language to Task Demonstrations and Low-level Control of Industrial Robots
|pdfUrl=https://ceur-ws.org/Vol-1540/paper_05.pdf
|volume=Vol-1540
|dblpUrl=https://dblp.org/rec/conf/iros/StenmarkM15
}}
==Connecting Natural Language to Task Demonstrations and Low-level Control of Industrial Robots==
Workshop on Multimodal Semantics for Robotic Systems (MuSRobS)
IEEE/RSJ International Conference on Intelligent Robots and Systems 2015
Connecting natural language to task demonstrations and low-level control
of industrial robots
Maj Stenmark, Jacek Malec
Dept. of Computer Science, Lund University, Sweden
Abstract— Industrial robotics is a complex domain, not In particular, we have devoted last years to understand and
easily amenable to formalization using semantic technologies. describe robotic assembly, including force-based operations
It involves such disparate aspects of the real world as geome- (snap, drill, press, etc.), using machine-readable formalism
try, dynamics, constraint-satisfaction, planning and scheduling,
real-time control, robot-robot and human-robot communication expressing the semantics of possible robot actions. Without
and, finally, intentions of the robot user. To represent so it there will be no possibility to create a meaningful rea-
different kinds of knowledge is a challenge and the research on soning leading from the task specification (what needs to be
combining those topics is only in its infancy. manufactured) to task synthesis (how can this be achieved
This paper describes our attempts to combine descriptions of using the available robot skills) and robust execution by
robot tasks using natural language together with their realiza-
tions using robot hardware involving force sensing, ultimately the synthesized code for a particular architecture. Not to
leading to a potential of learning new robot skills employing mention swift error handling in case of unexpected problems,
force-based assembly. We believe it is a novel approach opening portability of a robot skill from one robot to another, and
possibilities of semantic anchoring for learning from demonstra- learning of new skills.
tion. In previous research, we have focused our attention on two
areas: interaction between the user and the robotic system,
I. I NTRODUCTION
preferably on the user’s conditions, e.g., using natural lan-
Recent developments in robotics, artificial intelligence and guage [1], [2], and representation of force-controlled assem-
cognitive science lead to bold predictions about the soon- bly operations, particularly problematic due to the inherent
to-come robotization of all aspects of human life. Robots mix of continuous and discrete aspects [3]. Besides being
will help the elderly, perform mundane jobs no one wants, able to talk with the robot about a force-controlled assembly
drive our cars, fill our refrigerators when needed, tirelessly operation, we would like it to be learnt automatically from a
rehabilitate patients in need of physical exercise, fight our demonstration and be represented semantically in a manner
wars, become our sex partners, etc., etc. Some draw even enabling portability among different robots.
the conclusion that robots will take over Earth and will turn So far these kinds of systems are developed only in
humans into obsolete pets. research laboratories. Our own research is done in the context
However, when observing the development of the robotics of several EU projects, in particular ROSETTA, P RACE and
field, we can realize that this perspective is rather far, far SME ROBOTICS, aiming at developing intelligent interactive
away. Service robots are clumsy and unskilled, no one systems suitable for inexperienced users, such as SMEs.
trusts a robotized car, and production still relies on simple Before they reach the factory floor though, they need to
manipulators programmed in a classical manner by skilled be filled with sufficient production knowledge so that they
engineers. Any attempt to instruct a robot to perform a become useful. Knowledge acquisition is a bottleneck in
concrete manufacturing task consists of person-weeks of developing practical systems, as it can only happen while
work of skilled system integrator engineers, who take into the system is used, but it won’t be useful before it is done:
account the geometrical layout of the workcell, all the objects a classical chicken and egg problem. Therefore the only
involved, involving their geometry, physical properties and, viable solution is a learning system, capable of sharing
last but not least, purpose of the task. It is the implicit its experiences by storing them in a (possibly cloud-based)
knowledge that needs to be transferred into robot code, which knowledge base [4] and using experiences of other robots by
makes this task so complex. importing and adapting their skills. However, such a solution
Use of semantic technologies is advocated since at least requires a common understanding of the contents of this
a decade. Unfortunately, industrial robotics is a complex knowledge base, thus, a commonly agreed-upon semantics.
domain, not easily amenable to formalization. It involves The work on standardization of robotics domain is already
such disparate aspects of the real world as geometry, dy- quite well advanced. There exist ontologies for specific
namics (including forceful interaction with the work ob- domains, like service robotics or surgical robotics, and a
jects), constraints, planning, scheduling, optimization, real- core ontology for robotics and automation (CORA) recently
time control, robot-robot and human-robot communication standardized by IEEE [5]. However, they introduce concepts
and, finally, intentions of the robot user. To represent so in symbolic form without properly connecting to all their
different kinds of knowledge is a challenge and the research denotations, e.g., robot programs instantiating skills named
on combining those topics is only in its infancy. in these ontologies. Our work addresses this problem by pro-
25
Workshop on Multimodal Semantics for Robotic Systems (MuSRobS)
IEEE/RSJ International Conference on Intelligent Robots and Systems 2015
viding concrete denotations belonging to several modalities. operators how to interact with a robot using kinesthetic
As we have mentioned before, we describe robot actions teaching and dialogue.
using natural language, using assembly graphs, using tran- Please note that our understanding of the term multimodal
sition systems, using iTaSC formalism, and using the actual semantics differs from the one quite commonly encountered
robot code. Those multiple modalities are co-existing in one in literature, see e.g. [25], where the authors aim at finding
system, letting the reasoner switch between representations the meaning of a particular text fragment using the statistical
when such a need arises. approach grounded both in text and image corpora. However,
Learning from demonstration leads to new problems in there is no attempt to use this semantics in the reverse
semantic anchoring of robot actions, as there is no obvious, direction, to generate new utterances (that our robot programs
apparent meaning in robot movements. Semantics may be would correspond to).
either guessed, derived by inductive reasoning, or attributed
post factum by humans via some form of annotating. In III. C URRENT WORK
particular, force-based assembly is problematic as quite often The focus of current work is to semantically annotate task
the difference between success and failure depends on a demonstrations to enable reuse and reasoning. This involves
particular profile of the force signal. So far, this issue has annotating log data with quantities, units, and task states. The
been approached using sensor fusion techniques, without logs can then be used to identify force/torque and position
direct support from semantics. Our work attempts to remedy constraints and application specific parameter values (posi-
this situation, introducing natural language into the picture tions, velocity, stiffness, etc.). The demonstrations are use to
and letting assembly to be not only detected via sensor segment the task in different sub-skills and extract parameters
readings, but also by being simultaneously told about. for each skill. One approach is to describe the trajectory
of the sub-skills using DMPs and then parameterize the
II. R ELATED WORK primitives and describe them with for example skill type,
In the domain of service robotics, there are some in- preconditions, and postconditions.
teresting frameworks for representation of household tasks As an example, when demonstrating picking and placing
and environments. KnowRob [6] is a knowledge processing of an object, the task can be segmented into different
system that combines declarative and procedural knowledge sub-skills. First the robot approaches the object, opens the
from multiple sources, e.g., the RoboEarth [7] database and gripper, moves into pick position, closes the gripper, retracts
web sites. A similar project is RoboHow [8], which devel- from the surface moves to the place position (perhaps using
oped a knowledge-based reasoning service OpenEASE [9] via positions as well), positions the object correctly, releases
and attempts at bridging the gap from symbolic planning it and retracts. Each segment can be described using a
to constraint-based control [10]. Ontologies for kit building trajectory (e.g., a DMP) in some reference frame together
applications for industrial robots have been developed by with a gripper state. Multiple demonstrations can be used
Balakirsky et al. [11] and Carbonera et al. [12] developed for each sub-skill in order to detect which for example
a ontology for positions. We have already mentioned the relevant reference frame and allowed gripping poses. To
standardization work of IEEE Working Group ORA [5]. enable reuse, we are working on annotating the segments
We are interested in integrating low-level statistical task with initial allowed start positions and gripper state, a skill
representations taken from demonstrations. Such tasks can type and postconditions. This will allow the planner to add
be represented by a trajectory or force profile. The tra- required actions before or after the skill and add error-
jectories can be extracted from the demonstration by first handling procedures to the task (e.g., if the robot drops the
applying segmentation algorithms and then parameterizing object when transporting it, the object should be localized
each segment as a trajectory. Niekum et al. [13], [14] use and picked up again). The skill also has to be parameterized
Beta Process Autoregressive Hidden Markov Models from so that it can be initialized correctly, for example, specifying
Fox et al. [15] to automatically segment demonstrations and controller, reference frames, velocity values.
dynamic movement primitives (DMPs) [16] to represent the Another example is force-controlled assembly. The force
trajectories. Since the statistical properties of semantically data is not used for sensor fusion, it is used to control
different sub-tasks can be similar, they use predecessor states the motions of the robot and signals failure or success
to refine the classification and determine the transitions of the assembly. In a snapfit assembly skill, where a two
in a finite state machine. Other learning methods are for plastic pieces, a switch and a box shown in Fig. 1, are
example reinforcement learning, used by Metzen et al. [17] ”snapped” together, the force signature indicates whether the
to learn skill templates, and Iterative Learning Control, used snap occurred or not. In previous work [1] such task could
by Nemec et al. [18] to follow demonstrated force profiles. be expressed using the force constraint directly in guarded
One way to annotate objects and actions is to describe motions. Using a graphical user interface, primitive actions
them using natural language. Matuszek et al. [19], Kollar et and skills could be combined into a sequence. An example
al. [20] and Landsiedel [21] use natural language to describe is shown in Fig. 2. In the sequence, the box is first picked
routes and Walter et al. [22] use language descriptions to se- and placed on a fixture using three search motions. The first
mantically annotate maps. She et al. [23] studies the dialogue motion moves the robot down until it feels contact forces
system, while Cakmak [24] evaluates methods for teaching in the z-direction, then, while pressing down, it searches in
26
Workshop on Multimodal Semantics for Robotic Systems (MuSRobS)
IEEE/RSJ International Conference on Intelligent Robots and Systems 2015
the y-direction until contact with the wall, finally, it searches
in the x-direction while simultaneously pressing down and
towards the wall. In the sequence pickbox, movetofixt,
pickswitch and retract are position based motions
running on the native robot controller. The snapFitSkill
is a reused skill, which in turn contains multiple guarded
searches. From the graphical representation, the skill specifi-
cation can be exported to XML-format (see excerpt in Fig. 3)
and to runnable format, see Fig. 4. The skills are semantically
annotated with sensor and controller type, and the parameters
are described with units.
Fig. 1. A part of a box to the left and a switch to the right.
Fig. 3. The XML representation of the three guarded search motions
created in the GUI.
longer side of the rectangle is parallel with the x-axis and
the rotation around a xy-vector from the initial tilted position
will align it with the PCB, the execution will branch into a
rotation around the x-axis until the short side is aligned with
the PCB. Otherwise, the rotation will align the short side
first, as seen in Fig. 6.
To lower the threshold for the user, we want to use natural
language dialogues to describe the demonstration and extend
the task. Together with the parameterized demonstrations,
this will allow the user to use high-level structures such as
Fig. 2. An example of the user interface. loops and if-then-else statements, which are easily described
using language but tedious or difficult to describe using
In another assembly, a rectangular metal plate (a shield demonstrations only. In our current system, the user can
can) is inserted on a printed circuit board (PCB). The PCB instruct the robot using unstructured text or dictate the task
is attached in a fixture, which is attached to a force sensor. using Google dictation tools. An example instruction is
The assembly starts by tilting the shield can above the PCB displayed in Fig. 7. All parameters have default values, which
(see Fig. 5), moving down until the corner touches the board. makes the high-level nominal task easy and fast to generate
Then, the robot attaches one corner of the plate to a corner from text. The programming interface use language specific
on the PCB and rotates the plate into place. The rotation is statistical tools to extract the semantics of the sentences,
first carried out around the xy-plane of the PCB until either then a rule-based mapping to robot skills and world objects.
the long or the short side of the rectangle touches the PCB, At the moment, English and Swedish [26] are supported
then the last side has to be rotated into place. That is, if the programming languages.
27
Workshop on Multimodal Semantics for Robotic Systems (MuSRobS)
IEEE/RSJ International Conference on Intelligent Robots and Systems 2015
Fig. 6. The short side is aligned with the PCB.
Fig. 4. Part of the executable state machine. The guarded searches run
on an external force controller (ExtCtrl) which has to be turned on and off
before the force controlled skills are executed.
Fig. 7. The user can instruct the robot using unstructured natural language
commands.
should ask questions and come with suggestions on what to
do.
The next step is to introduce the possibility of extending
the robot knowledge by adding new concepts to the semantic
hierarchy. This is a more complex task than the previous one,
as it involves inducing relations with existing concepts and
proper placing of the new symbol in the IsA hierarchy.
Fig. 5. In the initial position, the shield can is tilted above the PCB. Yet another interesting problem is to reason about “syn-
onyms” among robot programs, i.e. syntactically different
structures or programs leading to the same effect. A simple
IV. C ONCLUSIONS example is a “localize and pick” task that may use different
The immediate future work involves investigating how to kinds of sensors to localize an object, while the goal (of
teach pre-and postconditions for skills learned from demon- picking the object from its current location) is achieved
stration, to enable online reasoning. These conditions need irrespectively of which concrete sensor is used. How to teach
to be anchored in sensor readings. Inductive inference is one the system that two skills are equivalent in such (or some
possibility; another is to use mixed-initiative dialogue with other) sense? What needs to be told? What kind of reasoning
the user, asking for guidance or confirmation, yet another is performed?
to introduce some annotation tool to be used simultaneously Representing knowledge about industrial processes involv-
with the learning procedure. ing semantically-capable robots is a challenge leading to
It is desirable to have natural language support on all fascinating questions. We are quite sure we will have a lot
levels in the system. At the moment, we only support task to do in years to come.
instruction, but we also want to be able to describe the world
ACKNOWLEDGMENTS
and connect the perceived objects and situations to (new)
semantic symbols. E.g., saying ”This is a nut” after teaching The research leading to these results has received par-
the camera system to recognize an object, or describing a tial funding from the European Union’s seventh framework
pallet as ”empty”. At the moment, the robot is a passive program under grant agreement No. 287787 (project SME-
participant in the dialogue, only reacting on commands from robotics) and from the European Union’s H2020 program
the human. When interacting with non-expert users, the robot under grant agreement No. 644938 (project SARAFun).
28
Workshop on Multimodal Semantics for Robotic Systems (MuSRobS)
IEEE/RSJ International Conference on Intelligent Robots and Systems 2015
R EFERENCES [14] S. Niekum, S. Osentoski, G. Konidaris, S. Chitta, B. Marthi, and
A. G. Barto, “Learning grounded finite-state representations from
[1] M. Stenmark, “Instructing industrial robots using high-level task unstructured demonstrations,” The International Journal of Robotics
descriptions,” Ph.D. dissertation, Lund University, Department of Research, vol. 34, no. 2, pp. 131–157, 2015.
Computer Science, Mar. 2015, licentiate Thesis. [15] E. B. Fox, M. I. Jordan, E. B. Sudderth, and A. S. Willsky, “Sharing
[2] M. Stenmark and J. Malec, “Knowledge-Based Instruction of features among dynamical systems with beta processes,” in Advances
Manipulation Tasks for Industrial Robotics,” Robotics and Computer in Neural Information Processing Systems 22, Y. Bengio, D. Schu-
Integrated Manufacturing, vol. 33, pp. 56–67, 2015. [Online]. urmans, J. Lafferty, C. Williams, and A. Culotta, Eds. Curran
Available: http://lup.lub.lu.se/record/4679243/file/4679245.pdf Associates, Inc., 2009, pp. 549–557.
[3] J. Malec, K. Nilsson, and H. Bruyninckx, “Describing assembly tasks [16] A. J. Ijspeert, J. Nakanishi, and S. Schaal, “Learning attractor
in declarative way,” in Proc. IEEE ICRA 2013 Workshop on Semantics, landscapes for learning motor primitives,” in Advances in Neural
Identification and Control of Robot-Human-Environment Interaction, Information Processing Systems 15 (NIPS2002), 2002, pp. 1547–1554.
Karlsruhe, Germany, May 2013, pp. 50–53. [17] J. Metzen, A. Fabisch, L. Senger, J. de Gea Fernndez, and E. Kirchner,
[4] M. Stenmark, J. Malec, K. Nilsson, and A. Robertsson, “On “Towards learning of generic skills for robotic manipulation,” KI -
Distributed Knowledge Bases for Robotized Small-Batch Assembly,” Knstliche Intelligenz, vol. 28, no. 1, pp. 15–20, 2014.
IEEE Transactions on Automation Science and Engineering, [18] B. Nemec, F. Abu-Dakka, B. Ridge, A. Ude, J. Jorgensen,
vol. 12, no. 2, pp. 519–528, 2015. [Online]. Available: T. Savarimuthu, J. Jouffroy, H. Petersen, and N. Kruger, “Transfer
http://dx.doi.org/10.1109/TASE.2015.2408264 of assembly operations to new workpiece poses by adaptation to
[5] “IEEE standard ontologies for robotics and automation,” IEEE Stan- the desired force profile,” in Advanced Robotics (ICAR), 2013 16th
dard 1872-2015, 2015. International Conference on, Nov 2013, pp. 1–7.
[6] M. Tenorth and M. Beetz, “Knowrob: A knowledge processing infras- [19] C. Matuszek, E. Herbst, L. Zettlemoyer, and D. Fox, “Learning to
tructure for cognition-enabled robots,” The International Journal of parse natural language commands to a robot control system,” in
Robotics Research, vol. 32, no. 5, pp. 566–590, 2013. Experimental Robotics, ser. Springer Tracts in Advanced Robotics.
[7] M. Tenorth, A. Perzylo, R. Lafrenz, and M. Beetz, “Representation Springer International Publishing, 2013, vol. 88, pp. 403–415.
and exchange of knowledge about actions, objects, and environments [20] T. Kollar, S. Tellex, D. Roy, and N. Roy, “Grounding verbs of
in the roboearth framework,” Automation Science and Engineering, motion in natural language commands to robots,” in Experimental
IEEE Transactions on, vol. 10, no. 3, pp. 643–651, July 2013. Robotics, ser. Springer Tracts in Advanced Robotics. Springer Berlin
[8] M. Tenorth, G. Bartels, and M. Beetz, “Knowledge-based specification Heidelberg, 2014, vol. 79, pp. 31–47.
of robot motions,” in Proceedings of the European Conference on [21] C. Landsiedel, R. de Nijs, K. Kuhnlenz, D. Wollherr, and M. Buss,
Artificial Intelligence (ECAI), 2014. “Route description interpretation on automatically labeled robot
[9] M. Beetz, M. Tenorth, and J. Winkler, “Open-EASE – a knowledge maps,” in Proceedings of the International Conference on Robotics
processing service for robots and robotics/ai researchers,” in IEEE and Automation (ICRA), Karlsruhe, Germany, May 2013, p. 22512256.
International Conference on Robotics and Automation (ICRA), Seattle, [22] M. R. Walter, S. Hemachandra, B. Homberg, S. Tellex, and S. Teller,
Washington, USA, 2015. “Learning semantic maps from natural language descriptions,” in
[10] E. Scioni, G. Borghesan, H. Bruyninckx, and M. Bonfe, “Bridging Proceedings of the 2013 Robotics: Science and Systems IX Conference,
the gap between discrete symbolic planning and optimization-based Berlin, Germany, 2013.
robot control,” in 2015 IEEE International Conference on Robotics [23] L. She, S. Yang, Y. Cheng, Y. Jia, J. Chai, and N. Xi, “Back to
and Automation, 2015. the blocks world: Learning new actions through situated human-robot
[11] S. Balakirsky, Z. Kootbally, C. Schlenoff, T. Kramer, and S. Gupta, dialogue,” in Proceedings of the 15th Annual Meeting of the Special
“An industrial robotic knowledge representation for kit building ap- Interest Group on Discourse and Dialogue (SIGDIAL). Philadelphia,
plications,” in Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ PA, U.S.A: for Computational Linguistics, 2014, pp. 89–97.
International Conference on, Oct 2012, pp. 1365–1370. [24] M. Cakmak and L. Takayama, “Teaching people how to teach robots:
[12] J. Carbonera, S. Rama Fiorini, E. Prestes, V. Jorge, M. Abel, R. Mad- The effect of instructional materials and dialog design,” in Inter-
havan, A. Locoro, P. Goncalves, T. Haidegger, M. Barreto, and national Conference on Human-Robot Interaction (HRI), Bielefeld,
C. Schlenoff, “Defining positioning in a core ontology for robotics,” in Germany, Mar. 2014.
Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International [25] E. Bruni, N. K. Tran, and M. Baroni, “Multimodal distributional
Conference on, November 2013, pp. 1867–1872. semantics,” J. Artif. Int. Res., vol. 49, no. 1, pp. 1–47, Jan. 2014.
[13] S. Niekum, S. Chitta, A. Barto, B. Marthi, and S. Osentoski, “Incre- [26] M. Stenmark, “Bilingual robots: Extracting robot program statements
mental semantically grounded learning from demonstration,” Berlin, from swedish natural language instructions,” in Proc. of The 13th
Germany, 2013. Scandinavian Conf. on Artificial Intelligence, Halmstad, Sweden,
2015.
29