=Paper=
{{Paper
|id=Vol-2325/paper-04
|storemode=property
|title=Knowledge Representation for Cognition- and Learning-enabled Robot Manipulation
|pdfUrl=https://ceur-ws.org/Vol-2325/paper-04.pdf
|volume=Vol-2325
|authors=Daniel Beßler,Sebastian Koralewski,Michael Beetz
|dblpUrl=https://dblp.org/rec/conf/kr/BesslerKB18
}}
==Knowledge Representation for Cognition- and Learning-enabled Robot Manipulation==
Knowledge Representation for
Cognition- and Learning-enabled Robot Manipulation
Knowledge Representation for
Cognition- and Learning-enabled Robot Manipulation
Daniel Beßler, Sebastian Koralewski, Michael Beetz ∗
Institute
Daniel Beßler, for Artificial
Sebastian Koralewski, Intelligence
Michael Beetz ⇤
Institute for Artificial IntelligenceGermany
Am Fallturm 1, 28359 Bremen,
Am Fallturm 1, 28359 Bremen, Germany
Abstract
Knowledge representation Abstract
and reasoning (KR&R) systems
are widely employed for the representation of abstract knowl-
edge. Action models are usually representations of state tran- Reach
sitions:Knowledge
Actions can berepresentation and reasoning
performed if all pre-conditions are
(KR&R) systems are widely employed fortake
the contact+
met, and it is expected that the designated effects will
place when the action is executed.
representation However,
of abstract embodiedAction
knowledge. agents Push
need additional
models knowledge
are usually about how their body should
representations of statebe
moved transitions:
to achieve theirActions
goals without causing
can be unwantedifside
performed all contact-
effects. The proposed action representation is based on force
pre-conditions are met, and it is expected
dynamic events that occur when an embodied agent interacts
that Retract
with itsthe designated
world. We show effects will of
how patterns takeforceplace
eventswhen
can
be usedthe
to define
actionsemantics of actionHowever,
is executed. verbs. Robots use our
embodied
model to acquireneed
agents episodic memoriesknowledge
additional which are stories of their
about how
performance
their coupled with sub-symbolic
body should be moveddata, and they share
to achieve their
their experience through the knowledge service OPEN EASE.
goals without causing unwanted side effects. Figure
Figure1:1:PR2
PR2closing
closinga drawer in in
a drawer a kitchen. TheThe
a kitchen. action
ac-is
The proposed action representation is based decomposed into different phases with distinct motion and
tion is decomposed into different phases with distinct
on force dynamicIntroduction
events that occur when an force event pattern.
motion and force event pattern.
embodied agent interacts
The cognition system of humans allows with itsusworld. We
to accom-
plish manipulation tasks veryof competently.
show how patterns force events This can be is used
possi- that actions might cause in terms of force events that
ble through the organization
to define semanticsofofactionsactionin verbs.
terms of motion
Robots One of
might the main reasons for investigating action models
occur.
phases, anduse through
our model the toprediction
acquire ofepisodic
effects memories
that actions from human psychology in robotics is that action models in
might causewhichin terms of force
are stories of events that might occur.
their performance coupled AI, In this
such work, (Ghallab
as PDDL we investigate an action
et al. 1998), usuallymodel
do notpos-
have
In this with
work,sub-symbolic
we investigatedata,an action tulated
an in human
appropriate level ofpsychology,
abstractionand make use
for robots. of it in
In particular,
and model postulated
they share theirin
human psychology, an artificial
action modelssystem.
in AI oftenTheabstract
model was awayproposed
from body bymotions
Flan-
experienceand make usethe
through of it knowledge
in an artificialservice
system.
The model was proposed by Flannagan et al. (2006). Actions
openEASE. and only concentrate on representing action pre- and post-
nagan et al. [6]. Actions are decomposed into motion
are decomposed into motion phases with different subgoals. conditions,
phases with anddifferent
sequences. Intelligent
subgoals. Theembodied
subgoalsagents need
are force
The subgoals are force dynamic events that also generate to bridge the
dynamic gap that
events between alsothese representations
generate distinctivewith miss-
sensory
Introduction
distinctive sensory feedback in the nervous system. ing information
feedback in theand the actual
nervous execution of an action in the
system.
Intentions of others
The cognition can not
system ofbe monitored
humans directly.
allows us toMoni-
accom- physical
Intentions of others can not is
world. Bridging this gap benon trivial and
monitored a prob-
directly.
toring
plish manipulation tasks very competently.less
force events, on the other hand, is at least Thisprob-
is pos- lem which is widely unsolved on the abstract
Monitoring force events, on the other hand, is at least level (i.e., by
lematic because events re-usable general knowledge). It is further expected that con-
sible through the may be monitored
organization in the physics
of actions in termsen- of less problematic because events may be monitored in
gine motion
of virtual worlds, or observed by some agent. This is, ditions and effects of actions are pre-defined – a hard to meet
phases, and through the prediction of effects the physicswith
requirement engine of virtualof worlds,
the diversity or observed
effects actions by
may cause
for example, that the hand gets into contact with the milk
package∗Thebefore grasping it from thepaper
table,has
or been
that the pack- by some agent. This
in the physical world. is, for example, that the hand gets
research reported in this supported
age looses contact
the German to theFoundation
Research supportingDFG, surface when
as part the agent
of Collaborative into
Thecontact
central with
questionthe for
milk package embodied
successful before grasping
action ex-it
performs a retracting
Research motion after the milk has1320
Center (Sonderforschungsbereich) been“EASE
grasped.- Ev- from the table, or that the package looses
ecution is how agents should move their bodies to achieve contact to
eryday Activity Science and Engineering”, University of Bremen the supporting surface when the agent performsThis a re-is,
⇤ (http://www.ease-crc.org/).
certain effects while avoiding unwanted side-effects.
The research reported in this paper has been supported by for example,
tracting how aafter
motion robotthe should
milk move its arm
has been such that the
grasped.
the German
CopyrightResearch Foundation
c by the DFG, as
paper’s authors. part ofpermitted
Copying Collaborative
for pri-
Research Center (Sonderforschungsbereich) 1320 “EASE - Ev-
pancake
One mixof the contained in the bottle
main reasons it holds is poured
for investigating action on
vate and academic purposes.
eryday top of thefrom
models pancake
human maker, and forms
psychology in arobotics
pancakeiswith that10cm
ac-
In:Activity ScienceA.and
G. Steinbauer, Engineering”,
Ferrein Universityofofthe
(eds.): Proceedings Bremen
11th In-
(http://www.ease-crc.org/). diameter.
tion models In thein area of AIasthere
AI, such PDDL are only few approaches
[7], usually do not
ternational Workshop on Cognitive Robotics, Tempe, AZ, USA,
Copyright c 2018,published
27-Oct-2018, Association for the Advancement of Artificial
at http://ceur-ws.org that
haveaddress this problem
an appropriate despite
level the semantic
of abstraction fornature
robots.of this
In
Intelligence (www.aaai.org). All rights reserved. reasoning problem.
11
particular, action models in AI often abstract away We use the openEASE platform for storing and
from body motions and only concentrate on represent- managing the episodic memories represented with our
ing action pre- and post-conditions, and sequences. In- model. It also allows to ask queries about it, such as
telligent embodied agents need to bridge the gap be- how the robot was moving when an action was per-
tween these representations with missing information formed, and to visualize snapshots of the activity with
and the actual execution of an action in the physical visual annotations. Figure 1 shows such an example
world. Bridging this gap is non trivial and a problem where the robot was closing a drawer in a kitchen en-
which is widely unsolved on the abstract level (i.e., vironment. The action is properly segmented into the
by re-usable general knowledge). It is further expected different motion phases, which is also visible in the Fig-
that conditions and effects of actions are pre-defined – ure. The vision is to collect a large data set of episodic
a hard to meet requirement with the diversity of effects memories, and utilize them for learning tasks to gain a
actions may cause in the physical world. better understanding about manipulation activities.
The central question for successful embodied action
execution is how agents should move their bodies to Related Work
achieve certain effects while avoiding unwanted side-
effects. This is, for example, how a robot should move There are several projects with efforts to provide
its arm such that the pancake mix contained in the symbolic knowledge about manipulation activities to
bottle it holds is poured on top of the pancake maker, robots. The most notable one is the IEEE-RAS work-
and forms a pancake with 10cm diameter. In the area ing group ORA (Ontologies for Robotics and Au-
of AI there are only few approaches that address this tomation) [13], which aims at defining standards for
problem despite the semantic nature of this reasoning knowledge representation in robotics. Schlenoff [12]
problem. also presented a related approach for detecting inten-
One of the peculiarities of our KR&R system is that tions in cooperative human-robot environments based
it runs inside the perception-action loop of a robotic on states which are more easily recognizable by sensor
agent. Symbols correspond to data structures of the systems than actions. In his work, intentions are also
robot control system, and as such they have a rather used for the prediction of the next action. For this
simple grounding. The representations in our system work, we extend the KnowRob system [17], which,
are inspired by the role that episodic memories play in among others, defines concepts for actions, and their
the acquisition of generalized knowledge in the human effects [16]. KnowRob also has a notion of motion
memory system [18]. phases, but these are not defined using force dynam-
The proposed representation of episodic memories ics.
consists of two parts. One part stores experiences and Another related branch of research is task and mo-
events as symbolic data. Those events and experiences tion planning. In this work, we present an action model
can be e.g. perceived objects, or performed actions, that can be used to yield higher level activities from
their duration, and possible failures. The second part observations of force events. Such force events can
stores sensor data from the robot in a database. We also be detected through haptic feedback, and be used
define this unstructured data as sub-symbolic data. to minimize uncertainty during manipulation activ-
In the first section, we will describe the symbolic ity planning [19]. The relation of our system to gen-
knowledge representation. An overview about the sub- eral planning systems is that planning domains can
symbolic data will be given afterwards. Then, we will be represented using our model and that plan param-
show how those memories can be used to improve the eters can be inferred from knowledge represented in
robot’s action models by getting insights about ma- our system. Action models in traditional planning sys-
nipulation activities. This will be achieved by using tems (such as PDDL) often only consider action pre-
a combination of query answering and visual analytic conditions and their effects, and do not incorporate
tools. more detailed information about motions and forces.
Our KR&R system is made available as part of the More recently, systems emerged that enable robots to
knowledge web service openEASE 1 [4]. The web ser- perform planning on both task and motion level by in-
vice gives the KR community the opportunity to do troducing an interface layer between task and motion
research in the context of real robot experiments. Re- planner [14, 5]. Our action model could be used by
searchers in the field of KR-based robot control can such systems to represent tasks, and to define action
further extend the knowledge base of the web ser- pre-conditions which are occurrences of force events.
vice by providing additional episodic memories of their Another aspect is that our system can yield partial
robots performing manipulation activities. boundaries of motion phases given some observation.
Motion segmentation methods typically apply some
1 http://www.open-ease.org/ form of clustering to build stochastic representations of
12
primitive motions and motion sequences. These meth- The property contact+ is further decomposed into
ods include self-similarity [8], k-means [10] or hierar- functional properties contact+1 and contact+2 denot-
chical [20] clustering. Primitive motions are often rep- ing the two salient objects during the contact event
resented as Hidden Markov Models [9, 11, 15] and se- (the two objects can be randomly assigned). The ob-
quences as stochastic motion graphs [15]. This research jects remain touched until they separate again which
has mainly focused on body motions with some excep- is indicated by a LeavingContactEvent. The contact
tions that also consider object movement [9, 11]. Con- is either caused by an agent moving objects into con-
trary to our approach, the listed motion segmentation tact, or through a physical process such as gravity, for
approaches do either not consider manipulated object example, pulling an object such that it falls onto the
movement or only consider its trajectory. Instead, we floor.
define motion boundaries according to interactions of Creation (CreationEvent) and destruction (De-
objects with the physical world through force dynam- structionEvent) events are also distinctive subgoals of
ics. These contact states seem particularly important activities that we use for activity representation at a
for control strategies employed by humans [6]. higher level of our ontology (e.g., cutting a bread cre-
ates a slice of bread).
Narrative of Episodic Memories The last category of physical events we consider in
the scope of this work are fluid flow events (Fluid-
This section introduces an action model for robots in- FlowEvent). These are events in which some liquid or
spired by the Flanagan model. The basis of it are force gaseous substance moves, for example, milk flowing
events that occur when an agent moves its body, and from a package to a glass, or water flowing in a river.
the different motion phases of actions. Our ontology Such events may be intended as in “pouring milk in a
is organized along these areas. It has 4 levels: Force glass”, or unintended as in “spilling milk on the floor
events, situations, motion phases, and intentional ac- during navigating”. The primary involved object is the
tivities. In addition, we use rules to declare identity liquid or gaseous substance, linked to the event via the
constraints. In this section we provide a description of functional property fluid v involved.
how this information is organized and represented.
In this work, we build upon the KnowRob ontol- Force Situations
ogy, and (manually) extend it with concepts of our
action model such as ForceEvent and PouringMotion. At the next level of our ontology there are situations
We have chosen KnowRob because it provides the during which force events occurred (ForceSituation v
necessary infrastructure for interfacing with robot con- Situation). Force events occur at time instants, for ex-
trol systems, and to record episodic memories from ample, in the moment the hand touches some object,
task execution. It defines concepts such as Event and and when it leaves contact again. We use such tempo-
Situation, and also specific ones to describe e.g. robots ral patterns of force events to expand them to distinc-
and their parts. tive situations.
Sub-events are linked to situations via the inverse
Force Events functional event object property. With inverse func-
tional we imply that each event can only be the sub-
At the lowest level of our action representation there event of a single situation. For detecting situations,
are events that physical objects cause in a (simulated) we use two dedicated events: One indicating the start
physical world. They are described independently from and the other indicating the end of the situation.
intentions. This is to allow detecting them fully au- These are represented using the functional properties
tomated, without taking into account previous events starter v event for the event starting the situation,
and higher level knowledge about task or embodiment. and stopper v event for the one stopping it.
PhysicalEvent v Event is the most general concept Surely, starter event should occur before stopper
in this ontology. It implies that physical events occur event. Situations during which the object is not in
at a particular time instant (derived from Event), and contact could else be classified as contact situations.
that at least one object is involved. Involved means We use predicates from Allen’s interval algebra [1] and
that one of their physical properties is salient during an identity constraint to assert this relation between
the event. This is the case if the object involved is cre- starter and stopper event. As illustration, this con-
ated or destroyed, touched or untouched, transformed straint can be written as:
into something else, etc.
∀ instance of(x, ForceSituation) :
The most essential events are the contact events (1)
∃(stopper ◦ after ◦ starter− )(x, x)
(ContactEvent) that occur whenever an object moves
in the world such that it touches (contact+ v involved) Note that the fact that some event occurred after an-
another object within a spatial region (contactRegion). other one is inferred on demand by our reasoner and
13
does not need to be asserted. The begin time of the sit- Arm Movement
uation is further defined as time of occurrence of the
Arm movements are fundamental for object manipu-
starter, and the end time as the time of occurrence of
lation. The repertoire of different arm motions of hu-
the stopper.
mans is rich: reaching, lifting, throwing, cutting, pour-
The starter event of contact situations is the con- ing, etc. Some of which have distinct patterns of force
tact event and the stopper event is the leaving contact dynamic events, such as cutting, that we use for rep-
event. Both have exactly the same involved objects. resenting them.
We represent this type of information using identity We use force events as delimiters of motion phases.
constraint rules using a property chain starting from In particular important are contact situations between
the starter event via involved objects, stopper event, body parts and other objects. Motions during which
and back to the starter event. lifetime the contact between body part and object is
Fluid flow situations are a bit different because continuously salient are called carrying motions (Car-
there are no distinct starter and stopper event types. ryingMotion). The body part in contact with the ob-
At some time instant the first and at a later time the ject must be part of the body part (denoted by partOf )
last fluid flow event of a situation occurs. However, which is moved during the motion. This is to allow, for
not every sequence of fluid flow events referring to the example, that the contact occurs between hand and
same fluid makes a situation. If the container is put tool while the body part referred to by the motion is
aside for a while, for example, one would rather say the arm (which in turn has a hand part).
that the situation ended then, and that a new situa- Objects held by agents may also touch other objects
tion starts when the container is used later on. This or liquids during the motion, causing distinct force
can be enforced by asserting that, during fluid flow events during that interaction. We use this pattern
situations, the container may only be salient for fluid of force events for the representation of tool motions.
flow events. A cutting motion, for example, is a carrying motion,
performed with a cutting tool, during which some ob-
ject was cut into pieces. Cutting events may also be
destruction events in case the object cut into pieces
Motion Phases entirely disappeared. We further assert that the tool
used in the cutting event (cutter ) is also salient during
Motions can be detected by monitoring the joint con- the carrying situation.
figuration of an agent. Movements are either reflexive Another challenging manipulation task is pouring.
or intentional. But at this level of our ontology, with- It can be performed in many ways, and on many dif-
out knowing intentions of agents, we can not distin- ferent expert levels. The motion profiles of different
guish between reflexive and intentional motions and expert levels are drastically different, but they all gen-
represent motions solely in terms of expected events erate fluid flow events when particles are leaving the
and body parts used. source container. We represent pouring motions as
The different body parts are defined in the contact situations with a subgoal which is a fluid flow
KnowRob ontology. Here, we define a general “body situation. First, we state that pouring motions are car-
part moved” concept for each of these body parts. rying situations where a container that contains some
We define the functional relation partMoved to rep- fluid is a salient object, and that at least one fluid
resent which body part moved during a motion, and flow situation is a subgoal of this situation. We further
restrict the range of this property to the correspond- state that the fluid transported in fluid flow events of
ing body part type. For ArmMovement’s, for example, subgoals is exactly the fluid inside of (contains) the
we assert: ∀partM oved.Arm and = 1partM oved.Arm. contacted container.
Force events salient for a motion are denoted by the
inverse functional event relation. Temporal ordering Activities
constraints are asserted by temporal properties before, At the highest level of our ontology there are activities
after, and during. composed of motions with expected event patterns.
Here, we only investigate arm movements. Hand At this level of the ontology, the intention of agents
movements are also represented, but only at a coarse is implied by action concepts. The standard example
level using a boolean state: Opened or closed. We also quoted in the work of Flanagan et al. is a fetch-and-
ignore gaze motions in this work. However, it would place activity. During fetch-and-place tasks, there is
be interesting to look into gaze contact events and to a contact situation between agent and fetched object,
compare gaze patterns for different expert levels in fu- and also distinct events indicating that the carried ob-
ture work. ject first leaves contact to a supporting surface, and
14
later gets into contact with a supporting surface again. The data is used by our knowledge system to answer
We state that fetch-and-place activities have a sub- questions such as: “Where was the base relative to the
motion which is a carrying motion. And that there are object, 5 seconds ago”.
two additional force events linked to the action via the
subevent relation. We further state that there is a sub-
Reasoning with Episodic Memories
event in which the carried object looses contact to a
supporting surface. The knowledge represented in acquired experiences is
At this level, we can distinguish between colliding, very comprehensive. It not only contains narrations
supporting, and intentionally touching. Unexpected of activities but also raw experience data. Competent
contacts during an activity are classified as collisions. robot behavior needs both: Experience data encodes
This makes it very easy to detect them. With expected particularities of motions such as forces and velocities,
we mean that the activity concept asserts their occur- and the narrative is required to make sense of the data
rence during the activity. at higher cognitive levels.
We use the same scheme to distinguish between Here, we provide reasoning examples with our ac-
pouring and spilling: Pouring actions have intended tion representation. We first describe how activities
fluid flow subgoals while spillage events are exactly can be obtained from force events, and also how an
the unintended fluid flow events occurring during an agent can make sense of action concepts. We finally
action. More concretely, pouring actions have a tar- outline some analytical reasoning tasks that can be
get location where the fluid should be poured into or performed on episodic memories.
onto. We classify all fluid flow events where the fluid is
transported to somewhere else then the target location
Activity Parsing
as spillage events.
In virtual worlds, force dynamic events can be mon-
Experience of Episodic Memories itored perfectly. These can be asserted to the knowl-
edge base as they occur. Given the occurrence of force
Experience data captures low-level information about events, we can infer new knowledge using descriptions
experienced activities represented as time series data from higher levels of our ontology. In the first step, the
streams. This data has often no or only unfeasible loss- events are expanded to situations. The situations are
less representation as facts in a knowledge base. To then refined to motions with distinct force event pat-
make this data knowledgable, procedural hooks are de- terns. Finally, high level activities are detected based
fined in the ontology to compute relations from the ex- on patterns of force events and motions.
perience data, and to embed this information in logic-
based reasoning.
The data is stored in a NoSQL database using JSON Expanding Force Events
documents. Each individual type of data is stored in a The expansion process exploits representations of sit-
collection named according to the type of data stored uation concepts to identify events that determine
in it. When imported, the knowledge system stores the the situation. Situations are determined by so called
data in a MongoDB 2 server, for which the knowledge starter and stopper events. The events are processed
system implements a client for querying the data dur- from earliest to latest. A situation symbol is created
ing question answering. when a starter event was detected, and a triple that
specifies the starter relation is asserted. The proce-
Pose Data dure stores a list of situations without stopper events.
A robotic system typically has many mobile compo- For each new event, this list is first iterated to test
nents arranged in a kinematic chain. Each component whether the event is a stopper event of the situation,
in a kinematic chain has an associated named coor- and a triple that specifies the stopper relation is as-
dinate frame such as world frame, base frame, grip- serted if this is the case. Finally, it is also tested if new
per frame, head frame, etc. 6 DOF relative poses are events are sub-event of one of the situations without
assigned to frames. These are usually updated with stopper.
about 10 Hz during movements, and expressed relative
to the parent in the kinematic chain to avoid updates Classifying Motions
when only the parent frame moves. The transforma-
tion tree is rooted in the dedicated world frame node We assume that arm motions are only segmented by
(also often called map frame). zero velocity segmentation in advance. We use force
events as delimiters for coarse-grained segmentation.
2 https://www.mongodb.com/ We think that this segmentation is sufficient because it
15
captures the force events which are the essential sub- etc. The inferred action symbol is bound to a vari-
goals of manipulation activities. Here, we only con- able which is used as index to sub-symbolic data in
sider arm motions. For each situation during which an the experience part of episodic memories. This is done
arm motion occurred, we iterate through the different to access data slices corresponding to the semantic ac-
subclasses of ArmMotion which are also contact situa- tivity for which the symbol was inferred earlier. An
tions, and we test if classifying the situation with that example of such a query is shown in the following:
type would yield a contradiction. The motion type is e n t i t y ( Act , [ an , a c t i o n , [ type , p u t t i n g d o w n ] ] ) ,
asserted if this is not the case. The motion classes are o c c u r s ( Act , [ , End ] ) ,
disjoint such that situations can only be classified as h o l d s ( p o s e ( p r 2 : ’ p r 2 b a s e l i n k ’ , Pose ) , End ) .
being instance of one of the motion classes. Which corresponds to the question “Where did the
robot stand at the end of put-down actions?”.
Parsing Activities Based on our model, we can also ask questions
about the goals of an action, for example, “What mo-
Motions and force events are then used as building
tion phases are the subgoals of an action”. For our in-
blocks for activities. Activities can be parsed using
troductory example of a robot closing a drawer (see
rules that detect temporal patterns of events and mo-
Figure 1), the motion phases can be queried with a
tions that are distinctive for them. Force events and
query such as:
motions that are subgoals of activities are denoted by
the subevent and submotion properties. Patterns with e n t i t y ( Act , [ an , a c t i o n , [ type , c l o s i n g a d r a w e r ] ,
[ part moved , [ an , o b j e c t , [ b a s e l i n k , HandBase ] ] ] ] ) ,
partial ordering constraints can be inferred from this f i n d a l l (M, e n t i t y ( Act , [ s u b m o t i o n , M] ) , Motions ) .
model. The output of the parser is an ontology, de-
scribing instances of detected actions. Here, we provide For a more detailed description of the question answer-
one hand-written rule that is used to detect pick-and- ing system used here, please consult the system paper
place activities shown in Algorithm 1. written by Beetz et al. [2].
Algorithm 1 Detect Pick-and-Place Activity Analytics
1: procedure detect-pick-and-place
2: CarryingMotion(?s), contact+S (?s, ?obj), Episodic memories are very comprehensive and addi-
3: LeavingContactEvent(?ev1), tional tools for inspection are required. For illustra-
4: loose-support-event(?s, ?ev1, ?obj), tion, we pick one simple pick-and-place task performed
5: ContactEvent(?ev2), by a robot and show how our visual analytics tools are
6: gain-support-event(?s, ?ev2, ?obj), used to get insights about manipulation activities and
7: before(?ev1, ?ev2). reasoning processes. Our goal is to provide tools for
gathering data for learning algorithms, and to learn
8: procedure loose-support-event(?s,?ev,?obj)
about the requirements for robots performing every-
9: contact-(?ev, ?obj), contact-(?ev, ?t),
day activities. Clustering methods may be used, for
10: SupportingSurface(?t),
example, to group actions based on their parameter-
11: stopper(?s, ?x), before(?ev, ?x).
izations, and to identify e.g. what kind of actions re-
12: procedure gain-support-event(?s,?ev,?obj)
quire two arms to be performed successfully, or what
13: contact+(?ev, ?obj), contact+(?ev, ?t), kind of actions require additional tools. Different com-
14: SupportingSurface(?t), ponents of our analytics framework will be described
15: starter(?s, ?x), after(?ev, ?x). below.
Action Hierarchy Visualization
Activity Interpretation
Cognition-enabled plan frameworks, such as CRAM
Our ultimate goal is to enhance the performance of [3], generate action hierarchies instead of sequences of
robots by supplying them with knowledge about every- actions. This is because, in cognition-enabled plans,
day activities, and in particular with high-level stories most actions are abstract and require reasoning which
about what happened combined with experience data. results in action hierarchies. For instance, a pick and
In this section, we provide a description of how robots place action requires a ”pick” sub-action to be per-
may use the information represented in episodic mem- formed followed by a ”place” sub-action. Action hier-
ories. archies are stored in our episodic memory as symbolic
A typical query first asks for a particular seman- data. To get a better understanding of an experiment,
tic action that fulfills certain constraints such as be- openEASE contains a component to visual the whole
ing successful, being performed by a particular agent, action hierarchy. This visualization gives an overview
16
manipulation
grasping
location
motion
unreachable
action
object
not found
not found
collision
MovingToLocation 417 35 35
BaseMovement 2 24 24
LookingAt 18 12 12
MovingToLocation 13 VisualPerception 9 29
VisualPerception 5 LookingFor 15 14
FetchAndDeliver 2 1 1 Reaching 14 3 10
PickingUpAnObject 2 MovingToOperate 7 14 2
MovingToOperate 1 PickingUpAnObject 7 4 1 2 1
LookingForSomething 1 Retracting 6 2 5
PuttingDown 4 2 2 1
LiftingAnArm 4 1 3
Figure 2: Co-occurrence matrix between actions and OpeningAGripper 4 4
errors. The cell values indicate how many times the LoweringAnArm 2 1 2
Pulling 2 1 2
error occurred for each action type. FetchAndDeliver 5
AcquireGrasp 2 1 2
ClosingAGripper 2 2
SettingAGripper 2 2
about what actions were executed by the robot, the re- Figure 3: Co-occurrence matrix between actions and
lationship between those actions and which tasks were reasoning tasks. The cell values indicate how many
successful and which not. times a reasoning task was performed during each ac-
With our visual analytics framework we want to go tion type.
beyond showing just hierarchies and statistics. Each
visual component is linked to the knowledge base
Visualization of Reasoning Tasks
which allows us to perform queries on the displayed
data. To be specific, the nodes in the action hierar- Cognition-enabled plans require a significant amount
chy can be selected by the user, and the user can ask of reasoning. We provide multiple visualization tools
queries about them such as getting the error type of available to get insights about reasoning processes.
an unsuccessful task, the time duration, etc. In addi- Figure 3 shows a co-occurrence matrix with the action
tion, trajectories during actions can be queried and types (rows) and the reasoning questions (columns)
visualized. Having the experience data linked to the which are asked during a pick and place action. This
narrative of an activity further allows to correlate suc- matrix gives an overview which reasoning tasks were
cess of an action with e.g. the goal pose relative to the performed the most and which tasks required the most
base. reasoning. In our example, a significant amount of spa-
tial and perception reasoning tasks were performed.
Our analytics framework serves additional statis-
Visualization of Errors tics, such as depicted in Figure 4. The left pie chart
shows the ratio between the frequency of reasoning
For every episodic memory we can request a co- tasks compared to actions. A high number of reason-
occurrence matrix between actions and errors which ing tasks indicates the robot performed a very abstract
occurred during an activity. Figure 2 shows an error plan since it required a lot of reasoning to be able to
matrix for a pick and place activity. The rows and execute it. The right pie chart in Figure 4 depicts an
columns can be sorted by frequency to get quickly an overall time usage between reasoning and action exe-
overview which actions failed the most or which error cution. Note that even though the general amount of
type occurs the most. Referring to Figure 2, the matrix reasoning tasks is significantly higher than the num-
shows that most failed action was MovingToLocation ber of actions, the action execution requires the most
due to collision. time. This insight gives us the the opportunity to let
the robot do more expensive reasoning in the future
We are also using the error matrix to extract action
without extending the overall experiment runtime be-
preconditions which were not considered during plan
cause we could run the reasoning in parallel during the
design. Currently we are extracting the preconditions
action execution.
manually. In the future we are planning to automatize
this extraction so the robot can extend its action model
by itself. Conclusion
The matrix is also linked to the knowledge base, this In this paper, we have introduced an approach for rep-
allows us to query detailed information about the er- resenting episodic memories of embodied agents per-
rors. For instance, for perception errors we can query forming manipulation tasks. The action model is in-
which objects could not be perceived. Those queries spired by a model from human psychology. Its repre-
can give us an overview e.g. for which objects the per- sentations are based on force dynamic events which
ception system might need to be improved. are used to define semantics of action verbs. We have
17
Reasoning D. Wilkins. PDDL–the planning domain defi-
Action
86 98 nition language. AIPS-98 planning committee,
2
14 Reasoning 1998.
Action
[8] B. Krüger, A. Vögele, T. Willig, A. Yao, R. Klein,
Figure 4: The left chart shows the frequency of reason- and A. Weber. Efficient unsupervised temporal
ing tasks (791) compared to the number of performed segmentation of motion data. IEEE Transactions
actions (124). The right chart shows how much time on Multimedia, 19(4):797–812, 2017.
was spend during action execution (180.61 sec) and
[9] V. Kruger, D. L. Herzog, S. Baby, A. Ude, and
resoning (3.68 sec).
D. Kragic. Learning actions from observations.
IEEE Robotics Automation Magazine, 17(2):30–
shown that patterns of force events can be used to de- 43, 2010.
tect intentions, and what actions an embodied agent
performed. The action model is coupled with expe- [10] K. Kulkarni, E. Boyer, R. Horaud, and A. Kale.
rience data that stores control level information. We An unsupervised framework for action recogni-
believe that collections of episodic memories are key tion using actemes. In R. Kimmel, R. Klette, and
for understanding how experiential knowledge about A. Sugimoto, editors, Computer Vision – ACCV
manipulation tasks can be generalized. 2010. Springer Berlin Heidelberg, 2011.
[11] Sanmohan, V. Krüger, and D. Kragic. Unsu-
References pervised learning of action primitives. In 2010
[1] J. Allen. Maintaining knowledge about tem- 10th IEEE-RAS International Conference on Hu-
poral intervals. Communications of the ACM, manoid Robots, 2010.
26(11):832–843, 1983.
[12] C. Schlenoff. Inferring intentions through state
[2] M. Beetz, D. Beßler, A. Haidu, M. Pomarlan, representations in cooperative human-robot en-
A. K. Bozcuoglu, and G. Bartels. Knowrob 2.0 vironments. (Déduction d’intentions au travers
– a 2nd generation knowledge processing frame- de la représentation d’états au sein des milieux
work for cognition-enabled robotic agents. In In- coopératifs entre homme et robot). PhD thesis,
ternational Conference on Robotics and Automa- University of Burgundy, Dijon, France, 2014.
tion (ICRA), Brisbane, Australia, 2018.
[13] C. Schlenoff, E. Prestes, R. Madhavan,
[3] M. Beetz, L. Mösenlechner, and M. Tenorth. P. Goncalves, H. Li, S. Balakirsky, T. Kramer,
Cram—a cognitive robot abstract machine for and E. Miguelanez. An IEEE standard ontology
everyday manipulation in human environments. for robotics and automation. In Intelligent
In Intelligent Robots and Systems (IROS), 2010 Robots and Systems (IROS), 2012 IEEE/RSJ
IEEE/RSJ International Conference on, pages International Conference on, pages 1337–1342.
1012–1017. IEEE, 2010. IEEE, 2012.
[4] M. Beetz, M. Tenorth, and J. Winkler. Open- [14] S. Srivastava, E. Fang, L. Riano, R. Chitnis,
EASE – a knowledge processing service for robots S. Russell, and P. Abbeel. Combined task and
and robotics/ai researchers. In IEEE Interna- motion planning through an extensible planner-
tional Conference on Robotics and Automation independent interface layer. In IEEE Interna-
(ICRA), Seattle, Washington, USA, 2015. Final- tional Conference on Robotics and Automation
ist for the Best Cognitive Robotics Paper Award. (ICRA), 2014.
[5] N. T. Dantam, Z. K. Kingston, S. Chaudhuri, and [15] W. Takano, H. Imagawa, and Y. Nakamura.
L. E. Kavraki. Incremental task and motion plan- Spatio-temporal structure of human motion prim-
ning: A constraint-based approach. In Robotics: itives and its application to motion prediction.
Science and Systems, 2016. Robotics and Autonomous Systems, 75:288 – 296,
2016.
[6] J. R. Flanagan, M. C. Bowman, and R. S. Jo-
hansson. Control strategies in object manipula- [16] M. Tenorth and M. Beetz. A unified representa-
tion tasks. Curr. Opin. Neurobiol., 16(6):650–659, tion for reasoning about robot actions, processes,
Dec 2006. and their effects on objects. In 2012 IEEE/RSJ
International Conference on Intelligent Robots
[7] M. Ghallab, A. Howe, C. Knoblock, D. Mc- and Systems (IROS), Vilamoura, Portugal, Oc-
Dermott, A. Ram, M. Veloso, D. Weld, and tober, 7–12 2012.
18
[17] M. Tenorth and M. Beetz. KnowRob – A Knowl- ular optimization. In 15th IEEE-RAS Inter-
edge Processing Infrastructure for Cognition- national Conference on Humanoid Robots, Hu-
enabled Robots. Int. Journal of Robotics Re- manoids 2015, Seoul, South Korea, November 3-5,
search, 32(5):566 – 590, April 2013. 2015, pages 407–413, 2015.
[18] E. Tulving. Episodic and semantic memory [20] F. Zhou, F. D. l. Torre, and J. K. Hodgins. Hi-
1. Organization of Memory. London: Academic, erarchical aligned cluster analysis for temporal
381(e402):4, 1972. clustering of human motion. IEEE Transactions
on Pattern Analysis and Machine Intelligence,
[19] N. A. Vien and M. Toussaint. Touch based 35(3):582–596, 2013.
POMDP manipulation via sequential submod-
19