<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Which tool to use? Grounded reasoning in everyday environments with assistant robots</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lydia Fischer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephan Hasler</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jorg Deigmoller</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Schnurer</string-name>
          <email>thomas.schnuerer@tu-illmenau.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Redert</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ulrike Pluntke</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katrin Nagel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chris Senzel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joern Ploennigs</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Richter</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julian Eggert</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Copyright c by the paper's authors. Copying permitted for private and academic purposes. In: G. Steinbauer, A. Ferrein (eds.): Proceedings of the 11th International Workshop on Cognitive Robotics</institution>
          ,
          <addr-line>Tempe, AZ, USA, 27-Oct-2018, published at</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IBM Research</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <fpage>3</fpage>
      <lpage>10</lpage>
      <abstract>
        <p>We present a cooperative reasoning agent embodied into a mobile robot enabled to explore its environment by a camera. The robot can infer missing knowledge given an action and an object by the user, like for example \I want to open a wine bottle.". Available tools are explored by the robot and it nally recommends the most suitable one. The reasoning is on the one hand based on a static ontology that describes how to relate the actions \open" and \cut " with a xed set of tools. On the other hand, unknown actions and tools are resolved by looking up for synonyms or super/sub-class relations in Wikidata and WordNet. By this, the robot tries to map knowledge from linked data to its internal interpretable ontology. To retrace the reasoning process, the robot is able to explain its conclusions by text to speech. Finally, we show the performance of the system based on di erent settings for scanning the linked data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Cooperative Intelligence (CI) is an approach which
takes up ideas from arti cial intelligence and machine
learning but puts the human in the center of all
considerations. One of the prerequisites of CI is that
for seamless interaction, an arti cial agent and a
human need to share concepts about the things that
happen in their environment [Rebhan et al., 2009a,
Rebhan et al., 2009b]. A viable approach for this is to
use semantic knowledge in terms of ontologies.
Problems here lie in the generalizability of the approach
and in the grounding [Harnad, 1990] of the semantic
concepts in the real-world. Besides the pure
functionality of such a system the capability of making the
internal processes transparent is important for user
acceptance and trust [Hancock et al., 2011]. To tackle
this problem, in a joint project between HRI-EU and
IBM Watson IoT Center, we have studied a minimal
real-world set-up in which a robot interacts with a
user to reason about tools, objects and actions. The
scenario was limited to a small set of actions like e. g.
cutting and opening, which are common all day
activities. However it was designed to be expandable to a
broader set of involved elements and actions. In this
paper, we present a system consisting of a robotics
platform provided with a scenario-speci c root ontology,
dialog, reasoning, external knowledge search, object
detection, and symbol grounding capabilities. The
ontology provides seed knowledge about a small set of
concepts, and we show the system capabilities of
reasoning about tool-related queries within the ontology and
additionally making use of external knowledge sources
to extend its reasoning scope in case of \unknown"
concepts. The analysed system is capable of
interacting with the user via speech,
processing user requests within the designed scope
of the system using an internal static ontology and
reasoning,
providing a log le that contains the internal steps
of the reasoning as well as an explanation via
speech,
connecting knowledge from external sources with
the real world, and
operating in the real world.</p>
      <p>In the following we present a proof of concept
framework that allows for future extensions. We have chosen
a limited scenario that should show the main idea of
dealing with external knowledge sources for a robot.</p>
    </sec>
    <sec id="sec-2">
      <title>The scenario: Tools, objects, and actions</title>
      <p>In our real world scenario, we show a mobile robot
that is acting intelligently and able to reason by taking
into account the current situation and the knowledge
sources of the system. In the future, robots will assist
humans in many tasks in their household or at work.
We consider the tasks of cutting and opening objects
which are typical in all day activities. These tasks are
easy to understand and o er already enough complexity
to show the challenges and solutions in reasoning.</p>
      <p>In the beginning of the demonstration, a user enters
the scene and sets a task like \I want to cut a wooden
block.". The system shall suggest a suited tool with
respect to the available tools in the current scene. The
robot IRA (Intelligent Reasoning Agent) is placed in a
xed and static environment. In our demonstration, we
set up a room with a table that is placed in the middle
of the room (Figure 1). On the table there are tools
which are available for the task. We assume that the
tools do not change and are not moved while exploring.</p>
      <p>In case the user states a request which cannot be
answered using the prede ned ontology, e. g. \I want
to cube an onion." (the action to `cube' and the object
`onion' is unknown by the system), the system uses
external knowledge sources in order to match the
unknown terms with related terms known in the internal
ontology. Hence, our system is able to work beyond
the designed task scope with this mechanism.</p>
      <p>One run of the scenario contains ve main steps:
1. User sets a task via speech interaction
2. IRA provides a rst suggestion by reasoning in its
internal ontology (and external knowledge sources
if required)
3. IRA explores the table with the available tools
and recognizes them visually
4. IRA suggests the most suited tool that is available
on the table for the given task and points to it
5. IRA provides an explanation of its decision
The subsequent section provides related work
followed by a description of the system architecture.
Afterwards the functionality of single system components are
explained. Then the results of the quantitative tests
are described. The conclusion completes the paper.
2</p>
      <sec id="sec-2-1">
        <title>Related work</title>
        <p>Most work in robotics in the area of
reasoning focuses on detailed execution of
manipulation tasks and robot planning. For
instance, [Hogg et al., 2017, Tenorth and Beetz, 2013]
and [Haidu and Beetz, 2016] focus on learning tasks
from observing human activities in kitchen scenarios.
In the area of robot planning, the reasoning usually
deals with uncertainties from real world measurements
[Hanheide et al., 2017].</p>
        <p>Only little work exists in covering more broader
everyday knowledge for robots like usual
relations between objects and properties to allow
a more natural and less targeted interaction.
Early work tried to build own crowd sourced
databases [Gupta and Kochenderfer, 2004]. Later on,
more extensive databases emerged from the linked
data wave [Antoniou et al., 2012, Wood et al., 2014].
[Daoutis et al., 2009] used the common sense database
Cyc [Matuszek et al., 2006] for communicating with
the robot about objects and their properties that have
been seen. [Kaiser et al., 2014] crawl text documents
that t to the current context, build their own ontology
and improve the content by linking against WordNet
[Stallman and Krempl, 2010].</p>
        <p>One major problem in making data interpretable
for robots is the type of representation. In
[Pustejovsky and Krishnaswamy, 2016], a semantic
knowledge representation is de ned that describes
realworld objects from a functional point of view, which
might be bene cial for robotics.</p>
        <p>Still, the previous approaches lack in judging if an
observed object by the robot is usable for a certain
task. This relates to the problem that linked data is
not directly machine interpretable, since the naming
of symbols in the ontology depends on the designers
choice. Hence, our approach di ers as we map linked
data onto an internal, machine-interpretable ontology
that represents a known concept to the robot. The
advantage is that by doing this we are able to evaluate
if external knowledge sources are usable for the robot.
3</p>
      </sec>
      <sec id="sec-2-2">
        <title>The system architecture</title>
        <p>event via MQTT</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Front-end</title>
      <p>WAKE UP
END SESSION
SPEAK
ASK
EXPLORE TABLE</p>
      <p>COMUNICATE RESULT
command via MQTT</p>
      <p>Back-end
conversation service
keyword detection
natural language
understanding
reasoning
{ internal knowledge
{ external knowledge</p>
      <p>The system consists of two main parts (Figure 2):
the front-end (interaction platform) and the back-end
(dialog management and reasoning). We assume a
system with a prede ned static ontology about objects and
tool types as well as relations between them with
respect to the actions cut and open, e. g. a wooden block
can be cut by a wood saw. The platform of the system
is a MetraLabs SCITOS G5 robot [MetraLabs, 2018],
additionally equipped with a Kinect camera mounted
on a pan-tilt-unit for moving the head. Laser
scanners allow the robot to localize itself in the room and
a KINOVA JACO arm [Kinovarobotics, 2018] enables
the robot to point at tools and objects in the
environment.</p>
      <p>
        The front-end provides basic modules for the robot
like speaking or listening. Those modules are running
in ROS
        <xref ref-type="bibr" rid="ref22">(Robot Operating System [ROS, 2018])</xref>
        . The
back-end comprises mainly the dialog control and the
reasoning. Further it acts as a state machine that
decides for example if the robot needs to explore the
environment for available tools.
4
      </p>
      <sec id="sec-3-1">
        <title>Front-end</title>
        <p>
          The front-end provides modular components that
can be independently triggered by the back-end
through a MQTT
          <xref ref-type="bibr" rid="ref12">(Message Queue Telemetry
Transport, [ISO/IEC, 2016])</xref>
          bridge. The MQTT bridge
translates messages from the back-end to ROS messages
of the front-end and vice versa. As depicted in
Figure 2, the front-end modules are limited to SPEAK, ASK,
EXPLORE TABLE, and COMMUNICATE RESULT. WAKE UP
and END SESSION only send unidirectional messages
for starting and stopping the dialog. The modules are
explained in the following:
        </p>
        <p>SPEAK receives a text message from the back-end,
transfers it to speech and returns if the robot has
nished speaking.</p>
        <p>ASK works similar as SPEAK, as it is a combination of
SPEAK at rst - for asking a question - with a subsequent
listening function. It nally returns the answer of the
user as text to the back-end.</p>
        <p>EXPLORE TABLE is the most complex module. If
triggered, it moves the robot around the table while keeping
the camera view on the table. The robot stops
approximately every 0.5 m on the trajectory and captures an
RGB and a depth image from the Kinect camera. For
each RGB image the objectness score of the YOLO
detection framework [Redmon and Farhadi, 2017] is
used to predict bounding boxes of object candidates
(see Figure 3). For this the pre-trained YOLO
model for the COCO dataset is used, which yields
an 81-dimensional class vector for each bounding box.
This vector is the feature input to a linear SVM
[Cortes and Vapnik, 1995] that is trained to predict
the nal tool class together with a con dence. Using
the depth information, a 3D world position is
computed for each object detection. After all views have
been taken, nearby 3D detections are clustered using
DBSCAN [Ester et al., 1996]. Each cluster is assigned
with a unique instance ID, a label, and a con dence.
The label indicates the tool class with the highest
accumulated con dence, and the con dence of the cluster
equals the ratio between the accumulated con dence
of the winning class and the sum of all detection
condences (Figure 4). This integration step strongly
reduces the e ects of noisy 3D estimations and wrong
class predictions from certain view-points. Finally the
cluster labels with their con dences are sent to the
back-end. We expect that the single view detection
performance can be strongly improved by directly
netuning YOLO on our data. Having this, a more e cient
exploration strategy of the robot can be implemented,
e. g. like in [Andreopoulos et al., 2011] where
occlusions of objects on a table are taken into account. This
module relates to step three of a demonstration run
explained at the end of the Introduction.
4. Metal Saw</p>
        <p>3. Lighter
2. Wood Saw
0. Bread Knife
1. Screwdriver
0. Bread Knife
confidence = 0.88
size = 200.0
1. Screwdriver
confidence = 0.94
size = 80.0
2. Wood Saw
confidence = 0.88
size = 300.0
3. Lighter
confidence = 0.91
size = 80.0
4. Metal Saw
confidence = 0.94
size = 140.0</p>
        <p>COMMUNICATE RESULT receives the inferred
information from the back-end which tool is the most suitable
one for the given query. This contains the instance
ID of the tool and the text for speaking. The text is
forwarded to the SPEAK module and the instance ID is
used to point at the tool on the table with the robot
arm (step 4).
5</p>
      </sec>
      <sec id="sec-3-2">
        <title>Back-end</title>
        <p>The complete dialog between the user and IRA is
controlled by the back-end. In addition, keyword detection
based on natural language understanding techniques
extract the action and the object from tasks stated by
the user.</p>
        <p>The control module of the back-end keeps track of
the current state of the dialog and triggers the needed
commands for the robot. It also provides the exchange
of information between the back-end components. The
main intelligence of the system is contained in the
reasoning module that is explained in more detail later.</p>
        <p>The architecture of the back-end (Figure 5) follows
the micro service architectural pattern. It has been
implemented on the IBM Cloud using multiple IBM
Watson services. Connectivity between the robot and
the back-end is established through the Watson
Internet of Things Platform using the MQTT protocol.
The back-end services can be organized in three layers,
which are brie y described below.</p>
        <p>Natural language understanding: The
components within this layer are responsible to drive the
dialog with the user. If an utterance is received by
the Dialog Controller, it decides, which of potentially
several skills is best suited to handle it. In case, none
ts well, the Default Skill creates a response that tells
the user that the system cannot handle the request. If
the utterance is related to the described scenario, the
Cut &amp; Open Skill will produce a response. In order to
do so, it rst analyses the intent of the utterance and
extracts relevant information like the action and the
object using the Keyword Extraction component. In
case action and object are identi ed, the reasoning is
performed. The result is then translated into natural
language using the Cut &amp; Open Dialog component.
This is also involved, if the system must ask for missing
information or explains the result.</p>
        <p>Structured reasoning: This layer contains the
main reasoning component, which provides a
structured API for concept-based as well as instance-based
reasoning. It uses knowledge sources from the layer
below. Details of both algorithms are described in the
next section of the paper.</p>
        <p>Knowledge sources: The components of this layer
provide information for the reasoning component. Most
prominent is the Structured Knowledge Store
component, which contains structured domain knowledge
(ontology) about the use case scenario. In order to
enhance manually added knowledge, it applies reasoning
rules that generate additional knowledge. The
Unstructured Knowledge Store is considered as a fallback
that returns a paragraph of text as description in case
no speci c tool can be recommended. WordNet and
Wikidata wrap the corresponding knowledge sources.</p>
        <p>WordNet [Stallman and Krempl, 2010] is a lexical
database of English containing nouns, verbs,
adjectives and adverbs. Words are grouped into sets
of cognitive synonyms (synsets), each expressing
a distinct concept. This knowledge is helpful in
resolving the meaning of unknown words.
Wikidata [Vrandecic and Krotzsch, 2014] is an open source
knowledge database containing factual knowledge
extracted for instance from Wikipedia. Both sources are
selected because of their complementary content.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>The reasoning:</title>
      <p>The reasoning components are the pre-de ned static
ontology and reasoning rules on top of it, the
interfaces to Wikidata and WordNet. The static ontology
contains
objects ( 30), e. g. apple, glass, rope
tools ( 20), e. g. bread knife, corkscrew, wood
saw
concepts ( 15), e. g. vegetable, vessel, plant
relations (hasComponent, cutBy, cutByPref,
openBy, openByPref)
the internal structure encodes an isA relation of
objects, tools, and concepts, e. g. an apple isA
fruit
The objects and the tools are attached with the
properties: stability S and their 3D shape. The 3D shape
is described in the dimensions width W, height H and
length L. These qualitative properties are used for
reasoning. The stability S of a tool is speci ed in a
range from completely deformable to completely sti .
Currently, the shape of a tool is not measured online
but is a static property stored in the ontology. Later on
it is planned to use the visual input of IRA to measure
the length of the recognized tools. For reasoning within
cutting tasks there are two main reasoning rules using
this properties:
(i) Sobject</p>
      <sec id="sec-4-1">
        <title>Stool ; and (ii) Wobject</title>
      </sec>
      <sec id="sec-4-2">
        <title>2 Ltool :</title>
        <p>For a task within opening scenarios there is a special
reasoning rule in addition to the direct links that says,
if a vessel v has a fastening f , and this fastening f can
be opened by a tool t, then vessel v can also be opened
by tool t:
hasComponent(v; f ) &amp; openBy(f; t) ! openBy(v; t) :</p>
        <p>The reasoning consists of two steps: concept-based
reasoning, and instance-based reasoning. The
conceptbased reasoning uses the internal ontology and the
external knowledge without reference to the tools on
the table. Hence the system provides a rst
suggestion of a tool, as mentioned in the second step of the
scenario description. Figure 6 shows a visualisation
of the relations of a wooden block in the internal
ontology which represents the main structure of objects
and tools. For the task \I want to cut a wooden block."
cutByPref</p>
        <p>wood saw
wooden block
cutBy
cutting tools
the rst suggestion would be a wood saw since it is
connected via a cutByPref relation. The su x Pref
indicates that a tool is always the preferred one given
a related tool. If such a relation is missing, an
alternative tool that is connected with the object via the
non preference relations cutBy, openBy is suggested.
In the example of the object wooden block (Figure 6)
all cutting tools (e. g. metal saw, pocket knife) are
connected with the wooden block and hence they are
alternative tools to cut it. Of course this holds for
tools only that full- ll the constraints for cutting
respectively opening the objects. To make the internal
process of the reasoning transparent for the user there
is a so-called reasoning log stating single steps of the
reasoning process. It is easily accessible in real-time on
a web page. An example for a concept-based reasoning
log can be found in Figure 7.</p>
        <p>Checking ontology...</p>
        <p>The action `to cut' is known.</p>
        <p>The object `Wooden Block' is known.</p>
        <p>Best suited tool to cut a Wooden Block:
Wood Saw
Unknown if it is available, hence exploring the
table.</p>
        <p>After the rst reasoning step, the robot explores
the available tools on the table. Subsequently, the
instance-based reasoning takes place. The recognized
tools together with their recognition con dence are
taken into account in addition to the internal and
external knowledge. The di erences to the concept-based
reasoning are that only the available tools are
considered including their con dence values and that the
reasoning rules mentioned above are applied. With this
additional information, the system is able to suggest
an available tool in the current scene that ts the task
with a high certainty. Figure 8 shows an example of
instance-based reasoning.</p>
        <p>On the table there are best suited tools to cut a
Wooden Block: Wood Saw (id 1)
The best suited tool with the highest recognition
con dence is the Wood Saw (id 1).</p>
        <p>The recognition con dence (0.95) is above a
de ned threshold (0.80).</p>
        <p>Recommendation: Wood Saw (id 1)</p>
        <p>Now, let us assume that the given task consists of
known actions (cut and open) and objects that are
contained in the ontology, we refer to this kind of tasks
as standard tasks. For standard tasks the reasoning
process is described above, external knowledge is not
required. As an extension assume that at least the
action or the object, or both terms are unknown. In
this case the internal knowledge is unable to provide
an adequate suggestion because of lacking information.</p>
        <p>Using external knowledge sources to nd a proper
match of the unknown terms to terms known in the
system is the rst step before the reasoning steps of the
standard tasks are performed. We refer to this kind of
tasks as non standard tasks. Within this paper we
dene a match of two terms (unknown term, term known
in the ontology) is proper when the semantic meaning
(linguistic) ts. In other words, a proper match is a
path in the external knowledge source that connects
an unknown term with a term known in the internal
ontology. For example proper matches of actions are
(cube, cut) and (uncork, open) and on the other hand
for objects (meat, food), and (rose, plant). From
WordNet we only search for synonym and hypernym links,
whereas from Wikidata we search for material used,
part of, and subclass relations. The number of allowed
links indicating relations is denoted as search depth. A
proper path is a path of the structure: unknown term
(e. g. cube) { link-type (e. g. is synonym) { known term
in the ontology (e. g. cut). Such a path can possibly
have more links and unknown terms in-between. The
important part is that the path connects the unknown
term via allowed links and potentially other terms with
a term known in the ontology. In order to choose the
most reliable path out of possibly many paths ending
in di erent known terms, we introduce a scoring in
three steps. First, each existing link li is scored with 1:
li =
(1; if link exists</p>
        <p>0; if no link exists</p>
        <p>Second, a path into the ontology has a weight pj
which is based on the number of links:</p>
        <p>n
pj = Y(0:9 li)</p>
        <p>i=1
The factor 0:9 penalizes long paths, since we assume
the longer the path the more the uncertainty increases.</p>
        <p>The maximum score popt of all m path scores de nes
the optimal path. It describes the \properness" of
a match between two terms (unknown term, known
term):</p>
        <p>popt = max(p1; :::; pm)</p>
        <p>Methods for link and path scoring as well as the
selection of the best result are subject to future research.
For example, the method for scoring a link might use
the number of links of the same type starting from the
same object. Further the types of links, e. g. made of,
can be scored di erently.</p>
        <p>Figure 9 shows an excerpt of a dialog that highlights
the reasoning of IRA if it is asked for \cube an onion".
This example contains an unknown action (cube in the
sense of cutting) and an unknown object (onion). If
IRA is asked to explain its reasoning it describes the
relations to an external knowledge source based on the
mentioned scoring process.</p>
        <p>User: \I want to cube an onion"
IRA: \To cube is like to cut. Onion is a kind of
vegetable. The best tool would be a Fruit Knife.
Let me check the available tools."
IRA: \A Fruit Knife is available, I would
recommend to use it."
User: \Please explain."
IRA: \According to Wikidata a vegetable is a
superclass of an onion. According to WordNet to
cut is a hypernym of to cube. A Fruit Knife is
the preferred tool to cut a vegetable. This tool is
available on the table. Is there anything else you
want to do?"</p>
        <p>Performance analysis of the system
In the rst part we evaluate if the suggested tool
matches the task. For instance for the unknown terms
\cube an onion" a suited tool would be a knife. This
would count as a successful suggestion. An example for
a failure would be the suggestion to use a wood saw for
\carving wood" (unknown term). The suggested tool
is reasonable but of course it would be very di cult
to carve wood with a saw. In the second part of this
section we evaluate the performance of our system in
nding proper matches for unknown terms to terms in
the internal ontology. This mechanism is the basement
of the reasoning.</p>
        <p>We created a test set that consists in total of 50 test
examples. Each example is a triplet (object, action,
tool). The objects and actions are taken from 15
unknown actions, 44 unknown objects, and the known
objects and actions in the ontology. We consider only
known tools for the test examples. The test set
consists of 5 examples with only an unknown action, 32
examples with only an unknown object, and 13
examples where both terms are unknown. For a meaningful
evaluation, di erent settings have been applied in the
experiments. Within a setting the search depth of the
available relations of the external knowledge sources
are de ned. The search depth is varied from direct
links (length one) up to search paths of length six. In
the gures 10 and 11, the search depth is varied on the
x-axis and the legend denotes the used relation. The
term Wikidata refers to the used relations: material
used, part of, and subclass. WordNet refers to the
relations: synonym and hypernym.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Evaluation of tool suggestion:</title>
      <p>Figure 10 shows the evaluation of the de ned test cases.
It can be seen that single relations like synonym and
hypernym contained in WordNet help in resolving few
test cases only, independently of the search depth. The
hypernym
synonym
WordNet</p>
      <p>Wikidata &amp; WordNet
2
combination of both relations is bene cial (WordNet
graph) but the best performance occurs when
combining WordNet and Wikidata. As mentioned the content
of both sources is complementary and it is
consequential that their combination is bene cial. The associated
graph displays that a search depth of three provides the
best results while higher depths leads to a decreasing
performance. In total the system is able to recommend
suited tools for 35 examples while 15 failed. The main
reason is missing information such that there is no
possibility to nd a proper match for at least one unknown
term. Another reason of failing is the mismatch of the
suggested tool with the expected one in the test case
triplet. An example for this case is (wooden block,
carve (unknown), pocket knife). Here the system
suggests the wood saw but of course carving wood with a
wood saw is di cult.</p>
      <p>Subsequently we evaluate more deeply the
performance of resolving unknown actions and objects.</p>
    </sec>
    <sec id="sec-6">
      <title>Evaluation of proper object matches:</title>
      <p>To evaluate the performance regarding proper object
matches using Wikidata and WordNet, we analyzed the
true positive rates and false positive rates (Figure 11).
The true positive rate reports the number of proper
matches divided by the overall number of unknown
objects from our test set. The false positive rate
denotes the number of mismatches, that means matches
into the ontology which are semantically wrong, again
divided by the overall number of unknown objects. The
displayed positive rate shows the sum of both rates.
From Figure 11, we derive that a search depth of three
provides a good balance between true positives (67,3 %)
and false positives (19 %). This is in accordance with
observations from Figure 10 and holds especially for
using WordNet and Wikidata. As can be seen from
the positive rate, the system nds matches into the
ontology for 95.45 % of the unknown objects. During
our experiments it turned out, that WordNet is more
suited than Wikidata for resolving lacking information
of unknown actions due to its lexical content. Using
the relations synonym or hypernym from WordNet, the
1.0
0.8
system nds proper matches for 75 % of the unknown
actions of the test set.</p>
      <p>In summary, the most promising setting is the
combination of sources with complementary content (in our
case WordNet and Wikidata). There are many more
sources available [Paulheim, 2017] possibly useful for
intelligent systems.
7</p>
      <sec id="sec-6-1">
        <title>Conclusions</title>
        <p>We proposed an intelligent system that is able to
provide an answer for a stated task by the user. The
answer is retrieved through reasoning taking an
internal ontology, external knowledge sources and the
explored resources in the current scene into account.
The reasoning takes place in order to suggest a suited
tool for a task like \I want to cut a wooden block."
that contains an action and an object. For unknown
actions and objects, external knowledge sources are
used to map these terms onto known terms into the
ontology. For our de ned test cases it turned out that
the combination of complementary external knowledge
sources is bene cial and a promising direction for future
research. The system shows on a web page a
reasoning log containing the internal steps of the reasoning
process and an explanation of the found suggestion via
speech. This mechanism makes the system transparent
and nicer to engage with.</p>
        <p>In future, we would like to extend the scenario to
more actions and objects to allow a more natural and
comprehensive interaction with the robot.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>The authors would like to thank all the people that
supported this project at both companies. We thank
MetraLabs for the set-up and support of the robot.</p>
      <p>Kinova.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Andreopoulos et al.,
          <year>2011</year>
          ] Andreopoulos,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Hasler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Wersing</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.</surname>
          </string-name>
          ,
          <article-title>Jan en</article-title>
          , H.,
          <string-name>
            <surname>Tsotsos</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and Korner, E. (
          <year>2011</year>
          ).
          <article-title>Active 3D Object Localization Using a Humanoid Robot</article-title>
          .
          <source>IEEE Transactions on Robotics</source>
          ,
          <volume>27</volume>
          (
          <issue>1</issue>
          ):
          <volume>47</volume>
          {
          <fpage>64</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Antoniou et al.,
          <year>2012</year>
          ] Antoniou,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Groth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Harmelen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. v.</given-names>
            , and
            <surname>Hoekstra</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>A Semantic Web Primer</article-title>
          . The MIT Press.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Cortes and Vapnik</source>
          , 1995] Cortes,
          <string-name>
            <given-names>C.</given-names>
            and
            <surname>Vapnik</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          (
          <year>1995</year>
          ).
          <article-title>Support-vector networks</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>20</volume>
          (
          <issue>3</issue>
          ):
          <volume>273</volume>
          {
          <fpage>297</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Daoutis et al.,
          <year>2009</year>
          ] Daoutis,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Coradeshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            , and
            <surname>Lout</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Grounding Commonsense Knowledge in Intelligent Systems</article-title>
          .
          <source>Journal of Ambient Intelligence and Smart Environments</source>
          ,
          <volume>1</volume>
          (
          <issue>4</issue>
          ):
          <volume>311</volume>
          {
          <fpage>321</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Ester et al.,
          <year>1996</year>
          ] Ester,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Kriegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Sander</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            , and
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          (
          <year>1996</year>
          ).
          <article-title>A density-based algorithm for discovering clusters in large spatial databases with noise</article-title>
          .
          <source>In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96)</source>
          , pages
          <fpage>226</fpage>
          {
          <fpage>231</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Gupta and Kochenderfer</source>
          , 2004] Gupta,
          <string-name>
            <given-names>R.</given-names>
            and
            <surname>Kochenderfer</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. J.</surname>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>Common Sense Data Acquisition for Indoor Mobile Robots</article-title>
          .
          <source>In Proceedings of the 19th National Conference on Arti cal Intelligence</source>
          ,
          <source>AAAI'04</source>
          , pages
          <fpage>605</fpage>
          {
          <fpage>610</fpage>
          . AAAI Press.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Haidu and Beetz</source>
          , 2016] Haidu,
          <string-name>
            <given-names>A.</given-names>
            and
            <surname>Beetz</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Action recognition and interpretation from virtual demonstrations</article-title>
          .
          <source>In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)</source>
          , pages
          <fpage>2833</fpage>
          {
          <fpage>2838</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Hancock et al.,
          <year>2011</year>
          ] Hancock,
          <string-name>
            <given-names>P. A.</given-names>
            ,
            <surname>Billings</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            ,
            <surname>Schaefer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. E.</given-names>
            ,
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Y. C.</given-names>
            ,
            <surname>de Visser</surname>
          </string-name>
          , E., and
          <string-name>
            <surname>Parasuraman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>A meta-analysis of factors a ecting trust in human-robot interaction</article-title>
          .
          <source>Human Factors</source>
          ,
          <volume>53</volume>
          (
          <issue>5</issue>
          ):
          <volume>517</volume>
          {
          <fpage>527</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Hanheide et al.,
          <year>2017</year>
          ] Hanheide,
          <string-name>
            <surname>M.</surname>
          </string-name>
          , Gobelbecker,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Horn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            ,
            <surname>Pronobis</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          , Sjoo,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Aydemir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Jensfelt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Gretton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Dearden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Janicek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Zender</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Kruij</surname>
          </string-name>
          , G.-J.,
          <string-name>
            <surname>Hawes</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Wyatt</surname>
            ,
            <given-names>J. L.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Robot task planning and explanation in open and uncertain worlds</article-title>
          .
          <source>Arti cial Intelligence</source>
          ,
          <volume>247</volume>
          :
          <fpage>119</fpage>
          {
          <fpage>150</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>[Harnad</source>
          , 1990] Harnad,
          <string-name>
            <surname>S.</surname>
          </string-name>
          (
          <year>1990</year>
          ).
          <article-title>The symbol grounding problem</article-title>
          .
          <source>Physica D: Nonlinear Phenomena</source>
          ,
          <volume>42</volume>
          (
          <issue>1</issue>
          ):
          <volume>335</volume>
          {
          <fpage>346</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Hogg et al.,
          <year>2017</year>
          ] Hogg,
          <string-name>
            <given-names>D. C.</given-names>
            ,
            <surname>Alomari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Duckworth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Hawasly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Bore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            , and
            <surname>Cohn</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G.</surname>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Grounding of Human Environments and Activities for Autonomous Robots</article-title>
          .
          <source>In Proceedings of International Joint Conference of Arti cial Intelligence (IJCAI)</source>
          , pages
          <fpage>1395</fpage>
          {
          <fpage>1402</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <source>[ISO/IEC</source>
          , 2016]
          <article-title>ISO/IEC (</article-title>
          <year>2016</year>
          ).
          <article-title>Information technology { Message Queuing Telemetry Transport (MQTT)</article-title>
          . Website. https://www.iso.org/standard/69466.html.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Kaiser et al.,
          <year>2014</year>
          ] Kaiser,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Petrick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. P. A.</given-names>
            ,
            <surname>Asfour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            , and
            <surname>Steedman</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Extracting common sense knowledge from text for robot planning</article-title>
          .
          <source>In 2014 IEEE International Conference on Robotics and Automation</source>
          , ICRA, pages
          <volume>3749</volume>
          {
          <fpage>3756</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>[Kinovarobotics</source>
          , 2018]
          <string-name>
            <surname>Kinovarobotics</surname>
          </string-name>
          (
          <year>2018</year>
          ). Website. www.meetjaco.com/about/.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [Matuszek et al.,
          <year>2006</year>
          ] Matuszek,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Cabral</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Witbrock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            , and
            <surname>Deoliveira</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          (
          <year>2006</year>
          ).
          <article-title>An introduction to the syntax and content of Cyc</article-title>
          .
          <source>In Proceedings of the AAAI Spring Symposium on Formalizing and Compiling Background Knowledge and Its Applications to Knowledge Representation and Question Answering</source>
          , pages
          <volume>44</volume>
          {
          <fpage>49</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>[MetraLabs</source>
          , 2018] MetraLabs (
          <year>2018</year>
          ). Metralabs. Website. www.metralabs.com/en/mobile-robot
          <string-name>
            <surname>-</surname>
          </string-name>
          scitos-g5/.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <source>[Paulheim</source>
          , 2017] Paulheim,
          <string-name>
            <surname>H.</surname>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Knowledge graph re nement: A survey of approaches and evaluation methods</article-title>
          .
          <source>Semantic Web</source>
          ,
          <volume>8</volume>
          (
          <issue>3</issue>
          ):
          <volume>489</volume>
          {
          <fpage>508</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>[Pustejovsky and Krishnaswamy</source>
          , 2016] Pustejovsky,
          <string-name>
            <given-names>J.</given-names>
            and
            <surname>Krishnaswamy</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>VoxML: A visualization modeling language</article-title>
          .
          <source>In Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC</source>
          <year>2016</year>
          , Portoroz, Slovenia, May
          <volume>23</volume>
          -28,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [Rebhan et al., 2009a]
          <string-name>
            <surname>Rebhan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Einecke</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Eggert</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2009a</year>
          ).
          <article-title>Consistent modeling of functional dependencies along with world knowledge</article-title>
          .
          <source>In Proceedings of the International Conference on Cognitive Information Systems Engineering</source>
          , pages
          <volume>341</volume>
          {
          <fpage>348</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [Rebhan et al., 2009b]
          <string-name>
            <surname>Rebhan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Richter</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Eggert</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2009b</year>
          ).
          <article-title>Demand-driven visual information acquisition</article-title>
          .
          <source>In Computer Vision Systems, 7th International Conference on Computer Vision Systems</source>
          , ICVS, pages
          <volume>124</volume>
          {
          <fpage>133</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>[Redmon and Farhadi</source>
          , 2017] Redmon,
          <string-name>
            <given-names>J.</given-names>
            and
            <surname>Farhadi</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>YOLO9000: Better, Faster, Stronger</article-title>
          .
          <source>In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          , pages
          <fpage>6517</fpage>
          {
          <fpage>6525</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <source>[ROS</source>
          ,
          <year>2018</year>
          ] ROS (
          <year>2018</year>
          ).
          <article-title>Robot Operating System</article-title>
          .
          <source>Website. www.ros.org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <source>[Stallman and Krempl</source>
          , 2010] Stallman,
          <string-name>
            <given-names>R. M.</given-names>
            and
            <surname>Krempl</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          (
          <year>2010</year>
          ). Princeton university `About WordNet'. Website. https://wordnet.princeton.edu/.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <source>[Tenorth and Beetz</source>
          , 2013] Tenorth,
          <string-name>
            <given-names>M.</given-names>
            and
            <surname>Beetz</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>KnowRob: A Knowledge Processing Infrastructure for Cognition-enabled Robots</article-title>
          .
          <source>International Journal of Robotic Research</source>
          ,
          <fpage>32</fpage>
          -
          <lpage>5</lpage>
          :
          <fpage>566</fpage>
          {
          <fpage>590</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [Vrandecic and Krotzsch, 2014] Vrandecic,
          <string-name>
            <surname>D.</surname>
          </string-name>
          and Krotzsch,
          <string-name>
            <surname>M.</surname>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Wikidata: A free collaborative knowledgebase</article-title>
          .
          <source>Communications of ACM</source>
          ,
          <volume>57</volume>
          (
          <issue>10</issue>
          ):
          <volume>78</volume>
          {
          <fpage>85</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [Wood et al.,
          <year>2014</year>
          ] Wood,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Zaidman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Ruth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            , and
            <surname>Hausenblas</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Linked Data: Structured Data on the Web. Manning Publications, Shelter Island, rst edition</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>