=Paper=
{{Paper
|id=Vol-2325/paper-03
|storemode=property
|title=Which Tool to Use? Grounded Reasoning in Everyday Environments with Assistant Robots
|pdfUrl=https://ceur-ws.org/Vol-2325/paper-03.pdf
|volume=Vol-2325
|authors=Lydia Fischer,Stephan Hasler,Joerg Deigmoeller,Thomas Schnuerer,Michael Redert,Ulrike Pluntke,Katrin Nagel,Chris Senzel,Joern Ploennigs,Andreas Richter,Julian Eggert
|dblpUrl=https://dblp.org/rec/conf/kr/FischerHDSRPNSP18
}}
==Which Tool to Use? Grounded Reasoning in Everyday Environments with Assistant Robots==
<pdf width="1500px">https://ceur-ws.org/Vol-2325/paper-03.pdf</pdf>
<pre>
     Which tool to use? Grounded reasoning in everyday
             environments with assistant robots

        Lydia Fischer1    Stephan Hasler1      Jörg Deigmöller1      Thomas Schnürer2
          Michael Redert3      Ulrike Pluntke3       Katrin Nagel3       Chris Senzel3
                                4                       1
                 Joern Ploennigs        Andreas Richter          Julian Eggert1
             1 – Honda Research Institute EU, Offenbach, Germany, firstname.surname@honda-ri.de
           2 – Technical University of Illmenau, Illmenau, Germany, thomas.schnuerer@tu-illmenau.de
                 3 – IBM Watson IoT Center, Munich, Germany, firstname.surname@de.ibm.com
                           4 – IBM Research, Ireland, firstname.surname@ie.ibm.com


                                                                    for seamless interaction, an artificial agent and a hu-
                                                                    man need to share concepts about the things that
                       Abstract                                     happen in their environment [Rebhan et al., 2009a,
                                                                    Rebhan et al., 2009b]. A viable approach for this is to
    We present a cooperative reasoning agent em-                    use semantic knowledge in terms of ontologies. Prob-
    bodied into a mobile robot enabled to explore                   lems here lie in the generalizability of the approach
    its environment by a camera. The robot can                      and in the grounding [Harnad, 1990] of the semantic
    infer missing knowledge given an action and                     concepts in the real-world. Besides the pure function-
    an object by the user, like for example “I want                 ality of such a system the capability of making the
    to open a wine bottle.”. Available tools are ex-                internal processes transparent is important for user
    plored by the robot and it finally recommends                   acceptance and trust [Hancock et al., 2011]. To tackle
    the most suitable one. The reasoning is on                      this problem, in a joint project between HRI-EU and
    the one hand based on a static ontology that                    IBM Watson IoT Center, we have studied a minimal
    describes how to relate the actions “open” and                  real-world set-up in which a robot interacts with a
    “cut” with a fixed set of tools. On the other                   user to reason about tools, objects and actions. The
    hand, unknown actions and tools are resolved                    scenario was limited to a small set of actions like e. g.
    by looking up for synonyms or super/sub-class                   cutting and opening, which are common all day activi-
    relations in Wikidata and WordNet. By this,                     ties. However it was designed to be expandable to a
    the robot tries to map knowledge from linked                    broader set of involved elements and actions. In this
    data to its internal interpretable ontology. To                 paper, we present a system consisting of a robotics plat-
    retrace the reasoning process, the robot is able                form provided with a scenario-specific root ontology,
    to explain its conclusions by text to speech.                   dialog, reasoning, external knowledge search, object
    Finally, we show the performance of the sys-                    detection, and symbol grounding capabilities. The on-
    tem based on different settings for scanning                    tology provides seed knowledge about a small set of
    the linked data.                                                concepts, and we show the system capabilities of reason-
                                                                    ing about tool-related queries within the ontology and
1    Introduction                                                   additionally making use of external knowledge sources
                                                                    to extend its reasoning scope in case of “unknown”
Cooperative Intelligence (CI) is an approach which
                                                                    concepts. The analysed system is capable of
takes up ideas from artificial intelligence and machine
learning but puts the human in the center of all con-                 • interacting with the user via speech,
siderations. One of the prerequisites of CI is that                   • processing user requests within the designed scope
                                                                        of the system using an internal static ontology and
Copyright c by the paper’s authors. Copying permitted for
private and academic purposes.
                                                                        reasoning,
In: G. Steinbauer, A. Ferrein (eds.): Proceedings of the 11th
                                                                      • providing a log file that contains the internal steps
International Workshop on Cognitive Robotics, Tempe, AZ, USA,           of the reasoning as well as an explanation via
27-Oct-2018, published at http://ceur-ws.org                            speech,


                                                                3
   • connecting knowledge from external sources with               2. IRA provides a first suggestion by reasoning in its
     the real world, and                                              internal ontology (and external knowledge sources
   • operating in the real world.                                     if required)
   In the following we present a proof of concept frame-           3. IRA explores the table with the available tools
work that allows for future extensions. We have chosen                and recognizes them visually
a limited scenario that should show the main idea of               4. IRA suggests the most suited tool that is available
dealing with external knowledge sources for a robot.                  on the table for the given task and points to it
                                                                   5. IRA provides an explanation of its decision
The scenario: Tools, objects, and actions                           The subsequent section provides related work fol-
                                                                 lowed by a description of the system architecture. After-
In our real world scenario, we show a mobile robot
                                                                 wards the functionality of single system components are
that is acting intelligently and able to reason by taking
                                                                 explained. Then the results of the quantitative tests
into account the current situation and the knowledge
                                                                 are described. The conclusion completes the paper.
sources of the system. In the future, robots will assist
humans in many tasks in their household or at work.
We consider the tasks of cutting and opening objects             2   Related work
which are typical in all day activities. These tasks are         Most work in robotics in the area of rea-
easy to understand and offer already enough complexity           soning focuses on detailed execution of manip-
to show the challenges and solutions in reasoning.               ulation tasks and robot planning.                For in-
   In the beginning of the demonstration, a user enters          stance, [Hogg et al., 2017, Tenorth and Beetz, 2013]
the scene and sets a task like “I want to cut a wooden           and [Haidu and Beetz, 2016] focus on learning tasks
block.”. The system shall suggest a suited tool with             from observing human activities in kitchen scenarios.
respect to the available tools in the current scene. The         In the area of robot planning, the reasoning usually
robot IRA (Intelligent Reasoning Agent) is placed in a           deals with uncertainties from real world measurements
fixed and static environment. In our demonstration, we           [Hanheide et al., 2017].
set up a room with a table that is placed in the middle
                                                                    Only little work exists in covering more broader
of the room (Figure 1). On the table there are tools
                                                                 everyday knowledge for robots like usual rela-
                                                                 tions between objects and properties to allow
                                                                 a more natural and less targeted interaction.
                                                                 Early work tried to build own crowd sourced
                                                                 databases [Gupta and Kochenderfer, 2004]. Later on,
                                                                 more extensive databases emerged from the linked
                                                                 data wave [Antoniou et al., 2012, Wood et al., 2014].
                                                                 [Daoutis et al., 2009] used the common sense database
                                                                 Cyc [Matuszek et al., 2006] for communicating with
                                                                 the robot about objects and their properties that have
                                                                 been seen. [Kaiser et al., 2014] crawl text documents
                                                                 that fit to the current context, build their own ontology
                                                                 and improve the content by linking against WordNet
Figure 1: Illustration of the scenario. The robot IRA            [Stallman and Krempl, 2010].
and the user are in a room. The user can ask IRA                    One major problem in making data interpretable
how to manipulate an object. IRA explores the table,             for robots is the type of representation.              In
detects the tools and proposes a suitable tool via speech.       [Pustejovsky and Krishnaswamy, 2016], a semantic
which are available for the task. We assume that the             knowledge representation is defined that describes real-
tools do not change and are not moved while exploring.           world objects from a functional point of view, which
   In case the user states a request which cannot be             might be beneficial for robotics.
answered using the predefined ontology, e. g. “I want               Still, the previous approaches lack in judging if an
to cube an onion.” (the action to ‘cube’ and the object          observed object by the robot is usable for a certain
‘onion’ is unknown by the system), the system uses               task. This relates to the problem that linked data is
external knowledge sources in order to match the un-             not directly machine interpretable, since the naming
known terms with related terms known in the internal             of symbols in the ontology depends on the designers
ontology. Hence, our system is able to work beyond               choice. Hence, our approach differs as we map linked
the designed task scope with this mechanism.                     data onto an internal, machine-interpretable ontology
   One run of the scenario contains five main steps:             that represents a known concept to the robot. The
  1. User sets a task via speech interaction                     advantage is that by doing this we are able to evaluate


                                                             4
if external knowledge sources are usable for the robot.          SPEAK receives a text message from the back-end,
                                                              transfers it to speech and returns if the robot has
                                                              finished speaking.
3    The system architecture
                                                                  ASK works similar as SPEAK, as it is a combination of
                                                              SPEAK at first - for asking a question - with a subsequent
    event via MQTT
                                                              listening function. It finally returns the answer of the
         Front-end                 Back-end
                             • conversation service           user as text to the back-end.
     •   WAKE UP             • keyword detection
     •   END SESSION         • natural language under-           EXPLORE TABLE is the most complex module. If trig-
     •   SPEAK                 standing
     •   ASK                 • reasoning
                                                              gered, it moves the robot around the table while keeping
     •   EXPLORE TABLE             – internal knowledge       the camera view on the table. The robot stops approx-
     •   COMUNICATE RESULT         – external knowledge       imately every 0.5 m on the trajectory and captures an
 command via MQTT                                             RGB and a depth image from the Kinect camera. For
                                                              each RGB image the objectness score of the YOLO
Figure 2: The system overview. The back-end controls          detection framework [Redmon and Farhadi, 2017] is
the dialog and applies modules of the front-end to com-       used to predict bounding boxes of object candidates
municate with the user or to explore the environment.         (see Figure 3). For this the pre-trained YOLO
Commands/ events from and to the front-end are sent           model for the COCO dataset is used, which yields
by MQTT.                                                      an 81-dimensional class vector for each bounding box.
                                                              This vector is the feature input to a linear SVM
   The system consists of two main parts (Figure 2):          [Cortes and Vapnik, 1995] that is trained to predict
the front-end (interaction platform) and the back-end         the final tool class together with a confidence. Using
(dialog management and reasoning). We assume a sys-
tem with a predefined static ontology about objects and
tool types as well as relations between them with re-
spect to the actions cut and open, e. g. a wooden block
can be cut by a wood saw. The platform of the system
is a MetraLabs SCITOS G5 robot [MetraLabs, 2018],
additionally equipped with a Kinect camera mounted
on a pan-tilt-unit for moving the head. Laser scan-
ners allow the robot to localize itself in the room and
a KINOVA JACO arm [Kinovarobotics, 2018] enables
                                                              Figure 3: Robot camera view with predicted bounding
the robot to point at tools and objects in the environ-
                                                              boxes.
ment.
   The front-end provides basic modules for the robot         the depth information, a 3D world position is com-
like speaking or listening. Those modules are running         puted for each object detection. After all views have
in ROS (Robot Operating System [ROS, 2018]). The              been taken, nearby 3D detections are clustered using
back-end comprises mainly the dialog control and the          DBSCAN [Ester et al., 1996]. Each cluster is assigned
reasoning. Further it acts as a state machine that            with a unique instance ID, a label, and a confidence.
decides for example if the robot needs to explore the         The label indicates the tool class with the highest accu-
environment for available tools.                              mulated confidence, and the confidence of the cluster
                                                              equals the ratio between the accumulated confidence
4    Front-end                                                of the winning class and the sum of all detection con-
                                                              fidences (Figure 4). This integration step strongly
The front-end provides modular components that                reduces the effects of noisy 3D estimations and wrong
can be independently triggered by the back-end                class predictions from certain view-points. Finally the
through a MQTT (Message Queue Telemetry Trans-                cluster labels with their confidences are sent to the
port, [ISO/IEC, 2016]) bridge. The MQTT bridge                back-end. We expect that the single view detection
translates messages from the back-end to ROS messages         performance can be strongly improved by directly fine-
of the front-end and vice versa. As depicted in Fig-          tuning YOLO on our data. Having this, a more efficient
ure 2, the front-end modules are limited to SPEAK, ASK,       exploration strategy of the robot can be implemented,
EXPLORE TABLE, and COMMUNICATE RESULT. WAKE UP                e. g. like in [Andreopoulos et al., 2011] where occlu-
and END SESSION only send unidirectional messages             sions of objects on a table are taken into account. This
for starting and stopping the dialog. The modules are         module relates to step three of a demonstration run
explained in the following:                                   explained at the end of the Introduction.


                                                          5
                                             0. Bread Knife
                                             confidence = 0.88
                                             size = 200.0

           4. Metal Saw                      1. Screwdriver
                                             confidence = 0.94
                           0. Bread Knife    size = 80.0

                                             2. Wood Saw
                                             confidence = 0.88
                                             size = 300.0

                                             3. Lighter
              3. Lighter                     confidence = 0.91
                                             size = 80.0

                                             4. Metal Saw
                            1. Screwdriver   confidence = 0.94
                                             size = 140.0
            2. Wood Saw
                                                                                Figure 5: Back-end architecture
                                                                     case action and object are identified, the reasoning is
     Figure 4: Integrated object detection result.                   performed. The result is then translated into natural
   COMMUNICATE RESULT receives the inferred informa-                 language using the Cut & Open Dialog component.
tion from the back-end which tool is the most suitable               This is also involved, if the system must ask for missing
one for the given query. This contains the instance                  information or explains the result.
ID of the tool and the text for speaking. The text is                   Structured reasoning: This layer contains the
forwarded to the SPEAK module and the instance ID is                 main reasoning component, which provides a struc-
used to point at the tool on the table with the robot                tured API for concept-based as well as instance-based
arm (step 4).                                                        reasoning. It uses knowledge sources from the layer
                                                                     below. Details of both algorithms are described in the
5    Back-end                                                        next section of the paper.
The complete dialog between the user and IRA is con-                    Knowledge sources: The components of this layer
trolled by the back-end. In addition, keyword detection              provide information for the reasoning component. Most
based on natural language understanding techniques                   prominent is the Structured Knowledge Store compo-
extract the action and the object from tasks stated by               nent, which contains structured domain knowledge
the user.                                                            (ontology) about the use case scenario. In order to en-
    The control module of the back-end keeps track of                hance manually added knowledge, it applies reasoning
the current state of the dialog and triggers the needed              rules that generate additional knowledge. The Un-
commands for the robot. It also provides the exchange                structured Knowledge Store is considered as a fallback
of information between the back-end components. The                  that returns a paragraph of text as description in case
main intelligence of the system is contained in the                  no specific tool can be recommended. WordNet and
reasoning module that is explained in more detail later.             Wikidata wrap the corresponding knowledge sources.
    The architecture of the back-end (Figure 5) follows                 WordNet [Stallman and Krempl, 2010] is a lexical
the micro service architectural pattern. It has been                 database of English containing nouns, verbs, adjec-
implemented on the IBM Cloud using multiple IBM                      tives and adverbs. Words are grouped into sets
Watson services. Connectivity between the robot and                  of cognitive synonyms (synsets), each expressing
the back-end is established through the Watson In-                   a distinct concept. This knowledge is helpful in
ternet of Things Platform using the MQTT protocol.                   resolving the meaning of unknown words. Wiki-
The back-end services can be organized in three layers,              data [Vrandečić and Krötzsch, 2014] is an open source
which are briefly described below.                                   knowledge database containing factual knowledge ex-
    Natural language understanding: The compo-                       tracted for instance from Wikipedia. Both sources are
nents within this layer are responsible to drive the                 selected because of their complementary content.
dialog with the user. If an utterance is received by
the Dialog Controller, it decides, which of potentially
                                                                     The reasoning:
several skills is best suited to handle it. In case, none
fits well, the Default Skill creates a response that tells           The reasoning components are the pre-defined static
the user that the system cannot handle the request. If               ontology and reasoning rules on top of it, the inter-
the utterance is related to the described scenario, the              faces to Wikidata and WordNet. The static ontology
Cut & Open Skill will produce a response. In order to                contains
do so, it first analyses the intent of the utterance and               • objects (≈ 30), e. g. apple, glass, rope
extracts relevant information like the action and the                  • tools (≈ 20), e. g. bread knife, corkscrew, wood
object using the Keyword Extraction component. In                         saw


                                                                 6
  • concepts (≈ 15), e. g. vegetable, vessel, plant              tools only that full-fill the constraints for cutting re-
  • relations (hasComponent, cutBy, cutByPref,                   spectively opening the objects. To make the internal
    openBy, openByPref)                                          process of the reasoning transparent for the user there
  • the internal structure encodes an isA relation of            is a so-called reasoning log stating single steps of the
    objects, tools, and concepts, e. g. an apple isA             reasoning process. It is easily accessible in real-time on
    fruit                                                        a web page. An example for a concept-based reasoning
The objects and the tools are attached with the prop-            log can be found in Figure 7.
erties: stability S and their 3D shape. The 3D shape
                                                                    • Checking ontology...
is described in the dimensions width W, height H and
                                                                    • The action ‘to cut’ is known.
length L. These qualitative properties are used for                 • The object ‘Wooden Block’ is known.
reasoning. The stability S of a tool is specified in a              • Best suited tool to cut a Wooden Block:
range from completely deformable to completely stiff.                 Wood Saw
Currently, the shape of a tool is not measured online               • Unknown if it is available, hence exploring the
but is a static property stored in the ontology. Later on             table.
it is planned to use the visual input of IRA to measure
the length of the recognized tools. For reasoning within         Figure 7: Example of a concept-based reasoning log
cutting tasks there are two main reasoning rules using           for the task “I want to cut a wooden block.”.
this properties:
                                                                    After the first reasoning step, the robot explores
    (i) Sobject ≤ Stool , and (ii) Wobject ≤ 2 · Ltool .         the available tools on the table. Subsequently, the
                                                                 instance-based reasoning takes place. The recognized
For a task within opening scenarios there is a special           tools together with their recognition confidence are
reasoning rule in addition to the direct links that says,        taken into account in addition to the internal and ex-
if a vessel v has a fastening f , and this fastening f can       ternal knowledge. The differences to the concept-based
be opened by a tool t, then vessel v can also be opened          reasoning are that only the available tools are con-
by tool t:                                                       sidered including their confidence values and that the
                                                                 reasoning rules mentioned above are applied. With this
 hasComponent(v, f ) & openBy(f, t) → openBy(v, t) .             additional information, the system is able to suggest
                                                                 an available tool in the current scene that fits the task
   The reasoning consists of two steps: concept-based
                                                                 with a high certainty. Figure 8 shows an example of
reasoning, and instance-based reasoning. The concept-
                                                                 instance-based reasoning.
based reasoning uses the internal ontology and the
external knowledge without reference to the tools on                • On the table there are best suited tools to cut a
the table. Hence the system provides a first sugges-                  Wooden Block: Wood Saw (id 1)
tion of a tool, as mentioned in the second step of the              • The best suited tool with the highest recognition
scenario description. Figure 6 shows a visualisation                  confidence is the Wood Saw (id 1).
of the relations of a wooden block in the internal on-              • The recognition confidence (0.95) is above a
tology which represents the main structure of objects                 defined threshold (0.80).
and tools. For the task “I want to cut a wooden block.”             • Recommendation: Wood Saw (id 1)
                      cutByPref       wood saw
                                                                 Figure 8: Example of an instance-based reasoning log
        wooden block
                                                                 for the task “I want to cut a wooden block.”
                          cutBy     cutting tools
                                                                    Now, let us assume that the given task consists of
                                                                 known actions (cut and open) and objects that are
Figure 6: Instance of the relations attached to an object
                                                                 contained in the ontology, we refer to this kind of tasks
the first suggestion would be a wood saw since it is             as standard tasks. For standard tasks the reasoning
connected via a cutByPref relation. The suffix Pref              process is described above, external knowledge is not
indicates that a tool is always the preferred one given          required. As an extension assume that at least the
a related tool. If such a relation is missing, an alter-         action or the object, or both terms are unknown. In
native tool that is connected with the object via the            this case the internal knowledge is unable to provide
non preference relations cutBy, openBy is suggested.             an adequate suggestion because of lacking information.
In the example of the object wooden block (Figure 6)                Using external knowledge sources to find a proper
all cutting tools (e. g. metal saw, pocket knife) are            match of the unknown terms to terms known in the
connected with the wooden block and hence they are               system is the first step before the reasoning steps of the
alternative tools to cut it. Of course this holds for            standard tasks are performed. We refer to this kind of


                                                             7
tasks as non standard tasks. Within this paper we de-                  • User: “I want to cube an onion”
fine a match of two terms (unknown term, term known                    • IRA: “To cube is like to cut. Onion is a kind of
in the ontology) is proper when the semantic meaning                     vegetable. The best tool would be a Fruit Knife.
(linguistic) fits. In other words, a proper match is a                   Let me check the available tools.”
path in the external knowledge source that connects                    • IRA: “A Fruit Knife is available, I would recom-
an unknown term with a term known in the internal                        mend to use it.”
ontology. For example proper matches of actions are                    • User: “Please explain.”
(cube, cut) and (uncork, open) and on the other hand                   • IRA: “According to Wikidata a vegetable is a
                                                                         superclass of an onion. According to WordNet to
for objects (meat, food), and (rose, plant). From Word-
                                                                         cut is a hypernym of to cube. A Fruit Knife is
Net we only search for synonym and hypernym links,
                                                                         the preferred tool to cut a vegetable. This tool is
whereas from Wikidata we search for material used,                       available on the table. Is there anything else you
part of, and subclass relations. The number of allowed                   want to do?”
links indicating relations is denoted as search depth. A
proper path is a path of the structure: unknown term               Figure 9: Part of the dialog if action (cube) and object
(e. g. cube) – link-type (e. g. is synonym) – known term           (onion) are unknown.
in the ontology (e. g. cut). Such a path can possibly
have more links and unknown terms in-between. The                  6    Performance analysis of the system
important part is that the path connects the unknown
                                                                    In the first part we evaluate if the suggested tool
term via allowed links and potentially other terms with
                                                                    matches the task. For instance for the unknown terms
a term known in the ontology. In order to choose the
                                                                   “cube an onion” a suited tool would be a knife. This
most reliable path out of possibly many paths ending
                                                                   would count as a successful suggestion. An example for
in different known terms, we introduce a scoring in
                                                                    a failure would be the suggestion to use a wood saw for
three steps. First, each existing link li is scored with 1:
                                                                   “carving wood” (unknown term). The suggested tool
                     (                                              is reasonable but of course it would be very difficult
                       1, if link exists                            to carve wood with a saw. In the second part of this
                li =
                       0, if no link exists                         section we evaluate the performance of our system in
                                                                    finding proper matches for unknown terms to terms in
  Second, a path into the ontology has a weight pj                  the internal ontology. This mechanism is the basement
which is based on the number of links:                              of the reasoning.
                           n
                           Y                                           We created a test set that consists in total of 50 test
                    pj =         (0.9 · li )                        examples. Each example is a triplet (object, action,
                           i=1                                      tool). The objects and actions are taken from 15 un-
                                                                    known actions, 44 unknown objects, and the known
The factor 0.9 penalizes long paths, since we assume                objects and actions in the ontology. We consider only
the longer the path the more the uncertainty increases.             known tools for the test examples. The test set con-
   The maximum score popt of all m path scores defines              sists of 5 examples with only an unknown action, 32
the optimal path. It describes the “properness” of                  examples with only an unknown object, and 13 exam-
a match between two terms (unknown term, known                      ples where both terms are unknown. For a meaningful
term):                                                              evaluation, different settings have been applied in the
                                                                    experiments. Within a setting the search depth of the
                 popt = max(p1 , ..., pm )                          available relations of the external knowledge sources
                                                                    are defined. The search depth is varied from direct
   Methods for link and path scoring as well as the                 links (length one) up to search paths of length six. In
selection of the best result are subject to future research.        the figures 10 and 11, the search depth is varied on the
For example, the method for scoring a link might use                x-axis and the legend denotes the used relation. The
the number of links of the same type starting from the              term Wikidata refers to the used relations: material
same object. Further the types of links, e. g. made of,             used, part of, and subclass. WordNet refers to the
can be scored differently.                                          relations: synonym and hypernym.
   Figure 9 shows an excerpt of a dialog that highlights
the reasoning of IRA if it is asked for “cube an onion”.
                                                                   Evaluation of tool suggestion:
This example contains an unknown action (cube in the
sense of cutting) and an unknown object (onion). If                Figure 10 shows the evaluation of the defined test cases.
IRA is asked to explain its reasoning it describes the             It can be seen that single relations like synonym and
relations to an external knowledge source based on the             hypernym contained in WordNet help in resolving few
mentioned scoring process.                                         test cases only, independently of the search depth. The


                                                               8
                                100
                                                                     hypernym
                                                                                                             1.0
                                                                     synonym                                           true positive rate
                                 80                                  WordNet
                                                                     Wikidata & WordNet                                false positive rate


        test cases passed [%]
                                 60                                                                                    positive rate
                                                                                                             0.8

                                 40


                                 20
                                                                                                             0.6

                                  0


                                                                                                      rate
                                      1   2   3                  4     5                  6
                                                  search depth


                                                                                                             0.4
Figure 10: The evaluation of the test cases if accessing
external knowledge sources. The performance is shown
                                                                                                             0.2
for different knowledge sources depending on search
depth.
                                                                                                             0.0
combination of both relations is beneficial (WordNet                                                               0     1          2             3         4   5   6
                                                                                                                                             search depth
graph) but the best performance occurs when combin-
ing WordNet and Wikidata. As mentioned the content
                                                                                                  Figure 11: The true positive rate, false positive rate,
of both sources is complementary and it is consequen-
                                                                                                  and the positive rate for the unknown objects in the
tial that their combination is beneficial. The associated
                                                                                                  test set using Wikidata and WordNet.
graph displays that a search depth of three provides the
best results while higher depths leads to a decreasing                                            system finds proper matches for 75 % of the unknown
performance. In total the system is able to recommend                                             actions of the test set.
suited tools for 35 examples while 15 failed. The main                                               In summary, the most promising setting is the com-
reason is missing information such that there is no pos-                                          bination of sources with complementary content (in our
sibility to find a proper match for at least one unknown                                          case WordNet and Wikidata). There are many more
term. Another reason of failing is the mismatch of the                                            sources available [Paulheim, 2017] possibly useful for
suggested tool with the expected one in the test case                                             intelligent systems.
triplet. An example for this case is (wooden block,
carve (unknown), pocket knife). Here the system sug-
                                                                                                  7          Conclusions
gests the wood saw but of course carving wood with a
wood saw is difficult.                                                                            We proposed an intelligent system that is able to pro-
   Subsequently we evaluate more deeply the perfor-                                               vide an answer for a stated task by the user. The
mance of resolving unknown actions and objects.                                                   answer is retrieved through reasoning taking an in-
                                                                                                  ternal ontology, external knowledge sources and the
Evaluation of proper object matches:                                                              explored resources in the current scene into account.
                                                                                                  The reasoning takes place in order to suggest a suited
To evaluate the performance regarding proper object                                               tool for a task like “I want to cut a wooden block.”
matches using Wikidata and WordNet, we analyzed the                                               that contains an action and an object. For unknown
true positive rates and false positive rates (Figure 11).                                         actions and objects, external knowledge sources are
The true positive rate reports the number of proper                                               used to map these terms onto known terms into the
matches divided by the overall number of unknown                                                  ontology. For our defined test cases it turned out that
objects from our test set. The false positive rate de-                                            the combination of complementary external knowledge
notes the number of mismatches, that means matches                                                sources is beneficial and a promising direction for future
into the ontology which are semantically wrong, again                                             research. The system shows on a web page a reason-
divided by the overall number of unknown objects. The                                             ing log containing the internal steps of the reasoning
displayed positive rate shows the sum of both rates.                                              process and an explanation of the found suggestion via
From Figure 11, we derive that a search depth of three                                            speech. This mechanism makes the system transparent
provides a good balance between true positives (67,3 %)                                           and nicer to engage with.
and false positives (19 %). This is in accordance with                                               In future, we would like to extend the scenario to
observations from Figure 10 and holds especially for                                              more actions and objects to allow a more natural and
using WordNet and Wikidata. As can be seen from                                                   comprehensive interaction with the robot.
the positive rate, the system finds matches into the
ontology for 95.45 % of the unknown objects. During
                                                                                                  Acknowledgements
our experiments it turned out, that WordNet is more
suited than Wikidata for resolving lacking information                                            The authors would like to thank all the people that
of unknown actions due to its lexical content. Using                                              supported this project at both companies. We thank
the relations synonym or hypernym from WordNet, the                                               MetraLabs for the set-up and support of the robot.


                                                                                              9
References                                                          [Kaiser et al., 2014] Kaiser, P., Lewis, M., Petrick, R.
                                                                      P. A., Asfour, T., and Steedman, M. (2014). Extracting
[Andreopoulos et al., 2011] Andreopoulos, A., Hasler, S.,
                                                                      common sense knowledge from text for robot planning.
  Wersing, H., Janßen, H., Tsotsos, J., and Körner,
                                                                      In 2014 IEEE International Conference on Robotics and
  E. (2011). Active 3D Object Localization Using a
                                                                      Automation, ICRA, pages 3749–3756.
  Humanoid Robot. IEEE Transactions on Robotics,
  27(1):47–64.                                                      [Kinovarobotics, 2018] Kinovarobotics (2018).     Kinova.
                                                                      Website. www.meetjaco.com/about/.
[Antoniou et al., 2012] Antoniou, G., Groth, P., Harme-
  len, F. v., and Hoekstra, R. (2012). A Semantic Web               [Matuszek et al., 2006] Matuszek, C., Cabral, J., Wit-
  Primer. The MIT Press.                                              brock, M., and Deoliveira, J. (2006). An introduction
                                                                      to the syntax and content of Cyc. In Proceedings of the
[Cortes and Vapnik, 1995] Cortes, C. and Vapnik, V.                   AAAI Spring Symposium on Formalizing and Compiling
  (1995). Support-vector networks. Machine Learning,                  Background Knowledge and Its Applications to Knowl-
  20(3):273–297.                                                      edge Representation and Question Answering, pages 44–
[Daoutis et al., 2009] Daoutis, M., Coradeshi, S., and                49.
  Loutfi, A. (2009). Grounding Commonsense Knowledge                [MetraLabs, 2018] MetraLabs (2018). Metralabs. Website.
  in Intelligent Systems. Journal of Ambient Intelligence             www.metralabs.com/en/mobile-robot-scitos-g5/.
  and Smart Environments, 1(4):311–321.
                                                                    [Paulheim, 2017] Paulheim, H. (2017). Knowledge graph
[Ester et al., 1996] Ester, M., Kriegel, H., Sander, J., and          refinement: A survey of approaches and evaluation
  Xu, X. (1996). A density-based algorithm for discov-                methods. Semantic Web, 8(3):489–508.
  ering clusters in large spatial databases with noise. In
  Proceedings of the Second International Conference on             [Pustejovsky and Krishnaswamy, 2016] Pustejovsky,      J.
  Knowledge Discovery and Data Mining (KDD-96), pages                 and Krishnaswamy, N. (2016). VoxML: A visualization
  226–231.                                                            modeling language. In Proceedings of the Tenth Interna-
                                                                      tional Conference on Language Resources and Evaluation
[Gupta and Kochenderfer, 2004] Gupta, R. and Kochen-                  LREC 2016, Portorož, Slovenia, May 23-28, 2016.
  derfer, M. J. (2004). Common Sense Data Acquisition
                                                                    [Rebhan et al., 2009a] Rebhan, S., Einecke, N., and Eg-
  for Indoor Mobile Robots. In Proceedings of the 19th
                                                                      gert, J. (2009a). Consistent modeling of functional de-
  National Conference on Artifical Intelligence, AAAI’04,
                                                                      pendencies along with world knowledge. In Proceedings
  pages 605–610. AAAI Press.
                                                                      of the International Conference on Cognitive Information
[Haidu and Beetz, 2016] Haidu, A. and Beetz, M. (2016).               Systems Engineering, pages 341–348.
  Action recognition and interpretation from virtual                [Rebhan et al., 2009b] Rebhan, S., Richter, A., and Eg-
  demonstrations. In IEEE/RSJ International Conference                gert, J. (2009b). Demand-driven visual information ac-
  on Intelligent Robots and Systems (IROS), pages 2833–               quisition. In Computer Vision Systems, 7th International
  2838.                                                               Conference on Computer Vision Systems, ICVS, pages
[Hancock et al., 2011] Hancock, P. A., Billings, D. R.,               124–133.
  Schaefer, K. E., Chen, J. Y. C., de Visser, E., and Para-         [Redmon and Farhadi, 2017] Redmon, J. and Farhadi, A.
  suraman, R. (2011). A meta-analysis of factors affect-              (2017). YOLO9000: Better, Faster, Stronger. In IEEE
  ing trust in human-robot interaction. Human Factors,                Conference on Computer Vision and Pattern Recognition
  53(5):517–527.                                                      (CVPR), pages 6517–6525.
[Hanheide et al., 2017] Hanheide, M., Göbelbecker, M.,             [ROS, 2018] ROS (2018). Robot Operating System. Web-
  Horn, G. S., Pronobis, A., Sjöö, K., Aydemir, A., Jens-           site. www.ros.org.
  felt, P., Gretton, C., Dearden, R., Janicek, M., Zender,
  H., Kruijff, G.-J., Hawes, N., and Wyatt, J. L. (2017).           [Stallman and Krempl, 2010] Stallman,     R. M. and
  Robot task planning and explanation in open and un-                  Krempl, S. (2010). Princeton university ‘About Word-
  certain worlds. Artificial Intelligence, 247:119–150.                Net’. Website. https://wordnet.princeton.edu/.

[Harnad, 1990] Harnad, S. (1990). The symbol grounding              [Tenorth and Beetz, 2013] Tenorth, M. and Beetz, M.
  problem. Physica D: Nonlinear Phenomena, 42(1):335–                 (2013). KnowRob: A Knowledge Processing Infrastruc-
  346.                                                                ture for Cognition-enabled Robots. International Jour-
                                                                      nal of Robotic Research, 32-5:566–590.
[Hogg et al., 2017] Hogg, D. C., Alomari, M., Duckworth,
  P., Hawasly, M., Bore, N., and Cohn, A. G. (2017).                [Vrandečić and Krötzsch, 2014] Vrandečić, D.    and
  Grounding of Human Environments and Activities for                  Krötzsch, M. (2014).        Wikidata: A free collabo-
  Autonomous Robots. In Proceedings of International                  rative knowledgebase.        Communications of ACM,
  Joint Conference of Artificial Intelligence (IJCAI), pages          57(10):78–85.
  1395–1402.                                                        [Wood et al., 2014] Wood, D., Zaidman, M., Ruth, L., and
                                                                      Hausenblas, M. (2014). Linked Data: Structured Data
[ISO/IEC, 2016] ISO/IEC (2016). Information technol-
                                                                      on the Web. Manning Publications, Shelter Island, first
   ogy – Message Queuing Telemetry Transport (MQTT).
                                                                      edition.
   Website. https://www.iso.org/standard/69466.html.


                                                               10

</pre>