=Paper=
{{Paper
|id=Vol-2325/paper-03
|storemode=property
|title=Which Tool to Use? Grounded Reasoning in Everyday Environments with Assistant Robots
|pdfUrl=https://ceur-ws.org/Vol-2325/paper-03.pdf
|volume=Vol-2325
|authors=Lydia Fischer,Stephan Hasler,Joerg Deigmoeller,Thomas Schnuerer,Michael Redert,Ulrike Pluntke,Katrin Nagel,Chris Senzel,Joern Ploennigs,Andreas Richter,Julian Eggert
|dblpUrl=https://dblp.org/rec/conf/kr/FischerHDSRPNSP18
}}
==Which Tool to Use? Grounded Reasoning in Everyday Environments with Assistant Robots==
Which tool to use? Grounded reasoning in everyday
environments with assistant robots
Lydia Fischer1 Stephan Hasler1 Jörg Deigmöller1 Thomas Schnürer2
Michael Redert3 Ulrike Pluntke3 Katrin Nagel3 Chris Senzel3
4 1
Joern Ploennigs Andreas Richter Julian Eggert1
1 – Honda Research Institute EU, Offenbach, Germany, firstname.surname@honda-ri.de
2 – Technical University of Illmenau, Illmenau, Germany, thomas.schnuerer@tu-illmenau.de
3 – IBM Watson IoT Center, Munich, Germany, firstname.surname@de.ibm.com
4 – IBM Research, Ireland, firstname.surname@ie.ibm.com
for seamless interaction, an artificial agent and a hu-
man need to share concepts about the things that
Abstract happen in their environment [Rebhan et al., 2009a,
Rebhan et al., 2009b]. A viable approach for this is to
We present a cooperative reasoning agent em- use semantic knowledge in terms of ontologies. Prob-
bodied into a mobile robot enabled to explore lems here lie in the generalizability of the approach
its environment by a camera. The robot can and in the grounding [Harnad, 1990] of the semantic
infer missing knowledge given an action and concepts in the real-world. Besides the pure function-
an object by the user, like for example “I want ality of such a system the capability of making the
to open a wine bottle.”. Available tools are ex- internal processes transparent is important for user
plored by the robot and it finally recommends acceptance and trust [Hancock et al., 2011]. To tackle
the most suitable one. The reasoning is on this problem, in a joint project between HRI-EU and
the one hand based on a static ontology that IBM Watson IoT Center, we have studied a minimal
describes how to relate the actions “open” and real-world set-up in which a robot interacts with a
“cut” with a fixed set of tools. On the other user to reason about tools, objects and actions. The
hand, unknown actions and tools are resolved scenario was limited to a small set of actions like e. g.
by looking up for synonyms or super/sub-class cutting and opening, which are common all day activi-
relations in Wikidata and WordNet. By this, ties. However it was designed to be expandable to a
the robot tries to map knowledge from linked broader set of involved elements and actions. In this
data to its internal interpretable ontology. To paper, we present a system consisting of a robotics plat-
retrace the reasoning process, the robot is able form provided with a scenario-specific root ontology,
to explain its conclusions by text to speech. dialog, reasoning, external knowledge search, object
Finally, we show the performance of the sys- detection, and symbol grounding capabilities. The on-
tem based on different settings for scanning tology provides seed knowledge about a small set of
the linked data. concepts, and we show the system capabilities of reason-
ing about tool-related queries within the ontology and
1 Introduction additionally making use of external knowledge sources
to extend its reasoning scope in case of “unknown”
Cooperative Intelligence (CI) is an approach which
concepts. The analysed system is capable of
takes up ideas from artificial intelligence and machine
learning but puts the human in the center of all con- • interacting with the user via speech,
siderations. One of the prerequisites of CI is that • processing user requests within the designed scope
of the system using an internal static ontology and
Copyright c by the paper’s authors. Copying permitted for
private and academic purposes.
reasoning,
In: G. Steinbauer, A. Ferrein (eds.): Proceedings of the 11th
• providing a log file that contains the internal steps
International Workshop on Cognitive Robotics, Tempe, AZ, USA, of the reasoning as well as an explanation via
27-Oct-2018, published at http://ceur-ws.org speech,
3
• connecting knowledge from external sources with 2. IRA provides a first suggestion by reasoning in its
the real world, and internal ontology (and external knowledge sources
• operating in the real world. if required)
In the following we present a proof of concept frame- 3. IRA explores the table with the available tools
work that allows for future extensions. We have chosen and recognizes them visually
a limited scenario that should show the main idea of 4. IRA suggests the most suited tool that is available
dealing with external knowledge sources for a robot. on the table for the given task and points to it
5. IRA provides an explanation of its decision
The scenario: Tools, objects, and actions The subsequent section provides related work fol-
lowed by a description of the system architecture. After-
In our real world scenario, we show a mobile robot
wards the functionality of single system components are
that is acting intelligently and able to reason by taking
explained. Then the results of the quantitative tests
into account the current situation and the knowledge
are described. The conclusion completes the paper.
sources of the system. In the future, robots will assist
humans in many tasks in their household or at work.
We consider the tasks of cutting and opening objects 2 Related work
which are typical in all day activities. These tasks are Most work in robotics in the area of rea-
easy to understand and offer already enough complexity soning focuses on detailed execution of manip-
to show the challenges and solutions in reasoning. ulation tasks and robot planning. For in-
In the beginning of the demonstration, a user enters stance, [Hogg et al., 2017, Tenorth and Beetz, 2013]
the scene and sets a task like “I want to cut a wooden and [Haidu and Beetz, 2016] focus on learning tasks
block.”. The system shall suggest a suited tool with from observing human activities in kitchen scenarios.
respect to the available tools in the current scene. The In the area of robot planning, the reasoning usually
robot IRA (Intelligent Reasoning Agent) is placed in a deals with uncertainties from real world measurements
fixed and static environment. In our demonstration, we [Hanheide et al., 2017].
set up a room with a table that is placed in the middle
Only little work exists in covering more broader
of the room (Figure 1). On the table there are tools
everyday knowledge for robots like usual rela-
tions between objects and properties to allow
a more natural and less targeted interaction.
Early work tried to build own crowd sourced
databases [Gupta and Kochenderfer, 2004]. Later on,
more extensive databases emerged from the linked
data wave [Antoniou et al., 2012, Wood et al., 2014].
[Daoutis et al., 2009] used the common sense database
Cyc [Matuszek et al., 2006] for communicating with
the robot about objects and their properties that have
been seen. [Kaiser et al., 2014] crawl text documents
that fit to the current context, build their own ontology
and improve the content by linking against WordNet
Figure 1: Illustration of the scenario. The robot IRA [Stallman and Krempl, 2010].
and the user are in a room. The user can ask IRA One major problem in making data interpretable
how to manipulate an object. IRA explores the table, for robots is the type of representation. In
detects the tools and proposes a suitable tool via speech. [Pustejovsky and Krishnaswamy, 2016], a semantic
which are available for the task. We assume that the knowledge representation is defined that describes real-
tools do not change and are not moved while exploring. world objects from a functional point of view, which
In case the user states a request which cannot be might be beneficial for robotics.
answered using the predefined ontology, e. g. “I want Still, the previous approaches lack in judging if an
to cube an onion.” (the action to ‘cube’ and the object observed object by the robot is usable for a certain
‘onion’ is unknown by the system), the system uses task. This relates to the problem that linked data is
external knowledge sources in order to match the un- not directly machine interpretable, since the naming
known terms with related terms known in the internal of symbols in the ontology depends on the designers
ontology. Hence, our system is able to work beyond choice. Hence, our approach differs as we map linked
the designed task scope with this mechanism. data onto an internal, machine-interpretable ontology
One run of the scenario contains five main steps: that represents a known concept to the robot. The
1. User sets a task via speech interaction advantage is that by doing this we are able to evaluate
4
if external knowledge sources are usable for the robot. SPEAK receives a text message from the back-end,
transfers it to speech and returns if the robot has
finished speaking.
3 The system architecture
ASK works similar as SPEAK, as it is a combination of
SPEAK at first - for asking a question - with a subsequent
event via MQTT
listening function. It finally returns the answer of the
Front-end Back-end
• conversation service user as text to the back-end.
• WAKE UP • keyword detection
• END SESSION • natural language under- EXPLORE TABLE is the most complex module. If trig-
• SPEAK standing
• ASK • reasoning
gered, it moves the robot around the table while keeping
• EXPLORE TABLE – internal knowledge the camera view on the table. The robot stops approx-
• COMUNICATE RESULT – external knowledge imately every 0.5 m on the trajectory and captures an
command via MQTT RGB and a depth image from the Kinect camera. For
each RGB image the objectness score of the YOLO
Figure 2: The system overview. The back-end controls detection framework [Redmon and Farhadi, 2017] is
the dialog and applies modules of the front-end to com- used to predict bounding boxes of object candidates
municate with the user or to explore the environment. (see Figure 3). For this the pre-trained YOLO
Commands/ events from and to the front-end are sent model for the COCO dataset is used, which yields
by MQTT. an 81-dimensional class vector for each bounding box.
This vector is the feature input to a linear SVM
The system consists of two main parts (Figure 2): [Cortes and Vapnik, 1995] that is trained to predict
the front-end (interaction platform) and the back-end the final tool class together with a confidence. Using
(dialog management and reasoning). We assume a sys-
tem with a predefined static ontology about objects and
tool types as well as relations between them with re-
spect to the actions cut and open, e. g. a wooden block
can be cut by a wood saw. The platform of the system
is a MetraLabs SCITOS G5 robot [MetraLabs, 2018],
additionally equipped with a Kinect camera mounted
on a pan-tilt-unit for moving the head. Laser scan-
ners allow the robot to localize itself in the room and
a KINOVA JACO arm [Kinovarobotics, 2018] enables
Figure 3: Robot camera view with predicted bounding
the robot to point at tools and objects in the environ-
boxes.
ment.
The front-end provides basic modules for the robot the depth information, a 3D world position is com-
like speaking or listening. Those modules are running puted for each object detection. After all views have
in ROS (Robot Operating System [ROS, 2018]). The been taken, nearby 3D detections are clustered using
back-end comprises mainly the dialog control and the DBSCAN [Ester et al., 1996]. Each cluster is assigned
reasoning. Further it acts as a state machine that with a unique instance ID, a label, and a confidence.
decides for example if the robot needs to explore the The label indicates the tool class with the highest accu-
environment for available tools. mulated confidence, and the confidence of the cluster
equals the ratio between the accumulated confidence
4 Front-end of the winning class and the sum of all detection con-
fidences (Figure 4). This integration step strongly
The front-end provides modular components that reduces the effects of noisy 3D estimations and wrong
can be independently triggered by the back-end class predictions from certain view-points. Finally the
through a MQTT (Message Queue Telemetry Trans- cluster labels with their confidences are sent to the
port, [ISO/IEC, 2016]) bridge. The MQTT bridge back-end. We expect that the single view detection
translates messages from the back-end to ROS messages performance can be strongly improved by directly fine-
of the front-end and vice versa. As depicted in Fig- tuning YOLO on our data. Having this, a more efficient
ure 2, the front-end modules are limited to SPEAK, ASK, exploration strategy of the robot can be implemented,
EXPLORE TABLE, and COMMUNICATE RESULT. WAKE UP e. g. like in [Andreopoulos et al., 2011] where occlu-
and END SESSION only send unidirectional messages sions of objects on a table are taken into account. This
for starting and stopping the dialog. The modules are module relates to step three of a demonstration run
explained in the following: explained at the end of the Introduction.
5
0. Bread Knife
confidence = 0.88
size = 200.0
4. Metal Saw 1. Screwdriver
confidence = 0.94
0. Bread Knife size = 80.0
2. Wood Saw
confidence = 0.88
size = 300.0
3. Lighter
3. Lighter confidence = 0.91
size = 80.0
4. Metal Saw
1. Screwdriver confidence = 0.94
size = 140.0
2. Wood Saw
Figure 5: Back-end architecture
case action and object are identified, the reasoning is
Figure 4: Integrated object detection result. performed. The result is then translated into natural
COMMUNICATE RESULT receives the inferred informa- language using the Cut & Open Dialog component.
tion from the back-end which tool is the most suitable This is also involved, if the system must ask for missing
one for the given query. This contains the instance information or explains the result.
ID of the tool and the text for speaking. The text is Structured reasoning: This layer contains the
forwarded to the SPEAK module and the instance ID is main reasoning component, which provides a struc-
used to point at the tool on the table with the robot tured API for concept-based as well as instance-based
arm (step 4). reasoning. It uses knowledge sources from the layer
below. Details of both algorithms are described in the
5 Back-end next section of the paper.
The complete dialog between the user and IRA is con- Knowledge sources: The components of this layer
trolled by the back-end. In addition, keyword detection provide information for the reasoning component. Most
based on natural language understanding techniques prominent is the Structured Knowledge Store compo-
extract the action and the object from tasks stated by nent, which contains structured domain knowledge
the user. (ontology) about the use case scenario. In order to en-
The control module of the back-end keeps track of hance manually added knowledge, it applies reasoning
the current state of the dialog and triggers the needed rules that generate additional knowledge. The Un-
commands for the robot. It also provides the exchange structured Knowledge Store is considered as a fallback
of information between the back-end components. The that returns a paragraph of text as description in case
main intelligence of the system is contained in the no specific tool can be recommended. WordNet and
reasoning module that is explained in more detail later. Wikidata wrap the corresponding knowledge sources.
The architecture of the back-end (Figure 5) follows WordNet [Stallman and Krempl, 2010] is a lexical
the micro service architectural pattern. It has been database of English containing nouns, verbs, adjec-
implemented on the IBM Cloud using multiple IBM tives and adverbs. Words are grouped into sets
Watson services. Connectivity between the robot and of cognitive synonyms (synsets), each expressing
the back-end is established through the Watson In- a distinct concept. This knowledge is helpful in
ternet of Things Platform using the MQTT protocol. resolving the meaning of unknown words. Wiki-
The back-end services can be organized in three layers, data [Vrandečić and Krötzsch, 2014] is an open source
which are briefly described below. knowledge database containing factual knowledge ex-
Natural language understanding: The compo- tracted for instance from Wikipedia. Both sources are
nents within this layer are responsible to drive the selected because of their complementary content.
dialog with the user. If an utterance is received by
the Dialog Controller, it decides, which of potentially
The reasoning:
several skills is best suited to handle it. In case, none
fits well, the Default Skill creates a response that tells The reasoning components are the pre-defined static
the user that the system cannot handle the request. If ontology and reasoning rules on top of it, the inter-
the utterance is related to the described scenario, the faces to Wikidata and WordNet. The static ontology
Cut & Open Skill will produce a response. In order to contains
do so, it first analyses the intent of the utterance and • objects (≈ 30), e. g. apple, glass, rope
extracts relevant information like the action and the • tools (≈ 20), e. g. bread knife, corkscrew, wood
object using the Keyword Extraction component. In saw
6
• concepts (≈ 15), e. g. vegetable, vessel, plant tools only that full-fill the constraints for cutting re-
• relations (hasComponent, cutBy, cutByPref, spectively opening the objects. To make the internal
openBy, openByPref) process of the reasoning transparent for the user there
• the internal structure encodes an isA relation of is a so-called reasoning log stating single steps of the
objects, tools, and concepts, e. g. an apple isA reasoning process. It is easily accessible in real-time on
fruit a web page. An example for a concept-based reasoning
The objects and the tools are attached with the prop- log can be found in Figure 7.
erties: stability S and their 3D shape. The 3D shape
• Checking ontology...
is described in the dimensions width W, height H and
• The action ‘to cut’ is known.
length L. These qualitative properties are used for • The object ‘Wooden Block’ is known.
reasoning. The stability S of a tool is specified in a • Best suited tool to cut a Wooden Block:
range from completely deformable to completely stiff. Wood Saw
Currently, the shape of a tool is not measured online • Unknown if it is available, hence exploring the
but is a static property stored in the ontology. Later on table.
it is planned to use the visual input of IRA to measure
the length of the recognized tools. For reasoning within Figure 7: Example of a concept-based reasoning log
cutting tasks there are two main reasoning rules using for the task “I want to cut a wooden block.”.
this properties:
After the first reasoning step, the robot explores
(i) Sobject ≤ Stool , and (ii) Wobject ≤ 2 · Ltool . the available tools on the table. Subsequently, the
instance-based reasoning takes place. The recognized
For a task within opening scenarios there is a special tools together with their recognition confidence are
reasoning rule in addition to the direct links that says, taken into account in addition to the internal and ex-
if a vessel v has a fastening f , and this fastening f can ternal knowledge. The differences to the concept-based
be opened by a tool t, then vessel v can also be opened reasoning are that only the available tools are con-
by tool t: sidered including their confidence values and that the
reasoning rules mentioned above are applied. With this
hasComponent(v, f ) & openBy(f, t) → openBy(v, t) . additional information, the system is able to suggest
an available tool in the current scene that fits the task
The reasoning consists of two steps: concept-based
with a high certainty. Figure 8 shows an example of
reasoning, and instance-based reasoning. The concept-
instance-based reasoning.
based reasoning uses the internal ontology and the
external knowledge without reference to the tools on • On the table there are best suited tools to cut a
the table. Hence the system provides a first sugges- Wooden Block: Wood Saw (id 1)
tion of a tool, as mentioned in the second step of the • The best suited tool with the highest recognition
scenario description. Figure 6 shows a visualisation confidence is the Wood Saw (id 1).
of the relations of a wooden block in the internal on- • The recognition confidence (0.95) is above a
tology which represents the main structure of objects defined threshold (0.80).
and tools. For the task “I want to cut a wooden block.” • Recommendation: Wood Saw (id 1)
cutByPref wood saw
Figure 8: Example of an instance-based reasoning log
wooden block
for the task “I want to cut a wooden block.”
cutBy cutting tools
Now, let us assume that the given task consists of
known actions (cut and open) and objects that are
Figure 6: Instance of the relations attached to an object
contained in the ontology, we refer to this kind of tasks
the first suggestion would be a wood saw since it is as standard tasks. For standard tasks the reasoning
connected via a cutByPref relation. The suffix Pref process is described above, external knowledge is not
indicates that a tool is always the preferred one given required. As an extension assume that at least the
a related tool. If such a relation is missing, an alter- action or the object, or both terms are unknown. In
native tool that is connected with the object via the this case the internal knowledge is unable to provide
non preference relations cutBy, openBy is suggested. an adequate suggestion because of lacking information.
In the example of the object wooden block (Figure 6) Using external knowledge sources to find a proper
all cutting tools (e. g. metal saw, pocket knife) are match of the unknown terms to terms known in the
connected with the wooden block and hence they are system is the first step before the reasoning steps of the
alternative tools to cut it. Of course this holds for standard tasks are performed. We refer to this kind of
7
tasks as non standard tasks. Within this paper we de- • User: “I want to cube an onion”
fine a match of two terms (unknown term, term known • IRA: “To cube is like to cut. Onion is a kind of
in the ontology) is proper when the semantic meaning vegetable. The best tool would be a Fruit Knife.
(linguistic) fits. In other words, a proper match is a Let me check the available tools.”
path in the external knowledge source that connects • IRA: “A Fruit Knife is available, I would recom-
an unknown term with a term known in the internal mend to use it.”
ontology. For example proper matches of actions are • User: “Please explain.”
(cube, cut) and (uncork, open) and on the other hand • IRA: “According to Wikidata a vegetable is a
superclass of an onion. According to WordNet to
for objects (meat, food), and (rose, plant). From Word-
cut is a hypernym of to cube. A Fruit Knife is
Net we only search for synonym and hypernym links,
the preferred tool to cut a vegetable. This tool is
whereas from Wikidata we search for material used, available on the table. Is there anything else you
part of, and subclass relations. The number of allowed want to do?”
links indicating relations is denoted as search depth. A
proper path is a path of the structure: unknown term Figure 9: Part of the dialog if action (cube) and object
(e. g. cube) – link-type (e. g. is synonym) – known term (onion) are unknown.
in the ontology (e. g. cut). Such a path can possibly
have more links and unknown terms in-between. The 6 Performance analysis of the system
important part is that the path connects the unknown
In the first part we evaluate if the suggested tool
term via allowed links and potentially other terms with
matches the task. For instance for the unknown terms
a term known in the ontology. In order to choose the
“cube an onion” a suited tool would be a knife. This
most reliable path out of possibly many paths ending
would count as a successful suggestion. An example for
in different known terms, we introduce a scoring in
a failure would be the suggestion to use a wood saw for
three steps. First, each existing link li is scored with 1:
“carving wood” (unknown term). The suggested tool
( is reasonable but of course it would be very difficult
1, if link exists to carve wood with a saw. In the second part of this
li =
0, if no link exists section we evaluate the performance of our system in
finding proper matches for unknown terms to terms in
Second, a path into the ontology has a weight pj the internal ontology. This mechanism is the basement
which is based on the number of links: of the reasoning.
n
Y We created a test set that consists in total of 50 test
pj = (0.9 · li ) examples. Each example is a triplet (object, action,
i=1 tool). The objects and actions are taken from 15 un-
known actions, 44 unknown objects, and the known
The factor 0.9 penalizes long paths, since we assume objects and actions in the ontology. We consider only
the longer the path the more the uncertainty increases. known tools for the test examples. The test set con-
The maximum score popt of all m path scores defines sists of 5 examples with only an unknown action, 32
the optimal path. It describes the “properness” of examples with only an unknown object, and 13 exam-
a match between two terms (unknown term, known ples where both terms are unknown. For a meaningful
term): evaluation, different settings have been applied in the
experiments. Within a setting the search depth of the
popt = max(p1 , ..., pm ) available relations of the external knowledge sources
are defined. The search depth is varied from direct
Methods for link and path scoring as well as the links (length one) up to search paths of length six. In
selection of the best result are subject to future research. the figures 10 and 11, the search depth is varied on the
For example, the method for scoring a link might use x-axis and the legend denotes the used relation. The
the number of links of the same type starting from the term Wikidata refers to the used relations: material
same object. Further the types of links, e. g. made of, used, part of, and subclass. WordNet refers to the
can be scored differently. relations: synonym and hypernym.
Figure 9 shows an excerpt of a dialog that highlights
the reasoning of IRA if it is asked for “cube an onion”.
Evaluation of tool suggestion:
This example contains an unknown action (cube in the
sense of cutting) and an unknown object (onion). If Figure 10 shows the evaluation of the defined test cases.
IRA is asked to explain its reasoning it describes the It can be seen that single relations like synonym and
relations to an external knowledge source based on the hypernym contained in WordNet help in resolving few
mentioned scoring process. test cases only, independently of the search depth. The
8
100
hypernym
1.0
synonym true positive rate
80 WordNet
Wikidata & WordNet false positive rate
test cases passed [%]
60 positive rate
0.8
40
20
0.6
0
rate
1 2 3 4 5 6
search depth
0.4
Figure 10: The evaluation of the test cases if accessing
external knowledge sources. The performance is shown
0.2
for different knowledge sources depending on search
depth.
0.0
combination of both relations is beneficial (WordNet 0 1 2 3 4 5 6
search depth
graph) but the best performance occurs when combin-
ing WordNet and Wikidata. As mentioned the content
Figure 11: The true positive rate, false positive rate,
of both sources is complementary and it is consequen-
and the positive rate for the unknown objects in the
tial that their combination is beneficial. The associated
test set using Wikidata and WordNet.
graph displays that a search depth of three provides the
best results while higher depths leads to a decreasing system finds proper matches for 75 % of the unknown
performance. In total the system is able to recommend actions of the test set.
suited tools for 35 examples while 15 failed. The main In summary, the most promising setting is the com-
reason is missing information such that there is no pos- bination of sources with complementary content (in our
sibility to find a proper match for at least one unknown case WordNet and Wikidata). There are many more
term. Another reason of failing is the mismatch of the sources available [Paulheim, 2017] possibly useful for
suggested tool with the expected one in the test case intelligent systems.
triplet. An example for this case is (wooden block,
carve (unknown), pocket knife). Here the system sug-
7 Conclusions
gests the wood saw but of course carving wood with a
wood saw is difficult. We proposed an intelligent system that is able to pro-
Subsequently we evaluate more deeply the perfor- vide an answer for a stated task by the user. The
mance of resolving unknown actions and objects. answer is retrieved through reasoning taking an in-
ternal ontology, external knowledge sources and the
Evaluation of proper object matches: explored resources in the current scene into account.
The reasoning takes place in order to suggest a suited
To evaluate the performance regarding proper object tool for a task like “I want to cut a wooden block.”
matches using Wikidata and WordNet, we analyzed the that contains an action and an object. For unknown
true positive rates and false positive rates (Figure 11). actions and objects, external knowledge sources are
The true positive rate reports the number of proper used to map these terms onto known terms into the
matches divided by the overall number of unknown ontology. For our defined test cases it turned out that
objects from our test set. The false positive rate de- the combination of complementary external knowledge
notes the number of mismatches, that means matches sources is beneficial and a promising direction for future
into the ontology which are semantically wrong, again research. The system shows on a web page a reason-
divided by the overall number of unknown objects. The ing log containing the internal steps of the reasoning
displayed positive rate shows the sum of both rates. process and an explanation of the found suggestion via
From Figure 11, we derive that a search depth of three speech. This mechanism makes the system transparent
provides a good balance between true positives (67,3 %) and nicer to engage with.
and false positives (19 %). This is in accordance with In future, we would like to extend the scenario to
observations from Figure 10 and holds especially for more actions and objects to allow a more natural and
using WordNet and Wikidata. As can be seen from comprehensive interaction with the robot.
the positive rate, the system finds matches into the
ontology for 95.45 % of the unknown objects. During
Acknowledgements
our experiments it turned out, that WordNet is more
suited than Wikidata for resolving lacking information The authors would like to thank all the people that
of unknown actions due to its lexical content. Using supported this project at both companies. We thank
the relations synonym or hypernym from WordNet, the MetraLabs for the set-up and support of the robot.
9
References [Kaiser et al., 2014] Kaiser, P., Lewis, M., Petrick, R.
P. A., Asfour, T., and Steedman, M. (2014). Extracting
[Andreopoulos et al., 2011] Andreopoulos, A., Hasler, S.,
common sense knowledge from text for robot planning.
Wersing, H., Janßen, H., Tsotsos, J., and Körner,
In 2014 IEEE International Conference on Robotics and
E. (2011). Active 3D Object Localization Using a
Automation, ICRA, pages 3749–3756.
Humanoid Robot. IEEE Transactions on Robotics,
27(1):47–64. [Kinovarobotics, 2018] Kinovarobotics (2018). Kinova.
Website. www.meetjaco.com/about/.
[Antoniou et al., 2012] Antoniou, G., Groth, P., Harme-
len, F. v., and Hoekstra, R. (2012). A Semantic Web [Matuszek et al., 2006] Matuszek, C., Cabral, J., Wit-
Primer. The MIT Press. brock, M., and Deoliveira, J. (2006). An introduction
to the syntax and content of Cyc. In Proceedings of the
[Cortes and Vapnik, 1995] Cortes, C. and Vapnik, V. AAAI Spring Symposium on Formalizing and Compiling
(1995). Support-vector networks. Machine Learning, Background Knowledge and Its Applications to Knowl-
20(3):273–297. edge Representation and Question Answering, pages 44–
[Daoutis et al., 2009] Daoutis, M., Coradeshi, S., and 49.
Loutfi, A. (2009). Grounding Commonsense Knowledge [MetraLabs, 2018] MetraLabs (2018). Metralabs. Website.
in Intelligent Systems. Journal of Ambient Intelligence www.metralabs.com/en/mobile-robot-scitos-g5/.
and Smart Environments, 1(4):311–321.
[Paulheim, 2017] Paulheim, H. (2017). Knowledge graph
[Ester et al., 1996] Ester, M., Kriegel, H., Sander, J., and refinement: A survey of approaches and evaluation
Xu, X. (1996). A density-based algorithm for discov- methods. Semantic Web, 8(3):489–508.
ering clusters in large spatial databases with noise. In
Proceedings of the Second International Conference on [Pustejovsky and Krishnaswamy, 2016] Pustejovsky, J.
Knowledge Discovery and Data Mining (KDD-96), pages and Krishnaswamy, N. (2016). VoxML: A visualization
226–231. modeling language. In Proceedings of the Tenth Interna-
tional Conference on Language Resources and Evaluation
[Gupta and Kochenderfer, 2004] Gupta, R. and Kochen- LREC 2016, Portorož, Slovenia, May 23-28, 2016.
derfer, M. J. (2004). Common Sense Data Acquisition
[Rebhan et al., 2009a] Rebhan, S., Einecke, N., and Eg-
for Indoor Mobile Robots. In Proceedings of the 19th
gert, J. (2009a). Consistent modeling of functional de-
National Conference on Artifical Intelligence, AAAI’04,
pendencies along with world knowledge. In Proceedings
pages 605–610. AAAI Press.
of the International Conference on Cognitive Information
[Haidu and Beetz, 2016] Haidu, A. and Beetz, M. (2016). Systems Engineering, pages 341–348.
Action recognition and interpretation from virtual [Rebhan et al., 2009b] Rebhan, S., Richter, A., and Eg-
demonstrations. In IEEE/RSJ International Conference gert, J. (2009b). Demand-driven visual information ac-
on Intelligent Robots and Systems (IROS), pages 2833– quisition. In Computer Vision Systems, 7th International
2838. Conference on Computer Vision Systems, ICVS, pages
[Hancock et al., 2011] Hancock, P. A., Billings, D. R., 124–133.
Schaefer, K. E., Chen, J. Y. C., de Visser, E., and Para- [Redmon and Farhadi, 2017] Redmon, J. and Farhadi, A.
suraman, R. (2011). A meta-analysis of factors affect- (2017). YOLO9000: Better, Faster, Stronger. In IEEE
ing trust in human-robot interaction. Human Factors, Conference on Computer Vision and Pattern Recognition
53(5):517–527. (CVPR), pages 6517–6525.
[Hanheide et al., 2017] Hanheide, M., Göbelbecker, M., [ROS, 2018] ROS (2018). Robot Operating System. Web-
Horn, G. S., Pronobis, A., Sjöö, K., Aydemir, A., Jens- site. www.ros.org.
felt, P., Gretton, C., Dearden, R., Janicek, M., Zender,
H., Kruijff, G.-J., Hawes, N., and Wyatt, J. L. (2017). [Stallman and Krempl, 2010] Stallman, R. M. and
Robot task planning and explanation in open and un- Krempl, S. (2010). Princeton university ‘About Word-
certain worlds. Artificial Intelligence, 247:119–150. Net’. Website. https://wordnet.princeton.edu/.
[Harnad, 1990] Harnad, S. (1990). The symbol grounding [Tenorth and Beetz, 2013] Tenorth, M. and Beetz, M.
problem. Physica D: Nonlinear Phenomena, 42(1):335– (2013). KnowRob: A Knowledge Processing Infrastruc-
346. ture for Cognition-enabled Robots. International Jour-
nal of Robotic Research, 32-5:566–590.
[Hogg et al., 2017] Hogg, D. C., Alomari, M., Duckworth,
P., Hawasly, M., Bore, N., and Cohn, A. G. (2017). [Vrandečić and Krötzsch, 2014] Vrandečić, D. and
Grounding of Human Environments and Activities for Krötzsch, M. (2014). Wikidata: A free collabo-
Autonomous Robots. In Proceedings of International rative knowledgebase. Communications of ACM,
Joint Conference of Artificial Intelligence (IJCAI), pages 57(10):78–85.
1395–1402. [Wood et al., 2014] Wood, D., Zaidman, M., Ruth, L., and
Hausenblas, M. (2014). Linked Data: Structured Data
[ISO/IEC, 2016] ISO/IEC (2016). Information technol-
on the Web. Manning Publications, Shelter Island, first
ogy – Message Queuing Telemetry Transport (MQTT).
edition.
Website. https://www.iso.org/standard/69466.html.
10