Transforming Web Knowledge into Actionable Knowledge
                         Graphs for Robot Manipulation Tasks
                         Michael Beetz1 , Philipp Cimiano2 , Michaela Kümpel1 , Enrico Motta3 , Ilaria Tiddi4 and
                         Jan-Philipp Töberg2
                         1
                           Institute for Artificial Intelligence, University of Bremen, Bremen, Germany
                         2
                           Cluster of Excellence Cognitive Interaction Technology (CITEC), Bielefeld University, Bielefeld, Germany
                         3
                           Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom
                         4
                           Knowledge Representation and Reasoning Group, Vrije Universiteit Amsterdam, The Netherlands


                                      Abstract
                                      One of the visions in AI based robotics are household robots that can autonomously handle a variety of meal
                                      preparation tasks. Based on this scenario, we present a best practice tutorial on how to create actionable knowledge
                                      graphs that a robot can use for execution of task variations of cutting actions. We implemented a solution for
                                      this task that integrates all necessary software components in the framework of the robot control process. In
                                      the context of this tutorial, we focus on knowledge acquisition, knowledge representation and reasoning, and
                                      simulating robot action execution, bringing these components together into a learning environment that – in the
                                      extended version – introduces the whole control process of Cognitive Robotics. In particular, the Tutorial will
                                      detail necessary concepts a knowledge graph should include for robot action execution, how web knowledge
                                      can be automatically acquired for the domain of cutting fruits, and how the created knowledge graph can be
                                      used to let robots execute tasks like slicing a cucumber or quartering an apple. The learning environment follows
                                      an immersive approach, using a physics-based simulation environment for visualization purposes that helps to
                                      illustrate the concepts taught in the tutorial.
                                          Tutorial ressource: https://github.com/Food-Ninja/Tutorial_ESWC_HHAI

                                      Keywords
                                      Knowledge Representation, Cognitive Robotics, Web Knowledge, Actionable Knowledge, Knowledge Extraction


                         1. Introduction
                         We envision household robots that can be placed in any kitchen to then be given a random recipe
                         from the Web that they can understand and parse into action plans that can be broken down into
                         executable body motions that can be performed with available objects in the environment. For this,
                         robots need to be enabled to perform meal preparation tasks with any tool, on any available object
                         and for a variation of tasks. This tutorial is based on prior research that proposed a methodology for
                         creating actionable knowledge graphs [1], where a solution for creating knowledge graphs that link
                         object to action and environment information and thus make them actionable is proposed, as well as a
                         knowledge engineering methodology that is more specifically aligned to creating ontologies for meal
                         preparation tasks that can be used to parameterise robot action plans in order to perform task variations
                         of cutting actions [2].
                            There has been lots of research on creation of knowledge graphs, which has led to many domain
                         knowledge graphs that have proven to be good in answering questions. Usually, these knowledge
                         graphs contain object information (e.g. about food objects, recipes, people, books). To make such
                         knowledge graphs actionable, it is important to link the contained object knowledge to environment
                         knowledge. If robots shall use the knowledge graphs for action execution, they need to further include
                         action knowledge.

                          ESWC 2024 Workshops and Tutorials Joint Proceedings, May 26-27, Heraklion, Greece
                          $ beetz@cs.uni-bremen.de (M. Beetz); cimiano@techfak.uni-bielefeld.de (P. Cimiano); michaela.kuempel@uni-bremen.de
                          (M. Kümpel); enrico.motta@open.ac.uk (E. Motta); i.tiddi@vu.nl (I. Tiddi); jtoeberg@techfak.uni-bielefeld.de (J. Töberg)
                           0000-0002-7888-7444 (M. Beetz); 0000-0002-4771-441X (P. Cimiano); 0000-0002-0408-3953 (M. Kümpel);
                          0000-0003-0015-1952 (E. Motta); 0000-0001-7116-9338 (I. Tiddi); 0000-0003-0434-6781 (J. Töberg)
                                   © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Figure 1: The Knowledge Engineering Methodology proposed in [2] we use as the foundation for the tutorial.


   This implies that actionable knowledge graphs do not aim at perfectly modeling object knowledge,
but instead focus on reuse of existing knowledge sources and modeling and linking of environment and
action knowledge in order for making the contained knowledge applicable in agent applications. This
tutorial will detail the necessary concepts for creating an actionable knowledge graph for the example
domain of Cutting Fruits and Vegetables, which shall be used by robotic agents to be able to infer the
correct body motions for quartering an apple or dicing a cucumber.


2. Structure of the Tutorial
The tutorial is centered around the knowledge engineering methodology introduced in [2] and its
application on the exemplary task of Cutting Fruits & Vegetables. In general, the methodology consists
of five steps to create actionable knowledge graphs that a robot can employ to handle manipulation
tasks, as can be examined in Figure 1. In the following we present a brief summarisation of these steps:
    1) Defining Motion Parameters: Definition of the domain- and action-dependent parameters
       influencing the execution of the target manipulation action. An example is the knife position for
       cutting tasks.
    2) Collecting Knowledge Sources: Collection of different sources for three types of knowledge:
       action knowledge, object knowledge & knowledge for linking action and object knowledge
  3a) Extraction of Action Groups & Affordances: Collect information about the manipulation
       action and its associated synonyms and hyponyms. This information is used to organize different
       action verbs into groups based on similarities in their motion parameters. For each so called
       action group, a representative is chosen and their affordances are created.
  3b) Extraction of Object Knowledge & Dispositions: Collect information about objects partici-
       pating in the manipulation action (e.g. tools, environments, targets). Then collect information
       and concrete values for the task-specific object properties that influence the action execution.
       This knowledge is represented through dispositions.
    4) Relate Object to Action Knowledge: Relate the action affordances to the object dispositions in
       an ontology by re-using relations from the SOMA [3] ontology.
    5) Link to Cognitive Architecture: Map concepts in the generalized manipulation plan to their
       representation in the ontology and use the architecture’s perception system to ground objects
       and their properties.
In this tutorial we present the whole methodology but focus on the steps 1), 3) and 4), which represent
the knowledge collection and extraction from (Semantic) Web resources.
2.1. Defining Motion Parameters
In order to create an actionable knowledge graph for the domain of cutting fruits and vegetables, we
first have to investigate motion parameters that influence action execution. For this, one can first
investigate a lexical resource like WordNet [4] to find commonly used synonyms of cutting, such as
slicing, dicing, or halving.
   We then investigate how different action verbs influence task execution, which results in the following
motion parameters:
     - number of repetitions: Cutting tasks vary in the number of repetitions to be executed. Sometimes,
       a cut is only performed once, while other tasks require to cut the whole object.
     - cutting position: Cutting tasks also vary in the applied cutting position. Halving requires a
       different position than slicing, for example.
     - result object: Cutting tasks result in objects of different amount and shape.
     - prior actions: Some objects require a prior action (such as peeling) to be executed.
     - dependent tasks: Some tasks depend on prior tasks (i.e. quartering depends on halving).

2.2. Extraction of Relevant Action Knowledge from the Web
The relevant action knowledge we focus on consists of the different verbs that are associated with the
manipulation action. This includes the main verb (e.g. cut) as well as all of its hyponyms and synonyms.
Additionally, action knowledge covers the properties of the different verbs that distinguish their action
execution and generally influence the manipulation action.
  In the tutorial we showcase the action knowledge extraction for the exemplary task of Cutting. We
begin by extracting all synonyms and hyponyms from WordNet [4] and VerbNet [5], two expertly
created resources for lexical information and verb usage. For the verb cut, we extract 211 verbs from
WordNet and 147 verbs from VerbNet. After pre-processing and duplicate removal, 181 verbs remain.
These remaining verbs are then filtered based on their relevance for the domain using an instruction-
focused corpus from WikiHow. We set a threshold of 100 occurrences in a specific part of an article
across the whole corpus to warrant an inclusion of the verb in future steps. With this restriction, only
46 verbs remain. However, there is still a need for manual post-processing since some important verbs
are missing (e.g. halve or quarter) or are very general and thus not relevant for cutting (e.g. make or
pull).


Table 1
Comparison of different methods for extracting anatomical parts for a given fruit sorted based on their F1-score.
In each column, we mark the three methods with the highest performance in bold.
            Method                 Acc.    Prec.    Rec.    Spec.      F1 Threshold
            Recipe1M+ 2-Step  .863          .824     .636    .948    .718    Occ. in ≥ 1% of steps
            ChatGPT            .775          .556   .909      .724   .690    -
            GPT-4              .700          .476   .909      .621   .625    -
            CN Numberbatch    .788           .609    .636     .845    .622   Cossim ≥ 0.20
            Recipe1M+ Bigrams .688           .463   .864      .621    .603   Occ. in any step
            Recipe1M+ 2-Step   .738          .517    .682     .759    .588   Occ. in ≥ 0.5% of steps
            Recipe1M+ Bigrams .788          .667     .455    .914     .541   Occ. in ≥ 0.1% of steps
            CN Numberbatch    .825          1.00     .364    1.00     .533   Cossim ≥ 0.30
            GloVe              .550          .348    .727     .483    .471   Cossim ≥ 0.25
            GloVe              .688          .435    .455     .776    .444   Cossim ≥ 0.40
            NASARI             .750          .571    .364     .897    .444   Cossim ≥ 0.75
            GloVe              .738          .533    .364     .879    .432   Cossim ≥ 0.50
            NASARI             .500          .295    .591     .466    .394   Cossim ≥ 0.50
2.3. Extraction of Relevant Object Knowledge from the Web
For the object knowledge, we focus on information about objects involved in the manipulation action,
their properties, usage and their specific purpose. In general we showcase a similar pipeline to the one
explained in Section 2.2. We begin by extracting all relevant objects from domain-specific taxonomies.
For our focus on fruits and vegetables, we query the FoodOn [6] using SPARQL, resulting in 257 unique
fruits and 31 unique vegetables. Since not all of these fruits and vegetables are equally relevant and we
need enough information to exist to evaluate their task-specific properties, we again use instruction-
focused corpora to filter them based on their occurrence data. In this case we also look at the recipe
corpus Recipe1M+ [7] and only include fruits and vegetables that occur in 1% of any part of these two
corpora. This filtering step results in 15 remaining fruits and one remaining vegetables.
   Lastly, we present our ongoing efforts in automating the extraction of task-specific object property
values. For this, we compare three different pre-trained embeddings (GloVe [8], NASARI [9] and
ConceptNet Numberbatch [10]), two large language models (ChatGPT and GPT-4) as well as two
techniques for extracting this information from the Recipe1M+ on the task of extracting the existing
anatomical parts for a given fruit. Our preliminary results and their condition can be examined in Table 1.

2.4. Linking Action to Object Knowledge in the Ontology
For connecting and linking the action to the object knowledge, we rely on the concepts of disposition
and affordance. In general, a disposition describes the property of an object, thereby enabling an agent
to perform a certain task [11] as in a knife can be used for cutting, whereas an affordance describes what
an object or the environment offers an agent [12] as in an apple affords to be cut.
   In recent works like SOMA [3], both concepts are set in relation by stating that dispositions allow
objects to participate in events realizing affordances, which are more abstract descriptions of dispositions.
This is achieved in the TBOX by using the affordsTask, affordsTrigger and hasDisposition
relations from SOMA. An example for the disposition of Peelability can be examined in Section 2.4.

             hasDisposition 𝑠𝑜𝑚𝑒
                (𝑃 𝑒𝑒𝑙𝑎𝑏𝑖𝑙𝑖𝑡𝑦
                𝑎𝑛𝑑 (affordsTask 𝑠𝑜𝑚𝑒 𝑃 𝑒𝑒𝑙𝑖𝑛𝑔)
                𝑎𝑛𝑑 (affordsTrigger 𝑜𝑛𝑙𝑦 (𝑐𝑙𝑎𝑠𝑠𝑖𝑓 𝑖𝑒𝑠 𝑜𝑛𝑙𝑦 𝐻𝑎𝑛𝑑)))
Figure 2: Example for connecting an affordance (”Peeling with a hand”) to a disposition (”Peelability”) using
relations from the SOMA ontology [3].


3. Tutorial Material
For the tutorial, we made our implementation available in Jupyter Notebooks found in a GitHub
repository1 . Participants are encouraged to download the notebooks and follow along, but the notebooks
are presented in depth during the talks, so actual hands-on experience is optional.


Acknowledgments
The tutorial is organized by the SAIL Network in collaboration with the Joint Research Center on
Cooperative and Cognition-enabled AI (CoAI JRC). The research towards this Tutorial has been par-
tially supported by the German Federal Ministy of Education and Research; Project-ID 16DHBKI047
“IntEL4CoRo - Integrated Learning Environment for Cognitive Robotics”, University of Bremen as well
as the German Research Foundation DFG, as part of CRC (SFB) 1320 “EASE - Everyday Activity Science
and Engineering”, University of Bremen (http://www.ease-crc.org/). The research was conducted in
subproject R04 “Cognition-enabled execution of everyday actions”.
1
    https://github.com/Food-Ninja/Tutorial_ESWC_HHAI
References
 [1] M. Kümpel, Actionable knowledge graphs - how daily activity applications can benefit from
     embodied web knowledge, 2024. doi:10.26092/elib/2936.
 [2] M. Kümpel, J.-P. Töberg, V. Hassouna, P. Cimiano, M. Beetz, Towards a Knowledge Engineering
     Methodology for Flexible Robot Manipulation in Everyday Tasks, in: International Workshop
     on Actionable Knowledge Representation and Reasoning for Robots (AKR3 ), Heraklion, Crete,
     Greece, 2024.
 [3] D. Beßler, R. Porzel, M. Pomarlan, A. Vyas, S. Höffner, M. Beetz, R. Malaka, J. Bateman, Foundations
     of the Socio-physical Model of Activities (SOMA) for Autonomous Robotic Agents, in: Formal
     Ontology in Information Systems, volume 344 of Frontiers in Artificial Intelligence and Applications,
     IOS Press, Amsterdam, 2022, pp. 159–174. URL: https://ebooks.iospress.nl/doi/10.3233/FAIA210379.
     arXiv:2011.11972.
 [4] G. A. Miller, WordNet: A Lexical Database for English, Communications of the ACM 38 (1995)
     39–41. doi:10.1145/219717.219748.
 [5] K. K. Schuler, VerbNet: A Broad-Coverage, Comprehensive Verb Lexicon, Ph.D. thesis, University
     of Pennsylvania, 2005.
 [6] D. M. Dooley, E. J. Griffiths, G. S. Gosal, P. L. Buttigieg, R. Hoehndorf, M. C. Lange, L. M. Schriml,
     F. S. L. Brinkman, W. W. L. Hsiao, FoodOn: A harmonized food ontology to increase global
     food traceability, quality control and data integration, npj Sci Food 2 (2018) 23. doi:10.1038/
     s41538-018-0032-6.
 [7] J. Marín, A. Biswas, F. Ofli, N. Hynes, A. Salvador, Y. Aytar, I. Weber, A. Torralba, Recipe1M+:
     A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images, IEEE
     Transactions on Pattern Analysis and Machine Intelligence 43 (2021) 187–203. doi:10.1109/
     TPAMI.2019.2927476.
 [8] J. Pennington, R. Socher, C. Manning, Glove: Global Vectors for Word Representation, in:
     Proceedings of the 2014 Conference on Empirical Methods in Natural Language Process-
     ing (EMNLP), Association for Computational Linguistics, Doha, Qatar, 2014, pp. 1532–1543.
     doi:10.3115/v1/D14-1162.
 [9] J. Camacho-Collados, M. T. Pilehvar, R. Navigli, NASARI: A Novel Approach to a Semantically-
     Aware Representation of Items, in: Proceedings of the 2015 Conference of the North American
     Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver,
     CO, 2015, pp. 567–577. URL: http://aclweb.org/anthology/N/N15/N15-1059.pdf.
[10] R. Speer, J. Chin, C. Havasi, ConceptNet 5.5: An Open Multilingual Graph of General Knowledge,
     AAAI 31 (2017). doi:10.1609/aaai.v31i1.11164.
[11] M. T. Turvey, Ecological foundations of cognition: Invariants of perception and action., in: H. L.
     Pick, P. W. van den Broek, D. C. Knill (Eds.), Cognition: Conceptual and Methodological Issues.,
     American Psychological Association, Washington, 1992, pp. 85–117. doi:10.1037/10564-004.
[12] M. H. Bornstein, J. J. Gibson, The Ecological Approach to Visual Perception, The Journal of
     Aesthetics and Art Criticism 39 (1980) 203. doi:10.2307/429816. arXiv:10.2307/429816.