=Paper=
{{Paper
|id=Vol-2941/paper14
|storemode=property
|title=BenchEmbedd: A FAIR Benchmarking Tool for Knowledge Graph Embeddings
|pdfUrl=https://ceur-ws.org/Vol-2941/paper14.pdf
|volume=Vol-2941
|authors=Afshin Sadeghi,Xhulia Shahini,Martin Schmitz,Jens Lehmann
|dblpUrl=https://dblp.org/rec/conf/i-semantics/SadeghiSS021
}}
==BenchEmbedd: A FAIR Benchmarking Tool for Knowledge Graph Embeddings==
<pdf width="1500px">https://ceur-ws.org/Vol-2941/paper14.pdf</pdf>
<pre>
     BenchEmbedd: A FAIR Benchmarking tool for
           knowledge graph Embeddings

Afshin Sadeghi1,2 ? ?? , Xhulia Shahini1 , Martin Schmitz1 , and Jens Lehmann1,2
           1
             Department of Computer Science, University of Bonn, Germany
                            2
                              Fraunhofer IAIS, Germany
               sadeghi@cs.uni-bonn.com, shahinixhulja@gmail.com,
            schmitz.kessenich@gmail.com, jens.lehmann@cs.uni-bonn.de


         Abstract. Knowledge graph embedding models have been studied com-
         prehensively recently. However, these studies lack an evaluation system
         that compares their efficiency in a reproducible manner that follows the
         FAIR principles. In this study, we extend the general HOBBIT bench-
         marking platform to evaluate the efficiency of embedding models with
         such criteria. The demo benchmark, source code of this study, and instal-
         lation and usage guide are openly available in https://github.com/mlwin-
         de/BenchEmbed. In this paper, we explain the structure of this Bench-
         marking tool and demonstrate the usage of the benchmarking system for
         the knowledge graph embedding models.

         Keywords: Knowledge graph embedding · Benchmarking · Link pre-
         diction.


1      Introduction

A knowledge graph is a heterogeneous multi-relational graph composed of knowl-
edge about the world presented in a structured form i.e. facts are represented by
entities that are connected using relations. Knowledge graphs embedding (KGE)
models learn a mathematical approximation of knowledge graphs and produce
representations for their entities and relations. These methods have been compre-
hensively studied recently [4, 3, 6] and are applied in many downstream Machine
Learning and Natural Language Processing (NLP) tasks. A gap in current KGE
studies is a standard independent evaluation environment that evaluates the ef-
ficiency of models in a fair setting (e.g. with same vector sizes). Furthermore,
these studies suffer from the lack of a systematic reproducible evaluation. To
target these issues, we extended the HOBBIT [2] platform is a Holistic bench-
marking approach for Big Linked Data. with a new set of benchmarks with the
aim to evaluate the efficiency of knowledge graph embedding models with the
aforementioned criteria. We released this Benchmarking tool with the name
?
     The two preceding authors are co-first.
??
     Copyright © 2021 for this paper by its authors. Use permitted under Creative
     Commons License Attribution 4.0 International (CC BY 4.0).
2       A. Sadeghi et al.

BenchEmbedd. A demo benchmark, source code, installation, and usage guide of
this project are openly available3 .
    We chose HOBBIT as the base, because it is developed under FAIR prin-
ciples [5]. We follow the same concepts in making this BenchEmbedd.Another
advantage of the platform is generating dockerized benchmarking, i.e., that once
a system (image) is generated, it be executed locally on a personal system or a
local cluster or be deployed on computing services such as Amazon Web Services
(AWS).
    The produced benchmarks are accessible, transferable, and easily reusable.
This setting promotes reliable scientific publications, because it allows researchers
to repeat the evaluations of an study without concerns about standardized eval-
uation hardware. We ensure the reproducibility of the evaluations by generating
benchmark systems which are executable (docker) images of the exact envi-
ronment of an original evaluation made by a researcher. The method is easily
extensible by making a new copy and adding more models to it. In the follow-


Fig. 1. HOBBIT platform structure. BenchEmbedd extends it with evaluations and
metrics for Link prediction on knowledge graphs.


ing section, we explain the structure of our benchmarking platform. We then in
explain the functionalities in Section 3 and the Demonstration of BenchEmbedd
in Section 4.


2     Structure:

Figure4 1 illustrates the components in the HOBBIT platform structure. To
make a HOBBIT-based benchmark, we created the green and orange components
in this figure. These parts consist of the Benchmark Components (in orange) and
Benchmark System (in green).
    The Benchmark Components provide the tasks and data for the system.
All benchmark components together work as an infrastructure for benchmark-
ing a system on the task of link prediction. This section consists of Evaluation
3
    https://github.com/mlwin-de/BenchEmbedd
4
    The diagram is from [2].
BenchEmbedd: A FAIR Benchmarking tool for knowledge graph Embeddings                3

Module, Evaluation storage, Benchmark Controller, Task Generator, and Data
Generator.
   The Benchmark System contains a complete ready-to-run Benchmarking
workflow within a controlled dockerized5 running environment. A Benchmark
System can contain configurations for running multiple tests on different models
and different test datasets. To Extend BenchEmbedd to other datasets it is
enough to duplicate and extend a new Benchmark System configuration of the
benchmarking platform. Section 4 explains a demo System and explains the steps
to make a new System.


3     Functionalities
In BenchEmbedd we perform a Link Prediction evaluation task. KGE models
learn knowledge graphs in the form of triples (head, relation, tail), and the
link prediction task tests KGE models in how efficiently they predict missing
links (triples) in a knowledge graph. Figure 2 shows a knowledge graph with 4


            Fig. 2. An example of a knowledge graph with a missing link.


entities, where the green relations are known. In this example, the link prediction
task tests how well the missing triple (“Polito”, “is a university in”, “Italy”) is
estimated by a knowledge graph learning model. A KGE model is efficient if it
generates a high score for the missing link indicating the existence of this relation.
The current implementation computes the following metrics: HIT@1, HIT@3,
HIT@10, and Mean Reciprocal Rank. The current implementation includes the
test for TransE [1] model, while the benchmark is open to be extended to other
models. We configured a benchmark to test over the WN18rr benchmarking
dataset for the demo.


4     Demonstration
The benchmark is a java Maven project. After the setup 6 of BenchEmbedd, to
execute a sample Benchmark system online one needs to follow these steps:
5
    https://www.docker.com
6
    Setup guide is in https://github.com/mlwin-de/BenchEmbedd#installation
4         A. Sadeghi et al.

    – Login to the website https://master.project-hobbit.eu/.
    – Select “Benchmarks”.
    – Select “MLwin Benchmark” in the drop-down list of “Benchmarks”.
    – Select a desired System to Benchmark in the drop-down list “System”.
    – Press the “Submit” Button.
At this stage, a pop-up window will appear. There the Experiment Status shows
the progress of the running experiment and clicking the link in the popup window
shows the experiment results once the experiment is finished. Figure 3 illustrates
an example of the result table after running the demo benchmark system.


                 Fig. 3. An example of demonstrated evaluation results.


    Adding new models: To include more metrics and datasets in the con-
text of the knowledge graph link prediction task, it is possible to make a new
Benchmark test environment with a new configuration. Then a new independent
Benchmark dockerized system (colored green in Figure 1 entitled as “Bench-
marked System” ) is created on this configuration. The steps to make a new
Benchmark environment by extending the current demo Benchmark configura-
tion is:
    – Writing a Benchmark System file.
    – Providing a set of pre-trained embedding vectors.
    – Creating a system docker image.
    – Writing a system meta-data file.
    – Creating a HOBBIT GitLab account to load up the files.
The steps to write a Benchmark System file are:
    – Extend the TransEtest.java file for a new benchmark system file. It contains
      the method “test triple” that is the base for the link prediction tests.
    – Provide trained embeddings with names “entity2vec.txt” and
      “relation2vec.txt”.
    Our Sample System is trained using the TransE model and the output files
of the training process of this repository are converted from “.npy” to “.txt”
BenchEmbedd: A FAIR Benchmarking tool for knowledge graph Embeddings                  5

files using our script at “src/kge output to data.py”. To test the System on
the Benchmark we setup the docker image that contains both the implemented
system and the trained embedding vector files. 7 .


                    Fig. 4. An example of system meta-data file.


    To declare the user name and system name to HOBBIT a new system re-
quired to adopt system the meta-data file “system.ttl”. Figure 4 shows an ex-
ample of a system meta-data file whose label is adopted to “sample-system” and
includes the GitLab username. To upload the benchmark system a HOBBIT Git-
Lab account is required that can be created in git.project-hobbit.eu. Afterwards,
the created system (docker image) can be pushed to HOBBIT GitLab.

5    Acknowledgement
This study is partially supported by the MLwin project8 (Maschinelles Lernen
mit Wissensgraphen, grant 01IS18050F of the Federal Ministry of Education
and Research of Germany). MLwin Project aims to promote and study the
application of Machine learning methods in knowledge graphs.

References
1. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating
   embeddings for modeling multi-relational data. In: NeurIPS. pp. 1–9 (2013)
2. Röder, M., Kuchelev, D., Ngonga Ngomo, A.C.: Hobbit: A platform for benchmark-
   ing big linked data. Data Science 3(1), 15–35 (2020)
3. Sadeghi, A., Graux, D., Yazdi, H.S., Lehmann, J.: MDE: multiple distance embed-
   dings for link prediction in knowledge graphs. In: ECAI (2020)
4. Wang, Q., Mao, Z., Wang, B., Guo, L.: Knowledge graph embedding: A survey of
   approaches and applications. TKDE (2017)
5. Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J., Appleton, G., Axton, M., Baak,
   A., Blomberg, N., Boiten, J.W., da Silva Santos, L.B., Bourne, P.E., et al.: The fair
   guiding principles for scientific data management and stewardship. Nature Scientific
   data (2016)
7
  mvn commands:        https://github.com/mlwin-de/BenchEmbedd#benchmark-the-
  system-online
8
  https://mlwin.de/
6       A. Sadeghi et al.

6. Zhang, S., Tay, Y., Yao, L., Liu, Q.: Quaternion knowledge graph embeddings. In:
   Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett,
   R. (eds.) NeurIPS (2019)

</pre>