<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>BenchEmbedd: A FAIR Benchmarking tool for knowledge graph Embeddings</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Afshin Sadeghi</string-name>
          <email>sadeghi@cs.uni-bonn.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xhulia Shahini</string-name>
          <email>shahinixhulja@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Schmitz</string-name>
          <email>schmitz.kessenich@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jens Lehmann</string-name>
          <email>jens.lehmann@cs.uni-bonn.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Bonn</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Fraunhofer IAIS</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Knowledge graph embedding models have been studied comprehensively recently. However, these studies lack an evaluation system that compares their e ciency in a reproducible manner that follows the FAIR principles. In this study, we extend the general HOBBIT benchmarking platform to evaluate the e ciency of embedding models with such criteria. The demo benchmark, source code of this study, and installation and usage guide are openly available in https://github.com/mlwinde/BenchEmbed. In this paper, we explain the structure of this Benchmarking tool and demonstrate the usage of the benchmarking system for the knowledge graph embedding models.</p>
      </abstract>
      <kwd-group>
        <kwd>Knowledge graph embedding</kwd>
        <kwd>Benchmarking</kwd>
        <kwd>Link prediction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        A knowledge graph is a heterogeneous multi-relational graph composed of
knowledge about the world presented in a structured form i.e. facts are represented by
entities that are connected using relations. Knowledge graphs embedding (KGE)
models learn a mathematical approximation of knowledge graphs and produce
representations for their entities and relations. These methods have been
comprehensively studied recently [
        <xref ref-type="bibr" rid="ref3 ref4 ref6">4, 3, 6</xref>
        ] and are applied in many downstream Machine
Learning and Natural Language Processing (NLP) tasks. A gap in current KGE
studies is a standard independent evaluation environment that evaluates the
efciency of models in a fair setting (e.g. with same vector sizes). Furthermore,
these studies su er from the lack of a systematic reproducible evaluation. To
target these issues, we extended the HOBBIT [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] platform is a Holistic
benchmarking approach for Big Linked Data. with a new set of benchmarks with the
aim to evaluate the e ciency of knowledge graph embedding models with the
aforementioned criteria. We released this Benchmarking tool with the name
      </p>
      <p>BenchEmbedd. A demo benchmark, source code, installation, and usage guide of
this project are openly available3.</p>
      <p>
        We chose HOBBIT as the base, because it is developed under FAIR
principles [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We follow the same concepts in making this BenchEmbedd.Another
advantage of the platform is generating dockerized benchmarking, i.e., that once
a system (image) is generated, it be executed locally on a personal system or a
local cluster or be deployed on computing services such as Amazon Web Services
(AWS).
      </p>
      <p>The produced benchmarks are accessible, transferable, and easily reusable.
This setting promotes reliable scienti c publications, because it allows researchers
to repeat the evaluations of an study without concerns about standardized
evaluation hardware. We ensure the reproducibility of the evaluations by generating
benchmark systems which are executable (docker) images of the exact
environment of an original evaluation made by a researcher. The method is easily
extensible by making a new copy and adding more models to it. In the
following section, we explain the structure of our benchmarking platform. We then in
explain the functionalities in Section 3 and the Demonstration of BenchEmbedd
in Section 4.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Structure:</title>
      <sec id="sec-2-1">
        <title>3 https://github.com/mlwin-de/BenchEmbedd 4 The diagram is from [2].</title>
        <p>Module, Evaluation storage, Benchmark Controller, Task Generator, and Data
Generator.</p>
        <p>The Benchmark System contains a complete ready-to-run Benchmarking
work ow within a controlled dockerized5 running environment. A Benchmark
System can contain con gurations for running multiple tests on di erent models
and di erent test datasets. To Extend BenchEmbedd to other datasets it is
enough to duplicate and extend a new Benchmark System con guration of the
benchmarking platform. Section 4 explains a demo System and explains the steps
to make a new System.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Functionalities</title>
      <p>
        In BenchEmbedd we perform a Link Prediction evaluation task. KGE models
learn knowledge graphs in the form of triples (head, relation, tail), and the
link prediction task tests KGE models in how e ciently they predict missing
links (triples) in a knowledge graph. Figure 2 shows a knowledge graph with 4
entities, where the green relations are known. In this example, the link prediction
task tests how well the missing triple (\Polito", \is a university in", \Italy") is
estimated by a knowledge graph learning model. A KGE model is e cient if it
generates a high score for the missing link indicating the existence of this relation.
The current implementation computes the following metrics: HIT@1, HIT@3,
HIT@10, and Mean Reciprocal Rank. The current implementation includes the
test for TransE [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] model, while the benchmark is open to be extended to other
models. We con gured a benchmark to test over the WN18rr benchmarking
dataset for the demo.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Demonstration</title>
      <p>The benchmark is a java Maven project. After the setup 6 of BenchEmbedd, to
execute a sample Benchmark system online one needs to follow these steps:</p>
      <sec id="sec-4-1">
        <title>5 https://www.docker.com</title>
        <p>6 Setup guide is in https://github.com/mlwin-de/BenchEmbedd#installation
{ Login to the website https://master.project-hobbit.eu/.
{ Select \Benchmarks".
{ Select \MLwin Benchmark" in the drop-down list of \Benchmarks".
{ Select a desired System to Benchmark in the drop-down list \System".
{ Press the \Submit" Button.</p>
        <p>At this stage, a pop-up window will appear. There the Experiment Status shows
the progress of the running experiment and clicking the link in the popup window
shows the experiment results once the experiment is nished. Figure 3 illustrates
an example of the result table after running the demo benchmark system.</p>
        <p>Adding new models: To include more metrics and datasets in the
context of the knowledge graph link prediction task, it is possible to make a new
Benchmark test environment with a new con guration. Then a new independent
Benchmark dockerized system (colored green in Figure 1 entitled as
\Benchmarked System" ) is created on this con guration. The steps to make a new
Benchmark environment by extending the current demo Benchmark con
guration is:
{ Writing a Benchmark System le.
{ Providing a set of pre-trained embedding vectors.
{ Creating a system docker image.
{ Writing a system meta-data le.</p>
        <p>{ Creating a HOBBIT GitLab account to load up the les.</p>
        <p>The steps to write a Benchmark System le are:
{ Extend the TransEtest.java le for a new benchmark system le. It contains
the method \test triple" that is the base for the link prediction tests.
{ Provide trained embeddings with names \entity2vec.txt" and
\relation2vec.txt".</p>
        <p>Our Sample System is trained using the TransE model and the output les
of the training process of this repository are converted from \.npy" to \.txt"
les using our script at \src/kge output to data.py". To test the System on
the Benchmark we setup the docker image that contains both the implemented
system and the trained embedding vector les. 7.</p>
        <p>To declare the user name and system name to HOBBIT a new system
required to adopt system the meta-data le \system.ttl". Figure 4 shows an
example of a system meta-data le whose label is adopted to \sample-system" and
includes the GitLab username. To upload the benchmark system a HOBBIT
GitLab account is required that can be created in git.project-hobbit.eu. Afterwards,
the created system (docker image) can be pushed to HOBBIT GitLab.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgement</title>
      <p>This study is partially supported by the MLwin project8 (Maschinelles Lernen
mit Wissensgraphen, grant 01IS18050F of the Federal Ministry of Education
and Research of Germany). MLwin Project aims to promote and study the
application of Machine learning methods in knowledge graphs.
7 mvn commands:
https://github.com/mlwin-de/BenchEmbedd#benchmark-thesystem-online
8 https://mlwin.de/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bordes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Usunier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Duran</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weston</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yakhnenko</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Translating embeddings for modeling multi-relational data</article-title>
          .
          <source>In: NeurIPS</source>
          . pp.
          <volume>1</volume>
          {
          <issue>9</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Roder,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Kuchelev</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Ngonga</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.C.</surname>
          </string-name>
          :
          <article-title>Hobbit: A platform for benchmarking big linked data</article-title>
          .
          <source>Data Science</source>
          <volume>3</volume>
          (
          <issue>1</issue>
          ),
          <volume>15</volume>
          {
          <fpage>35</fpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Sadeghi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Graux</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yazdi</surname>
            ,
            <given-names>H.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>MDE: multiple distance embeddings for link prediction in knowledge graphs</article-title>
          .
          <source>In: ECAI</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Knowledge graph embedding: A survey of approaches and applications</article-title>
          .
          <source>TKDE</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Wilkinson</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dumontier</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aalbersberg</surname>
            ,
            <given-names>I.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Appleton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Axton</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baak</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blomberg</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boiten</surname>
            ,
            <given-names>J.W.</given-names>
          </string-name>
          , da Silva Santos,
          <string-name>
            <given-names>L.B.</given-names>
            ,
            <surname>Bourne</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.E.</surname>
          </string-name>
          , et al.:
          <article-title>The fair guiding principles for scienti c data management and stewardship</article-title>
          .
          <source>Nature Scienti c data</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tay</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          :
          <article-title>Quaternion knowledge graph embeddings</article-title>
          . In: Wallach,
          <string-name>
            <given-names>H.M.</given-names>
            ,
            <surname>Larochelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Beygelzimer</surname>
          </string-name>
          , A.,
          <string-name>
            <surname>d'</surname>
            Alche-Buc,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>E.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garnett</surname>
            ,
            <given-names>R</given-names>
          </string-name>
          . (eds.)
          <source>NeurIPS</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>