<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>N. Abdelmageed);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Metadata to Knowledge Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nora Abdelmageed</string-name>
          <email>nora.abdelmageed@uni-jena.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Birgitta König-Ries</string-name>
          <email>birgitta.koenig-ries@uni-jena.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Friedrich Schiller University Jena</institution>
          ,
          <addr-line>Jena</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Michael Stifel Center Jena</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Metadata is used to describe data. It includes information about the who, when, where, how, and why of data collection. Ideally, it should be in a machine-understandable format like RDF. This enables queries using structured query languages like SPARQL and empowers further data usage. In this paper, we investigate metadata as a source for generating Knowledge Graphs (KGs). We introduce a fully automatic approach that transforms raw metadata files into a Knowledge Graph ( KG). Our resources and code are publicly available1.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction
case; however, we expect our method to be domain-independent.</p>
      <p>Methodology
Figure 1 shows the four phases of our pipeline. 1) Data Acquisition We collected our metadata
ifles from various biodiversity data portals to develop the data model and evaluate our matching
technique. 2) Ontology Development The data-driven process of crafting our data model
n
o
taa iiits
D uq
c
A
e
t
a
l
u
p
o
P
&amp;
h
c
t
a
M
e
s
a
e
l
e
R</p>
      <p>Pre</p>
      <p>Processing 1
t
n
e
m
p
o
l
e
v
e
D
y
g
o
l
o
t
n</p>
    </sec>
    <sec id="sec-2">
      <title>O Seen Data Pre</title>
      <sec id="sec-2-1">
        <title>Processing 4</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Embedding</title>
    </sec>
    <sec id="sec-4">
      <title>Source</title>
    </sec>
    <sec id="sec-5">
      <title>Unseen Data</title>
    </sec>
    <sec id="sec-6">
      <title>Keys</title>
      <p>BMO</p>
    </sec>
    <sec id="sec-7">
      <title>Keys</title>
      <p>Get</p>
      <sec id="sec-7-1">
        <title>Embedding 5</title>
      </sec>
      <sec id="sec-7-2">
        <title>Reconcile &amp; 2</title>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Model</title>
    </sec>
    <sec id="sec-9">
      <title>Keys +</title>
    </sec>
    <sec id="sec-10">
      <title>Synonyms BMO E</title>
    </sec>
    <sec id="sec-11">
      <title>Match 6</title>
    </sec>
    <sec id="sec-12">
      <title>Keys E</title>
    </sec>
    <sec id="sec-13">
      <title>Publish 9</title>
    </sec>
    <sec id="sec-14">
      <title>Matches</title>
      <sec id="sec-14-1">
        <title>VPaolipdualtaete&amp; 8</title>
      </sec>
    </sec>
    <sec id="sec-15">
      <title>BMKG</title>
      <sec id="sec-15-1">
        <title>EGmebteOdndtinog 3</title>
      </sec>
    </sec>
    <sec id="sec-16">
      <title>Embedding</title>
    </sec>
    <sec id="sec-17">
      <title>Source</title>
    </sec>
    <sec id="sec-18">
      <title>Evaluate 7</title>
    </sec>
    <sec id="sec-19">
      <title>Scores</title>
    </sec>
    <sec id="sec-20">
      <title>Ground</title>
    </sec>
    <sec id="sec-21">
      <title>Truth</title>
      <p>(Biodiversity Metadata Ontology (BMO)). We applied several cleaning steps to the collected data.
During this phase, we held several meetings with a biodiversity expert to validate and review
our conceptual model. In addition, we developed mean-based techniques to transform BMO
to the embedding space (BMOE). 3) Match &amp; Populate Our unsupervised learning methods
for ontology matching and instance population. For matching, we used cosine similarity in
the embedding space between the ontological embeddings, BMO E, and metadata embeddings,
Keys E. We used embeddings to capture the semantic meaning of words. For population, We
limit the population to a triple if and only if its value has the expected datatype. For example,
we accept the triple, e.g., (author, phone, XXX) if “XXX” is a phone. We implemented such
kind of validations using regular expressions. 4) Release We published our resources and code
under the Creative Commons Attribution 4.0 International (CC BY 4.0) and Apache License 2.0,
respectively.</p>
      <p>Acknowledgments
The authors thank the Carl Zeiss Foundation for the financial support of the project “A Virtual
Werkstatt for Digitization in the Sciences (K3, P5)” within the scope of the program line
“Breakthroughs: Exploring Intelligent Systems for Digitization” - explore the basics, use applications”.
In addition, we thank, Cornelia Fürstenau, Sirko Schindler, Muhammad Abbady, and Jan Martin
Keil for the fruitful discussions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , E. Blomqvist,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          , C. d'Amato, G. de Melo,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutiérrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kirrane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E. L.</given-names>
            <surname>Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Neumaier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Rashid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Schmelzeisen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zimmermann</surname>
          </string-name>
          , Knowledge Graphs,
          <source>Synthesis Lectures on Data, Semantics, and Knowledge</source>
          , Morgan &amp; Claypool Publishers,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          . 2200/S01125ED1V01Y202109DSK022.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Aalbersberg</surname>
          </string-name>
          , G. Appleton,
          <string-name>
            <given-names>M.</given-names>
            <surname>Axton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Baak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Blomberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-W.</given-names>
            <surname>Boiten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B. da Silva</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Bourne</surname>
          </string-name>
          , et al.,
          <article-title>The fair guiding principles for scientific data management and stewardship</article-title>
          ,
          <source>Scientific data 3</source>
          (
          <year>2016</year>
          ). doi:
          <volume>10</volume>
          .1038/sdata.
          <year>2016</year>
          .
          <volume>18</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Grave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , T. Mikolov,
          <article-title>Enriching word vectors with subword information</article-title>
          ,
          <source>Trans. Assoc. Comput. Linguistics</source>
          <volume>5</volume>
          (
          <year>2017</year>
          )
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          . URL: https://doi.org/10. 1162/tacl_a_00051. doi:
          <volume>10</volume>
          .1162/tacl\_a\_
          <volume>00051</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>