<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Helmholtz Digitization Ontology: Representing Digital Assets in the Helmholtz Digital Ecosystem</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Said Fathalla</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gerrit Günther</string-name>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leon Steinmeier</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christine Lemster</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dorothee Kottmeier</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lakxmi Sivapatham</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pier Luigi Buttigieg</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Volker Hofmann</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Sandfeld</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Alfred-Wegener-Institut Helmholtz-Zentrum für Polarund Meeresforschung</institution>
          ,
          <addr-line>Bremerhaven</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Deutsches Zentrum für Luft und Raumfahrt</institution>
          ,
          <addr-line>Cologne</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Forschungszentrum Jülich GmbH, Institute for Advanced Simulation - Materials Data Science and Informatics (IAS-9)</institution>
          ,
          <addr-line>Jülich</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>GEOMAR Helmholtz-Zentrum für Ozeanforschung</institution>
          ,
          <addr-line>Kiel</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Helmholtz Center Dresden Rossendorf</institution>
          ,
          <addr-line>Dresden</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Helmoltz-Zentrum Berlin für Materialien und Energie</institution>
          ,
          <addr-line>Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Helmholtz Association is actively digitizing research outcomes to drive progress and innovation. The vast volumes of digital data are diverse in terms of their formats and the semantic descriptions used in their interchange, and storage. Therefore, a semantic frame of reference is required to facilitate interoperability throughout the Helmholtz digital ecosystem. This paper presents the Helmholtz Digitization Ontology (HDO), which is intended to serve that purpose. HDO is a mid-level ontology that contains concepts representing digital assets relevant to the Helmholtz digital ecosystem, data creation, management, and exchange. It is developed within the framework of the Helmholtz Metadata Collaboration (HMC) with contributors from various scientific backgrounds. HDO serves as a harmonized semantic framework and machine-actionable reference across all Helmholtz research fields.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Metadata</kwd>
        <kwd>Metadata Management</kwd>
        <kwd>Data Management</kwd>
        <kwd>FAIR</kwd>
        <kwd>Harmonization</kwd>
        <kwd>OWL</kwd>
        <kwd>Bilingual ontology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The Helmholtz Association comprises 18 research centers operating across Germany,
focusing on a wide range of scientific topics and methods within six research areas fields 1. The
Helmholtz Metadata Collaboration (HMC)2 is an association-wide operating platform that
supports (meta)data harmonization and information engineering across all research fields intending
to make data within Helmholtz adhere to the FAIR principles [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and establish an interoperable
FAIR data space.
      </p>
      <p>
        Motivation and requirements. The heterogeneity of scientific contexts within Helmholtz
leads to ambiguity and conflicts regarding metadata semantics, e.g. in developed metadata
schemas, tools or general communication between collaborators, when data is exchanged in
interdisciplinary collaboration. Due to their ability to establish clear context and relationships
between concepts [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], ontologies are widely used towards facilitating eficient and interoperable
data management and data exploitation. As such, ontologies are important towards making data
FAIR, specifically towards achieving interoperability and machine actionability. Thus, HMC
realized that it is required to provide a semantic frame of reference for all stakeholders involved
in Helmholtz’s digitization eforts. Contributors from all Helmholtz research fields have been
involved in this process.
      </p>
      <p>
        Objectives. The main objectives of the Helmholtz Digitization Ontology (HDO) are: 1)
Creating a standardized semantic framework with terminology that can reduce semantic uncertainty
and ambiguity and thereby increase semantic interoperability between various Helmholtz
systems. 2) Facilitating data integration in diferent Helmholtz systems, e.g., the institutional
Helmholtz Knowledge Graph [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] or domain-specific use cases. 3) Providing a basis for reasoning
based on existing data to allow inferring new knowledge, e.g., towards predictive analyses.
4) Supporting harmonized knowledge management within Helmholtz to facilitate
decisionmaking and the preservation of research findings.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Concepts Definitions</title>
      <p>In HDO, we provide well-defined concepts with rigorous semantic and unambiguous definitions.
Class definitions aim to: 1) outline the intrinsic characteristics of the term being defined,
2) avoid circularity, 3) be neither excessively broad (to avoid ambiguity), nor too narrow (to
allow implementation of further sub-classes where necessary), and 4) be easily understood
using common, unambiguous terminology, which is important, especially concerning HDO’s
purpose of serving as a mid-level ontology that provides common understanding and reduces
miscommunication across diferent research fields.</p>
      <p>To create class definitions, we follow Aristotelian logic, specifically, definitions adhere to
the genus-diferentia form 3. Genus-diferentia definitions follow the form: “A (the class
label) is a B (the genus or superclass) which is C (the differentia)”.
For instance, consider the definition of “ JSON file”, which is “A file which conforms
to JSON format”. Here, the genus part is “file”, which is the superclass of JSON file, from
which all diferentia and properties are inherited. The remaining part is the diferentia, which
comprises the features distinguishing the currently defined term from its genus and siblings.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Development Strategy</title>
      <p>The key aspects of HDO development are illustrated in Figure 1. The development is carried
out in three phases: initialization, implementation, and adoption and adaption.</p>
      <p>1. Initialization phase: In the early stages of the development, an internal GitLab repository
was created to gather a set of core terms and their definitions in per-term YAML files. We
followed a template with keys such as definition, synonyms, comments, and seeAlso.
3https://en.wikiversity.org/wiki/Dominant_group/Genus_diferentia_definition
creator date modified date created contributor Class Axioms
LODE
license
editor note</p>
      <p>BFO
RO
DC
FOAF
definition</p>
      <p>label
comment</p>
      <p>Metadata</p>
      <p>Logic
IAO
Reused
ontologies</p>
      <p>Class
annotations</p>
      <p>Documentation</p>
      <p>Development</p>
      <p>ODK
OWL</p>
      <p>PID
full provenance
acronym</p>
      <p>Bilingual
broad synonym
related synonym</p>
      <p>English</p>
      <p>German
cross reference examples
gloss
plural
exact synonym</p>
      <p>
        2. Implementation phase: After collecting terms, YAML files were converted and
merged into one OWL file, where keys of the template were mapped onto existing and
imported annotation properties (e.g. definition was mapped to iao:definition).
Further development was carried out in the OWL file in a public repository 4, and managed
using the Ontology Development Kit [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. HDO is made accessible via a persistent
identiifer ( https://purls.helmholtz-metadaten.de/hob/hdo.owl) and terms are derefrenced via their IRIs (e.g.,
HDO_00004001). PIDA is used to dereference a single ontology term IRI, and display the HTML
documentation5 on a web browser for human users or provide the OWL file for machines.
The class hierarchy is extended according to the following workflow: 1) Contributors create
a GitLab issue and propose a term definition and properties, 2) Collaborative discussion was
carried out within the issue thread. 3) Upon general agreement, the class was implemented in
the development file (i.e., hdo-edit.owl) within separate, sequentially generated branches, and
4) A merge request is created, and upon approval by at least three contributors, these branches
are merged into the main.
      </p>
      <p>3. Adoption and Adaption: Upon publication of HDO, it will be used in use cases across
the diferent Helmholtz research fields. This will test the ontology against use case-specific
requirements and allow further adaption based on iterative exchange.</p>
      <sec id="sec-3-1">
        <title>3.1. Reuse of existing terms</title>
        <p>We focused on reusing classes from existing, well-known semantic artifacts, wherever possible,
to ensure semantic interoperability. HDO is top-level aligned with the Basic Formal Ontology
(BFO)6 to ensure interoperability with other mid- and domain-level ontologies. Furthermore,
classes and properties from well-known Open Biological and Biomedical Ontologies (OBO)</p>
        <sec id="sec-3-1-1">
          <title>4https://codebase.helmholtz.cloud/hmc/hmc-public/hob/hdo 5https://purls.helmholtz-metadaten.de/hob/HDO_00000000 6https://obofoundry.org/ontology/bfo</title>
          <p>entity
occurrent
continuant</p>
          <p>generically dependent continuant
specifically
dependent continuant
information
structured data
data
were re-used, including: Information Artifact Ontology (IAO)7 (iao:action specification,
iao:information content entity, iao:plan specification) and the Relation
Ontology (RO)8 (ro:input of, ro:has characteristic and ro:has input).</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Concepts overview</title>
        <p>An overview of the HDO core classes is shown in Figure 2. HDO establishes semantics for
the core concepts of digital infrastructure, digital information management, and processes
in data management and exchange. For example, HDO ofers a rigorous semantic context
for the aspects of the FAIR principles. These concepts are modelled as bfo:disposition
that inhere in hdo:data (HDO_00000009) either according to a practical understanding of
hdo:findability, as well as specified according to the FAIR principles as hdo:findability
according to the FAIR principles. Further, we created classes for diferent aspects
of metadata standardization under hdo:information and aligned this with IAO classes (e.g.,
iao:information content entity).
3.3. Logic
We asserted pairwise disjointness between mutually disjoint classes, e.g., hdo:digital
infrastructure is disjoint with hdo:hardware. We used bfo:role and bfo:realizes
to create a pattern that allows populating certain classes by inference. For example, the class
hdo:agent is “An entity which realises an agent role”. This is inferred through hdo:agent</p>
        <sec id="sec-3-2-1">
          <title>7https://github.com/information-artifact-ontology/IAO/ 8https://obofoundry.org/ontology/ro.html</title>
          <p>role and bfo:realizes. A similar example is the class hdo:tool which is defined as “A
continuant which realises a tool role”. For this, as well as to allow reasoning about relevant
concepts, several OWL axioms have been asserted in HDO. The EquivalentClasses axiom
allows to state that several class expressions are equivalent to each other. For instance, the class
hdo:data has the following axiom assertion:
’generically dependent continuant’ and</p>
          <p>(’output of’ some (’encoding process’ and (’has input’ some signifier)))
The SubClassOf axiom allows to state that each instance that fits a class expression is
also an instance of that class. For instance, the class hdo:structured vocabulary has the
following SubClassOf axiom assertion:
vocabulary and ’has quality’ some structuredness</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions and Future Work</title>
      <p>
        The Helmholtz Digitization Ontology was developed with the objective of providing harmonized
semantics as a reference framework for the Helmholtz digital ecosystem. The development
of HDO was open and transparent, and its full provenance is recorded. Further, we follow an
established ontology development framework (e.g., ODK) and align HDO with well-established
semantic frameworks in order to increase acceptance towards domain-level re-use and
application. One of the further use cases will be the semantic representation of FAIR digital
objects (FDOs) that will allow data integration between FDOs and the HMC Helmholtz KG
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Such implementations will extend HDO core semantics and facilitate the representation,
interoperability, and analysis of scientific metadata within the Helmholtz digital ecosystem.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Aalbersberg</surname>
          </string-name>
          , G. Appleton,
          <string-name>
            <given-names>M.</given-names>
            <surname>Axton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Baak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Blomberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-W.</given-names>
            <surname>Boiten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B. da Silva</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Bourne</surname>
          </string-name>
          , et al.,
          <article-title>The FAIR guiding principles for scientific data management and stewardship</article-title>
          ,
          <source>Scientific data 3</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Fathalla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vahdati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lange</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <article-title>Seo: A scientific events data model</article-title>
          ,
          <source>in: The Semantic Web - ISWC 2019</source>
          , Springer International Publishing, Cham,
          <year>2019</year>
          , pp.
          <fpage>79</fpage>
          -
          <lpage>95</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bröder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Preuß</surname>
          </string-name>
          ,
          <string-name>
            <surname>F. D'Mello</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Fathalla</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Hofmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Sandfeld</surname>
          </string-name>
          ,
          <article-title>The Helmholtz knowledge graph: driving the transition towards a FAIR data ecosystem in the Helmholtz Association</article-title>
          , in: European Semantic Web Conference, Springer,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Matentzoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Goutte-Gattat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Z. K.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Balhof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Carbon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Caron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. D.</given-names>
            <surname>Duncan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Flack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Haendel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. L.</given-names>
            <surname>Harris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. R.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. T.</given-names>
            <surname>Hoyt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. C</given-names>
            .
            <surname>Jackson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Larralde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>McMurry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Overton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pilgrim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Stefancsik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Robb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Toro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Vasilevsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Walls</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Mungall</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          Osumi-Sutherland,
          <article-title>Ontology Development Kit: a toolkit for building, maintaining and standardizing biomedical ontologies</article-title>
          ,
          <year>Database 2022</year>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>