<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Semantic Representation of the Schema Utilizing the CEDAR Workbench NFDI4Health Metadata</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matthias Löbe</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aliaksandra Shutsko</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carsten O. Schmidt</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johannes Darms</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sophie A. I. Klopfenstein</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carina N. Vorisek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xioaming Hu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Golebiewski</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juliane Fluck</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Core Facility Digital Medicine and Interoperability, Berlin Institute of Health at Charité</institution>
          ,
          <addr-line>Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Heidelberg Institute for Theoretical Studies (HITS)</institution>
          ,
          <addr-line>Heidelberg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute for Community Medicine, University Medicine Greifswald</institution>
          ,
          <addr-line>Greifswald</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Institute for Medical Informatics, Statistics and Epidemiology (IMISE), University of Leipzig</institution>
          ,
          <addr-line>Leipzig</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>ZB MED - Information center for Live Sciences</institution>
          ,
          <addr-line>Cologne</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Rich metadata is required for a comprehensive description of research assets. We developed a metadata schema for clinical, epidemiological and public health research studies based on existing generic and domain-specific metadata vocabularies. It forms the basis for various search and data management services provided by the German National Research Data Infrastructure for Personal Health Data (NFDI4Health). Interoperability remains a challenge, as various health research standards are to be supported in the medium term. At the same time, embedding our infrastructure in national and international resources requires the use of overarching syntactic and semantic standards and vocabularies. In this paper we present a prototypical implementation in CEDAR Workbench. This not only provides a graphical web interface for collaboration and a possibility for form-based data entry for testing purposes. CEDAR also enables the use of standard vocabularies, annotation of concepts with medical terminologies, and a serialization in an RDF-JSON format.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Metadata Schema</kwd>
        <kwd>Clinical Trial Registries</kwd>
        <kwd>Dublin Core</kwd>
        <kwd>DataCite</kwd>
        <kwd>DCAT</kwd>
        <kwd>RDF</kwd>
        <kwd>CEDAR</kwd>
        <kwd>BioPortal</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and Methods</title>
      <p>
        NFDI4Health is an initiative to foster data sharing in the clinical and epidemiological research
community in Germany. To improve findability and reusability of structured health data from clinical
trials, epidemiological studies, disease registries, administrative health databases and public health
surveillance, a metadata schema (N4H MDS) was developed unifying these different types of research
studies particularly in the advent of COVID-19 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The realization took place in Microsoft Excel, in
order to be able to bring together the community quickly and without special software knowledge and
not to anticipate any software-technical realization. Based on the experience gained, this schema will
be further developed and also opened up for other types such as nutritional studies. However, due to
the increasing complexity of the MDS, the large number of experts involved and the different domains
of health research, the work is becoming increasingly difficult.
      </p>
      <p>
        The CEDAR Workbench [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is a web-based tool for the collaborative authoring of metadata
schemas. It allows the creation of individual metadata elements including versioning and the reuse of
high-quality templates that map to metadata standards such as Dublin Core or W3C DCAT. Metadata
elements can be grouped and published in a form view. Completed forms are stored persistently and
are therefore also suitable for prototypical tests before implementation in self-developed software.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Results</title>
      <p>All metadata elements of the N4H MDS version 0.8 (equivalent to version 1.0 after final consensus)
were implemented in CEDAR. We divided the implementation into 109 simple data elements that can
be reused and 14 more complex structures composed of simple data elements2. This allowed the schema
to be mapped completely. As in many technical implementations, certain idiosyncrasies of the software
had to be anticipated, such as unusual types of form fields. Although the MDS follows established
vocabularies such as Dublin Core, DataCite, or the specifications of international study registries,
machine interpretation of such embeddings requires dedicated references, which the MDS currently
does not provide. The annotation of data elements with ontologies from BioPortal was therefore only
investigated prototypically, but appears very promising. It was not possible to fully map the complex
conditional conditions under which certain data elements become mandatory fields, must satisfy certain
formats, or should not be filled in at all. However, this would also not be possible with other tools like
REDCap without programming effort.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Discussion and Outlook</title>
      <p>
        Collaborative development of metadata vocabularies with domain experts has suffered for many
years from limited support by intuitive tools. Mostly, the focus is on content work and the experts are
not willing to learn new software tools in parallel. As a result, Microsoft Excel is still a quasi-standard,
although its limitations in tracking changes, enforcing naming conventions and technical constraints,
and implementation in software or APIs are well known. The use of CEDAR can also only partially
resolve these conflicts. Further work should investigate the usability of the RDF serialization. Desirable
would be a plugin mechanism that would allow syntactic compatibility to the RDF variant of the HL7
FHIR standard, as this is expected to play a major role in health research in the future [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Acknowledgements</title>
    </sec>
    <sec id="sec-5">
      <title>5. References</title>
      <p>This work supported by the DFG grant no. 442326535 and WI 1605/10-2.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Golebiewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Löbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.O.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lehne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shutsko</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Darms</surname>
          </string-name>
          ,
          <article-title>NFDI4Health Task Force COVID-19 Metadata Schema</article-title>
          , FAIRDOMHub,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .15490/FAIRDOMHUB.1.DATAFILE.
          <volume>3972</volume>
          .1
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.O.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Darms</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shutsko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Löbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nagrani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Seifert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lindstädt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Golebiewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Koleva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bender</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.R.</given-names>
            <surname>Bauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Sax</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lieser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Junker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Klopfenstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zeleke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Waltemath</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Pigeot</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Fluck</surname>
          </string-name>
          ,
          <article-title>Facilitating Study and Item Level Browsing for Clinical and Epidemiological COVID-</article-title>
          19
          <source>Studies. Studies in health technology and informatics 281</source>
          (
          <year>2021</year>
          ),
          <fpage>794</fpage>
          -
          <lpage>798</lpage>
          . doi:
          <volume>10</volume>
          .3233/SHTI210284
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.S.</given-names>
            <surname>Gonçalves</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.J. O'Connor</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Martínez-Romero</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          <string-name>
            <surname>Egyedi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Willrett</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Graybeal</surname>
            , and
            <given-names>M.A.</given-names>
          </string-name>
          <string-name>
            <surname>Musen</surname>
          </string-name>
          ,
          <article-title>The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata that Describe Scientific Experiments</article-title>
          .
          <source>Semant Web ISWC</source>
          <volume>10588</volume>
          (
          <year>2017</year>
          ),
          <fpage>103</fpage>
          -
          <lpage>110</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -68204-4_
          <fpage>10</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.A.I.</given-names>
            <surname>Klopfenstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.N.</given-names>
            <surname>Vorisek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shutsko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lehne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Löbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.O.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Thun</surname>
          </string-name>
          ,
          <article-title>Fast Healthcare Interoperability Resources (FHIR) in a FAIR Metadata Registry for COVID-</article-title>
          19
          <source>Research. Studies in health technology and informatics 287</source>
          (
          <year>2021</year>
          ),
          <fpage>73</fpage>
          -
          <lpage>77</lpage>
          . doi:
          <volume>10</volume>
          .3233/SHTI210817
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          2 Open view link: https://cedar.metadatacenter.org/instances/create/https://repo.metadatacenter.org/templates/6293220a-9f68
          <string-name>
            <surname>-</surname>
          </string-name>
          419e-
          <fpage>9577</fpage>
          - d055cea8ae93?folderId=https:%
          <source>2F%2Frepo.metadatacenter.org%2Ffolders%2F2d0ab99f-01cb-4e74-bbe2-c5329fb77950</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>