<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ilya Mazein</string-name>
          <email>ilya.mazein@uni-greifswald.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sarah Braun</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tom Gebhardt</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ron Henkel</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lea Michaelis</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dagmar Waltemath</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Judith A.H. Wodke</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Core Unit Data Integration Center, University Medicine Greifswald</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Medical Informatics Department, Institute of Community Medicine, University Medicine Greifswald</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <fpage>26</fpage>
      <lpage>29</lpage>
      <abstract>
        <p>The MeDaX project aims to develop and implement concepts and tools for bioMedical Data eXploration using graph technologies. Here, we present v0.2 of our prototype, representing FHIR formatted clinical data in a graph format. We build on the pre-existing CyFHIR tool for generic conversion, optimise the resulting graph structure to lessen complexity, and incorporate the BioCypher framework, integrating the clinical data with ontology information. This makes the data more accessible and convenient for querying, information retrieval and analysis.</p>
      </abstract>
      <kwd-group>
        <kwd>Clinical data</kwd>
        <kwd>graph database</kwd>
        <kwd>knowledge graph</kwd>
        <kwd>FHIR</kwd>
        <kwd>Neo4j</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Accessibility of clinical data and its consequent analysis are widely discussed among researchers
and clinicians. The Medical Informatics Initiative (MII) Germany aims to digitally provide
reusable clinical data for research purposes (https://www.medizininformatik-initiative.de/en/
start). The MeDaX project focuses on developing a resource for biomedical data exploration
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Here, we present a prototype that represents patient data in a graph format, beneficial for
querying and analysis of complex heterogeneous data [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Data formats</title>
      <p>As part of the MII, healthcare data of university hospitals are collected, harmonised, semantically
enriched, and then made available for research purposes by Data Integration Centers. To
overcome the diversity of source formats and individual data processing pipelines, data is
provided in a FHIR (Fast Healthcare Interoperability Resources) format. Hence, we base our
prototype on synthetic data of the same format. For our graph we use Neo4j (https://github.com/
neo4j/neo4j), one of the most popular and continuously maintained graph database systems
currently available. We opt for a labelled property graph representation in place of the RDF
format as it is less complex when it comes to storing large quantities of patient data and provides
a more intuitive visualisation for clinicians.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Challenges and solutions</title>
      <p>For generic conversion from FHIR to Neo4j we applied an existing project named CyFHIR, “a
native Neo4j plugin that acts as the bridge between FHIR and Neo4j” (https://github.com/Optum/
CyFHIR). It parses the tree-like structure of a FHIR resource JSON file, creating a corresponding
Neo4j graph structure, regardless of the type of FHIR resource being used as input.</p>
      <p>Due to the fact that CyFHIR builds a graph according to the raw internal structure of a FHIR
resource, the resulting order of magnitude for nodes and relationships can be equal to 4 and
higher. While most of the graph is relevant in its utility, we found some parts to be redundant.
We optimised the graph structure, making it less complex and storage-intensive, while keeping
relevant information intact.</p>
      <p>1) References between various FHIR resources represent high-level connections between
FHIR entities - a Patient and a Diagnosis they have, for example. By simplifying the structure
of these connections we decreased the test patient graph size by ∼25%.</p>
      <p>2) In its raw graph representation the structure of a FHIR resource property can consist of
multiple nodes and edges collectively describing the same feature. Condensation of property
structures so far resulted in up to ∼40% less elements in the graph. However, the optimal ratio
of complexity to searchability remains to be determined.</p>
      <p>
        3) To further organise entities after conversion and remove redundancies within the graph,
we apply the BioCypher framework [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], integrating clinical data with ontological information.
An overarching ontology will provide a unifying framework for incorporating various clinical
data formats in addition to FHIR.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion References</title>
      <p>Our graph-based solution and planned clinical user interface will help to provide a unified
way of access for researchers and direct interaction with enriched data for clinicians. Intuitive
visualisation alongside structured graph database queries allows to organise information and
answer treatment and research related questions more eficiently.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J. A.H.</given-names>
            <surname>Wodke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Michaelis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Henkel</surname>
          </string-name>
          ,
          <article-title>The MeDaX knowledge graph prototype, Studies in Health Technology and Informatics (</article-title>
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .3233/SHTI230089.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lysenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. A.</given-names>
            <surname>Roznovăţ</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saqi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mazein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Rawlings</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Aufray</surname>
          </string-name>
          ,
          <article-title>Representing and querying disease networks using graph databases</article-title>
          ,
          <source>BioData Mining</source>
          (
          <year>2016</year>
          ).
          <source>doi:10.1186/ s13040- 016- 0102- 8.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.-H.</given-names>
            <surname>Yoon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-K.</given-names>
            <surname>Kim</surname>
          </string-name>
          , S.-Y. Kim,
          <article-title>Use of graph database for the integration of heterogeneous biological data, Genomics Informatics (</article-title>
          <year>2017</year>
          ). doi:
          <volume>10</volume>
          .5808/GI.
          <year>2017</year>
          .
          <volume>15</volume>
          .1.19.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lobentanzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Aloy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Baumbach</surname>
          </string-name>
          , et al,
          <article-title>Democratizing knowledge representation with BioCypher, Nature Biotechnology (</article-title>
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .1038/s41587- 023- 01848- y.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>