<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The CEKG: A Tool for Constructing Event Graphs in the Care Pathways of Multi-Morbid Patients⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Milad Naeimaei Aali</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Felix Mannhardt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pieter Jelle Toussaint</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>tools</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Eindhoven University of Technology</institution>
          ,
          <addr-line>Eindhoven</addr-line>
          ,
          <country country="NL">Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Norwegian University of Science and Technology</institution>
          ,
          <addr-line>Trondheim</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>One of the challenges in healthcare processes, especially those related to multi-morbid patients who sufer from multiple disorders simultaneously, is not connecting the disorders in patients to process events and not linking events' activities to globally accepted terminology. Addressing this challenge introduces a new entity to the clinical process. On the other hand, it facilitates that the process is interpretable and analyzable across diferent healthcare systems. This paper aims to introduce a tool named CEKG that uses event logs, diagnosis data, ICD-10, SNOMED-CT, and mapping functions to satisfy these challenges by constructing event graphs for multi-morbid patients' care pathways automatically.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Healthcare</kwd>
        <kwd>Process mining</kwd>
        <kwd>Event knowledge graph</kwd>
        <kwd>Multi-morbid patients</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Source code repository</title>
    </sec>
    <sec id="sec-2">
      <title>Screencast video Test Dataset</title>
      <sec id="sec-2-1">
        <title>1. Introduction</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Value</title>
      <p>
        all. Patients with multi-morbidity in particular need such care. This patient group, who have
multiple chronic conditions at the same time [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], is expanding due to socio-economic deprivation
and an aging population [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and they require coordinated care from various specialties. They
also demand significantly more resources due to the complexity of their conditions. Therefore,
enhancing healthcare services for such patients can be a cornerstone of achieving truly efective
healthcare for all. One of the approaches for enhancing healthcare service for these multi-morbid
patients is enhancing the clinical process they are subject to.
      </p>
      <p>
        A clinical process or care pathway outlines the events involved in diagnosing, treating,
managing, and following up with patients [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. It can be considered a type of business process
and, consequently, techniques like process mining may be used to improve a multi-morbid
patient’s clinical process. Still, there are a lot of challenges when applying process mining
methods to the care pathways of multi-morbid patients, which often spread several caregivers in
multiple organizations and involve the simultaneous treatment of multiple conditions. Among
these challenges are connecting emerging entities to events and linking relevant terminology
to them, Addressing both of these challenges may significantly enhance the delivery of care
paths [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        • Connecting clinical entities. Emerging clinical entities are clinical attributes that are
not connected directly to events but can potentially be used as new entities. For example,
multi-morbid patient disorders are connected to the patient entity but not attached to
events. Multi-morbid patients have many diferent (sometimes emerging) disorders, which
we see as entities connected to the patient. By connecting events to the relevant disorders,
thereby getting a multi-entity event data, we can better query the event data of a patient
to find relevant insights[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
• Linking terminology. Standardized clinical coding and nomenclature systems provide
useful terminology that are often not linked to the clinical process’s activities and
entities. For example, sources like Systematized Nomenclature of Medicine Clinical Terms
(SNOMED CT) [7], International Classification of Diseases Clinical Modification (ICD
CM) [8], and diagnosis-related groups (DRG) that store event activities and entities
terminology in a standardized way are not linked directly to event activities and entities. By
aligning terminology with activities and events, we can standardize clinical processes,
enabling global interoperability for patient diagnoses and event activities. This may also
allow for various levels of abstraction and standardized categorization, ensuring a more
organized and segmented process[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        In this paper we introduce a tool for the Clinical Event Knowledge Graph (CEKG) framework
presented in our previous work [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which addresses the two mentioned challenges. We
developed this tool to support constructing CEKGs for enhancing the process analysis of multi-morbid
patient care pathways. The tool utilizes inputs such as low-dimensional clinical event logs,
diagnostic data (indicating each patient’s disorders), ICD codes, and SNOMED CT terminology.
It allows to map the diferent inputs through constrained node mappings, which are functions
derived from various sources, including empirical data, domain expertise, professional insights,
and documentation.
      </p>
      <p>In Section 2, we describe the innovations of this tool and in Sect. 3 we show its application in
a case study.</p>
      <sec id="sec-3-1">
        <title>2. The Overview of the Tool</title>
        <p>
          Since the tool needs to support terminologies such as SNOMED-CT and ICD-10 as parts of its
inputs a graph database was chosen for storage since it supports a linked-data structure for
these terminologies. Furthermore, the need for path-based traversal of data makes the graph
database an ideal choice. Additionally, the tool requires the storage of entity attributes and other
semantic patient data, further reinforcing the suitability of a graph database for these functions.
The CEKG was proposed using the Labelled Property Graph Model. However, a challenge in
creating a CEKG using Neo4J is integrating data from diferent sources (SNOMED, hospital
information systems, etc.), which requires using several complex Cypher Query Language (CQL)
queries manually. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] introduced an open-source Python library for exploring graph-based,
object-centric process discovery, but this approach requires the deployment of clinical data and
terminology.
        </p>
        <p>In addressing usability challenges, process mining tools frequently exhibit operational
complexities, particularly when applied by healthcare professionals managing patients with
multimorbid conditions. To address these dificulties, the CEKG tool incorporates a user interface
that is designed, enabling users to rapidly assimilate its functionalities from the initial steps.
Furthermore, the tool generates outputs in the LPG format and employs the Graphviz library
for visualization purposes.</p>
        <p>For the implementation of the tool, as illustrated in Fig. 1, Python with Django and Django
Channels was used as the backend framework along with several libraries such as Pandas, Neo4j,
and Graphviz. For frontend development, vanilla JavaScript and HTML and CSS were used.</p>
        <p>The CEKG tool ofers several features for discovering various types of care pathways that
integrate both connecting entities and linking terminology:</p>
        <p>C1 Independent graphs for each patient without consolidating patient activities.
C2 Combined graphs for patients without consolidating patient activities.</p>
        <p>C3 Consolidated patient activities to identify repeated activities for patients with the same
multi-morbidity or for specific disorders.</p>
        <p>C4 Consolidated patient activities to determine how frequently activities related to the
treatment of each disorder are repeated for a group of patients with the same
multimorbidity.</p>
        <p>C5 Consolidated Patient Care Pathways to identify the most frequently repeated activities in
the treatment of a group of patients with the same multi-morbidity.</p>
        <p>C6 Care Pathways that indicate which disorders are treated, untreated, or newly discovered
in each admission.</p>
        <p>Additionally, we can determine whether to include properties of activities in the graph. For
example, should the graph only indicate that a specific clinical test, such as the ABG test, was
conducted, or should it also include the test results (e.g., the values of Oxygen, Hemoglobin, ...).
Furthermore, we can segment the graph by relating it to the domain or scope of activities.</p>
        <p>The tool is designed to handle large datasets, with the primary dataset extracted from the
entire MIMIC-IV database. However, it can also be used with alternative datasets. To facilitate
testing and provide a template for users to create their own datasets, a test dataset was prepared
by extracting a portion of the MIMIC-IV data. The activity titles and patient IDs were modified
for de-identification. This test dataset includes essential ICD codes and SNOMED-CT IDs,
although the tool is capable of processing the full range of ICD codes and SNOMED-CT ID
databases. For convenience, all data in the test dataset was consolidated into a single spreadsheet.
However, for practical use, separate CSV files can be imported into the tool.</p>
        <p>For building the clinical event knowledge graph, diferent steps were defined:</p>
        <p>At each step, Neo4j queries are automatically generated based on the input dataset to create
graphs or establish relationships between two graphs. These queries will adapt if the dataset is
changed. Some queries are designed to clear the database, remove or create constraints, create
or modify nodes, or establish relationships between nodes. Since the tool sends the queries to
the Neo4j Aura Database, it is possible to view the final clinical event knowledge graph within
Neo4j Aura. However, the tool also facilitates the creation of the clinical event knowledge graph
ofline in a local Neo4J instance by providing all the necessary queries for the user.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3. Use cases overview</title>
        <p>
          In this section, we validated the CEKG tool with a case based on the MIMIC-IV dataset [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
Two patients with multi-morbidity were considered, including only two entities: PATIENT and
ADMISSION. We used the tools to discover care pathways, denoted as C2 and C3, as examples
of the care pathways that we can identify. By using the tools, we not only discovered another
entity, Disorder, but also connected all its activities and entities to ICD-10 and SNOMED-CT to
facilitate standardized analysis.
        </p>
        <p>The C2 care pathway discovered from the tool, as shown in Fig. 2, is the dependent care
pathways of two multi-morbid patients, consisting of three entities: PATIENT with red circles,
ADMISSION with blue circles, and Disorder with green circles. All activities in the process
are mapped to concepts from SNOMED-CT. Additionally, the domains of activities are shown
with diferent colors in the graph. With this type of care pathway, we can determine which
activities that happened for these patients are related to which disorders. For example, the graph
"Analysis of Arterial blood gases and pH" relates to two disorders with SNOMED-CT 1085006
and 94181007. Furthermore, we can categorize the activities using SNOMED-CT concepts.</p>
        <p>The C3, as shown in Fig. 3, is the identification of the most frequent activities in the treatment
of two patients. For example, we can find out how many times the "Microbiology Procedure"
happened after the "Analysis of Arterial Blood Gases and pH" for these two patients. Using
SNOMED-CT concepts as a label of activities facilitates the interpretation of the resulting care
pathways universally across all health organizations.</p>
        <p>To sum up, the tool streamlines the creation of standardized care pathways by integrating
any event log with ICD codes and SNOMED CTs using a graph database. It also automates the
Hospital
admission
6
16
32 27
Ward 21 sBaslmeonoptdle from
microbiology 223 Dhisocshpaitragle</p>
        <p>to
laboratory
2
2
4
2
2
6
4
4
8
2
generation and execution of the necessary queries for building the graph database, ensuring
a seamless process. One area of future research could focus on identifying additional care
pathways from the clinical event knowledge graph.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Marengoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Angleman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Melis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mangialasche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Karp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garmen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Meinow</surname>
          </string-name>
          , L. Fratiglioni,
          <article-title>Aging with multimorbidity: a systematic review of the literature</article-title>
          ,
          <source>Ageing research reviews</source>
          <volume>10</volume>
          (
          <year>2011</year>
          )
          <fpage>430</fpage>
          -
          <lpage>439</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Munoz-Gama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Fernandez-Llatas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. A.</given-names>
            <surname>Johnson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sepúlveda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Helm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Galvez-Yanjari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Rojas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Martinez-Millana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Aloini</surname>
          </string-name>
          , et al.,
          <article-title>Process mining for healthcare: Characteristics and challenges</article-title>
          ,
          <source>Journal of Biomedical Informatics</source>
          <volume>127</volume>
          (
          <year>2022</year>
          )
          <fpage>103994</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Naeimaei Aali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mannhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Jelle</surname>
          </string-name>
          <string-name>
            <surname>Toussaint</surname>
          </string-name>
          ,
          <article-title>Clinical event knowledge graphs: Enriching healthcare event data with entities and clinical concepts-research paper</article-title>
          , in: International Conference on Process Mining, Springer,
          <year>2023</year>
          , pp.
          <fpage>296</fpage>
          -
          <lpage>308</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Swevels</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. L.</given-names>
            <surname>Klijn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fahland</surname>
          </string-name>
          ,
          <article-title>Object-centric process mining (and more) using a graph-based approach with promg</article-title>
          .,
          <source>in: ICPM Doctoral Consortium/Demo</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , L. Bulgarelli,
          <string-name>
            <given-names>L.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gayles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shammout</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Horng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. J.</given-names>
            <surname>Pollard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hao</surname>
          </string-name>
          , B. Moody, B.
          <string-name>
            <surname>Gow</surname>
          </string-name>
          , et al.,
          <article-title>Mimic-iv, a freely accessible electronic health record dataset</article-title>
          ,
          <source>Scientific data 10</source>
          (
          <year>2023</year>
          )
          <article-title>1</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>