<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Analysis of patient pathways with contextual process mining</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fazle Rabbi</string-name>
          <email>fazle.rabbi@uib.no</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bahareh Fatemi</string-name>
          <email>Bahareh.Fatemi@uib.no</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wendy MacCaull</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop Proceedings</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, St Francis Xavier University</institution>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Information Science and Media Studies, University of Bergen</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>The volume and diversity of healthcare information available through the use of modern technology ofers great potential for improving health services delivery. Giving healthcare practitioners and health system administrators the ability to investigate and analyse data from various perspectives can promote evidence-based decision making. Current analysis approaches often result in models where essential relations are dificult to depict and/or discern. Further it is not easy to dynamically change the level of detail in order to accommodate user requirements and allow users to analyse data from various perspectives. We outline an approach involving filtering and abstraction of event logs using ontologies, dimensional modeling and cluster analysis so healthcare professionals may access the information they need. Our approach enhances existing process mining techniques by incorporating data mining techniques for dividing patient records into closely related groups, and results in contextual process mining. healthcare information, abstraction, process mining, community detection, visualization, ontology, Today's vast amount of healthcare information needs to be accessed easily, integrated intelligently and analysed appropriately to support better healthcare delivery. With intelligent integration and analysis, data can be used to gain deeper insights into patient health using historical healthcare information (including a patient's symptoms, appointments, exercise routines, lab results, vital signs, prescriptions, allergies, assessments, etc.) and also can be used to determine the efectiveness of treatment. Context relevant clinical information can enable improved decision making. Healthcare data analysis across disciplines can facilitate the discovery of new knowledge which can lead to streamlined workflows, greater eficiency and improved patient care. Systematic analysis of healthcare data can help to detect patterns so that healthcare providers can optimize their resource allocation and clinicians can optimize treatment plans for individuals leading to better health outcomes.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        vidual practitioner and at the system level. Clinicians require information regarding various
treatments applied over diferent patient groups to choose the best treatment plan for the
individual patient. At the system level, there is an increasing demand to optimize systems by
the eficient use of the limited resources available. Both clinicians and administrators would
benefit from more intuitive tools that would allow them to more fully utilize the data available to
them without the need for sophisticated technical knowledge. They are particularly interested
to find common pathways for patients, to ascertain how a process model be improved and to
determine to what extent existing systems are following clinical guidelines. However it is not
easy to meet these demands as healthcare processes are highly dynamic, complex, ad-hoc, and
are increasingly multidisciplinary [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Data mining techniques present the opportunity to analyse and learn from healthcare
information, from numerous viewpoints (i.e., contexts) such as with respect to patient populations
with specific diseases, ages, gender, incidences of co-morbidity, type of healthcare service
setting (e.g., clinic, hospital, nursing home), home location (urban or rural), procedures used,
etc. Clustering [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] and community detection [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] can provide useful information in order
to understand relationships between a patient’s symptoms and other information. These data
mining algorithms have the potential to identify similar groups of patients but do not include
temporal information and therefore lack the capacity to determine patient pathways. On the
other hand, process mining [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] techniques hold great potential to support health services by
identifying common flows of patients, but they are not equipped with clustering and community
detection methods. To adequately study the progression of diseases and the flow of patients
from a tight-knit group we need an integrated approach combining data mining and process
mining. As with much of the literature on network science and social network analysis, the
terms clustering and community detection are used interchangeably. In the rest of the paper
we use the term “community structure” to refer to the concept of a network structure where
the nodes are densely connected internally.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] we introduced the idea of using a new approach, called model based slicing, that utilizes
ontologies and dimensional models to access the data required for improved data analysis.
This approach exploits the use of structured information available in the healthcare industry
such as standard ontologies (e.g., ICD (International Classification of Disease)-10 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], designed
to provide diagnostic codes for classifying diseases, and SNOMED-CT [9] which provides a
comprehensive terminology for clinical health) as well as hospital organizational and other
relevant hierarchies.
      </p>
      <p>In this paper we propose a data mining approach that integrates with and augments process
mining which also utilizes healthcare ontologies. The proposed approach has the potential to
analyze patients with similar issues. We present a flexible approach for extracting healthcare
data through a data preprocessor designed to pipe the data extracted according to a particular
user’s specifications to existing data analysis tools. The experiments described in this paper
used data on patient admission, diagnosis, and procedural activities over the period of 6 months
taken from the Norwegian Patient Registry (NPR data).</p>
      <p>The rest of this paper is organized as follows. Section 2 outlines popular solutions available
for analysing healthcare data and discuss some of the problems that remain. Section 3 describes
how our data abstraction methods can be used to enhance output from community detection
technology. Section 4 discusses related work and in section 5 we conclude the paper and give
directions for our future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Existing solutions and some problems that remain</title>
      <p>In this section we outline popular techniques for analyzing healthcare data, both timed (i.e.,
associated with processes) and untimed, and point out several problems that remain.</p>
      <sec id="sec-2-1">
        <title>2.1. Process mining</title>
        <p>In many developed countries, most citizens use public healthcare services involving multiple
service providers providing diverse services supported by numerous software applications.
Various types of data analysis are required due to the various roles played by healthcare
professionals e.g., while a clinician is interested to see the progression of diseases and to study
the efects of diferent treatment procedures, healthcare managers are interested to study the
patient flow from one clinic to another and the number of patients waiting to get services.</p>
        <p>In principle, process mining can be used to acquire and explore new knowledge about patient
condition, adherence to treatment program, efectiveness of treatment, etc. The main focus
of process mining in healthcare is to provide evidence-based process analysis techniques for
efective process management [ 10]. It combines data mining and process analysis techniques
and is used to discover trends and patterns of process executions by analyzing the trace of
activities performed in a system. These traces are referred to as ‘event logs’. However, existing
process mining techniques use event logs as input which typically consist of event information
such as case-id, event time, event name and attributes of events. Since healthcare information
is much more complex than information from most other domains, current process mining
techniques cannot adequately mine the information from the event logs required by the various
health professionals and therefore are not exploited to their full potential.</p>
        <p>In a healthcare setting, patients may need to visit various health facilities for various health
related issues and also may need to use certain online treatment programs. Hence, for efective
process mining, event logs representing activities of various systems must be considered. Event
logs from a variety of systems using various data definitions and formats must therefore be
harmonized before they can be analysed by any process mining algorithm. Another issue with
applying process mining in healthcare data is related to the diversity of information. Identifying
common pathways for patient flow in healthcare systems is complicated by the large variety
of patient conditions and diagnoses. In healthcare, the data preparation task is very critical
as healthcare data are very sensitive and therefore semantics of the data must be preserved.
Getting the right setup for data preparation is important to get the best understanding from
the data as eficiently as possible [ 11]. The current situation in healthcare information systems
makes this a complicated problem indeed.</p>
        <p>
          Imagine a situation where we have event logs representing patient information with respect
to patient visits to various clinics (or service points in the hospital), and patient diagnoses.
Table 1 shows a portion of such an event log. The first column of the event log shows case
id; the second column shows when the event occurred; the third column shows event names,
(such as check in at a department or clinic, or a clinical finding) and the fourth column shows
the resource for the information. Due to the complexity in the process flow and diversity of
to be prepared in diferent ways in order to support a wider variety of queries. In the following,
we provide an example to illustrate this need. Considering the event logs from Table 1, suppose
we wish to investigate the flow of those patients to various departments in the hospital who
have been diagnosed with some mental disorders, i.e., they have an ICD-10 code in the range
F00-F99.9. We wish to focus on only those patients who are very similar or, in other words,
belong to a community structure. The event logs in Table 1 consists of some information at a
very low level, i.e., visit to clinics and specific diagnosis information. We need to relate clinics
to their department hierarchy and specific diagnosis information to their diagnosis group from
ICD-10 ontology. Relating clinics to departments and diagnosis to diagnosis group allows us to
choose the desired abstraction layer. This choice of abstraction layer for defining the context
and integrate community detection algorithms are not supported by existing process mining
tools. In [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] we argued that the current practice of process mining needs to be advanced by
means of a rich information model which supports various perspectives needed to analyze event
logs.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Contextualization using data mining</title>
        <p>
          Healthcare data are received from various healthcare service providers and personal health
applications such as patient monitoring applications and sensor devices. It is becoming a
dificult task for health caregivers to analyze all these data from various sources and extract
information in a meaningful way. Visualization of healthcare information can greatly help
care providers to gain insights from patients data and make important decisions [14]. Data
mining techniques such as community detection are used to identify useful information from
healthcare data and the outcome of such data mining technologies can be useful to healthcare
providers enabling them to understand efectiveness of treatment processes and to streamline
healthcare processes and treatment protocols [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Community structure in patient records
based on patient condition and symptoms provide useful information for eficiently detecting
unknown and valuable information. Date mining approaches can also be used to help medical
researchers in making eficient healthcare policies, constructing drug recommendation systems,
and developing health profiles of individuals [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Community detection algorithms are easy to
apply for healthcare data as they are based on unsupervised learning techniques, i.e., the data
does not need to be labeled.
        </p>
        <p>One of the challenges for revolutionizing current healthcare systems is to adapt the treatment
procedure according to the various needs of a diverse group of patients. This requires better
understanding of each patient’s condition which include patient demographics, symptoms,
preferences, interests, etc. In general, healthcare systems consist of various processes inside hospital
organizations and several other processes outside the boundary of healthcare organization to
support online treatments. The central motive of personalized medicine is the premise that an
individual has a significant role in disease vulnerability and the selection of treatments [ 15].
Such personalization and adaptive treatment systems behave diferently for diferent users. The
decision on how these systems should behave for any particular user is based on a user model,
which is a detailed representation of an individual user’s information, such as user preferences,
interests, behavior, background, knowledge, individual traits, and others. As these attributes are
dynamic it is essential to create, maintain, and update the user model.</p>
        <p>
          There exists several community detection algorithms to partition the dataset based on the
association of its individuals. Patient demographic information along with other information
can be efectively used to group patients using clustering or community detection algorithms
such as k -Means Clustering Algorithm [
          <xref ref-type="bibr" rid="ref2">16, 2</xref>
          ], Community detection algorithm [
          <xref ref-type="bibr" rid="ref4 ref5">5, 4</xref>
          ]. [17]
constructs a lower-dimensional feature matrix of the weighted network using a deep sparse
autoencoder and then performs K-means algorithm for obtaining the community detection
results. [18] proposes an embedding-based method that tackles the problem of overlapping
communities. They design a Generative Adversarial Net (GAN) for optimization. [19] propose a
Markov Random Field (MRF) method formalizing modularity as the energy function. However,
such algorithms are not equipped with healthcare ontologies to provide perspectives from
various abstraction levels. There are situations where domain experts would like to investigate
patient problems on a specific area with a greater detail but at the same time only get an
overview for other areas with an abstract level. For example, a psychologist would be interested
to see the patients with specific mental problems and their co-morbidity with ailments such as
diabetes, cardiovascular disease, cancer, infectious diseases, etc. In the next section we propose
a solution that allows specifying contexts for mining patient information from a variety of
abstraction levels, leading to improved process mining.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Contextual process mining</title>
      <p>
        To exploit the vast amount of information available in healthcare information systems, we need
to develop ways to support contextual data analysis. One may need to both filter information
to select certain subpopulations of patients, or certain clinical contexts, and also to group
information to allow patterns to emerge from a more abstract view. The main challenge in
performing data analysis using existing process mining and clustering tools and techniques is
the lack of support for the needed filtering and abstraction which would give the individual
practitioner or administrator the ability to extract and explore data from whatever perspective
they deem appropriate. In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Rabbi et al., presented a model based slicing technique for process
mining which utilizes dimensional modeling and ontological representations of healthcare
information. The slicing techniques allow domain experts to analyze pathways for patients
from various contexts e.g., patients diagnosed with cancer, or patients admitted into the women’s
clinic, etc. The approach was based on filtering mechanism where we allowed filtering over the
abstraction level on healthcare ontologies, but this approach was limited to selecting patient
groups of similar kinds. For example, in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] it is possible to filter patients who have been
diagnosed with any ’mental disorder’ but that may include patients who have been through a
variety of other issues. Patient  1 may be quite diferent than  2 even though they had some
commonalities. In this paper, we propose to utilize a community detection algorithm over
patient records to identify patient groups with common issues. In order to support a variety
of abstraction levels, we introduce a pre-processing step where we manipulate the input for
existing community detection algorithms. The input to the community detection is manipulated
based on a dimensional model which includes healthcare ontologies.
      </p>
      <p>The NetworkX python library allows us to construct directional and bi-directional graphs
with many nodes and edges. The library allows us to export the graphs in various formats such
as GraphML which can be used to import graph visualization tools such as Gephi. Figure 2
shows the clustering output of a sample NPR data. For this experiment we filtered the patients
by their association with diagnosis. We constructed a graph that consists of nodes with patients
identification (shown in integer number), and patient diagnosis (ICD-10 code) and with edges
representing the relation between a patient and their diagnosis. In Figure 2 we used the low
level ICD-10 code, i.e., leaf nodes from the ICD-10 hierarchy.</p>
      <p>The figure shows some diagnoses were made for many individuals (the large purple cluster
associated to F900) and some diagnoses were made for only a few individuals (the small grey
cluster associated to F422 in the lower left hand region of Figure 2); it also shows that there
are some isolated clusters (such as the grey cluster associated to the F422 diagnosis) as well as
some clusters which seem to contain some of the same individuals, i.e., we see some individuals
which occur between two or more diagnosis (individual 10 in green appears to belong to both
the F401 and F321 diagnoses) indicating that individual 10 had both diagnoses.</p>
      <p>
        To produce Figure 2 we used a community detection algorithm [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] which is available in the
Gephi tool. The colors of the nodes indicate which cluster they belong to. The Gephi tool has
been designed to support the visualization of large networks and the tool is equipped with
many dynamic features such as re-arrangement of nodes. To produce this figure we have used
‘ForceAtlas’ layout algorithm with scaling 1.5 and gravity 3.0. A lower gravity value and higher
scaling would allow us to separate the nodes but it would consume more space.
130
      </p>
      <p>135</p>
      <p>Although this visualization gives us an idea of overall clustering associated with diagnoses,
and indicates that some patients are associated with more than one diagnoses, the fine grained
detail prevents us from getting a quick visualization giving meaningful information on patients
having more than one diagnoses.</p>
      <p>We constructed another graph, see Figure 3, where we used the group code of ICD-10
diagnostics code (i.e., diagnoses in the range F00_F09 are grouped, etc). Here diferent colours
denote diferent clusters and the size of the node representing the group code is determined by
the degree of the node (i.e., the number of patients with that group code). This abstraction yields
a better visualization: we can more easily pick the patients who have been diagnosed with
multiple mental issues. From this figure we can see that many of the patients were diagnosed
with F40_F48 (Anxiety, dissociative, stress-related, somatoform and other nonpsychotic mental
disorders), F30_F39 (Mood [afective] disorders), F20_F29 (Schizophrenia, schizotypal and
delusional disorders), F90_F98 (Unspecified mental disorder), F10_F19 (Mental and behavioural
disorders due to psychoactive substance use) and the overlap is indicated by the patient nodes
in the center of the graph with their association with multiple diagnosis nodes.</p>
      <p>Now that we have seen how ontologies can be incorporated in community detection, we
show how the selected group of patient event logs can be analyzed with process mining tools.
An analyst may be interested in investigating the progression of diseases or the admission flow
of patients in diferent departments. In our approach, appropriate events are prepared based on
Raw data</p>
      <p>A00B99</p>
      <p>D50D89
F00</p>
      <p>F99</p>
      <p>DiSmeeanrcshion
ZZ0909Community detection basedon a
chosen abstraction level
The flow shows theprogression of diseases for a
selected group of patients.</p>
      <p>Processmining based on Selected events
from a community of patients.</p>
      <p>User also specifies the dimension for
processmining.</p>
      <p>Patients with similar issues are
found in the samecommunity.</p>
      <p>
        The flow shows themovement of patients to
differentdepartments in the hospital.
the selection of event types e.g., diagnosis, admission, etc. We enrich the model based slicing
approach presented in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] with a two step approach where in the first step we contextualize the
dataset with patients from specific community and in the next step we provide the contextual
information to a process mining tool. Figure 4 shows the approach proposed in the paper. The
raw data includes patient diagnosis and admission related information. We include a variety of
hierarchical search dimensions (e.g., diseases, admissions) for the data analysis tasks. While
applying community detection over patient records, we have the opportunity to specify which
search dimension we would like to explore. In this example, we have specified a specific group
of diagnosis code range ( 00 − − 99 ). The output of community detection algorithm is shown
on the bottom left side of the figure. The communities can be further analyzed by means of a
process mining technique. While applying process mining technique, we have the opportunity
to specify which dimension we wish to explore. This selection is taken into account to prepare
the event log for the process mining technique. The output of process mining technique is
shown on the right side of the figure. In this example, we have shown flows for two diferent
dimensions: the progression of diseases of a particular group of patients (top right side of the
ifgure) and patient admission flow in various departments (bottom right side of the figure). The
proposed approach is supported by a prototype implementation which extends the approach
presented in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
4. Related work
van der Aalst presented four diferent analysis perspectives for process mining in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] which
include control-flow perspective, organizational perspective, case perspective, and time
perspective. These perspectives are useful to understand the ordering of activities, the roles of resources,
the attributes related to a particular case, and the frequency and timing of events. Although
these perspectives can be used to derive useful insight by analyzing event logs from diferent
points of view they lack an abstraction mechanism which will allow health professionals to both
mine relevant information from highly discipline specific data sources and also to process event
data from often highly individualistic patient pathways in order to discover common pathways.
      </p>
      <p>Bistarelli et al. presented a prototype tool called PrOnto in [20] which can discover business
processes from an event log and classify them w.r.t a business ontology. The tool takes an event
log file as input and produces an UML based activity diagram in XML format. The aim of the
approach is to raise the level of abstraction in process mining by utilizing business ontologies.
They proposed an ontology representing the hierarchy of resources. In their approach, resources
are the actors of the activities. Since the resources are given an ontological hierarchy, it is
possible to define which level of abstraction will be used for process mining. They proposed
to use integer numbers to define the level of abstraction. Defining a high level of abstraction
would merge several activities being performed by all the actors that belong to the high level
classification of resources. Our work is diferent from the approach presented in [ 20] in a sense
that we proposed to use dimensional models and ontologies to classify event logs. Our approach
is more flexible, since it is possible to be more specific in one portion of the process model
while being more generic in another portion of the process model. The idea of combining
dimensional modeling with ontologies is novel in this paper. We have shown that the mining
process includes some preprocessing steps. In these preprocessing steps the user specifies
the context (using dimensional modeling and ontologies) to define a patient group and also
specifies the level of abstraction (using dimensional modeling and ontologies) that will be used
to visualize the process mining output.</p>
      <p>In [21], the authors discussed the application of process mining in healthcare and provided
an overview of frequently asked questions by medical professionals in process mining projects.
The questions reflect the medical professionals’ interest both in learning common pathways
of diferent patient groups, to determine their compliance with internal and external clinical
guidelines, and also in gathering information about the throughput times for treating patients.
The authors pointed out the need for accumulating data from diferent data sources and they
claimed this to be a major challenge in healthcare. In the conclusion of the paper the authors
suggested that ontologies can be used for defining appropriate scope and for identifying the
cases from diferent data sources. They urged the exploitation of ontology-based process mining
approaches in the healthcare domain.</p>
      <p>In [22], the authors discussed the necessity of relating elements in event logs with their
semantic concepts. By linking event logs with the concepts from an ontology they presented a
process mining approach that performs concept-based analysis. The idea of using semantics
makes it possible to automatically reason or infer relationships of concepts that are related.
They distinguished between the application of process mining in two diferent levels: instance
level and conceptual level. They illustrated their ideas with an example process model to
repair telephones in a company. That process model included three diferent ontologies: a
Task ontology, a Role ontology and a Performer ontology. The idea of using an ontology for
process mining presented in [22] is very similar to our approach. The idea of filtering based
on ontological concepts and the idea of grouping nodes by a high level ontological concept is
similar. However, in our approach we emphasize the benefits of on ontology based process
mining for the healthcare domain. While in [22] the authors implemented their technique in
ProM, our approach is more general as it ofers a pre-processing step where we filter events for
patients from a tightly-knit group using data mining technique and import the filtered events
to Fluxicon Disco process mining tool.</p>
      <p>
        A K-means clustering algorithm was presented in [23] where the authors enhanced the
traditional K-means algorithm by means of a semantic model. The approach is similar to our
community detection analysis where we utilize an abstract layer which gives semantics for our
model. In our approach we focused on the usability of incorporating a flexible abstraction layer
for healthcare data analysis. Rosvall et al. in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] provided a list of approaches for community
detection and briefly presented the potential of utilizing network abstraction for understanding
an air trafic system. Again, they did not present any example of abstraction which can be
applied in healthcare data to enhance clustering and its visualization.
      </p>
      <p>Machine learning techniques have been employed in a variety of healthcare studies such as
diagnostic code assignment [24], patient representation [25], etc. However, the potential for
these machine learning algorithms and their integration with process mining techniques for
analyzing healthcare information needs to be exploited. Healthcare data, due to its complex
nature, can be modeled as a heterogeneous network. Representation learning methods [26]
can be used to analyze the community structure of healthcare information represented as
heterogeneous networks. According to [27] representation learning methods on heterogeneous
networks could be divided into three categories: Path based, Semantic unit based and Other
methods. Similar to node2vec [28] for homogeneous node embedding methods that preserves
the random-walk probabilities in the feature space, for heterogenious networks, one might try to
preserve metapath probabilities, which is a more sophisticated variation of path. MetaPath2vec
[29], MetaGraph2vec [30] fall into the Path based category which use path-based random-walks
and a heterogeneous skip-gram model to learn node representation vectors. The basic principle
of graph representation algorithms is to preserve the relationship and structural properties of
network in a low dimensional vector space which therefore can essentially be used for extracting
information about similarity of nodes from a network. Semantic unit based based algorithms
such as [31] define particular semantic units by means of capturing semantic information
in the embedding space. Du et al. [32] presented an algorithm for network representation
learning based on a graph partitioning strategy where a heterogeneous network is partitioned
into homogeneous and bipartite subnetworks and the projective relations hidden in bipartite
subnetworks are extracted by learning the projective embedding vectors. Although these
machine learning based approaches are relevant for studying healthcare information, they need
to be adapted to support the analysis of healthcare information which includes large healthcare
ontologies, and temporal aspects.</p>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusion and Future Work</title>
      <p>This paper proposes a model-based approach for determining patient pathways based on
contextual process mining. We incorporate visualization techniques for filtering and analyzing
patient records. Setting the context for process mining using community detection together
with dimensional analysis enriched with healthcare ontologies is novel in this paper. The
approach we presented here uses a flexible abstraction layer which can be tailored to meet the
needs of the user. We envision a healthcare information system that supports an easy-to-use
tool based on a graphical query language that permits a variety of healthcare professionals to
investigate data from their personal and specialized perspective. In future we plan to develop
methods and tools incorporating state-of-the art artificial intelligence techniques to apply our
approach to a large healthcare dataset.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgement</title>
      <p>This research was supported by the Research Council of Norway with funding to INTROMAT
(INtroducing personalized TReatment Of Mental health problems using Adaptive Technology),
project number 259293.
health problems / World Health Organization, 10th revision, 2nd ed. ed., World Health
Organization Geneva, 2004.
[9] O. Bodenreider, R. Cornet, D. J. Vreeman, Recent developments in clinical terminologies
snomed ct, loinc, and rxnorm., Yearbook of medical informatics 27 (2018) 129–139.
[10] A. Partington, M. Wynn, S. Suriadi, C. Ouyang, J. Karnon, Process mining for clinical
processes: A comparative analysis of four australian hospitals, ACM Trans. Manage. Inf.</p>
      <p>Syst. 5 (2015) 19:1–19:18.
[11] D. Pyle, Data Preparation for Data Mining, 1st ed., Morgan Kaufmann Publishers Inc., San</p>
      <p>Francisco, CA, USA, 1999.
[12] C. Günther, A. Rozinat, Disco: discover your processes, in: N. Lohmann, S. Moser (Eds.),
Proceedings of the Demonstration Track of the 10th International Conference on Business
Process Management (BPM 2012), CEUR Workshop Proceedings, CEUR-WS.org, 2012, pp.
40–44. Demonstration Track of the 10th International Conference on Business Process
Management, BPM Demos 2012 ; Conference date: 04-09-2012 Through 04-09-2012.
[13] B. F. van Dongen, A. K. A. de Medeiros, H. M. W. Verbeek, A. J. M. M. Weijters, W. M. P.
van der Aalst, The prom framework: A new era in process mining tool support, in:
G. Ciardo, P. Darondeau (Eds.), Applications and Theory of Petri Nets 2005, Springer Berlin
Heidelberg, Berlin, Heidelberg, 2005, pp. 444–454.
[14] F. Rabbi, J. D. Wake, T. Nordgreen, Reusable data visualization patterns for clinical practice,
in: Ö. Babur, J. Denil, B. Vogel-Heuser (Eds.), Systems Modelling and Management - First
International Conference, ICSMM 2020, Bergen, Norway, June 25-26, 2020, Proceedings,
volume 1262 of Communications in Computer and Information Science, Springer, 2020, pp. 55–72.</p>
      <p>URL: https://doi.org/10.1007/978-3-030-58167-1_5. doi:10.1007/978- 3- 030- 58167- 1\_5.
[15] U. Ozomaro, C. Wahlestedt, C. B. Nemerof, Personalized medicine in psychiatry: problems
and promises, BMC medicine 11 (2013) 132.
[16] J. MacQueen, Some methods for classification and analysis of multivariate observations, in:
Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability,
Volume 1: Statistics, University of California Press, Berkeley, Calif., 1967, pp. 281–297.
[17] S. Li, L. Jiang, X. Wu, W. Han, D. Zhao, Z. Wang, A weighted network community detection
algorithm based on deep learning, Applied Mathematics and Computation 401 (2021)
126012.
[18] Y. Jia, Q. Zhang, W. Zhang, X. Wang, Communitygan: Community detection with
generative adversarial nets, in: The World Wide Web Conference, 2019, pp. 784–794.
[19] D. He, W. Song, D. Jin, Z. Feng, Y. Huang, An end-to-end community detection model:
Integrating lda into markov random field via factor graph, in: Proceedings of the
TwentyEighth International Joint Conference on Artificial Intelligence, IJCAI-19, International
Joint Conferences on Artificial Intelligence Organization, 2019, pp. 5730–5736.
[20] S. Bistarelli, T. D. Noia, M. Mongiello, F. Nocera, Pronto: an ontology driven business
process mining tool, Procedia Computer Science 112 (2017) 306 – 315.
[21] R. S. Mans, W. M. P. van der Aalst, R. J. B. Vanwersch, A. J. Moleman, Process mining
in healthcare: Data challenges when answering frequently posed questions, in: R. Lenz,
S. Miksch, M. Peleg, M. Reichert, D. Riaño, A. ten Teije (Eds.), Process Support and
Knowledge Representation in Health Care, Springer Berlin Heidelberg, Berlin, Heidelberg, 2013,
pp. 140–153.
[22] A. K. A. de Medeiros, W. M. P. van der Aalst, Process mining towards semantics, in:
Advances in Web Semantics I, volume 4891 of Lecture Notes in Computer Science, Springer,
2009, pp. 35–80.
[23] Z. Liu, J. Bao, F. Ding, An improved k-means clustering algorithm based on semantic
model, in: Proceedings of the International Conference on Information Technology and
Electrical Engineering 2018, ICITEE ’18, Association for Computing Machinery, New York,
NY, USA, 2018.
[24] V. Mayya, S. K. S., G. S. Krishnan, T. Gangavarapu, Multi-channel, convolutional attention
based neural model for automated diagnostic coding of unstructured patient discharge
summaries, Future Generation Computer Systems 118 (2021) 374–391.
[25] X. Zhang, B. Qian, Y. Li, Y. Liu, X. Chen, C. Guan, C. Li, Learning robust patient
representations from multi-modal electronic health records: a supervised deep learning approach,
in: Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), SIAM,
2021, pp. 585–593.
[26] W. L. Hamilton, R. Ying, J. Leskovec, Representation learning on graphs: Methods and
applications, 2017. Cite arxiv:1709.05584Comment: Published in the IEEE Data Engineering
Bulletin, September 2017; version with minor corrections.
[27] Y. Xie, B. Yu, S. Lv, C. Zhang, G. Wang, M. Gong, A survey on heterogeneous network
representation learning, Pattern Recognition 116 (2021) 107936.
[28] A. Grover, J. Leskovec, Node2vec: Scalable feature learning for networks, in: Proceedings
of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, KDD ’16, ACM, New York, NY, USA, 2016, p. 855–864.
[29] Y. Dong, N. V. Chawla, A. Swami, Metapath2vec: Scalable representation learning for
heterogeneous networks, in: Proceedings of the 23rd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, KDD ’17, Association for Computing
Machinery, New York, NY, USA, 2017, p. 135–144.
[30] D. Zhang, J. Yin, X. Zhu, C. Zhang, Metagraph2vec: Complex semantic path augmented
heterogeneous network embedding, in: D. Phung, V. S. Tseng, G. I. Webb, B. Ho, M. Ganji,
L. Rashidi (Eds.), Advances in Knowledge Discovery and Data Mining, Springer
International Publishing, Cham, 2018, pp. 196–208.
[31] G. Fu, B. Yuan, Q. Duan, X. Yao, Representation learning for heterogeneous information
networks via embedding events, in: International Conference on Neural Information
Processing, Springer, 2019, pp. 327–339.
[32] W. Du, S. Yu, M. Yang, Q. Qu, J. Zhu, Gpsp: Graph partition and space projection based
approach for heterogeneous network embedding, in: WWW (Companion Volume), 2018,
pp. 59–60.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Rojas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Munoz-Gama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sepúlveda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Capurro</surname>
          </string-name>
          ,
          <article-title>Process mining in healthcare: A literature review</article-title>
          ,
          <source>Journal of Biomedical Informatics</source>
          <volume>61</volume>
          (
          <year>2016</year>
          )
          <fpage>224</fpage>
          -
          <lpage>236</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Haraty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dimishkieh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Masud</surname>
          </string-name>
          ,
          <article-title>An enhanced k-means clustering algorithm for pattern discovery in healthcare data</article-title>
          ,
          <source>Int. J. Distributed Sens. Networks</source>
          <volume>11</volume>
          (
          <year>2015</year>
          )
          <volume>615740</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>615740</lpage>
          :
          <fpage>11</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Kianifard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Obi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Arcona</surname>
          </string-name>
          ,
          <article-title>Cluster analysis and its application to healthcare claims data: a study of end-stage renal disease patients who initiated hemodialysis</article-title>
          ,
          <source>BMC Nephrology 17</source>
          (
          <year>2016</year>
          )
          <article-title>25</article-title>
          . doi:
          <volume>10</volume>
          .1186/s12882- 016- 0238- 2.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rosvall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Delvenne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Schaub</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lambiotte</surname>
          </string-name>
          , Diferent approaches to community detection,
          <source>CoRR abs/1712</source>
          .06468 (
          <year>2017</year>
          ). arXiv:
          <volume>1712</volume>
          .
          <fpage>06468</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V. D.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-L.</given-names>
            <surname>Guillaume</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lambiotte</surname>
          </string-name>
          , E. Lefebvre,
          <article-title>Fast unfolding of communities in large networks</article-title>
          ,
          <source>Journal of Statistical Mechanics: Theory and Experiment</source>
          <year>2008</year>
          (
          <year>2008</year>
          )
          <article-title>P10008</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          , Process Mining: Discovery, Conformance and Enhancement of Business Processes, 1st ed., Springer Publishing Company, Incorporated,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Rabbi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lamo</surname>
          </string-name>
          , W. MacCaull,
          <article-title>A model based slicing technique for process mining healthcare information</article-title>
          , in: Ö.
          <string-name>
            <surname>Babur</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Denil</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Vogel-Heuser</surname>
          </string-name>
          (Eds.),
          <source>Systems Modelling and Management - First International Conference, ICSMM</source>
          <year>2020</year>
          , Bergen, Norway, June 25-26,
          <year>2020</year>
          , Proceedings, volume
          <volume>1262</volume>
          of Communications in Computer and Information Science, Springer,
          <year>2020</year>
          , pp.
          <fpage>73</fpage>
          -
          <lpage>81</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W. H.</given-names>
            <surname>Organization</surname>
          </string-name>
          ., ICD-10 :
          <article-title>international statistical classification of diseases and related</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>