<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>analysis of longitudinal data of patients with dementia through unsupervised techniques</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Patrizia Ribino</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudia Di Napoli</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Paragliola</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Serino</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesca Gasparini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davide Chicco</string-name>
          <email>davidechicco@davidechicco.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Informatica Sistemistica e Comunicazione, Università di Milano-Bicocca</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Health Policy Management and Evaluation, University of Toronto</institution>
          ,
          <addr-line>Toronto, Ontario</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Istituto di Calcolo e Reti ad Alte prestazioni, Consiglio Nazionale delle Ricerche (CNR)</institution>
          ,
          <addr-line>Naples</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Istituto di Calcolo e Reti ad Alte prestazioni, Consiglio Nazionale delle Ricerche (CNR)</institution>
          ,
          <addr-line>Palermo</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>NeuroMI, Milan Center for Neuroscience, Università di Milano-Bicocca</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Dementia is a set of mental diseases afecting millions of people worldwide. Similarly to all the other mental health issues, it is often dificult to forecast the trend of the disease for patients sufering from it. In this context, data of patients sufering from mental health are usually collected through questionnaires, psychological and cognitive tests, over several timepoints. This way, longitudinal data can help identify disease trajectories and allow medical doctors to forecast specific treatments. In this study, we analyze an open, unrestricted dataset of electronic health records (EHRs) of patients sufering from dementia, called OASIS-2, through several unsupervised machine learning methods (  means, Hierarchical Clustering, Gaussian Mixture Model, and Spectral Clustering). This dataset contains demographic data and psychological test data collected over five independent visits, and having 142 patients at the first visit and ten features. Our goal is to identify patients' clusters that stay stable over the ifrst four visits (we discarded the data of the fith visit because of its small size) these clusters by studying their variables. We also measure the performances of the clustering methods through conventional metrics for internal and external validation. Our preliminary results show that unsupervised techniques can identify significant clusters of patients with mental health issues in this dataset and that Hierarchical Clustering outperforms the other algorithms to this end. dementia, mental health, clustering, unsupervised machine learning, electronic health records, older Dementia is the generic name of a set of health issues regarding patients' mental states, including Alzheimer's disease, Parkinson's disease, dementia with Lewy bodies and others. Diferently from other body diseases, such as breast cancer, patients with dementia cannot undergo a ∗Corresponding author.</p>
      </abstract>
      <kwd-group>
        <kwd>adults</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>CEUR
Workshop
Proceedings
surgical operation, and therefore, it is dificult to treat and even recognize patients sufering
from this disease.</p>
      <p>
        Diagnosis of dementia can be done through computed tomography (CT) scan or magnetic
resonance imaging (MRI), but these bioengineering techniques require expensive machines
that are sometimes unavailable in hospitals. In this context, diagnosis of dementia is often
found through cognitive tests: patients are asked to answer some questions and some tests, in a
written, oral, or computerized form, and their answers are recorded to generate a cognitive test
score. The most common cognitive tests are the Clinical Dementia Rating (CDR) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and the
Mini Mental State Examination (MMSE) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Results of these cognitive tests taken only once do not say much about the mental condition
of the patient; therefore, to have a reliable diagnosis, it is necessary to ask the patient to undergo
these cognitive exams multiple times, at diferent time points ( for example, once every 90 days,
one year, or two years). This way, the cognitive decline and the patient’s diagnosis can be
understood more clearly.</p>
      <p>Longitudinal data collected through these cognitive tests can be included in electronic health
records (EHRs), which in turn can be used for scientific analyses. Unsupervised machine
learning approaches can help identify clusters of patients having a similar trend over time.</p>
      <p>In this study, we analyzed a public dataset derived from EHRs called OASIS-2, which contains
data on patients sufering from dementia collected at 5 visits. We applied a set of clustering
techniques ( -means, Hierarchical Clustering, Gaussian Mixture Models, and Spectral Clustering)
to identify groups of patients following the same trends over time. Eventually, we investigated
the features of these clusters and identified some significant clusters where patients share
common traits. Our results show that unsupervised methods can be efective in identifying
meaningful groups made of patients sufering from dementia and of healthy individuals, and
can have a strong impact on medical practice: once our results are confirmed and validated
on another dataset, physicians will be able to use our findings to associate a new patient with
one of our clusters, after collecting their cognitive tests’ results Medical doctors will be able to
employ this information to design a better treatment.</p>
      <p>
        Predicting dementia trends is useful for medical and economic reasons: patients sufering
from dementia, in fact, need constant assistance through home care facilities or hospitals, which
can become expensive. Understanding the semantics of relevant patients’ clusters, especially in
the ageing society, can be useful for medical and economic reasons. We designed this study also
to investigate which clinical factors can better characterize each cluster, towards the framework
of the “minimal electronic health record” [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3, 4, 5</xref>
        ].
      </p>
      <p>
        Literature review Electronic health records (EHRs) analysis represents an unprecedented
source of information for health-related applications, ranging from epidemiological monitoring
of population diseases to treatment improvement and clinical research [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. These new data
sources may ofer new insights into the underlying heterogeneity of dementia, one of the
main causes of Alzheimer’s disease. Nevertheless, accurate analytic models from EHR data are
challenging due to data quality, data and label availability, and heterogeneity of data types.
      </p>
      <p>
        Traditional health analytics modelling relies on expert-defined phenotyping and ad-hoc
feature engineering, leading to models that present limited generalizability across diferent datasets.
Machine learning shifted the modelling paradigm from expert-driven feature engineering to
data-driven feature construction [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Several works propose supervised learning methods to
discover interconnections between diseases, predict the health status of patients, and prevent
diseases. Considering the lack of standardized instruments for detecting dementia, it is crucial
to develop methods to predict in advance the personalized risk of dementia to prevent it.
      </p>
      <p>
        Deep Feedforward Networks which are Multi-Layer Perceptron models (MLP1 and MLP2),
and a Convolutional Bidirectional Long Short-Term Memory were compared in predicting the
risk of dementia using the Alzheimer’s Disease Neuroimaging Initiative data [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The proposed
models identify diferent patterns for Dementia, Minor Cognitive Impairment, and Cognitive
Normal classes, but lots of preprocessing on the available data was required.
      </p>
      <p>In the [9] study, authors analyse the Alzheimer’s Disease Neuroimaging Initiative (ADNI)
dataset with supervised machine learning techniques to predict future disease states, considering
only data from non-invasive measurements derived from blood tests.</p>
      <p>In [10], logistic regression (LR), Least Absolute Shrinkage and Selection Operator (LASSO),
random forest (RF), and eXtreme Gradient Boosting (XGBoost) algorithms were used to identify
probable AD and related dementia subphenotypes using routinely collected data from EHRs.</p>
      <p>Others propose unsupervised learning methods to identify new patterns among no-labelled
sampled data that can be used to predict the possible evolution of a disease. One of the most
important unsupervised learning techniques is clustering, which can help discover patterns
and structures in labelled and unlabeled datasets, allowing the distribution of dementia patients
into subtypes based on key features recorded in the EHR. Moreover, clustering algorithms can
ifnd patterns dificult to detect even by specialized medical doctors. In [ 11], they propose the
Poisson Dirichlet Model (PDM), an unsupervised generative probabilistic model based on the
Latent Dirichlet Allocation (LDA) to discover latent disease clusters and to stratify patients
into subgroups with similar characteristics and risk factors. The proposed method identifies
latent comorbidities that provide additional information on the risk factors of developing the
disease other than those correlated to age and sex. Diferent clustering algorithms,  -means,
kernel  -means, afinity propagation and latent class analysis, were employed in [ 12] to identify
subtypes of Alzheimer’s disease from EHRs. Diferent clusters were found with each clustering
method, and one particular cluster resulted in three out of the four adopted clustering methods,
suggesting the plausible presence of a specific disease subtype.</p>
      <p>A multi-layer clustering algorithm was proposed in [13] to construct clusters of late MCI
subjects. Clusters of slow and rapidly declining subjects within the category of late MCI
were identified, showing pathological diferences that suggest the need to subclassify late MCI
subjects further.</p>
      <p>A two-stage clustering analysis [14] was applied to individuals with or without dementia
to identify subsets on which profiling is carried out according to features reported in the
Alzheimer’s features dataset. The analysis provided distinct patterns in the characteristics of
dementia patients within the diferent clusters regarding their sex, socio-economic status, age,
education level, and cognitive status.</p>
      <p>Most studies difer in the dimensions used by the clustering algorithms, the dataset used,
and the variables and groups included in the datasets [15]. In addition, these studies based on
EHRs obtained with cognitive tests and brain scans do not consider the progression of dementia.
Hence, a longitudinal dimension to clustering is necessary to improve the identification of risk
factors for the future prognosis of the disease.</p>
      <p>We organize the rest of the article as follows. After this Introduction, we describe the analyzed
dataset in section 2 and the unsupervised machine learning methods we use in section 3.
Aftwerwards, we report our results in subsection 3.3, and discuss them by outlining some
conclusions in section 4.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Dataset</title>
      <p>
        The dataset used in this paper was derived from the Open Access Series of Imaging Studies
(OASIS 2) with longitudinal MRI data [16, 17], and publicly released online [18, 19]. The people in
the OASIS 2 study were chosen from a group of individuals who had taken part in magnetic
resonance imaging (MRI) longitudinal studies at the Washington University Alzheimer’s Disease
Research Center (ADRC). The choice was made based on the requirement of having at least
two separate visits where both clinical and MRI data were gathered. The project collected data
from MRI scans of 150 people between the ages of 60 and 96 years through a follow-up period.
Together with these magnetic resonance images, the curators of the dataset collected data of
the psychological tests taken by the patients and on their socio-economic status. The mean
follow—up time for the cohort is 2.91 (±0.01) years. Sample size significantly attenuates with
the number of visits. For this study, the single response variable was the Clinical Dementia
Rating (CDR), developed to measure dementia severity [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. A CDR value of 0 indicates no
dementia, and CDRs of 0.5, 1, 2, and 3 represent very mild, mild, moderate, and severe dementia,
respectively. The clinical diagnosis of AD was determined as a  ≥ 0.5 [16].
      </p>
      <p>
        Moreover, OASIS 2 dataset provides several independent variables from the patient
sociodemographic characteristics such as Sex (Female, Male), Age within the range of [60, 96] years
at the first visit, Education within the range of [
        <xref ref-type="bibr" rid="ref6">6, 23</xref>
        ] years, Socioeconomic Status (SES) (1 =
lower, 2 = lower middle, 3 = middle, 4 = upper middle, 5 = upper). Moreover, clinical predictor
variables were also available. These variables include the Mini-Mental State Exam (MMSE), the
Atlas scaling factor (ASF), the estimated total intracranial volume (eTIV), and the normalized
whole brain volume (nWBV).
      </p>
      <p>The MMSE is a 30-point questionnaire with 30 questions covering arithmetic, memory, and
orientation to examine the cognitive situations of individuals. The Estimated Total Intracranial
Volume (eTIV) estimates intracranial brain volume. The normalized Whole Brain Volume
(nWBV) measures the volume of the whole brain. Finally, the ASF is a one-parameter scaling
factor that allows for comparison of the estimated total intracranial volume (eTIV) based on
diferences in human anatomy. The explanation of the features of this dataset can be found in
Table 3 of the [20] study.</p>
      <p>For the analysis conducted in this work, we considered the samples gathered during the first
four visits, and then reported the results for the first three, since they were the most meaningful.
Moreover, the patients whose not all variables are present have been excluded from the analysis.
Thus, the number of patients analysed in this work has been reduced to 142. Particularly,
the sample size at the time of the first visit comprises 86 healthy subjects and 56 cognitively
impaired patients. In this cohort, there were 84 (that is, 59%) women and 58 (that is, 41%) men.
The subjects for the clustering were input without considering the subject-ID, MRI-ID, and visit
age</p>
    </sec>
    <sec id="sec-4">
      <title>3. Methods</title>
      <sec id="sec-4-1">
        <title>3.1. Our approach</title>
        <p>We downloaded the dataset from Mendeley Data [19] as a comma-separated value (CSV) file
of 27.8 kilobytes (kB). We split the dataset into 5 subsets, each corresponding to a single visit.
Since the fith visit subset contains data from only 6 patients, we discarded it:
we used the
minimal number of 10 subjects per visit as heuristic threshold for our analysis. We removed the
MR‘Delay variable because it has identical values for the first visit.</p>
        <p>We also removed the diagnosis variables CDR and dementia group to utilize them as validation
targets. After clustering patients’ data without these two variables, we study how the clusters
are characterized with respect to these two factors. For each cluster, we study the average CDR
value and the distribution of the dementia group among the patients.</p>
        <p>We then applied the  -means and Hierarchical Clustering methods on the first visit subset
to optimize the number of clusters, resulting in  = 3 in both cases measured through the
Silhouette score. We, therefore, decided to use  = 3 clusters for all the visits’ subsets and
algorithms to keep our analyses consistent.</p>
        <p>Our goal is to find clusters of patients that are stable both through the four visits and through
the clustering techniques and to analyze the characteristics and traits of these clusters based on
the dataset variables (Table 1).</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Clustering techniques</title>
        <p>For the aims of this paper, four types of clustering algorithms have been adopted:  -means [21],
Gaussian Mixture Models [22], Spectral Clustering [23], and Hierarchical Clustering [24]. We
selected the most popular clustering techniques for four diferent clustering approaches [ 25]:
partition-based ( -means), based on hierarchy (hierarchical clustering), based on algebraic graph
theory (spectral clustering), and based on Gaussian distribution (Gaussian mixture models with
posterior probability calculated thorugh expectation-maximization).</p>
        <p>We implemented our scripts by using the Python and R open source programming languages,
employing the scikit-learn software library of Python and the NbClust, sClust, FCPS, and
table1 packages in R.</p>
        <p>Silhouette Score
[−1, +1]
0.189937
0.180429
0.187361
0.202786</p>
        <p>Davies-Bouldin Score
[0, +∞]
1.795906
1.539607
2.041673
1.726597</p>
        <p>Calinski Harabasz Score
[0, +∞]
33.976650
21.358736
28.684796
24.466958
 -means
Spectral Clustering</p>
        <p>Gaussian mixture
Hierarchical Clustering</p>
        <p>-means
Spectral Clustering</p>
        <p>Gaussian mixture</p>
        <p>Hierarchical Clustering</p>
        <p>Table 3 and Table 2 show a comparative analysis of the results obtained by the diferent
clustering algorithms employed from a purely analytical point of view. It is worth noting that
both tables refer specifically to patients at their first visit, as this visit includes a larger number
of patients than subsequent visits.</p>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. Results</title>
        <p>The following section provides a comprehensive analysis of the clustering results achieved by
each adopted algorithm. The assessment of the adopted methods in clustering subjects relies
on applying the CDR and the dementia group as validation metrics.</p>
        <p>Gaussian Mixture Model Clustering Figure S1 reports the statistical results concerning
the patients across the three separate clusters determined by applying the Gaussian Mixture
model for each visit throughout the longitudinal study. By looking at the clustering results
obtained on patients examined during the first visit, it is possible to observe that a cohort of 37
individuals has been assigned to Cluster 1. This group exhibits an average age of 79.38 years,
an average educational status of 12.38 years, and tends to possess a moderate socio-economic
status (3.27) on average. Specifically, the patients grouped within Cluster 1 exhibit the highest
average age, the most elevated socio-economic status, and the lowest education level compared
to patients belonging to Cluster 2 and Cluster 3. Moreover, Cluster 1 exhibits the lowest MMSE
score (24.7) among the three clusters, along with the lowest average value of nWBV. It is worth
mentioning that approximately 70% of the individuals in this cluster have been identified as
having mild or moderate cognitive decline, as evidenced by the value of CDR score greater than
zero, and they have been targeted as Demented (Figure S1). Notably, a higher prevalence of
dementia is observed among males than their female counterparts.</p>
        <p>On the contrary, Cluster 3 consists of a cohort of 79 individuals, with a notable predominance of
females. This cluster is characterized by having the lowest age (73.94), the lowest eTIV (1393.34),
the highest MMSE (28.86), as well as the highest nWBV (0.75) on average. The subjects within
this cluster exhibit a higher education level than individuals in Cluster 1, notwithstanding a
marginally lower socioeconomic status. Furthermore, it is worth mentioning that a considerable
majority (74.7%) of patients in this particular cluster were initially identified as exhibiting normal
cognitive functioning during their initial visit, as indicated by a Clinical Dementia Rating score
of 0. In particular, the prevailing demographic trend within the non-demented patients suggests
a greater prevalence of females. However, in this cohort of healthy patients, a proportion of
18.6% (n=11) of individuals experienced conversion throughout the study period, with a marked
predominance of female patients (Table S1).</p>
        <p>Finally, Cluster 2 comprises 26 individuals, mainly males. The average age of this cluster
aligns closely with that of Cluster 3 (i.e., 74.38 compared to 73.94). Cluster 2 presents the highest
education level and eTIV but the lowest socio-economic status. The level of MMSE is similar
to Cluster 3. The distribution of healthy and demented subjects is very similar. All patients
diagnosed with dementia and those who have undergone a conversion are exclusively male.</p>
        <p>Regarding the outcomes of the second visit, it is evident that Cluster 3 exhibits analogous
characteristics to those observed in Cluster 1 pertaining to the first visit. This outcome can
be ascribed to the observation that a substantial majority (97%) of patients from the first visit,
who were initially categorized as Cluster 1, have been assigned to Cluster 3 of the second visit.
Furthermore, there has been an increase in the average age of the patients since they attended
their second visit at least two years after their initial one.</p>
        <p>Similarly, Cluster 1 observed for the second visit exhibits comparable characteristics to Cluster
3 associated with the first visit. Cluster 1 comprises a significant majority of subjects, specifically
85%, who are grouped within Cluster 3 of the first visit. Additionally, it was observed that 75.8%
of the patients grouped under Cluster 2 of the second visit were also grouped under Cluster 2 of
the first visit. Similar grouping characteristics are evident for the third visit.</p>
        <p>A mere fraction of patients enrolled in the longitudinal study were observed to have attended
the fourth scheduled visit. Due to the limited sample size, this grouping is deemed unfit to be
considered representative.</p>
        <p>Ultimately, the findings obtained using the Gaussian Mixture model suggest that the clusters
exhibiting the highest degree of representation among individuals with cognitive impairments
throughout the longitudinal study are Visit 1 Cluster 1 (V1C1), Visit 2 Cluster 3 (V2C3), and
Visit 3 Cluster 1 (V3C1). On the contrary, the clusters denoted as V1C1, V2C1, and V3C3 have
been identified as the most indicative clusters within the population of healthy subjects.
 -means Clustering The statistical data about patients categorized into three distinct clusters
identified by the  -means model across multiple visits during the longitudinal study are
illustrated in Figure S2. By looking at the clustering results related to the patients examined during
the first visit, it can be observed that Cluster 1 comprises a total of 53 patients, with 88.68%
women and only 11% men. The average age of individuals is 71.96 years, with an education level
of 15.26 years and a low-middle socio-economic status (2.17) on average. Cluster 1 efectively
consolidates individuals with noticeably younger ages, with the highest MMSE quantified at
29.15 and the most substantial nWBV recorded at 0.76. Furthermore, it should be noted that a
significant proportion of patients identified within this particular cluster, specifically 83%, were
initially diagnosed as being within normal parameters during their initial visit. All individuals
who experienced conversion during the subsequent visits are identified as female based on the
Group label in Figure S2.</p>
        <p>Conversely, Cluster 3 comprises 47 individuals, where approximately 68% of the cluster’s
population are men, while women constitute the remaining 32%. Cluster 3 has the highest
average age (77.85), the highest socio-economic status (3.45), and the lowest education level
(12.13) compared to the other clusters of the same visit. Moreover, Cluster 3 highlights the
lowest MMSE score (i.e., 26.13) among the three clusters and a lower average value of nWBV
than Cluster 3 but the same of nWBV compared to Cluster 2. By looking at CDR, almost 64%
of this cluster’s patients have been diagnosed with mild or moderate cognitive decline (i.e.,
 &gt; 0 ) and have been targeted as Demented. The majority of patients who do not exhibit
symptoms of dementia are female.</p>
        <p>Finally, Cluster 2 encompasses 42 subjects, of which 88% are male and only 12% are female.
This cluster comprises individuals whose average age closely aligns with that of Cluster 3. The
results demonstrate that, on average, individuals in this cluster possess the highest level of
education (16.69) and the highest eTIV value (1676.33). However, they exhibit the lowest
socioeconomic status (1.83). The cognitive performance as measured by the MMSE is marginally
superior compared to Cluster 3. The distribution of individuals exhibiting well-preserved
cognitive abilities and those aflicted by dementia demonstrates a nearly equivalent distribution.</p>
        <p>The findings from the second visit reveal that Cluster 2 exhibits analogous characteristics
observed for Cluster 1 related to the first visit. Patients transition from Cluster 1 to Cluster 2
during the second visit is observed in 92% of the cases. Similarly, the statistics of Cluster 3
observed for the second visit demonstrate similarities when compared to the statistics of Cluster
3 of the first visit. Comparable grouping characteristics are also evident for the third visit.</p>
        <p>Hence, the clusters V1C3, V2C3, and V3C3 can potentially be identified as the most
representative cohorts exhibiting cognitive impairment. On the contrary, when assessing individuals in
good health, results suggest that the clusters V1C1, V2C2, and V3C1 exhibit the utmost degree
of representativeness.</p>
        <p>Hierarchical Clustering The table in Figure S3 presents the statistical information about the
patients categorized into three distinct clusters based on the Hierarchical Clustering model, as
observed during each visit within the longitudinal study. An analysis of the clustering outcomes
of the individuals evaluated during their initial visit reveals that Cluster 1 encompasses a cohort
of 60 patients, out of which 91.67% are female, and a mere 8% are male. The average age
of individuals is 74.58 years, with an average education level of 15.4 years and a low-middle
socio-economic status (2.12). Cluster 1 groups individuals with the highest MMSE (29.12) as well
as the highest nWBV (0.76). Individuals in Cluster 1 exhibit a comparatively higher education
level than individuals within Cluster 3, while their socio-economic status demonstrates a slight
decrease. Moreover, the findings indicate that a substantial majority of individuals (86.7%)
within this cluster showed a healthy diagnosis during their initial visit, as evidenced by a CDR
score of 0. Most individuals diagnosed as non-demented within the cluster exhibit female sex.
All converted individuals exclusively pertain to the female sex (group label in Figure S2).</p>
        <p>On the contrary, Cluster 3 comprises 54 individuals, with women accounting for 54% and men
representing 46% of the group. The average age of this cluster is 76.89 years, with an education
level of 12.56 years. Additionally, the average socio-economic status of this cluster is moderately
middle-class, with a value of 3.41. Mainly, the patients in this cluster have the highest average
age and the lowest education level (12.56) compared to the other clusters. Moreover, Cluster
3 highlights the lowest MMSE score (i.e., 26.04) and the lowest nWBV value (0.72) among the
three clusters. A thorough analysis of the CDR reveals that 66.7% of the patients within this
particular cluster exhibit mild to moderate cognitive decline, as indicated by a  &gt; 0 . The
Group label in Figure S2 indicates these individuals as Demented.</p>
        <p>Lastly, Cluster 2 comprises a cohort of 28 male patients. This grouping comprises individuals
with a comparable average age with Cluster 1. The members within this cluster exhibit the
most elevated average level of education and the highest eTIV, whereas their socioeconomic
status (1.54) is relatively low. The MMSE level (27.5) exhibits a marginal increment compared
to Cluster 3 (26.3). The demographic distribution of individuals classified as healthy and those
diagnosed with dementia exhibits striking similarity.</p>
        <p>Shifting focus towards the outcomes of the second visit, it is evident that Cluster 3 exhibits
analogous characteristics when juxtaposed with Cluster 1 about the first visit. Analogously, the
second visit unveils analogous statistical characteristics between Cluster 1 and the first visit’s
Cluster 3. Comparable grouping characteristics are also observed for the third visit.</p>
        <p>Therefore, results suggest that V1C3, V2C1, and V3C1 may be regarded as the most indicative
clusters of subjects with cognitive impairment. Similarly, V1C3, V2C3, and V3C3 can be
perceived as the most representative clusters of individuals who exhibit sound cognitive function.
Spectral Clustering Table S1, Table S2, Table S3, and Table S4 present the statistical
characteristics of the patients assigned to the three distinct clusters identified by the Spectral Clustering
model for every visit conducted in the course of the longitudinal study.</p>
        <p>After analyzing the clustering analysis results of the patients that undertaken the first visit,
it was observed that a group denoted as Cluster 1 includes a total of 83 female patients. The
average age of individuals in this cluster is 75.7 years, with an average education level of 14.5
years and a middle socio-economic status (2.53). Cluster 1 encompasses females with the highest
MMSE (28.2), as well as the highest nWBV (0.744). Moreover, 72.7% of patients in this cluster
were diagnosed as healthy at the time of the first visit (CDR=0). Among them, 12% converted
during the longitudinal study.</p>
        <p>On the other hand, Cluster 2 consists of a cohort of 30 individuals, primarily comprised of
males, with an average age of 75.9 years. Their level of education is measured at an average of
12.6 years, alongside a moderate socio-economic status of 3.33. Mainly, the patients in such
a cluster have the highest average age and the lowest education level (12.6) compared to the
other clusters. Moreover, Cluster 2 highlights the lowest MMSE score (i.e., 26) and the lowest
nWBV value (0.724) among the three clusters. By looking at Group label, 66.7% of the patients
belonging to this cluster have been diagnosed as Demented.</p>
        <p>Finally, Cluster 3 comprises a cohort of 29 male participants. This cluster comprises individuals
with an average age comparable to Cluster 1. The members comprising this particular cluster
exhibit the most elevated average education level (17.3) along with the highest eTIV (1700) but
the lowest socio-economic status (1.52). The level of MMSE (27.6) exhibits a marginal increase
compared to Cluster 2. The prevalence of healthy and demented subjects is nearly equivalent in
their distribution.</p>
        <p>Regarding the findings of the second visit, it emerges that Cluster 1 exhibits analogous
attributes compared to Cluster 2 of the first visit. Similarly, Cluster 3 in the second visit exhibits
comparable statistical patterns to those observed in Cluster 1 of the first visit. Similar grouping
characteristics are presented for the third visit.</p>
        <p>Thus, V1C2, V2C1, and V3C1 can be regarded as the most indicative clusters of cognitively
impaired individuals, while V1C1, V2C3, and V3C3 can be regarded as the most representative
clusters of healthy participants.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Discussion &amp; Conclusions</title>
      <p>In this work, we conducted an examination of a publicly available dataset, namely OASIS-2,
which encompasses information about individuals diagnosed with dementia, collected over
the course of five distinct visits. The primary aim of our study was to utilize a set of diferent
clustering techniques ( -means, Hierarchical Clustering, Gaussian mixed models, and Spectral
Clustering) for discerning patient cohorts exhibiting similar patterns over time.</p>
      <p>To assess the efectiveness of diverse clustering methodologies regarding the outcomes
of clustering processes, we analysed the clusters derived from distinct clustering algorithms
employed, with a specific focus on the first visit. Particularly, by referring to the data presented
in Table 1, we can see that out of the total population of 142 patients, 86 individuals have no
cognitive impairment, and 56 patients have been diagnosed with mild or moderate cognitive
impairment. Clustering results show that the  -means algorithm successfully groups together
52.3% (45 out of 86) of healthy patients, while the hierarchical algorithm achieves a higher
percentage of 61.6% (53 out of 86).</p>
      <p>Notably, the number of healthy individuals at the initial visit is determined by summing
the number of non-demented individuals and converted patients. Converted patients refer to
individuals who were identified as healthy during the initial visit but later experienced a change
in their health status over the follow-up period.</p>
      <p>Moreover, the Gaussian model demonstrates a clustering accuracy of 69.8% (60 out of 86), and
the Spectral Clustering algorithm accurately clusters 69.8% (60 out of 86) of healthy patients.
As concerns the cognitively impaired individuals, the  -means algorithm successfully groups
together 53.6% (30 out of 56), the hierarchical algorithm achieves 64.3% (36 out of 56), the
Gaussian model 46.4% (26 out of 56) and the Spectral Clustering 35.7% (20 out of 56). As we can
see, the Spectral Clustering and the Gaussian model perform better in clustering healthy patients.
On the other hand, Spectral Clustering is the worst in clustering patients with dementia, while
the hierarchical model is the best. We can note that the hierarchical model reaches comparable
eficacy in clustering healthy and demented patients.</p>
      <p>Furthermore, we analysed the clusters diferently by looking at their cardinality. It is
observable that the  -means algorithm identifies two distinct clusters: Cluster 1, which consists of
84.9% (45 out of 53) of individuals diagnosed as healthy patients, and Cluster 3, which contains
63.8% (30 out of 47) of individuals diagnosed as cognitively impaired. The Gaussian model
reveals the presence of Cluster 3, comprising 75.9% (60 out of 79) of individuals diagnosed as
healthy, as well as Cluster 1, consisting of 70.2% (26 out of 37) of individuals characterized as
cognitively impaired. The hierarchical model demonstrates the existence of two particular
clusters, specifically Cluster 1, predominantly comprising 88.3% (53 out of 60) healthy individuals,
and Cluster 3, mainly consisting of 66.7% (36 out of 54) cognitively impaired individuals.</p>
      <p>The Spectral Clustering technique identifies two distinct clusters Cluster 1 comprising
approximately 72.3% (60 out of 83) of individuals classified as healthy, whereas Cluster 2 encompassing
approximately 66.7% (20 out of 30) of individuals diagnosed with cognitive impairment.</p>
      <p>We can also note in this case that the hierarchical model reaches better results in finding
more homogeneous clusters of healthy and demented patients.</p>
      <p>Finally, we conducted an analysis to investigate the features of these clusters and identified
some significant clusters where patients share common traits. Mainly, we note that although
the cardinality of clusters among algorithms is diferent, the features of the clusters are almost
similar. As we can see above,  -means’s V1C1, Mixture Gaussian’s V1C3, Hierarchical’s V1C1
and Spectral cluster’s V1C1 could be considered as the most representative clusters grouping of
healthy subjects with respect to the four algorithms.</p>
      <p>In each of these clusters, individuals who demonstrate an MMSE score higher than 28 and
nWBV greater than 0.744, along with a moderate level of education (roughly 15 years), might
have a minor likelihood of being diagnosed with dementia, regardless of their socio-economic
status falling within the lower-middle level. Note: Here we refer to likelihood and risk in
sociological terms, not in statistical terms.</p>
      <p>Conversely, as we can see from  -means’s V1C3, Mixture Gaussian’s V1C1, Hierarchical’s
V1C3, and Spectral Clustering V1C2, individuals characterized by an MMSE lower than 26 and
nWBV lower than 0.72 with lower-middle educational level (around 12) have a major risk of
being diagnosed as demented although they show a middle socio-economic status (&gt; 3).</p>
      <p>It is also worth noting that, diferently from other techniques, the clusters determined by the
Spectral Clustering diferentiated mostly based on sex.</p>
      <p>In summary, this study employed various clustering methodologies to categorize cohorts
of patients exhibiting comparable characteristics, utilizing data collected from a longitudinal
investigation. The clustering analysis results demonstrated the presence of discernible patterns
within the distinct clusters obtained for each employed technique. Each clustering algorithm that
has been adopted has successfully identified the same patterns, albeit with varying performance
levels. Finally, we derived initial insights from previous findings, which will serve as a foundation
for future investigations on more extensive datasets related to dementia.</p>
      <p>• First of all, although age is not a primary cause of dementia, the risk of developing
dementia increases with ageing.
• Female subjects with higher values of nWBV and higher values of MMSE present a lower
risk of having dementia diagnosis.
• Conversely, subjects (both male and female) with lower nWBV values and a lower MMSE
and education level are more likely to be diagnosed as demented.
• Lower value of SES, higher education level, and higher MMSE are not good predictors for
men. The demented and non-demented subjects cannot be efectively distinguished from
these features.
• SES values of cognitively impaired patients are distributed across a higher range value
than SES of normal patients.</p>
      <p>• Further investigation is required to understand the variations between sexes.
Limitations and future developments Regarding limitations and aspects to improve, we
have to report that we performed our clustering analysis on a single dataset, with a limited
number of subjects. We therefore cannot claim that our results are general for most of the
patients sufering from dementia: they are limited to the subjects of this dataset. Moreover, we
focused our analysis on the change of the feature values, and not on the moves of the patients
between clusters. In the future, we plan to extend the patient-oriented side of this study, by
considering how many patients move through the clusters for each method. We also plan to
apply our approach on a validation cohort dataset, even if we know that it is dificult to find
a suitable one with a similar set of features. And we foresee to use longitudinal clustering
algorithms such as longitudinal  -means (kml [26]) and trajectory analysis (traj [27]), two-step
clustering, and dynamic clustering methods [28]. Moreover, we plan to collaborate with medical
doctors and collect their insights about the medical significance of the clusters. Finally, we plan
to use alternative clustering techniques such as DBSCAN or fuzzy clustering.</p>
    </sec>
    <sec id="sec-6">
      <title>Additional sections</title>
      <p>Ethics approval and consent to participate The authorization for collecting the data from
patients and to release them publicly was obtained by the original dataset curators [18, 19].
Data availability The analyzed dataset is available openly under the CC BY NC 3.0 licence
on Mendeley Data at https://doi.org/10.17632/tsy6rbc5d4.1
Acknowledgments
help.</p>
      <p>The authors thank Luca Cufaro (Università di Milano-Bicocca) for his
Funding This study work was funded by the European Union – Next Generation EU
programme, in the context of The National Recovery and Resilience Plan, Investment Partenariato
Esteso PE8 “Conseguenze e sfide dell’invecchiamento”, Project Age-It (Ageing Well in an Ageing
Society). This work was also partially supported by Ministero dell’Università e della Ricerca of
Italy under the “Dipartimenti di Eccellenza 2023-2027” ReGAInS grant assigned to Dipartimento
di Informatica Sistemistica e Comunicazione at Università di Milano-Bicocca. The funders had
no role in study design, data collection and analysis, decision to publish, or preparation of the
manuscript.
[9] J. F. Beltrán, B. M. Wahba, N. Hose, D. Shasha, R. P. Kline, F. the Alzheimer’s Disease
Neuroimaging Initiative, Inexpensive, non-invasive biomarkers predict Alzheimer transition
using machine learning analysis of the Alzheimer’s disease neuroimaging (ADNI) database,
PLOS One 15 (2020) 1–26. doi:10.1371/journal.pone.0235663.
[10] J. Xu, F. Wang, Z. Xu, P. Adekkanattu, P. Brandt, G. Jiang, R. C. Kiefer, Y. Luo, C. Mao,
J. A. Pacheco, L. V. Rasmussen, Y. Zhang, R. Isaacson, J. Pathak, Data-driven discovery of
probable Alzheimer’s disease and related dementia subphenotypes using electronic health
records, Learning Health Systems 4 (2020). doi:10.1002/lrh2.10246.
[11] Y. Wang, Y. Zhao, T. M. Therneau, E. J. Atkinson, A. P. Tafti, N. Zhang, S. Amin, A. H.</p>
      <p>Limper, S. Khosla, H. Liu, Unsupervised machine learning for the discovery of latent disease
clusters and patient subgroups using electronic health records, Journal of Biomedical
Informatics 102 (2020) 103364. doi:10.1016/j.jbi.2019.103364.
[12] N. Alexander, D. Alexander, F. Barkhof, S. Denaxas, Identifying and evaluating clinical
subtypes of Alzheimer’s disease in care electronic health records using unsupervised
machine learning, BMC Medical Informatics and Decision Making 21 (2021) 1–13. doi:10.
1186/s12911-021-01693-6.
[13] D. Gamberger, N. Lavrač, S. Srivatsa, R. E. Tanzi, P. M. Doraiswamy, Identification of
clusters of rapid and slow decliners among subjects at risk for Alzheimer’s disease, Scientific
Reports 7 (2017) 6763.
[14] G. Turcan, S. Peker, Profiling individuals with dementia using cluster analysis, in:
Proceedings of CISTI 2023 – the 18th Iberian Conference on Information Systems and Technologies,
2023, pp. 1–7. doi:10.23919/CISTI58278.2023.10211598.
[15] H. Alashwal, M. El Halaby, J. J. Crouse, A. Abdalla, A. A. Moustafa, The application of
unsupervised clustering methods to Alzheimer’s disease, Frontiers in Computational
Neuroscience 13 (2019). doi:10.3389/fncom.2019.00031.
[16] D. S. Marcus, A. F. Fotenos, J. G. Csernansky, J. C. Morris, R. L. Buckner, Open access series
of imaging studies: longitudinal MRI data in nondemented and demented older adults,
Journal of Cognitive Neuroscience 22 (2010) 2677–2684.
[17] OASIS, Open Access Series of Imaging Studies (OASIS), URL: https://www.oasis-brains.org/</p>
      <p>URL visited on 20th September 2023, 2023.
[18] G. Battineni, N. Chintalapudi, F. Amenta, Machine learning in medicine: performance
calculation of dementia prediction by support vector machines (SVM), Informatics in
Medicine Unlocked 16 (2019) 100200.
[19] G. Battineni, N. Chintalapudi, F. Amenta, Data for: Machine learning in medicine:
performance calculation of dementia prediction by support vector machines (SVM), Mendeley Data
(2019). doi:10.17632/tsy6rbc5d4.1.
[20] C. Kavitha, V. Mani, S. Srividhya, O. I. Khalaf, C. A. Tavera Romero, Early-stage Alzheimer’s
disease prediction using machine learning models, Frontiers in Public Health 10 (2022)
853294.
[21] A. Likas, N. Vlassis, J. J. Verbeek, The global  -means clustering algorithm, Pattern
recognition 36 (2003) 451–461.
[22] D. A. Reynolds, Gaussian mixture models., Encyclopedia of Biometrics 741 (2009).
[23] U. Von Luxburg, A tutorial on spectral clustering, Statistics and Computing 17 (2007)
395–416.
[24] F. Nielsen, Hierarchical clustering, Introduction to HPC with MPI for Data Science (2016)
195–211.
[25] D. Xu, Y. Tian, A comprehensive survey of clustering algorithms, Annals of Data Science
2 (2015) 165–193.
[26] C. Genolini, X. Alacoque, M. Sentenac, C. Arnaud, kml and kml3d: R packages to cluster
longitudinal data, Journal of Statistical Software 65 (2015) 1–34.
[27] K. Lefondré, M. Abrahamowicz, A. Regeasse, G. A. Hawker, E. M. Badley, J. McCusker,
E. Belzile, Statistical measures were proposed for identifying longitudinal patterns of
change in quantitative health indicators, Journal of Clinical Epidemiology 57 (2004)
1049–1062.
[28] J. Diaz-Rozo, C. Bielza, P. Larrañaga, Machine-tool condition monitoring with Gaussian
mixture models-based dynamic probabilistic clustering, Engineering Applications of
Artificial Intelligence 89 (2020) 103434.</p>
    </sec>
    <sec id="sec-7">
      <title>S1. Supplementary information</title>
      <p>Cluster 1
(N=37)
Cluster 2
(N=33)
eTIV
eTIV</p>
      <p>Mean (Std)</p>
      <p>Range
Group</p>
      <p>Demented</p>
      <p>Female</p>
      <p>Male
NonDemented</p>
      <p>Female</p>
      <p>Male
Converted</p>
      <p>Female</p>
      <p>Male
eTIV
eTIV
Cluster 2
(N=70)
Cluster 2
(N=42)
Figure S2: Clustering Results of  -means model, descriptive tables for each visit
Cluster 2
(N=28)
eTIV
eTIV</p>
      <p>Mean (Std)</p>
      <p>Range
Group</p>
      <p>Demented</p>
      <p>Female</p>
      <p>Male
NonDemented</p>
      <p>Female</p>
      <p>Male
Converted</p>
      <p>Female</p>
      <p>Male
age
sex
woman
man
age
man
woman
Table S2
Descriptive statistics of the results of Spectral Clustering applied to the visit #2 subset.
age
sex
man
woman
age
sex</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Morris</surname>
          </string-name>
          ,
          <article-title>The Clinical Dementia Rating (CDR): current version and scoring rules</article-title>
          .,
          <string-name>
            <surname>Neurology</surname>
          </string-name>
          (
          <year>1993</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Cockrell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Folstein</surname>
          </string-name>
          ,
          <article-title>Mini-mental state examination, Principles and Practice of Geriatric Psychiatry (</article-title>
          <year>2002</year>
          )
          <fpage>140</fpage>
          -
          <lpage>141</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Desautels</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Calvert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hofman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kerem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Shieh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Shimabukuro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Chettipally</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Feldman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Barton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Wales</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. Das</surname>
          </string-name>
          ,
          <article-title>Prediction of sepsis in the intensive care unit with minimal electronic health record data: a machine learning approach</article-title>
          ,
          <source>JMIR Medical Informatics</source>
          <volume>4</volume>
          (
          <year>2016</year>
          )
          <article-title>e5909</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radhachandran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garikipati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. S.</given-names>
            <surname>Zelin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Pellegrini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghandian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Calvert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hofman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. Das</surname>
          </string-name>
          ,
          <article-title>Prediction of short-term mortality in acute heart failure patients using minimal electronic health record data</article-title>
          ,
          <source>BioData Mining</source>
          <volume>14</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.-W.</given-names>
            <surname>Liang</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-C. Yang</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Islam</surname>
            ,
            <given-names>P. A. A.</given-names>
          </string-name>
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>Y.-T.</given-names>
          </string-name>
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>Z. Y.</given-names>
          </string-name>
          <string-name>
            <surname>Hou</surname>
            , C.-W. Huang,
            <given-names>T. N.</given-names>
          </string-name>
          <string-name>
            <surname>Poly</surname>
          </string-name>
          , Y.
          <string-name>
            <surname>-C. J. Li</surname>
          </string-name>
          ,
          <article-title>Predicting hepatocellular carcinoma with minimal features from electronic health records: development of a deep learning model</article-title>
          ,
          <source>JMIR Cancer 7</source>
          (
          <year>2021</year>
          )
          <article-title>e19812</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review</article-title>
          ,
          <source>Journal of the American Medical Informatics Association</source>
          <volume>25</volume>
          (
          <year>2018</year>
          )
          <fpage>1419</fpage>
          -
          <lpage>1428</lpage>
          . doi:
          <volume>10</volume>
          .1093/jamia/ocy068.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ben Miled</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Haas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Black</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Khandker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chandrasekaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lipton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Boustani</surname>
          </string-name>
          ,
          <article-title>Predicting dementia with routine care EMR data</article-title>
          ,
          <source>Artificial Intelligence in Medicine</source>
          <volume>102</volume>
          (
          <year>2020</year>
          )
          <article-title>101771</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.artmed.
          <year>2019</year>
          .
          <volume>101771</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Stamate</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tsygancov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vorobev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Langham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Stahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Reeves</surname>
          </string-name>
          ,
          <article-title>Applying deep learning to predicting dementia and mild cognitive impairment</article-title>
          ,
          <source>in: Artificial Intelligence Applications and Innovations: Proceedings of AIAI 2020 - the 16th IFIP WG 12</source>
          .5 International Conference, Neos Marmaras, Greece, June 5-7,
          <year>2020</year>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>II</given-names>
          </string-name>
          16, Springer,
          <year>2020</year>
          , pp.
          <fpage>308</fpage>
          -
          <lpage>319</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>Spectral Clustering applied to the 3rd visit subset cluster 1 cluster 2 cluster 3 (N=4) (N=19) (N=32)</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>