<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Data-driven vs knowledge-driven inference of health outcomes in the ageing population: a case study</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Davide Ferrari</string-name>
          <email>162996@studenti.unimore.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Riccardo Martoglia</string-name>
          <email>riccardo.martoglia@unimore.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Guaraldi</string-name>
          <email>giovanni.guaraldi@unimore.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jovana Milić</string-name>
          <email>jovana.milic@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federica Mandreoli</string-name>
          <email>federica.mandreoli@unimore.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Missier</string-name>
          <email>paolo.missier@ncl.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Newcastle University</institution>
          ,
          <addr-line>Newcastle upon Tyne</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Modena and Reggio Emilia</institution>
          ,
          <addr-line>Modena</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Preventive, Predictive, Personalised and Participative (P4) medicine has the potential to not only vastly improve people's quality of life, but also to significantly reduce healthcare costs and improve its eficiency. Our research focuses on age-related diseases and explores the opportunities ofered by a data-driven approach to predict wellness states of ageing individuals, in contrast to the commonly adopted knowledge-driven approach that relies on easy-to-interpret metrics manually introduced by clinical experts. This is done by means of machine learning models applied on the My Smart Age with HIV (MySAwH) dataset, which is collected through a relatively new approach especially for older HIV patient cohorts. This includes Patient Related Outcomes values from mobile smartphone apps and activity traces from commercial-grade activity loggers. Our results show better predictive performance for the data-driven approach. We also show that a post hoc interpretation method applied to the predictive models can provide intelligible explanations that enable new forms of personalised and preventive medicine.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Medical practice is evolving rapidly, away from the traditional but
ineficient detect-and-cure approach, and towards a Preventive,
Predictive, Personalised and Participative (P4) vision that focuses
on extending people’s wellness state, with particular focus on
ageing individuals [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. This vision is increasingly data-driven, and
is underpinned by many forms of “Big Health Data” including
periodic clinical assessments and electronic health records, but also
using new forms of self-assessment, such as mobile-based
questionnaires and personal wearable devices. With these premises,
P4 medicine has the potential to not only vastly improve people’s
quality of life, but also to significantly reduce healthcare costs
and improve its eficiency.
      </p>
      <p>
        Our research explores specific opportunities ofered by
datadriven approaches to predictive care, in contrast to traditional,
knowledge-driven approaches that rely purely on clinical
expertise. Our focus is on age-related diseases, an emerging issue
for health care systems. The World Health Organisation
estimates that the proportion of people over 60 years of age will
reach 2 billion by 2050 [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Ageing is associated with increased
prevalence of co-morbidities that accumulate in the complex of
multi-morbidity in older people [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
1.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        We focus on two easy-to-interpret metrics that clinical
researchers have proposed to succinctly express the health status
of patients at a given point in time. The first is a measure of
frailty, designed to quantify the reduction of homeostatic reserves
available to an individual. High frailty is indicative of higher
risk of negative health outcomes, but it is also a potentially
reversible condition [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] In practice, frailty is measured using a
variety of specific Frailty Index metrics (FIs). These are
calculated using at least 30 directly assessed health variables, which
include signs or symptoms, biochemical parameters, various
comorbidities or socio-demographic data [
        <xref ref-type="bibr" rid="ref22 ref6">6, 22</xref>
        ]. The choice of the
specific variables, as well as their sources, may vary depending
on data availability, leading to diferent specifications for the FI.
Given their complexity and multidimensionality, FIs reflect the
biological age of an individual rather than their chronological age,
making them reliable prognostic tools that can be used in
diferent settings for clinical decision algorithms [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In particular, the
dataset used in this research concerns a cohort of HIV patients.
This is important, because long-lived HIV patients exhibit a form
of accentuated ageing [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],such that they can successfully be used
to study frailty, i.e., where the duration of the condition (number
of years since infection) is used as a proxy for chronological age.
      </p>
      <p>
        The second measure of health considered in this work directly
reflects the more positive notion of healthy ageing [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] that is
becoming prevalent especially in public health settings. In
contrast to frailty, which is designed to measure decay, the term
healthy ageing (HA) has been proposed to promote a positive
approach to ageing that relies on reserves and preserved
capacities in an individual, rather than accumulation of deficits. In
the World Health Organization Guidelines on Integrated Care
for Older People (ICOPE), HA is based on concrete measures of
Intrinsic Capacity (IC). These are defined as a composite of all
the physical and mental capacities of an individual, divided into
ifve domains: locomotion, cognition, psychological, vitality and
sensory capacity [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        As suggested by Belloni and Cesari [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], frailty and IC should
not be considered as two opposed constructs, but rather two
constructs that share a common biological background. The IC
should be considered as an evolution of the frailty concept, taking
into special consideration the functional reserve expressed by
the vitality domain, the need for a worldwide implementation
of prevention, the continuum of the ageing process, and the
opportunities ofered by novel technologies [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Key to a successful operational definition of IC is the choice
of the variables used to characterise healthy ageing. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] argues
that integrated care is crucial to incorporate intrinsic capacity
assessment in new care models and encourages the adoption of
wearable devices and mobile apps for longitudinal data collection.
In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], the authors proposed a self-generated health measure
called Intrinsic Capacity Index (ICI) which relies on physical
function data collected through fitness tracking wearable devices
and a set of electronic Patient-Related Outcomes (PRO) collected
through a dedicated smart-phone app (MySAwH App). In the
study, these variables have been collected longitudinally over a
period of 18 months for a cohort of HIV patients as part of the My
Smart Age with HIV (MySAwH) study. This is a prospective
multicenter international case-only study, designed to empower Older
People Living With HIV (OPLWH) to achieve healthy lifestyles
and improvement in quality of life.
      </p>
      <p>
        Similarly to FI, ICI is computed from a subset of the available
variables, by manually selecting a cutof point for each variable
and simply counting, for a given patient, the variables with value
higher than the cutof for that patient. We have referred to this
approach that represents the common practice to assess health
condition in geriatric medicine as “knowledge-driven” (KD for
short), as it relies on easy-to-interpret metrics where the choice
of variables and of their cutof points is defined manually by
clinical experts. The usefulness of the ICI is demonstrated in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
by showing experimentally that it displayed higher sensitivity
than FI to predict one central indicator of “wellness state” of
ageing individuals, the Quality of Life (QoL in short).
1.2
      </p>
    </sec>
    <sec id="sec-3">
      <title>Contributions</title>
      <p>In contrast to the dominant KD methods as described, in this
paper we explore a complementary data-driven approach (DD)
to predicting wellness states for long-term patients, using a
combination of clinical, self-monitoring, and PRO (self-reporting)
longitudinal observations that refer to the five IC domains. As
we will see, while this approach removes the need for the clinical
experts to directly define metrics such as the ICI, it also
empowers them by deploying machine learning techniques that make
the predictions easily interpretable.</p>
      <p>Specifically, we focus on three dependent variables to
characterise the “wellness state” of ageing individuals, namely, (i) Falls,
indicating whether or not a person has experienced any falls
within a given time period; (ii) SPPB, or Short Physical
Performance Battery, measuring movement of the lower limbs, and (iii)
Quality of Life (QoL). Using the MySAwH dataset for training,
we can directly contrast the DD and KD approaches, and we
show that we can achieve better predictive power without the
need to rely on manually formulated ICI metrics. Furthermore,
we also show that the models’ performance increases if we also
include a single Frailty Index value for each patient,
representing a “baseline” clinical assessment, in addition to the PRO and
activity variables.</p>
      <p>Finally, we combine well-established machine learning
models with a post-hoc interpretation method and we show that
satisfactory model performance can be achieved together with
intelligible explanations, which provide the additional benefit of
ranking the variables with respect to a prediction. Indeed, we
show that the relative importance of the variables difers for
each patient, indicating the important role played by intelligible
models towards personalising healthcare. This capability also
provides critical feedback to the clinicians, who can use this
information in combination with their expertise to moderate the
model’s predictions.</p>
    </sec>
    <sec id="sec-4">
      <title>RELATED WORK</title>
      <p>
        The idea of exploiting machine learning toward a general “health”
focus is certainly gaining popularity, also thanks to the
possibility of continuously monitoring well-being through wearable
activity trackers’ data. For instance, passive sensing techniques
have been recently exploited to assess both mental and physical
health [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] or to predict weight objectives for users of smart
connected devices [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. Some researchers also successfully
exploited this kind of data in more traditional disease monitoring
scenarios, e.g., to track Multiple Sclerosis [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], depression [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]
and schizophrenia [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] patients’ health. In most of these studies
mobile phones are only exploited as a passive sensing device,
while in our case we combine the use of passive sensing
(wearable activity tracker) data with self-reporting EMA data collected
through mobile phone apps.
      </p>
      <p>
        A critical aspect for the successful application of ML to
medicine is the recent increased emphasis on the need for
explanations of ML systems [
        <xref ref-type="bibr" rid="ref10 ref7">7, 10</xref>
        ]. This has lead to another
important research trend, i.e. Interpretable Machine Learning in
healtcare [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Even if recent researches have proposed new
models which exhibit high performance as well as interpretability,
e.g., GA2M[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and rule-based models [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], the utility of these
models in healthcare has not been convincingly demonstrated
yet, due to the rarity of their application [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The interpretation
method used in [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ], instead, is designed to work with existing
and well established (even if less interpretable) ML methods, such
as gradient boosting or deep learning, by extracting explanations
through models that are applied post-hoc using Shapley Values
[
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. This is one of the most advanced interpretation methods
available, allowing for both global (entire study population) and
local (instance level) explanations. In our study we exploit such
model together with the XGBoost [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] gradient boosting
algorithm, in order to aim to both interpretability and performance.
A similar technique has been also proposed, only at conceptual
level, in the Explainable AI framework discussed in [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], which
is however based on clinical-only data analysis.
3
      </p>
    </sec>
    <sec id="sec-5">
      <title>THE HIV COHORT DATASET</title>
      <p>
        The experimental dataset used in this paper was obtained from
the My Smart Age with HIV (MySAwH) [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] project. MySAwH
is a multi-centre prospective ongoing study aiming at
empowering OPLWH, i.e. 50+ years old, to develop healthy lifestyles.
The project involves 261 patients from three clinics: 128 from
Modena (Italy), 100 from Sydney (Australia) and 33 from Hong
Kong (China). One novelty of the approach is the combination
of clinical patient data, acquired during periodic scheduled
assessments in the clinic, with patient-oriented longitudinal data
about patients’ behavioural, physiological and environmental
health status, which is collected at higher frequency through
smartphones and wearable devices.
      </p>
      <p>
        Thus, the resulting dataset is highly heterogeneous as to the
data type, the geographical origin of patients and the acquisition
rate. It provides a comprehensive characterization of patients
from the broad determinants of health that impact aging,
complemented by those more specific to OPLWH, namely:
Activity tracking variables: Step count, sleep hours and
calories, which are collected daily using a
commercialgrade wearable activity tracker;
PRO variables: 56 categorical questions exploring
functional abilities and Quality of life (QoL). These are
collected monthly using a dedicated smartphone app;
Clinical variables: Comprehensive geriatric assessment
and HIV-specific variables, which are collected by
healthcare workers during study visits at time 0, 9 and 18 months.
37 of these variables were used to measure the Frailty
Index (FI) as defined in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]: 27 from blood tests, 3 about body
composition, 7 HIV-related variables and patient-reported
outcomes.
      </p>
      <p>
        A preliminary analysis [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] introduced a new index for
expressing IC, named ICI, and compared the performance of FI and ICI
in predicting QoL. This is a quantitative measure of health as
assessed by the individual respondents that is widely used on
aging population. QoL was assessed using a standardised
EQ-5D5L questionnaire, based on the EQ Visual Analogue scale (EQ
VAS) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Through the MySAwH dataset, FI and ICI were shown
to be performative tools that can be used in research and clinical
setting to describe disease and health status in OPLWH. ICI score
in comparison to FI displayed higher sensitivity to predict the
QoL and self-perceived health in OPLWH.
      </p>
      <p>Outcomes. In this work, we focus on the task of predicting
significant healty ageing indicators and QoL is one of such
indicators. In addition to QoL, two other indicators were selected:
Falls, that is an adverse outcome included among the geriatric
syndroms, and the Short Physical Performance Battery (SPPB),
a group of measures about movement ability with lower limbs
(Guralnik et al., 2000) that can aid in the monitoring of function
in older people. These three outcomes are chosen because they
widely cover all 5 domains of IC. In summary, these are as follows
(Fig. 1 shows their distributions):</p>
      <p>Quality of Life (QoL): assessed using the EQ-5D5L
standard, with values between 0 and 1;
Falls: A binary outcome that evaluates to True if a patient
has fallen at least once since the previous visit, and False
otherwise;
SPPB Index: a discrete index that assumes integer values
between 0 and 12.</p>
      <p>Observational data and feature space. The observational
dataset used to predict these outcomes covers 18 months and is
broken down into two time windows, reflecting the clinical
assessment schedule (months 9 and 18), when the selected clinical
outcomes are assessed. For each time window, we draw two sets
of samples from the related observations, which we are going to
use as ground truth to train our models.</p>
      <p>The first consists of the patient-centric longitudinal data,
including the activity tracking and the PRO variables. To this end,
we further aggregated the resulting PRO time series and the
activity tracking time series at regular intervals of 1 month, resulting
in two sets of data points, one for each of the two windows. The
second sample set augments the first with the FI values
computed from the clinical variables measured during hospital visits
at the beginning of each window, namely at times 0 and 9. This
added value is a physician’s assessment that complements the
patient-centric data point and can be interpreted as the baseline
of the time series the data point comes from.</p>
      <p>
        Formally, for each outcome o ∈ {QoL, SPP B, F alls } we write
Sampleo to denote the sample set containing the monthly samples
per patient. For each patient p, we denote a single sample in the
set Sampleo at month m = i + (j − 1) ∗ 9, corresponding to the
i-th observation, i ∈ [
        <xref ref-type="bibr" rid="ref1 ref8">1, 8</xref>
        ], in the j-th window, j ∈ [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], by a pair
smp = (xip,j , ypj ) ∈ Sampleo where
• xip,j represents the feature vector for i-th observation;
in xip,j ;
• the feature vector xip,j contains the values of the PRO and
activity tracking variables V for patient p at month m: 56
PRO answers provided by the patient during the
corresponding month, i.e. plus 3 aggregated values computed
as the mean of the daily wearable device data (step count,
calories, number of sleep hours) collected during the same
month. Given a variable Vk , xip,j [Vk ] denotes the Vk value
• ypj is the value of the outcome o measured during the
hospital visit at the end of period j.
      </p>
      <p>The second sample set, denoted as SampleoF I , is built by adding
p
to each feature vector xi,j ∈ Sampleo , the FI value computed
from the clinical variables measured during the hospital visit at
the beginning of period j, i.e. at month 0 when j = 1 and 9 when
j = 2.</p>
      <p>Quality Assurance. Lastly, we performed a quality assurance
step on the resulting sample sets. Within each time window,
observations of PRO variables are sometimes incomplete, resulting
in sequence gaps. The size of the gaps is 5 consecutive missing
observations on average, with a max of 17, and we found 108
gaps per patient on average, with a max of 284 gaps (regardless
of size). We performed imputation by interpolating missing data
points in the time series, with an aim to achieve a balance
between the size of the gaps and the performance of the predictive
model. Clearly, interpolating very large gaps produces spurious
data in the training set. We experimentally determined the max
size of gaps that could be safely interpolated (five missing steps),
by assessing the predictive performance of each of the models
resulting from training sets obtained from more or less “aggressive”
interpolation. After adjusting for missing data, the final training
set contains 2,250 data points, with an average of 8 per patient.
The construction of the dataset results in at most 16 samples per
patient, for a total of 4,176 records, considering each month for
each patient.
4</p>
    </sec>
    <sec id="sec-6">
      <title>THE LEARNING FRAMEWORK</title>
      <p>The objective of the data-driven approach is to predict each
outcome at the time of a visit using the samples referring to the time
window before that visit, as shown in Fig. 2.</p>
      <p>To this end, the data-driven learning framework we built is
depicted on the left side of Fig. 3. Each outcome o is predicted by
two learning models Mo and MoF I , one for each of the two sample
sets Sampleo and SampleoF I , respectively. We trained the two
models separately and assessed the performance using standard
KFold cross-validation (CV) on an 80% of the samples (χt r ain )
and a test phase on the remaining 20% samples (χt est ) of the
corresponding sample set.</p>
      <p>The DD approach is compared to the KD approach that
represents the common practice in geriatric medicine. To this end, we
built the KD learning framework depicted on the right hand side
of Fig. 3. The approach aims at computing an ICI scores for each
observation by manually selecting a subset V = {V1 . . . Vn } ⊆ V
of the set of PRO and activity tracking variables V, specifying
functions si (x ) to map each value x for variable Vi ∈ V to a score,
and finally combining the individual scores si (x ) into a unique
value. The variables are chosen to represent each of the five IC
domains, namely locomotion, cognition, psychological, vitality
and sensory capacity. For most of the variables Vi , a binary score
is defined, i.e., si (x ) ∈ {0, 1}, based on a single threshold, for
instance when Vi = stress level (from 1 to 10) the score is mapped
3
2
5
4
7
6
8
7
9
8
0
1
9
1
1
0
1
2
1
1
1
(a) QoL distribution
(b) SPPB distribution</p>
      <sec id="sec-6-1">
        <title>Raw Data</title>
      </sec>
      <sec id="sec-6-2">
        <title>Subsetting</title>
      </sec>
      <sec id="sec-6-3">
        <title>Cutoffs</title>
        <p>()</p>
        <sec id="sec-6-3-1">
          <title>IC Index</title>
          <p>Data-Driven (DD) Approach
Training in CV
Regression
Regression
Classification
MQoL_DD
MSPPB_DD
MFalls_DD</p>
        </sec>
        <sec id="sec-6-3-2">
          <title>Model Interpretation</title>
          <p>
            to 1 if the value is lower than 3 and 0 otherwise. Other variables
are mapped to a score in the [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ] range, for instance the number
of steps per day.
          </p>
          <p>Then, given a feature vector xip,j for patient p, the
correspondp
ing ICI value ICI (i, j, p) is computed as the sum of the si (xi,j [Vi ])
scores, normalised by the number of variables:
p
Íi|V:1 | si (xi,j [Vi ])
ICI (i, j, p) =
n</p>
          <p>Such an index is subject to an inevitable bias: the imposition
of the physician’s interpretation on the choice of the variables
of the subset, as well as on the thresholds and the arithmetic
formula to be used.</p>
          <p>Also for this case, a dataset consisting exclusively of the ICIs,
SampleoIC I , and one consisting of the ICIs and the FI at the most
recent visit, SampleoIC I ,F I , was isolated. This parallelism with
what was done in the data-driven approach allowed to train
6 learning models with the datasets just described (M oIC I and
M oIC I ,F I for each of the three outcomes o) and to compare the
predictive performances between the two diferent approaches.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>5 EXPERIMENTAL RESULTS AND MODEL</title>
    </sec>
    <sec id="sec-8">
      <title>INTERPRETATION</title>
      <p>In this section we first discuss the performance of the predictive
models, and then describe in detail our approach and results
regarding model interpretation, which is reported as output to the
DD approach in the left side of Fig. 3. The latter is a fundamental
requirement in medicine, where the ability to provide medical
doctors with an easy-to-understand interpretation of the model
predictions is fundamental. This not only conveys confidence in
the predictions, but also helps to make them actionable, i.e., in
the form of recommendations to patients. We present examples
of such interpretations, and their practical relevance, in Sec. 5.2.</p>
      <p>
        The Gradient Boosting algorithm [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] proved to ofer better
predictive performance than other popular intelligible learning
frameworks such as GA2M [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], suggesting that separating model
1500
1000
500
0
      </p>
      <p>False</p>
      <p>True
(c) Falls distribution</p>
      <p>Knowledge-Driven (KD) Approach
Performance
Comparison</p>
      <p>QoL</p>
      <sec id="sec-8-1">
        <title>SPPB</title>
        <p>(MAE)</p>
      </sec>
      <sec id="sec-8-2">
        <title>Falls</title>
        <p>(Acc, Prec, Rec)</p>
        <p>Training in CV
MQoL_KD
MSPPB_KD
MFalls_KD</p>
        <p>Regression</p>
        <p>
          Regression
Classification
performance from model interpretability would better suit our
needs. Thus, our results are based on a combination of
Gradient Boosting (the XGBoost implementation [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] for performance
(Sec. 5.1), and Shapley Values [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], using the SHAP
implementation [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]) to generate reports on the relative importance of
each individual feature, both across the entire population and for
individual patients (Sec. 5.2).
        </p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>5.1 Predictive Performance</title>
      <p>In our evaluation we compare our DD approach, illustrated in
the left side of Fig. 3, with the KD approach based on a regression
on the IC index (right hand side in the Figure). Fig. 4 shows the
predictive performance of the models we tested, namely: the DD
models trained with and without using FI as a feature, earlier
referred to as MoF I and Mo , respectively; and the KD models,
where again the expert may or may not consider FI. Results are
presented using 1-MAPE (Mean Average Percentage Error) for
the numerical outcomes QoL and SPPB on the left of the figure,
and accuracy, precision, recall and F1 for Falls, on the right.</p>
      <p>The results indicate a higher than 90% 1-MAPE for all cases in
QoL and SPPB, while classification accuracy for Falls is higher
than 84%. Further, the DD approach performs generally better
than KD, and both benefit from using FI, with performance
reaching 94.3%, 94.9% and 95% for QoL, SPPB, and Falls, respectively.</p>
      <p>To note, in one case the KD approach returns a very low
Recall when FI is not used. This can be explained by the strong
imbalance of the majority “False” class (no Falls) relative to the
small minority “True”.</p>
      <p>The training sets used to generate these models combine
patients from all three clinics. To account for possible diferences
in data collection protocols between the clinics, we also created
one separate model for each. The corresponding results are
presented in Table 1 and are consistent with those presented above.
Some anomalies appear in the Hong Kong models, and these are
probably due to the small size of the training set.</p>
      <p>Finally, Fig. 5 shows the MAE distribution grouped per clinical
center for QoL and SPPB. This helps understanding the
robustness of the models and to identify any non-homogeneity in the
data. In particular, Hong Kong exhibits a higher number of
outliers compared to Modena and Sydney, probably because of the
small number of cases (33, compared to 128 in Modena and 100
in Sydney), which are also more homogeneous. These results
suggest that developing separate models by stratifying across
clinics and data collection centres may be beneficial for future,
larger scale studies.</p>
    </sec>
    <sec id="sec-10">
      <title>5.2 Model Interpretation</title>
      <p>
        SHAP [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is a framework for interpreting predictions from
machine learning models. It is based on Shapley values, first
introduced in 1953 in the context of cooperative game theory [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
Briefly, the main goal of the framework is to rank the relative
influence of each feature on a predictive model, both locally,
that is, for a specific instance prediction, and globally, i.e., when
considering the model predictions for an entire population.
      </p>
      <p>In our medical setting, this means that for each patient, in
addition to the predicted outcome the clinician also receives a
list of features, ranked in order of their relative importance in
achieving the prediction. Importantly, these orders may difer
for any two patients. This means that, using SHAP, we enable
forms of personalised medicine whereby similar outcomes are
explained in terms of diferent behaviour, represented for
instance by diferent EMA features. An example appears in Fig. 6,
showing two diferent sets of positively contributing (green) and
negatively contributing (red) features for two patients with the
same SPPB index (note that, for SPPB, higher is better, as this
indicates the patient’s capacity of physical movement. In the case
of Falls, for instance, the opposite would be true). Clearly, this
added information may lead to diferent interventions for these
two patients.</p>
      <p>At the same time, SHAP provides global explanations, which
characterise the contribution of each feature as a function of its
range of values. For instance, Fig. 7 shows how the SV, indicating
the overall contribution of one of these features (a PRO
question), goes from negative to positive depending on the patients’
responses to this question, with a definite threshold of ⩾ 3.</p>
      <p>We note that this capability essentially mimics the KD
approach in that it identifies thresholds for the variables. While
these are similar to the manually selected cutofs, in our DD
approach these are automatically identified from the data, in a
principled way. In the future, this explanation capability may
underpin epidemiological studies where the precise characterisation
of a populations of individuals enables new forms of preventive
medicine.</p>
    </sec>
    <sec id="sec-11">
      <title>6 CONCLUSIONS AND FUTURE WORK</title>
      <p>In this paper we have proposed a novel, data-driven approach
towards the definition of Intrinsic Capacity, aimed at quantifying
and predicting the wellness state of old people who live with HIV.
Using a cohort from a multi-centre prospective study as training
set, we have shown that a machine learning model that predicts
three specific wellness metrics (Falls, SPPB, and Quality of Life),
performs equally or better than a manually-defined Intrinsic
Capacity Index. At the same time, the model is interpretable, making
it an ideal complement to expert-based assessment of wellness.
w/o FI
w/ FI
w/o FI
w/ FI
w/o FI
w/ FI
86%
84%</p>
      <p>DD
100%
93%
94%
96%
89%
96%
KD
0%
33%
0%
53%
38%
69%</p>
      <p>DD
0%
33%
41%
68%
57%
68%</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Ilaria</given-names>
            <surname>Bellantuono</surname>
          </string-name>
          ,
          <string-name>
            <surname>Rafael</surname>
            <given-names>DeCabo</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dan Ehninger</surname>
          </string-name>
          , et al.
          <year>2018</year>
          .
          <article-title>Find drugs that delay many diseases of old age</article-title>
          .
          <source>Nature</source>
          <volume>554</volume>
          (02
          <year>2018</year>
          ),
          <fpage>293</fpage>
          -
          <lpage>295</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Giulia</given-names>
            <surname>Belloni</surname>
          </string-name>
          and
          <string-name>
            <given-names>Matteo</given-names>
            <surname>Cesari</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Frailty and Intrinsic Capacity: Two Distinct but Related Constructs</article-title>
          .
          <source>Frontiers in Medicine</source>
          <volume>6</volume>
          (
          <issue>06</issue>
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Brothers</surname>
          </string-name>
          , Susan Kirkland,
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Guaraldi</surname>
          </string-name>
          , et al.
          <year>2014</year>
          .
          <article-title>Frailty in People Aging With Human Immunodeficiency Virus (HIV) Infection</article-title>
          .
          <source>The Journal of infectious diseases 210 (06</source>
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Tianqi</given-names>
            <surname>Chen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Carlos</given-names>
            <surname>Guestrin</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>XGBoost: A Scalable Tree Boosting System</article-title>
          .
          <source>In Proc. of ACM SIGKDD (KDD '16)</source>
          . ACM, New York, NY, USA,
          <fpage>785</fpage>
          -
          <lpage>794</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Nancy</given-names>
            <surname>Devlin</surname>
          </string-name>
          and Richard Brooks.
          <year>2017</year>
          .
          <article-title>EQ-5D and the EuroQol Group: Past, Present and Future</article-title>
          .
          <source>Applied Health Economics and Health Policy</source>
          <volume>15</volume>
          (02
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Iacopo</given-names>
            <surname>Franconi</surname>
          </string-name>
          , Olga Theou,
          <string-name>
            <given-names>Lindsay</given-names>
            <surname>Wallace</surname>
          </string-name>
          , et al.
          <year>2018</year>
          .
          <article-title>Construct validation of a Frailty Index, an HIV Index and a Protective Index from a clinical HIV database</article-title>
          .
          <source>PLOS ONE 13 (10</source>
          <year>2018</year>
          ),
          <year>e0201394</year>
          .
          <source>94% Modena 95% Sydney</source>
          <volume>88</volume>
          %
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Leilani</surname>
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Gilpin</surname>
            , David Bau,
            <given-names>Ben Z.</given-names>
          </string-name>
          <string-name>
            <surname>Yuan</surname>
          </string-name>
          , et al.
          <year>2018</year>
          .
          <article-title>Explaining explanations: An overview of interpretability of machine learning</article-title>
          .
          <source>In Proc. DSAA</source>
          <year>2018</year>
          .
          <volume>80</volume>
          -
          <fpage>89</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Guaraldi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jovana</given-names>
            <surname>Milic</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>The Interplay Between Frailty and Intrinsic Capacity in Aging and HIV Infection</article-title>
          .
          <source>AIDS Research and Human Retroviruses</source>
          <volume>35</volume>
          (08
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Guaraldi</surname>
          </string-name>
          , Mirko Orsini,
          <string-name>
            <given-names>Agnese</given-names>
            <surname>Caselgrandi</surname>
          </string-name>
          , et al.
          <year>2019</year>
          .
          <article-title>Fitness tracking wearable devices and a dedicated smart phone app (MySAwH App) to predict quality of life in PLWH: a multi-centre prospective study</article-title>
          ..
          <source>In 17th European AIDS Conference (EACS)</source>
          (
          <year>2019</year>
          -08-05).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Riccardo</surname>
            <given-names>Guidotti</given-names>
          </string-name>
          , Anna Monreale,
          <string-name>
            <given-names>Salvatore</given-names>
            <surname>Ruggieri</surname>
          </string-name>
          , et al.
          <year>2018</year>
          .
          <article-title>A survey of methods for explaining black box models</article-title>
          .
          <source>Comput. Surveys</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Scott</surname>
            <given-names>M Lundberg</given-names>
          </string-name>
          and
          <string-name>
            <given-names>Su-In</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>A Unified Approach to Interpreting Model Predictions</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          30, I. Guyon,
          <string-name>
            <given-names>U. V.</given-names>
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vishwanathan</surname>
          </string-name>
          , and R. Garnett (Eds.). Curran Associates, Inc.,
          <fpage>4765</fpage>
          -
          <lpage>4774</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Scott M Lundberg</surname>
          </string-name>
          , Bala Nair,
          <string-name>
            <surname>Monica S Vavilala</surname>
          </string-name>
          , et al.
          <year>2018</year>
          .
          <article-title>Explainable machine learning predictions to help anesthesiologists prevent hypoxemia during surgery</article-title>
          .
          <source>Nature Biomedical Engineering</source>
          <volume>2</volume>
          ,
          <issue>10</issue>
          (
          <year>2018</year>
          ),
          <fpage>749</fpage>
          -
          <lpage>760</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Llew</surname>
            <given-names>Mason</given-names>
          </string-name>
          , Jonathan Baxter,
          <string-name>
            <surname>Peter L Bartlett</surname>
          </string-name>
          , et al.
          <year>2000</year>
          .
          <article-title>Boosting algorithms as gradient descent</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          .
          <volume>512</volume>
          -
          <fpage>518</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Ankur</given-names>
            <surname>Teredesai Muhammad Aurangzeb Ahmad</surname>
          </string-name>
          , Carly Eckert et al.
          <year>2018</year>
          .
          <article-title>Interpretable Machine Learning in Healthcare</article-title>
          .
          <source>IEEE Intelligent Informatics Bulletin</source>
          <volume>19</volume>
          ,
          <issue>1</issue>
          (
          <year>2018</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Harsha</surname>
            <given-names>Nori</given-names>
          </string-name>
          , Samuel Jenkins,
          <string-name>
            <given-names>Paul</given-names>
            <surname>Koch</surname>
          </string-name>
          , et al.
          <year>2019</year>
          .
          <article-title>InterpretML: A Unified Framework for Machine Learning Interpretability</article-title>
          . arXiv preprint arXiv:
          <year>1909</year>
          .
          <volume>09223</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16] World Health Organization.
          <year>2015</year>
          .
          <article-title>World report on aging and health</article-title>
          . https: //www.who.int/ageing/events/world-report-2015-launch/en/
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17] World Health Organization.
          <year>2018</year>
          .
          <article-title>Ageing and health</article-title>
          . https://www.who.int/ news-room/fact-sheets/detail/ageing-and-health
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Mirko</surname>
            <given-names>Orsini</given-names>
          </string-name>
          , Marco Pacchioni,
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Malagoli</surname>
          </string-name>
          , et al.
          <year>2017</year>
          .
          <article-title>My smart age with HIV: An innovative mobile and IoMT framework for patient's empowerment</article-title>
          .
          <source>In 2017 IEEE 3rd International Forum on Research and Technologies for Society and Industry (RTSI)</source>
          .
          <article-title>1-6</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Nathan</surname>
            <given-names>D</given-names>
          </string-name>
          <string-name>
            <surname>Price</surname>
          </string-name>
          , Andrew T Magis, John C Earls, et al.
          <year>2017</year>
          .
          <article-title>A wellness study of 108 individuals using personal, dense, dynamic data clouds</article-title>
          .
          <source>Nature Biotechnology</source>
          <volume>35</volume>
          (jul
          <year>2017</year>
          ),
          <fpage>747</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Mashfiqui</surname>
            <given-names>Rabbi</given-names>
          </string-name>
          , Shahid Ali,
          <string-name>
            <given-names>Tanzeem</given-names>
            <surname>Choudhury</surname>
          </string-name>
          , et al .
          <year>2011</year>
          .
          <article-title>Passive and in-situ assessment of mental and physical well-being using mobile sensors</article-title>
          .
          <source>In Proc. of UbiComp'11</source>
          .
          <fpage>385</fpage>
          -
          <lpage>394</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Martin</surname>
            <given-names>Ritt</given-names>
          </string-name>
          , Karl Gassmann, and
          <string-name>
            <given-names>Cornel</given-names>
            <surname>Sieber</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Significance of frailty for predicting adverse clinical outcomes in diferent patient groups with specific medical conditions</article-title>
          .
          <source>Zeitschrift fur Gerontologie und Geriatrie</source>
          <volume>49</volume>
          (09
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Samuel</surname>
            <given-names>Searle</given-names>
          </string-name>
          , Arnold Mitnitski,
          <string-name>
            <given-names>Evelyne</given-names>
            <surname>Gahbauer</surname>
          </string-name>
          , et al.
          <year>2008</year>
          .
          <article-title>A standard procedure for creating a frailty index</article-title>
          .
          <source>BMC geriatrics 8 (10</source>
          <year>2008</year>
          ),
          <fpage>24</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Lloyd</surname>
            <given-names>Stowell</given-names>
          </string-name>
          <string-name>
            <surname>Shapley</surname>
          </string-name>
          .
          <year>1953</year>
          .
          <article-title>A Value for n-Person Games</article-title>
          . Contributions to the
          <source>Theory of Games</source>
          , Vol.
          <volume>2</volume>
          . Princeton University Press, Chapter
          <volume>17</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Catherine</surname>
            <given-names>Tong</given-names>
          </string-name>
          , Matthew Craner,
          <string-name>
            <given-names>Matthieu</given-names>
            <surname>Vegreville</surname>
          </string-name>
          , et al.
          <year>2019</year>
          .
          <article-title>Tracking Fatigue and Health State in Multiple Sclerosis Patients Using Connnected Wellness Devices</article-title>
          .
          <source>Proc. of ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies</source>
          <volume>3</volume>
          ,
          <issue>3</issue>
          (sep
          <year>2019</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Berk</given-names>
            <surname>Ustun</surname>
          </string-name>
          and
          <string-name>
            <given-names>Cynthia</given-names>
            <surname>Rudin</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Supersparse Linear Integer Models for Optimized Medical Scoring Systems</article-title>
          .
          <source>Machine Learning</source>
          <volume>102</volume>
          (02
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Petar</surname>
            <given-names>Veličković</given-names>
          </string-name>
          , Laurynas Karazija, Nicholas D.
          <string-name>
            <surname>Lane</surname>
          </string-name>
          , et al.
          <year>2018</year>
          .
          <article-title>Cross-modal Recurrent Models for Weight Objective Prediction from Multimodal Timeseries Data</article-title>
          .
          <source>In Proc. of PervasiveHealth '18 (PervasiveHealth '18)</source>
          .
          <fpage>178</fpage>
          -
          <lpage>186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Danding</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qian Yang</surname>
            ,
            <given-names>Ashraf</given-names>
          </string-name>
          <string-name>
            <surname>Abdul</surname>
          </string-name>
          , et al.
          <year>2019</year>
          .
          <article-title>Designing theorydriven user-centric explainable AI</article-title>
          .
          <source>In Proc. of Conference on Human Factors in Computing Systems.</source>
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Rui</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Emily A.</given-names>
            <surname>Scherer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Megan</given-names>
            <surname>Walsh</surname>
          </string-name>
          , et al.
          <year>2018</year>
          .
          <article-title>Predicting Symptom Trajectories of Schizophrenia Using Mobile Sensing</article-title>
          .
          <source>GetMobile: Mobile Computing and Communications</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Rui</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weichen</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alex DaSilva</surname>
          </string-name>
          , et al.
          <year>2018</year>
          .
          <article-title>Tracking Depression Dynamics in College Students Using Mobile Phone and Wearable Sensing</article-title>
          .
          <source>Proc. of ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies</source>
          <volume>2</volume>
          ,
          <issue>1</issue>
          (
          <year>2018</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>