Sentiment Analysis of medical questionnaires to
improve adherence to telemonitoring programs
(Discussion Paper)

Chiara Zucco1 , Clarissa Paglia2 , Sonia Graziano3 , Sergio Bella2 and Mario Cannataro1
1
  Data Analytics Research Center, Department of Medical and Surgical Sciences, University Magna Græcia, Catanzaro,
Italy
2
  Unit of Cystic Fibrosis, Bambino Gesù Children’s Hospital, Rome, Italy
3
  Unit of Clinical Psychology, Bambino Gesù Children’s Hospital, Rome, Italy


                                         Abstract
                                         Although it is widely known that high adherence to telemedicine and in particular home telemonitoring
                                         programs lead to an improvement in the patient’s quality of life, a reduction in hospitalizations, and the
                                         containment of healthcare costs, poor adherence is a widespread problem, especially in chronic diseases.
                                         Guided by the intuition that the sentiment, i.e., the degrees of positiveness/negativeness, expressed by
                                         patients through their responses to ad-hoc designed questionnaires, may be related to their adherence
                                         and can be used to predict poor adherence levels, this work describes an integrated software architecture
                                         for the online provision, collection, and analysis of questionnaires or surveys and summarizes some
                                         preliminary results already presented in our previous work.

                                         Keywords
                                         Text Mining, Sentiment Analysis, Web-based questionnaire, Telemedicine.


1. Introduction
All the set of health services providing medical care in patients’ daily living environment
are subsumed under the umbrella of telemedicine, which strongly relies on the support of
information and telecommunication technologies [2].
   Common goals of telemedicine programs range from the increment of patient empowerment
to the reduction of healthcare costs [3].
   Telemedicine and, specifically, home telemonitoring systems have shown themselves to
be cost-effective [4] and to provide an improvement in patients’ quality-of-life, in terms, for
instance, of significant reduction of both mortality and length of stay of patients in progressive
care unit [5], significant improvement of glycemic control for patients with diabetes [6]. How-
ever, a telemonitoring program’s effectiveness is affected by the extent to which the patient
follows medical protocols in terms of frequency and/or dosage [7], namely the adherence levels
and, consequently, the degrees of drop-out. In fact, poor adherence is a widespread problem,


SEBD 2021: The 29th Italian Symposium on Advanced Database Systems, September 5-9, 2021, Pizzo Calabro (VV), Italy
Envelope-Open chiara.zucco@unicz.it (C. Zucco); clarissa.paglia@opbg.net (C. Paglia); sonia.graziano@gmail.com (S. Graziano);
sergio.bella@opbg.net (S. Bella); cannataro@unicz.it (M. Cannataro)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
especially in chronic diseases. Therefore, several studies have been carried out to investigate
variables related to poor adherence [9],[8],[10].
   Adherence levels may be measured in different suitable ways. Here, adherence is intended as
the rate of performed monitoring events w.r.t. the ideal number of events, suggested by the
telemonitoring protocol, while the degrees of drop-out refer to the percentage of patients the
abandon the telemedicine or the telehomecare program they were enrolled in, generally due to
poor adherence.
   The proposed architecture is intended as a contribution that can help the context of home
telemonitoring programs. The basic idea is to integrate within a telehomecare system an
online survey instrument that, by collecting textual answers written by patients enrolled in a
telemonitoring program, can extract useful features capable of early predict drop-out due to
poor adherence.
   The system encompasses a novel analysis approach that leverages lexicon-based sentiment
analysis techniques [11] and exploits the inferred polarity as a numerical feature to enhance
further statistical or machine learning analysis.
   To the best of our knowledge, no specific research has been published, nor a system archi-
tecture has been proposed that would explicitly monitor changes in patient’s opinion across
time through the repeated administration of a questionnaire, using the polarity associated with
answers to open-ended question as a numerical feature, in a telehomecare system.
   Additionally, the paper will present a case study application of the system architecture to
discuss whether a predictive relationship, in terms of Granger-causality test modeling, may be
assessed between patient adherence in a cystic fibrosis telehomecare program and their opinion
about the program they are enrolled in.
   The rest of the paper is organized as follows: Section 2 describes the methodology behind
the proposed approach and the case-study application. Section 3 provides insights related to
collected data and presents the Granger-causality hypothesis tests results and discusses it.
   Finally, Section 4 concludes the paper and outlines future works.


2. Materials and methods
In this Section, some preliminary information related to the case study, a description of the
experimental protocol used and the analysis pipeline’s proposal are presented.

2.1. Experimental protocol and Dataset description
The data analyzed in the present case study application were collected from the Cystic Fibrosis
Unit, Bambino Gesù Children’s Hospital, Rome, Italy. In this study, 169 online surveys sent by
38 cystic fibrosis patients ( F/M=20/18, age= 28.7 ± 9.91, age range= 14 − 49) recruited among
patients already enrolled in a telemedicine program (years of enrollment = 5.9 ± 3.9) were
collected and analyzed at five different survey epochs.
  The enrollment criteria include patients more than 12 years old with cystic fibrosis who will
access the Cystic Fibrosis Unit in ordinary, daytime, or outpatient hospitalization. All patients
who have undergone a transplant (liver/lung) were excluded from the study.
  The study was formally approved by the local Medical Research Ethics Committee.
2.2. Administration of questionnaire
From June 2019, 38 enrolled patients were asked to complete, every three months, an online
questionnaire designed ad-hoc by the clinical team. In the following, each set of surveys
submission will be indicated as an epoch.
   The Telemedicine Drop-Out (TDO) questionnaire consists of 15 blocks of closed, mixed, and
open-ended items with yes/no constraints, and it has been administered through a self-hosted
web-based survey instrument built on top of LimeSurvey1 . The TDO survey was designed as
an online, structured version of the interview led by the medical team within the telemedicine
program, extended with a series of open-ended questions, whose objective was to infer polarity
or, in perspective, to extract emotions from the relative answers [12].
   In order to administer surveys to patients, LimeSurvey is set up as a highly customizable, free,
and responsive online survey tool. It also provides various API functions through the LimeSurvey
RemoteControl 2 (LSRC2). The survey structure and the participants are created through the
user interface provided by LimeSurvey. The collection of survey answers is automatized using
the Python library Limepy that provides a Python wrapper for the LSRC2 API and the Python
library Schedule to automatically update the responses. The DBMS server is MySQL.
   As already stated, adherent patients need to transmit the results of the spirometry test at
least twice a week. For each survey administration, i.e. survey epoch, the patient’s adherence
score (Adh-score) to the telemonitoring program was assessed as the total number of spirometry
transmissions sent during a three months window starting from the month before until the
month subsequent to the survey administration, averaged by twice the total number of weeks
following. More in details, let suppose that a survey was carried at month t, then Adh-score𝑡 =
 𝑛𝑆𝑡−1 +𝑛𝑆𝑡 +𝑛𝑆𝑡+1
2(𝑤𝑡−1 +𝑤𝑡 +𝑤𝑡+1 )
                   where 𝑛𝑆𝑡−1 , 𝑛𝑆𝑡 , 𝑛𝑆𝑡+1 refer to the number of spirometry trasmissions sent at
month 𝑡 − 1, 𝑡, 𝑡 + 1 respectively, while 𝑤𝑡−1 , 𝑤𝑡 , 𝑤𝑡+1 refer to the number of weeks in the month
𝑡 − 1, 𝑡, 𝑡 + 1 respectively.
   Patients that strictly follow medical advice have a related Adh-score≥ 1. In the following, the
percentage of Adh-score, i.e. Adh-score (%) will be considered. Therefore, 0 ≤ Adh-score (%) =
Adh-score ∗ 100 and Adh-score (%)> 100 for patients with high rates of adherence. The clinical
team provided the number of transmissions per month.

2.3. System architecture
The system architecture encompasses three independent modules, connected in a cascade-
fashion. In future works, the modules are supposed to be integrated using a unique user
interface. Figure 1 shows the overall architecture for the system, that is organized as three
logical levels: i.e. i) data collection, ii) data integration, and iii)data analysis.
   The general pipeline for the analysis of textual data, i.e. answers to open-ended questions,
involves: Text preprocessing : includes standard NLP (Natual Language Processing) tech-
niques, i.e. tokenization, stop word removal, and lemmatization. The preprocessing step has
been executed by using SpaCy 2 a popular library for Natural Language Processing in Python,


    1
        https://www.LimeSurvey.org/
    2
        https://spacy.io/
which provides a set of preprocessing algorithms also for the Italian language. Feature ex-
traction: to each open-ended free-text answer, a polarity score in the range [−1, 1] has been
assigned through the VADER [13] lexicon-based method adapted to the Italian language and
considered as a numerical feature. Statistical hypothesis testing: data have been sorted by
respondents and survey submission date and, for each open-ended question, the sequence of
assigned polarity was modeled as a time series, as the sequence of Adh-scores. Augmented
Dickey-Fuller Test [14] was used to check for stationarity while Granger-causality hypothesis
test model [15] was examined to discuss the existence of directed causal interactions between
the polarity score associated with free-text answers and adherence. Data visualization: to
provide useful insights and summarize patient answers, different visualization techniques have
been used. In particular, preprocessed free-text answers have been visualized through word
clouds, while a graph shows the time-series of polarity scores at different submission epochs.

                                                 img/architecture-eps-converted-to.pdf
Figure 1: The modules of the system architecture, implemented as three independent levels connected
to each other in cascade. The architecture is designed to be cyclical, as the system is used for each
scheduled administration of the survey.


   VADER (Valence Aware Dictionary for sEntiment Reasoning) [13] is a lexicon-based senti-
ment analysis engine that combines lexicon-based methods with a rule-based modeling consist-
ing of 5 human validated rules.
   Starting from a classical lexicon-based approach, the VADER engine’s core step is the identi-
fication of some general grammatical and syntactic heuristics to identify semantic shifters, i.e.
words that increase, decrease or change the polarity orientation of another word. In particular,
five heuristics for sentiment polarity shifters have been identified: i.e.punctuation; capitaliza-
tion; a list of degree modifiers which encompasses noun, adjective, adverbs know to impact
sentiment intensity by increasing or decreasing it; contrastive particles and negations.
   To extend VADER to the Italian language, Sentix [16], a lexicon that automatically extends
the SentiWordNet annotation to the Italian synsets provided in MultiWordNet [17], has been
considered.
   Among the five heuristics designed in VADER, only three needed to be adapted to the Italian
language since the shifter role of capitalization of words and exclamation marks is used as
intensifiers for both languages. Words belonging to the VADER set of negation words were
translated in the Italian language, and the set was then extended by retrieving MultiWordNet
synset terms for each word, while contrastive particle “but” has been simply translated to Italian.
   Among the intensifier sets, VADER also considered a few idioms, but, due to discrepancies
across different languages, idioms were not considered.
   Granger-causality is a statistical hypothesis testing model to determine if there is a directed
relationship between two time-series [15]. A time series X is said to Granger-cause Y if it can
be shown that there is a statistically significant improvement in predicting future values of Y by
using past values of X (i.e. lagged values of X) and Y, compared to predictions based only on
past values of Y.
   The possibility to relate past values of X to Y’s actual values is in virtue of a lag factor. Here,
the Granger-causality test was computed for X’s lagged values. All the lags ranging from one
to four were tested, where four is the number of considered submission epochs minus one.
   Here, the considered alternative hypothesis is that the polarity-score time-series associated
with each considered open-ended question Granger-cause the time-series of adherence. The
level of significance was set at 5%, i.e. p-value < 0.05. The Granger-causality test assumes the
hypothesis that the investigated time-series are stationary. Therefore the augmented Dickey-
Fuller method was exploited to check stationarity conditions [14].


3. Results and discussion
A comprehensive analysis of the responses to the TDO survey is beyond the scope of this study.
Instead, only answers to two open-ended questions collected from the TDO survey will be
discussed, namely: i) Q1 - “What do you think about telemedicine?”, and Q2 - “Since you joined
the telemonitoring program, what has improved the quality of your life?”. A polarity score
ranging in [−1, 1] has been inferred by adapting the VADER framework to the Italian language
and considered as a numerical feature for each set of answers.

3.1. Testing Granger-causality
Three time-series have been considered, i.e. polarity score related to Q1 and Q2 respectively and
the time-series of Adh-scores. The augmented Dickey-Fuller test showed that for all the three
considered time-series the stationarity condition holds (p-value= 4.6124e-18, p-value= 3.2185e-
07, p-value= 0.0035 respectively).
   Two Granger-causality tests were performed to check whether Q1 Granger-causes Adh-score
and whether Q2 Granger-causes Adh-score. Moreover, since all the three series are considered
contemporaneously, we also need to check whether Adh-score Granger-causes Q1 and whether
Adh-score Granger-causes Q2. Three different test-statistics, i.e. F-test, chi2, and likelihood-
ratio were considered, with the number of lags varying from one to four. Table 1 and Table
2 show the results in terms of p-values. It can be seen that both Q1 and Q2 Granger-causes
Adh-score for lag=1. On the other hand, Adh-score appears to not Granger-causes Q1 or Q2.
   Therefore, the results suggest the existence of a predictive relationship between the polarity
scores series associated with Q1 and the polarity score series associated with Q2 w.r.t. the
Adh-score.

Table 1
Q1 and Adh-score: p-value of Granger-causality test performed with three different statistics and four
different lags.
                      Q1⟶ Adh-score                              Adh-score⟶ Q1

             F-test   Chi2 test   Likelihood-ratio      F-test   Chi2 test   Likelihood-ratio

        1   0.0368     0.0339         0.0350            0.4076    0.4027          0.4031
        2   0.0998     0.0908         0.0936            0.6667    0.6585          0.6592
        3   0.1063     0.0917         0.0963            0.6628    0.6479          0.6496
        4   0.2032     0.1760         0.1833            0.8200    0.8061          0.8064
Table 2
Q2 and Adh-score: p-value of Granger-causality test performed with three different statistics and four
different lags.
                      Q2⟶ Adh-score                              Adh-score⟶ Q2

             F-test   Chi2 test   Likelihood-ratio      F-test   Chi2 test   Likelihood-ratio

        1   0.0020     0.0016         0.0028            0.9970    0.9969          0.9969
        2   0.0174     0.0141         0.0155            0.7455    0.7390          0.7394
        3   0.0547     0.0446         0.0482            0.8235    0.8148          0.9154
        4   0.0865     0.0685         0.0744            0.7566    0.7387          0.7406


  The results are consistent with the hypothesis that the polarities extracted from patients’
opinion on telemedicine may help predict their average adherence one epoch after the survey
administration.


4. Conclusions
In the present paper, a system architecture for the extraction of emotional states from textual
contents, designed to support the monitoring of patients with chronic disease has been presented.
The main goal of the proposed system is basically to implement a methodology to capture the
underlying opinions that chronic patients have about the program they are enrolled in and to
investigate whether these features may help in the early prediction of patients drop-out from
the telemedicine program.
   The proposed system is designed in an end-to-end fashion to provide support through the
whole process, including the implementation of the questionnaire, the survey administration
at scheduled intervals, as well as the analysis. Specific contributions are: i)the design of a
self-hosted web-based survey instrument built on top of Limesurvey for the management of
online inquiries over time; ii)an analysis pipeline that exploits sentiment analysis techniques
to infer a sentiment polarity score for each open-ended answer and uses it as a numerical
feature (to the best of our knowledge, this is the first time that this kind of approach has been
proposed); iii)the validation of both the survey instrument and the analysis pipeline, that have
been applied to collect and analyze 169 TDO survey responses sent by 38 patients enrolled in
a home telemonitoring program provided by the Cystic Fibrosis Unit at the “Bambino Gesu”
Children Hospital in Rome, Italy.
   Limitations to the present analysis may be found in the small amount of data collected
up-to-date, which does not consent the investigation of changes for a single patient through
time.
   Nevertheless, the promising results encourage us to further investigate the potentiality of
the proposed architecture and the analysis pipeline with the aim to develop, as future work, a
predictive system for the early detection of poorly-adherent patients that may also alert doctors
to contact patients and eventually update/personalize their telemedicine program (e.g. in terms
of timing, technological equipment, psychological counseling, etc.)
References
 [1] Zucco, C., Paglia, C., Graziano, S., Bella, S., & Cannataro, M.: Sentiment Analysis and
     Text Mining of Questionnaires to Support Telemonitoring Programs. Information, 11(12),
     550 (2020)
 [2] Ryu, S. Telemedicine: Opportunities and Developments in Member States: Report on the
     Second Global Survey on eHealth 2009 (Global Observatory for eHealth Series, Volume
     2); Healthc. inf. res.; 18, 153 (2012).
 [3] Nielsen, M.K.; Johannessen, H. Patient empowerment and involvement in telemedicine.
     J. Nurs. Educ. Pract., 9, 54–58 (2019).
 [4] Delgoshaei, B.; Mobinizadeh, M.; Mojdekar, R.; Afzal, E.; Arabloo, J.; Mohamadi, E.
     Telemedicine: A systematic review of economic evaluations. Med J. Islam. Repub. Iran
     (MJIRI), 31, 754–761 (2017).
 [5] Armaignac, D.L.; Saxena, A.; Rubens, M.; Valle, C.A.; Williams, L.M.S.; Veledar, E.; Gidel,
     L.T. Impact of Telemedicine on Mortality, Length of Stay, and Cost Among Patients
     in Progressive Care Units: Experience From a Large Healthcare System. Critical care
     medicine, 46, 728 (2018).
 [6] Polisena, J.; Tran, K.; Cimon, K.; Hutton, B.; McGill, S.; Palmer, K. Home telehealth for
     diabetes management: a systematic review and meta-analysis. Diabetes, Obesity and
     Metabolism, 11, 913–930 (2009).
 [7] Osterberg, L., & Blaschke, T.: Medication Adherence. New England Journal of Medicine,
     353, 487-497 (2005)
 [8] Gorst, S.L.; Armitage, C.J.; Brownsell, S.; Hawley, M.S. Home telehealth uptake and
     continued use among heart failure and chronic obstructive pulmonary disease patients: a
     systematic review. Annals of Behavioral Medicine, 48, 323–336 (2014).
 [9] Mathes, T., Jaschinski, T., & Pieper, D.: Adherence influencing factors–a systematic review
     of systematic reviews. Archives of Public Health, 72(1), 1–9 (2014)
[10] Tagliente, I.; Solvoll, T.; Murgia, F.; Bella, S. Telemonitoring in cystic fibrosis: A 4-year
     assessment and simulation for the next 6 years. Interact. J. Med. Res., 5, (2016).
[11] Zucco, C.; Calabrese, B.; Agapito, G.; Guzzi, P.H.; Cannataro, M. Sentiment analysis
     for mining texts and social networks data: Methods and tools. Wiley Interdisciplinary
     Reviews: Data Mining and Knowledge Discovery, p. e1333 (2019).
[12] Zucco, C.; Bella, S.; Paglia, C.; Tabarini, P.; Cannataro, M. Predicting abandonment
     in telehomecare programs using Sentiment Analysis: a system proposal. 2018 IEEE
     International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 1734–1739
     (2018).
[13] Hutto, C.; Vader, G.E. A parsimonious rule-based model for sentiment analysis of social
     media text. In Proceedings of the Eighth International AAAI Conference on Weblogs and
     Social, Ann Arbor, MI, USA, 1–4 (2014).
[14] Dickey, D.A.; Fuller, W.A. Distribution of the estimators for autoregressive time series
     with a unit root. J. Am. Stat. Assoc., 74, 427–431 (1979).
[15] Granger, C.W. Testing for causality: A personal viewpoint. J. Econ. Dyn. Control, 2,
     329–352 (1980).
[16] Basile, V.; Nissim, M. Sentiment analysis on Italian tweets. In Proceedings of the 4th
     Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analy-
     sis, Atlanta, Georgia, 14 June 2013; Association for Computational Linguistics: Atlanta,
     Georgia; 100–107 (2013).
[17] Pianta, E.; Bentivogli, L.; Girardi, C. MultiWordNet: Developing an Aligned Multilingual
     Database; In Proceedings of the First International Conference on Global WordNet,
     Mysore, India, 293–302 (2002).