=Paper= {{Paper |id=Vol-2148/2.prefaceAndCommittee |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-2148/2.prefaceAndCommittee.pdf |volume=Vol-2148 }} ==None== https://ceur-ws.org/Vol-2148/2.prefaceAndCommittee.pdf
                                            Preface
                                 The 3rd International Workshop on
                          Knowledge Discovery in Healthcare Data (KDH)




Introduction                                                          Invited Speaker: Jesse D. Raffa, MIT Critical Data,
The Knowledge Discovery in Health care Data (KDH) work-               Cambridge, MA
shop series was established in 2016 to bring together AI and
clinical researchers, fostering collaborative discussions and
presenting AI research efforts to solve pressing problems in          Bio: Dr. Jesse Raffa, PhD, is a Research Scientist at the
health care. This is the workshop’s third year; held along with       Laboratory of Computational Physiology at MIT. His back-
IJCAI/ECAI in Stockholm, Sweden and focusing on learn-                ground is in data science, biostatistics and epidemiology, hav-
ing health care systems. For the first time, this workshop            ing completed a PhD in biostatistics from the University of
featured a challenge: The Machine Learning Blood Glucose              Waterloo, Canada, and more recently, a postdoc at the Uni-
Level Prediction Challenge.                                           versity of Washington, USA. His methodological interests in-
   The notion of the learning health care system has been             clude modeling complex longitudinal data and reproducible
put forward to denote the translation of routinely collected          research. He has collaborated with colleagues in a diverse
data into knowledge that drives the continual improvement of          set of fields including: virology, addiction, psychiatry, the so-
medical care. This notion has been described in many forms,           cial sciences, genetics and critical care. He was the recipient
but each follows a similar cycle of assembling, analyzing             of the New Investigator of the Year for Clinical Science by
and interpreting data from multiple sources (clinical records,        the Canadian Association of HIV/AIDS Research in 2004,
guidelines, patient-provided data including wearables, omic           and winner of a Distinguished Student Paper award by the
data, etc.), followed by feeding the acquired knowledge back          Eastern North American section of the International Biomet-
into clinical practice. This framework aims to provide per-           rics Society in 2013. Dr. Raffa has organized several criti-
sonalized recommendations and decision support tools to aid           cal care datathons since joining the Laboratory of Computa-
both patients and care providers, to improve outcomes and             tional Physiology, has chaired a session at the Joint Statistical
personalize care.                                                     Meetings, was a core-instructor for the MITx massive open
   This framework also extends the range of actions possible          online course, Global Health Informatics to Improve Quality
in response to patient monitoring data, for example, alerting         of Care, and created the residential MIT course: HST.953,
patients or automatically adjusting insulin doses when blood          Collaborative Data Science in Medicine.
glucose levels are predicted to go out of range. Blood glucose          Title: The Global Open Source Severity of Illness Scale
level prediction is a challenging task for AI researchers with        (GOSSIS): Opportunities and Challenges
the potential to improve the health and well-being of people
with diabetes. In the Machine Learning Blood Glucose Level               Abstract: This talk provides an overview of MIT Criti-
Prediction (BGLP) Challenge, researchers came together to             cal Data’s effort to develop a global open severity of illness
compare the efficacy of different machine learning prediction         scale for critical care patients in collaboration with interna-
approaches on a standard set of real patient data.                    tional partners. There is an increasing need for an openly
   The workshop received 22 submissions that were peer-               available severity of illness scale which is well documented
reviewed by at least two reviewers each. After the review             and easy to deploy. Many current offerings exist, but they are
phase, 9 technical papers and 7 BGLP Challenge papers were            often: proprietary, expensive or developed at a single center
accepted for presentation at the workshop. Among the ac-              or geographic region. Thus far, we have collaborators who
cepted papers, the current trend of applying deep learning can        have contributed data from North and South America, Asia
be seen here as well, three papers use deep learning methods          and Oceania. We will discuss the current technical approach
on health care data, while other methods used are: case-based         for handling this heterogeneous set of data, where differences
reasoning, natural language processing or time series analy-          in data collection practices and patient case mix can severely
sis. Another trend seen in the presentations was the need for         affect the ability to predict patient outcomes. The ability of
open data sets that can drive the field forward and build on          models trained in one setting and applied in other settings will
each other’s work. This topic was addressed by the invited            also be explored, with the ultimate aim to foster international
talk as well as by the included BGLP Challenge.                       collaboration in critical care research.


                                                                  3
Accepted Papers                                                           Martinsson et al. use an LSTM-based approach in two in-
The following technical papers presenting original research            stantiations: one that predicts just the BG level, trained with
works were accepted.                                                   a mean squared error (MSE) loss, and one that predicts both
   A number of papers address problems with activity mon-              the mean and variance of BG levels under a Gaussian distribu-
itoring using sensors and wearables. Holmes et al. present             tion, trained with a negative log-likelihood loss. Both models
an approach for analysing patient domestic activity as part of         use only a 30 minute history of blood glucose behavior and
recovery monitoring after surgery. The work uses a combi-              achieve comparable RMSE results. Analysis of patient-level
nation of ambient and wearable sensors to measure variables            results shows that a predicted higher variance correlates with
such as location, intensity of movement and types of physical          a higher RMSE.
activity. The authors present results from a real scenario in an          Chen et al. use a 3-layered Dilated RNN (DRNN), which
out-of-lab environment to back their work.                             enables a vanilla RNN to learn temporal dependencies at dif-
   Diaz et al. present a methodology for analysing changes             ferent resolutions, thus capturing the difference in temporal
in physical activity in school children following specific in-         effects on blood glucose due to previous BG levels, carbo-
tervention programs. The work describes how different ac-              hydrate intake, and insulin. When training data is small due
tivity levels are recognised from raw accelerometer data and           to large regions of missing data, the subject-specific DRNN
how length and frequency are computed for different activity           models are pre-trained on 10% of data from the other sub-
levels. Clustering is then used to determine physical activ-           jects. The trained DRNN models process a history of 30 min-
ity changes in the trial group, in comparison with a control           utes and are instantiated with vanilla RNN cells, which are
group.                                                                 shown to obtain better results compared to the more complex
   Vonstad et al. present a machine learning model to clas-            LSTM cells and gated recurrent units (GRU).
sify the quality of movements measured using a camera                     Zhu et al. use a WaveNet approach containing blocks of
tracking system. The authors specifically focus on classify-           5-layered Dilated CNN (DCNN). Like Chen et al. (this vol-
ing weight-shifting movements, common in stroke rehabilita-            ume), they predict blood glucose using a history of BG lev-
tion patients, and evaluate different classification algorithms        els, carbohydrate intake, insulin, and normalized time. To
(Random Forests, K-nearest Neighbour and Support Vector                account for large blocks of missing data, the training set for
Machines) .                                                            each subject is augmented with 10% of data from the other
   Massie et al. present preliminary work to exploit sensor            subjects. Experimental results show that using carbohydrate
data to develop a fall prediction system for the residents of          intake, insulin, and normalized time improves the RMSE,
FitHomes, a Scottish Smart Home initiate lead by Albyn                 compared with using BG levels alone as input.
Housing Society Ltd, UK.                                                  Midroni et al. explored gradient-boosted decision trees
   Finally, Agrawal et al. describe data gathering and analy-          (XGBoost), Random Forests (RF), and RNNs, using a diverse
sis for a virtual trainer to assist nursing and care profession-       set of features and their combinations. The best test results
als move patients safely, usinga Kinect camera and pressure            are obtained using XGBoost. Subsequent feature ablation ex-
sensors on shoes to collect data for annotating as correct or          periments with XGBoost show that optimal test RMSE is ob-
incorrect (with error label).                                          tained using only blood glucose and self-reported information
   The proceedings also include a number of papers imple-              such as meals, finger-stick glucose, stress, illness, exercise
menting Machine Learning approaches for a number of pre-               and work.
diction and classification tasks, including Natural Language              Bertachi et al. use feed-forward neural networks (NNs) on
Processing. To begin with, Biagi et al. propose the use of             two tasks: BG prediction and hypoglycemia prediction. For
Compositional Data (CoDa) analysis to classify daily blood             BG prediction, physiological model equations estimate the
glucose patterns in people with type 1 diabetes (T1DM) based           insulin on board, the glucose absorption rate, and the activity
on features including the proportions of the day spent in dif-         on board. Together with two BG derived features, these are
ferent glycemic regions as well as glucose concentration pat-          used as input to a set of independently trained NNs, one for
terns on different days for the patient. This work can pro-            each of 6 predefined BG ranges.
vide valuable insight to care providers if patterns found can             Contreras et al. use a grammatical evolution approach to
be linked with daily activities, with interesting preliminary          generate models of BG dynamics, based on the output of the
results presented in the paper. Gupta et al. present an ap-            same physiological equations used by Bertachi et al. (this vol-
plication using pre-trained TimeNets to learn features from            ume). They also introduce a sinusoidal term to account for the
time series patient health data to predict mortality in the ICU.       circadian variations of the patients’ physiology. Besides the
Finally, Adduru et al. present a novel method to create a              standard RMSE, two additional metrics are used to define the
paraphrase corpus from webpages discussing medical topics.             fitness function: the glucose specific RMSE and the Clarke
   The BGLP Challenge papers describe blood glucose (BG)               error grid. Corresponding results are then reported for 30, 60,
level prediction approaches and the corresponding results on           and 90 minute predictions.
the OhioT1DM dataset. The participants used a wide vari-                  Xie and Wang compare the classic autoregression with ex-
ety of machine learning approaches, ranging from simple au-            ogenous inputs (ARX) with a set of popular ML algorithms,
toregressive models and ridge regression, to more sophisti-            including XGBoost, support vector regression (SVR), as well
cated deep learning models such as recursive neural networks           as deep learning models such as LSTMs and temporal con-
(RNN) with long short-term memory (LSTM) units, dilated                volution networks (TCNs). All models use the total insulin
RNNs, or temporal convolution networks (TCN).                          delivery rate, meal sizes, and the heart rate. Two multi-step


                                                                   4
prediction strategies are defined and evaluated: a recursive         Program Committee
method that predicts one step ahead multiple times, and a             • Imon Banerjee, Stanford University
direct method that predicts multiple steps ahead. Of all the
models tested, the simple ARX model is shown to obtain the            • Isabelle Bichindaritz, State University of New York at
best RMSE on test data.                                                 Oswego
   The final session of the workshop was a community discus-          • Ali Cinar, Illinois Institute of Technology
sion with a panel of the workshop organizers chaired by Raz-          • Kevin Bretonnel Cohen, University of Colorado School
van Bunescu. During this session the panelists discussed the            of Medicine
challenges faced when creating open data sets in the health
sciences as well as encouraging the community to open their           • José Manuel Colmenar, Universidad Rey Juan Carlos
source code. There was a clear agreement that sharing all             • Bryan Conroy, Philips Research North America
types of sources is beneficial and should be encouraged.
   We very much appreciate the support of the workshop                • Sergio Consoli, Philips Research, Data Science Depart-
chair, Kevin Leyton-Brown, as well as this year’s conference            ment
chair Jeffrey S. Rosenschein and program chair Jérôme Lang.         • Alexandra Constantin, Bigfoot Biomedical
Further we would like to thank Fredrik Heintz for the local           • Vivek V Datla, Philips Research North America
arrangements and Vesna Sabljakovic-Fritz for her adminis-
trative support.                                                      • Spiros Denaxas, University College London
   We sincerely hope that the participants enjoyed this year’s        • Franck Dernoncourt, Massachusetts Institute of Tech-
workshop program and that this collection of papers will in-            nology
spire and encourage more AI-related research for and within           • Andrea Facchinetti, University of Padova
healthcare in the future.
                                                                      • Michele Filannino, MIT
                                                                      • Pau Herrero, Imperial College London
                             Kerstin Bach, Razvan Bunescu,
                                  Oladimeji Farri, Aili Guo,          • Ignacio Hidalgo, Universidad Complutense de Madrid
                                Sadid Hasan, Zina Ibrahim,            • Yuan Ling, Philips Research North America
                                Cindy Marling, Jesse Raffa,           • Bo Liu, Auburn University
                              Jonathan Rubin, Honghan Wu
                                                                      • Stewart Massie, Robert Gordon University
                                       Stockholm, July 2018
                                                                      • Claudia Moro, PUCPR
                                                                      • Tristan Naumann, Massachusetts Institute of Technol-
                                                                        ogy
                                                                      • Anna Rumshisky, University of Massachusetts, Lowell
                                                                      • Sadiq Sani, Robert Gordon University Aberdeen
                                                                      • Alexander Schliep, Gothenburg University
                                                                      • Rushdi Shams, OneClass - Notesolution Inc.
                                                                      • Giovanni Sparacino, University of Padova
                                                                      • Shawn Stapleton, Philips Research North America
                                                                      • Ozlem Uzuner, George Mason University
                                                                      • Josep Vehi, University of Girona




                                                                 5