=Paper=
{{Paper
|id=Vol-2148/2.prefaceAndCommittee
|storemode=property
|title=None
|pdfUrl=https://ceur-ws.org/Vol-2148/2.prefaceAndCommittee.pdf
|volume=Vol-2148
}}
==None==
Preface The 3rd International Workshop on Knowledge Discovery in Healthcare Data (KDH) Introduction Invited Speaker: Jesse D. Raffa, MIT Critical Data, The Knowledge Discovery in Health care Data (KDH) work- Cambridge, MA shop series was established in 2016 to bring together AI and clinical researchers, fostering collaborative discussions and presenting AI research efforts to solve pressing problems in Bio: Dr. Jesse Raffa, PhD, is a Research Scientist at the health care. This is the workshop’s third year; held along with Laboratory of Computational Physiology at MIT. His back- IJCAI/ECAI in Stockholm, Sweden and focusing on learn- ground is in data science, biostatistics and epidemiology, hav- ing health care systems. For the first time, this workshop ing completed a PhD in biostatistics from the University of featured a challenge: The Machine Learning Blood Glucose Waterloo, Canada, and more recently, a postdoc at the Uni- Level Prediction Challenge. versity of Washington, USA. His methodological interests in- The notion of the learning health care system has been clude modeling complex longitudinal data and reproducible put forward to denote the translation of routinely collected research. He has collaborated with colleagues in a diverse data into knowledge that drives the continual improvement of set of fields including: virology, addiction, psychiatry, the so- medical care. This notion has been described in many forms, cial sciences, genetics and critical care. He was the recipient but each follows a similar cycle of assembling, analyzing of the New Investigator of the Year for Clinical Science by and interpreting data from multiple sources (clinical records, the Canadian Association of HIV/AIDS Research in 2004, guidelines, patient-provided data including wearables, omic and winner of a Distinguished Student Paper award by the data, etc.), followed by feeding the acquired knowledge back Eastern North American section of the International Biomet- into clinical practice. This framework aims to provide per- rics Society in 2013. Dr. Raffa has organized several criti- sonalized recommendations and decision support tools to aid cal care datathons since joining the Laboratory of Computa- both patients and care providers, to improve outcomes and tional Physiology, has chaired a session at the Joint Statistical personalize care. Meetings, was a core-instructor for the MITx massive open This framework also extends the range of actions possible online course, Global Health Informatics to Improve Quality in response to patient monitoring data, for example, alerting of Care, and created the residential MIT course: HST.953, patients or automatically adjusting insulin doses when blood Collaborative Data Science in Medicine. glucose levels are predicted to go out of range. Blood glucose Title: The Global Open Source Severity of Illness Scale level prediction is a challenging task for AI researchers with (GOSSIS): Opportunities and Challenges the potential to improve the health and well-being of people with diabetes. In the Machine Learning Blood Glucose Level Abstract: This talk provides an overview of MIT Criti- Prediction (BGLP) Challenge, researchers came together to cal Data’s effort to develop a global open severity of illness compare the efficacy of different machine learning prediction scale for critical care patients in collaboration with interna- approaches on a standard set of real patient data. tional partners. There is an increasing need for an openly The workshop received 22 submissions that were peer- available severity of illness scale which is well documented reviewed by at least two reviewers each. After the review and easy to deploy. Many current offerings exist, but they are phase, 9 technical papers and 7 BGLP Challenge papers were often: proprietary, expensive or developed at a single center accepted for presentation at the workshop. Among the ac- or geographic region. Thus far, we have collaborators who cepted papers, the current trend of applying deep learning can have contributed data from North and South America, Asia be seen here as well, three papers use deep learning methods and Oceania. We will discuss the current technical approach on health care data, while other methods used are: case-based for handling this heterogeneous set of data, where differences reasoning, natural language processing or time series analy- in data collection practices and patient case mix can severely sis. Another trend seen in the presentations was the need for affect the ability to predict patient outcomes. The ability of open data sets that can drive the field forward and build on models trained in one setting and applied in other settings will each other’s work. This topic was addressed by the invited also be explored, with the ultimate aim to foster international talk as well as by the included BGLP Challenge. collaboration in critical care research. 3 Accepted Papers Martinsson et al. use an LSTM-based approach in two in- The following technical papers presenting original research stantiations: one that predicts just the BG level, trained with works were accepted. a mean squared error (MSE) loss, and one that predicts both A number of papers address problems with activity mon- the mean and variance of BG levels under a Gaussian distribu- itoring using sensors and wearables. Holmes et al. present tion, trained with a negative log-likelihood loss. Both models an approach for analysing patient domestic activity as part of use only a 30 minute history of blood glucose behavior and recovery monitoring after surgery. The work uses a combi- achieve comparable RMSE results. Analysis of patient-level nation of ambient and wearable sensors to measure variables results shows that a predicted higher variance correlates with such as location, intensity of movement and types of physical a higher RMSE. activity. The authors present results from a real scenario in an Chen et al. use a 3-layered Dilated RNN (DRNN), which out-of-lab environment to back their work. enables a vanilla RNN to learn temporal dependencies at dif- Diaz et al. present a methodology for analysing changes ferent resolutions, thus capturing the difference in temporal in physical activity in school children following specific in- effects on blood glucose due to previous BG levels, carbo- tervention programs. The work describes how different ac- hydrate intake, and insulin. When training data is small due tivity levels are recognised from raw accelerometer data and to large regions of missing data, the subject-specific DRNN how length and frequency are computed for different activity models are pre-trained on 10% of data from the other sub- levels. Clustering is then used to determine physical activ- jects. The trained DRNN models process a history of 30 min- ity changes in the trial group, in comparison with a control utes and are instantiated with vanilla RNN cells, which are group. shown to obtain better results compared to the more complex Vonstad et al. present a machine learning model to clas- LSTM cells and gated recurrent units (GRU). sify the quality of movements measured using a camera Zhu et al. use a WaveNet approach containing blocks of tracking system. The authors specifically focus on classify- 5-layered Dilated CNN (DCNN). Like Chen et al. (this vol- ing weight-shifting movements, common in stroke rehabilita- ume), they predict blood glucose using a history of BG lev- tion patients, and evaluate different classification algorithms els, carbohydrate intake, insulin, and normalized time. To (Random Forests, K-nearest Neighbour and Support Vector account for large blocks of missing data, the training set for Machines) . each subject is augmented with 10% of data from the other Massie et al. present preliminary work to exploit sensor subjects. Experimental results show that using carbohydrate data to develop a fall prediction system for the residents of intake, insulin, and normalized time improves the RMSE, FitHomes, a Scottish Smart Home initiate lead by Albyn compared with using BG levels alone as input. Housing Society Ltd, UK. Midroni et al. explored gradient-boosted decision trees Finally, Agrawal et al. describe data gathering and analy- (XGBoost), Random Forests (RF), and RNNs, using a diverse sis for a virtual trainer to assist nursing and care profession- set of features and their combinations. The best test results als move patients safely, usinga Kinect camera and pressure are obtained using XGBoost. Subsequent feature ablation ex- sensors on shoes to collect data for annotating as correct or periments with XGBoost show that optimal test RMSE is ob- incorrect (with error label). tained using only blood glucose and self-reported information The proceedings also include a number of papers imple- such as meals, finger-stick glucose, stress, illness, exercise menting Machine Learning approaches for a number of pre- and work. diction and classification tasks, including Natural Language Bertachi et al. use feed-forward neural networks (NNs) on Processing. To begin with, Biagi et al. propose the use of two tasks: BG prediction and hypoglycemia prediction. For Compositional Data (CoDa) analysis to classify daily blood BG prediction, physiological model equations estimate the glucose patterns in people with type 1 diabetes (T1DM) based insulin on board, the glucose absorption rate, and the activity on features including the proportions of the day spent in dif- on board. Together with two BG derived features, these are ferent glycemic regions as well as glucose concentration pat- used as input to a set of independently trained NNs, one for terns on different days for the patient. This work can pro- each of 6 predefined BG ranges. vide valuable insight to care providers if patterns found can Contreras et al. use a grammatical evolution approach to be linked with daily activities, with interesting preliminary generate models of BG dynamics, based on the output of the results presented in the paper. Gupta et al. present an ap- same physiological equations used by Bertachi et al. (this vol- plication using pre-trained TimeNets to learn features from ume). They also introduce a sinusoidal term to account for the time series patient health data to predict mortality in the ICU. circadian variations of the patients’ physiology. Besides the Finally, Adduru et al. present a novel method to create a standard RMSE, two additional metrics are used to define the paraphrase corpus from webpages discussing medical topics. fitness function: the glucose specific RMSE and the Clarke The BGLP Challenge papers describe blood glucose (BG) error grid. Corresponding results are then reported for 30, 60, level prediction approaches and the corresponding results on and 90 minute predictions. the OhioT1DM dataset. The participants used a wide vari- Xie and Wang compare the classic autoregression with ex- ety of machine learning approaches, ranging from simple au- ogenous inputs (ARX) with a set of popular ML algorithms, toregressive models and ridge regression, to more sophisti- including XGBoost, support vector regression (SVR), as well cated deep learning models such as recursive neural networks as deep learning models such as LSTMs and temporal con- (RNN) with long short-term memory (LSTM) units, dilated volution networks (TCNs). All models use the total insulin RNNs, or temporal convolution networks (TCN). delivery rate, meal sizes, and the heart rate. Two multi-step 4 prediction strategies are defined and evaluated: a recursive Program Committee method that predicts one step ahead multiple times, and a • Imon Banerjee, Stanford University direct method that predicts multiple steps ahead. Of all the models tested, the simple ARX model is shown to obtain the • Isabelle Bichindaritz, State University of New York at best RMSE on test data. Oswego The final session of the workshop was a community discus- • Ali Cinar, Illinois Institute of Technology sion with a panel of the workshop organizers chaired by Raz- • Kevin Bretonnel Cohen, University of Colorado School van Bunescu. During this session the panelists discussed the of Medicine challenges faced when creating open data sets in the health sciences as well as encouraging the community to open their • José Manuel Colmenar, Universidad Rey Juan Carlos source code. There was a clear agreement that sharing all • Bryan Conroy, Philips Research North America types of sources is beneficial and should be encouraged. We very much appreciate the support of the workshop • Sergio Consoli, Philips Research, Data Science Depart- chair, Kevin Leyton-Brown, as well as this year’s conference ment chair Jeffrey S. Rosenschein and program chair Jérôme Lang. • Alexandra Constantin, Bigfoot Biomedical Further we would like to thank Fredrik Heintz for the local • Vivek V Datla, Philips Research North America arrangements and Vesna Sabljakovic-Fritz for her adminis- trative support. • Spiros Denaxas, University College London We sincerely hope that the participants enjoyed this year’s • Franck Dernoncourt, Massachusetts Institute of Tech- workshop program and that this collection of papers will in- nology spire and encourage more AI-related research for and within • Andrea Facchinetti, University of Padova healthcare in the future. • Michele Filannino, MIT • Pau Herrero, Imperial College London Kerstin Bach, Razvan Bunescu, Oladimeji Farri, Aili Guo, • Ignacio Hidalgo, Universidad Complutense de Madrid Sadid Hasan, Zina Ibrahim, • Yuan Ling, Philips Research North America Cindy Marling, Jesse Raffa, • Bo Liu, Auburn University Jonathan Rubin, Honghan Wu • Stewart Massie, Robert Gordon University Stockholm, July 2018 • Claudia Moro, PUCPR • Tristan Naumann, Massachusetts Institute of Technol- ogy • Anna Rumshisky, University of Massachusetts, Lowell • Sadiq Sani, Robert Gordon University Aberdeen • Alexander Schliep, Gothenburg University • Rushdi Shams, OneClass - Notesolution Inc. • Giovanni Sparacino, University of Padova • Shawn Stapleton, Philips Research North America • Ozlem Uzuner, George Mason University • Josep Vehi, University of Girona 5