Personalised Glucose Prediction via Deep Multitask Networks John Daniels and Pau Herrero and Pantelis Georgiou1 Abstract. Glucose control is an essential requirement in primary 2 RELATED WORK therapy for diabetes management. Digital approaches to maintaining tight glycaemic control, such as clinical decision support systems and Glucose prediction has been a long-standing area of focus in the dia- artificial pancreas systems rely on continuous glucose monitoring de- betes community. As a result, many approaches have existed in order vices and self-reported data, which is usually improved through glu- to provide near-time glucose concentration level forecasts. cose forecasting. In this work, we develop a multitask approach us- Early work in this area have focused on physiological models and ing convolutional recurrent neural networks (MTCRNN) to provide traditional machine learning methods in predicting glucose concen- short-term forecasts using the OhioT1DM dataset which comprises tration levels [12, 3]. Recent work as seen in the 2018 Blood Glucose 12 participants. We obtain the following results - 30 min: 19.79±0.06 Predictive Challenge has seen a move towards deep learning methods mg/dL (RMSE); 13.62±0.05 mg/dL (MAE) and 60 min: 33.73±0.24 with more impressive results [11, 9, 14, 8]. These have used convolu- mg/dL (RMSE); 24.54±0.15 mg/dL (MAE). Multitask learning fa- tional architectures, recurrent architectures, or a combination of both cilitates an approach that allows for learning with the data from all to model the task of glucose prediction. available subjects, thereby overcoming the common challenge of in- sufficient individual datasets while learning appropriate individual 3 DATASET AND DATA PREPROCESSING models for each participant. In this section, we detail the transformations that are performed on the data prior to training and testing the model for each T1DM par- 1 INTRODUCTION ticipant. In recent years, the proliferation of biosensors and wearable devices has facilitated the ability to perform continuous monitoring of phys- 3.1 OhioT1DM Dataset 2020 iological signals. In diabetes management, this has come with the The OhioT1DM dataset 2020 [10] is a dataset comprising 12 unique increasing use of continuous glucose monitoring (CGM) devices for participants that cover eight weeks of daily living. The participants helping with glucose control. The current literature on clinical impact are given IDs as the data is anonymised. This data comprises physio- of CGM devices shows that continuously monitoring blood glucose logical data gathered using a continuous glucose monitor (blood glu- concentration levels has benefit in maintaining tight glycaemic con- cose concentration levels) and wristband device (heart rate, skin con- trol [5, 2]. As a next step, glucose prediction offers an opportunity ductance, skin temperature), activity data (acceleration, step count), to further improve glucose control by taking actions to avert adverse and self-reported data (meal intake, insulin, exercise, work, sleep, glycaemic events, such as suspension of insulin delivery in closed- and stressors). loop systems to avert hypoglycaemia. The general work in this area has typically involved collecting data covering physiological variables such as glucose concentration lev- 3.2 Dealing with Missing Values els, heart rate, and self-reported data covering exercise,sleep, stress, illness, insulin, and meals. However, public datasets covering ambu- A non-trivial aspect of the datasets used for developing glucose pre- latory monitoring of T1DM population are not widely available. diction models is the aspect of missingness. This is evident in the Deep learning [6] facilitates learning the optimal features and has Ohio T1DM dataset with missingness present in both physiological been shown to perform better than other methods involving hand variables and self-reported data [4]. crafted features that have been employed in recent times for predict- Linear Interpolation: The blood glucose values that are miss- ing glucose concentration levels. However, typically these models re- ing in this dataset are typically missing at random. This could be quire relatively large amounts of data to converge on an appropriate attributed to issues around replacing glucose sensors and/or transmit- model. ters, or dealing with faulty communication. As a result, we employ In this work, we employ a multitask learning [1] approach in order linear interpolation in the training set to handle imputation of missing to improve the performance of the glucose forecasting in a neural net- blood glucose concentration levels in the dataset over a period of one work, where each individual is viewed as a task, using shared layers hour. In the samples where more than an hour of CGM data is miss- to enable learning form other individuals. ing the sample is discarded from the training set. This is illustrated with an example sequence in (C) of Fig.1 1 Imperial College London, United Kingdom, email: jsd111@imperial.ac.uk, On the other hand, features which comprise self-reported data the p.herrero-vinias@imperial.ac.uk, pantelis@imperial.ac.uk assumption is made that any missing values represent an absence of Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Figure 1. A visualisation of the imputation methods employed in this work. In (A) the input sequence has at least 30 minutes of recent values missing (eg. linear extrapolation). (B) shows the imputation scheme during testing for longer than 30 minutes of recent values missing (zero-order hold). Finally (C) shows the imputation scheme when the missing values of the input sequence are located between real values (linear interpolation). said feature. Therefore all missing values in insulin, meal intake and ously with the aim of improving generalisation [1]. reported exercise are imputed with zero. Multitask learning for personalisation has been used mainly in af- The missingness in features from the self-reported data in the test- fective computing [13] with early work in diabetes management fo- ing set is tackled similarly as in the training set. However, this is cusing on using multitask learning for developing prediction models not the case for blood glucose concentration levels as interpolation for clustered groups of Type 1, Type 2, an non-diabetic participants when a current value at a given timestep is missing would lead to an [7] rather than leveraging similarities within groups such as gender, inaccurate evaluation of model performance. for personalised glucose predictions. Extrapolation: In order to accurately evaluate the performance of As seen Figure 2, the output from the shared layers are now fed the model we cannot always rely on interpolation at test time as this into the individual(task)-specific fully connected layers of each user. may require, in a real-time setting, an unknown future value to per- In a multitask setting of this kind, a multiplicative gating approach form interpolation. Consequently, we need to rely on other methods is used to ensure that the input corresponding to the particular user of extrapolation to impute the missing glucose concentration levels. trains on just that user in the individual-specific layers. In that sense, In this scenario (A), for gaps of data less than 30 minutes, we im- at each iteration a batch that consists of data from a particular indi- pute missing values with predicted values from the trained model. vidual is used to train the shared layers and the layers specific to the For missing recent values longer than 30 minutes as in (B), we pad individual. the remaining values with the last computed value. In cases where, a gap larger than 30 minutes is evident in historical data and a current value is present at the given timestep, linear interpolation was then 4.2 CRNN Model employed instead to provide a more accurate imputation. The deep learning model trained in the multitask learning setting is a convolutional recurrent neural network (CRNN) proposed by Li et. 3.3 Standardisation al [8] to perform short-term glucose prediction. This forms the basis of the single-task (STL) model. The convolutional recurrent model To enable training the proposed model effectively, we perform trans- consists initially of a 3 temporal convolutional layers that perform formation of the relevant input features (blood glucose concentration, a 1-D convolution with a Gaussian kernel over the sequence of in- insulin bolus, meal(carbohydrate) intake, and reported exercise). The put to extract features of various rates of appearance, followed by blood glucose concentration levels are scaled down by a factor of a max pooling layer after each convolution operation. The input is 120. Similarly, the insulin bolus is scaled by 100 and meal intake a 4-dimensional sequence that takes a 2-hour window of historical values are scaled by 200 in the same range between features. The ex- data. ercise values are transformed to a simple binary representation of the The output from the convolutional layers performs feature extrac- presence or absence of exercise, from the recorded exercise intensity tion and feeds into a recurrent long short-term memory (LSTM) layer on a range from 1-10. that is able to better model the temporal nature of the task. The output from the shared layers feed into the fully connected 4 METHODS layers of each user and to then provide the change in glucose value over the prediction horizon. This is then added to the current glucose In this section we detail the machine learning technique that is used value to provide the forecast glucose concentration level. to provide the means of learning personalised models with the entire dataset. We detail the approach to develop the deep multitask net- work for personalisation. We provide a summary of the hyperparam- 4.3 Loss Function eters used in training as well and setting up the input for personalised The loss function used for converging to the appropriate model for multitask learning. the glucose forecasting is the mean absolute error. This is expressed below as: 4.1 Multitask Learning N Multitask learning is an approach in machine learning that can be 1 X L(y, ŷ) = |y − ŷ| (1) broadly described as a method of learning multiple tasks simultane- Nbatch k=1 Figure 2. A detailed look at the formulation of convolutional recurrent networks in a multitask setting. In this setting, each user is represented as a task. In addition, the initial layers (convolutional and recurrent layers) are shared between each user, the next two (dense) layers are shared based on gender, and the last (dense) layer is specific to each user. where ŷ denotes the predicted results given the historical data and y The repository for the code accompanying the paper can be found at: denotes the reference change in glucose concentration over the rele- https://github.com/jsmdaniels/ecai-bglp-challenge vant glucose prediction, and Nbatch refers to the batch size. 5 RESULTS 4.4 Hyperparameters 5.1 Evaluation Metrics The following table details provides the details of the hyperparame- The model is tested on data from six participant IDs: 540, 544, 552, ters used for the model architecture at each layer. 567, 584, 596. The evaluation of the model is based on two metrics: root mean Table 1. A table detailing the size and dimensions of layers in the square error (RMSE) and the mean absolute error (MAE). The ex- multitask CRNN model (MTCRNN) trapolated points are not considered in calculating these metrics. The formulation of these metrics are provided below: Layer Description Output Dimensions No. of (layer) Parameters v u u1 X N Shared Convolutional Layers (Batch×Steps×Channels) RM SE = t (y − ŷ)2 , (2) (1) 1×4 conv 128(1) × 24 × 8 104 N k=1 max pooling, size 2 128(1) × 12 × 8 − (2) 1×4 conv 128(1) × 12 × 16 528 1 N X max pooling, size 2 128(1) × 6 × 16 − M AE = |y − ŷ|. (3) N (3) 1×4 conv 128(1) × 6 × 32 2080 k=1 max pooling 128(1) × 3 × 32 − where ŷ denotes the predicted results given the historical data and Shared Recurrent Layer (Batch×Cells) y denotes the reference glucose measurement, and N refers to the (4) lstm 128(1) × 64 24832 data size. Sub-cluster Dense Layers (Batch×Units) In order to undertake a comprehensive evaluation of the model (5) dense 128(1) × 256 16640 performance, the subsequent criteria for assessment are followed: (6) dense 128(1) × 32 8224 Individual-Specific Dense Layers (Batch×Units) • Performance evaluation over 30-minute and 60-minute pre- (7) dense 128(1) × 1 33 diction horizon (PH): The RMSE and MAE for each participant is analysed for a the same length of values for both prediction The optimiser used for this work is Adam. The learning rate is horizons. 0.0053. The model is trained for 200 epochs. This value was obtained • Comparison of training setting: The performance of the multi- through grid search optimisation. task learning (MTL) approach is evaluated in the context of com- The model is developed on Keras 2.2.2, with a Tensorflow 1.5 parison with the performance of a single task learning (STL) ap- backend. The training is performed on an NVIDIA GTX 1050 GPU. proach which uses only patient specific data. • Multiple runs for each participant ID: The multitask CRNN (MTCRNN) model uses randomly initialised weights at the start of training. Given the variable nature of this training procedure, the results reported are the average of 5 model runs. The unit for results reported below is mg/dL. The best perfor- mance is in bold. Table 2. A table showing prediction performance for 30 minutes the RMSE and MAE results of the six participants over 5 runs (CRNN) MTL STL ID RMSE MAE RMSE MAE 540 21.19±0.07 15.17±0.06 22.45±0.39 16.21±0.34 544 16.82±0.09 11.72±0.06 18.63±1.59 12.57±0.23 552 16.30±0.12 11.92±0.03 17.11±0.24 12.68±0.49 567 24.12±0.17 15.55±0.03 24.73±0.45 16.01±0.71 584 23.66±0.20 15.77±0.08 24.30±0.48 16.20±0.23 596 16.63±0.15 11.59±0.09 16.78±0.20 12.00±1.77 Figure 4. A graph showing the predictive performance of the model on Average 19.79±0.06 13.62±0.05 20.67±0.32 14.28±0.19 participant ID:596 at a 60 minute predictive horizon. Figures 3 and 4 exhibit the differences in performance as seen in the Table 3. A table showing the prediction performance for 60 minutes specific window for participant 596. The increased lag and reduced RMSE and MAE results of the six participants over 5 runs (CRNN) predictive performance can also be attributed to the higher chance of MTL STL external activities (insulin, meals, exercise) that influence the blood ID RMSE MAE RMSE MAE glucose trajectory occurring over the prediction horizon. 540 38.29±0.29 28.60±0.17 41.06±0.24 30.33±0.69 The best predictive performances were achieved by the model 544 28.97±0.24 20.77±0.20 29.60±0.37 20.52±0.17 with IDs 544, 552, 596 whereas, IDs 540, 567, and 584 exhibited 552 29.35±0.27 22.07±0.13 30.32±0.10 22.53±0.13 567 40.19±0.79 28.77±0.13 40.09±0.64 27.71±0.13 worse performances over both 30 and 60 minute prediction horizons. 584 37.82±0.78 26.88±0.37 37.22±0.34 26.64±0.41 An investigation of the glycaemic variability, using the coefficient 596 27.74±0.11 20.12±0.14 28.13±0.48 20.30±0.41 of variation (CV) [2], of the training set of the former set of par- Average 33.73±0.24 24.54±0.08 34.40±0.14 24.67±0.14 ticipants are stable (CV≤36%) whereas the latter group are labile (CV>36%). The multitask learning approach definitively performs better over the single task approach over a 30-minute prediction hori- zon. However, the performance improvement of the MTL approach 6 DISCUSSION over a 60-minute prediction is not consistent across each participant As seen in Table 3, the results shown provide a comprehensive eval- and metric. uation of the model predictive performance. One potential issue with multitask learning is the issue of negative transfer. This can be described as a scenario in which one or more of the tasks (individuals) or sampled batches during training are not strongly correlated, degrading the learning in the shared layers, and subsequently the performance at test time. 7 CONCLUSION In this work, we have presented a multitask convolutional recurrent neural network that is capable of performing short-term personalised predictions - 19.79±0.06mg/dL (RMSE) and 13.62±0.05mg/dL (MAE) at 30 minutes, as well as 33.73±0.24mg/dL (RMSE) and 24.54±0.15mg/dL (MAE) at 60 minutes. We work towards lever- aging population data while still learning a personalised model. In the future, we hope to address further challenges such as negative transfer during learning that could improve the accuracy of individ- ual models. This approach would enable more accurate models to be Figure 3. A graph showing the predictive performance of the model on deployed in the face of limited personal data. participant ID:596 at a 30 minute predictive horizon. ACKNOWLEDGEMENTS Evidently, the model performance at PH = 30 minutes is better than the model performance at PH = 60 minutes, given that prediction This work is supported by the ARISES project (EP/P00993X/1), at 60 minutes is a more complex task than prediction at 30 minutes. funded by the Engineering and Physical Sciences Research Council. REFERENCES [1] Rich Caruana, ‘Multitask Learning’, Machine Learning, 28(1), 41–75, (July 1997). [2] Antonio Ceriello, Louis Monnier, and David Owens, ‘Glycaemic vari- ability in diabetes: clinical and therapeutic implications’, The Lancet Diabetes & Endocrinology, (August 2018). [3] E. I. Georga, V. C. Protopappas, D. Ardigò, M. Marina, I. Zavaroni, D. Polyzos, and D. I. Fotiadis, ‘Multivariate Prediction of Subcutaneous Glucose Concentration in Type 1 Diabetes Patients Based on Support Vector Regression’, IEEE Journal of Biomedical and Health Informat- ics, 17(1), 71–81, (January 2013). [4] Marzyeh Ghassemi, Tristan Naumann, Peter Schulam, Andrew L. Beam, and Rajesh Ranganath, ‘Opportunities in Machine Learning for Healthcare’, arXiv:1806.00388 [cs, stat], (June 2018). arXiv: 1806.00388. [5] Giacomo Cappon, Giada Acciaroli, Martina Vettoretti, Andrea Facchinetti, and Giovanni Sparacino, ‘Wearable Continuous Glucose Monitoring Sensors: A Revolution in Diabetes Treatment’, Electronics, 6(3), 65, (September 2017). [6] Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016. http://www.deeplearningbook.org. [7] Weixi Gu, Zimu Zhou, Yuxun Zhou, Miao He, Han Zou, and Lin Zhang, ‘Predicting Blood Glucose Dynamics with Multi-time-series Deep Learning’, in Proceedings of the 15th ACM Conference on Em- bedded Network Sensor Systems - SenSys ’17, pp. 1–2, Delft, Nether- lands, (2017). ACM Press. [8] K. Li, J. Daniels, C. Liu, P. Herrero-Vinas, and P. Georgiou, ‘Convolu- tional Recurrent Neural Networks for Glucose Prediction’, IEEE Jour- nal of Biomedical and Health Informatics, 1–1, (2019). [9] Kezhi Li, Chengyuan Liu, Taiyu Zhu, Pau Herrero, and Pantelis Geor- giou, ‘GluNet: A Deep Learning Framework for Accurate Glucose Forecasting’, IEEE Journal of Biomedical and Health Informatics, 24(2), 414–423, (February 2020). [10] Cindy Marling and Razvan Bunescu, ‘The OhioT1DM Dataset for Blood Glucose Level Prediction’, In: The 5th International Workshop on Knowledge discovery in healthcare data., (2020). CEUR proceeding in press. Available at http://smarthealth.cs.ohio.edu/bglp/OhioT1DM- dataset-paper.pdf. [11] John Martinsson, Alexander Schliep, Björn Eliasson, and Olof Mogren, ‘Blood Glucose Prediction with Variance Estimation Using Recurrent Neural Networks’, Journal of Healthcare Informatics Research, 4(1), 1–18, (March 2020). [12] C. Pérez-Gandı́a, A. Facchinetti, G. Sparacino, C. Cobelli, E.j. Gómez, M. Rigla, A. de Leiva, and M.e. Hernando, ‘Artificial Neural Network Algorithm for Online Glucose Prediction from Continuous Glucose Monitoring’, Diabetes Technology & Therapeutics, 12(1), 81–88, (Jan- uary 2010). [13] Sara Ann Taylor, Natasha Jaques, Ehimwenma Nosakhare, Akane Sano, and Rosalind Picard, ‘Personalized Multitask Learning for Pre- dicting Tomorrow’s Mood, Stress, and Health’, IEEE Transactions on Affective Computing, 1–1, (2017). [14] Taiyu Zhu, Kezhi Li, Jianwei Chen, Pau Herrero, and Pantelis Geor- giou, ‘Dilated Recurrent Neural Networks for Glucose Forecasting in Type 1 Diabetes’, Journal of Healthcare Informatics Research, (April 2020).