Personalised Glucose Prediction via
                                      Deep Multitask Networks
                                         John Daniels and Pau Herrero and Pantelis Georgiou1


Abstract. Glucose control is an essential requirement in primary                         2     RELATED WORK
therapy for diabetes management. Digital approaches to maintaining
tight glycaemic control, such as clinical decision support systems and                   Glucose prediction has been a long-standing area of focus in the dia-
artificial pancreas systems rely on continuous glucose monitoring de-                    betes community. As a result, many approaches have existed in order
vices and self-reported data, which is usually improved through glu-                     to provide near-time glucose concentration level forecasts.
cose forecasting. In this work, we develop a multitask approach us-                         Early work in this area have focused on physiological models and
ing convolutional recurrent neural networks (MTCRNN) to provide                          traditional machine learning methods in predicting glucose concen-
short-term forecasts using the OhioT1DM dataset which comprises                          tration levels [12, 3]. Recent work as seen in the 2018 Blood Glucose
12 participants. We obtain the following results - 30 min: 19.79±0.06                    Predictive Challenge has seen a move towards deep learning methods
mg/dL (RMSE); 13.62±0.05 mg/dL (MAE) and 60 min: 33.73±0.24                              with more impressive results [11, 9, 14, 8]. These have used convolu-
mg/dL (RMSE); 24.54±0.15 mg/dL (MAE). Multitask learning fa-                             tional architectures, recurrent architectures, or a combination of both
cilitates an approach that allows for learning with the data from all                    to model the task of glucose prediction.
available subjects, thereby overcoming the common challenge of in-
sufficient individual datasets while learning appropriate individual                     3     DATASET AND DATA PREPROCESSING
models for each participant.
                                                                                         In this section, we detail the transformations that are performed on
                                                                                         the data prior to training and testing the model for each T1DM par-
1     INTRODUCTION                                                                       ticipant.

In recent years, the proliferation of biosensors and wearable devices
has facilitated the ability to perform continuous monitoring of phys-                    3.1     OhioT1DM Dataset 2020
iological signals. In diabetes management, this has come with the
                                                                                         The OhioT1DM dataset 2020 [10] is a dataset comprising 12 unique
increasing use of continuous glucose monitoring (CGM) devices for
                                                                                         participants that cover eight weeks of daily living. The participants
helping with glucose control. The current literature on clinical impact
                                                                                         are given IDs as the data is anonymised. This data comprises physio-
of CGM devices shows that continuously monitoring blood glucose
                                                                                         logical data gathered using a continuous glucose monitor (blood glu-
concentration levels has benefit in maintaining tight glycaemic con-
                                                                                         cose concentration levels) and wristband device (heart rate, skin con-
trol [5, 2]. As a next step, glucose prediction offers an opportunity
                                                                                         ductance, skin temperature), activity data (acceleration, step count),
to further improve glucose control by taking actions to avert adverse
                                                                                         and self-reported data (meal intake, insulin, exercise, work, sleep,
glycaemic events, such as suspension of insulin delivery in closed-
                                                                                         and stressors).
loop systems to avert hypoglycaemia.
   The general work in this area has typically involved collecting data
covering physiological variables such as glucose concentration lev-                      3.2     Dealing with Missing Values
els, heart rate, and self-reported data covering exercise,sleep, stress,
illness, insulin, and meals. However, public datasets covering ambu-                     A non-trivial aspect of the datasets used for developing glucose pre-
latory monitoring of T1DM population are not widely available.                           diction models is the aspect of missingness. This is evident in the
   Deep learning [6] facilitates learning the optimal features and has                   Ohio T1DM dataset with missingness present in both physiological
been shown to perform better than other methods involving hand                           variables and self-reported data [4].
crafted features that have been employed in recent times for predict-                       Linear Interpolation: The blood glucose values that are miss-
ing glucose concentration levels. However, typically these models re-                    ing in this dataset are typically missing at random. This could be
quire relatively large amounts of data to converge on an appropriate                     attributed to issues around replacing glucose sensors and/or transmit-
model.                                                                                   ters, or dealing with faulty communication. As a result, we employ
   In this work, we employ a multitask learning [1] approach in order                    linear interpolation in the training set to handle imputation of missing
to improve the performance of the glucose forecasting in a neural net-                   blood glucose concentration levels in the dataset over a period of one
work, where each individual is viewed as a task, using shared layers                     hour. In the samples where more than an hour of CGM data is miss-
to enable learning form other individuals.                                               ing the sample is discarded from the training set. This is illustrated
                                                                                         with an example sequence in (C) of Fig.1
1 Imperial College London, United Kingdom, email: jsd111@imperial.ac.uk,                    On the other hand, features which comprise self-reported data the
    p.herrero-vinias@imperial.ac.uk, pantelis@imperial.ac.uk                             assumption is made that any missing values represent an absence of


    Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 Figure 1. A visualisation of the imputation methods employed in this work. In (A) the input sequence has at least 30 minutes of recent values missing (eg.
linear extrapolation). (B) shows the imputation scheme during testing for longer than 30 minutes of recent values missing (zero-order hold). Finally (C) shows
                  the imputation scheme when the missing values of the input sequence are located between real values (linear interpolation).


said feature. Therefore all missing values in insulin, meal intake and           ously with the aim of improving generalisation [1].
reported exercise are imputed with zero.                                            Multitask learning for personalisation has been used mainly in af-
   The missingness in features from the self-reported data in the test-          fective computing [13] with early work in diabetes management fo-
ing set is tackled similarly as in the training set. However, this is            cusing on using multitask learning for developing prediction models
not the case for blood glucose concentration levels as interpolation             for clustered groups of Type 1, Type 2, an non-diabetic participants
when a current value at a given timestep is missing would lead to an             [7] rather than leveraging similarities within groups such as gender,
inaccurate evaluation of model performance.                                      for personalised glucose predictions.
   Extrapolation: In order to accurately evaluate the performance of                As seen Figure 2, the output from the shared layers are now fed
the model we cannot always rely on interpolation at test time as this            into the individual(task)-specific fully connected layers of each user.
may require, in a real-time setting, an unknown future value to per-                In a multitask setting of this kind, a multiplicative gating approach
form interpolation. Consequently, we need to rely on other methods               is used to ensure that the input corresponding to the particular user
of extrapolation to impute the missing glucose concentration levels.             trains on just that user in the individual-specific layers. In that sense,
In this scenario (A), for gaps of data less than 30 minutes, we im-              at each iteration a batch that consists of data from a particular indi-
pute missing values with predicted values from the trained model.                vidual is used to train the shared layers and the layers specific to the
For missing recent values longer than 30 minutes as in (B), we pad               individual.
the remaining values with the last computed value. In cases where, a
gap larger than 30 minutes is evident in historical data and a current
value is present at the given timestep, linear interpolation was then            4.2     CRNN Model
employed instead to provide a more accurate imputation.                          The deep learning model trained in the multitask learning setting is
                                                                                 a convolutional recurrent neural network (CRNN) proposed by Li et.
3.3    Standardisation                                                           al [8] to perform short-term glucose prediction. This forms the basis
                                                                                 of the single-task (STL) model. The convolutional recurrent model
To enable training the proposed model effectively, we perform trans-             consists initially of a 3 temporal convolutional layers that perform
formation of the relevant input features (blood glucose concentration,           a 1-D convolution with a Gaussian kernel over the sequence of in-
insulin bolus, meal(carbohydrate) intake, and reported exercise). The            put to extract features of various rates of appearance, followed by
blood glucose concentration levels are scaled down by a factor of                a max pooling layer after each convolution operation. The input is
120. Similarly, the insulin bolus is scaled by 100 and meal intake               a 4-dimensional sequence that takes a 2-hour window of historical
values are scaled by 200 in the same range between features. The ex-             data.
ercise values are transformed to a simple binary representation of the              The output from the convolutional layers performs feature extrac-
presence or absence of exercise, from the recorded exercise intensity            tion and feeds into a recurrent long short-term memory (LSTM) layer
on a range from 1-10.                                                            that is able to better model the temporal nature of the task.
                                                                                    The output from the shared layers feed into the fully connected
4     METHODS                                                                    layers of each user and to then provide the change in glucose value
                                                                                 over the prediction horizon. This is then added to the current glucose
In this section we detail the machine learning technique that is used            value to provide the forecast glucose concentration level.
to provide the means of learning personalised models with the entire
dataset. We detail the approach to develop the deep multitask net-
work for personalisation. We provide a summary of the hyperparam-                4.3     Loss Function
eters used in training as well and setting up the input for personalised
                                                                                 The loss function used for converging to the appropriate model for
multitask learning.
                                                                                 the glucose forecasting is the mean absolute error. This is expressed
                                                                                 below as:
4.1    Multitask Learning
                                                                                                                            N
Multitask learning is an approach in machine learning that can be                                                      1    X
                                                                                                      L(y, ŷ) =                  |y − ŷ|                 (1)
broadly described as a method of learning multiple tasks simultane-                                                Nbatch
                                                                                                                            k=1
 Figure 2. A detailed look at the formulation of convolutional recurrent networks in a multitask setting. In this setting, each user is represented as a task. In
addition, the initial layers (convolutional and recurrent layers) are shared between each user, the next two (dense) layers are shared based on gender, and the last
                                                                 (dense) layer is specific to each user.


where ŷ denotes the predicted results given the historical data and y              The repository for the code accompanying the paper can be found at:
denotes the reference change in glucose concentration over the rele-                https://github.com/jsmdaniels/ecai-bglp-challenge
vant glucose prediction, and Nbatch refers to the batch size.
                                                                                    5     RESULTS
4.4     Hyperparameters                                                             5.1     Evaluation Metrics
The following table details provides the details of the hyperparame-                The model is tested on data from six participant IDs: 540, 544, 552,
ters used for the model architecture at each layer.                                 567, 584, 596.
                                                                                       The evaluation of the model is based on two metrics: root mean
      Table 1.   A table detailing the size and dimensions of layers in the         square error (RMSE) and the mean absolute error (MAE). The ex-
                     multitask CRNN model (MTCRNN)                                  trapolated points are not considered in calculating these metrics. The
                                                                                    formulation of these metrics are provided below:
       Layer Description     Output Dimensions       No. of
             (layer)                              Parameters                                                       v
                                                                                                                   u
                                                                                                                   u1 X    N
       Shared Convolutional Layers (Batch×Steps×Channels)
                                                                                                       RM SE = t             (y − ŷ)2 ,                (2)
         (1) 1×4 conv          128(1) × 24 × 8        104                                                             N
                                                                                                                               k=1
       max pooling, size 2     128(1) × 12 × 8         −
         (2) 1×4 conv         128(1) × 12 × 16        528                                                                1
                                                                                                                             N
                                                                                                                             X
       max pooling, size 2     128(1) × 6 × 16         −                                                     M AE =                |y − ŷ|.                    (3)
                                                                                                                         N
         (3) 1×4 conv          128(1) × 6 × 32       2080                                                                    k=1
         max pooling           128(1) × 3 × 32         −                              where ŷ denotes the predicted results given the historical data and
                Shared Recurrent Layer (Batch×Cells)                                y denotes the reference glucose measurement, and N refers to the
            (4) lstm             128(1) × 64         24832                          data size.
               Sub-cluster Dense Layers (Batch×Units)                                 In order to undertake a comprehensive evaluation of the model
           (5) dense            128(1) × 256         16640                          performance, the subsequent criteria for assessment are followed:
           (6) dense             128(1) × 32         8224
          Individual-Specific Dense Layers (Batch×Units)                             • Performance evaluation over 30-minute and 60-minute pre-
           (7) dense              128(1) × 1           33                              diction horizon (PH): The RMSE and MAE for each participant
                                                                                       is analysed for a the same length of values for both prediction
   The optimiser used for this work is Adam. The learning rate is                      horizons.
0.0053. The model is trained for 200 epochs. This value was obtained                 • Comparison of training setting: The performance of the multi-
through grid search optimisation.                                                      task learning (MTL) approach is evaluated in the context of com-
   The model is developed on Keras 2.2.2, with a Tensorflow 1.5                        parison with the performance of a single task learning (STL) ap-
backend. The training is performed on an NVIDIA GTX 1050 GPU.                          proach which uses only patient specific data.
• Multiple runs for each participant ID: The multitask CRNN
  (MTCRNN) model uses randomly initialised weights at the start
  of training. Given the variable nature of this training procedure,
  the results reported are the average of 5 model runs.

  The unit for results reported below is mg/dL. The best perfor-
mance is in bold.

     Table 2. A table showing prediction performance for 30 minutes the
      RMSE and MAE results of the six participants over 5 runs (CRNN)

                        MTL                            STL
      ID         RMSE        MAE               RMSE          MAE
      540      21.19±0.07 15.17±0.06         22.45±0.39   16.21±0.34
      544      16.82±0.09 11.72±0.06         18.63±1.59   12.57±0.23
      552      16.30±0.12 11.92±0.03         17.11±0.24   12.68±0.49
      567      24.12±0.17 15.55±0.03         24.73±0.45   16.01±0.71
      584      23.66±0.20 15.77±0.08         24.30±0.48   16.20±0.23
      596      16.63±0.15 11.59±0.09         16.78±0.20   12.00±1.77           Figure 4. A graph showing the predictive performance of the model on
    Average    19.79±0.06 13.62±0.05         20.67±0.32   14.28±0.19                    participant ID:596 at a 60 minute predictive horizon.


                                                                           Figures 3 and 4 exhibit the differences in performance as seen in the
     Table 3. A table showing the prediction performance for 60 minutes    specific window for participant 596. The increased lag and reduced
      RMSE and MAE results of the six participants over 5 runs (CRNN)
                                                                           predictive performance can also be attributed to the higher chance of
                        MTL                            STL                 external activities (insulin, meals, exercise) that influence the blood
      ID         RMSE        MAE               RMSE          MAE           glucose trajectory occurring over the prediction horizon.
      540      38.29±0.29 28.60±0.17         41.06±0.24   30.33±0.69          The best predictive performances were achieved by the model
      544      28.97±0.24 20.77±0.20         29.60±0.37 20.52±0.17         with IDs 544, 552, 596 whereas, IDs 540, 567, and 584 exhibited
      552      29.35±0.27 22.07±0.13         30.32±0.10   22.53±0.13
      567      40.19±0.79 28.77±0.13         40.09±0.64   27.71±0.13       worse performances over both 30 and 60 minute prediction horizons.
      584      37.82±0.78 26.88±0.37         37.22±0.34   26.64±0.41       An investigation of the glycaemic variability, using the coefficient
      596      27.74±0.11 20.12±0.14         28.13±0.48   20.30±0.41       of variation (CV) [2], of the training set of the former set of par-
    Average    33.73±0.24 24.54±0.08         34.40±0.14   24.67±0.14       ticipants are stable (CV≤36%) whereas the latter group are labile
                                                                           (CV>36%). The multitask learning approach definitively performs
                                                                           better over the single task approach over a 30-minute prediction hori-
                                                                           zon. However, the performance improvement of the MTL approach
6     DISCUSSION                                                           over a 60-minute prediction is not consistent across each participant
As seen in Table 3, the results shown provide a comprehensive eval-        and metric.
uation of the model predictive performance.                                   One potential issue with multitask learning is the issue of negative
                                                                           transfer. This can be described as a scenario in which one or more
                                                                           of the tasks (individuals) or sampled batches during training are not
                                                                           strongly correlated, degrading the learning in the shared layers, and
                                                                           subsequently the performance at test time.


                                                                           7     CONCLUSION

                                                                           In this work, we have presented a multitask convolutional recurrent
                                                                           neural network that is capable of performing short-term personalised
                                                                           predictions - 19.79±0.06mg/dL (RMSE) and 13.62±0.05mg/dL
                                                                           (MAE) at 30 minutes, as well as 33.73±0.24mg/dL (RMSE) and
                                                                           24.54±0.15mg/dL (MAE) at 60 minutes. We work towards lever-
                                                                           aging population data while still learning a personalised model. In
                                                                           the future, we hope to address further challenges such as negative
                                                                           transfer during learning that could improve the accuracy of individ-
                                                                           ual models. This approach would enable more accurate models to be
    Figure 3. A graph showing the predictive performance of the model on
                                                                           deployed in the face of limited personal data.
             participant ID:596 at a 30 minute predictive horizon.

                                                                           ACKNOWLEDGEMENTS
   Evidently, the model performance at PH = 30 minutes is better
than the model performance at PH = 60 minutes, given that prediction       This work is supported by the ARISES project (EP/P00993X/1),
at 60 minutes is a more complex task than prediction at 30 minutes.        funded by the Engineering and Physical Sciences Research Council.
REFERENCES
 [1] Rich Caruana, ‘Multitask Learning’, Machine Learning, 28(1), 41–75,
     (July 1997).
 [2] Antonio Ceriello, Louis Monnier, and David Owens, ‘Glycaemic vari-
     ability in diabetes: clinical and therapeutic implications’, The Lancet
     Diabetes & Endocrinology, (August 2018).
 [3] E. I. Georga, V. C. Protopappas, D. Ardigò, M. Marina, I. Zavaroni,
     D. Polyzos, and D. I. Fotiadis, ‘Multivariate Prediction of Subcutaneous
     Glucose Concentration in Type 1 Diabetes Patients Based on Support
     Vector Regression’, IEEE Journal of Biomedical and Health Informat-
     ics, 17(1), 71–81, (January 2013).
 [4] Marzyeh Ghassemi, Tristan Naumann, Peter Schulam, Andrew L.
     Beam, and Rajesh Ranganath, ‘Opportunities in Machine Learning
     for Healthcare’, arXiv:1806.00388 [cs, stat], (June 2018). arXiv:
     1806.00388.
 [5] Giacomo Cappon, Giada Acciaroli, Martina Vettoretti, Andrea
     Facchinetti, and Giovanni Sparacino, ‘Wearable Continuous Glucose
     Monitoring Sensors: A Revolution in Diabetes Treatment’, Electronics,
     6(3), 65, (September 2017).
 [6] Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning,
     MIT Press, 2016. http://www.deeplearningbook.org.
 [7] Weixi Gu, Zimu Zhou, Yuxun Zhou, Miao He, Han Zou, and Lin
     Zhang, ‘Predicting Blood Glucose Dynamics with Multi-time-series
     Deep Learning’, in Proceedings of the 15th ACM Conference on Em-
     bedded Network Sensor Systems - SenSys ’17, pp. 1–2, Delft, Nether-
     lands, (2017). ACM Press.
 [8] K. Li, J. Daniels, C. Liu, P. Herrero-Vinas, and P. Georgiou, ‘Convolu-
     tional Recurrent Neural Networks for Glucose Prediction’, IEEE Jour-
     nal of Biomedical and Health Informatics, 1–1, (2019).
 [9] Kezhi Li, Chengyuan Liu, Taiyu Zhu, Pau Herrero, and Pantelis Geor-
     giou, ‘GluNet: A Deep Learning Framework for Accurate Glucose
     Forecasting’, IEEE Journal of Biomedical and Health Informatics,
     24(2), 414–423, (February 2020).
[10] Cindy Marling and Razvan Bunescu, ‘The OhioT1DM Dataset for
     Blood Glucose Level Prediction’, In: The 5th International Workshop
     on Knowledge discovery in healthcare data., (2020). CEUR proceeding
     in press. Available at http://smarthealth.cs.ohio.edu/bglp/OhioT1DM-
     dataset-paper.pdf.
[11] John Martinsson, Alexander Schliep, Björn Eliasson, and Olof Mogren,
     ‘Blood Glucose Prediction with Variance Estimation Using Recurrent
     Neural Networks’, Journal of Healthcare Informatics Research, 4(1),
     1–18, (March 2020).
[12] C. Pérez-Gandı́a, A. Facchinetti, G. Sparacino, C. Cobelli, E.j. Gómez,
     M. Rigla, A. de Leiva, and M.e. Hernando, ‘Artificial Neural Network
     Algorithm for Online Glucose Prediction from Continuous Glucose
     Monitoring’, Diabetes Technology & Therapeutics, 12(1), 81–88, (Jan-
     uary 2010).
[13] Sara Ann Taylor, Natasha Jaques, Ehimwenma Nosakhare, Akane
     Sano, and Rosalind Picard, ‘Personalized Multitask Learning for Pre-
     dicting Tomorrow’s Mood, Stress, and Health’, IEEE Transactions on
     Affective Computing, 1–1, (2017).
[14] Taiyu Zhu, Kezhi Li, Jianwei Chen, Pau Herrero, and Pantelis Geor-
     giou, ‘Dilated Recurrent Neural Networks for Glucose Forecasting in
     Type 1 Diabetes’, Journal of Healthcare Informatics Research, (April
     2020).