=Paper= {{Paper |id=Vol-2675/paper15 |storemode=property |title=Blood Glucose Prediction for Type 1 Diabetes Using Generative Adversarial Networks |pdfUrl=https://ceur-ws.org/Vol-2675/paper15.pdf |volume=Vol-2675 |authors=Taiyu Zhu,Xi Yao,Kezhi Li,Pau Herrero,Pantelis Georgiou |dblpUrl=https://dblp.org/rec/conf/ecai/ZhuYLHG20 }} ==Blood Glucose Prediction for Type 1 Diabetes Using Generative Adversarial Networks== https://ceur-ws.org/Vol-2675/paper15.pdf
                   Blood Glucose Prediction for Type 1 Diabetes
                     Using Generative Adversarial Networks
                                  Taiyu Zhu1, Xi Yao2, Kezhi Li3, Pau Herrero4 and Pantelis Georgiou5


Abstract. Maintaining blood glucose in a target range is essential                       that measures BG levels and provides readings in real-time. CGM
for people living with Type 1 diabetes in order to avoid excessive                       has produced a vast amount of BG data with its increasing use in the
periods in hypoglycemia and hyperglycemia which can result in se-                        diabetes population. Taking advantage of this, the emergence of deep
vere complications. Accurate blood glucose prediction can reduce                         learning algorithms for BG prediction has achieved recent success
this risk and enhance early interventions to improve diabetes man-                       and outperformed several conventional machine learning approaches
agement. However, due to the complex nature of glucose metabolism                        in terms of accuracy [1, 16, 17, 23, 28]. Generally, the major chal-
and the various lifestyle related factors which can disrupt this, dia-                   lenge of BG prediction lies in accounting for the intra- and inter-
betes management still remains challenging. In this work we pro-                         person variability that leads to various glucose responses under dif-
pose a novel deep learning model to predict future BG levels based                       ferent conditions [25]. Furthermore, many external events and factors
on the historical continuous glucose monitoring measurements, meal                       can influence glucose dynamics, such as meal ingestion, physical ex-
ingestion, and insulin delivery. We adopt a modified architecture of                     ercise, psychological stress, and illness. Deep learning is powerful
the generative adversarial network that comprises of a generator and                     at extracting hidden representations from large-scale raw data [15],
a discriminator. The generator computes the BG predictions by a                          making it suitable for accounting for the complexity of glucose dy-
recurrent neural network with gated recurrent units, and the auxil-                      namics in diabetes.
iary discriminator employs a one-dimensional convolutional neural                           In this work, we propose a novel deep learning model for BG pre-
network to distinguish between the predictive and real BG values.                        diction using a modified generative adversarial network (GAN). As a
Two modules are trained in an adversarial process with a combina-                        recent breakthrough in the field of deep learning, GANs have shown
tion of loss. The experiments were conducted using the OhioT1DM                          promising performance on various tasks, such as generating realistic
dataset that contains the data of six T1D contributors over 40 days.                     images [13], synthesizing electronic health records [4] and predicting
The proposed algorithm achieves an average root mean square er-                          financial time series [31]. Normally, a GAN framework is composed
ror (RMSE) of 18.34 ± 0.17 mg/dL with a mean absolute error                              of two deep neural networks (DNNs) models as the generator and the
(MAE) of 13.37 ± 0.18 mg/dL for the 30-minute prediction hori-                           discriminator, respectively. They are trained simultaneously through
zon (PH) and an average RMSE of 32.31 ± 0.46 mg/dL with a MAE                            an adversarial process [10]. The proposed generator captures feature
of 24.20 ± 0.42 for the 60-minute PH. The results are compared                           maps of the multi-variant physiological waveform data and gener-
for clinical relevance using the Clarke error grid which confirms the                    ates predictive BG samples, while the discriminator is designed to
promising performance of the proposed model.                                             distinguish the real data from generated ones. To model the temporal
                                                                                         dynamics of BG data, we adopt a recurrent neural network (RNN)
                                                                                         in the generator and a one-dimensional convolutional neural network
1     INTRODUCTION                                                                       (CNN) in the discriminator with dilation factors in each DNN layer
Diabetes is a chronic metabolic disorder that affects more than 400                      to expand receptive fields, which have been verified as adequate net-
million people worldwide with an increasing global prevalence [27].                      work structures for BG prediction in our previous works [5, 17, 33].
Due to an absence of insulin production from the pancreatic β cells,
people living with Type 1 diabetes (T1D) require long-term self-
                                                                                         2     METHODS
management through exogenous insulin delivery to maintain blood
glucose (BG) levels in a normal range. In this regard, accurate glu-                     2.1     Dataset and Pre-processing
cose prediction has great potential to improve diabetes manage-
ment, enabling proactive actions to reduce the occurrence of adverse                     The data that we used to develop the model is the OhioT1DM
glycemic events, including hypoglycemia and hyperglycemia.                               dataset, provided by the Blood Glucose Level Prediction (BGLP)
  In recent years, empowered by the advances in wearable devices                         Challenge [20, 21]. It was produced by collecting BG-relevant data
and data-driven techniques, different BG prediction algorithms have                      on 12 people with T1D over an eight-week period. The first half of
been proposed and validated in clinical practice [29]. Among these,                      the cohort released for the 2018 BGLP challenge was used for model
continuous glucose monitoring (CGM) is an essential technology                           pre-training, and we focus on the performance of the rest six indi-
1 Imperial College London, UK, email: taiyu.zhu17@imperial.ac.uk
                                                                                         viduals that numbered 540, 544, 552, 567, 584, and 596. The dataset
2 Imperial College London, UK, email: x.yao19@imperial.ac.uk                             contains BG levels collected by CGM readings every five minutes,
3 University College London, UK, email: ken.li@ucl.ac.uk                                 insulin delivery from insulin pumps, self-reported events (such as
4 Imperial College London, UK, email: pherrero@imperial.ac.uk                            meal, work, sleep, psychological stress, and physical exercise) via
5 Imperial College London, UK, email: pantelis@imperial.ac.uk
                                                                                         a smartphone app and physical activity by a sensor band. However,




    Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                  Real Sequence
                                                                                                Gt+1,Gt+2, … , Gt+w                                              FC
                                                                                                                                          1-D CNN Layers
                                       Pre-processing
                                                             Dataset
                                                               XT                                 Synthetic Sequence
                                                                                                Gt+1,Gt+2, … , GG
                                                                                                                ˆt+w
                                                                                                                  t+w


                                                                                                                                            Discriminator

                                                                       GRU Cell

                                                                              ✕          +
                                                                                                              Prediction
                    Raw Data                                                  1-          ✕                                                          Real/Synthetic
                                                     Input      ✕
                                                                                                                  ˆ t+w
                                                                                                                  G        Gt+w
                                                                        σ     σ
                                                                                         tanh

                                                                                                                                  Adversarial Loss

                                                                       Generator
                                        X(t+1-L):t
                                                                                                      Supervised Loss




                               Figure 1: The system architecture of the proposed GAN framework to predict BG levels.

there are unavoidable differences between the collected data and ac-                            Then the predictive BG level Ĝt+w from the generator is defined as
tual physiological states. For example, the CGM sensor measures in-                             follows:
terstitial fluid glucose level and then estimate BG levels by applying                                             Ĝt+w = fG (Xt+1−L:t ) + Gt                  (2)
signal processing techniques, such as filtering and calibration algo-
rithms. The meal and insulin are discrete values manually input by                              where fG represents the parameters of the generator. Instead of us-
users, instead of series of carbohydrates and insulin on board.                                 ing the whole series, we divide X into small contiguous sequences
   It should be noted that the dataset contains many missing gaps                               with a length of L as a sliding window, then feed them into the deep
and outliers affecting BG levels, both in the training and testing                              generative model in a form of mini-batches, aiming at improving sta-
sets, mainly due to CGM signal loss, sensor noise (e.g., compres-                               bility and generalization of the model [12]. According to the feature
sion artifacts), or some usage reasons, such as sensor replacement                              selection in [22] and the model validation, we empirically set L = 18
and calibration. To compensate for some of the missing data, we ap-                             which indicates that the input contains 1.5 hours historical data.
ply linear interpolation to fill the missing sequences in the training
sets, while we only extrapolate missing values in the testing set to
ensure that the future information is not involved as partial inputs
                                                                                                2.3         System Architecture
in the prediction. We then align processed BG samples and other                                 The RNN-based algorithms performed well in BG level prediction
features, e.g. exogenous events, with the same resolution of CGM                                in previous studies [1, 23, 28]. Thus, we instantiate a three-layer
measurements, and normalize them to form a N -step time series:                                 RNN with 32 hidden units to build the generator, which can be
XN = [x1 , . . . , xN ] ∈ RN ×d , where x is a d-dimensional vector                             seen as a typical setup of time series GANs [9, 24, 30]. In general,
mapping the multivariate data at each timestep.                                                 vanilla RNN architecture faces the problem of gradient vanishing
                                                                                                and exploding, making it difficult to capture long-term dependencies.
2.2    Problem Formulation                                                                      Thus, the gated RNN units are proposed to meet this challenge using
                                                                                                element-wise gating functions [7], including long short-term mem-
Considering a target prediction horizon (PH) (e.g. 30 or 60 minutes),                           ory (LSTM) units [11] and gated recurrent units (GRUs) [6]. Com-
the goal of the predictor is to estimate the future BG levels Gt+w                              pared to the vanilla RNN, the gated units are able to control the flow
of individuals given past and current physiological states, where w is                          of information inside units by a set of gates, which allows an eas-
the number of timesteps determined by PH and CGM resolution (e.g.                               ier backward propagation process. Compared to the LSTM, the GRU
5 minutes). Hence, the objective of predictor is consistent with that                           was proposed more recently and removed the memory unit. This cell
of GANs, aiming to learn the DNN approximator p̂ from the pattern                               structure uses less parameters and computes the output more effi-
of glucose dynamics p measured in the human body, which can be                                  ciently [32] . During the hyper-parameter tuning, GRU-based algo-
expressed by the form of the Kullback-Leibler divergence [30]:                                  rithms also achieved the best predictive outcomes, so we naturally
                                                                                                adopt GRU cells in the RNNs .
             min D((p(Gt+w |X1:t )||(p̂(Gt+w |X1:t ))                              (1)             As depicted in Figure 1, the multi-dimensional input is fed into a
               p̂
                                                                                                RNN with GRU cells given a state length of L. Then the data is pro-
where D is a measurement of the distance between distributions.                                 cessed by a set of hidden neurons to calculate the last cell state Ct .
Thus, we need to select highly-related data features to represent                               A fully connected (FC) layer with weights WF C and a bias bF C are
the physiological state. Referring to some previous work and hyper-                             used to model the final scalar output: ∆Ĝt = WF C Ct + bF C . Fi-
parameter tuning [16, 17, 22, 23], we use X , [G, M, I] as the                                  nally, after adding the current BG level to predictive glucose change,
physiological time series, where G is pre-processed CGM measure-                                we obtain the output Ĝt+w .
ments (mg/dL); M denotes the carbohydrate amount of meal inges-                                    In general, the prediction performance degrades with the increase
tion (g); and I is the bolus insulin delivery (U). In order to reduce                           of PH, due to the complicated physiological conditions of people
the bias in the supervised learning, we set the changes of BG levels                            with T1D and the uncertainties of exogenous events between t and
in PH as the training targets of the generator: ∆Gt = Gt+w − Gt .                               t+w. For instance, if there was a meal intake with large carbohydrate
20-30 minutes before t + w, the BG level would raise fast and make         the training set, we use the last 20% data of the training set to validate
the target ∆Gt suddenly increase. These cases occur frequently in          the models and guarantee that future information is not involved in
the daytime with a large PH, which could affect a supervised learning      current prediction. The early-stop technique is applied to avoid over-
model to achieve global optimum. This motivated us to make use             fitting; we stop the training process when the validation loss keeps
of the information between t and t + w during the training process         increasing. In particular, we set the maximum number of epochs to
to investigate the contiguous glucose change. Therefore, we append         3000 with stopping patience of 50. The data sufficiency and over-
the predictive BG level to the end of series Gt+1:t+w−1 to form a          fitting occurrences are further investigated by means of the learning
synthetic sequence ŷ and use Gt+1:t+w as the corresponding real           curves.
sequence y. Then we introduce a CNN-based discriminator to extract
features and distinguish the real from synthetic sequences, benefiting
from the good classification ability of CNNs [15]. There are three
                                                                           2.5    Metrics
one-dimensional (1-D) causal CNN layers employed with rectified            A set of metrics is applied to evaluate the performance of the GAN
linear unit (ReLU) activation and 32 hidden units to compute the final     model, including root mean square error (RMSE) (mg/dL), mean ab-
binary output. The discriminator is expected to classify the real and      solute error (MAE) (mg/dL), which are denoted as:
synthetic sequences by 1 and 0, while the generator is pitted against                    v
the discriminator and aims to estimate a BG value that is close to the                   u
                                                                                         u1 X  N                              N
                                                                                                                           1 X
real BG distribution over the PH. Thus the loss of discriminator is          RMSE = t            (Gk − Ĝk )2 , MAE =            |Gk − Ĝk |,
computed by cross-entropy. Consequently, this adversarial training                         N                               N
                                                                                                k=1                              k=1
contains two loss functions LG and LD for the generator and the                                                                               (6)
discriminator respectively, which are given by                             In addition to the RMSE and MAE metrics, we also use the Clarke
                                                                           error grid (CEG) [8], which is a semi-quantitative tool from the clin-
                                      m
                                      X                                    ical perspective. As shown in Figure 2, there are five zones labeled
               LG = λ1 LSL + λ2 m            log(1 − fD (ŷ(i) )),   (3)
                                       i=1
                                                                           to intuitively reveal the medical consequence based on the prediction
                  m                                                        results. In general, the data points (BG pairs) in zone A and B are
              1 X
   LD =             [− log fD (y(i) ) − (log(1 − fD (ŷ(i) )))],     (4)   regarded as positive for medical treatment, while the rest (C, D and
              m i=1                                                        E) are considered undesirable.
where fD represents the calculation in the discriminator; LSL
is the means square error loss of supervised learning: LSL =               3     RESULTS
Pm        (i)       (i)   2
   i=1 (Gt+w − Ĝt+w ) ; λ1 and λ2 are used to adjust the ratio be-        After tuning the hyper-parameters, we tested the model on the test-
tween supervised loss and adversarial loss [31]; and m stands for
                                                                           ing sets. Table 1 shows the RMSE and MAE results for the PH of
the mini-batch size. In practice, we employ two separate Adam opti-
                                                                           30 minutes and 60 minutes. Considering the randomness of the ini-
mizer [14] to minimize LG and LD with batch size of 512 and learn-
                                                                           tial weights in DNNs, we conducted 10 simulations and reported
ing rate of 0.0001.
                                                                           results by Mean±SD, where SD is the standard deviation. The av-
   Moreover, we introduce dilation to both the RNN and the CNN
                                                                           erage (AVG) RMSE and MAE over all 6 contributors respectively
layers [3, 26], which has shown the promising performance of BG
                                                                           achieve 18.34 ± 0.17 and 13.37 ± 0.18 mg/dL for 30-minute PH,
level prediction in previous work [5, 17, 32, 33]. By skipping certain
                                                                           and 32.21 ± 0.46 and 24.20 ± 0.42 for 60-minute PH. The best
number connections between neurons, the receptive field of the DNN
                                                                           RMSE and MAE results in experiments are also presented in the last
layers can be exponentially increased, which is helpful to capture
                                                                           row, which are slightly smaller than the average results. It is noted the
long-term temporal dependencies in the BG series. In particular, the
                                                                           standard deviation of multiple simulations is small, which indicates
dilation of layer l is set to rl = 2l−1 , increasing from the bottom
                                                                           the stability of the model.
layer to the top layer. The computation of DNN layers are defined as
follows:
                        (l)        (∗)     (l−1)                           Table 1: Prediction performance of the GAN model evaluated on 6
                       ht = fN (ht−rl , int      )                 (5)
                                                                           data contributors.
        (l)           (l−1)
where ht and int         are the output and input of layer l at timestep
t; fN denotes the computation in hidden neurons, referring to convo-              Number              30-minute PH                60-minute PH
                                                                            ID
lution and cell operation in CNN and RNN layers, respectively. As                    (#)        RMSE            MAE           RMSE            MAE
a feed-forward neural network, the CNN hidden units fetch all the          540      2884    20.14 ± 0.21    15.22 ± 0.17   38.54 ± 0.46   29.37 ± 0.21
inputs from the layer at a lower level (∗ = l − 1), whereas RNNs           544      2704    16.28 ± 0.11    11.62 ± 0.15   27.64 ± 0.43   20.09 ± 0.38
skip cell state by rl − 1 timesteps to perform the recursive operation     552      2352    16.08 ± 0.20    12.03 ± 0.22   29.03 ± 0.35   22.47 ± 0.34
(∗ = l).                                                                   567      2377    20.00 ± 0.14    14.17 ± 0.22   35.65 ± 0.41   26.68 ± 0.53
                                                                           584      2653    20.91 ± 0.08    15.11 ± 0.11   34.31 ± 0.53   25.55 ± 0.52
                                                                           596      2731    16.63 ± 0.25    12.12 ± 0.23   28.10 ± 0.57   21.06 ± 0.57
2.4    Training and Validation                                             AVG                 18.34            13.37          32.21          24.20
                                                                           SD                    0.17            0.18           0.46           0.42
The training and testing sets are separately provided by the BGLP          Best                 18.21           13.21          31.64          23.70
challenge, which contains the data for around 40 and 10 days, re-
spectively. To tune the hyper-parameters by grid search, we vali-
dated the models by the same range of hyper-parameters values as              To visualize clinical significance between the reference and pre-
in our previous work [32]. We considered many validation methods,          diction outcomes, Figure 2 shows the CEG of the contributor 544
such as simple splitting, k-fold cross-validation, and blocked cross-      that obtains the best statistic performance in Table 1. The specific
validation [2]. Due to the temporal dependencies and limited size of       percentage of the distribution in five regions is presented in Table 2.
                                   400                                                                                             400                                                                                                               
                                             E              C                B                                                               E              C                B                                                                                                                                           0HDVXUHPHQW
                                   350                                                                                             350
                                                                                                                                                                                                                                                                                                                   3UHGLFWLRQ




                                                                                                                                                                                                        *OXFRVH/HYHOPJG/
                                   300                                                                                             300
Prediction Concentration (mg/dL)




                                                                                                Prediction Concentration (mg/dL)
                                                                                           B                                                                                               B                                                         
                                   250                                                                                             250
                                                                                                                                                                                                                                                     
                                   200                                                                                             200
                                                                                                                                                                                                                                                     
                                   150       D                                                                                     150       D
                                                                                           D                                                                                               D
                                   100                                                                                             100                                                                                                               

                                   50                                                                                              50                                                                                                                  
                                             A              C                              E                                                 A              C                              E                                                                                            
                                    0                                                                                               0
                                         0    50   100 150 200 250 300               350       400                                       0   50    100 150 200 250 300               350       400                                                                                         7LPH
                                                   Reference Concentration (mg/dL)                                                                 Reference Concentration (mg/dL)

                                                 (a) 30-minute PH                                                                                (b) 60-minute PH                                                                                                               (a) 30-minute PH
                                                                                                                                                                                                                                                     
                                             Figure 2: The Clarke error grid plots for contributor 544                                                                                                                                                                                                                   0HDVXUHPHQW
                                                                                                                                                                                                                                                                                                                   3UHGLFWLRQ




                                                                                                                                                                                                        *OXFRVH/HYHOPJG/
                                     Table 2: The percentage distribution in Clarke error gird (%).                                                                                                                                                  

                                                                                                                                                                                                                                                     
                                             ID              540            544      552      567                                                          584             596
                                                                                30-minute PH                                                                                                                                                         
                                             CEGA          86.15           93.91    89.41    89.01                                                        86.75          91.03                                                                       
                                             CEGB          12.18            5.76     8.80    10.06                                                        12.26           7.57
                                             CEGC            0               0        0        0                                                            0              0                                                                           
                                             CEGD          1.67            0.33     1.79     0.93                                                         0.98           1.40                                                                                                           
                                             CEGE            0               0        0        0                                                            0              0                                                                                                               7LPH
                                                                                60-minute PH
                                             CEGA          60.22           79.38    68.01    60.81                                                        69.46          76.60                                                                                                  (b) 60-minute PH
                                             CEGB          33.37           19.20    28.91    30.80                                                        28.34          20.78
                                             CEGC          0.14              0        0      0.25                                                         0.18             0                         Figure 3: The comparison between the model predictions and the
                                             CEGD          6.27            1.38     3.08     8.14                                                         2.01           2.62                        ground truth of CGM measurements during the first 24-hour period
                                             CEGE            0               0        0        0                                                            0              0                         in the testing set of contributor 544. There are three missing BG val-
                                                                                                                                                                                                     ues between 8:00 and 8:15.
4                                        DISCUSSION
As shown in Table 2, the majority of the CEG points are located
in zones A and B. These zones signify that the data is within 20%
value of the reference, where the treatment suggestions are appro-                                                                                                                                   by subject-specific data, but the average validation RMSE slightly
priate regardless of the prediction error. It indicates the high clinical                                                                                                                            increased by around 0.5 mg/dL, compared with the models without
accuracy of the proposed model. The percentage of zone D is small                                                                                                                                    pre-training. As shown in Table 1, there are two groups: one includ-
for the 30-minute PH and increases for the 60-minute PH. The points                                                                                                                                  ing contributors 544, 552, and 596 with better RMSE and MAE per-
in zone D mean the predictive model missed the hypoglycemia or                                                                                                                                       formance, and the other including contributors 540, 567 and 584.
hyperglycemia events and could lead to poor treatment. In Figure 2b,                                                                                                                                 We introduced the data from the former group to pre-train a pop-
the most error points are concentrated on the bottom-right corner of                                                                                                                                 ulation model for the latter group, but the RMSE almost remained
the left panel of zone D. It reveals that the model outputs higher                                                                                                                                   unchanged. Thus, one explanation of the pre-training performance is
predictions when BG levels enter the hypoglycemia region, which is                                                                                                                                   that large inter-person variability exists. For example, in the testing
undesirable in the clinical setting. Figure 3 shows the correspond-                                                                                                                                  set, contributor 552 has a gap of 1415 missing data points (∼ 5 days),
ing BG curves for the contributor 544, where the findings from CEG                                                                                                                                   and contributor 567 did not record the meal ingestion, for which we
analysis can be validated, and time lags between the predictions and                                                                                                                                 reduced the dimension of the input data. To this end, multiple pre-
measurements can be observed. The overestimation is observed in                                                                                                                                      possessing methods are needed to mitigate these missing or incor-
several BG regions with low BG levels or a sharp decrease. Aligning                                                                                                                                  rect inputs, such as the detection of unannounced meals. In addition,
the error region with the timesteps, we find that some of the mis-                                                                                                                                   as future work, we consider incorporating personalized physiologi-
estimation occurs in nocturnal hypoglycemia. Similar findings are                                                                                                                                    cal and behavioral models [18], such as insulin and carbohydrate on
identified by the CEG analysis and BG curves of the other contribu-                                                                                                                                  board, to better explain the observed variability.
tors. Therefore, future work will include training and switching be-                                                                                                                                    Compared with the RNN prediction model in our previous
tween different models for different glucose regions, evaluated by                                                                                                                                   work [32], the GAN model achieved better validation performance
more advanced error grid analysis.                                                                                                                                                                   and smaller RMSE for most of the data contributors in the training
    During the experiments, we explored Tikhonov regularization to                                                                                                                                   process, especially for the 60-minute PH. During the testing phase,
filter out the outliers in training sets, as described in [1]. However, it                                                                                                                           the GAN model can output the predictions without using the discrim-
was prone to degrade the validation performance but largely reduce                                                                                                                                   inator. Hence, the complexity of the proposed model is similar to that
the training loss. Then we used the 2018 OhioT1DM dataset [21] and                                                                                                                                   of the conventional RNN models, which can be easily implemented
the in silico datasets from UVA/Padova T1D simulator [19] for model                                                                                                                                  on smartphone applications [16, 17] to provide real-time predictions
pre-training. The simulator produced data of an average virtual adult                                                                                                                                and control insulin pump via Bluetooth connectivity. The code cor-
subject with the scenarios defined in [32] over 360 simulated days.                                                                                                                                  responding to this work is available at: https://bitbucket.org/deep-
The population model was trained by 5 epochs and then fine-tuned                                                                                                                                     learning-healthcare/glugan.
5    CONCLUSION                                                                  [14] Diederik P Kingma and Jimmy Ba, ‘Adam: A method for stochastic
                                                                                      optimization.’, International Conference on Learning Representations
In this work, a novel deep learning model using a modified GAN                        2015, 1–15, (2015).
                                                                                 [15] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, ‘Deep learning’,
architecture is designed to predict BG levels for people with T1D.                    nature, 521(7553), 436–444, (2015).
We developed the personalized models and conducted multiple eval-                [16] Kezhi Li, John Daniels, Chengyuan Liu, Pau Herrero-Vinas, and Pan-
uations for each data contributor in the OhioT1DM dataset. The                        telis Georgiou, ‘Convolutional recurrent neural networks for glucose
proposed model achieves promising prediction performance for 30-                      prediction’, IEEE journal of biomedical and health informatics, (2019).
                                                                                 [17] Kezhi Li, Chengyuan Liu, Taiyu Zhu, Pau Herrero, and Pantelis Geor-
minute and 60-minute PH in terms of average RMSE and MAE. The
                                                                                      giou, ‘Glunet: A deep learning framework for accurate glucose fore-
CEG analysis further indicates good clinical accuracy, but there are                  casting’, IEEE journal of biomedical and health informatics, (2019).
opportunities for enhancement. In particular the model falls short               [18] Chengyuan Liu, Josep Vehı́, Parizad Avari, Monika Reddy, Nick Oliver,
sometimes in capturing a small number of hypoglycemia events.                         Pantelis Georgiou, and Pau Herrero, ‘Long-term glucose forecasting us-
Nevertheless, the model is able capture most of the individual glu-                   ing a physiological model and deconvolution of the continuous glucose
                                                                                      monitoring signal’, Sensors, 19(19), 4338, (2019).
cose dynamics and has clear potential to be adopted in actual clinical           [19] Chiara Dalla Man, Francesco Micheletto, Dayu Lv, Marc Breton, Boris
applications.                                                                         Kovatchev, and Claudio Cobelli, ‘The uva/padova type 1 diabetes sim-
                                                                                      ulator: new features’, Journal of diabetes science and technology, 8(1),
                                                                                      26–34, (2014).
ACKNOWLEDGEMENTS                                                                 [20] C. Marling and R. Bunescu, ‘The OhioT1DM dataset for blood
                                                                                      glucose level prediction: Update 2020’, in The 5th KDH work-
The work is supported by EPSRC EP/P00993X/1 and the President’s                       shop, ECAI 2020, (2020). CEUR proceedings in press, available at
                                                                                      http://smarthealth.cs.ohio.edu/bglp/OhioT1DM-dataset-paper.pdf.
PhD Scholarship at Imperial College London.                                      [21] Cindy Marling and Razvan C. Bunescu, ‘The OhioT1DM dataset for
                                                                                      blood glucose level prediction’, in The 3rd KDH workshop, IJCAI-
                                                                                      ECAI 2018, pp. 60–63, (2018).
REFERENCES                                                                       [22] Cooper Midroni, Peter J Leimbigler, Gaurav Baruah, Maheedhar Kolla,
                                                                                      Alfred J Whitehead, and Yan Fossat, ‘Predicting glycemia in type 1 dia-
 [1] Alessandro Aliberti, Irene Pupillo, Stefano Terna, Enrico Macii, Santa           betes patients: experiments with XGBoost’, in The 3rd KDH workshop,
     Di Cataldo, Edoardo Patti, and Andrea Acquaviva, ‘A multi-patient                IJCAI-ECAI 2018, pp. 79–84, (2018).
     data-driven approach to blood glucose prediction’, IEEE Access, 7,          [23] Sadegh Mirshekarian, Hui Shen, Razvan Bunescu, and Cindy Marling,
     69311–69325, (2019).                                                             ‘Lstms and neural attention models for blood glucose prediction: Com-
 [2] Christoph Bergmeir and José M Benı́tez, ‘On the use of cross-validation         parative experiments on real and synthetic data’, in 2019 41st Annual
     for time series predictor evaluation’, Information Sciences, 191, 192–           International Conference of the IEEE Engineering in Medicine and Bi-
     213, (2012).                                                                     ology Society (EMBC), pp. 706–712. IEEE, (2019).
 [3] Shiyu Chang, Yang Zhang, Wei Han, Mo Yu, Xiaoxiao Guo, Wei Tan,             [24] Olof Mogren, ‘C-rnn-gan: Continuous recurrent neural networks with
     Xiaodong Cui, Michael Witbrock, Mark A Hasegawa-Johnson, and                     adversarial training’, arXiv preprint arXiv:1611.09904, (2016).
     Thomas S Huang, ‘Dilated recurrent neural networks’, in Advances in         [25] Silvia Oviedo, Josep Vehı́, Remei Calm, and Joaquim Armengol, ‘A
     Neural Information Processing Systems, pp. 77–87, (2017).                        review of personalized blood glucose prediction strategies for t1dm pa-
 [4] Zhengping Che, Yu Cheng, Shuangfei Zhai, Zhaonan Sun, and Yan Liu,               tients’, International journal for numerical methods in biomedical en-
     ‘Boosting deep learning risk prediction with generative adversarial net-         gineering, 33(6), e2833, (2017).
     works for electronic health records’, in 2017 IEEE International Con-       [26] Tom Le Paine, Pooya Khorrami, Shiyu Chang, Yang Zhang, Prajit
     ference on Data Mining (ICDM), pp. 787–792. IEEE, (2017).                        Ramachandran, Mark A Hasegawa-Johnson, and Thomas S Huang,
 [5] Jianwei Chen, Kezhi Li, Pau Herrero, Taiyu Zhu, and Pantelis Geor-               ‘Fast wavenet generation algorithm’, arXiv preprint arXiv:1611.09482,
     giou, ‘Dilated recurrent neural network for short-time prediction of glu-        (2016).
     cose concentration.’, in The 3rd KDH workshop, IJCAI-ECAI 2018, pp.         [27] Pouya Saeedi, Inga Petersohn, Paraskevi Salpea, Belma Malanda,
     69–73, (2018).                                                                   Suvi Karuranga, Nigel Unwin, Stephen Colagiuri, Leonor Guariguata,
 [6] Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and                      Ayesha A Motala, Katherine Ogurtsova, et al., ‘Global and regional dia-
     Yoshua Bengio, ‘On the properties of neural machine translation:                 betes prevalence estimates for 2019 and projections for 2030 and 2045:
     Encoder-decoder approaches’, arXiv preprint arXiv:1409.1259, (2014).             Results from the international diabetes federation diabetes atlas’, Dia-
 [7] Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Ben-                  betes research and clinical practice, 157, 107843, (2019).
     gio, ‘Empirical evaluation of gated recurrent neural networks on se-        [28] Qingnan Sun, Marko V Jankovic, Lia Bally, and Stavroula G
     quence modeling’, in NIPS 2014 Workshop on Deep Learning, Decem-                 Mougiakakou, ‘Predicting blood glucose with an LSTM and Bi-LSTM
     ber 2014, (2014).                                                                based deep neural network’, in 2018 14th Symposium on Neural Net-
 [8] William L Clarke, Daniel Cox, Linda A Gonder-Frederick, William                  works and Applications (NEUREL), pp. 1–5. IEEE, (2018).
     Carter, and Stephen L Pohl, ‘Evaluating clinical accuracy of systems        [29] Ashenafi Zebene Woldaregay, Eirik Årsand, Ståle Walderhaug, David
     for self-monitoring of blood glucose’, Diabetes care, 10(5), 622–628,            Albers, Lena Mamykina, Taxiarchis Botsis, and Gunnar Hartvigsen,
     (1987).                                                                          ‘Data-driven modeling and prediction of blood glucose dynamics: Ma-
 [9] Cristóbal Esteban, Stephanie L Hyland, and Gunnar Rätsch, ‘Real-               chine learning applications in type 1 diabetes’, Artificial intelligence in
     valued (medical) time series generation with recurrent conditional               medicine, (2019).
     gans’, arXiv preprint arXiv:1706.02633, (2017).                             [30] Jinsung Yoon, Daniel Jarrett, and Mihaela van der Schaar, ‘Time-series
[10] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David                  generative adversarial networks’, in Advances in Neural Information
     Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio,                 Processing Systems, pp. 5509–5519, (2019).
     ‘Generative adversarial nets’, in Advances in neural information pro-       [31] Xingyu Zhou, Zhisong Pan, Guyu Hu, Siqi Tang, and Cheng Zhao,
     cessing systems, pp. 2672–2680, (2014).                                          ‘Stock market prediction on high-frequency data using generative ad-
[11] Sepp Hochreiter and Jürgen Schmidhuber, ‘Long short-term memory’,               versarial nets’, Mathematical Problems in Engineering, 2018, (2018).
     Neural computation, 9(8), 1735–1780, (1997).                                [32] Taiyu Zhu, Kezhi Li, Jianwei Chen, Pau Herrero, and Pantelis Geor-
[12] Elad Hoffer, Itay Hubara, and Daniel Soudry, ‘Train longer, generalize           giou, ‘Dilated recurrent neural networks for glucose forecasting in type
     better: closing the generalization gap in large batch training of neural         1 diabetes’, Journal of Healthcare Informatics Research, 1–17, (2020).
     networks’, in Advances in Neural Information Processing Systems, pp.        [33] Taiyu Zhu, Kezhi Li, Pau Herrero, Jianwei Chen, and Pantelis Geor-
     1731–1741, (2017).                                                               giou, ‘A deep learning algorithm for personalized blood glucose predic-
[13] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros, ‘Image-            tion.’, in The 3rd KDH workshop, IJCAI-ECAI 2018, pp. 64–78, (2018).
     to-image translation with conditional adversarial networks’, in Pro-
     ceedings of the IEEE conference on computer vision and pattern recog-
     nition, pp. 1125–1134, (2017).