Blood Glucose Level Prediction as Time-Series Modeling
         using Sequence-to-Sequence Neural Networks
         Ananth Bhimireddy 1 and Priyanshu Sinha 2 and Bolu Oluwalade 3 and Judy Wawira Gichoya 4
                                        and Saptarshi Purkayastha 5


Abstract. The management of blood glucose levels is critical in                          insulin, or the body is resistant to the effect of insulin. This absence
the care of Type 1 diabetes subjects. In extremes, high or low lev-                      or resistance to insulin leads to an increase in blood glucose levels,
els of blood glucose are fatal. To avoid such adverse events, there is                   known as hyperglycemia. Symptoms of hyperglycemia include ex-
the development and adoption of wearable technologies that continu-                      cessive thirst, excessive urination, sweating, etc. Diabetic ketoacido-
ously monitor blood glucose and administer insulin. This technology                      sis is a serious complication of uncontrolled hyperglycemia that can
allows subjects to easily track their blood glucose levels with early                    lead to death. On the other hand, elevated insulin levels in the body
intervention, preventing the need for hospital visits. The data col-                     can cause low levels of blood glucose, a condition known as hypo-
lected from these sensors is an excellent candidate for the application                  glycemia. Dizziness, weakness, coma, or eventually, death can occur
of machine learning algorithms to learn patterns and predict future                      in uncontrolled hypoglycemia. Insulin and glucose control are criti-
values of blood glucose levels. In this study, we developed artificial                   cal to the management of diabetes, and hence titration of the admin-
neural network algorithms based on the OhioT1DM training dataset                         istered insulin doses is critical in management of diabetes patients.
that contains data on 12 subjects. The dataset contains features such                       Glucose levels vary according to the patient’s diet, and activities
as subject identifiers, continuous glucose monitoring data obtained                      throughout the day. Sensors have been developed to estimate blood
in 5 minutes intervals, insulin infusion rate, etc. We developed in-                     glucose levels at various time intervals. These sensors are useful in
dividual models, including LSTM, BiLSTM, Convolutional LSTMs,                            diabetes management because they provide longitudinal data about
TCN, and sequence-to-sequence models. We also developed transfer                         subjects’ blood glucose and show the distinctive patterns throughout
learning models based on the most important features of the data, as                     the day. These sensors are frequently coupled with the use of in-
identified by a gradient boosting algorithm. These models were eval-                     sulin pumps to deliver short-acting insulin continuously (basal rate)
uated on the OhioT1DM test dataset that contains 6 unique subject’s                      and specific insulin quantity after a meal for appropriate glycemic
data. The model with the lowest RMSE values for the 30- and 60-                          control. Although the sensors and insulin pumps have helped to im-
minutes was selected as the best performing model. Our result shows                      prove patient care, patients are typically unaware of an impending
that sequence-to-sequence BiLSTM performed better than the other                         adverse event of severe hyperglycemia or hypoglycemia. These ad-
models. This work demonstrates the potential of artificial neural net-                   verse effects commonly occur when patients are asleep. There is an
works algorithms in the management of Type 1 diabetes.                                   opportunity for the development of accurate prediction models using
   Keywords. Blood glucose prediction, Time-series model, Wear-                          previously collected sensor data to estimate future values of blood
able devices, Transfer learning                                                          glucose levels to prevent the occurrence of adverse events.
                                                                                            In this study, we utilize the OhioT1DM dataset, which contains
                                                                                         blood glucose values of twelve T1DM subjects collected at inter-
1     INTRODUCTION                                                                       vals over a total time span of eight weeks [13]. These individuals
Diabetes Mellitus is a chronic disease characterized by high blood                       had an insulin pump with continuous glucose monitoring (CGM),
glucose levels. According to the 2020 CDC National Diabetes Statis-                      wearing a physical activity band and self-reporting life events us-
tics Report, about 34.2 million people or 10.2 percent of Americans                      ing a smartphone application. CGM blood glucose data were ob-
have diabetes [3]. Diabetes is classified as Type 1, Type 2 and Ges-                     tained at 5-minute intervals [12]. We developed multiple models for
tational Diabetes. Insulin is an enzyme produced in the pancreas                         predicting glucose values at 30 and 60 minutes in the future, us-
that helps in blood glucose absorption into cells. In Type 1 diabetes                    ing the CGM values, and mean Root Mean Square Error (RMSE)
(T1DM), the pancreas produces little or no insulin. In contrast, in                      as an evaluation metric. The code for this study can be found at
Type 2 diabetes (T2DM), the pancreas produces a small amount of                          https://github.com/iupui-soic/bglp2

1  Indiana University Purdue University Indianapolis, USA, email:
  anbhimi@iupui.edu                                                                      2    RELATED WORK
2   Mentor Graphics India Pvt. Ltd., India, email: priyanshus-
  inha@outlook.com                                                                       Models like LSTM and RNN have been used for forecasting in [10]
3 Indiana University Purdue University Indianapolis, USA, email:
                                                                                         which has been improved since then. The paper explores short-term
  boluwala@iupui.edu                                                                     load forecasting for individual electric customers and proposes an
4   Emory University School of Medicine, USA, email: judy-
  wawira@emory.edu                                                                       LSTM based framework to tackle the issue. Machine Learning mod-
5 Indiana University Purdue University Indianapolis, USA, email: sapt-                   els such as XGBoost have been used to predict glycemia in type-
  purk@iupui.edu                                                                         1 diabetic patients in [14]. This paper experiments primarily with


    Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
XGBoost algorithm to predict the blood glucose levels at a 30-             3.2.1   Long Short-Term Memory (LSTM) Networks
minute horizon in the OhioT1DM dataset. Features from pre-trained
TimeNets have been used for clinical predictions in [7]. This pa-          LSTMs were originally introduced by Hochreiter and Schmidhuber
per uses pre-training a network for some supervised or unsupervised        [8] and later refined and popularized [17] [20]. LSTMs are a spe-
tasks on datasets, and then fine-tuning via transfer learning for a re-    cial kind of RNNs, capable of learning long-term dependencies. This
lated end-task to leverage the resources of labeled data in making         quality of LSTMs helps memorize useful parts of the sequence and
predictions. This paper points out that training deep learning models      the model learns parameters more efficiently, making it useful for
such as RNNs and LSTMs requires large labeled data and is compu-           time series models.
tationally expensive.                                                         We trained two models using LSTMs, one with 5-min interval data
   Deep learning models like Recurrent Neural Network (RNN) have           and another with 30-min interval data. In each model, we used all the
been used on the OhioTIDM dataset to predict future blood glucose          available features at time t to predict the glucose value at time t+1.
values [16], including the BGLP Challenge at KDH@IJCAI-ECAI                Before fitting the dataset, we scaled the dataset using MinMaxScaler
2018 (http://ceur-ws.org/Vol-2148/). In some cases,                        from scikit-learn [15].
these data-driven models use only the CGM values or use physiolog-            The LSTM model was built using the Keras [6] platform. We used
ical data such as the insulin concentration, amount of carbohydrate in     128 LSTM units, followed by a dense layer (150 units), dropout layer
meals and physical activities. Chen et al. created a data-driven 3-layer   (0.20), dense layer (100 units), dropout layer (0.15), dense layer (50
dilated recurrent neural network model with a mean RMSE of 18.9,           units), dense layer (20 units) and a final layer with one unit (for pre-
with a range of 15.2995 and 22.7104 [4]. They concluded that the           diction). We used ReLU as the activation function with Adam op-
missing data and the continuous fluctuations in the data influenced        timizer. The loss was calculated in mean squared error (MSE) and
the model’s performance. Their model bettered the Convolutional            later converted into Root Mean Squared Error (RMSE). The model
Neural Network (CNN) that gave an average RMSE of 21.726 for               was trained for 200 epochs with a batch size of 32. The results of the
six subjects [21]. Bertachi et al. predicted blood glucose levels using    model for each subject are provided in the results table.
Artificial Neural Networks with the inclusion of physiological data
[1]. Their results were not significantly dissimilar to those obtained     3.2.2   Bi-directional Long Short-Term Memory (BiLSTM)
in the data-driven models. All the previous studies demonstrate that               Networks
a lower RMSE is obtained at the 30-minute prediction when com-
pared to the 60-minutes values. We postulate that a hybrid approach        As our input text data is static, and the entire sequence is available
of combining both the data-driven and physiological models could           at the same time, we implemented a BiLSTM model to observe how
improve on the performance of the individual models and incorpo-           processing the sequence from either direction affected the accuracy.
rate this in our approach.                                                 The architecture of the BiLSTM model is similar to the LSTM model
                                                                           and is thus useful for time series prediction.
                                                                              The data processing and model parameters for BiLSTM and
3     METHODS                                                              LSTM model were similar with an exception in the model’s first
3.1    Dataset Description                                                 layer, where the scaled data was inputted into the Bidirectional
                                                                           LSTM with 128 units. The model was trained for 200 epochs with
A detailed description of the dataset has been previously published in     a batch size of 32. The results of the model for each subject are pro-
the OhioT1DM dataset paper [12]. We used the data provided on 12           vided in the results table.
subjects for training, and 6 test subjects. Furthermore, the parameters
basal and temp basal were merged into a single parameter.                  3.2.3   Temporal Convolutional Networks (TCN)
   We converted both the training and testing datasets from XML to
CSV, preserving the time intervals. We did not use interpolation on        TCNs were originally introduced by Lea and Videl [11] in 2016.
the datasets as rules of the competition have prohibited interpolation.    TCNs are extremely useful in capturing the high-level temporal rela-
We tried using the forward and backward filling to fill the null values    tionships in sequential networks. The TCN architecture allows cap-
in the datasets, but they are creating additional time intervals which     turing long-range spatio-temporal relationships. TCN’s help cap-
becomes a problem in testing datasets. So, no re-sampling technique        ture the blood glucose level of subjects who usually have routine
was used in this paper to preserve the time intervals of the samples.      lifestyles, as TCNs can capture hierarchical relationships at low, in-
The data pre-processing is performed on only 6 patients (test and          termediate, and high time scales.
train datasets) whose results are to be predicted. Subject-548 has the        The data processing steps are similar to the LSTM model. The
highest number of training records (12150) and Subject-552 has the         TCN model was built using the Keras platform, but the depth of
lowest no. of records(9080). The no.of features varied from subject to     the model is relatively simpler compared to the LSTM and BiLSTM
subject which causes unevenness while training time series models.         model. The scaled data was inputted into a TCN layer and then con-
So, we have added features to subjects as required to ease the process     nected to a dense layer with one unit for the output. The model used
of training and predicting. Missing data is handled in all columns by      Adam optimizer and MSE for calculating loss which was later con-
inputting the null values with zeros(0). We did not disturb the glucose    verted to RMSE. The model was trained for 10 epochs and the ob-
value column as required by the competition.                               tained results are provided in the results table.

                                                                           3.2.4   Convolutional LSTM
3.2    Description of ML models
                                                                           Convolutional LSTMs (ConvLSTM) were introduced by Xingjian
We used the following deep learning models to predict the blood            Shi et al. [18] in 2015. ConvLSTMs are created by extending the fully
glucose levels of each subject. The data pre-processing and model          connected LSTM to have convolutional structure in both the input-
development are summarized in figure 1.                                    to-state and state-to-state transitions. ConvLSTM network captures
                                                         Figure 1. Process Architecture

spatio-temporal correlations better and usually outperform Fully          specific time periods, let’s say (after 10 minutes) or a whole sequence
Connected LSTM networks.                                                  of blood glucose values.
   The scaled data was reshaped and inputted to a convolution layer          To evaluate these sequence-to-sequence models we used walk for-
with 32 filters of kernel size 1, followed by LSTM layer with 128         ward validation. Here the model predicts the next one hour and then
units, Dense layer with 150 units, Dropout layer with 0.2 dropout         the actual data for one hour is given to make prediction for next one
rate, Dense layer with 100 units, Dropout layer with 0.15 rate, Dense     hour. See below table 1 for more illustration:
layer with 50 units, Dense layer with 16 units and finally a Dense
layer with 1 unit for prediction. ReLU activation function was used       Table 1. Description of Sequence-to-Sequence input and prediction values
in all layers. The model used Adam optimizer and the MSE loss was
                                                                                   Input                                    Prediction
further converted to RMSE for model comparison. The model was
trained for 200 epochs with a batch size of 32. The obtained results                1st 60 minutes data                 2nd 60 minutes data
                                                                                    [1st + 2nd] 60 minutes data         3rd 60 minutes data
are provided in the results table.                                                  [1st + 2nd + 3rd] 60 minutes data   4th 60 minutes data

                                                                             For our training, we kept our input size (number of prior observa-
3.2.5   Description of Sequence-to-Sequence Models                        tions required to make next predictions) as 30 minutes data to predict
Sequence to sequence models were first introduced by Google in            next 60 minutes data. Each sequence-to-sequence model used in our
2014 [19]. These models map fixed-length input with fixed-length          work is described below:
output where length of input and output differ. Sequence-to-sequence
                                                                          1. Sequence-to-Sequence LSTM
models consist of three parts:
                                                                             In this model, we used 200 LSTM cells for the encoder and
• Encoder: Encoder consists of stacks of several recurrent (LSTM)            decoder. This layer was followed with 2 dense layers containing
  units where each unit takes a single element of the input sequence,        150 and 1 units wrapped in a TimeDistributed layer. The model
  extracts information from it and propagates it to the next unit.           was trained for 80 epochs with batch size of 40. We used Adam
• Encoded Vector: This is the intermediate step and the final hid-           optimizer [9] with learning rate of 0.01 and loss function as MSE.
  den layer of the encoder. It also acts as the first hidden layer for
  the decoder to make predictions. This vector encapsulates all in-       2. Sequence-to-Sequence Bi-LSTM
  formation from all input samples and provides this information to          In this model, we used 100 Bi-LSTM cells (LSTM cells wrapped
  the decoder.                                                               in Bidirectional wrapper) for the encoder and decoder. This layer
• Decoder: This consists of stacks of recurrent unit where each unit         was followed with 2 dense layers containing 150 and 1 units
  predicts output at time step t. Each unit accepts a hidden layer as        wrapped in a TimeDistributed layer. The model was trained for
  input and produces output as well as its own hidden state.                 80 epochs with batch size of 40. Similar to sequence-to-sequence
                                                                             LSTM, Adam optimizer was used with 0.01 learning rate and
   The main advantage of this architecture is that it can map se-            “mean squared error” as the loss function.
quences of different lengths to each other. We applied 3 variants
of sequence to sequence models viz., sequence-to-sequence LSTM,           3. Sequence-to-Sequence CNN-LSTM In this model, we used 2
sequence-to-sequence Bi-LSTM, and sequence-to-sequence CNN-                  1D Convolutional layer with filter size of 128 and 64 respectively.
LSTM.                                                                        Convolutional layers were followed with MaxPooling 1D layer
   For training of sequence-to-sequence models, we split the data into       and flatten layer for the encoder. 200 LSTM cells are used for
windows of 60 minutes. This approach is intuitive and helpful as             the decoder. This layer was followed with 2 dense layers con-
BGLP values can be predicted 1-hour ahead. It is also helpful while          taining 100 and 1 units wrapped in a TimeDistributed layer. The
modelling as the model can be used to predict blood glucose values at        model was trained for 80 epochs with batch size of 40. Similar to
    above models, Adam optimizer was used with 0.01 learning rate
                                                                               Table 3. RMSE values for individual models at 30 minutes Horizon
    and “mean squared error” as the loss function.
                                                                                       Subjects    LSTM       BiLSTM          TCN     ConvLSTM
                                                                                        584        27.97          29.06       26.55     26.57
3.2.6      Transfer Learning                                                            567        25.65          27.04       25.71     26.04
                                                                                        596        19.47          20.30       18.95     21.08
For transfer learning, we first found the most relevant and common                      552        20.62          20.30       17.14     17.73
features among all 12 subjects. The importance of features was found                    544        21.34          22.06       80.67     20.94
                                                                                        540        30.21          31.72       25.94     27.35
using a Gradient Boosting algorithm. We set the cumulative fre-                         Mean       25.0           24.4         34.7     23.2
quency to 0.99 for feature selection and the 5 most important and                       SD          3.99           4.98       21.13     3.43
common features are as follows: ’finger stick value’, ’basal rate
value’, ’galvanic skin response value’, ’skin temperature value’, ’bo-
                                                                            Table 4.    RMSE values for sequence-to-sequence models at 30 Minutes
lus dose value’
                                                                            horizon
   Using the above-identified important features only, we trained our
model on a randomly selected subject (567) and subsequently, fine-                                   Seq-2-Seq            Seq-2-Seq   Seq-2-Seq
tuned each sequence-to-sequence model on each subject. The final                        Subjects       LSTM               BiLSTM      CNN-LSTM
model was used for prediction on the test data. The configurations
                                                                                        567                29.2             20.7        29.0
of each model was similar to the sequence-to-sequence model as de-                      540                25.0             24.3        23.1
scribed above.                                                                          544                18.5             19.8        19.2
                                                                                        596                18.6             18.6        19.0
                                                                                        584                30.7             29.7        31.0
4    RESULTS                                                                            552                19.2             18.1        17.2
                                                                                        Mean               23.5             21.8        23.0
                                                                                        SD                  5.0              4.0         5.2
Figures 2 and 3 shows the comparison between the actual and the pre-
dicted values obtained from our best performing model i.e. sequence-
to-sequence BiLSTM model. From the figures, it is clear that our            Table 5. RMSE values for sequence to Sequence transfer learning model at
model closely predicts the values of the test data by following simi-       30 Minutes horizon
lar peaks and troughs.                                                                               Seq-2-Seq            Seq-2-Seq   Seq-2-Seq
                                                                                        Subjects       LSTM               BiLSTM      CNN-LSTM
Table 2.    RMSE and MAE results of Sequence-to-Sequence BiLSTM model                   567             33.5                30.8        32.6
                                                                                        540             39.7                32.7        44.2
                              RMSE                   MAE                                544             23.1                31.5        21.8
            Subjects    30 Mins    60 Mins    30 Mins    60 Mins                        596             19.2                18.0        17.9
           584            29.7       42.6       18.1       30.0                         584             36.8                37.3        38.1
           567            20.7       35.1       14.4       24.9                         552             21.8                14.3        17.7
           596            18.6       28.3       12.7       19.3                         Mean            29.0                27.4        28.7
           552            18.2       30.0       13.3       22.2                         SD               7.9                 8.3        10.2
           544            19.8       32.9       13.7       23.1
           540            24.3       41.4       17.8       31.0
           Mean           21.8       35.0       15.0       25.0
           SD              4.0        5.4        2.1        4.2


   The RMSE of the 30 minutes horizon predictions of the models
are presented in tables 3, 4 and 5. From the tables, the sequence-
to-sequence (Seq2Seq) models performed better than all the other
models with an average RMSE of 23.5 for Seq-2-Seq LSTM, 21.8
for Seq-2-Seq BiLSTM, and 23.0 for Seq-2-Seq CNN-LSTM. Ta-
ble 2 describes the RMSE and MAE values for the sequence-to-
sequence BiLSTM model which performed better than the individual
and transfer learning models.
   From Table 2, the value for the RMSE varies from 18.2 in subject
552 to 29.7 in subject 584 at 30 minutes, and between 28.3 in sub-
ject 596 to 42.6 in subject 584 at 60 minutes. The MAE values lies
between 12.7 to 18.1 at 30 minutes and between 19.3 and 31.0 at 60
minutes.
   Subject 584 in figure 3 shows more fluctuations, has a higher peak
and lower troughs than subject 552 in figure 2. The effect of these
variations are reflected in our results as patient 584 has the highest
RMSE value while subject 552 has the lowest RMSE values. It is evi-
dent that the levels of variations in the individual subjects contributes
significantly to the differences in RMSE values of the individual sub-
jects in our model.
                                    Figure 2. Actual and Predicted values of subject 552 at 30 minutes horizons


                                    Figure 3. Actual and Predicted values of subject 584 at 30 minutes horizons

5   CONCLUSION AND FUTURE SCOPE                                             posed results. But [5] have used interpolation as a part of data pro-
                                                                            cessing which is against the rules of the competition and [2] did not
                                                                            mention details about data processing. Our future work will be to
In this paper, we present results of application of deep learning mod-      improve the transfer learning model as we are provided with more
els to make predictions of blood glucose values. Potential benefits         common features among all subjects, so that we can create a generic
such as the prevention of adverse events associated extreme glu-            model for predicting blood glucose levels. However, the development
cose values serve as a source of motivation for these efforts. Over-        of a generic model can be challenging because of confounding fac-
all, sequence-to-sequence models especially Bi-LSTM have the best           tors such as variations in sensor types, lifestyles, physiology and ge-
performance as these models are best at mapping sequences irrespec-         netics. It is therefore pertinent that these factors are considered in
tive of their lengths. Our performance is affected by fluctuations in       future endeavors.
glucose values and also with missing data as described in previous
experiments. Given the overall success of transfer learning, we also
evaluated the potential of single model prediction via transfer learn-      REFERENCES
ing approach. The transfer learning approach was inferior to the se-
                                                                             [1] Arthur Bertachi, Lyvia Biagi, Iván Contreras, Ningsu Luo, and Josep
quence to sequence models.                                                       Vehı́, ‘Prediction of blood glucose levels and nocturnal hypoglycemia
   Compared to the previous paper for BGLP Challenege, we ob-                    using physiological models and artificial neural networks.’, in KHD@
served that two papers [2] and [5] have better results than our pro-             IJCAI, pp. 85–90, (2018).
 [2] Arthur Bertachi, Lyvia Biagi, Iván Contreras, Ningsu Luo, and
     Josep Vehı́, ‘Prediction of blood glucose levels and nocturnal hypo-
     glycemia using physiological models and artificial neural networks’, in
     KHD@IJCAI, (2018).
 [3] CDC, National Diabetes Statistics Report 2020. Estimates of diabetes
     and its burden in the United States., 2020.
 [4] Jianwei Chen, Kezhi Li, Pau Herrero, Taiyu Zhu, and Pantelis Geor-
     giou, ‘Dilated recurrent neural network for short-time prediction of glu-
     cose concentration.’, in KHD@ IJCAI, pp. 69–73, (2018).
 [5] Jianwei Chen, Kezhi Li, Pau Herrero, Taiyu Zhu, and Pantelis Geor-
     giou, ‘Dilated recurrent neural network for short-time prediction of glu-
     cose concentration’, in KHD@IJCAI, (2018).
 [6] François Chollet et al. Keras. https://keras.io, 2015.
 [7] Priyanka Gupta, Pankaj Malhotra, Lovekesh Vig, and Gautam Shroff,
     ‘Using features from pre-trained timenet for clinical predictions’, (07
     2018).
 [8] Sepp Hochreiter and Jürgen Schmidhuber, ‘Long short-term memory’,
     Neural Comput., 9(8), 1735–1780, (November 1997).
 [9] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic
     optimization, 2014.
[10] W. Kong, Z. Y. Dong, Y. Jia, D. J. Hill, Y. Xu, and Y. Zhang, ‘Short-term
     residential load forecasting based on lstm recurrent neural network’,
     IEEE Transactions on Smart Grid, 10(1), 841–851, (Jan 2019).
[11] Colin Lea, René Vidal, Austin Reiter, and Gregory D. Hager, ‘Temporal
     convolutional networks: A unified approach to action segmentation’,
     CoRR, abs/1608.08242, (2016).
[12] Cindy Marling and Razvan Bunescu, ‘The ohiot1dm dataset for blood
     glucose level prediction: Update 2020’.
[13] Cindy Marling and Razvan C Bunescu, ‘The ohiot1dm dataset for blood
     glucose level prediction.’, in KHD@ IJCAI, pp. 60–63, (2018).
[14] Cooper Midroni, Peter leimbigler, Gaurav baruah, maheeder kolla, al-
     fred whitehead, and Yan Fossat, ‘Predicting glycemia in type 1 diabetes
     patients: Experiments with xgboost’, (07 2018).
[15] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
     O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander-
     plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch-
     esnay, ‘Scikit-learn: Machine learning in Python’, Journal of Machine
     Learning Research, 12, 2825–2830, (2011).
[16] M Sangeetha and M Senthil Kumaran, ‘Deep learning-based data impu-
     tation on time-variant data using recurrent neural network’, Soft Com-
     puting, 1–12, (2020).
[17] Alex Sherstinsky, ‘Fundamentals of recurrent neural network
     (RNN) and long short-term memory (LSTM) network’, CoRR,
     abs/1808.03314, (2018).
[18] Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-
     Kin Wong, and Wang-chun Woo, ‘Convolutional LSTM network:
     A machine learning approach for precipitation nowcasting’, CoRR,
     abs/1506.04214, (2015).
[19] Ilya Sutskever, Oriol Vinyals, and Quoc V Le, ‘Sequence to sequence
     learning with neural networks’, in Advances in neural information pro-
     cessing systems, pp. 3104–3112, (2014).
[20] Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu, Hongyun Bao,
     and Bo Xu, ‘Text classification improved by integrating bidirectional
     LSTM with two-dimensional max pooling’, in Proceedings of COLING
     2016, the 26th International Conference on Computational Linguistics:
     Technical Papers, pp. 3485–3495, Osaka, Japan, (December 2016). The
     COLING 2016 Organizing Committee.
[21] Taiyu Zhu, Kezhi Li, Pau Herrero, Jianwei Chen, and Pantelis Geor-
     giou, ‘A deep learning algorithm for personalized blood glucose pre-
     diction.’, in KHD@ IJCAI, pp. 64–78, (2018).