<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Blood Glucose Level Prediction as Time-Series Modeling using Sequence-to-Sequence Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ananth Bhimireddy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Priyanshu Sinha</string-name>
          <email>inha@outlook.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bolu Oluwalade</string-name>
          <email>boluwala@iupui.edu</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Saptarshi Purkayastha</string-name>
          <email>purk@iupui.edu</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Judy Wawira Gichoya</string-name>
          <email>wawira@emory.edu</email>
        </contrib>
      </contrib-group>
      <abstract>
        <p>The management of blood glucose levels is critical in the care of Type 1 diabetes subjects. In extremes, high or low levels of blood glucose are fatal. To avoid such adverse events, there is the development and adoption of wearable technologies that continuously monitor blood glucose and administer insulin. This technology allows subjects to easily track their blood glucose levels with early intervention, preventing the need for hospital visits. The data collected from these sensors is an excellent candidate for the application of machine learning algorithms to learn patterns and predict future values of blood glucose levels. In this study, we developed artificial neural network algorithms based on the OhioT1DM training dataset that contains data on 12 subjects. The dataset contains features such as subject identifiers, continuous glucose monitoring data obtained in 5 minutes intervals, insulin infusion rate, etc. We developed individual models, including LSTM, BiLSTM, Convolutional LSTMs, TCN, and sequence-to-sequence models. We also developed transfer learning models based on the most important features of the data, as identified by a gradient boosting algorithm. These models were evaluated on the OhioT1DM test dataset that contains 6 unique subject's data. The model with the lowest RMSE values for the 30- and 60minutes was selected as the best performing model. Our result shows that sequence-to-sequence BiLSTM performed better than the other models. This work demonstrates the potential of artificial neural networks algorithms in the management of Type 1 diabetes.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Diabetes Mellitus is a chronic disease characterized by high blood
glucose levels. According to the 2020 CDC National Diabetes
Statistics Report, about 34.2 million people or 10.2 percent of Americans
have diabetes [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Diabetes is classified as Type 1, Type 2 and
Gestational Diabetes. Insulin is an enzyme produced in the pancreas
that helps in blood glucose absorption into cells. In Type 1 diabetes
(T1DM), the pancreas produces little or no insulin. In contrast, in
Type 2 diabetes (T2DM), the pancreas produces a small amount of
insulin, or the body is resistant to the effect of insulin. This absence
or resistance to insulin leads to an increase in blood glucose levels,
known as hyperglycemia. Symptoms of hyperglycemia include
excessive thirst, excessive urination, sweating, etc. Diabetic
ketoacidosis is a serious complication of uncontrolled hyperglycemia that can
lead to death. On the other hand, elevated insulin levels in the body
can cause low levels of blood glucose, a condition known as
hypoglycemia. Dizziness, weakness, coma, or eventually, death can occur
in uncontrolled hypoglycemia. Insulin and glucose control are
critical to the management of diabetes, and hence titration of the
administered insulin doses is critical in management of diabetes patients.
      </p>
      <p>Glucose levels vary according to the patient’s diet, and activities
throughout the day. Sensors have been developed to estimate blood
glucose levels at various time intervals. These sensors are useful in
diabetes management because they provide longitudinal data about
subjects’ blood glucose and show the distinctive patterns throughout
the day. These sensors are frequently coupled with the use of
insulin pumps to deliver short-acting insulin continuously (basal rate)
and specific insulin quantity after a meal for appropriate glycemic
control. Although the sensors and insulin pumps have helped to
improve patient care, patients are typically unaware of an impending
adverse event of severe hyperglycemia or hypoglycemia. These
adverse effects commonly occur when patients are asleep. There is an
opportunity for the development of accurate prediction models using
previously collected sensor data to estimate future values of blood
glucose levels to prevent the occurrence of adverse events.</p>
      <p>
        In this study, we utilize the OhioT1DM dataset, which contains
blood glucose values of twelve T1DM subjects collected at
intervals over a total time span of eight weeks [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. These individuals
had an insulin pump with continuous glucose monitoring (CGM),
wearing a physical activity band and self-reporting life events
using a smartphone application. CGM blood glucose data were
obtained at 5-minute intervals [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. We developed multiple models for
predicting glucose values at 30 and 60 minutes in the future,
using the CGM values, and mean Root Mean Square Error (RMSE)
as an evaluation metric. The code for this study can be found at
https://github.com/iupui-soic/bglp2
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        Models like LSTM and RNN have been used for forecasting in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
which has been improved since then. The paper explores short-term
load forecasting for individual electric customers and proposes an
LSTM based framework to tackle the issue. Machine Learning
models such as XGBoost have been used to predict glycemia in
type1 diabetic patients in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. This paper experiments primarily with
XGBoost algorithm to predict the blood glucose levels at a
30minute horizon in the OhioT1DM dataset. Features from pre-trained
TimeNets have been used for clinical predictions in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This
paper uses pre-training a network for some supervised or unsupervised
tasks on datasets, and then fine-tuning via transfer learning for a
related end-task to leverage the resources of labeled data in making
predictions. This paper points out that training deep learning models
such as RNNs and LSTMs requires large labeled data and is
computationally expensive.
      </p>
      <p>
        Deep learning models like Recurrent Neural Network (RNN) have
been used on the OhioTIDM dataset to predict future blood glucose
values [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], including the BGLP Challenge at KDH@IJCAI-ECAI
2018 (http://ceur-ws.org/Vol-2148/). In some cases,
these data-driven models use only the CGM values or use
physiological data such as the insulin concentration, amount of carbohydrate in
meals and physical activities. Chen et al. created a data-driven 3-layer
dilated recurrent neural network model with a mean RMSE of 18.9,
with a range of 15.2995 and 22.7104 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. They concluded that the
missing data and the continuous fluctuations in the data influenced
the model’s performance. Their model bettered the Convolutional
Neural Network (CNN) that gave an average RMSE of 21.726 for
six subjects [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. Bertachi et al. predicted blood glucose levels using
Artificial Neural Networks with the inclusion of physiological data
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Their results were not significantly dissimilar to those obtained
in the data-driven models. All the previous studies demonstrate that
a lower RMSE is obtained at the 30-minute prediction when
compared to the 60-minutes values. We postulate that a hybrid approach
of combining both the data-driven and physiological models could
improve on the performance of the individual models and
incorporate this in our approach.
3
3.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>METHODS</title>
    </sec>
    <sec id="sec-4">
      <title>Dataset Description</title>
      <p>
        A detailed description of the dataset has been previously published in
the OhioT1DM dataset paper [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. We used the data provided on 12
subjects for training, and 6 test subjects. Furthermore, the parameters
basal and temp basal were merged into a single parameter.
      </p>
      <p>We converted both the training and testing datasets from XML to
CSV, preserving the time intervals. We did not use interpolation on
the datasets as rules of the competition have prohibited interpolation.
We tried using the forward and backward filling to fill the null values
in the datasets, but they are creating additional time intervals which
becomes a problem in testing datasets. So, no re-sampling technique
was used in this paper to preserve the time intervals of the samples.
The data pre-processing is performed on only 6 patients (test and
train datasets) whose results are to be predicted. Subject-548 has the
highest number of training records (12150) and Subject-552 has the
lowest no. of records(9080). The no.of features varied from subject to
subject which causes unevenness while training time series models.
So, we have added features to subjects as required to ease the process
of training and predicting. Missing data is handled in all columns by
inputting the null values with zeros(0). We did not disturb the glucose
value column as required by the competition.
3.2</p>
    </sec>
    <sec id="sec-5">
      <title>Description of ML models</title>
      <p>We used the following deep learning models to predict the blood
glucose levels of each subject. The data pre-processing and model
development are summarized in figure 1.
3.2.1</p>
      <sec id="sec-5-1">
        <title>Long Short-Term Memory (LSTM) Networks</title>
        <p>
          LSTMs were originally introduced by Hochreiter and Schmidhuber
[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and later refined and popularized [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. LSTMs are a
special kind of RNNs, capable of learning long-term dependencies. This
quality of LSTMs helps memorize useful parts of the sequence and
the model learns parameters more efficiently, making it useful for
time series models.
        </p>
        <p>
          We trained two models using LSTMs, one with 5-min interval data
and another with 30-min interval data. In each model, we used all the
available features at time t to predict the glucose value at time t+1.
Before fitting the dataset, we scaled the dataset using MinMaxScaler
from scikit-learn [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>
          The LSTM model was built using the Keras [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] platform. We used
128 LSTM units, followed by a dense layer (150 units), dropout layer
(0.20), dense layer (100 units), dropout layer (0.15), dense layer (50
units), dense layer (20 units) and a final layer with one unit (for
prediction). We used ReLU as the activation function with Adam
optimizer. The loss was calculated in mean squared error (MSE) and
later converted into Root Mean Squared Error (RMSE). The model
was trained for 200 epochs with a batch size of 32. The results of the
model for each subject are provided in the results table.
3.2.2
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>Bi-directional Long Short-Term Memory (BiLSTM)</title>
      </sec>
      <sec id="sec-5-3">
        <title>Networks</title>
        <p>As our input text data is static, and the entire sequence is available
at the same time, we implemented a BiLSTM model to observe how
processing the sequence from either direction affected the accuracy.
The architecture of the BiLSTM model is similar to the LSTM model
and is thus useful for time series prediction.</p>
        <p>The data processing and model parameters for BiLSTM and
LSTM model were similar with an exception in the model’s first
layer, where the scaled data was inputted into the Bidirectional
LSTM with 128 units. The model was trained for 200 epochs with
a batch size of 32. The results of the model for each subject are
provided in the results table.
3.2.3</p>
      </sec>
      <sec id="sec-5-4">
        <title>Temporal Convolutional Networks (TCN)</title>
        <p>
          TCNs were originally introduced by Lea and Videl [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] in 2016.
TCNs are extremely useful in capturing the high-level temporal
relationships in sequential networks. The TCN architecture allows
capturing long-range spatio-temporal relationships. TCN’s help
capture the blood glucose level of subjects who usually have routine
lifestyles, as TCNs can capture hierarchical relationships at low,
intermediate, and high time scales.
        </p>
        <p>The data processing steps are similar to the LSTM model. The
TCN model was built using the Keras platform, but the depth of
the model is relatively simpler compared to the LSTM and BiLSTM
model. The scaled data was inputted into a TCN layer and then
connected to a dense layer with one unit for the output. The model used
Adam optimizer and MSE for calculating loss which was later
converted to RMSE. The model was trained for 10 epochs and the
obtained results are provided in the results table.
3.2.4</p>
      </sec>
      <sec id="sec-5-5">
        <title>Convolutional LSTM</title>
        <p>
          Convolutional LSTMs (ConvLSTM) were introduced by Xingjian
Shi et al. [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] in 2015. ConvLSTMs are created by extending the fully
connected LSTM to have convolutional structure in both the
inputto-state and state-to-state transitions. ConvLSTM network captures
spatio-temporal correlations better and usually outperform Fully
Connected LSTM networks.
        </p>
        <p>The scaled data was reshaped and inputted to a convolution layer
with 32 filters of kernel size 1, followed by LSTM layer with 128
units, Dense layer with 150 units, Dropout layer with 0.2 dropout
rate, Dense layer with 100 units, Dropout layer with 0.15 rate, Dense
layer with 50 units, Dense layer with 16 units and finally a Dense
layer with 1 unit for prediction. ReLU activation function was used
in all layers. The model used Adam optimizer and the MSE loss was
further converted to RMSE for model comparison. The model was
trained for 200 epochs with a batch size of 32. The obtained results
are provided in the results table.
3.2.5</p>
      </sec>
      <sec id="sec-5-6">
        <title>Description of Sequence-to-Sequence Models</title>
        <p>
          Sequence to sequence models were first introduced by Google in
2014 [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. These models map fixed-length input with fixed-length
output where length of input and output differ. Sequence-to-sequence
models consist of three parts:
        </p>
        <p>Encoder: Encoder consists of stacks of several recurrent (LSTM)
units where each unit takes a single element of the input sequence,
extracts information from it and propagates it to the next unit.
Encoded Vector: This is the intermediate step and the final
hidden layer of the encoder. It also acts as the first hidden layer for
the decoder to make predictions. This vector encapsulates all
information from all input samples and provides this information to
the decoder.</p>
        <p>Decoder: This consists of stacks of recurrent unit where each unit
predicts output at time step t. Each unit accepts a hidden layer as
input and produces output as well as its own hidden state.</p>
        <p>The main advantage of this architecture is that it can map
sequences of different lengths to each other. We applied 3 variants
of sequence to sequence models viz., sequence-to-sequence LSTM,
sequence-to-sequence Bi-LSTM, and sequence-to-sequence
CNNLSTM.</p>
        <p>For training of sequence-to-sequence models, we split the data into
windows of 60 minutes. This approach is intuitive and helpful as
BGLP values can be predicted 1-hour ahead. It is also helpful while
modelling as the model can be used to predict blood glucose values at
specific time periods, let’s say (after 10 minutes) or a whole sequence
of blood glucose values.</p>
        <p>To evaluate these sequence-to-sequence models we used walk
forward validation. Here the model predicts the next one hour and then
the actual data for one hour is given to make prediction for next one
hour. See below table 1 for more illustration:</p>
        <p>For our training, we kept our input size (number of prior
observations required to make next predictions) as 30 minutes data to predict
next 60 minutes data. Each sequence-to-sequence model used in our
work is described below:</p>
        <sec id="sec-5-6-1">
          <title>1. Sequence-to-Sequence LSTM</title>
          <p>
            In this model, we used 200 LSTM cells for the encoder and
decoder. This layer was followed with 2 dense layers containing
150 and 1 units wrapped in a TimeDistributed layer. The model
was trained for 80 epochs with batch size of 40. We used Adam
optimizer [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] with learning rate of 0.01 and loss function as MSE.
          </p>
        </sec>
        <sec id="sec-5-6-2">
          <title>2. Sequence-to-Sequence Bi-LSTM</title>
          <p>In this model, we used 100 Bi-LSTM cells (LSTM cells wrapped
in Bidirectional wrapper) for the encoder and decoder. This layer
was followed with 2 dense layers containing 150 and 1 units
wrapped in a TimeDistributed layer. The model was trained for
80 epochs with batch size of 40. Similar to sequence-to-sequence
LSTM, Adam optimizer was used with 0.01 learning rate and
“mean squared error” as the loss function.</p>
        </sec>
        <sec id="sec-5-6-3">
          <title>3. Sequence-to-Sequence CNN-LSTM In this model, we used 2</title>
          <p>1D Convolutional layer with filter size of 128 and 64 respectively.
Convolutional layers were followed with MaxPooling 1D layer
and flatten layer for the encoder. 200 LSTM cells are used for
the decoder. This layer was followed with 2 dense layers
containing 100 and 1 units wrapped in a TimeDistributed layer. The
model was trained for 80 epochs with batch size of 40. Similar to</p>
          <p>RMSE values for sequence-to-sequence models at 30 Minutes
Seq-2-Seq</p>
          <p>LSTM</p>
          <p>Seq-2-Seq
BiLSTM</p>
          <p>Seq-2-Seq
CNN-LSTM
above models, Adam optimizer was used with 0.01 learning rate
and “mean squared error” as the loss function.
3.2.6</p>
        </sec>
      </sec>
      <sec id="sec-5-7">
        <title>Transfer Learning</title>
        <p>For transfer learning, we first found the most relevant and common
features among all 12 subjects. The importance of features was found
using a Gradient Boosting algorithm. We set the cumulative
frequency to 0.99 for feature selection and the 5 most important and
common features are as follows: ’finger stick value’, ’basal rate
value’, ’galvanic skin response value’, ’skin temperature value’,
’bolus dose value’</p>
        <p>Using the above-identified important features only, we trained our
model on a randomly selected subject (567) and subsequently,
finetuned each sequence-to-sequence model on each subject. The final
model was used for prediction on the test data. The configurations
of each model was similar to the sequence-to-sequence model as
described above.
4</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>RESULTS</title>
      <p>The RMSE of the 30 minutes horizon predictions of the models
are presented in tables 3, 4 and 5. From the tables, the
sequenceto-sequence (Seq2Seq) models performed better than all the other
models with an average RMSE of 23.5 for Seq-2-Seq LSTM, 21.8
for Seq-2-Seq BiLSTM, and 23.0 for Seq-2-Seq CNN-LSTM.
Table 2 describes the RMSE and MAE values for the
sequence-tosequence BiLSTM model which performed better than the individual
and transfer learning models.</p>
      <p>From Table 2, the value for the RMSE varies from 18.2 in subject
552 to 29.7 in subject 584 at 30 minutes, and between 28.3 in
subject 596 to 42.6 in subject 584 at 60 minutes. The MAE values lies
between 12.7 to 18.1 at 30 minutes and between 19.3 and 31.0 at 60
minutes.</p>
      <p>Subject 584 in figure 3 shows more fluctuations, has a higher peak
and lower troughs than subject 552 in figure 2. The effect of these
variations are reflected in our results as patient 584 has the highest
RMSE value while subject 552 has the lowest RMSE values. It is
evident that the levels of variations in the individual subjects contributes
significantly to the differences in RMSE values of the individual
subjects in our model.</p>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSION AND FUTURE SCOPE</title>
      <p>In this paper, we present results of application of deep learning
models to make predictions of blood glucose values. Potential benefits
such as the prevention of adverse events associated extreme
glucose values serve as a source of motivation for these efforts.
Overall, sequence-to-sequence models especially Bi-LSTM have the best
performance as these models are best at mapping sequences
irrespective of their lengths. Our performance is affected by fluctuations in
glucose values and also with missing data as described in previous
experiments. Given the overall success of transfer learning, we also
evaluated the potential of single model prediction via transfer
learning approach. The transfer learning approach was inferior to the
sequence to sequence models.</p>
      <p>
        Compared to the previous paper for BGLP Challenege, we
observed that two papers [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] have better results than our
proposed results. But [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] have used interpolation as a part of data
processing which is against the rules of the competition and [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] did not
mention details about data processing. Our future work will be to
improve the transfer learning model as we are provided with more
common features among all subjects, so that we can create a generic
model for predicting blood glucose levels. However, the development
of a generic model can be challenging because of confounding
factors such as variations in sensor types, lifestyles, physiology and
genetics. It is therefore pertinent that these factors are considered in
future endeavors.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Arthur</given-names>
            <surname>Bertachi</surname>
          </string-name>
          , Lyvia Biagi, Iva´n Contreras, Ningsu Luo, and Josep Veh´ı, '
          <article-title>Prediction of blood glucose levels and nocturnal hypoglycemia using physiological models and artificial neural networks</article-title>
          .',
          <string-name>
            <surname>in</surname>
            <given-names>KHD</given-names>
          </string-name>
          @ IJCAI, pp.
          <fpage>85</fpage>
          -
          <lpage>90</lpage>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Arthur</given-names>
            <surname>Bertachi</surname>
          </string-name>
          , Lyvia Biagi, Iva´n Contreras, Ningsu Luo, and Josep Veh´ı, '
          <article-title>Prediction of blood glucose levels and nocturnal hypoglycemia using physiological models and artificial neural networks'</article-title>
          ,
          <source>in KHD@IJCAI</source>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[3] CDC, National Diabetes Statistics Report</source>
          <year>2020</year>
          .
          <article-title>Estimates of diabetes and its burden in the United States</article-title>
          .,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Jianwei</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Kezhi</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Pau</given-names>
            <surname>Herrero</surname>
          </string-name>
          , Taiyu Zhu, and Pantelis Georgiou, '
          <article-title>Dilated recurrent neural network for short-time prediction of glucose concentration</article-title>
          .',
          <string-name>
            <surname>in</surname>
            <given-names>KHD</given-names>
          </string-name>
          @ IJCAI, pp.
          <fpage>69</fpage>
          -
          <lpage>73</lpage>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Jianwei</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Kezhi</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Pau</given-names>
            <surname>Herrero</surname>
          </string-name>
          , Taiyu Zhu, and Pantelis Georgiou, '
          <article-title>Dilated recurrent neural network for short-time prediction of glucose concentration'</article-title>
          ,
          <source>in KHD@IJCAI</source>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6] Franc¸ois Chollet et al. Keras. https://keras.io,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Priyanka</given-names>
            <surname>Gupta</surname>
          </string-name>
          , Pankaj Malhotra, Lovekesh Vig, and Gautam Shroff, '
          <article-title>Using features from pre-trained timenet for clinical predictions', (07</article-title>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Sepp</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          and
          <article-title>Ju¨rgen Schmidhuber, 'Long short-term memory', Neural Comput</article-title>
          .,
          <volume>9</volume>
          (
          <issue>8</issue>
          ),
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          , (
          <year>November 1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Diederik</surname>
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Kingma</surname>
            and
            <given-names>Jimmy</given-names>
          </string-name>
          <string-name>
            <surname>Ba</surname>
          </string-name>
          .
          <article-title>Adam: A method for stochastic optimization</article-title>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>W.</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. Y.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Hill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , '
          <article-title>Short-term residential load forecasting based on lstm recurrent neural network'</article-title>
          ,
          <source>IEEE Transactions on Smart Grid</source>
          ,
          <volume>10</volume>
          (
          <issue>1</issue>
          ),
          <fpage>841</fpage>
          -
          <lpage>851</lpage>
          , (
          <year>Jan 2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Colin</surname>
            <given-names>Lea</given-names>
          </string-name>
          , Rene´ Vidal, Austin Reiter, and
          <string-name>
            <given-names>Gregory D.</given-names>
            <surname>Hager</surname>
          </string-name>
          , '
          <article-title>Temporal convolutional networks: A unified approach to action segmentation'</article-title>
          , CoRR, abs/1608.08242, (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Cindy</given-names>
            <surname>Marling</surname>
          </string-name>
          and Razvan Bunescu, '
          <article-title>The ohiot1dm dataset for blood glucose level prediction:</article-title>
          <source>Update</source>
          <year>2020</year>
          '.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Cindy</given-names>
            <surname>Marling and Razvan C Bunescu</surname>
          </string-name>
          , '
          <article-title>The ohiot1dm dataset for blood glucose level prediction</article-title>
          .',
          <string-name>
            <surname>in</surname>
            <given-names>KHD</given-names>
          </string-name>
          @ IJCAI, pp.
          <fpage>60</fpage>
          -
          <lpage>63</lpage>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Cooper</surname>
            <given-names>Midroni</given-names>
          </string-name>
          ,
          <article-title>Peter leimbigler, Gaurav baruah, maheeder kolla, alfred whitehead, and Yan Fossat, 'Predicting glycemia in type 1 diabetes patients: Experiments with xgboost', (07</article-title>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , and E. Duchesnay, '
          <article-title>Scikit-learn: Machine learning in Python'</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>12</volume>
          ,
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          , (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M</given-names>
            <surname>Sangeetha and M Senthil Kumaran</surname>
          </string-name>
          , '
          <article-title>Deep learning-based data imputation on time-variant data using recurrent neural network'</article-title>
          ,
          <source>Soft Computing</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          , (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Alex</surname>
            <given-names>Sherstinsky</given-names>
          </string-name>
          , '
          <article-title>Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network'</article-title>
          , CoRR, abs/
          <year>1808</year>
          .03314, (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Xingjian</surname>
            <given-names>Shi</given-names>
          </string-name>
          , Zhourong Chen, Hao Wang,
          <string-name>
            <surname>Dit-Yan</surname>
            <given-names>Yeung</given-names>
          </string-name>
          , WaiKin Wong, and
          <article-title>Wang-chun Woo, 'Convolutional LSTM network: A machine learning approach for precipitation nowcasting'</article-title>
          , CoRR, abs/1506.04214, (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Ilya</surname>
            <given-names>Sutskever</given-names>
          </string-name>
          , Oriol Vinyals, and Quoc V Le, '
          <article-title>Sequence to sequence learning with neural networks'</article-title>
          ,
          <source>in Advances in neural information processing systems</source>
          , pp.
          <fpage>3104</fpage>
          -
          <lpage>3112</lpage>
          , (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Peng</surname>
            <given-names>Zhou</given-names>
          </string-name>
          , Zhenyu Qi, Suncong Zheng, Jiaming Xu,
          <string-name>
            <given-names>Hongyun</given-names>
            <surname>Bao</surname>
          </string-name>
          , and Bo Xu, '
          <article-title>Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling'</article-title>
          ,
          <source>in Proceedings of COLING</source>
          <year>2016</year>
          ,
          <source>the 26th International Conference on Computational Linguistics: Technical Papers</source>
          , pp.
          <fpage>3485</fpage>
          -
          <lpage>3495</lpage>
          , Osaka, Japan, (
          <year>December 2016</year>
          ).
          <article-title>The COLING 2016 Organizing Committee</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Taiyu</surname>
            <given-names>Zhu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kezhi</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Pau</given-names>
            <surname>Herrero</surname>
          </string-name>
          , Jianwei Chen, and Pantelis Georgiou, '
          <article-title>A deep learning algorithm for personalized blood glucose prediction</article-title>
          .',
          <string-name>
            <surname>in</surname>
            <given-names>KHD</given-names>
          </string-name>
          @ IJCAI, pp.
          <fpage>64</fpage>
          -
          <lpage>78</lpage>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>