<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic blood glucose prediction with confidence using recurrent neural networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>John Martinsson</string-name>
          <email>john.martinsson@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Schliep</string-name>
          <email>alexander@schlieplab.org</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bj o¨rn Eliasson</string-name>
          <email>bjorn.eliasson@gu.se</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian Meijner</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simon Persson</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olof Mogren</string-name>
          <email>olof@mogren.one</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chalmers University of Technology</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Gothenburg University</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Sahlgrenska University Hospital</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Low-cost sensors continuously measuring blood glucose levels in intervals of a few minutes and mobile platforms combined with machinelearning (ML) solutions enable personalized precision health and disease management. ML solutions must be adapted to different sensor technologies, analysis tasks and individuals. This raises the issue of scale for creating such adapted ML solutions. We present an approach for predicting blood glucose levels for diabetics up to one hour into the future. The approach is based on recurrent neural networks trained in an end-to-end fashion, requiring nothing but the glucose level history for the patient. The model outputs the prediction along with an estimate of its certainty, helping users to interpret the predicted levels. The approach needs no feature engineering or data pre-processing, and is computationally inexpensive.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Our future will be recorded and quantified in unprecedented
temporal resolution and with a rapidly increasing variety of
variables describing activities we engage in as well as
physiologically and medically relevant phenomena. One
example is the increasingly wide adoption of continuous blood
glucose monitoring systems (CGM) which has given type-1
diabetics (T1D) a valuable tool for closely monitoring and
reacting to their current blood glucose levels and trends.
Blood glucose levels adhere to complex dynamics that
depend on many different variables (such as carbohydrate
intake, recent insulin injections, physical activity, stress
levels, the presence of an infection in the body, sleeping
patterns, hormonal patterns, etc) [Bremer and Gough, 1999;
Cryer et al., 2003]. This makes predicting the short term
blood glucose changes (up to a few hours) a challenging task,
and developing machine learning (ML) approaches an
obvious approach for improving patient care. Variations in sensor
technologies must be reflected in the ML method. However,
acquiring domain expertise, understanding sensors, and
handcrafting features is expensive and not easy to scale up.
Sometimes natural, obviously important and well-studied variables
(e.g. caloric intake for diabetics) might be too inconvenient
to measure for end-users. On the other hand deep
learning approaches are a step towards automated machine
learning, as features, classifiers and predictors are simultaneously
learned. Thus they present a possibly more scalable
solution to the myriad of machine learning problems in precision
health management resulting from technology changes alone.</p>
      <p>The hypothesis underlying our approach are:</p>
      <p>It is feasible to predict glucose levels from glucose levels
alone.</p>
      <p>Appropriate models can be trained by non-experts
without feature engineering or complicated training
procedures.</p>
      <p>Models can quantify uncertainty in their predictions to
alert users to the need for extra caution or additional
input.</p>
      <p>Physiologically motivated loss functions improve the
quality of predictions.</p>
      <p>We trained and evaluated our method on the Ohio T1DM
Dataset for Blood Glucose Level Prediction; see [Marling and
Bunescu, 2018] for details.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <p>A recurrent neural network (RNN) is a feed forward artificial
neural network that can model a sequence of arbitrary length,
using weight sharing between each position in the sequence.
In the basic RNN variant, the transition function is a linear
transformation of the hidden state and the input, followed by
a pointwise nonlinearity:</p>
      <p>ht = tanh(W xt + U ht 1 + b);
where W and U are weight matrices, b is a bias vector, and
tanh is the selected nonlinearity. W , U , and b are
typically trained using some variant of stochastic gradient descent
(SGD).</p>
      <p>Basic RNNs struggle with learning long dependencies and
suffer from the vanishing gradient problem. This makes them
difficult to train [Hochreiter, 1998; Bengio et al., 1994], and
has motivated the development of the Long Short Term
Memory (LSTM) [Hochreiter and Schmidhuber, 1997], that to
some extent solves these shortcomings. An LSTM is an RNN
where the cell at each step t contains an internal memory
vector ct, and three gates controlling what parts of the
internal memory will be kept (the forget gate ft), what parts of
the input that will be stored in the internal memory (the
input gate it), as well as what will be included in the output
(the output gate ot). In essence, this means that the
following expressions are evaluated at each step in the sequence, to
compute the new internal memory ct and the cell output ht.
Here “ ” represents element-wise multiplication.
it = (Wixt + Uiht 1 + bi);
ft = (Wf xt + Uf ht 1 + bf );
ot =</p>
      <p>(Woxt + Uoht 1 + bo);
ut = tanh(Wuxt + Uuht 1 + bu);
ct = it
ht = ot
ut + ft
tanh(ct):
ct 1;</p>
      <p>We model the blood glucose levels using a recurrent
neural network (see Fig. 1), working on the sequence of input
data provided by the CGM sensor system. The network
consists of Long short-term memory (LSTM) cells [Hochreiter
and Schmidhuber, 1997]. The whole model takes as input
a sequence of blood glucose measurements from the CGM
system and outputs one prediction regarding the blood
glucose level after time T (we present experimental evaluation
for T 2 f30; 60g minutes). An RNN is designed to take a
vector of inputs at each timestep, but in the case of feeding
the network with blood glucose measurements only, the input
vectors are one-dimensional (effectively scalar valued).</p>
      <p>The output vector from the final LSTM cell (see ht in
Fig. 1) in the sequence is fed through a fully connected output
layer having two outputs with a linear activation function,
[ ; 2] = Wlht + bl:
The output is modeled as a univariate Gaussian
distribution [Graves, 2013], using one value for the mean, , and
one value for the variance, 2. This gives us an estimate of
the confidence in the models’ predictions.</p>
      <p>The negative log-likelihood (NLL) loss function is based
on the Gaussian probability density function,</p>
      <p>k
1 X
L = k</p>
      <p>log N (yij i; i2) ;
i=0
where yi is the target value from the data, and i, i are the
network’s output given the input sequence xi. This way of
modeling the prediction facilitates basing decisions on the
predictions.</p>
      <sec id="sec-2-1">
        <title>Physiological loss function: We also trained the model</title>
        <p>with a glucose-specific loss function [Favero et al., 2012],
which is a metric that combines the mean squared error with
a penalty term for predictions that would lead to clinically
dangerous treatments.
2.1</p>
      </sec>
      <sec id="sec-2-2">
        <title>Experimental setup</title>
        <p>The only preprocessing done on the glucose values are
scaling by 0:01 as in [Mirshekarian et al., 2017] to get the glucose
values into a range fit for training.</p>
        <p>Hyperparmeter selection was performed by selecting
patient 559 and 591 in the Ohio T1DM Dataset for Blood
Glucose Level Prediction [Marling and Bunescu, 2018] and train
on the first 60% of the training data for each patient, using the
next 20% of the data for early stopping, selecting the
hyperparameters by the performance on the last 20% of the data.
We then proceeded to train five models, with different
random initializations, on a set of different configurations using
30, 120 and 240 minutes of history in combination with an
LSTM state size of 8, 32, 96 and 128. Each model was
allowed a maximum of 200 epochs and early stopping with a
patience of 8. The configuration which generalized best for
the two patients was using 30 minutes of glucose level
history and 128 LSTM states. This can be seen in Fig. 2; note
the blue line. Using 30 minutes of history in combination
with few LSTM states results in a high RMSE score for both
patients, but 30 minutes of history in combination with 128
LSTM states works well both patients. The problem of
selecting the proper model and the amount of glucose level
history that the model should use to make the future prediction
is something that warrants further research, and which should
be addressed in future work.</p>
        <p>Final models: The final models were trained using 30
minutes of glucose level history for predictions 30 and 60 minutes
into the future, respectively. The setup for the final training
was to train on the first 80% of the glucose level training data
for each patient, and validate on the last 20%. The final
models were given a low learning rate of 10 5, a maximum
number of 10; 000 epochs, and an early stopping patience of 256
to allow them more time to converge. These final models
were then the only models run on the supplied test data. Note
that the there are values in the test data for which no
predictions have been made.</p>
        <p>Missing data: The number of missing predictions depends
on the number of gaps in the data, i.e., the number of
pairwise consecutive measurements in the glucose level data
34
32
where the time-step is not exactly five minutes. We do not
interpolate to fill the missing values since it is unclear how
much bias this would introduce, and instead only use data for
which it is possible to create the (x; y) pairs of glucose
history and regression targets at the given horizon. The greatest
number of gaps in the test data is 11 for patient 559. Using 30
minutes of history (6 time-steps) and predicting 30 minutes
into the future (6 time-steps) results in 12 11 = 132 values
which have no predictions, since we need at least 12
consecutive measurements to create a (x; y) pair. The test portion of
the dataset contains 2514, 2570, 2745, 2590, 2791 and 2760
test points, which gives us a upper-bound of roughly 5% of
missing predictions for each patient. See the discussion of
missing data for further explanation.</p>
        <p>Computational requirements: In our experimental setup
training of the model could be performed on a commodity
laptop. The model is small enough to fit in, and be used on
mobile devices (e.g. mobile phones, blood glucose
monitoring devices, etc). Training could initially be performed offline
and then incremental training would be light enough to allow
for training either on the devices or offline.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>The results presented in Table 1 are the root mean squared
error (RMSE) for the model when trained with the mean
squared error (MSE) loss function and the negative
loglikelihood (NLL) loss function. The results indicate that
the model performs comparably when trained with NLL and
MSE, but with the added benefit of estimating the variance of
the prediction.</p>
      <p>The glucose level of patient 591 is harder to predict than the
glucose level for patient 570, which can be seen in the Table 1
where the RMSE for patient 570 is 16.3 and the RMSE for
400
350
300
lve250
e
l
se200
o
c
luG150
100
50
0
400
350
300
lve250
e
l
se200
o
c
luG150
100
50
0
patient 591 is 24.6. Fig. 3 indicate that the model is able to
learn this by assigning a higher variance to the predictions
for patient 591 than for patient 570. The standard deviation
is illustrated by the pink shaded region in the figure. This is
further illustrated in the Clarke error grid plots in Fig. 4 where
we can see that for patient 570 most of the predictions are in
region A, which is considered as a clinically safe region, but
for patient 591 we can see that more predictions are in the B
region, which is still considered non-critical, but also in the
more critical D region. That is, the variance of the error in the
predictions is higher for patient 591 than for patient 570. In
particular, the model has a hard time predicting hypoglycemic
events.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Discussion</title>
      <p>As the competition will provide the benchmarking we focus
on particular insights we have gained during the development
of the method.</p>
      <p>Minimalistic ML: Compared to results in the literature for
other datasets our system based on recurrent neural networks
can predict blood glucose levels in type-1 diabetes for
horizons of up to 60 minutes into the future using only blood
glucose level as inputs. Generally, the minor improvement
over a naive baseline algorithm demonstrate that the
prediction problem is a rather difficult one, partly due to large intra
and inter patient variation. Nevertheless, our results suggest
that a substantially reduced human effort—avoiding
laborintensive prior work by experts hand-crafting features based
on extensive domain knowledge —in designing and training
machine learning methods for precision health management
can be feasible.</p>
      <p>Quantifying uncertainty: Our model also outputs an
estimate of the variance of the prediction, thus measuring
uncertainty in prediction. This is a useful aspect for a system
which will be used by continuous glucose monitoring users
for making decisions about administration of insulin and/or
caloric intake. We expect that large-scale data collection of
data from many users will further improve results. The results
in Fig. 3 show the two ends of the spectrum in this uncertainty
quantification.</p>
      <p>One principle problem is that disambiguating between
intra-patient variation and sensor errors is unlikely to be
feasible. An interesting research question concerns methods
which can detect sensor degradation over time or identify
defects by comparing sensors for the same patient in long-term
physiological; it is unclear if the often smoothed data
supplied by sensors is sufficient for that.</p>
      <p>Physiological loss function: To our surprise we did not see
improvements when using a physiologically motivated loss
function [Favero et al., 2012] (results not shown), essentially
a smoothed version of the Clarke error grid [Clarke et al.,
1987]. Of course our findings are not proof that such loss
functions cannot improve results. Possibly a larger-scale
investigation, exploring in particular a larger area of the
paramB
D
E
B
D</p>
      <p>E</p>
      <p>Patient 570 Clarke Error Grid
E</p>
      <p>C</p>
      <p>B
C
C</p>
      <p>C
50 100 150 200 250 300 350 400
Reference Concentration (mg/dl)
Patient 591 Clarke Error Grid</p>
      <p>B
50 100 150 200 250 300 350 400</p>
      <p>Reference Concentration (mg/dl)
eter space and different training regimes might provide
further insights. Penalizing errors for hypo- or hyper-glycemic
states should lead to better real-world performance, as we
observed comparatively larger deviations in minima and
maxima. One explanation for that is the relative class imbalance,
as extrema are rare. This could be countered with data
augmentation techniques.</p>
      <p>Model selection: The large inter-patient variation also
suggest that selecting one model for all patients might yield
suboptimal results, see Fig. 1. Consequently, precision health
apps should not only adapt parameters to individuals, but also
entertain increasing or decreasing model complexity. While
this is clearly undesirable from a regulatory point-of-view
(e.g., how to show efficacy in a trial), the differences we
observed seemed to suggest that adaption of complexity
improves quality of care.</p>
      <p>Missing data: There are gaps in the training data with
missing values. Most of the gaps are less than 10 hours, but some
of the gaps are more than 24 hours. The number of missing
data points account for roughly 23 out of 263 days, or 9% of
the data. The gaps could be filled using interpolation, but it is
not immediately clear how this would affect either the
training of the models, or the evaluation of the models, since this
would introduce artificial values. Filling a gap of 24 hours
using interpolation would not result in realistic data. Instead we
have chosen not to fill the gaps with artifical values and limit
our models to be trained and evaluated only on real data. This
has its own limitations since we can not predict the initial
values after a gap, but the advantage is that model training and
evaluation is not biased by the introduction of artificial
values.</p>
      <p>Conclusion: The field is certainly in desperate need of
larger data sets and standards for the evaluation. Crowd
sourcing from patient associations would be one
possibility, but differences in sensor types and sensor revisions, life
styles, and genetic markup are all obvious confounding
factors. Understanding sensor errors by measuring glucose level
in vivo, for example in diabetes animal models, with several
sensors simultaneously would be very insightful, and likely
improve prediction quality. Another question concerns
preprocessing in the sensors, which might be another
confounding factor in the prediction. While protection of proprietary
intellectual property is necessary, there has been examples,
e.g. DNA microarray technology, where only a completely
open analysis process from the initial steps usually performed
with vendor’s software tools to the final result helped to
realize the full potential of the technology.</p>
    </sec>
    <sec id="sec-5">
      <title>Software</title>
      <p>The software including all scripts to reproduce the
computational experiments is released under an open-source
license and available from https://github.com/
johnmartinsson/blood-glucose-prediction.
We have used Googles TensorFlow framework, in particular
the Keras API of TensorFlow which allows for rapid
prototyping of deep learning models, to implement our model and
loss functions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Bengio et al.,
          <year>1994</year>
          ]
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          , Patrice Simard, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Frasconi</surname>
          </string-name>
          .
          <article-title>Learning long-term dependencies with gradient descent is difficult</article-title>
          .
          <source>Neural Networks, IEEE Transactions on, 5</source>
          (
          <issue>2</issue>
          ):
          <fpage>157</fpage>
          -
          <lpage>166</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Bremer and Gough</source>
          , 1999]
          <string-name>
            <given-names>Troy</given-names>
            <surname>Bremer</surname>
          </string-name>
          and David A Gough.
          <article-title>Is blood glucose predictable from previous values? a solicitation for data</article-title>
          .
          <source>Diabetes</source>
          ,
          <volume>48</volume>
          (
          <issue>3</issue>
          ):
          <fpage>445</fpage>
          -
          <lpage>451</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Clarke et al.,
          <year>1987</year>
          ] William L Clarke, Daniel Cox,
          <string-name>
            <surname>Linda A Gonder-Frederick</surname>
            ,
            <given-names>William</given-names>
          </string-name>
          <string-name>
            <surname>Carter</surname>
          </string-name>
          , and Stephen L Pohl.
          <article-title>Evaluating clinical accuracy of systems for selfmonitoring of blood glucose</article-title>
          .
          <source>Diabetes care</source>
          ,
          <volume>10</volume>
          (
          <issue>5</issue>
          ):
          <fpage>622</fpage>
          -
          <lpage>628</lpage>
          ,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Cryer et al.,
          <year>2003</year>
          ] Philip E Cryer,
          <string-name>
            <surname>Stephen N Davis</surname>
            , and
            <given-names>Harry</given-names>
          </string-name>
          <string-name>
            <surname>Shamoon</surname>
          </string-name>
          .
          <article-title>Hypoglycemia in diabetes</article-title>
          .
          <source>Diabetes care</source>
          ,
          <volume>26</volume>
          (
          <issue>6</issue>
          ):
          <fpage>1902</fpage>
          -
          <lpage>1912</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Favero et al.,
          <year>2012</year>
          ]
          <string-name>
            <given-names>Simone</given-names>
            <surname>Del Favero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Facchinetti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Claudio</given-names>
            <surname>Cobelli</surname>
          </string-name>
          .
          <article-title>A Glucose-Specific Metric to Assess Predictors</article-title>
          and
          <string-name>
            <given-names>Identify</given-names>
            <surname>Models</surname>
          </string-name>
          .
          <volume>59</volume>
          (
          <issue>5</issue>
          ):
          <fpage>1281</fpage>
          -
          <lpage>1290</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Graves</source>
          , 2013]
          <string-name>
            <given-names>Alex</given-names>
            <surname>Graves</surname>
          </string-name>
          .
          <article-title>Generating sequences with recurrent neural networks</article-title>
          .
          <source>arXiv preprint arXiv:1308.0850</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Hochreiter and Schmidhuber</source>
          , 1997]
          <article-title>Sepp Hochreiter and Ju¨rgen Schmidhuber. Long short-term memory</article-title>
          .
          <source>Neural computation</source>
          ,
          <volume>9</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>[Hochreiter</source>
          , 1998]
          <string-name>
            <given-names>Sepp</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          .
          <article-title>The vanishing gradient problem during learning recurrent neural nets and problem solutions</article-title>
          .
          <source>International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems</source>
          ,
          <volume>6</volume>
          (
          <issue>02</issue>
          ):
          <fpage>107</fpage>
          -
          <lpage>116</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>[Marling and Bunescu</source>
          , 2018]
          <string-name>
            <given-names>Cindy</given-names>
            <surname>Marling</surname>
          </string-name>
          and
          <string-name>
            <given-names>Razvan</given-names>
            <surname>Bunescu</surname>
          </string-name>
          .
          <article-title>The ohiot1dm dataset for blood glucose level prediction</article-title>
          .
          <source>Glucose Prediction News</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Mirshekarian et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Sadegh</given-names>
            <surname>Mirshekarian</surname>
          </string-name>
          , Razvan Bunescu, Cindy Marling, and
          <string-name>
            <given-names>Frank</given-names>
            <surname>Schwartz</surname>
          </string-name>
          .
          <article-title>Using LSTMs to learn physiological models of blood glucose behavior</article-title>
          .
          <source>Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society</source>
          , EMBS, pages
          <fpage>2887</fpage>
          -
          <lpage>2891</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>