Neural Multi-class Classification Approach to Blood
     Glucose Level Forecasting with Prediction Uncertainty
                         Visualisation
                                                          Michael Mayo1 and Tomas Koutny2


Abstract. A machine learning-based method for blood glucose                              sification problem. This is achieved by dividing the range of pos-
level prediction thirty and sixty minutes in advance based on highly                     sible BGL values into 100 bins equally spaced in the risk domain
multiclass classification (as opposed to the more traditional regres-                    [9]. Each bin is mapped to a class, and therefore a given forecast is
sion approach) is proposed. An advantage of this approach is the                         generated by predicting the probability of each class and computing
possibility of modelling and visualising the uncertainty of a predic-                    the expected value across all of the classes. An advantage of this ap-
tion across the entire range of blood glucose levels without paramet-                    proach is that the probabilities associated with the forecast can be
ric assumptions such as normality. To demonstrate the approach, a                        visualised across the BGL range. This could be useful for patients,
long-short term memory-based neural network classifier is used in                        since it enables the patient to take the reliability of the forecast into
conjunction with a blood glucose-specific data preprocessing tech-                       consideration when making a decision. Additionally, the probability
nique (risk domain transform) to train a set of models and generate                      distribution can be used to estimate the chance of significant events
predictions for the 2018 and 2020 Blood Glucose Level Prediction                         such as hypoglycemic episode. Although a similar idea was explored
Competition datasets. Numeric accuracy results are reported along                        in the context of regression recently [11], the underlying assumption
with examples of the uncertainty visualisation possible using this                       there was that the uncertainty distribution was Gaussian, whereas in
technique.                                                                               the classification approach presented here, no assumptions need be
                                                                                         made about the distribution.
1     INTRODUCTION AND BACKGROUND
Maintaining blood glucose level (BGL) in the normoglycemic range
is a significant challenge for patients with type 1 diabetes (T1D).
Traditionally, patient BGL self-management is achieved using finger
stick blood samples, testing strips and glucose meters (see [13] for
an overview), combined with bolus insulin dosing to approximate
proper insulin delivery in the body of non-diabetic person. However,
with the recent development of continuous glucose monitors (CGMs)
and semi- and fully closed-loop artificial pancreas (AP) systems [14],
much finer grained control of patient BGL is now possible. Addition-
ally, significantly greater volumes of BGL data is also available when
these devices are used. AP technology has been show to improve pa-
tient outcomes [5].
                                                                                         Figure 1. Risk function. The vertical green line at 112.5 mg/dl is the point
   In this paper, the problem of forecasting BGL thirty and sixty min-                       of least risk while vertical red lines represent (from left to right) the
utes in advance is considered using the 2020 BGL Prediction Chal-                         thresholds [6] for level 2 and level 1 hypoglycemia, normoglycemia, and
lenge [3] as a testbed. Although several past systems have consid-                                       level 1 and level 2 hyperglycemia respectively.
ered machine learning techniques for BGL forecasting (see [16] for a
comprehensive survey), most approaches take a regression approach
to solving the problem. In other words, each “forecast” is a numeric                        In order to transform the problem of BGL forecasting into a clas-
point prediction (such as BGL at some point in the future), and over-                    sification problem, a method of breaking the BGL range into sensi-
all system accuracy is a measurement of the error between the fore-                      ble classes is required. This is not trivial because the range of BGL
cast and the actual future BGL. Accuracy metrics may be statistical                      values is continuous, and the sizes of clinically-relevant subranges
(e.g. mean absolute error) or clinical (e.g. Clarke error grid analysis                  varies non-linearly. For example, a small change in the hypoglycemic
[12]). Regardless, the focus is usually on point predictions.                            part of the BGL range may be highly significant clinically but an
   Here, an alternative approach is taken: instead of treating BGL                       equivalent change in the hyperglycemic part of the range may be
forecasting as a regression problem, it is instead viewed as a clas-                     considered insignificant.
1 School of Computing and Mathematical Sciences, University of Waikato,
                                                                                            One option is to use the five ranges proposed by Danne et al. [6] as
    New Zealand, email: michael.mayo@waikato.ac.nz                                       the classes. These ranges are: levels I and II hypoglycemic, normo-
2 NTIS – New Technologies for the Information Society, University of West                glycemia, and levels I and II hyperglycemia. In this case, the size and
    Bohemia, Czech Republic, email: txkoutny@kiv.zcu.cz                                  split points of each range are defined. However, this would amount


    Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
to only five classes, and predictions in such a case may lack accuracy        therefore numeric vectors, e.g. (0, 0 . . . 0.25, 0.6, 0.3 . . . 0, 0). Note
within a class. Another option is to arbitrarily divide the BGL range         that whichever type of model is used, the values should be positive
into a much larger number of bins (e.g. 100) which both increases the         and sum to unity so that they can be interpreted as normal probabili-
number of classes considerably (making the machine learning more              ties.
challenging) but simultaneously increases the granularity of the pre-            To evaluate this idea, 100-class classification experiments using
dictions so that better probability distributions can be produced. In         a neural network as a predictive model were performed. Figure 2
this paper, the latter approach is taken, however this in turn leads to       depicts the particular neural network architecture used here.
the necessity to decide how the bins should be defined/split across
the range of BGL values.                                                                                  LSTM(units=12)
   Because of the inherent non-linearity of the BGL range, an ap-
proach called the risk domain transformation, first proposed by Ko-
vatchev et al. [9], is utilised. The idea is to define a non-linear trans-                                     Flatten
formation function (and by implication, its inverse) that shifts a CGM
sensor reading from the blood glucose domain to a new “risk” do-
main that is better suited for subsequent analysis. This transforma-                                   Batch Normalisation
tion function is illustrated by Figure 1. Also shown by the figure are
the breakpoints for the five ranges defined by Danne et al. [6].                                   Dense(units=50, act=RELU)
   As can be observed in the figure, risk domain values typically
spans a range from approximately -2 to just over 2, and most nor-
moglycemic readings lie more or less in the range range [−0.9, 0.9].                                   Batch Normalisation
A risk value of 0.0 corresponds to the BGL of 112.5 mg/dl, which is
considered the point of least risk. An advantage of the risk domain                              Dense(units=100, act=Softmax)
is that the hypo- and hyperglycemic ranges now have equal size and
significance, which reduces the chance of bias in statistical analysis
(e.g. due to larger absolute error sizes in the hyperglycemic range).           Figure 2. Architecture of the neural network used in the experiments.

              r(xt ) = yt = 1.509 log(xt )1.084 − 5.381
                                                             
                                                                       (1)
                                                                                 The inputs to the neural network are twelve risk value readings,
                                                           1
                                                                              representing the past sixty minutes of BGL variation as sensed by
                                                             
                                             yt
                                                        1.084
              −1
          r        (yt ) = xt = exp              + 5.381               (2)    the CGM (assuming that readings occur every five minutes). Since
                                           1.509
                                                                              this is a time series, a long-short term memory (LTSM) layer with
   The exact definitions of the risk domain transformation and its            twelve units is used for initial input processing. Next, the LSTM out-
inverse are given in Equations 1 and 2 where xt is a CGM reading at           put is flattened and passed through two dense fully connected layers
time t and yt is its corresponding risk value.                                for further processing. Both of these layers apply batch normalisa-
                                                                              tion first, which ensures faster convergence times and stability during
                                                                              training. Finally, the last dense layer predicts the class and applies a
2    METHOD                                                                   softmax activation function to ensure that the output is a probability
For dividing the BGL range into classes, the following exact pro-             vector.
cedure is used. The risk domain range [−2, 2] is considered and                  To train each instance (one per patient) of this neural network, the
100 bin midpoints are placed on it. The bin midpoints are denoted             ADAM optimiser [7] was used with a learning rate of 0.0001, 10k
y1∗ , y3∗ , y3∗ . . . y100
                       ∗
                           with y1∗ = −2 and y100
                                              ∗
                                                  = 2. The remaining bin      epochs, batch size of 32, and validation data set to a random 15% sub-
midpoints are equally spaced along the risk range between y1∗ and             set of training data. The loss function utilised was categorical cross-
  ∗                                                                           entropy, which is commonly used for multiclass classification prob-
y100   . This ensures that for the smaller hypoglycemic range, the bin
sizes will be scaled properly in size and that there will be a propor-        lems. Early stopping during training was permitted if no improve-
tionate number of bins (and therefore classes) in each subrange of            ment in loss was observed for 100 epochs. All other settings were
the BGL scale.                                                                identical to those used in keras v. 2.3.1 [4] with tensorflow v.
    Next, in order to a assign a new CGM reading x (in mg/dl) to a bin,       2.1.0 [2] as a base neural network system. The neural network param-
its corresponding risk value y = r(x) is computed using Equation 1.           eters and architecture decisions were made as a result of single-run
The reading is then assigned the bin with the closest midpoint. This          experiments using data from the first patient in the 2018 competition
means that the split points between bins do not need to be calculated         dataset [10].
explicitly, and if a reading is outside the risk range (either x < −2 or         To generate data for training the neural network, a strict approach
x > 2) then it will be assigned to one of the bins at the ends of range.      was taken towards missing data and all examples with gaps or time
However, this tends not happen often since the significant majority           discrepancies (e.g. readings not exactly five minutes apart) exist were
of readings in the competition datasets lie on the [−2, 2] range.             excluded. Therefore, to generate one example for the thirty minute
    Finally, once its class is determined, the reading x is transformed       (t+30) forecasting problem, it is required that nine consecutive read-
into a one-hot encoded vector (0, 0 . . . 0, 1, 0 . . . 0, 0) of length 100   ings (from the start of the example up to and including the prediction
where the single 1 in the vector corresponds to the bin that x is as-         target) exist, and for the sixty minute problem (t + 60), twelve con-
signed to.                                                                    secutive readings were required. No missing value imputation was
    To summarise the process, the predictors X for the model com-             performed. As a result, the number of test examples varies slightly
prise a time series of risk-transformed CGM readings, and the target          depending on the forecasting horizon: i.e. the number of 2020 dataset
Y is a one-hot encoded class vector of dimension 100. Predictions are         test examples is 2,743, 2,579, 2,177, 2,185, 2,393 and 2,624 for the
30-minute horizon and 2,689, 2,531, 2,111, 2,113, 2,297 and 2,582                     Patient ID     MAE t + 30      std    MAE t + 60       std
for the 60 minute horizon respectively.
                                                                                      559                14.7        0.2         26.2        0.1
   With the neural network architecture and training data construc-                   570                12.2        0.4         21.0        0.4
tion approach described, the final aspect of methodology to be de-                    588                14.0        0.1         23.4        0.2
scribed is the way that numeric point predictions were generated for                  563                13.6        0.1         22.5        0.2
the competition purposes (which require point predictions). A sim-                    575                15.0        0.1         25.9        0.2
plistic approach is allow only bin midpoints (i.e. y1∗ . . . y100
                                                              ∗
                                                                  ) to be             591                16.0        0.1         26.3        0.1
predictions, and select the bin/class with the highest probability. Ini-              Avg.               14.3                    24.2
tial tests showed that this technique had low accuracy. Instead, a more               540                16.8        0.1         30.8        0.1
sophisticated approach is to calculate the expected BGL value, as de-                 544                12.9        0.2         23.3        0.2
scribed by the following equation:                                                    552                12.5        0.1         23.0        0.1
                                                                                      567                14.9        0.1         27.9        0.2
                                                                                      584                16.7        0.1         28.0        0.2
             x̂t+n = f (mn , yt−55 , yt−50 . . . yt−5 , yt )                          596                12.6        0.1         21.7        0.0
                       100
                       X                                                (3)           Avg.               14.4                    25.8
                   =         p(yi∗ |mn , yt−55 . . . yt )r−1 (yi∗ )
                       i=1                                                       Table 1.    Mean absolute error (MAE, mg/dl) results by patient and
                                                                                              prediction horizon, averaged over five runs.
where n ∈ {30, 60}, mn is the neural network for the current patient
with forecasting horizon n, yt is the risk-transformed BGL level at
time t (yt = r(xt )), y1∗ . . . y100
                                 ∗
                                     are risk bin midpoints, f () represents        Patient ID     RMSE t + 30       std    RMSE t + 60        std
the application of mn to the observed CGM values, and x̂t+n is the                  559                 21.6         0.2         35.5          0.2
expected value or prediction at time t + n. Since the probabilities                 570                 17.2         0.5         28.3          0.3
across the 100 bins sum to 1, the resulting point prediction will be                588                 19.1         0.1         31.8          0.2
scaled correctly. Initial experiments showed that this expected value               563                 20.6         0.3         31.2          0.4
                                                                                    575                 23.8         0.4         36.4          0.3
approach produced accurate estimates. Source code is available [1].                 591                 21.8         0.2         33.7          0.1
                                                                                    Avg.                20.7                     32.8
3    RESULTS                                                                        540                 23.0         0.2         40.6         0.14
Two rounds of experiments were performed. In the first round, neu-                  544                 17.4         0.2         30.5         0.17
                                                                                    552                 16.9         0.0         30.2         0.10
ral networks models were independently trained and tested for each                  567                 20.9         0.2         36.9         0.22
of the twelve patients from both the 2018 and 2020 competition                      584                 23.0         0.1         36.6         0.16
datasets. There was no sharing of information between models. In                    596                 17.8         0.1         29.5         0.03
the second round, models for the 2020 patients only were trained,                   Avg.                19.8                     34.0
and the training data for each patient included all of the training data
from the 2018 competition in addition to the specific 2020 patient’s           Table 2.    Root mean squared error (RMSE, mg/dl) results by patient and
training data. An individual patient’s own training data was therefore                        prediction horizon, averaged over five runs.
a small subset of his/her full training dataset. To account for this im-
balance in the second round, samples from the target patient were
                                                                                      Patient ID     MAE t + 30      std    MAE t + 60       std
re-weighted by a factor of six compared to the sample weights from
the other patients.                                                                   540                16.4        0.1         29.8        0.1
                                                                                      544                12.3        0.1         22.9        0.2
   Results for the first round of experiments are given in Tables 1                   552                12.2        0.1         22.2        0.2
and 2. The first table is mean absolute error (MAE) results, and the                  567                14.5        0.2         27.4        0.4
second table gives root mean squared error (RMSE) results. Units                      584                16.1        0.2         26.4        0.1
are mg/dl. Using both metrics, predictive performance is comparable                   596                12.2        0.3         21.3        0.1
between the two datasets, with patient 540 from the 2020 dataset                      Avg.               13.9                    25.0
being the most “difficult” patient to predict. Conversely, the patient
with lowest forecast error is patient 570 from the 2018 dataset.                 Table 3. Mean absolute error (MAE, mg/dl) results by patient and
                                                                                 prediction horizon, averaged over five runs for the 2020 dataset only.
   Results for the second round of experiments are given in Tables                          Training data includes the entire 2018 dataset.
3 and 4. It can be observed that the additional training data leads
to a very slight improvement in accuracy. Performing a paired t-test
across the six 2020 patients reveals that the improvement in MAE is                  Patient ID     RMSE t + 30      std    RMSE t + 60        std
significant even albeit small (average improvement for thirty minute                 540                22.4         0.1         39.5          0.1
forecasting is 0.45 with significance p = 0.000056, and for sixty                    544                17.0         0.1         30.1          0.2
minute forecasting it is 0.82 with significance p = 0.008096).                       552                16.5         0.1         29.3          0.1
   More interesting are the prediction plots that can be generated                   567                20.8         0.2         36.9          0.4
                                                                                     584                22.4         0.2         35.9          0.2
when making a forecast using the classification approach. Figures
                                                                                     596                17.2         0.3         29.0          0.2
3-6 depict some probability densities produced by the model when
making four different predictions. For each figure, the point predic-                Avg.               19.4                     33.4
tion generated using the expected value computation (Equation 3) is            Table 4. Root mean squared error (RMSE, mg/dl) results by patient and
shown as red line.                                                               prediction horizon, averaged over five runs for the 2020 dataset only.
   The tidiest example is Figure 3, which depicts a single-peaked dis-                      Training data includes the entire 2018 dataset.
tribution with the forecast coinciding with the peak of the distribu-
tion. While this class of forecast is common, it is not the only type of
distribution that is output from the model.
   Figure 4 shows a two-peaked distribution with an expected value
between the peaks. While the expected value is a useful point predic-
tion, the dual peaks are also useful information since it can be clearly
observed that the two most likely outcomes are normoglycemia vs. a
state most likely in level I hyperglycemica, although the model is not
certain.


                                                                                  Figure 6. An example of a prediction with considerable uncertainty over
                                                                                             the hypoglycemic and normoglycemic ranges.


                                                                                 the true risk to the patient at the current time. Again, this is clearly
                                                                                 noticeable.
                                                                                    Finally, Figure 6 depicts a prediction with considerable noise in the
                                                                                 distribution. While the forecast is normoglycemic, there is significant
                                                                                 mass in the hypoglycemic range. Therefore it could be concluded that
 Figure 3. Probability distribution over BGLs for one prediction. In this        although the point prediction is reasonably, there is still significant
    example, the expected BGL coincides with the distribution peak.              risk of hypoglycemia.
                                                                                    Further analysis was performed with respect to the variance of the
                                                                                 prediction distributions. It was found that for most patients, the dis-
                                                                                 tribution of variances is skewed to the left indicating that on average
                                                                                 most predictions are more certain (more like Figures 3 and 4) than
                                                                                 uncertain. However, more analysis needs to be done on this point.

                                                                                 4   CONCLUSION
                                                                                 This paper describes a system for forecasting BGL at thirty and sixty
                                                                                 minutes in advance. This main distinctiveness of this approach is the
                                                                                 adoption of a highly multi-class classification-based technique and
                                                                                 use of a domain-specific transform for normalising BGL values (op-
                                                                                 posed to more traditional min/max scaling or standardisation). The
                                                                                 ability to visualise non-parametric probability distributions accom-
                                                                                 panying predictions as a meaningful context is a clear advantage.
Figure 4. In this probability distribution over BGL levels, two peaks exist         To test the proposed method with real patients, we will use a
     in the distribution and the expected BGL lies between the peaks.
                                                                                 system known as SmartCGMS [8]. SmartCGMS is a continuous
                                                                                 glucose monitoring and controlling software framework. It pro-
                                                                                 vides infrastructure to connect and develop “building blocks” for an
                                                                                 insulin-pump software stack. Principally, the pump developer con-
                                                                                 nects CGM-sensor blocks to computing blocks, which predict BGL
                                                                                 and subsequently schedule insulin boluses or adjust the insulin basal
                                                                                 rate. Next, another block transforms the results of these computa-
                                                                                 tions into insulin-pump control commands. With SmartCGMS, we
                                                                                 can close the loop in-silico[15] first, before conducting an in-vivo
                                                                                 experiment to ensure maximum safety.
                                                                                    Our specific approach will be to transform the best trained
                                                                                 keras/tensorflow-based neural network into a hard-coded and con-
                                                                                 stant feed-forward neural network in C++. This will enable efficient
                                                                                 deployment and computation on low-power devices such as insulin-
                                                                                 pump controllers, while we can still train the original neural network
Figure 5. In this example, the distribution is skewed to to the right with the   using high-performance computers. As a result, a flow in which a
          expected BGL quite a distance to the left of the peak.                 neural network is continuously learned from patient BGL measure-
                                                                                 ments, providing personalised BGL predictions, can be established.

  Figure 5 shows a skewed distribution a significant mass of the
                                                                                 Acknowledgment
probability distribution is at the upper end of the range, but the ex-
pected value is closer to the middle of the range (albeit still in the hy-       This publication was partially supported by the project LO1506 of
perglycemic range). In this case, the expected value underestimates              the Czech Ministry of Education, Youth and Sports.
REFERENCES
 [1] https://www.cms.waikato.ac.nz/%7Emmayo/kdhcompsource.py
 [2] Martı́n Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng
     Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean,
     Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Ge-
     offrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz
     Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat
     Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster,
     Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul
     Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol
     Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and
     Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on het-
     erogeneous systems, 2015. Software available from tensorflow.org.
 [3] Marling C. and Bunescu R.                      The OhioT1DM dataset
     for      blood     glucose      level    prediction:      Update      2020.
     http://smarthealth.cs.ohio.edu/bglp/OhioT1DM-dataset-paper.pdf,
     2020.
 [4] François Chollet et al. Keras. https://keras.io, 2015.
 [5] Xia Dai, Zu-Chun Luo, Lu Zhai, Wen-Piao Zhao, and Feng Huang, ‘Ar-
     tificial pancreas as an effective and safe alternative in patients with type
     1 diabetes mellitus: A systematic review and meta-analysis’, Diabetes
     therapy : research, treatment and education of diabetes and related dis-
     orders, 9(3), 1269–1277, (06 2018).
 [6] Thomas Danne, Revital Nimri, Tadej Battelino, Richard M. Bergen-
     stal, Kelly L. Close, J. Hans DeVries, Satish Garg, Lutz Heinemann, Irl
     Hirsch, Stephanie A. Amiel, Roy Beck, Emanuele Bosi, Bruce Buck-
     ingham, Claudio Cobelli, Eyal Dassau, Francis J. Doyle, Simon Heller,
     Roman Hovorka, Weiping Jia, Tim Jones, Olga Kordonouri, Boris Ko-
     vatchev, Aaron Kowalski, Lori Laffel, David Maahs, Helen R. Mur-
     phy, Kirsten Nørgaard, Christopher G. Parkin, Eric Renard, Banshi Sa-
     boo, Mauro Scharf, William V. Tamborlane, Stuart A. Weinzimer, and
     Moshe Phillip, ‘International consensus on use of continuous glucose
     monitoring’, Diabetes Care, 40(12), 1631–1640, (2017).
 [7] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic
     optimization, 2014.
 [8] Tomas Koutny and Martin Ubl, ‘Parallel software architecture for the
     next generation of glucose monitoring’, Procedia Computer Science,
     141, 279–286, (2018).
 [9] Boris P. Kovatchev, Martin Straume, Daniel J. Cox, and Leon S. Farhy,
     ‘Risk analysis of blood glucose data: A quantitative approach to opti-
     mizing the control of insulin dependent diabetes’, Journal of Theoreti-
     cal Medicine, 3(1), (2000).
[10] Cindy Marling and Razvan Bunescu, ‘The OhioT1DM dataset for
     blood glucose level prediction’, in Proceedings of the 3rd International
     Workshop on Knowledge Discovery in Healthcare Data, 60–63, CEUR
     Workshop Proceedings (CEUR-WS.org), (2018).
[11] John Martinsson, Alexander Schliep, Bjorn Eliasson, Christian Meijner,
     Simon Persson, and Olof Mogren, ‘Automatic blood glucose prediction
     with confidence using recurrent neural networks’, in Proceedings of
     the 3rd International Workshop on Knowledge Discovery in Healthcare
     Data, 64–68, CEUR Workshop Proceedings (CEUR-WS.org), (2018).
[12] Himel Mondal and Shaikat Mondal, ‘Clarke error grid analysis on
     graph paper and microsoft excel’, Journal of Diabetes Science and
     Technology, 14(2), 499–499, (2020). PMID: 31777281.
[13] Leann Olansky and Laurence Kennedy, ‘Finger-stick glucose monitor-
     ing’, Diabetes Care, 33(4), 948–949, (2010).
[14] Dawei Shi, Sunil Deshpande, Eyal Dassau, and Francis J. Doyle,
     ‘Chapter 1 - feedback control algorithms for automated glucose man-
     agement in t1dm: the state of the art’, in The Artificial Pancreas, eds.,
     Ricardo S. Sánchez-Peña and Daniel R. Cherñavvsky, 1 – 27, Academic
     Press, (2019).
[15] Martin Ubl and Tomas Koutny, ‘SmartCGMS as an environment for
     an insulin-pump development with FDA-accepted in-silico pre-clinical
     trials’, Procedia Computer Science, 160, 322–329, (2019).
[16] Ashenafi Zebene Woldaregay, Eirik Årsand, Ståle Walderhaug, David
     Albers, Lena Mamykina, Taxiarchis Botsis, and Gunnar Hartvigsen,
     ‘Data-driven modeling and prediction of blood glucose dynamics: Ma-
     chine learning applications in type 1 diabetes’, Artificial Intelligence in
     Medicine, 98, 109 – 134, (2019).