=Paper= {{Paper |id=Vol-2675/paper26 |storemode=property |title=Multi-lag Stacking for Blood Glucose Level Prediction |pdfUrl=https://ceur-ws.org/Vol-2675/paper26.pdf |volume=Vol-2675 |authors=Heydar Khadem,Hoda Nemat,Jackie Elliott,Mohammed Benaissa |dblpUrl=https://dblp.org/rec/conf/ecai/KhademNEB20 }} ==Multi-lag Stacking for Blood Glucose Level Prediction== https://ceur-ws.org/Vol-2675/paper26.pdf
      Multi-lag Stacking for Blood Glucose Level Prediction
                    Heydar Khadem1 and Hoda Nemat1 and Jackie Elliott2 and Mohammed Benaissa1


Abstract. This work investigates blood glucose level prediction                          model explored BGL, insulin, food, and activity information as in-
for type 1 diabetes in two horizons of 30 and 60 minutes. Initially,                     puts. For the same prediction horizons, Bertachi et al. [4] and Georga
three conventional regression tools—partial least square regression                      et al. [9], in separate studies, proposed predictive models. Bertachi
(PLSR), multilayer perceptron, and long short-term memory—are                            et al. applied an artificial neural network contemplating glucose, in-
deployed to create predictive models. They are trained once on 30                        sulin, carbohydrate and physical activity as inputs for their system.
minutes and once on 60 minutes of historical data resulting in six ba-                   BGL profile, insulin, carbohydrate intake and physical activity were
sic models for each prediction horizon. A collection of these models                     inputs for a support vector regression (SVR) in the model developed
are then set as base-learners to develop three stacking systems; two                     by Georga et al. Investigating continuous glucose monitoring (CGM)
uni-lag and one multi-lag. One of the uni-lag systems uses the three                     data by recursive and direct deep learning approaches, Xie et al. [22]
basic models trained on 30 minutes of lag data; the other uses those                     recommended a model for BGL prediction. Martinsson et al. [15]
trained on 60 minutes. The multi-lag system, on the other hand, lever-                   proposed an automatic forecast model for a prediction horizon of up
ages the basic models trained on both lags. All three stacking systems                   to 60 minutes using RNN. The model used only the information from
deploy a PLSR as meta-learner. The results obtained show: i) the                         past BGLs as input. Bunescu et al. [7] created descriptive features to
stacking systems outperform the basic models, ii) among the stacking                     train a SVR using a physiological model of blood glucose dynamics.
systems, the multi-lag shows the best predictive performance with a                      Carbohydrate intake, insulin administration, and the current and past
root mean square error of 19.01 mg/dl and 33.37 mg/dl for the pre-                       BGLs were inputs of their model. Despite extensive research devoted
diction horizon of 30 and 60 minutes, respectively.                                      to the development of predictive models, the performance of the pro-
                                                                                         posed models remains a challenge [3].
                                                                                            In this work, we contributed to the improvement of BGL pre-
1     INTRODUCTION                                                                       diction for T1DM by applying a multi-lag stacking methodology.
                                                                                         Initially, three conventional regression tools—partial least squares,
Diabetes mellitus is a metabolic disorder and a significant cause
                                                                                         multilayer perceptron, and long-short term memory—were applied
of morbidity and mortality worldwide [1]. As yet, there is no cure
                                                                                         to forecast BGLs in horizons of 30 and 60 minutes. Each tool was
developed for diabetes; and management of the corresponding life-
                                                                                         trained twice; once on a lag of 30 minutes and once on a lag of 60
impeding conditions is recommended as the most successful way to
                                                                                         minutes of CGM data. Therefore, six basic models were created for
control the disease [6]. In fact, the occurrence of the associated com-
                                                                                         each prediction horizon. For each horizon, three stacking systems
plications can be suspended or even prevented by effective manage-
                                                                                         were then developed where predictions from a selection of the basic
ment of the disease [11].
                                                                                         models were used as features to train a new regression. The first two
   Among different types of diabetes, the importance of the self-
                                                                                         stacking systems followed a uni-lag approach. They used predictions
management for type 1 diabetes mellitus (T1DM) is accentuated
                                                                                         from the three base models trained on a history of 30 minutes and 60
[8, 19]. The key factor in T1DM management is to control the blood
                                                                                         minutes, respectively. The third system was multi-lag and used pre-
glucose level (BGL) within the normal range [2]. BGL predictive
                                                                                         dictions from all six base models. The stacking systems resulted in
models could contribute to achieving this goal. They can help avert
                                                                                         appreciable improvements in predictive accuracy as compared to the
adverse glycaemic events by forecasting them and giving patients the
                                                                                         basic predictive models. The third stacking system showed a predic-
chance to take corrective actions ahead of time [2].
                                                                                         tive performance better than the other systems.
   The importance of the development of BGL predictive models in
                                                                                            This is the first paper, to our knowledge, that has combined mod-
T1DM management has spurred research into this field [16, 22]. Ac-
                                                                                         els with different time-lags to generate a multi-lag BGL prediction
cording to the knowledge requirement, predictive models can be clas-
                                                                                         system.
sified as; physiological, data-driven, and hybrid models [21]. Data-
driven models interpret trends in sequences of data to make esti-
mations of future BGLs. Machine learning approaches are broadly                          2    DATASET
adopted in this area [21].
   Mirshekarian et al. [17] developed a model to predict blood glu-                      The Ohio T1DM dataset comprises several features collected from
cose in 30-minute and 60-minute horizons using a recursive neural                        12 individuals with type 1 diabetes in 8 weeks [14, 13]. The last
network (RNN) with long short- term memory (LSTM) units. The                             ten days’ worth of data for each contributor was considered as the
                                                                                         test set. Data for a cohort of six subjects was released in 2018 for the
1   Department of Electronic and Electrical Engineering, Univer-                         first BGL prediction challenge [14]; data for another six subjects was
  sity of Sheffield, UK, email addresses: h.khadem@sheffield.ac.uk,
  hoda.nemat@sheffield.ac.uk, m.benaissa@sheffiels.ac.uk                                 released in 2020 for the second challenge [13].
2 Department of Oncology and Metabolism, University of Sheffield, UK,                       In this work, the 2020’s data was investigated for developing and
  email address: j.elliott@sheffield.ac.uk                                               evaluating predictive models. Among the collected features were




    Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CGM data every 5 minutes, which was the only feature explored in              PLSR for glucose quantification which provided promising results
this work. A brief description of the CGM data in the Ohio T1DM               [12].
dataset released for 2020 BGL prediction challenge is displayed in            In this work, PLSR was used as one of the regression tools. For
Table 1.                                                                      the number of components, different values ranging from 1 to the
                                                                              length of the input variable were tried. Each time, the predicted
Table 1. Number of test and training examples of each participant in Ohio     residual sum of squares (P RESS) was calculated as follows. The
                T1DM dataset released in 2020 [13].                           number of components (A) resulting in the minimum value for
            Patient   Number of Training     Number of Test                   P RESS/(N − A − 1) was then selected [20].
             ID          Examples              Examples
                                                                                                                N
               540           11947                2896
                                                                                                                X
               544           10623                2716                                            P RESS =            (yi − yˆi )2               (1)
               552            9080                2364                                                          i=1
               567           10858                2389
                                                                              where, N is the size of the evaluation set , yi is reference value,
               584           12150                2665
               596           10877                2743                        and yˆi is predicted value.

                                                                            • Multilayer perceptron (MLP)
                                                                              An MLP [18] with an architecture of one hidden layer including
                                                                              100 nodes and an output layer was implemented. ReLU was used
3     METHODS                                                                 as the activation function for he hidden layer, Adam as the opti-
                                                                              miser, and mean absolute error as the loss function. Learning rate
As mentioned earlier, this work proposes methodologies to predict
                                                                              was 0.01, and the training process was based on 100 epochs.
BGL in horizons of 30 and 60 minutes. The detail of the pursued
methodologies is presented in this section
                                                                            • Long short-term memory (LSTM)
                                                                              We used a Vanila LSTM [10] composed of a single hidden LSTM
3.1     Pre-processing                                                        layer with 200 nodes, a fully connected layer with 100 nodes, and
                                                                              an output layer. ReLU was the activation function for both hidden
The first pre-processing task was taking care of missing data. Miss-
                                                                              layers, mean squared error was the loss function, and Adam was
ing data in the training set was imputed applying a simple linear in-
                                                                              the optimizer. The model trained on 100 epochs with a learning
terpolation. Alternatively, for the test set, a linear extrapolation was
                                                                              rate of 0.01.
employed. This was to ensure the model is not contaminated by ob-
serving future data in its pre-processing stage.
   The next pre-processing step was transferring the time series fore-      3.2.2   Stacking systems
casting problem to a supervised learning task. To this end, a rolling
window consisting of a lag and future data was used as explanatory          Ensemble learning is a machine learning technique that combines
and dependent variables respectively. To give an illustration, for fore-    decisions from several models to create a new model. Stacking (Fig-
casting BGL of 30 minutes later using a history of 60 minutes, for          ure 1) is an ensemble approach that uses predictions from multiple
example, we used a window with the length of 18. As a consequence           base-learners (first level models) as features to train a meta-learner
of the 5-minute interval between data points, it therefore follows that     (second level model). This meta-learner then makes the final predic-
the first 12 data points in the window were explanatory variables, and      tions on the test set [23].
the rest were dependent variables.

3.2     Prediction methods
First, six basic predictive models were created by means of three con-
ventional regression tools. Subsequently, employing stacking learn-
ing, three more advanced predictive systems were developed where a
collection of the basic models were considered as base-learners and
a partial least squares regression as meta-learner. All proposed mod-
els/systems were personalised to individuals.

3.2.1    Basic models                                                       Figure 1. A stacking system uses predictions from multiple base-learners
                                                                                            as features to train a meta-learner[5] .
Initially, for each prediction horizon of 30 and 60 minutes, the fol-
lowing three conventional regressions tools were employed to gen-
erate six basic predictive models—two models by each tool. For this            In this paper, for each prediction horizon of 30 and 60 minutes,
purpose, these tools were trained once on a history of 30 and once on       three stacking systems comprised of two uni-lag and one multi-lag
a history of 60 minutes.                                                    were developed.

• Partial least squares regression (PLSR)                                   • System 1
  PLSR, as a basic linear regression, holds substantial popularity            The three basic models trained on a history of 30 minutes were
  in different applications due to its easy-to-apply nature and mini-         the base-learners of this uni-lag system and a PLSR was its meta-
  mal computation time requirement. In a previous work, we applied            learner.
• System 2
  This system was also uni-lag. It was similar to system 1, except it
  used the three basic models trained on a history of 60 minutes in
  place of 30 minutes as base-learners.

• System 3
  In this multi-lag system, all the six basic models were considered
  as the base-learners and again a PLSR was the meta-learner. By
  performing a multi-lag approach the idea was to help capture a            Table 2. Evaluation results of the basic predictive models for a 30-minute
  broader frequency range of BGL dynamics.                                                            prediction horizon.

                                                                                   Patient     Basic     History       RMSE             MAE
3.3     Evaluation                                                                  ID         Model      (min)        (mg/dl)         (mg/dl)
                                                                                                            30         22.11           16.58
                                                                                               PLSR
The test set was held out, and the train set was used to create the                                         60         22.07           16.56
predictive models/systems. The developed models/systems were then                    540                    30      21.98 ± 0.48    16.52 ± 0.33
                                                                                                MLP
utilised to predict the test data. The set of evaluation points starts 60                                   60      22.52 ± 0.78    16.76 ± 0.62
minutes after the beginning of the test set. First evaluation points                           LSTM
                                                                                                            30      21.65 ± 0.28    16.06 ± 0.12
would be otherwise similar to the training data, and it can affect the                                      60      21.58 ±0.67     16.20 ± 0.61
reliability of the results. Hence, the number of evaluated points for                                       30         18.08           13.34
                                                                                               PLSR
each patient is 12 less than the number of test examples mentioned                                          60         18.09           13.33
in Table 1. Root mean square error (RMSE) and mean absolute er-                      544                    30      18.22 ± 0.18    13.38 ± 0.37
                                                                                                MLP
                                                                                                            60      18.25 ± 0.28    13.21 ± 0.35
ror (MAE) were calculated as follows and then used as evaluation
                                                                                                            30      17.63 ± 0.15    12.63 ± 0.10
metrics.                                                                                       LSTM
                                                                                                            60      18.42 ± 0.60    13.36 ± 0.44
                               rP
                                      N                                                                     30         16.76           12.76
                                         (y − yˆi )2                                           PLSR
                                      i=1 i                                                                 60         16.79           12.78
                   RM SE =                                           (2)
                                           N                                         552        MLP
                                                                                                            30      17.08 ± 0.36    12.91 ± 0.40
                                 PN                                                                         60      17.03 ± 0.34    12.77 ± 0.17
                                    i=1
                                        |yi − yˆi |                                                         30      16.49 ± 0.10    12.29 ± 0.24
                      M AE =                                         (3)                       LSTM
                                        N                                                                   60      17.06 ± 0.70    12.88 ± 0.51
where, N , yi , and yˆi carry the same definition as in (1).                                                30         20.98           15.12
                                                                                               PLSR
                                                                                                            60         21.00           15.07
                                                                                     567                    30      21.24 ± 0.70    15.42 ± 0.76
                                                                                                MLP
4     RESULTS AND DISCUSSION                                                                                60      21.10 ± 0.46    15.13 ± 0.58
                                                                                                            30      20.66 ± 0.16    14.79 ± 0.25
                                                                                               LSTM
This section presents the evaluation results for both the basic mod-                                        60      20.77 ± 0.36    14.72 ± 0.40
els and stacking systems. Models/systems with a performance de-                                             30         22.00           16.15
                                                                                               PLSR
pended on random initialization ran five times, and corresponding                                           60         21.97           16.12
results have been reported in the form of mean and standard devia-                   584                    30      21.67 ± 0.18    15.63 ± 0.16
                                                                                                MLP
                                                                                                            60      22.43 ± 0.48    16.35 ± 0.61
tion. Extrapolated points were excluded when calculating the eval-
uation metrics. All models were built to predict future BGLs up to                                          30      22.23 ± 0.70    16.33 ± 0.67
                                                                                               LSTM
                                                                                                            60      22.04 ± 0.22    16.11 ± 0.28
the end of the intended prediction horizon, but only the evaluation
                                                                                                            30         17.79           12.77
results for the horizon of interest are reported.                                              PLSR
                                                                                                            60         17.62           12.67
                                                                                     596                    30      17.74 ± 0.04    12.55 ± 0.05
                                                                                                MLP
                                                                                                            60      18.44 ± 0.26    13.49 ± 0.42
4.1     Prediction horizon of 30 minutes
                                                                                                            30      17.76 ± 0.67    12.74 ± 0.55
                                                                                               LSTM
4.1.1    Basic models                                                                                       60      17.71 ± 0.28    12.50 ± 0.33
                                                                                                            30         19.62           14.45
                                                                                               PLSR
The results of the RMSE and MAE of the basic predictive models for                                          60         19.59           14.42
the prediction horizon of 30 minutes are displayed in Table 2.                     Average      MLP
                                                                                                            30      19.65 ± 0.32    14.40 ± 0.35
   Based on the average of RMSE and MAE for all patients, LSTM                                              60      19.96 ± 0.43    14.62 ± 0.46
trained on a history of 30 minutes showed the best performance                                              30      19.40 ± 0.34    14.14 ± 0.32
                                                                                               LSTM
among the basic models. PLSR with 60-minute lag was the second-                                             60      19.60 ± 0.47    14.30 ± 0.43
best model. All models had satisfactory standard deviations.
   LSTM yielded the best overall predictive accuracy among the three
regression tools. However, the results of the other two tools were also
comparable to that of LSTM. It is worth remarking that PLSR, as a
linear regression tool, was able to generate results comparable to that
of LSTM and even better than that of MLP.
   Among all patients, patient 552 had the best overall evaluation
results. The worst results, on the other hand, belonged to patients
584 and 540.
4.1.2    Stacking systems

Table 3 shows the evaluation results of the stacking systems for a
prediction horizon of 30 minutes. For all patients, the performance
of the stacking systems surpassed that of the basic models. System
3 proposed the best predictions overall based on average RMSE and
MAE values. This system resulted in the best predictive accuracy
for all patients except patient 544 and 584. All systems possessed
small standard deviation values. The best result among all patients        Table 4. Evaluation results of the basic predictive models for a 60-minute
belonged to patient 552. The worst results, on the other hand, were                                  prediction horizon.
those of patients 584, 540, and 567.
                                                                                  Patient     Basic     History       RMSE             MAE
                                                                                   ID         Model      (min)        (mg/dl)         (mg/dl)
   Table 3.   Evaluation results of the stacking systems for a 30-minute                                   30         41.03           31.68
                            prediction horizon.                                               PLSR
                                                                                                           60         41.03           31.70
          Patient      Stacking        RMSE              MAE                        540                    30      40.20 ± 0.38    30.90 ± 0.21
                                                                                               MLP
           ID          System          (mg/dl)          (mg/dl)                                            60      41.94 ± 2.18    32.14 ± 1.53
                      System 1      21.13 ± 0.08      15.72 ± 0.10                                         30      40.36 ± 0.91    30.80 ± 0.64
                                                                                              LSTM
              540     System 2      21.11 ± 0.18      15.69 ± 0.14                                         60      39.65 ± 1.16    30.28 ± 0.84
                      System 3      20.93 ± 0.11      15.52 ± 0.13                                         30         31.80           24.71
                                                                                              PLSR
                      System 1      17.47 ± 0.05      12.50 ± 0.05                                         60         31.83           24.71
              544     System 2      17.92 ± 0.10      12.93 ± 0.08                  544                    30      31.58 ± 0.53    24.19 ± 0.99
                                                                                               MLP
                      System 3      17.52 ± 0.05      12.50 ± 0.07                                         60      32.15 ± 0.63    24.13 ± 0.83
                      System 1      16.29 ± 0.06      12.13 ± 0.06                                         30      30.61 ± 0.19    22.97 ± 0.26
                                                                                              LSTM
              552     System 2      16.43 ± 0.12      12.33 ± 0.16                                         60      31.79 ± 0.31    24.57 ± 0.73
                      System 3      16.21 ± 0.09      12.08 ± 0.08                                         30         30.23           23.67
                                                                                              PLSR
                      System 1      20.43 ± 0.07      14.47 ± 0.06                                         60         30.24           23.68
              567     System 2      20.51 ± 0.14      14.51 ± 0.16                                         30      30.14 ± 0.09    23.27 ± 0.24
                                                                                    552        MLP
                      System 3      20.43 ± 0.06      14.41 ± 0.06                                         60      30.59 ± 1.01    23.65 ± 0.63
                      System 1      21.61 ± 0.06      15.68 ± 0.04                                         30      29.84 ± 0.25    22.52 ± 0.29
              584     System 2      21.83 ± 0.14      15.86 ± 0.08                            LSTM
                                                                                                           60      31.36 ± 1.43    23.72 ± 1.77
                      System 3      21.75 ± 0.08      15.76 ± 0.07
                                                                                                           30         37.47           28.28
                      System 1      17.26 ± 0.03      12.19 ± 0.03                            PLSR
                                                                                                           60         37.53           28.24
              596     System 2      17.47 ± 0.15      12.25 ± 0.11
                      System 3      17.22 ± 0.10      12.09 ± 0.04                  567                    30      36.81 ± 0.28    27.52 ± 0.50
                                                                                               MLP
                                                                                                           60      37.73 ± 1.28    28.57 ± 1.35
                      System 1      19.03 ± 0.06      13.78 ± 0.06
          Average     System 2      19.21 ± 0.14      13.93 ± 0.12                                         30      36.56 ± 0.17    27.58 ± 0.28
                                                                                              LSTM
                      System 3      19.01 ± 0.08      13.73 ± 0.07                                         60      37.17 ± 0.58    27.90 ± 0.72
                                                                                                           30         36.71           27.65
                                                                                              PLSR
                                                                                                           60         36.84           27.75
                                                                                    584                    30      36.32 ± 0.59    26.95 ± 0.66
                                                                                               MLP
                                                                                                           60      37.35 ± 0.82    27.82 ± 0.92
                                                                                                           30      37.14 ± 0.98    28.03 ± 1.14
                                                                                              LSTM
4.2     Prediction horizon of 60 minutes                                                                   60      37.03 ± 0.99    27.42 ± 0.54
                                                                                                           30         29.63           22.05
                                                                                              PLSR
4.2.1    Basic models                                                                                      60         29.48           21.97
                                                                                    596                    30      29.68 ± 0.27    21.87 ± 0.31
                                                                                               MLP
Table 4 lists RMSE and MAE of the basic models for 60-minute pre-                                          60      29.97 ± 0.39    22.08 ± 0.39
diction horizon. Among all models, LSTM trained on a lag of 30 min-                                        30      28.98 ± 0.29    21.14 ± 0.19
                                                                                              LSTM
utes showed the best performance. MLP trained on 300 minutes was                                           60      29.71 ± 0.72    22.09 ± 0.80
the second high-performance model. The value of standard deviation                                         30         34.48           26.43
                                                                                              PLSR
for all models were satisfactory. Among the implemented regression                                         60         34.55           26.34
tools, LSTM resulted in the highest overall prediction accuracy. PLSR             Average                  30      34.12 ± 0.36    25.78 ± 0.49
                                                                                               MLP
                                                                                                           60      34.95 ± 1.05    26.40 ± 0.94
produced acceptable results in this case too. Data for patients 596 and
552 showed the highest overall predictability. In, contrast, patients                                      30      33.92 ± 0.47    25.51 ± 0.47
                                                                                              LSTM
                                                                                                           60      34.45 ± 0.86    26.00 ± 0.90
540, 567, and 584 had the lowest predictable data.


4.2.2    Stacking systems

Evaluation results of the stacking systems for a prediction horizon
of 60 minutes are displayed in Table 5. System 3 proposed the best
overall predictions based on average RMSE and MAE values. The
best result among all patients belonged to patient 596. All systems
had low values of standard deviation.
                                                                                    [4] Arthur Bertachi, Lyvia Biagi, Iván Contreras, Ningsu Luo, and Josep
    Table 5.   Evaluation results of the stacking systems for a 60-minute               Vehí, ‘Prediction of blood glucose levels and nocturnal hypoglycemia
                             prediction horizon.                                        using physiological models and artificial neural networks.’, in KHD@
                                                                                        IJCAI, pp. 85–90, (2018).
          Patient       Stacking         RMSE               MAE                     [5] Julio Borges, The Power of Ensembles in Deep Learning, 2019.
           ID           System           (mg/dl)           (mg/dl)                      https://towardsdatascience.com/the-power-of-ensembles-in-deep-
                       System 1       39.47 ± 0.17      30.10 ± 0.17                    learning-a8900ff42be9.
               540     System 2       39.14 ± 0.28      29.76 ± 0.20                [6] Danielle Bruen, Colm Delaney, Larisa Florea, and Dermot Diamond,
                       System 3       39.00 ± 0.20      29.65 ± 0.12                    ‘Glucose sensing for diabetes monitoring: recent developments’, Sen-
                       System 1       30.47 ± 0.10      22.92 ± 0.13                    sors, 17(8), 1866, (2017).
               544     System 2       31.12 ± 0.12      23.72 ± 0.14                [7] Razvan Bunescu, Nigel Struble, Cindy Marling, Jay Shubrook, and
                       System 3       30.54 ± 0.09      22.95 ± 0.17                    Frank Schwartz, ‘Blood glucose level prediction using physiological
                                                                                        models and support vector regression’, in 2013 12th International Con-
                       System 1       29.39 ± 0.15      22.39 ± 0.13                    ference on Machine Learning and Applications, volume 1, pp. 135–140.
               552     System 2       29.38 ± 0.20      22.46 ± 0.20                    IEEE, (2013).
                       System 3       29.10 ± 0.13      22.10 ± 0.14                [8] Mol Ecol, ‘HHS Public Access’, 25(5), 1032–1057, (2017).
                       System 1       36.11 ± 0.11      27.08 ± 0.15                [9] Eleni I Georga, Vasilios C Protopappas, Diego Ardigò, Demosthenes
               567     System 2       36.54 ± 0.14      27.36 ± 0.14                    Polyzos, and Dimitrios I Fotiadis, ‘A glucose model based on sup-
                       System 3       36.31 ± 0.14      27.09 ± 0.08                    port vector regression for the prediction of hypoglycemic events un-
                       System 1       36.15 ± 0.16      27.04 ± 0.18                    der free-living conditions’, Diabetes technology & therapeutics, 15(8),
               584     System 2       36.68 ± 0.19      27.43 ± 0.19                    634–643, (2013).
                       System 3       36.52 ± 0.10      27.30 ± 0.14               [10] Sepp Hochreiter and Jürgen Schmidhuber, ‘Long short-term memory’,
                                                                                        Neural computation, 9(8), 1735–1780, (1997).
                       System 1       28.74 ± 0.16      20.84 ± 0.12
                                                                                   [11] George S Jeha, Lefkothea P Karaviti, Barbara Anderson, EO’Brian
               596     System 2       29.06 ± 0.21      21.13 ± 0.27
                                                                                        Smith, Susan Donaldson, Toniean S McGirk, and Morey W Haymond,
                       System 3       28.75 ± 0.10      20.78 ± 0.05
                                                                                        ‘Continuous glucose monitoring and the reality of metabolic control in
                       System 1       33.39 ± 0.14      25.06 ± 0.15                    preschool children with type 1 diabetes’, Diabetes Care, 27(12), 2881–
           Average     System 2       33.65 ± 0.19      25.31 ± 0.19                    2886, (2004).
                       System 3       33.37 ± 0.13      24.98 ± 0.12               [12] Heydar Khadem, Mohammad R Eissa, Hoda Nemat, Osamah Alrezj,
                                                                                        and Mohammed Benaissa, ‘Classification before regression for im-
                                                                                        proving the accuracy of glucose quantification using absorption spec-
                                                                                        troscopy’, Talanta, 211, 120740, (2020).
5    CONCLUSION                                                                    [13] Cindy Marling and Razvan Bunescu, ‘The ohiot1dm dataset for blood
                                                                                        glucose level prediction: Update 2020’.
BGL prediction improved using stacking learning concepts. Initially,               [14] Cindy Marling and Razvan C Bunescu, ‘The OhioT1DM Dataset For
a time series problem was translated into a supervised learning task.                   Blood Glucose Level Prediction.’, in 3rd International Workshop on
                                                                                        Knowledge Discovery in Healthcare Data, pp. 60–63, (2018).
Three conventional regression tools were trained with on different                 [15] John Martinsson, Alexander Schliep, Björn Eliasson, Christian Meijner,
history length of 30 and 60 minutes, resulting in six basic predic-                     Simon Persson, and Olof Mogren, ‘Automatic blood glucose prediction
tive models. Predictions from the basic models trained with a history                   with confidence using recurrent neural networks’, in 3rd International
of 30 minutes were fed as features to a regression to build a com-                      Workshop on Knowledge Discovery in Healthcare Data, KDH@ IJCAI-
bined learner. The learner was then used to make final predictions on                   ECAI 2018, 13 July 2018, pp. 64–68, (2018).
                                                                                   [16] Cooper Midroni, Peter J. Leimbigler, Gaurav Baruah, Maheedhar
the test set. The same scenario was repeated using the basic models                     Kolla, Alfred J. Whitehead, and Yan Fossat, ‘Predicting glycemia in
trained on 60-minute lag observations. In both cases, the combined                      type 1 diabetes patients: Experiments with XGBoost’, CEUR Workshop
learner was able to make more accurate predictions on the test set.                     Proceedings, 2148, 79–84, (2018).
The overall performance further improved when predictions from all                 [17] Sadegh Mirshekarian, Razvan Bunescu, Cindy Marling, and Frank
                                                                                        Schwartz, ‘Using lstms to learn physiological models of blood glucose
basic models—trained on both histories of 30 and 60 minutes—were                        behavior’, in 2017 39th Annual International Conference of the IEEE
considered as features to train a new learner.                                          Engineering in Medicine and Biology Society (EMBC), pp. 2887–2891.
                                                                                        IEEE, (2017).
                                                                                   [18] Fionn Murtagh, ‘Multilayer perceptrons for classification and regres-
6    SOFTWARE AND CODE                                                                  sion’, Neurocomputing, 2(5-6), 183–197, (1991).
                                                                                   [19] Shauna S Roberts, ‘Type 1 diabetes’, Diabetes Forecast, 55, 19, (2002).
                                                                                   [20] Svante Wold, Michael Sjöström, and Lennart Eriksson, ‘Pls-regression:
For data analysis we used Python 3.6, TensorFlow 1.15.0 and                             a basic tool of chemometrics’, Chemometrics and intelligent laboratory
Keras 2.2.5. Pandas, NumPy and Sklearn packages of python                               systems, 58(2), 109–130, (2001).
were used. The codes are available at: https://gitlab.com/                         [21] Ashenafi Zebene Woldaregay, Eirik Årsand, Taxiarchis Botsis, David
Heydar-Khadem/multi-lag-stacking.git                                                    Albers, Lena Mamykina, and Gunnar Hartvigsen, ‘Data-driven blood
                                                                                        glucose pattern classification and anomalies detection: machine-
                                                                                        learning applications in type 1 diabetes’, Journal of medical Internet
                                                                                        research, 21(5), e11030, (2019).
REFERENCES                                                                         [22] Jinyu Xie and Qian Wang, ‘Benchmark machine learning approaches
                                                                                        with classical time series approaches on the blood glucose level predic-
[1] Florencia Aguiree, Alex Brown, Nam Ho Cho, Gisela Dahlquist, Sheree                 tion challenge.’, in KHD@ IJCAI, pp. 97–102, (2018).
    Dodd, Trisha Dunning, Michael Hirst, Christopher Hwang, Dianna                 [23] Zhi-Hua Zhou, Ensemble methods: foundations and algorithms, CRC
    Magliano, Chris Patterson, et al., ‘Idf diabetes atlas’, (2013).                    press, 2012.
[2] Ramzi Ajjan, David Slattery, and Eugene Wright, ‘Continuous glucose
    monitoring: A brief review for primary care practitioners’, Advances in
    therapy, 36(3), 579–596, (2019).
[3] Muhammad Asad and Usman Qamar, ‘A review of continuous blood
    glucose monitoring and prediction of blood glucose level for diabetes
    type 1 patient in different prediction horizons (ph) using artificial neural
    network (ann)’, in Proceedings of SAI Intelligent Systems Conference,
    pp. 684–695. Springer, (2019).