=Paper=
{{Paper
|id=Vol-2675/paper26
|storemode=property
|title=Multi-lag Stacking for Blood Glucose Level Prediction
|pdfUrl=https://ceur-ws.org/Vol-2675/paper26.pdf
|volume=Vol-2675
|authors=Heydar Khadem,Hoda Nemat,Jackie Elliott,Mohammed Benaissa
|dblpUrl=https://dblp.org/rec/conf/ecai/KhademNEB20
}}
==Multi-lag Stacking for Blood Glucose Level Prediction==
Multi-lag Stacking for Blood Glucose Level Prediction
Heydar Khadem1 and Hoda Nemat1 and Jackie Elliott2 and Mohammed Benaissa1
Abstract. This work investigates blood glucose level prediction model explored BGL, insulin, food, and activity information as in-
for type 1 diabetes in two horizons of 30 and 60 minutes. Initially, puts. For the same prediction horizons, Bertachi et al. [4] and Georga
three conventional regression tools—partial least square regression et al. [9], in separate studies, proposed predictive models. Bertachi
(PLSR), multilayer perceptron, and long short-term memory—are et al. applied an artificial neural network contemplating glucose, in-
deployed to create predictive models. They are trained once on 30 sulin, carbohydrate and physical activity as inputs for their system.
minutes and once on 60 minutes of historical data resulting in six ba- BGL profile, insulin, carbohydrate intake and physical activity were
sic models for each prediction horizon. A collection of these models inputs for a support vector regression (SVR) in the model developed
are then set as base-learners to develop three stacking systems; two by Georga et al. Investigating continuous glucose monitoring (CGM)
uni-lag and one multi-lag. One of the uni-lag systems uses the three data by recursive and direct deep learning approaches, Xie et al. [22]
basic models trained on 30 minutes of lag data; the other uses those recommended a model for BGL prediction. Martinsson et al. [15]
trained on 60 minutes. The multi-lag system, on the other hand, lever- proposed an automatic forecast model for a prediction horizon of up
ages the basic models trained on both lags. All three stacking systems to 60 minutes using RNN. The model used only the information from
deploy a PLSR as meta-learner. The results obtained show: i) the past BGLs as input. Bunescu et al. [7] created descriptive features to
stacking systems outperform the basic models, ii) among the stacking train a SVR using a physiological model of blood glucose dynamics.
systems, the multi-lag shows the best predictive performance with a Carbohydrate intake, insulin administration, and the current and past
root mean square error of 19.01 mg/dl and 33.37 mg/dl for the pre- BGLs were inputs of their model. Despite extensive research devoted
diction horizon of 30 and 60 minutes, respectively. to the development of predictive models, the performance of the pro-
posed models remains a challenge [3].
In this work, we contributed to the improvement of BGL pre-
1 INTRODUCTION diction for T1DM by applying a multi-lag stacking methodology.
Initially, three conventional regression tools—partial least squares,
Diabetes mellitus is a metabolic disorder and a significant cause
multilayer perceptron, and long-short term memory—were applied
of morbidity and mortality worldwide [1]. As yet, there is no cure
to forecast BGLs in horizons of 30 and 60 minutes. Each tool was
developed for diabetes; and management of the corresponding life-
trained twice; once on a lag of 30 minutes and once on a lag of 60
impeding conditions is recommended as the most successful way to
minutes of CGM data. Therefore, six basic models were created for
control the disease [6]. In fact, the occurrence of the associated com-
each prediction horizon. For each horizon, three stacking systems
plications can be suspended or even prevented by effective manage-
were then developed where predictions from a selection of the basic
ment of the disease [11].
models were used as features to train a new regression. The first two
Among different types of diabetes, the importance of the self-
stacking systems followed a uni-lag approach. They used predictions
management for type 1 diabetes mellitus (T1DM) is accentuated
from the three base models trained on a history of 30 minutes and 60
[8, 19]. The key factor in T1DM management is to control the blood
minutes, respectively. The third system was multi-lag and used pre-
glucose level (BGL) within the normal range [2]. BGL predictive
dictions from all six base models. The stacking systems resulted in
models could contribute to achieving this goal. They can help avert
appreciable improvements in predictive accuracy as compared to the
adverse glycaemic events by forecasting them and giving patients the
basic predictive models. The third stacking system showed a predic-
chance to take corrective actions ahead of time [2].
tive performance better than the other systems.
The importance of the development of BGL predictive models in
This is the first paper, to our knowledge, that has combined mod-
T1DM management has spurred research into this field [16, 22]. Ac-
els with different time-lags to generate a multi-lag BGL prediction
cording to the knowledge requirement, predictive models can be clas-
system.
sified as; physiological, data-driven, and hybrid models [21]. Data-
driven models interpret trends in sequences of data to make esti-
mations of future BGLs. Machine learning approaches are broadly 2 DATASET
adopted in this area [21].
Mirshekarian et al. [17] developed a model to predict blood glu- The Ohio T1DM dataset comprises several features collected from
cose in 30-minute and 60-minute horizons using a recursive neural 12 individuals with type 1 diabetes in 8 weeks [14, 13]. The last
network (RNN) with long short- term memory (LSTM) units. The ten days’ worth of data for each contributor was considered as the
test set. Data for a cohort of six subjects was released in 2018 for the
1 Department of Electronic and Electrical Engineering, Univer- first BGL prediction challenge [14]; data for another six subjects was
sity of Sheffield, UK, email addresses: h.khadem@sheffield.ac.uk,
hoda.nemat@sheffield.ac.uk, m.benaissa@sheffiels.ac.uk released in 2020 for the second challenge [13].
2 Department of Oncology and Metabolism, University of Sheffield, UK, In this work, the 2020’s data was investigated for developing and
email address: j.elliott@sheffield.ac.uk evaluating predictive models. Among the collected features were
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CGM data every 5 minutes, which was the only feature explored in PLSR for glucose quantification which provided promising results
this work. A brief description of the CGM data in the Ohio T1DM [12].
dataset released for 2020 BGL prediction challenge is displayed in In this work, PLSR was used as one of the regression tools. For
Table 1. the number of components, different values ranging from 1 to the
length of the input variable were tried. Each time, the predicted
Table 1. Number of test and training examples of each participant in Ohio residual sum of squares (P RESS) was calculated as follows. The
T1DM dataset released in 2020 [13]. number of components (A) resulting in the minimum value for
Patient Number of Training Number of Test P RESS/(N − A − 1) was then selected [20].
ID Examples Examples
N
540 11947 2896
X
544 10623 2716 P RESS = (yi − yˆi )2 (1)
552 9080 2364 i=1
567 10858 2389
where, N is the size of the evaluation set , yi is reference value,
584 12150 2665
596 10877 2743 and yˆi is predicted value.
• Multilayer perceptron (MLP)
An MLP [18] with an architecture of one hidden layer including
100 nodes and an output layer was implemented. ReLU was used
3 METHODS as the activation function for he hidden layer, Adam as the opti-
miser, and mean absolute error as the loss function. Learning rate
As mentioned earlier, this work proposes methodologies to predict
was 0.01, and the training process was based on 100 epochs.
BGL in horizons of 30 and 60 minutes. The detail of the pursued
methodologies is presented in this section
• Long short-term memory (LSTM)
We used a Vanila LSTM [10] composed of a single hidden LSTM
3.1 Pre-processing layer with 200 nodes, a fully connected layer with 100 nodes, and
an output layer. ReLU was the activation function for both hidden
The first pre-processing task was taking care of missing data. Miss-
layers, mean squared error was the loss function, and Adam was
ing data in the training set was imputed applying a simple linear in-
the optimizer. The model trained on 100 epochs with a learning
terpolation. Alternatively, for the test set, a linear extrapolation was
rate of 0.01.
employed. This was to ensure the model is not contaminated by ob-
serving future data in its pre-processing stage.
The next pre-processing step was transferring the time series fore- 3.2.2 Stacking systems
casting problem to a supervised learning task. To this end, a rolling
window consisting of a lag and future data was used as explanatory Ensemble learning is a machine learning technique that combines
and dependent variables respectively. To give an illustration, for fore- decisions from several models to create a new model. Stacking (Fig-
casting BGL of 30 minutes later using a history of 60 minutes, for ure 1) is an ensemble approach that uses predictions from multiple
example, we used a window with the length of 18. As a consequence base-learners (first level models) as features to train a meta-learner
of the 5-minute interval between data points, it therefore follows that (second level model). This meta-learner then makes the final predic-
the first 12 data points in the window were explanatory variables, and tions on the test set [23].
the rest were dependent variables.
3.2 Prediction methods
First, six basic predictive models were created by means of three con-
ventional regression tools. Subsequently, employing stacking learn-
ing, three more advanced predictive systems were developed where a
collection of the basic models were considered as base-learners and
a partial least squares regression as meta-learner. All proposed mod-
els/systems were personalised to individuals.
3.2.1 Basic models Figure 1. A stacking system uses predictions from multiple base-learners
as features to train a meta-learner[5] .
Initially, for each prediction horizon of 30 and 60 minutes, the fol-
lowing three conventional regressions tools were employed to gen-
erate six basic predictive models—two models by each tool. For this In this paper, for each prediction horizon of 30 and 60 minutes,
purpose, these tools were trained once on a history of 30 and once on three stacking systems comprised of two uni-lag and one multi-lag
a history of 60 minutes. were developed.
• Partial least squares regression (PLSR) • System 1
PLSR, as a basic linear regression, holds substantial popularity The three basic models trained on a history of 30 minutes were
in different applications due to its easy-to-apply nature and mini- the base-learners of this uni-lag system and a PLSR was its meta-
mal computation time requirement. In a previous work, we applied learner.
• System 2
This system was also uni-lag. It was similar to system 1, except it
used the three basic models trained on a history of 60 minutes in
place of 30 minutes as base-learners.
• System 3
In this multi-lag system, all the six basic models were considered
as the base-learners and again a PLSR was the meta-learner. By
performing a multi-lag approach the idea was to help capture a Table 2. Evaluation results of the basic predictive models for a 30-minute
broader frequency range of BGL dynamics. prediction horizon.
Patient Basic History RMSE MAE
3.3 Evaluation ID Model (min) (mg/dl) (mg/dl)
30 22.11 16.58
PLSR
The test set was held out, and the train set was used to create the 60 22.07 16.56
predictive models/systems. The developed models/systems were then 540 30 21.98 ± 0.48 16.52 ± 0.33
MLP
utilised to predict the test data. The set of evaluation points starts 60 60 22.52 ± 0.78 16.76 ± 0.62
minutes after the beginning of the test set. First evaluation points LSTM
30 21.65 ± 0.28 16.06 ± 0.12
would be otherwise similar to the training data, and it can affect the 60 21.58 ±0.67 16.20 ± 0.61
reliability of the results. Hence, the number of evaluated points for 30 18.08 13.34
PLSR
each patient is 12 less than the number of test examples mentioned 60 18.09 13.33
in Table 1. Root mean square error (RMSE) and mean absolute er- 544 30 18.22 ± 0.18 13.38 ± 0.37
MLP
60 18.25 ± 0.28 13.21 ± 0.35
ror (MAE) were calculated as follows and then used as evaluation
30 17.63 ± 0.15 12.63 ± 0.10
metrics. LSTM
60 18.42 ± 0.60 13.36 ± 0.44
rP
N 30 16.76 12.76
(y − yˆi )2 PLSR
i=1 i 60 16.79 12.78
RM SE = (2)
N 552 MLP
30 17.08 ± 0.36 12.91 ± 0.40
PN 60 17.03 ± 0.34 12.77 ± 0.17
i=1
|yi − yˆi | 30 16.49 ± 0.10 12.29 ± 0.24
M AE = (3) LSTM
N 60 17.06 ± 0.70 12.88 ± 0.51
where, N , yi , and yˆi carry the same definition as in (1). 30 20.98 15.12
PLSR
60 21.00 15.07
567 30 21.24 ± 0.70 15.42 ± 0.76
MLP
4 RESULTS AND DISCUSSION 60 21.10 ± 0.46 15.13 ± 0.58
30 20.66 ± 0.16 14.79 ± 0.25
LSTM
This section presents the evaluation results for both the basic mod- 60 20.77 ± 0.36 14.72 ± 0.40
els and stacking systems. Models/systems with a performance de- 30 22.00 16.15
PLSR
pended on random initialization ran five times, and corresponding 60 21.97 16.12
results have been reported in the form of mean and standard devia- 584 30 21.67 ± 0.18 15.63 ± 0.16
MLP
60 22.43 ± 0.48 16.35 ± 0.61
tion. Extrapolated points were excluded when calculating the eval-
uation metrics. All models were built to predict future BGLs up to 30 22.23 ± 0.70 16.33 ± 0.67
LSTM
60 22.04 ± 0.22 16.11 ± 0.28
the end of the intended prediction horizon, but only the evaluation
30 17.79 12.77
results for the horizon of interest are reported. PLSR
60 17.62 12.67
596 30 17.74 ± 0.04 12.55 ± 0.05
MLP
60 18.44 ± 0.26 13.49 ± 0.42
4.1 Prediction horizon of 30 minutes
30 17.76 ± 0.67 12.74 ± 0.55
LSTM
4.1.1 Basic models 60 17.71 ± 0.28 12.50 ± 0.33
30 19.62 14.45
PLSR
The results of the RMSE and MAE of the basic predictive models for 60 19.59 14.42
the prediction horizon of 30 minutes are displayed in Table 2. Average MLP
30 19.65 ± 0.32 14.40 ± 0.35
Based on the average of RMSE and MAE for all patients, LSTM 60 19.96 ± 0.43 14.62 ± 0.46
trained on a history of 30 minutes showed the best performance 30 19.40 ± 0.34 14.14 ± 0.32
LSTM
among the basic models. PLSR with 60-minute lag was the second- 60 19.60 ± 0.47 14.30 ± 0.43
best model. All models had satisfactory standard deviations.
LSTM yielded the best overall predictive accuracy among the three
regression tools. However, the results of the other two tools were also
comparable to that of LSTM. It is worth remarking that PLSR, as a
linear regression tool, was able to generate results comparable to that
of LSTM and even better than that of MLP.
Among all patients, patient 552 had the best overall evaluation
results. The worst results, on the other hand, belonged to patients
584 and 540.
4.1.2 Stacking systems
Table 3 shows the evaluation results of the stacking systems for a
prediction horizon of 30 minutes. For all patients, the performance
of the stacking systems surpassed that of the basic models. System
3 proposed the best predictions overall based on average RMSE and
MAE values. This system resulted in the best predictive accuracy
for all patients except patient 544 and 584. All systems possessed
small standard deviation values. The best result among all patients Table 4. Evaluation results of the basic predictive models for a 60-minute
belonged to patient 552. The worst results, on the other hand, were prediction horizon.
those of patients 584, 540, and 567.
Patient Basic History RMSE MAE
ID Model (min) (mg/dl) (mg/dl)
Table 3. Evaluation results of the stacking systems for a 30-minute 30 41.03 31.68
prediction horizon. PLSR
60 41.03 31.70
Patient Stacking RMSE MAE 540 30 40.20 ± 0.38 30.90 ± 0.21
MLP
ID System (mg/dl) (mg/dl) 60 41.94 ± 2.18 32.14 ± 1.53
System 1 21.13 ± 0.08 15.72 ± 0.10 30 40.36 ± 0.91 30.80 ± 0.64
LSTM
540 System 2 21.11 ± 0.18 15.69 ± 0.14 60 39.65 ± 1.16 30.28 ± 0.84
System 3 20.93 ± 0.11 15.52 ± 0.13 30 31.80 24.71
PLSR
System 1 17.47 ± 0.05 12.50 ± 0.05 60 31.83 24.71
544 System 2 17.92 ± 0.10 12.93 ± 0.08 544 30 31.58 ± 0.53 24.19 ± 0.99
MLP
System 3 17.52 ± 0.05 12.50 ± 0.07 60 32.15 ± 0.63 24.13 ± 0.83
System 1 16.29 ± 0.06 12.13 ± 0.06 30 30.61 ± 0.19 22.97 ± 0.26
LSTM
552 System 2 16.43 ± 0.12 12.33 ± 0.16 60 31.79 ± 0.31 24.57 ± 0.73
System 3 16.21 ± 0.09 12.08 ± 0.08 30 30.23 23.67
PLSR
System 1 20.43 ± 0.07 14.47 ± 0.06 60 30.24 23.68
567 System 2 20.51 ± 0.14 14.51 ± 0.16 30 30.14 ± 0.09 23.27 ± 0.24
552 MLP
System 3 20.43 ± 0.06 14.41 ± 0.06 60 30.59 ± 1.01 23.65 ± 0.63
System 1 21.61 ± 0.06 15.68 ± 0.04 30 29.84 ± 0.25 22.52 ± 0.29
584 System 2 21.83 ± 0.14 15.86 ± 0.08 LSTM
60 31.36 ± 1.43 23.72 ± 1.77
System 3 21.75 ± 0.08 15.76 ± 0.07
30 37.47 28.28
System 1 17.26 ± 0.03 12.19 ± 0.03 PLSR
60 37.53 28.24
596 System 2 17.47 ± 0.15 12.25 ± 0.11
System 3 17.22 ± 0.10 12.09 ± 0.04 567 30 36.81 ± 0.28 27.52 ± 0.50
MLP
60 37.73 ± 1.28 28.57 ± 1.35
System 1 19.03 ± 0.06 13.78 ± 0.06
Average System 2 19.21 ± 0.14 13.93 ± 0.12 30 36.56 ± 0.17 27.58 ± 0.28
LSTM
System 3 19.01 ± 0.08 13.73 ± 0.07 60 37.17 ± 0.58 27.90 ± 0.72
30 36.71 27.65
PLSR
60 36.84 27.75
584 30 36.32 ± 0.59 26.95 ± 0.66
MLP
60 37.35 ± 0.82 27.82 ± 0.92
30 37.14 ± 0.98 28.03 ± 1.14
LSTM
4.2 Prediction horizon of 60 minutes 60 37.03 ± 0.99 27.42 ± 0.54
30 29.63 22.05
PLSR
4.2.1 Basic models 60 29.48 21.97
596 30 29.68 ± 0.27 21.87 ± 0.31
MLP
Table 4 lists RMSE and MAE of the basic models for 60-minute pre- 60 29.97 ± 0.39 22.08 ± 0.39
diction horizon. Among all models, LSTM trained on a lag of 30 min- 30 28.98 ± 0.29 21.14 ± 0.19
LSTM
utes showed the best performance. MLP trained on 300 minutes was 60 29.71 ± 0.72 22.09 ± 0.80
the second high-performance model. The value of standard deviation 30 34.48 26.43
PLSR
for all models were satisfactory. Among the implemented regression 60 34.55 26.34
tools, LSTM resulted in the highest overall prediction accuracy. PLSR Average 30 34.12 ± 0.36 25.78 ± 0.49
MLP
60 34.95 ± 1.05 26.40 ± 0.94
produced acceptable results in this case too. Data for patients 596 and
552 showed the highest overall predictability. In, contrast, patients 30 33.92 ± 0.47 25.51 ± 0.47
LSTM
60 34.45 ± 0.86 26.00 ± 0.90
540, 567, and 584 had the lowest predictable data.
4.2.2 Stacking systems
Evaluation results of the stacking systems for a prediction horizon
of 60 minutes are displayed in Table 5. System 3 proposed the best
overall predictions based on average RMSE and MAE values. The
best result among all patients belonged to patient 596. All systems
had low values of standard deviation.
[4] Arthur Bertachi, Lyvia Biagi, Iván Contreras, Ningsu Luo, and Josep
Table 5. Evaluation results of the stacking systems for a 60-minute Vehí, ‘Prediction of blood glucose levels and nocturnal hypoglycemia
prediction horizon. using physiological models and artificial neural networks.’, in KHD@
IJCAI, pp. 85–90, (2018).
Patient Stacking RMSE MAE [5] Julio Borges, The Power of Ensembles in Deep Learning, 2019.
ID System (mg/dl) (mg/dl) https://towardsdatascience.com/the-power-of-ensembles-in-deep-
System 1 39.47 ± 0.17 30.10 ± 0.17 learning-a8900ff42be9.
540 System 2 39.14 ± 0.28 29.76 ± 0.20 [6] Danielle Bruen, Colm Delaney, Larisa Florea, and Dermot Diamond,
System 3 39.00 ± 0.20 29.65 ± 0.12 ‘Glucose sensing for diabetes monitoring: recent developments’, Sen-
System 1 30.47 ± 0.10 22.92 ± 0.13 sors, 17(8), 1866, (2017).
544 System 2 31.12 ± 0.12 23.72 ± 0.14 [7] Razvan Bunescu, Nigel Struble, Cindy Marling, Jay Shubrook, and
System 3 30.54 ± 0.09 22.95 ± 0.17 Frank Schwartz, ‘Blood glucose level prediction using physiological
models and support vector regression’, in 2013 12th International Con-
System 1 29.39 ± 0.15 22.39 ± 0.13 ference on Machine Learning and Applications, volume 1, pp. 135–140.
552 System 2 29.38 ± 0.20 22.46 ± 0.20 IEEE, (2013).
System 3 29.10 ± 0.13 22.10 ± 0.14 [8] Mol Ecol, ‘HHS Public Access’, 25(5), 1032–1057, (2017).
System 1 36.11 ± 0.11 27.08 ± 0.15 [9] Eleni I Georga, Vasilios C Protopappas, Diego Ardigò, Demosthenes
567 System 2 36.54 ± 0.14 27.36 ± 0.14 Polyzos, and Dimitrios I Fotiadis, ‘A glucose model based on sup-
System 3 36.31 ± 0.14 27.09 ± 0.08 port vector regression for the prediction of hypoglycemic events un-
System 1 36.15 ± 0.16 27.04 ± 0.18 der free-living conditions’, Diabetes technology & therapeutics, 15(8),
584 System 2 36.68 ± 0.19 27.43 ± 0.19 634–643, (2013).
System 3 36.52 ± 0.10 27.30 ± 0.14 [10] Sepp Hochreiter and Jürgen Schmidhuber, ‘Long short-term memory’,
Neural computation, 9(8), 1735–1780, (1997).
System 1 28.74 ± 0.16 20.84 ± 0.12
[11] George S Jeha, Lefkothea P Karaviti, Barbara Anderson, EO’Brian
596 System 2 29.06 ± 0.21 21.13 ± 0.27
Smith, Susan Donaldson, Toniean S McGirk, and Morey W Haymond,
System 3 28.75 ± 0.10 20.78 ± 0.05
‘Continuous glucose monitoring and the reality of metabolic control in
System 1 33.39 ± 0.14 25.06 ± 0.15 preschool children with type 1 diabetes’, Diabetes Care, 27(12), 2881–
Average System 2 33.65 ± 0.19 25.31 ± 0.19 2886, (2004).
System 3 33.37 ± 0.13 24.98 ± 0.12 [12] Heydar Khadem, Mohammad R Eissa, Hoda Nemat, Osamah Alrezj,
and Mohammed Benaissa, ‘Classification before regression for im-
proving the accuracy of glucose quantification using absorption spec-
troscopy’, Talanta, 211, 120740, (2020).
5 CONCLUSION [13] Cindy Marling and Razvan Bunescu, ‘The ohiot1dm dataset for blood
glucose level prediction: Update 2020’.
BGL prediction improved using stacking learning concepts. Initially, [14] Cindy Marling and Razvan C Bunescu, ‘The OhioT1DM Dataset For
a time series problem was translated into a supervised learning task. Blood Glucose Level Prediction.’, in 3rd International Workshop on
Knowledge Discovery in Healthcare Data, pp. 60–63, (2018).
Three conventional regression tools were trained with on different [15] John Martinsson, Alexander Schliep, Björn Eliasson, Christian Meijner,
history length of 30 and 60 minutes, resulting in six basic predic- Simon Persson, and Olof Mogren, ‘Automatic blood glucose prediction
tive models. Predictions from the basic models trained with a history with confidence using recurrent neural networks’, in 3rd International
of 30 minutes were fed as features to a regression to build a com- Workshop on Knowledge Discovery in Healthcare Data, KDH@ IJCAI-
bined learner. The learner was then used to make final predictions on ECAI 2018, 13 July 2018, pp. 64–68, (2018).
[16] Cooper Midroni, Peter J. Leimbigler, Gaurav Baruah, Maheedhar
the test set. The same scenario was repeated using the basic models Kolla, Alfred J. Whitehead, and Yan Fossat, ‘Predicting glycemia in
trained on 60-minute lag observations. In both cases, the combined type 1 diabetes patients: Experiments with XGBoost’, CEUR Workshop
learner was able to make more accurate predictions on the test set. Proceedings, 2148, 79–84, (2018).
The overall performance further improved when predictions from all [17] Sadegh Mirshekarian, Razvan Bunescu, Cindy Marling, and Frank
Schwartz, ‘Using lstms to learn physiological models of blood glucose
basic models—trained on both histories of 30 and 60 minutes—were behavior’, in 2017 39th Annual International Conference of the IEEE
considered as features to train a new learner. Engineering in Medicine and Biology Society (EMBC), pp. 2887–2891.
IEEE, (2017).
[18] Fionn Murtagh, ‘Multilayer perceptrons for classification and regres-
6 SOFTWARE AND CODE sion’, Neurocomputing, 2(5-6), 183–197, (1991).
[19] Shauna S Roberts, ‘Type 1 diabetes’, Diabetes Forecast, 55, 19, (2002).
[20] Svante Wold, Michael Sjöström, and Lennart Eriksson, ‘Pls-regression:
For data analysis we used Python 3.6, TensorFlow 1.15.0 and a basic tool of chemometrics’, Chemometrics and intelligent laboratory
Keras 2.2.5. Pandas, NumPy and Sklearn packages of python systems, 58(2), 109–130, (2001).
were used. The codes are available at: https://gitlab.com/ [21] Ashenafi Zebene Woldaregay, Eirik Årsand, Taxiarchis Botsis, David
Heydar-Khadem/multi-lag-stacking.git Albers, Lena Mamykina, and Gunnar Hartvigsen, ‘Data-driven blood
glucose pattern classification and anomalies detection: machine-
learning applications in type 1 diabetes’, Journal of medical Internet
research, 21(5), e11030, (2019).
REFERENCES [22] Jinyu Xie and Qian Wang, ‘Benchmark machine learning approaches
with classical time series approaches on the blood glucose level predic-
[1] Florencia Aguiree, Alex Brown, Nam Ho Cho, Gisela Dahlquist, Sheree tion challenge.’, in KHD@ IJCAI, pp. 97–102, (2018).
Dodd, Trisha Dunning, Michael Hirst, Christopher Hwang, Dianna [23] Zhi-Hua Zhou, Ensemble methods: foundations and algorithms, CRC
Magliano, Chris Patterson, et al., ‘Idf diabetes atlas’, (2013). press, 2012.
[2] Ramzi Ajjan, David Slattery, and Eugene Wright, ‘Continuous glucose
monitoring: A brief review for primary care practitioners’, Advances in
therapy, 36(3), 579–596, (2019).
[3] Muhammad Asad and Usman Qamar, ‘A review of continuous blood
glucose monitoring and prediction of blood glucose level for diabetes
type 1 patient in different prediction horizons (ph) using artificial neural
network (ann)’, in Proceedings of SAI Intelligent Systems Conference,
pp. 684–695. Springer, (2019).