INTRODUCTION

Multi-lag Stacking for Blood Glucose Level Prediction

Heydar Khadem

h.khadem@shef 0

Mohammed Benaissa

m.benaissa@shef 0 0 Department of Electronic and Electrical Engineering , Univer- 1 Department of Oncology and Metabolism, University of Sheffield , UK

This work investigates blood glucose level prediction for type 1 diabetes in two horizons of 30 and 60 minutes. Initially, three conventional regression tools-partial least square regression (PLSR), multilayer perceptron, and long short-term memory-are deployed to create predictive models. They are trained once on 30 minutes and once on 60 minutes of historical data resulting in six basic models for each prediction horizon. A collection of these models are then set as base-learners to develop three stacking systems; two uni-lag and one multi-lag. One of the uni-lag systems uses the three basic models trained on 30 minutes of lag data; the other uses those trained on 60 minutes. The multi-lag system, on the other hand, leverages the basic models trained on both lags. All three stacking systems deploy a PLSR as meta-learner. The results obtained show: i) the stacking systems outperform the basic models, ii) among the stacking systems, the multi-lag shows the best predictive performance with a root mean square error of 19.01 mg/dl and 33.37 mg/dl for the prediction horizon of 30 and 60 minutes, respectively.

INTRODUCTION

Diabetes mellitus is a metabolic disorder and a significant cause of morbidity and mortality worldwide [ 1 ]. As yet, there is no cure developed for diabetes; and management of the corresponding lifeimpeding conditions is recommended as the most successful way to control the disease [ 6 ]. In fact, the occurrence of the associated complications can be suspended or even prevented by effective management of the disease [ 11 ].

Among different types of diabetes, the importance of the selfmanagement for type 1 diabetes mellitus (T1DM) is accentuated [ 8, 19 ]. The key factor in T1DM management is to control the blood glucose level (BGL) within the normal range [ 2 ]. BGL predictive models could contribute to achieving this goal. They can help avert adverse glycaemic events by forecasting them and giving patients the chance to take corrective actions ahead of time [ 2 ].

The importance of the development of BGL predictive models in T1DM management has spurred research into this field [ 16, 22 ]. According to the knowledge requirement, predictive models can be classified as; physiological, data-driven, and hybrid models [ 21 ]. Datadriven models interpret trends in sequences of data to make estimations of future BGLs. Machine learning approaches are broadly adopted in this area [ 21 ].

Mirshekarian et al. [ 17 ] developed a model to predict blood glucose in 30-minute and 60-minute horizons using a recursive neural network (RNN) with long short- term memory (LSTM) units. The model explored BGL, insulin, food, and activity information as inputs. For the same prediction horizons, Bertachi et al. [ 4 ] and Georga et al. [ 9 ], in separate studies, proposed predictive models. Bertachi et al. applied an artificial neural network contemplating glucose, insulin, carbohydrate and physical activity as inputs for their system. BGL profile, insulin, carbohydrate intake and physical activity were inputs for a support vector regression (SVR) in the model developed by Georga et al. Investigating continuous glucose monitoring (CGM) data by recursive and direct deep learning approaches, Xie et al. [ 22 ] recommended a model for BGL prediction. Martinsson et al. [ 15 ] proposed an automatic forecast model for a prediction horizon of up to 60 minutes using RNN. The model used only the information from past BGLs as input. Bunescu et al. [ 7 ] created descriptive features to train a SVR using a physiological model of blood glucose dynamics. Carbohydrate intake, insulin administration, and the current and past BGLs were inputs of their model. Despite extensive research devoted to the development of predictive models, the performance of the proposed models remains a challenge [ 3 ].

In this work, we contributed to the improvement of BGL prediction for T1DM by applying a multi-lag stacking methodology. Initially, three conventional regression tools—partial least squares, multilayer perceptron, and long-short term memory—were applied to forecast BGLs in horizons of 30 and 60 minutes. Each tool was trained twice; once on a lag of 30 minutes and once on a lag of 60 minutes of CGM data. Therefore, six basic models were created for each prediction horizon. For each horizon, three stacking systems were then developed where predictions from a selection of the basic models were used as features to train a new regression. The first two stacking systems followed a uni-lag approach. They used predictions from the three base models trained on a history of 30 minutes and 60 minutes, respectively. The third system was multi-lag and used predictions from all six base models. The stacking systems resulted in appreciable improvements in predictive accuracy as compared to the basic predictive models. The third stacking system showed a predictive performance better than the other systems.

This is the first paper, to our knowledge, that has combined models with different time-lags to generate a multi-lag BGL prediction system. 2

DATASET

The Ohio T1DM dataset comprises several features collected from 12 individuals with type 1 diabetes in 8 weeks [ 14, 13 ]. The last ten days’ worth of data for each contributor was considered as the test set. Data for a cohort of six subjects was released in 2018 for the first BGL prediction challenge [ 14 ]; data for another six subjects was released in 2020 for the second challenge [ 13 ].

In this work, the 2020’s data was investigated for developing and evaluating predictive models. Among the collected features were CGM data every 5 minutes, which was the only feature explored in this work. A brief description of the CGM data in the Ohio T1DM dataset released for 2020 BGL prediction challenge is displayed in Table 1. The first pre-processing task was taking care of missing data. Missing data in the training set was imputed applying a simple linear interpolation. Alternatively, for the test set, a linear extrapolation was employed. This was to ensure the model is not contaminated by observing future data in its pre-processing stage.

The next pre-processing step was transferring the time series forecasting problem to a supervised learning task. To this end, a rolling window consisting of a lag and future data was used as explanatory and dependent variables respectively. To give an illustration, for forecasting BGL of 30 minutes later using a history of 60 minutes, for example, we used a window with the length of 18. As a consequence of the 5-minute interval between data points, it therefore follows that the first 12 data points in the window were explanatory variables, and the rest were dependent variables. 3.2

Prediction methods

First, six basic predictive models were created by means of three conventional regression tools. Subsequently, employing stacking learning, three more advanced predictive systems were developed where a collection of the basic models were considered as base-learners and a partial least squares regression as meta-learner. All proposed models/systems were personalised to individuals. 3.2.1

Basic models

Initially, for each prediction horizon of 30 and 60 minutes, the following three conventional regressions tools were employed to generate six basic predictive models—two models by each tool. For this purpose, these tools were trained once on a history of 30 and once on a history of 60 minutes.

Partial least squares regression (PLSR)

PLSR, as a basic linear regression, holds substantial popularity in different applications due to its easy-to-apply nature and minimal computation time requirement. In a previous work, we applied PLSR for glucose quantification which provided promising results [ 12 ].

In this work, PLSR was used as one of the regression tools. For the number of components, different values ranging from 1 to the length of the input variable were tried. Each time, the predicted residual sum of squares (P RESS) was calculated as follows. The number of components (A) resulting in the minimum value for P RESS=(N A 1) was then selected [ 20 ].

P RESS =

N X(yi i=1 y^i)2 where, N is the size of the evaluation set , yi is reference value, and y^i is predicted value.

Multilayer perceptron (MLP)

An MLP [ 18 ] with an architecture of one hidden layer including 100 nodes and an output layer was implemented. ReLU was used as the activation function for he hidden layer, Adam as the optimiser, and mean absolute error as the loss function. Learning rate was 0.01, and the training process was based on 100 epochs.

Long short-term memory (LSTM)

We used a Vanila LSTM [ 10 ] composed of a single hidden LSTM layer with 200 nodes, a fully connected layer with 100 nodes, and an output layer. ReLU was the activation function for both hidden layers, mean squared error was the loss function, and Adam was the optimizer. The model trained on 100 epochs with a learning rate of 0.01. (1) 3.2.2

Stacking systems

Ensemble learning is a machine learning technique that combines decisions from several models to create a new model. Stacking (Figure 1) is an ensemble approach that uses predictions from multiple base-learners (first level models) as features to train a meta-learner (second level model). This meta-learner then makes the final predictions on the test set [ 23 ].

In this paper, for each prediction horizon of 30 and 60 minutes, three stacking systems comprised of two uni-lag and one multi-lag were developed.

System 1

The three basic models trained on a history of 30 minutes were the base-learners of this uni-lag system and a PLSR was its metalearner.

System 2

This system was also uni-lag. It was similar to system 1, except it used the three basic models trained on a history of 60 minutes in place of 30 minutes as base-learners.

System 3

In this multi-lag system, all the six basic models were considered as the base-learners and again a PLSR was the meta-learner. By performing a multi-lag approach the idea was to help capture a broader frequency range of BGL dynamics. 3.3

Evaluation

The test set was held out, and the train set was used to create the predictive models/systems. The developed models/systems were then utilised to predict the test data. The set of evaluation points starts 60 minutes after the beginning of the test set. First evaluation points would be otherwise similar to the training data, and it can affect the reliability of the results. Hence, the number of evaluated points for each patient is 12 less than the number of test examples mentioned in Table 1. Root mean square error (RMSE) and mean absolute error (MAE) were calculated as follows and then used as evaluation metrics.

RM SE =

M AE = r PN i=1(yi

N PN i=1 jyi

y^i)2 y^ij (2) (3) where, N , yi , and y^i carry the same definition as in (1). 4

RESULTS AND DISCUSSION

This section presents the evaluation results for both the basic models and stacking systems. Models/systems with a performance depended on random initialization ran five times, and corresponding results have been reported in the form of mean and standard deviation. Extrapolated points were excluded when calculating the evaluation metrics. All models were built to predict future BGLs up to the end of the intended prediction horizon, but only the evaluation results for the horizon of interest are reported. 4.1 4.1.1

Prediction horizon of 30 minutes Basic models

The results of the RMSE and MAE of the basic predictive models for the prediction horizon of 30 minutes are displayed in Table 2.

Based on the average of RMSE and MAE for all patients, LSTM trained on a history of 30 minutes showed the best performance among the basic models. PLSR with 60-minute lag was the secondbest model. All models had satisfactory standard deviations.

LSTM yielded the best overall predictive accuracy among the three regression tools. However, the results of the other two tools were also comparable to that of LSTM. It is worth remarking that PLSR, as a linear regression tool, was able to generate results comparable to that of LSTM and even better than that of MLP.

Among all patients, patient 552 had the best overall evaluation results. The worst results, on the other hand, belonged to patients 584 and 540.

Prediction horizon of 60 minutes Basic models

Evaluation results of the stacking systems for a prediction horizon of 60 minutes are displayed in Table 5. System 3 proposed the best overall predictions based on average RMSE and MAE values. The best result among all patients belonged to patient 596. All systems had low values of standard deviation. 5

CONCLUSION

BGL prediction improved using stacking learning concepts. Initially, a time series problem was translated into a supervised learning task. Three conventional regression tools were trained with on different history length of 30 and 60 minutes, resulting in six basic predictive models. Predictions from the basic models trained with a history of 30 minutes were fed as features to a regression to build a combined learner. The learner was then used to make final predictions on the test set. The same scenario was repeated using the basic models trained on 60-minute lag observations. In both cases, the combined learner was able to make more accurate predictions on the test set. The overall performance further improved when predictions from all basic models—trained on both histories of 30 and 60 minutes—were considered as features to train a new learner. For data analysis we used Python 3.6, TensorFlow 1.15.0 and Keras 2.2.5. Pandas, NumPy and Sklearn packages of python were used. The codes are available at: https://gitlab.com/ Heydar-Khadem/multi-lag-stacking.git

[1]

Florencia

Aguiree , Alex Brown, Nam Ho Cho, Gisela Dahlquist, Sheree Dodd, Trisha Dunning, Michael Hirst, Christopher Hwang, Dianna Magliano,

Chris

Patterson , et al., 'Idf diabetes atlas' , ( 2013 ).

[2]

Ramzi

Ajjan , David Slattery, and Eugene Wright, ' Continuous glucose monitoring: A brief review for primary care practitioners' , Advances in therapy, 36 ( 3 ), 579 - 596 , ( 2019 ).

[3]

Muhammad

Asad and Usman Qamar, ' A review of continuous blood glucose monitoring and prediction of blood glucose level for diabetes type 1 patient in different prediction horizons (ph) using artificial neural network (ann)' , in Proceedings of SAI Intelligent Systems Conference , pp. 684 - 695 . Springer, ( 2019 ).

[4]

Arthur

Bertachi , Lyvia Biagi, Iván Contreras, Ningsu Luo, and Josep Vehí, ' Prediction of blood glucose levels and nocturnal hypoglycemia using physiological models and artificial neural networks .', in

KHD

@ IJCAI, pp. 85 - 90 , ( 2018 ).

[5]

Julio

Borges , The Power of Ensembles in Deep Learning , 2019 . https://towardsdatascience.com/ the-power-of-ensembles-in-deeplearning-a8900ff42be9.

[6]

Danielle

Bruen , Colm Delaney, Larisa Florea, and Dermot Diamond, ' Glucose sensing for diabetes monitoring: recent developments' , Sensors , 17 ( 8 ), 1866 , ( 2017 ).

[7]

Razvan

Bunescu , Nigel Struble, Cindy Marling, Jay Shubrook, and Frank Schwartz, ' Blood glucose level prediction using physiological models and support vector regression' , in 2013 12th International Conference on Machine Learning and Applications , volume 1 , pp. 135 - 140 . IEEE, ( 2013 ).

[8]

Mol

Ecol , 'HHS Public Access', 25 ( 5 ), 1032 - 1057 , ( 2017 ).

[9] Eleni

I Georga

, Vasilios C Protopappas, Diego Ardigò, Demosthenes Polyzos, and Dimitrios

I Fotiadis,

' A glucose model based on support vector regression for the prediction of hypoglycemic events under free-living conditions' , Diabetes technology & therapeutics, 15(8) , 634 - 643 , ( 2013 ).

[10]

Sepp

Hochreiter and Jürgen Schmidhuber, 'Long short-term memory' , Neural computation , 9 ( 8 ), 1735 - 1780 , ( 1997 ).

[11] George

S Jeha

, Lefkothea P Karaviti, Barbara Anderson , EO'Brian Smith , Susan Donaldson , Toniean S McGirk , and Morey W Haymond, ' Continuous glucose monitoring and the reality of metabolic control in preschool children with type 1 diabetes' , Diabetes Care , 27 ( 12 ), 2881 - 2886 , ( 2004 ).

[12] Heydar

Khadem

, Mohammad R Eissa, Hoda Nemat, Osamah Alrezj, and Mohammed Benaissa, ' Classification before regression for improving the accuracy of glucose quantification using absorption spectroscopy' , Talanta , 211 , 120740 , ( 2020 ).

[13]

Cindy

Marling and Razvan Bunescu, ' The ohiot1dm dataset for blood glucose level prediction: Update 2020 '.

[14]

Cindy

Marling and Razvan C Bunescu , ' The OhioT1DM Dataset For Blood Glucose Level Prediction .', in 3rd International Workshop on Knowledge Discovery in Healthcare Data , pp. 60 - 63 , ( 2018 ).

[15] John

Martinsson

, Alexander Schliep, Björn Eliasson, Christian Meijner, Simon Persson, and Olof Mogren, ' Automatic blood glucose prediction with confidence using recurrent neural networks' , in 3rd International Workshop on Knowledge Discovery in Healthcare Data, KDH@ IJCAIECAI 2018 , 13 July 2018 , pp. 64 - 68 , ( 2018 ).

[16] Cooper

Midroni

, Peter J. Leimbigler , Gaurav Baruah, Maheedhar Kolla, Alfred J. Whitehead , and Yan Fossat, ' Predicting glycemia in type 1 diabetes patients: Experiments with XGBoost' , CEUR Workshop Proceedings , 2148 , 79 - 84 , ( 2018 ).

[17] Sadegh

Mirshekarian

, Razvan Bunescu, Cindy Marling, and Frank Schwartz, ' Using lstms to learn physiological models of blood glucose behavior' , in 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) , pp. 2887 - 2891 . IEEE, ( 2017 ).

[18] Fionn

Murtagh

, ' Multilayer perceptrons for classification and regression' , Neurocomputing , 2 ( 5-6 ), 183 - 197 , ( 1991 ).

[19] Shauna

S Roberts

, ' Type 1 diabetes' , Diabetes Forecast , 55 , 19 , ( 2002 ).

[20] Svante

Wold

Michael

Sjöström , and Lennart Eriksson, ' Pls-regression: a basic tool of chemometrics', Chemometrics and intelligent laboratory systems , 58 ( 2 ), 109 - 130 , ( 2001 ).

[21]

Ashenafi

Zebene Woldaregay , Eirik Årsand, Taxiarchis Botsis, David Albers,

Lena

Mamykina , and Gunnar Hartvigsen, ' Data-driven blood glucose pattern classification and anomalies detection: machinelearning applications in type 1 diabetes' , Journal of medical Internet research , 21 ( 5 ), e11030 , ( 2019 ).

[22]

Jinyu

Xie and

Qian

Wang , ' Benchmark machine learning approaches with classical time series approaches on the blood glucose level prediction challenge .', in

KHD

@ IJCAI, pp. 97 - 102 , ( 2018 ).

[23] Zhi-Hua

Zhou

, Ensemble methods: foundations and algorithms , CRC press, 2012 .