Comparison of Forecasting Algorithms for Type 1 Diabetic Glucose Prediction on 30 and 60-Minute Prediction Horizons Richard McShinsky1 and Brandon Marshall2 Abstract. Control of blood glucose (BG) levels is essential for di- Exogenous Regressor (VAR), Ordinary Least Squares (OLS), K- abetes management, especially for long term health improvement. Nearest Neighbors (KNN), SVM, RF, Gradient Boosting, XGBoost- Predicting both hypoglycemic events (BG < 70 mg/dl) and hyper- ing, Adaptive Neuro-Fuzzy Inference System (ANFIS), and Multi- glycemic events (BG > 180 mg/dl) is essential in helping diabetics Layer Perceptron. Additionally we attempt to use both the Kalman control their long term health. In this paper we attempt to forecast Filter and the Unscented Kalman Filter (UKF) to predict future blood future blood glucose levels, as well as analyze the efficiency of de- glucose values. The Unscented Kalman Filter was chosen over the tecting both hypoglycemic events and hyperglycemic events. We do Extended Kalman Filter due to its ability to use state-space models so by comparing Auto-Regressive Integrated Moving-Average, Vector to predict nonlinear functions. In comparing each of these model’s Auto-Regression, Kalman Filter, Unscented Kalman Filter, Ordinary effectiveness we use RMSE, MAE, the Matthew Correlation Coef- Least Squares, Support Vector Machines, Random Forests, Gradient ficient (A commonly used metric for checking hypoglycemic and Boosted Trees, XGBoosted Trees, Adaptive Neuro-Fuzzy Inference hyperglycemic events that roughly measures the quality of binary System (ANFIS), and Multi-Layer Perceptron in terms of Root Mean classifications) [4], and the Clarke Error Grid. Squared Error, Mean Absolute Error, Coefficient of Determination, Matthews Correlation Coefficient, and Clarke Error Grid to com- pare their effectiveness in predicting future blood glucose levels, as 2 Data well as predicting both hypoglycemic and hyperglycemic events. 2.1 OHIO T1DM The data used for this comparison was the OhioT1DM data set, 1 Introduction which was obtained as part of the second Blood Glucose Level Prediction Challenge [5]. This data set contains eight weeks worth Blood glucose prediction has been an ongoing challenge within the of data for 12 people with type 1 diabetes. All contributors were medical field due to the near unpredictable variability of the many on insulin pump therapy with continuous blood glucose monitoring underlying factors influencing an individual’s glucose levels. There (CGM). All pumps were of one of two brands, all life event data was has been a strong drive recently to create an artificial pancreas using reported via a custom smartphone app, and all psychological data artificial intelligence, which has necessitated the need to predict fu- was provided from a fitness band. The features themselves provided ture blood glucose levels as well as the ability to accurately predict in the data set are: Date, Glucose Level, Finger Stick, Basal (Insulin), the onset of both hypoglycemic (BG < 70 mg/dl) and hyperglycemic Basal Temperature, Bolus (Insulin), Meal (Carbohydrate Estimate), (BG > 180 mg/dl) events [11]. Sleep, Work, Stressors, Hypoglycemic Event, Illness, Exercise, Basis Most predictive models for blood glucose encompass a physio- Heart Rate, Basis GSR, Basis Skin Temperature, Basis Air Temper- logical profile that includes a person’s insulin, meal absorption, and ature, Basis Steps, Basis Sleep, and Acceleration [5]. past blood glucose levels [13]. Various machine learning methods The train and test splits were given as part of the second Blood that have been attempted to predict future blood glucose levels with Glucose Level Prediction Challenge (see [5] for more details). regards to this profile include Auto-Regressive Integrated Moving- Average (ARIMA, see [3], [4], [13], and [15]), Support Vector Ma- chines and Kernel Regression (SVM, see [3], [12], [13], and [15]), 2.2 Preprocessing Random Forests (RF, see [8], [12], [13], and [15]), Gradient Boosted The glucose readings are in about 5-minute increments while other Trees (see [8] and [15]), and Artificial Neural Networks (see REF- reading are every minute. Other readings reported by the patient are ERENCES). at arbitrary times not aligned with the glucose readings. To combine Comparing papers on the results, accuracy, and effectiveness of them into one data frame to use for predicting glucose, the most im- the models is near impossible due to different data sets being used portant predictor, glucose levels, was made the main index. All other between them. This paper seeks to offer a comparison of as many values were merged to the closest glucose values within the previ- models as possible on a single data set. ous 4 minutes. For values that were not in this tolerance they were In this paper, we compare the effectiveness of several mod- dropped from the data frame. els, namely ARIMA, Vector Auto-Regression Moving-Average with Most of these values that were dropped were due to missing data. 1 Brigham Young University, USA, email: richard.mcshinsky@byu.net There are many gaps where the meter was not recording glucose val- 2 Brigham Young University, USA, email: brandon.marshall@byu.net ues. This could be times between taking it off and putting it on, the Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). hour or more it takes for the meter to get set up, or a day where the using the previous p blood glucose values. VAR used the same pa- user just did not put it on. Leaving these gaps often resulted in large rameters used in the ARIMA model described above. jumps in the training and testing data. These discontinuities would be a problem in training the models. To fill them we couldn’t use interpolation methods as we are unable to know the future while pre- 3.1.3 Unscented Kalman Filter (UKF) dicting these values. Therefore, our method to extrapolate values for these times was to use a moving average. For example, for the first Whilst the Extended Kalman Filter (EKF) works well for linear pro- extrapolated missing value, we would use the mean of the previous 2 jections, blood glucose levels are nonlinear in nature. Generally EKF values. For the second we would use the mean of the previous 4 val- can be thought of as the extension of a Gaussian Random Variable ues. For the tenth we would use the mean of the previous 20 values, (GRV) through a linear system [14]. In the nonlinear case however, including the ten we had just extrapolated before that. This would the EKF produces approximations to the values xk , yk , and Kk happen in five minute increments until we reach the next actual value (the state, observation, and covariance for the system) [14]. In other in the data frame. The last predicted value would be dropped and the words, the Extended Kalman Filter propagates a GRV through a first- data frame would continue as normal until a difference of more than 6 order linearization of the nonlinear system [14]. minutes between values was detected and this rolling average would The Unscented Kalman Filter also uses a Gaussian Random Vari- extrapolate the missing values. The rolling average would eventu- able, but instead uses a minimal set of carefully chosen sample points ally converge to the average value of all the data, but maintains the for which to propagate this GRV [14]. This is done by applying nature of the recent data. For example, if the person has had high the unscented transformation to the selected sample points and then blood glucose levels for the day, the filled data would stay high, but propagating these carefully chosen points through the system. Doing eventually move towards the mean of the person when using several so allows for approximations that are accurate to the third order of a days for large gaps. This was done since after a few hours, guessing Taylor series expansion [14]. where the person’s data was going to start is nearly random guessing. To summarize, the Unscented Kalman Filter selects carefully cho- Since the actual glucose values are essentially normally distributed, sen points, applies the unscented transformation to these points, then it is better to guess more towards the mean of the glucose levels. performs the time update and measurement update as is standard in Meanwhile, the discontinuities were reduced by maintaining the lo- the Kalman Filter [14]. cal rolling mean. This resulted in many of the extrapolations ending very close to where the data continues from the discontinuity for this data. 3.2 Regression and Ensemble Methods 3 Methods Since the OhioT1DM data set is time series based, regular regression methods are not immediately available for us to use when forecasting We intend to compare many methods used for classical and regres- data. However, we can transform the data into a regression problem sive time series analysis. Thus, even though some methods are known by first redefining how the data is presented. Instead of each row in to not perform well with blood glucose levels for this type of prob- the data representing a single time step of the nineteen features, we lem, they give a baseline to compare each successive method. In ad- instead redefine the data on the last six rows of data (we used the last dition to the classical models, we used some models described in 30 minutes of known information of data). Thus each row in the new other papers about predicting glucose levels for comparison and po- reformatted data set now contains the last six known time steps with tentially better parameter choices. Further, we chose some methods the labels being the future blood glucose values we wish to predict like VAR and ANFIS in order to compare methods not seen in the at each time step. Each label is the next six or twelve blood glucose research found. The following subsections explain choices in why values following the current time step in the OhioT1DM data set specific methods, parameters, and architecture were chosen. for the 30-minute and 60-minute prediction horizons respectively. In summary, each time step is reformatted to have a 6x19 feature space 3.1 Classical Methods with each label having 6 or 12 values. With the data reformatted the following algorithms can be run. 3.1.1 ARIMA Even though ARIMA itself is a linear combination of a trend com- ponent, a seasonal component, and a residual component, we chose 3.2.1 Ordinary Least Squares to use this model due to its classical use within time series anal- ysis. Additionally, ARIMA was chosen due to its ability to allow While the data is nonlinear in nature, it is possible that within a suf- us to choose the order of p and q for both the AR and MA parts ficiently small subset of the data (that is, for a sufficiently small time of the model. These hyperparameters p and q were chosen using interval), the data may be quasi-linear. As with ODEs (where one stats.models.orderselect, from which we found that p=2 and q=2 gave can essentially linearize a nonlinear system) we seek to do some- the lowest error. It should be noted that the data is nearly stationary to thing similar by attempting to fit affine functions to a sufficiently start, so a lag of 0 was used (as larger lags resulted in a worse error). small time domain. Ordinary Least Squares (OLS) seeks to do this, The only data features used were the previous p blood glucose levels fit an affine function (with a constant and error term), to the data and the q corresponding error terms. set. In addition to regular OLS, we also run OLS with regularization terms, namely Lasso (L1 regularization), Ridge (L2 regularization), and Elastic Net (L1 and L2 regularization) all with α values of 1 for 3.1.2 VAR the regularization terms. We note that Lasso regularization gives us VAR is a vectored version of an AR model. This allows for more the advantage of feature reduction, allowing us to analyze which lags types of inputs to influence the prediction, rather than just simply are most important in determining future blood glucose levels. 3.2.2 Support Vector Machines 3.3.1 ANFIS We believe Support Vector Machine regression may be a useful ANFIS is a neural network that includes fuzzy logic principles. method due to its ability to alter the kernel being used, thus allowing Fuzzy logic is about partial truths. Most neural networks have us to alter our definition of distance with regards to the data. Sup- a true/false form in selections. Fuzzy logic models uncertain- port Vector Machine (SVM) regression seeks to fit a hyperplane to ties. Some examples of this are what one considers warm/cold, the data with an -margin. Points that fall within this -margin are fast/medium/slow, or high/low. Rather than just picking one or the known as support vectors and are used to help define the hyperplane other, a draw from a distribution can give a weighted random nature used in the regression. Notions of distance to this hyperplane are de- to the choices. ANFIS is designed to approximate nonlinear func- fined using a kernel. We attempt to use an RBF-kernel (with a scaling tions like glucose values. This was chosen due to the extremely ac- γ value) and a Polynomial Kernel (with a scaling γ value, a constant curate predictions in the referenced paper on chaotic systems. [9] term of 0 and a power of 3) in our regressions. Each SVM had an -margin of 0.1. The results for each of the SVMs are reported under 3.3.2 Multi-Layer Perceptron (MLP) RBF, Poly, and Sig respectively. The Multi-Layer Perceptron (MLP) is a fully-connected, feed- forward neural network. This neural network can often find higher- 3.2.3 K-Nearest Neighbors order terms without having to create these higher-order terms. This reduces feature engineering of the data. Our MLP consists of three It is likely that previous patterns in the lags of blood glucose (and hidden layers, each with 100 nodes, and ReLu activation functions. other features) may be similar to the current pattern in the lags of The output layer for the regression is merely the output of the last features, we believe KNN regression may also be a useful regression affine function. Results are reported under MLP. method. KNN uses a voting method to form the regression. Using a defined metric of distance, KNN regression finds the K closest neigh- 4 Metrics bors to the given data point and then returns the average of the labels. We use five neighbors, along with Euclidean distance for this algo- The following metrics were used when evaluating the efficiency and rithm. The results for this algorithm are reported under KNN. accuracy of the algorithms: 4.1 Root Mean Square Error 3.2.4 Random Forest Regression r n 1 P The root mean square error (RMSE) is defined as (yˆi − yi )2 Random Forest Regression is an ensemble method that combines n i=1 weak decision-tree regressors to form a strong group regressor, Ran- where yˆi is the predicted value and yi is the actual value. RMSE has dom Forests allow us to create a regressor that branches based on the the advantage of an easily defined gradient, easy interpretability, and features. This is included here due to its use in other papers attempt- taking the square root of the squares transforms the error back to the ing blood glucose prediction (see [8], [12], [13], and [15]). To limit original function space (that is, the RMSE value is in the same units run-time to a reasonable length, a max-depth of four was imposed on as our label). This is the first metric used in evaluating the accuracy each forest. of the regression models. 3.2.5 Gradient Boosting 4.2 Mean Absolute Error n The mean absolute error (MAE) is defined as n1 P Another ensemble method that combines weak decision-tree regres- | yˆi − yi |. This i=1 sors to form a strong group regressor, Gradient Boosting instead error function is easy to define, is fairly robust against outliers, and seeks to optimize the gradient of the loss function for each regres- will be in the same units as our label. However, the gradient is not sor. As this can perform well with the correct hyperparameters, we always easy to define (and may not exist). This is the second metric include this to see if the algorithm can outperform any of the afore- used in evaluating the accuracy of the regression models. mentioned algorithms. In addition to using regular Gradient Boosted Trees, we also use an optimized version of this algorithm known as Extreme Gradient Boosted Trees (XGB). For Gradient Boosting a 4.3 Coefficient of Determination least-squares loss function, along with a learning rate of 0.1, and 100 The coefficient of determination (R2 ) is defined as estimators were used. For XBG a grid search was performed to find n 2 P the optimal hyperparameters. Respectively, the results for these algo- i rithms are reported under Grad and XGB. 1− i=1 n P (yi − ȳ)2 i=1 3.3 Neural Networks where yi is the actual value, yˆi is the predicted value, i = yi − yˆi Much work has already been done implementing neural networks and is defined as the ith residual, and ȳ is the sample mean. The co- in many different forms, including CNN, CRNN, DCNN, LSTM, efficient of determination gives a measure of how much variance is Jump neural Networks, and Echo State (see [1], [2], [3], [4], [6], [8], explained by the model. Values near 1 indicate nearly all variance and [15]). Much of this work came from the Blood Glucose Level is explained by the model, while values near 0 indicate the variance Prediction Challenge (BGLP) in 2018 using the OHIO T1DM data may be caused by other factors. We note that negative values are pos- set. sible, and for this paper indicate poor performance from the model. 4.4 Matthews Correlation Coefficient Table 2. Metric Averages for 60-minute Prediction Horizon The Matthews Correlation Coefficient (MCC) is defined as (T P ∗T N )−(F P ∗F N ) √ where TP, FP, FN, TN Method RMSE MAE MCCl MCCh R2 (T P +F P )(T P +F N )(T N +F P )(T N +F N ) stand for the true positive, false positive, false negative, and true neg- OLS 33.42 24.65 0.02 0.61 0.62 ative rates respectively [4]. This metric gives a general idea of how Lasso 33.41 24.67 0.02 0.61 0.62 well an algorithm does in predicting glycemic events. Values near 1 Ridge 33.41 24.65 0.02 0.61 0.62 Elastic 33.40 24.67 0.02 0.61 0.62 show the predictions correlate with the actual glycemic events. Val- RBF 36.76 26.53 -0.00 0.55 0.54 ues near 0 indicate the algorithm does no better than random guess- Poly 39.16 29.31 -0.00 0.53 0.48 ing. Values near -1 indicate negative correlation (that is the predic- KNN 38.11 28.01 0.15 0.53 0.50 tions correlate with the opposite of the glycemic event). This metric RF 35.20 26.08 0.09 0.58 0.58 Grad 33.98 24.96 0.08 0.58 0.61 is commonly used by many articles that attempt to predict blood glu- XGB 39.78 26.97 0.15 0.53 0.46 cose levels (see [4] for one such example), and as such is used here. Kalman 22.77 15.28 0.41 0.75 0.81 UKF 29.78 20.65 0.30 0.66 0.69 ARIMA 36.39 26.93 0.01 0.56 0.54 4.5 Clarke Error Grid VAR 35.06 19.56 0.16 0.70 0.54 ANFIS 36.87 26.53 0.12 0.59 0.56 The Clarke Error Grid plots the actual blood glucose values against MLP 35.59 25.81 0.06 0.59 0.57 the predicted blood glucose values and is used as an indication of the potential results that may occur for a given prediction. The grid 6 Analysis is split into 5 zones A-E. Predictions in Zone A and B are gener- ally considered safe predictions and would not result in any negative In an attempt to first analyze the accuracy of these predictions we first effects on the patient. Predictions in Zone C would result in unneces- analyze the RMSE and MAE for both the 30-minute and 60-minute sary treatment. Predictions in Zone D indicate a potentially danger- prediction horizons (Tables 1 and 2). As a general guideline we will ous failure to detect a glycemic event. Predictions in zone E would first analyze which model we believe is performing best among the confuse treatment of hypoglycemia for hyperglycemia and vice versa patients. Once this is done we will then analyze general trends we (see [1]). Points in Zone E are considered extremely dangerous, as have noticed while analyzing this data. treatment due to these results could result in the patient’s death. For this paper, in addition to MCC we use the percentage of points within each zone to evaluate the accuracy of a model’s predictions. 6.1 30-Minute Prediction We note that in terms of the above defined metrics OLS, Lasso, 5 Results Ridge, and Elastic Net Regression perform nearly identical. Thus, since the differences between OLS, Ridge, Lasso, and Elastic Net re- The following tables describe the average of the metric scores from gression yield minimally different results, we consider Lasso to be the 6 patients. Each of these metrics are described above, namely the best model for the 30-minute blood glucose predictions. Lasso RMSE, MAE, MCCs, and R2 . The abbreviation definitions and ex- regression offers a natural form of feature selection which allows planations can be found in the Methods section above. us to analyze which lags are most important for predicting future blood glucose levels. A further analysis of the feature relevancy can Table 1. Metric Averages for 30-minute Prediction Horizon be found under section 6.4. Even though we have identified Lasso regression as the best per- Method RMSE MAE MCCl MCCh R2 forming algorithm among those tested for the 30-minute prediction OLS 20.53 14.14 0.34 0.79 0.86 horizon, this means little if this ”best” algorithm still yields subpar Lasso 20.58 14.22 0.32 0.79 0.85 results. As such, we analyze Lasso regression both in terms of MCC Ridge 20.52 14.13 0.35 0.79 0.86 and the Clarke Error Grid to determine if these results are ”suffi- Elastic 20.56 14.20 0.31 0.79 0.86 ciently adequate” for blood glucose prediction. To see general trends RBF 24.89 16.96 0.14 0.74 0.79 for the prediction we analyze the results for actual and predicted val- Poly 31.73 22.51 -0.00 0.70 0.66 KNN 24.57 17.07 0.30 0.73 0.79 ues across time for patients 540 and 584. RF 23.00 16.27 0.16 0.76 0.82 Note the Clarke Error Grid for patients 540 and 584 for the 30- Grad 21.37 14.87 0.17 0.78 0.84 minute prediction horizon (figure 2). The closer the points fall onto XGB 24.62 17.29 0.34 0.74 0.79 the bottom left to top right diagonal the better the predictions are Kalman 24.08 24.08 0.40 0.74 0.78 UKF 29.88 20.65 0.30 0.67 0.69 considered. Analyzing these plots visually does not raise any imme- ARIMA 23.73 16.68 0.12 0.75 0.81 diate concerns for the predictions. Most values appear to fall within VAR 25.25 17.05 0.36 0.74 0.79 zones A, B, and C. Analyzing the zones percentages (table 3) shows ANFIS 24.56 16.52 0.26 0.76 0.80 that Lasso has 96% accuracy for patient 540 and about 99% accu- MLP 20.85 14.30 0.30 0.78 0.85 racy for patient 584. The major concern however is that the rest of these predictions fall within zones D-E, indicating these predictions may result in potentially dangerous care if acted on for the patient. Considering the high accuracy for each patient though, these results are considered ”sufficiently accurate” for the 30-minute prediction horizon. Analyzing the MCC for Lasso regression for the 30 minute hori- zon shows that the MCC tends to be about twice as high for hyper- glycemic events than for hypoglycemic events. Given that the data 172.71 mg/dl, and 148.23 mg/dl for patients 540, 544, 552, 567, 584, tends to have many more values in the hyperglycemic range than the and 596 respectively the most likely reason that the MCC for hypo- hypoglycemic this reflects more on the class imbalance more than the glycemic events is so low is due to class imbalance within the glucose algorithm. This is seen due to all the algorithms having this trend. levels. Since most glucose levels are generally high for the patients, Further, this bias is reflected in the algorithm’s predictions, as val- the model overfits for higher glucose levels, and as such struggles to leys in the predictions do not reach as low as the valleys in the actual predict hypoglycemic events. A potential solution could be to upsam- data (see figure 1). Because of this, we note that the algorithms are ple by ”jittering” the smaller imbalanced class (adding small random less likely to predict hypoglycemic events as they are hyperglycemic perturbations to the existing smaller imbalanced class in order to cre- events, a result that occurs due to the higher number of blood glucose ate for data). See [7] and [10] for such an example. values in the data. 6.2 60-Minute Prediction 6.4 Feature Relevancy Looking at the results for the 60-minute prediction horizon for the As stated earlier, one important benefit of Lasso regression is the RMSE and MAE we find the surprising result that the Kalman Filter ability to identify features important to glucose prediction. As seen (not the Unscented Kalman Filter), performs best out of all the algo- in Table 4: glucose level, bolus, meal, and exercise are significant in rithms. Several explanations are possible as to why this occurs. One predicting glucose levels (finger sticks are potentially significant, but of these is that the Kalman filter seemed to dampen the predictions. they may be linearly dependent on glucose level). The Weights col- Most of the other algorithms would keep predicting upwards for the umn is the sum of all 6 people’s weight scores. The problem with the hour predictions if the trend was going up beforehand. The Kalman weights is the huge variability in the number of recorded data points. filter seems to mainly shift the prediction horizon over (so the differ- In an attempt to normalize the data, we created an Adjusted Weight. ence between the last known glucose value and the prediction for an This is made by dividing the weights of each person by the number hour later is minimal). Since it keeps the results in the typical ranges of recorded values for each person and summing all 6 of them to- of glucose values it may avoid the poor scores from unusually strong gether. This was multiplied by 1000 so the values would be about spikes of predicted values. The scores may be the best, but they may the same magnitude as the original weights. The lack of enough data still be very poor predictors for an hour out. for exercise is demonstrated here. Only 3 of the 6 people had values Considering the aforementioned problems with the Kalman filter, for exercise and one of them had only 4 values. This person in the we analyze the ”second” best algorithm. Since the general trends dis- Adjusted Weights had a score of 32 while the other two were about cussed in the 30-minute prediction horizon section still hold for the 1.5 and 2. More data points for these other categories would reduce 60-minute prediction horizon (when we disregard the Kalman Fil- the variance and more clearly identify what features are important. ter), we conclude Lasso regression to be the next best algorithm to use. However, analyzing the difference between the 30-minute pre- diction horizon and the 60-minute prediction horizon raises several concerns with using Lasso regression for the 60-minute prediction 7 Conclusion horizon. We noted earlier that Lasso regression tends to underfit with re- We found that Lasso regression performed best out of the algo- gards to hypoglycemic events. This problem is only exacerbated rithms used for both the 30-minute prediction horizon and the 60- when the prediction horizon is extended to 60 minutes (see table 2). minute prediction horizon. While the results were adequate for the Here we notice the hypoglycemic MCC has reduced to near 0, indi- 30-minute prediction horizon, these quickly degraded for the 60- cating that Lasso prediction does no better than random guessing as minute horizon. We found in general that the regression algorithms to whether a hypoglycemic event is occurring. This is far from ideal perform fairly well for predicting hyperglycemic events, but strug- for any diabetic patient. As well, we note that for the 60-minute pre- gle for predicting hypoglycemic events. It is our opinion that further diction horizon, the accuracy of safe predictions degrades by about research should be done with regards to improving the prediction 2-3% (see table 3). While 94-97% accuracy is still fairly good, given horizon for blood glucose prediction. Specifically, further research that this reduction in accuracy results in 2-3% more dangerous pre- should be investigated into the effects of the volume of data on the dictions, and considering the fact that Lasso regression is unable to prediction horizon. If an artificial pancreas is to become a reality, predict hypoglycemic events better than random guessing, we do not stable prediction horizons beyond 30-minutes are needed. consider these predictions to be ”sufficiently accurate” for the 60- Furthermore, analyzing the coefficients of the Lasso model shows minute prediction horizon. As such, our recommendation is to use that glucose level, bolus, meal, and exercise are the most relevant the 30-minute prediction horizon. features in producing forecasts for blood glucose levels. However, problems with sparsity among certain features reduce the relevancy of these features. As such, future research should include handling 6.3 Overall Trends sparse features in a more robust way. The biggest trend that we notice is that the models tend to underfit in regards to hypoglycemic events. That is, the predicted values do not reach as low as the actual blood glucose values do. This is noted 8 Additional Material in the hypoglycemic MCC for the 30-minute prediction horizon (see table 1) which gives on average a score at about 0.3. This indicates For those wishing to compare or reproduce work found in this a general correlation in predicting hypoglycemic events, but not a paper, the related code can be found at https://github. strong one. Given that the average blood glucose levels on the test com/marshallb95/BloodGlucosePrediction/blob/ data were 159.42 mg/dl, 158.51 mg/dl, 134.92 mg/dl, 143.41 mg/dl, master/Master.ipynb. REFERENCES [1] A. Aliberti, I. Pupillo, S. Terna, E. Macii, S. Di Cataldo, E. Patti, and A. Acquaviva, ‘A multi-patient data-driven approach to blood glucose prediction’, IEEE Access, 7, 69311–69325, (2019). Table 3. Clarke Error Grid percentages [2] J. Chen, K. Li, P. Herror, T. Zhu, and G Pantelis. Dilated recurrent neu- ral network for short-time prediction of glucose concentration. Paper 30 min 60 min presented at the Third International Workshop on Knowledge Discov- z }| { z }| { ery in Healthcare Data at the 27th International Joint Conference on Zone 540 584 540 584 Artificial Intelligence and the 23rd European Conference on Artificial Intelligence, 2018. [3] S. Fiorini, C. Martini, D. Malpassi, R. Cordera, D. Maggi, A. Verri, and Zones A-B 0.96 0.99 0.935 0.97 A. Barla. Data-driven strategies for robust forecast of continuous glu- Zone C 0.0 0.00 0.001 0.01 cose monitoring time-series. Paper presented at the 39th Annual Inter- Zones D-E 0.04 0.01 0.064 0.02 national Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2017. [4] K. Li, J. Daniels, C. Liu, P. Herrero, and P. Georgiou, ‘Convolutional recurrent neural networks for glucose prediction’, IEEE Journal of Biomedical and Health Informatics, 24, 603–613, (2019). [5] C. Marling and R. Bunescu. The ohiot1dm dataset for blood glucose level prediction: Update 2020. In The 5th International Workshop on Knowledge Discovery in Healthcare Data, Santiago de Compostela, Spain, June, 2020, 2020. Table 4. Lasso Significant Values Totals [6] J. Martinsson, A. Schliep, B. Eliasson, C. Meijner, S. Persson, and O Mogren. Automatic blood glucose prediction with confidence us- Feature Number Recorded Weights Adjusted Weights ing recurrent neural networks. Paper presented at the Third Interna- tional Workshop on Knowledge Discovery in Healthcare Data at the glucose level 77563 15.4654 1.2062 27th International Joint Conference on Artificial Intelligence and the basis gsr 39542 0.2272 0.03560 23rd European Conference on Artificial Intelligence, 2018. skin temperature 39540 0.2418 0.0295 [7] M. Mayo, L. Chepulis, and R. Paul, ‘Glycemic-aware metrics and over- acceleration 39542 0 0 sampling techniques for predicting blood glucose levels using machine finger stick 1669 0.54 2.4504 learning’, PLoS ONE, 14, 0225613–0225632, (2019). basal 428 0 0 [8] C. Midroni, P. J. Leimbigler, G. Baruah, M. Kolla, A. J. Whitehead, and temp basal 208 0 0 Y. Fossat. Predicting glycemia in type 1 diabetes patients: Experiments bolus 1994 9.4944 23.4776 with xgboost. Paper presented at the Third International Workshop on meal 957 3.5682 31.6974 Knowledge Discovery in Healthcare Data at the 27th International Joint stressors 2 0 0 Conference on Artificial Intelligence and the 23rd European Confer- exercise 65 0.2312 36.2337 ence on Artificial Intelligence, 2018. [9] A. Miranian and M. Abdollahzade, ‘Developing a local least-squares support vector machines-based neuro-fuzzy model for nonlinear and chaotic time series prediction,’, IEEE Transactions on Neural Networks and Learning Systems,, 24, 207–218, (2013). [10] N. Nnamoko and I. Korkontzelos, ‘Efficient treatment of outliers and class imbalance for diabetes prediction’, Artificial Intelligence in Medicine, 104, 101805–101817, (2020). [11] S. M. Pappada, M. H. Owais, B. D. Cameron, J. C. Jaume, A. Mavarez- Martinez, R. S. Tripathi, and T. J. Papadimos, ‘An artificial neural 400 actual network-based predictive model to support optimization of inpatient predicted glycemic control’, Diabetes Technology & Therapeutics, 22, 1–12, 350 (2020). [12] I. Rodriguez-Rodriguez, J.V. Rodriguez, I. Chatzigiannakis, and M.A. 300 blood glucose (mg/dl) Zamora, ‘On the possibility of predicting glycaemia ‘on the fly’ with constrained iot devices in type 1 diabetes mellitus patients’, Sensors, 19, 4482–4496, (2019). 250 [13] I. Rodrı́guez-Rodrı́guez, I. Chatzigiannakis, J.V. Rodrı́guez, M. Maranghi, M. Gentili, and M.A. Zamora, ‘Utility of big data 200 in predicting short-term blood glucose levels in type 1 diabetes mel- litus through machine learning techniques’, Sensors, 19, 4538–4557, (2019). 150 [14] E. A. Wan and R. V. D. Merwe. The unscented kalman filter for non- linear estimation. Adaptive Systems for Signal Processing, Communi- 100 cations, and Control Symposium, 2000. [15] J. Xie and Q. Wang. Benchmark machine learning approaches with 50 classical time series approaches on the blood glucose level prediction challenge. Paper presented at the Third International Workshop on 0 Knowledge Discovery in Healthcare Data at the 27th International Joint 0 500 1000 1500 2000 2500 3000 Conference on Artificial Intelligence and the 23rd European Confer- ence on Artificial Intelligence, 2018. time step Figure 1. Patient 540 prediction results for 30 min PH with Lasso regres- sion 400 E C B 350 Prediction Concentration (mg/dl) 300 250 B 200 150 D D 100 50 0 A C E 0 50 100 150 200 250 300 350 400 Reference Concentration (mg/dl) Figure 2. Patient 540 Clarke Error Grid for 30 min PH with Lasso regres- sion 400 E C B 350 Prediction Concentration (mg/dl) 300 250 B 200 150 D D 100 50 0 A C E 0 50 100 150 200 250 300 350 400 Reference Concentration (mg/dl) Figure 3. Patient 584 Clarke Error Grid for 30 min PH with Lasso regres- sion