Predicting Glycemia in Type 1 Diabetes Patients: Experiments with XGBoost Cooper Midroni, Peter J. Leimbigler, Gaurav Baruah, Maheedhar Kolla, Alfred J. Whitehead, Yan Fossat Klick Inc., 175 Bloor Street East, Toronto, Ontario, Canada {cmidroni,pleimbigler,gbaruah,mkolla,awhitehead,yfossat}@klick.com Abstract and timing of meals, physical activity, and insulin self- administration. This, along with the altered pharmacokinetics Type 1 diabetes patients must self-administer in- of subcutaneous insulin, add further layers of complexity to sulin through injections or insulin-pump ther- the task of predicting blood glucose. As such, the Blood Glu- apy, requiring careful lifestyle management around cose Level Prediction Challenge represents an important step meals and physical activity. Accurate blood glu- toward the realization of an artificial pancreas. Herein lies the cose prediction could increase patient quality of objective of restoring homeostasis through accurately dosed life, and foreknowledge of hypoglycemia or hyper- insulin, and the creation of a model which captures the com- glycemia could mitigate risks and save lives. For plexities of the disease. This challenge was particularly mo- the 2018 BGLP Challenge, we experiment primar- tivating for us, not simply from the perspective of predictive ily with XGBoost to predict blood glucose levels at modeling, but also for the potential applications in providing a 30-minute horizon in the OhioT1DM dataset. Our tangible benefits to T1D patients. experiments show that XGBoost can be a competi- To predict glucose at a 30-minute time horizon, we pro- tive predictor of blood glucose levels, as compared cessed the raw features of the OhioT1DM dataset [Marling to prior research, and that feature signals from dif- and Bunescu, 2018] to create 3 different feature sets, and ex- ferent sources contribute in varying capacity for im- perimented with gradient-boosted trees [Chen and Guestrin, proved predictive ability of XGBoost. 2016a] (XGBoost), random forests [Breiman, 2001], and re- current neural network variants. We find that: 1 Introduction • XGBoost performs on par with prior models [Bunescu et al., 2013; Mirshekarian et al., 2017] for this task. Diabetes affects over 400 million people worldwide [World • Many of the provided features do not contribute to im- Health Organization, 2016], with near 5% of diabetics suffer- proved predictive performance. In essence, when using ing from type 1 diabetes (T1D) [American Diabetes Associ- XGBoost, past glucose is the most important predictor ation, 2018]. Patients with T1D are incapable of producing of future glucose. insulin, a hormone generated by the pancreas, which acts as the primary regulator of blood glucose metabolism. This dys- • Using -insensitive loss for training LSTMs improves function can lead to both hypoglycemia (low blood sugar) and predictive performance compared to mean squared error. hyperglycemia (high blood sugar), resulting in a significant patient burden to regulate carbohydrate consumption and sup- 2 Data plemental insulin delivery. Hyperglycemia can lead to medi- cal complications such as vision loss and kidney failure, and The OhioT1DM dataset comprises 19 features collected from increases risk of heart disease and stroke. Hypoglycemia can 6 patients with T1D [Marling and Bunescu, 2018]. Pa- lead to loss of consciousness and even death. tients wore Medtronic 530G insulin pumps, Medtronic Enlite An increasing number of T1D patients are adopting insulin continuous glucose monitors (CGM), and Basis Peak fitness pump therapy, wherein a wearable device releases insulin wristbands. 8 weeks of data were provided per patient, of subcutaneously to mimic pancreatic response. Current in- which the final 10 days were provided separately as a test set. sulin pumps require patient input on carbohydrate intake and 2.1 Data Analysis approval of each recommended insulin dose. Driven by the outstanding need for closed-loop insulin therapy, the notion Due to the heterogeneity of the data, we grouped features ac- of an artificial pancreas has gained traction in diabetes-related cording to their frequency: research [Graf et al., 2017; Juvenile Diabetes Research Foun- • one-off data: intermittent measurements with no fixed dation, 2018]. sampling frequency or duration. (e.g., finger stick glu- The dysregulation of blood glucose in T1D patients is cose, insulin bolus time and dose, sleep times and qual- further complicated by daily variations in the magnitude ity, work intensity, exercise intensity and duration, meal Patient 563 180 Patient 563 100 Patient 563 skin temperature (°F) skin temperature (°F) glucose level (mg/dL) glucose level (mg/dL) 400 heart rate (bpm) 150 90 200 120 80 90 70 60 0 60 Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Patient 591 180 Patient 591 100 Patient 591 400 heart rate (bpm) 150 90 200 120 80 90 70 60 0 60 Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Figure 1: Selected features for two patients, illustrating differences in mean values and waveforms. Lines and shaded regions indicate mean ±1 SD. Orange dots show individual data samples. type and carbohydrate content, hypoglycemic events, ill- Condensed Feature Set nesses, stressors) We developed a condensed feature set based on pairwise cor- • quasi-continuous data: signals in continuous effect, and relations between features. This feature set included Basis signals aggregated at 5-minute intervals. (e.g., CGM Peak band and Medtronic CGM sensor data, based on the glucose level, basal and temporary basal rates of insulin strength of their correlation with glucose. Several derivations infusion, heart rate, steps taken, galvanic skin response of the glucose signal were added to the feature set, includ- (GSR), skin and air temperature) ing five- and ten-minute time lags of glucose to provide the model with information of glucose history. A binary indicator Analysis of quasi-continuous data showed unique tempo- feature marked when the present glucose level was in the up- ral patterns in each patient (Figure 1). These bio-signals dis- per or lower 20% of the patient’s glucose distribution. Other played properties characteristic to each individual, such as re- features present in this set include the last bolus dose, mean peated waveforms and unique signal means. These qualities basal rate over the past 5 minutes, and an indicator of whether reflect the hour-of-day periodicities and homeostatic norms the patient was asleep. that vary across patients. Accordingly, we were motivated to build a distinct model for each patient, as opposed to a general Dimensionality-Reduced Features (PCA-reduced) model trained on the data of several patients. We used Principal Component Analysis (PCA) to transform our expanded feature set to remove features with minimal 2.2 Feature Engineering variance. Such features are less likely to be predictive in Expanded Feature Set a model. On average, the first 55 principal components ac- We found that due to the variable sampling frequency of one- counted for 99% of the variance in each patient’s dataset. off features and missing values in the data, the feature vector at any given timestamp was not guaranteed to contain values 3 Machine Learning Models for all fields. Converting the data into a feature matrix re- For all models, we used the final week of each patient’s train- sulted in rows with missing values, hindering analysis. We ing data for validation. In training personalized models, hy- thus chose to resample our data to 5-minute intervals, reflect- perparameters and model structure (e.g. learning rates, num- ing the 5-minute aggregation frequency of quasi-continuous ber of LSTM nodes) were kept consistent across patients. All variables. models in this study were implemented on all three of the en- Within each 5-minute resampling window, we aggregated gineered feature sets. each feature with either its maximum, mean, or last valid We investigated the following models: value, depending on its nature. For instance, we took the sum of steps, whereas we computed the mean of heart rate. • Tree ensembles using the Random Forest Regressor im- Some features are only in effect for specified durations. For plementation of Scikit-Learn [Pedregosa et al., 2011]. instance, basal insulin infusion rates were overridden with re- • Regression-based gradient-boosted decision trees using spective temporary basal overrides, if any, and each square- XGBoost [Chen and Guestrin, 2016b]. wave insulin bolus dose was spread evenly across its speci- fied time interval. Missing values for glucose were imputed • Recurrent neural network variants using Keras [Chollet, via linear interpolation. 2015]. Based on our observations (Sec. 2.1) of time-dependent Keras was used to create several RNN models: multi- patterns in the data, e.g., the dawn phenomenon, we included layer LSTMs, GRUs, a Bidirectional GRU, and LSTMs with one-hot encoded features for hour-of-day and day-of-week. dropout. These were implemented to evaluate their perfor- For each one-off feature, a binary indicator feature [Che et mance on the data, and were tested on varying durations of al., 2018] was used to denote missing values. look-back windows (5 minutes–1 day). Table 1: Submitted system performances. Table 2: Results of XGBoost on ablated feature set combinations of Self Reported (S), Medtronic Insulin pump (P), Basis Peak band Model: RF XGBoost XGBoost LSTM LSTM (B), and Continuous Glucose Monitor (G) features (Sec. 2.2). Featureset: condensed expanded PCA- expanded expanded reduced -insens. MAE loss Features val RMSE test RMSE 559 37.236 19.810 21.816 23.949 23.76 S 53.549 54.510 563 28.095 18.415 19.375 25.121 24.901 B 55.829 55.390 570 24.625 18.140 19.614 19.562 20.497 G 22.692 19.597 575 28.688 24.172 24.242 25.923 27.796 P 56.85 54.711 588 23.894 19.240 22.141 19.012 20.303 591 28.976 22.487 23.391 27.333 30.256 S+B 53.184 53.607 AVG 28.586 20.377 21.763 23.483 24.586 S+G 22.099 19.322 S+P 54.087 54.371 B+G 22.764 19.842 3.1 System Performance Results P+B 56.779 54.128 We found that no one feature set (either expanded, condensed P+G 22.388 19.470 or PCA-reduced) produced consistently better glucose pre- S+B+P 52.481 53.238 dictions, and different models performed better on different S+B+G 22.504 19.484 feature sets. Table 1 lists our submitted systems and feature S+P+G 22.125 19.418 sets. B+P+G 22.616 19.577 XGBoost was the best-performing model on both the ex- S+B+P+G 22.45 19.573 panded and PCA-reduced feature sets, achieving a mean RMSE across all patients of 20.377. These results are at par with previously published models based on Support Vector last meal; and lagged features, up to 2 hours. 8-fold cross- Regression [Bunescu et al., 2013]. validation without shuffling was used on the full training set to optimize the number of boosting rounds. Experiments with LSTM Loss Functions Our LSTM models were simple and did not perform as well 4.2 Feature Ablation Experiments as recent LSTM models for blood-glucose prediction [Mir- Given the diversity of features sourced from biological mea- shekarian et al., 2017]. We submitted results for an LSTM surements, self-reported events, and non-invasive physiolog- model that was composed of: layers LSTM(64 nodes), ical signals, we conducted an ablation study to determine LSTM(64 nodes), Fully-connected(32 nodes); the relative performance of models on subsets of feature a dropout rate of 0.2; an Adadelta optimizer; and a look- groups. This experiment was performed with our best XG- back of 5 minutes. Boost model. From our expanded feature set, we created the We observed that Mean Absolute Error (MAE) improved following feature subsets: the performance of trained LSTMs over using Mean Squared Error (MSE) as the loss function. The models trained with • Self-reported features (S): meals, finger-stick glucose, MSE showed a degradation which was particularly severe for illness, stress, exercise, and work, together with missing- glucose values near hypoglycemic and hyperglycemic lev- value indicator columns, and one-hot encodings for meal els [Medtronic, 2010]. MAE, in contrast to MSE, does not type (41 features) penalize large errors as heavily as MSE, which likely helped • Basis Peak band features (B): heart rate, GSR, skin and improve performance over outlying cases. air temperature, steps, and sleep (6 features) The generally accepted error rate for finger-stick blood glu- cose measurements is 15 mg/dL [Food and Drug Administra- • Pump features (P): basal and temporary basal infusion tion, 2016]. Thus, for predictions within a 15 mg/dL window rates, bolus doses, together with missing-value indicator of the ground truth, the the loss for such values can be con- and one-hot encoding columns (10 features) sidered less impactful. We therefore investigated the use of • CGM glucose feature (G): blood glucose level recorded an -insensitive loss function for training our LSTM, with  via CGM sensor (1 feature) set as 5 mg/dL, for a more stringent boundary than 15 mg/dL. • Time features: one-hot encodings for hour-of-day and We compared the three loss functions: and found that train- day-of-week (31 features) ing LSTMs with MAE loss improved results (RMSE 24.586) over MSE loss (RMSE 30.097) with -insensitive loss per- The XGBoost model was trained on a feature vector contain- forming the best with an RMSE of 23.483. ing the current feature value as well as the previous 12 values, lagged at 5 minutes apart. In total, we investigated 15 combi- 4 Follow-up Experiments and Results nations of the S, P, B, and G features (Table 2). Time features were included in all combinations. 4.1 Post Challenge Submission Table 2 shows that: We tuned the hyperparameters of our best model, XGBoost, and added the following features to the expanded featureset: • Prediction suffers greatly when glucose (G) is ablated. first difference of CGM glucose; time since last bolus, meal, • The best model does not include insulin (P) or band (B) hypo event, and hypo correction; size of last bolus; carbs in features. Table 3: Average feature ranking for XGBoost for top 25 features points (black over gray) included in the test set, whereas cor- (with average rank of feature in brackets). lagN indicates a feature rected RMSE is computed with the interpolated points ex- value (N × 5) minutes in the past. cluded from the test set (no predictions when glucose values Feature importance Feature importance are missing). Under the linear interpolation scheme, mean without band features with band features RMSE improves from an unfiltered value of 18.540 to 16.214 ( 1.0) glucose level ( 1.0) glucose level mg/dL (Table 4). Note that other features may have values ( 2.1) glucose level lag12 ( 2.1) glucose level lag12 even when glucose is absent (Sec. 2.2). ( 6.0) glucose level lag6 ( 7.3) glucose level lag5 We then ran our best model on three test-set imputation ( 6.7) glucose level lag5 ( 8.5) glucose level lag6 schemes, the latter two of which are compatible with online ( 7.7) glucose level lag4 (10.4) glucose level lag11 prediction: linear interpolation (linear), persisting the last ( 8.8) glucose level lag3 (13.0) glucose level lag10 valid value (ffill), and leaving gaps unchanged (none). (10.6) glucose level lag1 (13.5) glucose level lag4 Figure 2 and Table 4 compare the RMSEs of these im- (10.8) glucose level lag11 (14.8) glucose level lag7 putation schemes. The unfiltered RMSE column in Table 4 (12.2) glucose level lag7 (15.0) glucose level lag8 (13.2) glucose level lag9 (15.8) glucose level lag1 gives model performance on interpolated test glucose val- (13.8) glucose level lag10 (17.5) glucose level lag2 ues, without using the corrected RMSE function. The cor- (14.8) glucose level lag2 (17.8) glucose level lag9 rected RMSE columns list performance for three imputation (14.8) glucose level lag8 (18.1) glucose level lag3 schemes, using the corrected RMSE function. Interpolation (37.2) meal carbs lag3 (27.1) basis air temp with corrected RMSE performs best, but only forward-filled (40.2) finger stick lag5 (27.4) basis skin temp lag12 and non-imputed schemes can be implemented in an online (43.7) finger stick lag7 (33.2) basis gsr lag12 context. Figure 2 (left) illustrates incorrect interpolation of (44.3) meal carbs lag2 (34.9) basis gsr glucose values. Figure 2 (center) and (right) show the ffill (49.1) finger stick lag8 (37.1) basis heart rate and none schemes, both of which are compatible with online (50.8) meal carbs (38.9) basis heart rate lag12 prediction. (57.2) basal (48.2) basis heart rate lag4 (59.4) meal carbs lag4 (51.3) basis heart rate lag2 As an interesting exercise, missing test-set glucose values (60.6) finger stick lag6 (59.3) basis heart rate lag3 were left unchanged, and XGBoost was allowed to make pre- (64.2) meal carbs lag1 (62.1) basis heart rate lag1 dictions on the remaining features in the absence of glucose (66.4) finger stick lag3 (62.8) meal carbs lag2 (Figure 2, right panel, orange over gray). Predictions were (67.8) finger stick lag4 (63.8) meal carbs lag4 more variable in the absence of glucose signal, but seem to recover within two hours of the end of the data gap. • The model trained with only band features (B) and glu- cose (G) performed the worst among the glucose cohort. 5 Context, Related Work, and Discussion • In general, adding band features seems to reduce perfor- In this work, our aim was to deepen our understanding of the mance. predictive modeling of bio-signals, for which the Blood Glu- cose Prediction Challenge was ideally suited. We acknowl- We used XGBoost’s feature importance rank to observe edge that a rich body of research exists, which explores the which features contributed to the most decision splits within prediction of glycemia in depth. the trees of the model. More splits within the trees infer a Researchers have previously implemented Support Vector higher importance in decision making. Table 3 shows the Regression [Plis et al., 2014; Bunescu et al., 2013], neu- mean rank of a feature as determined by XGBoost’s impor- ral networks [Pappada et al., 2008], recurrent neural net- tance score. We observe that, on average, XGBoost’s deci- works [Allam et al., 2011; Mirshekarian et al., 2017], as sions are most influenced by: (i) current glucose, (ii) glucose well as genetic algorithms [Hidalgo et al., 2017]. Feature one hour ago, and (iii–xii) other glucose values within the engineering approaches include using expectation maximiza- past hour. This remains true irrespective of inclusion of Basis tion for missing data imputation [Tresp and Briegel, 1998], Peak band features. as well as physiologically modeling glucose response sig- 4.3 Data Imputation Revisited nals as features [Bunescu et al., 2013; Zecchin et al., 2012; Contreras et al., 2017]. In each of the patient’s data, missing values were observed in For this work, we applied conventional feature-engineering CGM measurements. Initially, these data gaps were filled via methods. In the future, we would like to explore the inclusion linear interpolation for both the training and test sets. How- of features based on physiological models of bio-signals for ever, such an imputation across gaps is only valid in a train- prediction. As an extension to our ablation and feature im- ing and batch prediction setting. In online prediction, new portance study, we would also like to explore non-glucose feature vectors stream into the model in real time. Thus, to signals in-depth for glucose level prediction. reflect realistic online prediction, missing values should only be imputed using past data. To this point, any time intervals with missing glucose val- 6 Conclusion ues should be excluded from the test-RMSE metric. As Our main finding is that XGBoost remains competitive with such, we corrected our implementation of test-RMSE to in- previously reported ARIMA models [Bunescu et al., 2013], clude only data periods with available glucose. In Figure which supports glucose-derived features as the strongest pre- 2 (left), unfiltered RMSE is computed with the interpolated dictors of future glucose levels. Table 4: Post-submission XGBoost scores on test-set glucose with unfiltered and NaN-masked (corrected) RMSE, for three interpolation schemes. Interpolation: number linear number linear ffill no interpolation of test unfiltered of test corrected corrected corrected Patient-id points RMSE points RMSE RMSE RMSE 559 2985 16.598 2514 17.107 17.391 17.230 563 2884 18.509 2570 16.018 16.033 16.026 570 2972 15.401 2745 14.315 14.709 14.493 575 2758 21.217 2590 17.556 17.611 17.749 588 2875 18.964 2791 16.500 16.500 16.519 591 2949 20.552 2760 15.162 15.140 15.269 AVG 18.540 16.110 16.231 16.214 Patient 559 Patient 559 Patient 559 250 250 250 CGM glucose (mg/dL) CGM glucose (mg/dL) CGM glucose (mg/dL) 200 200 200 150 150 150 100 100 100 actual actual actual 50 predicted 50 predicted 50 predicted 12:00 15:00 18:00 21:00 00:00 03:00 06:00 09:00 12:00 12:00 15:00 18:00 21:00 00:00 03:00 06:00 09:00 12:00 12:00 15:00 18:00 21:00 00:00 03:00 06:00 09:00 12:00 21-Jan 21-Jan 21-Jan Figure 2: A representative data period in the test set of Patient 559, showing CGM glucose (blue), predicted glucose (orange), interpolated glucose (black), and spans of missing glucose (grey shaded regions). Three imputation schemes are compared: linear interpolation (left), forward-filling (center), and no imputation (right). We observed that in LSTMs, -insensitive loss proved a [American Diabetes Association, 2018] American Diabetes Asso- more effective loss function than MAE, as inspired by the ciation. Type 1 diabetes. http://www.diabetes.org/ notion of incorporating an error tolerance corresponding to diabetes-basics/type-1/, 2018. finger-stick measurement error. Interestingly, XGBoost mod- [Breiman, 2001] Leo Breiman. Random forests. Machine Learn- els outperformed LSTMs in our study. ing, 45(1):5–32, Oct 2001. The collaboration of life sciences with the practice of data [Bunescu et al., 2013] Razvan Bunescu, Nigel Struble, Cindy Mar- science offers the possibility of developing truly individual- ling, Jay Shubrook, and Frank Schwartz. Blood glucose level ized proactive medicine. By personalizing such predictive prediction using physiological models and support vector regres- models, we endeavour to further explore key signals—digital sion. In ICMLA, 2013, volume 1, pages 135–140. IEEE, 2013. biomarkers, digital surrogate measurements—which reflect [Che et al., 2018] Zhengping Che, Sanjay Purushotham, the strength of this interdisciplinary collaboration, and our Kyunghyun Cho, David Sontag, and Yan Liu. Recurrent ability to transform the future of healthcare. neural networks for multivariate time series with missing values. Scientific reports, 8(1):6085, 2018. Acknowledgments [Chen and Guestrin, 2016a] Tianqi Chen and Carlos Guestrin. Xg- boost: A scalable tree boosting system. CoRR, abs/1603.02754, We would like to thank Michael Li and the rest of the Klick- 2016. Labs team at Klick Inc. for their feedback, support, and en- [Chen and Guestrin, 2016b] Tianqi Chen and Carlos Guestrin. Xg- couragements. boost: A scalable tree boosting system. In KDD, KDD ’16, pages 785–794, New York, NY, USA, 2016. ACM. [Chollet, 2015] François et al Chollet. Keras. https://keras. References io, 2015. [Allam et al., 2011] Fayrouz Allam, Zaki Nossai, Hesham Gomma, [Contreras et al., 2017] Iván Contreras, Silvia Oviedo, Martina Ibrahim Ibrahim, and Mona Abdelsalam. A recurrent neural net- Vettoretti, Roberto Visentin, and Josep Vehı́. Personalized blood work approach for predicting glucose concentration in type-1 di- glucose prediction: A hybrid approach using grammatical evo- abetic patients. In Engineering Applications of Neural Networks, lution and physiological models. PloS one, 12(11):e0187754, pages 254–259. Springer, 2011. 2017. [Food and Drug Administration, 2016] U.S. Food and Drug Administration. Self-monitoring blood glucose test sys- tems for over-the-counter use. https://www.fda.gov/ downloads/ucm380327.pdf, 2016. [Graf et al., 2017] Anneke Graf, Sybil A. McAuley, Catriona Sims, Johanna Ulloa, Alicia J. Jenkins, Gayane Voskanyan, and David N. O’Neal. Moving toward a unified platform for in- sulin delivery and sensing of inputs relevant to an artificial pan- creas. Journal of Diabetes Science and Technology, 11(2):308– 314, 2017. PMID: 28264192. [Hidalgo et al., 2017] J Ignacio Hidalgo, J Manuel Colmenar, Gabriel Kronberger, Stephan M Winkler, Oscar Garnica, and Juan Lanchares. Data based prediction of blood glucose concen- trations using evolutionary methods. Journal of medical systems, 41(9):142, 2017. [Juvenile Diabetes Research Foundation, 2018] Juvenile Diabetes Research Foundation. http://www.jdrf.org/research/artificial- pancreas/. http://www.jdrf.org/research/ artificial-pancreas/, 2018. [Marling and Bunescu, 2018] C. Marling and R. Bunescu. The OhioT1DM dataset for blood glucose level predic- tion. In The 3rd International Workshop on Knowl- edge Discovery in Healthcare Data, Stockholm, Swe- den, July 2018. CEUR proceedings in press, available at http://smarthealth.cs.ohio.edu/bglp/OhioT1DM-dataset- paper.pdf. [Medtronic, 2010] Medtronic. The basics of insulin pump therapy - medtronic diabetes. https://www. medtronicdiabetes.com/sites/default/ files/library/download-library/workbooks/ BasicsofInsulinPumpTherapy.pdf, 2010. [Mirshekarian et al., 2017] Sadegh Mirshekarian, Razvan Bunescu, Cindy Marling, and Frank Schwartz. Using lstms to learn physio- logical models of blood glucose behavior. In EMBC, 2017, pages 2887–2891. IEEE, 2017. [Pappada et al., 2008] Scott M Pappada, Brent D Cameron, and Paul M Rosman. Development of a neural network for predic- tion of glucose concentration in type 1 diabetes patients. Journal of diabetes science and technology, 2(5):792–801, 2008. [Pedregosa et al., 2011] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Ma- chine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. [Plis et al., 2014] Kevin Plis, Razvan C Bunescu, Cindy Marling, Jay Shubrook, and Frank Schwartz. A machine learning approach to predicting blood glucose levels for diabetes management. In AAAI Workshop: Modern Artificial Intelligence for Health Ana- lytics, number 31, pages 35–39, 2014. [Tresp and Briegel, 1998] Volker Tresp and Thomas Briegel. A so- lution for missing data in recurrent neural networks with an ap- plication to blood glucose prediction. In NIPS, pages 971–977, 1998. [World Health Organization, 2016] World Health Organization. Global report on Diabetes. World Health Organization, 2016. [Zecchin et al., 2012] C. Zecchin, A. Facchinetti, G. Sparacino, G. De Nicolao, and C. Cobelli. Neural network incorporating meal information improves accuracy of short-time prediction of glucose concentration. IEEE Transactions on Biomedical Engi- neering, 59(6):1550–1560, June 2012.