Predicting Glycemia in Type 1 Diabetes Patients: Experiments with XGBoost

Cooper Midroni, Peter J. Leimbigler, Gaurav Baruah, Maheedhar Kolla, Alfred J. Whitehead, Yan Fossat
                       Klick Inc., 175 Bloor Street East, Toronto, Ontario, Canada
                 {cmidroni,pleimbigler,gbaruah,mkolla,awhitehead,yfossat}@klick.com


                            Abstract                                  and timing of meals, physical activity, and insulin self-
                                                                      administration. This, along with the altered pharmacokinetics
       Type 1 diabetes patients must self-administer in-              of subcutaneous insulin, add further layers of complexity to
       sulin through injections or insulin-pump ther-                 the task of predicting blood glucose. As such, the Blood Glu-
       apy, requiring careful lifestyle management around             cose Level Prediction Challenge represents an important step
       meals and physical activity. Accurate blood glu-               toward the realization of an artificial pancreas. Herein lies the
       cose prediction could increase patient quality of              objective of restoring homeostasis through accurately dosed
       life, and foreknowledge of hypoglycemia or hyper-              insulin, and the creation of a model which captures the com-
       glycemia could mitigate risks and save lives. For              plexities of the disease. This challenge was particularly mo-
       the 2018 BGLP Challenge, we experiment primar-                 tivating for us, not simply from the perspective of predictive
       ily with XGBoost to predict blood glucose levels at            modeling, but also for the potential applications in providing
       a 30-minute horizon in the OhioT1DM dataset. Our               tangible benefits to T1D patients.
       experiments show that XGBoost can be a competi-                   To predict glucose at a 30-minute time horizon, we pro-
       tive predictor of blood glucose levels, as compared            cessed the raw features of the OhioT1DM dataset [Marling
       to prior research, and that feature signals from dif-          and Bunescu, 2018] to create 3 different feature sets, and ex-
       ferent sources contribute in varying capacity for im-          perimented with gradient-boosted trees [Chen and Guestrin,
       proved predictive ability of XGBoost.                          2016a] (XGBoost), random forests [Breiman, 2001], and re-
                                                                      current neural network variants. We find that:
  1   Introduction                                                       • XGBoost performs on par with prior models [Bunescu
                                                                            et al., 2013; Mirshekarian et al., 2017] for this task.
  Diabetes affects over 400 million people worldwide [World              • Many of the provided features do not contribute to im-
  Health Organization, 2016], with near 5% of diabetics suffer-             proved predictive performance. In essence, when using
  ing from type 1 diabetes (T1D) [American Diabetes Associ-                 XGBoost, past glucose is the most important predictor
  ation, 2018]. Patients with T1D are incapable of producing                of future glucose.
  insulin, a hormone generated by the pancreas, which acts as
  the primary regulator of blood glucose metabolism. This dys-           • Using -insensitive loss for training LSTMs improves
  function can lead to both hypoglycemia (low blood sugar) and              predictive performance compared to mean squared error.
  hyperglycemia (high blood sugar), resulting in a significant
  patient burden to regulate carbohydrate consumption and sup-        2     Data
  plemental insulin delivery. Hyperglycemia can lead to medi-
  cal complications such as vision loss and kidney failure, and       The OhioT1DM dataset comprises 19 features collected from
  increases risk of heart disease and stroke. Hypoglycemia can        6 patients with T1D [Marling and Bunescu, 2018]. Pa-
  lead to loss of consciousness and even death.                       tients wore Medtronic 530G insulin pumps, Medtronic Enlite
     An increasing number of T1D patients are adopting insulin        continuous glucose monitors (CGM), and Basis Peak fitness
  pump therapy, wherein a wearable device releases insulin            wristbands. 8 weeks of data were provided per patient, of
  subcutaneously to mimic pancreatic response. Current in-            which the final 10 days were provided separately as a test set.
  sulin pumps require patient input on carbohydrate intake and        2.1    Data Analysis
  approval of each recommended insulin dose. Driven by the
  outstanding need for closed-loop insulin therapy, the notion        Due to the heterogeneity of the data, we grouped features ac-
  of an artificial pancreas has gained traction in diabetes-related   cording to their frequency:
  research [Graf et al., 2017; Juvenile Diabetes Research Foun-         • one-off data: intermittent measurements with no fixed
  dation, 2018].                                                           sampling frequency or duration. (e.g., finger stick glu-
     The dysregulation of blood glucose in T1D patients is                 cose, insulin bolus time and dose, sleep times and qual-
  further complicated by daily variations in the magnitude                 ity, work intensity, exercise intensity and duration, meal
                                                                 Patient 563                                180               Patient 563                                                            100               Patient 563


                                                                                                                                                       skin temperature (°F) skin temperature (°F)
glucose level (mg/dL) glucose level (mg/dL)
                                              400


                                                                                         heart rate (bpm)
                                                                                                            150                                                                                       90
                                              200                                                           120                                                                                       80
                                                                                                             90
                                                                                                                                                                                                      70
                                                                                                             60
                                                0                                                                                                                                                     60
                                                 Mo    Tu   We    Th      Fr   Sa   Su                         Mo   Tu   We    Th      Fr    Sa   Su                                                    Mo   Tu   We    Th      Fr   Sa   Su
                                                                 Patient 591                                180               Patient 591                                                            100               Patient 591
                                              400


                                                                                         heart rate (bpm)
                                                                                                            150                                                                                       90
                                              200                                                           120                                                                                       80
                                                                                                             90
                                                                                                                                                                                                      70
                                                                                                             60
                                                0                                                                                                                                                     60
                                                 Mo    Tu   We   Th      Fr    Sa   Su                         Mo   Tu   We   Th       Fr    Sa   Su                                                    Mo   Tu   We   Th      Fr    Sa   Su

Figure 1: Selected features for two patients, illustrating differences in mean values and waveforms. Lines and shaded regions indicate mean
±1 SD. Orange dots show individual data samples.


                                                type and carbohydrate content, hypoglycemic events, ill-                           Condensed Feature Set
                                                nesses, stressors)                                                                 We developed a condensed feature set based on pairwise cor-
                                 • quasi-continuous data: signals in continuous effect, and                                        relations between features. This feature set included Basis
                                   signals aggregated at 5-minute intervals. (e.g., CGM                                            Peak band and Medtronic CGM sensor data, based on the
                                   glucose level, basal and temporary basal rates of insulin                                       strength of their correlation with glucose. Several derivations
                                   infusion, heart rate, steps taken, galvanic skin response                                       of the glucose signal were added to the feature set, includ-
                                   (GSR), skin and air temperature)                                                                ing five- and ten-minute time lags of glucose to provide the
                                                                                                                                   model with information of glucose history. A binary indicator
   Analysis of quasi-continuous data showed unique tempo-                                                                          feature marked when the present glucose level was in the up-
ral patterns in each patient (Figure 1). These bio-signals dis-                                                                    per or lower 20% of the patient’s glucose distribution. Other
played properties characteristic to each individual, such as re-                                                                   features present in this set include the last bolus dose, mean
peated waveforms and unique signal means. These qualities                                                                          basal rate over the past 5 minutes, and an indicator of whether
reflect the hour-of-day periodicities and homeostatic norms                                                                        the patient was asleep.
that vary across patients. Accordingly, we were motivated to
build a distinct model for each patient, as opposed to a general                                                                   Dimensionality-Reduced Features (PCA-reduced)
model trained on the data of several patients.                                                                                     We used Principal Component Analysis (PCA) to transform
                                                                                                                                   our expanded feature set to remove features with minimal
2.2                                                 Feature Engineering                                                            variance. Such features are less likely to be predictive in
Expanded Feature Set                                                                                                               a model. On average, the first 55 principal components ac-
We found that due to the variable sampling frequency of one-                                                                       counted for 99% of the variance in each patient’s dataset.
off features and missing values in the data, the feature vector
at any given timestamp was not guaranteed to contain values                                                                        3        Machine Learning Models
for all fields. Converting the data into a feature matrix re-                                                                      For all models, we used the final week of each patient’s train-
sulted in rows with missing values, hindering analysis. We                                                                         ing data for validation. In training personalized models, hy-
thus chose to resample our data to 5-minute intervals, reflect-                                                                    perparameters and model structure (e.g. learning rates, num-
ing the 5-minute aggregation frequency of quasi-continuous                                                                         ber of LSTM nodes) were kept consistent across patients. All
variables.                                                                                                                         models in this study were implemented on all three of the en-
   Within each 5-minute resampling window, we aggregated                                                                           gineered feature sets.
each feature with either its maximum, mean, or last valid                                                                            We investigated the following models:
value, depending on its nature. For instance, we took the
sum of steps, whereas we computed the mean of heart rate.                                                                              • Tree ensembles using the Random Forest Regressor im-
Some features are only in effect for specified durations. For                                                                            plementation of Scikit-Learn [Pedregosa et al., 2011].
instance, basal insulin infusion rates were overridden with re-                                                                        • Regression-based gradient-boosted decision trees using
spective temporary basal overrides, if any, and each square-                                                                             XGBoost [Chen and Guestrin, 2016b].
wave insulin bolus dose was spread evenly across its speci-
fied time interval. Missing values for glucose were imputed                                                                            • Recurrent neural network variants using Keras [Chollet,
via linear interpolation.                                                                                                                2015].
   Based on our observations (Sec. 2.1) of time-dependent                                                                             Keras was used to create several RNN models: multi-
patterns in the data, e.g., the dawn phenomenon, we included                                                                       layer LSTMs, GRUs, a Bidirectional GRU, and LSTMs with
one-hot encoded features for hour-of-day and day-of-week.                                                                          dropout. These were implemented to evaluate their perfor-
For each one-off feature, a binary indicator feature [Che et                                                                       mance on the data, and were tested on varying durations of
al., 2018] was used to denote missing values.                                                                                      look-back windows (5 minutes–1 day).
            Table 1: Submitted system performances.                Table 2: Results of XGBoost on ablated feature set combinations
                                                                   of Self Reported (S), Medtronic Insulin pump (P), Basis Peak band
  Model:         RF    XGBoost XGBoost LSTM        LSTM            (B), and Continuous Glucose Monitor (G) features (Sec. 2.2).
 Featureset: condensed expanded PCA- expanded expanded
                                reduced -insens. MAE loss                     Features      val RMSE       test RMSE
    559        37.236   19.810 21.816 23.949        23.76                      S               53.549         54.510
    563        28.095   18.415 19.375 25.121       24.901                      B               55.829         55.390
    570        24.625   18.140 19.614 19.562       20.497                      G               22.692         19.597
    575        28.688   24.172 24.242 25.923       27.796
                                                                               P               56.85          54.711
    588        23.894   19.240 22.141 19.012       20.303
    591        28.976   22.487 23.391 27.333       30.256                      S+B             53.184         53.607
   AVG         28.586   20.377 21.763 23.483       24.586                      S+G             22.099         19.322
                                                                               S+P             54.087         54.371
                                                                               B+G             22.764         19.842
3.1    System Performance Results                                              P+B             56.779         54.128
We found that no one feature set (either expanded, condensed                   P+G             22.388         19.470
or PCA-reduced) produced consistently better glucose pre-                      S+B+P           52.481         53.238
dictions, and different models performed better on different                   S+B+G           22.504         19.484
feature sets. Table 1 lists our submitted systems and feature                  S+P+G           22.125         19.418
sets.                                                                          B+P+G           22.616         19.577
   XGBoost was the best-performing model on both the ex-                       S+B+P+G         22.45          19.573
panded and PCA-reduced feature sets, achieving a mean
RMSE across all patients of 20.377. These results are at par
with previously published models based on Support Vector           last meal; and lagged features, up to 2 hours. 8-fold cross-
Regression [Bunescu et al., 2013].                                 validation without shuffling was used on the full training set
                                                                   to optimize the number of boosting rounds.
Experiments with LSTM Loss Functions
Our LSTM models were simple and did not perform as well            4.2   Feature Ablation Experiments
as recent LSTM models for blood-glucose prediction [Mir-           Given the diversity of features sourced from biological mea-
shekarian et al., 2017]. We submitted results for an LSTM          surements, self-reported events, and non-invasive physiolog-
model that was composed of: layers LSTM(64 nodes),                 ical signals, we conducted an ablation study to determine
LSTM(64 nodes), Fully-connected(32 nodes);                         the relative performance of models on subsets of feature
a dropout rate of 0.2; an Adadelta optimizer; and a look-          groups. This experiment was performed with our best XG-
back of 5 minutes.                                                 Boost model. From our expanded feature set, we created the
   We observed that Mean Absolute Error (MAE) improved             following feature subsets:
the performance of trained LSTMs over using Mean Squared
Error (MSE) as the loss function. The models trained with            • Self-reported features (S): meals, finger-stick glucose,
MSE showed a degradation which was particularly severe for             illness, stress, exercise, and work, together with missing-
glucose values near hypoglycemic and hyperglycemic lev-                value indicator columns, and one-hot encodings for meal
els [Medtronic, 2010]. MAE, in contrast to MSE, does not               type (41 features)
penalize large errors as heavily as MSE, which likely helped         • Basis Peak band features (B): heart rate, GSR, skin and
improve performance over outlying cases.                               air temperature, steps, and sleep (6 features)
   The generally accepted error rate for finger-stick blood glu-
cose measurements is 15 mg/dL [Food and Drug Administra-             • Pump features (P): basal and temporary basal infusion
tion, 2016]. Thus, for predictions within a 15 mg/dL window            rates, bolus doses, together with missing-value indicator
of the ground truth, the the loss for such values can be con-          and one-hot encoding columns (10 features)
sidered less impactful. We therefore investigated the use of         • CGM glucose feature (G): blood glucose level recorded
an -insensitive loss function for training our LSTM, with            via CGM sensor (1 feature)
set as 5 mg/dL, for a more stringent boundary than 15 mg/dL.
                                                                     • Time features: one-hot encodings for hour-of-day and
   We compared the three loss functions: and found that train-
                                                                       day-of-week (31 features)
ing LSTMs with MAE loss improved results (RMSE 24.586)
over MSE loss (RMSE 30.097) with -insensitive loss per-           The XGBoost model was trained on a feature vector contain-
forming the best with an RMSE of 23.483.                           ing the current feature value as well as the previous 12 values,
                                                                   lagged at 5 minutes apart. In total, we investigated 15 combi-
4     Follow-up Experiments and Results                            nations of the S, P, B, and G features (Table 2). Time features
                                                                   were included in all combinations.
4.1    Post Challenge Submission                                      Table 2 shows that:
We tuned the hyperparameters of our best model, XGBoost,
and added the following features to the expanded featureset:         • Prediction suffers greatly when glucose (G) is ablated.
first difference of CGM glucose; time since last bolus, meal,        • The best model does not include insulin (P) or band (B)
hypo event, and hypo correction; size of last bolus; carbs in          features.
Table 3: Average feature ranking for XGBoost for top 25 features       points (black over gray) included in the test set, whereas cor-
(with average rank of feature in brackets). lagN indicates a feature   rected RMSE is computed with the interpolated points ex-
value (N × 5) minutes in the past.
                                                                       cluded from the test set (no predictions when glucose values
       Feature importance                Feature importance            are missing). Under the linear interpolation scheme, mean
      without band features              with band features            RMSE improves from an unfiltered value of 18.540 to 16.214
  ( 1.0) glucose level             ( 1.0) glucose level                mg/dL (Table 4). Note that other features may have values
  ( 2.1) glucose level lag12       ( 2.1) glucose level lag12          even when glucose is absent (Sec. 2.2).
  ( 6.0) glucose level lag6        ( 7.3) glucose level lag5              We then ran our best model on three test-set imputation
  ( 6.7) glucose level lag5        ( 8.5) glucose level lag6           schemes, the latter two of which are compatible with online
  ( 7.7) glucose level lag4       (10.4) glucose level lag11           prediction: linear interpolation (linear), persisting the last
  ( 8.8) glucose level lag3       (13.0) glucose level lag10           valid value (ffill), and leaving gaps unchanged (none).
 (10.6) glucose level lag1        (13.5) glucose level lag4               Figure 2 and Table 4 compare the RMSEs of these im-
 (10.8) glucose level lag11       (14.8) glucose level lag7            putation schemes. The unfiltered RMSE column in Table 4
 (12.2) glucose level lag7        (15.0) glucose level lag8
 (13.2) glucose level lag9        (15.8) glucose level lag1
                                                                       gives model performance on interpolated test glucose val-
 (13.8) glucose level lag10       (17.5) glucose level lag2            ues, without using the corrected RMSE function. The cor-
 (14.8) glucose level lag2        (17.8) glucose level lag9            rected RMSE columns list performance for three imputation
 (14.8) glucose level lag8        (18.1) glucose level lag3            schemes, using the corrected RMSE function. Interpolation
 (37.2) meal carbs lag3           (27.1) basis air temp                with corrected RMSE performs best, but only forward-filled
 (40.2) finger stick lag5         (27.4) basis skin temp lag12         and non-imputed schemes can be implemented in an online
 (43.7) finger stick lag7         (33.2) basis gsr lag12               context. Figure 2 (left) illustrates incorrect interpolation of
 (44.3) meal carbs lag2           (34.9) basis gsr                     glucose values. Figure 2 (center) and (right) show the ffill
 (49.1) finger stick lag8         (37.1) basis heart rate              and none schemes, both of which are compatible with online
 (50.8) meal carbs                (38.9) basis heart rate lag12        prediction.
 (57.2) basal                     (48.2) basis heart rate lag4
 (59.4) meal carbs lag4           (51.3) basis heart rate lag2
                                                                          As an interesting exercise, missing test-set glucose values
 (60.6) finger stick lag6         (59.3) basis heart rate lag3         were left unchanged, and XGBoost was allowed to make pre-
 (64.2) meal carbs lag1           (62.1) basis heart rate lag1         dictions on the remaining features in the absence of glucose
 (66.4) finger stick lag3         (62.8) meal carbs lag2               (Figure 2, right panel, orange over gray). Predictions were
 (67.8) finger stick lag4         (63.8) meal carbs lag4               more variable in the absence of glucose signal, but seem to
                                                                       recover within two hours of the end of the data gap.
   • The model trained with only band features (B) and glu-
     cose (G) performed the worst among the glucose cohort.            5   Context, Related Work, and Discussion
   • In general, adding band features seems to reduce perfor-          In this work, our aim was to deepen our understanding of the
     mance.                                                            predictive modeling of bio-signals, for which the Blood Glu-
                                                                       cose Prediction Challenge was ideally suited. We acknowl-
   We used XGBoost’s feature importance rank to observe                edge that a rich body of research exists, which explores the
which features contributed to the most decision splits within          prediction of glycemia in depth.
the trees of the model. More splits within the trees infer a              Researchers have previously implemented Support Vector
higher importance in decision making. Table 3 shows the                Regression [Plis et al., 2014; Bunescu et al., 2013], neu-
mean rank of a feature as determined by XGBoost’s impor-               ral networks [Pappada et al., 2008], recurrent neural net-
tance score. We observe that, on average, XGBoost’s deci-              works [Allam et al., 2011; Mirshekarian et al., 2017], as
sions are most influenced by: (i) current glucose, (ii) glucose        well as genetic algorithms [Hidalgo et al., 2017]. Feature
one hour ago, and (iii–xii) other glucose values within the            engineering approaches include using expectation maximiza-
past hour. This remains true irrespective of inclusion of Basis        tion for missing data imputation [Tresp and Briegel, 1998],
Peak band features.                                                    as well as physiologically modeling glucose response sig-
4.3   Data Imputation Revisited                                        nals as features [Bunescu et al., 2013; Zecchin et al., 2012;
                                                                       Contreras et al., 2017].
In each of the patient’s data, missing values were observed in            For this work, we applied conventional feature-engineering
CGM measurements. Initially, these data gaps were filled via           methods. In the future, we would like to explore the inclusion
linear interpolation for both the training and test sets. How-         of features based on physiological models of bio-signals for
ever, such an imputation across gaps is only valid in a train-         prediction. As an extension to our ablation and feature im-
ing and batch prediction setting. In online prediction, new            portance study, we would also like to explore non-glucose
feature vectors stream into the model in real time. Thus, to           signals in-depth for glucose level prediction.
reflect realistic online prediction, missing values should only
be imputed using past data.
   To this point, any time intervals with missing glucose val-         6   Conclusion
ues should be excluded from the test-RMSE metric. As                   Our main finding is that XGBoost remains competitive with
such, we corrected our implementation of test-RMSE to in-              previously reported ARIMA models [Bunescu et al., 2013],
clude only data periods with available glucose. In Figure              which supports glucose-derived features as the strongest pre-
2 (left), unfiltered RMSE is computed with the interpolated            dictors of future glucose levels.
 Table 4: Post-submission XGBoost scores on test-set glucose with unfiltered and NaN-masked (corrected) RMSE, for three interpolation
 schemes.

                                            Interpolation:            number                           linear           number            linear                 ffill                          no interpolation
                                                                       of test                        unfiltered         of test         corrected            corrected                            corrected
                                               Patient-id              points                          RMSE              points           RMSE                 RMSE                                  RMSE
                                                  559                  2985                            16.598            2514             17.107               17.391                                17.230
                                                  563                  2884                            18.509            2570             16.018               16.033                                16.026
                                                  570                  2972                            15.401            2745             14.315               14.709                                14.493
                                                  575                  2758                            21.217            2590             17.556               17.611                                17.749
                                                  588                  2875                            18.964            2791             16.500               16.500                                16.519
                                                  591                  2949                            20.552            2760             15.162               15.140                                15.269
                                                 AVG                                                   18.540                             16.110               16.231                                16.214
                                            Patient 559                                                                    Patient 559                                                                       Patient 559


                      250                                                                            250                                                                               250
                                                                               CGM glucose (mg/dL)
CGM glucose (mg/dL)


                                                                                                                                                                 CGM glucose (mg/dL)
                      200                                                                            200                                                                               200


                      150                                                                            150                                                                               150


                      100                                                                            100                                                                               100


                                                                 actual                                                                         actual                                                                            actual
                      50                                         predicted                           50                                         predicted                              50                                         predicted
                       12:00 15:00 18:00 21:00 00:00 03:00 06:00 09:00 12:00                          12:00 15:00 18:00 21:00 00:00 03:00 06:00 09:00 12:00                             12:00 15:00 18:00 21:00 00:00 03:00 06:00 09:00 12:00
                                               21-Jan                                                                         21-Jan                                                                            21-Jan


 Figure 2: A representative data period in the test set of Patient 559, showing CGM glucose (blue), predicted glucose (orange), interpolated
 glucose (black), and spans of missing glucose (grey shaded regions). Three imputation schemes are compared: linear interpolation (left),
 forward-filling (center), and no imputation (right).


    We observed that in LSTMs, -insensitive loss proved a                                                                          [American Diabetes Association, 2018] American Diabetes Asso-
 more effective loss function than MAE, as inspired by the                                                                            ciation. Type 1 diabetes. http://www.diabetes.org/
 notion of incorporating an error tolerance corresponding to                                                                          diabetes-basics/type-1/, 2018.
 finger-stick measurement error. Interestingly, XGBoost mod-                                                                        [Breiman, 2001] Leo Breiman. Random forests. Machine Learn-
 els outperformed LSTMs in our study.                                                                                                  ing, 45(1):5–32, Oct 2001.
    The collaboration of life sciences with the practice of data                                                                    [Bunescu et al., 2013] Razvan Bunescu, Nigel Struble, Cindy Mar-
 science offers the possibility of developing truly individual-                                                                       ling, Jay Shubrook, and Frank Schwartz. Blood glucose level
 ized proactive medicine. By personalizing such predictive                                                                            prediction using physiological models and support vector regres-
 models, we endeavour to further explore key signals—digital                                                                          sion. In ICMLA, 2013, volume 1, pages 135–140. IEEE, 2013.
 biomarkers, digital surrogate measurements—which reflect                                                                           [Che et al., 2018] Zhengping      Che,    Sanjay     Purushotham,
 the strength of this interdisciplinary collaboration, and our                                                                        Kyunghyun Cho, David Sontag, and Yan Liu. Recurrent
 ability to transform the future of healthcare.                                                                                       neural networks for multivariate time series with missing values.
                                                                                                                                      Scientific reports, 8(1):6085, 2018.

 Acknowledgments                                                                                                                    [Chen and Guestrin, 2016a] Tianqi Chen and Carlos Guestrin. Xg-
                                                                                                                                      boost: A scalable tree boosting system. CoRR, abs/1603.02754,
 We would like to thank Michael Li and the rest of the Klick-                                                                         2016.
 Labs team at Klick Inc. for their feedback, support, and en-                                                                       [Chen and Guestrin, 2016b] Tianqi Chen and Carlos Guestrin. Xg-
 couragements.                                                                                                                        boost: A scalable tree boosting system. In KDD, KDD ’16, pages
                                                                                                                                      785–794, New York, NY, USA, 2016. ACM.
                                                                                                                                    [Chollet, 2015] François et al Chollet. Keras. https://keras.
 References                                                                                                                           io, 2015.
 [Allam et al., 2011] Fayrouz Allam, Zaki Nossai, Hesham Gomma,                                                                     [Contreras et al., 2017] Iván Contreras, Silvia Oviedo, Martina
    Ibrahim Ibrahim, and Mona Abdelsalam. A recurrent neural net-                                                                     Vettoretti, Roberto Visentin, and Josep Vehı́. Personalized blood
    work approach for predicting glucose concentration in type-1 di-                                                                  glucose prediction: A hybrid approach using grammatical evo-
    abetic patients. In Engineering Applications of Neural Networks,                                                                  lution and physiological models. PloS one, 12(11):e0187754,
    pages 254–259. Springer, 2011.                                                                                                    2017.
[Food and Drug Administration, 2016] U.S. Food and Drug
   Administration.        Self-monitoring blood glucose test sys-
   tems for over-the-counter use. https://www.fda.gov/
   downloads/ucm380327.pdf, 2016.
[Graf et al., 2017] Anneke Graf, Sybil A. McAuley, Catriona Sims,
   Johanna Ulloa, Alicia J. Jenkins, Gayane Voskanyan, and
   David N. O’Neal. Moving toward a unified platform for in-
   sulin delivery and sensing of inputs relevant to an artificial pan-
   creas. Journal of Diabetes Science and Technology, 11(2):308–
   314, 2017. PMID: 28264192.
[Hidalgo et al., 2017] J Ignacio Hidalgo, J Manuel Colmenar,
   Gabriel Kronberger, Stephan M Winkler, Oscar Garnica, and
   Juan Lanchares. Data based prediction of blood glucose concen-
   trations using evolutionary methods. Journal of medical systems,
   41(9):142, 2017.
[Juvenile Diabetes Research Foundation, 2018] Juvenile Diabetes
   Research Foundation.         http://www.jdrf.org/research/artificial-
   pancreas/.              http://www.jdrf.org/research/
   artificial-pancreas/, 2018.
[Marling and Bunescu, 2018] C. Marling and R. Bunescu.
   The OhioT1DM dataset for blood glucose level predic-
   tion.       In The 3rd International Workshop on Knowl-
   edge Discovery in Healthcare Data, Stockholm, Swe-
   den, July 2018.          CEUR proceedings in press, available
   at         http://smarthealth.cs.ohio.edu/bglp/OhioT1DM-dataset-
   paper.pdf.
[Medtronic, 2010] Medtronic.              The basics of insulin
   pump therapy - medtronic diabetes.               https://www.
   medtronicdiabetes.com/sites/default/
   files/library/download-library/workbooks/
   BasicsofInsulinPumpTherapy.pdf, 2010.
[Mirshekarian et al., 2017] Sadegh Mirshekarian, Razvan Bunescu,
   Cindy Marling, and Frank Schwartz. Using lstms to learn physio-
   logical models of blood glucose behavior. In EMBC, 2017, pages
   2887–2891. IEEE, 2017.
[Pappada et al., 2008] Scott M Pappada, Brent D Cameron, and
   Paul M Rosman. Development of a neural network for predic-
   tion of glucose concentration in type 1 diabetes patients. Journal
   of diabetes science and technology, 2(5):792–801, 2008.
[Pedregosa et al., 2011] F. Pedregosa, G. Varoquaux, A. Gramfort,
   V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
   R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau,
   M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Ma-
   chine learning in Python. Journal of Machine Learning Research,
   12:2825–2830, 2011.
[Plis et al., 2014] Kevin Plis, Razvan C Bunescu, Cindy Marling,
   Jay Shubrook, and Frank Schwartz. A machine learning approach
   to predicting blood glucose levels for diabetes management. In
   AAAI Workshop: Modern Artificial Intelligence for Health Ana-
   lytics, number 31, pages 35–39, 2014.
[Tresp and Briegel, 1998] Volker Tresp and Thomas Briegel. A so-
   lution for missing data in recurrent neural networks with an ap-
   plication to blood glucose prediction. In NIPS, pages 971–977,
   1998.
[World Health Organization, 2016] World Health Organization.
   Global report on Diabetes. World Health Organization, 2016.
[Zecchin et al., 2012] C. Zecchin, A. Facchinetti, G. Sparacino,
   G. De Nicolao, and C. Cobelli. Neural network incorporating
   meal information improves accuracy of short-time prediction of
   glucose concentration. IEEE Transactions on Biomedical Engi-
   neering, 59(6):1550–1560, June 2012.