<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The OhioT1DM Dataset for Blood Glucose Level Prediction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Cindy Marling</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Razvan Bunescu</string-name>
          <email>bunescug@ohio.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Electrical Engineering and Computer Science Ohio University Athens</institution>
          ,
          <addr-line>Ohio, 45701</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper documents the OhioT1DM Dataset, which was developed to promote and facilitate research in blood glucose level prediction. It contains eight weeks' worth of continuous glucose monitoring, insulin, physiological sensor, and self-reported life-event data for each of six people with type 1 diabetes. An associated graphical software tool allows researchers to visualize the integrated data. The paper details the contents and format of the dataset and tells interested researchers how to obtain it.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Blood glucose level (BGL) prediction is a challenging task for
AI researchers, with the potential to improve the health and
wellbeing of people with diabetes. Knowing in advance when
blood glucose is approaching unsafe levels provides time to
proactively avoid hypo- and hyper-glycemia and their
concomitant complications. The drive to perfect an artificial
pancreas [Juvenile Diabetes Research Foundation (JDRF), 2018]
has increased the interest in using machine learning (ML)
approaches to improve prediction accuracy. Work in this area
has been hindered, however, by a lack of real patient data;
some researchers have only been able to work on simulated
patient data.</p>
      <p>To promote and facilitate research in blood glucose level
prediction, we have curated the OhioT1DM Dataset and made
it publicly available for research purposes. To the best of our
knowledge, this is the first publicly available dataset to
include continuous glucose monitoring, insulin, physiological
sensor, and self-reported life-event data for people with type
1 diabetes.</p>
      <p>The OhioT1DM Dataset contains eight weeks’ worth of
data for each of six people with type 1 diabetes. All data
contributors were between 40 and 60 years of age at the time of
the data collection. Two were male, and four were female. All
were on insulin pump therapy with continuous glucose
monitoring (CGM). They wore Medtronic 530G insulin pumps and
used Medtronic Enlite CGM sensors throughout the 8-week
data collection period. They reported life-event data via a
custom smartphone app and provided physiological data from a
Basis Peak fitness band.</p>
      <p>The dataset includes: a CGM blood glucose level every 5
minutes; blood glucose levels from periodic self-monitoring
of blood glucose (finger sticks); insulin doses, both bolus and
basal; self-reported meal times with carbohydrate estimates;
self-reported times of exercise, sleep, work, stress, and
illness; and 5-minute aggregations of heart rate, galvanic skin
response (GSR), skin temperature, air temperature, and step
count.</p>
      <p>The rest of this paper provides background information,
details the data format, describes the OhioT1DM Viewer
visualization software, and tells how to obtain the OhioT1DM
Dataset and Viewer for research purposes.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>We have been working on intelligent systems for diabetes
management for over a decade [Schwartz et al., 2008;
Marling et al., 2009; Marling et al., 2012; Bunescu et al., 2013;
Plis et al., 2014; Marling et al., 2016; Mirshekarian et al.,
2017]. As part of our work, we have run five clinical
research studies involving subjects with type 1 diabetes on
insulin pump therapy. Over 50 anonymous subjects have
provided blood glucose, insulin, and life-event data so that we
could develop software intended to help people with diabetes
and their professional health care providers.</p>
      <p>Throughout the years, we have received numerous requests
to share the data with other researchers. Our most recent
study was designed so that de-identified data could be shared
with the research community. All data contributors to the
OhioT1DM dataset signed informed consent documents
allowing us to share their de-identified data with outside
researchers. This agreement clearly delineated what types of
data could be shared and with whom. The data in the dataset
was fully de-identified according to the Safe Harbor method,
a standard specified by the Health Insurance Portability and
Accountability Act (HIPAA) Privacy Rule [Office for Civil
Rights, 2012]. To protect the data and ensure that it is used
only for research purposes, a Data Use Agreement (DUA)
must be executed before a researcher can obtain the data.
3</p>
    </sec>
    <sec id="sec-3">
      <title>OhioT1DM Data Format</title>
      <p>In the OhioT1DM Dataset, the data contributors are referred
to by ID numbers 559, 563, 570, 575, 588 and 591. Numbers
563 and 570 were male, while numbers 559, 575, 588 and 591
were female. For each data contributor, there is one XML file
for training and development data and a separate XML file
for testing data. This results in a total of 12 XML files, two
for each of the six contributors. Table 1 shows the number of
training and test examples for each contributor.
1. &lt;patient&gt; The patient ID number and insulin type.</p>
      <p>Weight is set to 99 as a placeholder, as actual patient
weights are unavailable.
2. &lt;glucose level&gt; Continuous glucose
(CGM) data, recorded every 5 minutes.
monitoring
3. &lt;finger stick&gt; Blood glucose values obtained through
self-monitoring by the patient.
4. &lt;basal&gt; The rate at which basal insulin is continuously
infused. The basal rate begins at the specified timestamp
ts, and it continues until another basal rate is set.
5. &lt;temp basal&gt; A temporary basal insulin rate that
supersedes the patient’s normal basal rate. When the value
is 0, this indicates that the basal insulin flow has been
suspended. At the end of a temp basal, the basal rate
goes back to the normal basal rate, &lt;basal&gt;
6. &lt;bolus&gt; Insulin delivered to the patient, typically
before a meal or when the patient is hyperglycemic. The
most common type of bolus, normal, delivers all insulin
at once. Other bolus types can stretch out the insulin
dose over the period between ts begin and ts end.
7. &lt;meal&gt; The self-reported time and type of a meal, plus
the patient’s carbohydrate estimate for the meal.
8. &lt;sleep&gt; The times of self-reported sleep, plus the
patient’s subjective assessment of sleep quality: 1 for Poor;
2 for Fair; 3 for Good.
9. &lt;work&gt; Self-reported times of going to and from work.</p>
      <p>Intensity is the patient’s subjective assessment of
physical exertion, on a scale of 1 to 10, with 10 being most
physically active.
10. &lt;stressors&gt; Time of self-reported stress.
11. &lt;hypo event&gt; Time of self-reported hypoglycemic
episode. Symptoms are not available, although there is
a slot for them in the XML file.
12. &lt;illness&gt; Time of self-reported illness.
13. &lt;exercise&gt; Time and duration, in minutes, of
selfreported exercise. Intensity is the patient’s subjective
assessment of physical exertion, on a scale of 1 to 10,
with 10 being most physically active.
14. &lt;basis heart rate&gt; Heart rate, aggregated every 5
minutes.
15. &lt;basis gsr&gt; Galvanic skin response, also known as
skin conductance, aggregated every 5 minutes.
16. &lt;basis skin temperature&gt; Skin temperature, in
degrees Fahrenheit, aggregated every 5 minutes.
17. &lt;basis air temperature&gt; Air temperature, in degrees</p>
      <p>Fahrenheit, aggregated every 5 minutes.
18. &lt;basis steps&gt; Step count, aggregated every 5 minutes.
19. &lt;basis sleep&gt; Times when the Basis band reported that
the subject was asleep, along with its estimate of sleep
quality.</p>
      <p>Note that, in de-identifying the dataset, all dates were shifted
by the same random amount of time into the future. The days
of the week and the times of day were maintained in the new
timeframes. However, the months were shifted, so that it is
not possible to consider the effects of seasonality or of
holidays.
4</p>
    </sec>
    <sec id="sec-4">
      <title>The OhioT1DM Viewer</title>
      <p>The OhioT1DM Viewer is a visualization tool that opens an
XML file from the OhioT1DM Dataset and graphically
displays the integrated data. It aids in developing intuition about
the data and also in debugging. For example, if a system
makes a poor blood glucose level prediction at a particular
point in time, viewing the data at that time might illuminate
a cause. For example, the subject might have forgotten to
report a meal or might have been feeling ill or stressed.</p>
      <p>Figure 1 shows a screenshot from the OhioT1DM Viewer.
The data is displayed one day at a time, from midnight to
midnight. Controls allow the user to move from day to day
and to toggle any type of data off or on for targeted viewing.</p>
      <p>The bottom pane shows blood glucose, insulin, and
selfreported life-event data. CGM data is displayed as a mostly
blue curve, with green points indicating hypoglycemia.
Finger sticks are displayed as red dots. Boluses are displayed
along the horizontal axis as orange and yellow circles. The
basal rate is indicated as a black line. Temporary basal rates
appear as red lines. Self-reported sleep is indicated by blue
regions. Life-event icons appear at the top of the pane as dots,
squares, and triangles. The data in the bottom pane is
clickable, so that additional information about any data point can
be displayed. For example, clicking on a meal (a square blue
icon) displays the timestamp, type of meal, and carbohydrate
estimate.</p>
      <p>The top pane displays data from the Basis Peak fitness
band. Blue regions in the top pane are times that the fitness
band reported that the subject was asleep. The step count
is indicated by vertical blue lines. The curves show heart rate
(red), galvanic skin response (green), skin temperature (gold),
and air temperature (cyan).
T
o
i
h
The OhioT1DM Dataset and Viewer are initially
available to participants in the Blood Glucose Level
Prediction (BGLP) Challenge of the Third International
Workshop on Knowledge Discovery in Healthcare Data at
IJCAIECAI 2018, in Stockholm, Sweden. After the BGLP
Challenge, these resources become publicly available to
other researchers. To protect the data and ensure that it
is used only for research purposes, a Data Use
Agreement (DUA) is required. A DUA is a binding document
signed by legal signatories of Ohio University and the
researcher’s home institution. As of this writing, researchers
can request a DUA at
https://sites.google.com/view/kdhd2018/bglp-challenge. Once a DUA is executed, the Dataset
and Viewer will be directly released to the researcher.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>The OhioT1DM Dataset was developed to promote and
facilitate research in blood glucose level prediction. Accurate
blood gluocose level predictions could positively impact the
health and well-being of people with diabetes. In addition to
their role in the artificial pancreas project, such predictions
could also enable other beneficial applications, such as
decision support for avoiding impending problems, “what if”
analysis to project the effects of different lifestyle choices,
and enhanced blood glucose profiles to aid in individualizing
diabetes care. It is our hope that sharing this Dataset will help
to advance the state of the art in blood glucose level
prediction.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was supported by grant 1R21EB022356 from the
National Institutes of Health (NIH). The OhioT1DM Viewer
was implemented by Robin Kelby, based on earlier
visualization software built by Hannah Quillin and Charlie
Murphy. The authors gratefully acknowledge the contributions of
Emeritus Professor of Endocrinology Frank Schwartz, MD, a
pioneer in building intelligent systems for diabetes
management. We would also like to thank our physician
collaborators, Aili Guo, MD, and Amber Healy, DO, our research
nurses, Cammie Starner and Lynn Petrik, and our past and
present graduate and undergraduate research assistants. We
are especially grateful to the anonymous individuals with type
1 diabetes who shared their data, enabling the creation of this
dataset.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Bunescu et al.,
          <year>2013</year>
          ]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bunescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Struble</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Marling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shubrook</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Schwartz</surname>
          </string-name>
          .
          <article-title>Blood glucose level prediction using physiological models and support vector regression</article-title>
          .
          <source>In Proceedings of the Twelfth International Conference on Machine Learning and Applications (ICMLA)</source>
          , pages
          <fpage>135</fpage>
          -
          <lpage>140</lpage>
          . IEEE Press,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Juvenile Diabetes Research Foundation (JDRF)</source>
          ,
          <year>2018</year>
          ]
          <article-title>Juvenile Diabetes Research Foundation (JDRF)</article-title>
          .
          <source>Artificial Pancreas</source>
          ,
          <year>2018</year>
          . Available at http://www.jdrf.org/research/ artificial-pancreas/, accessed June,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Marling et al.,
          <year>2009</year>
          ]
          <string-name>
            <given-names>C.</given-names>
            <surname>Marling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shubrook</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Schwartz</surname>
          </string-name>
          .
          <article-title>Toward case-based reasoning for diabetes management: A preliminary clinical study and decision support system prototype</article-title>
          .
          <source>Computational Intelligence</source>
          ,
          <volume>25</volume>
          (
          <issue>3</issue>
          ):
          <fpage>165</fpage>
          -
          <lpage>179</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Marling et al.,
          <year>2012</year>
          ]
          <string-name>
            <given-names>C.</given-names>
            <surname>Marling</surname>
          </string-name>
          , M. Wiley,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bunescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shubrook</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Schwartz</surname>
          </string-name>
          .
          <article-title>Emerging applications for intelligent diabetes management</article-title>
          .
          <source>AI Magazine</source>
          ,
          <volume>33</volume>
          (
          <issue>2</issue>
          ):
          <fpage>67</fpage>
          -
          <lpage>78</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Marling et al.,
          <year>2016</year>
          ]
          <string-name>
            <given-names>C.</given-names>
            <surname>Marling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bunescu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Schwartz</surname>
          </string-name>
          .
          <article-title>Machine learning experiments with noninvasive sensors for hypoglycemia detection</article-title>
          .
          <source>In IJCAI 2016 Workshop on Knowledge Discovery in Healthcare Data</source>
          , New York, NY,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Mirshekarian et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mirshekarian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bunescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Marling</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Schwartz</surname>
          </string-name>
          .
          <article-title>Using LSTMs to Learn Physiological Models of Blood Glucose Behavior</article-title>
          .
          <source>In Proceedings of the 39th International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC'17)</source>
          , pages
          <fpage>2887</fpage>
          -
          <lpage>2891</lpage>
          ,
          <string-name>
            <surname>Jeju</surname>
            <given-names>Island</given-names>
          </string-name>
          , Korea,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Office for Civil Rights</source>
          ,
          <year>2012</year>
          ]
          <article-title>Office for Civil Rights</article-title>
          .
          <article-title>Guidance regarding methods for de-identification of protected health information in accordance with the Health Insurance Portability and Accountability Act (HIPAA) privacy rule</article-title>
          ,
          <year>2012</year>
          . Available at https://www.hhs.gov/sites/default/files/ocr/privacy/hipaa/ understanding/coveredentities/De-identification/hhs deid guidance.pdf, accessed June,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Plis et al.,
          <year>2014</year>
          ]
          <string-name>
            <given-names>K.</given-names>
            <surname>Plis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bunescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Marling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shubrook</surname>
          </string-name>
          , and
          <string-name>
            <surname>F. Schwartz.</surname>
          </string-name>
          <article-title>A machine learning approach to predicting blood glucose levels for diabetes management</article-title>
          .
          <source>In Modern Artificial Intelligence for Health Analytics: Papers Presented at the Twenty-Eighth AAAI Conference on Artificial Intelligence</source>
          , pages
          <fpage>35</fpage>
          -
          <lpage>39</lpage>
          . AAAI Press,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>[Schwartz</surname>
          </string-name>
          et al.,
          <year>2008</year>
          ]
          <string-name>
            <given-names>F. L.</given-names>
            <surname>Schwartz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Shubrook</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Marling</surname>
          </string-name>
          .
          <article-title>Use of case-based reasoning to enhance intensive management of patients on insulin pump therapy</article-title>
          .
          <source>Journal of Diabetes Science and Technology</source>
          ,
          <volume>2</volume>
          (
          <issue>4</issue>
          ):
          <fpage>603</fpage>
          -
          <lpage>611</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>