<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Work in Progress: A Protocol for the Collection, Analysis, and Interpretation of Log Data from eHealth Technology</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Floor Sieverink</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Saskia Kelders</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Saskia Akkersdijk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mannes Poel</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liseth Siemons</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lisette van Gemert-Pijnen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre for eHealth and Wellbeing Research, Department of Psychology, Health and Technology, University of Twente</institution>
          ,
          <addr-line>Enschede</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Human Media Interaction, University of Twente</institution>
          ,
          <addr-line>Enschede</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <fpage>56</fpage>
      <lpage>60</lpage>
      <abstract>
        <p>Randomized controlled trials to evaluate the effectiveness of eHealth technologies provide only little understanding in why a particular outcome did occur. Log data analysis is a promising methodology to explain the found effects of eHealth technologies and to improve the effects. In this paper, we describe our experiences with the collection, analysis, and interpretation of log data from eHealth technology so far. It serves as a first step towards the development of a log data protocol to support eHealth research and will be extended and validated for different types of research questions and eHealth applications in the future.</p>
      </abstract>
      <kwd-group>
        <kwd>eHealth</kwd>
        <kwd>evaluation</kwd>
        <kwd>log data analysis</kwd>
        <kwd>black box</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Although persuasive eHealth technologies aim to support people changing their
behaviour or attitudes, one of the main problems is that the adoption and subsequent use
of such technologies remains low [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1-3</xref>
        ]. Moreover, most eHealth research is
dominated by a classic conception of medical research where randomized controlled trials
(RCTs) are the golden standard for measuring outcomes. This type of research
focuses on the effectiveness of the technology, without divulging how and why the
technology has contributed to this effectiveness. By conducting RCTs only, we do not
really know why a particular outcome did occur and how the technology supports the
users in healthier living, improving wellbeing, or conducting daily tasks [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. We
call this the ‘Black Box Phenomenon’ [
        <xref ref-type="bibr" rid="ref2 ref5">2, 5</xref>
        ]. To open this Black Box, it is necessary
to look for methodologies that go beyond the classic conception of effect evaluations
where pretests and posttests are the standard.
      </p>
      <p>
        The analysis of log data (anonymous records of real-time actions performed by
each user) is a promising methodology to explain the found effects of a technology
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This data has the potential to provide insight into the navigation process (what
functionalities are used and in what order) and to look beyond just the amount of use.
After all, longer exposure to an eHealth technology might be an indicator that the
system fits the needs of its users, but it can also be an indication for unfocused and
non-strategic use and inefficient systems [
        <xref ref-type="bibr" rid="ref4 ref6">4, 6</xref>
        ].
      </p>
      <p>
        In our research, we therefore use log data to look among others at how users
navigate through an application (what functionalities of the application are used and in
what order?) at which points users drop out and when the technology is used [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6-8</xref>
        ].
This information provides input to improve the content, layout, as well as the
underlying system of the application, and in turn, increase the persuasiveness of the
application and its long-term usage.
      </p>
      <p>In this paper, we describe our work in progress regarding the handling, analysis
and interpretation of log data of eHealth technologies as a starting point for the
development of a log data protocol for eHealth research.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The collection of log data</title>
      <p>Depending on the research questions (information requests), there are different
ways to collect log data. Google Analytics for example, can provide more information
about the quantity of use by al users as a group. However, to gain more in-depth
knowledge regarding the usage patterns of individual users, log data collected from
the client side (actions from users are logged) or server side (requests to the server are
logged) of the technology provide richer information.</p>
      <p>It is recommended to think about the research questions and the type of
information that is needed before the application is build, since it is often less time (and
money) consuming to build the logging functionality into a new application than into
an existing application. Also, information that is not logged from the beginning can
often not be recovered.</p>
      <p>When log data is not collected from the beginning, it might be difficult to interpret
the results, since valuable information is missing about how usage patterns developed
over time. Depending on the research question, a solution might be to only follow the
users that are logged from the first visit.</p>
      <p>However, (eHealth) technologies are often updated after the first release. It is thus
important to check after every system update whether the use of new and changed
functionalities is logged, in order to get a complete picture of the (changes in) usage
patterns.</p>
      <p>In Figure 1, an example of a fictional log data file is given. In this data, every
record (row) contains an (anonymous) user identity, a timestamp, and an identification of
the action for every action of the user. To meet the information request, the data files
should contain the needed information and the data should be available for analysis
under the applicable privacy regulations. Furthermore, the data should be a good
description of the future and be of sufficient quality, because as always, “garbage in,
leads to garbage out”.</p>
      <p>User
John
Mary
John
Mary
Mary
George
George
George
George</p>
    </sec>
    <sec id="sec-3">
      <title>Mathematical translation of log data</title>
      <sec id="sec-3-1">
        <title>Data preparation</title>
        <p>It is common that the log data that is needed for the analysis, consists of ten
thousands of records. To handle large amounts of (unstructured) log data, we are currently
developing a tool to generate data reports (e.g. number of logins, number of activities,
order of the activities) that are ready to use for further analysis.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Data analysis</title>
        <p>
          Once the log datasets are prepared, the files are ready for analysis. Up to now, our
analyses mostly consists of counting the occurrence of different usage patterns [
          <xref ref-type="bibr" rid="ref6 ref7 ref8">6-8</xref>
          ].
However, more information can be obtained by applying machine learning
algorithms, for example by transforming the data into a format that is readable for the
Weka (Waikato Environment for Knowledge Analysis) tool [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] and use this tool for
visualizations of the data and predictive modelling. The following methods for
analysis can be used:
• Supervised learning: predicting adherence and effects from early use patterns,
which enables early action for groups at risk [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
• Unsupervised learning: what usage profiles appear from the log data and can
we match those to a certain group of participants? [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]
• Markov modelling: What is the dominant path through the system? [
          <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
          ]
• Market-basket analysis: What features are often used together? [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]
The results will provide scientific input as well as practical input for system
improvements.
4
4.1
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Scientific and practical translation of log data</title>
      <sec id="sec-4-1">
        <title>Scientific translation</title>
        <p>
          As stated in the introduction, most eHealth research is dominated by a classic
conception of medical research where randomized controlled trials (RCTs) are the golden
standard for measuring outcomes. However, this type of research focuses on the
effectiveness of the technology, without divulging how and why the technology has
contributed to this effectiveness (the ‘Black Box Phenomenon’) [
          <xref ref-type="bibr" rid="ref2 ref5">2, 5</xref>
          ]. The results of
the log data analysis provide input for opening this black box.
        </p>
        <p>
          However, a log data analysis does not give information about why certain results
occur. For example, from previous research we know that almost all users of a
Personal Health Record for patients with Type 2 Diabetes Mellitus drop out when they
use the education service as a first step in their first session [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Additional research is
needed to find out why: e.g. are users overwhelmed by the large amount of
information, or did the information not meet the expectation of the users? Although a log
data analysis in itself does not always provide a complete picture, it does provide
specific targets for conducting additional research, for example via interviews,
questionnaires, or usability tests.
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Practical translation</title>
        <p>Besides the scientific value, log data analysis can be of added value for designers
and healthcare providers as well. For example, information about the dominant path
through the system can be used as input for adapting the system design to the users
and increasing the match between these two, in order to make the technology more
persuasive. Also, the results of a market-basket analysis can be used to give
suggestions to the users regarding their follow-up actions on the system (e.g. “You have
added a goal, other users have added their current weight as well. Click here to add
your weight”). This can also be incorporated to the system to give real-time feedback
to the users.</p>
        <p>Because log data analysis via (un)supervised learning can provide information
about users that will probably drop out from the intervention, healthcare providers
have the opportunity for intervening and stimulating users to continue using the
system. Also, healthcare providers can use log data analysis to see what the effects of
response time on messages are on the adherence of users to the system, making log
data analysis of added value in composing protocols for (blended) care via eHealth
technologies.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>The analysis of log data can be of great value for scientists, designers, as well as
caregivers and policy makers for opening the black box of eHealth technology.
However, from the collection of log data to translating the results into valuable
information, some steps need to be taken, each with their own considerations. This paper
serves as a first step towards the development of a log data protocol for data
collection, analysis, and interpretation to support eHealth research and will be extended and
validated for different types of research questions and eHealth applications in the
future.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Nijland</surname>
          </string-name>
          , N.,
          <string-name>
            <surname>van</surname>
            Gemert-Pijnen,
            <given-names>J.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelders</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brandenburg</surname>
            ,
            <given-names>B.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seydel</surname>
            ,
            <given-names>E.R:</given-names>
          </string-name>
          <article-title>Factors influencing the use of a Web-based application for supporting the self-care of patients with type 2 diabetes: a longitudinal study</article-title>
          .
          <source>Journal of medical Internet research 13(3)</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Van</given-names>
            <surname>Gemert-Pijnen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.E.W.C.</given-names>
            ,
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Ossebaard</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.C.</surname>
          </string-name>
          :
          <article-title>Improving eHealth</article-title>
          . Eleven International Publishing, The
          <string-name>
            <surname>Hague</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Black</surname>
            ,
            <given-names>A.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Car</surname>
            ,
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pagliari</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anandan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cresswell</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bokun</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McKinstry</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Procter</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Majeed</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sheikh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The Impact of eHealth on the Quality and Safety of Health Care: A Systematic Overview</article-title>
          .
          <source>PLoS Med</source>
          ,
          <year>2011</year>
          .
          <volume>8</volume>
          (
          <issue>1</issue>
          ): p.
          <fpage>e1000387</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Han,
          <string-name>
            <surname>J.Y.</surname>
          </string-name>
          ,
          <article-title>Transaction logfile analysis in health communication research: Challenges and opportunities</article-title>
          .
          <source>Patient Education and Counseling</source>
          <volume>82</volume>
          (
          <issue>3</issue>
          ),
          <fpage>307</fpage>
          -
          <lpage>312</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Resnicow</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strecher</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Couper</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chua</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Little</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nair</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polk</surname>
            ,
            <given-names>T.A.</given-names>
          </string-name>
          <string-name>
            <surname>Atenza</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          :
          <article-title>Methodologic and design issues in patient-centered e-health research</article-title>
          .
          <source>American journal of preventive medicine 38(1)</source>
          ,
          <fpage>98</fpage>
          -
          <lpage>102</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kelders</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bohlmeijer</surname>
          </string-name>
          , E.T.,
          <string-name>
            <surname>Van</surname>
            Gemert-Pijnen,
            <given-names>J.E.W.C.</given-names>
          </string-name>
          : Participants, usage, and
          <article-title>use patterns of a web-based intervention for the prevention of depression within a randomized controlled trial</article-title>
          .
          <source>Journal of Medical Internet Research</source>
          <volume>15</volume>
          (
          <issue>8</issue>
          ),
          <year>e172</year>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Van</given-names>
            <surname>Gemert-Pijnen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.E.W.C.</given-names>
            ,
            <surname>Kelders</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.M.</given-names>
            ,
            <surname>Bohlmeijer</surname>
          </string-name>
          , E.T.:
          <article-title>Understanding the Usage of Content in a Mental Health Intervention for Depression: An Analysis of Log Data</article-title>
          .
          <source>J Med Internet Res</source>
          <volume>16</volume>
          (
          <issue>1</issue>
          ),
          <year>e27</year>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Sieverink</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelders</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Braakman-Jansen</surname>
          </string-name>
          , L.M.,
          <string-name>
            <surname>van</surname>
            Gemert-Pijnen,
            <given-names>J.E.</given-names>
          </string-name>
          :
          <article-title>The Added Value of Log File Analyses of the Use of a Personal Health Record for Patients With Type 2 Diabetes Mellitus: Preliminary Results</article-title>
          .
          <source>Journal of Diabetes Science and Technology</source>
          ,
          <volume>8</volume>
          (
          <issue>2</issue>
          ),
          <fpage>247</fpage>
          -
          <lpage>255</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holmes</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pfahringer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reutemann</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Witten</surname>
            ,
            <given-names>I.H.</given-names>
          </string-name>
          :
          <article-title>The WEKA data mining software: an update</article-title>
          .
          <source>ACM SIGKDD Explor. Newsl</source>
          .
          <volume>11</volume>
          (
          <issue>1</issue>
          ),
          <fpage>10</fpage>
          -
          <lpage>18</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>J</given-names>
          </string-name>
          .,
          <string-name>
            <surname>Kamber</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pei</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Data mining: concepts and techniques</article-title>
          .
          <source>Elsevier</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Seneta</surname>
          </string-name>
          , E.:
          <article-title>Markov and the Birth of Chain Dependence Theory</article-title>
          .
          <source>International Statistical Review/Revue Internationale de Statistique</source>
          <volume>64</volume>
          (
          <issue>3</issue>
          ),
          <fpage>255</fpage>
          -
          <lpage>263</lpage>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Borges</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levene</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Evaluating Variable-Length Markov Chain Models for Analysis of User Web Navigation Sessions</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,.
          <volume>19</volume>
          (
          <issue>4</issue>
          ),
          <fpage>441</fpage>
          -
          <lpage>452</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Anand</surname>
            ,
            <given-names>S.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patrick</surname>
            ,
            <given-names>A.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hughes</surname>
            ,
            <given-names>J.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bell</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          :
          <article-title>A Data Mining methodology for cross-sales</article-title>
          .
          <source>Knowledge-Based Systems</source>
          ,
          <volume>10</volume>
          (
          <issue>7</issue>
          ),
          <fpage>449</fpage>
          -
          <lpage>461</lpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>