<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Teaching HCI Methods: Replicating a Study of Collaborative Search</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Author Keywords Collaborative search</institution>
          ,
          <addr-line>Synergy, Replication</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Max L. Wilson Mixed Reality Lab University of Nottingham</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the challenges experienced when replicating a user study that evaluated synergy in a collaborative search system. The original paper saw signi cant di erences in collaborative performance, depending on the mode of collaboration. We were unable to replicate the ndings, but experienced several challenges that created ambiguity and di erences in the methods, which may have prevented us from doing so. These challenges and experiences, and their a ect on our ability to replicate the ndings, are described in detail.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Copyright is held by the author/owner(s).</p>
      <p>This paper was submitted to RepliCHI 2013, a CHI'13 workshop</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>
        Hands on experience of replicating an experiment is often
considered a good method of teaching [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. For this
reason, a cohort of 6 MSc students were asked to replicate
a user study; to learn the methodological and analytical
skills required to do so. Further, we hoped to con rm the
ndings for the bene t of the wider community. Based
      </p>
      <sec id="sec-2-1">
        <title>Original Task Description</title>
        <p>
          A leading newspaper has hired
your team to create a
comprehensive report on the causes,
e ects, and consequences of the
recent gulf oil spill. As a part of
your contract, you are required to
collect all the relevant information
from any available online sources
that you can nd.
upon the interests of the sta and students involved, we
chose to replicate a user study of the synergetic e ect
experienced by users searching in collaboration, originally
carried out by Shah and Gonzalez-Ibanez [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], herein
referred to as the original researchers.
        </p>
        <p>
          The original researchers studied their own collaborative
search software (Coagmento1), which had been evaluated
previously [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], to examine synergy between collaborators
in di erent group orientations. These orientations, as the
primary independent variable, were co-located (same
computer), co-located (di erent computers), and remotely
located (di erent computers); individual searchers,
automatically paired post hoc, were used as a baseline.
The paper further contributed to the issue of evaluating
synergy in collaborative search, by presenting new
applicable measures. This focus on measures provided
additional learning bene t to the MSc students involved.
The MSc students were given an entire semester to
coordinate and run the study, and had each had to write
about the results and the experience for their primary
assessment. Support from the original researchers had
been previously arranged by the sta .
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Challenges Faced and Decisions Made</title>
      <p>Signi cant challenges were faced throughout the
replication attempt, from setting up the study, running
the study, and analysing the results. These are described
in turn below.</p>
      <sec id="sec-3-1">
        <title>Setup Challenges</title>
        <p>There were three major challenges in the setup phase:
software procurement, data capture, and task design.</p>
        <p>
          Software Procurement - Initially it was considered that
the procurement of software would be very easy, as
Coagmento can be easily downloaded from the website.
After installing the software, however, we noticed several
di erences in the user interface to the system described in
the original paper [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The original researchers told us
their study was based on an earlier version of the software.
At rst, we decided to accept the di erence in
functionality and to report it as a limitation later if
needed. The original researchers, however, agreed to try
and roll-back their functionality and provide us with a
version that matched the evaluated version. This was very
generous of the original researchers, and not always an
option for those wishing to replicate studies.
        </p>
        <p>Data Capture - After investigating which data must be
captured for the study, we discovered that the original
researchers captured the data at the server level. Again,
we were faced with two options: video record the desktop
and manually log the necessary data afterwards, or
request access to the data from the original researchers.
The original researchers were again generous and agreed
to provide us with the logs.</p>
        <p>Task Design - One signi cant challenge we faced was
task design. The study was based upon an open-ended
exploratory recall task, based upon american political
parties. Our third decision was whether we should keep
the american political task focus, or choose a more
temporally (since the political topic had become old) and
culturally relevant task for the British university. Several
alternatives were proposed before making the decision,
and in the end a temporally and culturally relevant task
was chosen that focused on the 2012 Olympics (see
original and revised task descriptions in the margins).
This decision was made because task relevance and</p>
        <sec id="sec-3-1-1">
          <title>Revised Task Description</title>
          <p>A leading newspaper has hired
your team to create a
comprehensive report on he causes,
e ects and consequences of the
Olympic Games. As a part of
your contract, you are required to
collect all the relevant information
from any available online sources
that you can nd.</p>
          <p>To prepare this report, search and
visit any website that you want
and look for speci c aspects as
given in the guideline below. As
you nd useful information,
highlight and save relevant snippets.</p>
          <p>Make sure you also rate a snippet
to help you in ranking them based
on their quality and usefulness.</p>
          <p>Later, you can use these snippets
to compile your report, no longer
than 200 lines, as instructed.</p>
          <p>
            Your report on this topic should
address the following issues:
Impact on economy of host countries
(people and animals), long-term
implications on the host country,
conditions and voting policy to
become hosting nation and the next
host country and their
preparations to host the games.
inherent motivation are considered key factors in creating
good work tasks for user studies [
            <xref ref-type="bibr" rid="ref1 ref7">7, 1</xref>
            ].
          </p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Running the Study</title>
        <p>There were three major challenges in the process of
running the study: the experience of the research team,
the nancial support for incentives, and time limitations.</p>
        <p>Research Team - As this replication was being used to
teach new MSc students about the process of running a
study, the rst and most obvious challenge is that the
study is being run by inexperienced researchers. This
challenge was further confounded by the necessity to
teach many students at once. In this case, the original
study was performed by one experienced phd student, but
the replication was carried out by 6 novice MSc students.
Each MSc student required experience at designing study
materials (like questionnaires), handling participants, and
analysing the results. This means that there was likely to
be a high variance in each of the stages. To reduce
variance, one nal protocol was selected from each of
protocols submitted by the students. However, there were
not many constraints, apart from a default script, in terms
of how, where, and when the researchers carried out the
study with their participants.</p>
        <p>
          Financial Support for Incentives - As part of a taught
module, rather than a funded research project, the
students had to design alternative incentive methods. In
the end, they choose a prize draw for a single prize
(provided by the sta ), but of a value much lower than a
$10 voucher for each participant. There is some related
work (e.g. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]) into the style of di erent incentive
structures, but the e ect in this case was not clear.
        </p>
        <p>Time limitations - Also driven by the taught-module
based constraints, the students had a limited amount of
time to perform the study. Consequently, the students
had to make a decision, also relating to the nancial
limitations, about how many participants to include in the
study. The students managed 40 participants in the
timeframe, rather than the 70 involved in the original
research.</p>
      </sec>
      <sec id="sec-3-3">
        <title>Analysing the Results</title>
        <p>There were two major challenges in the analysis phase:
data processing and data analysis.</p>
        <p>Data Processing - The main challenge experienced in
the analysis section was around the pre-processing of log
data for analysis. The original researchers, for example,
removed search engine result pages from their analysis of
diverse website coverage, but the exact set of URLs
considered as search engine results pages was implicit
rather than explicit. In fact, any form of log processing
and ltering in such a study would be a possible source of
variance in user studies, unless the exact rules are
accessible to the replicating team. One challenging
example is whether to include both a user's typo and then
their correction in analysing log data. In our own
experiment, we created lters to achieve the same goals
as reported in the paper, but we could not guarantee the
exact same data would be ltered as the original research,
given the same log; these elements of research methods
are extremely di cult to comprehensively report in
research publications.</p>
        <p>
          Data Analysis - With many methods, there are many
variations on how to apply methods. In the case of this
study, it was ambiguous as to how the data from the
NASA Task Load Index (TLX) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] was analysed. Many
studies remove physical e ort from the scale, as using a
computer does not lend itself to variation in the physical
e ort questions. In this case, it was unclear as to exactly
how the NASA TLX was applied, including as to whether
pair-wise comparisons were made.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Study Outcome and Discussion</title>
      <p>The outcome of our replication attempt was that we
could not replicate any of the original ndings, as we hope
may be reported in detail in a future publication. In
summary, we saw no di erence between the di erent
measures, where the original researchers found a number
of di erences. However, there are many possible reasons
for the di erences, where we'll begin with the limitations
of our replication attempt.</p>
      <sec id="sec-4-1">
        <title>Limitations of our Replication</title>
        <p>Although we were somewhat privileged to have the
support of the original authors, we also had several
limitations in our attempt:</p>
        <p>Researchers - our study was performed by 6 novice
researchers, who each took part in running the
study, with di erent individual abilities
Participants - we had fewer participants (40 instead
of 70), but from a similar academic population
Participant Motivation - as part of a teaching
module, participants were volunteers found by the
MSc students, and were not motivated in the same
way as original study
Software - although the original researchers provided
rolled-back software for the study, the process of
rolling back introduced bugs that sometimes made
the software unresponsive</p>
      </sec>
      <sec id="sec-4-2">
        <title>Possible Causes of Di erent Findings</title>
        <p>There are many reasons, including those listed above, that
may have a ected the outcome of our results, and
prevented us from getting the same ndings. Re ectively,
its hard to estimate which element would have likely had
the biggest impact on our attempt to replicate the study.
First, the performance of the software, after being rolled
back, was not ideal and this alone may have obstructed
the synergetic e ect seen by the original researchers.
Second, the study was performed by several novice
researchers, who may simply not have performed the
study e ectively. Third, the di erences in the number of
participants and the lack of voucher-based motivation
could have limited the performance of participants.
Fourth, task design has been seen to have a large a ect
on task outcome, and so perhaps your culturally and
temporary relevant task may have not have been suitable.
Finally, the processing of data for the analysis could have
been simply di erent. Having some di erent or more
comprehensive ltering rules may have led to signi cant
di erences in the measures.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Implications for RepliCHI</title>
        <p>We chose to report this HCI replication, despite being
focused on a user study not published at an HCI venue,
because of the sheer number of issues that it highlighted
for a community that wants to better support replication.
Our speci c example leaves many open questions that we
may wish to investigate:</p>
        <p>What should we do when presented with di erent
software versions from the original study?
Should we use original tasks? Or is it acceptable to
replace them for increased temporal/cultural
relevance?
Where data processing is involved, how should we
best support others who wish to replicate our
studies?
If we want to recommend replication as a form of
teaching, what are the consequences of using groups
of novice researchers?
If we can't overcome these challenges, is there any
value in replicating the studies?
Overall, the students experienced many challenges in
trying to replicate the study, but learned a lot about study
design and paper writing by doing so. For these
educational reasons, the replication attempt provided a lot
of value to the students. In terms of con rming the
original study, we were unable to con rm the results, but
were of course unable to disprove them also. This is
perhaps a nal challenge and discussion point for
replication in HCI: we need to decide what we take away
from studies that cannot replicate ndings, and what
value we have from understanding them. From this
experience report, we hope that researchers may learn
about several decisions that they may likely have to make
when performing replications, and perhaps make more
informed choices when the time comes.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>We'd like to thank the original authors, Chirag Shah and
Roberto Gonzalez-Ibanez for their support: providing
software and and advice for the replication.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Borlund</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <article-title>The concept of relevance in ir</article-title>
          .
          <source>Journal of the American Society for information Science and Technology</source>
          <volume>54</volume>
          ,
          <issue>10</issue>
          (
          <year>2003</year>
          ),
          <volume>913</volume>
          {
          <fpage>925</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>M. C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Saxe</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>Teaching replication</article-title>
          .
          <source>Perspectives on Psychological Science 7</source>
          ,
          <issue>6</issue>
          (
          <year>2012</year>
          ),
          <volume>600</volume>
          {
          <fpage>604</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Hart</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>Nasa-task load index (nasa-tlx); 20 years later</article-title>
          .
          <source>In Proceedings of the Human Factors and Ergonomics Society Annual Meeting</source>
          , vol.
          <volume>50</volume>
          ,
          <string-name>
            <given-names>SAGE</given-names>
            <surname>Publications</surname>
          </string-name>
          (
          <year>2006</year>
          ),
          <volume>904</volume>
          {
          <fpage>908</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Musthag</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raij</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ganesan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <article-title>Shi man, S. Exploring micro-incentive strategies for participant compensation in high-burden studies</article-title>
          .
          <source>In Proceedings of the 13th international conference on Ubiquitous computing, ACM</source>
          (
          <year>2011</year>
          ),
          <volume>435</volume>
          {
          <fpage>444</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , and
          <article-title>Gonzalez-Iban~ez, R. Evaluating the synergic e ect of collaboration in information seeking</article-title>
          .
          <source>In SIGIR11: Proceedings of the 34th annual international ACM SIGIR conference on Research and development in information retrieval, July</source>
          <volume>24</volume>
          , vol.
          <volume>28</volume>
          (
          <year>2011</year>
          ),
          <volume>24</volume>
          {
          <fpage>28</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marchionini</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Learning design principles for a collaborative information seeking system</article-title>
          .
          <source>In Proceedings of the 27th international conference extended abstracts on Human factors in computing systems</source>
          ,
          <source>ACM</source>
          (
          <year>2009</year>
          ),
          <volume>3419</volume>
          {
          <fpage>3424</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Wildemuth</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Freund</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <article-title>Search tasks and their role in studies of search behaviors</article-title>
          .
          <source>In Third Annual Workshop on Human Computer Interaction and Information Retrieval</source>
          , Washington DC (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>