=Paper= {{Paper |id=None |storemode=property |title=Teaching HCI Methods: Replicating a Study of Collaborative Search |pdfUrl=https://ceur-ws.org/Vol-976/tpaper5.pdf |volume=Vol-976 |dblpUrl=https://dblp.org/rec/conf/chi/Wilson13 }} ==Teaching HCI Methods: Replicating a Study of Collaborative Search== https://ceur-ws.org/Vol-976/tpaper5.pdf
                        Teaching HCI Methods: Replicating
                        a Study of Collaborative Search

                                                               Abstract
Max L. Wilson
                                                               This paper describes the challenges experienced when
Mixed Reality Lab
University of Nottingham, UK
                                                               replicating a user study that evaluated synergy in a
max.wilson@nottingham.ac.uk                                    collaborative search system. The original paper saw
                                                               significant differences in collaborative performance,
                                                               depending on the mode of collaboration. We were unable
                                                               to replicate the findings, but experienced several
                                                               challenges that created ambiguity and differences in the
                                                               methods, which may have prevented us from doing so.
                                                               These challenges and experiences, and their affect on our
                                                               ability to replicate the findings, are described in detail.

                                                               Author Keywords
                                                               Collaborative search, Synergy, Replication

                                                               ACM Classification Keywords
                                                               H.5.3 [Group and Organization Interfaces]: Collaborative
                                                               computing.; H.3.3 [Information Search and Retrieval]:
                                                               Search Process.; H.3.7 [Digital Libraries]: User Issues.

                                                               Introduction
                                                               Hands on experience of replicating an experiment is often
                                                               considered a good method of teaching [2]. For this
                                                               reason, a cohort of 6 MSc students were asked to replicate
                                                               a user study; to learn the methodological and analytical
                                                               skills required to do so. Further, we hoped to confirm the
Copyright is held by the author/owner(s).                      findings for the benefit of the wider community. Based
This paper was submitted to RepliCHI 2013, a CHI’13 workshop
Original Task Description              upon the interests of the staff and students involved, we     • Software Procurement - Initially it was considered that
A leading newspaper has hired          chose to replicate a user study of the synergetic effect      the procurement of software would be very easy, as
your team to create a compre-          experienced by users searching in collaboration, originally   Coagmento can be easily downloaded from the website.
hensive report on the causes,          carried out by Shah and Gonzalez-Ibanez [5], herein           After installing the software, however, we noticed several
effects, and consequences of the       referred to as the original researchers.                      differences in the user interface to the system described in
recent gulf oil spill. As a part of
                                                                                                     the original paper [5]. The original researchers told us
your contract, you are required to     The original researchers studied their own collaborative
collect all the relevant information
                                                                                                     their study was based on an earlier version of the software.
                                       search software (Coagmento1 ), which had been evaluated       At first, we decided to accept the difference in
from any available online sources
                                       previously [6], to examine synergy between collaborators      functionality and to report it as a limitation later if
that you can find.
                                       in different group orientations. These orientations, as the   needed. The original researchers, however, agreed to try
To prepare this report, search and     primary independent variable, were co-located (same           and roll-back their functionality and provide us with a
visit any website that you want        computer), co-located (different computers), and remotely     version that matched the evaluated version. This was very
and look for specific aspects as       located (different computers); individual searchers,          generous of the original researchers, and not always an
given in the guideline below. As       automatically paired post hoc, were used as a baseline.       option for those wishing to replicate studies.
you find useful information, high-     The paper further contributed to the issue of evaluating
light and save relevant snippets.      synergy in collaborative search, by presenting new            • Data Capture - After investigating which data must be
Make sure you also rate a snippet      applicable measures. This focus on measures provided          captured for the study, we discovered that the original
to help you in ranking them based      additional learning benefit to the MSc students involved.     researchers captured the data at the server level. Again,
on their quality and usefulness.                                                                     we were faced with two options: video record the desktop
Later, you can use these snippets      The MSc students were given an entire semester to             and manually log the necessary data afterwards, or
to compile your report, no longer      coordinate and run the study, and had each had to write       request access to the data from the original researchers.
than 200 lines, as instructed.         about the results and the experience for their primary        The original researchers were again generous and agreed
                                       assessment. Support from the original researchers had         to provide us with the logs.
Your report on this topic should
address the following issues: de-
                                       been previously arranged by the staff.
                                                                                                     • Task Design - One significant challenge we faced was
scription of how the oil spill took
place, reactions by BP as well         Challenges Faced and Decisions Made                           task design. The study was based upon an open-ended
as various government and other        Significant challenges were faced throughout the              exploratory recall task, based upon american political
agencies, impact on economy and        replication attempt, from setting up the study, running       parties. Our third decision was whether we should keep
life (people and animals) in the       the study, and analysing the results. These are described     the american political task focus, or choose a more
gulf, attempts to fix the leak-        in turn below.                                                temporally (since the political topic had become old) and
ing well and to clean the waters,                                                                    culturally relevant task for the British university. Several
long-term implications and lessons     Setup Challenges                                              alternatives were proposed before making the decision,
learned.                               There were three major challenges in the setup phase:         and in the end a temporally and culturally relevant task
                                       software procurement, data capture, and task design.          was chosen that focused on the 2012 Olympics (see
                                                                                                     original and revised task descriptions in the margins).
                                                                                                     This decision was made because task relevance and
                                          1 http://www.coagmento.org/
                                       inherent motivation are considered key factors in creating    time to perform the study. Consequently, the students
                                       good work tasks for user studies [7, 1].                      had to make a decision, also relating to the financial
Revised Task Description                                                                             limitations, about how many participants to include in the
A leading newspaper has hired
                                       Running the Study                                             study. The students managed 40 participants in the
your team to create a compre-
hensive report on he causes,           There were three major challenges in the process of           timeframe, rather than the 70 involved in the original
effects and consequences of the        running the study: the experience of the research team,       research.
Olympic Games. As a part of            the financial support for incentives, and time limitations.
your contract, you are required to                                                                   Analysing the Results
collect all the relevant information   • Research Team - As this replication was being used to       There were two major challenges in the analysis phase:
from any available online sources      teach new MSc students about the process of running a         data processing and data analysis.
that you can find.
                                       study, the first and most obvious challenge is that the
                                       study is being run by inexperienced researchers. This         • Data Processing - The main challenge experienced in
To prepare this report, search and
visit any website that you want        challenge was further confounded by the necessity to          the analysis section was around the pre-processing of log
and look for specific aspects as       teach many students at once. In this case, the original       data for analysis. The original researchers, for example,
given in the guideline below. As       study was performed by one experienced phd student, but       removed search engine result pages from their analysis of
you find useful information, high-     the replication was carried out by 6 novice MSc students.     diverse website coverage, but the exact set of URLs
light and save relevant snippets.      Each MSc student required experience at designing study       considered as search engine results pages was implicit
Make sure you also rate a snippet      materials (like questionnaires), handling participants, and   rather than explicit. In fact, any form of log processing
to help you in ranking them based      analysing the results. This means that there was likely to
on their quality and usefulness.                                                                     and filtering in such a study would be a possible source of
                                       be a high variance in each of the stages. To reduce           variance in user studies, unless the exact rules are
Later, you can use these snippets      variance, one final protocol was selected from each of
to compile your report, no longer                                                                    accessible to the replicating team. One challenging
                                       protocols submitted by the students. However, there were      example is whether to include both a user’s typo and then
than 200 lines, as instructed.
                                       not many constraints, apart from a default script, in terms   their correction in analysing log data. In our own
Your report on this topic should       of how, where, and when the researchers carried out the       experiment, we created filters to achieve the same goals
address the following issues: Im-      study with their participants.                                as reported in the paper, but we could not guarantee the
pact on economy of host countries      • Financial Support for Incentives - As part of a taught      exact same data would be filtered as the original research,
(people and animals), long-term                                                                      given the same log; these elements of research methods
                                       module, rather than a funded research project, the
implications on the host country,
                                       students had to design alternative incentive methods. In      are extremely difficult to comprehensively report in
conditions and voting policy to be-
                                       the end, they choose a prize draw for a single prize          research publications.
come hosting nation and the next
host country and their prepara-        (provided by the staff), but of a value much lower than a
                                                                                                     • Data Analysis - With many methods, there are many
tions to host the games.               £10 voucher for each participant. There is some related
                                                                                                     variations on how to apply methods. In the case of this
                                       work (e.g. [4]) into the style of different incentive
                                                                                                     study, it was ambiguous as to how the data from the
                                       structures, but the effect in this case was not clear.
                                                                                                     NASA Task Load Index (TLX) [3] was analysed. Many
                                       • Time limitations - Also driven by the taught-module         studies remove physical effort from the scale, as using a
                                       based constraints, the students had a limited amount of       computer does not lend itself to variation in the physical
effort questions. In this case, it was unclear as to exactly   prevented us from getting the same findings. Reflectively,
how the NASA TLX was applied, including as to whether          its hard to estimate which element would have likely had
pair-wise comparisons were made.                               the biggest impact on our attempt to replicate the study.
                                                               First, the performance of the software, after being rolled
                                                               back, was not ideal and this alone may have obstructed
Study Outcome and Discussion                                   the synergetic effect seen by the original researchers.
The outcome of our replication attempt was that we
                                                               Second, the study was performed by several novice
could not replicate any of the original findings, as we hope
                                                               researchers, who may simply not have performed the
may be reported in detail in a future publication. In
                                                               study effectively. Third, the differences in the number of
summary, we saw no difference between the different
                                                               participants and the lack of voucher-based motivation
measures, where the original researchers found a number
                                                               could have limited the performance of participants.
of differences. However, there are many possible reasons
                                                               Fourth, task design has been seen to have a large affect
for the differences, where we’ll begin with the limitations
                                                               on task outcome, and so perhaps your culturally and
of our replication attempt.
                                                               temporary relevant task may have not have been suitable.
Limitations of our Replication                                 Finally, the processing of data for the analysis could have
Although we were somewhat privileged to have the               been simply different. Having some different or more
support of the original authors, we also had several           comprehensive filtering rules may have led to significant
limitations in our attempt:                                    differences in the measures.

   • Researchers - our study was performed by 6 novice         Implications for RepliCHI
     researchers, who each took part in running the            We chose to report this HCI replication, despite being
     study, with different individual abilities                focused on a user study not published at an HCI venue,
   • Participants - we had fewer participants (40 instead      because of the sheer number of issues that it highlighted
     of 70), but from a similar academic population            for a community that wants to better support replication.
   • Participant Motivation - as part of a teaching            Our specific example leaves many open questions that we
     module, participants were volunteers found by the         may wish to investigate:
     MSc students, and were not motivated in the same
                                                                  • What should we do when presented with different
     way as original study
                                                                    software versions from the original study?
   • Software - although the original researchers provided
                                                                  • Should we use original tasks? Or is it acceptable to
     rolled-back software for the study, the process of
                                                                    replace them for increased temporal/cultural
     rolling back introduced bugs that sometimes made
                                                                    relevance?
     the software unresponsive
                                                                  • Where data processing is involved, how should we
                                                                    best support others who wish to replicate our
Possible Causes of Different Findings                               studies?
There are many reasons, including those listed above, that
                                                                  • If we want to recommend replication as a form of
may have affected the outcome of our results, and
     teaching, what are the consequences of using groups           Technology 54, 10 (2003), 913–925.
     of novice researchers?                                    [2] Frank, M. C., and Saxe, R. Teaching replication.
   • If we can’t overcome these challenges, is there any           Perspectives on Psychological Science 7, 6 (2012),
     value in replicating the studies?                             600–604.
                                                               [3] Hart, S. Nasa-task load index (nasa-tlx); 20 years
                                                                   later. In Proceedings of the Human Factors and
Overall, the students experienced many challenges in               Ergonomics Society Annual Meeting, vol. 50, SAGE
trying to replicate the study, but learned a lot about study       Publications (2006), 904–908.
design and paper writing by doing so. For these                [4] Musthag, M., Raij, A., Ganesan, D., Kumar, S., and
educational reasons, the replication attempt provided a lot        Shiffman, S. Exploring micro-incentive strategies for
of value to the students. In terms of confirming the               participant compensation in high-burden studies. In
original study, we were unable to confirm the results, but         Proceedings of the 13th international conference on
were of course unable to disprove them also. This is               Ubiquitous computing, ACM (2011), 435–444.
perhaps a final challenge and discussion point for             [5] Shah, C., and González-Ibáñez, R. Evaluating the
replication in HCI: we need to decide what we take away            synergic effect of collaboration in information seeking.
from studies that cannot replicate findings, and what              In SIGIR11: Proceedings of the 34th annual
value we have from understanding them. From this                   international ACM SIGIR conference on Research and
experience report, we hope that researchers may learn              development in information retrieval, July 24, vol. 28
about several decisions that they may likely have to make          (2011), 24–28.
when performing replications, and perhaps make more            [6] Shah, C., Marchionini, G., and Kelly, D. Learning
informed choices when the time comes.                              design principles for a collaborative information
                                                                   seeking system. In Proceedings of the 27th
Acknowledgements                                                   international conference extended abstracts on Human
We’d like to thank the original authors, Chirag Shah and           factors in computing systems, ACM (2009),
Roberto Gonzalez-Ibanez for their support: providing               3419–3424.
software and and advice for the replication.                   [7] Wildemuth, B., and Freund, L. Search tasks and their
                                                                   role in studies of search behaviors. In Third Annual
References                                                         Workshop on Human Computer Interaction and
[1] Borlund, P. The concept of relevance in ir. Journal of         Information Retrieval, Washington DC (2009).
    the American Society for information Science and