Avoiding “Itʼs JUST a Replication”

Bonnie E. John                                                          Abstract
IBM T. J. Watson Research Center                                        This position paper explores my experiences getting
1101 Kitchawan Rd                                                       replication studies accepted at the CHI conference over
Yorktown Heights, NY 10598 USA                                          the past 30 years. These experiences lead to my
bejohn@us.ibm.com                                                       hypothesis that CHI reviewers and program committee
                                                                        members at all levels need education and technology
                                                                        support to understand and appropriately consider
                                                                        replication studies for publication at CHI. I propose a
                                                                        draconian “zeroth iteration” on a design for extensions
                                                                        to the Precision Conference System to spur discussion
                                                                        about how we can design our values into our processes.

                                                                        Author Keywords
                                                                        Experimental design, replication.

                                                                        ACM Classification Keywords
                                                                        H5.m. Information interfaces and presentation (e.g.,
                                                                        HCI): Miscellaneous.

                                                                        General Terms
                                                                        Human Factors

                                                                        Introduction
                                                                        Replication has been at the heart of science for as long
                                                                        as the scientific method has existed; sometimes it feels
                                                                        as though I have been fighting for the value of
                                                                        replication at CHI almost as long. As an engineer by
Presented at RepliCHI2013. Copyright © 2013 for the individual
                                                                        training and inclination, replication is of even more
papers by the papers’ authors. Copying permitted only for private and
                                                                        importance for the practice of UI design, in my view,
academic purposes. This volume is published and copyrighted by its
                                                                        because practitioners can (and should) only trust
editors.
results from science when the results have been             when they vary any one of the myriad other variables
replicated at several different research groups (i.e.,      in a study.
direct replication) and the boundaries of applicability
have been thoroughly explored through                       Extending the participants to a new user group.
replicate+extend studies. I cannot count the number of      For example, a study I cannot name for confidentiality
times I have heard “Reject; it’s JUST another Fitts’s       purposes was rejected when it replicated an educational
Law study” or Reject; it’s JUST another GOMS study” at      treatment using participants who were different from
program committee meetings in our field. When               the previously published work: they were at a lesser-
present, I have sometimes been able to rescue these         known school, they were in a different major and
contributions to our field’s science base. I can only       therefore could be assumed to be less motivated to do
imagine how many such papers were rejected when I,          well on a topic, and were given less direct access to
or like-minded researchers, were not present and how        expert support in doing the experimental support. The
many potentially-contributing authors have been             fact that these participants performed as well as the
discouraged by such “JUST a replication” reviews. This      majors at a top-of-the-line school studying under the
position paper is a proposal of how to avoid “It’s JUST a   inventor of the educational treatment is a replication
replication” in the absence of dogmatic senior              worth printing because it gives hope that the
researchers like me.                                        educational treatment will scale beyond the reach of its
                                                            inventor.
Hypotheses about the problem
It is my experience that some sorts of replication are      Similarly, a paper that was rescued from JUST-a-
more acceptable to reviewers and program committees         replication, but which I will not name to maintain
than others. The most acceptable seem to be those           confidentiality, described a well-known HCI method
that replicate only a method, e.g., Baskin and John [1]     being used by practitioners far outside the HCI field,
used the same method of achieving extremely skilled         having picked up the technique from the HCI literature
task execution performance as did Card, Moran and           and made profitable use of it, verified with empirical
Newell [2]. Using the same method to study                  data. That any of our methods can be of use to people
performance on a GUI CAD system [1] and a                   without our help is a result worth publishing because it
command-line text editor [2] was not criticized by          also shows that the beneficial impact of our field can
reviewers, seemingly because the tasks were                 extend beyond the reach of our limited number of
sufficiently different. My hypothesis is that method        researchers.
replication is not a problem in HCI research publication,
so much so that it might not even be recognized as a        Extending the measures in the study to cover new
type of replication.                                        questions
                                                            Again, in a rejected paper I cannot reveal, a replication
However, I know of replicate and extend papers falling      was done that included additional survey data that
(or being pushed) into the JUST-a-replication barrel        explored why some behavior was observed in both the
original and replication studies. The survey instrument     results are the only ones that excite software
was new, the data was new, and, to me, the insight it       engineering audiences when I talk about them (SEs are
revealed was new, but this was rejected as JUST-a-          the target “users” of these research results).
replication. Thus, there seems to be a disagreement in
our community about how much extension constitutes          Whether you agree that the results are exciting enough
a publishable extension. In my opinion, the replication     to publish is immaterial to the reviews we received –
itself was valuable and the extension was icing on the      “Reject; it’s JUST a replication” without comment on
cake, but that was not the opinion of the reviewers.        the new analyses and results. This leads me to the
Differences of opinion about what does and does not         hypothesis that new analyses are not sufficiently valued
constitute a publishable contribution are not               or understood by our reviewing community to warrant
uncommon, and in fact should be encouraged, but the         comment. The replication “surface structure” is enough
reviews did not even acknowledge that there was any         to push a paper into the JUST-a-replication barrel.
extension at all, causing me to hypothesize that the
definition of replicate+extend is not well assimilated      And interesting point about the interaction of replication
into our review community.                                  and anonymous reviewing was brought out by this
                                                            paper as well. This was in the era of CHI’s strict rules
Direct replication to increase statistical power so that    about anonymization, so we wrote about ourselves in
new questions can be answered                               the third person, as instructed. A reviewer seemed to
Tired of not being able to give details of the papers I     think that using “Golden et. al’s” materials was
have discussed above, I offer my own rejected CHI           somehow cheating or lazy and criticized us for not
paper to make a point about direct replication [4]. We      creating our own materials. Again, this leads to the
had done a study with only six participants per             hypothesis that our reviewing community is in need of
condition and the effect was so strong that it attained     education about the process of a good replication (i.e.,
statistical significance on some coarse measures and        NOT making your own materials) and highlights a
was published at the IEEE’s International Conference on     potential confound between anonyminity and
Software Engineering [3]. The coarse measures did not       replication. Might the paper have been less harshly
help us understand why the participants performed           reviewed if the reader had known that we did the
better on some conditions than others and did not           original study, i.e., we did do the hard work of creating
distinguish between two conditions that had important       the materials and were not cheating or lazy?
implications for the practical use of the technique we
were investigating. Therefore, we did a direct              A proposed approach to a solution
replication of the previous study, justified combining      As explained above, my experiences lead me to the
the data, and were able to tease out several new            hypothesis that if our community is to embrace
insights given the increased power of the combined          replication and publish good ones, reviewers need to be
study. We thought the results were a significant            educated about what makes a good replication and its
contribution beyond the initial study, and in fact, these   value to the field.
It is not sufficient to instruct Associate Chairs (ACs) and   changes to include specific required fields that apply to
Sub-committee Chairs (SCs) as was done at the                 replication studies. Include an information button next
Program Committee meeting for CHI2013, because                to every field so the reviewer can get information about
reviewer scores push replications down in the rankings        acceptable replication processes and the general value
and we cannot depend on human memory in the heat              of replication at the time of filling out the review.
of PC debates to raise such papers to the level of            Depending on how much we believe our target users
discussion.                                                   need the education, we may consider presenting this
                                                              information in a modal dialog box when field is first
Therefore, I propose that we build our values into            clicked by a reviewer with a button that dismisses the
submission and reviewing software (Precision                  dialog box and a checkbox “do not show me this again”
Conference System, PCS), to be a “job aid” to authors,        appearing after a reasonable amount of time needed to
reviewers, ACs and SCs, delivering education at the           read the text in the box.
time it is needed. Below I present “iteration 0” of a
design for these extensions to PCS.                           Reviewers should be able to identify themselves to PCS
                                                              as being skilled in assessing replications and interested
Job aid for authors:                                          in doing so.
Present a required radio button for authors at
submission time. Include an information button next to        Job aid for Associate Chairs (ACs)
the question that leads to information about what a           If the author has declared the paper to be a replication,
replication study is and what the criteria for reviewing      this is indicated to the AC at paper-assignment time, so
are for a replication study.                                  the AC is aware that reviewers skilled in experimental
                                                              design and analysis should be recruited. Such
                                                              reviewers may be self-identified in PCS, as above. We
                                                              may also consider allowing ACs and SCs to identify
                                                              especially skilled replication reviewers in PCS, like we
                                                              currently acknowledge excellent reviews.

                                                              At review time, the AC’s meta-review form also
It is possible that we would want to ask for the type of      changes to include required fields that specifically
replication (direct replication, replicate+extend, or         address issues with replication, with information
conceptual replication), but that may be introducing too      buttons.
much complexity in the first iteration.
                                                              PCS could also automatically mark this paper “to be
Job aid for reviewers                                         discussed at the PC meeting”. Depending on how
If the author has declared the paper to be a replication      aggressive the CHI conference wants to be that year for
study, then the review form shown to reviewers
considering replication papers, this status may or may      conference reviewing technology can support our value
not be changed by the AC.                                   system surrounding replication studies. I believe the
                                                            need is there, let’s put our UI design skills and our
Job aid for Subcommittee Chairs (SCs)                       SIG’s money where our values are.
If the author has declared the paper to be a replication,
this is indicated to the SC at the time that papers are     Acknowledgements
assigned to ACs, so the SC can assign an AC skilled in      This research was supported by in part by IBM. The
assessing replication. When recruiting ACs for a            views and conclusions in this paper are those of the
subcommittee likely to get replication submissions, the     authors and should not be interpreted as representing
SCs might be asked to identify one or two ACs who are       the official policies, either expressed or implied of IBM.
skilled in assessing replications, which will get the SCs
thinking about this necessary skill when they can do        References
something about it instead of when replication studies      [1] Baskin, J. D. & John, B. E. (1998) Comparison of
arrive.                                                     GOMS analysis methods. Proceedings Companion of
                                                            CHI, 1998 (Los Angeles CA, April 18-23, 1998) ACM,
                                                            New York. Pp. 261-262.
At the PC meeting, the SC’s view should highlight the
                                                            [2] Card, S. K., Moran, T. P., Newell, A.: The
papers that were identified by their authors as being
                                                            Psychology of Human-Computer Interaction. Lawrence
replication studies, so the SC can query the AC about       Erlbaum Associates, Hillsdale, NJ (1983)
them during the meeting. Even if PCS allows the AC to
                                                            [3] Golden, E., John, B.E., and L. Bass. (2006) The
change the status of the paper to “do not discuss” it
                                                            value of a usability-supporting architectural pattern in
would contribute to the education of all ACs if a           software architecture design: A controlled experiment.
sentence or two were said at the PC meeting about why       Proceedings of the 27th International Conference on
this replication paper was not being discussed.             Software Engineering, May, 2005, St. Louis, MO.
                                                            [4] Golden, E., John, B.E., and L. Bass. (2007) Helping
Conclusion                                                  software developers achieve usability. Unpublished
The zeroth iteration on changes to PCS proposed above       replication study.
are purposely draconian to start discussion of how our