=Paper=
{{Paper
|id=None
|storemode=property
|title=Avoiding "It's JUST a Replication"
|pdfUrl=https://ceur-ws.org/Vol-976/ppaper1.pdf
|volume=Vol-976
|dblpUrl=https://dblp.org/rec/conf/chi/John13
}}
==Avoiding "It's JUST a Replication"==
Avoiding “Itʼs JUST a Replication”
Bonnie E. John Abstract
IBM T. J. Watson Research Center This position paper explores my experiences getting
1101 Kitchawan Rd replication studies accepted at the CHI conference over
Yorktown Heights, NY 10598 USA the past 30 years. These experiences lead to my
bejohn@us.ibm.com hypothesis that CHI reviewers and program committee
members at all levels need education and technology
support to understand and appropriately consider
replication studies for publication at CHI. I propose a
draconian “zeroth iteration” on a design for extensions
to the Precision Conference System to spur discussion
about how we can design our values into our processes.
Author Keywords
Experimental design, replication.
ACM Classification Keywords
H5.m. Information interfaces and presentation (e.g.,
HCI): Miscellaneous.
General Terms
Human Factors
Introduction
Replication has been at the heart of science for as long
as the scientific method has existed; sometimes it feels
as though I have been fighting for the value of
replication at CHI almost as long. As an engineer by
Presented at RepliCHI2013. Copyright © 2013 for the individual
training and inclination, replication is of even more
papers by the papers’ authors. Copying permitted only for private and
importance for the practice of UI design, in my view,
academic purposes. This volume is published and copyrighted by its
because practitioners can (and should) only trust
editors.
results from science when the results have been when they vary any one of the myriad other variables
replicated at several different research groups (i.e., in a study.
direct replication) and the boundaries of applicability
have been thoroughly explored through Extending the participants to a new user group.
replicate+extend studies. I cannot count the number of For example, a study I cannot name for confidentiality
times I have heard “Reject; it’s JUST another Fitts’s purposes was rejected when it replicated an educational
Law study” or Reject; it’s JUST another GOMS study” at treatment using participants who were different from
program committee meetings in our field. When the previously published work: they were at a lesser-
present, I have sometimes been able to rescue these known school, they were in a different major and
contributions to our field’s science base. I can only therefore could be assumed to be less motivated to do
imagine how many such papers were rejected when I, well on a topic, and were given less direct access to
or like-minded researchers, were not present and how expert support in doing the experimental support. The
many potentially-contributing authors have been fact that these participants performed as well as the
discouraged by such “JUST a replication” reviews. This majors at a top-of-the-line school studying under the
position paper is a proposal of how to avoid “It’s JUST a inventor of the educational treatment is a replication
replication” in the absence of dogmatic senior worth printing because it gives hope that the
researchers like me. educational treatment will scale beyond the reach of its
inventor.
Hypotheses about the problem
It is my experience that some sorts of replication are Similarly, a paper that was rescued from JUST-a-
more acceptable to reviewers and program committees replication, but which I will not name to maintain
than others. The most acceptable seem to be those confidentiality, described a well-known HCI method
that replicate only a method, e.g., Baskin and John [1] being used by practitioners far outside the HCI field,
used the same method of achieving extremely skilled having picked up the technique from the HCI literature
task execution performance as did Card, Moran and and made profitable use of it, verified with empirical
Newell [2]. Using the same method to study data. That any of our methods can be of use to people
performance on a GUI CAD system [1] and a without our help is a result worth publishing because it
command-line text editor [2] was not criticized by also shows that the beneficial impact of our field can
reviewers, seemingly because the tasks were extend beyond the reach of our limited number of
sufficiently different. My hypothesis is that method researchers.
replication is not a problem in HCI research publication,
so much so that it might not even be recognized as a Extending the measures in the study to cover new
type of replication. questions
Again, in a rejected paper I cannot reveal, a replication
However, I know of replicate and extend papers falling was done that included additional survey data that
(or being pushed) into the JUST-a-replication barrel explored why some behavior was observed in both the
original and replication studies. The survey instrument results are the only ones that excite software
was new, the data was new, and, to me, the insight it engineering audiences when I talk about them (SEs are
revealed was new, but this was rejected as JUST-a- the target “users” of these research results).
replication. Thus, there seems to be a disagreement in
our community about how much extension constitutes Whether you agree that the results are exciting enough
a publishable extension. In my opinion, the replication to publish is immaterial to the reviews we received –
itself was valuable and the extension was icing on the “Reject; it’s JUST a replication” without comment on
cake, but that was not the opinion of the reviewers. the new analyses and results. This leads me to the
Differences of opinion about what does and does not hypothesis that new analyses are not sufficiently valued
constitute a publishable contribution are not or understood by our reviewing community to warrant
uncommon, and in fact should be encouraged, but the comment. The replication “surface structure” is enough
reviews did not even acknowledge that there was any to push a paper into the JUST-a-replication barrel.
extension at all, causing me to hypothesize that the
definition of replicate+extend is not well assimilated And interesting point about the interaction of replication
into our review community. and anonymous reviewing was brought out by this
paper as well. This was in the era of CHI’s strict rules
Direct replication to increase statistical power so that about anonymization, so we wrote about ourselves in
new questions can be answered the third person, as instructed. A reviewer seemed to
Tired of not being able to give details of the papers I think that using “Golden et. al’s” materials was
have discussed above, I offer my own rejected CHI somehow cheating or lazy and criticized us for not
paper to make a point about direct replication [4]. We creating our own materials. Again, this leads to the
had done a study with only six participants per hypothesis that our reviewing community is in need of
condition and the effect was so strong that it attained education about the process of a good replication (i.e.,
statistical significance on some coarse measures and NOT making your own materials) and highlights a
was published at the IEEE’s International Conference on potential confound between anonyminity and
Software Engineering [3]. The coarse measures did not replication. Might the paper have been less harshly
help us understand why the participants performed reviewed if the reader had known that we did the
better on some conditions than others and did not original study, i.e., we did do the hard work of creating
distinguish between two conditions that had important the materials and were not cheating or lazy?
implications for the practical use of the technique we
were investigating. Therefore, we did a direct A proposed approach to a solution
replication of the previous study, justified combining As explained above, my experiences lead me to the
the data, and were able to tease out several new hypothesis that if our community is to embrace
insights given the increased power of the combined replication and publish good ones, reviewers need to be
study. We thought the results were a significant educated about what makes a good replication and its
contribution beyond the initial study, and in fact, these value to the field.
It is not sufficient to instruct Associate Chairs (ACs) and changes to include specific required fields that apply to
Sub-committee Chairs (SCs) as was done at the replication studies. Include an information button next
Program Committee meeting for CHI2013, because to every field so the reviewer can get information about
reviewer scores push replications down in the rankings acceptable replication processes and the general value
and we cannot depend on human memory in the heat of replication at the time of filling out the review.
of PC debates to raise such papers to the level of Depending on how much we believe our target users
discussion. need the education, we may consider presenting this
information in a modal dialog box when field is first
Therefore, I propose that we build our values into clicked by a reviewer with a button that dismisses the
submission and reviewing software (Precision dialog box and a checkbox “do not show me this again”
Conference System, PCS), to be a “job aid” to authors, appearing after a reasonable amount of time needed to
reviewers, ACs and SCs, delivering education at the read the text in the box.
time it is needed. Below I present “iteration 0” of a
design for these extensions to PCS. Reviewers should be able to identify themselves to PCS
as being skilled in assessing replications and interested
Job aid for authors: in doing so.
Present a required radio button for authors at
submission time. Include an information button next to Job aid for Associate Chairs (ACs)
the question that leads to information about what a If the author has declared the paper to be a replication,
replication study is and what the criteria for reviewing this is indicated to the AC at paper-assignment time, so
are for a replication study. the AC is aware that reviewers skilled in experimental
design and analysis should be recruited. Such
reviewers may be self-identified in PCS, as above. We
may also consider allowing ACs and SCs to identify
especially skilled replication reviewers in PCS, like we
currently acknowledge excellent reviews.
At review time, the AC’s meta-review form also
It is possible that we would want to ask for the type of changes to include required fields that specifically
replication (direct replication, replicate+extend, or address issues with replication, with information
conceptual replication), but that may be introducing too buttons.
much complexity in the first iteration.
PCS could also automatically mark this paper “to be
Job aid for reviewers discussed at the PC meeting”. Depending on how
If the author has declared the paper to be a replication aggressive the CHI conference wants to be that year for
study, then the review form shown to reviewers
considering replication papers, this status may or may conference reviewing technology can support our value
not be changed by the AC. system surrounding replication studies. I believe the
need is there, let’s put our UI design skills and our
Job aid for Subcommittee Chairs (SCs) SIG’s money where our values are.
If the author has declared the paper to be a replication,
this is indicated to the SC at the time that papers are Acknowledgements
assigned to ACs, so the SC can assign an AC skilled in This research was supported by in part by IBM. The
assessing replication. When recruiting ACs for a views and conclusions in this paper are those of the
subcommittee likely to get replication submissions, the authors and should not be interpreted as representing
SCs might be asked to identify one or two ACs who are the official policies, either expressed or implied of IBM.
skilled in assessing replications, which will get the SCs
thinking about this necessary skill when they can do References
something about it instead of when replication studies [1] Baskin, J. D. & John, B. E. (1998) Comparison of
arrive. GOMS analysis methods. Proceedings Companion of
CHI, 1998 (Los Angeles CA, April 18-23, 1998) ACM,
New York. Pp. 261-262.
At the PC meeting, the SC’s view should highlight the
[2] Card, S. K., Moran, T. P., Newell, A.: The
papers that were identified by their authors as being
Psychology of Human-Computer Interaction. Lawrence
replication studies, so the SC can query the AC about Erlbaum Associates, Hillsdale, NJ (1983)
them during the meeting. Even if PCS allows the AC to
[3] Golden, E., John, B.E., and L. Bass. (2006) The
change the status of the paper to “do not discuss” it
value of a usability-supporting architectural pattern in
would contribute to the education of all ACs if a software architecture design: A controlled experiment.
sentence or two were said at the PC meeting about why Proceedings of the 27th International Conference on
this replication paper was not being discussed. Software Engineering, May, 2005, St. Louis, MO.
[4] Golden, E., John, B.E., and L. Bass. (2007) Helping
Conclusion software developers achieve usability. Unpublished
The zeroth iteration on changes to PCS proposed above replication study.
are purposely draconian to start discussion of how our