<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Avoiding “Itʼs JUST a Replication”</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bonnie E. John</string-name>
          <email>bejohn@us.ibm.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IBM T. J. Watson Research Center 1101</institution>
          <addr-line>Kitchawan Rd Yorktown Heights, NY 10598</addr-line>
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This position paper explores my experiences getting replication studies accepted at the CHI conference over the past 30 years. These experiences lead to my hypothesis that CHI reviewers and program committee members at all levels need education and technology support to understand and appropriately consider replication studies for publication at CHI. I propose a draconian “zeroth iteration” on a design for extensions to the Precision Conference System to spur discussion about how we can design our values into our processes.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Presented at RepliCHI2013. Copyright © 2013 for the individual
papers by the papers’ authors. Copying permitted only for private and
academic purposes. This volume is published and copyrighted by its
editors.</p>
    </sec>
    <sec id="sec-2">
      <title>ACM Classification Keywords</title>
      <p>H5.m. Information interfaces and presentation (e.g.,
HCI): Miscellaneous.</p>
    </sec>
    <sec id="sec-3">
      <title>General Terms</title>
      <p>Human Factors</p>
    </sec>
    <sec id="sec-4">
      <title>Introduction</title>
      <p>Replication has been at the heart of science for as long
as the scientific method has existed; sometimes it feels
as though I have been fighting for the value of
replication at CHI almost as long. As an engineer by
training and inclination, replication is of even more
importance for the practice of UI design, in my view,
because practitioners can (and should) only trust
results from science when the results have been
replicated at several different research groups (i.e.,
direct replication) and the boundaries of applicability
have been thoroughly explored through
replicate+extend studies. I cannot count the number of
times I have heard “Reject; it’s JUST another Fitts’s
Law study” or Reject; it’s JUST another GOMS study” at
program committee meetings in our field. When
present, I have sometimes been able to rescue these
contributions to our field’s science base. I can only
imagine how many such papers were rejected when I,
or like-minded researchers, were not present and how
many potentially-contributing authors have been
discouraged by such “JUST a replication” reviews. This
position paper is a proposal of how to avoid “It’s JUST a
replication” in the absence of dogmatic senior
researchers like me.</p>
    </sec>
    <sec id="sec-5">
      <title>Hypotheses about the problem</title>
      <p>
        It is my experience that some sorts of replication are
more acceptable to reviewers and program committees
than others. The most acceptable seem to be those
that replicate only a method, e.g., Baskin and John [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
used the same method of achieving extremely skilled
task execution performance as did Card, Moran and
Newell [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Using the same method to study
performance on a GUI CAD system [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and a
command-line text editor [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] was not criticized by
reviewers, seemingly because the tasks were
sufficiently different. My hypothesis is that method
replication is not a problem in HCI research publication,
so much so that it might not even be recognized as a
type of replication.
      </p>
      <p>However, I know of replicate and extend papers falling
(or being pushed) into the JUST-a-replication barrel
when they vary any one of the myriad other variables
in a study.</p>
      <sec id="sec-5-1">
        <title>Extending the participants to a new user group.</title>
        <p>For example, a study I cannot name for confidentiality
purposes was rejected when it replicated an educational
treatment using participants who were different from
the previously published work: they were at a
lesserknown school, they were in a different major and
therefore could be assumed to be less motivated to do
well on a topic, and were given less direct access to
expert support in doing the experimental support. The
fact that these participants performed as well as the
majors at a top-of-the-line school studying under the
inventor of the educational treatment is a replication
worth printing because it gives hope that the
educational treatment will scale beyond the reach of its
inventor.</p>
        <p>Similarly, a paper that was rescued from
JUST-areplication, but which I will not name to maintain
confidentiality, described a well-known HCI method
being used by practitioners far outside the HCI field,
having picked up the technique from the HCI literature
and made profitable use of it, verified with empirical
data. That any of our methods can be of use to people
without our help is a result worth publishing because it
also shows that the beneficial impact of our field can
extend beyond the reach of our limited number of
researchers.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Extending the measures in the study to cover new questions</title>
        <p>Again, in a rejected paper I cannot reveal, a replication
was done that included additional survey data that
explored why some behavior was observed in both the
original and replication studies. The survey instrument
was new, the data was new, and, to me, the insight it
revealed was new, but this was rejected as
JUST-areplication. Thus, there seems to be a disagreement in
our community about how much extension constitutes
a publishable extension. In my opinion, the replication
itself was valuable and the extension was icing on the
cake, but that was not the opinion of the reviewers.
Differences of opinion about what does and does not
constitute a publishable contribution are not
uncommon, and in fact should be encouraged, but the
reviews did not even acknowledge that there was any
extension at all, causing me to hypothesize that the
definition of replicate+extend is not well assimilated
into our review community.</p>
      </sec>
      <sec id="sec-5-3">
        <title>Direct replication to increase statistical power so that new questions can be answered</title>
        <p>
          Tired of not being able to give details of the papers I
have discussed above, I offer my own rejected CHI
paper to make a point about direct replication [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. We
had done a study with only six participants per
condition and the effect was so strong that it attained
statistical significance on some coarse measures and
was published at the IEEE’s International Conference on
Software Engineering [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The coarse measures did not
help us understand why the participants performed
better on some conditions than others and did not
distinguish between two conditions that had important
implications for the practical use of the technique we
were investigating. Therefore, we did a direct
replication of the previous study, justified combining
the data, and were able to tease out several new
insights given the increased power of the combined
study. We thought the results were a significant
contribution beyond the initial study, and in fact, these
results are the only ones that excite software
engineering audiences when I talk about them (SEs are
the target “users” of these research results).
        </p>
        <p>Whether you agree that the results are exciting enough
to publish is immaterial to the reviews we received –
“Reject; it’s JUST a replication” without comment on
the new analyses and results. This leads me to the
hypothesis that new analyses are not sufficiently valued
or understood by our reviewing community to warrant
comment. The replication “surface structure” is enough
to push a paper into the JUST-a-replication barrel.
And interesting point about the interaction of replication
and anonymous reviewing was brought out by this
paper as well. This was in the era of CHI’s strict rules
about anonymization, so we wrote about ourselves in
the third person, as instructed. A reviewer seemed to
think that using “Golden et. al’s” materials was
somehow cheating or lazy and criticized us for not
creating our own materials. Again, this leads to the
hypothesis that our reviewing community is in need of
education about the process of a good replication (i.e.,
NOT making your own materials) and highlights a
potential confound between anonyminity and
replication. Might the paper have been less harshly
reviewed if the reader had known that we did the
original study, i.e., we did do the hard work of creating
the materials and were not cheating or lazy?</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>A proposed approach to a solution</title>
      <p>As explained above, my experiences lead me to the
hypothesis that if our community is to embrace
replication and publish good ones, reviewers need to be
educated about what makes a good replication and its
value to the field.</p>
      <p>It is not sufficient to instruct Associate Chairs (ACs) and
Sub-committee Chairs (SCs) as was done at the
Program Committee meeting for CHI2013, because
reviewer scores push replications down in the rankings
and we cannot depend on human memory in the heat
of PC debates to raise such papers to the level of
discussion.</p>
      <p>Therefore, I propose that we build our values into
submission and reviewing software (Precision
Conference System, PCS), to be a “job aid” to authors,
reviewers, ACs and SCs, delivering education at the
time it is needed. Below I present “iteration 0” of a
design for these extensions to PCS.</p>
      <sec id="sec-6-1">
        <title>Job aid for authors:</title>
        <p>Present a required radio button for authors at
submission time. Include an information button next to
the question that leads to information about what a
replication study is and what the criteria for reviewing
are for a replication study.</p>
        <p>It is possible that we would want to ask for the type of
replication (direct replication, replicate+extend, or
conceptual replication), but that may be introducing too
much complexity in the first iteration.</p>
      </sec>
      <sec id="sec-6-2">
        <title>Job aid for reviewers</title>
        <p>If the author has declared the paper to be a replication
study, then the review form shown to reviewers
changes to include specific required fields that apply to
replication studies. Include an information button next
to every field so the reviewer can get information about
acceptable replication processes and the general value
of replication at the time of filling out the review.
Depending on how much we believe our target users
need the education, we may consider presenting this
information in a modal dialog box when field is first
clicked by a reviewer with a button that dismisses the
dialog box and a checkbox “do not show me this again”
appearing after a reasonable amount of time needed to
read the text in the box.</p>
        <p>Reviewers should be able to identify themselves to PCS
as being skilled in assessing replications and interested
in doing so.</p>
      </sec>
      <sec id="sec-6-3">
        <title>Job aid for Associate Chairs (ACs)</title>
        <p>If the author has declared the paper to be a replication,
this is indicated to the AC at paper-assignment time, so
the AC is aware that reviewers skilled in experimental
design and analysis should be recruited. Such
reviewers may be self-identified in PCS, as above. We
may also consider allowing ACs and SCs to identify
especially skilled replication reviewers in PCS, like we
currently acknowledge excellent reviews.</p>
        <p>At review time, the AC’s meta-review form also
changes to include required fields that specifically
address issues with replication, with information
buttons.</p>
        <p>PCS could also automatically mark this paper “to be
discussed at the PC meeting”. Depending on how
aggressive the CHI conference wants to be that year for
considering replication papers, this status may or may
not be changed by the AC.</p>
      </sec>
      <sec id="sec-6-4">
        <title>Job aid for Subcommittee Chairs (SCs)</title>
        <p>If the author has declared the paper to be a replication,
this is indicated to the SC at the time that papers are
assigned to ACs, so the SC can assign an AC skilled in
assessing replication. When recruiting ACs for a
subcommittee likely to get replication submissions, the
SCs might be asked to identify one or two ACs who are
skilled in assessing replications, which will get the SCs
thinking about this necessary skill when they can do
something about it instead of when replication studies
arrive.</p>
        <p>At the PC meeting, the SC’s view should highlight the
papers that were identified by their authors as being
replication studies, so the SC can query the AC about
them during the meeting. Even if PCS allows the AC to
change the status of the paper to “do not discuss” it
would contribute to the education of all ACs if a
sentence or two were said at the PC meeting about why
this replication paper was not being discussed.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>The zeroth iteration on changes to PCS proposed above
are purposely draconian to start discussion of how our
conference reviewing technology can support our value
system surrounding replication studies. I believe the
need is there, let’s put our UI design skills and our
SIG’s money where our values are.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>This research was supported by in part by IBM. The
views and conclusions in this paper are those of the
authors and should not be interpreted as representing
the official policies, either expressed or implied of IBM.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Baskin</surname>
            ,
            <given-names>J. D.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>John</surname>
            ,
            <given-names>B. E.</given-names>
          </string-name>
          (
          <year>1998</year>
          )
          <article-title>Comparison of GOMS analysis methods</article-title>
          .
          <source>Proceedings Companion of CHI</source>
          ,
          <year>1998</year>
          (Los Angeles CA, April
          <volume>18</volume>
          -
          <issue>23</issue>
          ,
          <year>1998</year>
          ) ACM, New York. Pp.
          <volume>261</volume>
          -
          <fpage>262</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Card</surname>
            ,
            <given-names>S. K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moran</surname>
            ,
            <given-names>T. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Newell</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The Psychology of Human-Computer Interaction</article-title>
          . Lawrence Erlbaum Associates, Hillsdale, NJ (
          <year>1983</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Golden</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>John</surname>
            ,
            <given-names>B.E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>L. Bass.</surname>
          </string-name>
          (
          <year>2006</year>
          )
          <article-title>The value of a usability-supporting architectural pattern in software architecture design: A controlled experiment</article-title>
          .
          <source>Proceedings of the 27th International Conference on Software Engineering</source>
          , May,
          <year>2005</year>
          , St. Louis, MO.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Golden</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>John</surname>
            ,
            <given-names>B.E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>L. Bass.</surname>
          </string-name>
          (
          <year>2007</year>
          )
          <article-title>Helping software developers achieve usability</article-title>
          .
          <source>Unpublished replication study.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>