Avoiding “Itʼs JUST a Replication” Bonnie E. John Abstract IBM T. J. Watson Research Center This position paper explores my experiences getting 1101 Kitchawan Rd replication studies accepted at the CHI conference over Yorktown Heights, NY 10598 USA the past 30 years. These experiences lead to my bejohn@us.ibm.com hypothesis that CHI reviewers and program committee members at all levels need education and technology support to understand and appropriately consider replication studies for publication at CHI. I propose a draconian “zeroth iteration” on a design for extensions to the Precision Conference System to spur discussion about how we can design our values into our processes. Author Keywords Experimental design, replication. ACM Classification Keywords H5.m. Information interfaces and presentation (e.g., HCI): Miscellaneous. General Terms Human Factors Introduction Replication has been at the heart of science for as long as the scientific method has existed; sometimes it feels as though I have been fighting for the value of replication at CHI almost as long. As an engineer by Presented at RepliCHI2013. Copyright © 2013 for the individual training and inclination, replication is of even more papers by the papers’ authors. Copying permitted only for private and importance for the practice of UI design, in my view, academic purposes. This volume is published and copyrighted by its because practitioners can (and should) only trust editors. results from science when the results have been when they vary any one of the myriad other variables replicated at several different research groups (i.e., in a study. direct replication) and the boundaries of applicability have been thoroughly explored through Extending the participants to a new user group. replicate+extend studies. I cannot count the number of For example, a study I cannot name for confidentiality times I have heard “Reject; it’s JUST another Fitts’s purposes was rejected when it replicated an educational Law study” or Reject; it’s JUST another GOMS study” at treatment using participants who were different from program committee meetings in our field. When the previously published work: they were at a lesser- present, I have sometimes been able to rescue these known school, they were in a different major and contributions to our field’s science base. I can only therefore could be assumed to be less motivated to do imagine how many such papers were rejected when I, well on a topic, and were given less direct access to or like-minded researchers, were not present and how expert support in doing the experimental support. The many potentially-contributing authors have been fact that these participants performed as well as the discouraged by such “JUST a replication” reviews. This majors at a top-of-the-line school studying under the position paper is a proposal of how to avoid “It’s JUST a inventor of the educational treatment is a replication replication” in the absence of dogmatic senior worth printing because it gives hope that the researchers like me. educational treatment will scale beyond the reach of its inventor. Hypotheses about the problem It is my experience that some sorts of replication are Similarly, a paper that was rescued from JUST-a- more acceptable to reviewers and program committees replication, but which I will not name to maintain than others. The most acceptable seem to be those confidentiality, described a well-known HCI method that replicate only a method, e.g., Baskin and John [1] being used by practitioners far outside the HCI field, used the same method of achieving extremely skilled having picked up the technique from the HCI literature task execution performance as did Card, Moran and and made profitable use of it, verified with empirical Newell [2]. Using the same method to study data. That any of our methods can be of use to people performance on a GUI CAD system [1] and a without our help is a result worth publishing because it command-line text editor [2] was not criticized by also shows that the beneficial impact of our field can reviewers, seemingly because the tasks were extend beyond the reach of our limited number of sufficiently different. My hypothesis is that method researchers. replication is not a problem in HCI research publication, so much so that it might not even be recognized as a Extending the measures in the study to cover new type of replication. questions Again, in a rejected paper I cannot reveal, a replication However, I know of replicate and extend papers falling was done that included additional survey data that (or being pushed) into the JUST-a-replication barrel explored why some behavior was observed in both the original and replication studies. The survey instrument results are the only ones that excite software was new, the data was new, and, to me, the insight it engineering audiences when I talk about them (SEs are revealed was new, but this was rejected as JUST-a- the target “users” of these research results). replication. Thus, there seems to be a disagreement in our community about how much extension constitutes Whether you agree that the results are exciting enough a publishable extension. In my opinion, the replication to publish is immaterial to the reviews we received – itself was valuable and the extension was icing on the “Reject; it’s JUST a replication” without comment on cake, but that was not the opinion of the reviewers. the new analyses and results. This leads me to the Differences of opinion about what does and does not hypothesis that new analyses are not sufficiently valued constitute a publishable contribution are not or understood by our reviewing community to warrant uncommon, and in fact should be encouraged, but the comment. The replication “surface structure” is enough reviews did not even acknowledge that there was any to push a paper into the JUST-a-replication barrel. extension at all, causing me to hypothesize that the definition of replicate+extend is not well assimilated And interesting point about the interaction of replication into our review community. and anonymous reviewing was brought out by this paper as well. This was in the era of CHI’s strict rules Direct replication to increase statistical power so that about anonymization, so we wrote about ourselves in new questions can be answered the third person, as instructed. A reviewer seemed to Tired of not being able to give details of the papers I think that using “Golden et. al’s” materials was have discussed above, I offer my own rejected CHI somehow cheating or lazy and criticized us for not paper to make a point about direct replication [4]. We creating our own materials. Again, this leads to the had done a study with only six participants per hypothesis that our reviewing community is in need of condition and the effect was so strong that it attained education about the process of a good replication (i.e., statistical significance on some coarse measures and NOT making your own materials) and highlights a was published at the IEEE’s International Conference on potential confound between anonyminity and Software Engineering [3]. The coarse measures did not replication. Might the paper have been less harshly help us understand why the participants performed reviewed if the reader had known that we did the better on some conditions than others and did not original study, i.e., we did do the hard work of creating distinguish between two conditions that had important the materials and were not cheating or lazy? implications for the practical use of the technique we were investigating. Therefore, we did a direct A proposed approach to a solution replication of the previous study, justified combining As explained above, my experiences lead me to the the data, and were able to tease out several new hypothesis that if our community is to embrace insights given the increased power of the combined replication and publish good ones, reviewers need to be study. We thought the results were a significant educated about what makes a good replication and its contribution beyond the initial study, and in fact, these value to the field. It is not sufficient to instruct Associate Chairs (ACs) and changes to include specific required fields that apply to Sub-committee Chairs (SCs) as was done at the replication studies. Include an information button next Program Committee meeting for CHI2013, because to every field so the reviewer can get information about reviewer scores push replications down in the rankings acceptable replication processes and the general value and we cannot depend on human memory in the heat of replication at the time of filling out the review. of PC debates to raise such papers to the level of Depending on how much we believe our target users discussion. need the education, we may consider presenting this information in a modal dialog box when field is first Therefore, I propose that we build our values into clicked by a reviewer with a button that dismisses the submission and reviewing software (Precision dialog box and a checkbox “do not show me this again” Conference System, PCS), to be a “job aid” to authors, appearing after a reasonable amount of time needed to reviewers, ACs and SCs, delivering education at the read the text in the box. time it is needed. Below I present “iteration 0” of a design for these extensions to PCS. Reviewers should be able to identify themselves to PCS as being skilled in assessing replications and interested Job aid for authors: in doing so. Present a required radio button for authors at submission time. Include an information button next to Job aid for Associate Chairs (ACs) the question that leads to information about what a If the author has declared the paper to be a replication, replication study is and what the criteria for reviewing this is indicated to the AC at paper-assignment time, so are for a replication study. the AC is aware that reviewers skilled in experimental design and analysis should be recruited. Such reviewers may be self-identified in PCS, as above. We may also consider allowing ACs and SCs to identify especially skilled replication reviewers in PCS, like we currently acknowledge excellent reviews. At review time, the AC’s meta-review form also It is possible that we would want to ask for the type of changes to include required fields that specifically replication (direct replication, replicate+extend, or address issues with replication, with information conceptual replication), but that may be introducing too buttons. much complexity in the first iteration. PCS could also automatically mark this paper “to be Job aid for reviewers discussed at the PC meeting”. Depending on how If the author has declared the paper to be a replication aggressive the CHI conference wants to be that year for study, then the review form shown to reviewers considering replication papers, this status may or may conference reviewing technology can support our value not be changed by the AC. system surrounding replication studies. I believe the need is there, let’s put our UI design skills and our Job aid for Subcommittee Chairs (SCs) SIG’s money where our values are. If the author has declared the paper to be a replication, this is indicated to the SC at the time that papers are Acknowledgements assigned to ACs, so the SC can assign an AC skilled in This research was supported by in part by IBM. The assessing replication. When recruiting ACs for a views and conclusions in this paper are those of the subcommittee likely to get replication submissions, the authors and should not be interpreted as representing SCs might be asked to identify one or two ACs who are the official policies, either expressed or implied of IBM. skilled in assessing replications, which will get the SCs thinking about this necessary skill when they can do References something about it instead of when replication studies [1] Baskin, J. D. & John, B. E. (1998) Comparison of arrive. GOMS analysis methods. Proceedings Companion of CHI, 1998 (Los Angeles CA, April 18-23, 1998) ACM, New York. Pp. 261-262. At the PC meeting, the SC’s view should highlight the [2] Card, S. K., Moran, T. P., Newell, A.: The papers that were identified by their authors as being Psychology of Human-Computer Interaction. Lawrence replication studies, so the SC can query the AC about Erlbaum Associates, Hillsdale, NJ (1983) them during the meeting. Even if PCS allows the AC to [3] Golden, E., John, B.E., and L. Bass. (2006) The change the status of the paper to “do not discuss” it value of a usability-supporting architectural pattern in would contribute to the education of all ACs if a software architecture design: A controlled experiment. sentence or two were said at the PC meeting about why Proceedings of the 27th International Conference on this replication paper was not being discussed. Software Engineering, May, 2005, St. Louis, MO. [4] Golden, E., John, B.E., and L. Bass. (2007) Helping Conclusion software developers achieve usability. Unpublished The zeroth iteration on changes to PCS proposed above replication study. are purposely draconian to start discussion of how our