=Paper= {{Paper |id=None |storemode=property |title=RepliPRI: Challenges in Replicating Studies of Online Privacy |pdfUrl=https://ceur-ws.org/Vol-976/ppaper3.pdf |volume=Vol-976 |dblpUrl=https://dblp.org/rec/conf/chi/Patil13 }} ==RepliPRI: Challenges in Replicating Studies of Online Privacy== https://ceur-ws.org/Vol-976/ppaper3.pdf
                        RepliPRI: Challenges in Replicating
                        Studies of Online Privacy

Sameer Patil
Helsinki Institute for
                                                              Abstract
Information Technology HIIT
                                                              Replication of prior results has recently attracted
Aalto University                                              attention and interest from the CHI community. This
Aalto 00076, FInland                                          paper focuses on the challenges and issues faced in
sameer.patil@hiit.fi                                          carrying out meaningful and valid replications of HCI
                                                              studies. I attribute these challenges to two main
                                                              underlying factors: (i) a domain of inquiry that
                                                              simultaneously covers people, social systems, and
                                                              technology; and (ii) deficiencies in result reporting and
                                                              data archiving. Using examples from investigations of
                                                              online privacy, I outline how these challenges manifest
                                                              themselves in HCI studies. Longitudinal approaches,
                                                              international collaboration, and sharing of study
                                                              instruments could help address these challenges.

                                                              Author Keywords
                                                              Replication, Privacy, Cultural differences

                                                              ACM Classification Keywords
                                                              H.1.2 [User/Machine Systems]: Human factors.

                                                              General Terms
                                                              Human Factors, Security

Presented at RepliCHI2013. Copyright c 2013 for the indi-     Introduction
vidual papers by the papers authors. Copying permitted only   Replication of prior results has recently attracted
for private and academic purposes. This volume is published   attention and interest from the CHI community. The
and copyrighted by its editors.
resulting discussions tackle replication from two                 is the publication describing the results of the
important perspectives: higher level epistemological              study. Unfortunately, due to page limits and other
debate on the place and merits of replication in the              editorial reasons, publications often do not
scientific (publishing) enterprise and the lower-level            include all information — about methods and/or
practical considerations for replicating previous studies         data — necessary for carrying out the study the
from the literature. Growing interest in RepliCHI                 way it was originally conducted. For instance,
suggests increasing recognition for the value of                  instead of including the entire questionnaire
replicating prior studies. I hope and anticipate that this        instrument, the publication may include only
trend will foster continued community discussion on               those questionnaire items that led to statistically
how to justify, appreciate, and reward replication as a           significant results. Similarly, results may be
valuable scientific pursuit. Therefore, in this paper I           presented in the aggregate or as percentages,
focus on the latter aspect, viz., challenges and issues           making it difficult to replicate analyses that
faced in carrying out meaningful and valid replications           require details of individual data points.
of HCI studies.

I attribute these challenges to two main factors:            In the following section, I outline how I have found
                                                             these challenges to manifest themselves in
                                                             investigation of user preferences and practices
  1. Domain of inquiry: A large proportion of HCI            regarding online privacy. I conclude with some
     studies tackle research problems where results          thoughts on addressing the challenges.
     typically exhibit simultaneous and interacting
     influence of individuals, social systems, and           Replicating Studies of Online Privacy
     technology. Each of these three factors changes         When thinking about and carrying out replications of
     at drastically different rates and magnitudes. For      research related to privacy, I have encountered several
     instance, technology used in a study may                practical challenges:
     become obsolete within months or a couple of
     years, while physical and cognitive capabilities of     Privacy is a nuanced and complex issue affected by
     adults change at much slower rates (and the             individual characteristics, context of operation, and the
     magnitude of the change is often comparatively          technology under consideration. For instance,
     small and predictable). These differences in the        individuals have been classified into different groups
     evolution trajectories of humans, cultures, and         based on their inherent level of privacy concern [7],
     technology make it difficult to replicate studies at    and privacy concerns have been shown to exhibit
     a later time and to determine and attribute             cultural variation [3]. People’s mental models and
     causes behind differences in results, if any.           understanding of the underlying technology also affects
                                                             their preferences and practices regarding privacy [4].
  2. Insufficient and/or incomplete reporting: Typically     This implies that even when considering the same
     the only resource available for replicating a study     technology, replication conducted at a later time ought
to take into account the impact of learning effects on       a simple case of re-running the study with subjects
privacy issues. Replications may also encounter the          drawn from a different culture, with translation of
selection-maturation threat to validity owing to major       instruments and study materials, if necessary. In
external events that occur after the original study, such    practice, however, cultural differences pose several
as news coverage of privacy breaches. Such events            hurdles. For instance, the same word or term may be
affect the population’s overall understanding and            interpreted differently leading to the same question
awareness of privacy issues, thereby potentially             being answered differently. For example, we found that
affecting the results of replications of studies that were   the term “cubicle” was understood differently in the US
originally conducted prior to these event(s).                and India owing to differences in office layouts and
                                                             density. This difference was one of the factors crucial
The majority of attention in replication has been            for understanding the differences in results between
devoted to replication at a different (later) time. In the   the US and India [5]. In other studies, I discovered that
case of privacy, however, it is equally important to         the demographic question about ethnicity, which is
consider replication across different cultures. For          commonly asked in the US (and even mandated for
example, we administered a questionnaire                     NSF-sponsored studies), was considered potentially
simultaneously in the US and India, enabling us to           offensive and confusing in Europe. Differences in
draw interesting and surprising observations from            lifestyle and beliefs can also affect whether questions
comparison across cultures [5]. Our results confirmed        and tasks from one study can yield valid results, or
earlier findings regarding low levels of consumer            even make sense, when replicated in a different
privacy concerns in India. Surprisingly, by examining        cultural context. For instance, some privacy studies
interpersonal privacy separately from consumer               have asked Western respondents about premarital sex,
privacy, we found that interpersonal privacy concerns        sexual practices, extramarital affairs, and number of
in India were not only higher than consumer privacy          sexual partners (e.g., [1]). Such questions are unlikely
concerns but also higher than interpersonal privacy          to produce meaningful results in cultures where such
concerns in the US. Our study considered culture at          practices are uncommon and/or forbidden. Resolving
the broad level of national cultures. However, it should     this issue can be complicated when such
be noted that for replication purposes “culture” could       culturally-specific questions comprise parts of standard
be construed to connote any large groups with shared         scales; using the scale without modifications will not
characteristics and/or values, such as students,             yield meaningful results and dropping and/or modifying
engineers, mothers, liberals, etc. Moreover, if              items in the scale risks affecting the validity of
replication across cultures is conducted at a time later     comparison across studies. Finally, it is also necessary
than the original study, then learning effects and           to consider whether results across cultures are
maturation threats need to be taken into account (as         affected by differences in sampling techniques and
discussed above).                                            sample characteristics. For instance, although our
                                                             comparison of the US and India was limited to software
In theory, replication with a different cultural sample is
                                                             professionals, the mean and median ages of the Indian
participants were lower than those of the US                 simultaneously covers individuals, social systems, and
participants.                                                technology; and (ii) result reporting and data archiving.

We found that understanding privacy-related cultural         The second of these, in particular, could be easily
nuance often requires insights derived from qualitative      addressed by requiring inclusion of full instruments and
methods (such as interviews, focus groups, field visits,     study protocols as appendices1 . Similarly, authors of
etc.) and/or insider knowledge of the culture and its        accepted papers could be asked, or even required, to
practices [6]. Currently the CHI community is focused        upload the raw data after taking steps necessary to
mostly on replication of studies that employ quantitative    protect participant anonymity. In this regard, ACM,
methods, such as experiments, questionnaires, or             IEEE, NSF, and other prominent HCI funding and
usability evaluations. Complementing quantitative            sponsoring organizations can follow the lead of the
replications with qualitative insights has potential to      NIH, which mandates raw data availability. In a similar
broaden the scope of these replication endeavors.            vein, an open source inspired approach could
Toward this end, it may also be fruitful to tackle whether   encourage authors to release the source code of
and how qualitative studies could be effectively             systems and scripts used for conducting studies and
replicated.                                                  carrying out analyses. An open question regarding
                                                             data and code sharing is how to deal with
Discussion and Conclusion                                    commercialization and intellectual property issues
The previous section utilized examples from                  (especially when corporate entities are involved in
investigations of online privacy attitudes and behaviors     conducting the study)2 .
to illustrate some of the challenges and issues in
replicating HCI studies. Online privacy cuts across the      One approach for addressing the issue of intersection
individual, the social, and the technical, in much the       of people and technology is to encourage longitudinal
same way as many studies in HCI do. Therefore, I             investigations carried out at regular intervals over
believe that many, if not all, of these concerns are also    several years. Depending on the details and logistics of
likely to arise in HCI investigations of other topics.       the study, a longitudinal investigation could utilize the
                                                             same participants or different participants with the
The RepliCHI workshop is an important milestone              same sampling method and sample characteristics.
toward developing a comprehensive compilation and            The former approach can help examine the impact of
understanding of various challenges involved in the          changes in individual characteristics, evolution in
replication of HCI studies. Moving forward, it is            lifestyles, and effects of learning. The latter approach
necessary to apply this knowledge and insight for            can help illuminate the impact of changes in
constructing best practices to follow and pitfalls to
                                                                 1 This also provides the additional benefit of addressing one of the
avoid. Toward this end, I offer suggestions that address
                                                             most common comments raised in peer reviews — lack of method-
the two important considerations outlined in the             ological detail.
Introduction, viz., (i) domain of inquiry that                   2 Data used by studies conducted by corporations was a hotly de-

                                                             bated topic at the WWW 2012 conference [2].
technology. For replications across cultures, however, it        researchers.
is perhaps best to target simultaneous study                     http://www.nytimes.com/2012/05/22/science/big-
deployment. Fostering international collaborations               data-troves-stay-forbidden-to-social-scientists.html,
and/or leveraging international students to gain cultural        May 2012.
knowledge and access could help in this regard.              [3] Milberg, S., Burke, S., Smith, H., and Kallman, E.
                                                                 Values, personal information privacy, and
Requiring a replication component in Bachelor’s and              regulatory approaches. Communications of the
Master’s theses could provide a starting point for               ACM 38, 12 (1995), 65–74.
repeating studies from the literature, simultaneously        [4] Patil, S., and Kobsa, A. Uncovering privacy
serving a valuable pedagogical purpose by training the           attitudes and practices in Instant Messaging. In
next generation. Further, conferences and journals               Proceedings of the 2005 International ACM
could explicitly solicit replications of specific studies.       SIGGROUP Conference on Supporting Group
Special conference sessions or journal sections could            Work, GROUP ‘05, ACM (New York, NY, USA,
be devoted solely to replication studies. Discussions            2005), 109–112.
and follow-up activities from the RepliCHI workshop          [5] Patil, S., Kobsa, A., John, A., and Seligmann, D.
could lead the way toward legitimizing and promoting             Comparing privacy attitudes of knowledge workers
replication as a valuable scientific pursuit within HCI.         in the U.S. and India. In Proceedings of the 3rd
                                                                 International Conference on Intercultural
Acknowledgments                                                  Collaboration, ICIC ‘10, ACM (New York, NY, USA,
I thank Mihir Mahajan and John McCurley for editorial            2010), 141–150.
comments.                                                    [6] Patil, S., Kobsa, A., John, A., and Seligmann, D.
                                                                 Methodological reflections on a field study of a
References                                                       globally distributed software project. Information
[1] Grossklags, J., and Acquisti, A. When 25 cents is            and Software Technology 53, 9 (2011), 969–980.
    too much: An experiment on willingness-to-sell and       [7] Taylor, H. Most people are “privacy pragmatists”
    willingness-to-protect personal information. In              who, while concerned about privacy, will sometimes
    Workshop on the Economics of Information                     trade it off for other benefits. The Harris Poll 17
    Security (WEIS) (2007).                                      (2003), 19.
[2] Markoff, J. Troves of personal data, forbidden to