Re-testing the Perception of Social Annotations in Web Search Jennifer Fernquist Abstract Google We evaluated the perception of social annotations 1600 Amphitheatre Pkwy designed via guidelines recommended by Muralidharan, Mountain View, CA 94043 USA Gyongyi, Chi, 2012. The initial study found participants jenf@google.com noticed the annotation only 11% of the time with annotations shown below the search result snippet. Our Ed H. Chi refined study revealed that the proposed design with Google the annotation above the snippet increased noticeability 1600 Amphitheatre Pkwy to 60%. Replication studies are often iterative version Mountain View, CA 94043 USA of old studies, and this was no exception. The new edchi@google.com study refined the protocol for measuring ‘notice’ events, and modified the tasks to include tasks that are more relevant to recent news articles. Author Keywords Annotation; social search; eyetracking; user study. ACM Classification Keywords H.5.m. Information interfaces and presentation (e.g., HCI): Miscellaneous Introduction The abundance of information on the web suggests the importance of creating an environment in which users have the appropriate signals to make decisions about Presented at RepliCHI2013. Copyright © 2013 for the individual papers which search results are the most useful to them. As by the papers’ authors. Copying permitted only for private and academic more of the web involves social interactions, they purposes. This volume is published and copyrighted by its editors. produce a wealth of signals for searching the most interesting and relevant information. Much research has that were generated before the study, customized for been done on modifying search ranking based on social each participant. signals for web pages [1][2][3][5][6], but how should we present the social signals for web search results? (2) The second part consisted of a retrospective think- The most recent paper that we have found is the aloud (RTA) where they walked the participant through CHI2012 paper on social annotations by Muralidharan each task using the eyetrace data post hoc. During the et al. [4]. interview, researchers checked noticeability by asking if the participants noticed the social annotations, either by Previous Research them mentioning they saw them or being explicitly asked Muralidharan et al. [4] studied the perception of social if they had seen them. During the RTA the researchers annotations appearing below search results, as in Figure also obtained qualitative feedback about social 1. Consistent with prior papers, we use the term “social annotations. signals” to refer to any social information that is used to affect ranking, recommendation or presentation to The second study compared the perception of multiple the user. We use the term “social annotations” to refer designs of social annotations. They varied profile image to the presentation of social signals for an explanation size (small, large), snippet length (1, 2, 4 lines), and as to why a search or recommendation result is annotation position (above, below snippet). For this study presented. Thus, a social signal only becomes an the same mocks were used for each participant, with annotation when it is presented to the user. customization only for customizing familiar names and faces of people in the annotations. In the second study, noticeability of the annotations was measured by counting the number of fixations. Findings In the first study, they found that only 5 of the 45 (11%) of the visible social annotations were noticed. In the second study, they found that there were fewer fixations on annotations when: the snippet length was longer; the Figure 1: Example of older designs of social annotations. Image is from [4]. image was smaller; and the annotation was below the snippet. They concluded that the optimal design for a Study Protocol social annotation is one with a large picture, above the Their first study had two parts: (1) In the first part, snippet, with a short snippet length. participants conducted 18-20 search tasks, randomly ordered. Half were designed so that one or two social Our Method and Replication annotations would appear in the top four or five results. We aimed to actually test the proposed annotation The search results pages were presented as static mocks design guidelines from study 2 using live user data to see if people notice the annotations more by using the PART 2: RETROSPECTIVE THINK-ALOUD method from study 1. Specifically, we wanted to test After the search tasks, we immediately conducted a with live data that is relevant to participants (from their retrospective review of eye-tracking traces for search connections), as opposed to the static images used tasks in which subjects exhibited behaviors of some previously. An example of a social annotation with the interest to the experimenter. Review of eye-tracking new design is shown in Figure 2. videos prompted think-aloud question answering about participants’ process on the entire task, particular interesting pages, and particular interesting results. Unlike Experiment 1 in Muralidharan et al. [4], we examined the eyetrace data directly by hand to Figure 2: Example result with the new annotation design determine noticeability, rather than through verbal proposed by prior work. This annotation is above the snippet, has a large image and the snippet is less than 4 lines long. feedback during the RTA tasks. Study Protocol PART 3: THINK-ALOUD TASKS Experimental sessions consisted of 3 parts, the first two Finally, participants performed two or three different using essentially the same protocol as experiment 1’s in search queries for which we determined ahead of time the previous work, with some improvements. that should bring up relevant personal results. Here we gathered qualitative feedback on social annotations. PART 1: SEARCH TASKS We designed planned 16-20 custom search tasks for Results each subject, at least eight of which were “social In total, we collected eye-trace data for 153 tasks from search" tasks designed to organically pull up social nine subjects. Each eye-trace data for each task was annotations. The 8 non-social search tasks were the analyzed by hand by an experimenter to understand: same as used in the prior work. which positions contained personal search results; whether the search result was in the field of view in the In order to ensure that personal results appear for as browser; and importantly, whether the subject fixated many queries as required, we designed 2-4 additional on the result and/or the social annotation. This funnel social search tasks for each participant that were analysis approach is different than the previous work’s intended to bring up personal results. This way, if one approach of asking participants if they noticed the social search task did not bring up personal results, we annotations. gave them the additional tasks to help ensure that they We discovered that participants fixated on annotations saw 8 tasks with personal results. in 35 of the 58 tasks where they appeared (60%). This is a dramatic improvement over the 11% perception rate of the Muralidharan et al. [4]. We account this This raises a big issue for research replication: difference primarily to the new annotation design. changing environments such as time or space. In our case, the tasks lost their relevancy over time. Replication Discussion Researchers could help mitigate this by rewriting tasks Access to Previous Experimental Data. We were able to so they are more relevant but still in the same vein as repeat the exact same tasks performed in the previous the original. For example, we could have written a work but only because we share a co-author who had different task that was more topical but would still be access to the data. If anyone else tried to replicate the categorized as news. It must be decided which would study, they would not have been able to do so as cause the least amount of discrepancy for replication: effectively. maintaining the identical, less relevant task or rewriting Temporal Challenges. Even though the search tasks a relevant task that differs from the original. were identical, because the study was conducted Iteration and Refinement. The primary difference in our several months later, some of the task questions were protocol, measuring perception with fixation data rather no longer topically relevant. For example, one task than verbal confirmation, offered an improvement to asked “What is the website for the Google image the previous work. labeling game?” At the time of our study, the website was no longer active. Similarly, the search task “Find Even with those challenges, we feel that we were some information about the Nevada law legalizing self- successful in our replication efforts. We conducted an driving cars” brought up news articles from the almost identical study to confirm the proposed previous summer, when Muralidharan et al. [4] improved design for social annotations and found a conducted their research, since it was no longer recent large increase in perception. news. References [4] Muralidharan, A., Gyongyi, Z., and Chi, E. H. Social [1] Bao, S., Xue, G., Wu, X., Yu, Y., Fei, B., and Su, Z. annotations in web search. In Proc. CHI 2012, 1085– Optimizing web search using social annotations. In 1094. Proc. WWW 2007, 501–510. [5] Yanbe, Y., Jatowt, A., Nakamura, S., and Tanaka, [2] Carmel, D., Zwerdling, N., Guy, I., Ofek-Koifman, K. Can social bookmarking enhance search in the web? S., Har’el, N., Ronen, I., Uziel, E., Yogev, S. and In Proc. JCDL 2007, 107–116. Chernov, S. Personalized social search based on the [6] Zanardi, V., and Capra, L. Social ranking: user’s social network. In Proc. CIKM 2009, 1227–1236. uncovering relevant content using tag-based [3] Heymann, P., Koutrika, G., and Garcia-Molina, H. recommender systems. In Proc. RecSys 2008, 51–58. Can social bookmarking improve web search? In Proc. WSDM 2008, 195–206.