Re-testing the Perception of Social
                              Annotations in Web Search

Jennifer Fernquist                                                        Abstract
Google                                                                    We evaluated the perception of social annotations
1600 Amphitheatre Pkwy                                                    designed via guidelines recommended by Muralidharan,
Mountain View, CA 94043 USA                                               Gyongyi, Chi, 2012. The initial study found participants
jenf@google.com                                                           noticed the annotation only 11% of the time with
                                                                          annotations shown below the search result snippet. Our
Ed H. Chi                                                                 refined study revealed that the proposed design with
Google                                                                    the annotation above the snippet increased noticeability
1600 Amphitheatre Pkwy                                                    to 60%. Replication studies are often iterative version
Mountain View, CA 94043 USA                                               of old studies, and this was no exception. The new
edchi@google.com                                                          study refined the protocol for measuring ‘notice’
                                                                          events, and modified the tasks to include tasks that are
                                                                          more relevant to recent news articles.

                                                                          Author Keywords
                                                                          Annotation; social search; eyetracking; user study.

                                                                          ACM Classification Keywords
                                                                          H.5.m. Information interfaces and presentation (e.g.,
                                                                          HCI): Miscellaneous

                                                                          Introduction
                                                                          The abundance of information on the web suggests the
                                                                          importance of creating an environment in which users
                                                                          have the appropriate signals to make decisions about
Presented at RepliCHI2013. Copyright © 2013 for the individual papers     which search results are the most useful to them. As
by the papers’ authors. Copying permitted only for private and academic   more of the web involves social interactions, they
purposes. This volume is published and copyrighted by its editors.        produce a wealth of signals for searching the most
interesting and relevant information. Much research has     that were generated before the study, customized for
been done on modifying search ranking based on social       each participant.
signals for web pages [1][2][3][5][6], but how should
we present the social signals for web search results?       (2) The second part consisted of a retrospective think-
The most recent paper that we have found is the             aloud (RTA) where they walked the participant through
CHI2012 paper on social annotations by Muralidharan         each task using the eyetrace data post hoc. During the
et al. [4].                                                 interview, researchers checked noticeability by asking if
                                                            the participants noticed the social annotations, either by
Previous Research                                           them mentioning they saw them or being explicitly asked
Muralidharan et al. [4] studied the perception of social    if they had seen them. During the RTA the researchers
annotations appearing below search results, as in Figure    also obtained qualitative feedback about social
1. Consistent with prior papers, we use the term “social    annotations.
signals” to refer to any social information that is used
to affect ranking, recommendation or presentation to        The second study compared the perception of multiple
the user. We use the term “social annotations” to refer     designs of social annotations. They varied profile image
to the presentation of social signals for an explanation    size (small, large), snippet length (1, 2, 4 lines), and
as to why a search or recommendation result is              annotation position (above, below snippet). For this study
presented. Thus, a social signal only becomes an            the same mocks were used for each participant, with
annotation when it is presented to the user.                customization only for customizing familiar names and
                                                            faces of people in the annotations. In the second study,
                                                            noticeability of the annotations was measured by counting
                                                            the number of fixations.

                                                            Findings
                                                            In the first study, they found that only 5 of the 45 (11%)
                                                            of the visible social annotations were noticed. In the
                                                            second study, they found that there were fewer fixations
                                                            on annotations when: the snippet length was longer; the
Figure 1: Example of older designs of social annotations.
Image is from [4].                                          image was smaller; and the annotation was below the
                                                            snippet. They concluded that the optimal design for a
Study Protocol                                              social annotation is one with a large picture, above the
Their first study had two parts: (1) In the first part,     snippet, with a short snippet length.
participants conducted 18-20 search tasks, randomly
ordered. Half were designed so that one or two social       Our Method and Replication
annotations would appear in the top four or five results.   We aimed to actually test the proposed annotation
The search results pages were presented as static mocks     design guidelines from study 2 using live user data to
see if people notice the annotations more by using the          PART 2: RETROSPECTIVE THINK-ALOUD
method from study 1. Specifically, we wanted to test            After the search tasks, we immediately conducted a
with live data that is relevant to participants (from their     retrospective review of eye-tracking traces for search
connections), as opposed to the static images used              tasks in which subjects exhibited behaviors of some
previously. An example of a social annotation with the          interest to the experimenter. Review of eye-tracking
new design is shown in Figure 2.                                videos prompted think-aloud question answering about
                                                                participants’ process on the entire task, particular
                                                                interesting pages, and particular interesting results.

                                                                Unlike Experiment 1 in Muralidharan et al. [4], we
                                                                examined the eyetrace data directly by hand to
Figure 2: Example result with the new annotation design
                                                                determine noticeability, rather than through verbal
proposed by prior work. This annotation is above the snippet,
has a large image and the snippet is less than 4 lines long.
                                                                feedback during the RTA tasks.

Study Protocol                                                  PART 3: THINK-ALOUD TASKS
Experimental sessions consisted of 3 parts, the first two       Finally, participants performed two or three different
using essentially the same protocol as experiment 1’s in        search queries for which we determined ahead of time
the previous work, with some improvements.                      that should bring up relevant personal results. Here we
                                                                gathered qualitative feedback on social annotations.
PART 1: SEARCH TASKS
We designed planned 16-20 custom search tasks for               Results
each subject, at least eight of which were “social              In total, we collected eye-trace data for 153 tasks from
search" tasks designed to organically pull up social            nine subjects. Each eye-trace data for each task was
annotations. The 8 non-social search tasks were the             analyzed by hand by an experimenter to understand:
same as used in the prior work.                                 which positions contained personal search results;
                                                                whether the search result was in the field of view in the
In order to ensure that personal results appear for as          browser; and importantly, whether the subject fixated
many queries as required, we designed 2-4 additional            on the result and/or the social annotation. This funnel
social search tasks for each participant that were              analysis approach is different than the previous work’s
intended to bring up personal results. This way, if one         approach of asking participants if they noticed the
social search task did not bring up personal results, we        annotations.
gave them the additional tasks to help ensure that they
                                                                We discovered that participants fixated on annotations
saw 8 tasks with personal results.
                                                                in 35 of the 58 tasks where they appeared (60%). This
                                                                is a dramatic improvement over the 11% perception
rate of the Muralidharan et al. [4]. We account this        This raises a big issue for research replication:
difference primarily to the new annotation design.          changing environments such as time or space. In our
                                                            case, the tasks lost their relevancy over time.
Replication Discussion
                                                            Researchers could help mitigate this by rewriting tasks
Access to Previous Experimental Data. We were able to
                                                            so they are more relevant but still in the same vein as
repeat the exact same tasks performed in the previous
                                                            the original. For example, we could have written a
work but only because we share a co-author who had
                                                            different task that was more topical but would still be
access to the data. If anyone else tried to replicate the
                                                            categorized as news. It must be decided which would
study, they would not have been able to do so as
                                                            cause the least amount of discrepancy for replication:
effectively.
                                                            maintaining the identical, less relevant task or rewriting
Temporal Challenges. Even though the search tasks           a relevant task that differs from the original.
were identical, because the study was conducted
                                                            Iteration and Refinement. The primary difference in our
several months later, some of the task questions were
                                                            protocol, measuring perception with fixation data rather
no longer topically relevant. For example, one task
                                                            than verbal confirmation, offered an improvement to
asked “What is the website for the Google image
                                                            the previous work.
labeling game?” At the time of our study, the website
was no longer active. Similarly, the search task “Find      Even with those challenges, we feel that we were
some information about the Nevada law legalizing self-      successful in our replication efforts. We conducted an
driving cars” brought up news articles from the             almost identical study to confirm the proposed
previous summer, when Muralidharan et al. [4]               improved design for social annotations and found a
conducted their research, since it was no longer recent     large increase in perception.
news.


References                                                  [4] Muralidharan, A., Gyongyi, Z., and Chi, E. H. Social
[1] Bao, S., Xue, G., Wu, X., Yu, Y., Fei, B., and Su, Z.   annotations in web search. In Proc. CHI 2012, 1085–
Optimizing web search using social annotations. In          1094.
Proc. WWW 2007, 501–510.                                    [5] Yanbe, Y., Jatowt, A., Nakamura, S., and Tanaka,
[2] Carmel, D., Zwerdling, N., Guy, I., Ofek-Koifman,       K. Can social bookmarking enhance search in the web?
S., Har’el, N., Ronen, I., Uziel, E., Yogev, S. and         In Proc. JCDL 2007, 107–116.
Chernov, S. Personalized social search based on the         [6] Zanardi, V., and Capra, L. Social ranking:
user’s social network. In Proc. CIKM 2009, 1227–1236.       uncovering relevant content using tag-based
[3] Heymann, P., Koutrika, G., and Garcia-Molina, H.        recommender systems. In Proc. RecSys 2008, 51–58.
Can social bookmarking improve web search? In Proc.
WSDM 2008, 195–206.