=Paper=
{{Paper
|id=None
|storemode=property
|title=None
|pdfUrl=https://ceur-ws.org/Vol-1033/EuroHCIR2013-Proceedings.pdf
|volume=Vol-1033
}}
==None==
<pdf width="1500px">https://ceur-ws.org/Vol-1033/EuroHCIR2013-Proceedings.pdf</pdf>
<pre>
                         EuroHCIR!2013!
                        1st!August!2013!–!Dublin,!Ireland!

                    !
            Proceedings+of+the+!
       3rd!European)Workshop)on)!
     Human"Computer)Interaction)and)
          Information*Retrieval!
                        A"workshop"at"ACM"SIGIR"2013"
                                      "


                                   Preface!
  EuroHCIR)2013)was)organised)with)the)specific)goal)of)better)engaging)the)IR)
      community,)who)have)been)underrepresented)at)previous)EuroHCIR)
     conferences.)Thus)we)proposed)to)have)the)workshop)at)the)ACM)SIGIR)
conference)in)Dublin.)Research,)Industry,)and)Position)papers)were)invited,)and)
although)very)few)industry)submissions)were)received,)we)received)a)number)of)
     research)and)position)papers)focusing)on)the)intersection)of)IR)and)HCI)
 evaluations,)several)focusing)on)adapting)the)TREC)paradigm.)Many)interesting)
              system)and)demonstrator)papers)were)also)accepted.)


                                Organised!by!
           Max"L."Wilson"                              Birger"Larsen"
         Mixed)Reality)Lab)                   The)Royal)School)of)Library)and))
    University)of)Nottingham,)UK)              Information)Science,)Denmark)
    max.wilson@nottingham.ac.uk)                        blar@iva.dk)
                   )
                                 Preben"Hansen"
                      Dept.)of)Computer)&)Systems)Sciences)
                         Stockholm)University,)Sweden)
                                preben@dsv.su.se)
                                        )
        Tony"RussellFRose"                            Kristian"Norling"
             UXLabs,)UK)                           Norling)&)Co,)Sweden)
         tgr@uxlabs.co.uk)                      kristian.norling@gmail.com)
                                        )
                                        )
                                        )
)                            )
Research!Papers!
Page"3"F"" Fading"Away:"Dilution"and"User"Behaviour"(Orally)Presented)"
            Paul%Thomas,%Falk%Scholer,%Alistair%Moffat%
Page"7"F"" Exploratory"Search"Missions"for"TREC"Topics"(Orally)Presented)"
            Martin%Potthast,%Matthias%Hagen,%Michael%Völske,%Benno%Stein%
Page"11"F"" Interactive"Exploration"of"Geographic"Regions"with"WebFbased"Keyword"
            Distributions"
            Chandan%Kumar,%Dirk%Ahlers,%Wilko%Heuten,%Susanne%Boll%
Page"15"F"" Inferring"Music"Selections"for"Casual"Music"Interaction"(Orally)Presented)"
            Daniel%Boland,%Ross%McLachlan,%Roderick%MurrayESmith"
Page"19"F"" Search"or"browse?"Casual"information"access"to"a"cultural"heritage"collection"
            Robert%Villa,%Paul%Clough,%Mark%Hall,%Sophie%Rutter%
Page"23"F"" Studying"Extended"Session"Histories"
            Chaoyu%Ye,%Martin%Porcheron,%Max%L.%Wilson%
Page"27"F" Comparative"Study"of"Search"Engine"Result"Visualisation:"Ranked"Lists"Versus"
            Graphs"
            Casper%Petersen,%Christina%Lioma,%Jakob%Grue%Simonsen"

Position!Papers!
Page"31"F"" Evolving"Search"User"Interfaces"(Orally)Presented)"
            Tatiana%Gossen,%Marcus%Nitsche,%Andreas%Nürnberger"
Page"35"F"" A"Pluggable"WorkFbench"for"Creating"Interactive"IR"Interfaces"(Orally)Presented)"
            Mark%M.%Hall,%Spyros%Katsaris,%Elaine%Toms"
Page"39"F"" A"Proposal"for"UserFFocused"Evaluation"and"Prediction"of"Information"Seeking"
            Process"(Orally)Presented)"
            Chirag%Shah%
Page"43"F"" Directly"Evaluating"the"Cognitive"Impact"of"Search"User"Interfaces:"a"TwoF
            Pronged"Approach"with"fNIRS"
            Horia%A.%Maior,%Matthew%Pike,%Max%L.%Wilson,%Sarah%Sharples%
Page"47"F"" Dynamics"in"Search"User"Interfaces"
            Marcus%Nitsche,%Florian%Uhde,%Stefan%Haun%and%Andreas%Nürnberger%

Demo!Descriptions!
Page"51"F"" SearchPanel:"A"browser"extension"for"managing"search"activity"
            Simon%Tretter,%Gene%Golovchinsky,%Pernilla%Qvarfordt""
Page"55"F"" A"System"for"PerspectiveFAware"Search"
            M.%Atif%Qureshi,%Arjumand%Younus,%Colm%O’Riordan,%Gabriella%Pasi,%Nasir%Touheed"
                     Fading Away: Dilution and User Behaviour

                     Paul Thomas                                Falk Scholer                           Alistair Moffat
                   CSIRO ICT Centre                    School of Computer Science             Department of Computing and
              paul.thomas@csiro.au                     and Information Technology                 Information Systems
                                                             RMIT University                   The University of Melbourne
                                                      falk.scholer@rmit.edu.au                ammoffat@unimelb.edu.au


ABSTRACT                                                                       2. Submitting another query, hoping for better results;
When faced with a poor set of document summaries on the first page             3. Switching to a different search engine and entering the same
of returned search results, a user may respond in various ways: by                query, hoping that it provides better results;
proceeding on to the next page of results; by entering another query;          4. Trying to find the information through other techniques, for
by switching to another service; or by abandoning their search. We                example by browsing.
analyse this aspect of searcher behaviour using a commercial search
system, comparing a deliberately degraded system to the original             We investigate the first two possibilities, reporting on differences
one. Our results demonstrate that searchers naturally avoid selecting     in user behaviour when a standard retrieval system is compared to an
poor results as answers given the degraded system; however, the           adjusted system in which results are diluted by inserting non-relevant
depth of the ranking that they view, their query reformulation rate,      answers. Our results indicate that searchers remained attentive to the
and the amount of time required to complete search tasks, are all         task in the degraded system, and adapted their behaviour to avoid
remarkably unchanged.                                                     clicking on non-relevant snippets. However, all other aspects of
                                                                          their behaviour were remarkably consistent, including the amount of
Categories and Subject Descriptors                                        time spent on tasks; the number of query reformulations undertaken;
                                                                          and their perceptions of search difficulty.
H.3.4 [Information Storage and Retrieval]: Systems and soft-
ware—performance evaluation.
                                                                          2.     METHODS
General Terms                                                                We designed a user experiment to explore ways in which be-
                                                                          haviour changes with retrieval quality. A total of n = 34 participants,
Experimentation, measurement.                                             comprising staff and students from the Australian National Univer-
                                                                          sity, carried out six search tasks of differing complexity, covering
Keywords                                                                  the remember, analyse and understand tasks of Wu et al. [7] but
                                                                          modified for our context. On commencing a task, users were shown
Retrieval experiment, evaluation, system measurement.
                                                                          a result page for an initial “starter” query that was constant across
                                                                          users. They were then free to explore the results list, including being
1.     INTRODUCTION                                                       able to open documents, to view further results pages, and to enter
   While carrying out a search, users have a number of tactics avail-     follow-up queries. Once any document was opened for viewing,
able to them. Intuitively, it seems likely that these tactics or be-      participants were asked to indicate whether or not it was relevant to
haviours will vary based on the quality of the results that are re-       their search task, before returning to the search results listing. The
turned by the retrieval system. For example, other things being           search interface prevented tabbed browsing, and while a document
equal, a user who cannot find any relevant items on the first page        was being viewed it replaced the results page. Participants were not
of search results might be more inclined to reformulate their query       given an explicit time limit for any task, but were told they could
(by entering another query into the search interface) than a user who     move on when they felt ready.
has found a large number of relevant items. Possible tactics when            The search results displayed to participants were sourced from
using an apparently ineffective system include:                           the Yahoo! API, and presented in the usual way as an ordered list
                                                                          consisting of query-biased summaries, with ten results per page. No
     1. Looking further in the results list, visiting pages beyond the    branding from the underlying search service was shown. Without
        first, hoping that the results improve;                           telling our participants, we simulated search systems of two differ-
                                                                          ent effectiveness levels by showing results in one of two modes:
                                                                          full, where the ranking obtained from the search service was dis-
                                                                          played in its original form; and diluted, where the original results
                                                                          were interleaved with answers from a related but incorrect query [5].
                                                                          Dilution was operationalised by leveraging the capacity-enhancing
                                                                          (and obfuscatory) power of “management-speak”: the original stake-
                                                                          holder information need was actioned going forward by enhancing
Presented at EuroHCIR2013. Copyright © 2013 for the individual papers     it through the win-win inclusion of a jargon competency chosen
by the papers’ authors. Copying permitted only for private and academic   randomly from a list of outside-the-box strategies, thereby disem-
purposes. This volume is published and copyrighted by its editors.        powering the results paradigm. For example, if the task was to “find
                               0.30


                                                                                                                  0.30
                               0.25


                                                                                                                  0.25
                               0.20


                                                                                                                  0.20
                  Proportion


                                                                                                     Proportion
                               0.15


                                                                                                                  0.15
                               0.10


                                                                                                                  0.10
                               0.05


                                                                                                                  0.05
                               0.00


                                                                                                                  0.00
                                      1   2   3   4   5   6   7   8   9     11   13   15   17   19                       1   2   3   4   5   6   7   8   9     11    13   15   17     19

                                                                          Rank                                                                               Rank


          Figure 1: Normalised total click positions across participants and tasks, for full queries (left) and diluted queries (right).


the Eurovision Song Contest home page”, a user’s initial full query                                                                                      1st results page           2nd results page
might be “eurovision”; whereas in the diluted system half of the
                                                                                                                                     full                           207                    15
results displayed might instead be derived from the query “eurovi-
                                                                                                                                     diluted                        212                    22
sion best practice”. There were a small number of queries issued
for which it was not possible to generate five such results; these 22
out of 5930 page interactions are excluded from the analysis below.                                               Table 1: Total page views, summed across users and topics, for the
   Most interactions with the search system were logged while par-                                                full and diluted retrieval systems.
ticipants carried out the six search tasks, including: submitted search
queries; clicks on snippets in order to open documents for view-
ing; assessments of document usefulness; and the point of gaze                                                    (c 2 test, p = 0.97). The number of items that were determined as
on the screen, captured using an eye tracker. Task order was bal-                                                 being useful was also similar in the two conditions: 201 for full,
anced across the participants and topics so as to minimise the risk                                               and 214 for diluted (c 2 test, p = 0.52). Our participants needed to
of bias; similarly, whether the full or diluted approach got applied                                              read a remarkably similar number of documents, and a remarkably
for each participant-task combination was pre-determined as part of                                               similar number of useful documents, to satisfy the (assigned) needs
the experimental design.                                                                                          regardless of the search system.
                                                                                                                     Given this difference in click rates, it is reasonable to expect other
                                                                                                                  changes in behaviour and we consider this below.
3.    RESULTS
  User behaviour, and the differences caused by the full and diluted                                              Depth of result page viewing: When presented with a search results
query treatments, can be measured in a range of ways.                                                             page, the user chooses which snippets require further evaluation.
                                                                                                                  In line with commercial search engines, our experimental partici-
User click behaviour: The normalised click frequency at each rank                                                 pants were presented with ten answers per page, with the option of
position in the answer pages is shown in Figure 1. In the diluted                                                 accessing subsequent results pages.
retrieval system the “incorrect but plausible” documents were in-                                                    Faced with a relatively poor quality results list, a plausible strategy
serted in positions 1, 3, 5, 7 and 9. The pattern of click behaviour                                              for a user who is looking for an answer document is to look further
demonstrates that our experimental manipulation was successful:                                                   down the results page. Table 1 shows the frequency with which
for the full search results, the click distribution follows the expected                                          results pages were viewed (that is, the user visited a results page
pattern of users clicking more frequently on items that are higher                                                and looked at one or more items on the screen as recorded using
in the ranked list [1], whereas users of the diluted system were less                                             eye-tracking), summed across users and queries. When using the
likely to click answer items in the odd positions. Note that position                                             full system, participants moved on to the second page of results for
bias – the propensity for searchers to select items that occur higher                                             15 out of 207 issued queries (with a corresponding mean page depth
in a ranking, possibly because they “trust” the underlying search                                                 of 1.07), while in the diluted system the second results page was
system [3] – exists in both systems. In particular, all of the odd-                                               visited for 22 out of the total of 212 queries that were issued (a mean
numbered rank positions in the diluted system are equally “bad”,                                                  page depth of 1.10). The difference in depth was not significant
but participants still favoured items higher in the ranking.                                                      (c 2 test, p = 0.34). No participants viewed results beyond the the
   A second check to confirm that our system dilution had an impact                                               second page with either system.
on search effectiveness is to consider the rates at which users saved                                                Figures 2 and 3 provide a more detailed view of gaze behaviour,
documents that they viewed (that is, the likelihood that a document                                               showing the deepest rank position that searchers examined while
was found to be relevant after it was clicked). The mean rate is                                                  carrying out a query, and the last rank position that was viewed
0.733 for the full system, compared to 0.597 for the diluted system,                                              before finishing the query. The distributions of the lowest rank
a statistically significant difference (t-test, p < 0.05).                                                        positions viewed are similar between the full and diluted systems:
   While Figure 1 establishes that our user study participants re-                                                both show peaks at rank positions 7 (the last item above the fold)
sponded differently in terms of rank-specific click behaviour, the                                                and 10 (the last item in each page of search results). The distribution
high-level aggregated click behaviour across all participants and                                                 of the last position viewed before finishing a query (which arises
search tasks was not distinctive: in total (all tasks, and all users)                                             when either enough relevant items have been found, or the user types
there were 323 clicks for the full system, and 322 for the diluted                                                a fresh query) are also broadly similar. However, for the diluted
system. Unsurprisingly this difference is not statistically significant                                           system, rank position 1 has a larger proportion of the probability
                                                                                                                     0.20
                               0.20


                                                                                                                     0.15
                               0.15
                  Proportion


                                                                                                        Proportion

                                                                                                                     0.10
                               0.10


                                                                                                                     0.05
                               0.05
                               0.00


                                                                                                                     0.00
                                      1   2   3   4   5   6   7   8   9 10     12   14   16   18   20                       1   2   3   4   5   6                   7    8   9 10      12    14   16    18       20

                                                                        Rank                                                                                                   Rank


     Figure 2: Deepest rank position viewed, averaged across topics and participants, for full queries (left) and diluted queries (right).
                               0.30


                                                                                                                     0.30
                               0.25


                                                                                                                     0.25
                               0.20


                                                                                                                     0.20
                  Proportion


                                                                                                        Proportion
                               0.15


                                                                                                                     0.15
                               0.10


                                                                                                                     0.10
                               0.05


                                                                                                                     0.05
                               0.00


                                                                                                                     0.00


                                      1   2   3   4   5   6   7   8   9 10     12   14   16   18   20                       1   2   3   4   5   6                   7    8   9 10      12    14   16    18       20

                                                                        Rank                                                                                                   Rank


       Figure 3: Final rank position viewed, averaged across topics and participants, for full queries (left) and diluted queries (right).


mass. A possible reason is that searchers mentally compare answers                                                                                                                     ●


as they view items in the results list, and most users scan at least the
                                                                                                                                                                    14


top few items. The diluted system is likely to have a non-relevant
                                                                                                                                                                    12


                                                                                                                                                                                       ●


document in position one, and so reviewing that snippet may serve
as a final confirmation, before the user commits to a click on a
                                                                                                                                                                    10


                                                                                                                                                                                                         ●
                                                                                                                                                Number of queries


deeper-ranked snippet from the underlying full results.
                                                                                                                                                                                       ●                 ●
                                                                                                                                                                    8


                                                                                                                                                                                       ●                 ●

Query reformulation: A second way in which a user might respond                                                                                                                        ●
                                                                                                                                                                    6


to search systems of differing quality is to change the rate at which                                                                                                                  ●


they stop looking through the current set of search results, and
                                                                                                                                                                    4


instead enter a new query.                                                                                                                                                             ●                 ●
                                                                                                                                                                    2


   The number of queries used by participants when carrying out
their search tasks is shown in Figure 4. Overall the number was                                                                                                                       Full             Diluted
low for both systems, with a median of 1 and 2 queries (0 and 1
reformulations) for the full and diluted results, respectively. This
difference was not statistically significant (Wilcoxon signed-rank                                                   Figure 4: Number of queries per task, for full and diluted queries.
test, p = 0.46).

Ability to identify relevant answers: When a retrieval system serves
unhelpful answers, it might be that the ability of the searcher to                                                   Time spent on tasks: While depth of viewing and query re-form-
identify useful answers is similarly affected. However, based on our                                                 ulation do not show significant differences in searcher behaviour, it
experiments, the mean rate at which clicked items were saved as                                                      could still be the case that using an inferior system makes querying
being relevant was 0.787 for the full system and 0.747 for the diluted                                               slower. Differences in system quality might alter the time spent
system, showing no significant difference (t-test, p = 0.25). Thus                                                   by users when viewing and processing result pages. However, the
the ability of users to identify relevant answers, once documents                                                    average gaze duration when viewing snippets, measured as the sum
have been selected for viewing via their snippets, did not differ                                                    of fixation durations that occurred in the screen area defined by each
between the experimental treatments.                                                                                 search result summary, was 0.586 second for full queries and 0.589
                                                                                                                     seconds for diluted queries. This difference was not statistically
                                                                                                                     significant (t-test, p = 0.89).
                                                                                                                        Differences could also occur at a higher level of system interac-
tion. The mean time that participants spent working on each search         unspecified “boss”, with an incentive to find the most good and
task, including viewing search result pages, viewing selected doc-         fewest bad sources possible [4]; participants were not constrained
uments, and making relevance decisions, was 2.70 minutes for the           in the amount of time that they could spend on a task. In contrast,
full treatment, and 2.54 minutes for the diluted one. This difference      our subjects were instructed that they would complete a sequence of
was not statistically significant (t-test, p = 0.62).                      . . . web search tasks and were advised to spend what feels to be an
   Finally, we consider the interaction between time and query re-         appropriate amount of time on each task, until you have collected a
formulations. When using the full system, participants entered an          set of answer pages that in your opinion allow the information need
average of 1.50 queries per minute while completing each task.             to be appropriately met. The overall expectations were therefore
For the diluted system, the rate was 1.52 queries per minute. The          different: in the Smith and Kantor study, participants were given the
difference was not significant (t-test, p = 0.95).                         goal of maximising relevance by finding as many good answers as
   Overall, these results indicate that the quality of the search system   possible; in our study, participants were “satisficing”, having been
did not affect the rate at which participants were able to process         requested to decide for themselves when an appropriate number of
information on search results pages, or how much time they spent           answers had been found.
working on tasks before feeling that they had achieved their goals.             Alternatively, it may be that our diluted system, while certainly
The only significant difference between the two treatments was the         poorer in overall quality (in the sense that non-relevant answers were
click distribution, and the rate at which clicked documents were           introduced into the ranking), was not poor enough to induce different
judged to be useful.                                                       behaviour. Smith and Kantor used results typically from the 300th
                                                                           position in Google’s results: even today, these are unreliable for
Searcher assessment of task difficulty: After carrying out each            the simplest of our topics, and in 2008 will almost certainly have
search task, experimental participants were asked to answer two            produced a poor result set. Importantly, our diluted system always
questions: “How difficult was it to find useful information on this        included a few high-ranked results.
topic?”, and “How satisfied were you with the overall quality of your           Either way, our results raise an important question about how the
search experience?”. The 5-point response scale for these questions        effectiveness of search systems should be analysed. While some
was anchored with the labels “Not at all” (assigned a value of 1) and      fine-grained aspects of user clicking behaviour differed between
“Extremely” (assigned a value of 5).                                       the full and diluted treatments, the majority of behaviours did not.
   Searchers found the tasks relatively easy to complete: the median       This outcome is in line with previous results that found little rela-
response rate for the search difficulty question was 2 for both the di-    tionship between user behaviour and system quality as measured
luted and full systems; this difference was not significant (Wilcoxon      by common IR evaluation metrics such as MAP [6]. The ques-
test, p = 0.73). Satisfaction levels were also highly consistent be-       tion then becomes one of whether even a significant improvement
tween the two systems, with a median response level of 4 for both          in effectiveness, as measured by some metric, actually results in
systems (Wilcoxon test, p = 0.91). Overall, there were no system-          improved task performance. In future work, we therefore plan to
atic differences in participants’ perceptions of search difficulty or      systematically investigate different levels of answer-page dilution,
the overall experience resulting from the two different treatments.        to establish guidelines for the extent of practical differences that
                                                                           need to be present in search systems for measurable disparities in
4.    DISCUSSION AND CONCLUSIONS                                           user behaviour to manifest. We also plan to explore the issue of the
                                                                           impact that specific variations in task instructions have on searcher
   It seems “obvious” that user behaviour will be influenced by the
                                                                           behaviour through a controlled user study in a work task-based
quality of results that returned by a search service. Seeing many
                                                                           framework [2].
poor results near the start of an answer list may influence the user’s
decision about whether to continue viewing subsequent answer
pages, to enter a new query, or to abandon the search altogether.          References
Previous work has supported this view. For example, in a study of          [1] E. Agichtein, E. Brill, S. Dumais, and R. Ragno. Learning user inter-
36 users completing 12 search tasks with different search systems,             action models for predicting web search result preferences. In Proc.
Smith and Kantor [4] found that users adapted their behaviour:                 SIGIR, pages 3–10, Seattle, WA, 2006.
when given a consistently degraded search system, they entered             [2] P. Borlund. Experimental components for the evaluation of interactive
more queries per minute than users of a standard system; similarly,            information retrieval systems. Journal of Documentation, 56(1):71–90,
a higher detection rate (the ability to identify relevant answers) was         2000.
observed for users of degraded systems.
                                                                           [3] T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately
   However our study, in which 34 subjects carried out search tasks            interpreting clickthrough data as implicit feedback. In Proc. SIGIR,
using an evenly balanced combination of full and diluted search                pages 154–161, Salvador, Brazil, 2005.
systems, contrasts strongly with that intuition and previous findings.
Overall, searchers took around the same amount of time to complete         [4] C. Smith and P. Kantor. User adaptation: good results from poor systems.
their tasks in both experimental treatments; were able to save a               In Proc. SIGIR, pages 147–154, Singapore, 2008.
similar number of documents as being relevant; exhibited consistent        [5] P. Thomas, T. Jones, and D. Hawking. What deliberately degrading
viewing behaviour when looking at the search results lists returned            search quality tells us about discount functions. In Proc. SIGIR, pages
by the treatments; and did not perceive significant differences in the         1107–1108, Beijing, China, 2011.
difficulty of carrying out tasks with both systems. The key difference
                                                                           [6] A. Turpin and F. Scholer. User performance versus precision measures
in participant behaviour was their click rate at particular ranks: in          for simple web search tasks. In Proc. SIGIR, pages 11–18, Seattle, WA,
essence, they successfully avoided poor answers, as demonstrated               2006.
by the shift in the click probability mass, shown in Figure 1.
   A possible explanation for the divergence in observed user be-          [7] W.-C. Wu, D. Kelly, A. Edwards, and J. Arguello. Grannies, tanning
                                                                               beds, tattoos and NASCAR: Evaluation of search tasks with varying
haviour between the two studies may be the context in which the
                                                                               levels of cognitive complexity. In Proc. 4th Information Interaction in
searches were carried out. Participants in the Smith and Kantor                Context Symp., pages 254–257, Nijmegen, The Netherlands, 2012.
study were instructed to “find good information sources” for an
                Exploratory Search Missions for TREC Topics

             Martin Potthast                   Matthias Hagen                  Michael Völske                     Benno Stein

                                                        Bauhaus-Universität Weimar
                                                         99421 Weimar, Germany
                                         <first name>.<last name>@uni-weimar.de


ABSTRACT                                                                   crowdsourcing by employing writers whose task was to write long
We report on the construction of a new query log corpus that consists      essays on given TREC topics, using a ClueWeb09 search engine for
of 150 exploratory search missions, each of which corresponds to           research. Hence, our corpus forms a strong connection to existing
one of the topics used at the TREC Web Tracks 2009–2011. In-               evaluation resources that are used frequently in information retrieval.
volved in the construction was a group of 12 professional writers,         Further, it captures the way how average users perform exploratory
hired at the crowdsourcing platform oDesk, who were given the task         search today, using state-of-the-art search interfaces. The new cor-
to write essays of 5000 words length about these topics, thereby           pus is intended to serve as a point of reference for modeling users
inducing genuine information needs. The writers used a ClueWeb09           and tasks as well as for comparison with new retrieval models and
search engine for their research to ensure reproducibility. Thousands      interfaces. Key figures of the corpus are shown in Table 2.
of queries, clicks, and relevance judgments were recorded. This               After a brief review of related work, Section 2 details the corpus
paper overviews the research that preceded our endeavors, details          construction and Section 3 gives first quantitative and qualitative
the corpus construction, gives quantitative and qualitative analyses       analyses, concluding with insights into writers’ search behavior.
of the data obtained, and provides original insights into the query-       1.1     Related Work
ing behavior of writers. With our work we contribute a missing
                                                                              To date, the most comprehensive overview of research on ex-
building block in a relevant evaluation setting in order to allow for
                                                                           ploratory search systems is that of White and Roth [19]. More
better answers to questions such as: “What is the performance of
                                                                           recent contributions not covered in this body of work include the
today’s search engines on exploratory search?” and “How can it be
                                                                           approaches proposed by Morris et al. [13], Bozzon et al. [2], Car-
improved?” The corpus will be made publicly available.
                                                                           tright et al. [4], and Bron et al. [3]. Exploratory search is studied also
Categories and Subject Descriptors: H.3.3 [Information Search              within contextual IR and interactive IR, as well as across disciplines,
and Retrieval]: Query formulation                                          including human computer interaction, information visualization,
Keywords: Query Log, Exploratory Search, Search Missions                   and knowledge management.
                                                                              Regarding the evaluation of exploratory search systems, White
1.    INTRODUCTION                                                         and Roth [19] conclude that “traditional measures of IR perfor-
                                                                           mance based on retrieval accuracy may be inappropriate for the
   Humans frequently conduct task-based information search, i.e.,
                                                                           evaluation of these systems” and that “exploratory search evalua-
they interact with search appliances in order to conduct the research
                                                                           tion [...] must include a mixture of naturalistic longitudinal studies”
deemed necessary to solve knowledge-intensive tasks. Examples
                                                                           while “[...] simulations developed based on interaction logs may
include long-lasting interactions which may involve many search
                                                                           serve as a compromise between existing IR evaluation paradigms
sessions spread out across several days. Modern web search en-
                                                                           and [...] exploratory search evaluation.” The necessity of user stud-
gines, however, are optimized for the diametrically opposed task,
                                                                           ies makes evaluations cumbersome and, above all, expensive. By
namely to answer short-term, atomic information needs. Never-
                                                                           providing part of the solution (a decent corpus) for free, we want
theless, research has picked up this challenge: in recent years, a
                                                                           to overcome the outlined difficulties. Our corpus compiles a solid
number of new solutions for exploratory search have been proposed
                                                                           database of exploratory search behavior, which researchers may use
and evaluated. However, most of them involve an overhauling of
                                                                           for comparison purposes as well as for bootstrapping simulations.
the entire search experience. We argue that exploratory search tasks
                                                                              Regarding standardized resources to evaluate exploratory search,
are already being tackled, after all, and that this fact has not been
                                                                           hardly any have been published up to now. White et al. [18] dedi-
sufficiently investigated. Reasons for this shortcoming can be found
                                                                           cated a workshop to evaluating exploratory search systems in which
in the lack of publicly available data to be studied. Ideally, for any
                                                                           requirements, methodologies, as well as some tools have been pro-
given task that fits the aforementioned description, one would have
                                                                           posed. Yet, later on, White and Roth [19] found out that still no
a large set of search interaction logs from a diversity of humans
                                                                           “methodological rigor” has been reached—a situation which has not
solving it. Obtaining such data, even for a single task, has not been
                                                                           changed much until today. The departure from traditional evalua-
done at scale until now. Even search companies, which have access
                                                                           tion methodologies (such as the Cranfield paradigm) and resources
to substantial amounts of raw query log data, face difficulties in
                                                                           (especially those employed at TREC) has lead researchers to devise
discerning individual exploratory tasks from their logs.
                                                                           ad-hoc evaluations which are mostly incomparable across papers
   In this paper, we contribute by introducing the first large corpus of
                                                                           and which cannot be reproduced easily.
long, exploratory search missions. The corpus was constructed via
                                                                              A potential source of data for the purpose of assessing current
Presented at EuroHCIR2013. Copyright c 2013 for the individual papers      exploratory search behavior is to detect exploratory search tasks
by the papers’ authors. Copying permitted only for private and academic    within raw search engine logs, such as the 2006 AOL query log [14].
purposes. This volume is published and copyrighted by its editors..
However, most session detection algorithms deal with short term           Used TREC Topics.
tasks only and the few algorithms that aim to detect longer search          Since the topics from the TREC Web Tracks 2009–2011 were
missions still have problems when detecting interesting semantic          not amenable for our purpose as is, we rephrased them so that they
connections of intertwined search tasks [10, 12, 8]. In this regard,      ask for writing an essay instead of searching for facts. Consider for
our corpus may be considered the first of its kind.                       example topic 001 from the TREC Web Track 2009:
   To justify our choice of an exploratory task, namely that of writing            Query. obama family tree
an essay about a given TREC topic, we refer to Kules and Capra [11],               Description. Find information on President Barack
who manually identified exploratory tasks from raw query logs on                   Obama’s family history, including genealogy, national
a small scale, most of which turned out to involve writing on a                    origins, places and dates of birth, etc.
given subject. Egusa et al. [6] describe a user study in which they
                                                                                   Sub-topic 1. Find the TIME magazine photo essay
asked participants to do research for a writing task, however, without
                                                                                   “Barack Obama’s Family Tree.”
actually writing something. This study is perhaps closest to ours,
although the underlying data has not been published. The most                      Sub-topic 2. Where did Barack Obama’s parents and
notable distinction is that we asked our writers to actually write,                grandparents come from?
thereby creating a much more realistic and demanding state of mind                 Sub-topic 3. Find biographical information on Barack
since their essays had to be delivered on time.                                    Obama’s mother.
                                                                          This topic is rephrased as follows:
2.    CORPUS CONSTRUCTION                                                           Obama’s family. Write about President Barack Oba-
   As discussed in the related work, essay writing is considered a                  ma’s family history, including genealogy, national ori-
valid approach to study exploratory search. Two data sets form the                  gins, places and dates of birth, etc. Where did Barack
basis for constructing a respective corpus, namely (1) a set of topics              Obama’s parents and grandparents come from? Also
to write about and (2) a set of web pages to research about a given                 include a brief biography of Obama’s mother.
topic. With regard to the former, we resort to topics used at TREC,          In the example, Sub-topic 1 is considered too specific for our
specifically to those from the Web Tracks 2009–2011. With regard          purposes while the other sub-topics are retained. TREC Web track
to the latter, we employ the ClueWeb09 (and not the “real web in the      topics divide into faceted and ambiguous topics. While topics of the
wild”). The ClueWeb09 consists of more than one billion documents         first kind can be directly rephrased into essay topics, from topics of
from ten languages; it comprises a representative cross-section of the    the second kind one of the available interpretations is chosen.
real web, is a widely accepted resource among researchers, and it is
used to evaluate the retrieval performance of search engines within       A Search Engine for Controlled Experiments.
several TREC tracks. The connection to TREC will strengthen the              To give the oDesk writers a familiar search experience while main-
compatibility with existing evaluation methodology and allow for          taining reproducibility at the same time, we developed a tailored
unforeseen synergies. Based on the above decisions, our corpus            search engine called ChatNoir [15]. Besides ours, the only other
construction steps can be summarized as follows:                          public search engine for the ClueWeb09 is hosted at Carnegie Mel-
    1. Rephrasing of the 150 topics used at the TREC Web Tracks           lon and based on Indri. Unfortunately, it is far from our efficiency
       2009–2011 so that they invite people to write an essay.            requirements. Our search engine returns results after a couple of
    2. Indexing of the English portion of the ClueWeb09 (about            hundreds of milliseconds, its interface follows industry standards,
       0.5 billion documents) using the BM25F retrieval model plus        and it features an API that allows for user tracking.
       additional features.                                                  ChatNoir is based on the BM25F retrieval model [17], uses the
    3. Development of a search interface that allows for answering        anchor text list provided by Hiemstra and Hauff [9], the PageRanks
       queries within milliseconds and that is designed along the         provided by the Carnegie Mellon University,1 and the spam rank list
       lines of commercial search interfaces.                             provided by Cormack et al. [5]. ChatNoir comes with a proximity
    4. Development of a browsing interface for the ClueWeb09,             feature with variable-width buckets as described by Elsayed et al. [7].
       which serves ClueWeb09 pages on demand and which                   Our choice of retrieval model and ranking features is intended to
       rewrites links on delivered pages so that they point to their      provide a reasonable baseline performance. However, it is neither
       corresponding ClueWeb09 pages on our servers.                      near as mature as those of commercial search engines nor does it
    5. Recruiting 12 professional writers at the crowdsourcing plat-      compete with the best-performing models proposed at TREC. Yet,
       form oDesk from a wide range of hourly rates for diversity.        it is among the most widely accepted models in the information
    6. Instructing the writers to write essays of at least 5000 words     retrieval community, which underlines our goal of reproducibility.
       length (corresponds to an average student’s homework assign-          In addition to its retrieval model, ChatNoir implements two search
       ment) about an open topic among the initial 150, using our         facets: text readability scoring and long text search. The former
       search engine and browsing only ClueWeb09 pages.                   facet, similar to that provided by Google, scores the readability of a
    7. Logging all writers’ interactions with the search engine and       text found on a web page via the well-known Flesh-Kincaid grade
       the ClueWeb09 on a per-topic basis at our site.                    level formula: it estimates the number of years of education required
    8. Double-checking all of the 150 essays for quality.                 in order to understand a given text. This number is mapped onto
                                                                          the three categories “simple”, “intermediate”, and “expert.” The
   After the deployment of the search engine and successfully com-        long text search facet omits search results which do not contain at
pleted usability tests (see Steps 2-4 and 7 above), the actual corpus     least one continuous paragraph of text that exceeds 300 words. The
construction took nine months, from April 2012 through Decem-             two facets can be combined with each other. They are meant to
ber 2012. The post-processing of the data took another four months,       support writers that want to reuse text from retrieved search results.
so that this corpus is among the first, late-breaking results from        Especially interesting for this type of writers are result documents
our efforts. However, the outlined experimental setup can obvi-           containing longer text passages and documents of a specific reading
ously serve different lines of research. The remainder of the section
                                                                          1
presents elements of our setup in greater detail.                             http://boston.lti.cs.cmu.edu/clueWeb09/wiki/tiki-index.php?page=PageRank
       Table 1: Demographics of the twelve writers employed.               Table 2: Key figures of our exploratory search mission corpus.
                            Writer Demographics                            Corpus                            Distribution                      ⌃
Age                          Gender                 Native language(s)     Characteristic           min      avg     max     stdev
Minimum           24         Female       67%       English          67%   Writers                                                            12
Median            37         Male         33%       Filipino         25%   Topics                                                            150
Maximum           65                                Hindi            17%   Topics / Writer            1      12.5      33      9.3
Academic degree              Country of origin      Second language(s)     Queries                                                         13 651
Postgraduate    41%          UK              25%    English          33%   Queries / Topic            4      91.0    616      83.1
Undergraduate   25%          Philippines     25%    French           17%   Clicks                                                          16 739
None            17%          USA             17%    Afrikaans, Dutch,      Clicks / Topic            12     111.6    443      80.3
n/a             17%          India           17%    German, Spanish,       Clicks / Query             0       0.8     76       2.2
                             Australia         8%   Swedish each      8%
                                                                           Sessions                                                          931
                             South Africa      8%   None              8%
                                                                           Sessions / Topic           1      12.3    149      18.9
Years of writing             Search engines used    Search frequency       Days                                                              201
Minimum                 2    Google          92%    Daily            83%   Days / Topic               1       4.9      17      2.7
Median                  8    Bing            33%    Weekly            8%   Hours                                                            2068
Standard dev.           6    Yahoo           25%    n/a               8%   Hours / Writer             3     129.3    679     167.3
Maximum                20    Others            8%                          Hours / Topic              3       7.5     10       2.5
                                                                           Irrelevant                                                       5962
level such that reusing text from the results still yields an essay with   Irrelevant / Topic         1      39.8    182      28.7
                                                                           Irrelevant / Query         0       0.5     60       1.4
homogeneous readability.                                                   Relevant                                                          251
   When clicking on a search result, ChatNoir does not link into           Relevant / Topic           0       1.7       7      1.5
the real web but redirects into the ClueWeb09. Though ClueWeb09            Relevant / Query           0       0.0       4      0.2
provides the original URLs from which the web pages have been ob-          Key                                                              1937
                                                                           Key / Topic                1      12.9      46      7.5
tained, many of these page may have gone or been updated since. We         Key / Query                0       0.2      22      0.7
hence set up an interface that serves web pages from the ClueWeb09
on demand: when accessing a web page, it is pre-processed before           Corpus Statistics.
being shipped, removing all kinds of automatic referrers and replac-          Table 2 shows key figures of the query logs collected, including
ing all links to the real web with links to their counterpart inside       the absolute numbers of queries, relevance judgments, working days,
ClueWeb09. This way, the ClueWeb09 can be browsed as if surfing            and working hours, as well as relations among them. On average,
the real web and it becomes possible to track a user’s movements.          each writer wrote 12.5 essays, while two wrote only one, and one
The ClueWeb09 is stored in the HDFS of our 40 node Hadoop                  very prolific writer managed more than 30 essays.
cluster, and web pages are fetched with latencies of about 200ms.             From the 13 651 submitted queries, each topic got an average
ChatNoir’s inverted index has been optimized to guarantee fast             of 91. Note that queries often were submitted twice requesting
response times, and it is deployed on the same cluster.                    more than ten results or using different facets. Typically, about
                                                                           1.7 results are clicked for consecutive instances of the same query.
Hired Writers.                                                             For comparison, the average number of clicks per query in the
   Our ideal writer has experience in writing, is capable of writing       aforementioned AOL query log is 2.0. In this regard, the behavior of
about a diversity of topics, can complete a text in a timely man-          our writers on individual queries does not seem to differ much from
ner, possesses decent English writing skills, and is well-versed in        that of the average AOL user in 2006. Most of the clicks we recorded
using the aforementioned technologies. This wish list lead us to           are search result clicks, whereas 2457 of them are browsing clicks
favor (semi-)professional writers over, for instance, volunteer stu-       on web page links. Among the browsing clicks, 11.3% are clicks
dents recruited at our university. To hire writers, we made use of         on links that point to the same web page (i.e., anchor links using a
the crowdsourcing platform oDesk.2 Crowdsourcing has quickly               URL’s hash part). The longest click trail observed lasted 51 unique
become one of the cornerstones for constructing evaluation cor-            web pages but most click trails are very short. This is surprising,
pora, which is especially true for paid crowdsourcing. Compared            since we expected a larger proportion of browsing clicks, but it
to Amazon’s Mechanical Turk [1], which is used more frequently             also shows our writers relied heavily on the search engine. If this
than oDesk, there are virtually no workers at oDesk submitting fake        behavior generalizes, the need for a more advanced support of
results due to advanced rating features for workers and employers.         exploratory search tasks from search engines becomes obvious.
   Table 1 gives an overview of the demographics of the writers we            The queries of each writer can be divided into a total of 931 ses-
hired, based on a questionnaire and their resumes at oDesk. Most           sions with an average 12.3 sessions per topic. Here, a session is
of them come from an English-speaking country, and almost all of           defined as a sequence of queries recorded on a given topic which
them speak more than one language, which suggests a reasonably             is not divided by a break longer than 30 minutes. Despite other
good education. Two thirds of the writers are female, and all of them      claims in the literature (e.g., in [10]), we argue that, in our case,
have years of writing experience. Hourly wages were negotiated             sessions can be reliably identified by means of a timeout because of
individually and range from 3 to 34 US-dollars (dependent on skill         our a priori knowledge about which query belongs to which topic
and country of residence), with an average of about 12 US-dollars.         (i.e., task). Typically, finishing an essay took 4.9 days, which fits
In total, we spent 20 468 US-dollars to pay the writers.                   well the definition of exploratory search tasks being long-lasting.
                                                                              In their essays, writers referred to web pages they found during
3.       CORPUS ANALYSIS                                                   their search, citing specific passages and topic-related information
  This section presents the results of a preliminary corpus analysis       used in their texts. This forms an interesting relevance signal which
that gives an overview of the data and sheds some light onto the           allows us to separate irrelevant from relevant web pages. Slightly dif-
search behavior of writers doing research.                                 ferent to the terminology of TREC, we consider web pages referred
                                                                           to in an essay as key documents for its respective topic, whereas
2
    http://www.odesk.com                                                   web pages that are on a click trail leading to a key document are
        1                          5                                10                               15                           20                            25
A
            64    74    42    64        8    60     9    36    26     52    16   16   173       24    30   208   147    18   69     75    42    42   22   108    62

B
            76    60   162    12       68    33    80    66    24     78    23   17    61      133    29    14   185   118   46    135    40   181   84    58   241

C
            99   155    92    48   274      150    44    69   111    301    42   51   104       74    60    70   323   106   56    139   170   147   20    76   112

D
         198      48   218    94   198      48     88    50    55    10     46   47   98       108    28   136     4   106   69     32    62   101   74   616   120

E
            57    34    70    34       42   36    138    50    97    66     48   52   60        52    46   284   114   34    24     35   208   18    30    26    64

F
         319     154   113   153   148      40    248   109   347    23    196   27   28       119    18    23   113   70    58    210   158   16    20    33   165


Figure 1: Spectrum of writer search behavior. Each grid cell corresponds to one of the 150 topics and shows a curve of the percentage
of submitted queries (y-axis) at times between the first query until the essay was finished (x-axis). The numbers denote the amount
of queries submitted. The cells are sorted by area under the curve from the smallest area in cell A1 to the largest area in cell F25.

relevant. The fact, that there are only few click trails of this kind                   5.           REFERENCES
explains the unusually high number of key documents compared                                 [1] J. Barr and L. F. Cabrera. AI gets a brain. Queue, 4(4):24–29,
to that of relevant ones. The remainder of web pages which were                                  2006.
accessed but discarded by our writers may be considered irrelevant.                          [2] A. Bozzon, M. Brambilla, S. Ceri, and P. Fraternali. Liquid query:
   The writer’s search interactions are made freely available as the                             multi-domain exploratory search on the web. Proc. of WWW 2010.
Webis-Query-Log-12.3 Note that the writing interactions are the                              [3] M. Bron, J. van Gorp, F. Nack, M. de Rijke, A. Vishneuski, and
focus of our accompanying ACL paper [16] and contained in the                                    S. de Leeuw. A subjunctive exploratory search interface to
Webis text reuse corpus 2012 (Webis-TRC-12).                                                     support media studies researchers. Proc. of SIGIR 2012.
                                                                                             [4] M.-A. Cartright, R. White, and E. Horvitz. Intentions and
                                                                                                 attention in exploratory health search. Proc. of SIGIR 2011.
Exploring Exploratory Search Missions.                                                       [5] G. Cormack, M. Smucker, and C. Clarke. Efficient and effective
   To get an inkling of the wealth of data in our corpus, and how it                             spam filtering and re-ranking for large web datasets. Information
may influence the design of exploratory search systems, we analyze                               Retrieval, 14(5):441–465, 2011.
the writers’ search behavior during essay writing. Figure 1 shows                            [6] Y. Egusa, H. Saito, M. Takaku, H. Terai, M. Miwa, and N. Kando.
for each of the 150 topics a curve of the percentage of queries at any                           Using a concept map to evaluate exploratory search. Proc. of IIiX
given time between a writer’s first query and an essay’s completion.                             2010.
We have normalized the time axis and excluded working breaks of                              [7] T. Elsayed, J. Lin, and D. Metzler. When close enough is good
more than five minutes. The curves are organized so as to highlight                              enough: approximate positional indexes for efficient ranked
the spectrum of different search behaviors we have observed: in                                  retrieval. Proc. of CIKM 2011.
row A, 70–90% of the queries are submitted toward the end of the                             [8] M. Hagen, J. Gommoll, A. Beyer, and B. Stein. From search
writing task, whereas in row F almost all queries are submitted at the                           session detection to search mission detection. Proc. of SIGIR
                                                                                                 2012.
beginning. In between, however, sets of queries are often submitted
                                                                                             [9] D. Hiemstra and C. Hauff. MIREX: MapReduce information
in short “bursts,” followed by extended periods of writing, which                                retrieval experiments. Tech. Rep. TR-CTIT-10-15, University of
can be inferred from the plateaus in the curves (e.g., cell C12). Only                           Twente, 2010.
in some cases (e.g., cell C10) a linear increase of queries over time                       [10] R. Jones and K. L. Klinkner. Beyond the session timeout:
can be observed for a non-trivial amount of queries, which indicates                             automatic hierarchical segmentation of search topics in query logs.
continuous switching between searching and writing.                                              Proc. of CIKM 2008.
   From these observations, it can be inferred that query frequency                         [11] B. Kules and R. Capra. Creating exploratory tasks for a faceted
alone is not a good indicator of task completion or the current stage                            search interface. Proc. of HCIR 2008.
of a task, but different algorithms are required for different mission                      [12] C. Lucchese, S. Orlando, R. Perego, F. Silvestri, and G. Tolomei.
types. Moreover, exploratory search systems have to deal with a                                  Identifying task-based sessions in search engine query logs. Proc.
broad subset of the spectrum and be able to make the most of few                                 of WSDM 2011.
queries, or be prepared that writers interact only a few times with                         [13] D. Morris, M. Ringel Morris, and G. Venolia. SearchBar: a
                                                                                                 search-centric web history for task resumption and information
them. Our ongoing research on this aspect focuses on predicting the                              re-finding. Proc. of CHI 2008.
type of search mission, since we found it does not simply depend                            [14] G. Pass, A. Chowdhury, and C. Torgeson. A picture of search.
on the writer or a topic’s difficulty as perceived by the writer.                                Proc. of Infoscale 2006.
                                                                                            [15] M. Potthast, M. Hagen, B. Stein, J. Graßegger, M. Michel,
4.       SUMMARY                                                                                 M. Tippmann, and C. Welsch. ChatNoir: a search engine for the
                                                                                                 ClueWeb09 corpus. Proc. of SIGIR 2012.
   We introduce the first corpus of search missions for the ex-
                                                                                            [16] M. Potthast, M. Hagen, M. Völske, and B. Stein. Crowdsourcing
ploratory task of writing. The corpus is of representative scale,                                interaction logs to understand text reuse from the web. Proc. of
comprising 150 different writing tasks and thousands of queries,                                 ACL 2013.
clicks, and relevance judgments. A preliminary corpus analysis                              [17] S. Robertson, H. Zaragoza, and M. Taylor. Simple BM25
shows the wide variety of different search behavior to expect from a                             extension to multiple weighted fields. Proc. of CIKM 2004.
writer conducting research online. We expect further insights from                          [18] R. White, G. Muresan, and G. Marchionini, editors. Proc. of
a forthcoming in-depth analysis, whereas the results mentioned                                   SIGIR workshop EESS 2006.
demonstrate the utility of our publicly available corpus.                                   [19] R. White and R. Roth. Exploratory search: beyond the
3                                                                                                query-response paradigm. Morgan & Claypool, 2009.
    http://www.webis.de/research/corpora
         Interactive Exploration of Geographic Regions with
                  Web-based Keyword Distributions

                                       Chandan Kumar                               Dirk Ahlers
                                     University of Oldenburg,             NTNU – Norwegian University
                                      Oldenburg, Germany                   of Science and Technology,
                                   chandan.kumar@uni-                          Trondheim, Norway
                                      oldenburg.de                         dirk.ahlers@idi.ntnu.no
                                      Wilko Heuten                                Susanne Boll
                                       OFFIS – Institute for                 University of Oldenburg,
                                     Information Technology,                  Oldenburg, Germany
                                       Oldenburg, Germany                       susanne.boll@uni-
                                   wilko.heuten@offis.de                          oldenburg.de

ABSTRACT                                                                   1.    INTRODUCTION
The most common and visible use of geographic information                  Geospatial search has become a widely accepted search mode
retrieval (GIR) today is the search for specific points of inter-          o↵ered by many commercial search engines. Their inter-
est that serve an information need for places to visit. How-               faces can easily be used to answer relatively simple requests
ever, in some planning and decision making processes, the                  such as “restaurant in Berlin” on a point-based map inter-
interest lies not in specific places, but rather in the makeup             face, which additionally gives extended information about
of a certain region. This may be for tourist purposes, to find             entities [1]. A corresponding strong research interested has
a new place to live during relocation planning, or to learn                developed in the field of geographic information retrieval,
more about a city in general. Geospatial Web pages contain                 e.g., [2, 17, 15]. However, there are many tasks in which the
rich spatial information content about the geo-located facil-              retrieval of individual pinpointed entities such as facilities,
ities that could characterize the atmosphere, composition,                 services, businesses, or infrastructure cannot satisfy user’s
and spatial distribution of geographic regions. But the cur-               more complex spatial information needs.
rent means of Web-based GIR interfaces only support the
sequential search of geo-located facilities and services indi-             To support more complex tasks we propose a new retrieval
vidually, and limit the end users on abstracted view, analy-               method based on entities. For example, sometimes the dis-
sis and comparison of urban areas. In this work we propose                 tribution of results on a map can already inform certain
a system that abstracts from the places and instead gener-                 views about areas, e.g., a search for “bar” may show a clus-
ates the makeup of a region based on extracted keywords we                 tering of results that can be used for “eyeballing” a region
find on the Web pages of the region. We can then use this                  of nightlife even without sophisticated geospatial analysis.
textual fingerprint to identify and compare other suitable                 However, as users become more used to local search, more
regions which exhibit a similar fingerprint. The developed                 complex search types and supporting analysis are desired
interface allows the user to get a grid overview, but also to              that enable a combined view onto the underlying data [10].
drill in and compare selected regions as well as adapt the                 Exploration of geographic regions and their characterization
list of ranked keywords.                                                   was found as one of the key desire of local search users in
                                                                           our requirement study [11]. A person who is moving to a
Categories and Subject Descriptors                                         new area or city would like to find similar neighborhoods
H.3.3 [Information Storage and Retrieval]: Information                     or regions with a similar makeup to their current home. It
Search and Retrieval; H.5.2 [Information Interfaces and                    might not even be the concrete entities, but rather the atmo-
Presentation]: User Interfaces                                             sphere, composition, and spatial distribution that make up
                                                                           the “feeling” of a neighborhood that best capture the inten-
Keywords                                                                   tion of a user. To assess this similarity of regions we propose
                                                                           a spatial fingerprint (query-by-spatial-example) that acts as
Geographic information retrieval, Spatial Web, Geographic
                                                                           an abstracted view onto the same point-based data.
regions, Keyword distributions, Visualization, Interaction
                                                                           We also aim to provide new visual tools for the exploration of
                                                                           geographic regions. While the necessary multi-dimensional
                                                                           geospatial data is already available, there is no suitable inter-
                                                                           face to query them, let alone to deal with the multi-criteria
                                                                           complexity. In this paper we describe a visual-interactive
                                                                           GIR system to support the retrieval of relevant geospatial
Presented at EuroHCIR2013. Copyright c 2013 for the individual papers
                                                                           regions and enable users to explore and interact with geospa-
by the papers’ authors. Copying permitted only for private and academic
purposes. This volume is published and copyrighted by its editors.         tial data. We propose a new query-by-spatial-example in-
Figure 1: Geographic querying and ranking of ge-
ographic regions, with user-selected target regions
and alternative grid view

                                                                   Figure 2: Keyword-based visual comparison of geo-
teraction method in which a user-selected region’s charac-         graphic regions
teristic is fingerprinted to present similar regions. Users can
interactively refine their query to use those characteristics
of a region that are most important to them. For a more            fetched from OSM for the major cities of Germany. To re-
detailed overview, we use the full text of georeferenced Web       trieve actual pages, we crawled the Web with a geospatially
pages for queries and analysis. This work goes beyond con-         focused crawler [3] based on the geoparser and built a rich
ventional GIR interfaces as it allows users to interact with       geo-index for various cities of Germany, where each city con-
aggregated spatial information via spatial queries instead         tains several thousand geotagged Web pages with their full
of only textual querying, which is especially important to         textual content.
define regions of interest. We discuss the necessary input,
visualization, comparison, refinement, and ranking methods
in the remainder of this paper.
                                                                   3.    INTERFACE FOR EXPLORATION OF
                                                                         GEOGRAPHIC REGIONS OF INTEREST
2.     USING THE GEOSPATIAL WEB TO CHAR- We have implemented two main interaction modes in the
                                         Web interface as shown in Figure 1. A user intends to com-
       ACTERIZE GEOGRAPHIC REGIONS       pare multiple geographic regions of Frankfurt (target region,
The distribution of geo-entities is used to illustrate the char-   right in the dual-map view) with respect to a certain relevant
acteristics and dynamics of a geographic region. A geo-            region in Berlin (query region, left). The current reference
entity is a real life entity at a physical location, e.g., a       region of interest is specified via a visual query. The user
restaurant, theatre, pub, museum, business, school, etc. To        can then either select regions by placing markers onto the
open these entities up for aggregate and multi-criteria region     map, or alternatively use a grid overview (right side of Fig-
characterization, they need a certain depth of information         ure 1). In both cases, the system computes the relevance of
associated with them. It is obvious that only position infor-      the target regions with respect to the characteristics of the
mation or the name of a place is insufficient, so categorial       query region.
or textual description is needed. For initial studies [11, 9]
we used OpenStreetMap (OSM)1 which uses a tagging sys-
tem for categories. To better characterize the geo entities        3.1    Query-by-spatial-example
we now use their associated Web pages. The reason for this         Most GIR interfaces use a conventional textual query as in-
is the massive increase of the amount of usable data. The          put method to describe user’s information need or use the
Web pages of entities contain a lot more than just the ba-         currently selected map viewport. We wanted to give users
sic information and can therefore be used to uncover much          the ability to arbitrarily define their own spatial region of
more detailed information. This method can also include            interest. The free definition of the query region is important,
additional sources such as events happening in the region          as users may not always want a neighborhood that is eas-
or user-generated content on third-party pages [2]. We later       ily describable by a textual query. We therefore enabled to
describe how we identify the most meaningful keywords from         query by spatial example, where users can define the query
the pages for this task.                                           region by drawing on map. Figure 1 shows an example of
                                                                   a user selected region of interest via a polygon query (by
To actually make the connection from a location to Web             mouse clicks and drag) in the city of Berlin.
pages, we assume that the presence of location references
on a page is a strong indication that the page is associated
with the entity at that location. We use our geoparser to ex-
                                                                   3.2    Visualization of suitable geographic regions
                                                                   Users can select several location preferences in their target
tract location references and thereby assess the geographical
                                                                   region that they would like to explore by positioning markers
scopes of a page. The geoparser is trained to the presence of
                                                                   on the map interface. The system defines the targets with
location references in the form of addresses within the page
                                                                   a circle around the user-selected locations with the same di-
content. This is a suitable approach for the urban areas
                                                                   ameter as the reference region polygon. The target regions
we are addressing in this work, because we need a geospa-
                                                                   obtain the ranking with respect to their similarity with the
tial granularity at the sub-neighborhood level. Knowledge-
                                                                   reference region. Their relevance is shown by the percent-
based identification and verification of the addresses is done
                                                                   age similarity and the heatmap based relevance visualiza-
against a gazetteer extended with street names, which we
                                                                   tion. We used a color scheme of di↵erent green tones which
1
    http://www.openstreetmap.org/                                  di↵ered in their transparency. Light colors represented low
                                                                  tion and wants to influence the keywords. In the example
                                                                  of Figure 3, a user decides that pubs are more important
                                                                  than restaurant, fast food is not an aspect of his lifestyle
                                                                  and should be replaced by education facilities near his new
                                                                  home. In such scenarios users need to interact and adapt
                                                                  the generated keyword distributions of query regions. We
                                                                  make the word cloud interactive and editable. Users can
                                                                  drag keywords to alter their position and thus their signifi-
                                                                  cance. They can also edit, delete or replace keywords in the
                                                                  word cloud to change the criteria. After modifying the key-
                                                                  word distribution, users can revisualize the target regions to
                                                                  update their ranking. Figure 3 shows this user interaction
Figure 3: User interaction with the keyword distri-               with the word cloud, including the revisualization of the up-
bution and revisualization                                        dated ranking of target regions, which are visibly di↵erent
                                                                  from the previous ranking of Figure 2.

relevance, dark colors were used to indicate high relevance.      4.   TEXT-BASED CHARACTERIZATION AND
The color scheme selection was aided by ColorBrewer 2 .
                                                                       RANKING OF GEOGRAPHIC REGIONS
As an example, Figure 1 shows 4 user-selected locations on        We adapt common IR methods for ranking and similarity
the city map of Frankfurt, the circle regions around these        measures. In relevance-based language models, the similar-
4 markers have the same diameter as the query region in           ity of a document to a query is the probability that a given
Berlin. The target region in the centre of the city is most       document would generate the query [12]. To be able to do
relevant with the similarity of 88%, and consequently has         the same with geographic regions, we add a transitional step.
the darkest green tone. If a user has not yet formed any          Regions are considered as compound documents built from
preference, we o↵er an aggregate overview of geo-entities.        the Web pages of the entities inside them. We can then
We partition the map area using a grid raster [14], as we         define the similarity of document clusters of regions based
do not intend to restrict user exploration to only selected       on the probability that the target region can generate the
areas. There could be situations when users look beyond the       query region. The Kullback-Leibler divergence is used for
specific target regions, and would like to have an overview       comparison [4].
of the whole city with respect to a query region. The right
side of Figure 1 shows the aggregated ranked view of the          For a geospatial document d, we estimate P (w|d) , which is a
grid-based visualization. Each grid cell represents the overall   unigram language model , with the maximum likelihood es-
relevance with respect to the query region. The visualization     timator, simply given by relative counts: P (w|d) = tf (w,d)
                                                                                                                            |d|
                                                                                                                                ,
gives a good overview and assessment on relevant regions          here tf (w, d) is the frequency of word w in the document d
which the user can then explore further. Users can select         and |d| is the length of the document d. A geographic region
the grid size, which is otherwise similar to the size of the      contains several geospatial documents insides its footprint
query region. The grid layout is fixed to the city boundaries     area. We define a geographic region based on a document
as we intend to give the overview of whole city. In the future    cluster D which contains document {d1 , d2 ....dk }, and the
we would like to make it more dynamic where users should          distribution of a particular word w in the geographic re-
be able to shift the grid layout, since a slight variation in     gion would be estimatedPwith its combine probability in the
grid cell boundaries could alter the relevance results.           collection P (w|D) = k1 ki=1 P (w|di ). The word cloud rep-
                                                                  resents the most prominent keywords of the region with re-
                                                                  spect to their ranked probability distribution P (w|D). The
3.3     Exploration and interaction with geographic               comparison of regions is done with respect to their probabil-
        regions via keyword distributions                         ity distribution using KL-divergence. A target region x will
Interaction models should provide end users the opportu-          be compared to the query region as following
nity to explore the characteristics of selected regions, and                                     X                P (w|Dq )
adapt it further to their requirements. We initially show               Relevance(Regionx ) =        P (w|Dq )log
the most relevant keywords of the respective region using a                                       w
                                                                                                                  P (w|Dx )
word cloud. The word cloud provides more detailed infor-          The computation of this formula involves a sum over all
mation on keyword distribution when the mouse hovers over         the words that have a non-zero probability according to
it. The font size and order of the keywords signify their rel-    P (w|Dq ). Each region Regionx gets a relevance score ac-
evance. Figure 2 shows the comparison of the query region         cording to its distribution comparison to the query region
with the most relevant target region via both their keyword       Regionq . All target regions (user selected regions or grid
distributions. In this case, the distributions of both regions    based divisions) are ranked with respect to their relevance
are very similar, leading to the high relevance score for the     score for visualization.
target region.

Since the keyword characteristics of a query region is de-        5.   RELATED WORK
rived from the georeferenced Web pages, there are situations      The field of geographic information retrieval examines doc-
where a user might not be satisfied with the spatial descrip-     uments’ geospatial features at a regional scale and also at
                                                                  smaller granularities and usually supports keyword@location
2
    http://colorbrewer2.org                                       queries [2, 17, 15]. Similarly, location-based services (e.g.,
FourSquare, Yelp, Google Maps) allow users to retrieve and           Acknowledgments
visualize geo-entities matching a category or search term.           The authors are grateful to the DFG SPP 1335 ‘Scalable
However, search for multiple categories or other complex             Visual Analytics’ priority program which funds the project
tasks is usually not supported. Some non-conventional spa-           UrbanExplorer. The 2nd author acknowledges funding from
tial querying methods have been proposed, e.g., query-by-            the ERCIM “Alain Bensoussan” Fellowship Programme.
sketch on a map [6]. Other work uses the density of arbi-
trary user-supplied keywords to build a query region [8]. Tag        7.   REFERENCES
clouds have been adapted to maps, exploiting georeferenced            [1] D. Ahlers. Local Web Search Examined. In Web
tags [16]. Locally characteristic keywords can be extracted               Search Engine Research. Emerald, 2012.
for map visualization and to show their spatial extent [19].          [2] D. Ahlers and S. Boll. Location-based Web search. In
None of these approaches make a larger word cloud available,              The Geospatial Web. Springer, 2007.
but only the main terms. Other geovisualization approaches
                                                                      [3] D. Ahlers and S. Boll. Adaptive Geospatially Focused
[5, 7] approach multi-criteria analysis, but are usually tar-
                                                                          Crawling. In CIKM ’09, 2009.
geted to specific domains and experts. The Inspect system
was tailored at geospatial analysts to visually filter and ex-        [4] T. M. Cover and J. A. Thomas. Elements of
plore multidimensional data [13]. A multi-criteria evalua-                information theory, 1991.
tion for home buyers was proposed in [18]. The scenario of            [5] J. Dykes, A. M. MacEachren, and M.-J. Kraak.
spatial decision making is similar to ours, but it focused on             Exploring Geovisualization. Elsevier, 2005.
experts and spatial computation issues rather than interface          [6] M. J. Egenhofer. Query processing in spatial-query-
and visualization aspects.                                                by-sketch. J. Vis. Lang. Comput., 8, 1997.
                                                                      [7] R. Greene et al. GIS-based multiple-criteria decision
Our system interface di↵ers in the granularity of informa-                analysis. Geography Compass, 5(6), 2011.
tion need and representation, i.e., we focus on the ranking           [8] A. Henrich and V. Lüdecke. Measuring Similarity of
of regions, but base it on high-granularity geo-entities that             Geographic Regions for Geographic Information
have a very exact location, which ensures that the spatial                Retrieval. In ECIR ’09, 2009.
query does not produce overlap to neighboring regions and             [9] C. Kumar, W. Heuten, and S. Boll. Visual interfaces
makes the multi-criteria analysis more exact to be executed               to support spatial decision making in geographic
at arbitrary region sizes.                                                information retrieval. In CD-ARES 2013. to appear.
                                                                     [10] C. Kumar, W. Heuten, and S. Boll. Geovisualization
6.   CONCLUSIONS AND FUTURE WORK                                          for end user decision support: Easy and e↵ective
Most current local search interfaces do not o↵er adequate                 exploration of urban areas. In GeoViz Hamburg 2013:
support for the exploration and comparison of geographic                  Interactive Maps That Help People Think, 2013.
areas and regions. End users need visual and interactive as-         [11] C. Kumar, B. Poppinga, D. Haeuser, W. Heuten, and
sistance from GIR systems for an abstracted overview and                  S. Boll. Geovisual interfaces to find suitable urban
analysis of geospatial data. We proposed interactive inter-               regions for citizens: A user-centered requirement
faces for the characterization and assessment of relevant ge-             study. In UbiComp’13 Adjunct, 2013. to appear.
ographic regions that enable end-users to query, analyze and         [12] V. Lavrenko and W. B. Croft. Relevance based
interact with the rich geospatial data available on the Web               language models. SIGIR ’01. ACM, 2001.
in user-selected geographic regions. The relevance of regions        [13] S.-J. Lee et al. Inspect: a dynamic visual query system
is based on the similarity of keyword distributions.                      for geospatial information exploration. In SPIE, 2003.
                                                                     [14] A. M. MacEachren and D. DiBiase. Animated maps of
The observation of results shows satisfactory performance by              aggregate data: Conceptual and practical problems.
uncovering realistic and meaningful keywords defining the                 CaGIS, 18(4), 1991.
regions. We observed that the characterization and compar-           [15] A. Markowetz et al. Design and Implementation of a
ison of geographic regions show good results with respect                 Geographic Search Engine. In WebDB 2005, 2005.
to geo-located facilities and infrastructure of German cities,       [16] D.-Q. Nguyen and H. Schumann. Taggram: Exploring
e.g., clearly distinct characteristics for university, industrial,        geo-data on maps through a tag cloud-based
or party districts. In the future we plan a more formal qual-             visualization. In IV’10, 2010.
itative and quantitative evaluation of these interfaces, to
                                                                     [17] R. S. Purves et al. The design and implementation of
examine the acceptance of these visualizations with regard
                                                                          SPIRIT: a spatially aware search engine for informa-
to user-centered aspects such as exploration ability, infor-
                                                                          tion retrieval on the internet. IJGIS, 21(7), 2007.
mation overload, and cognitive demand. We would also like
to explore more advanced interaction methods to enhance              [18] C. Rinner and A. Heppleston. The spatial dimensions
the usability of the proposed visualizations.                             of multi-criteria evaluation – case study of a home
                                                                          buyer’s spatial decision support system. In Geographic
Additionally, we envision more powerful region similarity                 Information Science. 2006.
measures such as landscape and topological similarity, simi-         [19] B. Thomee and A. Rae. Uncovering locally
larity via social media, and an integration of additional data            characterizing regions within geotagged data.
sources.                                                                  WWW ’13, 2013.
     Inferring Music Selections for Casual Music Interaction

                   Daniel Boland                            Ross McLachlan                  Roderick Murray-Smith
                University of Glasgow                      University of Glasgow               University of Glasgow
                  United Kingdom                             United Kingdom                      United Kingdom
              daniel@dcs.gla.ac.uk                           r.mclachlan.1@                    rod@dcs.gla.ac.uk
                                                           research.gla.ac.uk


ABSTRACT                                                                 Music listeners are not always fully engaged with the se-
We present two novel music interaction systems developed                 lection of music - as evidenced by the success of the shu✏e
for casual exploratory search. In casual search scenarios,               playback feature. Large libraries of music such as Spotify are
users have an ill-defined information need and it is not clear           available but users often just want background music, not a
how to determine relevance. We apply Bayesian inference                  specific song out of millions. In these casual search scenar-
using evidence of listening intent in these cases, allowing              ios, users often satisfice i.e. search for something which is
for a belief over a music collection to be inferred. The first           ‘good enough’ [11]. As this information need is poorly de-
system using this approach allows users to retrieve music                fined, so too is relevance, placing these interactions outside
by subjectively tapping a song’s rhythm. The second sys-                 of typical Information Retrieval approaches.
tem enables users to browse their music collection using a
radio-like interaction that spans from casual mood-setting
through to explicit music selection. These systems embrace               2.   UNCERTAIN MUSIC SELECTION
the uncertainty of the information need to infer the user’s
                                                                         By asking ‘What would this user do?’, we can develop a
intended music selection in casual music interactions.
                                                                         likelihood model of user input within an interaction. With
                                                                         Bayes theorem, this allows for an uncertain belief over a
Categories and Subject Descriptors                                       music space to be inferred. Users can provide evidence of
H.5.2 [Information interfaces]: User Interfaces                          their listening intent as part of a casual music interaction,
                                                                         not needing to be fully engaged in the music retrieval. This is
                                                                         an explicitly user-centered approach, focusing on how a user
General Terms                                                            will interact with the system. Both the systems discussed
Design, Human Factors, Theory                                            here have been iteratively developed by comparing real user
                                                                         behaviour against that predicted by the user input models.
1.    INTRODUCTION                                                       We present two novel music retrieval systems which explore
                                                                         two challenges with this approach: i) how to correctly inter-
When interacting with a music system, listeners are faced
                                                                         pret evidence which may be subjective and ii) how to allow
with selecting songs from increasingly large music collec-
                                                                         users to set their current level of engagement:
tions. With services like Spotify, these libraries can include
                                                                         i) ‘Query by Tapping’ is a music retrieval technique where
many songs the user has never heard of. This retrieval is
                                                                         users tap the rhythm of a song in order to retrieve it [1].
often a hedonic activity and may not serve a particular in-
                                                                         As part of a user-centred development process, we identified
formation need. Users do not always have a song in mind
                                                                         that rhythmic queries are often subjective and so developed
and are often just interested in setting a mood or finding
                                                                         a model of rhythmic input which captures some of this sub-
something ‘good enough’ [9]. This type of casual search has
                                                                         jective behaviour. This allows for the system to be trained
recently been identified as not being well supported within
                                                                         to the user’s tapping style, giving significant improvements
IR literature [14]. In particular, the concept of relevance
                                                                         over previous e↵orts at rhythmic music retrieval.
becomes nebulous where the information need is not well
                                                                         ii) FineTuner is a prototype of a radio-like music interface
defined. By inferring a belief over a music collection using
                                                                         that enables users to retrieve music at a level of engage-
the likelihood of a user’s input, we implement interactions
                                                                         ment suited to their current information need. Users nav-
which incorporate this uncertainty. These interactions can
                                                                         igate their music collection using a dial, with the system
account for subjectivity and span from casual, serendipitous
                                                                         using prior knowledge of the user to inform the music se-
listening through to highly engaged music selection.
                                                                         lection. A pressure sensor enables users to assert varying
                                                                         levels of control over the system – with no pressure, users
                                                                         can casually tune in to sections of their music collection to
                                                                         hear recommended music with common characteristics. As
                                                                         pressure is applied, the user is able to make increasingly
                                                                         specific selections from the collection. The inferred music
                                                                         selection is conditioned upon the asserted control, allowing
Presented at EuroHCIR2013. Copyright c 2013 for the individual papers
by the papers authors. Copying permitted only for private and academic   for the seamless transition from casual mood-setting to en-
purposes. This volume is published and copyrighted by its editors.       gaged music interaction.
 Vocals
 Guitar
  Bass
 Drums

 User 1
 User 2
                                                                                                                                                             t

Figure 1: Users construct queries by sampling from preferred instruments. User 1 prefers Vocals and Guitar
whereas User 2 prefers Drums and Bass.


3.   MODELLING SUBJECTIVITY                                       3.1                           Query By Tapping
In this section we describe our e↵orts to model the subjectiv-    ‘Query by Tapping’ has received some consideration in the
ity of rhythmic queries, yielding a query by tapping system       Music Information Retrieval community. The term was in-
for casual music retrieval which can be trained to users to       troduced in [7] which demonstrated that rhythm alone can
account for their subjective querying style. After training       be used to retrieve musical works, with their system yield-
the system, a user can tap a rhythm to re-order their music       ing a top 10 ranking for the desired result 51% of the time.
collection by rhythmic similarity to their query. The top 20      Their work is limited however in considering only mono-
highly ranked results are listed on-screen as a music playlist,   phonic rhythms i.e. the rhythm from only one instrument,
from which the user could also then select a specific song.       as opposed to being polyphonic and comprising of multiple
Query by tapping provides an example of a casual music            instruments. Their music corpus consists of MIDI repre-
interaction which su↵ers from subjective queries. In mo-          sentations of tunes such as ”You are my sunshine” which is
bile music-listening contexts, it can often be inconvenient       hardly analogous to real world retrieval of popular music.
for users to remove their mobile device from their pocket         Rhythmic interaction has been recognised in HCI [8, 15]
or bag and engage with it to select music. This tapping of        with [4] introducing rhythmic queries as a replacement for
music as a querying technique for music is depicted in figure     hot-keys. In [2] tempo is used as a rhythmic input for explor-
2. Tapping a rhythm is already a common act and rhythm            ing a music collection – indicating that users enjoyed such a
is a universal aspect of music [13]. In an exploratory design     method of interaction. The consideration of human factors
session where users were asked to provide rhythmic queries,       is also an emerging trend in Music Information Retrieval
it became apparent that users di↵ered in querying style. We       [12]. Our work draws upon both these themes, being the
describe this subjective behaviour and our approach to mod-       first QBT system to adapt to users. A number of key tech-
elling it in previous work [1]. One of the key aspects of the     niques for QBT are introduced in [5] which describes rhythm
model is that users have preferences for which instruments        as a sequence of time intervals between notes – termed inter-
they tap to, as depicted in figure 1.                             onset intervals (IOIs). They identify the need for such in-
In order to assign a belief to the songs in the music col-        tervals to be defined relative to each other to avoid the user
lection given a rhythmic query, we compare the query to           having to exactly recreate the music’s tempo.
those predicted by the user input model. This comparison is       In previous implementations of QBT, each IOI is defined
done using the edit distance from string comparison meth-         relative to the preceding one [5]. This sequential depen-
ods, scaling the mismatch penalty to the time di↵erences          dency compounds user errors in reproducing a rhythm, as
between the rhythmic sequences [5].                               an erroneous IOI value will also distort the following one.


                                                                                      100
                                                                   recognition rate (%)


                                                                                          75


                                                                                                                                                  querymodel
                                                                                          50                                                        Gen. Model
                                                                                                                                                    Baseline


                                                                                          25


                                                                                          0
                                                                                               0.5   1.0    2.0      5.0    10.0      20.0 30.0
                                                                                                      query length in seconds (log)


Figure 2: Users are able to select music by simply                Figure 3: Percentage of queries yielding a highly
tapping a rhythm or tempo on the device, enabling                 ranked result (in the top 20 i.e. 6.7%) plotted
a casual eyes-free music interaction.                             against query length in seconds.
                    Probability of input


                                                           Probability of input


                                                                                                         Probability of input
                                           Dial position                          Dial position                                 Dial position


                                              (a)                                    (b)                                           (c)


Figure 4: As the user asserts control, the distribution of predicted input for a given song becomes narrower.
This adds weight to the input, meaning a belief is inferred over fewer songs and the view zooms in.


The approach to rhythmic interaction in [4] however used k-                                 We explore how the inference of listening intent can be con-
means clustering to classify taps and IOIs into three classes                               ditioned upon the user’s level of engagement, with the music
based on duration. The clustering based approach avoids                                     interaction spanning from casual mood-setting through to
the sequential error however loses a great deal of detail in                                specific song selection. While it would be desirable to bring
the rhythmic query and so we explore a hybrid approace.                                     the simplicity of radio-like interaction to modern music col-
                                                                                            lections, mapping a modern music collection to a dial such as
3.2    Evaluation                                                                           in figure 5 would require prolonged scrolling. An alternative
The most important metric for the system to be usable was                                   would be to instead support scrolling through an overview of
whether a rhythmic input produced an on-screen (top 20)                                     the music space however this removes granularity of control
result. We asked eight participants to provide queries for                                  from the user, leaving them unable to select specific items.
songs selected from a corpus of 300 songs which we had                                      We developed a radio-like system called FineTuner that al-
complete note onset data for. Participants listened to the                                  lows users to navigate their music, which is arranged along
songs first to ensure familiarity and were asked to provide                                 a mood axis. Users can ‘tune in’ to a mood to hear recom-
training queries for each song. These training queries were                                 mended songs based on their listening history. FineTuner
used to train the generative model using leave-one-out cross-                               allows the user to assert control over the music recommenda-
validation. We use a state-of-the-art onset detection algo-                                 tion by applying pressure to a sensor. This enables users to
rithm (based on measuring spectral flux [10]) as a baseline                                 seamlessly transition from a casual style of interaction akin
which does not account for subjectivity. Performance typi-                                  to a radio to controlling styles such as specifying a particular
cally improves with query length as seen in figure 3. Higher                                sub-area of interest in a music space, or even selecting indi-
rankings are achieved for all query lengths when using the                                  vidual songs. FineTuner provides a single interaction which
generative model. Interestingly, queries over 10 seconds lead                               supports casual search through to fully engaged retrieval.
to a rapid fall-o↵ in performance - possibly due to errors ac-
cumulating beyond the initial query the user had in mind or
due to users becoming bored.

4.    MODELLING ENGAGEMENT
We consider casual search interactions as spanning a range
of levels of engagement. How much a user is willing to en-
gage with a system and provide evidence of their listening
intent will undoubtedly vary with listening context. An in-
teraction which is fixedly casual would be as problematic
as one which requires a user’s full attention, with users un-
able to take control when they wish to. An example of this
would be old analogue radios – whilst they o↵er a simple
music interaction, users have limited control over what they
hear. Previous work by Hopmann et al. sought to bring the
benefits of interaction with vintage analog radio to modern
digital music collections [6], however their work also required                             Figure 5: Users share control over an intelligent ra-
explicit selection (a fixed level of engagement).                                           dio system, using a knob and pressure sensor.
4.1    Varying Engagement                                          creating a more personalised search experience. A key fea-
Our system enables both casual and engaged forms of inter-         ture of the second system, FineTuner, is its ability to span
action, giving users varying degrees of control over the selec-    seamlessly from casual search scenarios, such as satisficing,
tion of music. In casual interactions where users apply less       through to more explicit selections of music. By condition-
pressure, the system can become more autonomous – making           ing the inference upon the user’s level of engagement, we are
inferences from prior evidence about what the user intended.       able to interpret the same input space (in this case the dial)
This handover of control was termed the ‘H-metaphor’ by            according to the current context.
Flemisch et al. where it was likened to riding a horse – as        Our approach to casual music interaction empowers the user
the rider asserts less control the horse behaves more au-          to enjoy their music while expending as much or as little
tonomously [3]. By allowing users to make selections from          e↵ort in the retrieval as they wish, providing queries in their
the general to the specific, the system supports both specific     own subjective style. Instead of focusing solely on optimising
selections and satisficing. Users can make broad and uncer-        the retrieval process, we consider it equally important to
tain general selections to casually describe what they want        design retrieval systems which suit how the user currently
to listen to. However, they can also assert more control over      wants to interact. By considering how users might provide
the system and force it to play a specific song. Control is        casual evidence for their listening intent, we achieve music
asserted by applying force to a pressure sensor.                   interactions as simple as tapping a beat or tuning a radio.
As the user begins an interaction, they have not applied pres-
sure and therefore are not asserting control over the system.      6.   ACKNOWLEDGMENTS
The inferred selection is thus broad, covering an entire re-       We are grateful for support from Bang & Olufsen and the
gion of their collection and is biased towards popular tracks      Danish Council for Strategic Research.
(fig. 4a). The music in the inferred selection is visualised by
randomly sampling tracks from it and drawing beams from
the dial position to the album art. The user may press in the      7.   REFERENCES
                                                                    [1] Boland, D., and Murray-Smith, R. Finding My Beat:
knob to accept the selection and the sampled track is played.           Personalised Rhythmic Filtering for Mobile Music
At low levels of assertion it is likely that most tracks played         Interaction. In MobileHCI 2013 (2013).
would be highly popular tracks. This behaviour is a design          [2] Crossan, A., and Murray-Smith, R. Rhythmic Interaction
assumption, users may want the system to use other prior                for Song Filtering on a Mobile Device. Haptics and Audio
evidence. When the user applies pressure, the system inter-             Interface Design (2006), 45–55.
                                                                    [3] Flemisch, O., Adams, A., Conway, S. R., Goodrich, K. H.,
prets this as an assertion of control. The inferred selection is        Palmer, M. T., and Schutte, P. C.
smaller and the spread of beams becomes narrower, the al-               NASA/TMâĂŤ2003-212672 The H-Metaphor as a
bum art visualisation zooms in to show the smaller selection            Guideline for Vehicle Automation and Interaction, 2003.
(fig. 4b). This selection is a combination of evidence from         [4] Ghomi, E., Faure, G., Huot, S., and Chapuis, O. Using
the dial position with prior evidence i.e. their last.fm music          rhythmic patterns as an input method. Proc. CHI (2012),
history. When users fully assert control (max. pressure),               1253–1262.
they navigate the collection album by album (fig. 4c) and           [5] Hanna, P. Query by tapping system based on alignment
can make exact selections. By varying the pressure, users               algorithm. In Proc. ICASSP (2009), 1881–1884.
seamlessly move through this continuous range of control.           [6] Hopmann, M., Vexo, F., Gutierrez, M., and Thalmann, D.
                                                                        Vintage Radio Interface: Analog Control for Digital
The smooth change in engagement is achieved using a sim-                Collections. In CHI 2012: Case Study (2012).
ple model of user input. We assume that in an engaged               [7] Jang, J., Lee, H., and Yeh, C.-H. Query by Tapping: A
interaction, users will point precisely at the song of inter-           New Paradigm for Content-based Music Retrieval from
est (as in fig. 4c). For more casual selection, we assume               Acoustic Input. Proc. PCM (2001).
that users will point in the general area (mood) of the music       [8] Lantz, V., and Murray-Smith, R. Rhythmic interaction
they want, modelled using a normal distribution as in (fig.             with a mobile device. In Proc. NordiCHI, ACM (2004),
4b). As less pressure is applied the distribution is widened,           97–100.
leading to less precise selection and a greater role for a prior    [9] Laplante, A., and Downie, J. S. Everyday life music
                                                                        information-seeking behaviour of young adults, 2006.
belief over the music collection such as listening history.
                                                                   [10] Masri, P. Computer modelling of sound for transformation
                                                                        and synthesis of musical signals. PhD thesis, University of
5.    SUMMARY                                                           Bristol, 1996.
The scenarios explored here involve casual music retrieval,        [11] Scheibehenne, B., Greifeneder, R., and Todd, P. M. What
where users have an ill-defined information need and browse             Moderates the Too-Much-Choice E↵ect? Journal of
                                                                        Psychology & Marketing 26(3) (2009), 229–253.
for hedonic purposes or to satisfice a music selection. In
                                                                   [12] Stober, S., and Nürnberger, A. Towards user-adaptive
these cases, considering what input a user would provide for            structuring and organization of music collections. Adaptive
target songs and inferring selections is an intuitive approach          Multimedia Retrieval. Identifying, Summarizing, and
which avoids the issue of defining relevance. We show two               Recommending Image and Music (2010), 53–65.
music interactions which support the uncertain selection of        [13] Trehub, S. E. Human processing predispositions and
music, inferred from casual user input such as tapping a                musical universals. In The Origins of Music, N. L. Wallin,
rhythm or turning a radio dial.                                         B. Merker, and S. Brown, Eds. MIT Press, 2000, ch. 23,
                                                                        427–448.
We have shown that modelling user input for inferring mu-
                                                                   [14] Wilson, M. L., and Elsweiler, D. Casual-leisure Searching:
sic selection can address issues of subjectivity by taking a            the Exploratory Search scenarios that break our current
user-centered approach to model development. The model                  models. In HCIR 2010 (2010).
can be iterated by comparing its predictions against actual        [15] Wobbrock, J. O. Tapsongs: tapping rhythm-based
user behaviour. Accounting for this subjectivity can yield              passwords on a single binary sensor. In Proc. UIST (2009),
significant improvements in retrieval performance as well as            93–96.
Search or browse? Casual information access to a cultural
                  heritage collection

                                         Robert Villa, Paul Clough, Mark Hall, Sophie Rutter
                                                                        Information School
                                                                       University of Sheffield
                                                                           Sheffield, UK
                                                                              S1 4DP
                                     {r.villa, p.d.clough, m.mhall, sarutter1} @sheffield.ac.uk

ABSTRACT                                                                              The work reported here is based on initial results from the
Public access to cultural heritage collections is a challenging and                   Interactive CHiC (Cultural Heritage in CLEF) track of CLEF1 as
ongoing research issue, not least due to the range of different                       run at Sheffield University. The interactive CHiC track is based
reasons a user may want to access materials. For example, for a                       on the CHiC Europeana data set as used in 2011 and 2012 [1]. An
virtual museum website users may vary from professionals or                           early prototype of an evaluation framework was used [2] which
experts, to interested members of the public visiting on a whim. In                   allowed the interactive experiment to be semi-automated. In this
this paper, we are interested in the latter user: a user who visits a                 work, our focus is on how users explored the collection and in
cultural heritage website without a clear goal or information need                    particular how search and browse were used in this exploration.
in mind. In the user study reported here, carried out within the                      We consider three research questions:
context of the interactive task at CLEF (interactive CHiC), 20                            RQ1. How do participants initiate their exploration?
participants explored a subset of Europeana with no explicit task
provided using a custom-built interface that offered both search                          RQ2. Do participants use browse or search in their exploration
and browse functionalities. Results suggest that browsing is used                              of the collection?
considerably more by the majority of users when compared to text                          RQ3. How do participants decide to search or browse, when
search (all participants used the category browser before carrying                             given no explicit task?
out a text search). This highlights the need for cultural heritage
search interfaces to provide browsing functionality in addition to                    With RQ1 we are particularly interested whether users start their
conventional text search if they wish to support casual search                        exploration by browsing categories, or by search. RQ2 then
tasks.                                                                                considers how users access the collection over their whole
                                                                                      session. For RQ3 we will present some initial qualitative data
                                                                                      from our lab-based interactive study, where the aim is to identify
General Terms                                                                         reasons for the use of either the search or browse functions.
Design, Experimentation, Human Factors.
                                                                                      2. PREVIOUS WORK
Keywords                                                                              A general review of museum informatics is provided in [3],
Cultural heritage, virtual museums, information access.                               although the more specific area of museum visitor studies,
                                                                                      investigating why and how individuals visit museums, has a long
1. INTRODUCTION                                                                       history [4]. More recent work has focused on visitors to digital
Providing public access to cultural heritage is an ongoing and                        museums [5-7]. In [6] the information seeking behavior of
challenging area of research. Previous work suggests that visitors                    cultural heritage experts was studied through interviews, finding
to online cultural heritage collections (e.g. virtual museum                          that complex information gathering was required for the majority
visitors) are not necessarily motivated by an explicit task, and that                 of search tasks. In contrast [7] studied virtual museum visitors,
interacting with cultural heritage collections is exploratory in                      inspired by the work of [8] and [9] which suggest that museum
nature [8, 9].  Recent  work  in  the  area  of  ‘casual  search’  [10]  has          visitors are exploratory in their information seeking. This work
also investigated situations where users are driven by the pleasure                   [7] found that search occurred far more often than browse
of the search process itself, rather than an explicit information                     behavior for three of the four tasks used in the study, the
need.                                                                                 exception being an open and broad task where browsing occurred
                                                                                      to a greater degree.
The focus for this paper is how individuals explore a cultural
heritage collection when given no task. The results may be used                       Museum visitors can, in some respects, be considered as examples
both to contrast with studies which have used explicit tasks, and                     of  “casual  leisure”  searchers,  as  outlined  in  [10],  where  examples  
to motivate changes to cultural heritage systems to better support                    were  found  of  “need-less” browsing (based on a diary study, and
a diverse range of user tasks.                                                        analysis of Tweets, both outside the domain of cultural heritage).
                                                                                      Darby and Clough [11] investigated the information seeking


Presented at EuroHCIR2013. Copyright © 2013 for the individual papers                 1
                                                                                          http://www.promise-noe.eu/unlocking-culture
by  the  papers’  authors.  Copying  permitted  only  for  private  and  academic  
purposes. This volume is published and copyrighted by its editors.
                                                                                                     underlying search system is based on Apache Solr2,
                                                                                                     which provides the text search, spelling checker, and
                                                                                                     the   “more   like   this”   suggestions (determined using
                                                                                                     Solr’s  standard  more-like-this functionality. The data
                                                                                                     set used was the same as that used in interactive
                                                                                                     CHiC, a dump of the Europeana data set3.

                                                                                                     4. EXPERIMENTAL SETUP
                                                                                                     The search and browse interface was embedded into
                                                                                                     an IR evaluation system, which automatically
                                                                                                     administered pre- and post-questionnaires, and
                                                                                                     displayed the experimental system. All data reported
                                                                                                     here is from an in-lab study. This allowed a follow
                                                                                                     up interview to be carried out, during which each
                                                                                                     participant reviewed his or her search session. To
                                                                                                     enable this reviewing, Morae screen recording
                                                                                                     software   was   used   to   record   the   user’s   activity,   and  
                                                                                                     during the interview, an audio recording was made of
              Figure 1: Screenshot of the Interactive CHiC interface
                                                                                                     the  user’s  comments.  
behavior of genealogists, with an emphasis on the behavior of                         An important aspect of the interactive CHiC experimental design
amateurs and hobbyists, rather than professionals. In [12] a                          was that no explicit task was provided to users. Instead
review of three digital libraries projects is carried out, from the                   instructions asked the user to explore freely as they wished, until
point of view of Ingwersen and Järvelin's Information Seeking                         they were bored. Users were informed after they had been active
and Retrieval framework [13]. Similar to [10], it points out that                     for 10 minutes, and could then continue for a further 5 minutes if
information  behavior  by  end  users  may  be  the  “end  in  itself”.               they wished, at which point they would be asked to stop (these
                                                                                      timings were carry out by hand, and were approximate). Once this
The study reported here uses a conventional lab-based protocol.
                                                                                      was  finished,  the  user’s  search  session  would  be  replayed  to  them,
However, unlike in previous work, such as [7], the participants
                                                                                      and an interview conducted to investigate the user’s   search  
were not given an explicit task: the underlying aim being to model
                                                                                      process. Participants were paid 10 pounds for taking part.
a situation closer to that investigated in [10], where there is no
explicit information need.                                                            In total 20 participants were recruited for the study, 11 male and 9
                                                                                      female. Eight participants were in the 18-25 year age band, nine in
3. INTERACTIVE CHiC                                                                   the 26-35 band; the other 3 between 36-45. The majority were
A screenshot of the CHiC interactive system is shown in Figure 1.                     students (13), with 5 employed, one unemployed, and one
The interface is split into five main areas, clockwise from left to                   “other”. 13 had completed a higher education degree, while six
right: a category browser, search box, item display, bookbag, and                     were currently studying an undergraduate degree.
search results. The search box operates in the conventional
manner, allowing free text queries with search results being                          5. RESULTS
displayed as a grid below. When a result is clicked, it is displayed                  5.1 Initiation of exploration
in  the  “item  display”  on  the  right.  This  information  will  typically         RQ1 asks how users initiate their exploration of the collection. To
include   a   small   thumbnail,   textual   description,   and   the   item’s        investigate this, we first looked at how users started their session,
associated metadata. Metadata is clickable, e.g. if an item is listed                 and in particular, their searching. For example, did they select a
as being owned by the British Library, clicking on the field will                     category or enter a query?
search for British Library objects. At the bottom of the item
display   is   a   “more   like  this”,  which  displays  the  images  of  up  to     Over the whole data set four different actions were used by
eight similar objects, which can be viewed three at a time.                           participants to initiate their session (Table 1, column 2). For the
                                                                                      majority of users, the first action was to select one of the
On   the   left   of   the   interface   is   the   “category   browser”,   which     categories (15 out of the 20 users). It should be noted that the
allows the user to browse the Europeana collection through a                          interface, on startup, showed a set of default results to all users.
hierarchy of categories. This hierarchy is automatically generated,                   For three users, the first action was to display one of these default
and is based on the work of [14]. The technique combines the                          results,  another  user  clicked  the  “next  page”  to  view  the  next  page  
Wikipedia category hierarchy with topics derived from Wikipedia                       of default results, while the final   user’s   first   action   was   to  
articles into which items are mapped. When a category is clicked,                     bookmark one of the default result items.
the main results are updated to list the category contents. Small
right arrows beside each non-leaf category allows the viewing of                      We  also  investigated  the  logs  to  find  out  each  user’s  first  search  or  
sub-categories. The user can therefore search and browse the                          browse action, which could be one of category select, text query,
collection in three main ways: using a text query, selecting a                        or metadata/more like this select. As shown in Table 1 (column
category,  or  selecting  item  metadata  or  “more  like  this”.                     3), for all users this was a category select. In addition to counting
                                                                                      the first actions, we also investigated how long each user spent
On the bottom right of the interface is the bookbag, into which                       before either clicking the interface, or starting a new
items can be placed. Book-bagged items are kept listed on the
display, and can be removed and redisplayed as required. The                          2
                                                                                          http://lucene.apache.org/solr/
                                                                                      3
                                                                                          http://www.europeana.eu/
search/browse using the three previously listed methods. These
results are shown in Table 2, along with the overall length of time
of each session.
  Table 1: Number of users whose first action/first search or
             browse action were as column one.
Action                          #Users first       #Users first
                                action             search/browse action
Category select                 15                 20
Display item                    3                  -
Next search result page         1                  -
Add to bookbag                  1                  -
    Table 2: Time to first action, time to first search/browse                    Figure 2: Comparison of query and category select counts
     action, and overall session time (all times in seconds)
             Min      1st Qu.        Median    Mean       3rd Qu      Max
  First
             7.00     19.00          25.00     30.50       38.75      90.00
 action
 First
search/      7.00     22.75          38.00     57.50       81.75      204.0
browse
 Total
             129      631.8          783.5     787.8       918.0      1544
 time


There was a considerable variance in the length of time users
spent on the task. The median time taken by users was 783.5
seconds (just over 13 minutes), with an interquartile range of                   Figure 3: Estimated time querying vs. browsing by category
286.2 seconds (approximately 4 minutes, 45 seconds). The
minimum time was 129 seconds, and maximum 1544 seconds
                                                                                5.3 “How  did  you  start?”
                                                                                In addition to the quantitative data above, in the post-session
(over 25 minutes).
                                                                                interview two questions were asked  of  users:  “how  did  you  start?”
Most users spent some time at the start of their session before                 and   “Why   did you choose to start with a [category/search
either clicking on an interface element (median time 25 seconds)                query]?” It was intended to alter this latter question depending on
or initiating a search (median 38 seconds).                                     how the user initiated their exploration. While some users started
                                                                                by examining the results, all users chose the category browser
5.2 Search vs. browse                                                           over the search box to initiate searchers.
RQ2 asks whether participants use search or browse. Figure 2
presents query and category counts across all users (i.e. counts of             The responses to the first question “how   did   you   start?”  
how often either text queries were executed or categories                       mentioned the category browser explicitly in 8 of the 12 answers.
selected).  Item  select  and  the  “more  like  this”  functionality  is not   In most of these cases this was linked to exploring the interface.
included here, due to the relative rarity of these events (across the           For example, participant P3 stated:
whole data set this functionality was used only 15 times, by 7                      “I   was   drawn   to   the   middle   then   decided   to   look   around   at  
different users).                                                                   the interface. I decided to look at categories first, picked
A non-parametric Wilcoxon rank-sum test indicated that there was                    politics”
a significant difference between queries executed and categories                Similarly, participant P10 stated:
selected (W  =  50.5,  p  ≤  0.001).  As can be seen from the boxplots,             “I just looked round to see what I could use to explore things.
categories were selected far more than queries entered, the median                  The category browser looked like the most likely candidates
number of queries executed being 2, compared to a median of 11                      because it had descriptions of stuff.”
for category selects. All but three users selected more categories
than executed queries, and 8 users did not enter a text query at all.           As well as being influenced by the interface, responses from some
                                                                                users suggest that prior interests also played a part. For example,:
A similar situation exists when the time querying vs. browsing
categories is estimated (Figure 3). Such times were estimated by                    “I just look at the layout of the website and then found that I
starting a timer when a query or category was selected, and taking                  had a category browser so I went to what I study actually,
all activity between this point and the next query or category                      and I study languages and I try to find something
select as the user either “querying”  or  “browsing  categories”.  As               interesting.” [P8]
might be expected, the trend is similar to that of Figure 2, with                   “There is no particular task and so I started from browse to
users spending more time browsing categories when compared to                       see which information is more interesting to me.”  [P1]
executing queries. All but five participants spent more time
                                                                                The design of the interface, with a relatively small search box,
browsing using the categories than spent querying.
                                                                                appears to also have had an effect on the choses of at least two of
the user, indicated by responses to the second question.                            8. REFERENCES
Participants P2 and P4 stated:                                                      [1] Gäde, M., Ferro, N., and Lestari Paramita, M. 2011. ChiC
    “Because I only saw that [category].   I  didn’t see the search                     2011 – Cultural Heritage in CLEF: From Use Cases to
    until a bit later on.”  [P2]                                                        Evaluation in Practice for Multilingual Information Access to
    “I didn’t really see this one at first [the search box] it was a                    Cultural Heritage. In Petras, V., Forner, P., and Clough, P.,
    bit obscure.”  [P4]                                                                 editors, CLEF 2011 Labs and Workshops, Italy.

For many users, however, the fact that the category browser                         [2] Hall, M. and Toms E. 2013. Building a Common Framework
allowed easy exploration appeared to be the key, with some users                        for IIR Evaluation. Information Access Evaluation meets
making connections to physical museums. For example:                                    Multilinguality, Multimodality, and Visualization, 4 th
                                                                                        International Conference of the CLEF Initiative.
     “If I was going to a museum I would look at the categories
    [museum sections] that are of most interest to me: arts, old                    [3] Marty, P. F., Rayward, W. B. and Twidale, M. B. 2003.
    stuff and so this is why I was looking for Mona Lisa.”  [P5]                        Museum informatics. Ann. Rev. Info. Sci. Tech., 37, 259–
                                                                                        294.
The lack of an explicit task was mentioned by some, and search
was explicitly commented on by two users. E.g., P7 stated “When                     [4] Booth, B. 1998. Understanding the Information Needs of
I  wanted  to  find  something  specific  I  went  to  the  search  box.”               Visitors to Museums, In Museum Management and
                                                                                        Curatorship, 17(2).
6. DISCUSSION                                                                       [5] White, L., Gilliland-Swetland A., and Chandler R. 2004.
RQ1 asks how participants initiate their exploration of the                             We're Building It, Will They Use It? The MOAC II
collection. From Table 1 it can be seen that all 20 participants                        Evaluation Project. In Museums and the Web (MW2004),
started their exploration using the category browser, rather than a                     http://www.museumsandtheweb.com/mw2004/papers/g-
text search. Indeed, the first action for the majority of users (75%)                   swetland/g-swetland.html
was to select a category. Quantitative data from Section 5.3 backs
                                                                                    [6] Amin, A., van Ossenbruggen, J., Hardman, L. and van
this up, with 8 out of 12 of the participants for which text
                                                                                        Nispen, A. 2008. Understanding cultural heritage experts'
transcripts are available explicitly mentioning the category
                                                                                        information seeking needs. In Proceedings of the 8th
browser as a way of starting their exploration. Looking at Table 2,
                                                                                        ACM/IEEE-CS joint conference on Digital libraries (JCDL
it can be seen that there is typically a short delay until participants
                                                                                        '08). ACM, New York, NY, USA, 39-47.
started their browsing (median 38 seconds, interquartile range of
59).   This   delay   is   consistent   with   participant’s   comments   which     [7] Skov, M. and Ingwersen, P. 2008. Exploring information
suggested that many first spent some time orienting themselves to                       seeking behaviour in a digital museum context. In
the interface before starting (e.g. P10 from Section 5.3).                              Proceedings of the second international symposium on
                                                                                        Information interaction in context (IIiX '08), ACM, New
Moving to RQ2 and RQ3, which asked whether participants have
                                                                                        York, NY, USA, 110-115.
a preference for browse or search and why, it is clear from Figure
2 and Figure 3 that there is a general preference for browsing, e.g.                [8] Black, G. 2005. The engaging museum. London: Routledge.
from Figure 3 the median estimated time spent browsing using the                    [9] Treinen, H. 1993. What does the visitor want from a
categories was 524 seconds (IQR 399), compared to 77 seconds                            museum? Mass media aspects of museology. In S. Bicknell
(IQR 394) for text queries. Looking at the participant comments,                        and G. Farmelo (Eds.), Museum visitor studies in the 90s,
the lack of any explicit task would appear to have played a part in                     London, Science Museum, 86-93.
this preference (e.g. P1 and P5 quotes from Section 5.3). In
                                                                                    [10] Wilson, M. L. and Elsweiler, D. 2010. Casual-leisure
addition to this the design of the interface, with a relatively small
                                                                                         Searching: the Exploratory Search scenarios that break our
text search box at the top, appeared to also play a part, with some
                                                                                         current models. In: 4th International Workshop on Human-
users pointing out that they did not see the search box until later
                                                                                         Computer Interaction and Information Retrieval, Aug 22
in their session (e.g. P2 and P4).
                                                                                         2010, New Brunswick, NJ, 28-31.
7. CONCLUSIONS AND FUTURE WORK                                                      [11] Darby, P. and Clough, P. 2013 Investigating the information-
The preliminary results reported here would suggest that                                 seeking behaviour of genealogists and family historians.
providing browse functionality to cultural heritage collections is                       Journal of Information Science February, 39, 73-84.
important for users arriving without a specific information need,                   [12] Richard Butterworth and Veronica Davis Perkins. 2006.
as may be typical in casual search. For the majority of users, this                      Using the information seeking and retrieval framework to
preference for category browsing continues to hold for the session                       analyse non-professional information use. In Proceedings of
as a whole, with all but 5 users spending more time browsing than                        the 1st international conference on Information interaction in
keyword searching. Initial analysis of quantitative interface data                       context (IIiX). ACM, New York, NY, USA, 162-168.
backs up the qualitative results, with more of the currently
analysed user transcripts explicitly mentioning the category                        [13] Ingwersen, P. and Järvelin, K. 2005. The turn: integration of
browser. The results presented here are preliminary. Future work                         information seeking and retrieval in context. Dordrecht, The
will expand on the analysis presented here, both the qualitative                         Netherlands: Springer.
and quantitative results. However, these initial results provide                    [14] Fernando, S., Hall, M.M., Agirre, E., Soroa, A., Clough, P.
evidence of the importance of providing browse functionality to                          & Stevenson, M. (2012) Comparing taxonomies for
cultural heritage collections, and Europeana in particular.                              organizing collections of documents, Proceedings of
                                                                                         COLING 2012: Technical Papers, 879-894.
Acknowledgements: This work was supported by the EU projects
PROMISE (no. 258191) and PATHS (no. 270082).
                           Studying Extended Session Histories

                      Chaoyu Ye                             Martin Porcheron                       Max L. Wilson
                 Mixed Reality Lab                          Mixed Reality Lab                    Mixed Reality Lab
            University of Nottingham, UK               University of Nottingham, UK         University of Nottingham, UK
           psxcy1@nottingham.ac.uk                      me@mporcheron.com                  max.wilson@nottingham.ac.uk

ABSTRACT                                                                  however our research has focused on using such methods to
While there is an increasing amount of interest in evalu-                 better understand real extended search sessions. This pa-
ating and supporting longer “search sessions”, the majority               per begins by first summarising literature on sessions and
of research has focused on analysing large volumes of logs                then describes our research methods and preliminary find-
and dividing sessions according to obvious gaps between en-               ings about extended search sessions.
tries. Although such approaches have produced interesting
insights into some di↵erent types of longer sessions, this pa-
per describes the early results of an investigation into ses-             2.   UNDERSTANDING “SESSIONS”
sions as experienced by the searcher. During interviews,                     Although investigations into web sessions can be dated
participants reviewed their own search histories, presented               back to around 20 years ago (e.g. [2]), the concept of a session
their views of “sessions”, and discussed their actual sessions.           still lacks clear definition. A number of researchers have gen-
We present preliminary findings around a) how users under-                erated diverse definitions of a session using di↵erent delim-
stand sessions, b) how these sessions are characterised and               iters such as cuto↵ time, query context, or even the status of
c) how sessions relate to each other temporally.                          the browser windows (e.g. [7]). In 1995, Catledge and Pitkow
                                                                          used a “timeout”, the time between two adjacent activities,
                                                                          to divide user’s web activities into sessions and found that
Categories and Subject Descriptors                                        a 25.5 minute timeout was best [2]. Their research, how-
H5.2 [Information interfaces and presentation]: User                      ever, was focused on general web activity rather than search
Interfaces. - Graphical user interfaces.                                  sessions, but their 25.5 minutes timeout has been used by
                                                                          many others. He and Goker later aimed to find the optimal
Keywords                                                                  interval that would divide large sessions, whilst not a↵ect-
                                                                          ing smaller sessions [4]. Their analysis found that optimal
HCIR, Interactive, Information Retrieval, Sessions                        timeout values vary between 10 and 15 minutes.
                                                                             In 2006, Spink et al [11] defined a session as the entire
1.    INTRODUCTION                                                        series of queries submitted by a user during one interaction
   Information Retrieval (IR) specialists are becoming in-                with a search engine, and one session may consist of single
creasingly concerned with users who continue to search be-                or multiple topics. Their approach focused on topic changes
yond a few queries or a few minutes1 . Although Informa-                  rather than temporal breaks, yet it is perhaps unclear how
tion Retrieval, and even Interactive IR, evaluations are well             they determined “one interaction” with a search engine.
known, research is recognising situations where people con-                  A clear definition has also been cited as an important
tinue to search after finding seemingly useful results [13].              challenge in other research. While focusing on “revisitation”
Some might be in a larger session involving several related               behaviour, Jhaveri and Räihä [6] and Tausher and Green-
subtopics, while others may continue to search for enter-                 berg [12] found it challenging to di↵erentiate between in-
taining videos until they struggle to find ‘good’ results [3,             session revisitation and post-session revisitation, for which
1]. Consequently, researchers are interested in how to eval-              a clear detection of session boundaries would be useful.
uate, measure, and ultimately better support searchers who                   When focusing on searching, rather than web sessions,
continue to search for extended sessions.                                 some use the concept of a “query session”. Nettleton et al
   Most research into extended search sessions, described in              defined a query session as at least one query made to a
detail below, has focused on analysing search engine logs [1,             search engine, together with the results which were clicked
4, 8] by dividing the logs using obvious periods of inactivity            on and other user behaviours as well [8]. They also evaluated
and either qualitatively [1] or quantitatively [4, 8] charac-             the “session quality” based on the number of clicks, hold
terising them. Some research has investigated human web                   time and ranking of selected documents, and they used these
behaviour and user goals qualitatively through interviews,                measures to help determine the di↵erence between sessions.
                                                                             To summarise the di↵erent approaches used to define ses-
1
  The recent NII Shonan event and the forthcoming Dagstuhl                sions, Jansen et al. provided a summary of the three most
are both, for example, focused on this topic.                             representative strategies [5], as shown in Table 1. As IP and
                                                                          cookies were utilised to identify a user, the most frequent
                                                                          strategies involve temporal cuto↵s and topic change.
                                                                             The methods summarised in Table 1 are primarily focused
Presented at EuroHCIR2013. Copyright c 2013 for the individual papers
by the papers’ authors. Copying permitted only for private and academic   on temporal and topical boundaries, but other research has
purposes. This volume is published and copyrighted by its editors.        shown clear challenges to these strategies. Mackay et al, in
Table 1: Session Diving Strategies; Jansen et al [5]

        Approach            Session Constraints
        1                        IP, cookie
        2             IP, cookie, and temporal cuto↵
        3             IP, cookie, and content change


                                                                                  Figure 1: Session Card Information
2008, examined tasks that frequently occur as multi-session
tasks, where something thematically consistent occurs over
multiple sessions [7]. Moreover, research into web, browser,       In addition, the reasons for leading to non-success and dif-
and browser-tabs, has found that some users often keep web         ficulty can be investigated via the card sorting of difficulty,
pages spread out over time, especially in the information          and the di↵erence of user’s web behaviour in di↵erent envi-
gathering tasks, e.g. [10]. These situations indicate that         ronments can also be examined by the sorting of location.
the logged web behaviour may di↵er significantly from the          The entire interview was audio recorded, and physical copies
actual behaviours and intentions of the searchers. This re-        of the card sorts were kept for analysis.
search focuses on the searcher’s experience of web sessions,          This paper describes our preliminary analysis of the first
such that others may continue to develop strategies for more       phase of the study, which involved 11 interviews. Phase two,
accurately dividing large scale logs into sessions.                which is still under way, involves a slightly refined methodol-
                                                                   ogy to capture more information about topics that emerged
                                                                   from the initial analysis described below. A more compre-
3.   EXPERIMENT DESIGN                                             hensive analysis of both phases will be published later.
   To understand and characterise real extended search ses-
sions, we employed similar interview methods to Sellen et          4.     PRELIMINARY FINDINGS
al. [10]. Participants were engaged in a 90-120 minute inter-
                                                                      Based on our preliminary investigation, some potentially
view about their own search behaviour. To ground the inter-
                                                                   interesting results relating to perceived duration, time of
views in real data, participants focused on printouts of their
                                                                   day, and use of queries were found. We considered each of
own web history, and we used the card sorting technique [9]
                                                                   these below according to two aspects: activity goal and ac-
to probe their mental models of sessions. The procedure was
                                                                   tivity context. For activity goal, we used Sellen et al’s [10]
approved by the school ethics board and pilot tested.
                                                                   6 categories: ‘finding’, ‘information gathering’, ‘browsing’,
   Participants began by providing their web history and
                                                                   ‘transaction’, ‘communication’, and ‘housekeeping’. This
they were advised to edit their history in advance should
                                                                   approach did not include any email, so this was added as a
they wish to keep some logged activities private2 . These logs
                                                                   7th category. For activity context, we applied Elseweiler et
were gathered by importing their search histories to Firefox
                                                                   al’s [3] comparison between work and non-work (leisure) ac-
(if not already there), and creating an XML export using
                                                                   tivities, involving: ‘work’, ‘serious-leisure’, ‘project-leisure’,
“History Export 0.4”3 . This log was then structured and
                                                                   and ‘casual-leisure’. At this early stage in the project, the
preliminarily processed using a) automatic methods to find
                                                                   primary author performed the classification individually based
search URLs, and b) manual investigation to find possible
                                                                   on corresponding examples given in the referenced work.
sessions to discuss in the interview. After providing demo-
graphic information, participants spent around 20 minutes          4.1      Defining Sessions
examining the structured printout of their history, using a
                                                                     There were 216 sessions in total and 19.6 sessions per
pen to mark sessions. These sessions, unless duplicates of
                                                                   person have been studied thus far, as shown as Table 2.
prior sessions, were written onto separate cards for later sort-
                                                                   Amongst these, 94 were longer than 5 minutes, 99 featured
ing until around 20 cards were produced. Each card had
                                                                   search and only 9 sessions were unsuccessful.
a number, a title, activity purpose, included history items
from the history list and also whether it has been completed                       Table 2: All Session Information
successfully or not; an example is shown in Figure 1.
                                                                         Parti-    Session   Long Ses-    Unsuccess    Search Ses-   Query
   The remainder of the interview involved first open, and               cipant      No.     Sion No.    Session No.    Sion No.      No.
then closed card sorting. Open card sorting allowed the                     1         18         9            1            13          45
                                                                            2         30        14            0            11          34
participants to classify and group the sessions according to                3         20        12            1            12         101
                                                                            4         20         8            1             9          22
their own ideas, whilst closed card sorting allowed us to                   5         16        10            0             6          17
make sure the following dimensions were considered: pur-                    6
                                                                            7
                                                                                      26
                                                                                      30
                                                                                                 6
                                                                                                 5
                                                                                                              0
                                                                                                              1
                                                                                                                           16
                                                                                                                            0
                                                                                                                                       27
                                                                                                                                        0
pose, for whom, with whom, location, duration, difficulty,                  8         17         7            1            12          74
                                                                            9         10         6            0             6          18
importance, frequency, and priority. This exercise was to                  10         10         8            4             4          57
                                                                           11         19         9            0            10          23
help explore the session feature in a more detailed way. For             Total       216        94            9            99         418
example, studying frequency helps to find out the most fre-               Avg.      19.6        8.5          0.8            9          38

quent sessions and elicit the pattern of user’s web activity.
                                                                      All participants mentioned that activities with the same
2
  Although this means we have likely missed common search          purpose and subject should be grouped into one session, as
sessions, like the lengthy adult sessions observed by Bailey       shown in Table 3. In addition, 8 of the 11 suggested that
et al [1], it was considered an important ethical provision.       similar tasks happened in di↵erent time periods should be
3
  addons.mozilla.org/en-us/firefox/addon/history-export/           classified as a single session, rather than them being tem-
        Table 3: Session Delimiters Summary                                            Table 5: Duration Categories
         Parti-            Type of   Differ time->                             Group               Detail
                  Topic                               Emotion
         cipant            Source    Differ Session                                                Sessions defined as Long
                                                                               Defined Long
            1       +        +              -           -                                          by Participant
            2       +         -            +            -                                          Session whose actual duration
            3       +         -             -           -                      Long
                                                                                                   is >= 5 mins
            4       +         -             -           -                                          Session defined as Long and its
            5       +         -             -           -                      Actual Long
                                                                                                   actual duration is >= 5 mins
            6       +         -            +            -                                          Session defined as Long but its
            7       +         -             -           +                      Over-estimated
                                                                                                   actual duration is less than 5 mins
            8       +         -            +            -                                          Session defined as Short
            9       +         -             -           -                      Defined Short
                                                                                                   by participant
           10       +         -             -           -                                          Session whose actual duration
           11       +         -             -           -                      Short
                                                                                                   is less than 5 mins
                                                                                                   Session defined as short and its
                                                                               Actual Short
                                                                                                   actual duration is less than 5 mins
                                                                                                   Session defined as Short but its
                                                                               Under-estimated
                                                                                                   actual duration is >= 5 mins

porally connected. Some participants said that they always
kept the browser windows open when doing long-term tasks.
                                                                              Table 6: Duration, by Acitivity Goal
Finally, 1 participant advised that they care about the emo-
tion involved within these web activities, even when they                                Defined Long    Over-esti    Defined Short      Under-esti
                                                                     Finding                  24        17 (70.8%)         36             3 (8.3%)
were doing the same task, such as “buying a pair shoes”. In          Info-gathering           35        15 (42.9%)          7            4 (57.1%)
                                                                     Browsing                 28        17 (60.7%)          5                 0
particular, this participant indicated that one topically con-       Transaction               4         2 (50.0%)          5             2 (40.0%)
sistent session should be divided between two disappoint-            Communication
                                                                     Housekeeping
                                                                                               9
                                                                                               0
                                                                                                         3 (33.3%)
                                                                                                             0
                                                                                                                            5
                                                                                                                            1
                                                                                                                                              0
                                                                                                                                         1 (100.0%)
ingly unproductive and excitingly productive phases.                 Email                     7         6 (85.7%)          7                 0


                                                                       Firstly, considering activity goals given in Table 6, the
                                                                    number of ‘information-gathering’ sessions defined as long
                                                                    was 5 times as that of those ‘defined short’, as was the same
                                                                    with ‘browsing’. On the contrary, the number of ‘finding’
                                                                    sessions defined as short was 1.5 times the number defined as
                                                                    long. Overall, nearly 70% of ‘finding’, 42% of ‘information-
                                                                    gathering’, 60.7% of ‘browsing’, 50% of ‘transaction’, and
         (a) Acitivty Goal            (b) Activity Context          85.5% of ‘email’ sessions defined as long were overestimated
                                                                    by users. Moreover, under-estimation occurred with ‘find-
             Figure 2: Session Categories                           ing’, ‘information-gathering’, and ‘housekeeping’ although
  Finally, besides the pre-defined dimensions, participants         over-estimation was more frequent with ‘finding’, ‘browsing’,
also came up with some unique sorting dimensions as shown           ‘communication’, and ‘email’ sessions.
in Table 4, and these may benefit in exploring the session’s
delimiters and features in new perspectives.                                Table 7: Duration, by Activity Context
                                                                                         Defined Long    Over-est.    Defined Short      Under-est.
              Table 4: Unique Dimensions                             Work                     38        22 (57.9%)         31             2 (6.5%)
                                                                     Serious-Leisure           8         2 (25%)            1                 0
                                                                     Project-Leisure          22        15 (68.2%)         23            5 (21.7%)
                            Unique Dimensions                        Casual-Leisure           39        21 (53.8%)         11            3 (27.2%)
      Google it or Go to Website directly  Content contributor
      National                             Certain topic or not
      University related or not
                                           Based on old knowledge
                                                or brand new
                                                                       Table 7 above shows that the number of ‘casual-leisure’
      Amusement                            Preference               sessions defined as long was as 3 times as that those ‘defined
      Result Satisfaction                  Eyes Ears Needed
      Security                                                      short’ and that 57.9% of ‘work’, 68.2% of ‘project-leisure’,
                                                                    and 53.8% of ‘casual-leisure’ sessions defined as long were
                                                                    over-estimated by users with lower levels of under-estimation
4.2    Duration                                                     occurring. This encouraged a further study on the feature
   As duration is one of the targeted dimensions, all par-          of each kind of web activity to determine the main cause for
ticipants were asked for their own definition of what con-          an incorrectly perceived length.
stitutes a “long session”. 45% of participants defined the
session where the duration is more than 5 minutes, whereas          4.3     Time of Day
27% went with over 30 minutes, 18% more than 1 hour, and               Figure 3 shows that most the ‘information-gathering’, ‘find-
1 participant chose over 2 hours.                                   ing’ and ‘housekeeping’ sessions seem to occur between 10:00
   Because participants first defined what they considered          and 16:00 whilst more ‘browsing’, ‘email’, and ‘communica-
to be a long session, and then later sorted their sessions          tion’ activities were done between 22:00 and 0:00, which
into length categories, we investigated the di↵erence be-           was labelled “before bed time”. Additionally, there is a
tween sessions that met their definition of long, and ones          peak around 14:00, in which more ‘finding’ and ‘information-
they remembered as being long during the card sorts. Par-           gathering’ happened rather than other kinds of sessions. Fi-
ticipants frequently grouped ‘defined short’ sessions as long       nally, at 23:00, general ‘browsing’ is most prevalent.
and vice-versa. Consequently, we investigated both ‘overes-            Figure 4 shows that most of the ‘serious-leisure’ sessions
timated’ and ‘under-estimated’ sessions in addition to ‘de-         occurred between 18:00 and 22:00. Most of the ‘work’ ac-
fined long’, ‘long’, ‘actual long’, ‘defined short‘, ‘short’, and   tivities happened between 11:00 and 18:00, which seems to
‘actual short’ as given in Table 5.                                 fit in within a typical working day. In the time ‘before bed’,
                                                                   minutes, but many had inaccurate recollections of the length
                                                                   of sessions. Long sessions were typically a mix of casual and
                                                                   serious leisure that often involved information gathering and
                                                                   browsing behaviour, while the majority of work related ses-
                                                                   sions were typically short. We also noticed that some of
                                                                   these activities may also be related to certain times of the
                                                                   day. All of the findings will be further explored after phase
                                                                   two of the study, but early insights suggest that real ex-
                                                                   tended search sessions could be more accurately modelled
                                                                   based on additional factors such as: time of day, activity
                                                                   goal, activity context, and number of queries.
       Figure 3: Time of Day, by Activity Goal
                                                                   6.   REFERENCES
the most frequent activity is ‘casual-leisure’.                     [1] P. Bailey, L. Chen, S. Grosenick, L. Jiang, Y. Li,
                                                                        P. Reinholdtsen, C. Salada, H. Wang, and S. Wong.
                                                                        User task understanding: a web search engine
                                                                        perspective. In NII Shonan Meeting on Whole-Session
                                                                        Evaluation of Interactive Information Retrieval
                                                                        Systems, Kanagawa, Japan, October 2012.
                                                                    [2] L. D. Catledge and J. E. Pitkow. Characterizing
                                                                        browsing strategies in the World-Wide web. Computer
    Figure 4: Time of Day, by Activity Context
                                                                        Networks and ISDN Systems, 27(6):1065–1073, 1995.
  Combined with the two comparisons above, there seems to
                                                                    [3] D. Elsweiler, M. L. Wilson, and B. K. Lunn.
be some overlap between ‘information-gathering’, ‘finding’,
                                                                        Understanding casual-leisure information behaviour.
‘housekeeping’ and ‘work’. There was also some overlap be-
                                                                        In A. Spink and J. Heinstrom, editors, Library and
tween ‘browsing’ and ‘casual-leisure’. Furthermore, these
                                                                        Information Science, pages 211–241. Emerald Group
tend to suggest that there may be some patterns for user’s
                                                                        Publishing Limited, 2011.
web activity in their daily life.
                                                                    [4] D. He and A. Göker. Detecting session boundaries
4.4    Search Queries                                                   from Web user logs. Methodology, pages 57–66, 2000.
   In Figure 5 below, sessions with more search queries tend        [5] B. J. Jansen, A. Spink, C. Blakely, and S. Koshman.
to be classified as ‘defined long’, ‘long’, and ‘actual long’           Defining a session on Web search engines. JASIST,
than those with fewer queries. An interesting observation is            58(6):862–871, 2007.
that what the user defined as a long session features a rela-       [6] N. Jhaveri and K.-J. Räihä. The advantages of a
tively low average number of search queries compared with               cross-session web workspace. In CHI2005 Ext.
‘long’ and ‘actual long’ sessions. Equally, sessions defined as         Abstracts, page 1949. ACM Press, 2005.
‘short’ by the user actually feature relatively more queries        [7] B. Mackay and C. Watters. Exploring Multi-session
compared to ‘short’ and ‘actual short’. This may indicate               Web Tasks. Time, pages 1187–1196, 2008.
that the user did not consider the number of queries per-           [8] D. Nettleton, L. Calderon-benavides, and
formed when defining the duration of sessions and failed to             R. Baeza-yates. Baezayates, analysis of web search
realise the e↵ect of this behaviour.                                    engine query sessions. In Proc. WebKDD 2006, 2006.
                                                                    [9] G. Rugg and P. McGeorge. The sorting techniques: a
                                                                        tutorial paper on card sorts, picture sorts and item
                                                                        sorts. Expert Systems, 14(2):80–93, 1997.
                                                                   [10] A. J. Sellen, R. Murphy, and K. L. Shaw. How
                                                                        knowledge workers use the web. In Proc. CHI2002,
                                                                        pages 227–234. ACM Press.
                                                                   [11] A. Spink, M. Park, B. J. Jansen, and J. Pedersen.
                                                                        Multitasking during Web search sessions. IP&M,
                                                                        42(1):264–275, 2006.
                                                                   [12] L. Tauscher and S. Greenberg. How people revisit web
                                                                        pages: empirical findings and implications for the
     Figure 5: Average Number of Search Queries                         design of history systems. IJHCS, 47(1):97–137, 1997.
                                                                   [13] E. G. Toms, R. Villa, and L. McCay-Peet. How is a
                                                                        search system used in work task completion? Journal
5.    CONCLUSIONS                                                       of Information Science, 39(1):15–25, 2013.
   Although this paper only describes a preliminary analysis
of over 200 sessions from 11 participants, we have begun to
see some potentially interesting early findings. Initially, par-
ticipants varied greatly in their opinions about their own ses-
sions, with some matching topical divisions, some temporal
divisions, and some a combination of the two. The majority
of participants judged “long sessions” as being longer than 5
Comparative Study of Search Engine Result Visualisation:
             Ranked Lists Versus Graphs

                 Casper Petersen                              Christina Lioma                 Jakob Grue Simonsen
             Dept. of Computer Science                  Dept. of Computer Science             Dept. of Computer Science
             University of Copenhagen                   University of Copenhagen              University of Copenhagen
                    cazz@diku.dk                             c.lioma@diku.dk                    simonsen@diku.dk


ABSTRACT                                                                  tially useful or interesting features about how the retrieved
Typically search engine results (SERs) are presented in a                 data is connected.
ranked list of decreasing estimated relevance to user queries.               We present a user study comparing ranked list vs graph-
While familiar to users, ranked lists do not show inherent                based SER visualisation interfaces. We use a web crawl of
connections between SERs, e.g. whether SERs are hyper-                    ca. 50 million documents in English with associated hyper-
linked or authored by the same source. Such potentially                   link information and 10 participants. We find that ranked
useful connections between SERs can be displayed as graphs.               lists result in overall more accurate and faster searches than
We present a preliminary comparative study of ranked lists                graph displays, but that the latter result in slightly higher re-
vs graph visualisations of SERs. Experiments with TREC                    call. We also find overall higher inter-rater agreement about
web search data and a small user study of 10 participants                 SER relevance when using ranked lists instead of graphs.
show that ranked lists result in more precise and also faster
search sessions than graph visualisations.                                2.   MOTIVATION
                                                                              While traditional IR systems successfully support known-
                                                                          item search [5], what should users do if they want to locate
Categories and Subject Descriptors                                        something from a domain where they have a general interest
H.3.3 [Information Storage and Retrieval]: Information                    but no specific knowledge [8]? Such exploratory searching
Search and Retrieval                                                      comprises a mixture of serendipity, learning, and investiga-
                                                                          tion and is not supported by contemporary IR systems [5],
                                                                          prompting users to “develop coping strategies which involve
Keywords                                                                  [...] the submission of multiple queries and the interactive
Search Engine Result Visualization, Ranked List, Graph                    exploration of the retrieved document space, selectively fol-
                                                                          lowing links and passively obtaining cues about where their
1.    INTRODUCTION                                                        next steps lie” [9]. A step towards exploratory search, which
                                                                          motivates this work, is to make explicit the hyper-linked
   Typically search engine results (SERs) are presented in a              structure of the ordered list used by e.g. Google and Ya-
ranked list of decreasing estimated relevance to user queries.            hoo. Investigation of such a representation does not exist
Drawbacks of ranked lists include showing only a limited                  according to our knowledge, but is comparable to Google’s
view of the information space, not showing how similar the                Knowledge Graph whose aim is to guide users to other rel-
retrieved documents are and/or how the retrieved docu-                    evant information from an initial selection.
ments relate to each other [4, 6]. Such potentially use-
ful information could be displayed to users in the form of
SER graphs; these could present at a glance an overview
                                                                          3.   PREVIOUS WORK
of clusters or isolated documents among the SERs, features                   Earlier work on graph-based SER displays includes Beale
not typically integrated into ranked lists. For instance, di-             et al.’s (1997) visualisation of sequences of queries and their
rected/undirected and weighted/unweighted graphs could                    respective SERs, as well as the work of Shneiderman & Aris
be used to display the direction, causality and strength of               (2006) on modelling semantic search aspects as networks
various relations among SERs. Various graph properties                    (both overviewed in [10]). Treharne et al. (2009) present a
(see [7]), such as the average path length, clustering coef-              critique of ranked list displays side by side a range of other
ficient or degree, could be also displayed, reflecting poten-             types of visualisation, including not only graphs, but also
                                                                          cartesian, categorical, spring and set-based displays [6]. This
                                                                          comparison is analytical rather than empirical. Closest to
                                                                          ours is the work of Donaldson et al. (2008), who experi-
                                                                          mentally compare ranked lists to graph-based displays [2].
                                                                          In their work, graphs model social web information, such
                                                                          as user tags and ratings, in order to facilitate contextual-
Presented at EuroHCIR2013. Copyright c 2013 for the individual papers     ising social media for exploratory web search. They find
by the papers’ authors. Copying permitted only for private and academic   that users seem to prefer a hybrid interface that combines
purposes. This volume is published and copyrighted by its editors.
SIGIR 2013 Dublin, Ireland                                                ranked lists with graph displays. Finally, the hyperlinked
.                                                                         graph representation discussed in the paper allows users to
investigate the result space thereby discovering related and           • The right window shows the ranked list of the top-k
potential relevant information that might otherwise be by-               SERs. The position of the clicked document in the list
passed. Such representation and comparison to a traditional              is clearly marked.
ranked list does not exist according to our knowledge, but           We display the SER graph in a standard force-directed
the idea underpinning the graph representation is compara-        layout [1]. Our graph layout does not allow for other types
ble with Google’s Knowledge Graph as the aim is to guide          of interaction with the graph apart from clicking on it. We
users to other relevant information from an initial selection.    reason that for the simple web search tasks we consider,
                                                                  layouts allowing further interaction may be confusing or
4.    INTERFACE DESIGN                                            time-consuming, and that they may be more suited to other
  This section presents the two di↵erent SER visualisations       search tasks, involving for instance decision making, naviga-
used in our study. Our goal is to study the e↵ect of display-     tion and exploration of large information spaces.
ing exactly the same information to the user in two di↵erent
ways, using ranked list and graph visualisations, respectively.   4.3     Document Snippets
                                                                     Both the RL and GR interfaces include short query-based
                1   docid                 4
                2   docid
                                                                  summaries of the top-k SERs (snippets). We construct them
                                3
                3   docid                     1                   as follows: We extract from each document a window of ±
                4   docid            2
                5   docid
                                                                  25 terms surrounding the query terms on either side. Let a
                6   docid   5
                                          6                       query consist of 3 terms q1 , q2 , q3 . We extract snippets for
                     (A)            (B)                           all ordered but not necessarily contiguous sequences of query
                                                                  terms: (q1, q2, q3), (q1 , q2 ), (q1 , q3 ), (q2 , q3 ), (q1 ), (q2 ), (q3 ).
Figure 1: Ranked list (A) and graph (B) representation of the     This way, we match all snippets containing query terms in
top-k documents from a query.                                     the order they appear in the query (not as a bag of words),
                                                                  but we also allow other terms to occur in between query
4.1     Ranked List (RL) Display                                  terms, for instance common modifiers.
   We use a standard ranked list SER display, where docu-            Several snippets can be extracted per document, but only
ments are presented in decreasing order of their estimated        the snippet with the highest TF-IDF score is displayed to
relevance to the user query. The list initially displays only     the user. The TF-IDF of each window is calculated as a
the top-k retrieved document ids (docids) with their asso-        normalised sum of the TF-IDF weights for each term:
ciated rank (see Figure 1 (A)). When clicked upon, each                             |w|                      ✓                         ◆
document expands to two mini windows, overlaid to the left                       1 X                                     |C|
                                                                       Ss(D) =          tf (t, D) ⇥ log
and right of the list:                                                          |w| t=0                        |D 2 C : t 2 D|
     • The left window shows a document snippet containing        where |w| is the number of terms in the window extracted,
       the query terms. The snippet provides a brief sum-         t 2 w is a term in the window, tf is the term frequency of t
       mary of the document contents that relate to the query     in document D from which the snippet is extracted, C is the
       in order to aid the user to assess document relevance      collection of documents, and Ss(D) is the snippet score for
       prior to viewing the whole document [4]. We describe       document D. Finally, as research has shown that query term
       exactly what the snippet shows and how it is extracted     highlighting can be a useful feature for search interfaces [4],
       in Section 4.3.                                            we highlight all occurrences of query terms in the snippet.
     • The right window shows a graph of the top-k ranked
       SERs (see Section 4.2). The position of the clicked        5.    EVALUATION
       document in the graph is clearly indicated, so users          We recruited 2 participants for a pilot study to calibrate
       can quickly overview its connections, if any, to other     the user interfaces; the results from the pilot study were
       top-k retrieved documents.                                 not subsequently used. For the main study, we recruited 10
                                                                  new participants (9 males, 1 female; average age: 33.05, all
Previously visited documents in the list are colour-marked.       with a background in Computer Science) using convenience
4.2     Graph (GR) Display                                        sampling. Each participant was introduced to the two in-
                                                                  terfaces. Their task was to find and mark as many relevant
   We display a SER graph G = (V, E) as a directed graph
                                                                  documents as possible per query using either interface. For
whose vertices v 2 V correspond to the top-k retrieved doc-
                                                                  each new query, the SERs could be shown in either interface.
uments, and edges e 2 E correspond to links (hyperlinks
                                                                  Each experiment lasted 30 minutes.
in our case of web documents) between two vertices. Each
                                                                     Participants did not submit their own queries. The queries
vertex is shown as a shaded circle that displays the rank of
                                                                  were taken from the TREC Web tracks of 2009-2012 (200
its associated document in the middle, see Figure 1 (B). The
                                                                  queries in total). This choice allowed us to provide very fast
size of each vertex is scaled according to its out-degree, so
                                                                  response times to participants (< two seconds, depending
that larger vertex size indicates more outlinks to the other
                                                                  on disk speed), because search results and their associated
top-k documents. Edge direction points towards the out-
                                                                  graphs were pre-computed and cached. Alternatively, run-
linked document. Previously visited documents are colour-
                                                                  ning new queries and plotting their SER graphs on the fly
marked.
                                                                  would result in notably slower response times that would
   When clicked upon, each vertex expands to two mini win-
                                                                  risk dissatisfying participants. However, a drawback in us-
dows, overlaid to the left and right of the graph:
                                                                  ing TREC queries is that participants did not necessarily
     • The left window shows the same document snippet as         have enough context to fully understand the underlying in-
       in the RL display.                                         formation needs and correctly assess document relevance.
                             Ranked List   Graph
             MAP@20            0.4195      0.3211                                             50
                                                                                                                                             Relevant
             MRR               0.4698      0.3948                                             40
                                                                                                                                             Not relevant

             RECALL@20         0.0067      0.0069


                                                                                  Frequency
                                                                                              30

Table 1: Mean Average Precision (MAP), Mean Reciprocal                                        20
Rank (MRR) & Recall of the top 20 results.
                                                                                              10


                                                                                              0
To counter this, we allowed participants to skip queries they                                      0   5   10   15       20        25   30      35          40
                                                                                                                     Click order
were not comfortable with. To avoid bias, skipping a query
                                                                                                                   (a)
was allowed after query terms were displayed, but before the
SERs were displayed.                                                                          40
  We retrieved documents from the TREC ClueWeb09 cat.                                                                                        Relevant
                                                                                                                                             Not relevant

B dataset (ca. 50 million documents crawled from the web                                      30


                                                                                  Frequency
in 2009), using Indri, version 5.2. The experiments were
                                                                                              20
carried out on a 14 inch monitor with a resolution of 1400
x 1050 pixels. We logged which SERs participants marked                                       10

relevant, as well as the participants’ click order and time
spent per SER.                                                                                0
                                                                                                   0        5            10             15                  20
                                                                                                                     Click order


5.1     Findings                                                                                                   (b)
   In total the 10 participants processed 162 queries (89 queries   Figure 2: Click-order and participant relevance assessments for
with the RL interface and 73 with the GR interface) with            the (a) ranked list interface and (b) graph interface
mean µ = 16.2, and standard deviation = 7.8. Four queries
(two from each interface) were bypassed (2.5% of all pro-                     Interface                     Min         Max               µ
                                                                              Ranked List                  1.391       25.476           8.228               4.371
cessed queries).
                                                                              Graph                        3.322       20.963           9.705               3.699
   Table 1 shows retrieval e↵ectiveness per interface, aggre-
gated over all queries for the top k = 20 SERs. The ranked                  Table 2: Time (seconds) spent on each interface.
list is associated with higher, hence better scores than the
graph display for MAP and MRR. MAP is +30.6% better
with ranked lists that with graph displays, meaning that            that for the graph display, the majority of participant clicks
overall a higher amount of relevant SERs is found by the            before the 5th click correspond to non-relevant documents.
participants at higher ranks in the ranked list as opposed          Even though the MRR scores of the graph display indicate
to the graph display. This finding is in agreement with the         that the first relevant document occurs around rank posi-
MRR scores, which indicate that the first SER to be as-             tion 2.5, we see that participants on average click four other
sessed relevant is likely to occur around rank position 2.13        documents before clicking the relevant document at rank
(1/2.13 = 0.469 ⇡ 0.4698) with ranked lists, but around             position 2.5. This indicates that in the graph display, par-
rank position 2.55 (1/2.55 = 0.392 ⇡ 0.3948) with graph             ticipants click documents not necessarily according to their
displays. Conversely, recall is slightly higher with graph dis-     rank position (indicated in the centre of each vertex), but
plays. In general, higher recall in this case would indicate        rather according to their graph layout or connectivity.
that participants are more likely to find a slightly larger
amount of relevant documents when seeing them as a graph            5.1.2     Time spent
of their hyperlinks. However, the di↵erence in recall between          Table 2 shows statistics about the time participants spent
ranked lists and graphs is very small and can hardly be seen        on each interface. Overall participants spent less time on
as a reliable indication.                                           the ranked list than on the graph display. This observation,
                                                                    combined with the retrieval e↵ectiveness measures shown
5.1.1    Click-order                                                in Table 1, indicates that participants conducted overall
   On average participants clicked on 9.46 entries per query        slightly more precise and faster searches using the ranked
in the ranked list (842 clicks for 89 queries) but only on          lists than using graph displays. The time use also suggests
6.7 entries per query in the graph display (490 clicks for 73       that participants are used to standard ranked list interfaces,
queries). The lower number of clicks in the latter case could       a type of conditioning not easy to control experimentally.
be due to the extra time it might have taken participants
to understand or navigate the graph. This lower number              5.1.3     Inter-participant agreement
of clicks also agrees with the lower MAP scores presented             To investigate how consistent participants were in their
above (if fewer entries were clicked, fewer SERs were as-           assessments, we report the inter-rater agreement using Krip-
sessed, hence fewer relevant documents were found in the            pendor↵’s ↵ [3]. Table 3 reports the agreement between the
top ranks).                                                         participants, and Table 4 reports the agreements between
   Figures 2a and 2b plot the order of clicks for the ranked        participants and the TREC preannotated relevance assess-
list and graph interfaces respectively on the x-axis, against       ments per interface. In both cases, only queries annotated
the frequency of clicks on the y-axis. We see that in the           more than once by di↵erent participants are included (19
ranked list, the first click of the participant is more often       queries for the ranked list and 11 for the graph SER).
on a relevant document, but in the graph display, the first           The average inter-rater agreements between participants
click is more often on a non-relevant document (as already          vary considerably. For the graph interface, ↵ = 0.04471,
indicated by the MRR scores shown above). We also see               which suggests lack of agreement between raters. On a query
basis, some queries (query 169 and 44) suggest a compara-                       Graph                      Ranked List
                                                                      Query    Raters          ↵   Query    Raters          ↵
tively much higher agreement whereas others (e.g. query
                                                                        101      4       0.09559     110      3       0.38654
104 and 184) show a comparatively higher level of disagree-             104      2      -0.17861     119      2      -0.22370
ment. For the ranked list, inter-rater agreement is higher              132      2       0.06561     120      2       0.03146
(↵ = 0.19813). On a per query basis, quite remarkably,                  169      2       0.33625     129      2       0.05600
query 92 had a perfect agreement between raters, while                  180      2      -0.08949     132      3       0.01689
queries 175 and 129 also exhibited a moderate to high level             184      2      -0.08949     133      2       0.04398
of agreement. However, most queries show only a low to                    3      2      -0.37209     155      2      -0.21067
                                                                         38      2      -0.05006     165      2      -0.25532
moderate level of agreement or disagreement.                             44      2      -0.05861     175      2      -0.07886
   Overall, the lack of agreement may indicate the partici-              54      2      -0.25532     180      2      -0.17861
pants’ confusion in assessing the relevance of SERs to pre-              58      2      -0.22917      51      2      -0.05006
typed queries. This may be aggravated by problems in ren-                 –      –             –      53      2      -0.24694
dering the HTML snippets into text. Some HTML docu-                       –      –             –      74      2      -0.06033
ments were ill-formed, hence their snippets sometimes in-                 –      –             –      80      2      -0.24694
                                                                          –      –             –      81      3      -0.13634
cluded HTML tags or other not always coherent text.                       –      –             –      92      2      -0.21181
   Inter-rater agreements between our participants and the                –      –             –      95      2       0.04582
TREC preannotated relevance assessments show an almost                    –      –             –      96      2      -0.12919
complete lack of agreement. For both interfaces there is                  –      –             –      97      2       0.07813
a weak level of disagreement on average (↵ = 0.0750 and                       Average ↵: -0.0750           Average ↵: -0.0721
↵ = 0.0721 for the graph and ranked list respectively). On
a per query basis there are only two queries (queries 169 &      Table 4: Inter-rater agreement (↵) between participants and
                                                                 TREC assessments for queries assessed by > 1 participant.
110) exhibiting a moderate level of agreement. For most re-
maining queries our participants’ assessments disagree with
the TREC assessments.                                            session in the analysis (e.g. user task, behaviour, satisfac-
                                                                 tion). Future work includes addressing the above limitations
               Graph                      Ranked list            and also testing whether and to what extent these results ap-
     Query    Raters          ↵   Query    Raters           ↵    ply when scaling up to wall-sized displays with significantly
       101       4      0.28696     110       3       0.41000
       104       2     -0.21875     119       2       0.00000
                                                                 larger screen real estate.
       132       2     -0.16071     120       2       0.49351
       169       2      0.48000     129       2       0.86022    7.    REFERENCES
       180       2     -0.10031     132       3     -0.08949
       184       2     -0.25806     133       2       0.30108     [1] G. D. Battista, P. Eades, R. Tamassia, and I. G.
         3       2      0.00000     155       2     -0.02632          Tollis. Graph drawing: algorithms for the visualization
        38       2     -0.07519     175       2       0.49351         of graphs. Prentice Hall PTR, 1998.
        44       2      0.49351     180       2     -0.37879      [2] J. J. Donaldson, M. Conover, B. Markines,
        58       2      0.00000      51       2       0.00000         H. Roinestad, and F. Menczer. Visualizing social links
         –       –            –      53       2       0.02151
         –       –            –      74       2     -0.14706          in exploratory search. In HT ’08, pages 213–218, New
         –       –            –      80       2       0.14420         York, NY, USA, 2008. ACM.
         –       –            –      81       3     -0.12919      [3] A. F. Hayes and K. Krippendor↵. Answering the call
         –       –            –      92       2       1.00000         for a standard reliability measure for coding data.
         –       –            –      95       2       0.15584         Communication Methods and Measures, 1(1):77–89,
         –       –            –      96       2       0.15584
         –       –            –      97       2       0.30179
                                                                      2007.
             Average ↵: 0.04471           Average ↵: 0.19813      [4] M. Hearst. Search user interfaces. Cambridge
                                                                      University Press, 2009.
Table 3: Inter-rater agreement (↵) for queries assessed by >1     [5] G. Marchionini. Exploratory search: from finding to
participant. Query is the TREC id of each query.                      understanding. Communications of the ACM,
                                                                      49(4):41–46, 2006.
6.    CONCLUSIONS                                                 [6] K. Treharne and D. M. W. Powers. Search engine
                                                                      result visualisation: Challenges and opportunities. In
  In a small user study, we compared ranked list versus               Information Visualisation, pages 633–638, 2009.
graph-based search engine result (SER) visualisation. Our
                                                                  [7] S. Wasserman and K. Faust. Social network analysis:
motivation was to conduct a preliminary experimental com-
                                                                      methods and applications. Structural analysis in the
parison of the two for the domain of web search, where doc-
                                                                      social sciences. Cambridge University Press, 1994.
ument hyperlinks were used to display them as graphs. We
                                                                  [8] R. W. White, B. Kules, S. M. Drucker, and
found that overall more accurate and faster searches were
                                                                      M. Schraefel. Supporting exploratory search.
done using ranked lists and that inter-user agreement was
                                                                      Communications of the ACM, 49(4):36–39, 2006.
overall higher with ranked lists than with graph displays.
Limitations of this study include: (1) using fixed TREC           [9] R. W. White, G. Muresan, and G. Marchionini.
queries, instead of allowing users to submit their own queries        Workshop on evaluating exploratory search systems.
on the fly; (2) having technical HTML to text rendering               SIGIR Forum, 40(2):52–60, 2006.
problems, resulting in sometimes incoherent document snip-       [10] M. L. Wilson, B. Kules, B. Shneiderman, et al. From
pets; (3) using only 10 users exclusively from Computer Sci-          keyword search to exploration: Designing future
ence, which makes for an overall small and rather biased              search interfaces for the web. Foundations and Trends
user sample; (4) not using the wider context of the search            in Web Science, 2(1):1–97, 2010.
                                  Evolving Search User Interfaces

                               Tatiana Gossen, Marcus Nitsche, Andreas Nürnberger
                             Data & Knowledge Engineering Group, Faculty of Computer Science
                                     Otto von Guericke University Magdeburg, Germany
                                                       http://www.dke.ovgu.de/


ABSTRACT                                                                  This is a very wide and heterogeneous target group with different
When designing search user interfaces (SUIs), there is a need to tar-     backgrounds, knowledge, experience, etc. Therefore, researchers
get specific user groups. The cognitive abilities, fine motor skills,     suggest providing a customized solution to cover the needs of indi-
emotional maturity and knowledge of a sixty years old man, a four-        vidual users (e.g., [6]). Nowadays, solutions in personalisation and
teen years old teenager and a seven years old child differ strongly.      adaptation of backend algorithms, i.e. query adaptation, adaptive
These abilities influence the decisions made in the user interface        retrieval, adaptive result composition and presentation, have been
(UI) design process of SUIs. Therefore, SUIs are usually designed         proposed in order to support the search of an individual user [13,
and optimized for a certain user group. However, especially for           14]. But the front end, i.e. the SUI, is usually designed and opti-
young and elderly users, the design requirements change rapidly           mized for a certain user group and does not support many mecha-
due to fast changes in users’ abilities, so that a flexible modifica-     nisms for personalisation. Common search engines allow the per-
tion of the SUI is needed. In this positional paper we introduce the      sonalisation of a SUI in a limited way: Users can choose a colour
concept of an evolving search user interface (ESUI). It adapts the        scheme or change the settings of the browser to influence some pa-
UI dynamically based on the derived capabilities of the user inter-       rameters like font size. Some search engines also detect the type
acting with it. We elaborate on user characteristics that change over     of device the user is currently using – e.g. a desktop computer or a
time and discuss how each of them can influence the SUI design us-        mobile phone – and present an adequate UI.
ing an example of a girl growing from six to fourteen. We discuss            Current research concentrates on designing SUIs for specific user
the ways to detect current user characteristics. We also support our      groups, e.g. for children [4, 6, 10] or elderly people [1, 2]. These
idea of an ESUI with a user study and present its first results.          SUIs are optimized and adapted to general user group character-
                                                                          istics. However, especially young and elderly users undergo fast
                                                                          changes in cognitive, fine motor and other abilities. Thus, design
Keywords                                                                  requirements change rapidly as well and a flexible modification of
Search User Interface, Human Computer Interaction, Adaptivity,            the SUI is needed. Therefore, we suggest to provide users with
Context Support, Information Retrieval.                                   an evolving search user interface (ESUI) that adapts to individual
                                                                          user’s characteristics and allows for changes not only in properties
Categories and Subject Descriptors                                        (e.g., colour) of UI elements but also influences the UI elements
                                                                          themselves and their positioning. Some UI elements are continu-
H.5.2 [Information Interfaces and Presentation]: User Interfaces.         ously adaptable (e.g. font size, button size, space required for UI
                                                                          elements), whereas others are only discretely adaptable (e.g. type
General Terms                                                             of results visualization). Not only SUI properties, but also the com-
Design, Human Factors.                                                    plexity of search results is continuously adaptable and can be used
                                                                          as a personalisation mechanism for users of all age groups.
1.    INTRODUCTION
  Search user interfaces [8] are an integral part of our lives. Most      2.    ESUI VISION
common known SUIs come in the form of web search engines with                In this section we share our vision of an ESUI. In general, we
an audience of hundreds of millions of people1 all over the world.        suggest to use a mapping function and adapt the SUI using it, in-
1                                                                         stead of building a SUI for a specific user group. Using a generic
   Google, for example, has over 170 million unique visi-                 model of an adaptive system, as discussed in [14], we depict the
tors per month, only in the U.S. http://www.nielsen.
com/us/en/newswire/2013/january-2013--top-u-s\                            model of an ESUI as following (see Fig. 1). We have a set of user
                                                                          characteristics (or skills) on one side. In the ideal case, the sys-
                                                                          tem detects the skills automatically, e.g. based on user’s interaction
                                                                          with the information retrieval system (user’s queries, selected re-
                                                                          sults, etc.). On the other side, there is a set of options to adapt the
                                                                          SUI, e.g. using different UI elements for querying or visualisation
                                                                          of results. In between, an adaptation component contains a set of
                                                                          logic rules to map the user’ skills to the specific UI elements of the
Presented at EuroHCIR2013. Copyright c 2013 for the individual papers     ESUI.
by the papers’ authors. Copying permitted only for private and academic
purposes. This volume is published and copyrighted by its editors.        --entertainment-sites-and-we-brands.html
                                                                           support and a resulting feeling of success [5]. Therefore, they re-
                                                                           quire support to increase their confidence. In general, reading and
                                                                           writing skills of adults are better than those of children. Knowl-
                                                                           edge is gathered during life. Thus, elderly people posses a larger
                                                                           knowledge base than adults, and adults have usually more knowl-
                                                                           edge than children. We believe that the discussed characteristics
                                                                           can affect the design of SUIs. However, further research should be
                                                                           done in this direction.

                                                                           2.3     Detection of User Abilities
                                                                              An ESUI can provide a specific SUI for a specific user given
                                                                           the knowledge of his specific abilities. A simple case is an adapt-
                                                                           able SUI, where a user manually adjusts the search user interface to
                                                                           his personal needs and tasks. An adaptable SUI may also provide
                   Figure 1: Model of an ESUI.                             several standard settings for a specific user selection to explore the
                                                                           options (e.g. young user, adult user, elderly user). More interest-
                                                                           ing and challenging is the case of an adaptive SUI, where a system
2.1     Mapping Function                                                   automatically detects the abilities of a user and provides him with
   The function between the user skill space and the options to            an appropriate SUI. Concepts for an automatic detection of user’s
adapt the UI elements of the SUI has to be found. We suggest using         abilities have been studied in the past. We can use the age of a
the knowledge about human development, e.g. from medical, cog-             registered and logged-in user. However, the age provides only an
nitive, psychosocial science fields to specify the user skill space.       approximation of a user’s capabilities. For an individual user an
The results of user studies about users’ search behaviour and SUI          appropriate mapping to the age group has to be found, e.g. us-
design preferences can provide recommendations for UI elements.            ing psychological tests covered in form of games. Those games
As far as the research provides information about the studied age          can be used to derive the quality of user’s fine-motor skills as well.
group, we can use the age group as a connector between the skill           Furthermore, we can use the user history from log files, in spe-
space and the UI elements. Note that we use age groups in the              cific, issued queries (their topic and specific spelling errors) and
sense of a more abstract category defining a set of specific capabil-      accessed documents. However, research is required to determine
ities while growing up. A lot of research is already done and can be       how to adapt a SUI in the way users would accept the changes.
used, e.g. [2, 4, 7]. In addition, if the set of adaptable UI elements
is defined, we can evaluate the mapping function by letting users          3.    DESIGN IDEAS
from different age groups put the UI elements of a SUI together               When designing an ESUI, we first have to define the components
(similar to the end user programming).                                     of a SUI that should be adapted. We consider three main compo-
                                                                           nents. The first component is an input, i.e. UI elements which
2.2     Evolving Skills                                                    allow a user to transform his information need into a machine un-
   In order to allow a SUI to evolve together with a user we first         derstandable format. This component is traditionally represented
have to determine those characteristics that vary from user to user        by an input field and a search button. Other variants are a menu
and change during his life (or due to some circumstances like dis-         with different categories or voice input. The second component is
eases). For example, discussion about the skills of young users is         an output of an information retrieval (IR) system. The output con-
given in [7]. We suggest to consider cognitive skills, information         sists of UI elements that provide an overview of retrieved search
processing rates, fine motor skills, different kinds of perception,        results. There can be different kinds of output, e.g. a vertical list of
knowledge base, emotional state, reading and writing skills.               snippets (Fig. 2a), tiles (Fig. 2c) or coverflow (Fig. 2b). The third
   In the following, brief summary of current research results in          is a management component. Management covers UI elements that
human development science is given. Human cognitive develop-               support users in information processing and retaining. Examples of
ment occurs in a sequential order in which later knowledge, abili-         management UI elements are bookmark management components
ties and skills build upon the previously acquired ones [12]. Cog-         or other history mechanisms like breadcrumbs. Historically, man-
nitive abilities of users in those stages differ, for example, before      agement UI elements are not part of an SUI. But recent research
the last (formal operational) stage they are unable to think logi-         [6] shows that users are highly motivated to use elements of man-
cally and to understand abstract concepts. Again, not only age but         agement. Besides these main components, there also exist general
also some diseases or accelerated cognitive development cause that         properties of UI elements that might affect all the three categories,
cognitive abilities, i.e. skills to gain, use and retain knowledge, dif-   e.g. font size or color. We propose to adapt these three main com-
fer from user to user. Information processing capabilities change          ponents of a SUI and its general UI properties to the user’s skills.
during life. Children’s information processing is slower than that
of adults [11]. Therefore, children have a limited cognitive recall.       3.1     Use Cases
It is widely agreed that elderly people have a decline in intellec-           In order to demonstrate the proposed ESUI, we consider a young
tual skills which affects the aggregation of new information [15].         girl called Jenny who is growing older. We show how input and
Fine motor skills are influenced by information processing rates           output of a SUI can be adapted to changes of Jenny’s abilities.
[9]. Therefore, young children’s performance in pointing move-                Use Case 1: Jenny is six years old. She started to learn reading,
ments, e.g. using a mouse, are lower than that of adults. Perception       but she has difficulties with writing. Jenny’s active vocabulary is
of color can also change while aging. Color discrimination is more         limited to 5,000 words. She cannot yet think in abstract categories
difficult for elderly people. Elderly people have also problems with       and is not able to process much information. Due to her limited
hearing [3]. Children are immature in the emotional domain and,            writing abilities, Jenny is not able to use an input field and write
especially at the age of six to twelve, require additional emotional       a query. She is learning to read, therefore, she can use a menu
         (a)                                                             (b)                                                            (c)

Figure 2: Different kinds of output of an information retrieval system: a) vertical list of snippets offers a fast overview of several
results at once b) coverflow view of results offers an attractive animation by browsing, uses a familiar book metaphor, central element
is clear separated from the rest c) tiles of search results offer a fast overview of several results at once, a user has small jumps by
reading within results, however the ordering of results is not so clear as by a list.


               (a)                                              (b)                                                          (c)

Figure 3: Different kinds of input of an information retrieval system: a) an ESUI enables a six-year-old Jenny to draw her query b) an
ESUI supports nine-year-old Jenny by voice input and through several pre-defined categories c) an ESUI enables fourteen-year-old
Jenny to use keyword-based input supported by an adaptive query cloud.


with different categories which are supported by images. In order              by spelling correction and suggestion mechanisms. A SUI can still
to search for any information Jenny can draw her query (Fig. 3a).              support Jenny by finding the “right” keywords, for example using
Jenny’s fine motor skills are not fully developed yet. She has dif-            a query cloud2 (Fig. 3b). Jenny can already manage different in-
ficulties using interactions like scrolling. She also cannot process           teraction techniques and is able to process more information than
much information at once. Therefore, the coverflow (Fig. 2b) result            the nine-year-old Jenny. Therefore, coverflow and a vertical list vi-
visualisation fits her abilities (best). Coverflow allows her to con-          sualisation would probably restrain her performance, whereas tiles
centrate on one item at a time, thus, her cognitive load is reduced.           (Fig. 2c) allow Jenny a better overview of results.
Jenny can interact with it using simple point-and-click interactions.
An integrated text-to-speech reader supports Jenny by reading the
results to her.                                                                4.    USER STUDY
   Use Case 2: Jenny is nine years old. Jenny can read and write                  In order to demonstrate the idea of an ESUI, we conducted a
short stories with just a few spelling errors. Jenny has some diffi-           user study to compare users’ preferences in the visualization of dif-
culties with typing using a keyboard. She “hunts and pecks” on the             ferent UI elements of a SUI. In specific, our hypothesis was that
keyboard for correct keys. This increases the amount of spelling               users from different age groups would prefer to use different UI el-
errors and also slows down the process. Jenny is frustrated because            ements and different general UI properties. We built a SUI that
the system does not understand her well. Thus, a standard keyword              can be personalized, i.e. users can choose input, output and tune
input field does not fit Jenny’s abilities well. Jenny still cannot            general UI properties. In this paper we present our first results, i.e.
think in abstract categories and process a lot of information. But             users’ preferences in results visualization. Our SUI allows users to
her language skills improved and her vocabulary size is increased.             choose between a vertical list of snippets, tiles (Fig. 4b) and cov-
Therefore, she can use voice input to search for information. A                erflow (Fig. 4a). In our experiment we demonstrated these three
menu with different categories in addition to voice input can in-              output types. The subjects interacted with the search system to get
spire Jenny to search for some new information. However, these                 a better feeling and were encouraged to solve a simple search task
categories should match her cognitive abilities (Fig. 3b). Jenny can           using the prefered SUI setup. 44 subjects participated in the study,
already manage different interaction techniques and is able to pro-            27 children and 17 adults. The children were between eight and ten
cess more information than the six-year-old Jenny. Therefore, a list           years old (8.9 on average), 19 girls and 8 boys from third (18 sub-
of snippets (Fig. 2a) is an adequate output visualization. It requires         jects) and fourth (9 subjects) grade. The adults were between 22
not that much cognitive recall as tiles, but allows to process more            and 53 years old (29.2 on avarage), five women and 12 men. Nine
results items at a time than coverflow does.                                   of them were students in computer science and four worked in the
   Use Case 3: Jenny is 14 years old. Jenny’s writing skills are fur-          IT sector. The results for the output are presented in Fig. 5. The
ther developed with use of correct grammar, punctuation and spel-              majority of the children prefered the coverflow results visualiza-
ling. She learns to think logically about abstract concepts. Her               tion, whereas the adults had a week tendency towards tiles. These
vocabulary size is about 20,000 words. She chats a lot with her                results can be explained by the fact that on average children cannot
friends which results in fast typing skills using a keyboard. There-           2
fore, Jenny is able to use a keyword-oriented input search supported             Similar to the quinturakids.com search engine, accessed on
                                                                               02.05.2013
                            (a)                                                                                 (b)

Figure 4: Different kinds of result visualization: a) ESUI with coverflow result visualization b) ESUI with tiles result visualization.


process much information, but adults do. Thus, it is easier for chil-       7.   REFERENCES
dren to use coverflow. Coverflow offers an animation by browsing             [1] A. Aula. User study on older adults’use of the web and
that is attractive for children. Many adults told us that they prefer            search engines. Universal Access in the Information Society,
tiles as, since many results can be compared at once, tiles offer a              4(1):67–81, 2005.
good overview of results.                                                    [2] A. Aula and M. Käki. Less is more in web search interfaces
                                                                                 for older adults. First Monday, 10(7-4), 2005.
                                                                             [3] J. E. Birren and K. W. Schaie. Handbook of the psychology
                                                                                 of aging, volume 2. Gulf Professional Publishing, 2001.
                                                                             [4] C. Eickhoff, L. Azzopardi, D. Hiemstra, F. de Jong,
                                                                                 A. de Vries, D. Dowie, S. Duarte, R. Glassey, K. Gyllstrom,
                                                                                 F. Kruisinga, et al. Emse: Initial evaluation of a
                                                                                 child-friendly medical search system. In IIiX Symposium,
                                                                                 2012.
                                                                             [5] E. Erikson. Children and society. WW Norton & Company,
                                                                                 1963.
                                                                             [6] T. Gossen, M. Nitsche, and A. Nürnberger. Knowledge
                                                                                 journey: A web search interface for young users. In Proc. of
                                                                                 the Sixth Symposium on HCIR, 2012.
                                                                             [7] T. Gossen and A. Nürnberger. Specifics of information
Figure 5: Study results: what type of visualization do children                  retrieval for young users: A survey. Information Processing
and adults prefer.                                                               & Management, 49(4):739–756, 2013.
                                                                             [8] M. Hearst. Search user interfaces. Cambridge University
                                                                                 Press, 2009.
5.    CONCLUSION                                                             [9] J. Hourcade, B. Bederson, A. Druin, and F. Guimbretière.
   In this positional paper we introduced the concept of an evolv-               Differences in pointing task performance between preschool
ing search user interface that adapts itself to abilities of a particular        children and adults using mice. ACM Transactions on
user. Instead of building a SUI for a specific user group, we use a              Computer-Human Interaction, 11(4):357–386, 2004.
mapping function between user skills and UI elements of a search            [10] M. Jansen, W. Bos, P. van der Vet, T. Huibers, and
system in order to adapt it dynamically, allowing the user to per-               D. Hiemstra. TeddIR: tangible information retrieval for
form his search process in a more efficient way. We considered                   children. In Proc. of the 9th Int. Conf. on Interaction Design
different abilities of a user, e.g. his cognitive skills, knowledge,             and Children, pages 282–285. ACM, 2010.
reading and writing skills, that change during life. Furthermore,           [11] R. Kail. Developmental change in speed of processing during
we proposed to adapt three main components of a SUI, i.e. input,                 childhood and adolescence. Psychological bulletin,
output and management, and its general UI properties to the user                 109(3):490, 1991.
skills. A key component of an ESUI is a mapping function between            [12] J. Ormrod and K. Davis. Human learning. Merrill, 1999.
user skill space and UI elements of a SUI, that has to be found. We         [13] B. Steichen, H. Ashman, and V. Wade. A comparative survey
elaborate on ways to learn this function. In order for an ESUI to be             of personalised information retrieval and adaptive
adaptive, ways to detect user abilities are required. We pointed in              hypermedia techniques. Information Processing &
several directions how the detection can be done.                                Management, 2012.
                                                                            [14] S. Stober and A. Nürnberger. Adaptive music retrieval–a
6.    ACKNOWLEDGEMENTS                                                           state of the art. Multimedia Tools and Applications, pages
   The work presented here was partly supported by the German                    1–28, 2012.
Ministry of Education and Science (BMBF) within the ViERforES               [15] I. Stuart-Hamilton. Intellectual changes in late life. John
II project, contract no. 01IM10002B.                                             Wiley & Sons, 1996.
            A Pluggable Work-bench for Creating Interactive IR
                               Interfaces

                       Mark M. Hall                             Spyros Katsaris                       Elaine Toms
                   Sheffield University                       Sheffield University                 Sheffield University
                  S1 4DP, Sheffield, UK                      S1 4DP, Sheffield, UK                S1 4DP, Sheffield, UK
                 m.mhall@sheffield.ac.uk                 evolve.sheffieldis@gmail.com            e.toms@sheffield.ac.uk


ABSTRACT                                                                   and interacted with a participant [5], usually using a be-
Information Retrieval (IR) has benefited from standard eval-               spoke IIR interface. Developing and running such experi-
uation practices and re-usable software components, that en-               ments is a time-consuming, resource exhaustive and labour
able comparability between systems and experiments. How-                   intensive process [6]. As a result of this bespoke approach,
ever, Interactive IR (IIR) has had only very limited benefit               the comparability of IIR experiments and their results suf-
from these developments, in part because experiments are                   fers. Where studies of the same activities show divergent
still built using bespoke components and interfaces. In this               results, it is difficult to determine whether the di↵erences
paper we propose a flexible workbench for constructing IIR                 are due to the specific aspect of IIR under investigation, or
interfaces that will standardise aspects of the IIR experiment             simply due to di↵erent participant samples or small di↵er-
process to improve the comparability and reproducibility of                ences in how the non-investigated user-interface (UI) compo-
IIR experiments.                                                           nents were implemented. The bespoke nature also makes it
                                                                           harder to replicate studies, as publications frequently do not
                                                                           contain sufficient detail to exactly replicate the experiment.
Categories and Subject Descriptors                                            In [3] we have proposed a flexible, standardised IIR eval-
H.3.3 [Information Storage and Retrieval]: Information                     uation framework that aims to address the issues created by
Search and Retrieval; H.5.3 [Information Interfaces and                    variations in the experimental processes and by how context
Presentation]: Group and Organization Interfaces                           information is acquired from the participants. However, the
                                                                           framework makes no provisions towards providing standard-
Keywords                                                                   ised IIR components that would improve the comparability
                                                                           of the experiment itself, the ease of setting up the experi-
evaluation, framework, standardisation
                                                                           ment, and the ease of reproducibility.
                                                                              A number of attempts at developing a configurable, re-
1.      MOTIVATION                                                         usable IIR evaluation system have been made in the past.
   Information Retrieval (IR) has benefited from standard                  In 2004, Toms, Freund and Li designed and implemented
evaluation practices and re-usable software components. The                the WiIRE (Web-based Interactive Information Retrieval)
Cranfield-style evaluation methodology enabled evaluation                  system [6], which devised an experimental workflow pro-
programmes such as TREC, INEX, or CLEF. At the same                        cess that took the participant through a variety of question-
time provision of re-usable software components such as                    naires and the search interface. Used in TREC 11 Interac-
Lucene1 , Terrier2 , Heritrix3 , or Nutch4 have enabled IR re-             tive Track, it was built using Microsoft Office desktop tech-
searchers to focus on the development of those components                  nologies, severely limiting its capabilities. The system was
directly related to their research. However, Interactive IR                re-created for the web and successfully used in INEX2007
(IIR) as had only very limited benefit from these develop-                 [7], but lacked flexibility in setup and data extraction. More
ments.                                                                     recently, SCAMP (Search ConfigurAtor for experiMenting
   Typically IIR research is still conducted using a single sys-           with PuppyIR) [4] was developed to assess IR systems, but
tem in a laboratory setting in which a researcher observed                 does not include the range of IIR research designs that are
1                                                                          typically done. A heavy-weight solution is PIIRExS5 [1],
  https://lucene.apache.org/
2                                                                          which supports the researcher through the whole process
  http://terrier.org/
3                                                                          from setting up the experiment to analysis, providing greater
  https://webarchive.jira.com/wiki/display/Heritrix/Heritrix
4
  http://nutch.apache.org/                                                 support but also a steeper learning curve. These approaches
                                                                           highlight the difficulty of balancing the two main constraints
                                                                           that limit a system’s wide-spread use:
                                                                                • sufficient flexibility to support the wide range of IIR
                                                                                  interfaces and experiments;

    Presented at EuroHCIR2013. Copyright 2013 for the individual papers         • sufficiently simple to implement that it does not in-
    by the papers’ authors. Copying permitted only for private and aca-           crease the resource commitment required to set up the
    demic purposes. This volume is published and copyrighted by its edi-          experiment.
    tors.                                                                  5
                                                                               http://sourceforge.net/projects/piirexs
                                                                   [SearchResults]
                                                                   handler = application.components.SearchResults
                                                                   name = search_results
                                                                   layout = grid-9 vgrid-expand
                                                                   connect = search_box:query


Figure 1: The evaluation workbench consists of the                 Figure 3: Configuration for a Standard Results List
four core modules, into which the IIR components                   component, showing how the component’s layout (9
used in the experiment are plugged.                                grid-cells wide and vertically expanding) and con-
                                                                   nections to other components (to the “search box”
                                                                   component via the query message) are specified.


                                                                      When the researcher sets up the workbench for their ex-
                                                                   periment, they can freely configure which components to
                                                                   use, how to lay them out, and which components to con-
                                                                   nect to which other components. Based on this configura-
                                                                   tion the Web Frontend generates the initial user-interface
                                                                   that is shown to the participants. Then, when the partici-
                                                                   pant interacts with a UI element (fig. 2), the resulting UI
Figure 2: The workbench’s main workflow starts                     event is handled by the Web Frontend, which generates a
with the generation of the initial UI and then waits               message based on the UI event. This message is passed to
for the participant to generate a UI event. The event              the Message Bus, which uses the configuration provided
is processed, the a↵ected component’s state and UI                 by the researcher to determine which components to deliver
are updated and the workbench goes back to wait-                   the message to. The components that are listening for that
ing for the next UI event. A powerful aspect of the                message update their own Session state based on the mes-
workflow is that components when they receive a                    sage and then mark themselves as changed. After message
message, can generate their own messages.                          processing has been completed for all components, the Web
                                                                   Frontend then updates the UI for each of the changed com-
                                                                   ponents.
2.    DESIGN                                                          An example of the configuration used to set-up the exper-
   To achieve the goal of developing a system that fulfils         iment is shown in figure 3 (from the experiment in figure 4),
these requirements, we propose a system design that is based       specifying the configuration of the “search results” compo-
around a very lean core into which the researcher can plug         nent. It specifies that the component should be displayed 9
the IIR components they wish to include in their experiment.       grid-cells wide (the application layout uses a 12-by-12 cell
We have implemented this design in our web-based evalua-           grid layout) and should expand vertically to use as much
tion framework (fig. 1), which complements the larger IIR          space as is available. The component is configured to be
experiment support system presented in [3]. To achieve max-        connected to the “search box” component via the “query”
imum flexibility, the system was designed using a message-         message. It is this ability to freely plug components together
passing architecture that consists of the following four com-      that, we believe, makes the framework sufficiently flexible to
ponents:                                                           support the wide range of IIR experiments, while remaining
                                                                   simple to set-up and use.
     • Web Frontend is handles the interface between the
       participant’s browser and the evaluation workbench
       and is implemented using a combination of client-side       3.    STANDARD COMPONENTS
       and server-side functionality.                                 The core system provides only the framework into which
                                                                   the IIR components can be plugged. This allows the re-
     • Message Bus handles the inter-component communi-
                                                                   searcher to build any custom IIR UI they wish to test, while
       cation and forms the core of the system. It is respon-
                                                                   at the same time being able to take advantage of the stan-
       sible for passing messages from the Web Frontend
                                                                   dardised session and log handling functionality. As IIR UIs
       to the IIR components configured to be listening for
                                                                   frequently include required elements that are not the focus of
       those messages and also for passing messages directly
                                                                   the study the researcher wishes to undertake, an optional set
       between the components.
                                                                   of default components for core IR UI elements is provided to
                                                                   reduce set-up time. This has the additional advantage that
     • Session handles loading and saving the components’
                                                                   as their behaviour is consistent across experiments, the com-
       current state for a specific participant, hiding the com-
                                                                   parability of experiments using the framework is improved.
       plexities of web-application state from the individual
       components.
                                                                   3.1   Search Box
     • Logging provides a standardised logging interface that        The Search Box component ([8], p. 49, “Formulate Query
       allows the components to easily attach logging infor-       Interface” [2], p. 76) provides a standard search box. When
       mation to the UI event generated by the participant.        the participant enters text and clicks on the “Search” button,
it generates a query message, which is usually connected to     View, a query message is sent to the Standard Results List
a Standard Results List.                                        to find items with the same bit of meta-data. The interface
                                                                was used to investigate un-directed exploration behaviour in
3.2   Standard Results List                                     a large digital cultural heritage collection.
   The Standard Results List component ([8], p. 50, “Exam-
ine Results Interface” [2], p. 77) provides a default 10 item   5.     WHERE TO GO NEXT?
listing of search results. The Standard Results List includes
                                                                   The stated aim of this paper was to present a novel, plug-
support for displaying snippets ([8], p. 51) and what Wilson
                                                                gable, extensible, and configurable IIR interface work-bench,
calls “Usable Information” ([8], p. 51) for each result doc-
                                                                that supports our wider aim of improving IIR experiment
ument. Unlike the other standard components, which can
                                                                comparability. The work-bench is sufficiently flexible to sup-
be used out-of-the-box, the Standard Results List has to be
                                                                port the wide range of web-based IIR experiments that are
extended by the researcher in order to be able to access the
                                                                undertaken, while being sufficiently simple and light-weight
search-engine used to power the UI.
                                                                to encourage wide-spread use of the workbench.
3.3   Pagination                                                   To enable this wide-spread use, the system has been re-
                                                                leased under an open-source license6 . We are also moving
  The Pagination component ([8] p. 70) displays a config-
                                                                to engage with the wider research community to determine
urable number of pages around the current search-results
                                                                to what degree the work-bench satisfies their needs for an
page. In response to user interaction it sends a start mes-
                                                                evaluation system and what needs to be done to achieve the
sage with the rank of the first document to paginate to.
                                                                wide-spread use needed to improve IIR experiment compa-
3.4   Category Browsing                                         rability.
  The Category Browsing component ([8], p. 54) provides a
hierarchical category structure that the participant can use    6.     ACKNOWLEDGEMENTS
to explore a collection. Clicking on a category sends a query     The research leading to these results was supported by
message with the category’s identifier.                         the Network of Excellence co-funded by the 7th Framework
                                                                Program of the European Commission, grant agreement no.
3.5   Saved Documents                                           258191.
   The Saved Documents component provides an area where
the participant can save things that they have found inter-     7.     REFERENCES
esting, to support them in their current task. Documents        [1] R. Bierig, M. Cole, J. Gwizdka, N. J. Belkin, J. Liu,
are added through a save_document message. The Saved                C. Liu, J. Zhang, and X. Zhang. An experiment and
Documents component supports an optional tagging feature            analysis system framework for the evaluation of
enabling the participant to tag the document with values            contextual relationships. In CIRSE 2010, page 5, 2010.
specified by the researcher. This can be used to let the par-   [2] C. Chua. A user interface guide for web search systems.
ticipant specify why they have chosen that document or how          In Proceedings of the 24th Australian Computer-Human
much it helps them in their current task.                           Interaction Conference, OzCHI ’12, pages 76–84, New
3.6   Task                                                          York, NY, USA, 2012. ACM.
                                                                [3] M. M. Hall and E. G. Toms. Building a common
   The Task component provides a static display of the task
                                                                    framework for iir evaluation. In Information Access
information to show to the user. Two versions of this com-
                                                                    Evaluation meets Multilinguality, Multimodality, and
ponent are provided, one that displays a static text set in
                                                                    Visualization. 4th International Conference of the
the configuration, and one that can fetch a task description
                                                                    CLEF Initiative - CLEF 2013, 2013.
from the database, based on a parameter passed to it.
                                                                [4] G. Renaud and L. Azzopardi. Scamp: a tool for
                                                                    conducting interactive information retrieval
4.    APPLICATION                                                   experiments. In Proceedings of the 4th Information
   The evaluation work-bench has so far been used to build          Interaction in Context Symposium, pages 286–289.
two IIR experiments, very di↵erent in their nature, clearly         ACM, 2012.
demonstrating the work-bench’s flexibility.                     [5] J. Tague-Sutcli↵e. The pragmatics of information
   The first experiment (fig. 4) re-uses the standard Task,         retrieval experimentation, revisited. Information
Search Box, Pagination, and Saved Documents components,             Processing & Management, 28(4):467–490, 1992.
and extends the Standard Results List to work with the spe-     [6] E. G. Toms, L. Freund, and C. Li. Wiire: the web
cific search backend. This set-up re-creates what is essen-         interactive information retrieval experimentation
tially a relatively standard search UI configuration, that is       system prototype. Information Processing &
being used to investigate query session behaviour.                  Management, 40(4):655–675, 2004.
   The second experiment (fig. 5) demonstrates a much           [7] E. G. Toms, H. O’Brien, T. Mackenzie, C. Jordan,
richer interface, with more modifications to the components         L. Freund, S. Toze, E. Dawe, and A. Macnutt. Task
and an experiment-specific component. It re-uses the Task           e↵ects on interactive search: The query factor. In
and Category Browsing components, extends the default               Focused access to XML documents, pages 359–372.
Search Box, Pagination, Standard Results List, and Saved            Springer, 2008.
Documents components, and adds a new Item View com-             [8] M. L. Wilson. Search User Inteface Design, volume 20.
ponent. The message-passing nature of the system made               Morgan & Claypool Publishers, 2011.
it possible to quickly integrate the new component, so that
                                                                6
when the participant clicks on a meta-data facet in the Item        https://bitbucket.org/mhall/pyire
Figure 4: Screenshot showing an experiment with a very basic configuration consisting of Task, Search Box,
Pagination, Standard Results List, and Saved Documents components. This is being used to investigate query
behaviour for tasks that require query reformulations.


Figure 5: Screenshot showing an experiment that makes heavy use of the customisation options o↵ered by the
workbench. This configuration was used to investigate un-directed exploration in a digital cultural heritage
collection.
A Proposal for User-Focused Evaluation and Prediction of
              Information Seeking Process
                                                                Chirag Shah
                                           School of Communication & Information (SC&I)
                                                         Rutgers University
                                          4 Huntington St, New Brunswick, NJ 08901, USA
                                                       chirags@rutgers.edu


ABSTRACT                                                                1     INTRODUCTION
One of the ways IR systems help searchers is by predicting or           IR evaluations are often concerned with explaining factors
assuming what could be useful for their information needs based         relating to user or system performance after the search and
on analyzing information objects (documents, queries) and               retrieval are conducted [20]. Most recommender systems,
finding other related objects that may be relevant. Such                however, operate with an objective to suggest objects that could
approaches often ignore the underlying search process of                be useful to a user based on his/her or others’ past actions
information seeking, thus forgoing opportunities for making             [2][19]. We commenced our investigation by broadly asking
process-based recommendations. To overcome this limitation,             how we could take valuable lessons from both IR evaluations
we are proposing a new approach that analyzes a searcher’s              and recommender systems to not only evaluate an ongoing
current processes to forecast his likelihood of achieving a certain     search process, but also predict how well it will unfold and
level of success in the future. Specifically, we propose a              suggest a better path to the searcher if it is likely to
machine-learning based method to dynamically evaluate and               underperform. The motivation behind this investigation was
predict search performance several time-steps ahead at each             based on the following assumptions and realizations grounded in
given time point of the search process during an exploratory            the literature.
search task. Our prediction method uses a collection of features
extracted solely from the search process such as dwell time,            1.    The underlying rational processes involved in information
query entropy and relevance judgment in order to evaluate                     search are reflected in the actions users make while
whether it will lead to low or high performance in the future.                searching. These actions include entering search queries,
Experiments that simulate the effects of switching search paths               skimming the results, as well as selecting and collecting
show a significant number of subpar search processes improving                useful information [8][14][15].
                                                                        2.    A searcher’s performance is a function of these actions
after the recommended switch. In effect, the work reported here
                                                                              performed during a search episode [7][22].
provides a new framework for evaluating search processes and
predicting search performance. Importantly, this approach is            With these assumptions, we propose to quantify a search process
based on user processes, and independent of any IR system               using various user actions, and use it for user performance
allowing for wider applicability that ranges from searching to          (henceforth, ‘search performance’ or ‘performance’) prediction
recommendations.                                                        as well as search process recommendations.

Categories and Subject Descriptors                                      2     BACKGROUND
H.3: INFORMATION STORAGE AND RETRIEVAL H.3.3:                           Past research on predictive models that relates to the approach
Information Search and Retrieval: Search process; H.3:                  we describe in this paper can be grouped into two main
INFORMATION STORAGE AND RETRIEVAL H.3.4:                                categories: (1) behavioral studies and (2) IR approaches. In both
Systems and Software: Performance evaluation (efficiency and            cases; however, the focus has been on end products instead of in
effectiveness)                                                          the process required to produce them.
                                                                        As far as the behavioral studies go, research has been conducted
General Terms                                                           to explore users models that help anticipating specific aspects of
Measurement, Performance, Experimentation
                                                                        the search process. One goal in this context has been the
Keywords                                                                determination of whether a search process will be completed in a
                                                                        single or multiple sessions. For example, Agichtein et al. [3]
Exploratory search, Evaluation, Performance prediction
                                                                        investigated different patterns that can be identified in tasks that
                                                                        require multiple sessions. As a result, the authors devised an
                                                                        algorithm capable of predicting whether users will continue or
                                                                        abandon the task. Similar work is described in Diriye et al. [6],
Presented at EuroHCIR2013.                                              which focuses on predicting and understanding of why and
Copyright © 2013 for the individual papers by the papers’ authors.
Copying permitted only for private and academic purposes. This volume
                                                                        when users abandon Web searches. To address this problem, the
is published and copyrighted by its editors.                            authors studied features such as queries and interactions with
                                                                        result pages. Based on this approach, the authors were able to
                                                                        determine reasons for search abandonment such as accidental
                                                                        causes (e.g. Web browser crashing), satisfaction levels, and
                                                                        query suggestions, among others.
There have been also attempts to understand past users'             steps ahead with the aim to aid their search process awareness
behaviors in order to predict future ones in similar conditions.    and performance trends.
For example, Adar et al. [1] visually explored behavioral aspects   Unlike previous works in IR, we are not proposing to use time
using large-scale datasets containing queries and other             series analyses or seasonal components of historic data. Instead,
information objects produced by users. The authors were able to     we investigate predictive models based on machine learning
identify different behavioral patterns that seem to appear          (ML) techniques; namely: SVM, logistic regression, and Naïve
consistently in different datasets. While not directly related to   Bayes which are trained over a set of features such as time,
performance prediction, this work focused on attributes of the      number of queries, and page dwell time. In contrast to most IR
search process instead of in final products derived from it.        evaluations, our method focuses on user-processes. Also, unlike
Research like the ones described above often relies on historic     most recommender systems, our approach could output
data from large populations and the use of trend and seasonal       alternative strategies instead of similar/relevant products to help
components, which are used to model long-term direction and         the searcher. In essence, the work reported here takes several
periodicity patterns of time-series [17]. For example, some have    lessons from tradition IR evaluations, recommender system
explored seasonal aspects in Web search (e.g. weekly, monthly,      designs, and weather/stock forecasting to come up with a new
or annual behaviors) that provides useful information to predict    approach for evaluating and predicting search performance.
and suggest queries [5].                                            In the next section we provide a detailed description of our
From an IR perspective, Radinski et al. [18] explored models to     method, feature selection, and the measures we used in order to
predict users’ behaviors in a population in order to improve        create ML-based predictive models.
results from IR systems. The authors also developed a learning
algorithm capable of selecting an appropriate predictive model      3     METHOD
depending on the situation and time. As described by the            In order to analyze the search processes followed by different
authors, applications of this approach could go from click          users/teams, we assume that the underlying dynamics of the
predictions to query-URL predictions. In contrast to this           search processes are expressed by a collection of activities that
approach, our method presented in this paper considers both the     take place from the beginning to the end of the search processes.
population trends and an individual user behavior.                  The first part of our method is a feature extraction step in which
In a similar track, several works have been conducted on query      we extract a wide array of features relating to webpages, queries
performance prediction, focusing on developing techniques that      and snippets saved from the search processes for each unit of
help IR system to anticipate whether a query will be effective or   time t. This step is performed in order to evaluate how well we
not to provide results that satisfy users’ needs [4][10][11]. For   could use those features to capture the underlying dynamics
example, Gao et al. [10] found that features derived from search    which would lead to recognizing whether a search process is
results and interactions features offer better prediction results   going to lead to high or low performance in the future time steps
than a prediction baseline defined in terms of query features.      at t+n (n=1,2,….,N), where N is the furthest time step.
Results from this study have direct implications to individual      The decision to include or exclude a feature was based on
users by aiding the auto evaluation process of IR systems.          literature (e.g., [7]) as well as our past experience [22] with
In information search, users may be unaware of their individual     representing and evaluating search objects and processes. Each
performance when solving an information search task. For            feature is extracted for each user or team, u, up to time t from
instance, Shah & Marchionini [23] showed how lack of                the search processes and they are explained in detail as follows.
awareness about different objects involved in searching (queries,   •    Total coverage (u,t): The total number of distinct
visited pages, bookmarks) could result in mistaken perception            Webpages visited by a user (u) up to time t. This feature
about search performance during an exploratory search task.              captures the Webpage based activity performed by a user
Even if an IR system is highly effective, users may run into             and provides a measure to see how much distinct
multiple query formulation and evaluation of several pages               information has been found by the user up to this time.
before finding what they need. This process, which can be
related to search strategies, implies effort and time that is       •    Useful coverage (u,t): The total number of distinct
usually underestimated by the users themselves. In this sense,           webpages in which a user spent at least 30 seconds, up to
instead of predicting end products (i.e., overall performance),          time t. This measure evaluates out of the total pages he/she
the approach we introduce in this paper is oriented toward               has visited how many of them were useful in finding
predictions at different times in order to increase the level of         relevant information leading to satisfaction with their
awareness of users about their own search process. Similar to            context in completing the exploratory task [9][22][25].
weather forecast, this information could help users to be aware     •    Number of queries (u,t): Total number of unique queries
of possible trends based on past and current behavior.                   executed by a user up to time t. This feature implicitly
For a more recent discussion on IR evaluations and their                 relates to how much effort and cognitive thinking a user has
shortcomings, see [12]. To the best of our knowledge, search             put in to this task.
process performance prediction at different times from a user       •    Number of saved snippets (u,t): Total number of snippets
perspective has not been explored. Similar approaches can be             saved by user u up to time t. This measures the amount of
found in weather and stock market studies. For example, using            information that the user thought that might be relevant in
machine learning approaches such as Support Vector Machine               the future to complete the task and needed to be
(SVM), some models have been implemented to predict the                  remembered. In other words, this feature is an indication of
trends of two different daily stock price indices using NASDAQ           explicit relevance judgments made by the user.
and Korean Stock prices [13][16]. In a similar fashion, our         •    Length of Query (u,q,t): Length of each query(q) executed
approach is oriented to forecast users’ search performance N-            by a user u based on the character count of the query up to
                                                                         time t. This feature captures how the user imposed the
       queries and how long they were at different times of the         above mentioned criteria and threshold and used as the output
       search process.                                                  class labels to be used in the n-step ahead prediction model. If a
•      Number of tokens in each query (u,q,t): This is the count of     class label at n-step ahead was correctly predicted based on the
       tokens/words in each query(q) executed by user u up to           features extracted up to time t from the classification model it
       time t. This query based measure takes into account how          was considered as correctly classified and if not as misclassified.
       specific a user was in defining the query. By inspecting the     4     EXPERIMENTS
       datasets, we realized that queries with a less number of         In order to evaluate whether users who are predicted to perform
       tokens tend to get general results. On the other hand,           at low performance in the future based on the current search
       composed queries with multiple terms are related to more         process, could benefit from this analysis to improve their search
       specific searchers. We also observed that typically the users    process, we conducted some simple simulation analysis.
       started with general queries with few words at the
       beginning of the search process but then went into more          We considered the individual user search processes as a
       detailed queries to find more specific information later. For    collection of search paths, where each search path is defined as
       all these reasons, we found it to be useful to capture the       the search process from the time a user issued a query up to the
       number of token used in a query.                                 time user issued another quite different query. This was found
•      Query entropy (u,q,t): This measures the information             out using generalized Levenshtein (edit) distance, which is a
       content in a given query (q), by finding the expected value      commonly used distance metric for measuring the distance
       of information contained in a query. We used the widely          between two character sequences. If the Levenshtein (edit)
       recognized notion of Shannon entropy [24] in Information         distance between two subsequent queries were greater than 2
       Theory to calculate the information content of a query. We       (assuming less than 2 was when there were changes in the
       calculated the number of unique characters appearing in          queries due to simple spelling mistakes or refining of the query),
       each of the queries, which represent the observed counts of      we considered the search process from the former query to the
       the random variable. This was used as the input to Shannon       next query as a single search path.
       entropy calculation and we used to the maximum-                  Following this method, we found the first search path of each
       likelihood method to calculate the entropy. Query entropy        user and based on the features extracted up to the end of the first
       feature has been used in the past to predict goodness of a       search path, and based on the classification model learnt from
       query for making query expansion decision [21].                  that corresponding n-step ahead prediction we predicted whether
The method used to assess the search performance of a user is           the user is going to have low/high performance at the end of the
described below. We define a measure called Efficiency (u,t), for       session. If the user was going to have low performance, then out
each user u up to time t in order to predict whether a given            of the users who predicted to have high performance, we looked
search process is going to yield in high/low performance in the         at which high performing user has the lowest Levenshtein (edit)
future We first define Effectiveness of user u up to time t as the      distance between the queries issued by low performing user
ratio of useful coverage and total coverage (both defined               within the first search path and considered it as a pair of users,
earlier). A similar measure was used in [7] and [22].                   whom we are going to use in the simulation. Then, for each low
                                                                        performing user and high performing user that was matched, we
                                Useful coverage(u,t)                    switched the search process of low performing user at the end of
         Effectiveness(u,t) =
                                 Total coverage(u,t)                    the first search path with the high performing user’s search path
                                                           (1)
                                                                        up to t=T minutes, where T is the total number of minutes for a
       We then calculated Efficiency as defined in Equation 2.          session. Then we evaluated by switching the search process
                              Effectiveness(u, t)                       early during the overall process whether it would benefit each
         Efficiency(u,t) =                                              low performing user to improve their performance. We found
                             NumberofQueries(u,t)          (2)          that we were able to move most of the underperforming search
In other words, Efficiency is defined as the Effectiveness              processes to higher performance by early detection and
obtained per query, or how effective a query is in terms of             switching, while keeping the higher performing processes
achieving a certain level of useful coverage.                           unharmed.

The performance for each user u at each time t was classified in        These simulations provide verification that by realizing early
to the two classes; high performance and low performance based          during the search process whether a user is going to perform
on the following criteria:                                              well or not, one could recommend better search
                                                                        processes/strategies for that user which would lead to uplifting
Class =    { low
             high ;if
                  ;else
                        Efficiency(u, t) ≥ Efficiency(u, t) (3)         the search performance of a previously destined to low
                                                                        performing user.

                                                                        5     CONCLUSION
Using various user studies data available to us, we constructed
                                                                        When it comes to prediction, information retrieval and filtering
feature matrices which consist of all aforementioned features for
                                                                        systems are primarily focused on objects while assessing what
each minute of time t for all the users in each dataset, and
                                                                        and if something could help the users. These approaches are
converted in to a long vector of features which we fed as the
                                                                        often system-dependent even though the process of information
input to the classification models used.1 The class labels were
                                                                        seeking is usually user-specific. Personalization and
generated as high/low performance at minute t+n based on the
                                                                        recommendations are frequently exercised as methods to address
                                                                        user-specific IR and filtering, but still limited to comparing and
1
    In the interest of space and scope of work here, details of these   recommending objects, not focusing on underlying IR processes
     experiments have been omitted, but will be available for           that are carried out by the searchers. We presented a new
     discussion at the workshop.                                        approach to address these shortcomings. We began by asking
whether we could model a user’s search process based on the                      in collaborative IR systems. In Proceedings of the 75th Annual
actions he/she is performing during an exploratory search task                   Meeting of the Association for Information Science and
and forecast how well that process will do in the future. This                   Technology (ASIS&T). Baltimore, MD, USA.
was based on a realization that an information seeker’s search              [8] Gwizdka, J. (2008). Cognitive load on web search tasks. Workshop
goal/task can be mapped out as a series of actions, and that a                   on Cognition and the Web, Information Processing,
sequence of actions or choices the searcher makes, and                           Comprehension, and Learning. Granada, Spain. Available from
                                                                                 http://eprints.rclis.org/14162/1/GwizdkaJ_WCW2008_short_paper
especially the search path he/she takes, affects how well he/she                 _finalp.pdf
will do. Thus, in contrast to approaches that measure the
goodness of search products (e.g., documents, queries) as a way             [9] Fox, S., Karnawat, K., Mydland, M., Dumais, S., & White, T.
                                                                                 (2005). Evaluating implicit measures to improve web search. ACM
to evaluate the overall search effectiveness, we measured the
                                                                                 TOIS, 23(2): 147−168.
likelihood of an existing search process to produce good results.
                                                                            [10] Gao, Q., White, R., Dumais, S.T., Wang, S., & Anderson, B.
Here we presented simulations to demonstrate what could                          (2010). Predicting query performance using query, result and
happen if one can make process-based predictions, but one could                  interaction features. In Proceedings of RIAO 2010.
develop an actual recommender system using the proposed                     [11] He, B., & Ounis, I. (2006). Query performance prediction,
method. Another potential application of such prediction-based                   Information Systems, Volume 31, Issue 7, November 2006, Pages
method would be to use such approach in IR systems to provide                    585-594, ISSN 0306-4379, 10.1016/j.is.2005.11.003.
the awareness to users how their future performance will be                 [12] Järvelin, K. (2012). IR research: systems, interaction, evaluation
based on the current/past search process. The system could                       and     theories.  ACM      SIGIR        Forum,      45(2),      17.
identify that a user will have low performance if, he continues                  doi:10.1145/2093346.2093348
this manner at an early stage of the process, and what could be             [13] Kyoung-jae, K. (2003). Financial time series forecasting using
done to provide suggestions to improve overall performance.                      support vector machines. Neurocomputing, Volume 55, Issues 1–2,
                                                                                 September     2003,    Pages    307-319,   ISSN     0925-2312,
Given that the proposed technique is independent of any specific                 10.1016/S0925-2312(03)00372-2.
kind of system, and solely focused on user-based processes, it
                                                                            [14] Liu, C., Gwizdka, J., Liu, J., Xu, T., & Belkin, N. J. (2010).
will presumably be easy to apply it to a variety of IR systems                   Analysis and evaluation of query reformulations in different task
and situations irrespective of retrieval, ranking, or                            types. American Society for Information Science, 47(17). Available
recommendation algorithms. Finally, while we have used                           from http://dl.acm.org/citation.cfm?id=1920331.1920356
datasets borrowed from previous user studies, one could easily              [15] Liu, J., Gwizdka, J., Liu, C., & Belkin, N. J. (2010). Predicting
apply the proposed method to Web logs, TREC data, and other                      task difficulty for different task types. In Proceedings of the
forms of datasets with various user actions recorded over time.                  Association for Information Science, 47(16). Available from
                                                                                 http://dl.acm.org/citation.cfm?id=1920331.1920355
6     ACKNOWLEDGEMENTS                                                      [16] Ming-Chi, L. (2009). Using support vector machine with a hybrid
The work reported here is supported by The Institute of Museum                   feature selection method to the stock trend prediction. Expert
and Library Services (IMLS) Cyber Synergy project as well as                     Systems with Applications, Volume 36, Issue 8, October 2009,
IMLS grant # RE-04-12-0105-12. The author is also grateful to                    Pages 10896-10904, ISSN 0957-4174, 10.1016/j.eswa.2009.02.038
his PhD students Chathra Hendahewa and Roberto Gonzalez-                    [17] Ord, J., Hyndman, R., Koehler, A., & Snyder, R. (2008).
Ibanez for their valuable contributions to this work.                            Forecasting with Exponential Smoothing (The State Space
                                                                                 Approach). Springer, 2008.
7     REFERENCES                                                            [18] Radinski, K., Svore, K., Dumais, S. T., Teevan, J., Horvitz, E., &
[1] Adar, E., Weld, D. S., Bershad, B. N., & Gribble, S. D. (2007).              Bocharov, A. (2012). Modeling and predicting behavioral
     Why we search: visualizing and predicting user behavior. In                 dynamics on the Web. In Proceedings of WWW 2012.
     Proceedings of World Wide Web (WWW) Conference 2007.                   [19] Resnick, P., & Varian, H. R. (1997). Recommender Systems.
[2] Adomavicius, G., & Tuzhilin, A. (2005). Toward the Next                      Communications of the ACM, 40(3), 56–58.
     Generation of Recommender Systems: A Survey of the State-of-           [20] Saracevic, T. (1995). Evaluation of evaluation in information
     the-Art and Possible Extensions. IEEE Transactions on Knowledge             retrieval. In Proceedings of the Annual ACM Conference on
     and Data Engineering, 17(6), 734–749.                                       Research and Development in Information Retrieval (SIGIR) (pp.
[3] Agichtein, E., White, R.W., Dumais, S.T., & Bennett. P.N. (2012).            138–146).
     Search interrupted: Understanding and predicting search task           [21] Shah, C., & Croft, W. B. (2004). Evaluating high accuracy
     continuation. In Proceedings of the Annual ACM Conference on                retrieval techniques. Proceedings of the Annual ACM Conference
     Research and Development in Information Retrieval (SIGIR) 2012.             on Research and Development in Information Retrieval (SIGIR)
[4] Cronen-Townsend, S., Zhou, Y., & Croft, B. (2002). Predicting                (pp. 2-9). Sheffield, UK.
     query performance. In Proceedings of the Annual ACM                    [22] Shah, C., & Gonzalez-Ibanez, R. (2011). Evaluating the Synergic
     Conference on Research and Development in Information                       Effect of Collaboration in Information Seeking. Proceedings of the
     Retrieval (SIGIR) 2002.                                                     Annual ACM Conference on Research and Development in
[5] Dignum, S., Kruschwitz, U., Fasli, M., Yunhyong, K., Dawei, S.,              Information Retrieval (SIGIR) (pp. 913–922). Beijing, China.
     Beresi, U.C., & De Roeck, A. (2010). Incorporating Seasonality         [23] Shah, C., & Marchionini, G. (2010). Awareness in Collaborative
     into Search Suggestions Derived from Intranet Query Logs. In                Information Seeking. Journal of American Society of Information
     Proceedings of IEEE/WIC/ACM International Conference on Web                 Science and Technology (JASIST), 61(10), 1970–1986.
     Intelligence and Intelligent Agent Technology (WI-IAT) 2010,
     vol.1, no., pp.425-430, Aug. 31 2010-Sept. 3 2010                      [24] Shannon, C. E. and Weaver, W. Mathematical Theory of
                                                                                 Communication. Urbana, IL: University of Illinois Press, 1963.
[6] Diriye, A., White, R.W., Buscher, G., & Dumais, S.T.
     (2012). Leaving so soon? Understanding and predicting web              [25] White, R. W., & Huang, J. (2010). Assessing the scenic route:
     search abandonment rationales. In Proceedings of CIKM 2012.                 Measuring the value of search trails in web logs. In Proceedings of
                                                                                 the Annual ACM Conference on Research and Development in
[7] González-Ibáñez, R., Shah, C., & White, R. W. (2012). Pseudo-                Information Retrieval (SIGIR). Geneva, Switzerland.
     collaboration as a method to perform selective algorithmic mediation
8
     Directly Evaluating the Cognitive Impact of Search User
         Interfaces: a Two-Pronged Approach with fNIRS

                 Horia A. Maior1,2 , Matthew Pike1 , Max L. Wilson1 , Sarah Sharples3
                   1
                       Mixed Reality Lab, 2 Horizon DTC, 3 Human Factors - School of Engineering
                                              University of Nottingham, UK
                 {psxhama,psxmp8,max.wilson,sarah.sharples}@nottingham.ac.uk

ABSTRACT                                                                  movement, and fMRI requiring users to lay in tunnel void
Recent research has pointed towards further understanding                 of any metal objects. Recent Human-Computer Interaction
the cognitive processes involved in interactive information               research has listed the benefits of fNIRS brain sensing tech-
retrieval, with most papers using secondary measures of cog-              niques, which are less a↵ected by body movement, and can
nition to do so. Our own research is focused on using direct              be more easily used in ecologically valid study conditions.
measures of cognitive workload, using brain sensing tech-                    Functional Near Infrared Spectroscopy (fNIRS) is an emerg-
niques with fNIRS. Amongst various brain sensing technolo-                ing neuroimaging technique that is non-invasive, portable,
gies, fNIRS is most conducive to ecologically valid user stud-            inexpensive and suitable for periods of extended monitor-
ies, as it is less a↵ected by body movement and can be worn               ing. fNIRS measures the hemodynamic response - the de-
while using a computer at a desk. This paper describes our                livery of blood to active neuronal tissues. fNIRS is designed
two pronged approach focusing on a) moving fNIRS research                 to be placed directly upon a participants scalp, typically
beyond simple psychological tests towards actual interactive              targeting the prefrontal cortex. This paper describes our
IR tasks and b) evaluating real search user interfaces.                   two-pronged approach to using fNIRS to study the cogni-
                                                                          tive workload created by SUIs, focused on a) task analysis
                                                                          and b) SUI analysis.
Categories and Subject Descriptors
H5.2 [Information interfaces and presentation]: Eval-
uation/methodology, Theory and methods                                    2.   RELATED WORK
                                                                             Understanding the cognitive aspects of interactive search-
Keywords                                                                  ing (as well as interaction in general) has been a long-standing
                                                                          goal for researchers in the field of Interactive IR. In the 1970s
Functional near-infrared spectroscopy(fNIRS), Brain-computer              Bates suggested that searchers employ both search tactics
interface(BCI), Human cognition, Information processing sys-              and idea tactics [7]. In an attempt to explain an individual’s
tem, Multiple resource model, Limited resource model                      path during IR, Bates’ “Berrypicking” model [8] argued that
                                                                          search will vary as the user recognises information and has
1.    INTRODUCTION                                                        new ideas and questions.
   The cognitive aspects of Information Retrieval (IR) have                  In the main cognitive evolution of information seeking re-
repeatedly received focus over time, from Ingwersen’s Cog-                search, Ingwersen proposed a cognitive model of IR [11],
nitive Model [11], to recent analyses of cognitive workload               where the searcher’s understanding of the document collec-
during search tasks [2, 10]. The recurring interest is in what            tion, system, and task that would determine which path a
users think about at di↵erent task stages, and how much                   search would take. The model again put the user’s cognition
mental workload is involved. The benefits of knowing more                 as the central point of interest. More recently, Joho [12] ar-
about the searcher’s cognitive state would come from pro-                 gued that the cognitive e↵ects typically observed in Psychol-
viding better support for their needs, with Wilson et al sug-             ogy could provide a potential building block of theoretical
gesting that better designed Search User Interfaces (SUIs)                development for evaluating interactive IR. Back et al [2], for
could reduce unnecessary workload on the user [23].                       example, examined the cognitive demands on users during
   Although some prior work (e.g. [2]) have used indirect                 the relevance judgement phase, suggesting that the amount
techniques to analyse workload during search tasks, the de-               of workload involved was the reason behind searchers rarely
creasing cost of brain sensing hardware has meant that more               providing relevance judgements in previous work. Using a
recent research is using more objective techniques. Pike et               secondary measure, the Stroop task, Gwizdka [10] mapped
al [17] and Gwizdka et al [10] used EEG technology, while                 varying levels of workload at multiple stages of search.
Moshfeghi et al used fMRI to measure workload when mak-                      More recently, researchers have focused on objectively mea-
ing relevance judgements [15]. Each of these technologies                 suring interactive IR phases, in line with Back et al’s work,
have known limitations for studying actual interactive IR be-             Moshfeghi et al measured workload during relevance assess-
haviour, with EEG being highly a↵ected by even tiny body                  ments by asking people to make judgements while lying in
                                                                          an fMRI machine. As making relevance judgements can be
                                                                          performed without directly interacting with a computer, this
                                                                          made use of an fMRI machine more realistic. Using more
Presented at EuroHCIR2013. Copyright c 2013 for the individual papers
by the papers’ authors. Copying permitted only for private and academic   commercialised tools, Anderson [1] used an EEG sensor to
purposes. This volume is published and copyrighted by its editors.        compare visualization techniques in terms of the burden they
place on a viewer’s cognitive resources. Similarly, Pike et al   One important part of cognition during interactive search-
[17] developed a prototype tool named CUES that was ca-          ing involves human memory systems. There are two dif-
pable of collecting a variety of data including EEG whilst       ferent types of memory [21]: working memory (sometimes
interacting with a website. Pike et al used this to moni-        called short-term memory) and long-term memory. Wick-
tor aspects such as frustration and concentration, but their     ens describes working memory as the temporary holding of
work demonstrated the variability of EEG data across the         information that is “active”, while long-term memory involv-
several minutes involved in an interactive IR task.              ing the unlimited, passive storage of information that is not
   Using fNIRS, as introduced above, Peck [16] performed a       currently in working memory.
similar study of di↵erent visualisation techniques, while a         Working memory. Working memory, proposed by Bad-
system called Brainput [18] was able to identify and corre-      deley and Hitch (1974) [6], refers to a specific system in the
late brain activity patterns among users during multitasking     brain which “provides temporary storage and manipulation
studies, and intervene when it sensed workload exceeding a       of information...” [3]. Working memory [6, 4, 5] processes
certain level. Our work intends to build upon these HCI          information in two forms: verbal and spatial, and has four
studies, to study interactive IR tasks and SUIs in more eco-     main components (Figure 1):
logically valid user study situations.
                                                                    • A central executive managing attention, acting as
                                                                      supervisory system and controlling the information from
3.    RESEARCH PATHS                                                  and to its “slave systems”.
   Pike et al [17] highlighted the challenges of using brain
                                                                    • A visuo-spatial sketch pad holding information in
sensing technologies to evaluate IIR tasks: that tasks have
                                                                      an analogue spatial form (e.g. Colours, shapes, maps,
di↵erent stages, that behaviour quickly diverges after the
                                                                      etc.), specialised on learning by means of visuospatial
first interaction (and thus is hard to compare), and that
                                                                      imagery.
brain measurements vary dramatically over time. In order
to address these challenges, we have initiated two clear re-        • A phonological loop holding verbal information in
search paths, both utilising fNIRS technology: 1) evaluating          an acoustical form (e.g. Numbers, words, etc.); spe-
the cognitive aspects of Interactive IR tasks and 2) meth-            cialised on learning and remembering information us-
ods to evaluate the design of SUIs. The aim of the first              ing repetition.
path, is to move beyond using fNIRS to measure workload             • A episodic bu↵er dedicated to linking verbal and
in simplistic psychology memory tasks (like Peck et al [16]),         spatial information in chronological order. It is also
towards being able to break down real search tasks into pri-          assumed to have links to long-term memory.
mary components. This implies three considerations:

     • Collected data would be meaningless if is not related
       to existing knowledge. Therefore, to interpret sensed
       fNIRS data we use proposed theories and models.

     • It is known that fNIRS can sense cognition information
       [19, 16] related to so called working memory (if placed
       on the forehead). Assuming this is correct, we are
       using models of working memory.                              Figure 1: Baddeley’s Working Memory Model

     • The proposed models will help us interpret the sensed        Information processing system. As humans, we are
       data with fNIRS and have a better understanding of        exposed to large amounts of information via our sensory
       the cognitive impact of various complex tasks (such as    systems. One of our strengths is in selecting information
       a IR).                                                    from our environment, perceiving it, processing it, and cre-
                                                                 ating a response. Therefore we can use this understanding
Such a technique would allow researchers to analyse data by      of brain activity to identify which elements of an interac-
stage, and find e↵ective points of comparison during several     tive IR environment need to be considered when measuring
minutes of continuous measurements. The second path is           brain activity, and how we can reduce rather than increase
focused on identifying which aspects of working memory are       a user’s mental workload via interface and system design.
a↵ected by di↵erent features of SUIs, such that researchers         Wicken’s Information Processing Model [21] aims to il-
can objectively evaluate the e↵ect of di↵erent SUI design        lustrate how elements of the human information processing
decisions. A combination of both paths works towards being       system such as attention, perception, memory, decision mak-
able to proactively evaluate how SUIs support searchers.         ing and response selection interconnect. We are interested in
                                                                 observing how and when these elements interconnect during
                                                                 IR. He describes three di↵erent ‘stages’ (see STAGES di-
4.    PATH 1: WORKLOAD MODELS                                    mension in Figure 2) at which information is transformed:
  To understand the cognitive aspects of IIR, it is essential    a perception stage, a processing or cognition stage, and a
to learn about user’s capabilities and limitations in terms      response stage, the first two being processes involved in cog-
of their cognition: how people perceive, think, remember,        nition. The first stage involves perceiving information that
and process information. This path of research focuses on        is gathered by our senses and provide meaning and interpre-
existing models from Cognitive Psychology and Human Fac-         tation of what is being sensed. The second stage represents
tors, models that conceptualize and highlight aspects that       the step where we manipulate and “think about” the per-
typically describe or influence elements of human cognition.     ceived information. This part of the information processing
system takes place in working memory and consists of a              • Avoid unnecessary zeros in codes to be remembered;
wide variety of the mental activities. In relation to IR, it
is interesting to observe how elements of cognition, such as        • Encourage regular use of information to increase fre-
rehearsal of information, planning the search strategy and            quency and redundancy;
deciding on the search keywords interconnect.                       • Encourage verbalization or reproduction of informa-
   Multiple Resource Model. One model of mental work-                 tion that needs to be reproduced in the future;
load that has been widely accepted in Human Factors is
Wickens Multiple Resource Model [20] (Figure 2). The ele-           • Carefully design information to be remembered;
ments of this model overlap with the needs and considera-
tions of evaluating complex tasks (such as IR). He describes   Resource vs Demands. One other model that is of inter-
the aspects of human cognition and the multiple resource       est is the limited resource model [22] describing the relation-
theory in four dimensions:                                     ship between the demands of a task, the resources allocated
                                                               to the task and the impact on performance.


                                                               Figure 3: Resources available vs task demands !
                                                               impact on performance [22]

                                                                  The graph from Figure 3 is used to represent the lim-
                                                               ited resource model. The X-axes represent the resources
                                                               demanded by the primary task and as we move to the right
  Figure 2: The 4-D multiple resource model [20]               of the axes, the resources demanded by the primary task
                                                               increase. The axes on the left indicate the resources being
                                                               used, but also the maximum available resources point (if we
   • The STAGES dimension refers to the three main stages      think of working memory that is limited in capacity). The
     of information processing system (Wickens, 2004 [21]).    right axes indicate the performance of the primary task (the
                                                               dotted line on the graph). The key element of this model is
   • The MODALITIES dimension indicating that audi-            the concept of a limited set of resources which, if exceeded,
     tory and visual perception have di↵erent sources.         has a negative impact on performance. However, it does not
                                                               distinguish between resource modality, therefore we propose
   • The CODES dimension refers to the types of memory         to use both the limited and multiple resources models to
     encodings which can be spatial or verbal.                 inform our work.
   • The VISUAL PROCESSING dimension refers to a nested
     dimension within visual resources distinguishing be-      5.    PATH 2: SUI EVALUATION
     tween focal vision (reading text) and ambient vision         Relating quantitative data from brain sensing devices into
     (orientation and movement).                               feedback about SUI designs is one of our ultimate goals in
                                                               conducting this research. SUIs are inherently information
  Our aim is to understand how these elements link together    rich and thus a↵ect both visual (results page layout) and
and compose more complex components/tasks. Additionally        verbal (text based results) memory. Detecting a change in ei-
we want to consider how complex tasks (such as a search        ther verbal or spatial working memory would help determine
task) can be divided into primary components according to      if a workload di↵erence was caused by SUI design (spatial)
the models described. This will help identify possible prob-   or the amount of information the design provides (verbal).
lems in SUI design as well as indicating a possible solution   Our first in-progress study has stimulated each memory type
to the problem (suggested implications by Wickens [21]):       in di↵erent tasks - Verbal memory was tested by performing
                                                               an n-back [13] number memory task, whereas spatial mem-
   • Minimize working memory load of the SUI system and        ory was tested using an n-back visual block matrix task.
     consider working memory limits in instructions;           Other studies have also looked at each type of memory and
                                                               confirmed fNIRS ability to detect changes in heamodynamic
   • Provide more visual echoes (cues) of di↵erent types
                                                               responses accordingly [9].
     during IR (verbal vs spatial);
                                                                  In addition to developing an understanding of the ex-
   • Exploit chunking (Miller, 1956 [14]) in various ways:     tent to which we can monitor di↵erent memory, our ini-
     physical size, meaningful size, superiority of letters    tial study also sought to measure the e↵ect of artefacts on
     over numbers, etc;                                        the fNIRS data. Controlling the environment and human
                                                               derived sources of noise is a potentially difficult factor to
   • Minimize confusability;                                   control without e↵ecting the ecological validity of a study.
Solovey et al [19] showed that fNIRS is relatively resilient to      [3] A. Baddeley. Working memory. Science,
motion derived artefacts when compared to EEG [17] for ex-               255(5044):556–559, 1992.
ample, but still required some consideration by researchers          [4] A. Baddeley. The episodic bu↵er: a new component of
conducting studies. In our own experience, we found that                 working memory? Trends in cognitive sciences,
asking participants to remain still as much as possible was              4(11):417–423, 2000.
fairly successful. We are additionally looking at possible           [5] A. D. Baddeley. Is working memory still working?
methods for correcting motion derived artefacts using an                 European psychologist, 7(2):85–97, 2002.
external gyroscope connected to the participant.                     [6] A. D. Baddeley and G. Hitch. Working memory. The
   Designing tasks for experiments that measure cognitive ef-            psychology of learning and motivation, 8:47–89, 1974.
fect via a brain sensor require careful consideration in order       [7] M. J. Bates. Idea tactics. JASIST, 30(5):280–289,
to ensure that results can be attributed to a cause. Thank-              1979.
fully this problem space has been well explored in the field
                                                                     [8] M. J. Bates. The design of browsing and berrypicking
of Psychology and we are able to adapt the approaches de-
                                                                         techniques for the online search interface. Online
scribed in the literature to suit our task type requirements.
                                                                         Information Review, 13(5):407–424, 1989.
A primary example of this adaptation is demonstrated by
Peck et al [16], where 2 data visualisations techniques were         [9] X. Cui, S. Bray, D. M. Bryant, G. H. Glover, and
compared using a methodology based loosely on the n-back                 A. L. Reiss. A quantitative comparison of NIRS and
task - a widely used psychology task that is designed to in-             fMRI across multiple cognitive tasks. Neuroimage,
crease load on working memory.                                           54(4):2808–2821, 2011.
   Additionally, we are interested in exploring standard search     [10] J. Gwizdka. Distribution of cognitive load in web
studies (without following a psychological study layout) and             search. JASIST, 61(11):2167–2187, 2010.
seeing whether interesting states can be detected. Solovey          [11] P. Ingwersen. Cognitive perspectives of information
et al [18] performed a similar function by utilising a ma-               retrieval interaction: elements of a cognitive IR
chine learning algorithm that had classified “states of inter-           theory. Journal of documentation, 52(1):3–50, 1996.
est” prior to performing a task.                                    [12] H. Joho. Cognitive e↵ects in information seeking and
   Using a similar approach, we could evaluate a SUI to de-              retrieval. In Proc. CIRSE2009, 2009.
termine whether a particular change in layout has a positive        [13] W. K. Kirchner. Age di↵erences in short-term
or negative impact on visual memory. Alternatively, to test              retention of rapidly changing information. Journal of
the relevance of a results page (which would be dependant                experimental psychology, 55(4):352, 1958.
on the textual results), we could analyse the e↵ects on verbal      [14] G. Miller. The magical number seven, plus or minus
memory between 2 varied results pages, we could then re-                 two: Some limits on our capacity for processing
flect these changes to the Wickens Multiple Resource Model               information. The psychological review, 63:81–97, 1956.
[20]. We are also working towards enabling the interpreta-          [15] Y. Moshfeghi, L. R. Pinto, F. E. Pollick, and J. M.
tion of data within the context of complex multimodal tasks              Jose. Understanding Relevance: An fMRI Study. In
to further extending our knowledge of the processes involved             Proc. ECIR2013, pages 14–25. Springer, 2013.
during IR and how they interact and e↵ect one another.              [16] E. M. Peck, B. F. Yuksel, A. Ottley, R. J. Jacob, and
                                                                         R. Chang. Using fNIRS Brain Sensing to Evaluate
6.   SUMMARY                                                             Information Visualization Interfaces. In Proc.
   This paper has aimed to summarise our two-pronged ap-                 CHI2013. ACM, 2013.
proach towards actually evaluating the design of search user        [17] M. Pike, M. L. Wilson, A. Divoli, and A. Medelyan.
interfaces, in realistic ecologically valid study conditions, us-        CUES: Cognitive Usability Evaluation System. In
ing fNIRS technology. The approach first involves braking                EuroHCIR2012, pages 51–54, 2012.
down interactive IR tasks into how they e↵ect the di↵er-            [18] E. Solovey, P. Schermerhorn, M. Scheutz, A. Sassaroli,
ent elements of working memory, and second understanding                 S. Fantini, and R. Jacob. Brainput: enhancing
how SUIs are processed by di↵erent parts of working mem-                 interactive systems with streaming fNIRs brain input.
ory. Our two paths of research will build towards a stage                In Proc. CHI2012, pages 2193–2202. ACM, 2012.
where we can combine them and objectively evaluate cogni-           [19] E. T. Solovey, A. Girouard, K. Chauncey, L. M.
tive workload involved in interactive IR. We believe that this           Hirshfield, A. Sassaroli, F. Zheng, S. Fantini, and R. J.
research will provide a novel new direction that SUI’s and               Jacob. Using fNIRS brain sensing in realistic HCI
indeed HCI in a broader sense can benefit from. The asso-                settings: experiments and guidelines. In Proc.
ciation of physical recordings in ecological valid settings, to          UIST2009, pages 157–166. ACM, 2009.
an existing theoretical model, provides a new measure from          [20] C. D. Wickens. Multiple resources and mental
which future SUI development and evaluation could benefit.               workload. The Journal of the Human Factors and
                                                                         Ergonomics Society, 50(3):449–455, 2008.
7.   REFERENCES                                                     [21] C. D. Wickens, S. E. Gordon, and Y. Liu. An
                                                                         introduction to human factors engineering. Pearson
 [1] E. W. Anderson, K. Potter, L. Matzen, J. Shepherd,                  Prentice Hall Upper Saddle River, 2004.
     G. Preston, and C. Silva. A user study of visualization
                                                                    [22] J. R. Wilson and E. N. Corlett. Evaluation of human
     e↵ectiveness using EEG and cognitive load. Computer                 work. CRC Press, 2005.
     Graphics Forum, 30(3):791–800, 2011.
                                                                    [23] M. L. Wilson. Evaluating the cognitive impact of
 [2] J. Back and C. Oppenheim. A model of cognitive load                 search user interface design decisions. EuroHCIR
     for IR: implications for user relevance feedback                    2011, pages 27–30, 2011.
     interaction. Information Research, 6(2):6–2, 2001.
                             Dynamics in Search User Interfaces

                    Marcus Nitsche, Florian Uhde, Stefan Haun and Andreas Nürnberger
                                        Otto von Guericke University, Magdeburg, Germany
                         {marcus.nitsche, stefan.haun, andreas.nuernberger}@ovgu.de,
                                            florian.uhde@st.ovgu.de


ABSTRACT                                                                  knowledge available online. Therefore, a proficient tool to anal-
Searching the WWW has become an important task in today’s in-             yse the structure of the web and to provide guidance to specific
formation society. Nevertheless, users will mostly find static search     sources of information is needed. This task is accomplished by
user interfaces (SUIs) with results being only calculated and shown       modern search engines like Google2 , Bing3 , Yahoo4 and other lo-
after the user triggers a button. This procedure is against the idea      cal or topic centred search engines. By the increase of computa-
of flow and dynamic development of a natural search process. The          tional power in smart phones and wider access to online resources
main difficulty of good SUI design is to solve the conflict between       the demand for these search tools has risen and the quality of the
good usability and presentation of relevant information. Serving a        search terms has changed. Instead of single-query-searches, users
UI for every task and every user group is especially hard because         tend to request complex answers5 , trying to learn about topics in
of varying requirements. Dynamic search user interface elements           deep. While the need for information and the expectations of users
allow the user to manage desired information fluently. They offer         increased, matching the broader knowledge base contained in the
the possibility to add individual meta information, like tags, to the     Internet in the last few years. About 300 Mio. websites were added
search process and enrich it thereby.                                     in 20116 . Search engines mainly remain the same. This leads to the
                                                                          fact that a “significant design challenge for web search engine de-
Keywords                                                                  velopers is to develop functionality that accommodates the wide va-
Search User Interface, User Experience, Exploratory Search.               riety of skills and information needs of a diverse user population”
                                                                          [1]. Therefore, this paper proposes the concept of using dynamic
                                                                          elements in SUIs, that focus on fluent work flow characteristics, a
Categories and Subject Descriptors                                        high grade of interactivity and an adequate answer-time-behaviour.
H.3.3 [Information Storage and Retrieval]: Information Search
and Retrieval.; H.5.2 [Information Interfaces and Presentation]:
User Interfaces.                                                          2.    INFORMATION GATHERING
                                                                          Looking at users’ habits in search, they no longer perform sim-
                                                                          ple lookup searches. There is an increasing need to answer com-
General Terms                                                             plex information needs. Therefore, we mainly consider informa-
Design, Human Factors, Management.                                        tion gathering processes, searches where users are not familiar with
                                                                          the domain. Users need to refine search queries, branch out into
1.    MOTIVATION                                                          other queries to gain additional understanding and collect results to
Since the launch of the WWW, users accumulated a vast amount of           merge them into a single topic. This kind of search process is called
information. With broadband technologies becoming a part of ev-           exploratory search and is contrary to a known-item search task as
eryday life1 the WWW offers a great opportunity in terms of learn-        stated in [2]. Exploratory search processes “depend on selection,
ing and education. University courses, for instance, are available        navigation, and trial-and-error tactics, which in turn facilitate in-
online and nearly every topic is handled somewhere in the great           creasing expectations to use the Web as a source for learning and
amount of blogs, Q&A pages, fora, web pages or databases. Yet             exploratory discovery” [3]. Search tasks are fragmented, consist-
there is no map, no guide leading through this vast amount of in-         ing of single queries and search requests. The search requests may
formation. Users need to search for information, to locate the bits       yield additional data or parts of the final information which in the
fitting to their specific information need, indexing the amount of        end form the information requested by the user. While perform-
1
                                                                          ing such a complex search task, a pattern called berry picking [4]
  http://www.internetworldstats.com/images/                               can be observed. While reading through a source of data, looking
world2012pr.gif, 02.05.2013                                               for qualified information the user discovers new traces leading to
                                                                          other sources, which have to be handled one after the next. By re-
                                                                          2
                                                                            http://www.google.com, 02.05.2013
                                                                          3
                                                                            http://www.bing.com, 02.05.2013
                                                                          4
                                                                            http://www.yahoo.com, 02.05.2013
                                                                          5
                                                                            see the 2009 HitWise study for more details: http:
                                                                          //image.exct.net/lib/fefc1774726706/d/1/
Presented at EuroHCIR2013. Copyright c 2013 for the individual papers     SearchEngines_Jan09.pdf, 10.07.2013
                                                                          6
by the papers’ authors. Copying permitted only for private and academic     http://royal.pingdom.com/2012/01/17/
purposes. This volume is published and copyrighted by its editors.        internet-2011-in-numbers/, 02.05.2013
fining the search and gaining deeper information the user satisfies        synonyms. By adding and linking those parts the user constructs
the initial need for it. These different traces span a map in the end,     a boolean query which will be submitted to the Google search en-
representing the whole search and its processing. When someone             gine. Boolify was built for children and elderly. Tests in a third
is learning about something this map is refined and expanded. The          grade technology class showed that children without any knowl-
learner may track back to a certain node and deepen the understand-        edge of boolean queries were able to construct complex queries
ing about it by adding new queries, and therefore new branches. Or         just by pulling them together piece by piece9 . A similar approach
he may discard a whole part of the map because it turned out that          was implemented at SortFix10 . This tool offers the user the “abil-
the contained information was not relevant to him. When the user           ity to drag and drop search terms in between several buckets” [6]
is satisfied with the gained information this map is encapsulated          to in- and exclude them in the query. With a Standy Bucket users
and represents the whole development of this complex information.          are “able to keep track of all [their] inspirations and alternative
According to this concept the result is not a single object. It is a set   search words off to the side, ready to be dragged and dropped into
of sources, representing the learning process for a specific user.         your search box if needed.” [6] Another possible use of dynamic
                                                                           interface elements is the weighting of search terms based on their
Looking at the current process of information gathering in the In-         font size as used at SearchCloud.net11 . The ranked keywords are
ternet there are only two places. The Internet itself, containing the      shown in a Tag Cloud like manner and additionally the site shows,
pool of existing information, in an unstructured form and a mental         based on the ranking, “the calculated relevance score for each [re-
model about the information (space) that is constructed. This sys-         sult]” [6]. Not only the query building process can be enchanted
tem may work perfect when dealing with short, exact search queries         by dynamic elements, also the presentation of the result can benefit
like postal code New York City, but when it comes to complex in-           from it. Dynamic side loading can provide the user a lens like view
formation needs, where the user needs to access a lot of information       to parts of the result where keywords occur. Microsoft’s WaveLens
and generate more detailed search queries while looming through            “[...] fetches a longer sample for the page containing your key-
pages this system reaches it boundaries. The user might retrieve           words, without you having to download it.” [8] Microsoft Research
only partial facts. For example, if the user needs explanation of a        shows that in a study using WaveLens, presenting the participants
term used in its initial query. The user is now in need of another         with a normal interface and two versions of WaveLens’ UI (instant
place, where he can store information, reorder it and put it into the      zoom and dynamic zoom), “participants were not only slower with
context of other information pieces.                                       the normal view than the other two, but they were more than twice
                                                                           as likely to give up” [9]. Another way of result presentation was
                                                                           shown at SearchMe12 : “Fragmentation into multiple sites, domains
3.    STATE OF THE ART                                                     and identities becomes a huge distraction. User don’t know which
Looking at Google, the most used search engine today [5], the user         site to visit for which purpose, and the lack of consistent, intuitive
interface of a modern search engine is mostly static. Google’s fea-        inter-site search and navigation makes it hard to find content [..]”
tures include some dynamic elements like real time search. For             [6]. All these dynamic features can be used as a mask over tradi-
example “[..] Google Suggest which interactively displays sugges-          tional SUIs to extend them. By hiding the dynamic part, dynamic
tions in a drop-down list as the searcher types in each character of       elements can be added to an existing search engine and let the user
his/her query. The suggestions are based on similar queries submit-        make a choice which part should be shown and used. The proposed
ted by other users.” [1] Dynamic previews of results will be offered       concept is similar to Byström & Hansen’s approach in [19].
when clicking on the double arrow beside a result. But the core of
the interface has not changed a lot since its launch in 19977 . While      Issues. Comparing the state of the art with the process of infor-
adopting fast to new information sources like Facebook and Twitter,        mation gathering some issues appear, which may be resolved or at
Google discarded the adoption of new HCI methods in favour of a            least damped by using of dynamic elements. While collecting in-
clean, slim interface. With increasing touch support on the devices,       formation pieces for solving complex questions the user discovers
a richer user interface can be designed to provide the user with           new sources, containing more information. These sources may not
immediate feedback and allows haptic interaction with the search           form a linear search process every time. Sometimes there will be a
process. Some mobile clients take advance of the additional in-            split and the user needs to decide which trace to follow first. This
formation available, like the iOS search client, which switches to         issue is also noted in [10]. Today’s search engines offer only little
voice queries when the phone is lifted to the head, but there is no        support for this. The user needs to save web pages to favourites or
full extension of Google’s search services. While Google is an ad-         organize them himself for later reading. Searching different terms
equate tool for short queries and queries calling for a direct answer,     one by one allows users to follow new pages like traces through
features for deep research on complex topics are missing.                  the Internet. By connecting these traces and setting them into re-
                                                                           lation the user can retrieve the whole information needed to cover
One way to integrate dynamic elements into existing SUI infras-            his query. Most modern search engines discard this feature, it is
tructure is to build an overlay. Thereby, dynamic UI utilize existing,     again something the user needs to do by himself. This leads to
well known search engines and provide a benefit by enriching them.         another more general problem, the enclosing of search queries.
This approach is shown in the Boolify8 search engine, which pro-           Google for example handles every search term as a new opera-
vides a dynamic drag and drop interface on top of Google’s search          tion. Data is stored, but contains only general information about
engine. This engine is relatively new and was build to promote the         the user, queries are not related to each other and therefore miss-
understanding of boolean queries. Users build a query by drag-
ging jigsaw like parts onto a search surface. These parts contain           9
                                                                              http://ed-tech-axis.blogspot.de/2009/03/
words (general or exact) and linkers like AND and OR. Additional            boolified.htm, 02.05.2013
                                                                           10
parts have been added to provide search on a specific page or for             SortFix.com, offline since 11/2011,   Firefox plugin:
                                                                            https://addons.mozilla.org/en-us/firefox/
7
  http://www.google.com/about/company/                                      addon/sortfix-Extension, 02.05.2013
                                                                           11
history/, 02.05.2013                                                          http://searchcloud.net/, 02.05.2013
8                                                                          12
  http://www.boolify.org/, 02.05.2013                                         http://www.searchme.com, offline since 2009
                                                                         holds an array of parameters, which is used to evaluate every item.
                                                                         Possible criteria are Accuracy, Clarity, Currency and Source Nov-
                                                                         elty. These and more criteria are mentioned and explained in [14].
                                                                         When a user reorders items to fit his preferences the search engine
                                                                         may use the information provided by this ranking to weight the ex-
                                                                         isting parameters to yield better results in the future. The engine
                                                                         will be able to present results ranked according to the user’s prefer-
                                                                         ence. This can be done for all users and also search process wide, as
                                                                         some search tasks require documents and papers while others may
       Figure 1: Data flow while refining during search.                 focus on web pages or media. This addition to classical user inter-
                                                                         faces can make great use of the up-trend for touch based devices, in
                                                                         2012 89% of mobile phones and smart-books support touch [15].
ing its broader context. But when learning about a complex topic         Designing the SUI responsive to touch and gesture is maybe one
refining the search query is more important to the user. In the iter-    of the most natural solutions for human computer interaction and
ation of search processes, to narrow down the mass of information        adds an amount of possible actions based on gestures.
and to tap new sources, the searcher needs to rewrite and modify
the query, to link it to other related search tasks. Building a con-     Workbench. The workbench targets the issue of loosing informa-
nection between parts of information and evaluating it against each      tion while switching between different searches. It adds a third
other is a core principle of learning. This leaves the user targeting    place to the proposed search process, located outside of the search
a broader, intense search, in the need to build a custom solution to     scope but still related to it. The user may drop queries here to keep
extract knowledge and manage it. This is strictly against the guide-     them throughout the whole search process. When entering a query,
line for online interfaces which suggests to “[..] not require users     indicators show how relevant items on the bench are. This allows
to remember information from place to place on a Web site” [11] as       the user to classify new results in terms of integrity towards already
this is a distraction from the main process of searching and destroys    selected snippets. The workbench acts as a buffer between search
the interaction flow triggered by the search process.                    queries, adding a broader context to every entry. Like a frame, it
                                                                         contains information exclusively attached to the current search pro-
4.    COMPOSING A DYNAMIC SUI                                            cess, leading to the possibility of customization and user centred
                                                                         search environments. When the user switches between queries he
The proposed approach shows a design based on today’s search en-
                                                                         can immediately determine how well the new results fit into already
gines, enriched with dynamic UI elements to provide a plus for the
                                                                         selected items. This allows identifying false positive as well as ex-
user. The design includes principles to form web based learning ap-
                                                                         ploratory search [16] results. Users may just enter queries that lead
plications [12] to focus on the completion of complex search tasks.
                                                                         to a peripheral topic and check the indicators whether the result is
By adding dynamic elements internal states can be visualized for
                                                                         relevant to his initial information.
the user to give a better overview about the current position in the
search process. Furthermore it will allow the serialization of search
                                                                         Tag Cloud. The tag cloud is another feature to guide the user in the
processes and to step in at every point of the process later on. As
                                                                         search process. As shown in [17] a tag cloud supported retrieval
stated in Beyond Box Search “different interfaces (or at least dif-
                                                                         system can increase the find rate of adjacent data nodes by nearly
ferent forms of interaction) should be available to match different
                                                                         15%. When adding an item to the workbench its most relevant tags
search goals” and “[t]he interface should facilitate the selection of
                                                                         are extracted and visualized in the tag cloud. It is able to show how
appropriate context for the search” [13]. Both of this quality mea-
                                                                         often a tag occurs and how different tags are related to each other.
surements should be regarded when conceptualizing a SUI. The
                                                                         When entering a new search query the tag cloud displays the rele-
first point will be covered by a modular UI, the user may move,
                                                                         vant tags and reorders the cloud to revolve around the current tags.
hide and scale elements to fit his current need. The second point
                                                                         By combining distance and size of the entered tag with their direct
is strongly bounded to the use of dynamic items in the UI design.
                                                                         neighbours the user can directly spot how homogeneous its current
By giving immediate feedback to the user it is easier to classify
                                                                         query is in terms of the whole process. The tag cloud can also use
the current results. The context of the whole search process will
                                                                         the existing tags to show the user other closely related tags and sug-
be persistent over multiple search queries and provide a method of
                                                                         gest query refinement based on tag proximity. Colours can indicate
accumulation parts of the search process into a single object.
                                                                         the state a tag is currently in. A possible color scheme for western
                                                                         culture can be based on the three colors used in traffic lights. The
Four features are proposed and explained in this paper, showing a
                                                                         concept of three-coloured traffic lights also work for color-blind
use-case for dynamic search interfaces and giving a suggestion how
                                                                         people, since they do have a given position. Therefore, we also use
this can be accomplished. Together these features build up a mid
                                                                         second coding paradigm: form. A green triangle is proposed for
instance to accumulate into a bigger context for a search process.
                                                                         tags resulting from the current query, which are contained in the
This clipboard (Fig. 1) reshapes the search process and provide the
                                                                         overall tag cloud spanned by the workbench. An orange circle in-
place to store information between search queries. Instead of trying
                                                                         dicates a warning for tags, either in the current query result or the
to accumulate knowledge and information directly the user is able
                                                                         bench, which are not related to the rest of the cloud. A red square is
to construct a solution of the search query in this buffer and save it
                                                                         avoided for the reason that uncontained tags may not be bad, they
as a complete collection of the information retrieval process.
                                                                         can lead to a new direction or add a reasonable value to the whole
                                                                         search process. The tags are scaled depending on their frequency.
Reordering. Giving users the opportunity to reorder and therefore
                                                                         When the user selects any item from the bench or the search re-
to rate a search result is an important step towards dynamics in
                                                                         sult the corresponding tags are centred. The other tags are located
SUIs. Every result is handled as a single item and can be picked
                                                                         based on their coherence with the selected tags; closer means the
by the user and dropped in another place. The other items reorder
                                                                         tag is in a direct relation to the selected item. A user can quickly
fluently, giving user feedback while the user moves on. The SUI
                                                                         Starting as overlays and additional feature of existing search en-
                                                                         gines may develop and emerge into independent solutions.

                                                                         Acknowledgement
                                                                         Part of the work is funded by the German Ministry of Education and
                                                                         Science (BMBF) within the ViERforES II project (01IM10002B).
     Figure 2: Search map, representing the search process.
                                                                         6.   REFERENCES
                                                                          [1] Sandvig, J. C., Deepinder B.: User Perceptions of Search
check the integrity of his search process by looking at the tag cloud.
                                                                              Enhancements in Web Search. In: J. of Comp. Inform. Syst.
A slim, packed cloud means the results are all related to each other,
                                                                              52, no. 2, 2011.
an open, wide cloud indicates a broad result field, covering many
aspects. False positives may be filtered out, when enough items ex-       [2] White, R. W., Marchionini, G.: A Study of Real-Time Query
ist, as they stick out the rest of the cloud.                                 Expansion Effectiveness. In: SIGIR Forum 39, 2006.
                                                                          [3] Marchionini, G.: Exploratory Search: From Finding to
Search Map Support. The search map (Fig. 2) acts as a representa-             Understanding. In: Comm. of the ACM 49, 4.2006.
tion of the whole search process, by storing every query and follow-      [4] Bates, Marcia J.: The design of browsing and berrypicking
ing up querying and visualize it in a chronological order. The user           techniques for the online search interface. Univers. of Calif.
may select single nodes in the map to get into the state of search            at L.A., 1989.
process at this moment and refine it. The map provides a kind of top      [5] Purcell, K., Brenner, J., Rainie, L.: Search Engine Use 2012.
view to the path of the search and shows where the user branched              In: Pew Internet & American Life Project, 2013.
out into new queries. It allows the user to cut off nodes and whole       [6] Bates, M. E.: Make Mine Interactive. Vol. 31, Issue 10, p. 63,
branches if they are not needed any more to fulfil the need for in-           12/2008.
formation. As it contains every action and some data in the current       [7] Heer, J., Viégas, F. B., Wattenberg, M.: Voyagers and
search process, the search map might be serialized and stored to re-          voyeurs: Supporting asynchronous collaborative
trieve the search process later on. With this map at hand a user can          visualization. In: Commun. of the ACM, 52, No. 1, pp.
save whole search tasks just like he saves favourite web pages. He            87–97, ACM, New York, NY, USA, 01/2009.
can step back into the process at any time and reconstruct the whole      [8] MS Research: Cutting Edge. New Scientist 181, no. 2434,
learning process or correct parts of the search which has proven to           2004.
be not correct. This kind of Story Telling helps to visualize the         [9] Paek, T., Dumais, S., Logan, R.: WaveLens: A new view
given data, “[...] lead to findings, which prompt actions [...] [and]         onto Internet search results. In: Proc. of the SIGCHI
can indicate the need to forage for new data.” [18] The search map            Conference on Human Factors in Computing Systems (CHI
[7] features two ways of expanding. The user may follow a result to           ’04), pp. 727–734, 2004.
expand it vertically. The result is added as a new node and resides      [10] Morville, P., Callender, J.: Search Patterns - Design for
in the map until it is processed further. When the user selects an            Discovery. In: O’Reilly, 2010.
existing node he steps back to the vertical position of this node and
                                                                         [11] U.S. Department of Health and Human Services,
can now branch out horizontally. This deals with an issue of berry-
                                                                              Research-Based Web Design and Usability Guidelines.
picking [4], where the new sources has to be processed one by one.
                                                                              Washington, D.C.: GPO, n.d.
While not abolishing this the search map provides a visual repre-
sentation to simulate parallelism. The map also allows scoping of        [12] Jayasimman, L., Nisha Jebaseeli, A., Prakashraj, E.G.,
the analysis by creating a horizontal or vertical bound. Only tags            Charles, J.: Dynamic User Interface Based on Cognitive
and items inside this bound will be considered, the rest is greyed            Approach in Web Based Learning. In: Int. J. of CS Iss.
out. This allows the user to dig deep into a certain topic (small             (IJCSI), 2011.
vertical bounds) or create a better understanding of a certain term      [13] Buck, S., Nicholas, J.: Beyond the search box. Reference &
and add more results to a certain query (horizontal boundary). This           User Services 51(3), pp. 235-245, 2012.
can help the user to concentrate on smaller pieces of a big search       [14] Beresi, U. C., Kim, Y., Song, D., Ruthven, I.: Why did you
process and to narrow down problems one by one.                               pick that? Visualising relevance criteria in exploratory
                                                                              search. In: Int. J. on Dig. Lib. 11 (2), pp. 59–74, 2010.
                                                                         [15] Lee, D: The State of the Touch-Screen Panel Market in 2011.
5.    CONCLUSION                                                              In: Walker Mobile, LLC, SID Information Display
This paper has shown certain design flaws of today’s search engines
                                                                              Magazine, 3.2011.
and some proposed dynamic design principles to counter them. The
application of the envisioned elements can extend a search engine        [16] White, R. W., Kules, B., Drucker, S. M., schraefel, m. c.:
towards a software capable of complex research tasks. With the                Supporting Exploratory Search. In: Comm. of the ACM 49,
current up-trend of online learning this unlock a new way of using            4.2006.
them. The surplus resides not only in the dynamic and vivid inter-       [17] Trattner, C.: QUERYCLOUD: Automatically linking related
face, it prepares a whole new tier of online search solutions. The            documents via search query (Tags) Clouds. In: Proc. of the
process of learning can be preserved and shared with others. One              IADIS Int. Conf. on WWW/Internet, 2010.
can come back at any time, jump right into the saved search process      [18] Mackinlay, J. D.: Technical Perspective: Finding and Telling
and reconstruct the development of certain knowledge. With this               Stories with Data. In: Comm. of the ACM 52, 2009.
tool chain at hand learning becomes a social and an integrative part     [19] Byström, K., Hansen, P.: Conceptual framework for tasks in
of the WWW. The next step in deploying dynamic elements into                  information studies: Book Reviews. In: J. Am. Soc. Inf. Sci.
search user interfaces would be prototyping them. Design snippets             Technol., Vol. 56, 10, pp. 1050–1061, John Wiley & Sons,
need to be tested for usability and acceptance in the real world.             Inc., New York, NY, USA, 2005.
                              SearchPanel: A browser extension
                                for managing search activity

                    Simon Tretter                          Gene Golovchinsky                     Pernilla Qvarfordt
             University of Amsterdam                   FX Palo Alto Laboratory, Inc.        FX Palo Alto Laboratory, Inc.
           Amsterdam, The Netherlands                      3174 Porter Drive                    3174 Porter Drive
                                                              Palo Alto, CA                        Palo Alto, CA
               s.tretter@gmail.com                           gene@fxpal.com                    pernilla@fxpal.com

ABSTRACT                                                                  documents in relation to the searcher’s activity: how many
People often use more than one query when searching for                   times was a document retrieved, whether it was viewed be-
information; they also revisit search results to re-find infor-           fore, etc. This kind of information can help searchers to
mation. These tasks are not well-supported by search inter-               remember, understand and plan their search processes.
faces and web browsers. We designed and built a Chrome                       The browser plugin enhances the searcher’s ability to use
browser extension that helps people manage their ongoing                  process metadata to understand their search results and to
information seeking. The extension combines document and                  plan subsequent activity by displaying surrogates for the
process metadata into an interactive representation of the                current set of retrieved documents. We represent prior re-
retrieved documents that can be used for sense-making, for                trieval state, whether a document was opened, and whether
navigation, and for re-finding documents.                                 it was bookmarked in an integrated overview that appears
                                                                          at the side of the browser window. We also make it pos-
                                                                          sible for searchers to examine multiple documents without
1.    INTRODUCTION                                                        returning to the search results or using multiple tabs.
   Broder et al. [3] proposed a taxonomy of web search that                  The remainder of this paper is organized as follows: we
included transactional and navigational searches in addition              review the relevant related work, describe the browser ex-
to the more traditional (from an IR perspective) informa-                 tension, and conclude with a discussion of the design space.
tional searches. To this taxonomy we might add re-finding
[17] [5], the task of locating a previously-found document.
From a theoretical perspective, it is not clear whether refind-           2.   RELATED WORK
ing is a di↵erent kind of search activity or an orthogonal di-               There are two broad categories of related work: the man-
mensions. Regardless, while major web search engines o↵er                 agement of search history and the representation of search
simple and efficient interfaces for navigational and transac-             results. Refinding has received increasing attention recently.
tional searches, relatively little support is available for more          While the browser implements some history mechanisms,
complex informational search or re-finding.                               these are typically not well-suited to users’ needs [15]. El-
   These seemingly neglected activities are not unimportant,              sweiler and Ruthven [5] described di↵erent patterns of re-
however: Teevan et al. [17] reported that 39% of queries are              finding; Teevan [16] proposed a mechanism for merging pre-
re-finding queries; furthermore, 20-30% of searches represent             viously-found and newly-retrieved documents. More explicit
open-ended informational needs [13]. Related, Qvarfordt et                management of search history has also been investigated in
al. [11] found query overlap rates of 50-60% in exploratory               the literature; see [7] for a succinct summary.
search, and suggested that awareness of this overlap may be                  Information overload due to large numbers of results is
useful in supporting more efficient searching behavior. Thus              a common problem in information seeking [2]. This prob-
we decided to explore ways in which searchers’ interactions               lem can be addressed in a variety of ways. MetaSpider
with search engines could be enhanced to support these more               [4] uses a 2D map to display and classify retrieved doc-
complex information-seeking tasks.                                        uments. Grokker [8] uses nested circular and rectangular
   We created a web browser extension that enriches com-                  shapes to present results and also shows them in a hier-
mon web search engine interfaces and addresses important                  arachical grouped way. Sparkler [12] uses a star plot for the
deficits with respect to open-ended (exploratory) search and              result presentation, where every star represents a document.
re-finding. Our extension visualizes search results to help                  One potential issue with the systems above is that the
users find the right document or documents by visualizing                 overall organization of the interface itself may induce us-
metadata of the retrieved pages.                                          ability problems. Complex interfaces allow more individual
   Following Golovchinsky et al. [7] we distinguish docu-                 settings to be specified by a user, but simple interfaces allow
ment metadata from process metadata. Document metadata                    a broader spectrum of users to use them. This tradeo↵ is
– dates of publication, titles, hosting web sites, etc. – are             not trivial to handle, and as we see nowadays, most Web
basic characteristics of documents that are independent of                search interfaces tend to be quite simple.
the means by which these documents were retrieved. Pro-                      Supporting the searcher’s decision making process can be
cess metadata, on the other hand, characterize aspects of                 crucial for e↵ective search performance for complex infor-
                                                                          mation needs. This support can take the form of enhanced
Presented at EuroHCIR2013. Copyright c 2013 for the individual papers
by the papers’ authors. Copying permitted only for private and academic   surrogates for documents. One type of information often
purposes. This volume is published and copyrighted by its editors.        used for this purpose is document metadata (author, date,
images of the document, etc.). Even et al. [6] has shown          Table 1: Design space: Activities and supporting features
that the decision making process can be highly improved by        related to document and process metadata. ”Doc” refers to
adding process metadata (in our case information that is re-      document metadata and ”Proc” to process metadata.
lated to the search process) to the user interface. Research
has shown that presenting simple tasks in a slightly di↵er-          Activity               Feature             Doc    Proc
ent way may help the user to understand how the search                   Search    perform search               yes     no
is performing and what can be done to gain better results                          switch engine                 no     yes
[18]. One common example of incorporating process meta-                            results list                 yes     no
data in web browsers is the practice of changing the color of                      visit status                  no     yes
a traversed link anchor.                                                           visualize no. of visits       no     yes
   Spoerri [14] showed that users can benefit from di↵erent          Navigation    access results                 -      -
or additional visualizations of web search results. However,                       mark current result path      no     yes
none of the techniques above have been integrated by major                         identify results: preview    yes     no
search engines into their main interfaces. In some cases, ex-                      snippet
tension developers have enhanced the user experience of web                        identify results: favicon     yes     no
search. Examples include: SearchPreview[9] that fetches            Organization    bookmarking                   no      yes
screen shots of the result pages and shows them directly                           organize bookmarks            no      yes
next to the each search result. Bettersearch[1] is a Firefox
extension that performs a similar task, but also enriches the        When searchers find useful web pages, they may wish to
result page with more features and links. For example, this       save those documents for future access. More specialized
extention allows users to open a result in a new tab, or adds     search engines sometimes support this capability directly,
links to a search result to quickly show the web page on the      but it is most often supported only by the browser’s book-
”Wayback Machine”1 . WebSearch Pro [10] is also a Firefox         marking capability.
extension that adds the ability to look up a text by high-           We can consider these search and sense-making activities
lighting it on a page. Another feature is drag&drop zones         in light of the kinds of information required to satisfy them.
to search for things directly from any website.                   In particular, Table 1 shows when document and process
                                                                  metadata might be pertinent for the di↵erent categories of
                                                                  search activities. A representation of the number of visits
3.    BROWSER EXTENSION                                           to a retrieved result (process metadata) could be used by a
   To compensate for the deficiencies of SERPs we created a       searcher to decide how to interact with that result. In a re-
browser extension called SearchPanel. This extension com-         finding sub task, for example, searchers might want to ignore
bines document and process metadata in a visual represen-         newly-found documents or pages that were not opened.
tation of search results to help people manage their infor-          The purpose of the search panel is to complement the
mation seeking. We chose the browser extension approach           SERP and to be available when exploring search results; we
rather than creating a proxy for several reasons. While both      wanted the design to be simple and unobtrusive but still
o↵er the potential of parsing and augmenting SERP and             convey useful information. Some features (e.g., organizat-
document pages, a browser extension has some advantages.          ing bookmarks) listed in Table 1 are too complex to be in-
It scales better with respect to storing user history data. It    tegrated into the extension. Others, such as favicons, while
ensures a higher level of data privacy, since data that might     seemingly trivial, may still provide useful information for
potentially reveal user interests (e.g., query keywords, se-      navigating search results.
lected URLs, etc.) can be logged as hashed values. Finally,
it has access to bookmarks and local browsing history.            3.2     Implementation
3.1    Design space                                                  SearchPanel displays automatically on the right side of the
                                                                  browser window when it is enabled (Figure 1). The right side
   When performing search tasks, searchers may need di↵er-        of the content page has been chosen because this location is
ent kinds of information to support their information seek-       frequently free of document content. In cases of overlap, its
ing. We represent the design space as consisting of three         vertical position can be adjusted manually to accommodate
categories of activities: search activity, navigation activity,   page content that may be occluded.
and organization activity.                                           SearchPanel displays immediately after a search has been
   Historically, web UI support for the search process, or        performed on a supported web search engine (currently, they
search activity, has been focused on query formulation and        are Google, Google Scholar, Yahoo, Bing and Microsoft Aca-
understanding the current query. Web browsers o↵er lim-           demic Search). SearchPanel remains visible even if the sear-
ited support for comparing current results set with earlier       cher follows links from retrieved documents. In addition,
activity by marking the visited status of documents.              searchers can return directly to the original query, or re-run
   When engaged with a search task, users need to shift their     it on a di↵erent search engine.
attention between the SERP and the retrieved pages. In               A short tutorial page is displayed at installation, and can
some cases, the searcher does not find the desired informa-       also be reached through the option menu. This page also
tion in a retrieved document, but rather in links to other        allows logging (see 3.2.4) to be disabled, and can be used to
documents containing relevant information. This naviga-           delete the recorded history.
tion activity can be an important part of the information
seeking process.                                                  3.2.1    Document metadata
1                                                                  SearchPanel displays several kinds of document metadata.
 The Wayback Machine is a service that provides access to
archived and historical versions of web sites.                    Documents are represented by bars arranged in order corre-
                                                                  Figure 2: Highlighting of snippet on the SERP when mous-
                                                                  ing over SearchPanel.


                                                                  Figure 3: Snippets of other pages are shown on a document
Figure 1: SearchPanel control annotated to show impor-            page when mousing over SearchPanel.
tant aspects. 1 search engine selector; 2 bar representing
a newly-found page; 3 favicon representing the site from          the star to bookmark the corresponding page. Second, pre-
                                                                  viously bookmarked documents in the SERP will show a
which the page was retrieved; 4 bar representing page that
                                                                  yellow star next to them. This allows to re-find a web page
has been visited; 5 highlighted bar based on cursor posi-         quicker, as the user does not need to navigate to a document
tion; 6 bookmark indicator; 7 currently-selected page.            to know if they have previously bookmarked it.

sponding to the retrieved list; clicking on a bar is equivalent   3.2.3    Navigational support
to clicking on a link on the SERP. Almost all websites have          The selection indicator (see item 7 in Figure 1) indicates
icons (favicons) to help re-identify the web page quickly;        the currently-selected result page. If a link on a result page
these icons are shown to the right of the bar (see Figure 1,      is clicked, the page indicator will stay on the last retrieved
item 3 ). A tooltip with the title of the document is added       document page to indicate that navigation started with it.
to each bar as well. We considered identifying other meta-        Hovering over the result highlights the associated bar (item
data such as document MIME type, but that would incur              5 ), and also highlights the corresponding snippet in the
the overhead of a separate HTTP request for each document.        SERP (Figure 2); the SERP is scrolled as necessary to bring
At least initially, we chose not to pursue this strategy.         highlighted snippet into view. Conversely, when the mouse
                                                                  is over a snippet on the SERP, the related bar jiggles left-
3.2.2    Process metadata                                         right to reinforce the connection between the two.
   Process metadata is also incorporated into SearchPanel.           When the user navigates o↵ the SERP to a search re-
First, the icon of the search engine that ran the search is       sult, SearchPanel remains active. Clicking on bars navigates
highlighted in the top bar (item 1 ). Other icons repre-          among the retrieved documents, bypassing the intermediate
sent available comparable search engines. Clicking on one         step of reloading the search results. When the mouse is over
of these icons re-runs the query with the selected search en-     a bar in SearchPanel, the SERP snippet of that result will
gine. Search engines are grouped into two categories (web         be shown. This can be seen in Figure 3, where a preview
search and academic research) and only the relevant ones are      of the Wolfram Alpha snippet is shown. If the snippet is
shown. The current selection (highlighted with a black bor-       not available, a tooltip with the document title is shown
der) links back to the search result page if the user navigates   instead. Both of these features should make it easier and
to one of the retrieved documents.                                more efficient to navigate the search results without neces-
   Each bar can have one of three di↵erent colors, depending      sarily creating a large number of tabs in the process.
on the link history. If a link has never been retrieved before,
the state of the link is ”new” and the color will be teal. Re-    3.2.4    Logging
sults that have been retrieved by prior queries but have not        The extension was created to study people’s information
been clicked on are colored blue. Visited links are colored       seeking behaviors. The goal of the project is to understand
violet. The local browser history is examined to retrieve the     how people use the web when looking for information to
link status. This allows us to incorporate page views that        improve their search experience. Therefore logging of user
occurred before SearchPanel was installed.                        activity was necessary. To encapsulate it from the basic
   Each bar’s length reflects the frequency of retrieval of the   functionality it was designed as plugin that could be con-
corresponding page. The more frequently a page has been           nected or disconnected from SearchPanel. It collects infor-
retrieved, the shorter the bar gets (item 3 ). The retrieval      mation related to the use of SearchPanel for the purposes of
history is stored locally in the browser for privacy reasons      statistical analysis of patterns of behavior.
and can be deleted through SearchPanel’s option page.               To maximize searchers’ privacy, no personally-identifying
   In SearchPanel, the bookmarking function serves two pur-       information is saved. Queries and found URLs are recorded
poses (item 6 in Figure 1). First, searchers can click on         as MD5-hashed values only. This allows us to identify re-
curring queries and documents, without being able to read         [6] Even, A., Shankaranarayanan, G., and Watts,
the content of the query or to observe which pages people             S. Enhancing decision making with process metadata:
view. Specifically, the following information is recorded:            Theoretical framework, research tool, and exploratory
                                                                      examination. In System Sciences, 2006. HICSS’06.
     • The IP address and the time the event was logged
                                                                      Proceedings of the 39th Annual Hawaii International
     • When a search result was clicked and where this hap-           Conference on (2006), vol. 8, IEEE, pp. 209a–209a.
       pened (SearchPanel or SERP)                                [7] Golovchinsky, G., Diriye, A., and Dunnigan, T.
     • Hash strings that represent the queries and found web          The future is in the past: designing for exploratory
       pages.                                                         search. In Proceedings of the 4th Information
     • Time spent with the mouse on di↵erent interface parts          Interaction in Context Symposium (New York, NY,
       (SearchPanel vs SERP)                                          USA, 2012), IIIX ’12, ACM, pp. 52–61.
                                                                  [8] Hong-li, Q. A novel visual search engines: Grokker.
     • Various actions related to the extension (adding book-
                                                                      Journal of Library and Information Sciences in
       marks by clicking the start, moving it, etc.).
                                                                      Agriculture 8 (2008), 047.
                                                                  [9] KG, P. U. . C. Searchpreview, the browser extension
4.    NEXT STEPS                                                      previously known as googlepreview.
   After an in-house pilot deployment, SearchPanel has been           http://searchpreview.de/, 2013. [Online; accessed
made available through the Google Chrome store. The goal              06/06/2013].
of the deployment is to understand whether the extension         [10] Martijn. Web seach pro, search the web the way you
helps people with their search tasks, and to assess the rela-         like... http://websearchpro.captaincaveman.nl,
tive utility of document vs. process metadata. We also ex-            2012. [Online; accessed 06/06/2013].
pect to collect a dataset that characterizes people’s browsing   [11] Qvarfordt, P., Golovchinsky, G., Dunnigan, T.,
and searching behaviors in terms of patterns of retrieval and         and Agapie, E. Looking ahead: Query preview in
re-retrieval, search result navigation, etc.                          exploratory search. In Proceedings of the 36th
                                                                      international ACM SIGIR conference on Research and
5.    CONCLUSIONS                                                     development in Information Retrieval (New York, NY,
   Web search engines are used for many di↵erent kinds of             USA, 2013), SIGIR ’13, ACM.
search tasks. While navigational and transactional uses of       [12] Roberts, J., Boukhelifa, N., and Rodgers, P.
search engines are well-supported by current interfaces and           Multiform glyph based web search result visualization.
algorithms, searchers are left to their own devices for more          In Information Visualisation, 2002. Proceedings. Sixth
open-ended information seeking and re-finding. We created             International Conference on (2002), IEEE,
a Google Chrome browser extension to help people manage               pp. 549–554.
their search activity. We explored the design space of doc-      [13] Rose, D. E., and Levinson, D. Understanding user
ument and process metadata related to the wide range of               goals in web search. In Proceedings of the 13th
activities searchers may engage in during information seek-           international conference on World Wide Web (2004),
ing. The extension keeps track of retrieval, page visits, and         ACM, pp. 13–19.
bookmarking, and integrates traces of these activities with      [14] Spoerri, A. How visual query tools can support users
document metadata to give people a more complete impres-              searching the internet. In Information Visualisation,
sion of their search activity. An upcoming deployment will            2004. IV 2004. Proceedings. Eighth International
explore the e↵ect that this extension has on how people in-           Conference on (2004), IEEE, pp. 329–334.
teract with search results.                                      [15] Tauscher, L., and Greenberg, S. How people
                                                                      revisit web pages: empirical findings and implications
6.    REFERENCES                                                      for the design of history systems. Int. J.
 [1] ABAKUS. Bettersearch a firefox addon for enhancing               Hum.-Comput. Stud. 47, 1 (July 1997), 97–137.
     search engines. http://mybettersearch.com/, 2010.           [16] Teevan, J. The re:search engine: simultaneous
     [Online; accessed 06/06/2013].                                   support for finding and re-finding. In Proceedings of
 [2] Baeza-Yates, R., Ribeiro-Neto, B., et al.                        the 20th annual ACM symposium on User interface
     Modern information retrieval, vol. 463. ACM press                software and technology (New York, NY, USA, 2007),
     New York, 1999.                                                  UIST ’07, ACM, pp. 23–32.
 [3] Broder, A. A taxonomy of web search. SIGIR Forum            [17] Teevan, J., Adar, E., Jones, R., and Potts, M.
     36, 2 (Sept. 2002), 3–10.                                        A. S. Information re-retrieval: repeat queries in
 [4] Chen, H., Fan, H., Chau, M., and Zeng, D.                        yahoo’s logs. In Proceedings of the 30th annual
     Metaspider: Meta-searching and categorization on the             international ACM SIGIR conference on Research and
     web. Journal of the American Society for Information             development in information retrieval (New York, NY,
     Science and Technology 52, 13 (2001), 1134–1147.                 USA, 2007), SIGIR ’07, ACM, pp. 151–158.
 [5] Elsweiler, D., and Ruthven, I. Towards                      [18] Wang, T. D., Deshpande, A., and Shneiderman,
     task-based personal information management                       B. A temporal pattern search algorithm for personal
     evaluations. In Proceedings of the 30th annual                   history event visualization. Knowledge and Data
     international ACM SIGIR conference on Research and               Engineering, IEEE Transactions on 24, 5 (2012),
     development in information retrieval (New York, NY,              799–812.
     USA, 2007), SIGIR ’07, ACM, pp. 23–30.
                      A System for Perspective-Aware Search

              M. Atif Qureshi*†! , Arjumand Younus*†! , Colm O’Riordan* , Gabriella Pasi! , Nasir
                                                 Touheed†
          *
           Computational Intelligence Research Group, Information Technology, National University of Ireland,
                                                      Galway, Ireland
           !
             Information Retrieval Lab, Informatics, Systems and Communication, University of Milan Bicocca,
                                                        Milan, Italy
           †
             Web Science Research Group, Faculty of Computer Science, Institute of Business Administration,
                                                     Karachi, Pakistan
           muhammad.qureshi,arjumand.younus@nuigalway.ie, colm.oriordan@nuigalway.ie,
                          pasi@disco.unimib.it, ntouheed@iba.edu.pk


ABSTRACT                                                                   terrorism in most of the cases. This prompts the user
Traditional search engines fail to capture the notion of “per-             to explicitly evaluate how much Islam is related to ter-
spective” in their search results and at times present the re-             rorism in the returned search results.
sults skewed towards a particular topic. Under most of these
                                                                        • Consider the case of a user who wishes to find out
cases even query reformulation fails to retrieve desired search
                                                                          about roles and rights of women in Islam but the search
results and the underlying reason for such failure is often
                                                                          engine returns articles that contain a high amount of
the bias within the document collection itself (e.g., news ar-
                                                                          terms highlighting oppression against women instead
ticles). A perspective-aware search interface enabling users
                                                                          of women rights and roles. In this case the user is
to look into search results for some “perspective” terms may
                                                                          prompted to check the correlation between women and
be of great use for certain information needs. In this paper
                                                                          oppression within the search results that have been
we describe such a system.
                                                                          returned.

Categories and Subject Descriptors                                      Note that the perspective given by most search results
H.1.2 [User/Machine Systems]: Human factors; H.3.3                   (Islam in our motivating example (1) and oppression in our
[Information Search and Retrieval]: Search process                   motivating example (2)) may or may not be aligned with
                                                                     the user’s query intent. In case of search results not being
                                                                     aligned with his/her query intent he/she may be interested
General Terms                                                        in observing the amount of perspective tendencies in various
Human Factors, Performance                                           news reports.
                                                                        This paper proposes the concept of a “perspective-aware”
Keywords                                                             search interface that enables the user to explicitly analyse
                                                                     search results for information from a particular perspec-
Perspective, Wikipedia, Bias
                                                                     tive with respect to an issued query. To the best of our
                                                                     knowledge, previous research within Human-Computer In-
1. INTRODUCTION AND RELATED WORK                                     teraction and Information Retrieval has failed to capture
  It is often the case that when using a search engine for in-       the notion of “perspective” within the information retrieval
formation seeking users have an underlying intent [1]. Tra-          process. Early research related to Interactive Information
ditional search interfaces fail to capture the user intent for       Retrieval by Belkin [2] and Ingwersen [6] suggests the inte-
certain topics and at times return results that may be skewed        gration of cognitive aspects within the information retrieval
towards a certain perspective. Here, perspective as defined          process: in line with this suggestion we argue for incorporat-
by the Oxford Dictionary refers to a “point of view”1 within         ing the essential cognitive element of “perspectives”2 within
the search results that may or may not be something what             the search engine interface.
user is looking for. We explain further through the following           Recently the information retrieval community has turned
motivating examples:                                                 attention to diversification of search results which aims to
                                                                     tackle the issue of query ambiguity on the user side [8]. How-
      • Consider the case of a user who wishes to find more
                                                                     ever, even when formulating a non-ambiguous query users
        about a certain event (say, a bomb attack in a certain
                                                                     may have an intent that influences the perspective from
        region). The search results returned contain a ma-
                                                                     which the query terms can be interpreted in a text; in case of
        jority of news reports blaming Islam relating it with
                                                                     2
1                                                                      According to Wikipedia the definition of perspective states
    This may also be seen as topic drifts within a document.         the following: “Perspective in theory of cognition is the
Presented at EuroHCIR2013. Copyright !
                                     c 2013 for the individual pa-   choice of a context or a reference (or the result of this choice)
pers by the papers’ authors. Copying permitted only for pri-         from which to sense, categorize, measure or codify experi-
vate and academic purposes. This volume is published and copy-       ence, cohesively forming a coherent belief, typically for com-
righted by its editors..                                             paring with another.”
                               Figure 1: Entry Point of Perspective-Aware Search Interface


                                                                  the entry point of the interface which resembles the standard
                                                                  type-keywords-in-entry-form interface with the augmenta-
                                                                  tion of an additional input text box for entry of perspective
                                                                  terms.
                                                                     The underlying perspective detection algorithm makes use
                                                                  of the encyclopedic structure in Wikipedia; more specifi-
                                                                  cally the knowledge encoded in Wikipedia’s graph structure
                                                                  is utilized for the discovery of various perspectives in docu-
                                                                  ments returned by the search engine. Wikipedia is organized
                                                                  into categories in a taxonomy-like3 structure (see Figure 2).
                                                                  Each Wikipedia category can have an arbitrary number of
                                                                  subcategories as well as being mentioned inside an arbitrary
                                                                  number of supercategories (e.g., category C4 in Figure 1 is
                                                                  a subcategory of C2 and C3 , and a supercategory of C5 , C6
                                                                  and C7 .) Furthermore, in Wikipedia each article can belong
                                                                  to an arbitrary number of categories, where each category is
Figure 2: Wikipedia Category Graph Structure along                a kind of semantic tag for that article [11]. As an example,
with Wikipedia Articles                                           in Figure 2, article A1 belongs to categories C1 and C10 ,
                                                                  article A2 belongs to categories C3 and C4 , while article A3
                                                                  belongs to categories C4 and C7 . It can be seen that the
perspective mismatch between the user intent and the doc-         articles and the Wikipedia Category Graph are interlinked
uments returned in first positions by a search engine, users      and our system makes use of these interlinks for the detec-
may find the retrieved results annoying or subjective to a        tion of a certain perspective within a document retrieved by
non-agreed perspective [7]. One may argue that a query re-        the search engine.
formulation technique could be employed to tackle this prob-
lem [5]; e.g. considering the motivating example (2), the user
could issue a reformulated query such as “roles and rights of
                                                                  2.1   Underlying Algorithm
women in islam”. However, for some topics query reformu-             The underlying perspective detection algorithm within our
lation may fail to retrieve the desired search results, and the   system requires the perspective term/phrase to match the
underlying reason for such failure is often the bias within the   title of a Wikipedia article. This may seem to impose a cog-
document collection itself (e.g., news articles) [10]. Under      nitive load on the user at search time. However, this is not
such a scenario it would be interesting to provide a search       the case: as shown in Figure 3 the entered text automati-
interface that would enable the users to look into the search     cally turns green when a certain user-specified perspective
results for some “perspective” terms and we describe such a       term matches the title of a Wikipedia article, and symmet-
system in this paper.                                             rically the entered text automatically turns red in case of a
                                                                  mismatch.
                                                                     Once the perspective term is entered correctly the system
2. PERSPECTIVE-AWARE SEARCH INTER-                                fetches the Wikipedia article corresponding to the perspec-
   FACE AND IMPLEMENTATION DETAILS                                tive term referred to as Seed Perspective Article (PAseed )
  This section presents the essential details of the proposed     along with the categories to which it belongs and we use
perspective-aware search interface along with the underlying
implementation details. We keep the interface as simple as        3
                                                                   We say taxonomy-like because it is not strictly hierarchi-
possible on account of research suggesting users’ reluctance      cal due to the presence of cycles in the Wikipedia category
in switching from a simple search form [3]. Figure 1 shows        graph.
      Figure 3: Automatic Text Color Changing to Test Match of Perspective Term with Wikipedia Article Title


PC0 4 to refer to these categories. After fetching of Wikipedia   “terrorism” is shown in Figure 4. As evident from the top
categories in PC0 , the system retrieves sub-categories of PC0    search result, there is a high perspective of terrorism within
until depth 2 i.e., PC1 and PC2 5 and collectively these cat-     the returned document and perspective terms that our al-
egories related to PAseed are referred to as PC (where PC         gorithm fetches are as follows: a) the war on terrorism, b)
is union of PC0 , PC1 and PC2 .). Next, the set of all ar-        ayman al zawahiri, and c) osama bin laden.
ticles within the Wikipedia category set PC is retrieved
and we refer to this set as Expanded Perspective Article Set      3.   DISCUSSION
(PAexpanded ). The system then retrieves all categories as-
                                                                     There have been many efforts in the information retrieval
sociated with the set PAexpanded which we refer to as WC ;
                                                                  research to present to users information regarding the rela-
note that PC is a subset of WC. Finally, the intersection be-
                                                                  tionship between the query and the answer set and the query
tween PC and WC is retrieved which is a set of categories
                                                                  and document collection. Capturing this information during
representative of the domain of the perspective term origi-
                                                                  the retrieval process provides the user with much valuable in-
nally input by the user, we refer to this set of representative
                                                                  formation (e.g. whether a term is overly specific, or whether
categories as RC.
                                                                  a term is ambiguous etc.). Various attempts have been made
   After building the Wikipedia category sets as defined above6
                                                                  to tackle this problem, ranging from the definition of snip-
i.e., PC, RC and WC we match variable-length n-grams
                                                                  pets to the definition of approaches to cluster search results
within a document with articles in the set PAexpanded , and
                                                                  (Clusty.com), to the presentation of diversified search results
we check for cardinality of RC and WC. The cardinality
                                                                  in the first position of the ranked list offered to the users.
scores along with n-gram frequencies are used to compute a
                                                                  Recently there has been a resurgence of interest in defining
perspective score for each document.
                                                                  visualization techniques of search results that offer an effec-
2.2 Search Results Presentation                                   tive and more informative alternative to usual and scarcely
                                                                  informative ranked lists. Pioneer visualization systems are
   The perspective scores computed in section 2.1 are dis-        represented by Tilebar [4], and Infocyrstal [9], and these
played within the search results, and based on the perspec-       attempts have been aimed to provide the user with more
tive score a document receives , we define four levels of         information than that provided by the traditional ranked
perspective adherence as follows: a) High, b) Medium, c)          list.
Low, and d) Neutral. Moreover, in case of documents with             This additional information can help the user in their
high, medium and low scores we also report the top-scoring        search task (e.g. allowing them to navigate the collection
perspective terms that were extracted using the Wikipedia         more easily or providing evidence to allow the user to refor-
graph structure as explained previously. A sample search          mulate their query more efficiently).
corresponding to search query “india pakistan relations” and         Our proposed system, although related in that we also at-
4                                                                 tempt to give the user an insight into the answer set and its
  These are basically perspective categories at depth zero.       relation to the query, differs in a fundamental manner. Our
5
  These are basically perspective categories at depth one and     system, we posit, allows the user to gain insight into the an-
two.
6
  The set building phase is performed through a cus-              swer set and its relation to the query, but moreover, allows
tom Wikipedia API that has pre-indexed Wikipedia                  to the user to gain an insight into a perspective inherent in
data and hence, it is computationally fast. For details           the answer set. Our system uses an external and collectively
http://www3.it.nuigalway.ie/cirg/prj/WikiMadeEasy.html            created knowledge resource (which is less likely to be biased
                                Figure 4: Search Results within Perspective-Aware Search


in a given direction) to obtain extra terms to represent the       [3] M. A. Hearst. ’natural’ search user interfaces.
perspective of interest to the user. This knowledge (per-              Commun. ACM, 54(11):60–67, Nov. 2011.
spective term and related terms) does not modify the query         [4] M. A. Hearst and J. O. Pedersen. Visualizing
(as would an additional query term), but is instead used to            information retrieval results: a demonstration of the
highlight the presence of a perspective in the answer set.             tilebar interface. In Conference Companion on Human
  In this paper we have proposed a novel approach for cap-             Factors in Computing Systems, pages 394–395, 1996.
turing the relationship between a user’s query and the re-         [5] J. Huang and E. N. Efthimiadis. Analyzing and
turned answer set. We do not rely on evidence in the doc-              evaluating query reformulation strategies in web
ument collection or the query stream, but rather instead               search logs. In Proceedings of the 18th ACM
extract terms from an external source of evidence to help              conference on Information and knowledge
users quickly see the presence of a particular perspective in          management, CIKM ’09, pages 77–86, 2009.
the document collection and answer set.                            [6] P. Ingwersen. Cognitive perspectives of information
                                                                       retrieval interaction: Elements of a cognitive IR
4. FUTURE WORK                                                         theory. Journal of Documentation, 52(1):3–50, 1996.
  Having built the system and undertaken preliminary user          [7] B. J. Jansen, D. L. Booth, and A. Spink. Determining
evaluations7 , we aim at undertaking a complete and system-            the informational, navigational, and transactional
atic review of the approach. This will comprise a number               intent of web queries. Inf. Process. Manage.,
of separate user evaluation tasks. The initial experiments             44(3):1251–1266, May 2008.
will involve comparing our search approach with and with-          [8] R. L. Santos, C. Macdonald, and I. Ounis.
out the perspective-aware component over a number of tasks             Intent-aware search result diversification. In
to see if the additional context and information provided by           Proceedings of the 34th international ACM SIGIR
our perspective aware system aids the users in a range of              conference on Research and development in
information-seeking tasks. Our second planned experiments              Information Retrieval, SIGIR ’11, pages 595–604,
will be focussed on persons seeking information from news-             2011.
paper articles, a domain wherein a degree of bias often exists.    [9] A. Spoerri. Infocrystal: A visual tool for information
We wish to explore the users’ experience with regards to any           retrieval & management. In Proceedings of the second
perceived bias in the considered corpora.                              international conference on Information and
                                                                       knowledge management, pages 11–20, 1993.
                                                                  [10] A. Younus, M. A. Qureshi, S. K. Kingrani, M. Saeed,
5. REFERENCES                                                          N. Touheed, C. O’Riordan, and P. Gabriella.
 [1] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong.             Investigating bias in traditional media through social
     Diversifying search results. In Proceedings of the                media. In Proceedings of the 21st international
     Second ACM International Conference on Web Search                 conference companion on World Wide Web, WWW
     and Data Mining, WSDM ’09, pages 5–14, 2009.                      ’12 Companion, pages 643–644, 2012.
 [2] N. Belkin. Cognitive models and information transfer.        [11] T. Zesch and I. Gurevych. Analysis of the Wikipedia
     Social Science Information Studies, 4(2âĂŞ3):111 –             Category Graph for NLP Applications. In Proceedings
     129, 1984.                                                        of the TextGraphs-2 Workshop (NAACL-HLT), 2007.
7
  The preliminary user evaluations have not been shared in
this paper.

</pre>