=Paper=
{{Paper
|id=None
|storemode=property
|title=Searching Wikipedia: Learning the Why, the How, and the Role Played by Emotion
|pdfUrl=https://ceur-ws.org/Vol-836/paper5.pdf
|volume=Vol-836
}}
==Searching Wikipedia: Learning the Why, the How, and the Role Played by Emotion==
Searching Wikipedia: learning the why, the how, and the
role played by emotion
Hanna Knäusl
Department of Information Science
University of Regensburg
93040 Regensburg
hanna.knaeusl@sprachlit.uni-regensburg.de
ABSTRACT • Entity search, e.g. [2], which assumes the user has
Searching Wikipedia has been the focus of study for an in- an information need that could be solved by with a
creasing number of information retrieval publications. In list of entities that satisfy some properties. A query
recent years different IR tasks have used Wikipedia as a ba- might, for example, indicate the type of entities to be
sis for evaluating algorithms and interfaces for various types retrieved (e.g., “castle”) and distinctive features (e.g.,
of search tasks, including Question Answering, Exploratory “German”, “medieval”).
Search, Entity Search and Structured Document retrieval. • Structured retrieval e.g. [3], which aims to retrieve
Despite being associated with these well-defined task types, relevant parts of documents in a collection in response
little is known about why people actually search wikipedia, to given information need.
what they try to find, how and why they try to find it or
the criteria they use to define success. We argue that the • Exploratory search e.g. [5], whereby the user has a
way wikipedia content is generated influences the way it is poorly defined information need, little knowledge of
used, including search behaviour. We are particularly in- the topic of interest or is unfamiliar with the search
terested in learning about affective aspects of search, which space.
have been suggested to be an important motivating factor Each of these examples are associated with well-defined
in wikipedia search behaviour, particularly in leisure scenar- tasks or situations. However, it is unclear how reflective
ios. In this position paper we motivate the investigation of these tasks are of real-life wikipedia search behaviour. Are
wikipedia search behaviour in the wild and present our ideas these the most appropriate tasks to be investigating? Are
on the best way to study this behaviour. we evaluating these tasks appropriately? Are there more
pressing aspects that we, as a research community, should
1. INTRODUCTION AND MOTIVATION be investigating?
As a starting point to answering these questions, in the
Wikipedia1 is a free online encyclopedia, which due to its
following section, we briefly review research that informs on
open source design and community-based editing policy has
wikipedia search behaviour in naturalistic situations.
become one of the largest reference works of all time. The
large volume of information, the breadth of topics covered
and open-access nature of the collection has made Wikipedia 2. SEARCHING WIKIPEDIA
a natural target of study within the Information Retrieval The main source of knowledge of wikipedia search be-
research community. Wikipedia is now used as the document haviour comes from transaction log analyses. Sakai and
collection for several retrieval evaluation efforts at CLEF [4] Nogami [6], for example, logged user interaction with a wikipedia
and INEX [3] and has formed the basis of evaluations in search interface, designed to encourage exploration and de-
several IR domains including: velopment of information needs. They discovered that infor-
mation needs tend to progress and develop in small steps,
• Question answering, e.g. [4], which attempts to pro- usually within query type. For example, users tended to
vide answers to questions such as “How fast can a browse pages from person to person or from place to place
Cheetah run?”, sometimes supplementing answers with etc. The implicit structure of wikipedia most likely encour-
additional relevant snippets that might be helpful to ages this behavior
the user. Fissaha and de Rijke [1] also used log analyses to learn
1
http://www.wikipedia.org about wikipedia searches, distinguishing between “directed”
and “undirected” searches by analysing the phrasing of queries.
They [also] discovered that a large percentage of searches
were undirected and exploratory in nature.
Log-based investigations such as these have the advantage
of collecting large quantities of data from naturalistic situ-
ations. However, they are limited in that they say nothing
about the intention of the user, his experience, or the out-
Presented at Searching4Fun workshop at ECIR2012. Copyright January come of the search. For example, the work of Wilson and
2012 for the individual papers by the papers’ authors. Copying permit-
ted only for private and academic purposes. This volume is published and Elsweiler [7] asserts that many searches will not be moti-
copyrighted by its editors. vated by information needs per se, but purely by the user
having an interest in a topic. In their work, they found we ask more detailed questions regarding the experience,
example search tasks that were motivated by the desire to success of the task, how the feelings realized and the factors
achieving a particular mood, emotional or physical state or that influenced these. This data will be elicited through a
by the presence or need of someone else in the social con- mixture of fixed and free-form questions.
text. In such cases, the support the user would need from We plan to triangulate the data collected from the vari-
the system and the criteria that should be used to evaluate ous aspects of our study to create a rich understanding of
system performance would be very different to those cur- user needs and behaviour. For example, we plan to look
rently featured in information retrieval research. at the content of visited pages; the topic and the kind of
We believe that the way wikipedia is constructed, i.e., media used etc. and look to see how this relates to how par-
collaboratively by a subset of the users, the large collection ticipants describe their experiences. We want to see, what
size and broad topic range, linked structure, as well as mul- affects user behaviour, e.g. does the link structure or the
timedia prominence of multimedia content will mean that way information is presented, certain content influence be-
wikipedia will be used for leisure-time tasks. People are mo- haviour or emotions experienced. The different sources of
tivated to create / edit wikipedia pages as it mirrors their data we will collect will help us to learn about these com-
interests. This may not always be positive. plicated behavioural aspects.
For example, Wilson and Elsweiler [7] describe one study
participant reporting frustration that he has again wasted 4. CONCLUSIONS
a lot of time aimlessly browsing ebay. This negative out-
So what will we learn from the study and why is it impor-
come - realised through a negative emotion - would not be
tant? The most important point is to find out what makes
considered in any current IR methodology.
the users happy; what do they need, how do they behave
In the following section we outline our thoughts on what
to achieve these needs and emotional aspects are involved
we believe to be a more suitable study design to learn about
when Wikipedia is searched? An understanding of these is-
wikipedia search tasks. We would like to use the workshop
sues will inform us on the kind of functionality a wikipedia
as a platform for discussion to improve on this design.
search tool should offer. Do users want to browse to related
topics? Do they like a wide range of possible interesting in-
3. LEARNING ABOUT BEHAVIOUR WITH formation or just quirky look up pieces of information as and
when they are needed? The proposed study would offer the
A LOG / DIARY HYBRID chance to answer these questions by providing naturalistic
We need to design a study that helps us learn about the data, as well as additional comments from the participants
the user’s motivation for searching, his behaviour in response of interest.
to this motivation, his satisfaction with the experience as
well as his emotional response to the experience. 5. REFERENCES
To investigate these aspects we propose combining the log
based approaches scholars have used previously with user [1] S. F. Adafre and M. de Rijke. Exploratory search in
diaries. Diary Studies offer the ability to capture factual wikipedia. In Proceedings SIGIR 2006 workshop on
data, in a natural setting, without the distracting influence Evaluating Exploratory Search Systems, 2006.
of an observer. They also offer the chance to question the [2] G. Demartini, C. Firan, T. Iofciu, R. Krestel, and
user regarding his motivation to search, as well as the search W. Nejdl. Why finding entities in wikipedia is difficult,
process and feelings and emotions experienced during the sometimes. Information Retrieval, 13:534–567, 2010.
search process. 10.1007/s10791-010-9135-7.
Diary studies also have limitations. These include difficul- [3] INEX. Initiative for the evaluation of xml retrieval,
ties in maintaining participant dedication levels throughout 2006. url: http://inex.is.informatik.uni -
the period of study and getting the participants to remember duisburg.de/2006/.
that situations of interest should be recorded. These neg- [4] V. Jijkoun and M. de Rijke. Overview of WiQA 2006.
ative aspects can be offset, however, through careful study In A. Nardi, C. Peters, and J. Vicedo, editors, Working
design. For example, since Wikipedia is digital and accessed Notes CLEF 2006, September 2006.
within a web browser, it makes sense to use a digital diary [5] B. Kules and R. Capra. Designing exploratory search
that can also be filled out in a web-browser session, perhaps tasks for user studies of information seeking support
as a pop up. We plan to build an extension to the Firefox systems. In Proceedings of the 9th ACM/IEEE-CS joint
web-browser that detects when a wikipedia page is accessed conference on Digital libraries, JCDL ’09, pages
and if a certain time threshold has elapsed since the last 419–420, New York, NY, USA, 2009. ACM.
diary entry, the user will be asked to record details about [6] T. Sakai and K. Nogami. Serendipitous search via
his information need and the motivating situation surround wikipedia: a query log analysis. In Proceedings of the
the search. The extension will also record interactions with 32nd international ACM SIGIR conference on Research
wikipedia (e.g. pages viewed, search queries submitted etc.), and development in information retrieval, SIGIR ’09,
allowing analyses similar to those published previously to be pages 780–781, New York, NY, USA, 2009. ACM.
complemented by the diary study data. [7] M. L. Wilson and D. Elsweiler. Casual-leisure
To limit the irritation that filling out such a form would searching: the exploratory search scenarios that break
cause and to minimise distraction to the search process we our current models. In 4th International Workshop on
plan only to ask two short questions at that time point. The Human-Computer Interaction and Information
user will be asked to give a brief description of what they Retrieval, Aug 2010. New Brunswick, NJ.
are looking for and why. This will be enough information
to remind them of the situation at a later time point when