<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Searching Wikipedia: learning the why, the how, and the role played by emotion</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hanna Knäusl</string-name>
          <email>hanna.knaeusl@sprachlit.uni-regensburg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Science University of Regensburg 93040 Regensburg</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Searching Wikipedia has been the focus of study for an increasing number of information retrieval publications. In recent years di erent IR tasks have used Wikipedia as a basis for evaluating algorithms and interfaces for various types of search tasks, including Question Answering, Exploratory Search, Entity Search and Structured Document retrieval. Despite being associated with these well-de ned task types, little is known about why people actually search wikipedia, what they try to nd, how and why they try to nd it or the criteria they use to de ne success. We argue that the way wikipedia content is generated in uences the way it is used, including search behaviour. We are particularly interested in learning about a ective aspects of search, which have been suggested to be an important motivating factor in wikipedia search behaviour, particularly in leisure scenarios. In this position paper we motivate the investigation of wikipedia search behaviour in the wild and present our ideas on the best way to study this behaviour.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. INTRODUCTION AND MOTIVATION</p>
      <p>
        Wikipedia1 is a free online encyclopedia, which due to its
open source design and community-based editing policy has
become one of the largest reference works of all time. The
large volume of information, the breadth of topics covered
and open-access nature of the collection has made Wikipedia
a natural target of study within the Information Retrieval
research community. Wikipedia is now used as the document
collection for several retrieval evaluation e orts at CLEF [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
and INEX [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and has formed the basis of evaluations in
several IR domains including:
      </p>
      <p>
        Question answering, e.g. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], which attempts to
provide answers to questions such as \How fast can a
Cheetah run?", sometimes supplementing answers with
additional relevant snippets that might be helpful to
the user.
1http://www.wikipedia.org
Presented at Searching4Fun workshop at ECIR2012. Copyright January
2012 for the individual papers by the papers’ authors. Copying
permitted only for private and academic purposes. This volume is published and
copyrighted by its editors.
      </p>
      <p>
        Entity search, e.g. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which assumes the user has
an information need that could be solved by with a
list of entities that satisfy some properties. A query
might, for example, indicate the type of entities to be
retrieved (e.g., \castle") and distinctive features (e.g.,
\German", \medieval").
      </p>
      <p>
        Structured retrieval e.g. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which aims to retrieve
relevant parts of documents in a collection in response
to given information need.
      </p>
      <p>
        Exploratory search e.g. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], whereby the user has a
poorly de ned information need, little knowledge of
the topic of interest or is unfamiliar with the search
space.
      </p>
      <p>Each of these examples are associated with well-de ned
tasks or situations. However, it is unclear how re ective
these tasks are of real-life wikipedia search behaviour. Are
these the most appropriate tasks to be investigating? Are
we evaluating these tasks appropriately? Are there more
pressing aspects that we, as a research community, should
be investigating?</p>
      <p>As a starting point to answering these questions, in the
following section, we brie y review research that informs on
wikipedia search behaviour in naturalistic situations.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>SEARCHING WIKIPEDIA</title>
      <p>
        The main source of knowledge of wikipedia search
behaviour comes from transaction log analyses. Sakai and
Nogami [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], for example, logged user interaction with a wikipedia
search interface, designed to encourage exploration and
development of information needs. They discovered that
information needs tend to progress and develop in small steps,
usually within query type. For example, users tended to
browse pages from person to person or from place to place
etc. The implicit structure of wikipedia most likely
encourages this behavior
      </p>
      <p>
        Fissaha and de Rijke [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] also used log analyses to learn
about wikipedia searches, distinguishing between \directed"
and \undirected" searches by analysing the phrasing of queries.
They [also] discovered that a large percentage of searches
were undirected and exploratory in nature.
      </p>
      <p>
        Log-based investigations such as these have the advantage
of collecting large quantities of data from naturalistic
situations. However, they are limited in that they say nothing
about the intention of the user, his experience, or the
outcome of the search. For example, the work of Wilson and
Elsweiler [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] asserts that many searches will not be
motivated by information needs per se, but purely by the user
having an interest in a topic. In their work, they found
example search tasks that were motivated by the desire to
achieving a particular mood, emotional or physical state or
by the presence or need of someone else in the social
context. In such cases, the support the user would need from
the system and the criteria that should be used to evaluate
system performance would be very di erent to those
currently featured in information retrieval research.
      </p>
      <p>We believe that the way wikipedia is constructed, i.e.,
collaboratively by a subset of the users, the large collection
size and broad topic range, linked structure, as well as
multimedia prominence of multimedia content will mean that
wikipedia will be used for leisure-time tasks. People are
motivated to create / edit wikipedia pages as it mirrors their
interests. This may not always be positive.</p>
      <p>
        For example, Wilson and Elsweiler [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] describe one study
participant reporting frustration that he has again wasted
a lot of time aimlessly browsing ebay. This negative
outcome - realised through a negative emotion - would not be
considered in any current IR methodology.
      </p>
      <p>In the following section we outline our thoughts on what
we believe to be a more suitable study design to learn about
wikipedia search tasks. We would like to use the workshop
as a platform for discussion to improve on this design.</p>
      <p>LEARNING ABOUT BEHAVIOUR WITH
A LOG / DIARY HYBRID</p>
      <p>We need to design a study that helps us learn about the
the user's motivation for searching, his behaviour in response
to this motivation, his satisfaction with the experience as
well as his emotional response to the experience.</p>
      <p>To investigate these aspects we propose combining the log
based approaches scholars have used previously with user
diaries. Diary Studies o er the ability to capture factual
data, in a natural setting, without the distracting in uence
of an observer. They also o er the chance to question the
user regarding his motivation to search, as well as the search
process and feelings and emotions experienced during the
search process.</p>
      <p>Diary studies also have limitations. These include di
culties in maintaining participant dedication levels throughout
the period of study and getting the participants to remember
that situations of interest should be recorded. These
negative aspects can be o set, however, through careful study
design. For example, since Wikipedia is digital and accessed
within a web browser, it makes sense to use a digital diary
that can also be lled out in a web-browser session, perhaps
as a pop up. We plan to build an extension to the Firefox
web-browser that detects when a wikipedia page is accessed
and if a certain time threshold has elapsed since the last
diary entry, the user will be asked to record details about
his information need and the motivating situation surround
the search. The extension will also record interactions with
wikipedia (e.g. pages viewed, search queries submitted etc.),
allowing analyses similar to those published previously to be
complemented by the diary study data.</p>
      <p>To limit the irritation that lling out such a form would
cause and to minimise distraction to the search process we
plan only to ask two short questions at that time point. The
user will be asked to give a brief description of what they
are looking for and why. This will be enough information
to remind them of the situation at a later time point when
we ask more detailed questions regarding the experience,
success of the task, how the feelings realized and the factors
that in uenced these. This data will be elicited through a
mixture of xed and free-form questions.</p>
      <p>We plan to triangulate the data collected from the
various aspects of our study to create a rich understanding of
user needs and behaviour. For example, we plan to look
at the content of visited pages; the topic and the kind of
media used etc. and look to see how this relates to how
participants describe their experiences. We want to see, what
a ects user behaviour, e.g. does the link structure or the
way information is presented, certain content in uence
behaviour or emotions experienced. The di erent sources of
data we will collect will help us to learn about these
complicated behavioural aspects.
4.</p>
    </sec>
    <sec id="sec-3">
      <title>CONCLUSIONS</title>
      <p>So what will we learn from the study and why is it
important? The most important point is to nd out what makes
the users happy; what do they need, how do they behave
to achieve these needs and emotional aspects are involved
when Wikipedia is searched? An understanding of these
issues will inform us on the kind of functionality a wikipedia
search tool should o er. Do users want to browse to related
topics? Do they like a wide range of possible interesting
information or just quirky look up pieces of information as and
when they are needed? The proposed study would o er the
chance to answer these questions by providing naturalistic
data, as well as additional comments from the participants
of interest.
5.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S. F.</given-names>
            <surname>Adafre and M. de Rijke</surname>
          </string-name>
          .
          <article-title>Exploratory search in wikipedia</article-title>
          .
          <source>In Proceedings SIGIR 2006 workshop on Evaluating Exploratory Search Systems</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Demartini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Firan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Iofciu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Krestel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Nejdl</surname>
          </string-name>
          .
          <article-title>Why nding entities in wikipedia is di cult, sometimes</article-title>
          .
          <source>Information Retrieval</source>
          ,
          <volume>13</volume>
          :
          <fpage>534</fpage>
          {
          <fpage>567</fpage>
          ,
          <year>2010</year>
          .
          <volume>10</volume>
          .1007/s10791-010-9135-7.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>INEX</surname>
          </string-name>
          .
          <article-title>Initiative for the evaluation of xml retrieval</article-title>
          ,
          <year>2006</year>
          . url: http://inex.is.informatik.uni - duisburg.de/2006/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V.</given-names>
            <surname>Jijkoun and M. de Rijke</surname>
          </string-name>
          .
          <article-title>Overview of WiQA 2006</article-title>
          . In A. Nardi,
          <string-name>
            <given-names>C.</given-names>
            <surname>Peters</surname>
          </string-name>
          , and J. Vicedo, editors,
          <source>Working Notes CLEF</source>
          <year>2006</year>
          ,
          <year>September 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kules</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Capra</surname>
          </string-name>
          .
          <article-title>Designing exploratory search tasks for user studies of information seeking support systems</article-title>
          .
          <source>In Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries, JCDL '09</source>
          , pages
          <fpage>419</fpage>
          {
          <fpage>420</fpage>
          , New York, NY, USA,
          <year>2009</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Sakai</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Nogami</surname>
          </string-name>
          .
          <article-title>Serendipitous search via wikipedia: a query log analysis</article-title>
          .
          <source>In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval</source>
          ,
          <source>SIGIR '09</source>
          , pages
          <fpage>780</fpage>
          {
          <fpage>781</fpage>
          , New York, NY, USA,
          <year>2009</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Wilson</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Elsweiler</surname>
          </string-name>
          .
          <article-title>Casual-leisure searching: the exploratory search scenarios that break our current models</article-title>
          .
          <source>In 4th International Workshop on Human-Computer Interaction and Information Retrieval</source>
          ,
          <year>Aug 2010</year>
          . New Brunswick, NJ.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>