<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A System for Perspective-Aware Search</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>M. Atif Qureshi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arjumand Younus</string-name>
          <email>arjumand.younus@nuigalway.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Colm O'Riordan</string-name>
          <email>colm.oriordan@nuigalway.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gabriella Pasi</string-name>
          <email>pasi@disco.unimib.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nasir Touheedy</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computational Intelligence Research Group, Information Technology, National University of Ireland</institution>
          ,
          <addr-line>Galway, Ireland Information Retrieval Lab, Informatics, Systems and Communication</addr-line>
          ,
          <institution>University of Milan Bicocca</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Traditional search engines fail to capture the notion of \perspective" in their search results and at times present the results skewed towards a particular topic. Under most of these cases even query reformulation fails to retrieve desired search results and the underlying reason for such failure is often the bias within the document collection itself (e.g., news articles). A perspective-aware search interface enabling users to look into search results for some \perspective" terms may be of great use for certain information needs. In this paper we describe such a system.</p>
      </abstract>
      <kwd-group>
        <kwd>Perspective</kwd>
        <kwd>Wikipedia</kwd>
        <kwd>Bias</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION AND RELATED WORK</title>
      <p>
        It is often the case that when using a search engine for
information seeking users have an underlying intent [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Traditional search interfaces fail to capture the user intent for
certain topics and at times return results that may be skewed
towards a certain perspective. Here, perspective as de ned
by the Oxford Dictionary refers to a \point of view"1 within
the search results that may or may not be something what
user is looking for. We explain further through the following
motivating examples:
      </p>
      <p>Consider the case of a user who wishes to nd more
about a certain event (say, a bomb attack in a certain
region). The search results returned contain a
majority of news reports blaming Islam relating it with
1This may also be seen as topic drifts within a document.
Presented at EuroHCIR2013. Copyright c 2013 for the individual
papers by the papers' authors. Copying permitted only for
private and academic purposes. This volume is published and
copyrighted by its editors..
terrorism in most of the cases. This prompts the user
to explicitly evaluate how much Islam is related to
terrorism in the returned search results.</p>
      <p>Consider the case of a user who wishes to nd out
about roles and rights of women in Islam but the search
engine returns articles that contain a high amount of
terms highlighting oppression against women instead
of women rights and roles. In this case the user is
prompted to check the correlation between women and
oppression within the search results that have been
returned.</p>
      <p>Note that the perspective given by most search results
(Islam in our motivating example (1) and oppression in our
motivating example (2)) may or may not be aligned with
the user's query intent. In case of search results not being
aligned with his/her query intent he/she may be interested
in observing the amount of perspective tendencies in various
news reports.</p>
      <p>
        This paper proposes the concept of a \perspective-aware"
search interface that enables the user to explicitly analyse
search results for information from a particular
perspective with respect to an issued query. To the best of our
knowledge, previous research within Human-Computer
Interaction and Information Retrieval has failed to capture
the notion of \perspective" within the information retrieval
process. Early research related to Interactive Information
Retrieval by Belkin [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and Ingwersen [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] suggests the
integration of cognitive aspects within the information retrieval
process: in line with this suggestion we argue for
incorporating the essential cognitive element of \perspectives"2 within
the search engine interface.
      </p>
      <p>
        Recently the information retrieval community has turned
attention to diversi cation of search results which aims to
tackle the issue of query ambiguity on the user side [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
However, even when formulating a non-ambiguous query users
may have an intent that in uences the perspective from
which the query terms can be interpreted in a text; in case of
2According to Wikipedia the de nition of perspective states
the following: \Perspective in theory of cognition is the
choice of a context or a reference (or the result of this choice)
from which to sense, categorize, measure or codify
experience, cohesively forming a coherent belief, typically for
comparing with another."
perspective mismatch between the user intent and the
documents returned in rst positions by a search engine, users
may nd the retrieved results annoying or subjective to a
non-agreed perspective [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. One may argue that a query
reformulation technique could be employed to tackle this
problem [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]; e.g. considering the motivating example (2), the user
could issue a reformulated query such as \roles and rights of
women in islam". However, for some topics query
reformulation may fail to retrieve the desired search results, and the
underlying reason for such failure is often the bias within the
document collection itself (e.g., news articles) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Under
such a scenario it would be interesting to provide a search
interface that would enable the users to look into the search
results for some \perspective" terms and we describe such a
system in this paper.
      </p>
    </sec>
    <sec id="sec-2">
      <title>PERSPECTIVE-AWARE SEARCH INTER</title>
    </sec>
    <sec id="sec-3">
      <title>FACE AND IMPLEMENTATION DETAILS</title>
      <p>
        This section presents the essential details of the proposed
perspective-aware search interface along with the underlying
implementation details. We keep the interface as simple as
possible on account of research suggesting users' reluctance
in switching from a simple search form [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Figure 1 shows
the entry point of the interface which resembles the standard
type-keywords-in-entry-form interface with the
augmentation of an additional input text box for entry of perspective
terms.
      </p>
      <p>
        The underlying perspective detection algorithm makes use
of the encyclopedic structure in Wikipedia; more speci
cally the knowledge encoded in Wikipedia's graph structure
is utilized for the discovery of various perspectives in
documents returned by the search engine. Wikipedia is organized
into categories in a taxonomy-like3 structure (see Figure 2).
Each Wikipedia category can have an arbitrary number of
subcategories as well as being mentioned inside an arbitrary
number of supercategories (e.g., category C4 in Figure 1 is
a subcategory of C2 and C3, and a supercategory of C5, C6
and C7.) Furthermore, in Wikipedia each article can belong
to an arbitrary number of categories, where each category is
a kind of semantic tag for that article [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. As an example,
in Figure 2, article A1 belongs to categories C1 and C10,
article A2 belongs to categories C3 and C4, while article A3
belongs to categories C4 and C7. It can be seen that the
articles and the Wikipedia Category Graph are interlinked
and our system makes use of these interlinks for the
detection of a certain perspective within a document retrieved by
the search engine.
2.1
      </p>
    </sec>
    <sec id="sec-4">
      <title>Underlying Algorithm</title>
      <p>The underlying perspective detection algorithm within our
system requires the perspective term/phrase to match the
title of a Wikipedia article. This may seem to impose a
cognitive load on the user at search time. However, this is not
the case: as shown in Figure 3 the entered text
automatically turns green when a certain user-speci ed perspective
term matches the title of a Wikipedia article, and
symmetrically the entered text automatically turns red in case of a
mismatch.</p>
      <p>Once the perspective term is entered correctly the system
fetches the Wikipedia article corresponding to the
perspective term referred to as Seed Perspective Article (PAseed)
along with the categories to which it belongs and we use
3We say taxonomy-like because it is not strictly
hierarchical due to the presence of cycles in the Wikipedia category
graph.
\terrorism" is shown in Figure 4. As evident from the top
search result, there is a high perspective of terrorism within
the returned document and perspective terms that our
algorithm fetches are as follows: a) the war on terrorism, b)
ayman al zawahiri, and c) osama bin laden.</p>
      <p>PC04 to refer to these categories. After fetching of Wikipedia
categories in PC0, the system retrieves sub-categories of PC0
until depth 2 i.e., PC1 and PC25 and collectively these
categories related to PAseed are referred to as PC (where PC
is union of PC0, PC1 and PC2.). Next, the set of all
articles within the Wikipedia category set PC is retrieved
and we refer to this set as Expanded Perspective Article Set
(PAexpanded). The system then retrieves all categories
associated with the set PAexpanded which we refer to as WC ;
note that PC is a subset of WC. Finally, the intersection
between PC and WC is retrieved which is a set of categories
representative of the domain of the perspective term
originally input by the user, we refer to this set of representative
categories as RC.</p>
      <p>After building the Wikipedia category sets as de ned above6
i.e., PC, RC and WC we match variable-length n-grams
within a document with articles in the set PAexpanded, and
we check for cardinality of RC and WC. The cardinality
scores along with n-gram frequencies are used to compute a
perspective score for each document.</p>
      <p>There have been many e orts in the information retrieval
research to present to users information regarding the
relationship between the query and the answer set and the query
and document collection. Capturing this information during
the retrieval process provides the user with much valuable
information (e.g. whether a term is overly speci c, or whether
a term is ambiguous etc.). Various attempts have been made
to tackle this problem, ranging from the de nition of
snippets to the de nition of approaches to cluster search results
(Clusty.com), to the presentation of diversi ed search results
in the rst position of the ranked list o ered to the users.</p>
      <p>Recently there has been a resurgence of interest in de ning
visualization techniques of search results that o er an e
ec2.2 Search Results Presentation tive and more informative alternative to usual and scarcely
informative ranked lists. Pioneer visualization systems are</p>
      <p>
        The perspective scores computed in section 2.1 are dis- represented by Tilebar [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and Infocyrstal [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and these
played within the search results, and based on the perspec- attempts have been aimed to provide the user with more
tive score a document receives , we de ne four levels of information than that provided by the traditional ranked
perspective adherence as follows: a) High, b) Medium, c) list.
      </p>
      <p>Low, and d) Neutral. Moreover, in case of documents with This additional information can help the user in their
high, medium and low scores we also report the top-scoring search task (e.g. allowing them to navigate the collection
perspective terms that were extracted using the Wikipedia more easily or providing evidence to allow the user to
reforgraph structure as explained previously. A sample search mulate their query more e ciently).
corresponding to search query \india pakistan relations" and Our proposed system, although related in that we also
attempt to give the user an insight into the answer set and its
4These are basically perspective categories at depth zero. relation to the query, di ers in a fundamental manner. Our
t5wToh.ese are basically perspective categories at depth one and system, we posit, allows the user to gain insight into the
an6The set building phase is performed through a cus- swer set and its relation to the query, but moreover, allows
tom Wikipedia API that has pre-indexed Wikipedia to the user to gain an insight into a perspective inherent in
data and hence, it is computationally fast. For details the answer set. Our system uses an external and collectively
http://www3.it.nuigalway.ie/cirg/prj/WikiMadeEasy.html created knowledge resource (which is less likely to be biased
in a given direction) to obtain extra terms to represent the
perspective of interest to the user. This knowledge
(perspective term and related terms) does not modify the query
(as would an additional query term), but is instead used to
highlight the presence of a perspective in the answer set.</p>
      <p>In this paper we have proposed a novel approach for
capturing the relationship between a user's query and the
returned answer set. We do not rely on evidence in the
document collection or the query stream, but rather instead
extract terms from an external source of evidence to help
users quickly see the presence of a particular perspective in
the document collection and answer set.</p>
    </sec>
    <sec id="sec-5">
      <title>FUTURE WORK</title>
      <p>Having built the system and undertaken preliminary user
evaluations7, we aim at undertaking a complete and
systematic review of the approach. This will comprise a number
of separate user evaluation tasks. The initial experiments
will involve comparing our search approach with and
without the perspective-aware component over a number of tasks
to see if the additional context and information provided by
our perspective aware system aids the users in a range of
information-seeking tasks. Our second planned experiments
will be focussed on persons seeking information from
newspaper articles, a domain wherein a degree of bias often exists.
We wish to explore the users' experience with regards to any
perceived bias in the considered corpora.
7The preliminary user evaluations have not been shared in
this paper.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gollapudi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Halverson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ieong</surname>
          </string-name>
          .
          <article-title>Diversifying search results</article-title>
          .
          <source>In Proceedings of the Second ACM International Conference on Web Search and Data Mining, WSDM '09</source>
          , pages
          <fpage>5</fpage>
          {
          <fpage>14</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Belkin</surname>
          </string-name>
          .
          <article-title>Cognitive models and information transfer</article-title>
          .
          <source>Social Science Information Studies</source>
          ,
          <volume>4</volume>
          (
          <issue>2</issue>
          ^aAS3):
          <volume>111</volume>
          {
          <fpage>129</fpage>
          ,
          <year>1984</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Hearst</surname>
          </string-name>
          .
          <article-title>'natural' search user interfaces</article-title>
          .
          <source>Commun. ACM</source>
          ,
          <volume>54</volume>
          (
          <issue>11</issue>
          ):
          <volume>60</volume>
          {
          <fpage>67</fpage>
          ,
          <string-name>
            <surname>Nov</surname>
          </string-name>
          .
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Hearst</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. O.</given-names>
            <surname>Pedersen</surname>
          </string-name>
          .
          <article-title>Visualizing information retrieval results: a demonstration of the tilebar interface</article-title>
          .
          <source>In Conference Companion on Human Factors in Computing Systems</source>
          , pages
          <fpage>394</fpage>
          {
          <fpage>395</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          and
          <string-name>
            <given-names>E. N.</given-names>
            <surname>Efthimiadis</surname>
          </string-name>
          .
          <article-title>Analyzing and evaluating query reformulation strategies in web search logs</article-title>
          .
          <source>In Proceedings of the 18th ACM conference on Information and knowledge management</source>
          ,
          <source>CIKM '09</source>
          , pages
          <fpage>77</fpage>
          {
          <fpage>86</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ingwersen</surname>
          </string-name>
          .
          <article-title>Cognitive perspectives of information retrieval interaction: Elements of a cognitive IR theory</article-title>
          .
          <source>Journal of Documentation</source>
          ,
          <volume>52</volume>
          (
          <issue>1</issue>
          ):3{
          <fpage>50</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Jansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Booth</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Spink</surname>
          </string-name>
          .
          <article-title>Determining the informational, navigational, and transactional intent of web queries</article-title>
          .
          <source>Inf</source>
          . Process. Manage.,
          <volume>44</volume>
          (
          <issue>3</issue>
          ):
          <volume>1251</volume>
          {
          <fpage>1266</fpage>
          , May
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macdonald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I.</given-names>
            <surname>Ounis</surname>
          </string-name>
          .
          <article-title>Intent-aware search result diversi cation</article-title>
          .
          <source>In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, SIGIR '11</source>
          , pages
          <fpage>595</fpage>
          {
          <fpage>604</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Spoerri</surname>
          </string-name>
          .
          <article-title>Infocrystal: A visual tool for information retrieval &amp; management</article-title>
          .
          <source>In Proceedings of the second international conference on Information and knowledge management</source>
          , pages
          <volume>11</volume>
          {
          <fpage>20</fpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Younus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Qureshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Kingrani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saeed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Touheed</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. O'Riordan</surname>
            ,
            <given-names>and P.</given-names>
          </string-name>
          <string-name>
            <surname>Gabriella</surname>
          </string-name>
          .
          <article-title>Investigating bias in traditional media through social media</article-title>
          .
          <source>In Proceedings of the 21st international conference companion on World Wide Web, WWW '12 Companion</source>
          , pages
          <volume>643</volume>
          {
          <fpage>644</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zesch</surname>
          </string-name>
          and
          <string-name>
            <surname>I. Gurevych.</surname>
          </string-name>
          <article-title>Analysis of the Wikipedia Category Graph for NLP Applications</article-title>
          .
          <source>In Proceedings of the TextGraphs-2 Workshop (NAACL-HLT)</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>