=Paper= {{Paper |id=None |storemode=property |title=A System for Perspective-Aware Search |pdfUrl=https://ceur-ws.org/Vol-1033/paper9.pdf |volume=Vol-1033 |dblpUrl=https://dblp.org/rec/conf/eurohcir/QureshiYOPT13 }} ==A System for Perspective-Aware Search== https://ceur-ws.org/Vol-1033/paper9.pdf
                      A System for Perspective-Aware Search

              M. Atif Qureshi*† , Arjumand Younus*† , Colm O’Riordan* , Gabriella Pasi , Nasir
                                                 Touheed†
          *
           Computational Intelligence Research Group, Information Technology, National University of Ireland,
                                                      Galway, Ireland
           
             Information Retrieval Lab, Informatics, Systems and Communication, University of Milan Bicocca,
                                                        Milan, Italy
           †
             Web Science Research Group, Faculty of Computer Science, Institute of Business Administration,
                                                     Karachi, Pakistan
           muhammad.qureshi,arjumand.younus@nuigalway.ie, colm.oriordan@nuigalway.ie,
                          pasi@disco.unimib.it, ntouheed@iba.edu.pk


ABSTRACT                                                                   terrorism in most of the cases. This prompts the user
Traditional search engines fail to capture the notion of “per-             to explicitly evaluate how much Islam is related to ter-
spective” in their search results and at times present the re-             rorism in the returned search results.
sults skewed towards a particular topic. Under most of these
                                                                        • Consider the case of a user who wishes to find out
cases even query reformulation fails to retrieve desired search
                                                                          about roles and rights of women in Islam but the search
results and the underlying reason for such failure is often
                                                                          engine returns articles that contain a high amount of
the bias within the document collection itself (e.g., news ar-
                                                                          terms highlighting oppression against women instead
ticles). A perspective-aware search interface enabling users
                                                                          of women rights and roles. In this case the user is
to look into search results for some “perspective” terms may
                                                                          prompted to check the correlation between women and
be of great use for certain information needs. In this paper
                                                                          oppression within the search results that have been
we describe such a system.
                                                                          returned.

Categories and Subject Descriptors                                      Note that the perspective given by most search results
H.1.2 [User/Machine Systems]: Human factors; H.3.3                   (Islam in our motivating example (1) and oppression in our
[Information Search and Retrieval]: Search process                   motivating example (2)) may or may not be aligned with
                                                                     the user’s query intent. In case of search results not being
                                                                     aligned with his/her query intent he/she may be interested
General Terms                                                        in observing the amount of perspective tendencies in various
Human Factors, Performance                                           news reports.
                                                                        This paper proposes the concept of a “perspective-aware”
Keywords                                                             search interface that enables the user to explicitly analyse
                                                                     search results for information from a particular perspec-
Perspective, Wikipedia, Bias
                                                                     tive with respect to an issued query. To the best of our
                                                                     knowledge, previous research within Human-Computer In-
1. INTRODUCTION AND RELATED WORK                                     teraction and Information Retrieval has failed to capture
  It is often the case that when using a search engine for in-       the notion of “perspective” within the information retrieval
formation seeking users have an underlying intent [1]. Tra-          process. Early research related to Interactive Information
ditional search interfaces fail to capture the user intent for       Retrieval by Belkin [2] and Ingwersen [6] suggests the inte-
certain topics and at times return results that may be skewed        gration of cognitive aspects within the information retrieval
towards a certain perspective. Here, perspective as defined          process: in line with this suggestion we argue for incorporat-
by the Oxford Dictionary refers to a “point of view”1 within         ing the essential cognitive element of “perspectives”2 within
the search results that may or may not be something what             the search engine interface.
user is looking for. We explain further through the following           Recently the information retrieval community has turned
motivating examples:                                                 attention to diversification of search results which aims to
                                                                     tackle the issue of query ambiguity on the user side [8]. How-
      • Consider the case of a user who wishes to find more
                                                                     ever, even when formulating a non-ambiguous query users
        about a certain event (say, a bomb attack in a certain
                                                                     may have an intent that influences the perspective from
        region). The search results returned contain a ma-
                                                                     which the query terms can be interpreted in a text; in case of
        jority of news reports blaming Islam relating it with
                                                                     2
1                                                                      According to Wikipedia the definition of perspective states
    This may also be seen as topic drifts within a document.         the following: “Perspective in theory of cognition is the
Presented at EuroHCIR2013. Copyright c 2013 for the individual pa-   choice of a context or a reference (or the result of this choice)
pers by the papers’ authors. Copying permitted only for pri-         from which to sense, categorize, measure or codify experi-
vate and academic purposes. This volume is published and copy-       ence, cohesively forming a coherent belief, typically for com-
righted by its editors..                                             paring with another.”
                               Figure 1: Entry Point of Perspective-Aware Search Interface


                                                                  the entry point of the interface which resembles the standard
                                                                  type-keywords-in-entry-form interface with the augmenta-
                                                                  tion of an additional input text box for entry of perspective
                                                                  terms.
                                                                     The underlying perspective detection algorithm makes use
                                                                  of the encyclopedic structure in Wikipedia; more specifi-
                                                                  cally the knowledge encoded in Wikipedia’s graph structure
                                                                  is utilized for the discovery of various perspectives in docu-
                                                                  ments returned by the search engine. Wikipedia is organized
                                                                  into categories in a taxonomy-like3 structure (see Figure 2).
                                                                  Each Wikipedia category can have an arbitrary number of
                                                                  subcategories as well as being mentioned inside an arbitrary
                                                                  number of supercategories (e.g., category C4 in Figure 1 is
                                                                  a subcategory of C2 and C3 , and a supercategory of C5 , C6
                                                                  and C7 .) Furthermore, in Wikipedia each article can belong
                                                                  to an arbitrary number of categories, where each category is
Figure 2: Wikipedia Category Graph Structure along                a kind of semantic tag for that article [11]. As an example,
with Wikipedia Articles                                           in Figure 2, article A1 belongs to categories C1 and C10 ,
                                                                  article A2 belongs to categories C3 and C4 , while article A3
                                                                  belongs to categories C4 and C7 . It can be seen that the
perspective mismatch between the user intent and the doc-         articles and the Wikipedia Category Graph are interlinked
uments returned in first positions by a search engine, users      and our system makes use of these interlinks for the detec-
may find the retrieved results annoying or subjective to a        tion of a certain perspective within a document retrieved by
non-agreed perspective [7]. One may argue that a query re-        the search engine.
formulation technique could be employed to tackle this prob-
lem [5]; e.g. considering the motivating example (2), the user
could issue a reformulated query such as “roles and rights of
                                                                  2.1   Underlying Algorithm
women in islam”. However, for some topics query reformu-             The underlying perspective detection algorithm within our
lation may fail to retrieve the desired search results, and the   system requires the perspective term/phrase to match the
underlying reason for such failure is often the bias within the   title of a Wikipedia article. This may seem to impose a cog-
document collection itself (e.g., news articles) [10]. Under      nitive load on the user at search time. However, this is not
such a scenario it would be interesting to provide a search       the case: as shown in Figure 3 the entered text automati-
interface that would enable the users to look into the search     cally turns green when a certain user-specified perspective
results for some “perspective” terms and we describe such a       term matches the title of a Wikipedia article, and symmet-
system in this paper.                                             rically the entered text automatically turns red in case of a
                                                                  mismatch.
                                                                     Once the perspective term is entered correctly the system
2. PERSPECTIVE-AWARE SEARCH INTER-                                fetches the Wikipedia article corresponding to the perspec-
   FACE AND IMPLEMENTATION DETAILS                                tive term referred to as Seed Perspective Article (PAseed )
  This section presents the essential details of the proposed     along with the categories to which it belongs and we use
perspective-aware search interface along with the underlying
implementation details. We keep the interface as simple as        3
                                                                   We say taxonomy-like because it is not strictly hierarchi-
possible on account of research suggesting users’ reluctance      cal due to the presence of cycles in the Wikipedia category
in switching from a simple search form [3]. Figure 1 shows        graph.
      Figure 3: Automatic Text Color Changing to Test Match of Perspective Term with Wikipedia Article Title


PC0 4 to refer to these categories. After fetching of Wikipedia   “terrorism” is shown in Figure 4. As evident from the top
categories in PC0 , the system retrieves sub-categories of PC0    search result, there is a high perspective of terrorism within
until depth 2 i.e., PC1 and PC2 5 and collectively these cat-     the returned document and perspective terms that our al-
egories related to PAseed are referred to as PC (where PC         gorithm fetches are as follows: a) the war on terrorism, b)
is union of PC0 , PC1 and PC2 .). Next, the set of all ar-        ayman al zawahiri, and c) osama bin laden.
ticles within the Wikipedia category set PC is retrieved
and we refer to this set as Expanded Perspective Article Set      3.   DISCUSSION
(PAexpanded ). The system then retrieves all categories as-
                                                                     There have been many efforts in the information retrieval
sociated with the set PAexpanded which we refer to as WC ;
                                                                  research to present to users information regarding the rela-
note that PC is a subset of WC. Finally, the intersection be-
                                                                  tionship between the query and the answer set and the query
tween PC and WC is retrieved which is a set of categories
                                                                  and document collection. Capturing this information during
representative of the domain of the perspective term origi-
                                                                  the retrieval process provides the user with much valuable in-
nally input by the user, we refer to this set of representative
                                                                  formation (e.g. whether a term is overly specific, or whether
categories as RC.
                                                                  a term is ambiguous etc.). Various attempts have been made
   After building the Wikipedia category sets as defined above6
                                                                  to tackle this problem, ranging from the definition of snip-
i.e., PC, RC and WC we match variable-length n-grams
                                                                  pets to the definition of approaches to cluster search results
within a document with articles in the set PAexpanded , and
                                                                  (Clusty.com), to the presentation of diversified search results
we check for cardinality of RC and WC. The cardinality
                                                                  in the first position of the ranked list offered to the users.
scores along with n-gram frequencies are used to compute a
                                                                  Recently there has been a resurgence of interest in defining
perspective score for each document.
                                                                  visualization techniques of search results that offer an effec-
                                                                  tive and more informative alternative to usual and scarcely
2.2 Search Results Presentation                                   informative ranked lists. Pioneer visualization systems are
   The perspective scores computed in section 2.1 are dis-        represented by Tilebar [4], and Infocyrstal [9], and these
played within the search results, and based on the perspec-       attempts have been aimed to provide the user with more
tive score a document receives , we define four levels of         information than that provided by the traditional ranked
perspective adherence as follows: a) High, b) Medium, c)          list.
Low, and d) Neutral. Moreover, in case of documents with             This additional information can help the user in their
high, medium and low scores we also report the top-scoring        search task (e.g. allowing them to navigate the collection
perspective terms that were extracted using the Wikipedia         more easily or providing evidence to allow the user to refor-
graph structure as explained previously. A sample search          mulate their query more efficiently).
corresponding to search query “india pakistan relations” and         Our proposed system, although related in that we also at-
4                                                                 tempt to give the user an insight into the answer set and its
  These are basically perspective categories at depth zero.       relation to the query, differs in a fundamental manner. Our
5
  These are basically perspective categories at depth one and     system, we posit, allows the user to gain insight into the an-
two.
6
  The set building phase is performed through a cus-              swer set and its relation to the query, but moreover, allows
tom Wikipedia API that has pre-indexed Wikipedia                  to the user to gain an insight into a perspective inherent in
data and hence, it is computationally fast. For details           the answer set. Our system uses an external and collectively
http://www3.it.nuigalway.ie/cirg/prj/WikiMadeEasy.html            created knowledge resource (which is less likely to be biased
                                Figure 4: Search Results within Perspective-Aware Search


in a given direction) to obtain extra terms to represent the       [3] M. A. Hearst. ’natural’ search user interfaces.
perspective of interest to the user. This knowledge (per-              Commun. ACM, 54(11):60–67, Nov. 2011.
spective term and related terms) does not modify the query         [4] M. A. Hearst and J. O. Pedersen. Visualizing
(as would an additional query term), but is instead used to            information retrieval results: a demonstration of the
highlight the presence of a perspective in the answer set.             tilebar interface. In Conference Companion on Human
  In this paper we have proposed a novel approach for cap-             Factors in Computing Systems, pages 394–395, 1996.
turing the relationship between a user’s query and the re-         [5] J. Huang and E. N. Efthimiadis. Analyzing and
turned answer set. We do not rely on evidence in the doc-              evaluating query reformulation strategies in web
ument collection or the query stream, but rather instead               search logs. In Proceedings of the 18th ACM
extract terms from an external source of evidence to help              conference on Information and knowledge
users quickly see the presence of a particular perspective in          management, CIKM ’09, pages 77–86, 2009.
the document collection and answer set.                            [6] P. Ingwersen. Cognitive perspectives of information
                                                                       retrieval interaction: Elements of a cognitive IR
4. FUTURE WORK                                                         theory. Journal of Documentation, 52(1):3–50, 1996.
  Having built the system and undertaken preliminary user          [7] B. J. Jansen, D. L. Booth, and A. Spink. Determining
evaluations7 , we aim at undertaking a complete and system-            the informational, navigational, and transactional
atic review of the approach. This will comprise a number               intent of web queries. Inf. Process. Manage.,
of separate user evaluation tasks. The initial experiments             44(3):1251–1266, May 2008.
will involve comparing our search approach with and with-          [8] R. L. Santos, C. Macdonald, and I. Ounis.
out the perspective-aware component over a number of tasks             Intent-aware search result diversification. In
to see if the additional context and information provided by           Proceedings of the 34th international ACM SIGIR
our perspective aware system aids the users in a range of              conference on Research and development in
information-seeking tasks. Our second planned experiments              Information Retrieval, SIGIR ’11, pages 595–604,
will be focussed on persons seeking information from news-             2011.
paper articles, a domain wherein a degree of bias often exists.    [9] A. Spoerri. Infocrystal: A visual tool for information
We wish to explore the users’ experience with regards to any           retrieval & management. In Proceedings of the second
perceived bias in the considered corpora.                              international conference on Information and
                                                                       knowledge management, pages 11–20, 1993.
                                                                  [10] A. Younus, M. A. Qureshi, S. K. Kingrani, M. Saeed,
5. REFERENCES                                                          N. Touheed, C. O’Riordan, and P. Gabriella.
 [1] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong.             Investigating bias in traditional media through social
     Diversifying search results. In Proceedings of the                media. In Proceedings of the 21st international
     Second ACM International Conference on Web Search                 conference companion on World Wide Web, WWW
     and Data Mining, WSDM ’09, pages 5–14, 2009.                      ’12 Companion, pages 643–644, 2012.
 [2] N. Belkin. Cognitive models and information transfer.        [11] T. Zesch and I. Gurevych. Analysis of the Wikipedia
     Social Science Information Studies, 4(2âĂŞ3):111 –             Category Graph for NLP Applications. In Proceedings
     129, 1984.                                                        of the TextGraphs-2 Workshop (NAACL-HLT), 2007.
7
  The preliminary user evaluations have not been shared in
this paper.