=Paper=
{{Paper
|id=None
|storemode=property
|title=A System for Perspective-Aware Search
|pdfUrl=https://ceur-ws.org/Vol-1033/paper9.pdf
|volume=Vol-1033
|dblpUrl=https://dblp.org/rec/conf/eurohcir/QureshiYOPT13
}}
==A System for Perspective-Aware Search==
A System for Perspective-Aware Search M. Atif Qureshi*† , Arjumand Younus*† , Colm O’Riordan* , Gabriella Pasi , Nasir Touheed† * Computational Intelligence Research Group, Information Technology, National University of Ireland, Galway, Ireland Information Retrieval Lab, Informatics, Systems and Communication, University of Milan Bicocca, Milan, Italy † Web Science Research Group, Faculty of Computer Science, Institute of Business Administration, Karachi, Pakistan muhammad.qureshi,arjumand.younus@nuigalway.ie, colm.oriordan@nuigalway.ie, pasi@disco.unimib.it, ntouheed@iba.edu.pk ABSTRACT terrorism in most of the cases. This prompts the user Traditional search engines fail to capture the notion of “per- to explicitly evaluate how much Islam is related to ter- spective” in their search results and at times present the re- rorism in the returned search results. sults skewed towards a particular topic. Under most of these • Consider the case of a user who wishes to find out cases even query reformulation fails to retrieve desired search about roles and rights of women in Islam but the search results and the underlying reason for such failure is often engine returns articles that contain a high amount of the bias within the document collection itself (e.g., news ar- terms highlighting oppression against women instead ticles). A perspective-aware search interface enabling users of women rights and roles. In this case the user is to look into search results for some “perspective” terms may prompted to check the correlation between women and be of great use for certain information needs. In this paper oppression within the search results that have been we describe such a system. returned. Categories and Subject Descriptors Note that the perspective given by most search results H.1.2 [User/Machine Systems]: Human factors; H.3.3 (Islam in our motivating example (1) and oppression in our [Information Search and Retrieval]: Search process motivating example (2)) may or may not be aligned with the user’s query intent. In case of search results not being aligned with his/her query intent he/she may be interested General Terms in observing the amount of perspective tendencies in various Human Factors, Performance news reports. This paper proposes the concept of a “perspective-aware” Keywords search interface that enables the user to explicitly analyse search results for information from a particular perspec- Perspective, Wikipedia, Bias tive with respect to an issued query. To the best of our knowledge, previous research within Human-Computer In- 1. INTRODUCTION AND RELATED WORK teraction and Information Retrieval has failed to capture It is often the case that when using a search engine for in- the notion of “perspective” within the information retrieval formation seeking users have an underlying intent [1]. Tra- process. Early research related to Interactive Information ditional search interfaces fail to capture the user intent for Retrieval by Belkin [2] and Ingwersen [6] suggests the inte- certain topics and at times return results that may be skewed gration of cognitive aspects within the information retrieval towards a certain perspective. Here, perspective as defined process: in line with this suggestion we argue for incorporat- by the Oxford Dictionary refers to a “point of view”1 within ing the essential cognitive element of “perspectives”2 within the search results that may or may not be something what the search engine interface. user is looking for. We explain further through the following Recently the information retrieval community has turned motivating examples: attention to diversification of search results which aims to tackle the issue of query ambiguity on the user side [8]. How- • Consider the case of a user who wishes to find more ever, even when formulating a non-ambiguous query users about a certain event (say, a bomb attack in a certain may have an intent that influences the perspective from region). The search results returned contain a ma- which the query terms can be interpreted in a text; in case of jority of news reports blaming Islam relating it with 2 1 According to Wikipedia the definition of perspective states This may also be seen as topic drifts within a document. the following: “Perspective in theory of cognition is the Presented at EuroHCIR2013. Copyright c 2013 for the individual pa- choice of a context or a reference (or the result of this choice) pers by the papers’ authors. Copying permitted only for pri- from which to sense, categorize, measure or codify experi- vate and academic purposes. This volume is published and copy- ence, cohesively forming a coherent belief, typically for com- righted by its editors.. paring with another.” Figure 1: Entry Point of Perspective-Aware Search Interface the entry point of the interface which resembles the standard type-keywords-in-entry-form interface with the augmenta- tion of an additional input text box for entry of perspective terms. The underlying perspective detection algorithm makes use of the encyclopedic structure in Wikipedia; more specifi- cally the knowledge encoded in Wikipedia’s graph structure is utilized for the discovery of various perspectives in docu- ments returned by the search engine. Wikipedia is organized into categories in a taxonomy-like3 structure (see Figure 2). Each Wikipedia category can have an arbitrary number of subcategories as well as being mentioned inside an arbitrary number of supercategories (e.g., category C4 in Figure 1 is a subcategory of C2 and C3 , and a supercategory of C5 , C6 and C7 .) Furthermore, in Wikipedia each article can belong to an arbitrary number of categories, where each category is Figure 2: Wikipedia Category Graph Structure along a kind of semantic tag for that article [11]. As an example, with Wikipedia Articles in Figure 2, article A1 belongs to categories C1 and C10 , article A2 belongs to categories C3 and C4 , while article A3 belongs to categories C4 and C7 . It can be seen that the perspective mismatch between the user intent and the doc- articles and the Wikipedia Category Graph are interlinked uments returned in first positions by a search engine, users and our system makes use of these interlinks for the detec- may find the retrieved results annoying or subjective to a tion of a certain perspective within a document retrieved by non-agreed perspective [7]. One may argue that a query re- the search engine. formulation technique could be employed to tackle this prob- lem [5]; e.g. considering the motivating example (2), the user could issue a reformulated query such as “roles and rights of 2.1 Underlying Algorithm women in islam”. However, for some topics query reformu- The underlying perspective detection algorithm within our lation may fail to retrieve the desired search results, and the system requires the perspective term/phrase to match the underlying reason for such failure is often the bias within the title of a Wikipedia article. This may seem to impose a cog- document collection itself (e.g., news articles) [10]. Under nitive load on the user at search time. However, this is not such a scenario it would be interesting to provide a search the case: as shown in Figure 3 the entered text automati- interface that would enable the users to look into the search cally turns green when a certain user-specified perspective results for some “perspective” terms and we describe such a term matches the title of a Wikipedia article, and symmet- system in this paper. rically the entered text automatically turns red in case of a mismatch. Once the perspective term is entered correctly the system 2. PERSPECTIVE-AWARE SEARCH INTER- fetches the Wikipedia article corresponding to the perspec- FACE AND IMPLEMENTATION DETAILS tive term referred to as Seed Perspective Article (PAseed ) This section presents the essential details of the proposed along with the categories to which it belongs and we use perspective-aware search interface along with the underlying implementation details. We keep the interface as simple as 3 We say taxonomy-like because it is not strictly hierarchi- possible on account of research suggesting users’ reluctance cal due to the presence of cycles in the Wikipedia category in switching from a simple search form [3]. Figure 1 shows graph. Figure 3: Automatic Text Color Changing to Test Match of Perspective Term with Wikipedia Article Title PC0 4 to refer to these categories. After fetching of Wikipedia “terrorism” is shown in Figure 4. As evident from the top categories in PC0 , the system retrieves sub-categories of PC0 search result, there is a high perspective of terrorism within until depth 2 i.e., PC1 and PC2 5 and collectively these cat- the returned document and perspective terms that our al- egories related to PAseed are referred to as PC (where PC gorithm fetches are as follows: a) the war on terrorism, b) is union of PC0 , PC1 and PC2 .). Next, the set of all ar- ayman al zawahiri, and c) osama bin laden. ticles within the Wikipedia category set PC is retrieved and we refer to this set as Expanded Perspective Article Set 3. DISCUSSION (PAexpanded ). The system then retrieves all categories as- There have been many efforts in the information retrieval sociated with the set PAexpanded which we refer to as WC ; research to present to users information regarding the rela- note that PC is a subset of WC. Finally, the intersection be- tionship between the query and the answer set and the query tween PC and WC is retrieved which is a set of categories and document collection. Capturing this information during representative of the domain of the perspective term origi- the retrieval process provides the user with much valuable in- nally input by the user, we refer to this set of representative formation (e.g. whether a term is overly specific, or whether categories as RC. a term is ambiguous etc.). Various attempts have been made After building the Wikipedia category sets as defined above6 to tackle this problem, ranging from the definition of snip- i.e., PC, RC and WC we match variable-length n-grams pets to the definition of approaches to cluster search results within a document with articles in the set PAexpanded , and (Clusty.com), to the presentation of diversified search results we check for cardinality of RC and WC. The cardinality in the first position of the ranked list offered to the users. scores along with n-gram frequencies are used to compute a Recently there has been a resurgence of interest in defining perspective score for each document. visualization techniques of search results that offer an effec- tive and more informative alternative to usual and scarcely 2.2 Search Results Presentation informative ranked lists. Pioneer visualization systems are The perspective scores computed in section 2.1 are dis- represented by Tilebar [4], and Infocyrstal [9], and these played within the search results, and based on the perspec- attempts have been aimed to provide the user with more tive score a document receives , we define four levels of information than that provided by the traditional ranked perspective adherence as follows: a) High, b) Medium, c) list. Low, and d) Neutral. Moreover, in case of documents with This additional information can help the user in their high, medium and low scores we also report the top-scoring search task (e.g. allowing them to navigate the collection perspective terms that were extracted using the Wikipedia more easily or providing evidence to allow the user to refor- graph structure as explained previously. A sample search mulate their query more efficiently). corresponding to search query “india pakistan relations” and Our proposed system, although related in that we also at- 4 tempt to give the user an insight into the answer set and its These are basically perspective categories at depth zero. relation to the query, differs in a fundamental manner. Our 5 These are basically perspective categories at depth one and system, we posit, allows the user to gain insight into the an- two. 6 The set building phase is performed through a cus- swer set and its relation to the query, but moreover, allows tom Wikipedia API that has pre-indexed Wikipedia to the user to gain an insight into a perspective inherent in data and hence, it is computationally fast. For details the answer set. Our system uses an external and collectively http://www3.it.nuigalway.ie/cirg/prj/WikiMadeEasy.html created knowledge resource (which is less likely to be biased Figure 4: Search Results within Perspective-Aware Search in a given direction) to obtain extra terms to represent the [3] M. A. Hearst. ’natural’ search user interfaces. perspective of interest to the user. This knowledge (per- Commun. ACM, 54(11):60–67, Nov. 2011. spective term and related terms) does not modify the query [4] M. A. Hearst and J. O. Pedersen. Visualizing (as would an additional query term), but is instead used to information retrieval results: a demonstration of the highlight the presence of a perspective in the answer set. tilebar interface. In Conference Companion on Human In this paper we have proposed a novel approach for cap- Factors in Computing Systems, pages 394–395, 1996. turing the relationship between a user’s query and the re- [5] J. Huang and E. N. Efthimiadis. Analyzing and turned answer set. We do not rely on evidence in the doc- evaluating query reformulation strategies in web ument collection or the query stream, but rather instead search logs. In Proceedings of the 18th ACM extract terms from an external source of evidence to help conference on Information and knowledge users quickly see the presence of a particular perspective in management, CIKM ’09, pages 77–86, 2009. the document collection and answer set. [6] P. Ingwersen. Cognitive perspectives of information retrieval interaction: Elements of a cognitive IR 4. FUTURE WORK theory. Journal of Documentation, 52(1):3–50, 1996. Having built the system and undertaken preliminary user [7] B. J. Jansen, D. L. Booth, and A. Spink. Determining evaluations7 , we aim at undertaking a complete and system- the informational, navigational, and transactional atic review of the approach. This will comprise a number intent of web queries. Inf. Process. Manage., of separate user evaluation tasks. The initial experiments 44(3):1251–1266, May 2008. will involve comparing our search approach with and with- [8] R. L. Santos, C. Macdonald, and I. Ounis. out the perspective-aware component over a number of tasks Intent-aware search result diversification. In to see if the additional context and information provided by Proceedings of the 34th international ACM SIGIR our perspective aware system aids the users in a range of conference on Research and development in information-seeking tasks. Our second planned experiments Information Retrieval, SIGIR ’11, pages 595–604, will be focussed on persons seeking information from news- 2011. paper articles, a domain wherein a degree of bias often exists. [9] A. Spoerri. Infocrystal: A visual tool for information We wish to explore the users’ experience with regards to any retrieval & management. In Proceedings of the second perceived bias in the considered corpora. international conference on Information and knowledge management, pages 11–20, 1993. [10] A. Younus, M. A. Qureshi, S. K. Kingrani, M. Saeed, 5. REFERENCES N. Touheed, C. O’Riordan, and P. Gabriella. [1] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Investigating bias in traditional media through social Diversifying search results. In Proceedings of the media. In Proceedings of the 21st international Second ACM International Conference on Web Search conference companion on World Wide Web, WWW and Data Mining, WSDM ’09, pages 5–14, 2009. ’12 Companion, pages 643–644, 2012. [2] N. Belkin. Cognitive models and information transfer. [11] T. Zesch and I. Gurevych. Analysis of the Wikipedia Social Science Information Studies, 4(2âĂŞ3):111 – Category Graph for NLP Applications. In Proceedings 129, 1984. of the TextGraphs-2 Workshop (NAACL-HLT), 2007. 7 The preliminary user evaluations have not been shared in this paper.