Searching Wikipedia: learning the why, the how, and the role played by emotion Hanna Knäusl Department of Information Science University of Regensburg 93040 Regensburg hanna.knaeusl@sprachlit.uni-regensburg.de ABSTRACT • Entity search, e.g. [2], which assumes the user has Searching Wikipedia has been the focus of study for an in- an information need that could be solved by with a creasing number of information retrieval publications. In list of entities that satisfy some properties. A query recent years different IR tasks have used Wikipedia as a ba- might, for example, indicate the type of entities to be sis for evaluating algorithms and interfaces for various types retrieved (e.g., “castle”) and distinctive features (e.g., of search tasks, including Question Answering, Exploratory “German”, “medieval”). Search, Entity Search and Structured Document retrieval. • Structured retrieval e.g. [3], which aims to retrieve Despite being associated with these well-defined task types, relevant parts of documents in a collection in response little is known about why people actually search wikipedia, to given information need. what they try to find, how and why they try to find it or the criteria they use to define success. We argue that the • Exploratory search e.g. [5], whereby the user has a way wikipedia content is generated influences the way it is poorly defined information need, little knowledge of used, including search behaviour. We are particularly in- the topic of interest or is unfamiliar with the search terested in learning about affective aspects of search, which space. have been suggested to be an important motivating factor Each of these examples are associated with well-defined in wikipedia search behaviour, particularly in leisure scenar- tasks or situations. However, it is unclear how reflective ios. In this position paper we motivate the investigation of these tasks are of real-life wikipedia search behaviour. Are wikipedia search behaviour in the wild and present our ideas these the most appropriate tasks to be investigating? Are on the best way to study this behaviour. we evaluating these tasks appropriately? Are there more pressing aspects that we, as a research community, should 1. INTRODUCTION AND MOTIVATION be investigating? As a starting point to answering these questions, in the Wikipedia1 is a free online encyclopedia, which due to its following section, we briefly review research that informs on open source design and community-based editing policy has wikipedia search behaviour in naturalistic situations. become one of the largest reference works of all time. The large volume of information, the breadth of topics covered and open-access nature of the collection has made Wikipedia 2. SEARCHING WIKIPEDIA a natural target of study within the Information Retrieval The main source of knowledge of wikipedia search be- research community. Wikipedia is now used as the document haviour comes from transaction log analyses. Sakai and collection for several retrieval evaluation efforts at CLEF [4] Nogami [6], for example, logged user interaction with a wikipedia and INEX [3] and has formed the basis of evaluations in search interface, designed to encourage exploration and de- several IR domains including: velopment of information needs. They discovered that infor- mation needs tend to progress and develop in small steps, • Question answering, e.g. [4], which attempts to pro- usually within query type. For example, users tended to vide answers to questions such as “How fast can a browse pages from person to person or from place to place Cheetah run?”, sometimes supplementing answers with etc. The implicit structure of wikipedia most likely encour- additional relevant snippets that might be helpful to ages this behavior the user. Fissaha and de Rijke [1] also used log analyses to learn 1 http://www.wikipedia.org about wikipedia searches, distinguishing between “directed” and “undirected” searches by analysing the phrasing of queries. They [also] discovered that a large percentage of searches were undirected and exploratory in nature. Log-based investigations such as these have the advantage of collecting large quantities of data from naturalistic situ- ations. However, they are limited in that they say nothing about the intention of the user, his experience, or the out- Presented at Searching4Fun workshop at ECIR2012. Copyright January come of the search. For example, the work of Wilson and 2012 for the individual papers by the papers’ authors. Copying permit- ted only for private and academic purposes. This volume is published and Elsweiler [7] asserts that many searches will not be moti- copyrighted by its editors. vated by information needs per se, but purely by the user having an interest in a topic. In their work, they found we ask more detailed questions regarding the experience, example search tasks that were motivated by the desire to success of the task, how the feelings realized and the factors achieving a particular mood, emotional or physical state or that influenced these. This data will be elicited through a by the presence or need of someone else in the social con- mixture of fixed and free-form questions. text. In such cases, the support the user would need from We plan to triangulate the data collected from the vari- the system and the criteria that should be used to evaluate ous aspects of our study to create a rich understanding of system performance would be very different to those cur- user needs and behaviour. For example, we plan to look rently featured in information retrieval research. at the content of visited pages; the topic and the kind of We believe that the way wikipedia is constructed, i.e., media used etc. and look to see how this relates to how par- collaboratively by a subset of the users, the large collection ticipants describe their experiences. We want to see, what size and broad topic range, linked structure, as well as mul- affects user behaviour, e.g. does the link structure or the timedia prominence of multimedia content will mean that way information is presented, certain content influence be- wikipedia will be used for leisure-time tasks. People are mo- haviour or emotions experienced. The different sources of tivated to create / edit wikipedia pages as it mirrors their data we will collect will help us to learn about these com- interests. This may not always be positive. plicated behavioural aspects. For example, Wilson and Elsweiler [7] describe one study participant reporting frustration that he has again wasted 4. CONCLUSIONS a lot of time aimlessly browsing ebay. This negative out- So what will we learn from the study and why is it impor- come - realised through a negative emotion - would not be tant? The most important point is to find out what makes considered in any current IR methodology. the users happy; what do they need, how do they behave In the following section we outline our thoughts on what to achieve these needs and emotional aspects are involved we believe to be a more suitable study design to learn about when Wikipedia is searched? An understanding of these is- wikipedia search tasks. We would like to use the workshop sues will inform us on the kind of functionality a wikipedia as a platform for discussion to improve on this design. search tool should offer. Do users want to browse to related topics? Do they like a wide range of possible interesting in- 3. LEARNING ABOUT BEHAVIOUR WITH formation or just quirky look up pieces of information as and when they are needed? The proposed study would offer the A LOG / DIARY HYBRID chance to answer these questions by providing naturalistic We need to design a study that helps us learn about the data, as well as additional comments from the participants the user’s motivation for searching, his behaviour in response of interest. to this motivation, his satisfaction with the experience as well as his emotional response to the experience. 5. REFERENCES To investigate these aspects we propose combining the log based approaches scholars have used previously with user [1] S. F. Adafre and M. de Rijke. Exploratory search in diaries. Diary Studies offer the ability to capture factual wikipedia. In Proceedings SIGIR 2006 workshop on data, in a natural setting, without the distracting influence Evaluating Exploratory Search Systems, 2006. of an observer. They also offer the chance to question the [2] G. Demartini, C. Firan, T. Iofciu, R. Krestel, and user regarding his motivation to search, as well as the search W. Nejdl. Why finding entities in wikipedia is difficult, process and feelings and emotions experienced during the sometimes. Information Retrieval, 13:534–567, 2010. search process. 10.1007/s10791-010-9135-7. Diary studies also have limitations. These include difficul- [3] INEX. Initiative for the evaluation of xml retrieval, ties in maintaining participant dedication levels throughout 2006. url: http://inex.is.informatik.uni - the period of study and getting the participants to remember duisburg.de/2006/. that situations of interest should be recorded. These neg- [4] V. Jijkoun and M. de Rijke. Overview of WiQA 2006. ative aspects can be offset, however, through careful study In A. Nardi, C. Peters, and J. Vicedo, editors, Working design. For example, since Wikipedia is digital and accessed Notes CLEF 2006, September 2006. within a web browser, it makes sense to use a digital diary [5] B. Kules and R. Capra. Designing exploratory search that can also be filled out in a web-browser session, perhaps tasks for user studies of information seeking support as a pop up. We plan to build an extension to the Firefox systems. In Proceedings of the 9th ACM/IEEE-CS joint web-browser that detects when a wikipedia page is accessed conference on Digital libraries, JCDL ’09, pages and if a certain time threshold has elapsed since the last 419–420, New York, NY, USA, 2009. ACM. diary entry, the user will be asked to record details about [6] T. Sakai and K. Nogami. Serendipitous search via his information need and the motivating situation surround wikipedia: a query log analysis. In Proceedings of the the search. The extension will also record interactions with 32nd international ACM SIGIR conference on Research wikipedia (e.g. pages viewed, search queries submitted etc.), and development in information retrieval, SIGIR ’09, allowing analyses similar to those published previously to be pages 780–781, New York, NY, USA, 2009. ACM. complemented by the diary study data. [7] M. L. Wilson and D. Elsweiler. Casual-leisure To limit the irritation that filling out such a form would searching: the exploratory search scenarios that break cause and to minimise distraction to the search process we our current models. In 4th International Workshop on plan only to ask two short questions at that time point. The Human-Computer Interaction and Information user will be asked to give a brief description of what they Retrieval, Aug 2010. New Brunswick, NJ. are looking for and why. This will be enough information to remind them of the situation at a later time point when