ECIR 2012 Barcelona, Spain April 1st, 2012 Searching4FUN Workshop of the 34rd European Conference on Information Retrieval Organised by: David Elsweiler, Max. L Wilson and Morgan Harvey I Copyright ©2012 remains with the author/owner(s). Proceedings of the ECIR 2012 Workshop on Searching4Fun. Held in Barcelona, Spain. April 1, 2012. II Preface These proceedings contain the papers presented at the ECIR 2012 Searching4Fun Workshop, that took place on 1 st April, 2012 in Barcelona, Spain. People spend more and more time online, not just to find information, but with the goal of enjoying themselves and passing time. Research has begun to show that during casual-leisure search, peoples’ intentions, their motivations, their criteria for success, and their querying behaviour all differ from typical web search, whilst potentially representing a significant portion of search queries. This workshop will investigate searching for fun, or casual-leisure search, and aims to understand this increasingly important type of searching, bring together relevant IR sub-communities (e.g. recommender systems, result diversity, multimedia retrieval) and related disciplines, discuss new and early research, and create a vision for future work in this area. There are lots of other open questions relating to searching for fun and the papers presented at the workshop deal with issues such as: - Understanding information needs and search behaviour in particular casual-leisure situations. - How existing systems are used in casual-leisure searching scenarios. - Use of Recommender Systems for Entertaining Content (books, movies, videos, music, websites). - Interfaces for exploratory search for casual-leisure situations. - Evaluation (methods, metrics) of Casual-leisure searching situations. - The role of Emotion in Casual-leisure search We would like to thank ECIR for hosting the workshop. Thanks also go to the programme committee and paper authors, without whom there would be no workshop. April 2012 David Elsweiler Max L. Wilson Morgan Harvey III Organisation Program Chairs David Elsweiler (University of Regensburg, Germany) Max L. Wilson (University of Nottingham, England) Morgan Harvey (University of Erlangen, Germany) Program Committee Pertti Vakkari, Tampere, Finland Elaine Toms, Sheffield, UK Ryen White, Microsoft Research, USA Leif Azzopardi, Glasgow, UK Bernd Ludwig, University of Regensburg, Germany Ian Ruthven, Strathclyde, UK Daniel Tunkelang, LinkedIn, USA Pablo Castells, Madrid, Spain Richard Schaller, Erlangen, Germany Stefan Mandl, Augsburg, Germany Amund, Tveit - Atbrox, Norway Michael Hurst - Loughborough University, UK IV Table of Contents Preface………………………………………………………………………………………………………...………I Organisation………………………………………………………………………………………………………….II Table of Contents…………………………………………………………………………………………………...III Keynote Lecture Finding without Seeking, Retrieving without Searching……………………………………………......................VI Elaine Toms (University of Sheffield) Presentations Session 1: Mobile Search Rethinking mobile search: towards casual, shared, social mobile search experiences ………………………....1 Sofia Reis (Telefonica), Karen Church (Telefonica) and Nuria Oliver (Telefonica) Out and About on Museums Night: Investigating Mobile Search Behaviour for Leisure Events ……………...5 Richard Schaller (Erlangen-Nuremberg), Morgan Harvey (Erlangen-Nuremberg) and David Elsweiler (Regensburg) The Information Needs of Mobile Searchers: A Framework ……………………………………………...........9 Tyler Tate (TwigKit) and Tony Russell-Rose (UXLabs) Session 2: Emotion Role of Emotion in Information Retrieval for Entertainment. …………………………………………….....…12 Yashar Moshfeghi (Glasgow) and Joemon M. Jose (Glasgow). Searching Wikipedia: learning the why, the how, and the role played by emotion ………………………….....14 Hanna Knäusl (Regensburg) Rushed or Relaxed? -- How the Situation on the Road Influences the Driver's Preferences for Music Tracks ..16 Linas Baltrunas (Telefonica), Bernd Ludwig (FAU-EN) and Francesco Ricci (Bozon-Bolzano) Session 3: Browsing for Reading Serendipitous Browsing: Stumbling the Wikipedia ……………………………………………………………..21 Claudia Hauff (Delft) and Geert-Jan Houben (Delft) A Diary Study of Information Needs Produced in Casual-Leisure Reading Situations. ………………………..25 Max L. Wilson (Nottingham), Basmah Alhodaithi (Swansea) and Michael Hurst (Loughborough) In Search of a Good Novel: Examining Results Matter ………………………………………………………...29 Suvi Oksaenen (Tampere) and Pertti Vakkari (Tampere) V Keynote Lecture – Elaine Toms Finding without Seeking, Retrieving without Searching In information retrieval we tend to focus on the process from specific information need to desired solution that follows a lockstep path from start to finish. Yet a rich part of our information world is in the unfocused, accidental encounter with information that leads to novel findings, and enriched experiences that maybe more about the journey than the destination. This is very true of how we approach information spaces in our leisure activities and how we use our unplanned time in digital worlds. This talk will focus on the accidental encountering of people with information, how systems support (or not) the orienteering and foraging that people tend to do, and how information retrieval might provide more optimal solutions. VI Rethinking mobile search: towards casual, shared, social mobile search experiences Sofia Reis Karen Church Nuria Oliver CITI Telefonica Research Telefonica Research Universidade Nova de Lisboa Plaza de Ernest Lluch i Martín, 5 Plaza de Ernest Lluch i Martín, 5 2829-516 Caparica – Portugal 08019 Barcelona – Spain 08019 Barcelona – Spain se.reis@campus.fct.unl.pt karen@tid.es nuriao@tid.es ABSTRACT move, portable, personal and dynamic. However recent research The mobile search space has witnessed phenomenal growth in has highlighted that (1) more and more users are accessing the recent years. As a result there has been a growing body of mobile Web in non-mobile settings like at home or at work [2, 13] research aimed at understanding why and how mobile users (2) mobile users are often motivated not by an exact need or search the Web via their handsets and how their mobile search urgency, but rather curiosity, boredom and even social avoidance experiences could be improved. However, much of this work has [2, 17] and (3) mobile web access, and mobile search in particular, focused on addressing the many challenges of the mobile space. is often a social act, carried out among groups of people, rather In this short position paper argue the need for more casual, shared, than while the end-user is alone [2, 5, 18]. Given these findings, social mobile search experiences. We outline a number of open we believe it’s time to devote some effort to enable mobile users and challenging research questions related to shared, social to search the Web in a more casual, social setting. mobile search. Finally, we present our ideas through a proof-of- In this short position paper we motivate and argue the role of concept mobile paper prototype designed to support causal mobile shared, social search experiences in the mobile space. We search and information sharing with co-located groups of friends. highlight what we think are important and fruitful areas of research related to this new direction in mobile search. Finally, to Categories and Subject Descriptors illustrate our ideas we present examples of a proof-of-concept H.5.2 [Information Systems]: Information Interfaces and mobile paper prototype, which is designed to support causal Presentation – User Interfaces. H.3.3 [Information Systems]: search and information sharing with co-located groups of friends Information Storage and Retrieval – Information Search and via their mobile handsets. Retrieval. 2. BACKGROUND & MOTIVATION General Terms The gaining momentum of mobile Web and mobile search usage Design, Human Factors. has also resulted in a growing body of interesting research related to understanding mobile users, mobile information needs [3, 16] Keywords and mobile Web behaviours [2, 4–6, 9, 13, 17]. In this section we Mobile search, mobile internet, mobile web, social search, social highlight key takeaway messages extracted from this past work context, casual search, shared search, collaborative search that we believe motivate a rethinking of the mobile search experience we provide to users. 1. INTRODUCTION Mobile phones, once deemed as simple communications devices, 2.1 Mobile does not always mean on-the-move have now evolved into sophisticated computing devices, offering Recent findings suggest that mobile users often access online users the ability to access a wealth of online information, anytime content in non-mobile settings. For example, a one week diary and anywhere. study of mobile Web access carried out by Nylander et al. [13] As mobile Internet usage has increased, there has been a growing shows that mobile Internet access occurs mostly at home (31%). body of research aimed at understanding why and how mobile A more recent study by Church & Oliver shows that > 70% of users search and browse the Web via their mobile handsets as mobile Web accesses are recorded when users are in familiar, well as how their mobile search and browsing experiences could stationary settings like at home and at work [2]. Cue & Roto [5] be improved [2, 4–9, 13, 17]. However, much of this work has discovered a similar trend emerging in a series of studies they focused on addressing the challenges of the mobile space and carried out between 2004-2007. That is mobile Web access is enabling mobile users to find the information they need as quickly becoming a more stationary activity. These findings point to the and effectively as possible. changing pace of the mobile Web. Location-dependency isn’t the only factor to consider when designed mobile services. With more While past research has shed key insights into mobile Web and more mobile users connecting to online content while behaviours and lead to a number of great advances in mobile Web engaging in their everyday lives, we need to focus on how we can services, recently there has been a shift in the mobile world, build innovative services that integrate seamlessly into their which we believe will force the community to re-think the mobile world. Web and mobile search space. In the past mobile meant on-the- 2.2 Social interactions are key Presented at Searching4Fun workshop at ECIR2012. Copyright © 2012 Mobile phones have always been deemed as intimate, personal for the individual papers by the papers' authors. Copying permitted only communications devices. They tend to be owned by one for private and academic purposes. This volume is published and copyrighted by its editors. individual and do not tend to be shared. Despite this trait, recent Participants ranged in age between 18-61 (average: 31, SD: 6.9). studies show that there is a social, shared aspect to consider in Responses were provided by 134 men (69.4%) and 59 women mobile environments. For example, two studies of mobile (30.6%) and users came from a diverse range of backgrounds, e.g. information needs have highlighted that conversations have a IT, engineering, sales, telecommunications, education and significant impact on the types of information needs that arise customer service. The majority of our participants were residents while mobile and how users choose to address those needs [3, 16]. of Spain (68%) and respondents primarily used Android (40.4%) The same is true for mobile Internet behaviours. For example, handsets to perform their searches. Finally we found that the Church & Oliver have shown that in > 65% of cases, mobile majority of participants (87%) stated that they used mobile search search was conducted in the presence of other people [2]. in social settings at least once per week, with 54.9% of Likewise, a recent study of local mobile search has shown that in participants using it at least once a day. 63% of cases, mobile searches took place within a social context and were discussed with someone else in the group [18]. Three key findings from this survey that are relevant to this position paper are as follows: (1) curiosity and alleviating While research on the social context of mobile search and tools to boredom was the primary motivation in social mobile search facilitate collaboration in mobile search have been limited to date (almost 50% of responses), (2) the most popular information need [10, 11], the same is not true for general Web search [1, 12, 14, related to trivia and pop culture (almost 40%) and (3) mobile 15, 20]. Going forward we believe there will be a need to support users tend to share results by simply speaking aloud or sometimes social, collaborative online experiences in mobile environments. showing their mobile phone screen. Rarely will users hand over their phone or share the results through electronic means. 2.3 Curiosity & boredom are important After analyzing user comments about what would improve their motivators social mobile search experiences many users pointed to more Although research has shown that mobile Web access is facilities for sharing the search results easily with their peers. motivated mainly by awareness [17], curiosity and diversion also Here’s some examples of end-user comments: “Being able to account for a significant proportion of mobile Internet motivations share information through WhatsApp or applications like that”, [2]. These motivations relate to the users desire to kill time, to “Shortcuts to send the information”, “sharing results should be a alleviate boredom and to find out something about an unfamiliar lot easier”, “sharing the screen between all participants”, “Some topic (normally encountered by chance). kind of co-browsing perhaps? Phone results mesh together”. Searching the Internet has traditionally been viewed as driven by a These findings combined with insights from past research shows specific information need in which search is considered successful that searching and sharing search experiences, in a casual manner, if the information the user is looking for is found in a minimal among groups of friends represents a potentially fruitful area of amount of time. However, in casual search scenarios finding the future research that has been largely ignored to date. In the right answer to a given query and finding that answer as quickly following section we outline what we think are important and as possible may not be the main goals [19]. In fact, in casual open research questions within this new direction of mobile search settings, the search may be considered successful even if search. the information the user is looking for is not found. In casual search scenarios people may browse the Web to pass time while 4. DISCUSSION AND OPEN RESEARCH they are idle, e.g. waiting for the bus. The information need may be vague or even nonexistent. Therefore, the measure of success QUESTIONS of a casual search process is typically based on the level of user In this section we outline a set of open research questions to frame enjoyment during the search activity and/or on how long the user the challenges and opportunities of developing applications to has been entertained for. Given that recent research in the mobile facilitate casual, shared, social mobile share: search space highlights that more and more users access content to ! What types of mobile interfaces and interactions would kill time, to eliminate boredom, to satisfy their curiosity, we support or enrich the “sharing experience” during social believe there is more opportunity to support casual search mobile search? scenarios in mobile settings. ! How can we enrich shared search experiences in relaxed social scenarios? 3. UNDERSTANDING THE SOCIAL ! Can we make shared mobile search experiences more CONTEXT OF MOBILE SEARCH entertaining for end-users? In this section we briefly outline results of a survey we conducted ! Will users share more search experiences if the sharing to understand more about social mobile search behavior. Survey process was simple, quick and easy? participants were asked to recall their most recent social mobile ! Does the type of content have any impact on the sharing search experience, i.e. a search conducted in a co-located group, experience? That is, will users share differently if the content to address a shared information need, and answer a series of is dynamic (e.g. a mobile map) versus static (a simple web- questions. The questions we asked included: what they searched page), or if the content is textual versus visual. for, their information need, their motivation, who they were with, ! Do users have preferences in terms of how they share their relationship(s) to the people present, where they were contents? Do users prefer to share entire pages, snippets of located, what they were doing before and after the search activity, pages or a “print screen” type view of the page in question? if and how they shared the search results, and if the search had ! Would users enjoy and like the ability to re-visit shared any effect on their future plans. mobile search experiences? How could shared search experiences be presented to users? 193 participants were recruited from internal and external mailing ! Does time, group size or the relationships within the group lists, online social networks and discussion forums. All impact on the sharing experience? participants had to own an Internet-enabled mobile phone and ! Do users need to share remotely, i.e. beyond co-located must perform mobile web searches at least a few times per month. groups? How might this physical distance impact on the time period to join a group. This fun, interactive action will experience? involve using the accelerometer within the phone. ! What are the technological challenges in building services to support casual, shared, social mobile search? 5.2 Easy Content Sharing Our goal is to enable mobile users to share all Web search related We are currently working on an early stage prototype designed to content with the members of their group. Figure 2 illustrates a facilitate shared social mobile search in casual settings. By simple paper prototype with our main thoughts on how to designing, building and evaluating this prototype, we hope we approach this task. Given it’s likely that users will want to share a will be able to answer some of the research questions outlined range of content types we want to provide the users the ability to previously. In the following section we present our initial ideas to (1) share a single search result or the entire page of search results support causal search and information sharing with co-located by pressing an appropriate “share” button (Figure 2 (a)), (2) an groups of friends via their mobile phones. entire Web page or image result (Figure 2 (b)), as well an interactive maps and addresses (Figure 2 (c)). Each time a piece of 5. TOWARDS SHARED MOBILE SEARCH content is shared, that content is shown as a thumbnail in a bar at To illustrate our ideas we present details of an early stage mobile the bottom of the screen (Figure 2). Pressing a thumbnail opens prototype, the design challenges we face and our plans for future the respective content again. The thumbnails’ bar is scrollable evaluations of this novel mobile search service. The prototype is horizontally. designed to enhance social mobile search by facilitating (1) easy group identification in co-located settings, (2) options to share a variety of search elements among groups and (3) the ability to view and reminisce about past social mobile search experiences. The software architecture we’re working on consists of two components: (1) an Android application that allows users to search and share their experiences; (2) a server that synchronizes and stores all search behaviour in a database. The server will also handle group identification and coordinate a notification facility, which will inform members of the co-located group about new “shares”. In addition, the server will log all the interactions between the user and the Android application for off-line analysis of user behaviour. As a first step we worked on a number of iterations of a paper prototype. The prototype focuses on three main components, each with its own design challenges: (a) (b) (c) 5.1 Easy Creation of a Sharing Session Figure 2. Sharing different contents. Information sharing on mobile phones is currently a complicated process and results of our survey reveal that this is the main reason that people do not share results with one another at present. Existing mobile browsers tend to require the user to click several times in order to finally share a web page. And this sharing is normally supported via email, SMS/MMS or social media like Facebook or Twitter. Each time a user wants to share another page, the same long sequence of clicks has to be repeated all over again. Other approaches to content sharing on mobile phones rely on Bluetooth, which is well known to be a cumbersome communication mechanism for end users. The goal of our application is to make the process of mobile Web information sharing as simple as a single click. The first step to achieve this goal is to detect which phones are associated with the shared search experience/session. At present we’re focusing our efforts on using (1) GPS to identify all people (a) (b) within a given location who have the application installed and (2) a simply way for users identified in step 1 to confirm or verify Figure 3. Visualizing past shares. they are a member of a specific group. Given that it’s likely that 5.3 Visiting past sessions the use case for such an application is indoors, GPS will not Finally, our prototype will enable users to access their past shared provide the fine level of location granularity we require. This is social search sessions. While our survey did not reveal a large the motivation for employing a second step in the group proportion of users expressing a need for revisiting past sessions, identification process. For step 2, we’re investigating a number of this need was expressed by a few users and it’s a feature we’d like alternative approaches to confirm association with a specific to implement and explore to see if it is in fact deemed useful by group. We’d like this process to be fun and playful, therefore end users. A past shared social search session is any session for we’re playing with the use of accelerometers, gestures, images which the user instigated a “share” or was the recipient of a and video. For example, one option is to ask all users within the “share”. We are currently playing with different forms of group and at a given location to shake their phones within a given presenting past shared search experiences to the end user. The first method is by time. Figure 3 illustrates two potential [6] Kamvar, M. and Baluja, S., A large scale study of approaches to grouping shared experiences by time. We could wireless search behavior. In Proceedings of CHI ’06 , show a small thumbnail for each past share, the name of the ACM (2006), 701-709. shared content and the name of the person who shared it (Figure 3 (a)) or a larger set of thumbnails to support a more visual UI [7] Kamvar, M. and Baluja, S., Query suggestions for mobile search. In Proceeding of CHI ’08, ACM (2008), 1013- (Figure 3 (b)). Another means of showing past shared search sessions is by group, that is allow users to view all shared 1016. searches carried out with or among a certain group of people or [8] Karlson, A.K., Robertson, G.G., Robbins, D.C., with an individual. Finally, we could show past shared search Czerwinski, M.P. and Smith, G.R. , FaThumb: a facet- sessions by location, that is, allow users to view all shared based interface for mobile search. In Proceedings searches carried out at a specific place. It’s likely that the choice CHI ’06, ACM (2006), 711-720. of interface will depend on a range of factors including personal preferences. [9] Kim, H., Kim, J. and Lee, Y., An Empirical Study of Use Contexts in the Mobile Internet, Focusing on the To date, we have developed a number of iterations of a paper- Usability of Information Architecture. Information based prototype and carried out design reviews with 6 users in- Systems Frontiers. 7, 2 (2005), 175-186. house to gain feedback and insights on the interface, the interaction and the core functionality. We are currently working [10] Komaki, D., Oku, A., Arase, Y., Hara, T., Uemukai, T., on implementing an Android application, however, we still have a Hattori, G. and Nishio, S., Content comparison functions number of technological challenges to overcome. Our plan is to for mobile co-located collaborative web search. Journal deploy and evaluate the application in-the-wild, among groups of of Ambient Intelligence and Humanized Computing. friends, to learn more about shared, social mobile search (2011), 1–10. behaviours in the real world. [11] Kotani, D., Nakamura, S. and Tanaka, K., Supporting 6. CONCLUSIONS sharing of browsing information and search results in In this position paper we motivate the need to support casual, mobile collaborative searches. In Proceedings of shared, social search experiences in the mobile space through a WISE'11, Springer-Verlag (2011), 298-305. review of past work and an outline of key findings from a recent [12] Morris, M.R., Lombardo, J. and Wigdor, D., WeSearch: survey of social mobile search. We highlight a set of open supporting collaborative search and sensemaking on a research questions that we think will be important for the tabletop display. In Proceedings of CSCW’10, ACM community going forward. Finally we illustrated our initial ideas (2010), 401–410. by presenting examples of a work-in-progress mobile prototype, which is designed to support causal search and information [13] Nylander, S., Lundquist, T. and Brännström, A., At home sharing with co-located groups of friends. and with computer access. In Proceedings of CHI ’09, ACM (2009), 1639-1642. 7. ACKNOWLEDGMENTS This work is funded as part of a Marie Curie Intra European [14] Paul, S.A. and Morris, M.R. , CoSense: enhancing Fellowship for Career Development (IEF) award held by Karen sensemaking for collaborative web search. In Church. Sofia Reis is currently an intern in Telefonica Research. Proceedings of CHI’09, ACM (2009), 1771–1780. As such this work was partly funded by Telefonica Research and [15] Perez, J.R. Whiting, S., and Jose, J. M., CoFox: A visual by FCT/MCTES, through grant SFRH/BD/61085/2009. Note that collaborative browser. In Proceedings of CIR '11, ACM the survey portion of the work was conducted with Antony Cousin (2011), 29-32. of University of Nottingham while he was an intern at Telefonica Research in Autumn 2011. [16] Sohn, T., Li, K.A., Griswold, W.G. and Hollan, J.D., A diary study of mobile information needs. In Proceedings 8. REFERENCES of CHI ’08, ACM (2008), 433-442. [1] Amershi, S. and Morris, M.R., CoSearch: a system for [17] Taylor, C.A., Anicello, O., Somohano, S., Samuels, N., co-located collaborative web search. In Proceedings of Whitaker, L. and Ramey, J.A., A framework for CHI’08, ACM (2008), 1647–1656. understanding mobile internet motivations and behaviors. In Proceedings of CHI ’08 Extended [2] Church, K. and Oliver, N., Understanding mobile web Abstracts, ACM (2008), 2679-2684. and mobile search use in today’s dynamic mobile landscape. In Proceedings of MobileHCI ’11, ACM [18] Teevan, J., Karlson, A., Amini, S., Brush, A.J.B. and (2011), 67-76. Krumm, J. Understanding the importance of location, time, and people in mobile local search behavior. In [3] Church, K. and Smyth, B., Understanding the intent Proceedings of MobileHCI ’11, ACM (2011), 77-80. behind mobile information needs. In Proceedings IUI ’09, ACM (2009), 247-256. [19] Wilson, M.L. and Elsweiler, D. Casual-leisure Searching: the Exploratory Search scenarios that break our current [4] Church, K., Smyth, B., Bradley, K. and Cotter, P., A models. HCIR’10: 4th International Workshop on large scale study of European mobile search behaviour. Human-Computer Interaction and Information Retrieval In Proceedings of MobileHCI ’08, ACM (2008), 13-22. (2010). [5] Cui, Y. and Roto, V., How people use the web on mobile [20] Wiltse, H. and Nichols, J., PlayByPlay: collaborative devices. In Proceedings of WWW’08, ACM (2008), 905- web browsing for desktop and mobile devices. In 914. Proceedings of CHI’09, ACM (2009), 1781–1 Out and About on Museums Night: Investigating Mobile Search Behaviour for Leisure Events Richard Schaller Morgan Harvey David Elsweiler Computer Science (i8) Computer Science (i8) I:IMSK Uni of Erlangen-Nuremberg Uni of Erlangen-Nuremberg University of Regensburg richard.schaller@cs.fau.de morgan.harvey@cs.fau.de david@elsweiler.co.uk ABSTRACT and/or the search process itself. When search behaviour is studied in information retrieval it Beyond these two studies, very little literature explicitly is nearly always studied with respect to work tasks. Recent focuses on information seeking behaviour in casual-leisure research, however, has indicated that search tasks people situations. Exceptions include studies of finding fiction [12] perform in leisure situations can be quite di↵erent. In leisure and non-goal oriented newspaper reading [14]. To our knowl- contexts needs tend to be more hedonistic in nature and of- edge no other naturalistic studies of information behaviour ten don’t require specific information to be found. Instead, in casual-leisure contexts exist. We believe that transac- information is sought that can lead to a specific emotional tional studies, such as those that have provided a rich under- or physical response from the user, such as feelings of being standing of web search behaviour [9] would be particularly stimulated or entertained. In this paper we investigate how beneficial, as they would provide concrete insight into how people behave to meet such needs in one particular leisure people behave to resolve such needs. If the model proposed context. We analyse search log data collected from a large- by Elsweiler et al. is correct and people do not care what scale (n=391), naturalistic study of behavior with a mobile information content is about, but rather are concerned pri- search tool designed to help people find events of interest to marily with the emotional or physical response to such con- them at the Long Night of Museums, Munich. We examine tent then what do queries in casual-leisure situations look the queries submitted, establish performance metrics and in- like? What do people try to describe with queries and how vestigate how spoken queries di↵er from those typed via the much e↵ort do they expend in doing this? Are queries long keyboard on a mobile device. The findings provide insight and descriptive and are users willing to look through lots of into how users behave in one specific casual-leisure context results to find something suitable? and lead to several open questions for future research. In this paper we describe a study designed to answer these kinds of questions. We report analyses of interaction logs for a search system supporting one specific leisure situation 1. INTRODUCTION AND MOTIVATION - the Long Night of Munich Museums, 2011. While we do Search behaviour has traditionally been studied in the not claim that the logs are representative of all casual-leisure context of people completing work tasks. Despite its name, a search behaviour, they do provide an insight into how users work task need not be work-related. It is simply a sequence behave in one specific casual-leisure context and a situation of activities a person has to perform in order to accomplish a where the user has a high-level, hedonistic goal. Our findings goal [8]. A work task has a recognisable beginning and end, represent a good starting point from which to investigate it may consist of a series of sub-tasks, and results in a mean- search behaviour more generally in casual-leisure situations. ingful product [3]. Correspondingly, the models we have of information seeking behaviour tend to assume that people look for information in response to a lack of understanding 2. DISTRIBUTED EVENTS or the recognition of a gap in knowledge [2] preventing the A distributed event is a collection of single events occur- completion of the task at hand. ring at approximately the same time and conforming to the Based on two investigative studies, one examining infor- same general theme. One such event is the Long Night of mation needs in the context of television viewing and the Munich Museums (Lange Nacht der Münchner Museen), an other analysing broader information behaviour reported on annual cultural event organised in the city of Munich, Ger- twitter, Elsweiler and colleagues [7] proposed a model for many1 . In addition to a diverse range of small and large mu- what they refer to as casual leisure search, which deviates seums, other cultural venues, such as the Hofbräuhaus and from standard work-based models. According to their model, the botanical garden open their doors during one evening in in casual-leisure situations users seek information not in re- October. Many venues organise special activities and exhi- sponse to a knowledge gap, but with the aim of being en- bitions not otherwise available. tertained or passing time. Such needs tend to be directly Visitors to the Long Night include both locals and tourists related to mood, physical state or the surrounding social and represent a broad range of age groups and social back- context. A further defining characteristic of such needs is grounds. In 2011 an estimated 20,000 people visited a total that the informational content found by users is often less of 176 events at 91 distinct locations, including exhibitions, important than the feelings induced by the found content galleries and interactive events. Events take place all over Presented at Searching4Fun workshop at ECIR2012. Copyright c 2012 the city, mostly in the city centre, but some, such as the Mu- for the individual papers by the papers’ authors. Copying permitted only for 1 private and academic purposes. This volume is published and copyrighted The event is organised by Münchner Kultur GmbH by its editors. (http://www.muenchner.de/museumsnacht/) seum of the MTU Aero Engines and the Potato Museum, are we extended Lucene to perform a search based on topics. In located in suburbs. Special bus tours are set up to transport a first step the event descriptions and titles were tokenised visitors between events. and stemmed. To match topically similar words we then From interviews (n=25) we conducted with people attend- map every token to one or more topic groups (these groups ing the evening we know that on average each visitor attends are taken from [4]). This way terms such as “dinner” and 4 events meaning that approximately 80,000 visits took place “food” are mapped to the same groups, thus event descrip- in 2011. The standard way to discover events on o↵er is to tions containing one of these words could be found by the use the booklet that is distributed for free by the organisers other. To speed up interaction with the system, queries were and contains descriptions of all events in the order they lie submitted after each typed character (search-as-you-type). along the bus tours. This booklet is necessarily large (110 The presented result list contains the name and nearest bus A6 pages) and can be difficult to navigate. stop for each of the retrieved events. Only a few of our interviewees reported having specific events they would like to visit. Instead, most described hav- ing the same kinds of high-level, hedonistic needs as reported in the literature [6, 15]. i.e. “to have a pleasant evening”, “to enjoy time with friends”, “to extend or diversify their gen- eral knowledge” etc. We will report on the interview results in detail in a future publication, but the findings seem to substantiate Elsweiler et al.’s model. Here we want to establish how visitors to the Long Night of Museums query a search system to address these kinds of needs. We also want to know how successful they are, and identify noteworthy behaviours, problems and any potential solutions. The long-term goals of our work are to learn about behaviour in order to understand how to build better search tools and to augment existing theoretical models of casual- leisure search. We present the results of initial analyses that lead to more detailed future research questions. 3. SYSTEM An Android app was developed to help visitors of the Long Night find events of interest to them personally. Once they have found and indicated the events they would most like Figure 1: The search screen with a query (left) and to visit, the system can create a time plan for the evening, the map screen with the planned route (right) taking into account constraints such as start and end times of events, time to travel between events and public trans- port routes and schedules. If the user chooses more events 4. METHOD than would fit into the available time2 , then the system tries We examined user search behaviour by recording user in- to maximise the number of scheduled events by leaving out teractions with our app at the 2011 Long Night. The app those that require long travel time. It is also possible for was available for download from the Android Market and the user to manually customise the plans by adding, remov- advertised on the official Long Night of Museums web page. ing and re-ordering events to be visited. Based on the cre- In total the application was downloaded approximately 500 ated plan, the application can lead the user between chosen times and 391 users allowed us to record their interaction events using a map display and textual instructions. Figure data. We recorded all interactions with the application in- 1 provides some screenshots of the app3 . cluding submitted queries, result click-throughs, all interac- The user has four ways to find events he would like to tions with browsing and recommendation interfaces, tours visit, namely he can: Browse events by bus route; browse generated, modifications to tours, as well as all ratings sub- events by event type (e.g. exhibitions, guided tours, interac- mitted for events. Users interacted on average for 45.26 tive event, etc.); submit free-text queries, which search over minutes5 with the system (median 19.31). 80.1% of users the names and descriptions of the events; receive recom- interacted for more than 5; 38.4% for more than 30. mendations based on a pre-defined profile and collaborative A short questionnaire provided us with demographic in- filtering algorithm built into the app. formation. 51% of the app users were first-time visitors to In this paper, in line with the research aims as outlined the Long Night of Museums, 22% were second-time visitors above, we focus on the way the search features were used. and 27% had attended more than twice previously. 4% of The search functionality was implemented in Lucene4 and users were 17 years of age or younger, 39% were between documents were represented by titles and descriptions from 18 and 29, 30% 30-39, 18% 40-49, 8% 50-59 and 1% above the Long Night booklet. Based on interviews conducted, 60 years old. These demographics are very similar to those we expected visitors to search for topics or for other high reported by event organisers for previous Long Nights [1] level needs not accessible for a full text search. Therefore suggesting that our sample of users should reflect well the 2 visitors as a whole. Comparing both age distributions with most events are open between 7pm and 2am Fisher’s exact test reveals a p-value of 0.29; thus it is highly 3 a video demo of the application can be found on YouTube (http://www.youtube.com/watch?v=woVjpivxtMc) 5 discounting times where no user interaction was recorded 4 Lucene version 3.1. (http://lucene.apache.org) for more than 15 seconds unlikely that the counts are drawn from di↵erent underlying smaller and much more specific than the web. Another ex- distributions. planation for the more homogenous queries is the fact that Since queries were submitted after every typed character, most queries are event names which are usually only one or it was necessary to pre-process the recorded queries to es- two words long. This reduces the possibilities for search- tablish those that the users actually intended to submit. For ing for these names when compared with the possibilities to example, if the user wanted to search for “food”, the system express interest, constraints or needs in general. logged “f”, “fo”, “foo”, as well as “food”. Furthermore, should In summary, our main observation is that the queries sub- the user wish to submit a new query, then he must first re- mitted to the search system did not reflect the information move the old search terms from the search box, resulting needs described in the pre-study interviews. It seems as again in all prefixes but this time in decreasing length. if the users did not use the search engine to discover new Automatically extracting the intended query proved dif- events, but rather used the feature to filter to events they ficult due to spelling errors and automatic correction. We already knew existed. Reflecting this, our queries have sim- therefore manually judged queries to be intended or not. ilar properties to those reported for known-item searches in 3 assessors separately annotated all of the approx. 10,000 web, email and desktop search, which have also been shown queries logged as being either intended or not-intended. A to be very short and contain a high percentage of named- high inter-assessor agreement was found (Fleiss’ kappa = entities [5, 13]. 0.872, 86.2% of queries which were labeled by at least 1 as- sessor were also labelled by at least one other assessor). This 6. QUERY PERFORMANCE process resulted in a final list of 801 search queries, which is We wanted to understand how successful queries were. used in the following analyses. With this in mind we defined three success metrics based on the user’s interaction with search results. The first refers 5. QUERY CHARACTERISTICS to whether the user selected a returned result to read a de- Overall the search queries were short, having a mean length tailed description of the event. This metric is our equiv- of 1.21 terms ( = 0.52) and 8.9 characters ( = 5.31). alent to click-through data. 58.4% of all searches resulted These values are much shorter than those reported for sim- in a click-through with an average of 0.73 clicks per query ilar mobile-like devices for web search. [10] report lengths ( = 0.93) and 5.95 results on average ( = 9.10). We didn’t of 2.3 terms for older mobile phones and new research sug- consider good abandonment since the result list contains no gests even longer queries (2.9 terms and 18.25 characters) information beyond name and nearest bus stop. for modern phones similar to those used in our study [11]. Two further, more explicit, definitions of success were if It was very apparent while analysing the queries that the user marked a returned event as a candidate for tour in- many represented searches for named entities, in particular clusion (38.0% of all searches) or the user added the event to the names of specific museums. Again 3 human assessors an preexisting tour (15.6% of all searches). These searches were asked to assign queries into categories: specific event were performed at di↵erent stages of application use. Re- name, not a specific event name or indeterminate. The third flecting this we derived a general success metric: in 59.7% of category was necessary as some queries were short and it was all searches at least one of these three actions was performed. not possible to definitively claim that the term referred to Of the remaining 40.3% unsuccessful queries 59.8% were us- a specific event. For example “deutsches” is likely to be a ing a search term which resulted in an empty result list, in reference to the “deutsches Museum” but it is not possible most cases a miss-spelled or only partial written named en- to say for certain. For 87.3% of all queries at least two of the tity. The huge number of spelling errors underlines the need assessors were able to agree on one of the three categories for fuzzy search methods in this application context. (Fleiss Kappa of 0.43). As the queries that were submitted were very short, we 59.4% of the agreed on queries were marked as clearly wanted to investigate if the length of the query had any named entities and 34.6% that might be named entities. impact on the success of the search. Searches defined as Only 6.0% were labeled as non named entity searches. These successful were on average longer with a mean of 1.26 terms remaining searches were often queries for non-museum loca- ( = 0.57) compared to unsuccessful searches with a mean tions, e.g. 18.2% of these are names of bus stops. of 1.13 terms ( = 0.42); a highly significant di↵erence Notably absent from the logs were queries describing topi- (p ⌧ 0.01). Likewise the number of characters per query was cal content of events e.g. “art history”, “engineering”, “mod- significantly (p ⌧ 0.01) longer with the successful searches ern art”, etc. There were also no queries referring to proper- having on average 9.90 characters ( = 5.42) and the unsuc- ties of events e.g. “interactive”, “talks”, “discussions” and no cessful searches having just 7.47 characters ( = 4.80). We evidence of high-level, hedonistic qualities an event might implemented a search-as-you-type system which searches for bring about e.g. “fun”, “exciting”, “entertainment”, etc. whole words, however the evidence suggests that users used In line with previous query analysis papers, we analysed the system as a means to filter to events they already knew the diversity of submitted queries. The cleaned query set about. Therefore while entering the search term the result contained 417 unique queries. As expected the distribution list is empty till you entered the complete word. This might looks rather Zipf-like with the top 2 queries being “deutsches” have led users to the conclusion that their queries will be and “deutsches Museum”. The top 50 unique queries amount unsuccessful and abandon the search early. This would be to 43.1% of all queries, the top 10 amount to 16.6% and the one explanation for the shorter query length in unsuccessful most common search term was used in 2.5% of all searches. searches. The entropy of the unique search terms is 2.44 bits. The queries submitted were therefore far less diverse than web 7. TYPED VS SPOKEN QUERIES search queries on desktop or mobile devices. This can be An additional feature our app o↵ers is the possibility to partially explained by the fact that our collection is much submit spoken queries. Rather than typing search terms in using the keyboard, the user speaks the query into the Our analysis of query performance showed that a high phone. The system uses Google Speech Recognition to iden- number of spelling mistakes were made. We wonder if this tify the query terms and the user selects the queries based is caused by environmental factors, e.g. typing on a bumpy on a list. This is familiar to android users as it is a stan- bus or if it is caused by a high number of named entities, the dard feature for web search on Android phones. We wanted spelling of which people are not familiar? Further research to establish how this feature was used, if queries submitted would be needed to di↵erentiate between the two, however a in this way di↵ered from typed queries and whether there fuzzy search feature would certainly help people who strug- was a notable di↵erence in performance between spoken and gle with the query input. A grep-style search would further typed queries. reduce this problem since users would only need to enter a In total 22 app users submitted 68 spoken queries, which few characters as opposed to whole terms. In the compari- equates to 8.5% of all search queries. Of these 6 users used son of spoken vs. typed queries we have seen that although it more than three times. When comparing the length of the not used much it provides a more successful way of querying search queries we discovered that voice searches tend to be the system. considerably longer than typed searches: 1.8 ( = 0.65) vs. We also believe that voice-queries deserve further research. 1.2 ( = 0.46) terms and 14.9 ( = 8.1) vs. 8.4 ( = 4.6) The reason behind the decision for typing or speaking a characters. Both comparisons6 are significant (p ⌧ 0.01). query is difficult to analyse based on the logged data. Per- It seems it is easier to create long queries with the voice haps users are shy of speaking to their smartphone in the interface than typing. The success rate is also significantly public. Further studies would be necessary to gain a proper higher: 75% success for speech queries compared to 58.3% insight into this behaviour. The information obtained from (p-value7 : 0.01) success for typed queries. this early study points to a number of potential avenues for It could be that the complicated input method when typ- further research. One plan we have is to look at di↵erent ing combined with the expectation of a filtering system might usage patterns with the system and see how they correlate have tempted people to give up early, whereas spoken queries with the outcomes of the evening e.g. number of events vis- are always full words. This would explain the ratio of empty ited, the ratings of visit events, the geographical coverage result list where 11.8% of the voice searches have an empty of the user etc. This would provide insight into how the result list compared to 25.2% of non-voice searches; a dif- features of our system support casual-leisure needs. ference which is significant (p-value7 : 0.013). In summary, Acknowledgments This work was supported by the Embedded there is evidence to suggest that voice search can be an ef- Systems Initiative (http://www.esi-anwendungszentrum.de). fective tool for entering search queries on a mobile device in leisure situations. There are, however, issues such as back- ground noise and user self-consciousness that may explain 9. REFERENCES [1] Die Lange Nacht der Musik Besucherbefragung 2010. why only a limited set of users used this functionality. Münchner Kultur GmbH, 2010. [2] N. J. Belkin, R. N. Oddy, and H. M. Brooks. ASK for 8. DISCUSSION AND CONCLUSIONS information retrieval: Part I. Background and theory. Journal In this paper we analysed the query behaviour of users of Documentation, 38(2):61–71, 1982. [3] K. Byström. Task complexity, information types and in a specific casual-leisure situation: a mobile application information sources. Examination of relationships. PhD to assist users at a distributed event. It was apparent when thesis, University of Tampere, Dep. of Inf. Studies, 1999. analysing the queries that there was a mismatch between the [4] F. Dornsei↵. Der deutsche Wortschatz nach Sachgruppen. queries people submitted to the search system and what we DeGruyter, Berlin, New York, 2004. [5] S. Dumais, E. Cutrell, J. Cadiz, G. Jancke, R. Sarin, and D. C. anticipated based on the needs reported in the interviews. Robbins. Stu↵ I’ve seen: a system for personal information The overwhelming majority of queries were partial or com- retrieval and re-use. SIGIR ’03, pages 72–79, NY, 2003. ACM. plete event names, where the user was trying to filter to a [6] D. Elsweiler, S. Mandl, and B. Kirkegaard Lunn. Understanding specific event. There were very few queries relating to topics casual-leisure information needs: a diary study in the context of television viewing. IIiX ’10, pages 25–34, NY, 2010. ACM. that the user may be interested in e.g. “art”, “history”, etc. [7] D. Elsweiler, M. L. Wilson, and B. Kirkegaard Lunn. New Furthermore there were no references to descriptors of events Directions in Information Behaviour, chapter Understanding that people noted they wanted in interviews e.g. “interac- Casual-leisure Information Behaviour. Emerald Pub., 2011. tive”, “talks”, “discussions”. Likewise there was no evidence [8] P. Hansen. User interface design for IR interaction. a task-oriented approach. In CoLIS 3, pages 191–205, 1999. of the high-level, hedonistic qualities an event might bring [9] B. J. Jansen and A. Spink. How are we searching the world about e.g. “fun”, “entertainment”, etc. wide web?: a comparison of nine search engine transaction logs. This poses the question: why are people using the search IPM, 42(1):248–263, 2006. system in this way? Are people conditioned to do so, i.e. do [10] M. Kamvar and S. Baluja. A large scale study of wireless search behavior: Google mobile search. In CHI 2006, 2006. they have a preconceived notion about how search engines [11] M. Kamvar, M. Kellar, R. Patel, and Y. Xu. Computers and work and only use the system in ways that reflects this? Or iphones and mobile phones, oh my!: a logs-based comparison of is it because the app has other features, such as browsing search users on di↵erent devices. WWW ’09, pages 801–810, NY, 2009. ACM. by tour or genre that might be better suited for tasks other [12] C. S. Ross. Finding without seeking: The information than known-item search? To answer these questions we are encounter in the context of reading for pleasure. IPM, currently analysing the log data for the other features of 35(6):783–799, 1999. the system. A comparison with other casual-leisure search [13] J. Teevan, E. Adar, R. Jones, and M. A. S. Potts. Information re-retrieval: repeat queries in Yahoo’s logs. SIGIR ’07, pages would also complement our understanding of this issue. Are 151–158, NY, 2007. ACM. there similar trends for search on YouTube, Wikipedia or [14] E. Toms. Understanding and facilitating the browsing of the web? electronic text. J. of Human-Comp. Studies, 52:423–452, 2000. 6 [15] M. L. Wilson and D. Elsweiler. Casual-leisure Searching: the Wilcoxon sign rank test Exploratory Search scenarios that break our current models. 7 Two-Tailed Test of Population Proportion HCIR ’10, Aug 2010. New Brunswick, NJ. The Information Needs of Mobile Searchers: A Framework Tyler Tate Tony Russell-Rose TwigKit UXLabs Cambridge, UK London, UK tyler@twigkit.com tgr@uxlabs.co.uk ABSTRACT 2. TWO DIMENSIONS OF INFORMATION The growing use of Internet-connected mobile devices demands NEEDS that we reconsider search user interface design in light of the Mobile information needs can be assed by two criteria: search context and information needs specific to mobile users. In this motive and search type. paper the authors present a framework of mobile information needs, juxtaposing search motives—casual, lookup, learn, and 2.1 Search Motive investigate—with search types—informational, geographic, The search motive describes the sophistication of the information personal information management, and transactional. need, along with the degree of higher-level thinking it involves and the time commitment required to satisfy it (see Figure 1). The Categories and Subject Descriptors lookup, learn, and investigate elements of motive shown below H.3.3 [Information Search and Retrieval]: Search process; are derived from Gary Marchionini’s work on exploratory search H.3.5 [Online Information Services]: Web-based services [5], while the casual element has been more recently studied by Max Wilson and David Elsweiler [9]: General Terms Design, Human Factors, Theory. ! Casual. Undirected/semi-directed activities with a hedonistic rather than task-driven purpose. Keywords Search, information retrieval, information needs, user experience, ! Lookup. “Known item” searching. HCI, mobile, design principles. ! Learn. Iterative information gathering that requires moderate interpretation and judgment. 1. INTRODUCTION We live in a post-desktop era. In the UK alone, 45% of Internet ! Investigate. Long-term research and planning that users used a mobile phone to connect to the Internet in 2011 [7], demands significant high-level thinking. and Morgan Stanley predicts that by 2014 there will be more While lookup, learn, and investigate are informational in nature, mobile Internet users than desktop Internet users globally [6]. Not casual activities are more experientially and hedonistically only are more people connecting with mobile devices, but they’re motivated, “frequently associated with very under-defined or also consuming more and more data. Mobile data usage more than absent information needs” [9]. Though it may be possible to doubled every year between 2008 and 2011, and is predicted to describe some casual activities in terms of other motives (e.g. grow from 0.6 exabytes per month in 2011 to 6.3 EB/month in casual information needs that share qualities of lookup or 2015 [3]. The numbers are impressive, but all it really takes is a investigation), we believe that differentiating casual from the quick glance at the people around us to recognize that mobile other three motives provides both clarity and legitimization. Internet is pervasive. Yet the practice of designing search experiences for mobile users 2.2 Search Type is still in its infancy. The challenge is much more sophisticated The search type, on the other hand, is concerned with the genre of than simply reworking existing user interfaces to fit on the smaller information being sought (see Figure 2). Broder is often cited for screens of mobile devices, which would be to ignore the vast recognizing the informational and transactional nature of many situational differences between desktop and mobile search. needs [1], while the geographic and personal information Mobile search user interfaces must be based on an understanding management goals identified by Church and Smyth are especially of the contextual factors specific to the mobile user. significant for mobile users [2]: Chief among those contextual factors are the information needs ! Informational. Information about a topic. that give rise to mobile search activities in the first place. In this ! Geographic. Points of interest or directions between paper we propose a framework for describing the diverse range of locations. information needs observed in mobile users. Of particular relevance to the Search 4 Fun! workshop is our inclusion of the ! Personal Information Management. Private casual category alongside traditional classifications of information not publicly available. information needs. ! Transactional. Action-oriented rather than informational goals. Presented at Searching4Fun workshop at ECIR2012. Copyright © 2012 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors. Figure 1: Path’s notification screen, Wikibot’s search results, product reviews on CNET, and Mendeley’s personalized library of academic papers represent the casual, lookup, learn, and investigate motives, respectively. Figure 2: Google Search, Yelp, Greplin, and Groupon demonstrate the informational, geographic, personal information management, and transactional types, respectively. 3. A MATRIX OF MOBILE 3.1 Informational INFORMATION NEEDS ! Window Shopping. I don’t know what I want. Show While the dimensions of motive and type provide a framework, me stuff. they don’t tell us about the information needs themselves. Fortunately, Sohn et al. [8] and Church and Smyth [2] have each ! Trivia. “What did Bob Marley die of, and when?” conducted diary studies in which smartphone-equipped adults ! Information Gathering. “How to tie correct knots in spread across the globe were instructed to record every rope?” information need that arose over a period of weeks. In addition, Cui and Roto [4] have performed a contextual inquiry study of ! Research. What is Keynesian economics and is it mobile Web usage. This research enables us to construct a matrix sustainable? of mobile information needs based on the motive and type dimensions (see Table 1). 3.2 Geographic The majority of the information needs in the matrix were ! Friend Check-ins. “Where are Sam and Trevor?” explicitly identified in the diary studies, though we added a few of our own in order to fully populate the framework. Below are ! Directions. “Directions to Sammy’s Pizza” examples of each information need, with quotation marks denoting statements recorded in the original diary studies. ! Local Points of Interest. “Where is the nearest library or bookstore?” ! Travel Planning. Flights, accommodations, and sights for my trip to Italy. Table 1: A matrix of mobile information needs Casual Lookup Learn Investigate Informational Window Shopping Trivia Information Gathering Research Geographic Friend Check-ins Directions Local Points of Interest Travel Planning Personal Information Checking Checking Calendar Situation Analysis Lifestyle Planning Management Notifications Acting on Transactional Price Comparison Online Shopping Product Monitoring Notifications 3.3 Personal Information Management 5. CONCLUSION In this paper we have proposed a framework of mobile ! Checking Notifications. “Email update for work” information needs in order to inform the design of mobile search ! Checking Calendar. “Is there an open date on my user interfaces. family calendar?” ! Situation Analysis. “What is my insurance coverage for 6. REFERENCES CAT scans?” [1] Broder, A. 2002. A taxonomy of web search. SIGIR ! Lifestyle Planning. What should my New Year’s Forum, Fall 2002, Vol. 36, No. 2 resolutions be this year? [2] Church, K. and Smyth, B. 2009. Understanding the intent behind mobile information needs. IUI’09, 3.4 Transactional February 8 - 11, 2009, Sanibel Island, Florida, USA. ! Act on Notifications. Mark as read, delete, respond to, Copyright 2009 ACM 978-1-60558-331-0/09/02 etc. [3] Cisco. 2011. Cisco visual networking index: global mobile data traffic forecast update, 2010–2015. ! Price Comparison. “How much does the Pantech phone cost on AT&T.com?” [4] Cui, Y., & Roto, V. 2008. How people use the web on mobile devices. WWW 2008, April 21–25, 2008, ! Online Shopping. I want to buy a watch as a gift. But Beijing, China. ACM 978-1-60558-085-2/08/04. which one? [5] Marchionini, G. 2006. Exploratory search: from finding ! Product Monitoring. I know the make and model of to understanding. In Commun. ACM 49 (2006), no. 4, used car I want. Alert me when new ones are listed. 41–46. [6] Morgan Stanley: Meeker, M., Devitt, S., Wu, L. 2010. 4. DISCUSSION Internet trends. This framework of mobile information needs originated out of an [7] Office for National Statistics 2011. Internet access - attempt to synthesize top-down HCIR concepts with bottom-up households and individuals, 2011. empirical data. We hope that future investigations of mobile [8] Sohn, T., Li, K., Griswold, W., Hollan, J. 2008. A diary behavior will use this framework as a conceptual point of study of mobile information needs. CHI 2008, April 5– reference when both constructing their studies and analyzing the 10, 2008, Florence, Italy. Copyright 2008 ACM 978-1- results, which will would undoubtedly bring about iterative 60558-011-1/08/04 improvement to the framework. [9] Wilson, M.L. and Elsweiler, D. 2010. Casual-leisure While the specific information needs that we have identified are searching: the exploratory search scenarios that break unique to the mobile context, the dimensions of search motive and our current models. In Proc. HCIR'10, New Brunswick, search type are themselves generic. We envision future studies NJ, USA, 28- 31. 2010. applying this same framework to desktop information needs, as well as comparing and contrasting desktop vs. mobile information needs. Role of Emotion in Information Retrieval for Entertainment (Position Paper) Yashar Moshfeghi Joemon M. Jose School of Computing Science School of Computing Science University of Glasgow University of Glasgow Glasgow, UK Glasgow, UK yashar@dcs.gla.ac.uk Joemon.Jose@glasgow.ac.uk In this paper, we argue that standard and dominant view doesn’t sufficiently consider all the possible aspects of search- ers’ needs. Information Science (IS) researchers have argued ABSTRACT about the existence of needs other than IN, and discussed The main objective of Information Retrieval (IR) systems their roles in the cognitive aspects of human beings and in is to satisfy searchers’ needs. A great deal of research has IR&S behaviour. Examples include Wilson’s interrelation been conducted in the past to attempt to achieve a better between physiological, a↵ective and information needs in insight into searchers’ needs and the factors that can poten- IR&S behaviour [6], Kuhlthau’s uncertainty principle [3]; tially influence the success of an Information Retrieval and these studies have investigated the role of a↵ective and cog- Seeking (IR&S) process. One of the factors which has been nitive experience of a searcher in an information seeking pro- considered is searchers’ emotion. It has been shown in pre- cess model. vious research that emotion plays an important role in the Although these views better capture the searchers’ mind success of an IR&S process which has the purpose of satisfy- compared to the traditional view, their accounting for the ing an information need. However, these previous studies do role of emotion is limited to its relation with cognition in not give a sufficiently prominent position to emotion in IR, the process of satisfying an IN in an IR&S behaviour, e.g., since they limit the role of emotion to a secondary factor, Kuhlthau’s [3] model. Therefore, emotion plays a marginal by assuming that a lack of knowledge (the need for informa- role in these views in their modelling of needs. For example, tion) is the primary factor (the motivation of the search). in an IR&S scenario, where searchers’ task is to find docu- In this paper, we propose to treat emotion as the principal ments that are topically relevant to a given query (e.g., Iraq factor in entertainment-based IR&S process, and therefore War), the emotion that they experience during the comple- one that ought to be considered by the retrieval algorithms. tion of this task influences their performance and satisfac- tion. Other examples are those of Arapakis et al. [1] and Categories and Subject Descriptors: H.3.3 Information Lopatovska [4] that investigated the use of facial expressions Storage and Retrieval - Information Search and Retrieval - and peripheral physiological signals as implicit indicators of Information Filtering topical relevance. General Terms: Theory Others, e.g., Wilson [6], consider a more autonomous role Keywords: Entertainment, Search, Information Retrieval, for a↵ect and define a↵ective need as an independent need Information Science, Emotion which can motivate an IR&S behaviour. For example, gath- ering information to satisfy a↵ective needs, such as the need for security, for achievement, or for dominance [6]. However, 1. INTRODUCTION there is no operationalisation of this a↵ective need suitable The idea that IR systems help searchers to overcome their for use in real IR systems. information need (IN) is a leitmotif since the early days of In general, the current landscape of the role of emotion IR: the main task is to locate documents containing infor- in IR&S behaviour is incomplete. Moshfeghi [5] argued that mation relevant to such needs. Within this view, a searcher people use computers for individual as well as social pur- is considered as an agent that interacts with an IR system poses, such as entertainment, dating, getting to know peo- with the intention of seeking information [3]. The informa- ple, finding ‘friends’, gaming, etc., which strongly indicates tion can be defined as facts, propositions, and concepts, as that users try to satisfy needs other than information ones. well as evaluative judgements such as opinion [6]. The study conducted by Elsweiler et al. [2] also supported this claim. The current views of emotion in IR/IS do not sufficiently explain these types of activities accurately, even though it is clear that users search for emotionally-rich doc- uments from the Internet to satisfy these needs. The pervasiveness of emotionally-rich content on the web, such as movies, music, images, news, blogs, customer re- view, Facebook comments and Twitter, highlights the de- Presented at Searching4Fun workshop at ECIR2012. Copyright c 2012 for the individual papers by the papers’ authors. Copying permitted only for mand for such contents, and, indirectly, their role in satis- private and academic purposes. This volume is published and copyrighted fying searchers’ needs. Therefore, it is important to under- by its editors. stand the IR&S behaviour backed up by an entertainment this point of view, not only is emotion a factor that exists aspect. The position of this paper is that emotion is a pri- throughout an IR&S process which aims to meet an IN, but mary motivation (either directly or indirectly) behind an also it can be considered as a need: the need to change entertainment-based IR&S behaviour. negative feelings caused by uncertainty during the initiation The rest of the paper is organised as follows: Section 2 phase (e.g. feelings of doubt, anxiety and frustration) to discusses Kuhlthau’s [3] model, followed by our approach in feelings of satisfaction and comfort. Section 3 and discussion and conclusion in Section 4. When the emotion need of the searcher is to diminish the negative feelings associated with a lack of knowledge (i.e., 2. EMOTION IN IR/IS an IN), the emotion need would be satisfied if the IN associ- There are many theories and models that attempt to ex- ated with it is resolved. However, in an entertainment-based plain the information seeking behaviour. Kuhlthau’s infor- IR&S process, the emotion need of the searcher is not asso- mation seeking process model is one of the first and most ciated with a particular IN, and is an autonomous need by popular models to investigate the a↵ective along with cog- itself. An example of such needs are the scenarios where the nitive and physical aspects of a searcher in an informa- searchers are stressed and look at some clips that could help tion seeking process. She proposes that people’s feelings, to relieve their stress, e.g., when searchers are seeking for thoughts and actions interact within their information seek- funny clips in YouTube. Of course, one way of finding these ing process. Kuhlthau’s information seeking process model clips is by looking at the popular (most viewed/highly rec- describes the searchers’ common patterns of seeking mean- ommended) videos. In such scenario there is no particular ing from information, to extend their knowledge state on a information need to be resolved, but only an emotion need. complex problem or topic which has a discrete beginning and From the above, we can now argue that emotion in an ending [3]. The fundamental principle behind Kuhlthau’s entertainment-based IR&S process acts as a primary factor, information seeking process is the uncertainty principle [3]. i.e. as an autonomous and important need. This refers to the existence of a cognitive state which causes feelings of anxiety and lack of confidence. Feelings of doubt, 4. CONCLUSIONS anxiety and frustration are in association with vague and In this paper, we explained the role of emotion in entertain- unclear thoughts. The model shows that during a typical ment-based IR&S behaviour. We explained that in the nor- information seeking process, the thoughts of a searcher be- mative view of IR/IS, the focus is on the satisfaction of come clear and consequently their confidence increases and searchers’ IN. Although the role of emotion is acknowledged their feeling of doubt, anxiety and frustration decrease. as a factor influencing the whole IR&S behaviour, its role Although this model is an important step towards under- was limited to the study of its influence on the process of standing the role of emotion in IR/IS, it does not encom- satisfying an IN. However, emotion can be a source of mo- pass many important aspects of emotion in IR. Kuhlthau tivation on its own for a searcher to engage in an IR&S considers emotion/a↵ect as a factor influencing the informa- process. Such scenarios have not been considered in the tion seeking process, rather than a need in itself. Moreover, IR/IS community, and this motivated the definition of the Kuhlthau’s model is limited by making uncertainty central, emotion need concept. We argued that there are emotion i.e., as driving the seeking process while we argue that pos- needs that can motivate searchers to engage in IR&S be- itive or negative emotion states, high or low arousal level, haviour which strictly speaking does not have an IN. The such as stress or boredom respectively, could also motivate pervasiveness of the use of IR applications for the purpose users to engage in an information seeking behaviour. There- of entertainment and the existence of emotionally-rich data fore, a key limitation lies in the fact that the a↵ective side on the web provides evidence that some information seeking of searchers is interpreted as only being a secondary moti- behaviour can be categorised under other strategies than in- vational source for information need. In this paper, we con- formation need that can lead to better satisfaction of the sider emotion as a separate need. This is explored further searchers’ needs. Given all these evidences, the conclusion in next section. of this paper is that emotion act as a primary factor behind entertainment-based IR&S behaviours. Finally, there is not 3. APPROACH much research about entertainment-based IR&S processes. This is due to the limitations associated with it, such as lack The goal of this section is to argue that emotion should of datasets, evaluation methodology, metrics and procedure. be considered as the primary factor in entertainment-based An attempt to solve such limitations is a possible direction IR&S behaviour: emotion can be considered as an individ- for future work. ual need which can motivate searchers to engage in an IR&S process. The secondary factor of emotion refers to the fact 5. REFERENCES that emotion (in relation to cognition) influences every as- [1] I. Arapakis, Y. Moshfeghi, H. Joho, R. Ren, D. Hannah, and J. M. Jose. Enriching user profiling with a↵ective features for pect of the searchers’ IR&S behaviour, and can thus influ- the improvement of a multimodal recommender system. In ence the success or failure of an IR&S process. First, we will CIVR, 2009. elaborate on emotion as a secondary factor in IR&S process. [2] D. Elsweiler, S. Mandl, and B. Kirkegaard Lunn. Understanding casual-leisure information needs: a diary study in the context of As discussed in Section 2, the secondary nature of emotion television viewing. In IIiX ’10, pages 25–34, 2010. in IR&S scenarios has been investigated for a long time [3]. [3] C. C. Kuhlthau. A principle of uncertainty for information The results of such investigations show that (i) participants seeking. Journal of Documentation, 49(4):339–355, 1993. experience a burst of negative feelings due to uncertainty [4] I. Lopatovska. Emotional correlates of information retrieval behaviors. In WACI’11, pages 1 –7, april 2011. associated with vague thoughts, leading them to recognise [5] Y. Mosheghi. Role of Emotino in Information Retrieval. PhD that they have an information need; and that (ii) there is a thesis, University of Glasgow, 2012. positive correlation between a successful information seeking [6] T. A. Wilson. On user studies and information needs. Journal of process and a decrease in these negative feelings [3]. From Documentation, 37(1):3–15, 1993. Searching Wikipedia: learning the why, the how, and the role played by emotion Hanna Knäusl Department of Information Science University of Regensburg 93040 Regensburg hanna.knaeusl@sprachlit.uni-regensburg.de ABSTRACT • Entity search, e.g. [2], which assumes the user has Searching Wikipedia has been the focus of study for an in- an information need that could be solved by with a creasing number of information retrieval publications. In list of entities that satisfy some properties. A query recent years different IR tasks have used Wikipedia as a ba- might, for example, indicate the type of entities to be sis for evaluating algorithms and interfaces for various types retrieved (e.g., “castle”) and distinctive features (e.g., of search tasks, including Question Answering, Exploratory “German”, “medieval”). Search, Entity Search and Structured Document retrieval. • Structured retrieval e.g. [3], which aims to retrieve Despite being associated with these well-defined task types, relevant parts of documents in a collection in response little is known about why people actually search wikipedia, to given information need. what they try to find, how and why they try to find it or the criteria they use to define success. We argue that the • Exploratory search e.g. [5], whereby the user has a way wikipedia content is generated influences the way it is poorly defined information need, little knowledge of used, including search behaviour. We are particularly in- the topic of interest or is unfamiliar with the search terested in learning about affective aspects of search, which space. have been suggested to be an important motivating factor Each of these examples are associated with well-defined in wikipedia search behaviour, particularly in leisure scenar- tasks or situations. However, it is unclear how reflective ios. In this position paper we motivate the investigation of these tasks are of real-life wikipedia search behaviour. Are wikipedia search behaviour in the wild and present our ideas these the most appropriate tasks to be investigating? Are on the best way to study this behaviour. we evaluating these tasks appropriately? Are there more pressing aspects that we, as a research community, should 1. INTRODUCTION AND MOTIVATION be investigating? As a starting point to answering these questions, in the Wikipedia1 is a free online encyclopedia, which due to its following section, we briefly review research that informs on open source design and community-based editing policy has wikipedia search behaviour in naturalistic situations. become one of the largest reference works of all time. The large volume of information, the breadth of topics covered and open-access nature of the collection has made Wikipedia 2. SEARCHING WIKIPEDIA a natural target of study within the Information Retrieval The main source of knowledge of wikipedia search be- research community. Wikipedia is now used as the document haviour comes from transaction log analyses. Sakai and collection for several retrieval evaluation efforts at CLEF [4] Nogami [6], for example, logged user interaction with a wikipedia and INEX [3] and has formed the basis of evaluations in search interface, designed to encourage exploration and de- several IR domains including: velopment of information needs. They discovered that infor- mation needs tend to progress and develop in small steps, • Question answering, e.g. [4], which attempts to pro- usually within query type. For example, users tended to vide answers to questions such as “How fast can a browse pages from person to person or from place to place Cheetah run?”, sometimes supplementing answers with etc. The implicit structure of wikipedia most likely encour- additional relevant snippets that might be helpful to ages this behavior the user. Fissaha and de Rijke [1] also used log analyses to learn 1 http://www.wikipedia.org about wikipedia searches, distinguishing between “directed” and “undirected” searches by analysing the phrasing of queries. They [also] discovered that a large percentage of searches were undirected and exploratory in nature. Log-based investigations such as these have the advantage of collecting large quantities of data from naturalistic situ- ations. However, they are limited in that they say nothing about the intention of the user, his experience, or the out- Presented at Searching4Fun workshop at ECIR2012. Copyright January come of the search. For example, the work of Wilson and 2012 for the individual papers by the papers’ authors. Copying permit- ted only for private and academic purposes. This volume is published and Elsweiler [7] asserts that many searches will not be moti- copyrighted by its editors. vated by information needs per se, but purely by the user having an interest in a topic. In their work, they found we ask more detailed questions regarding the experience, example search tasks that were motivated by the desire to success of the task, how the feelings realized and the factors achieving a particular mood, emotional or physical state or that influenced these. This data will be elicited through a by the presence or need of someone else in the social con- mixture of fixed and free-form questions. text. In such cases, the support the user would need from We plan to triangulate the data collected from the vari- the system and the criteria that should be used to evaluate ous aspects of our study to create a rich understanding of system performance would be very different to those cur- user needs and behaviour. For example, we plan to look rently featured in information retrieval research. at the content of visited pages; the topic and the kind of We believe that the way wikipedia is constructed, i.e., media used etc. and look to see how this relates to how par- collaboratively by a subset of the users, the large collection ticipants describe their experiences. We want to see, what size and broad topic range, linked structure, as well as mul- affects user behaviour, e.g. does the link structure or the timedia prominence of multimedia content will mean that way information is presented, certain content influence be- wikipedia will be used for leisure-time tasks. People are mo- haviour or emotions experienced. The different sources of tivated to create / edit wikipedia pages as it mirrors their data we will collect will help us to learn about these com- interests. This may not always be positive. plicated behavioural aspects. For example, Wilson and Elsweiler [7] describe one study participant reporting frustration that he has again wasted 4. CONCLUSIONS a lot of time aimlessly browsing ebay. This negative out- So what will we learn from the study and why is it impor- come - realised through a negative emotion - would not be tant? The most important point is to find out what makes considered in any current IR methodology. the users happy; what do they need, how do they behave In the following section we outline our thoughts on what to achieve these needs and emotional aspects are involved we believe to be a more suitable study design to learn about when Wikipedia is searched? An understanding of these is- wikipedia search tasks. We would like to use the workshop sues will inform us on the kind of functionality a wikipedia as a platform for discussion to improve on this design. search tool should offer. Do users want to browse to related topics? Do they like a wide range of possible interesting in- 3. LEARNING ABOUT BEHAVIOUR WITH formation or just quirky look up pieces of information as and when they are needed? The proposed study would offer the A LOG / DIARY HYBRID chance to answer these questions by providing naturalistic We need to design a study that helps us learn about the data, as well as additional comments from the participants the user’s motivation for searching, his behaviour in response of interest. to this motivation, his satisfaction with the experience as well as his emotional response to the experience. 5. REFERENCES To investigate these aspects we propose combining the log based approaches scholars have used previously with user [1] S. F. Adafre and M. de Rijke. Exploratory search in diaries. Diary Studies offer the ability to capture factual wikipedia. In Proceedings SIGIR 2006 workshop on data, in a natural setting, without the distracting influence Evaluating Exploratory Search Systems, 2006. of an observer. They also offer the chance to question the [2] G. Demartini, C. Firan, T. Iofciu, R. Krestel, and user regarding his motivation to search, as well as the search W. Nejdl. Why finding entities in wikipedia is difficult, process and feelings and emotions experienced during the sometimes. Information Retrieval, 13:534–567, 2010. search process. 10.1007/s10791-010-9135-7. Diary studies also have limitations. These include difficul- [3] INEX. Initiative for the evaluation of xml retrieval, ties in maintaining participant dedication levels throughout 2006. url: http://inex.is.informatik.uni - the period of study and getting the participants to remember duisburg.de/2006/. that situations of interest should be recorded. These neg- [4] V. Jijkoun and M. de Rijke. Overview of WiQA 2006. ative aspects can be offset, however, through careful study In A. Nardi, C. Peters, and J. Vicedo, editors, Working design. For example, since Wikipedia is digital and accessed Notes CLEF 2006, September 2006. within a web browser, it makes sense to use a digital diary [5] B. Kules and R. Capra. Designing exploratory search that can also be filled out in a web-browser session, perhaps tasks for user studies of information seeking support as a pop up. We plan to build an extension to the Firefox systems. In Proceedings of the 9th ACM/IEEE-CS joint web-browser that detects when a wikipedia page is accessed conference on Digital libraries, JCDL ’09, pages and if a certain time threshold has elapsed since the last 419–420, New York, NY, USA, 2009. ACM. diary entry, the user will be asked to record details about [6] T. Sakai and K. Nogami. Serendipitous search via his information need and the motivating situation surround wikipedia: a query log analysis. In Proceedings of the the search. The extension will also record interactions with 32nd international ACM SIGIR conference on Research wikipedia (e.g. pages viewed, search queries submitted etc.), and development in information retrieval, SIGIR ’09, allowing analyses similar to those published previously to be pages 780–781, New York, NY, USA, 2009. ACM. complemented by the diary study data. [7] M. L. Wilson and D. Elsweiler. Casual-leisure To limit the irritation that filling out such a form would searching: the exploratory search scenarios that break cause and to minimise distraction to the search process we our current models. In 4th International Workshop on plan only to ask two short questions at that time point. The Human-Computer Interaction and Information user will be asked to give a brief description of what they Retrieval, Aug 2010. New Brunswick, NJ. are looking for and why. This will be enough information to remind them of the situation at a later time point when Rushed or Relaxed? – How the Situation on the Road Influences the Driver’s Preferences for Music Tracks Linas Baltrunas Bernd Ludwig Francesco Ricci Telefonica Research, University of Regensburg, Free University of Bolzano, Plaza de E. Lluchi Martin 5, Universitätsstraße 31, Piazza Domenicani 3, Barcelona, Spain Regensburg, Germany Bolzano, Italy Linas@tid.es bernd.ludwig@ur.de fricci@unibz.it ABSTRACT For a recommender system, there is a major implication In context-aware recommender systems, the dependency of from this observation. If we can assess such an influence the user’s ratings on factors that describe important aspects for individual users we are able to better personalize recom- of the recommendation context is used to provide more rel- mendations. Beyond this, it may even be possible to group evant recommendations. users influenced in a similar way by certain contextual condi- Individual users may be influenced di↵erently by the same tions. This knowledge could lead to an improved prediction set of contextual factors. By understanding this kind of de- of ratings for items not previously rated by the user. pendency between the user’s ratings (evaluations) and con- With this in mind, it seems worth understanding the in- text, it is possible to identify user profiles and use them fluence of context on user ratings. In previous work [2], we to predict precisely the user ratings for items to be rec- reported on a collection of ratings data for music tracks while ommended. In this paper, we present our methodology to users experienced di↵erent stereotypical situations while driv- identify user profiles in a corpus of ratings for music tracks. ing a car. In this report, we focus on the analysis of this data These ratings were collected in a user study, which simu- with respect to the aims discussed above. Whether or not a lated typical situations that occur while driving a car. We particular aspect of context is important for predicting user present the findings derived from the data, and argue that ratings, is dependent on the user to whom the recommen- it is feasible to distinguish di↵erent typologies of users from dations are targeted. Our data suggest that di↵erent users the ratings they give to music tracks in specific contexts. have di↵erent perceptions of their surroundings and that these perceptions may influence musical preferences. Our data reveal that people assign di↵erent ratings to the same Categories and Subject Descriptors music track in di↵erent contexts and in many cases these H.3.3 [Information Storage and Retrieval]: Information di↵erences are statistically significant. Search and Retrieval—Information Filtering Our paper is structured as follows: In the next section we briefly present our data. Next, we introduce the mathemat- Keywords ical tools we use to analyze the influence of context on user ratings. In sections to follow, we present evidence that con- Recommender Systems, Context-based Reasoning, Collabo- text can provoke a change the music genres preferences of rative Filtering the user. In the final section, we discuss whether or not the influence of the context on ratings can even be observed for 1. INTRODUCTION individual users, and conclude the paper with a discussion Recommender systems predict user ratings for items on of the results and outline our plans for future work. the basis of previous ratings for similar items or similar users [5]. As users may rate the same item di↵erently depend- 2. DATA CORPUS AND CONTEXT MODEL ing on the situation in which they will experience or use the item, context-aware recommender systems [4, 6, 3, 1] As described in [2], we collected two independent data have become a popular research focus. The main idea is samples. In these experiments, driving situations were simu- to model context as a set of variables (contextual factors) lated with descriptions on a website. In the first experiment, each of which can take one of a finite set of discrete val- we intended to capture the influence of context on the ac- ues (contextual value). The user ratings are stochastically tive and conscious decision of a user to listen a tracks of a dependent on the contextual values. certain genre if at the same time he was exposed to a certain contextual factor. For this purpose, users were asked to fo- cus on one context factor at a time and rate the influence of this context factor on their decision to listen to a track of a randomly proposed genre on a three-level scale (POSITIVE, NEGATIVE, or NONE). In this way, the decision making process in this experiment was modeled as an active modification of the user’s attitude towards a genre. Over a period of three Presented at Searching4Fun workshop at ECIR2012. Copyright c 2012 for weeks, we acquired 2436 ratings from 59 users (Users were the individual papers by the papers’ authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted recruited via email-lists and social networks). This study by its editors. was considered a pilot, and in order to avoid the sparse data Context Factor M IY (X, Y ) then defined as: sleepiness 0.169766732 XX P (x, y) traffic conditions 0.034971332 M I(X, Y ) = P (x, y) · log P (x) · P (y) weather 0.027759496 y2Y x2X driving style 0.025347564 M I can be normalized to the interval [ 1; 1] by computing road type 0.022788139 its value relative to the entropy of Y : natural phenomena 0.015574021 mood 0.013993043 M I(X, Y ) M IY (X, Y ) = P landscape 0.010431354 y2Y P (y) · log P (y) For X we have 2436 ratings (see Section 2 above). For each Figure 1: Mutual Information between Influence of of the context factors, we collected 95 ratings. Figure 1 Context on Ratings and Context Factors gives a numeric overview of the average ratings in the second data set and the impact of the single context factors on the average rating. The results indicate that users are influenced heavily by problem a small number of tracks for each genre were pro- variable driving conditions such as their own physical con- posed. 95 ratings were collected per contextual factor. dition (sleepiness) and external factors such as traffic and For our model of context, we relied on cognitive task anal- weather. Personal factors, such as their mood, and factor yses of car driving and considered three di↵erent kinds of a not directly related to the car driving task, such as the land- driver’s perceptions and actions as potentially relevant: scape in which users are traveling, are of minor impact. Context Factor Possible Values In the next step of our analysis, we wanted to understand driving style relaxed driving, sport driving whether the influence of context depends on the user pref- road type city, highway, serpentine erence for a music track. We hypothesized that if the user landscape coast line, country side, more strongly likes or dislike a track then his rating can be mountains/hills, urban significantly influenced by contextual factors. In order to sleepiness awake, sleepy traffic conditions free road, many cars, traffic jam analyze this hypothesis we grouped the data into 5 parti- mood active, happy, lazy, sad tions for each of the 5 possible ratings a user could assign weather cloudy, snowing, sunny, rainy to a track. I.e. the partition 1 (“the tracks disliked with- natural phenomena day time, morning, night, afternoon out considering context”) contains all tracks rated with 1 (while di↵erent context factors were activated), and parti- Situations where more than one passenger was present tion 5 (“the highly preferred tracks”) contains the tracks were beyond the scope of our research. rated with 5 in any context. Again, the influence of the For the second sample, we collected tracks with ratings on context factors can be computed by measuring the mutual a five star scale. The sample consists of 955 ratings ignoring information and therefore the dependence between the ran- any context factor and 2865 ratings taking one contextual dom variable “a track is rated r without considering context” condition into account. The ratings were given by 66 di↵er- (r 2 {1, 2, 3, 4, 5}) and the random variable “context factor c ent users (including many who had participated in the first is active while a track is rated r”. Figure 2 shows the results study). 69 to 167 ratings were collected per contextual fac- of this experiment. A first look at the numbers gives the tor depending on the assumed relevance for the experiment impression that the mutual information is generally higher (see Figure 1 and the discussion in Sect. 3). than in the experiment documented in Figure 1. To test this in a statistically sound way, we compared the mutual infor- mation values for each partition to those shown in Figure 3. RELEVANCE OF CONTEXT FACTORS 1 using a t-test. The results are given in the last column. When analyzing the dependency between contextual fac- With the exception of partition 3 which groups the tracks tors and ratings we could not make any modeling assump- that users did rate neutrally, for each partition the di↵erence tions regarding the nature of the dependency. The same is statistically significant (the dot stands for ↵ = 0.5, ⇤ ⇤ for holds for inter-factor dependencies. Therefore, paramet- ↵ = 0.01, ⇤ ⇤ ⇤ for ↵ = 0.001). These findings suggest that ric models for the dependency such as linear regression are when users have strong positive or negative opinions for cer- not appropriate. Instead, we had to find a non-parametric tain tracks, the conditions they experience while driving a model. In information theory, the concept of mutual infor- car can influence more their ratings for these tracks. mation of two random variables is known exactly for this We also analyzed the influence of context on the prefer- purpose: it provides means to quantify the mutual depen- ences for certain music genres. For this purpose, we analyzed dence of two random variables. the data coming from the first study (see above). We for- In our case, we can apply mutual information to quanti- malized the user responses (POSITIVE, NEGATIVE, or NONE) tatively assess the di↵erence in the average ratings for music as a random variable I. Given this variable, the genre G ignoring any influence of context compared to the average and the activated context factor C given, we can estimate rating taking single contextual factors into account. More the probability distribution P (I|G, C) from the first data formally, we define a random variable X for the event that set and compare it to the distribution P (I|G) which does users assign one of the ratings 1, 2, 3, 4, or 5 to a genre (in not take any context into account. For our purposes, it is the first sample) or to a track (in the second sample). again interesting to compute the mutual information for the Secondly, we define another random variable Y for the above random variables (C|G) and (I|G). The following ta- event that one of the context factors holds in the current ble presents the top-3 results for all combinations of genres situation. Mutual information (M I) between X and Y is and context factors: Partition Context Factor 1 2 3 4 5 driving style 0.145373959 0.048822968 0.18469473 0.035874718 0.028085475 landscape 0.039462852 0.025682432 0.05470132 0.042950347 0.038938108 mood 0.017266963 0.029724906 0.052830753 0.046422692 0.093026607 natural phenomena 0.022655695 0.053228548 0.084777547 0.024086852 0.082907254 road type 0.062203817 0.027293531 0.040344565 0.073388508 0.143056622 sleepiness 0.136737517 0.17566705 0.053153867 0.396715694 0.31060986 traffic conditions 0.036059416 0.121036344 0.124320839 0.032237073 0.139863842 weather 0.089973183 0.064745768 0.03265592 0.019943082 0.053972648 Level of Significance . ⇤⇤ . ⇤⇤ Figure 2: Mutual Information between Influence of Context on Ratings (POSITIVE, NEGATIVE, or NONE) and Context Factors Given a Certain Rating (key: ’.’: ↵ = 0.5. ⇤, ⇤: ↵ = 0.01) Blues driving style 0.324193188 tracks may change their opinion if they experience their driv- road type 0.216609802 ing situation intensively enough. sleepiness 0.144555483 Classics driving style sleepiness 0.77439747 0.209061123 4. INDIVIDUAL USER TYPES weather 0.090901095 We now investigate the influence of context on individual users. We analyze the user ratings of the four users who Country sleepiness 0.469360938 gave most of the ratings in our second data collection phase driving style 0.363527911 (see above). We show that di↵erent contextual factors can weather 0.185619311 influence di↵erent users in di↵erent ways. In the following Disco mood 0.177643232 tables, Mean with context (MCY) is the average rating of a weather 0.17086365 user for all items rated under the assumption that the given sleepiness 0.147782999 contextual factor holds. Mean without context (MCN) is the average (of all users) rating for the same items without con- Hip Hop traffic conditions 0.192705142 sidering context. Di↵erences in these averages are compared mood 0.151120854 using a t-test in order to assess whether a contextual factor sleepiness 0.105843345 actually influences the user’s ratings in a significant way. We Jazz sleepiness 0.168519565 indicate the statistical significance of the di↵erence between road type 0.127974728 MCY and MCN with the p-value of the t-test. weather 0.106333439 We note that a recommender system can exploit the re- Metal driving style 0.462220717 sults of our data analysis when building a prediction model weather 0.264904662 that integrates the average rating of many users for an item, sleepiness 0.196577939 a personalized component for a particular user, and a com- ponent for the context (see [2] for details). Pop sleepiness 0.418648658 driving style 0.344360938 User 1: Preferences above Average. road type 0.268688459 As can be seen in column MCN in Table 3b, this user, on average, rated the tracks in the data base higher than the Reggae sleepiness 0.549730059 others. The comparison with MCN of all users (see Table driving style 0.382254696 3a) suggests that for this user many of the tracks were per- traffic conditions 0.321430505 ceived very positively in driving situations demanding the Rock traffic conditions 0.238140493 driver’s attention. In fact, driving on a highway, on a ser- sleepiness 0.224814184 pentine or mountain road leads to an increase of the average driving style 0.132856064 rating (compared to MCN for all users). On the other hand, situations that can be perceived as negative (e.g. traffic jam) From these results, we can learn two lessons. First, within provoke a decrease of the user ratings. This observation sim- a given genre, the mutual information is very high only for ilarly holds for some other factors: lots of cars, a situation some factors. Evidently, these have a strong influence on quite similar to traffic jam, or driving in morning time. In- the user ratings. This outcome was not obvious before the terestingly, sport driving – which stands for a consciously experiment as the user preferences could have been stronger sportive style of driving – has negative influence on the av- than the influence of the driving situation. However, some erage ratings of this user. Hence we hypothesize that the of these factors influence the ratings for (almost) all genres. user is a↵ected negatively by the tracks (mainly pop music) We may conclude that they are strongly related to the cogni- in situations that are likely to produce stress. tive and emotional state of a driver and therefore constitute User 2: Preferences around Average with Positive important features of recommending music in car. Tendency towards Tracks. Second, as the influence of context is evident, we may In this example the user has a personal average rating conclude that even users with strong preferences for certain similar to the other users. This phenomenon is not an ef- Factor MCN MCY Tendency ↵ Factor MCN MCY Tendency ↵ highway 2.498429 3.521739 " ⇤ ⇤ ⇤ traffic jam 3.077586 1.647059 # ⇤ ⇤ ⇤ traffic jam 2.498429 1.647059 # ⇤, ⇤ lots of cars 3.077586 1.894737 # ⇤ ⇤ ⇤ city 2.498429 3.800000 " ⇤⇤ sport driving 3.077586 1.705882 # ⇤ ⇤ ⇤ serpentine 2.498429 3.529412 " ⇤⇤ active 3.077586 1.866667 # ⇤⇤ sport driving 2.498429 1.705882 # ⇤⇤ morning 3.077586 2.000000 # ⇤⇤ lots of cars 2.498429 1.894737 # ⇤⇤ city 3.077586 3.800000 " ⇤ coast line 2.498429 3.500000 " ⇤ mountains/hills 2.498429 3.307692 " . active 2.498429 1.866667 # . (b) MCN versus MCY of User 1 country side 2.498429 3.272727 " . (a) MCN of all Users versus MCY for User 1 Figure 3: Profile of User 1. Only those factors with statistical significance are shown. Factor MCN MCY Tendency ↵ Factor MCN MCY Tendency ↵ happy 2.498429 1.444444 # ⇤⇤ happy 2.432692 1.444444 # ⇤⇤ serpentine 2.498429 1.709677 # ⇤⇤ serpentine 2.432692 1.709677 # ⇤ urban 2.498429 1.760000 # ⇤ awake 2.432692 3.642857 " ⇤ awake 2.498429 3.642857 " ⇤ urban 2.432692 1.760000 # ⇤ country side 2.498429 1.807692 # ⇤ country side 2.432692 1.807692 # . sad 2.498429 1.846154 # ⇤ sad 2.432692 1.846154 # . afternoon 2.498429 2.000000 # . relaxed driving 2.498429 2.025641 # . (b) MCN versus MCY of User 2 (a) MCN of all Users versus MCY of User 2 Figure 4: Profile of User 2. Only those factors with statistical significance are shown. fect of any context. The sign of the significant di↵erences previous comparison. Moreover, there is one personal fac- between MCN and MCY in Table 4a indicate that this user tor (awake) under which the user rated significantly higher. likes the tracks in the corpus when he feels awake. Being But, as there are many factors with almost identical ratings sad, he would never like to listen to the tracks. In general, to the already low non-contextualized ratings, in most sit- for this user the traffic situation (di↵erently from user 1) uations the items should not be recommended to this user. seems to play a minor role. Many significant di↵erences in From this observation, we can assume that as this user dis- his ratings can be found comparing his MCY with his non- likes tracks very strongly, it is hard to find context factors contextualized ratings (own MCN) as well as with the rating that may change his attitude. of all the users (MCN), for personal factors such as the mood and the perception of the surrounding landscape. 5. CONCLUSIONS AND FUTURE WORK User 3: Preferences slightly below or on Average We have presented a non-parametric approach to assess with Negative Tendency towards the Tracks. the impact of a set of contextual factors on the user ratings. In this user profile, the factors provoking significant dif- Our findings from the analysis of two data collections suggest ferences between MCN and MCY (see Table 5a) are mostly that the perceptions and experiences during the execution of personal ones or factors that indirectly influence personal a task influence user preferences even for non-crucial items attitudes or the cognitive load of the driver (i.e. road type). such as music tracks to be played in a car. As many of the tracks used for our data collection were pop songs, and on average the user assigns low ratings, we 5.1 Influence of Context can conclude that he has a strong dislike for this kind of mu- sic. This impression is strengthened by the observation that We found empirical evidence that the driving situation negative emotions (such as sad) lead to even worse ratings indeed influences the driver’s preferences for music. The for tracks than on average for this user. influence of context may even be strong enough to modify the preference of a user for his favorite tracks. User 4: Preferences below Average. The findings also suggest that the cognitive load of the In this user profile, there are several highly significant dif- driver, his emotional, mental, and physical state, and cur- ferences between the MCN of all users and MCY (see Table rent traffic conditions influence his preferences. 6a). In every case, the tendency is negative indicating that These findings are surely a↵ected by the set of tracks used there are almost no situations in which tracks from the data in the study. We used this set as the reported experiments set should be recommended to such a user. Probably this were developed within an industrial project, and the tracks user does not like the tracks in the corpus, or he even does were provided by the media platform of the industrial part- not like to listen to music at all while driving. The signifi- ner. It is an interesting task to collect data for other set of cance level of the di↵erence between the personal MCN and tracks – in a wider set of types of tracks or with a di↵erent MCY (see Table 6b), here is slightly smaller than in the specialization – and repeat the analysis. Factor MCN MCY Tendency ↵ Factor MCN MCY Tendency ↵ sad 2.498429 1.333333 # ⇤⇤ sad 2.329787 1.333333 # ⇤⇤ day time 2.498429 1.666667 # ⇤⇤ day time 2.329787 1.666667 # ⇤ active 2.498429 1.769231 # ⇤ active 2.329787 1.769231 # . serpentine 2.498429 1.714286 # ⇤ coast line 2.498429 2.000000 # . (b) MCN versus MCY of User 3 (a) MCN of all Users versus MCY of User 3 Figure 5: Profile of User 3. Only those factors with statistical significance are shown. Factor MCN MCY Tendency ↵ Factor MCN MCY Tendency ↵ day time 2.498429 1.166667 # ⇤ ⇤ ⇤ day time 2.175676 1.166667 # ⇤ ⇤ ⇤ afternoon 2.498429 1.666667 # ⇤⇤ awake 2.175676 3.222222 " . highway 2.498429 1.700000 # ⇤ afternoon 2.175676 1.666667 # . urban 2.498429 1.769231 # ⇤ morning 2.498429 1.714286 # . mountains/hills 2.498429 1.714286 # . (b) MCN versus MCY of User 4 country side 2.498429 1.700000 # . (a) MCN of all Users versus MCY of User 4 Figure 6: Profile of User 4. Only those factors with statistical significance are shown. 5.2 Critical Discussion of the Study Design B. Shapira, and P. B. Kantor, editors, Recommender It is important to note the constraints and conditions of Systems Handbook, pages 217 – 250. Springer, 2011. our study design. First of all, in the web survey, we created [2] L. Baltrunas, M. Kaminskas, B. Ludwig, O. Moling, fictive situations that the subject should imagine. Hence, F. Ricci, A. Aydin, K.-H. Luke, , and R. Schwaiger. the test persons may have overestimated the relevance of Incarmusic: Context-aware music recommendations in the contextual factors on their music preferences. Hence, a a car. In (to appear) Proceedings of the 12th di↵erent study where users are actually facing certain con- International Conference on Electronic Commerce and textual conditions is in order. But before performing that Web Technologies, 2011. evaluation, our study clearly indicates that users perceive [3] L. Baltrunas, M. Kaminskas, F. Ricci, L. Rokach, context as important and influential, and di↵erent users, B. Shapira, and K.-H. Luke. Best usage context with di↵erent music preferences, have completely di↵erent prediction for music tracks. In 2nd Workshop on perceptions. To assess this result quantitatively, the web Context-Aware Recommender Systems, 2010. survey and the described methods represent a simple way to [4] A. Chen. Context-aware collaborative filtering system: collect and analyze data. In fact, we exploited our results in Predicting the user’s preference in the ubiquitous the implementation of a real music recommender system and computing environment. In T. Strang and player [2]. Besides, it is also important to note that during C. Linnho↵-Popien, editors, Location- and our study users rated the music tracks just after listening Context-Awareness, volume 3479 of Lecture Notes in to them. This is not always the case in many recommender Computer Science, pages 244–253. Springer Berlin / systems (e.g. MovieLens or Netflix), where often the ratings Heidelberg, 2005. are provided long after the user experienced the items. [5] Y. Koren and R. Bell. Advances in collaborative filtering. In F. Ricci, L. Rokach, B. Shapira, and P. B. 5.3 Consequences for Future Work Kantor, editors, Recommender Systems Handbook. Currently, we are preparing a new study with an improved Springer, 2011. experimental setup: we are merging our prototype with an- [6] G.-E. Yap, A.-H. Tan, and H.-H. Pang. Discovering other application that allows to log onboard data in a car. causal dependencies in mobile context-aware We will equip cars of test persons with this tool and collect recommenders. In MDM 06: Proceedings of the 7th data in real driving situations. The logged data will allow International Conference on Mobile Data Management, us to detect the values of certain contextual factors from on- page 4, Washington, DC, USA, 2006. IEEE Computer board information about the car and its navigation system. Society. Furthermore, we will be able to combine this data with feed- back from the users (e.g., which of the recommended tracks are played or skipped). From such a new collection of data, gained in a naturalistic setting, we will validate the findings of our simulation study. 6. REFERENCES [1] G. Adomavicius and A. Tuzhilin. Context-aware recommender systems. In F. Ricci, L. Rokach, Serendipitous Browsing: Stumbling through Wikipedia Claudia Hauff and Geert-Jan Houben Web Information Systems Delft University of Technology Delft, the Netherlands {c.hauff,g.j.p.m.houben}@tudelft.nl ABSTRACT itous browsing is StumbleUpon1 (SU), which allows users While in the early years of the Web, searching for informa- to “stumble” through the Web one (semi-random) page at tion and keeping in touch used to be the two main reasons a time. Interestingly to us, many SU users appreciate be- for ’going online’, today we turn to the Web in many di↵er- ing shown Wikipedia2 articles, which are informative pieces ent situations, including when we look for entertainment to of text that educate the reader about a particular concept. pass the time or relax. A popular tool to facilitate the users’ The leisure activity of stumbling thus can also incorporate desire for entertainment is StumbleUpon, which allows users a learning experience, which might contribute to the devel- to “stumble” through the Web one (semi-random) page at a opment of novel ideas and lead to creative insights. Since time. Interestingly to us, many StumbleUpon users appre- life-long learning is an important characteristic of knowl- ciate being served Wikipedia articles, which are informative edge economies, it is crucial to understand the interplay be- pieces of text that educate the reader about a particular tween these two seemingly opposing forces (entertainment concept. The leisure activity of stumbling can thus also in- vs. learning). We hypothesize that a greater understanding corporate a learning experience. Since life-long learning is an of what makes certain Wikipedia articles more attractive to important characteristic of knowledge economies, it is cru- the serendipitously browsing user than others, will enable cial to understand the interplay between these two - at first us to develop adaptations that expose a greater amount of sight - opposing forces. We hypothesize that a greater un- Wikipedia articles to the leisure seeking user. derstanding of what makes certain Wikipedia articles more In this position paper we make an argument for the im- attractive to the serendipitously browsing user than others, portance of this task. We draw from a number of insights will enable us to develop adaptations that expose a greater gained in museum studies [11] where the question of how amount of Wikipedia articles to the leisure seeking user. learning can be facilitated in leisure settings (the museum visit) has been investigated for many years. While we do Categories and Subject Descriptors: H.3.3 Information not consider the SU pages to be similar to museum objects, Storage and Retrieval: Information Search and Retrieval we do find a number of parallels. General Terms: Human Factors, Experimentation A first experiment on the stumbled Wikipedia pages re- Keywords: free-choice learning, educational leisure, serendip- vealed that, just as in museums not all objects are equally itous browsing attractive to visitors, not all articles are interesting to the average StumbleUpon user. In fact, only a very small num- ber of Wikipedia articles gather a large number of views by 1. INTRODUCTION SU users, most articles are rarely viewed. While we have no In the early years of the Web, searching for information answer yet to the question of how to automatically classify and keeping in touch used to be the two main reasons for articles according to their attractiveness to the serendipi- ’going online’. Today, we rely on the Web in increasingly di- tously browsing user, we have developed a number of hy- verse situations including shopping, consultations and learn- potheses which are outlined in Section 3.2. ing. While these examples are all directed towards a partic- If we assume for a moment that we are indeed able to ular goal the user has, we also turn to the Web at times when develop such an approach, a number of application scenarios we simply want to be entertained to pass the time or relax. can be envisioned: The possibilities for entertaining yourself on the Web are • A qualitative study of the features that play a role in manifold, one can play games, listen to music, watch movies to trickling the interest of users who do not have an or simply browse through the Web in the hope of finding en- information need, will enable Wikipedia contributors tertaining pages. Due to the sheer size of the Web though, to write their articles in a way that is more accessible random browsing is not e↵ective for discovering pages that to such users. may b interesting to the individual user. For this reason, • Wikipedia is available in many di↵erent languages and a number of services have become popular that recommend such a prediction method would allow us to bootstrap a web pages to users based on their interests. One popular tool recommender like StumbleUpon in di↵erent languages to facilitate the users’ desire for entertainment by serendip- by adding an initial set of interesting, high quality pages before the critical mass of users is reached. Presented at Searching4Fun workshop at ECIR2012. Copyright c 2012 for the individual papers by the papers’ authors. Copying permitted only for 1 private and academic purposes. This volume is published and copyrighted http://www.stumbleupon.com/ 2 by its editors. http://www.wikipedia.org/ page submission • Outliers (articles with many ’Likes’ but a low proba- # user rating bility of being attractive) can be manually investigated web userdiscovery page to reduce spam. Or conversely, undiscovered articles " page user are obtained and can be injected into the index. index profiles web • The passages that trigger the surprise or the attrac- userbrowsing Stumble! page recommender tiveness of an article can be identified and highlighted engine to the browsing user. This may help to keep those serendipitously browsing users engaged that initially web page with meta-data page available for each entry infos only quickly scan the article. • E-learning applications can also benefit, as articles which are interesting to the casual reader can be found this Figure 1: A StumbleUpon user can contribute Web way. pages he likes to the index and he can “stumble” The rest of the paper is organized as follows: related work pages that are in the SU index according to his in- is presented in Section 2, followed by a preliminary analysis terests. One page at a time is shown; the user can of stumbled Wikipedia pages (Section 3) and the conclusiosn provide feedback in terms of like and dislike. (Section 4). 2. RELATED WORK For this work, we draw inspirations from two areas. On The usage of StumbleUpon is depicted in Figure 1. A user the one hand we consider research into so-called educational “stumbles” pages with a simple click of the ’Stumble!’ button leisure settings and free-choice learning which is a multi- in his browser toolbar. In response, the user is presented disciplinary field that includes aspects from sociology, psy- with a random page from the Web, biased according to his chology and education. On the other hand, our work is also user profile or his friends’ ’Likes’. The simplicity of the strongly related to serendipity. system protects the user from information overload [8, 4], a Education leisure settings can be found in a wide range user has only two choices when faced with a stumbled page: of institutions including museums [12], national parks, zoos, either to start reading or to continue stumbling. Users can science centers [5], etc. As the name suggests, these insti- also contribute pages to the SU index: whenever a SU user tutions serve two purposes: to educate the public as well as discover a web page that is not yet in the index and that he to provide an entertaining experience to the visitors. Edu- likes, he can add it by means of the ’Like’ button. Finally, for cation leisure settings can be characterized by a number of each page in the SU index, there is a SU page which contains commonalities with respect to the visitors and their learning meta-data, including the number of users who viewed/liked experience [9, 10, 11]: (i) the visitors gain direct experience, the page, the category the user who discovered the page (ii) they decide what and whether at all to learn, (iii) the placed it in and the comments users left about the page. learning process is guided by their interests, (iv) learning is influenced by the visitors’ social interactions and (iv) the 3.1 Wikipedia Articles in StumbleUpon visitors are a highly diverse group, with di↵erent educational In all experiments we report here, we utilize the English backgrounds and prior knowledge. Since learning in this set- Wikipedia dump enwiki-20111007 from October 2011. In a ting is voluntary, the visitors’ motivation plays an important pre-processing step, we selected all Wikipedia articles that role: why did they come? are neither redirects to other articles, nor new articles or Serendipity, the act of encountering information nuggets explicit disambiguation pages and have a length of at least unexpectedly, has mostly been investigated in the context 500 characters (to remove stubs). In total, 3, 552, 059 arti- of education [3] and work-related discoveries after serendipi- cles remained. tious moments. One of the works outside of this realm is [6] In order to determine the popularity of Wikipedia arti- where tools were developed to help people reminisce in their cles in StumbleUpon, we randomly selected half of these own digital collections. In goal-directed Web search the po- Wikipedia articles and queried the StumbleUpon API for tential for serendipitous encounters has also been recently their number of views by SU users. Since SU is a recom- investigated [2], while [1] o↵ers an insightful discussion of mendation engine, we can safely assume that the highly serendipity and how it is used, exploited and induced in viewed pages are also highly popular and liked. We note, computer science. that the number of ’Likes’ a page has received is not ac- Finally we note that di↵erent aspects of Wikipedia ar- cessible through the StumbleUpon API. The information is ticles have also been investigated in the past, though not accessible though at the SU meta-data page, which we man- from a perspective of serendipitously browsing users. For ually checked for the results reported in Table 1. instance, in [7] it was found that the writing style distin- Among the evaluated 1, 776, 029 articles, we found 267, 958 guishes so-called featured articles in Wikipedia3 from un- (15.13%) of them to be contained in the SU index. In our featured articles. Classifying Wikipedia articles according initial investigation, we also considered French and Ger- to their quality, as defined by Wikipedia contributors, was man Wikipedia which are two of the largest non-English also investigated in [13], where network motifs and graph Wikipedia repositories. However, we only found a very lim- patterns in the editor-article graph were exploited. ited number of their articles in the SU index (in both cases less than 1%) and thus did not consider them further. Thus, an application scenario as proposed in the introduction (to 3. STUMBLEUPON bootstrap a recommender for a new language) is highly de- 3 sirable. Featured Wikipedia articles are of particularly high quality and chosen by Wikipedia editors. Let us now focus on those articles that were submitted by Stumblers to the index. Figure 2 shows a scatter plot of (A) Comments expressing surprise the number of views versus the number of Wikipedia articles • “There’s a name for this?” in the index. As can be expected, most articles have very few views (the median number of views is 10) while a small • “I’d never heard of this before (go StumbleUpon!). number of articles have gathered more than half a million Very cool.” views. (B) Comments expressing admiration, sadness, sorrow, etc. • “That’s so sad” 100,000 • “No one should go through life afraid to take a 10,000 walk.” • “don’t know what to say actually..” Number of pages 1,000 (C) Comments about the usefulness of the knowledge 100 • “Simple, but helpful for designers.” 10 • “An exceptional list of colours and their code, in- valuable to graphic designers, webmasters etc.” 1 1 10 100 1,000 10,000 Number of views 100,000 1,000,000 10,000,000 (D) Comments expressing negative sentiments towards the article Figure 2: Log-log scatter plot of the number of views • “Fake.” versus the number of articles in the SU index. • “Why stumble everyday wikipedia articles?” To give an impression of the type of articles that have 3.2 Working Hypotheses gathered few or many views, Table 1 contains the ten most Based on the preliminary qualitative insights gained, we viewed Wikipedia articles in our data set as well as ten developed three intuitions that we believe will enable us to random examples of articles that were viewed one hundred predict to what a Wikipedia article is likely to be beneficial times. We chose these two settings as they represent two ex- to the average SU user. tremes: on the one hand, articles that were viewed and also liked by a large number of people and on the other hand Intuition A. Articles that contain unexpected nuggets of in- articles, that were shown a number of times but less well formation can be identified by considering how semantically received by the SU users. related the article is to the other articles it contains links to. It should also be noted that the SU category Bizarre & For instance, the List of unusual deaths Wikipedia article Oddities, which dominates the list of the ten most viewed ar- has, among others, outgoing links to the following diverse ar- ticles is not as prevalent when considering a larger set of ar- ticles: Common fig, Malvasia (wine), Eddystone Lighthouse, ticles. In fact, the top 100 viewed articles in our data set be- Hawaii, and Chimney. We hypothesize that finding such long to 59 di↵erent SU categories: Bizarre & Oddities occurs seemingly unrelated articles can be used as a measure of the 12 times, followed by the Writing category (5 times) and a likelihood of the article being of interest. number of categories with three occurrences, including Arts, Science and Linguistics. Only one of the top 100 articles was Intuition B. Articles that evoke emotional feelings can be a so-called featured article (indicating that previous work on discovered through a form of sentiment analysis. Although featured article prediction, e.g. [7], might not be applicable Wikipedia articles are written in a neutral style, some topics here), while seven were semi-protected articles due to pre- are bound to evoke emotions and those emotional topics can vious vandalism activities. Notable is also the fact that 12 be identified. out of the 100 articles are of the form List of X where X = {algorithms, legendary creatures, band name etymologies} to Intuition C. Articles that contain useful knowledge may be name three examples. identified indirectly, when considering their Talk pages, the While for a human reader it is usually not difficult to amount of discussions that are ongoing and the style of the quickly judge whether an article is potentially interesting to discussions. Articles about practically useful information him or not, it is a challenge to derive a method that automat- are not likely to be emotionally charged, unlike discussions ically classifies articles accordingly. What exactly makes one for instance about politicians, religious topics, etc. article more interesting to the general public than another? We emphasize, that these are hypotheses that need to be In order to get get a first understanding of what users think verified in future work. about the most viewed articles and possibly also why they like them, we analysed the comments that were posted on the SU info page for each of the ten most viewed Wikipedia 4. CONCLUSIONS articles. This analysis is very cursory, as compared to the In this position paper we have proposed to investigate number of views, very few users actually comment on an what makes certain Wikipedia articles interesting to users article, as commenting distracts from the ’stumbling’ expe- who are browsing the Web without a goal in order to pass rience. For example, the article Wrap rage with 0.86 million the time or relax. Since such articles are education to some views and forty-thousand likes has a 41 comments. In total, degree, the leisure activity of browsing (stumbling) can thus we analysed 479 comments and identified four broad cate- also incorporate a learning experience. Since life-long learn- gories: ing is an important characteristic of knowledge economies, it is crucial to understand the interplay between these two Most viewed articles #Views #Likes SU Category Date Example articles viewed 100 times SU Category List of unusual deaths 3.99M 0.423M Bizarre/Oddities 12/2004 Biblioscape Software Flying Spaghetti Monster 1.39M 0.121M Satire 08/2005 Edge of chaos Chaos/Complexity Wrap rage 0.86M 0.040M Bizarre/Oddities 01/2008 Gottfried Wilhelm Leibniz Prize Biology Shigeru Miyamoto 0.75M 0.019M Video Games 10/2003 Mario Buda Crime Benjaman Kyle 0.74M 0.051M Bizarre/Oddities 12/2008 Proto-Indo-European language Linguistics One red paperclip 0.72M 0.070M Bizarre/Oddities 09/2006 Cisco Adler Alternative Rock List of colors 0.70M 0.066M Arts 01/2005 Biofeedback Psychology Do not stand at my grave and weep 0.64M 0.132M Poetry 10/2007 Ovipositor Sexual Health Fuel cell 0.56M 0.009M Science 06/2005 Concealer Beauty Raymond Robinson (Green Man) 0.54M 0.036M Bizarre/Oddities 05/2008 Winklepickers Fashion Table 1: A list of Wikipedia articles that are contained in the SU index. For the most viewed articles, shown are also the number of views and likes in million, the category in StumbleUpon the page was assigned to by the user who discovered the page and the date (month/year) at which the page was discovered. forces. We argue that a greater understanding of features Characterizing wikipedia pages using edit network are indicative of an article’s attractiveness to the average motif profiles. In SMUC ’11, pages 45–52, 2011. user (stumbler) will enable us to develop adaptations that expose a greater amount of Wikipedia articles to the leisure seeking user. 5. REFERENCES [1] P. André, m. schraefel, J. Teevan, and S. T. Dumais. Discovery is never by chance: designing for (un)serendipity. In C&C ’09, pages 305–314, 2009. [2] P. André, J. Teevan, and S. T. Dumais. From x-rays to silly putty via uranus: serendipity and its role in web search. In CHI ’09, pages 2033–2036, 2009. [3] L. Björneborn. Design dimensions enabling divergent behaviour across physical, digital, and social library interfaces. In Persuasive Technology, volume 6137, pages 143–149. 2010. [4] D. Bollen, B. P. Knijnenburg, M. C. Willemsen, and M. Graus. Understanding choice overload in recommender systems. In RecSys ’10, pages 63–70, 2010. [5] J. H. Falk and M. Storksdieck. Science learning in a leisure setting. Journal of Research in Science Teaching, 47(2), 2010. [6] J. Helmes, K. O’Hara, N. Vilar, and A. Taylor. Meerkat and tuba: Design alternatives for randomness, surprise and serendipity in reminiscing. In Human-Computer Interaction - INTERACT 2011, volume 6947, pages 376–391. 2011. [7] N. Lipka and B. Stein. Identifying featured articles in wikipedia: writing style matters. In WWW ’10, 2010, pages 1147–1148. [8] A. Oulasvirta, J. P. Hukkinen, and B. Schwartz. When more is less: the paradox of choice in search engine use. In SIGIR ’09, pages 516–523, 2009. [9] J. Packer. Learning for fun: The unique contribution of educational leisure experiences. Curator: The Museum Journal, 49(3):329–344, 2006. [10] J. Packer. Beyond learning: Exploring visitors’ perceptions of the value and benefits of museum experiences. Curator: The Museum Journal, 51(1):33–54, 2008. [11] J. Packer and R. Ballantyne. Motivational factors and the visitor experience: A comparison of three sites. Curator: The Museum Journal, 45(3):183–198, 2002. [12] J. M. Packer. Motivational factors and the experience of learning in educational leisure settings. PhD thesis, Queensland University of Technology, 2004. [13] G. Wu, M. Harrigan, and P. Cunningham. A Diary Study of Information Needs Produced in Casual-Leisure Reading Situations Max L. Wilson Basmah Alhodaithi Michael Hurst Future Interaction Technology Lab Future Interaction Technology Lab Department of Information Science Swansea University, UK Swansea University, UK Loughborough University, UK m.l.wilson@swansea.ac.uk basmah.alhodaithi@gmail.com m.a.hurst@lboro.ac.uk ABSTRACT 2. RELATED WORK Both information seeking and leisurely activities are commonplace in people’s daily lives, but very little is know about The study of searching behaviour has long been embedded in the searching behaviours outside of the work context. To study such history of library and information science, where searching is leisurely information needs and subsequent searching, a diary presumed to be a goal-oriented research activity. This is highlighted by the common definition that Information Seeking is study was performed, focusing on the context of casual-leisure reading. The week-long diary study with 24 participants was focused on the resolution of an information need [12] or performed by a team of six graduate students. Reading was often knowledge gap [1]. Further, the common approach to describing both an act of casual searching, as well as a motivator for tasks for empirical research, is named a ‘Work Task’ [2]. Despite subsequent searching episodes, and around half were implying work-oriented scenarios, Work Tasks are described as hedonistically or emotionally motivated. Casual searching often including non-work personal tasks too, but these tasks are still began with topical or personal interests, but did not always typically goal and need-driven scenarios. Examples include involve information needs. The findings confirm prior literature studies of everyday-life information seeking [18] and information encountering [6], which relate to non-work contexts, but can still on casual search, while providing new insights into these less- be quite serious. critical and experience-driven episodes of searching, for fun. To understand non-work leisure time better, Stebbins introduced a taxonomy containing three levels: serious-leisure, project-leisure, General Terms and casual-leisure [22]. Serious leisure typically covers activities Experimentation, Human Factors, Theory. relating to committed hobbies, or volunteering outside of work [9]. Project-leisure relates to extended but temporal efforts like Keywords buying a car, planning a holiday, or researching family histories Casual-leisure, Reading, Information Seeking [3]. These goal- and need-driven leisure scenarios could be easily captured in Work Tasks. The third level, casual-leisure, relates to activities often involved in play and relaxation, such as watching 1. INTRODUCTION television [4] or searching online [23], and much more. Based on Although there has been decades of research into Information their prior work, Elsweiler et al proposed a model of casual- Seeking and Information Retrieval, very little has focused on the leisure information behaviour [5] that highlighted some key casual searching experiences of people outside of work. Research differences between casual scenarios and Work Tasks. First, these by Harris and Dewdney in 1994 indicated that 95% of 3,100 scenarios were often driven by hedonistic needs, rather than surveyed information seeking studies had focused on work-driven information needs. Consequently, searching often began with tasks [8]. Yet Pew Research found that searching simply for fun, ephemeral or absent information needs. Further, success in and often for no particular reason, is one of the most popular meeting their hedonistic needs, did not necessarily involve online pastimes and counts for a significant portion of internet successfully finding information and results. Hedonistic needs traffic [17]. Elsweiler et al suggest that casual, leisurely searching include factors such as affect, novelty, social relationships, and situations differ significantly to work or project driven tasks in enjoyment [10], where O’Brien, for example, studied their that they produce search experiences that often begin without a importance in online shopping experiences [14]. given information need. Further, their investigations indicated that actually finding relevant information is typically less important Many have also studied reading as a casual or pleasurable activity. than having fun [5]. Such scenarios involve passing time and Early work by Pjetersen converted observed book-finding relaxing, can be driven by the need to recover from a bad day, or behaviour into a naturalistic library-style search interface [16], to have fun with other people. Casual searching includes scenarios helping people to browse in different modes. In 1980, Spiller such as window shopping, browsing eBay, and delving into found that 46% of library loans (n=500) were based upon Wikipedia. To further investigate such casual-leisure searching browsing and 54% on known authors [21]. During a much smaller experiences in more detail, this paper describes a diary study of (n=12) qualitative study in 2011, however, Ooi and Liew saw searching for fun, performed in the context of casual reading. participants often only using the library to retrieve books that they had already selected in everyday life [15]. Further, along with the introduction of e-readers and tablet devices, the nature of reading Presented at Searching4Fun workshop at ECIR2012. Copyright © 2012 in casual episodes is changing. Research continues to highlight for the individual papers by the papers' authors. Copying permitted only that increasing numbers of people perform their reading online or for private and academic purposes. This volume is published and through digital mediums [11, 20]. copyrighted by its editors. 3. DIARY STUDY during this process. The six researchers then returned to their The main goal of this study was to investigate the information diary entries to re-examine them in the context of the final codes. seeking behaviours performed in the context of casual-leisure reading. Prior work by Ross found that people who read for 4. RESULTS pleasure often encounter new information, without having an Over the course of the week, most participants recorded around 1 existing related information need [19]. Here, six researchers, as or 2 diary entries per day, producing around 120 usable entries in part of their post-graduate studies, coordinated a diary study of total. To provide an overview, approximately 20% of reading was casual-leisure information behaviour. The methodology used was performed with physical paper objects (books, newspapers, and similar to the diary study performed by Elsweiler et al [4], which magazines), with the remaining being split between e-readers and studied information needs produced while watching television. In mobile devices (around 30%) and laptops and PCs (50%). Reading content included: News (around 45%), email (20%), total 24 participants took part in the diary study for one week. Participants were recruited by the six researchers using snowball- magazines (15%), and fiction (10%). In terms of physical sampling; participants were primarily young adults in their 20s. surroundings, around 40% of entries were produced in work contexts, with the remaining performed in home environments. Participants were given a small portable physical diary, so that it Figure 2 shows the model developed from the analysis, which is could be used in both digital and physical contexts; an example is described further below. shown in Figure 1. Participants were asked to fill out one entry 1. Reading Motivations page per information need or searching episode that was initiated a. Hedonistic or Emotional during a period of reading undertaken for self-motivated b. General knowledge interests pleasurable reasons. To support continued participation, the i. Interest driven participants were managed by one of the six researchers. Each ii. Carer participant had regular contact with their researcher, including but iii. In-the-know not limited to: an initial interview, an informal interim discussion, iv. Decision and a final debriefing interview. 2. Searching Motivations a. Information need b. Personal scoping c. General topical d. Decision-making 3. Search focus a. Factual information b. Background information c. Object related information 4. Source of Information a. Paper sources b. Social networks c. Expert sites d. Generic sites Figure 2: The developed coding scheme. 4.1 Reading motivations Reading material can be considered a source of information itself. Consequently, our study observed reading as being both the act of Figure 1: An example diary; a bound set of A5 card. casual searching, and as a source motivating separate casual search episodes. This section focuses on the former, where casual The diary consisted of a mix of open and closed questions. After reading is itself sometimes an act of casual search. logging the time and date, participants were asked to indicate the Although around 50% of casual reading episodes were driven by type of material they were reading and their environment, such as hedonistic or emotional needs, around 50% were driven by the home, work, library, coffee shop, etc. Participants were then asked participants’ general knowledge interests. Examples of hedonistic to describe a) what they wanted to search for, and b) why they or emotional motivations included “to pass time”, “to help cope wanted to search. Participants were then asked to identify how they then performed the search, if at all. with things”, and “to relax after my day”. Although following knowledge interests could also be seen as a pleasurable pastime, 3.1 Analysis the knowledge-driven entries also occasionally broached the Although some summative information was collected about the concepts of ‘project leisure’, such as reading about possible nature of the reading scenario, a Grounded Theory analysis [7] holiday destinations, and ‘serious leisure’, such as reading around was performed to systematically extract key elements from the a hobby domain. The majority of the knowledge-drive situations information needs and information seeking described in the open described by participants, however, were casual episodes relating text fields. The six researchers individually transcribed their to a project-leisure interest, rather than active periods of research diaries and initially coded them for key points. As a group, and in or work. One participant, for example, was reading about a collaboration with the supervising author, these codes were neighbourhood area as they were soon to be “moving into a new discussed, analysed, and configured into affinity diagrams, using house”. post-it notes and a whiteboard. These codes, and the relationships While the hedonistic and emotional scenarios were pretty uniform captured in the affinity diagrams, were discussed, referring back in motivation, we further classified the casual knowledge-driven to example diary entries, until they stabilized and all researchers reading scenarios into four types: Interest driven, Carer, In-the- were in agreement. Entries that challenged the evolving know, and Decision-oriented. Interest driven were those casual definitions and affinity diagrams were frequently considered bouts of reading relating to a hobby or temporary interest. Examples included “information about buying a car abroad” and wanted to “check the weather for the weekend” in order to make “information on fixing my PC”. For a participant who was a “new some plans. fan of J.K. Rowling’s novel series”, they were “reading about the latest Harry potter sequel”, which was due to be delivered. 4.3 Focus of information sought The information that people sought in these casual scenarios could Carers were those that were reading information that has personal be largely broken into three types: factual information, or emotional relevance. Carers often read news, for example, background/overview information, and object related information. about zones with natural disasters, or places and events relating to Factual information, of course, related to specific information their childhood, or to distant friends. One participant cited needs, and were often represented by factual content, such as choosing to read “more information on tsunamis”, while another dates, prices, locations, etc. One participant was searching for had a personal interested in the unrest in the Bahrain. “yesterday’s lottery results”. Background and overview In-the-know readers were those that casually monitored general information was typically sought in general topical situations and knowledge information sources, including news, to be aware of interest-driven reading, such as “wales football information”. current events and new technology. Example diary entries Finally, object related information pertained to places, people, and included a participant who “read about the 2011 budget meeting events with one participant suggesting they were “searching for in today’s paper” in order to get “updates on current budget more about Mississippi”. Such information was often sought by meetings”. Another participant said “I wanted to know what was caring readers, or personal-scoping searchers. happening while I was asleep”. In-the-know readers often recorded more frequent small reading sessions, than extended 4.4 Sources of information periods like those with hedonistic or emotional motivations. The diary study also asked participants to describe how they Finally, decision makers were those that read up on interest areas sought information during episodes of casual searching, motivated related to things like casual purchases, such as new movie releases by their casual reading. Perhaps correlating with the large or new cameras. In another example, a participant wrote that they percentage of our participants who read using digital devices, were reading “reviews of the movie ‘Inception’”, because they much of the information was sought online. Figure 3 highlights were “planning for a movie at the weekend”. that some participants sought their information using additional physical paper resources, often including those who performed 4.2 Motivations for Searching for fun additional topical interest reading. Of those that used the internet The casual reading, recorded in our diary study, often created to search, many consulted their social network, especially those separate episodes of casual searching. These episodes were driven establishing personal scope with the information. The remainder by encountering information that created an Anomalous State of typically referred to news sources and Wikipedia articles, or Knowledge [1], but did not always relate to a direct information generally searching the web for related pages. Several participants need. Some ASKs also led to additional smaller bouts of casual- described themselves as searching for websites with authority on a interest reading, rather than searching. The four identified key topic, such as one participant who went to the UK government motivations for additional searching or reading, were: information website for “…census information. To find out the deadlines”. need, personal scoping, general topical, and decision-making. Information need examples included those that identified a clear piece of information they would like to know in order to continue reading. These specific information needs often consisted of dictionary definitions, such as one participant who was looking for “the meaning of the word ‘oakum’” because they did not know what it meant. Personal scoping motivations related to participants who encountered information that was somehow related to their history or personal life. The participant interested in the Bahrain also provides a good example here. Personal scoping examples also often led to searching behaviour within one’s own information, such as email or media collections, or within social networks. Typically, personal scoping was aimed at establishing, or remembering, the connection they had with the information they had just encountered. Figure 3: Methods used for casual searching. General topical searching was motivated by discovering something of novel interest, and often initiated casual learning 5. DISCUSSION without a specific information need. One participant, another This research has continued the recent interest in investigating casual searching behaviour that people undertake for fun. We example of a Carer, wanted to “know more about children with aimed to further investigate the findings of researchers like dementia” after they “read [an] article in [the] newspaper about a 9yr girl with this disease”. Elsweiler et al [5], and the model of casual-leisure searching behaviour they produced. In line with their model, our study Finally, decision-makers were those searching when motivated by found that around half of the casual reading episodes were the need to make a new decision. Often relating to a topical motivated by hedonistic or emotional needs, rather than interest, such decision-making motivations included deciding if an information needs. For those that engaged in searching behaviour, activity was something they would want to do, or to learn more some did aim to find specific information, either facts or about in future casual reading. One participant said that they information connecting what they had found to their own lives, while others began additional reading or topical browsing without a given information need. This finding, however, highlights that [3] Butterworth, R., Information seeking and retrieval as a although Elsweiler et al’s model separated information and leisure activity. In DL-CUBA'06, 29-32. 2006 hedonistically driven motivations, these episodes are often [4] Elsweiler, D., Mandl, S. and Lunn, B.K., Understanding intertwined and highly connected. Further, our work contributed casual-leisure information needs: a diary study in the additional insights into variables created by person- and situation- context of television viewing. In IIiX'10, 25-34. 2010 types, both of which have an affect on the interplay between [5] Elsweiler, D., Wilson, M.L. and Kirkegaard Lunn, B. informational and emotional motivations. While these findings are Understanding Casual-leisure Information Behaviour. in novel, future work should focus on fully understanding these Spink, A. ed. Future Directions in Information conditions; some notions, for example, are closely related to Behaviour, Springer, 2011 (to appear). elements of McQuails Mass Communication Theory [13]. [6] Erdelez, S., Information encountering: a conceptual Unfortunately, the design of the study meant that we did not framework for accidental information discovery. In capture information about whether people succeeded in finding ISIC'97, 412-421. 1997 information. Future work could help to validate these latter phases [7] Glaser, B.G. and Strauss, A.L., The discovery of grounded of Elsweiler et al‘s model, by focusing on the success, failure, and theory. The British Journal of Sociology, 20(2), 1967. importance of casual searches. [8] Harris, R.M. and Dewdney, P. Barriers to information: How formal help systems fail battered women. Greenwood 5.1 Limitations Press Westport, CT, 1994. Although the study covered 24 participants over the space of a [9] Hartel, J., The serious leisure frontier in library and week, and gathered over 120 casual searching episodes, there are information science: hobby domains. Knowledge some potential limitations in the methodology that should be organization, 30(3-4), 228-238. 2003. acknowledged. First and foremost, the study was performed by [10] Hassenzahl, M., The effect of perceived hedonic quality on five masters and one PhD student, each in the first few months of product appealingness. International Journal of Human- their postgraduate study. Consequently, this was their first field Computer Interaction, 13(4), 481-499. 2001. study and they were learning the techniques by performing them; [11] Liu, Z., Reading behavior in the digital environment: their individual skills varied. Further, each researcher produced Changes in reading behavior over the past ten years. their own paper diaries, which also introduced some slight Journal of Documentation, 61(6), 700-712. 2005. variations in content. Despite the fact that execution of the study [12] Marchionini, G. Information Seeking in Electronic may have been less rigorous than many diary studies, the results Environments. Cambridge University Press, 1995. did reveal several findings that both confirmed elements of other [13] McQuail, D. McQuail's mass communication theory. Sage research and revealed new insights into casual-leisure searching. Publications Ltd, 2000. [14] O'Brien, H.L., The influence of hedonic and utilitarian motivations on user engagement: The case of online 6. CONCLUSIONS shopping experiences. Interacting with Computers, This paper has described a diary study that investigated searching 22(5), 344-352. 2010. for fun, in the context of casual reading. Research has shown that [15] Ooi, K. and Liew, C.L., Selecting fiction as part of everyday such activities make up a significant portion of internet traffic, life information seeking. Journal of Documentation, while remaining largely under-studied. Our findings provided 67(5), 748-772. 2011. further evidence for previously proposed models of casual [16] Pejtersen, A.M. The Book House: Modelling User'Needs and searching, including the significance of hedonistic and emotional, Search Strategies a Basis for System Design. Ris√∏ rather than information-driven, motivations. Further, we have National Laboratory, 1989. shown that many of these activities relate to areas of interest and [17] Purcell, K. Search and email still top the list of most popular personal scope, rather than being specifically related to an online activities Pew Internet & American Life information need. Finally, much of the casual leisure searching Project.2011. was for decision-making, but in regards to pleasurable hedonistic [18] Savolainen, R., Everyday Life Information Seeking: activities and purchases. Combined with previous research in this approaching information seeking in the context of. area, our findings contribute to the developing understanding of Library & Information Science Research, 17(3), 259- these less-critical, experience-driven, often-hedonistic episodes of 294. 1995. searching, for fun. [19] Sheldrick Ross, C., Finding without seeking: The information encounter in the context of reading for 7. ACKNOWLEDGMENTS pleasure. Information Processing & Management, Thanks both to the participants, and the remaining researchers 35(6), 783-799. 1999. who helped to run the study: Tashi Rapten Bhutia, Mohammed [20] Smith, R. and Young, N.J., Giving Pleasure Its Due: Taheri, Daniel Williams, and Tim Crawford. Also thank you to Collection Promotion and Readers' Advisory in the reviewers for their valuable comments. Academic Libraries. The Journal of Academic Librarianship, 34(6), 520-526. 2008. 8. REFERENCES [21] Spiller, D., The provision of fiction for public libraries. [1] Belkin, N.J., Oddy, R.N. and Brooks, H.M., Ask for Journal of Librarianship and Information Science, information retrieval: parts I and II. Journal of 12(4), 238. 1980. Documentation, 38(2/3), 61-71, 145-164. 1982. [22] Stebbins, R.A., Leisure and Its Relationship to Library and: [2] Borlund, P., Experimental components for the evaluation of Information Science: Bridging the Gap. Library trends, interactive information retrieval systems. Journal of 57(4), 618-631. 2009. Documentation, 56(1), 71-90. 2000. [23] Wilson, M.L. and Elsweiler, D., Casual-leisure Searching: the Exploratory Search scenarios that break our current models. In HCIR'10, 28-31. 2010 In Search of a Good Novel Examining Results Matter Suvi Oksanen Pertti Vakkari School of Information Sciences School of Information Sciences University of Tampere University of Tampere 33014 University of Tampere, Finland 33014 University of Tampere, Finland Suvi.Oksanen@uta.fi Pertti.Vakkari@uta.fi ABSTRACT catalogue are used to access interesting novels to read. We studied how an enriched public library catalogue is used to access novels. 58 users searched for interesting novels to read in a 2. RELATED RESEARCH simulated situation where they had only a vague idea of what they Next we introduce studies on how readers access fiction in would like to read. Data consist of search logs, pre and post search libraries and on evaluation of fiction search systems. The questionnaires and observations. Results show, that investing literature in this field is scarce [1]. In [8], Pejtersen summarizes effort on examining results improves search success, i.e. finding her seminal works in fiction retrieval. As far as we know, there interesting novels, whereas effort in querying has no bearing on it. have been no published studies on fiction searching in commercial In designing systems for fiction retrieval, enriching result sites like Amazon. The discussion in [3] hints also to that. presentation with detailed book information would benefit users. Goodall [5] differentiates two stages in the book search process in the library. Readers identify first attributes in the books, which Categories and Subject Descriptors trigger their interest, and after that focus on attributes, which H.3.7. [Digital Libraries]: User Issues generate the decision to borrow the book. In the filtering stage, external attributes of the book like its cover or title are perceived General Terms as important, whereas in the selection stage, internal attributes of Human Factors the book like text on the back of the cover or passages of the text in the book are considered as useful. Ross [11] has made a roughly similar distinction based on interviewing 194 committed Keywords readers. She distinguished between the clues in the book and Fiction Retrieval, Novels, Readers, Public Libraries, Search elements in the book as indicators of an interesting book. Tactics, Search Effort, Search Success Pejtersen [8] has defined three major tactics for accessing fiction, which match to our research goals. Analytical search strategy is 1. INTRODUCTION used when readers wish to find novels about some topic like the Reading novels is a popular leisure time interest. Fiction was read Second World War. Search by analogy is generated when readers at least once a year by 50 % of Americans in 2008 [10] and by 80 want something similar to novel X, e.g. a novel they had % of Finns in 2010 [13]. Public libraries are major channels of previously read. Browsing strategy is applied in situations when getting access to novels [9]. Studies on the outcomes of public readers have only a vague idea of what they would like to read. libraries show that the major benefit derived from their use is the They are simply browsing for finding a good novel. pleasure of reading fiction [6, 15]. Despite this fact, there has not been much interest in studying and developing systems for fiction Based on observing user-librarian negotiations for finding fiction, retrieval since the 1980s [1]. The effort in developing search Pejtersen [8] has designed a fiction search system called the Book systems has been focused on retrieving non-fiction [2, 4]. House. It consisted of facets representing various attributes of novels as perceived by library users. These facets were access Traditionally library catalogs have supported accessing novels if points to novels. The evaluation showed that the system was the reader knows the name of the author or the title of the novel. It useful and pleasurable to use [8] All the available system is know that about half of the fiction borrowed is found by functionalities were used and the fiction classification system browsing, half by known item search [14]. This indicates a need fully accepted. The users found it useful in finding novels. to develop systems supporting other fiction search tactics than known item search. There are signs of enriching public library catalogs to include features supporting fiction retrieval like 3. RESEARCH DESIGN extended book descriptions or indexing [1, 12]. However, the The aim of this study is to analyze how an online catalog in a utility of these tools for accessing novels is not studied. Our aim public library is used for finding novels to read. We focused on a is to analyze how tools provided by an enriched public library situation when the readers have only a vague idea of what they would like to read. This corresponds to the browsing strategy in Pejtersen [8]. In addition to known item search, browsing is the Presented at Searching4Fun workshop at ECIR2012, Barcelona, Spain. second major strategy for accessing fiction [8, 14]. Conceptually, Copyright © 2012 for the individual papers by the papers' authors. browsing includes also similarity search and category search, Copying permitted only for private and academic purposes. This volume because in these search modes the reader does not know exactly is published and copyrighted by its editors what she wants. Browsing may lead to similarity search and moves was very scattered. The four most common moves were category search. Therefore, we chose browsing as the search book clicks (20.4 %), result list (20.2 %), free text search (8.2 %) mode in our study. The specific research questions are: and category limitation (6.5 %). The proportion of all other 25 moves varied between 4.8 % and 0.2 %. Therefore, for the • What kind of search moves were used for accessing economy of analysis we collapsed similar move categories like novels? field search (by publication date, library, language, category, • Was there an association between moves and search material) or limiting result list (by keyword, language, etc). We success? also recorded the time used for the search. PIKI library system serves several municipalities in Tampere The indicator of the success of search was an interesting novel region in Finland. It includes a database containing metadata found. The searchers rated the novel in a three-point scale from about the books in the networked libraries, and an interface to one to three (least to most interesting). If the searcher could not interact with that information and search books. The metadata for find an interesting novel, the scoring was zero. fiction contains typical bibliographic information added with keywords from the fiction thesaurus “Kaunokki” [12] and tags 4. RESULTS assigned by users. The metadata includes also images of book When starting a search, readers could select either a quick search, covers, recommendations by users and librarians, and availability an advanced search or a recommendation page as their point of information. The object of a default search is the whole database. departure. Quick search consists of a search box with a drop down Search results are ranked by relevance, but they can be ordered menu suggesting a keyword with information about its type like also alphabetically by author or title, and by publication year. author when keying in search terms. In an advanced search it is Search results can be limited by category, i.e. fiction vs. non- possible to formulate a query by selecting several fields to search. fiction, by the type of material like book, video, etc., by keyword, Recommendation pages include various lists of books and by language, or by library. Clicking the book title on the result list recommendations with links. reveals the metadata of the book with availability information. Advanced search was the most popular search mode (72.4 %) In addition to author, title, free term or keyword search, users may followed by quick search (19 %) and recommendations (17.5 %) start from recommendation pages. They include various lists of (table 1). Readers made on average 7.9 moves when attempting books and recommendations by users and librarians. Users can to find a good novel. Of these moves on average 3 were advanced also search for similar books based on keywords. searches, 0.4 quick searches and 0.5 recommendation moves. For the study 58 participants were recruited in May 2011 from Users retrieved on average 1.6 result lists, and limited these result three public libraries of various sizes in PIKI area. Of the study lists 0.6 times. On the result lists they clicked 1.6 books, but read subjects, 26 were recruited in a big main library, 22 in a medium only 0.2 book descriptions containing more than bibliographic sized main library and 10 in a small branch library. 36 were data. The average interest score of the book accepted was 2.4. females and 22 males. Their age varied between 14 and 70 years, The average search time was 215 seconds. the average age being 34 years. They were relatively highly Table 1. Basic statistics of the main study variables (n=58) educated, 39 % had a university degree, and 23 % had a high school education, and the rest had a lower education. They read Variable Mean Stddev Min Max % on average 24 novels per year ranging from 0 to 120 novels. using The search task was as follows: You are in a library in a situation Quick search 0.4 1.1 0 6 19.0 when you do not have a clear idea of what you would like to read. Advanced search 3.0 2.9 0 12 72.4 Please use the PIKI catalog to search for a novel of interest to you, which you would like to read. Do not search for a particular Result list 1.6 1.4 0 6 86.2 author or novel, although you may use this as a point of departure for your search. Thus, we simulated a typical browsing situation Result list limit. 0.6 1.3 0 6 27.6 [5, 11] when readers have only a vague idea of what they would Book clicks 1.6 1.3 0 7 95.1 like to read [8]. The search was ended when an interesting novel was found, or when the searcher gave up the search task as Book description 0.2 0.6 0 3 10.3 unsuccessful. Recommendation 0.5 1.3 0 7 17.5 The search screen was recoded. The researcher observed the search sessions and made notes. The searchers filled in a pre- All moves 7.9 4.3 2 21 100 search questionnaire eliciting demographic information, Book scores 2.4 0.9 0 3 100 information about reading orientation, the use of the library and search tactics for books in the library. After the search they filled Search time 215 118 76 593 100 in a post-search questionnaire including a pattern of questions for assessing various features of PIKI interface, ranking of the novel As table 2 indicates, the most popular search tactic was field found and open questions concerning the criteria of selecting the search (63.8 %) followed by free term search (44.8 %). Known novel and the difficulty of the search task. item search and keyword search were equally popular. Search moves were observed from the recordings of search An average search was relatively short consisting of about eight screen. 29 move types were identified. A move is an identified use moves and lasting about 3.5 minutes. A typical search consisted of a system feature like a keyword search, an author search, of advanced searches including mostly field searches or searches inspecting result list, limiting it, or exploring book metadata. The with terms from controlled or free text vocabulary. Searchers number of the moves varied from 2 to 21. The distribution of seldom limited the result list, but immediately assessed novels by examining bibliographic book information. They explored very Table 3. Correlations between the average time per move, seldom more detailed book descriptions for assessing novels’ search effort and the interest grade of a novel (n=58) value. The searches can be considered as successful. Only five Variables Book Time/ Results/ Results, searchers out of 55 could not find a novel, which they considered scores moves moves book/mo as interesting. Evaluation scores in three cases were missing. Thus, 50 searchers had a successful result, i.e. a novel rated at Time/moves -.45** least with value one. Of the searchers only one rated the novel with value one, nineteen with value two, and the rest thirty with Results/mo .34* -.19 value three. Thus, about 55 % of the searchers retrieved a novel Results, .31* -.24 .70*** with the highest interest rank. book/moves Table 2. Basic statistics of the search tactics variables (n=58) Q&A -.27* .16 -.03 -.54*** Search Mean Stddev Min Max % searches/mo Variable using Legend: *= p<.05; **=p<.01; ***=p<.001 Known item 0.6 1.1 0 5 32.8 The previous correlation analyses suggest that the following Free term 0.8 1.3 0 6 44.8 variables were significantly associated with search success: the average time per move, result lists per move, results and book Keyword 05 1.0 0 5 32.8 information per move, quick and advanced searches per move. We use these variables for predicting search success, i.e. the Field search 1.4 1.4 0 5 63.8 rating of the novel found. Because the two variables measuring the proportion of result list exploration of all moves were We were curious to know whether the search process variables conceptually correlated, we removed the variable measuring only were associated to the success of search measured by the interest visits in result lists, and kept that one which included also rate of the novel found. We analyzed the association between exploring book information. The latter one reflects more validly search moves and search success by calculating Pearson the effort put in exploring the search results. correlation coefficients. The results indicated that none of the search process variables in tables 1 and 2 excluding the result list The model building aims at analyzing the direct and intermediated was significantly associated with the perceived value of the novel. effects of each independent variable to dependent variable. The model indicates the relative effect of each variable to other The number of result lists visited correlated significantly with the success (r=.28; p=.04). Thus, it seems that search success was not variables, i.e. it indicates the effects other variables controlled [7]. associated with the search moves or their combinations used Path analysis was used for testing the model. In the path analysis standard regression coefficient are used [7]. The model (figure 1) excluding the number of visits in the result list. was significant (F=7.14; p=.000) indicating a good fit with the Success was neither associated with search effort measured as data. The multiple correlation (R) of the model was .548, and time used in searching (r=-.14; p=.31) or the total number of adjusted R squared .258. Thus, the model explains about 26 % of moves (r=.23; p=.10). However, we observed that effort invested the variance in the scores of the novels. in exploring the search results and in querying were significantly associated with the search success. Correlation between the time invested on an average move and the interest rating of a novel found was -.45 (p=.001) (table 3). Thus, quick shifts from move to move predict finding an interesting novel. The correlations show also that the greater proportion of the moves devoted to looking at the result list (r=.34; p=.013) or examining novels in detail found on the result list (r=.31; p=.022), the more likely searchers found an interesting novel. Deviating from this finding, the proportion of quick and advanced searches of all moves was negatively associated with the ratings of the novels selected (r=-.27; p=.045). Thus, the greater the proportion of quick or advanced search moves of all moves, the less interesting novels were found. Legend: * = p<.05; ** = p<.01; ***=p<.001 (n=58) Figure 1. A path model for predicting the scores of the novel In all, these findings hint, that search formulation variables, i.e. querying, were not associated with finding an interesting novel to The path analysis indicates that time used per move has a read, and their great proportion of all moves contributed to an significant direct effect on the scores of the novel found (beta=- unsuccessful search result. The proportion of moves devoted to .36). Also the proportion of search result exploration of all moves exploring result lists and book information, however, helped has a significant effect on novel scores (beta=.30), whereas the searchers to find interesting novels. Thus, the more swiftly the proportion of quick and advanced searches of all moves has no searchers proceeded from move to move, but the more effort they effect on the interest rating of the novel (beta=-.04). The average invested in exploring results list and book information, and the time per move has a significant effect neither on the proportion of less effort in search formulation moves, the more interesting results exploration (beta=.-.16) nor on quick and advanced novels they found. The findings imply, that search formulations searches of all searches (beta=.16). Interestingly, the proportion are less important than examination of search results as conditions of quick and advanced searches has a very large significant effect for finding an interesting novel to read. on the variation in the proportion of result exploration (beta=-.52). In all, the model indicates, that the less time the searcher used per experimental studies on evaluating new tools for supporting move, the more interesting novels were found. The average time fiction retrieval are needed. used per move did not have a significant influence on the proportion of moves devoted either to search formulation or the 6. REFERENCES results exploration. Although these beta coefficients were not [1] Adkins, D. & Bossaller, J. E. 2007. Fiction access points significant, their directions hint, that the less time used per move, across computer-mediated book information sources: a the more effort was invested in examining the result lists and comparison of online bookstores, reader advisory databases, books information, and the less effort in search formulations. In and public library catalogs. Library & Information Science addition, the more effort put in querying, the less effort allocated Research 29(3), 354-368. in examining results. Thus, it seems that there was a bifurcation of search strategies emphasizing either querying or result list [2] Case, D. 2007. Looking for information. A survey of research examination. These two strategies had very different effects on on information needs, seeking and behavior. 2nd ed. finding an interesting novel. Investing effort on examining the Academic Press. result list and book information has a significant positive effect on [3] Buchanan, G. & McKay, D. 2011. In the bookshop: finding an interesting novel, whereas emphasis on search examining popular search strategies. In Proceedings of the formulations has no bearing on finding an interesting novel. 11th Joint Conference on Digital Libraries. New York: ACM Press, 269-278. 5. DISCUSSION AND CONCLUSIONS [4] Elsweiler, D., Wilson, M.L. & Lunn, B.K. 2011. As far we know, this is the first study since Pejtersen [8] to Understanding casual leisure information behavior. In: analyze the search tactics used by readers for accessing fiction in Spink, A & Heinström, J.. (Eds). Future directions in enriched public library catalogs. We observed how readers information behaviour. Emerald, 211-241. searched for an interesting novel in a situation where they had only a vague idea of what they would like to read [8]. We found [5] Goodall, D. 1989. Browsing in the public libraries. LISU out that the use of various moves for searching novels was Occasional paper No 1. Loughborough: Library and scattered. The most common moves were advanced search, Information Statistics Unit. browsing result list and examining book information. The use of [6] Lance, K.C., Steffen, N.O., Logan, R., Rodney, M.J., & various moves was not associated with the success of the search, Kaller, S. 2001. Counting on results: new tools for outcome- with finding an interesting novel. However, it turned out that the based evaluation of public libraries. Aurora, CO: less time used per move, and the greater the proportion of moves Bibliographical Center for Research. Retrieved in December for examining the result list and book information, the more 20, 2009 from http://www. lrs. org/documents/cor/CoRFin. interesting the novel found. The proportion of search formulation Pdf. moves was not associated to the search success. The model build [7] Pedhazur E. 1982. Multiple regression in behavioral hints that readers used two alternative strategies with differing research. 2 nd ed. New York: Holt, Rinehart & Winston. success for accessing good novels. The strategy emphasizing search formulations was not associated with finding an interesting [8] Pejtersen, A.M. 1989. The Book House. Modeling user’s novel, whereas the more effort invested in examining results in needs and search strategies as a basis for systems design. the search, the more interesting novel was found. Roskilde: Risö National Laboratory. Effort invested in exploring search results instead of querying is [9] Perceptions of Libraries, 2010: Context and Community. an essential factor for finding interesting novels in a situation (2010). OCLC. Retrieved August 10, 2011 from when readers do not have a clear idea of what they wish to read. http://www.oclc.org/reports/2010perceptions.htm. Although readers have only a vague idea of the object of interest, [10] Reading on the Rise (2008). National Endowment for the they know genres, authors and titles, and have attributes of good Arts. Retrieved November 27, 2011 from novels in their mind [11]. They use this information when www.nea.gov/research/Readingonrise.pdf selecting books to read. It is likely that what is considered as an [11] Ross, C.S. 2001. Making choices: What readers say about interesting novel varies a lot in the sense that the substitutability choosing books to read for pleasure. The Acquisition of novels is great in this situation. Several alternatives may do, not Librarian 13(25): 5-21. only one. Therefore, effort put on exploring the result list is more productive than querying in the search for good novels to read. [12] Saarti, J. & Hypén, K. 2010. From thesaurus to ontology: the development of the Kaunokki Finnish fiction thesaurus. The Our results suggest that in designing systems for fiction retrieval, Indexer 28(2): 50-58. it is important to enrich result list presentation. Readers need more clues about where to infer that the novel could be of interest, [13] Serola, S. & Vakkari, P. 2011. The public library in the and also more options to be informed about the content of the activities of people. In Finnish. Publications of the Ministry novel [5, 11]. The latter include e.g. recommendations by fellow of Culture and Education 2011/21. MCE: Helsinki. readers and librarians, texts on the back of the books and links to [14] Spiller, D. 1980. The provision of fiction for public libraries. critics of the novels and to author information like in some Journal of Librarianship 12(4): 238-266. electronic bookshops. [15] Vakkari P. & Serola, S. 2012. Perceived outcomes of public It can be supposed that the more readers know about literature, the libraries. Library & Information Science Research 34(1): 37- more effectively they can identify interesting fiction [11]. In the 44. doi:10.1016/j.lisr.2011.07.005 studies to come, we analyze whether readers’ literary competence is connected to fiction search process and output. Also