Overview of the INEX 2014 Interactive Social Book Search Track Mark Hall1 , Hugo Huurdeman2 , Marijn Koolen2 , Mette Skov3 , and David Walsh1 1 Department of Computing, Edge Hill University Ormskirk L39 4QP, United Kingdom {mark.hall,david.walsh}@edgehill.ac.uk 2 University of Amsterdam, Netherlands {h.c.huurdeman,marijn.koolen}@uva.nl 3 Department of Communication and Psychology Aalborg University, Denmark skov@hum.aau.dk Abstract. Users looking for books online are confronted with both pro- fessional meta-data and user-generated content. The goal of the Interac- tive Social Book Search Track was to investigate how users used these two sources of information, when looking for books in a leisure context. To this end participants recruited by four teams performed two different tasks using one of two book-search interfaces. Additionally one of the two interfaces also investigated whether user performance can be improved by providing a user-interface that supports multiple search stages. 1 Introduction The goal of the Interactive Social Book Search (SBS) task is to investigate how book searchers use professional metadata and user-generated content at different stages of the search process. The purpose of this task is to gauge user interaction and user experience in social book search by observing user activity with a large collection of rich book descriptions under controlled and simulated conditions, aiming for as much ”real-life” experiences intruding into the experimentation. The output will be a rich data set that includes both user profiles, selected individual differences (such as a motivation to explore), a log of user interactivity, and a structured set of questions about the experience. The Interactive Social Book Search (ISBS) Task is a merge of the INEX Social Book Search (SBS, (???)) track and the Interactive task of CHiC (??). The SBS Track started in 2011 and has focused on system-oriented evaluation of book search systems that use both professional metadata and user-generated content. Out of three years of SBS evaluation arose a need to understand how users interact with these different types of book descriptions and how systems could support user to express and adapt their information needs during the search process. 480 The CHiC Interactive task focused on interaction of users browsing and searching in the Europeana collection. One of the questions is what types of metadata searchers use to determine relevance and interest. The collection, use case and task were deemed not interesting and useful enough to users. The SBS contributes a new document collection, use case and search tasks. The first year of the ISBS will therefore focus on switching to the SBS collection and use case, with as few other changes as possible. The goal of the interactive book search task is to investigate how searchers interact with book search systems that offer different types of book metadata. The addition of opinionated descriptions and user-supplied tags allows users to search and select books with new criteria. User reviews may reveal information about plot, themes, characters, writing style, text density, comprehensiveness and other aspects that are not described by professional metadata. In particular, the focus is on complex goal-oriented tasks as well as non- goal oriented tasks. For traditional tasks such as known-item search, there are effective search systems based on access points via formal metadata (i.e. book title, author name, publisher, year, etc). But even here user reviews and tags may prove to have an important role. The long-term goal of the task is investigate user behaviour through a range of user tasks and interfaces and to identify the role of different types of metadata for different stages in the book search process. Research Questions For the Interactive task, the main research question is: RQ : How do searchers use professional metadata and user-generated content in book search? This can be broken down into a few more specific questions: RQ1 How should the UI combine professional and user-generated information? RQ2 How should the UI adapt itself as the user progresses through their search task? In this paper, we report on the setup and the results of the ISBS track. First, in Section 2 lists the participating teams. The experimental setup of the task is discussed in detail in Section 3 and the results in Section 4. We close in Section 5 with a summary and plans for 2015. 2 Participating Teams In this section we provide information on the participating teams. In Table 1 we show which institutes participated in this Track and the number of users that took part in their experiments. 3 Experimental Setup 3.1 Social Book Search The goal of the interactive Social Book Search (iSBS) track is to investigate how searchers make use of and appreciate professional metadata and user-generated 481 Table 1. Overview of the participating teams and number of users per team Institute # users Aalborg 7 Amsterdam 7 Edge Hill 10 Humboldt 17 Total 41 content for book search on the Web and to develop interfaces that support searchers through the various stages of their search task. The user has a spe- cific information need against a background of personal tastes, interests and previously seen books. Through social media, book descriptions are extended far beyond what is traditionally stored in professional catalogues. Not only are books described in the users’ own vocabulary, but they are also reviewed and discussed online, and added to online personal catalogues of individual readers. This additional information is subjective and personal, and opens up opportu- nities to aid users in searching for books in different ways that go beyond the traditional editorial metadata based search scenarios, such as known-item and subject search. For example, readers use many more aspects of books to help them decide which book to read next (?), such as how engaging, fun, educational or well-written a book is. In addition, readers leave a trail of rich information about themselves in the form online profiles which contain personal catalogues of the books they have read or want to read, personally assigned tags and ratings for those books and social network connections to other readers. This results in a search task that may require a different model than pure search (?) or pure recommendation. The iSBS track investigates book requests and suggestions from the Library- Thing (LT) discussion forums as a way to model book search in a social envi- ronment. The discussions in these forums show that readers frequently turn to others to get recommendations and tap into the collective knowledge of a group of readers interested in the same topic. The track builds on the INEX Amazon/LibraryThing (A/LT) collection (?), which contains 1.5 million book descriptions from Amazon, enriched with content from LT. This collection contains both professional metadata and user-generated content.4 The records contain title information as well as a Dewey Decimal Classifica- tion (DDC) code (for 61% of the books) and category and subject information supplied by Amazon. We note that for a sample of Amazon records the subject descriptors are noisy, with a number of inappropriately assigned descriptors that seem unrelated to the books. Each book is identified by an ISBN. Since different editions of the same work have different ISBNs, there can be multiple records for 4 This collection is a subset of a larger collection of 2.8 million description. The subset contains all book description that have a cover image. 482 a single intellectual work. Each book record is an XML file with fields like isbn, title, author, publisher, dimensions, numberofpages and publicationdate. Curated metadata comes in the form of a Dewey Decimal Classification in the dewey field, Amazon subject headings in the subject field, and Amazon category labels in the browseNode fields. The social metadata from Amazon and LT is stored in the tag, rating, and review fields. 3.2 User Tasks Two tasks were created to investigate the impact of different task types on the participants interactions with the interfaces and also the professional and user- generated book meta-data. The goal-oriented task was developed as a “simulated leisure task” (?), with the topic derived from the LibraryThing collection. The LibraryThing collection contains discussion fora in which users asked other users for advice on which books to read for a given topic, question, or area of interest. From this list of discussion topics, a discussion on “layman books for physics and mathematics” was selected as the book collection contained a significant number of books on the topic, it is a neutral topic, it provides guidance, but it is also sufficiently flexible that participants can interpret it as needed. The following instruction text was provided to participants: Imagine you are looking for some interesting physics and mathematics books for a layperson. You have heard about the Feynman books but you have never really read anything in this area. You would also like to find an ”interesting facts” sort of book on mathematics The non-goal task was developed based on the open-ended task used in the iCHiC task at CLEF 2013 (?). The aim of this task is to investigate how users in- teract with the system when they have no pre-defined goal in a more exploratory search context. It also allows the participants to bring their own goals or sub- tasks to the experiment in line with the “simulated work task” ideas (?). The following instruction text was provided to participants: Imagine you are waiting to meet a friend in a coffee shop or pub or the airport or your office. While waiting, you come across this website and explore it looking for any book that you find interesting, or engaging or relevant... 3.3 Questionnaires The experiment was conducted using the SPIRE system5 (?), which uses a set of question pages to acquire the following information: 5 Based on the Experiment Support System – https://bitbucket.org/mhall/ experiment-support-system 483 – Consent – all participants had to confirm that they understood the tasks they would be asked to undertake and the types of data collected in the experiment. Participants also specified who had recruited them; – Demographics – the following factors were acquired in order to characterise the participants: gender, age, achieved education level, current education level, and employment status; – Culture – to quantify language and cultural influences, the following fac- tors were collected: country of birth, country of residence, mother tongue, primary language spoken at home, languages used to search the web; – Post-Task – in the post task questions, participants were asked to judge how useful each of the interface components and meta-data parts that they had used in the task were, using 5-point Likert-like scales; – Engagement – after participants had completed both tasks, they were asked to complete O’Brien et al.’s (?) engagement scale. Figure 1 shows the path participants took through the 8 steps of the experi- ment. The SPIRE system automatically rotated the task order to avoid ordering biases. Fig. 1. The path participants took through the experiment. Each participant completed the Pre-Task, Task, Post-Task twice (once for each of the tasks). The SPIRE system automatically balanced the task order. No data was acquired in the Introduction and Pre-Task steps. 484 3.4 System and Interfaces The two tested interfaces (baseline and multi-stage) were both built using the PyIRE6 workbench, which provides the required functionality for creating inter- active IR interfaces and logging all interactions between the participants and the system. This includes any queries they enter, the books shown for the queries, pagination, facets selected, books viewed in detail, meta-data facets viewed, books added to the book-bag, and books-removed from the book-bag. All log- data is automatically timestamped and linked to the participant and task. Both interfaces used a shared IR backend implemented using ElasticSearch7 , which provided free-text search, faceted search, and access to the individual books complete meta-data. Fig. 2. Baseline interface – results view. The baseline interface shown in figure 2 represents a standard web-search interface, with the left column containing the task instructions, book-bag, and search history and the main area showing the results. The book-bag resembles a shopping-cart system and contained those books participants collected to com- plete the task. Items added to the book bag could be viewed again and also be removed from the book-bag if the participants felt that they did not need them after all. The history panel at the bottom recorded all searches undertaken, to 6 Python interactive Information Retrieval Evaluation workbench – https:// bitbucket.org/mhall/pyire 7 ElasticSearch – http://www.elasticsearch.org/ 485 enable easy re-running of queries. The main panel on the right contained the search box and a paginated list of search results. The result listings contained a thumbnail image, title, author(s), and aggregate review ratings, if there were any reviews for the book. Fig. 3. Baseline interface – item view. The item view shown in figure 3 was displayed once a book was selected from the search results. The item view contained two distinct sets of information. The left hand side was dedicated to professional meta data, including publisher, price, isbn, published date, number of pages and a detailed description, followed by a grid of similar books if any had been linked to the main book. The right-hand side contained user generated content such as tags and customer reviews, where each review consisted of a description and a 1-5 star rating. The multi-stage interface aims to support users by taking the different stages of the search process into account. The idea behind the multi-stage interface design is supported by two theoretical components. Firstly, several information search process models look at stages in the search process. A well-known example is ?, who discovered “common patterns in users’ experience” during task performance. She developed a model consisting of six stages, which describe users’ evolving thoughts, feelings and actions in the con- text of complex tasks. ? later summarized Kuhlthau’s stages into three cate- gories (pre-focus, focus formulation, and post-focus), and points to the types of information searched for in the different stages. Building on ?, the proposed multi-stage search interface for ISBS includes three stages: Explore, focus, and 486 refine. Secondly, when designing a new search interface for social book search it has also been relevant to look more specifically at the process of choosing a book to read. A model of decision stages in book selection (?) identifies the following deci- sion stages: browse category, selecting, judging, sampling, and sustained reading. This work supports the need for a user interface that takes the different search and decision stages into account. However, the different stages in (?) closely relate to a specific full text digital library, and therefore the model was not applicable to the present collection. Fig. 4. Multistage interface – explore view. The initial explore stage shown in Figure 4 aimed to support the initial exploration of the data-set and contains a very similar feature set to the baseline, including task instructions, search box, search restuls, book bag, and search history. The two main differences to the baseline interface were the navigation bar that allows the participants to switch between the stages and the dense, multi-column search results. The search results were shown in two columns with each book showing only a title and review rating. Each column showed a different faceted filter of the same query to allow for a wider overview of results to be explored. The ”filter options” allowed the participant to select what filter to apply to each of the columns. Additionally the book-bag was extended to include a note field for each book, but this feature was not used heavily. Figure 5 shows the popup interface which is displayed when a book is selected in the explore or refine stage’s search results. All of the book meta-data is avail- able via a tabbed structure, including the professional meta-data in the form 487 Fig. 5. Multistage interface – item popup view. of the description and publication data as well as the user generated content of reviews and tags. Fig. 6. Multistage interface – focus view. The focus stage shown in Figure 6 supports in-depth searching and provides detailed search results that directly include the full meta-data that in the other stages is shown via a popup (Figure 5). A category filter was also provided in the left column which provided a means to reduce and refine the search results. In the explore stage, the participant could choose to focus on one of the two result columns, which would show the refine view with that column’s filter pre-selected. The refine stage shown in Figure 7 supports the refining of the final list of books the participants want to choose. It thus focuses on the books the user has 488 Fig. 7. Multistage interface – refine view. already added to their book-bag and this stage cannot be entered until at least one book has been added to the book-bag. The search feature is a lot less prevalent in this interface and is confined to the left column with minimal result details shown. The quer, filters, and results are the same as on the focus stage. Directly underneath the results is the similar books panel, which shows all books that are defined as similar to any of the books in the book-bag, in order to support the participant in augmenting their list of collected books. When a book in the book-bag has similar books a button is shown above the book title which highlights that book’s similar books in yellow and raises them to the top of the similar books list. The book-bag fills the rest of the interface and works as it does on the other pages but provides more detail per book. 3.5 Participants A total of 41 participants were recruited (see Table 1), 27 female and 14 male. 16 were between 18 and 25, 21 between 26 and 35, 3 between 36 and 45, and 1 between 46 and 55. 9 were in employment and 32 were students. Participants came from 8 different countries including Germany, Denmark, UK, the Nether- lands, Colombia, Brazil, Romania, and Suriname. Participants mother tongues included German, Dutch, English, Danish, Romanian, Farsian or Portuguese. The majority of participants executed the tasks at a lab (29), only 12 users conducted the experiment remotely. 22 participants used the novel multi-stage interface, while 19 used the baseline interface. 489 3.6 Procedure Participants were invited by the individual teams, either using e-mail (Aalborg, Amsterdam) or by recruiting students from a lecture or lab (Edge Hill, Hum- boldt). Where participants were invited by e-mail, the e-mail contained a link to the online experiment, which would open in the participant’s browser. Where participants were recruited in a lecture or lab, the experiment URL was dis- tributed using e-learning platforms. All browsers and operating systems had been tested and worked, with the exception of Safari and Chrome on OS X, where there was a conflict caused by the security certificate, which was outside of our control. After participants had completed the experiment as outlined above (3.3), they were provided with additional information on the tasks they had com- pleted and with contact information, should they wish to learn more about the experiment. Where participants that completed the experiment in a lab, teams were able to conduct their own post-experiment process, which mostly focused on gathering additional feedback on the system from the participants. 4 Results Based on the participant responses and log data we have aggregated summary statistics for a number of basic performance metrics. Session length was measured automatically using JavaScript and stored with the participants’ responses. Table 2 shows median and inter-quartile ranges for all interface and task combinations. While the results seem to indicate that participants spent longer in the baseline interface and also longer on the goal- oriented task, the differences are not statistically significant(Wilcoxon signed- rank test). Interestingly, for the non-goal task, the median times are roughly similar to the session lengths in the iCHiC experiment that the task was taken from ?. This might indicate that that is the approximate time that participants can be expected to spend on any kind of open-ended leisure-task. Table 2. Session lengths for the two interfaces and tasks. Times are in min- utes:seconds and are reported median (inter-quartile range). Interface Goal-oriented Non-goal Baseline 6:25min (3:42min) 3:42min (3:45min) Multi-Stage 3:35min (4:24min) 2:40min (6:21min) 490 Number of queries was extracted from the log-data. In both interfaces it was possible to issue queries by typing keywords into the search box or by clicking on a meta-data field to search for other books with that meta-data field value. Both types of query have been aggregated and Table 3 shows the number of queries for each interface and task. The results are in line with the session length results, with participants executing slightly more queries in the goal-oriented task (Wilcoxon rank-sum test p < 0.05). However, the interface did not have a significant impact on the number of queries executed. Table 3. Number of queries executed. Numbers are reported median (inter- quartile range). Interface Goal-oriented Non-goal Baseline 4 (5.5) 2 (4.5) Multi-Stage 3 (2.75) 2 (3) Number of books viewed was extracted from the log-data. Table 4 shows the results. Participants viewed fewer books in the non-goal task (Wilcoxon rank-sum test p < 0.05), which was to be expected considering that they also executed less queries and spent less time on the task. As with the number of queries the number of books viewed is not significantly influenced by the interface participants used. Table 4. Number of books viewed. Numbers are reported median (inter-quartile range). Interface Goal-oriented Non-goal Baseline 4 (5.5) 2 (4.5) Multi-Stage 3 (2.75) 2 (3) Number of books collected was extracted from the log-data. Participants collected those books that they felt were of use to them. The numbers reported in Table 5 are based on the number of books participants had in their book-bag when they completed the session, not the total number of books collected over the course of their session, as participants could always remove books from their book-bag in the course of the session, as participants could always remove books from their book-bag in the course of the session. Unlike the other metrics, where the interface had no significant influence on the metric, in the non-goal task, participants collected significantly more books 491 Table 5. Number of books collected. Numbers are reported median (inter- quartile range). Interface Goal-oriented Non-goal Baseline 3 (3) 1 (2) Multi-Stage 3.5 (3) 2 (3) using the multi-stage interface than with the baseline interface. Considering that there are no significant interface effects for the non-goal task in any of the other metrics and that there is no significant difference in the goal-oriented task, this strongly suggests that the multi-stage interface provides a benefit to open-ended leisure tasks, while at the same time working just as well as the baseline interface for more focused tasks. 5 Conclusions and Plans This was the first year of the Interactive Social Book Search Track. Because of time constraints, the data-gathering period was short and only a small number of users participated in the study. However, their data provides valuable lessons for the future. Plans for the next year are to make improvements to the interfaces based on user feedback and to run a full data-gathering phase to get user data from a much larger group of participants. Bibliography T. Beckers, N. Fuhr, N. Pharo, R. Nordlie, and K. N. Fachry. Overview and results of the inex 2009 interactive track. In M. Lalmas, J. M. Jose, A. Rauber, F. Sebastiani, and I. Frommholz, editors, ECDL, volume 6273 of Lecture Notes in Computer Science, pages 409–412. Springer, 2010. ISBN 978-3-642-15463-8. P. Borlund and P. Ingwersen. The development of a method for the evaluation of interactive information retrieval systems. Journal of documentation, 53(3): 225–250, 1997. M. M. Hall and E. Toms. Building a common framework for iir evaluation. In CLEF 2013 - Information Access Evaluation. Multilinguality, Multimodality, and Visualization, pages 17–28, 2013. doi: 10.1007/978-3-642-40802-1 3. M. M. Hall, R. Villa, S. A. Rutter, D. Bell, P. Clough, and E. G. Toms. Sheffield submission to the chic interactive task: Exploring digital cultural heritage. In CLEF 2013 Evaluation Labs and Workshop. M. Koolen, J. Kamps, and G. Kazai. Social Book Search: The Impact of Pro- fessional and User-Generated Content on Book Suggestions. In Proceedings of the International Conference on Information and Knowledge Management (CIKM 2012). ACM, 2012a. 492 M. Koolen, G. Kazai, J. Kamps, A. Doucet, and M. Landoni. Overview of the INEX 2011 books and social search track. In S. Geva, J. Kamps, and R. Schenkel, editors, Focused Retrieval of Content and Structure: 10th In- ternational Workshop of the Initiative for the Evaluation of XML Retrieval (INEX 2011), volume 7424 of LNCS. Springer, 2012b. M. Koolen, G. Kazai, J. Kamps, M. Preminger, A. Doucet, and M. Landoni. Overview of the INEX 2012 social book search track. In S. Geva, J. Kamps, and R. Schenkel, editors, Focused Access to Content, Structure and Context: 11th International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX’12), LNCS. Springer, 2013a. M. Koolen, G. Kazai, M. Preminger, and A. Doucet. Overview of the INEX 2013 social book search track. In CLEF 2013 Evaluation Labs and Workshop, Online Working Notes, 2013b. C. C. Kuhlthau. Inside the search process: Information seeking from the user’s perspective. Journal of the American Society for Information Science, 42(5): 361–371, 1991. ISSN 1097-4571. doi: 10.1002/(SICI)1097-4571(199106)42: 5h361::AID-ASI6i3.0.CO;2-#. URL http://dx.doi.org/10.1002/(SICI) 1097-4571(199106)42:5<361::AID-ASI6>3.0.CO;2-#. H. L. O’Brien and E. G. Toms. The development and evaluation of a survey to measure user engagement. Journal of the American Society for Information Science and Technology, 61(1):50–69, 2009. V. Petras, T. Bogers, E. Toms, M. Hall, J. Savoy, P. Malak, A. Pawowski, N. Ferro, and I. Masiero. Cultural heritage in clef (chic) 2013. In P. Forner, H. Mller, R. Paredes, P. Rosso, and B. Stein, editors, Information Access Evaluation. Multilinguality, Multimodality, and Visualization, volume 8138 of Lecture Notes in Computer Science, pages 192–211. Springer Berlin Heidel- berg, 2013. ISBN 978-3-642-40801-4. doi: 10.1007/978-3-642-40802-1 23. URL http://dx.doi.org/10.1007/978-3-642-40802-1_23. K. Reuter. Assessing aesthetic relevance: Children’s book selection in a digital library. JASIST, 58(12):1745–1763, 2007. M. Skov and P. Ingwersen. Exploring information seeking behaviour in a digital museum context. In Proceedings of the second international symposium on Information interaction in context, pages 110–115. ACM, 2008. E. Toms and M. M. Hall. The chic interactive task (chici) at clef2013. http://www.clef-initiative.eu/documents/71612/1713e643-27c3- 4d76-9a6f-926cdb1db0f4, 2013a. E. G. Toms and M. M. Hall. The CHiC Interactive Task (CHiCi) at CLEF2013. In CLEF 2013 Evaluation Labs and Workshop, Online Working Notes, 2013b. P. Vakkari. A theory of the task-based information retrieval process: a summary and generalisation of a longitudinal study. Journal of documentation, 57(1): 44–60, 2001. 493