Introduction

Individual Di erences and Task Behaviour

Mark M Hall

Mark.Hall@edgehill.ac.uk 0

Marijn Koolen

marijn.koolen@uva.nl 1 0 Edge Hill University , St Helens Road, Ormskirk, L39 4QP , United Kingdom 1 University of Amsterdam , Netherlands

The Interactive Social Book Search track at CLEF has run the same experiment, task, and interface for two years. This provides an opportunity to study the individual di erences between two separately recruited participant cohorts, rather than between sub-sets of a single cohort. Overall the results show no signi cant di erences in how the participants used the three main stages of the interface for the two tasks. However, at the detail level there are some quite signi cant changes in exactly how participants use the available functionality.

user study interactive information retrieval individual di erences log analysis

Introduction

The aim of the Interactive Social Book Search (iSBS) track at CLEF is to investigate user behaviour when faced with a collection of approximately 1.5 million books that combines both professional and user-generated content. It is now in its third year and while signi cant changes were made from year one to year two, the experiment, its data, interface, and tasks were kept stable between years two and three. The only major change between the two years was the participants. It is thus possible to investigate the impact of two participant cohorts without having to sub-set a single experiment based on some criteria. From this we can investigate the stability of any conclusions, both at the macro and micro levels.

The iSBS experiment [ 2, 7 ] consists of two tasks, a non-goal and a goaloriented task. In the non-goal task participants were instructed to simply explore the collection until they are bored, adding any books they feel are interesting to their bookbag. In the goal-oriented task participants were instructed to nd ve books that they would like to have if they were alone on a desert island for one month.

In years one and two participants were allocated to either a baseline, faceted search interface or a novel multi-stage interface. In year three only the multistage interface was used. The multi-stage interface is designed to mimic the gradual narrowing of the information journey [6, 8] and is split into three stages. The initial stage (Explore, g. 1a) provides a pure browsing interface for exploring the collection. The left-hand side shows a tree structure automatically generated from the books' Amazon browse-node tags using an adapted version of the algorithm in [ 3 ]. When the user selects a tag, its child tags are shown in the tree, to allow digging down in to the tree, and on the right the books that are tagged with the tag are listed in a dense list. The user can view each book's details by clicking on the book's title.

(a) (b) (c)

The second stage (Search, g. 1b) provides a standard faceted search interface [4], with more detail shown for each book. The third stage (Book-bag, g. 1c) lets the user interact with those books that they have added to their book-bag in the Explore and Search stages. Additionally for each book the participant has access to similar books, with the similarity based on one of: authors, title, topics, or user-generated tags.

The interface, data-set, and tasks were all kept exactly the same between year two (2015) and year three (2016). There was a minor change in the experiment structure. In 2015 participants undertook both the non-goal and goal-oriented tasks (ordering balanced), while in 2016 participants were randomly assigned to one of the two tasks. This change was made to reduce the time requirements for participants. Previous analysis [ 1 ] indicates that in 2015 task order had no signi cant impact on use patterns, ensuring comparability with the 2016 data. Additionally, in 2016 participants could opt-in to do an additional focused task. As no comparable data is available for 2015, the additional task data is not taken into account in this analysis. The remainder of the paper will now compare the 2015 and 2016 results to investigate the impact of participant cohort di erences on the observed results. 2

Explore, Query, and the Book-bag

Before looking at detailed di erences in the participants' interactions with the system, it is necessary to rst determine whether there are any signi cant highlevel di erences between how the participants used the system. In particular whether they used the Explore, Search, and Book-bag stages di erently.

The experiments in both years captured very rich log data that allows for the full replication of each participant's interactions with the interface. Based on the time-stamps at which participants switched between the three stages, the amount of time they spent in each stage was calculated. As there is a large amount of variation in how long participants spent on the system, the times were then normalised by the total session length.

Figure 2 shows the normalised time distributions for both 2015 and 2016 in the non-goal task. Mann-Whitney U tests showed no signi cant di erences in the amount of time spent in each of the stages.

2015 2016 The interface provided a lot of functionality within each of the three stages. In the initial analysis the focus will be on how frequently participants interacted with the available functions. The following actions were taken extracted from the log data: { browse { the participant clicked on one of the topics in the tree in the Explore stage;

{ query { the participant issued a query, either by typing a query into the search box or by clicking on an item meta-data to search for that piece of meta-data; { facet { the participant added or removed a facet in the Search stage; { paginate { the participant paginated through the result list either in the

Explore or Search stages; { item { the participant viewed an item; { bookbag { the participant interacted with the Book-bag, adding or removing a book, or adding notes to a book; { similar { the participant used the similar items functionality in the Book-bag.

For each user a count of how often they used each of the actions was determined. The count vectors were then clustered using hierarchical, average-linkage clustering [5]. Clusters were determined using a distance threshold of 0:2. The resulting clusters were manually analysed and classi ed based on which actions participants in a cluster used. The cluster membership counts were normalised by the total number of participants in each year (2015: 95 both tasks; 2016: 52 non-goal , 53 goal-oriented task) to enable comparisons.

In neither year nor task did the bookbag and similar actions distinguish any of the clusters, as they are used relatively consistently across all clusters. The primary distinction between participants in both years was whether they primarily used the browse functionality in the Explore stage, or the query functionality in the Search stage, or whether they used a mixed approach (tab. 1 & 2). Those main distinctions were then augmented by distinctions based on what other actions they used. If an action is not included in the cluster name, then it was not distinctively used by participants in that cluster. 3.1 The cluster data in Table 1 mirrors the general time data from Figure 2, indicating that the majority of time is spent in the Explore stage (clusters# 1-4), with varying levels of activity in the Search stage (cluster #5 and #6). The results also show that for this type of unstructured task the use of just the query action is not a common strategy (cluster #7: two users in 2015, none in 2016).

Within those broad strokes, there are however quite signi cant di erences between the two years. First, while in 2015 11% of participants only browsed and selected their books without looking at any item details (cluster #1), in 2016 no participant used this strategy. This ts into the larger picture, where from 2015 to 2016 there is a clear move from just browsing-based strategies (clusters #1-4) to strategies that incorporate a signi cant search element in the mixed strategies (clusters #5 and #6).

In both years there are two participants who basically did not interact with the system in any detail (cluster #8), but simply selected one browse topic, then scrolled through the results, and selected one or more books from that list into their book-bag. Further work is needed to investigate whether these users are generally disengaged from the task or whether they are just struggling with the open-ended nature of the non-goal task. 3.2

Goal-oriented Task As with the non-goal task, the system use in the goal-oriented task (tab. 2) mirrors the time spent ( g. 3). Interestingly, the goal-oriented task shows larger clusters and more overlap between the years than the non-goal task.

The most striking di erence between the two years is that in 2016 13% of participants used a pure browsing strategy to complete the goal-oriented task (cluster #1). This change comes at the expense of using a search-only strategy (cluster #5). There are also more participants who used a mixed strategy, but did not look at many items in detail (cluster #2). Nevertheless the di erences between the two years are less than in the non-goal task, most likely as the focused nature of the task provides a guiding structure that reduces individual di erences. Focusing on the use of browsing in the experiment, the second analysis focused on how participants interacted with the tree structure in the Explore stage. To facilitate the comparison of browsing patterns, the following browsing actions were extracted from the log data: { start { the participant has not previously selected a topic and selects a toplevel topic. This includes the scenario where the participant switches to the Search stage and then back; { depth { the participant selects a child topic of the currently selected topic; { breadth { the participant selects a sibling of the currently selected topic or a sibling of one of the current topic's ancestors; { backtrack { the participant selects one of the ancestor topics of the current topic; { restart { the participant selects a top-level topic that is not related to the current topic.

In the analysis consecutive uses of the depth, breadth, and backtrack actions are merged into a single action before the analysis. This ensures that di erences in the heights and breadths of individual sub-trees of the hierarchical topic structure do not in uence the analysis. The resulting browse patterns were counted to identify all patterns that make up more than 5% of the total number of patterns (tab. 3 & 4).

A central result for both years and tasks is that the use of backtrack is limited to complex browse patterns that do not occur frequently. The maximum frequency for any browse pattern involving backtrack is 2%. However, overall backtrack occurs in approximately 7% - 12% of patterns. 4.1 In the non-goal task, as with the action use in section 3, there are signi cant di erences between the two years. The main di erence is that while in 2015 the most frequently used patterns are drilling down from the top with an optional breadth-search at the bottom of the tree (pattern #3, #5, #7), in 2016 the use of looking at siblings at the bottom has reduced signi cantly (pattern #5), with a matching increase in straight drill-down behaviour (patterns #3 and #7). It seems that the 2016 participants saw signi cantly less value in looking around the leaves of a branch, preferring instead to go back to the top and delve into a di erent branch. The main di erence is the big decrease in the frequency of just the start action (pattern #1). This mirrors the general increase in use of the browsing functionality (see g. 2, cluster 1) in the goal-oriented task. This indicates that the 2016 participants saw signi cantly more bene t in exploring the collection using the Explore stage than in 2015. However, these participants only explored one branch in the tree, before then moving on to the Search stage, as evidenced by the lack of change in the restart patterns (#6-9). Further analysis is needed to investigate what the participants are learning from this single exploration.

The second interesting di erence is that as in the non-goal task, there is a marked reduction in the number of uses of the depth-then-breadth patterns (#5 and #9). Why participants did not browse around the leaves of the branches requires further study. 5

Conclusion

Two years of iSBS data generated from the same data-set, tasks, and interface provide an interesting window into the potential variation between experiment participant cohorts. The overall comparison of the two years' of data indicates that the relative uses of the three main stages of the multi-stage interface are 1 start 2 start ! breadth 3 start ! depth 4 start ! breadth ! depth 5 start ! depth ! breadth 6 restart ! depth 7 restart ! depth ! breadth 8 restart ! breadth 9 restart ! breadth ! depth 2015

2016 stable across the participant cohorts. Thus changes in detailed use patterns are likely due to individual di erences, rather than due to external factors.

Both the analysis of the action use clusters in the system and the browse patterns reveal some quite large changes in detail behaviour between the two years. In particular there is a general trend towards using both the Explore and Search stages for both tasks. The 2016 data shows a decrease in the amount of pure-browsing approaches to the non-goal task together with an increase in the amount of browsing in the goal-oriented task.

This indicates that while for the general trends such as the overall use of the di erent stages, the number of participants in the two experiments (2015: 95, 2016: 105) is su cient to produce stable and reliable results. However, for more detailed analysis the variation suggests that the results have to be read with a certain amount of caution, and potentially larger participant numbers are required to produce stable results.

Future work will need to investigate the data in more detail to determine whether it is possible to nd explanations for the observed di erences in the behaviour of the two cohorts. In particular the potential impact of cultural / language / background issues will need to be investigated. 4. Hearst, M.A.: Search User Interfaces. Cambridge University Press (2009) 5. Jones, E., Oliphant, T., Peterson, P.: SciPy: Open source scienti c tools for Python (2001{) 6. Kuhlthau, C.C.: Inside the search process: Information seeking from the user's perspective. JASIS 42(5), 361{371 (1991) 7. T.b.a: Overview of the SBS 2016 Interactive Track. In: CLEF2016 Working Notes.

CEUR Workshop Proceedings (2016) 8. Vakkari, P.: A theory of the task-based information retrieval process: a summary and generalisation of a longitudinal study. Journal of documentation 57(1), 44{60 (2001)

1. Campbell , D. , Hall , M.M. , and Walsh , D. : Edge Hill Computing @ Interactive Social Book Search 2015 . In: Cappellato, L. , Ferro , N. , Jones , G.J.F. , and San Juan, E. (eds.) CLEF2015 Working Notes. CEUR Workshop Proceedings ( 2015 )

2. Gade, M., Hall , M.M. , Huurdeman , H. , Kamps , J. , Koolen , M. , Skov , M. , Toms , E. , and Walsh , D. : Overview of the SBS 2015 Interactive Track . In: Cappellato, L. , Ferro , N. , Jones , G.J.F. , and San Juan, E. (eds.) CLEF2015 Working Notes. CEUR Workshop Proceedings ( 2015 )

3. Hall , M.M. , Fernando , S. , Clough , P. , Soroa , A. , Agirre , E. , and Stevenson , M. : Evaluating hierarchical organisation structures for exploring digital libraries . Information Retrieval 17 ( 4 ), 351 { 379 ( 2014 )