MOTIVATION

A Pluggable Work-bench for Creating Interactive IR Interfaces

Mark M. Hall

m.mhall@shef m.mhall@sheffield.ac.uk 0

Spyros Katsaris

evolve.sheffieldis@gmail.com 1

Elaine Toms

e.toms@shef e.toms@sheffield.ac.uk 2 0 Sheffield University , S1 4DP, Sheffield , UK 1 Sheffield University , S1 4DP, Sheffield , UK 2 Sheffield University , S1 4DP, Sheffield , UK

Information Retrieval (IR) has bene ted from standard evaluation practices and re-usable software components, that enable comparability between systems and experiments. However, Interactive IR (IIR) has had only very limited bene t from these developments, in part because experiments are still built using bespoke components and interfaces. In this paper we propose a exible workbench for constructing IIR interfaces that will standardise aspects of the IIR experiment process to improve the comparability and reproducibility of IIR experiments.

evaluation framework standardisation

MOTIVATION

Information Retrieval (IR) has bene ted from standard evaluation practices and re-usable software components. The Cran eld-style evaluation methodology enabled evaluation programmes such as TREC, INEX, or CLEF. At the same time provision of re-usable software components such as Lucene1, Terrier2, Heritrix3, or Nutch4 have enabled IR researchers to focus on the development of those components directly related to their research. However, Interactive IR (IIR) as had only very limited bene t from these developments.

Typically IIR research is still conducted using a single system in a laboratory setting in which a researcher observed 1https://lucene.apache.org/ 2http://terrier.org/ 3https://webarchive.jira.com/wiki/display/Heritrix/Heritrix 4http://nutch.apache.org/

Presented at EuroHCIR2013. Copyright 2013 for the individual papers by the papers’ authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors. and interacted with a participant [ 5 ], usually using a bespoke IIR interface. Developing and running such experiments is a time-consuming, resource exhaustive and labour intensive process [ 6 ]. As a result of this bespoke approach, the comparability of IIR experiments and their results suffers. Where studies of the same activities show divergent results, it is di cult to determine whether the di erences are due to the speci c aspect of IIR under investigation, or simply due to di erent participant samples or small di erences in how the non-investigated user-interface (UI) components were implemented. The bespoke nature also makes it harder to replicate studies, as publications frequently do not contain su cient detail to exactly replicate the experiment.

In [ 3 ] we have proposed a exible, standardised IIR evaluation framework that aims to address the issues created by variations in the experimental processes and by how context information is acquired from the participants. However, the framework makes no provisions towards providing standardised IIR components that would improve the comparability of the experiment itself, the ease of setting up the experiment, and the ease of reproducibility.

A number of attempts at developing a con gurable, reusable IIR evaluation system have been made in the past. In 2004, Toms, Freund and Li designed and implemented the WiIRE (Web-based Interactive Information Retrieval) system [ 6 ], which devised an experimental work ow process that took the participant through a variety of questionnaires and the search interface. Used in TREC 11 Interactive Track, it was built using Microsoft O ce desktop technologies, severely limiting its capabilities. The system was re-created for the web and successfully used in INEX2007 [ 7 ], but lacked exibility in setup and data extraction. More recently, SCAMP (Search Con gurAtor for experiMenting with PuppyIR) [ 4 ] was developed to assess IR systems, but does not include the range of IIR research designs that are typically done. A heavy-weight solution is PIIRExS5 [ 1 ], which supports the researcher through the whole process from setting up the experiment to analysis, providing greater support but also a steeper learning curve. These approaches highlight the di culty of balancing the two main constraints that limit a system's wide-spread use: su cient exibility to support the wide range of IIR interfaces and experiments; su ciently simple to implement that it does not increase the resource commitment required to set up the experiment.

5http://sourceforge.net/projects/piirexs DESIGN

To achieve the goal of developing a system that ful ls these requirements, we propose a system design that is based around a very lean core into which the researcher can plug the IIR components they wish to include in their experiment. We have implemented this design in our web-based evaluation framework ( g. 1), which complements the larger IIR experiment support system presented in [ 3 ]. To achieve maximum exibility, the system was designed using a messagepassing architecture that consists of the following four components:

Web Frontend is handles the interface between the participant's browser and the evaluation workbench and is implemented using a combination of client-side and server-side functionality.

Message Bus handles the inter-component communication and forms the core of the system. It is responsible for passing messages from the Web Frontend to the IIR components con gured to be listening for those messages and also for passing messages directly between the components.

Session handles loading and saving the components' current state for a speci c participant, hiding the complexities of web-application state from the individual components. [SearchResults] handler = application.components.SearchResults name = search_results layout = grid-9 vgrid-expand connect = search_box:query

When the researcher sets up the workbench for their experiment, they can freely con gure which components to use, how to lay them out, and which components to connect to which other components. Based on this con guration the Web Frontend generates the initial user-interface that is shown to the participants. Then, when the participant interacts with a UI element ( g. 2), the resulting UI event is handled by the Web Frontend, which generates a message based on the UI event. This message is passed to the Message Bus, which uses the con guration provided by the researcher to determine which components to deliver the message to. The components that are listening for that message update their own Session state based on the message and then mark themselves as changed. After message processing has been completed for all components, the Web Frontend then updates the UI for each of the changed components.

An example of the con guration used to set-up the experiment is shown in gure 3 (from the experiment in gure 4), specifying the con guration of the \search results" component. It speci es that the component should be displayed 9 grid-cells wide (the application layout uses a 12-by-12 cell grid layout) and should expand vertically to use as much space as is available. The component is con gured to be connected to the \search box" component via the \query" message. It is this ability to freely plug components together that, we believe, makes the framework su ciently exible to support the wide range of IIR experiments, while remaining simple to set-up and use.

3. STANDARD COMPONENTS

The core system provides only the framework into which the IIR components can be plugged. This allows the researcher to build any custom IIR UI they wish to test, while at the same time being able to take advantage of the standardised session and log handling functionality. As IIR UIs frequently include required elements that are not the focus of the study the researcher wishes to undertake, an optional set of default components for core IR UI elements is provided to reduce set-up time. This has the additional advantage that as their behaviour is consistent across experiments, the comparability of experiments using the framework is improved. 3.1

Search Box

Logging provides a standardised logging interface that allows the components to easily attach logging information to the UI event generated by the participant.

The Search Box component ([ 8 ], p. 49, \Formulate Query Interface" [ 2 ], p. 76) provides a standard search box. When the participant enters text and clicks on the \Search" button, it generates a query message, which is usually connected to a Standard Results List. 3.2

Standard Results List

The Standard Results List component ([ 8 ], p. 50, \Examine Results Interface" [ 2 ], p. 77) provides a default 10 item listing of search results. The Standard Results List includes support for displaying snippets ([ 8 ], p. 51) and what Wilson calls \Usable Information" ([ 8 ], p. 51) for each result document. Unlike the other standard components, which can be used out-of-the-box, the Standard Results List has to be extended by the researcher in order to be able to access the search-engine used to power the UI. 3.3

Pagination

The Pagination component ([ 8 ] p. 70) displays a con gurable number of pages around the current search-results page. In response to user interaction it sends a start message with the rank of the rst document to paginate to. 3.4

Category Browsing

The Category Browsing component ([ 8 ], p. 54) provides a hierarchical category structure that the participant can use to explore a collection. Clicking on a category sends a query message with the category's identi er. 3.5

Saved Documents

The Saved Documents component provides an area where the participant can save things that they have found interesting, to support them in their current task. Documents are added through a save_document message. The Saved Documents component supports an optional tagging feature enabling the participant to tag the document with values speci ed by the researcher. This can be used to let the participant specify why they have chosen that document or how much it helps them in their current task. 3.6

Task

The Task component provides a static display of the task information to show to the user. Two versions of this component are provided, one that displays a static text set in the con guration, and one that can fetch a task description from the database, based on a parameter passed to it.

APPLICATION

The evaluation work-bench has so far been used to build two IIR experiments, very di erent in their nature, clearly demonstrating the work-bench's exibility.

The rst experiment ( g. 4) re-uses the standard Task, Search Box, Pagination, and Saved Documents components, and extends the Standard Results List to work with the speci c search backend. This set-up re-creates what is essentially a relatively standard search UI con guration, that is being used to investigate query session behaviour.

The second experiment ( g. 5) demonstrates a much richer interface, with more modi cations to the components and an experiment-speci c component. It re-uses the Task and Category Browsing components, extends the default Search Box, Pagination, Standard Results List, and Saved Documents components, and adds a new Item View component. The message-passing nature of the system made it possible to quickly integrate the new component, so that when the participant clicks on a meta-data facet in the Item View, a query message is sent to the Standard Results List to nd items with the same bit of meta-data. The interface was used to investigate un-directed exploration behaviour in a large digital cultural heritage collection. 5.

WHERE TO GO NEXT?

The stated aim of this paper was to present a novel, pluggable, extensible, and con gurable IIR interface work-bench, that supports our wider aim of improving IIR experiment comparability. The work-bench is su ciently exible to support the wide range of web-based IIR experiments that are undertaken, while being su ciently simple and light-weight to encourage wide-spread use of the workbench.

To enable this wide-spread use, the system has been released under an open-source license6. We are also moving to engage with the wider research community to determine to what degree the work-bench satis es their needs for an evaluation system and what needs to be done to achieve the wide-spread use needed to improve IIR experiment comparability. 6.

ACKNOWLEDGEMENTS

The research leading to these results was supported by the Network of Excellence co-funded by the 7th Framework Program of the European Commission, grant agreement no. 258191.

6https://bitbucket.org/mhall/pyire

[1]

Bierig ,

Cole ,

Gwizdka ,

N. J.

Belkin , J. Liu, C. Liu,

Zhang , and X. Zhang. An experiment and analysis system framework for the evaluation of contextual relationships . In CIRSE 2010, page 5 , 2010 .

[2]

Chua . A user interface guide for web search systems . In Proceedings of the 24th Australian Computer-Human Interaction Conference , OzCHI '12 , pages 76 { 84 , New York, NY, USA, 2012 . ACM.

[3]

M. M.

Hall and

E. G.

Toms . Building a common framework for iir evaluation . In Information Access Evaluation meets Multilinguality, Multimodality, and Visualization. 4th International Conference of the CLEF Initiative - CLEF 2013 , 2013 .

[4]

Renaud and

Azzopardi . Scamp: a tool for conducting interactive information retrieval experiments . In Proceedings of the 4th Information Interaction in Context Symposium , pages 286 { 289 . ACM, 2012 .

[5]

Tague-Sutcli e. The pragmatics of information retrieval experimentation, revisited . Information Processing & Management , 28 ( 4 ): 467 { 490 , 1992 .

[6]

E. G.

Toms ,

Freund , and

Li . Wiire: the web interactive information retrieval experimentation system prototype . Information Processing & Management , 40 ( 4 ): 655 { 675 , 2004 .

[7]

E. G.

Toms , H. O'Brien ,

Mackenzie ,

Jordan ,

Freund ,

Toze , E. Dawe, and

Macnutt . Task e ects on interactive search: The query factor . In Focused access to XML documents, pages 359 { 372 . Springer, 2008 .

[8]

M. L.

Wilson . Search User Inteface Design , volume 20 . Morgan & Claypool Publishers, 2011 .