A User-Centered Experiment and Logging Framework for Interactive Information Retrieval ∗ † Ralf Bierig Jacek Gwizdka Michael Cole SC&I Rutgers University SC&I Rutgers University SC&I Rutgers University 4 Huntington St., 4 Huntington St., 4 Huntington St., New Brunswick New Brunswick New Brunswick NJ 08901, USA NJ 08901, USA NJ 08901, USA bierig@rci.rutgers.edu jgwizdka@scils.rutgers.edu mcole@scils.rutgers.edu ABSTRACT This poses new challenges for the evaluation of information This paper describes an experiment system framework that retrieval systems. An enriched set of possible user behaviors enables researchers to design and conduct task-based ex- needs to be addressed and included as part of the evalu- periments for Interactive Information Retrieval (IIR). The ation process. Systems need to address information about primary focus is on multidimensional logging to obtain rich the entire interactive process with which users’ accomplish a behavioral data from participants. We summarize initial task. This problem has so far only been initially explored [4]. experiences and highlight the benefits of multidimensional data logging within the system framework. This paper describes an experiment system framework that enables researchers to design and conduct task-based IIR experiments. The paper is focused on the logging features Categories and Subject Descriptors of the system designed to obtain rich behavioral data from H.4 [Information Systems Applications]: Miscellaneous participants. The following section describes the overall ar- chitecture of the system. Section 3 provides more details about its specific logging features. Section 4 summarizes ini- Keywords tial experiences with multidimensional data logging within User logging, Interactive Information Retrieval, Evaluation the system framework based on initial data analysis from three user studies. Future work is proposed in section 5. 1. INTRODUCTION Over the last two decades, Interactive Information Retrieval 2. THE POODLE IIR EXPERIMENT SYS- (IIR) has established a new direction within the tradition of TEM FRAMEWORK IR. Evaluation in traditional IR is often performed in labo- The PooDLE IIR Experiment System Framework is part of ratory settings where controlled collections and queries are an the ongoing research project. The goal of PooDLE1 to evaluated against static information needs. IIR introduces investigate ways to improve information seeking in digital the user at the center of a more naturalistic search environ- libraries; the analysis concentrates on an array of interact- ment. Belkin and colleagues [3, 2] suggested the concept of ing factors involved in such online search activities. The an information seeking episode composed of a sequence of a overall aim of the framework is to reduce the complexity person’s interactions with information objects, determined of designing and conducting IIR experiments using multidi- by a specific goal, conditioned by an initial task, the general mensional logging of users’ interactive search behavior. Such context and the more specific situation in which the episode experiments usually require a complex arrangement of sys- takes place, and the application of a particular information tem components (e.g. GUI, user management and persis- seeking strategy. tent data storage) including logging facilities that monitor implicit user behavior. Our framework enables researchers ∗Copyright is held by the author/owner(s). to focus on the design of the experiment including ques- SIGIR’09, July 19-23, 2009,Boston, USA. tionnaire and task design and the selection of appropriate †This work is supported, in part, by the Institute of Museum logging tools. This can help to reduce the overall time and effort that is needed to design and conduct experiments that and Library Services (IMLS grant LG-06-07-0105-07) support the needs for IIR. As shown in figure 1, the experi- ment system framework consists of two sides – a server that operates in an Apache webserver environment and a client that resides on the machine where the experiment is con- ducted. We distinguish the following components: • Login and Authentication manages participants, allows them to authenticate with the system, and enables the system to direct individuals to particular experiment 1 . http://www.scils.rutgers.edu/imls/poodle/index.html Figure 1: System components of the PooDLE IIR Experiment System Framework. Logging features high- lighted in grey. setups; multiple experiments may exist and users can a bookmarking feature and an evaluation pro- be registered for multiple or multi-part experiments at cedure, and cognitive tasks to obtain informa- any time. tion about individual differences between partici- pants). Tasks are easily added to this basic collec- • The Graphical UI allows participants to authenticate tion and can be reused as part of the framework with the framework and activate their experiment. Each in different experiments. experiment consists of a number of rotated tasks that – The Task Progress and Control Management pro- are provided with a generic menu that presents the vides participants with (rotated) task sequences, predefined task order to the user. After every com- monitors their state within the experiment, and pleted task, the UI guides the participant back to the allows them to continue interrupted experiments menu that now highlights the completed tasks. This at a later point in time. allows participants to navigate between tasks and gain feedback that helps them to track their progress. In – The Interaction Logger allows tasks to register addition, the interface presents participants with ad- and trigger logging messages at strategic points ditional information, instructions and warnings when within the task. The system automatically logs progressing through the tasks of an experiment. the beginning and end of each task at task bound- aries. • The Experimenter controls and coordinates the core – Remote Logging Application Invocation calls log- components of the system – these are: ging applications that reside on the client. This allows for rich client-sided logging of low level user – An Extensible Task Framework that provides a behavior obtained from specific hardware (e.g. mouse range of standard tasks for IIR experiments that movements or eye-tracking information). are part of the framework (e.g. questionnaires for acquiring background information and gen- • The Database interface manages all access to one or eral feedback from participants, search tasks with more databases that store users’ interaction logs as well as the basic experiment design for other system javascript. It monitors page loads as well as resize and components (e.g. participants, tasks and experiment focus events. It identifies mouse hover events over page blocks in the form of task rotations for individual users). elements, mouse movements, mouse clicks, keystrokes, and scrolling. Our version of UsaProxy is slightly mod- 3. USER INTERACTION LOGGING ified as we don’t log mouse movements with this tool. UsaProxy can run directly on the client, but can also This section focuses on the logging features of the Experi- be activated on a separate computer to balance load. ment System Framework as highlighted in grey in figure 1. The logging features and the arrangement of logging tools • The URL Tracker is a command line tool that extracts within the framework have been informed by the following and logs the users current web location directly from requirements: the Internet Explorer (IE) address bar and makes it available to the system framework. This allows any • Hybridity: All logging functionality is divided between task to determine participants’ current position on the a more general server architecture and a more specific web and to monitor their browsing history within a client; this integrates server-based as well as client- task. based logging features into a hybrid system framework. Whereas the server logs user interactions uniformly • Tobii Eyetracker: We use the Tobii T60 eyetracking across experiments, client logging is targeted to the hardware which is packaged with Tobii Studio2 , a com- capabilities of the particular client machine used for mercial eyetracking recording and analysis software. the experiment. Researchers can select from a range The software records eye movements, eye fixations, as of logging tools or integrate their own tools to record well as webpage access, mouse events and keystrokes. user behavior. This enables the system to use low level input devices, normally inaccessible by the server, to • Morae is a commercial software package for usability be controlled by logging tools residing on the client. testing and user experience developed and distributed by TechSmith3 . It records participants’ webcam and • Flexibility: Client logging tools can be combined through computer screen as video, captures audio, and logs a loosely coupled XML-based configuration that is pro- screen text, mouse clicks and keystrokes occurring within vided at task granularity. The system framework uses Internet Explorer. these task configurations to start logging tools on the client when the participant enters a task and stops This extensible list of logging tools are loosely coupled to them when the participant completes a task. This the Interaction Logger and the Remote Logging Application gives researchers the flexibility to compose logging tools Framework components through task configurations for in- as part of the experiment design and attach them to dividual tasks. The task configuration describes which log- the configuration of the task. Such configurations can ging tools are used during a task and the software framework later be reused as design templates which promotes activates them as soon as participants enter a task and de- uniformly across experiments and ensures important activates them as soon as they complete a task. types of user interaction data are being logged. • Scalability: Experiments can be configured to apply a The researcher can create a selection of relevant tools for number of different client machines as part of the data each task of a particular IIR experiment from the available collection. A researcher can, for example, trigger an- logging tools supported by the system framework. First, one other client computer to record video from a second should select all user behavior the researcher is interested in. web camera or simultaneously activate several clients Second, the observable data types that provide evidence for for experiments that involve more than one partici- the existence and the structure of these user behaviors is pant. Redundant instances of the same logging tools identified. Finally, these data types are linked with relevant can be instantiated to produce multiple data streams logging tools. In the next section we summarize experiences to overcome potential measurement errors and insta- from three distinct experiments that were designed and per- bilities on a data stream due to load or general failure formed with our experiment system framework. We do not of hard and software. describe these experiments in this paper. Instead, we focus on key points and issues that should be addressed when col- The client is configured to work with the following selection lecting multidimensional logging data from hybrid logging of open-source and commercial logging tools that record dif- tools. ferent behavioral aspects of participants: • RUIConsole is an adapted command line version of 4. EXPERIENCES FROM MULTIDIMEN- the RUI tools developed at Pennsylvania State Univer- SIONAL DATA LOGGING sity [5]. RUI logs low level mouse movements, mouse Data logging with an array of hybrid tools, as described clicks, and keystrokes. Our extension additionally pro- in the previous section, has a number of benefits and chal- vides full control over its logging features through a lenges. This section summarizes our initial experiences from command line interface to allow for more efficient au- conducting three IIR user experiments with the system frame- tomated use within our experiment framework. work and some initial processing and integration of its data logs. • UsaProxy is a javascript based HTTP proxy devel- 2 oped at the University of Munich [1] that logs inter- http://www.tobii.com 3 active user behavior unobtrusively through injected http://www.techsmith.com • Accuracy and Reliability: Using data streams from can be demanding when using high quality web cam- multiple logging tools limits the risk of measurement era and screen capture recording. Limited hardware errors to enter data analysis. This is especially rel- resources may have a direct effect on the recording ac- evant to IIR due to its need to conduct experiments curacy of other logging tools. More importantly, how- in naturalistic settings where people perform tasks in ever, a overloaded client may have an effect on par- conditions that are not fully controlled and therefore ticipants and their ability to accomplish tasks realis- less predictable. Such settings allow participants to tically. This can be avoided by choosing a sufficiently solve tasks with great degrees of freedom. As a re- equipped client machine and a fast network. As men- sult of this, user actions in such settings tend to be tioned in section 4, the software framework supports highly variable. Measurement errors or missing data, the distribution of logging tools over several machines, for example based on varying system performance and while these tools are activated centrally by the server network latencies, have a larger impact because the architecture, which can help to better balance the load. entire interaction is studied. Multiple data streams from different sources improve the overall accuracy of • Stability: Concurrent use of multiple logging applica- recorded sessions and increase the reliability of detect- tions can destabilize the client computer. Individual ing features in individual logs. Furthermore, the use of applications can affect each other especially when log- multiple data logs limits of chances that artifacts cre- ging from the same resources (e.g. from the same in- ated by individual logging tools and their assumptions stance of Internet Explorer). Currently, our system will affect downstream analysis. framework does not monitor running logging tools and there is no mechanism to recover tools that hang or • Disambiguation: The use of multiple data logs allows break during a task. This is a feature we will incorpo- to contextualize each log with the logs produced by rate into a future version of the system framework. other tools and disambiguate uncertainties in the in- terpretation of logging event sequences. We found that 5. FUTURE WORK the most common cases are timestamp disambiguation and the synchronization of event accuracies. Future work on the experiment system framework will fo- cus on further improvement of logging tool integration and – Timestamp disambiguation: The timestamp gran- monitoring. We are currently developing a graphical user ularity of recorded events usually varies between interface for researchers to more easily design IIR experi- logging tools. For example, Tobii Studio records ments with the system and monitor progress of running ex- eye tracking data with a constant frequency deter- periments and the accuracy of its data logs. An extension mined by the eye tracking hardware (e.g. 60 logs to the experiment system framework presented in this paper per second (17 ms) for the T60 model) whereas is a data analysis system that allows us to fully integrate, UsaProxy records events only every full second analyse and develop models from the recorded data. In par- and RUIConsole records events dynamically only ticular, we are interested in creating higher level constructs when they occur. The combination of logging from integrated low-level logging data that can be used to data from different tools helps to better deter- personalise interactive search for users. The experiment sys- mine the real timing of events by providing differ- tem framework will be released as open source to the wider ent viewpoints for the same sequence of actions a research community. user has performed. Low granularity timestamps might collapse a number of user events to a sin- 6. REFERENCES gle point of time and, based on that, change the [1] R. Atterer, M. Wnuk, and A. Schmidt:. Knowing the natural order in which these events are recorded. User’s Every Move - User Activity Tracking for Website Alternative secondary logging data can help to de- Usability Evaluation and Implicit Interaction. In 15th tect such event sequences and help disambiguat- International World Wide Web Conference ing and correcting them. (WWW2006), Edinburgh, Scotland, 2006. – Detail of event structure: Every logging tool im- [2] N. Belkin. Intelligent Information Retrieval: Whose poses a number of assumptions on the data pro- Intelligence? In Fifth International Symposium for duced by a user – which events to log, which Information Science (ISI), pages 25–31, Konstanz, events to differentiate and how to label them. Germany, 1996. Universtaetsverlag Konstanz. Two logging tools recording the same events can [3] N. Belkin, C. Cool, A. Stein, and U. Thiel. Cases, therefore produce different event structures with Scripts, and Information-Seeking Strategies: On the varying detail. For example, RUIConsole differ- Design of Interactive Information Retrieval Systems. entiates a mouse click into a press and a release Expert Systems with Applications, 9(3):379–395, 1995. event whereas Tobii Studio considers a mouse click [4] A. Edmonds, K. Hawkey, M. Kellar, and D. Turnbull. as a single event. Different logging tools recording Workshop on logging traces of web activity: The the same user actions produce events with a struc- mechanics of data collection. In 15th International ture of different detail that can be used to con- World Wide Web Conference (WWW 2006), textualise conflicting recordings of user actions. Edinburgh Scotland, 2006. • Scalability: Concurrent use of logging tools may cre- [5] U. Kukreja, W. E. Stevenson, and F. E. Ritter. RUI – ate performance issues on the client machine especially Recording User Input from interfaces under Windows with tools that produce large amounts of data. Es- and Mac OS X. Behavior Research Methods, pecially the combined use of Morae and Tobii Studio 38(4):656–659, 2006.