=Paper=
{{Paper
|id=Vol-2337/paper1
|storemode=property
|title=The Multi-Stage Experience: the Simulated Work Task Approach to Studying Information Seeking Stages
|pdfUrl=https://ceur-ws.org/Vol-2337/paper1.pdf
|volume=Vol-2337
|authors=Hugo Huurdeman,Jaap Kamps,Max Wilson
|dblpUrl=https://dblp.org/rec/conf/chiir/HuurdemanKW19
}}
==The Multi-Stage Experience: the Simulated Work Task Approach to Studying Information Seeking Stages==
<pdf width="1500px">https://ceur-ws.org/Vol-2337/paper1.pdf</pdf>
<pre>
The Multi-Stage Experience: the Simulated Work Task Approach
            to Studying Information Seeking Stages
             Hugo C. Huurdeman                                                  Jaap Kamps                                                       Max L. Wilson
              h.c.huurdeman@uva.nl                                          kamps@uva.nl                                               Max.Wilson@nottingham.ac.uk
             University of Amsterdam                                    University of Amsterdam                                          University of Nottingham

ABSTRACT                                                                                                              100%


                                                                                         Percentage of participants
This experience paper shines more light on a simulated work task                                                      80%
approach to studying information seeking stages. This explicit mul-                                                   60%                                    input / informational
tistage approach was first utilized in Huurdeman, Wilson, and                                                                                                control
                                                                                                                      40%
Kamps [14] to investigate the utility of search user interface (SUI)                                                                                         personalisable
features at different macro-level stages of complex tasks. We fo-                                                     20%
cus on the paper’s terminology, research design, methodology and                                                       0%
use of previous resources. Finally, based on our experience, we                                                              Stage 1   Stage 2     Stage 3
reflect on the potential for re-using our multistage approach and
on general barriers to re-use in an Interactive Information Retrieval                    Figure 1: SUI feature categories perceived most useful by
research context.                                                                        stage (from [14])

KEYWORDS                                                                                 Kuhlthau [20]’s Information Search Process model and Vakkari
experience paper, information seeking, search stages                                     [28]’s adaptation, describe fundamentally different macro-level
                                                                                         stages. Current search systems usually do not provide support
1    INTRODUCTION                                                                        for these stages, but provide a static set of features predominantly
In the Interactive Information Retrieval (IIR) community, there is a                     focused on supporting micro-level search interactions. Huurdeman
varied range of terminology, approaches and methods. Bogers et al.                       et al. [14] delved deeper into this paradox, and described an ex-
[2] assert that it is not straightforward to re-use aspects and mate-                    perimental user study employing (cognitively complex) multistage
rials from previous user studies in IIR research. They list various                      simulated work tasks, studying interaction patterns with interface
barriers to reproducibility and re-use, which include the “fragmen-                      and content during different search stages. In this study, a custom
tary nature” of the organization of resources, the lack of awareness                     search system named SearchAssist was used, and tasks were de-
of their existence, insufficient documentation, the research publica-                    signed to take users through pre-focus, focus, and post-focus task
tion cycle, and the inherent effort required for making resources                        stages to gather active, passive, and subjective measures of when
available.                                                                               SUI features provide most value and support.
   This experience paper shines more light on the simulated work                            To our knowledge, this mixed methods study was the first to use
task approach to studying information seeking stages, which we                           an explicit multistage simulated task design using Vakkari [28]’s
implemented in Huurdeman, Wilson, and Kamps [14]. To uncover                             pre-focus, focus formulation and post-focus stages. The indepen-
various aspects related to re-use and reproducibility, we specifically                   dent variable was task stage, the dependent variables active utility
focus on the paper’s terminology, the experience of designing our                        (via clicks and queries), passive utility (via mouse and eye track-
user study, the adaptation of previous work and the opportunities                        ing fixation counts) and perceived utility (via questionnaires and
for the re-use of our approach.                                                          interviews) of search user interface features.
   First, we summarize the original paper in Section 2. Then, we                            First, we looked at active behaviour, the behaviour which can be
discuss the used terminology (Section 3), followed by the method-                        directly and indirectly determined from logged interaction, such as
ology and research design (Section 4). Section 5 discusses in which                      clicks and submitted queries. Our main finding was that some fea-
ways previous work was adapted for use in our paper. Next, we                            tures such as informational features (providing information about
discuss the potential re-use of our approach (Section 6). Section 7                      results) are used frequently throughout, while input and control
concludes this experience paper with a short reflection.                                 features (for refinement of results) are used less frequently after the
                                                                                         first stage. Second, we looked at passive behaviour, i.e. behaviour
2    SUMMARY OF MULTI-STAGE STUDY                                                        not typically caught in interaction logs, such as eye fixations and
                                                                                         mouse movements. Our main finding was the difference with the
Research into information seeking behavior has shown substantial
                                                                                         active results: evidently, users look often at actively used features,
changes in user behavior during complex tasks involving learn-
                                                                                         but other features that are less actively used (such as the recent
ing and construction. Models of information seeking, including
                                                                                         queries feature) are more used in a passive way, suggesting a dif-
Workshop on Barriers to Interactive IR Resources Re-use at the ACM SIGIR Conference on   ferent type of support offered by these features. Third, we were
Human Information Interaction and Retrieval (CHIIR 2019), 14 March 2019, Glasgow, UK     interested in the subjective opinions of users about the usefulness
2019. Copyright for the individual papers remains with the authors. Copying permitted
for private and academic purposes. This volume is published and copyrighted by its
                                                                                         of features; this data also formed a reference point for interpreting
editors..                                                                                other observed data from the previous research questions.
Workshop on Barriers to Interactive IR Resources Re-use at the ACM SIGIR Conference on Human Information Interaction and Retrieval
(CHIIR 2019), 14 March 2019, Glasgow, UK                                                  H.C. Huurdeman, J. Kamps, M.L. Wilson
                                                                                                 Pre-
   The paper concluded that the perceived usefulness of features                             Questionnaire

differs radically per search stage, as summarised in Figure 1. First,
the most familiar input and informational features (the search box                                        Topic
                                                                                                       Assignment
and results list) were perceived as very relevant overall, but declined                                Introduction
                                                                                                          system
after the initial stage. Similarly, a set of assistive control features
(search filters, tags and query suggestions), less commonly included
                                                                                                             Training task
in SUIs were also perceived as most useful in the beginning, but less
useful in consecutive stages. Third, personalisable features (query
                                                                                                                             Task
history and a feature to save results) were considered as less useful
                                                                                                                                                      3x
                                                                                                                          Post-task
in the beginning, but their usefulness significantly increases over                                                      Questionnaire
time, even surpassing the value of common SUI features. Hence,
the results of our paper suggest that the macro-level process has a                                                                 Post-experiment
                                                                                                                                     questionnaire
large influence on the usefulness of SUI features.
                                                                                                                                         Debriefing
                                                                                                                                         interview

3    TERMINOLOGY
As a first step in analyzing Huurdeman et al. [14], we focus on the           Figure 2: Simplified protocol for the SearchAssist study
terminology, why it was used and how it was developed.

Information behavior, seeking and searching
The paper used commonly accepted definitions in the areas of Li-            of Byström and Järvelin [5]: tasks which require “understanding,
brary and Information Science (LIS), and (Interactive) Information          sense-making, and problem formulation” . Complex tasks go beyond
Retrieval (IIR) to refer to information seeking and searching, con-         simple lookup tasks, and might involve learning and construction,
cepts which were of key importance to the paper. It was framed              as well as different stages.
using Wilson [33]’s definition of information behavior: “the totality
of human behavior in relation to sources and channels of informa-           Information seeking stages
tion, including both active and passive information seeking, and
                                                                            As a framework we used temporally-based information seeking mod-
information use.” The paper’s main focus was on information seeking
                                                                            els – as defined by Beheshti et al. [1]. In particular, we were in-
and searching, subsets of information behavior in Wilson’s nested
                                                                            terested in stages occurring in information seeking, and utilized
model of research areas [33]. We used Ingwersen and Järvelin [15,
                                                                            previous literature related to tasks involving learning and construc-
p.21]’s definition of information seeking: “human information be-
                                                                            tion. Kuhlthau [20] has described a succession of stages, during
havior dealing with searching or seeking information by means
                                                                            which the feelings, thoughts and actions evolve: Initiation, Topic
of information sources and (interactive) information retrieval sys-
                                                                            Selection, Exploration, Focus Formulation, Collection and Presentation.
tems.” Information searching, in its turn, was defined as a subfield of
                                                                            We chose this model, as it was highly cited and one of the most
information seeking in Wilson’s nested model, and specifically fo-
                                                                            empirically tested information seeking models [1]. The model has
cuses on the interaction between information user and information
                                                                            been further refined and tested in an information retrieval context
system [33].
                                                                            by Vakkari [28]. He grouped the stages into three wider stages: Pre-
   Following Huurdeman and Kamps [13], Wilson [33], we also
                                                                            focus (Initiation, Topic selection, Exploration), Focus formulation
distinguished between the macro-level described by information
                                                                            (Focus formulation) and Post-focus (Collection, Presentation). For
seeking models, and the micro-level of specific system and interface
                                                                            the design of our study, we chose to use Vakkari’s model, since the
features, and looked at ways to bridge the gap between these levels.
                                                                            grouped stages were more feasible to incorporate in our study than
                                                                            the fine-grained stages defined by Kuhlthau.
Work tasks, search tasks and their complexity
In the paper, we made the distinction between work tasks and search
tasks, and also based this on previous literature in the domain of LIS
                                                                            Search user interfaces
and IIR. We used Ingwersen and Järvelin [15, p.20]’s definition of          In our paper, our interest was in the utility of potential SUI features.
work task: a “job-related task or non-job associated daily-life task or     Hence, we needed a way for describing the search user interface
interest to be fulfilled by cognitive actor(s)”. These tasks may be real-   and for distinguishing the different types of features. As we did
life tasks, or in our case, assigned simulated work tasks, for which        previously in Huurdeman and Kamps [13], we made use of a taxon-
we used Borlund [3]’s definition and guidance. Work tasks may lead          omy proposed by Wilson [31]. This taxonomy distinguishes input
to one or more search tasks, and we used Ingwersen and Järvelin             features (helping users to express needs), control features (allowing
[15, p.20]’s definition: “the task to be carried out by a cognitive         users to restrict or modify input), informational features (provid-
seeking actor(s) as a means to obtain information associated with           ing results or information about them) and personalizable features
fulfilling a work task”.                                                    (which are tailored to a user’s experience). We chose this taxonomy
    An important distinction made in the paper is between sim-              because it was focused on Search User Interfaces. Its terminology
ple work tasks, which can for instance be solved with a single              could help us in framing the study, designing the user interface and
search query, and complex work tasks. We utilized the definition            in discussing the study’s outcomes.
Workshop on Barriers to Interactive IR Resources Re-use at the ACM SIGIR Conference on Human Information Interaction and Retrieval
The Multi-Stage Experience                                                               (CHIIR 2019), 14 March 2019, Glasgow, UK


Study setup
                                                                                    sim. work task: writing essay
The study setup was described using common terminology in previ-
ous literature (such as [17, 25]), and via terminology from Borlund
[3]. We intended to describe as much of the study’s setup as possi-                 subtask            subtask             subtask
ble within the given 10-page space. This included information on                    prepare list of     choose topic;      find and select
the task design and participants, the full task descriptions, the data                 3 topics       formulate specific      additional
                                                                                                          question          pages to cite
and the interface. Finally, we briefly described a validation of topic
differences and invoked stages. The latter was important to validate                 ~15 minutes        ~15 minutes         ~15 minutes
the new multistage simulated task approach used in the paper (see
Section 4 for more details). An important element was defining the                  initiation
                                                                                                                            collecting
study’s protocol, the importance of which also has been underlined                  topic selection    focus formulation
                                                                                                                            presenting
                                                                                    exploration
by Borlund. Figure 2 depicts a simplified example of the study’s
                                                                                    pre-focus          focus formulation    post-focus
protocol.

4   METHODOLOGY                                                                                  Figure 3: Study design
Next, we outline the methodology used in the CHIIR 2016 paper
[14], and the decisions made in the process of preparing it.             Research Design
                                                                         At the moment of writing the paper, Kuhlthau’s and Vakkari’s mod-
Methodology, methods and research techniques                             els had been studied in longitudinal settings (e.g. [18, 19, 29, 30]),
For describing aspects related to our methodology here, we use           for instance during students’ processes of writing a term paper.
part of the division made by Pickard [25]: research methodology          This means the process was monitored at multiple moments along
(theoretical perspective of the research), research method (strategy)    a broader timeframe (for instance using surveys or by monitoring a
and data collection instruments (research techniques).                   search session). At the moment of writing, no longitudinal studies
   In terms of research methodology, the paper used mixed method-        of search user interfaces or their specific features using the model
ology, thus combining qualitative and quantitative methodologies.        of Kuhlthau or Vakkari existed. Some studies had investigated tem-
We decided to use mixed methods to be able to capture the inher-         poral use of SUI features, but used temporal segmentations of sin-
ently multi-layered (‘macro-level’) aspects of information seeking       gular search sessions to deduct phases in a session (for instance
and the micro-level behavioral patterns.                                 [7, 13, 24]).
   With respect to research method, we used experimental research,          One the one hand, longitudinal settings may not have full possi-
via a lab-based user study. We took this approach (as opposed to         bilities for close monitoring and controlling experimental settings,
e.g. a naturalistic setting) to be able to combine a wide variety of     while on the other hand viewing information seeking stages as tem-
data collection instruments.                                             poral search segments might not include the same level of learning
   The data collection instruments directly used in our analysis,        as longitudinal studies. Therefore, our aim before the study was
and documented in the paper, were chosen based on our research           to find a middle point - combining aspects of both approaches. As
questions, and on examples from previous literature. These were          an instantiation of this aim, we set out to study multiple subtasks,
the following:                                                           representing different stages, within a single simulated work task
                                                                         (see Figure 3).
    • Questionnaires (pre-experiment, post-task, post-experiment)           In our user study, we used a (cognitively complex) multistage
    • Interview (post-experiment)                                        simulated work task – the commonly used essay-writing task –
    • Transaction logging (clicks, mouse moves, entered text)            which would also be sufficiently familiar to the undergraduate stu-
    • Eye tracking (fixations, saccades)                                 dents participating in the study. We studied interaction patterns
Furthermore, we made use of other data collection instruments,           with interface and content during different search stages, repre-
which were not directly used in our analysis.                            sented by the subtasks. The independent variable was task stage,
                                                                         the dependent variables were active, passive and perceived utility of
    • Observations (the investigator observed the participants’          search user interface features. More specifically, we looked at active
      behavior and could view their screen contents on a tablet          behaviour, “the behaviour which can be directly and indirectly deter-
      from a distance)                                                   mined from logged interaction”, passive behaviour, “behaviour not
    • Screen recordings (a time-stamped screenshot was made              typically caught in interaction logs, such as eye fixations and mouse
      every 250 milliseconds)                                            movements,” and perceived usefulness, “the subjective opinions of
The rationale underlying the use of the latter instruments is that       users about the usefulness of features” [14]. These variables were
they were used as a reference during the analysis process (observa-      discussed among the paper authors in advance, and were meant
tion notes), and as a backup in case transaction logging instruments     to extend the small-scale data analysis in the paper’s predecessor
would fail (screen recordings).                                          [13].
   Further specifics regarding the configuration of data collection         This approach can be seen as more realistic than the singular
instruments, and re-use of previous approaches can be found in           search approach, but potentially allow for more experimental con-
Section 5.                                                               trol than a longitudinal setting. A challenging aspect, however, was
Workshop on Barriers to Interactive IR Resources Re-use at the ACM SIGIR Conference on Human Information Interaction and Retrieval
(CHIIR 2019), 14 March 2019, Glasgow, UK                                                  H.C. Huurdeman, J. Kamps, M.L. Wilson


to formulate simulated work task situations which were representa-                            callforparticipants.com. Since direct payment was not possible, par-
tive of Vakkari’s stages, and also providing possibilities for learning                       ticipants received a 10 GBP Amazon voucher. This amount was
about a topic. This formulation took place during several months                              based on the available budget, and previous studies in the depart-
preceding the actual study, and involved the paper authors as well                            ment.3 To add additional incentive for our tasks involving learning,
as further information seeking experts. We discuss the re-use of                              we awarded 25 GBP for the best task outcome.
previous materials within the research design in Section 5.
   Borlund [3] underlines the importance of counterbalancing tasks.                           5    RE-USE OF PREVIOUS RESOURCES
In this case, we focused on work tasks involving learning. Therefore,                         This section describes the resources from previous literature which
the three tasks in the study had to be performed in sequence; the                             we re-used in our study, as well as resources related to system, data
stage order could not be counterbalanced without losing cumulative                            and user interface.
learning and understanding gathered in each subsequent stage.
We reckoned that this was a worthwhile tradeoff, since the tasks                              Research design
involved learning, and thus were dependent on each other (e.g.,
                                                                                              As mentioned in Section 4, no multistage simulated work task
a participant needed to explore topics before making a reasoned
                                                                                              design existed at the time which used Vakkari’s stages, but when
decision about which topic to choose).
                                                                                              designing the tasks, we did incorporate elements from previous
                                                                                              work, albeit often in adapted form. First of all, Kuhlthau’s book [20]
Task and Stage Validation                                                                     and Vakkari’s work (e.g. [28]) were an inspiration. Further resources
We also validated the multistage approach, of both task and stage                             had been found in our previous literature survey on information
within the process. We examined the validity of our task descrip-                             seeking stages [13], and additional examples were sought during
tions in terms of invoking correct stages.                                                    the preparation of this paper. This was both via online literature
   In post-stage questionnaires users selected the activities they                            search systems and via the RepAST repository of assigned search
had conducted from a randomized list1 derived from Kuhlthau’s                                 tasks4 . Another source of information for the subtasks were existing
model [20]. From the results of this validation, we concluded that                            research process and information literacy models. This included
even though changes between stages are sometimes gradual, our                                 Kumar [22, p.51-53] and various online resources5 . These models
experiment correctly invoked the main activities in each stage                                include the idea of formulating broad topics, selecting a specific
(for instance, ‘exploring’ in the first subtask, ‘focusing’ in the sec-                       topic and questions related to the topics.
ond/third subtask, and ‘collecting’ in the third subtask). The fact                              For the textual contents of the tasks, we used elements of work
that the first task was seen as explorative was also reflected in the                         tasks from Kules and Shneiderman [21], Liu and Belkin [23]. For
type of information sought, reported in the questionnaire as evolv-                           instance, to inspire participants we indicated that ideas for topics
ing from ‘general’ (in the questionnaire after stage 1), to ‘specific’                        should “cover many aspects of the topic” and that “unusual or
(after stage 2 and 3).                                                                        provocative ideas are good”, part of the task description in Kules
   Further parts of the stage validation matched the results of the                           and Shneiderman [21]. Furthermore, we took inspiration from Liu
validation, but could not be included in the CHIIR paper, due to a                            and Belkin [23], which used an “approach using different subtasks
lack of space. However, they were included in an extended version                             accomplished in different search sessions at different times.” In our
in Huurdeman [12]. This included an assessment of the feelings of                             case, however, subtasks were performed within different search
participants during the experiment, to monitor the concordance                                sessions in a single user study – with small breaks in between, in
with the stages described in Vakkari [28]. To this end, we used a                             which participants switched from focusing on the screen to filling
word list from previous user studies by Kuhlthau [20], Todd [27].                             out a paper-based questionnaire.
Participants had to choose from a list of ten words (in random order)
which could represent their state of mind near the end of each task                           Questionnaire design and scales
phase2 . Some fluctuations in reported feelings could be detected,
                                                                                              For the design of our pre-experiment, post-task and post-experiment
showing some evidence of Kuhlthau’s findings on gradually reduced
                                                                                              questionnaires, we combined different sources. A number of ques-
uncertainty and rising optimism.
                                                                                              tions were based on those described for the related user studies
                                                                                              described in Diriye [8]. Some questions were used directly (e.g.
Participant recruitment                                                                       referring to task and topic understanding and interest, and to the
We aimed at recruiting undergraduate students in the Computer                                 used interface), and other questions were added or reformulated
Science department of the University of Nottingham (UK campus),                               based on our research questions and particular multi-stage setup.
since this is where the lab study took place, and since we could                              Furthermore, we directly used Kuhlthau [20], Todd [27] for various
customize the tasks to be relevant to this particular audience. We                            validation questions within the process, as described in the pre-
used a multifaceted approach to cast a wide net: we announced                                 vious section 4. In terms of scales and possible answers, we used
the study via posters in the School of Computer Science, via the
                                                                                              3 As the findings of a replication study by Wilson [32] indicate, remuneration might
institution’s Facebook page, via an email list, and via the website
                                                                                              have an effect on motivation of participants. In the CHIIR study, we tried to optimize
                                                                                              motivation by ensuring that the participants selected a topic of their liking (using
1 Specifically: exploring, focusing, formulating, collecting, gathering, becoming informed,
                                                                                              elicitation questions in a pre-questionnaire), and by asking if participants wanted to
choosing, and getting an overview                                                             be eligible for a prize for the best topic.
2 In particular: confident, disappointed, relieved, frustrated, sure, confused, optimistic,   4 https://ils.unc.edu/searchtasks/
uncertain, satisfied, doubtful.                                                               5 e.g., http://ischoolapps.sjsu.edu/static/courses/250.loertscher/modelstrip.html
Workshop on Barriers to Interactive IR Resources Re-use at the ACM SIGIR Conference on Human Information Interaction and Retrieval
The Multi-Stage Experience                                                               (CHIIR 2019), 14 March 2019, Glasgow, UK


the approaches in the previously mentioned literature as a basis,
including open questions and 7-point Likert Scales – for the latter
setup, we used guidance from Pickard [25].                                                      1                                                                   5
   Difficulties in this phase were related to finding, adapting and
formulating questions suitable for the multistage search setting in
our project, as well as the used models from Kuhlthau and Vakkari.
Moreover, many previous papers did not document their question-
naire contents. An alternative would be to use standard question-                               2                                    4                              6
naires instead, though the inspected examples were deemed not
specific enough at the time of our study, and the duration to fill
out some of the longer standardized questionnaires would limit the
possibilities within the originally planned 60 minute timeframe of
the study.                                                                                      3
   Due to space constraints and a lack of time to provide further
documentation, the questionnaires were not included in the CHIIR
paper itself, but were only described – thus, they could not directly
be re-used at the time6 .                                                                                                            7
Experimental System                                                                       Figure 4: Screenshot SearchAssist. Left column (1, 2, 3): con-
The study used custom-built components for the SearchAssist                               trol features. Middle (4): input and informational features.
system (depicted in Figure 4). Components were created using                              Right Column (5, 6): personalisable features. (7): task bar
Javascript and PHP, and the libraries JQuery and JQuery-UI were
utilized. For logging system events, we used a custom MySQL com-
ponent and Log4Javascript. This way, user actions were both logged                        different results within the timeframe of the study (approximately
to a database as well as stored in raw text files (for redundancy).                       two weeks), we cached search results for each query locally, mean-
Furthermore, we exported browser history using the “Export His-                           ing a second user would see the same results if the same query was
tory” Chrome browser extension. For mouse logging, we used a                              entered11 . For spelling corrections to queries we utilized the Bing
Javascript based method available online7 .                                               Spelling Suggestions API. For documentation on the use of these
   We decided to create the system from scratch, although we re-                          APIs, we used Microsoft’s “Bing Search API Quick Start and Code
used previous frameworks and components when possible. Existing                           Samples” document.
systems for IIR experiments, such as PyIRE8 , were considered, but
not used due to several reasons. Since we made use of paper-based                         Interface
questionnaires, and since we used one interface for the three stages,                     The inspiration for the design of this search interface was based
we did not need a system for handling the experimental flow. Also,                        on the then-current Google search interface, including the basic
we had limited time to setup the needed system, adapting it to our                        layout and color scheme (see Figure 4). This way, we intended to
needs (e.g. for using eye-tracking and including the SUI features we                      offer a familiar environment to users. All functionality was tested
wished to evaluate), and to obtain the data that would be necessary                       and adapted based on a small pilot study with two participants. For
to populate a search system representing a general-purpose web                            specific features in the interface, the following components were
search engine, discussed next.                                                            used:
                                                                                              (1) Category Filters: a clickable list, generated by matching hostnames
Data                                                                                              of results with a converted list of URLs and top level category names
                                                                                                  downloaded from the Open Directory Project (DMOZ)12 .
For the data underlying the search system, different options were
                                                                                              (2) Word cloud: a basic word cloud created via jquery.tagcloud.js. Words
considered. For instance, creating a search engine using the ClueWeb9                             could be added to the current query.
or Amazon / LibraryThing dataset10 . However, in the end we de-                               (3) Query suggestions: a clickable list generated from the Bing Query
cided to use the Bing Web Search API for the search results. This                                 Suggestions API.
was chosen because it would a) ease the creation of a suitable search                         (4) Search Results: originating from the Bing Web Search API, combined
interface, b) would provide realistic and recent search results to                                with DMOZ category information.
participants, and c) because participants could open the full web                             (5) Recent queries: use of a local MySQL database to display a clickable
resources listed in the search results. To avoid different users seeing                           list with the last 15 queries
                                                                                              (6) Saved results feature: custom built and tested with colleagues (in-
6 At this point, though, they can be accessed via https://github.com/timelessfuture/              teraction design experts) at the department. Includes possibilities
searchassist/tree/master/chiir-study-materials                                                    for adding categories and drag ’n drop reordering of saved items /
7 Available at: https://stackoverflow.com/questions/7790725/javascript-track-mouse-               categories, as well as deletion of items / categories.
position/34348306
8 https://pyiire.readthedocs.io/en/latest/                                                11 These cached results were later securely stored in conjunction with the experimental
9 http://lemurproject.org/clueweb12/
                                                                                          data, for future reference and analysis.
10 As in e.g. the INEX/CLEF Interactive Social Book Search Track [9]: http://inex.mmci.   12 Now offline, archived version at: http://web.archive.org/web/20141102025545/http:
uni-saarland.de/data/documentcollection.html                                              //www.dmoz.org/docs/en/rdf.html
Workshop on Barriers to Interactive IR Resources Re-use at the ACM SIGIR Conference on Human Information Interaction and Retrieval
(CHIIR 2019), 14 March 2019, Glasgow, UK                                                  H.C. Huurdeman, J. Kamps, M.L. Wilson


    (7) Task bar: clickable links to the task instructions and response form               by the selection of one person and further investigating this per-
        (in an editable Google Document) and an option to end the current                  son (focus formulation) and collecting materials (post-focus). Their
        task.                                                                              research questions had different focal points, looking at feature use
A link to the source code of the used experimental system13 was                            (active utility in our paper), knowledge gain (captured in our study,
included in the final CHIIR 2016 paper. This consisted of the search                       but not used in our paper), perceived usefulness (included in our
interface, task configurations and all used back-end components                            study) and overall perceived usefulness and satisfaction (captured
(including custom usage logging), along with brief documentation.                          in our study, but not reported due to space limitations). Hoeber
Although tailored to our experiment, the different elements of this                        et al. [11]’s research outcomes in terms of feature use across stages
system could be re-used for future studies, keeping in mind the                            confirm our findings.
crucial aspect of maintenance (further discussed in Section 7)14 .                            Second, Gaikwad and Hoeber [10] used Vakkari [28]’s model as
                                                                                           “a design guide, and as a mechanism for controlling the laboratory-
Eye tracking and eye tracking analysis                                                     based evaluation.” This paper uses a multistage task design, but
With respect to eye tracking, we made use of the approach employed                         focuses on interactive image retrieval. Therefore, participants ex-
by Jiang et al. [16]. This pragmatic approach involved showing up to                       plored (pre-focus), selected (focus formulation) and organized im-
8 results at a time in the search interface, instead of the more regular                   ages (post-focus), and this is reflected in the task descriptions, which
10. This allowed for easier analysis of fixations on certain parts                         focus on holiday plans, food blogging and self-selected tasks.
of the search user interface (since there was no scrolling within
the search screen itself). The use of a relatively large screen with                       7    DISCUSSION AND CONCLUSION
sufficient screen resolution allowed for the displayal of all features.                    This experience paper has reflected on the various aspects related
   For our paper, we looked at common eye tracking metrics (see                            to creating a simulated work task approach to studying information
e.g. Poole and Ball [26]), and chose to analyze fixation counts and                        seeking stages. This approach was first applied in the context of
fixation durations. To distinguish fixations, we used Buscher et al.                       Huurdeman, Wilson, and Kamps [14]. Most terminology from this
[4]’s strategy, which defined a fixation as sequences of eye tracking                      paper originated from previous literature, and was adapted for use
measures within a 25 pixel radius, within a timeframe of at least                          in our paper. Our methodology extended previous approaches in
80ms. Since both fixation count and duration measures had similar                          combining a variety of data collection instruments, as well as in
results, we focused on reporting only fixation counts in the paper,                        taking a new approach to designing multistage studies. We also
due to stringent space limits for the CHIIR paper15 .                                      discussed the re-use of previous resources, including encountered
   For transparency and flexibility, we decided to use an open-                            difficulties. Finally, the subsequent use of the multistage approach
source Python framework to perform the eye tracking and do the                             [10, 11] has shown that re-use of research design and tasks is a
subsequent analysis. For the eye tracking, we utilized the PyGaze                          feasible prospect.
framework [6], as well as the PyTribe toolbox - a wrapper for the                             With respect to the barriers to the re-use of materials, we can
used EyeTribe eye tracker16 . Using the PyGaze software, it was then                       observe the issue of lacking space to document all aspects of our
possible to generate heatmaps and other eyetracking visualizations,                        study – for instance further documentation on decisions within
but also to analyze fixations using our own metrics.                                       the process. Moreover, there is the typical lack of time within the
                                                                                           research process and publication cycle, which meant that we could
6    POTENTIAL FOR RE-USE OF OUR                                                           release the source code for the used tools in time for publication,
     APPROACH                                                                              but not the analysis scripts or other resources. A restrictive consent
The simulated work task approach to studying information seeking                           form also meant that no actual data from the study (e.g. interaction
stages, as applied in our paper, has re-use potential. This is reflected                   data) could be released, even anonymously. On a broader scale, we
by the fact that the approach has been re-used in two papers so far                        encountered the tension between flexibility in terms of research
[10, 11].                                                                                  questions, and the possibility to re-use standardized systems and
   First, Hoeber et al. [11] explicitly state that they drew inspiration                   approaches, leading us to create a custom system. There is also the
from Huurdeman et al. [14] for the organization of their study, in                         issue of maintenance: just four years after our study, the components
which they evaluated an interactive search interface entitled “Lens-                       of the system have changed (e.g. Bing API configurations), as well
ing Wikipedia”. The paper utilizes a similar research design as our                        as the issue of persistence: various URLs of used resources are now
paper, also using Vakkari [28] stages and Wilson [31]’s taxonomy                           only available in the Internet Archive17 .
of interface features. The essay-writing task and its descriptions are                        We would fully support the creation of more standardized ap-
adapted to the domain described in the paper (a history course), but                       proaches to documentation and more centralized places to deposit
are otherwise similar to those in our paper. Instead of three research                     the wide variety of resources related to a user study, as discussed
ideas, users selected three persons in the pre-focus stage, followed                       in Bogers et al. [2]. In this light, it is also very positive to observe
                                                                                           that conferences such as CHIIR now allow additional space for ref-
13 Available from: https://github.com/timelessfuture/searchassist                          erences and appendices, making it possible to extend publications
14 At this point, in 2019, re-use would imply system adaptations to reflect for instance
                                                                                           with pivotal documentation about the process.
changed search API details and updated hostname lists for the category filters.
15 For a planned journal extension of the paper, we intend to include both fixation
metrics.
16 PyGaze is available from: https://github.com/esdalmaijer/PyGaze, and PyTribe from:      17 We took a proactive approach, however, and archived for instance all webpages

https://github.com/esdalmaijer/PyTribe                                                     opened by participants at the time, using wget
Workshop on Barriers to Interactive IR Resources Re-use at the ACM SIGIR Conference on Human Information Interaction and Retrieval
The Multi-Stage Experience                                                               (CHIIR 2019), 14 March 2019, Glasgow, UK


REFERENCES                                                                                 [15] Peter Ingwersen and Kalervo Järvelin. 2005. The Turn - Integration of Information
 [1] Jamshid Beheshti, Charles Cole, Dhary Abuhimed, and Isabelle Lamoureux. 2014.              Seeking and Retrieval in Context. Springer.
     Tracking middle school students’ information behavior via Kuhlthau’s ISP Model:       [16] Jiepu Jiang, Daqing He, and James Allan. 2014. Searching, Browsing, and Click-
     Temporality. J Am Soc Inf Sci Tec (2014). https://doi.org/10.1002/asi.23230                ing in a Search Session: Changes in User Behavior by Task and over Time. In
 [2] Toine Bogers, Maria Gade, Mark Michael Hall, Luanne Freund, Marijn Koolen,                 Proceedings of the 37th International ACM SIGIR Conference on Research and Devel-
     Vivien Petras, and Mette Skov. 2018. Report on the Workshop on Barriers to                 opment in Information Retrieval (SIGIR ’14). ACM, New York, NY, USA, 607–616.
     Interactive IR Resources Re-use (BIIRRR 2018). ACM SIGIR Forum 52, 1 (2018), 10.           https://doi.org/10.1145/2600428.2609633
     https://doi.org/10.1145/3274784.3274795                                               [17] Diane Kelly. 2009. Methods for Evaluating Interactive Information Retrieval Sys-
 [3] Pia Borlund. 2003. The IIR evaluation model: a framework for evaluation of inter-          tems with Users. Foundations and Trends in Information Retrieval 3, 1âĂŽÃĎÃő2
     active information retrieval systems. Inf Res 8, 3 (2003). http://www.informationr.        (2009), 1–224. https://doi.org/10.1561/1500000012
     net/ir/8-3/paper152.html                                                              [18] Carol Collier Kuhlthau. 1988. Longitudinal case studies of the information search
 [4] Georg Buscher, Andreas Dengel, and Ludger van Elst. 2008. Eye Movements As                 process of users in libraries. Libr Inform Sci Res 10 (1988), 257–304.
     Implicit Relevance Feedback. In CHI ’08 Extended Abstracts on Human Factors in        [19] Carol Collier Kuhlthau. 1988. Perceptions of the information search process in
     Computing Systems (CHI EA ’08). ACM, New York, NY, USA, 2991–2996. https:                  libraries: a study of changes from high school through college. Inform Process
     //doi.org/10.1145/1358628.1358796                                                          Manag 24 (1988), 419–427. https://doi.org/10.1016/0306-4573(88)90045-3
 [5] K. Byström and Kalervo Järvelin. 1995. Task Complexity Affects Information            [20] Carol Collier Kuhlthau. 2004. Seeking meaning: a process approach to library and
     Seeking and Use. Inform Process Manag 31, 2 (1995), 191–213. https://doi.org/10.           information services. Libraries Unlimited.
     1016/0306-4573(95)80035-R                                                             [21] Bill Kules and Ben Shneiderman. 2008. Users can change their web search tactics:
 [6] Edwin S. Dalmaijer, Sebastiaan Mathôt, and Stefan Van der Stigchel. 2013.                  Design guidelines for categorized overviews. Inform Process Manag 44, 2 (2008),
     PyGaze: An open-source, cross-platform toolbox for minimal-effort program-                 463–484. https://doi.org/10.1016/j.ipm.2007.07.014
     ming of eyetracking experiments. Behav Res Meth 46, 4 (2013), 913–921. https:         [22] Ranjit Kumar (Ed.). 2010. Research Methodology. SAGE.
     //doi.org/10.3758/s13428-013-0422-2                                                   [23] Jingjing Liu and Nicholas J. Belkin. 2015. Personalizing information retrieval
 [7] Abdigani Diriye, Ann Blandford, and Anastasios Tombros. 2010. When is System               for multi-session tasks. J Am Soc Inf Sci Tec 66, 1 (Jan. 2015), 58–81. https:
     Support Effective?. In Proceedings of the Third Symposium on Information Interac-          //doi.org/10.1002/asi.23160
                                                                                           [24] Xi Niu and Diane Kelly. 2014. The use of query suggestions during information
     tion in Context (IIiX ’10). ACM, 55–64. https://doi.org/10.1145/1840784.1840794
                                                                                                search. Inform Process Manag 50 (2014), 218–234. https://doi.org/10.1016/j.ipm.
 [8] A. M. Diriye. 2012. Search interfaces for known-item and exploratory search tasks.
                                                                                                2013.09.002
     Doctoral. UCL (University College London). http://discovery.ucl.ac.uk/1343928/
                                                                                           [25] Alison Pickard. 2007. Research methods in information. Facet Publishing.
 [9] Maria Gäde, Mark Hall, Hugo Huurdeman, Jaap Kamps, Marijn Koolen, Mette
                                                                                           [26] Alex Poole and Linden J. Ball. 2005. Eye Tracking in Human-Computer Interaction
     Skov, Elaine Toms, and David Walsh. 2016. Overview of the INEX 2016 Interactive
                                                                                                and Usability Research: Current Status and Future. In Prospects, Chapter in C.
     Social Book Search Track. In Working Notes of CLEF 2016 - Conference and Labs
                                                                                                Ghaoui (Ed.): Encyclopedia of Human-Computer Interaction. Pennsylvania: Idea
     of the Evaluation forum (CEUR Workshop Proceedings), Vol. 1609. http://ceur-ws.
                                                                                                Group, Inc.
     org/Vol-1609/16091024.pdf
                                                                                           [27] Ross J. Todd. 2006. From information to knowledge: charting and measuring
[10] Manali Gaikwad and Orland Hoeber. 2019. An Interactive Image Retrieval Ap-
                                                                                                changes in students’ knowledge of a curriculum topic. Inf Res 11, 4 (2006).
     proach to Searching for Images on Social Media. In Proceedings of the 2019
                                                                                                http://www.informationr.net/ir/11-4/paper264.html
     ACM Conference on Human Information Interaction and Retrieval (CHIIR ’19).
                                                                                           [28] Pertti Vakkari. 2001. A theory of the task-based information retrieval process:
     https://doi.org/10.1145/3295750.3298930
                                                                                                a summary and generalisation of a longitudinal study. J Doc 57, 1 (Feb. 2001),
[11] Orland Hoeber, Anoop Sarkar, Andrei Vacariu, Max Whitney, Manali Gaikwad,
                                                                                                44–60. https://doi.org/10.1108/EUM0000000007075
     and Gursimran Kaur. 2017. Evaluating the Value of Lensing Wikipedia During
                                                                                           [29] Pertti Vakkari and Nanna Hakala. 2000. Changes in relevance criteria and problem
     the Information Seeking Process. In Proceedings of the 2017 Conference on Confer-
                                                                                                stages in task performance. J Doc 56 (2000), 540–562. https://doi.org/10.1108/
     ence Human Information Interaction and Retrieval - CHIIR ’17. ACM Press, Oslo,
                                                                                                EUM0000000007127
     Norway, 77–86. https://doi.org/10.1145/3020165.3020178
                                                                                           [30] Pertti Vakkari, Mikko Pennanen, and Sami Serola. 2003. Changes of search terms
[12] Hugo C. Huurdeman. 2018. Supporting the complex dynamics of the information
                                                                                                and tactics while writing a research proposal: A longitudinal case study. Inform
     seeking process. Ph.D. Dissertation. University of Amsterdam. http://hdl.handle.
                                                                                                Process Manag 39 (2003), 445–463. https://doi.org/10.1016/S0306-4573(02)00031-6
     net/11245.1/1e3bf31a-0833-4ead-a00c-4cb1399d0216
                                                                                           [31] Max L. Wilson. 2011. Search User Interface Design. Synthesis Lectures on
[13] Hugo C. Huurdeman and Jaap Kamps. 2014. From Multistage Information-
                                                                                                Information Concepts, Retrieval, and Services 3, 3 (Nov. 2011), 1–143. https:
     seeking Models to Multistage Search Systems. In Proceedings of the 5th Information
                                                                                                //doi.org/10.2200/S00371ED1V01Y201111ICR020
     Interaction in Context Symposium (IIiX ’14). ACM, New York, NY, USA, 145–154.
                                                                                           [32] Max L. Wilson. 2013. Teaching HCI Methods: Replicating a Study of Collaborative
     https://doi.org/10.1145/2637002.2637020
                                                                                                Search. In Proceedings of the CHI2013 Workshop on the Replication of HCI Research
[14] Hugo C. Huurdeman, Max L. Wilson, and Jaap Kamps. 2016. Active and Passive
                                                                                                (RepliCHI 2013) (CEUR Workshop Proceedings), Vol. 976. 39–43. http://ceur-ws.
     Utility of Search Interface Features in Different Information Seeking Task Stages.
                                                                                                org/Vol-976/tpaper5.pdf
     In Proceedings of the 2016 ACM Conference on Human Information Interaction and
                                                                                           [33] Tom D. Wilson. 1999. Models in information behaviour research. J Doc 55 (1999),
     Retrieval (CHIIR ’16). ACM, New York, NY, USA, 3–12. https://doi.org/10.1145/
                                                                                                249–270. https://doi.org/10.1108/EUM0000000007145
     2854946.2854957

</pre>