<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>CHIIR</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Experiences with the 2013-2016 CLEF Interactive Information Retrieval Tracks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vivien Petras</string-name>
          <email>vivien.petras@ibi.hu-berlin.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Gäde</string-name>
          <email>maria.gaede@ibi.hu-berlin.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>interactive information retrieval, evaluation, CHiC, SBS, CLEF, book</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marijn Koolen</string-name>
          <email>marijn.koolen@di.huc.knaw.nl</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Toine Bogers</string-name>
          <email>toine@hum.aau.dk</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Berlin School of Library and Information Science, Humboldt-Universität zu Berlin</institution>
          ,
          <addr-line>Berlin</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Berlin School of Library and Information Science, Humboldt-Universität zu Berlin</institution>
          ,
          <addr-line>Berlin</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Humanities Cluster, Royal Netherlands Academy of Arts and Sciences</institution>
          ,
          <addr-line>Amsterdam</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Science and Information Studies, Department of Communication &amp; Psychology, Aalborg University Copenhagen</institution>
          ,
          <addr-line>Copenhagen</addr-line>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>search</institution>
          ,
          <addr-line>information seeking</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>14</volume>
      <fpage>2011</fpage>
      <lpage>2012</lpage>
      <abstract>
        <p>This paper describes our experiences with the interactive IR tracks organized at CLEF from 2013-2016 and aggregates the lessons learned with each consecutive instance of the lab. We end with a summary of practical insights and lessons for future collaborative interactive IR evaluation exercises and for potential re-use scenarios.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        After the INEX (Initiative for Evaluation of XML Retrieval)
Interactive Track ended in 2010 [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], there was a gap in interactive
information retrieval (IIR) experimentation at the large-scale
evaluation initiatives. The interactive track at the Cultural Heritage at
CLEF (Conference and Labs of the Evaluation Forum) lab (iCHiC)
revived this in 2013 and merged with the INEX Social Book Search
track to form the Social Book Search (SBS) lab at CLEF, running an
interactive track in 2014-2016.
      </p>
      <p>This paper provides a chronological overview of the development
and history of these two IIR initiatives and their outcomes. We
focus on the lessons learned for future collaborative IIR evaluation
exercises and for potential re-use scenarios. We start by chronicling
the timeline of the diferent interactive labs that were organized in
Sections 2-6. We then highlight the most important lessons learned
for the configuration of IIR evaluation experiments. We conclude
by discussing consequent activities and insights for the re-use of
IIR resources.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Setup</title>
      <p>
        The EU-funded PROMISE1 project (Participative Research
labOratory for Multimedia and Multilingual Information Systems
Evaluation) ran from 2010-2013 with the goal of providing a virtual and
open laboratory for research and experimentation with complex
multimodal and multilingual information systems [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In order to
evaluate its concepts and prototypes, three use cases were defined to
guide real-world requirements analysis and contextual testing:
‘Unlocking Cultural Heritage’ (information access to cultural heritage
material), ‘Searching for Innovation’ (patent search) and ‘Visual
Clinical Decision Support’ (radiology image retrieval).
      </p>
      <p>
        For the ‘Unlocking Cultural Heritage‘ (CH) use case, a workshop
at the 2011 CLEF conference was organized in order to review
existing information access use cases in the CH domain and then
develop retrieval scenarios that could be used for evaluating CH
information access systems [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In addition to qualitative usability
tests of user interfaces, transaction log analyses and Cranfield-style
text retrieval evaluation, other forms of user studies were also
considered as viable evaluation approaches. The study and analysis
of diferent interaction patterns with CH materials was the main
interest of the workshop’s participants2.
      </p>
      <p>
        At the 2012 CLEF conference, a pilot evaluation exercise was
organized for the CH domain, progressing work from the workshop
format to an evaluation lab [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. It was based on a real-life collection
of CH material: the complete index of the Europeana digital library3,
which encompassed ca. 23 million metadata records in 30 diferent
languages at that time. The information needs were based on 50
queries (harvested from Europeana logfiles), translated into English,
French and German. The tasks in this pilot exercise comprised both
a conventional system-oriented scenario (i.e., ad-hoc retrieval) as
well as more specialized retrieval scenarios for the CH domain–
the semantic enrichment and variability tasks4. The evaluation
followed the Cranfield paradigm by pooling the retrieval results
and assessing their relevance using human assessors.
2.2
      </p>
    </sec>
    <sec id="sec-3">
      <title>Lessons learned</title>
      <p>Although the 2011 CHiC workshop had already emphasized that
a focus on user interaction patterns was an important evaluation
aspect for the CH domain, this first CHiC lab in 2012 had no
interactive tasks. Instead, it utilized a document collection based on
Europeana and used queries harvested from Europeana logs to
construct information needs. The vision was to extend the ad-hoc style
retrieval evaluation with interactive and other evaluation
scenarios (particularly result presentations and alternative methods for
relevance assessments) in the next phases.</p>
      <p>The Europeana document collection, albeit a real-world
collection, turned out to be very challenging. While an efort was made to
normalize the provided metadata by wrapping it in a special XML
format and removing certain metadata fields, the content in the
metadata had very diferent descriptive qualities, depending on the
original content provider. Both the data sparseness and
multilinguality of the content posed serious challenges for the participants.
Image data, such as thumbnails of graphical material in Europeana,
could not be provided due to copyright reasons.</p>
      <p>Some of the provided topics were not suitable for relevance
assessment, because information needs could not always be
unambiguously inferred from the provided queries. The topics mostly
contained short queries of 1-3 words and only half of them had
short descriptions added, which did not help much when the topic
was vague. For the CH use case, IIR studies focusing on
interaction patterns were needed, so an additional interactive task was
proposed for the next round.
3
3.1</p>
    </sec>
    <sec id="sec-4">
      <title>INTERACTIVE CHIC TRACK @ CLEF 2013</title>
    </sec>
    <sec id="sec-5">
      <title>Setup</title>
      <p>
        The Interactive Track5 at the CHiC 2013 lab at CLEF (iCHiC) aimed
at building a bridge for IIR and behavior researchers to work in a
TREC-style evaluation environment. The idea was to develop a data
collection of IIR evaluation data, which could be re-used and built
upon. This task intentionally used a subset of the document
collection used in the other CHiC ad-hoc retrieval experimental tasks to
allow for later triangulation of results. Based on approximately 1
million metadata records from the English Europeana collection
and representing a broad range of CH objects, a simple search
interface was envisioned that would allow for browse and search
interactions with the metadata records for the IIR experiments [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ].
One non-goal oriented task (based on Borlund’s simulated work
tasks [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]), which simulated “casual” use of the system (“spend 20
minutes on the system and explore”) was provided to all experiment
participants.
      </p>
      <p>
        The same experimental infrastructure, which hosted the
webbased interfaces, documents and logged the interactions [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] was
provided to all participating research groups. All groups had to
recruit at least 30 participants: at least 10 of them had to be observed
in the lab, while at least 20 could use the system remotely. Apart
from the logged interactions on the systems, participants also filled
5http://www.promise-noe.eu/chic-2013/tasks/interactive-task
out pre- and post-task questionnaires, assessed their experience on
the User Engagement Scale [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] and evaluated the usefulness of
found objects (relevance assessment) and the interface (usability).
3.2
The iCHiC track ended up collecting data on 208 experiment
participants and their interactions from four participating research
groups. As a pilot experiment for collaborative data gathering, this
ifrst interactive task was successful overall.
      </p>
      <p>The most important lesson learned from iCHiC and the reason
why it was merged with the INEX Social Book Search lab (see
Section 4) was that the provided metadata records were not “rich”
enough in content to provide an interesting case study for casual
browsing and search. The sparseness of the document collection
had already been a problem for the ad-hoc retrieval tests, and real
users did not like them any better. The actual purpose of iCHiC—to
study users’ interactions with the content—was hampered by the
lack of interesting content.</p>
      <p>The experimental set-up and questionnaire instruments
represented a significant efort for the participants to complete. However,
the collected data was deemed necessary for further analysis.</p>
      <p>An original plan for the set-up of this task was to provide the
metadata collection, simulated work tasks, and the experimental
setup (questionnaires, logging protocol) to the participating
research groups and have them provide their own infrastructure for
data gathering. After discussions, the organizers concluded that
having diferent groups each building infrastructures would add
too much variability and also pose a large barrier to entry especially
for groups that did not have software or GUI design specialists.</p>
      <p>The data gathering at the University of Shefield’s servers had
the additional advantage of having a central place where all the
data was stored. This also posed a problem in later years, however,
when researchers afiliated with the University of Shefield moved
to a diferent institutions and neither the preservation and
maintenance of the infrastructure and data nor its legal ownership were
established.</p>
      <p>
        Four teams participated in the track, but not all of them were able
to recruit the 30 required participants. The uneven contribution
led to some discussion about the fairness of all groups then being
able to use the same data in later analyses. Initial discussions on
who would get to analyze the data with which research questions
in which priority (important for later publications) were never
successfully resolved as the organizers moved on to new tasks.
Some of the organizers published follow-up analyses of the data
[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], while other participating research groups did not.
      </p>
      <p>The participating groups all adhered to research ethics
requirements set forth by the University of Shefield, which hosted the
platform and the collected data. Diferent ethical requirements (e.g.,
based on national law) were not considered. The experimental
participants were asked to consent to their responses being shared not
just with the organizers, but with the wider research community,
which allows for re-use of the data. However, processes for enabling
the data sharing at a later time were not considered.</p>
      <p>
        The proposal for the interactive task had planned for a two-year
period, where the data gathering (user interaction logging) and
preliminary data analysis would happen in the first year. In year
two, an aggregated data set of all logged interactions was to be
released to the research community in order to inform an improved
system design for data gathering, which would start again in year
two. While the organizers provided an initial analysis of the data
[
        <xref ref-type="bibr" rid="ref32">32</xref>
        ], a planned follow-up analysis of the data did not take place.
4
      </p>
    </sec>
    <sec id="sec-6">
      <title>FIRST INEX iSBS TRACK @ CLEF 2014</title>
      <p>
        Social Book Search (SBS)6 started as a system-centered evaluation
campaign at INEX in 2011 [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], focusing on retrieval and ranking
of book metadata and associated user-generated metadata, such as
user reviews, ratings and tags from Amazon and LibraryThing [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
The main research question behind the track was how to exploit
the diferent types of curated and user-generated metadata for
realistic and complex book search requests as expressed on the
book discussion forums of LibraryThing. After its third year, the
organizers discussed changes to the SBS lab, specifically the nature
of book search tasks and how they are evaluated. At the same time,
the iCHiC organizers were looking for a diferent collection than
the Europeana cultural heritage objects, because they struggled to
come up with a meaningful task that engaged users, as the cultural
heritage metadata descriptions got little interest from participating
users. Initial discussions between the SBS and iCHiC organizers
suggested books and associated social media data might be a more
natural domain for participating users. By tying an interactive track
to a system-centred track around the same collection and tasks,
lessons learned in one track could feed into the other. Thus the
interactive SBS (iSBS) track was launched.
      </p>
      <p>
        Another important initiative was to study the diferent stages of
the search process and how they could be supported by diferent
interfaces [? ]. We considered models of the information search
process [
        <xref ref-type="bibr" rid="ref10 ref22 ref33">10, 22, 33</xref>
        ] in combination with models of how readers select
books to read [
        <xref ref-type="bibr" rid="ref15 ref28 ref29 ref30 ref31">15, 28–31</xref>
        ]. The book selection models distinguish
between book internal features (e.g., subject, treatment, characters,
ending) and external features (e.g., author, title, cover, genre) [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ],
but all are based on interaction in physical libraries and book shops,
so they had to be adapted to online environments, where the users
have no access to the full-text, but to additional data in the form of
user-generated content. Thus, selection is based only on external
features.
      </p>
      <p>
        This led to a three-stage model of browsing, searching and
selection, each with separate interfaces that carry over user choices
when switching between interfaces, based on Goodall [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. These
stages correspond to the three stages in Vakkari’s model of
prefocus, focus and post-focus [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ]. There was a lengthy discussion on
what functionalities to include in each stage and how to label the
diferent interfaces, to ensure that they made sense to users while
retaining a close connection to the three search stages and selection
stages from the literature. It took many iterations of UI choices to
adapt the system to the data that was available and deemed most
useful to the searcher based on book search studies [
        <xref ref-type="bibr" rid="ref15 ref28 ref30">15, 28, 30</xref>
        ]. Such
extensive tailoring of the search UI to the data collection naturally
makes reuse of UI components problematic.
      </p>
      <p>
        We were interested in the diference between goal-oriented and
non-goal oriented tasks, also to compare the non-goal oriented task
in the book domain to the same non-goal task in CH as used in
6http://marijnkoolen.com/Social-Book-Search/
iCHiC [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. In choosing a simulated work task, we considered tasks
that could be connected to specific stages in the search process,
similar to Pharo and Nordlie [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ].
4.1
      </p>
    </sec>
    <sec id="sec-7">
      <title>Setup</title>
      <p>
        The 2014 iSBS Track did not run as a full evaluation campaign,
because most of the year was used to prepare and set up the
multistage search system, tasks and protocol [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. However, each of these
components improved on the iCHIC set-up: a more interesting
collection, more focus on the user interfaces and more varied tasks.
The track organizers recruited a small number of participants (41)
but decided to open up the experiment to other groups only in the
second year. The multi-stage system was compared against a
baseline system that had mostly the same features but all in a single view.
The experiment included a training task, a goal-oriented task and a
non-goal oriented task. Pre- and post-experiment questionnaires
asked for demographic and cultural information, and the overall
experience and engagement with the interface. Post-task
questionnaires asked about the usefulness of diferent interface features.
Most of the questions were constructed specifically for this domain
and system, but the engagement questions were reused from the
iCHiC Track. The underlying experimental system of the iCHiC
experiments was also reused, but had to be modified somewhat to
ift the iSBS Track.
4.2
      </p>
    </sec>
    <sec id="sec-8">
      <title>Lessons learned</title>
      <p>Although the long preparation phase left little time for gathering
data, it resulted in a consensus among the large group of organizers
about the set of generic research questions that the experimental
setup and search systems should be able to address.</p>
      <p>The setup did not lead to enough complex interactions to identify
stage transitions in the search process and to test the value of
multi-stage interfaces. We considered multiple causes: (1) the tasks
were relatively simple and did not require complex interactions;
(2) the instructions and training task were not suficient to get
users familiar with such an interface; and (3) the interface was not
self-explanatory enough for users to interact with meaningfully.
The questionnaire data suggested the tasks could be completed
with little efort. We subsequently discussed whether we should
use more complex yet still realistic book search tasks.</p>
      <p>
        There was a conflict between the goal of studying social book
search with realistic tasks and the goal of studying the value of
interfaces for diferent stages in the search process. The models
of Kuhlthau [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] and Vakkari [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] are based on researchers and
students searching information to write a report or essay and are
perhaps less relevant to casual leisure search for books. Or perhaps
the users lack a felt need with the simulated tasks, but would display
more complex interactions if they really were searching for one or
more books to buy.
5
5.1
      </p>
    </sec>
    <sec id="sec-9">
      <title>SECOND iSBS TRACK @ CLEF 2015</title>
    </sec>
    <sec id="sec-10">
      <title>Changes from previous edition</title>
      <p>
        The second year of the iSBS track was open to other research groups
and had a longer data gathering period with many more participants
(192 in total) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Most of the setup was kept the same to allow
comparison with the results of the previous year. However, the
goal-oriented task was redesigned to have five diferent sub-tasks,
to make users interact more and for longer periods of time.
5.2
      </p>
    </sec>
    <sec id="sec-11">
      <title>Lessons learned</title>
      <p>We found that the fact that metadata in the book collection was
exclusively available in English was a hurdle for several non-native
English speaking users. As some participating groups contributed
many more users than other groups, with more non-native English
speakers, the balance was very diferent than the year before, which
makes comparison of cohorts dificult.</p>
      <p>Users also spent a lot of time on the goal-oriented task with
sub tasks, causing some of them to abandon the experiment after
the first of the two tasks. In their feedback, others indicated that
the overall experiment took too long. This could mean that the
gathered data is biased towards more persistent participants.
6
6.1</p>
    </sec>
    <sec id="sec-12">
      <title>THIRD iSBS TRACK @ CLEF 2016</title>
    </sec>
    <sec id="sec-13">
      <title>Changes from previous edition</title>
      <p>
        In the third edition of the iSBS track we made more significant
changes to the experimental setup. Some modifications were made
to the experiment structure to avoid participants abandoning the
experiment. The main change was that users only had one
mandatory task, but could continue with other tasks as long as they were
willing to continue. We added eight tasks based on book search
requests from the LibraryThing discussion forums to provide as
realistic tasks as possible [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Another big change was that we
focused only on the multi-stage interface to have fewer variables in
the gathered data. FInally, a third change was that each
participating institution had their own instance of the experiment to ensure
participant allocation was balanced for each institution, not only
for the overall experiment. This was mainly because some
institutions had specific cohorts, which they could not analyse across the
variables when balancing was only done overall.
6.2
      </p>
    </sec>
    <sec id="sec-14">
      <title>Lessons learned</title>
      <p>A comparison of the 2015 and 2016 cohorts showed very few
diferences in terms of time spent on goal-oriented and non-goal tasks
(the 2015 cohort showed no ordering efect between doing
goaloriented first and doing non-goal-oriented first), giving a strong
indication that the experiment structure and tasks are producing
reliable results. This also suggests that the two cohorts could be
combined to reduce the impact of individual diferences. One of
the hardest struggles in IIR evaluation campaigns is getting a large
and diverse enough set of users. Running such campaigns for long
periods requires continuity. The same experimental systems need
to remain available with at most small changes.</p>
      <p>The additional tasks based on requests from the LibraryThing
discussion forums resulted in diferent search behaviour from the
simulated goal-oriented and non-goal oriented tasks, but also showed
large diferences between the LibraryThing tasks themselves, with
more subjective, fiction-oriented tasks leading to less interaction
than concrete, non-fiction-oriented tasks. This suggests that IIR
ifndings may be very sensitive to the specifics of the simulated work
tasks used. It may also signal that in order to study information
search for reading for one’s own enjoyment, it is important that
users have ‘skin in the game’ and feel a personal connection to
leisure-focus work tasks.</p>
      <p>A problem encountered since running the 2016 iSBS Track is that
organizers move between institutions, which causes problems for
maintaining experimental systems, websites and repositories when
they loose institutional access to servers where the infrastructure
is hosted on. This in turn endangers the continuous availability of
research data and experiments. A natural solution to this recurring
problem could be an independent or inter-institutional platform
and repository for these systems and materials.
One important lesson learned from the iCHiC and iSBS tracks is the
importance of a suitable document collection that is realistic in both
size and content variety. The document collection used for iCHiC
was based on metadata from Europeana. Even though it represented
a broad range of diferent topics, the individual items in the dataset
were often sparse in their information content. In the iSBS tracks,
the document collection based on Amazon and LibraryThing data
ofered richer information that is more suitable for an interesting
task for users, but over the course of the diferent iSBS editions the
collection grew increasingly out-of-date. We found this negatively
afected search behavior as well as user engagement, especially
during the open search task. Users were looking for recent book
titles and got frustrated that they could only find books that were
at least six years old.</p>
      <p>While re-use of IIR resources is important for replicability and
reproducibility, oftentimes older document collections are simply
not interesting anymore for participants—something system-based
evaluation sufers from to a lesser degree. How to obtain
realistic, engaging, and up-to-date document collections, while at the
same time maintaining comparability across evaluation iterations,
remains an open question.</p>
      <p>Using a live document collection from a production system would
not allow for the same number of interactions to be studied and
poses dificulty for logging. It is not a simple alternative. Arguably,
what matters is not the stability of the set of documents that are
searchable, but the extent to which that set is up-to-date. Book
search interactions gathered in 2014 can be compared with those
gathered in 2019 if in both cases users could search books published
in the last five years, despite there being no overlap between the two
collections, as long as the type and amount of information about
books remains the same. To improve re-usability, it may be more
valuable to investigate and describe relevant aspects of document
collections, so that IIR studies with diferent document collections
can be compared based on their overlapping relevance aspects, e.g.,
recency, structure, type, and amount of information.</p>
      <p>Unfortunately, realistic document collections tend to exhibit a
larger degree of variety and complexity. This may make them more
engaging and interesting to participants, but it also increases the
complexity of the analysis of their behavior. One could argue that
to achieve a more detailed and thorough analysis, perhaps simpler
document collections would be more suitable, thereby setting up a
trade-of between complexity at the experimental and the analysis
stages.
7.2</p>
    </sec>
    <sec id="sec-15">
      <title>Information Needs</title>
      <p>
        In order to have meaningful impact, IIR studies should be
representative of the real-life variety in domains, system designs, and user
types and needs. One way in which iCHiC and iSBS attempted to
do this was by using a varied and realistic set of simulated work
tasks [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and cover stories that include extra context about the
background task to support the search behavior of participants.
How best to generate such realistic information needs is an open
question. One potentially fruitful approach in the 2016 iSBS track
involved taking real-world examples of complex information needs
from the LibraryThing forums and using them as optional
additional work tasks. These tasks were judged as being rich in variety
and detail by our participants, so this could be an interesting avenue
for future work. However, as the diference between fiction and
non-fiction tasks showed, personal interest does play an important
role in user engagement, so using real-world requests as simulated
work tasks is not a catch-all solution.
      </p>
      <p>Despite the proven usefulness of simulated work tasks, they are
still not the same as a user’s own information needs. We
therefore also included work tasks in iCHiC and iSBS that focused on
the participants’ own information needs. Non-restrictive tasks, in
which users can search whatever and however they want for as
long or short as they want, ofer more realistic aspects of
information behavior, but they make comparison more dificult. Diferences
between users can be due to them having wildly diferent ‘tasks’ in
mind. Although we experimented with diferent types of tasks, we
feel that we have only scratched the surface here. True information
needs can be multilingual and multicultural, making assessment
even more challenging.</p>
      <p>
        In addition, by focusing only on single information needs, we
believe that we are ignoring valuable aspects of the entire information
seeking process, both individual and collaborative [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Information
search is only one aspect of information behavior and is commonly
combined with exploration, browsing, or interaction with a
recommender system. Moreover, information behavior often takes place
across and between diferent devices (desktop vs. smartphone),
information systems (e.g. Amazon, LibraryThing, Google but also
social media channels like Facebook and Twitter [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]) and
modalities (digital vs. paper). On the other hand, a large number of varied
information needs and task contexts leads to a wide distribution
of experimental data points, which—if not enough users can be
persuaded to participate—may result in insuficiently significant
analyses.
7.3
      </p>
    </sec>
    <sec id="sec-16">
      <title>Study Participants</title>
      <p>Ideally, an IIR evaluation campaign recruits participants that are a
realistic representation of the general target population to avoid
the introduction of biases [8, p. 241]. However, in most IIR tracks—
including our own—researchers have often relied on recruiting
students from participating universities or research groups as
participants. Due to the short-term preparations and research cycles,
this is often the only way to include enough participants in an IIR
experiment. However, students are only one of several user groups
that need to be taken into account when dealing with complex
search tasks. It needs to be assured that users are selected based on
the specific system, feature or task to be tested as ignoring these
relationships and dependencies is likely to lead to invalid results.
Longer preparation time or access to user databases with
potential participants could help overcoming such biases in participant
recruitment.</p>
      <p>One of our findings in iSBS was that the cultural background
makes a significant diference. This is something that is rarely
reported in studies, but that appears to be an important aspect to
include. This also challenges the assumption that by providing the
same infrastructure and tasks but using diferent user group
distributions over the years or across national boundaries, measured
user interactions can be aggregated across these groups. There
were some analyses that clustered users based on certain aspects,
but the question remains which users can be viewed in
aggregation. Since academic IIR studies often rely on students, perhaps
studies can explicitly describe criteria of representativeness of the
target user group and add questions to the questionnaire that
capture aspects of users that allows mapping them to these aspects of
representativeness.
7.4</p>
    </sec>
    <sec id="sec-17">
      <title>Search User Interface</title>
      <p>The search user interface is perhaps the most important aspect to
get right for the IIR system used in the experiments as our
experience with iCHiC and iSBS tracks has taught us. The ubiquity and
popularity of modern-day search engines means that any search
user interface has certain minimum expectations to meet in terms of
layout and/or functionality. Not meeting these expectations means
risking distracting users and has a deleterious efect on their search
behavior. It would be beneficial if the IIR system ofered the
flexibility of choosing diferent search interfaces to study the efects of
the GUI on information seeking behavior. This was used to great
efect in the iSBS tracks to examine how diferent interfaces can
support the diferent search stages.</p>
      <p>
        This flexibility came at a price, however, as the software
components needed for the infrastructure became increasingly complex.
Both iCHiC and iSBS used a customized infrastructure developed
by one of the organizers, which made this possible [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
Maintaining customized software for future experiments is a hard problem.
Making infrastructure publicly available with appropriate
documentation is one way to alleviate this.
      </p>
      <p>Another dificulty is that the design of interfaces can be
informed by diferent theoretical models of information interaction.
In setting up the iSBS track and designing the multistage interface,
we discussed the appropriateness of numerous information
seeking/search models as well as book selection models and strategies,
how they are related to each other and how they correspond to or
are supported by aspects of the interface. A further complication
is that our choices were also steered by the research questions we
wanted to address. These issues add another set of variables to take
into account when considering comparison and reuse, and should
be described in studies.
7.5</p>
    </sec>
    <sec id="sec-18">
      <title>Experimental Setup</title>
      <p>IIR research usually includes several complex components that can
afect the quality and success of each experiment. While the
importance of some elements such as task development have been
extensively discussed, other aspects remain less considered. Only a
few studies report on or discuss measures used to analyze or
interpret results from IIR experiments. So far, IIR measures are highly
contextual varying from experiment to experiment. Measures used
span from data on interactions, such as session duration or clicks,
to qualitative data derived from questionnaires or interviews. Often
several data points are complemented or correlated.</p>
      <p>A collaborative IIR study requires that participating research
groups pool their gathered data and aggregating this data generates
substantial overhead. If institutions gather their own data,
aggregation may involve harmonizing inconsistencies. In the iCHiC and
iSBS tracks, a single system was used to gather all experimental
data, but this system had to be developed and adapted with each
iteration. A comprehensive documentation and accurate
descriptions of the data gathering tools is crucial for the evaluation and
re-use of these aspects in future studies.</p>
      <p>
        Diferent research groups and individuals often want to study
slightly diferent aspects of the problem domain or setup, requiring
diferent questions in the questionnaire, diferent tasks or users, or
diferent search system components. With every change, new users
need to be recruited, and comparisons to previously collected data
becomes harder. The long preparatory discussions among the iSBS
organizers regarding research questions, theoretical frameworks
and research designs suggests that it is possible to some extent to
incorporate a broad set of research questions in the overall research
design to allow a range of studies with the same setup. But often
research questions change or new questions are prompted during and
following the experiments, calling for an iterative development of
the research design. We are not aware of any guidelines on how to
best update designs to allow some backwards comparability. While
there is large variability in research questions and research designs,
the group would have benefited from re-using other researchers’
research design components, as was done with the User
Engagement Scale [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] in both iCHiC and iSBS. Apart from documenting
the broad aspects of the experimental set-up in the track overview
papers, a thorough documentation and subsequent publication of
questionnaire items, scales and other measures would not only help
other researchers in not having to re-invent standard items (e.g.,
demographic questions), but also support the standardization of IIR
research.
7.6
      </p>
    </sec>
    <sec id="sec-19">
      <title>Data Storage, Infrastructure Maintenance &amp;</title>
    </sec>
    <sec id="sec-20">
      <title>Intellectual Property Rights</title>
      <p>From 2011 until 2016, the various interactive tracks generated a
wealth of data, but also went through numerous organizational
changes, both in terms of the individuals involved and the
institutions that provided infrastructure. iSBS started as part of INEX
with some data stored on servers dedicated to INEX activities, other
data stored on servers maintained by one of the organizers’
institutions and the search indexes on another set of servers of another
organizing institution.</p>
      <p>Recurring questions are (1) what happens if organizers leave
and own crucial pieces of the data or infrastructure, and (2) what
happens when organizers move between institutions, thereby losing
access to data or infrastructure? For research data management
purposes, it is important that organizers of IIR studies make explicit
who is responsible for which part of the data and systems, who owns
the data or infrastructure, and what happens when organizers move
to other institutions or leave the project, or when new organizers
join.</p>
      <p>While always intended, the organizers of iCHiC and iSBS could
ifnd hardly any re-use of the gathered data for IIR studies or
triangulation studies with the related ad-hoc retrieval experiments in
CHiC or SBS. One reason may have been the insuficient availability
of the research data along with a proper rights clearance.</p>
      <p>There are generic platforms for storing and sharing scientific
data, such as the Open Science Framework7 and several Dataverse8
instances. These options solve some of the institutional issues, but
they lack the flexibility to run experimental systems or to add
domain-specific search and access features to datasets that make
a repository like RepAST useful to the IIR community. Publicly
available repositories for software and software infrastructures also
exist (e.g., GitHub), but present similar problems to the research
data repositories.</p>
      <p>Next to problems of storage and access of IIR research data,
there are issues of copyright, privacy and ethics. The questionnaire
informs users, which institutions are involved, but how should
organizers deal with new researchers and institutions joining? One
option is for organizers to agree on ethical guidelines for data
gathering, informed consent and data representation. For further data
re-use, it is crucial that users also give their informed consent for
additional analyses of their data. To create a trustworthy
environment, IIR researcher must provide concrete statements on who
and for what future purposes the data will be used. This should be
available additionally to the research data as part of an archived
and documented research design (see Section 7.5).
7.7</p>
    </sec>
    <sec id="sec-21">
      <title>Coordinating Collaborative Research</title>
      <p>IIR research is a highly interdisciplinary field bridging areas of
information seeking, interactive and system-centered (ranking,
evaluation) IR and user interface design. Accordingly, researchers from
diferent disciplines need to collaborate on complex questions and
experimental setups. Entering the field of IIR research is still a
challenge due to inconsistent or incompatible practices. Even for those
that work on IIR problems, no collaboration on systems, tasks, data,
participants or research questions can be observed. This might be
the case due to time and resource constraints caused by traditional
one-year research cycles as well as unawareness of other projects.</p>
      <p>In assessing the interest in an interactive track in the SBS Lab
during a joint iCHiC and SBS discussion session at CLEF 2013,
everyone who stated their interest was involved in the initial
discussions in setting up the track, to get an overview of what aspects
they wanted to investigate, thereby shaping the track around a
broad set of interests. This community input is valuable both in
attracting groups to actively participate and in creating a setup
with potential for long term community support and interest. A
challenge of the desired community input and larger organizer
numbers is the required additional overhead for the decision processes.
Once again, good documentation and communication is vital as are
well-understood guidelines or practices about the consequences of
researchers joining or leaving the initiative. Collaborative research
7https://osf.io/
8https://dataverse.org/
also entails a joint understanding of how research results will be
presented (e.g. rules of authorship and priority). This is especially
important in large collaborations.</p>
      <p>Collaborative research, by its very nature, tries to study aspects
which require a large-scale infrastructure, a large number of users
or other aspects that need a strong community input. This will
necessarily prolong the design and implementation phases of any
study, which is a detriment in a fast-paced scholarly context as
IIR research, especially within the large evaluation campaigns or
research conferences, which run on annual cycles. This type of
work would be best supported by a multi-year project or by moving
to a slower research output model.
8</p>
    </sec>
    <sec id="sec-22">
      <title>OUTCOMES: WHERE TO GO FROM HERE?</title>
      <p>
        Based on previous experiences from the CLEF/INEX Interactive
Social Book Search tracks, the two Supporting Complex Search
Tasks (SCST) community workshops (2015 and 2017) [
        <xref ref-type="bibr" rid="ref12 ref2">2, 12</xref>
        ] were
organized to discuss IIR challenges and future directions in the area
of complex search scenarios since cooperation between the diferent
tracks was rarely seen. The invited researchers from various fields
concluded that collaborative IIR campaigns have great potential,
but lack standardization and sustainability. Since previous eforts
such as the Systematic Review of Assigned Search Tasks (RepAST)
[
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] have only been partly noticed or used, it remains an open
question how to secure the persistence of IIR research designs and
results.
      </p>
      <p>
        The 2018 workshop on Barriers to IIR Resources Re-use (BIIRRR)
switched the focus to the analysis and preparation of requirements
for efective re-use of IIR resources or experiments [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The
development of quality standards for the curation and re-use of research
designs has been identified as one of the main tasks in this
initiative, along with the appropriate documentation and publication of
research data and the requisite software. Research designs were
named as a priority, because they appear to have the highest
potential for standardization and re-use in other IIR studies. This requires
a proper analysis of previously used research design elements as
well as motivation for or against potential re-use of these elements.
      </p>
      <p>One idea is to develop a platform that would allow researchers
from interdisciplinary fields to search for IIR research designs once
they have been identified as re-usable and are stored and
documented. Building such a repository requires an analysis and
implementation of user requirements both for accessing and contributing
research designs, the development and agreement on a standardized
data infrastructure as well as a maintenance plan coordinated by a
stable team of researchers.</p>
      <p>Apart from a proper documentation and archiving strategy, this
retrospective also pointed towards pre-study aspects, which are
instrumental for re-using experimental research data and designs.
This includes the establishment of guidelines for cross-national and
cross-institutional data collection, informed consent and data
distribution. As was declared several times in this paper, the reusability
of research designs and other IIR study components strongly
depends on the community’s willingness to develop and maintain
proper documentation, curation and publication guidelines. While
this may not be as rewarding as creating new research data by
implementing more IIR studies (and we need more of these as well),
it is crucial for the community to standardize in order to move
forward as a research discipline.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Beckers</surname>
          </string-name>
          , Norbert Fuhr, Nils Pharo, Ragnar Nordlie, and Khairun Nisa Fachry.
          <year>2010</year>
          .
          <article-title>Overview and Results of the INEX 2009 Interactive Track</article-title>
          .
          <source>In ECDL (Lecture Notes in Computer Science)</source>
          , Mounia Lalmas, Joemon M. Jose, Andreas Rauber,
          <source>Fabrizio Sebastiani, and Ingo Frommholz (Eds.)</source>
          , Vol.
          <volume>6273</volume>
          . Springer,
          <fpage>409</fpage>
          -
          <lpage>412</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Nicholas</given-names>
            <surname>Belkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Toine</given-names>
            <surname>Bogers</surname>
          </string-name>
          , Jaap Kamps, Diane Kelly, Marijn Koolen, and
          <string-name>
            <given-names>Emine</given-names>
            <surname>Yilmaz</surname>
          </string-name>
          .
          <year>2017</year>
          . Second Workshop on Supporting Complex Search Tasks.
          <source>In Proc CHIIR 2017</source>
          . ACM, New York, NY,
          <fpage>433</fpage>
          -
          <lpage>435</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Toine</given-names>
            <surname>Bogers</surname>
          </string-name>
          , Maria Gäde, Mark Hall, Luanne Freund, Marijn Koolen, Vivien Petras, and
          <string-name>
            <given-names>Mette</given-names>
            <surname>Skov</surname>
          </string-name>
          .
          <source>2018. Report on the Workshop on Barriers to Interactive IR Resources Re-use (BIIRRR</source>
          <year>2018</year>
          ).
          <source>SIGIR Forum 52</source>
          ,
          <issue>1</issue>
          (Aug.
          <year>2018</year>
          ),
          <fpage>119</fpage>
          -
          <lpage>128</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Pia</given-names>
            <surname>Borlund</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>The IIR Evaluation Model: A Framework for Evaluation of Interactive Information Retrieval Systems</article-title>
          .
          <source>Information Research 8</source>
          ,
          <issue>3</issue>
          (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Pia</given-names>
            <surname>Borlund</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Interactive Information Retrieval: An Evaluation Perspective</article-title>
          .
          <source>In CHIIR '16: Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval</source>
          . ACM, New York, NY, USA,
          <fpage>151</fpage>
          -
          <lpage>151</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Pia</given-names>
            <surname>Borlund</surname>
          </string-name>
          and
          <string-name>
            <given-names>Peter</given-names>
            <surname>Ingwersen</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>The Development of a Method for the Evaluation of Interactive Information Retrieval Systems</article-title>
          .
          <source>Journal of Documentation 53</source>
          ,
          <issue>3</issue>
          (
          <year>1997</year>
          ),
          <fpage>225</fpage>
          -
          <lpage>250</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Martin</given-names>
            <surname>Braschler</surname>
          </string-name>
          , Khalid Choukri, Nicola Ferro, Allan Hanbury, Jussi Karlgren, Henning Müller, Vivien Petras, Emanuele Pianta, Maarten de Rijke, and
          <string-name>
            <given-names>Giuseppe</given-names>
            <surname>Santucci</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>A PROMISE for Experimental Evaluation</article-title>
          .
          <source>In Multilingual and Multimodal Information Access Evaluation</source>
          , Maristella Agosti, Nicola Ferro, Carol Peters, Maarten de Rijke, and Alan Smeaton (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg,
          <fpage>140</fpage>
          -
          <lpage>144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Donald</surname>
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Case</surname>
          </string-name>
          and
          <string-name>
            <surname>Lisa M. Given</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Looking for Information: A Survey of Research on Information Seeking</article-title>
          , Needs, and
          <string-name>
            <surname>Behavior</surname>
          </string-name>
          (4th ed.). Emerald Group Publishing, Bingley, UK.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Otis</given-names>
            <surname>Chandler</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>How Consumers Discover Books Online</article-title>
          .
          <source>In Tools of Change for Publishing. O'Reilly.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>David</given-names>
            <surname>Ellis</surname>
          </string-name>
          .
          <year>1989</year>
          .
          <article-title>A behavioural model for information retrieval system design</article-title>
          .
          <source>Journal of information science 15</source>
          ,
          <fpage>4</fpage>
          -
          <lpage>5</lpage>
          (
          <year>1989</year>
          ),
          <fpage>237</fpage>
          -
          <lpage>247</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Maria</surname>
            <given-names>Gäde</given-names>
          </string-name>
          , Nicola Ferro, and Monica Lestari Paramita.
          <year>2011</year>
          .
          <article-title>CHiC 2011 - Cultural Heritage in CLEF: From Use Cases to Evaluation in Practice for Multilingual Information Access to Cultural Heritage</article-title>
          . In CLEF Notebook Papers/Labs/Workshop.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Maria</surname>
            <given-names>Gäde</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Mark M.</given-names>
            <surname>Hall</surname>
          </string-name>
          , Hugo Huurdeman, Jaap Kamps, Marijn Koolen, Mette Skove, Elaine Toms, and
          <string-name>
            <given-names>David</given-names>
            <surname>Walsh</surname>
          </string-name>
          .
          <source>2015. Report on the First Workshop on Supporting Complex Search Tasks. SIGIR Forum 49</source>
          ,
          <issue>1</issue>
          (
          <year>June 2015</year>
          ),
          <fpage>50</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Maria</surname>
            <given-names>Gäde</given-names>
          </string-name>
          , Mark Michael Hall, Hugo C. Huurdeman, Jaap Kamps, Marijn Koolen, Mette Skov, Toine Bogers, and
          <string-name>
            <given-names>David</given-names>
            <surname>Walsh</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Overview of the SBS 2016 Interactive Track</article-title>
          .
          <source>In Working Notes of the CLEF 2016 Conference (CEUR Workshop Proceedings)</source>
          , Krisztian Balog, Linda Cappellato,
          <source>Nicola Ferro, and Craig Macdonald (Eds.)</source>
          , Vol.
          <volume>1609</volume>
          . CEUR-WS.org,
          <volume>1024</volume>
          -
          <fpage>1038</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Maria</surname>
            <given-names>Gäde</given-names>
          </string-name>
          , Mark Michael Hall, Hugo C. Huurdeman, Jaap Kamps, Marijn Koolen, Mette Skov, Elaine Toms, and
          <string-name>
            <given-names>David</given-names>
            <surname>Walsh</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Overview of the SBS 2015 Interactive Track</article-title>
          .
          <source>In Working Notes of the CLEF 2015 Conference (CEUR Workshop Proceedings)</source>
          , Linda Cappellato, Nicola Ferro,
          <string-name>
            <given-names>Gareth J. F.</given-names>
            <surname>Jones</surname>
          </string-name>
          , and Eric SanJuan (Eds.), Vol.
          <volume>1391</volume>
          . CEUR-WS.org.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Deborah</given-names>
            <surname>Goodall</surname>
          </string-name>
          .
          <year>1989</year>
          .
          <article-title>Browsing in public libraries</article-title>
          .
          <source>Library and Information Statistics Unit LISU.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Mark</given-names>
            <surname>Hall</surname>
          </string-name>
          , Robert Villa, Sophie Rutter, Daniel Bell, Paul Clough, and
          <string-name>
            <given-names>Elaine</given-names>
            <surname>Toms</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Shefield Submission to the CHiC Ineractive Task: Exploring Digital Cultural Heritage</article-title>
          .
          <source>CLEF Working Notes.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Mark</given-names>
            <surname>Michael</surname>
          </string-name>
          <string-name>
            <surname>Hall</surname>
          </string-name>
          , Hugo C. Huurdeman, Marijn Koolen, Mette Skov, and
          <string-name>
            <given-names>David</given-names>
            <surname>Walsh</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Overview of the INEX 2014 Interactive Social Book Search Track</article-title>
          .
          <source>In Working Notes of the CLEF 2014 Conference (CEUR Workshop Proceedings)</source>
          , Linda Cappellato, Nicola Ferro,
          <source>Martin Halvey, and Wessel Kraaij (Eds.)</source>
          , Vol.
          <volume>1180</volume>
          . CEUR-WS.org,
          <volume>480</volume>
          -
          <fpage>493</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Mark</surname>
            <given-names>M Hall</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Spyros</given-names>
            <surname>Katsaris</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Elaine</given-names>
            <surname>Toms</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>A Pluggable Interactive IR Evaluation Work-bench</article-title>
          .
          <source>In European Workshop on Human-Computer Interaction and Information Retrieval</source>
          .
          <fpage>35</fpage>
          -
          <lpage>38</lpage>
          . http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1033</volume>
          /paper4.pdf
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Mark</given-names>
            <surname>Michael</surname>
          </string-name>
          Hall and
          <string-name>
            <given-names>Elaine</given-names>
            <surname>Toms</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Building a Common Framework for IIR Evaluation</article-title>
          .
          <source>In Information Access Evaluation</source>
          . Multilinguality, Multimodality, and
          <string-name>
            <surname>Visualization</surname>
          </string-name>
          , Pamela Forner, Henning Müller, Roberto Paredes, Paolo Rosso, and Benno Stein (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg,
          <fpage>17</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Preben</surname>
            <given-names>Hansen</given-names>
          </string-name>
          , Chirag Shah, and
          <string-name>
            <surname>Claus-Peter Klas</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Collaborative Information Seeking</article-title>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Marijn</surname>
            <given-names>Koolen</given-names>
          </string-name>
          , Gabriella Kazai, Jaap Kamps, Antoine Doucet, and
          <string-name>
            <given-names>Monica</given-names>
            <surname>Landoni</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Overview of the INEX 2011 Books and Social Search Track. In Focused Retrieval of Content and Structure: 10th International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX</article-title>
          <year>2011</year>
          )
          <article-title>(LNCS), Shlomo Geva</article-title>
          ,
          <source>Jaap Kamps, and Ralf Schenkel (Eds.)</source>
          , Vol.
          <volume>7424</volume>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Carol</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Kuhlthau</surname>
          </string-name>
          .
          <year>1991</year>
          .
          <article-title>Inside the search process: Information seeking from the user's perspective</article-title>
          .
          <source>Journal of the American Society for Information Science</source>
          <volume>42</volume>
          ,
          <issue>5</issue>
          (
          <year>1991</year>
          ),
          <fpage>361</fpage>
          -
          <lpage>371</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Ragnar</given-names>
            <surname>Nordlie</surname>
          </string-name>
          and
          <string-name>
            <given-names>Nils</given-names>
            <surname>Pharo</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Seven Years of INEX Interactive Retrieval Experiments - Lessons and Challenges</article-title>
          .
          <source>In Information Access Evaluation</source>
          . Multilinguality, Multimodality, and Visual Analytics, Tiziana Catarci, Pamela Forner, Djoerd Hiemstra, Anselmo Peñas, and Giuseppe Santucci (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg,
          <fpage>13</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Heather L. O'Brien and Elaine</surname>
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Toms</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>The development and evaluation of a survey to measure user engagement</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology 61</source>
          ,
          <issue>1</issue>
          (
          <year>2010</year>
          ),
          <fpage>50</fpage>
          -
          <lpage>69</lpage>
          . DOI:http://dx.doi.org/10. 1002/asi.21229
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Vivien</surname>
            <given-names>Petras</given-names>
          </string-name>
          , Toine Bogers, Elaine Toms, Mark Hall, Jacques Savoy, Piotr Malak, Adam Pawłowski, Nicola Ferro, and
          <string-name>
            <given-names>Ivano</given-names>
            <surname>Masiero</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Cultural Heritage in CLEF (CHiC) 2013</article-title>
          . In Information Access Evaluation. Multilinguality, Multimodality, and
          <string-name>
            <surname>Visualization</surname>
          </string-name>
          , Pamela Forner, Henning Müller, Roberto Paredes, Paolo Rosso, and Benno Stein (Eds.). Springer Berlin Heidelberg,
          <fpage>192</fpage>
          -
          <lpage>211</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Vivien</surname>
            <given-names>Petras</given-names>
          </string-name>
          , Nicola Ferro, Maria Gäde, Antoine Isaac, Michael Kleineberg, Ivano Masiero, Mattia Nicchio, and
          <string-name>
            <given-names>Juliane</given-names>
            <surname>Stiller</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Cultural Heritage in CLEF (CHiC) Overview 2012</article-title>
          .
          <source>In CLEF 2012 Labs and Workshops.</source>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Nils</given-names>
            <surname>Pharo</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ragnar</given-names>
            <surname>Nordlie</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Examining the efect of task stage and topic knowledge on searcher interaction with a digital bookstore</article-title>
          .
          <source>In Proceedings of the 4th Information Interaction in Context Symposium. ACM</source>
          ,
          <fpage>4</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Kara</given-names>
            <surname>Reuter</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Assessing aesthetic relevance: Children's book selection in a digital library</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          <volume>58</volume>
          ,
          <issue>12</issue>
          (
          <year>2007</year>
          ),
          <fpage>1745</fpage>
          -
          <lpage>1763</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>Catherine</given-names>
            <surname>Sheldrick Ross</surname>
          </string-name>
          .
          <year>1999</year>
          .
          <article-title>Finding without seeking: the information encounter in the context of reading for pleasure</article-title>
          .
          <source>Information Processing &amp; Management</source>
          <volume>35</volume>
          ,
          <issue>6</issue>
          (
          <year>1999</year>
          ),
          <fpage>783</fpage>
          -
          <lpage>799</lpage>
          . DOI:http://dx.doi.org/10.1016/S0306-
          <volume>4573</volume>
          (
          <issue>99</issue>
          )
          <fpage>00026</fpage>
          -
          <lpage>6</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>Catherine</given-names>
            <surname>Sheldrick Ross</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Making choices: What readers say about choosing books to read for pleasure</article-title>
          .
          <source>The Acquisitions Librarian</source>
          <volume>13</volume>
          ,
          <issue>25</issue>
          (
          <year>2000</year>
          ),
          <fpage>5</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>Katariina</given-names>
            <surname>Saarinen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Pertti</given-names>
            <surname>Vakkari</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>A sign of a good book: readersâĂŹ methods of accessing fiction in the public library</article-title>
          .
          <source>Journal of Documentation 69</source>
          ,
          <issue>5</issue>
          (
          <year>2013</year>
          ),
          <fpage>736</fpage>
          -
          <lpage>754</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>Elaine</given-names>
            <surname>Toms</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Hall</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>The CHIC interactive task (CHICi) at Clef2013</article-title>
          . CLEF Working Notes.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>Pertti</given-names>
            <surname>Vakkari</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>A theory of the task-based information retrieval process: a summary and generalisation of a longitudinal study</article-title>
          .
          <source>Journal of documentation 57</source>
          ,
          <issue>1</issue>
          (
          <year>2001</year>
          ),
          <fpage>44</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Barbara</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Wildemuth</surname>
            and
            <given-names>Luanne</given-names>
          </string-name>
          <string-name>
            <surname>Freund</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Assigning Search Tasks Designed to Elicit Exploratory Search Behaviors</article-title>
          .
          <source>In Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval (HCIR '12)</source>
          . ACM, New York, NY, USA, Article
          <volume>4</volume>
          , 10 pages. DOI:http://dx.doi.org/10.1145/2391224. 2391228
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>