<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>FDIA</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Evaluating the Success of Search Sessions in Interactive Information Retrieval</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Milan Bicocca</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>17</volume>
      <fpage>17</fpage>
      <lpage>18</lpage>
      <abstract>
        <p>Interactive Information Retrieval (IIR) studies include both system evaluations and users' information search behaviors, and the interaction of users with systems and information. The development and testing of appropriate measures and methodologies for evaluating IIR is central to information science. To better understand users' needs and support their interactions with information, IIR systems need to be able to understand the goals underlying users' search behaviors. This work is conceived to address some aspects of this problem. In particular, it considers how people evaluate the success of a complete search session and of the various search intentions within a search session, with respect to the task which motivated the search. In this paper a pilot study is described.</p>
      </abstract>
      <kwd-group>
        <kwd>Information Retrieval Interactive Information Retrieval Work Task Evaluation Search Session</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The main goal of an Information Retrieval System (IRS) is to return to users
the most relevant documents in response to their queries, thus respecting the
socalled paradigm \one query-one response". However, people usually engage in
longer and more complex information seeking episodes. Therefore, when people
try to address a new type of problem, they need to engage in many activities
other than just clicking on a search result retrieved by the system. In IIR, the
crucial point is to develop systems that allow the user to easily access the
information s/he needs, while also providing solutions to a series of problems that
may arise during a search session. According to Cole [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the evaluation of a
system should focus on how users are able to achieve their goals, how the system
helps users to identify and engage in appropriate interactions, and the
relationship between the results of these interactions and the progress towards the goals.
In order to understand and develop suitable measures for the evaluation of IIR
systems, it is necessary to know how people evaluate the system's support for
achieving the goals of an intention during a search session, and, in general, how
they evaluate the success in achieving the goals of the entire search session. In
order to do this, it is necessary to understand what these intentions are, and
what the nature of the work tasks is since it has been shown that task's topic
has an essential in uence on user behavior during a search session [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This
paper presents a methodology and a pilot study of a project undertaken during my
master's thesis.1
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Works</title>
      <p>
        Several studies have shown that capturing a user's information need is one of
the most critical aspects of IR. Although it is di cult to create an all-including
de nition of an information need, most information needs can be characterized
in terms of tasks and topics: a task represents the goal or purpose of the search,
this is what a user wants to accomplish by searching, e.g., a user wants to plan
a trip; a topic represents the subject area that is the focus of the task, e.g., the
user might plan a trip to Africa. Research has also shown that information needs
evolve during the search process, as these are dynamic information needs. This
evolution is due to the fact that during a search for information, people learn
more about their needs, and consequently their pertinent behaviors change. Li
&amp; Belkin in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] de ne tasks as \activities people attempt to accomplish in order
to keep their work or life moving on". More in general, a work task is de ned as
an activity people complete in order to achieve their work's goal, e.g., writing a
report, planning a vacation. Moreover, a work task is without a doubt a
motivation for information search, and includes both a) information-seeking tasks and
b) information-search tasks. With information-search is intended information
search only through an information system. Instead, with information-seeking is
intended the fact that users may also seek information from other sources, such
as human or printed documents. One important development in IIR evaluation
and experimentation has been the simulated work task that describes the
situation leading to the information need. The nature of the task that leads a person
to engage in the interaction with an IRS in searching for information has been
shown to in uence the behavior of users during the search sessions.
      </p>
      <p>
        In recent years, the characteristics of search tasks have been studied, such
as how di erent search tasks could be classi ed, what they are in uenced by,
and how they di er according to their attributes. A concrete example is a study
conducted by Wildemuth, Freund, and Toms [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] in 2014, in which two attributes
of the search tasks are studied and implemented: task complexity and task di
culty. That work provides a \detailed revision of existing practice in developing
search tasks to test, observe or control" these two attributes, because as they
say \it is not clear if these attributes are mutually exclusive or share some
dimensions, as current de nitions have tended to blur the distinction" [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>A New Paradigm of User Study for Evaluating IIR</title>
      <p>This project for the evaluation of IIR systems aims at investigating the following
issues: a) given a search session in response to a motivating task, how would
1 The project was undertaken under the supervision of Prof. Nicholas Belkin at
Rutgers University and Prof. Gabriela Pasi at the University of Milano - Bicocca
we evaluate the system support for that search session? b) given an intention
associated with a query segment, how would we evaluate the system support for
that intention? c) can we discover measures for evaluating the contribution of
each query segment to the success of the search session as a whole?</p>
      <p>To address these issues, the main practical goal of the work is to develop a
framework for the evaluation of IIR Systems. To do this, the following research
questions have to be answered:
1) RQ1 How do people judge the success of a search session?
2) RQ2 How useful was each intention/query segment in accomplishing the
goal of the search session?</p>
      <p>RQ1 concerns the ability to learn how satis ed are users in carrying out
the search task, or how successful, according to them, was their search session.
Speci cally, it wants to investigate the kind of measures that users adopt when
they evaluate the whole search session: what is/are appropriate measure(s) for
evaluating the system support of the search session? Do di erent types of
motivating tasks require di erent evaluating measures?</p>
      <p>RQ2 aims to learn about the usefulness of each intention of the search
session and the usefulness of each query segment of the same search session in
accomplishing the goal of the search task. Furthermore, it aims at
understanding what are the appropriate measures for evaluating the contribution of each
intention/query segment in accomplishing the goal of the search session.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Research methodology</title>
      <p>In the performed pilot study, users were required to follow a speci c procedure,
whose steps are summarized in Table 1.</p>
      <p>Procedure Time
1 Read and sign the consent form 3 min
2 Initial questionnaire 2 min
3 Shown the tutorial about the system 10 min
4 Shown the task and the topic of the search 3 min
5 Second questionnaire 2 min
6 Search, all behaviors are logged 20 min
7 Replay the search, by query segment &amp; annotation of query segments 40 min
8 Search session evaluation and comparison 12 min
As shown in Table 1, prior to conducting their searches, subjects were asked
to read and sign a consent form in which each of them was informed about the
experiment. Then, searchers completed a brief questionnaire about their
demographic characteristics and their normal searching behaviors. Next, searchers
were given a video tutorial which was designed to interactively guide them
through the workings of the experimental system. In the next step, to the users
were shown the tasks and the topics of the search. Before doing their search,
subjects were asked to take familiarity with the topic and the motivating task
and to anticipate their supposed di culty in completing the assignment. While
doing the search they had the possibility of saving/unsaving pages they
considered useful/not useful for accomplishing the task. The search ended when the
time required for the search was expired or when users have felt that the task
was accomplished. After the search was completed, participants were required to
ll a questionnaire, whose focus was understanding their intentions in each query
segment and the successes related to them. At the end of the entire searching
experience, subjects participated in a structured post-search interview which was
designed to elicit con dence, attitudes, strategies, and behaviors directly related
to the success or unsuccessful of their search session.
4.1</p>
      <p>
        Study Motivating Tasks
The task type classi cation framework proposed by Li &amp; Belkin [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] was used to
construct two motivating tasks for this study. The speci c intention in task
construction was to design motivating tasks that di ered systematically on several
of the facets of the task that were shown to a ect search behavior. In
particular, two task types were chosen because they have shown, in previous work, to
lead to signi cant di erences in search behaviors, including frequency of search
intentions. We hypothesize that the understanding of success in the two tasks is
di erent.
      </p>
      <p>The motivating tasks used in the study are based on the following Task
Scenario: You are about to plan a vacation with your partner to improve your
personal relationship between you and him/ her. You want to do this after the
end of Spring semester, when you have 18-26 May when you'll both be free, and
can book for a week somewhere, including travel time. The considered two tasks
to be executed by participants are Task 1 or Task 2, summarized in Table 2
below.</p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>Undergraduate students were recruited from Rutgers University to participate
in the study. The age of participants ranged between 18 and 21, and the
average number of years the participants have been conducting online searches was
11 years. All participants rated themselves as an experienced searcher in using
search engines (e.g., Google, Bing). Some of them indicated that they are also
experts in searching through social media (e.g., Facebook, Twitter, YouTube), or
marked that they are also experts in searching within community-based forums
(e.g., Quora, Stack Over ow). However, only one participant rated himself as
an experienced searcher in using other search tools, such as a library database.
In general, on average, participants were experienced with online information
searching, because they usually search for information online for their every-day
needs (e.g., homework, studies).</p>
      <p>In the rst part of the study, it may be said that most of the participants
were successful with their intentions: in fact, in most cases, users have managed
to complete the intentions of query segments, so these intentions have been
marked as successful. During the search, however, there were cases in which the
participants failed to positively conclude some intentions, which is why they have
been labeled as non-successful intentions. Summarizing, 77% of the intentions
chosen during the search sessions were marked as successful, and 15% as
nonsuccessful. Moreover, some intentions have not been reported either as successful
or as not successful, this number covers 8% of the total intentions. Instead,
the reasons for which users have reformulated their queries can be grouped as
follows: a) the user entered the new query because s/he was able to nd the
best-rated resort from what TripAdvisor stated, b) the user was still trying to
nd information from each of the websites, c) the user wanted to nd another
review of the resort besides TripAdvisor, d) the participant found the top resort
in Vietnam and was looking for more information about the resort, e) the user
was trying to load the website for another resort but it would not load, so s/he
moved onto the next resort which also would not load, f ) the user wanted to
obtain details about the best luxury resorts in Malaysia.</p>
      <p>It can be said that the most important part of this project was to understand
what the users meant by the success of a task, what it means to achieve the goal
of the task and positively conclude the search session. For this reason, all users
were asked to provide us with their own and personal de nition of successful.
To the question \What do you mean by successful?", we have received several
answers, which vary from the simplest answer in which the user says that s/he
was able to nd three websites/resort in three countries, to the most reformulated
ones in which the users explain that s/he found what s/he was looking for to
the best of his/her ability, or that s/he did not nd package pricing, rather the
nightly pricing for each of the resorts and their amenities.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Discussion of the Results</title>
      <p>The most important outcomes from the study are: a) even with this small sample,
participants made use of almost all the available intentions and they seem to have
been su cient to describe what the participants wanted to accomplish; b) the
reasons for judgments of success or unsuccess of the di erent intentions depend
on the considered intention, thus indicating that they would require di erent
measures for evaluating the system support. What such measures would be could
not be determined, given the small number of participants, but with more data,
it seems to be possible to infer categories of di erent measures; c) the reasons
for the success of the search session have to do with the accomplishment of the
task, which means that any possible measure for evaluating the search session
as a whole should be directly related to the type of motivating task. Since there
are two task types in the study, with more data it should be possible to identify,
based on both the reasons given and the reasons for changes of search strategy,
some general evaluation measures for the di erent tasks; d) the descriptions of
plans or search strategies and the reasons for changing can clearly be sources for
identifying criteria, and possible measures, for evaluation of the search session
as a whole.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions and Future Developments</title>
      <p>In the eld of Interactive Information Retrieval (IIR), the main goal of this
work was to understand the reasons why people change their queries, what is
successful to them and why, and, more precisely, to understand how people
evaluate the success of a search episode. The few data obtained in this pilot
study and described in this paper, indicate that we are in a promising direction
to arrive at de ning standard methods and metrics for the evaluation of IIR
systems. In order to validate in a more complete way our hypothesis and results,
it will be necessary to wait for the conclsion of the project, and for the global
collection of data relative to all the participants expected for this project.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Li</surname>
          </string-name>
          , Yuelin and Belkin,
          <string-name>
            <surname>Nicholas</surname>
            <given-names>J</given-names>
          </string-name>
          :
          <article-title>A faceted approach to conceptualizing tasks in information seeking</article-title>
          ., vol.
          <volume>44</volume>
          , pp.
          <year>1822</year>
          {
          <year>1837</year>
          .
          <string-name>
            <given-names>Information</given-names>
            <surname>Processing</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>Management</surname>
          </string-name>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cole</surname>
            , Michael and Liu,
            <given-names>J</given-names>
            ingjing and Belkin, Nicholas and Bierig, R and Gwizdka, J
          </string-name>
          and Liu, C and Zhang, J and Zhang, X:
          <article-title>Usefulness as the criterion for evaluation of interactive information retrieval</article-title>
          ., vol.
          <volume>44</volume>
          , pp.
          <volume>1</volume>
          {
          <issue>4</issue>
          .
          <string-name>
            <surname>Proc</surname>
          </string-name>
          .
          <source>HCIR</source>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Shinjeng and Xie, Iris: Behavioral changes in transmuting multisession successive searches over the web</article-title>
          ., vol.
          <volume>64</volume>
          , pp.
          <volume>1259</volume>
          {
          <fpage>1283</fpage>
          .
          <article-title>Journal of the American Society for Information Science</article-title>
          and Technology (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Wildemuth</surname>
            , Barbara and Freund, Luanne and
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Toms</surname>
          </string-name>
          , Elaine:
          <article-title>Untangling search task complexity and di culty in the context of interactive information retrieval studies</article-title>
          ., vol.
          <volume>44</volume>
          , pp.
          <volume>1118</volume>
          {
          <fpage>1140</fpage>
          . Journal of Documentation (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Hienert, Daniel and Mitsui, Matthew and Mayr, Philipp and Shah, Chirag and Belkin,
          <string-name>
            <surname>Nicholas</surname>
            <given-names>J</given-names>
          </string-name>
          :
          <article-title>The role of the task topic in web search of di erent task types</article-title>
          ., pp.
          <volume>72</volume>
          {
          <fpage>81</fpage>
          .,
          <source>Proceedings of the 2018 Conference on Human Information Interaction &amp; Retrieval</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>