<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>From Task-based Evaluation to Feature-based Evaluation in Personal Search</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sargol Sadeghi</string-name>
          <email>seyedeh.sadeghi@rmit.edu.au</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mark Sanderson</string-name>
          <email>mark.sanderson@rmit.edu.au</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Falk Scholer</string-name>
          <email>falk.scholer@rmit.edu.au</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Personal Search, Task-based Evaluation, Task, Search</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Characteristic.</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computer Science &amp;, Information Technology, RMIT University</institution>
          ,
          <addr-line>Melbourne</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Task-based evaluation has been suggested as a solution for comparing search systems in the personal context. However, as personal search tasks are broad, dependent on users, and have different levels of specificity [3], focusing on the building blocks (or characteristics) of these tasks could provide a more reliable and maintainable alternative for evaluation. Moreover, the characteristics can be used to determine to what extent evaluation results are generalizable and comparable across different users and tasks. In this position paper, a characteristic reference model for personal search tasks will be introduced. Based on this model, different search systems can be compared not only in relation to task types, but also in terms of the characteristics that are most influential in search tasks, increasing the level of detail at which comparisons can be made.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
      <p>H.3.3 [Information Storage and Retrieval]: Information Search
and Retrieval</p>
    </sec>
    <sec id="sec-2">
      <title>1. INTRODUCTION</title>
      <p>
        Providing search solutions to retrieve information that has been
seen previously is the main focus in the personal search context
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. To compare the effectiveness of search systems in the
personal context, identifying common search tasks is of key
importance. For example, Kelly and Teevan [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] proposed building
a shared collection of common tasks instead of studying tasks in
separate research groups. Common tasks for evaluation purposes
have also been suggested in other disciplines such as HCI (Human
Computer Interaction). For an instance, Whittaker et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
introduced reference tasks with the goal of comparing interaction
techniques.
      </p>
      <p>
        However, it is challenging to identify common search tasks,
particularly in the personal context, due to the variety of search
needs among different users. Controlling the variety of tasks
Presented at EuroHCIR2012. Copyright © 2012 for the individual
papers by the papers' authors. Copying permitted only for private and
academic purposes. This volume is published and copyrighted by its
editors.
under a set of task types was proposed as an approach for
evaluating personal search systems by Elseweiler and Ruthven
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In this study, three task types were identified based on a
search characteristic to control the evaluation experiments; and a
task-based evaluation conducted where the search systems are
compared in relation to the search tasks. However, as the
taskbased evaluation focuses on specific task scenarios, there is a
disadvantage that the acquired results cannot be generalized [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
This is while solving task-based evaluation problems and
developing a new type of evaluation has been highlighted [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
To overcome this problem, we propose to incorporate the
underlying characteristics of tasks. These characteristics, being
more general in nature, can support the identification of
commonalities across different tasks in terms of their components.
For this purpose, we introduce a characteristic reference model in
the next section.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. CHARACTERISTIC REFERENCE</title>
    </sec>
    <sec id="sec-4">
      <title>MODEL</title>
      <p>With the focus on search characteristics to compare personal
search systems, first we must acquire knowledge about the range
of characteristics that can affect the retrieval process. Based on
these characteristics, we can then identify similar tasks, which
have common search characteristics. This notion of explicit
similarity supports a fair comparison of search systems in relation
to the user tasks.</p>
      <p>However, it is also possible to define implicit similarity between
tasks. Here, tasks do not necessarily share the same set of
characteristics, but their characteristics have been demonstrated to
have the same effect on the retrieval process. Consider the
following simple example of the implicit similarity concept.
From pilot user studies that we have conducted with the aim of
identifying different types of personal search tasks, the user’s
level of knowledge in relation to the target information and task
has been observed as a search characteristic influential in retrieval
results. Based on this characteristic, we proposed a hierarchy of
personal task types for level of knowledge, as shown in Figure 1.
Personal</p>
      <p>Tasks
Known</p>
      <p>Unknown
Remembered</p>
      <p>Seen</p>
      <p>Unseen</p>
      <p>Notremembered
In the proposed task hierarchy, for example, the user’s state of
knowledge might be that the target information is unknown, where
the user does not know whether the required information item
exists. Another possibility is that the user is searching for an
information item that they know exists and have seen before, but
is currently not-remembered.</p>
      <p>In our observations of users, there are situations where user search
behavior for not-remembered tasks is the same as for unknown
tasks. For example, one of these situations is when the last access
time to the information is prior to last month; here, the user does
not know how to get to the information.</p>
      <p>
        In the literature, the time of last access to required information has
been called the task temperature. For this search characteristic,
three values of hot (accessed within the last week), warm
(accessed within the last month), and cold (accessed prior to the
last month) have been suggested [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Based on this observation
and from the gathered characteristics and values, it is possible to
derive a simple rule as an example of implicit task similarity,
illustrated in Figure 2.
      </p>
      <p>If:
Then:
Task A similar to Task B.
From Figure 2, it can be seen that if there are two task scenarios
identified under two different types (e.g. unknown and
notremembered), in some situations (e.g. cold temperature) they
could have a similar effect on the retrieval process. In other
words, it is possible that tasks which are in fact highly similar can
occur under different task types. Such relationships have not been
considered in task-based evaluations, where the focus is on
specific task scenarios.</p>
      <p>
        The previous scenario is a simple example; more realistically, it is
likely that many different characteristics affect search tasks, in
terms of: user, search need, search strategy, search context,
information, and the collection of information. Deriving
comprehensive rules for task similarities requires extensive user
studies in both qualitative and quantitative aspects. We intend to
extrapolate a set of rules composed of Characteristic: Value
settings, as a reference model for identifying similar tasks.
In building this reference model, we need to further explore:
 the key characteristics that are influential in a search
task
 interdependencies between characteristics
 the importance of characteristics in affecting retrieval
results
Such a model will incorporate the characteristics proposed when
studying tasks in different search applications (such as the goal of
the user, task complexity, and topic familiarity [
        <xref ref-type="bibr" rid="ref2 ref4 ref6">2, 4, 6</xref>
        ], in both
work task and search task aspects), as these are potentially
applicable in the personal context. Characteristic settings will be
derived by observing real task scenarios and mapping how search
characteristics affect search tasks. In this mapping, we consider
the interactions of characteristics.
      </p>
      <p>Based on this characteristic reference model, similar tasks can be
either created from scratch, or selected from the recorded tasks in
current studies where characteristic details are available. Search
systems can then be compared in relation to explicitly or
implicitly similar tasks. The advantage of using this model is not
only limited to enriching the comparability of personal search
systems, and the generalizability of comparison results, but it can
also lead to a complementary evaluation approach, where
assessing the effect of one characteristic on the performance of
search systems is important.</p>
    </sec>
    <sec id="sec-5">
      <title>3. CONCLUSION</title>
      <p>In this paper, we proposed a characteristic reference model for
evaluating personal search systems. As there are a variety of tasks
in the personal context, this model is based on identifying
building blocks, and how they affect search tasks. This approach
will enable better control and comparability across different users
and tasks, rather than focusing on specific instances of tasks as is
currently done in task-based evaluation. Focusing on these
characteristics not only facilitates the evaluation of search systems
based on search tasks through detailed comparisons, but also
provides evaluations on characteristics in affecting the
effectiveness of search systems.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Elsweiler</surname>
          </string-name>
          and
          <string-name>
            <given-names>I.</given-names>
            <surname>Ruthven</surname>
          </string-name>
          .
          <article-title>Towards task-based personal information management evaluations</article-title>
          .
          <source>In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <fpage>23</fpage>
          -
          <lpage>30</lpage>
          . ACM,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ingwersen</surname>
          </string-name>
          .
          <article-title>Selected variables for ir interaction in context: Introduction to irix sigir 2005 workshop</article-title>
          .
          <source>In Proceedings of the ACM SIGIR 2005 Workshop on Information Retrieval in Context (IRiX)</source>
          , pages
          <fpage>6</fpage>
          -
          <lpage>9</lpage>
          . Citeseer,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kelly</surname>
          </string-name>
          and
          <string-name>
            <surname>J. Teevan.</surname>
          </string-name>
          <article-title>11 understanding what works: Evaluating pim tools</article-title>
          .
          <source>Personal information management, page 190</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Soergel</surname>
          </string-name>
          .
          <article-title>Selecting and measuring task characteristics as independent variables</article-title>
          .
          <source>Proceedings of the American Society for Information Science and Technology</source>
          ,
          <volume>42</volume>
          (
          <issue>1</issue>
          )
          <string-name>
            <surname>:</surname>
          </string-name>
          n-a,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>W.</given-names>
            <surname>Kraaij</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Post</surname>
          </string-name>
          .
          <article-title>Task based evaluation of exploratory search systems</article-title>
          .
          <source>In Proc. of SIGIR 2006 Workshop, Evaluation Exploratory Search Systems</source>
          , Seattle, USA, pages
          <fpage>24</fpage>
          -
          <lpage>27</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <string-name>
            <given-names>N. J.</given-names>
            <surname>Belkin</surname>
          </string-name>
          .
          <article-title>An exploration of the relationships between work task and interactive information search behavior</article-title>
          .
          <source>JASIST</source>
          ,
          <volume>61</volume>
          (
          <issue>9</issue>
          ):
          <fpage>1771</fpage>
          -
          <lpage>1789</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Whittaker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Terveen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Nardi</surname>
          </string-name>
          .
          <article-title>Let's stop pushing the envelope and start addressing it: a reference task agenda for hci</article-title>
          .
          <source>Human-Computer Interaction</source>
          ,
          <volume>15</volume>
          (
          <issue>2</issue>
          ):
          <fpage>75</fpage>
          -
          <lpage>106</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Elsweiler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Toucedo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. T.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          .
          <article-title>Seeding simulated queries with user-study data forpersonal search evaluation</article-title>
          .
          <source>In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information, SIGIR '11</source>
          , pages
          <fpage>25</fpage>
          -
          <lpage>34</lpage>
          . ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>K.</surname>
          </string-name>
          <article-title>J¨arvelin. Ir research: systems, interaction, evaluation and theories</article-title>
          .
          <source>In ACM SIGIR Forum</source>
          , volume
          <volume>45</volume>
          , pages
          <fpage>17</fpage>
          -
          <lpage>31</lpage>
          . ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>