<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Information Research</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>User Evaluation of Multidimensional Relevance Assessment</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Célia da Costa Pereira</string-name>
          <email>pereira@dti.unimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mauro Dragoni</string-name>
          <email>dragoni@dti.unimi.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gabriella Pasi</string-name>
          <email>pasi@disco.unimib.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Università degli Studi di Milano</institution>
          ,
          <addr-line>Dipartimento di Tecnologie, dell'Informazione, Via Bramante 65, I-26013, Crema (CR)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Università degli Studi di Milano</institution>
          ,
          <addr-line>Dipartimento di Tecnologie, dell'Informazione, Via Bramante 65, I-26013, Crema (CR)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Università degli Studi di, Milano Bicocca, Dipartimento di Informatica, Sistemistica e Comunicazione</institution>
          ,
          <addr-line>Viale Sarca, 336, I-20126, Milano (MI)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <volume>8</volume>
      <issue>3</issue>
      <fpage>27</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>In this paper a user evaluation is proposed to assess the effectiveness of systems based on multidimensional relevance assessment. First of all, we introduce our approach to multidimensional modeling and aggregation, and the criteria used for the experiments. Then, we describe how the user evaluation has been performed, and ¯nally, we discuss the results obtained.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        In the ¯rst traditional approaches to Information Retrieval
(IR), relevance was modeled as \topicality", and its numeric
assessment was based on the matching function related to
the adopted IR model (boolean model, vector space model,
probabilistic model or fuzzy model ). However, relevance is,
in its very nature, the result of several components or
dimensions. Cooper [2] can be considered as one of the ¯rst
researchers who had intuitions on the multidimensional
nature of the concept of relevance. He de¯ned relevance as
topical relevance with utility. Mizzaro, who has written an
interesting article on the history of relevance [8], proposed a
relevance model in which relevance is represented as a
fourdimensional relationship between an information resource
(surrogate, document, and information) and a
representation of the user's problem (query, request, real information
need and perceived information need). A further judgment
is made according to the: topic, task, or context, at a
particular point in time. The dimensions pointed out by Mizzaro
are in line with the ¯ve manifestations of relevance suggested
by Saracevic [10]: system or algorithmic relevance, topical
or subject relevance, cognitive relevance or pertinence,
situational relevance or utility and motivational or e®ective
relevance. However, the concept of dimension used in this
paper which is similar to that used by Xu and Chen in [
        <xref ref-type="bibr" rid="ref4">12</xref>
        ]
is somehow di®erent from that used by Mizzaro and
Saracevic. They de¯ned several kinds of relevance and call them
dimensions of relevance while we de¯ne relevance as a
concept of concepts, i.e., as a point in a n-dimensional space
composed by n criteria. The document score is then the
result of a particular combination of those n space components
as explained in [3, 4].
      </p>
      <sec id="sec-1-1">
        <title>One of the problems raised by considering relevance as a</title>
        <p>multidimensional property of documents is how to aggregate
the related relevance scores. In [3, 4] an approach for
prioritized aggregation of multidimensional relevance has been
proposed. The proposed aggregation scheme is user
dependent: a user can be di®erently interested in each dimension.
The computation of the overall relevance score to be
associated with each retrieved document is then based on the
aggregation of the scores representing the satisfaction of the
considered dimensions. A problem raised by this new
approach is how to evaluate its e®ectiveness. In fact, there is
no test collection suited to evaluate such a model. In this
paper, we ¯rst recall the models for aggregating multiple
dimensions evaluations for relevance assessment presented in
[3] and [4]. We focus on observing how document rankings
are modi¯ed after applying the two operators on the
di®erent typologies of users (di®erent dimensions orderings).</p>
      </sec>
      <sec id="sec-1-2">
        <title>The paper is organized as follows. Section 2 recalls the aggregation models used in the paper. Section 3 presents the performed user evaluation and, ¯nally, Section 4 concludes the paper.</title>
        <p>2.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>PRIORITIZED MULTICRITERIA AGGRE</title>
    </sec>
    <sec id="sec-3">
      <title>GATION</title>
      <sec id="sec-3-1">
        <title>In this section, after a brief background on the representation of a multicriteria decision making problem, two prioritized approaches for aggregating distinct relevance assessments are shortly presented.</title>
        <p>2.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Problem Representation</title>
      <p>The presented multicriteria decision making approaches
have the following components:
² the set C of the n considered criteria: C = fC1; : : : ; Cng,
with Ci being the function evaluating the ith criterion;
² the collection of documents D;
² an aggregation function F to calculate for each
document d 2 D a score F (C(d))1 = RSV (d) on the basis
of the evaluation scores of the considered criteria.</p>
      <sec id="sec-4-1">
        <title>1Actually, it corresponds to F (C1(d); : : : ; Cn(d)).</title>
        <sec id="sec-4-1-1">
          <title>Cj(d) represents the satisfaction scores of document d</title>
          <p>with respect to criterion j. The weight associated with
each criterion Ci 2 C, with i 6= 1, is document and
userdependent. It depends on the preference order of Ci for the
user, and also on both the weight associated to criterion</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Ci¡1, and the satisfaction degree of the document with re</title>
        <p>spect to Ci¡12. Formally, if we consider document d, each
criterion Ci has an importance ¸i 2 [0; 1].</p>
        <sec id="sec-4-2-1">
          <title>Notice that di®erent users can have a di®erent preference order over the criteria and, therefore, it is possible to obtain di®erent importance weights for the same document for di®erent users.</title>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>We suppose that Ci Â Cj if i &lt; j. This is just a repre</title>
        <p>sentational convention which means that the most preferred
criteria have lower indexes.</p>
        <p>We suppose that:
² for each document d, the weight of the most important
criterion C1 is set to 1, i.e., by de¯nition we have:
8 d ¸1 = 1;
² the weights of the other criteria Ci, i 2 [2; n], are
calculated as follows:</p>
        <p>¸i = ¸i¡1 ¢ Ci¡1(d);
where Ci¡1(d) is the degree of satisfaction of
criterion Ci¡1 by document d, and ¸i¡1 is the importance
weight of criterion Ci¡1.
2.2</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>The Prioritized Scoring model</title>
      <p>This operator allows us to calculate the overall score value
from several criteria, where the weight of each criterion
depends both on the weights and on the satisfaction degrees
of the most important criteria | the higher the satisfaction
degree of a more important criterion, the more the
satisfaction degree of a less important criterion in°uences the
overall score.</p>
      <p>Operator Fs is de¯ned as follow: Fs : [0; 1]n ! [0; n] and
it is such that, for any document d,</p>
      <p>Fs(C1(d); : : : ; Cn(d)) =
n
X ¸i ¢ Ci(d):
i=1</p>
      <sec id="sec-5-1">
        <title>The RSVs of the alternative d is then given by:</title>
        <p>RSVs(d) = Fs(C1(d); : : : ; Cn(d)):</p>
      </sec>
      <sec id="sec-5-2">
        <title>Formalizations and properties of this operator are presented in [3].</title>
        <p>2.3</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>The Prioritized “min” Operator</title>
      <p>In this section a prioritized \min" (or \and") operator is
recalled [4]. This operator allows to compute the overall
satisfaction degree for a user whose overall satisfaction degree
is strongly dependent on the degree of the least satis¯ed
criterion. The peculiarity of such an operator, which also
distinguishes it from the traditional \min" operator, is that
the extent to which the least satis¯ed criterion is considered
depends on its importance for the user. If it is not important
at all, its satisfaction degree should not be considered, while
if it is the most important criterion for the user, only its
satisfaction degree is considered. This way, if we consider a
2If there are more than one criterion with the same priority
order, the average weight and the average satisfaction degree
are considered.
(1)
(2)
(3)
document d, for which the least satis¯ed criterion Ck is also
the least important one, the overall satisfaction degree will
be greater than Ck(d); it will not be Ck as it would be the
case with the traditional \min" operator | the less
important is the criterion, the lower its chances to represent the
overall satisfaction degree.</p>
      <sec id="sec-6-1">
        <title>The aggregation operator Fm is de¯ned as follows. Fm :</title>
        <p>[0; 1]n ! [0; 1] is such that, for all document d,
Fm(C1(d); : : : ; Cn(d)) = im=1in;n(fCi(d)g¸i ):
(4)</p>
      </sec>
      <sec id="sec-6-2">
        <title>Formalizations and properties of this operator are presented in [4].</title>
        <p>3.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>USER EVALUATION OF THE PRIORI</title>
    </sec>
    <sec id="sec-8">
      <title>TIZED AGGREGATION OPERATORS</title>
      <p>In [3, 4] the proposed approach for prioritized aggregation
of the considered relevance dimensions has been applied to
personalized IR without loss of generality. The considered
personalized approach relies on four relevance dimensions:
aboutness, coverage, appropriateness, and reliability. The
aboutness is computed as the similarity between the
document vector and the query vector. The scores of the
coverage and the appropriateness criteria are computed based on
a similarity of the document vector and a vector of terms
representing the user pro¯le. While the reliability
represents the trust degree for a user of the source from which
document comes.
3.1</p>
    </sec>
    <sec id="sec-9">
      <title>Preliminary Assumptions</title>
      <p>The prioritized aggregations approach is based on the
user's indication (either explicit or implicit) of the
importance order of relevance dimensions. In [3, 4] di®erent user's
behaviors have been described. In the case in which a user
formulates a query with the idea of locating documents which
are about the query and which also cover all his interests,
and at the same time he does not care about the fact that the
document also focuses on additional topics the user can be
called "coverage seeker". If on the contrary the user's intent
is to privilege documents which perfectly ¯t his interests the
user is called "appropriateness seeker"</p>
      <sec id="sec-9-1">
        <title>On the contrary, a user who formulates a query which</title>
        <p>has no intersection with his interests or users who do not
have a de¯ned list of interests { interest neutral { will not
give any importance to the coverage and appropriateness
criteria. Users of this kind are just looking for a satisfactory
answer to their current concern, as expressed by their query.</p>
      </sec>
      <sec id="sec-9-2">
        <title>Finally, users who are cautious about the trustworthiness of the origin of the retrieved documents { cautious { will give more importance to the reliability criterion than to the others.</title>
      </sec>
      <sec id="sec-9-3">
        <title>For example, coverage seeker users can be de¯ned as follows:</title>
        <p>CARAp: coverage Â aboutness Â reliability Â appropriateness;
3.2</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Experiments</title>
      <sec id="sec-10-1">
        <title>In this section, the impact of the proposed prioritized ag</title>
        <p>gregation operators in the personalized IR setting is
evaluated. In Section 3.2.1 we present the settings used to
perform the experiments, while in Section 3.2.2 we discuss the
obtained results.
3.2.1</p>
        <p>The traditional way to evaluate an information retrieval
system is based on a test collection composed by a
document collection, a set of queries, and a set of relevance
judgments which classify a document as being relevant or
not for each query. Precision and recall are then computed
to evaluate the e®ectiveness of the system. Unfortunately,
there is not a test collection suited to evaluate a system
based on approaches like the one proposed in this paper. It
is important to notice that in the case of a user-independent
aggregation of the multiple relevance numeric assessments,
a traditional system's evaluation could be applied. In fact if
for example the single assessment scores are aggregated by
a mean operator, the system could produce the same result
for a same query and a same document, independently of
the user judgments. When applying the prioritized
aggregation that we have proposed, a same document evaluated
with respect to a same query, could produce distinct
assessment scores depending on the adopted prioritized scheme,
which is user-dependent.</p>
      </sec>
      <sec id="sec-10-2">
        <title>The evaluation approach proposed in this paper is based</title>
        <p>on an analysis of how document rankings are modi¯ed
accordingly to the prioritized aggregations associated with the
user's typologies that we have identi¯ed in Section 3.1.</p>
      </sec>
      <sec id="sec-10-3">
        <title>The relevance criteria and their aggregation discussed in</title>
        <p>the previous sections have been implemented on top of the
well-known Apache Lucene open-source API 3. The Reuters</p>
      </sec>
      <sec id="sec-10-4">
        <title>RCV1 Collection (over 800,000 documents) has been used.</title>
      </sec>
      <sec id="sec-10-5">
        <title>The method that we have used to generate both queries</title>
        <p>and user's pro¯les has been inspired by the approach
presented by Sanderson in [9]. In this work the author presents
a method to perform simple IR evaluations by using the</p>
      </sec>
      <sec id="sec-10-6">
        <title>Reuters collection that does not have queries nor relevance</title>
        <p>judgments, but has one or more subject codes associated
with each document.</p>
        <p>He splits the collection in two parts, a query set \Q" and
a test set \T", and documents are randomly assigned to one
of the two subsets. Then, all subject codes are grouped in a
set \S". For each subject code sx, all documents tagged with
the subject code sx are extracted from the set \Q". From
these documents, the pairs (word, weight) are generated to
create a query. Then, the query is performed on the set \T".</p>
      </sec>
      <sec id="sec-10-7">
        <title>The precision/recall curves are calculated by considering as</title>
        <p>relevant, the documents that contain the subject code sx.</p>
      </sec>
      <sec id="sec-10-8">
        <title>We have been inspired by Sanderson's approach to build</title>
        <p>both the queries and the user's pro¯les. The queries have
been created as expressed above. The creation of the user's
pro¯le has been done in the following way. The set \Q"
has been split in di®erent subsets based on the subject code
of each document (ex. \sport", \science", \economy", etc.).</p>
      </sec>
      <sec id="sec-10-9">
        <title>Each subset of \Q" represents the set of documents known</title>
        <p>by the users interested in that particular topic. For
example, the subset that contains all documents tagged with the
subject code \sport" represents the set of documents known
by the users interested in sports.</p>
        <p>We have indexed each subset of \Q" and, for each created
index, we have calculated the TF-IDF of each term. Then,
we have computed a normalized ranking of these terms and
we have extracted the most signi¯cant ones. The TF-IDF of
each term represents the interest degree of that term in the
pro¯le, that is, how much the term plays the role of a good
3See URL http://lucene.apache.org/.
representation of the user's interests.</p>
      </sec>
      <sec id="sec-10-10">
        <title>An example of user's pro¯le is illustrated in Table 1. For</title>
        <p>example, the users associated with the \BIOTECH" pro¯le
have, with respect to the term \disease", an interest degree
of 0:419. Each pro¯le is viewed as a long term information
need, therefore, it is treated in the same way as documents
or queries.</p>
      </sec>
      <sec id="sec-10-11">
        <title>To study the behavior of the system, we have carried out</title>
        <p>
          a user evaluation as proposed in [1] [
          <xref ref-type="bibr" rid="ref1">5</xref>
          ] [
          <xref ref-type="bibr" rid="ref2">6</xref>
          ].
        </p>
      </sec>
      <sec id="sec-10-12">
        <title>The user evaluation described in this paper has been inspired by the one suggested in [7] that simply consists in a procedure in which a set of at least 6 users performs a set of at least 6 queries.</title>
        <p>In these experiments we have considered eight users with
eight di®erent pro¯les, each one associated with a subset of
\Q" (Table 2).</p>
        <p>scientist
researcher
disease
cancer
human
1.000
0.563
0.419
0.410
0.406</p>
        <p>BIOTECH
gene 0.402
study 0.386
clone 0.281
animal 0.279
planet 0.267
patient 0.260
brain 0.259
people 0.254
experiment 0.249
drug 0.247</p>
        <p>The aims of these experiments are to verify that: (i) when
a user performs queries in-line with his interests, by
applying a prioritized aggregation operator, the system produces
an improved ranking with respect to the one produced by
simply averaging the scores, and (ii) when a user performs
queries that are not-in-line with his interests, by applying a
prioritized aggregation operator, the quality of the produced
rank does not decrease with respect to the situation in which
the prioritized aggregation operators are not applied.</p>
      </sec>
      <sec id="sec-10-13">
        <title>Two kinds of queries have been considered. Those which</title>
        <p>are in-line with the interests contained in the user's pro¯le,</p>
      </sec>
      <sec id="sec-10-14">
        <title>Qi, and those which are not-in-line with the interests con</title>
        <p>tained in the user's pro¯le, Qn. Table 2 illustrates the set Qi
and shows the associations between the user's pro¯les and
the performed queries. In these preliminary experiments
only one query has been generated for each user. For
instance, for User 1, the set Qi is composed only by the query</p>
      </sec>
      <sec id="sec-10-15">
        <title>Q1, while the set Qn is composed by all the other queries</title>
        <p>from Q2 to Q8.</p>
      </sec>
      <sec id="sec-10-16">
        <title>For User 2, the set Qi is composed only by the query</title>
      </sec>
      <sec id="sec-10-17">
        <title>Q2, while the set Qn is composed by the query Q1 and the</title>
        <p>queries from Q3 to Q8, and so on for the other users.</p>
        <p>User
User1
User2
User3
User4
User5
User6
User7
User8</p>
        <p>Profile Name
SPACE
BIOTECH
HITECH
CRIMINOLOGY
DEFENSE
DISASTER
FASHION
SPORT</p>
        <p>Query
Q1: \space shuttle missions"
Q2: \drug disease"
Q3: \information technology"
Q4: \police arrest sentence fraud"
Q5: \russia military navy troops"
Q6: \flood earthquake hurricane"
Q7: \collection italian versace"</p>
        <p>Q8: \premiership league season score"</p>
        <p>When a user submits a query, the matching between the
query vector and each document vector is made ¯rst
(aboutness), then, on each document the coverage and the
appropriateness criteria are evaluated by comparing the document
vector with the user's pro¯le vector. Finally, the value of
the reliability criterion, which corresponds to the degree to
which the user trusts the source from which the document
comes, is taken into account. These are the values to be
aggregated | aboutness, coverage, appropriateness and
reliability.</p>
      </sec>
      <sec id="sec-10-18">
        <title>The evaluation of the produced rank is made by the eight real users that used the system. Each user analyzed the top 10 documents returned by the system and assessed, for each document, if it is relevant or not.</title>
        <p>3.2.2</p>
        <p>Discussion of the Results</p>
        <p>
          In this section we present the obtained results. For space
reasons some ranks have not been inserted, however the
complete archive of the ranks produced in these experiments are
available online 4. For convenience, only the top 10 ranked
documents are reported in each table. The rationale
behind this decision is the fact that the majority of search
result click activity (89.8%) happens on the ¯rst page of
search results [
          <xref ref-type="bibr" rid="ref3">11</xref>
          ], that is, generally, users only consider the
¯rst 10 (20) documents. The baseline rank for the
\Scoring" operator is obtained by applying the average operator
to calculate document assessment. Such rank corresponds
to the average assessment of the documents considering the
four criteria and without considering priorities among the
criteria. Instead, the baseline rank for the \Min" operator
is obtained by applying the standard min operator. Table 3
illustrates an example of rank produced by the average
operator after performing a query in Qi, while Table 4 illustrates
an example of rank produced by the standard min operator
after performing a query in Qi. The entries marked with the
asterisk before the title, have been considered relevant with
respect to both the performed query and the user pro¯le.
        </p>
      </sec>
      <sec id="sec-10-19">
        <title>We can notice that there are more non-relevant documents</title>
        <p>in the top 10 list resulting from the application of average
operator than in the list resulting from the application of the
standard min operator. This is due to the compensatory
nature of the average operator.</p>
      </sec>
      <sec id="sec-10-20">
        <title>We illustrate the behavior of the system by taking into</title>
        <p>account di®erent kinds of aggregations applied to the User</p>
      </sec>
      <sec id="sec-10-21">
        <title>1, the user associated to the \SPACE" pro¯le. In particular,</title>
        <p>we present in Tables from 5 to 10 the results obtained by
applying both the Prioritized \Scoring" Operator and the</p>
      </sec>
      <sec id="sec-10-22">
        <title>Prioritized \Min" Operator, with the aggregations ACApR,</title>
        <p>CApAR, and ApCAR</p>
      </sec>
      <sec id="sec-10-23">
        <title>We can notice that the proposed document rankings are</title>
        <p>improved, with respect to the baselines ranking for both
operators and for the considered aggregations, in the sense that
the number of relevant documents in the top 10 is greater
than the number of relevant documents in the baseline
ranking | non relevant documents are put down in the ranking.</p>
        <p>We can also notice that, while the document in the 9th
position of the top 10 documents in Table 3 is deemed
su±ciently topical for the user with pro¯le \SPACE", the same
document is not even considered in the top 10 list of any
table corresponding to the prioritized \Scoring" operator. This
is due to the fact that, even though the document satis¯es
the query because it contains information about space
mission, its content is instead related to space exploration.
Instead, for example, the document in the ¯rst position in the
scoring baseline rank, is also proposed in almost all the top
ten documents (scoring and min) including the min baseline
rank. An exception is Table 6 where that document does</p>
        <sec id="sec-10-23-1">
          <title>4http://www.dti.unimi.it/dragoni/¯les/</title>
          <p>vanceUserEvaluation.rar</p>
        </sec>
      </sec>
      <sec id="sec-10-24">
        <title>Multirele</title>
        <p>not appear. The reason is that this document comes from a
source with a very low degree of reliability.</p>
        <p>Di®erent considerations have to be done when the user's
query is not in-line with his pro¯le (i.e. the user's query is
in the set Qn). We will discuss about two di®erent
scenarios. In the ¯rst one the user associated with the \BIOTECH"
pro¯le executes the query associated to the \FASHION"
pro¯le, while in the second scenario, the user associated to the
\CRIMINOLOGY" pro¯le executes the query associated to
the \SPACE" pro¯le. We have noticed that, for the scoring
operator, the results for all aggregations are in general
similar to the baseline. The previous considerations are not valid
for the prioritized min operator. It is due to its de¯nition.</p>
      </sec>
      <sec id="sec-10-25">
        <title>Indeed, if just one criterion is weak satis¯ed, the overall as</title>
        <p>sessment is very low. Now, if users make queries not in line
with their pro¯le, the criteria like coverage and
appropriateness are weakly satis¯ed and then the overall value is low.</p>
      </sec>
      <sec id="sec-10-26">
        <title>Instead, when considering the prioritized min operator, the</title>
        <p>result depends also on the importance degree of the least
satis¯ed criterion. We can conclude that the (prioritized)
min operator should not be used for the users who make
queries that are not in line with their pro¯le.</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>4. CONCLUSION AND FUTURE WORK</title>
      <sec id="sec-11-1">
        <title>In this paper, a user evaluation for aggregating multiple</title>
        <p>criteria has been presented and discussed.</p>
        <p>The experimental results have been obtained thanks to a
case study on personalized Information Retrieval with
multicriteria relevance. These results show that: (i) the proposed
operators allow to improve the ranking of the documents
which are related to the user interest, when the user
formulates an interest-related query; (ii) for the \scoring"
operator, when a user has no interests or formulates a query
which is not related to his interests, the ranking of the
documents is similar to the ranking obtaining by using the
average operator; and (iii) for the \min" operator, when the
user formulates a non interest-related query this operator is
not suitable.</p>
        <p>R. Document Title Score
1 *Shuttle Atlantis blasts o® on schedule. 0.626
2 Countdown starts for Sunday shuttle launch. 0.575
3 *Shuttle ¯nally takes Lucid o® space station Mir. 0.573
4 U.S. spacewoman breaks another record. 0.573
5 *Shuttle Discovery heads for Florida. 0.572
6 *Shuttle Atlantis heads for Mir despite problem. 0.568
7 Scientists delighted with U.S. shuttle flight. 0.567
8 *U.S. shuttle launched on mission to Mir. 0.563
9 Boeing-Lockheed group signs $7 billion shuttle pact. 0.562
10 *U.S. shuttle leaves space station Mir. 0.561</p>
      </sec>
    </sec>
    <sec id="sec-12">
      <title>5. REFERENCES</title>
      <p>*Shuttle Discovery takes o® on schedule.
*Shuttle Atlantis blasts o® on schedule.
*U.S. space shuttle heads home.
*Shuttle Discovery heads for Florida.
*U.S. shuttle crew set up space laboratory.
*Columbia shuttle mission extended one day.
*Shuttle Atlantis heads for Mir despite problem.
*Shuttle Discovery lands in Florida.
*U.S. space shuttle crew set for Thursday landing.</p>
      <p>*U.S. shuttle will not °ush Mir's water.
the Prioritized Scoring Operator and ACApR
aggregation.
*Shuttle Atlantis to return home on Wednesday.
*With spacewalk o®, shuttle astronauts relax.
*U.S. space shuttle heads for rendezvous with Mir.
*U.S. shuttle crew prepares to retrieve satellite.
*Shuttle-deployed telescope ready for action.
*Space shuttle deploys U.S.-German satellite.
*Shuttle crew prepares for nighttime landing.
*Hubble service crew prepares to return home.
*Satellites line up behind shuttle Columbia.</p>
      <p>RUSSIA: Sticken Mir crew stands down, says worst over.
the Prioritized Min Operator and ACApR
aggregation.
[1] P. Borlund. The iir evaluation model: a framework for
evaluation of interactive information retrieval systems.
e®ectiveness. Journal of the American Society for</p>
      <sec id="sec-12-1">
        <title>Multidimensional relevance: A new aggregation</title>
        <p>prioritized \and" aggregation operator for
multidimensional relevance assessment. In AI*IA
2009, to appear, 2009.</p>
      </sec>
      <sec id="sec-12-2">
        <title>Taylor Graham, 1992.</title>
        <p>Information Seeking and Retrieval in Context Series.
retrieval interaction: Extension and applications.
Journal of American Society for Information Science,
34:313{327, 1997.
*Russians aim to ¯x Mir before US Shuttle arrives.
*Russians hope to ¯x Mir before Shuttle arrives.
*With spacewalk o®, shuttle astronauts relax.</p>
        <p>Countdown continues for U.S. spacewoman's return.
*Shuttle Columbia blasts o® to mission.
*Shuttle Atlantis blasts o® on schedule.
*Navigational problem crops up on shuttle mission.
*U.S. shuttle launched on mission to Mir.</p>
        <p>Sticken Mir crew stands down, says worst over.
*Astronaut Lucid tones up for ride home.</p>
        <p>Score
the Prioritized Scoring Operator and CApAR
aggre</p>
        <p>Score
1
2
3
4
5
6
7
8
9
10
the Prioritized Min Operator and CApAR
aggregagation.
*Part of planned space station arrives in Florida.
*French astronaut to join Russian space mission.
*Russia, hurt by Mars failure, sends probe to space.
*Astronauts board shuttle for U.S. launch.
*Shuttle Columbia blasts o® to mission.
*Shuttle Atlantis blasts o® on schedule.
*Shuttle Discovery lands in Florida.
*U.S. space shuttle crew set for Thursday landing.
*U.S. shuttle leaves space station Mir.</p>
        <p>Lack of funds threaten Russia's space programme.
0.364
0.364
0.362
0.351
0.336
0.332
0.332
0.314
0.303
0.299
0.250
0.242
0.231
0.228
0.228
0.225
0.216
0.215
0.210
0.204
141
-1
69
48
89
208
3
63
117
-2</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ingwersen</surname>
          </string-name>
          . Information Retrieval Interaction.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ingwersen</surname>
          </string-name>
          .
          <article-title>Cognitive perspectives of information retrieval interaction: elements of a cognitive ir theory</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Spink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Blakely</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Koshman</surname>
          </string-name>
          .
          <article-title>A study of results overlap and uniqueness among major web search engines</article-title>
          . Inf. Process. Manage.,
          <volume>42</volume>
          (
          <issue>5</issue>
          ):
          <volume>1379</volume>
          {
          <fpage>1391</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y. C.</given-names>
            <surname>Xu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>Relevance judgment: What do information users consider beyond topicality</article-title>
          ?
          <source>J. Am. Soc. Inf. Sci. Technol</source>
          .,
          <volume>57</volume>
          (
          <issue>7</issue>
          ):
          <volume>961</volume>
          {
          <fpage>973</fpage>
          ,
          <year>2006</year>
          . Table 10:
          <article-title>Results for "SPACE" pro¯le by applying</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>