<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A study on evaluation on opinion retrieval systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giambattista Amati</string-name>
          <email>gba@fub.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Amodeo</string-name>
          <email>gamodeo@fub.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valerio Capozio</string-name>
          <email>valeriocapozio@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlo Gaibisso</string-name>
          <email>carlo.gaibisso@iasi.cnr.it</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giorgio Gambosi</string-name>
          <email>gambosi@mat.uniroma2.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Computer Science, University of L'Aquila</institution>
          ,
          <addr-line>L'Aquila</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dept. of Mathematics, University of Rome "Tor, Vergata"</institution>
          ,
          <addr-line>Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Fondazione Ugo Bordoni</institution>
          ,
          <addr-line>Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Istituto di Analisi dei Sistemi, ed Informatica "Antonio, Ruberti" - CNR</institution>
          ,
          <addr-line>Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <fpage>27</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>We study the evaluation of opinion retrieval systems. Opinion retrieval is a relatively new research area, nevertheless classical evaluation measures, those adopted for ad hoc retrieval, such as MAP, precision at 10 etc., were used to assess the quality of rankings. In this paper we investigate the effectiveness of these standard evaluation measures for topical opinion retrieval. In doing this we split the opinion dimension from the relevance one and use opinion classi ers, with varying accuracy, to analyse how opinion retrieval performance changes by perturbing the outcomes of the opinion classi ers. Classi ers could be studied in two modalities, that is either to re-rank or to lter out directly documents obtained through a rst relevance retrieval. In this paper we formally outline both approaches, while for now focussing on the ltering process. The proposed approach aims to establish the correlation between the accuracy of the classi ers and the performance of the topical opinion retrieval. In this way it will be possible to assess the e ectiveness of the opinion component by comparing the e ectiveness of the relevance baseline with that of the topical opinion.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
      <p>H.3.0 [Information Storage and Retrieval]: General;
H.3.1 [Information Storage and Retrieval]: Content
Analysis and Indexing; H.3.3 [Information Storage and
Retrieval]: Information Search and Retrieval</p>
    </sec>
    <sec id="sec-2">
      <title>1. INTRODUCTION</title>
      <p>Sentiment analysis aims to documents classi cation,
according to opinions, sentiments, or, more generally,
subjective features contained in text. The study and
evaluation of e cient solutions to detect sentiments in text is a
popular research area, and di erent techniques have been
applied coming from natural language processing,
computational linguistics, machine learning, information retrieval
and text mining.</p>
      <p>
        The application of sentimental analysis to Information
Retrieval goes back to the novelty track of TREC 2003 [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
Topical opinion retrieval is also known as opinion retrieval
or opinion nding [
        <xref ref-type="bibr" rid="ref11 ref4 ref9">4, 9, 11</xref>
        ]. In [5, 3, 2, ?] dictionary-based
methodologies for topical opinion retrieval are proposed. An
application of opinion nding to blogs was introduced in the
Blog Track of TREC 2006 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. However, there is not yet
a comprehensive study of evaluation of topical opinion
systems, and in particular of the interaction and correlation
between relevance and sentiment assessments.
      </p>
      <p>
        At rst glance, evaluation of opinion retrieval systems
seems to not deserve any further investigation or extra
effort with respect to the evaluation of conventional retrieval
systems. Traditional evaluation measures, such as the Mean
Average Precision (MAP) or the precision at 10 [
        <xref ref-type="bibr" rid="ref10 ref11 ref6 ref8">8, 6, 10,
11</xref>
        ], can be still used to evaluate rankings of opinionated
documents that are also assessed to be relevant to a given
topic. However, if we give a deeper look at the performance
of topical opinion systems we are struck by the diversity in
the observed values of performance. For example the best
run for topic relevance in the blog track of TREC 2008 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
achieves a MAP value equal to 0.4954, that drops to 0.4052,
as concerns the MAP of opinion, in the opinion nding task.
Performance degradation is as expected because any variable
which is additional to relevance, i.e. the opinion one, must
deteriorate the system performance. However, we do not
have yet a way to set apart the e ectiveness of the opinion
detection component and evaluate how e ective it is, or to
determine whether and to which extent, the relevance and
opinion detection components are in uenced by each other.
It seems evident that an evaluation methodology or at least
some benchmarks are needed to make it possible to assess
how e ective the opinion component is. To exemplify: how
e ective is the performance value of opinion MAP 0.4052
when we start from an initial relevance MAP of 0.4954? It
is indeed a matter of fact that opinion MAP in TREC [
        <xref ref-type="bibr" rid="ref10 ref6 ref8">8, 6,
10</xref>
        ], seems to be highly dependent on the relevance MAP of
the rst-pass retrieval [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>The general issue is thus the following: can we assume
that absolute values of MAP can be used as they are to
compare di erent tasks, in our case the topical opinion and
the ad hoc relevance task; and thus: evaluation measures
can be used without any MAP normalization to compare
or to assess the state of the art of di erent techniques on
opinion nding?</p>
      <p>At this aim, we introduce a completely novel
methodological framework which:
provides a bound for the best achievable opinion MAP,
for a given relevance document ranking;
predicts the performance of topical opinion retrieval
given the performance of the topic retrieval and
opinion detection;
viceversa, provides whether a given opinion detection
technique gives a signi cant or marginal contribution
to the state of the art;
investigates the robustness of evaluation measures for
opinion retrieval e ectiveness.
indicates what re-ranking or ltering strategy is best
suited to improve topical retrieval by opinion
classiers.</p>
      <p>This paper is organized as follows. The proposed
evaluation method is presented in sections 2 and 4; section 3
introduces the collection used for tests. Results are presented
in section 5, and conclusions follow in section 6.</p>
    </sec>
    <sec id="sec-3">
      <title>EVALUATION APPROACH</title>
      <p>
        An opinion retrieval system is based on a topic retrieval
and an opinion detection subsystem [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]: di erent kinds of
\information" are retrieved and weighted in order to
generate a nal ranking of documents that re ects their relevance
with both topic and opinion content. To analyse the e
ectiveness of the whole system, we should be able to quantify
not only the performance of the nal result, but also the
contribution of each subsystem. As usual, the evaluation metric
used in literature for the nal ranking is the MAP. But MAP
(of relevance and opinion) for the nal ranking is not
sufcient to fully assess the performance of the whole system:
the contribution of each component, taken separately, needs
to be identi ed.
      </p>
      <p>The input to the proposed topical opinion evaluation
process is the relevance baseline, i.e. the ranking of documents
generated by the topic retrieval system, here considered as
a black box. The e ectiveness of the topic retrieval
component is measured by the MAP of opinion and relevance of
this baseline.</p>
      <p>The evaluation of the e ectiveness of the opinion
detection component, relies on arti cially de ned classi ers of
opinion. The arti cial classi er COk classi es documents as
opinionated, O, or not opinionated, O, with accuracy k,
0 k 1. The classi cation process is independent from
k
the topic relevance of documents. To achieve accuracy k CO
properly classi es each document with probability k.</p>
      <p>Therefore the number of misclassi ed documents is (1 k)
n, where n is the number of classi ed documents. Assuming
the independence between opinion and relevance, the
misclassi ed documents will be distributed randomly between
relevant and not relevant.</p>
      <p>The outcomes of these arti cial classi ers are then used to
modify the baseline. This can be done following two di erent
approaches:
a ltering process: when documents of the baseline are
deemed as not opinionated by the classi er, they are
removed from the ranking;
a re-ranking process: when documents of the baseline
are considered as opinionated by the classi er, they
receive a \reward" in their rank.</p>
      <p>
        The ltering process uses the classi er in its classical
meaning. This process is particularly suitable to analyse the
effectiveness of the technique itself to opinion detection, as a
classi cation task [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and its e ects on topical opinion
performance. Opinion ltering also gives some interesting clues
on what is the optimal performance achievable by an
opinion retrieval technique based on ltering, and also whether
ltering strategy is in general superior or not to even very
simple re-ranking strategies.
      </p>
      <p>In the re-ranking process a \reward" function for the
documents has to be de ned. In such a case we introduce bias
in assigning correct rewards, and we thus may observe the
e ectiveness of a re-ranking algorithm as long as the opinion
detection performance changes.</p>
      <p>By \comparing" the results of an opinion retrieval system
with the ltering process, or the re-ranking process at several
levels of accuracy, we can obtain relevant clues about:
the overall contribution introduced by the opinion
system only and its robustness;
the e ectiveness of the opinion detection component;
In the following we formally describe both the approaches
and focus on the experimentation concerning the ltering
process only.
3.</p>
    </sec>
    <sec id="sec-4">
      <title>EXPERIMENTATION ENVIRONMENT</title>
      <p>
        We used the BLOG06 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] collection and the data sets of the
Blog Track of TREC 2006, 2007 and 2008 [
        <xref ref-type="bibr" rid="ref10 ref6 ref8">8, 6, 10</xref>
        ] for our
experimentation. Since 2006, Blog Track has an evaluation
track on blogs where the main task is opinion retrieval, that
is the task of selecting the opinionated blog posts relevant
to a given topic [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. BLOG06 collection size is 148 GB and
contains spam as well as possibly non-blogs and non-English
pages.
      </p>
      <p>The data set consists of 150 topics and a list, the Qrels,
in which the relevance and content of opinion of documents
are assessed with respect to each topic. An item in the
list identi es a topic t, a document d and a judgement of
relevance/opinion assigned as follows:
0 if d is not relevant with respect to t;
1 if d is relevant to t, but does not contain comments
on t;
2 if d is relevant to t and contains positive comments
on t;
3 if d is relevant to t and contains neutral comments
on t;
4 if d is relevant to t and contains negative comments
on t.</p>
      <p>Note that not relevant documents are not classi ed
according to their opinion content.</p>
      <p>In the following, [x] denotes the set of documents labelled
by an x = 0; 1; 2; 3; 4, and not labelled documents belong to
[0] by default.</p>
      <p>TREC organizers also provide the best ve baselines,
produced by some participants, denoted by BL1, BL2, : : : ; BL5.</p>
    </sec>
    <sec id="sec-5">
      <title>EVALUATION FRAMEWORK</title>
      <p>The behaviour of arti cial classi er COk is de ned through
the Qrels. COk predicts the right opinion orientation of each
document in the collection by searching it in the Qrels. The
accuracy k is simulated by the introduction of a bias in the
classi cation. Documents not appearing or assessed as not
relevant in the Qrels, will be classi ed according to the
distribution of probability of opinionated and not opinionated
documents among the relevant ones. Taking into account
both relevance and opinion in the test collection we obtain
the contingency Table 1. As shown in table 1, the Qrels does
not provide the opinion classes for not relevant documents.
The missing data complicate a little bit, but not much, the
construction of our classi ers. To overcome the problem, we
assume that</p>
      <p>P r(OjR) = P r(OjR)
Equation 1 asserts that there is not a su cient reason to
have a di erent distribution of opinion among relevant and
not relevant documents. An a priori probability, Pr(O),
for opinionated documents is still unknown. However
equation 1 implies that O and R are independent, thus</p>
      <p>P r(OjR) = P r(O)
From equations 1 and 2 follows that</p>
      <p>P r(OjR) = P r(OjR) = P r(O) = 1
P r(O)</p>
      <p>
        Equations 2 and 3 are equivalent to assume that the set
f[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] [ [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] [ [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]g, as de ned in Table 1, is a sample of the set
of opinionated documents. Thus, without loss of generality,
we can de ne Pr(O) using only the documents classi ed as
relevant by the Qrels as follows:
(1)
(2)
(3)
(4)
(5)
P (O) =
      </p>
      <p>
        jf[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] [ [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] [ [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]gj
jf[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] [ [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] [ [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] [ [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]gj
and consequently
      </p>
      <p>P (O) = 1</p>
      <p>P (O) =</p>
      <p>
        j[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]j
jf[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] [ [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] [ [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] [ [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]gj
      </p>
      <p>In the following we study whether and how the set of
relevant and not relevant documents classi ed as opinionated
a ects the topical opinion ranking.</p>
      <p>We have to say that for both approaches, ltering or
reranking, a misclassi cation may have controversial e ects
on the e ectiveness of the nal ranking. If we lter
documents by opinions with a classi er, for example, the
misclassi ed and removed not relevant documents may bring a
positive contribution to the precision measures, because all
opinionated and relevant documents that were below them,
will have a higher rank after their removal. Even with the
re-ranking approach we have a similar situation, but this
precision boosting phenomenon is attenuated by the fact
that re-ranking is not based on as drastic decision as that
of a removal, and the repositioning of a document does not
propagate to all documents that are below it in the original
ranking.</p>
      <p>R
R</p>
      <p>Together with COk, we introduce a random classi er CORC
that classi es documents according to the a priori
distribution of opinionated documents in the collection. It
represents a good approximation of the random behaviour of a
classi er. More precisely, this classi er assesses a document
as opinionated with probability P (O) and as not
opinionated with probability Pr(O) = 1 Pr(O).</p>
      <p>As already stated, in the ltering approach documents
classi ed as not opinionated are removed from the baseline.
Note that while relevant documents contribute and improve
the evaluation measure, if correctly classi ed, the not
relevant ones do not contribute directly to this measure.</p>
      <p>In conclusion if a not relevant document is classi ed as
opinionated not being actually opinionated, then this
misclassi cation will not a ect the evaluation measure. Di
erently the removal of not relevant documents regardless of
their real opinion orientation, always positively a ects the
ranking, even if misclassi ed.</p>
      <p>For relevant documents instead the misclassi cation
always negatively a ects the ranking.</p>
      <p>With this approach we can observe how hard is to
overcome the baseline, i.e. we can identify how e ective must
be the opinion detection technique to improve the starting
topic retrieval.</p>
      <p>
        Re-ranking techniques essentially are fusion models [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
that combine a relevance score sR(d) and an opinion score
sO(d) (or two ranks derived from these scores) for a
document d. The new score sOR(d) is a function of the two non
negative scores, sR(d) and sO(d):
      </p>
      <p>sOR(d) = f (sR(d); sO(d))
re-ranked. sCOR(d) is de ned as follows:
onGtihveenouatccloamsseis eorfCCOkOk, waeccdoerdninegatnoewwhsiccohrethsCe baseline is
OR(d) based
sCOR(d) = ( f (sf R(s(Rd)(;ds)O;0()d)) iiff dd 622CCOOkk OO
(6)
(7)
where 2COk denotes the classi er outcome, that is when the
document is assigned to a given class. Note when k = 100%
and assuming that f ( ; ) is a not decreasing function of
sO( ), i.e. f (sR(d); x) f (sR(d); x0); 8x x0, the
opinion MAP of any ranking based on sOR( ) does not exceed
that based on sCOR( ) .</p>
      <p>All the above considerations can be further extended to
the case in witch the sOR(d) is based on the ranks of d
instead of on its scores (of relevance and opinion).</p>
    </sec>
    <sec id="sec-6">
      <title>EXPERIMENTATION RESULTS</title>
      <p>In this paper we report the experimentation results for the
ltering approach. The ltering process has been repeated
20 times for each baseline and for accuracy k = 0:5, 0:6,
0:7,0:8,0:9,1. Mean values of the MAPs are reported.</p>
      <p>Table 2 reports, in decreasing order, the relevance MAPs
(M APR) and the opinion MAPs (M APO) for each baseline.</p>
      <p>BL4
BL5
BL3
BL1
BL2</p>
      <p>Baselines
MAPR</p>
      <p>In gure 1 MAP values are reported for each baseline as
long as the accuracy of classi ers changes. The dotted lines
represent the baselines opinion MAPs and the dot-dashed
lines represent the baseline relevance MAPs. The MAP
values of random classi er is also reported as the dashed lines
in the graphs.</p>
      <p>Analysing the MAP trend we can infer the following
observations:
1. the baseline MAPR is an upper bound for the MAP0
obtained with a ltering approach;
2. the random classi er always deteriorate the
performance of the baseline MAP0.
3. the minimal accuracy needed to improve by ltering
the baseline MAP0 is very high, at least 80%;
4. there is a linear correlation between the MAP0
achievable by a classi er with accuracy k and the accuracy
itself.</p>
      <p>First three remarks says that ltering strategy is very
dangerous for MAP0 performance, that is removing documents
a ects greatly the performance of the topical opinion
retrieval.</p>
      <p>From the above considerations, we may conclude that the
opinion retrieval task is not easy and that having good
results with a ltering approach requires a too high accuracy.
The experimentation instead allows us to identify a plausible
range for the MAP achievable by an opinion retrieval system:
the classi er with accuracy 100% and the random classi ers
obtains performance that can be considered as thresholds
for the best and the worst opinion detection system. It is
also evident that higher the baseline MAP is, higher the
accuracy of classi er must be to introduce some bene ts with
a ltering approach with respect to relevance only retrieval.
6.</p>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSIONS AND FUTURE WORKS</title>
      <p>
        The opinion retrieval problem seems to be a relatively
hard task: the combination of two variables like topic
relevance and opinion, requires a deep analysis on their
correlation. From the results of TREC competitions [
        <xref ref-type="bibr" rid="ref10 ref6 ref8 ref9">8, 6, 10,
9</xref>
        ], emerges the lack of exhaustive evaluations measures: the
MAP, Precision at 10 and R-Precision are not su cient alone
to give a complete analysis on the systems performances.
      </p>
      <p>
        Up to now we have studied only the ltering of documents
by opinions. This strategy however requires a very high
accuracy of the classi cation. We will compute the study
with re-ranking approach starting from the approach used
in [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
      </p>
      <p>Our approach is able to provide an indicative accuracy
of the opinion component of the topical opinion retrieval
system. It also allows us to propose an evaluation
framework, able to evaluate the e ectiveness of opinion retrieval
systems.
7.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Amati</surname>
          </string-name>
          , E. Ambrosi,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gaibisso</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Gambosi</surname>
          </string-name>
          . Fub, iasi
          <article-title>-cnr and university of tor vergata at trec 2007 blog track</article-title>
          .
          <source>In Proc. of the 16th Text Retrieval Conference (TREC)</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Amati</surname>
          </string-name>
          , G. Amodeo,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gaibisso</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Gambosi</surname>
          </string-name>
          .
          <article-title>A uniform theoretic approach to opinion and information retrieval</article-title>
          , in Intelligent Information Access, G. Armano, M. de Gemmis, G. Semeraro, and E. Vargiu (eds.)
          <source>Studies in Computational Intelligence</source>
          . Springer, to appear.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Skomorowski</surname>
          </string-name>
          and
          <string-name>
            <given-names>O.</given-names>
            <surname>Vechtomova</surname>
          </string-name>
          .
          <article-title>Ad hoc retrieval of documents with topical opinion</article-title>
          . In G. Amati,
          <string-name>
            <given-names>C.</given-names>
            <surname>Carpineto</surname>
          </string-name>
          , and G. Romano, editors,
          <source>ECIR</source>
          , volume
          <volume>4425</volume>
          of Lecture Notes in Computer Science, pages
          <volume>405</volume>
          {
          <fpage>417</fpage>
          . Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Eguchi</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Lavrenko</surname>
          </string-name>
          .
          <article-title>Sentiment retrieval using generative models</article-title>
          .
          <source>In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <volume>345</volume>
          {
          <fpage>354</fpage>
          ,
          <string-name>
            <surname>Sydney</surname>
          </string-name>
          , Australia,
          <year>July 2006</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Mishne</surname>
          </string-name>
          .
          <article-title>Multiple ranking strategies for opinion retrieval in blogs</article-title>
          .
          <source>In The Fifteenth Text REtrieval Conference (TREC 2006) Proceedings</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Macdonald</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Ounis</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Soboro .</surname>
          </string-name>
          <article-title>Overview of the trec-2007 blog track</article-title>
          .
          <source>In Proc. of the 16th Text Retrieval Conference (TREC)</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Crag</given-names>
            <surname>Macdonald</surname>
          </string-name>
          and
          <string-name>
            <given-names>Iadh</given-names>
            <surname>Ounis</surname>
          </string-name>
          .
          <article-title>The trec blogs06 collection : Creating and analysing a blog test collection</article-title>
          .
          <source>Technical report</source>
          , University of Glasgow Scotland, UK,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>I.</given-names>
            <surname>Ounis</surname>
          </string-name>
          , M. de Rijke, C. Macdonald,
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Mishne</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Soboro .</surname>
          </string-name>
          <article-title>Overview of the trec-2006 blog track</article-title>
          .
          <source>In TREC 2006 Working Notes</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>I.</given-names>
            <surname>Ounis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macdonald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I.</given-names>
            <surname>Soboro</surname>
          </string-name>
          .
          <article-title>On the trec blog track</article-title>
          .
          <source>In Proc. of the 2nd International Conference on Weblogs and Social Media (ICWSM)</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>I.</given-names>
            <surname>Ounis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macdonald</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Soboro .</surname>
          </string-name>
          <article-title>Overview of the trec-2008 blog track</article-title>
          .
          <source>In Proc. of the 17th Text Retrieval Conference (TREC)</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B.</given-names>
            <surname>Pang</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>Opinion mining and sentiment analysis</article-title>
          .
          <source>Foundations and Trends in Information Retrieval</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          {2):1{
          <fpage>135</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Vaithyanathan</surname>
          </string-name>
          .
          <article-title>Thumbs up? sentiment classi cation using machine learning techniques</article-title>
          .
          <source>In Proc. of the ACL-02 conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <volume>79</volume>
          {
          <fpage>86</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Ian</given-names>
            <surname>Soboro</surname>
          </string-name>
          and
          <string-name>
            <given-names>Donna</given-names>
            <surname>Harman</surname>
          </string-name>
          .
          <article-title>Overview of the trec 2003 novelty track</article-title>
          .
          <source>In TREC</source>
          , pages
          <volume>38</volume>
          {
          <fpage>53</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>