<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UNED Online Reputation Monitoring Team at RepLab 2013?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Damiano Spina</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jorge Carrillo-de-Albornoz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tamara Mart n</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Enrique Amigo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julio Gonzalo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fernando Giner</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>damiano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>jcalbornoz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>tmartin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>enrique</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>julio}@lsi.uned.es</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>fginer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>@alumno.uned.es</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>UNED NLP &amp; IR Group Juan del Rosal</institution>
          ,
          <addr-line>16 28040 Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the UNED's Online Reputation Monitoring Team participation at RepLab 2013 [3]. Several approaches were tested: rst, an instance-based learning approach that uses Heterogeneity Based Ranking to combine seven di erent similarity measures was applied for all the subtasks. The ltering subtask was also tackled by automatically discovering lter keywords: those whose presence in a tweet reliably con rm (positive keywords) or discard (negative keywords) that the tweet refers to the company [16]. Di erent approaches have been submitted for the topic detection subtask: agglomerative clustering over wiki ed tweets, co-occurrence term clustering [10] and an LDA-based model that uses temporal information. Finally, the polarity subtask was tackled by following the approach presented in [14] to generate domain speci c semantic graphs in order to automatically expand the general purpose lexicon SentiSense [9]. We next use the domain speci c sub-lexicons to classify tweets according to their reputational polarity, following the emotional concept-based system for sentiment analysis presented in [8]. We corroborated that using entity-level training data improves the ltering step. Additionally, the proposed approaches to detect topics obtained the highest scores in the o cial evaluation, showing that they are promising directions to address the problem. In the reputational polarity task, our results suggest that a deeper analysis should be done in order to correctly identify the main di erences between the Reputational Polarity task and traditional Sentiment Analysis tasks. A nal remark is that the overall performance of a monitoring system in RepLab 2013 highly depends on the performance of the initial ltering step.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>This paper describes the UNED Online Reputation Monitoring Team
participation in RepLab 2013, which is focused on organizing and classifying tweet
streams associated with an entity of interest, in order to facilitate to the
analysts the labor of monitoring the reputation of an entity in Twitter. We have
participated in four of the ve subtasks proposed in the evaluation campaign:
ltering unrelated tweets, classifying tweets according to its reputational polarity,
detecting topics discussed in tweets and the full monitoring task.</p>
      <p>The RepLab 2013 collection consists of 61 entities from four di erent
domains: automotive, banking, university and music. Each of the entities has an
associated set of around 750 manually annotated tweets for training and 1,500
tweets for testing. Tweets are written in English and Spanish, following the same
(unbalanced) language distribution as in Twitter. Crawling was performed
during the period from the 1st June 2012 till the 31st Dec 2012 using the entitys
canonical name as query (e.g. "BMW"), and training/test datasets correspond
to di erent time ranges. The corpus also comprises additional background tweets
for each entity (up to 50,000, with a large variability across entities).</p>
      <p>
        We have applied a range of di erent approaches to each of the tasks, plus a
horizontal approach for all of them. The ltering task has been tackled with
keyword recognition based techniques learned from each entity. Polarity has been
addressed with semantic graphs for domain-speci c a ective lexicon adaptation.
In the case of topic detection, a revisited version of our three algorithms
presented in RepLab 2012 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] have been tested again in this edition. The horizontal
approach consists of an extended instance-based learning method over the
training corpus in which the similarity is measured over multiple tweet extensions
(author tweets, external link, etc.).
      </p>
      <p>The paper is organized as follows. We describe the approaches and the results
for the RepLab 2013 subtasks: ltering in Section 2, polarity in Section 3 and
topic detection in Section 4. Then, we analyze the results for the full monitoring
task in Section 5. We conclude in Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Filtering Subtask</title>
      <p>The ltering task in RepLab 2013 is oriented to disambiguate tweets in order to
discard those that do not refer to the company (i.e., related vs. unrelated). Here
we describe the two di erent approaches we have tested to tackle this problem:
instance-based learning over heterogeneity based ranking and lter keywords.
2.1</p>
      <sec id="sec-2-1">
        <title>Instance-based Learning over Heterogeneity Based Ranking</title>
        <p>
          The instance-based learning approach that we have tested is similar to the o cial
RepLab 2013 baseline, where each tweet inherits the (manually annotated) tags
from the most similar tweet in the training corpus of the same entity. The
difference is that, instead of using Jaccard distance to compute tweet similarity as
the baseline does, we have employed and combined multiple similarity measures.
Our measures expand the tweets with di erent sources, and then apply cosine
similarity. We have used the following sources to expand tweets: (i) the hashtags
in the tweets, (ii) the content of the URLs linked in the tweets, (iii) the rest
of tweets published by the same author and (iv) several parts of the wikipedia
entries associated with words in the tweet (e.g. title, wikipedia category, etc).
As one word can be associated with multiple Wikipedia entries, we have used
commonness probability [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] for disambiguation (described in Section 4.2).
        </p>
        <p>
          In order to combine all similarity measures, we have employed an
unsupervised method called Heterogeneity Based Ranking (HBR) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Basically, this
method considers the heterogeneity (a notion of diversity or disimilarity) of the
set of measures that corroborate that two texts are more similar to each other
than other pair of texts. It consists of the following steps: rst, for each tweet we
consider the most similar tweets in the training corpus according to the Jaccard
distance. Then, for each measure, we re-rank all these distances obtaining 100
similarity instances for each measure. Then, we apply the unsupervised ranking
fusion algorithm HBR to combine all the rankings. Finally, we take the tag from
the winner tweet. When we have no information to compute a given measure
(e.g. a tweet with no external links), we assign the average similarity between
the tweets in the corpus that do contain this information.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Filter Keywords Strategy</title>
        <p>
          The lter keywords strategy has proved to be a competitive approach to company
name disambiguation in Twitter [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. A positive/negative lter keyword is an
expression that, if present in a tweet, indicates a high probability that the tweet
is related/unrelated to the company.
        </p>
        <p>
          Here we explain the automatic classi cation approach, similar to [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], that we
have tested on the RepLab 2013 Filtering subtask. At a glance, it consists of two
steps: rst, lter keywords are discovered by using machine learning algorithms
(keyword discovery ); second, tweets containing positive/negative lter keywords
are used to feed a model that classi es the uncovered tweets (tweet classi cation).
Keyword Discovery. Given the tweet stream of an entity and its
representative pages (i.e., the homepage and the English and Spanish Wikipedia pages),
each term is represented by features that take into account the company's
website, Wikipedia, Open Directory Project (ODP) and the RepLab 2013 collection
itself1.
        </p>
        <p>
          Three families of features are de ned:
{ Collection-based features: This features are de ned to capture di erences
between keywords and skip terms. In this approach, we added two additional
speci city-based features, pseudo-document TF.IDF and KLD [
          <xref ref-type="bibr" rid="ref10 ref15">15,10</xref>
          ], to the
existing set of collection-based features described in [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <sec id="sec-2-2-1">
          <title>1 See [16] for details about how the features are computed.</title>
          <p>
            { Web-based features: This features should discriminate between positive
and negative lter keywords. These features are: term frequency on the
representative pages (homepage and the English and Spanish Wikipedia pages),
as well as term occurrence in relevant search results in Wikipedia and in
ODP.
{ Features expanded by co-occurrence: We applied the co-occurrence
expansion described in [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ] to all the features enumerated above.
Then, terms in the training set are labeled as positive/negative/skip by
considering the precision of the tweets covered by the term:
- If 85% of the tweets containing the term are RELATED, the term is
considered as a positive lter keyword;
- if 85% of the tweets are UNRELATED then the term is labeled as a negative
lter keyword;
- in other case, the term is labeled as a skip term.
          </p>
          <p>Finally, the labeled instances described above are used to feed a
positive-negativeskip classi er. We combine two classi ers: positive versus others and negative
versus others, using the con dence thresholds learned by the classi ers (i.e.,
those used by default to decide the nal label of each instance). Terms which
are simultaneously under/over both thresholds are tagged as skip terms.
Tweet Classi cation. After classifying the terms, tweets containing at least
one of those terms labeled as positive (negative) keywords are straightforwardly
classi ed as related (unrelated), respectively As classi ed keywords are unlikely
to cover all the tweets, we use a standard bootstrapping method to annotate
the uncovered tweets. Tweets are represented as Bag-of-Words (produced after
tokenization, lowercase and stop word removal) and term occurrence is used as
weighting function; nally, a supervised machine learning algorithm is used to
classify the tweets.</p>
          <p>The tweet classi cation process has been carried out at the entity level, that
is, for each entity we use the tweets retrieved by the keywords as seed, only using
the training set of the entity, in order to classify automatically the remaining
tweets of the entity.
2.3</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Submitted Runs</title>
        <p>
          Table 1 provides an overview of the ltering runs, showing the type of training
data used in each of them. Run UNED ORM filtering 1 is the instance-based
learning over HBR, that works at the entity level as described in 2.1. Runs
UNED_ORM_filtering_3, UNED_ORM_filtering_4 and UNED_ORM_filtering_5
follow the lter keyword approach described in Section 2.2, considering di
erent training data to build the model used in the keyword classi cation step:
UNED_ORM_filtering_3 joins all the terms from the di erent entities in the
training dataset to build a single model, while UNED_ORM_filtering_4 and
UNED_ORM_filtering_5 build a speci c model at the entity level. The di
erence between them is that, while UNED_ORM_filtering_5 only considers
entityspeci c data, UNED_ORM_filtering_4 uses exactly the complementary. The
latter run simulates the semi-supervised scenario in which the system does not use
any previously annotated data about the target entity (like in previous
evaluation campaigns [
          <xref ref-type="bibr" rid="ref2 ref4">2,4</xref>
          ]). Since in this edition of RepLab, annotated data about
the target entity is available, using this data instead of lter keywords to feed
the tweet classi cation step will give us an idea of the performance that we can
reach. For each entity, the run UNED_ORM_filtering_2 uses the tweets from the
training data to learn the model that will directly classify all the tweets in the
test data.
instance-based learning + HBR UNED ORM filt 1
lter keywords UNED ORM filt 5 UNED ORM filt 3 UNED ORM filt 4
lter keywords (tweet classif. only) UNED ORM filt 2
        </p>
        <p>
          In all the lter keywords' runs (from 2 to 5), tweets are tokenized using
a Twitter-speci c tokenizer [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], frequent words are removed using both
English and Spanish stop word lists, and terms occurring less than 5 times in the
collection are discarded. The machine learning algorithm used in all the
experiments was Nave Bayes, using the implementation provided by the Rapidminer
toolkit [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
2.4
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Results</title>
        <p>Table 2 reports the scores obtained for the evaluation metrics used in the ltering
subtask: Accuracy, Reliability (R), Sensitivity (S) and F1(R; S). For each of the
runs, the position on the o cial F1(R; S) RepLab rank is also shown.</p>
        <p>Even if the results obtained in terms of accuracy are competitive, the scores
according to Reliability and Sensitivity measures evidence the need for a better
understanding of the problem. Comparing the results obtained with the baseline,
only the run UNED ORM filtering 2 outperforms it in terms of accuracy and
F1(R; S), which proves that there is still room for improvement.</p>
        <p>As expected, runs that use previously annotated data from the entity are
signi cantly better than semi-supervised approaches. A deeper analysis of the
machine learning steps is needed to understand the actual limitations of the lter
keywords approach.
UNED ORM filtering 2 0.8587
baseline 0.8714
UNED ORM filtering 1 0.8733
UNED ORM filtering 5 0.8423
UNED ORM filtering 4 0.5020
UNED ORM filtering 3 0.5026
3</p>
        <p>Polarity Subtask
Most of the approaches that aim to detect polarity in texts make use of a ective
lexicons in order to identify opinions, sentiments or emotions. The main
drawback of a ective lexicons' development is the manual e ort needed to generate
good quality resources. Besides, these resources should be designed for a
general purpose, not taking into account the peculiarities of a speci c domain. The
RepLab 2013 dataset is a perfect evaluation framework to study and analyze
automatic methods for domain adaptation of a ective lexicons.</p>
        <p>Sentiment Analysis (SA) and Reputational Polarity (RP) are close but di
erent tasks. While the rst is mainly focused on identifying subjective content in
product/services reviews, the second one aims to measure if a text has positive
or negative implications for a company's reputation. This de nition includes
the well known, but less studied in SA, polar facts. Statements such as
\Report: HSBC allowed money laundering that likely funded terror, drugs, ..." are
clearly negative when analyzing the reputation of HSBC company, even though
no opinion is expressed. The e ect of polar facts can be reasonable biased using
a domain-speci c a ective lexicon. In the example, a system using a lexicon that
attaches to the words money laundering, terror or drugs a negative polarity or
emotion will correctly classify this fact.</p>
        <p>
          Within this premise our aim in this approach is to evaluate if the use of
semantic graphs and word sense disambiguation [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] will help to generate
domainspeci c a ective lexicons. Following this idea we generate domain speci c
semantic graphs in order to expand the existing lexicon SentiSense [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. To classify
tweets with reputational polarity we have used the emotional concept-based
system for sentiment analysis presented in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and adapted it to work with English
and Spanish texts simultaneously [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
Our hypothesis is that using a semantic graph, where the nodes represent
concepts and the edges represent semantic relations between concepts, the emotional
meaning of previously labeled concepts can be spread to other concepts not
labeled and that are strongly related in meaning. Even if this technique can be
applied to a general purpose dataset, it seems that a domain speci c dataset
will produce better results due to the similarity in vocabulary of the di erent
documents. To this end, we followed the approach presented in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], where the
main points are the use of a word sense disambiguation algorithm to properly
obtain the correct sense of each word and the use of di erent measures of text
similarity and WordNet relations between concepts to generate a semantic graph
of a document. This graph is next used to identify the di erent topics that are
dealt with in the document and to extract the most representative sentences in
the document to generate an automatic summary. Instead of manually
labeling the initial seed of concepts with emotional meaning, we use the SentiSense
a ective lexicon as a seed.
        </p>
        <p>
          The method proposed for domain adaption of a ective lexicons consists of
three steps:
{ Concept Identi cation. In order to determine the appropriate sense of
each word in WordNet we use the UKB algorithm [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] that is available both
for English and Spanish. According to this, in this step each tweet in the
dataset is represented as WordNet concepts. In this step also all hypernyms
of such concepts are retrieved in order to enrich the graph generation. Note
that only nouns, verbs, adjectives and adverbs are used to generate the graph.
        </p>
        <p>
          Also, a list of stop words has been used to remove non-relevant words.
{ Graph generation. We have analyzed di erent relations between concepts
as studied in [
          <xref ref-type="bibr" rid="ref14 ref7">14,7</xref>
          ] in order to determine which are suitable to propagate
the emotional meaning to closely related concepts. As the dataset consists
of tweets and the number of words per tweet is 10 words average, using only
semantic relations from WordNet produces very unconnected graph. For this
reason, we have also included a co-occurrence relation that links concepts
which co-occur between 3 to 10 times in the corpus. Our experiments with
the training sets reveal that the co-occurrence relation and the WordNet
relations of hypernymy, antonymy and derived from are the most
appropriate for spreading the emotional meaning between concepts. We use similar
weights as the proposed in [
          <xref ref-type="bibr" rid="ref14 ref7">14,7</xref>
          ] to ponder the di erent relations in the
graph. To this end, we weight with 1:0 each pair of concepts related by the
WordNet pointer derived from and with 1:0 for the antonymy one.
Following the idea that a hypernym is a generalization of a concept, the weight
assigned to these relations follow the equation 1. Finally, we weight with 1:0
each co-occurrence of two concepts.
        </p>
        <p>weight(Ci; Ej) = 1=(depth(hyperi) + 1)
(1)
{ Propagating emotions to new concepts. Finally, in this step the
semantic graph is used to extend the emotional categories to new concepts not
labeled in SentiSense. To this end, a concept Ci not previously labeled in
SentiSense is labeled as follows:</p>
        <p>For all the incoming links that represent relations of co-occurrence,
hypernymy and derived from, and that connect Ci with concepts of a same
emotional category, the weights of all links are added. For the antonymy
relations, the emotional categories are replaced by their antonyms, as
given in the SentiSense lexicon. As a result, for the concept Ci we get
a vector of 14 positions, each representing a SentiSense emotion, where
the vector positions represent the weight of each emotion in the concept.
It is important to note that these weights are domain-speci c, since are
derived from the domain semantic graph.</p>
        <p>The concept is nally labeled with multiple emotional categories, which
weights are normalized at the concept level, so that the sum of the
weights of all the emotional categories associated to the concept is 1:0.</p>
        <p>Figure 1 shows an example of an expanded concept.</p>
        <p>We have tested di erent approximations to generate the domain-speci c
semantic graphs using the training set of RepLab 2013. Our rst approximation
used all entities of the same domain to generate the graph and to adapt the
a ective lexicon of SentiSense. We evaluated di erent approaches: using just
related tweets of the training set, using all tweets of the test set, and using both
related tweets of the training set and all tweets of the test set. We have also
tested generating graphs at the entity level, that is to say, generating a graph
for each entity.
3.2</p>
      </sec>
      <sec id="sec-2-5">
        <title>Emotional concept-based system for Sentiment Analysis</title>
        <p>
          The resulting domain-speci c a ective lexicons are used to identify emotions in
the tweets of RepLab 2013 dataset using the emotional concept-based system
for sentiment analysis presented in [
          <xref ref-type="bibr" rid="ref7 ref8">8,7</xref>
          ] and described here for clari cation:
{ Pre-processing: POS Tagging and Concept Identi cation. The
objective of the rst step is to translate each text to its conceptual representation
in order to work at the concept level in the next steps and avoid word
ambiguity. To this aim, the input text is split into sentences and the tokens are
tagged with their POS. With this information, the system next maps each
token to its appropriate WordNet concept using the UKB algorithm [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
{ Emotion Identi cation. Once the concepts are identi ed, the next step
maps each WordNet synset to its corresponding emotional category in the
SentiSense a ective lexicon, if any. In this steps the di erent generated
domain speci c lexicons generated using the semantic graph approach are used.
{ Post-processing: Negation and Intensi ers. In this step, the system
has to detect and solve the e ect of negations and intensi ers over the
emotions discovered in the previous step. This process is important, since these
linguistic modi ers can change the polarity and intensity of the emotional
meaning of the text.To this end, our system rst identi es the presence
of modi ers using a list of common negation and intensi cation tokens. In
such a list, each intensi er is assigned a value that represents its weight or
strength. The scope of each modi er is determined using the syntax tree of
the sentence in which the modi er arises. We assume as scope all
descendant leaf nodes of the common ancestor between the modi er and the word
immediately after it, and to the right of the modi er. However, this process
may introduce errors in special cases, such as subordinate sentences or those
containing punctuation marks. In order to avoid this, our method includes
a set of rules to delimit the scope in such cases. These rules are based on
speci c tokens that usually mark the beginning of a di erent clause (e.g.,
because, until, why, which, etc.). Since some of these delimiters are
ambiguous, their POS is used to disambiguate them. Once the modi ers and their
scope are identi ed, the system solves their e ect over the emotions that
they a ect in the text. The e ect of negation is addressed by substituting
the emotions assigned to the concepts by their antonyms. In the case of the
intensi ers, the concepts that fall into the scope of an intensi er are tagged
with the corresponding percentage weight in order to increase or diminish
the intensity of the emotions assigned to the concepts.
{ Classi cation. In the last step, all the information generated in the previous
steps is used to translate each text into a Vector of Emotional Intensities
(VEI), which will be the input to a machine learning algorithm. The VEI
is a vector of 14 positions, each of them representing one of the emotional
categories of the SentiSense a ective lexicon. The values of the vector are
generated as follows:
        </p>
        <p>For each concept, Ci, labeled with an emotional category, Ej, the weight
of the concept for that emotional category, weight(Ci; Ej), is set to 1.0.
If no emotional category was found for the concept, and it was assigned
the category of its rst labeled hypernym, hyperi, then the weight of the
concept is computed as:
weight(Ci; Ej ) = 1=(depth(hyperi) + 1)
(2)
If the concept is a ected by a negation and the antonym emotional
category, Eantonj , was used to label the concept, then the weight of the
concept is multiplied by = 0:6. This value has been empirically
determined in previous studies. It is worth mentioning that the experiments
have shown that values below 0.5 decrease performance sharply, while
it drops gradually for values above 0.6.</p>
        <p>If the concept is a ected by an intensi er, then the weight of the
concept is increased/decreased by the intensi er percentage, as shown in
Equation 3.
weight(Ci; Ej ) = weight(Ci; Ej ) (100 + intensif ier percentage)=100
(3)
Finally, the position in the VEI of the emotional category assigned to
the concept is incremented by the weight previously calculated.
3.3</p>
      </sec>
      <sec id="sec-2-6">
        <title>Results</title>
        <p>The polarity detection task in RepLab 2013 consists of classifying tweets as
positive, negative or neutral. It is important to notice that the polarity is oriented
to reputation. That is, an emotionally negative tweet is not necessarily negative
from the reputation point of view (i.e. \I'm sad. Michael Jackson is dead"). Note
that tweets that are unrelated according to human assessors are not considered
in the evaluation.</p>
        <p>
          For the individual evaluation of the reputational polarity task, performance
is evaluated in terms of accuracy (% of correctly annotated cases) over related
tweets. Reliability (R) and sensitivity (S) are also included for comparison
purposes [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Overall scores (accuracy, R, S, F(R,S)) are computed as the average
of individual scores per entity, assigning the same weight to all entities.
        </p>
        <p>
          As the system for reputational polarity is a supervised method, we have
tested di erent machine learning algorithms in order to determine the best
classi er for the task. In our experiments the logistic regression model (Logistic) as
implemented in Weka with default parameters has obtained the best results. As
previously mentioned we tested di erent approaches for generating the
domainspeci c semantic graph, however we only could submit the runs using the graph
generated using related tweets of the training set and at the domain level. So,
the runs submitted are:
{ UNED ORM polarity 1: This run uses the approach described in section 2.1,
which is similar to the baseline but using di erent text similarity measures
and the Heterogeneity Based Ranking approach to combine them.
{ UNED ORM polarity 2: This run uses the system presented in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] trained at
the entity level.
{ UNED ORM polarity 3: This run uses the system presented in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] trained at
the entity level, but using a balanced training instead of the whole training
set.
{ UNED ORM polarity 4: This run uses the system presented in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] trained at
the entity level but using the domain-speci c SentiSense lexicon generated
with the graph resulted of all tweets related of the training set of the same
domain.
{ UNED ORM polarity 5: This run uses the same system as UNED ORM polarity 4,
but using a balanced training set instead of the whole training set.
        </p>
        <p>
          As shown in Table 3.3, the best performing con guration (UNED_ORM_polarity_
2) is that which has been trained at the entity level and that uses the original
SentiSense lexicon (without expanding it using semantic graphs). However, the
di erence with the (UNED_ORM_polarity_4) system is not signi cant. Both
approaches perform better than the baseline in terms of accuracy, while their
performance drops when evaluating with reliability and sensitivity measures. Even
if a similar approach has been tested achieving promising results in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], our
results obtained in the RepLab 2013 dataset suggest that a deeper analysis should
be done in order to correctly understand the task. Our intuition is that
analyzing reputational polarity in Twitter slightly di ers from analyzing reviews from
lms or news. First, the text in tweets contains multiple errors and misspellings,
so the error introduced in the linguistic process (POS analysis, parsing, word
sense disambiguation, etc.) is bigger than in well structured texts such as news
or reviews. Second, we found that most of the vocabulary used contains positive
emotional meaning, so this positivity is propagated to other concepts with this
methods obtaining vectors of emotions with mostly positive emotions, which is
translated in an over learning in the positive class. Finally, as in the ltering
subtask, combining multiple similarity measures for instance-based learning slightly
outperforms the baseline that only considering Jaccard similarity over the terms.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Topic Detection Subtask</title>
      <p>
        This subtask consists of grouping entity related tweets according to topics. The
organization does not provide any set of prede ned topics. Given that the
training corpus corresponds with a previous time range, new topics can appear in
the test set. It is assumed that each tweet belong to one topic. In the following
sections we describe the di erent approaches proposed for the topic detection
subtask: LDA-based clustering, tweet wiki ed clustering and term clustering.
Finally, we show the results obtained by the runs submitted.
We use an LDA-based model in order to obtain the topics of a collection of
tweets. It is an unsupervised machine learning technique which uncovers
information about latent topics across the corpora. The model is inspired by the
TwitterLDA model [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and TOT model [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. TwitterLDA is an author-topic
model that is based in the assumptions that there is a set of topics K in
Twitter, each represented by a word distribution and each user has her topic interests
modeled by a distribution over the topics. In this model a single topic is assigned
for an entire tweet. In the TOT model, each topic is associated with a
continuous distribution over timestamps, and for each generated document, the mixture
distribution over topics is in uenced by both word co-occurrences and the
document's timestamp.
      </p>
      <p>For the monitoring scenario there are two important characteristics that are
present in TwitterLDA and TOT: tweets are about one single topic and each
topic has a time distribution. We use this features in an LDA-based model with
the following assumptions: (i) there is a set of topics K in the collection of
tweets of an entity, each represented by a word distribution and a continuous
distribution over time; and (ii) when a tweet is written, the author rst chooses
a topic based on a topic distribution for the entity.</p>
      <p>Then a bag of words is chosen one by one based on the topic. However, not
all words in a tweet are closely related to the topic of that tweet; some are
background words commonly used in tweets on di erent topics. Therefore, for
each word in a tweet, the author rst decides whether it is a background word
or a topic word and then chooses the word from the respective word distribution
of the entity.</p>
      <p>The process is described as follows:
1. Draw d Dir( ), B Dir( ),
2. For each topic z = 1; :::; K</p>
      <p>(a) draw z Dir( )
3. For each tweet d = 1; :::; D
(a) draw a topic zd M ulti( d)
(b) draw a timestamp td Beta( zd )
(c) for each word i = 1; :::; Nd
i. draw ydi Bernoulli( )
ii. if ydi = 0 : draw wdi
if ydi = 1 : draw wdi</p>
      <p>M ulti( B)
M ulti( zd )</p>
      <p>Dir( )
where: zd denotes the word distribution for topic z; zd is the time distribution
for the topic z; B the word distribution for background words and denotes
a Bernoulli distribution that governs the choice between background words and
topic words. After applying the model, a topic is represented as a vector of
probabilities over the space of words and a single topic is assigned for an
entire tweet. We employ Gibss sampling to perform approximate inference. The
graphical model is shown in Figure 2.</p>
      <p>The system also relies on transfer learning by contextualizing the target
tweets with a large set of unlabeled \background" tweets that help improving
the clustering. We include background tweets together with target tweets in the
LDA model, and we set the total number of clusters. In practice, this means that
the system can adapt to nd the right number of clusters for the target data,
overcoming one of the limitations of using LDA-based approaches (the need of
establishing a priori the number of clusters).</p>
      <p>
        We train the LDA-based model simultaneously over all the tweets in the
target set and the background and x the target number of clusters (as required
by LDA) for the whole set. Note that the LDA-model labels each tweet with only
one topic, so we will have a non-overlapping clustering. Once we have obtained
the clustering, we extract all clusters that contain at least one target tweet, and
remove non-target tweets from those clusters.
This approach relies on the hypothesis that tweets sharing concepts or entities
de ned in a knowledge base {such as Wikipedia{ are more likely to be talking
about the same topic than tweets with none or less concepts in common.
Tweet Wiki cation. We use an entity linking approach to gather Wikipedia
entries that are semantically related to a tweet. To this end, the
COMMONNESS probability [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], based on the intra-Wikipedia hyperlinks, is used to select
the most probable entity for each of the longest n-grams that were linked to
Wikipedia articles. The approach computes the probability of a concept/entity
c been the target of a link with anchor text q in Wikipedia by:
jLq;cj
      </p>
      <p>Commonness(c; q) = Pc0 jLq;c0 j
where Lq;c denotes the set of all links with anchor text q and target c.</p>
      <p>We used both Spanish and English Wikipedia dumps. Spanish Wikipedia
articles are then translated to the corresponding English Wikipedia article by
following the inter-lingual links, using the Wikimedia API2.</p>
      <p>Tweet Clustering. After tweets are wiki ed, the Jaccard similarity between
the sets of entities linked to the tweets is used to group them together: given
two tweets d1 and d2 represented by the set of Wikipedia entities C1 and C2
respectively, if Jaccard(C1; C2) &gt; th, then d1 and d2 are grouped together to the
same cluster.
4.3</p>
      <sec id="sec-3-1">
        <title>Term Clustering</title>
        <p>Let us assume that each topic related to an entity can be represented with a
set of keywords, that allow to the expert to understand what the topic is about.
Considering this, we de ne a two-step algorithm that tries to (i) identify the
terminology of each topic {by clustering the terms{ and (ii) assigning tweets to
the identi ed clusters.</p>
        <p>We use Hierarchical Agglomerative Clustering (HAC) to build the term
clustering. As similarity function, we use the con dence score returned by a classi er
that, given a pair of co-occurrent terms, guesses whether both terms belong to
the terminology of the same topic or not.
2 http://www.mediawiki.org/wiki/API:Properties</p>
      </sec>
      <sec id="sec-3-2">
        <title>Co-occurrence pair representation. We used di erent families of features</title>
        <p>
          to represent each of the co-occurring pairs:
{ Term features: Features that describe each of the terms of the co-occurrence
pair. These are: term occurrence, normalized frequency, pseudo-document
TF.IDF and KL-Divergence [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. These features were computed in two ways:
(i) considering only tweets on the labeled corpus, and (ii) considering tweets
in both the labeled and background corpus. Features based on the tweets
meta-data where each term occurs are: Shannon's entropy of named users,
URLs, hashtags and authors in the tweets where the term occurs.
{ Content-based pair term features: Features that consider both terms of
the co-occurrence pair, such as Levenshtein's distance between terms,
normalized frequency of co-occurrences, Jaccard similarity between occurrences
of each of the terms.
{ Meta-data-based pair term features: Jaccard similarity and Shannon's
entropy of named users, URLs, hashtags and authors between tweets where
both terms co-occur.
{ Time-aware features: Features based on the date of the creation of the
tweets where the terms co-occurs. Features computed are median, minimum,
maximum, mean, standard deviation, Shannon's entropy and Jaccard
similarity. These features were computed considering four di erent time intervals:
milliseconds, minutes, hours and days.
        </p>
        <p>In our classi cation model each instance corresponds to a pair of co-occurrent
terms ht; t0i in the entity stream of tweets. In order to learn the model, we
extract training instances from the RepLab 2013 training dataset, considering
the following labeling function:
label (ht; t0i) =
clean if maxj Precision(Ct\t0 ; Lj ) &gt; 0:9
noisy in other case
where Ct\t0 is the set of tweets where terms t and t0 co-occurs and L is the set
of topics in the goldstandard, and</p>
        <p>Precision(Ci; Lj ) = jCi \ Lj j
jCij</p>
        <p>After this process, term pairs with 90% of the tweets belonging to the same
cluster in the goldstandard (i.e., purity=0.9) are considered clean pairs.
Otherwise, terms with less purity are labeled as noisy pairs.</p>
        <p>Hierarchical Agglomerative Term Clustering. Using all the co-occurrence
pair instances in the training set, we build a single binary classi er that will be
applied to all the entities in the test set. Then, the con dence of a co-occurrence
pair belonging to the clean class is used to build a similarity matrix between
terms. A Hierarchical Agglomerative Clustering is then applied to cluster the
terms, using the previously built similarity matrix. After building the
agglomerative clustering, a cut-o threshold based on the number of possible merges is
used to return the nal term clustering solution.</p>
        <p>Tweet clustering. The second step of this algorithm consists on assigning
tweets to the identi ed term clusters. Each tweet is assigned to the cluster with
highest Jaccard similarity.
4.4</p>
      </sec>
      <sec id="sec-3-3">
        <title>Submitted Runs</title>
        <p>A total of seven runs have been submitted for the topic detection subtask.
Table 4 summarizes the approaches and the parameters used for each of the runs.
UNED_ORM_topic_det_1 consist of applying instance-based learning over HBR,
analogously as described in Section 2.1. UNED_ORM_topic_det_2 is the
wikied tweet clustering approach, using the threshold for the Jaccard similarity
of th = 0:2. This threshold has been empirically optimized using the
training dataset3. UNED_ORM_topic_det_3,UNED_ORM_topic_det_4 and UNED_ORM_
topic_det_5 use the term clustering approach using di erent machine
learning algorithms to combine the features (Nave Bayes and Logistic Regression)
and di erent thresholds applied to the HAC. A threshold of 0.30 indicates that
the nal clustering considered is the one produced when the 30% of all possible
merges are applied. In all the term clustering runs, the merges are carried out by
the mean linkage criterion. Again, only terms after stopword removal and with
occurrence greater than ve are considered. Finally, UNED_ORM_topic_det_6
and UNED_ORM_topic_det_7 use the LDA-based clustering approach, where the
transfer learning step is carried out considering background tweets from the
entity with a max. of 10,000 tweets or from other randomly selected entity with
10,000 tweets, respectively.
4.5</p>
      </sec>
      <sec id="sec-3-4">
        <title>Results</title>
        <p>
          Table 5 shows the results obtained by the submitted runs in the topic detection
subtask and compared to the baselines. The table reports scores for the metrics
Reliability (R), Sensitivity (S) and F1-measure of R and S, F1(R; S). It also
reports the position of the runs in the ranking provided by the organizers. As a
clustering task, Reliability (R) corresponds to BCubed Precision and Sensitivity
(S) corresponds to BCubed Recall [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
        <p>In general, all the approaches are in the top of the rank, performing signi
cantly better than the baselines. The wiki ed tweet clustering gets the highest
score in all the metrics. Term clustering approach seems to performs similarly
when changing the HAC cut-o threshold or the machine learning algorithm. On
the other hand, the LDA-based clustering approach performs better when the
background collection used for transfer learning contain tweets from di erent</p>
        <sec id="sec-3-4-1">
          <title>3 Note that this is the only supervision used on this system.</title>
          <p>entities/test cases (in this run is ensures to have a collection of 10,000 tweets
as background). Finally, as in the other subtasks, combining multiple similarity
measures for instance-based learning slightly outperforms the baseline that only
consider Jaccard similarity over the terms.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Full Monitoring Task</title>
      <p>The full monitoring task combines the three subtasks that are typically addresses
in a monitoring process: rst, the tweets that are not related to the entity of
interest are ltered out ( ltering); then, the related tweets are clustered by
topics (topic detection) and nally, those topics are ranked by priority (priority).
Next we describe the submitted runs for the full monitoring task, as well as the
obtained results.
For the full monitoring task, we submitted 8 di erent runs in total. The runs
combine di erent approaches for each of the three subtasks, considering the
subsystem as a pipeline (e.g., topic detection is carried upon the tweets labeled
as related after the ltering step). Table 6 shows the di erent combinations
submitted. For instance, the run UNED_ORM_full_task_1 consists of applying
instance-based learning over HBR for the three subtasks. Run UNED_ORM_full_
task_2 uses the straightforward tweet classi cation ltering system, the wiki ed
tweet clustering techniques to detect topics over the related tweets and nally
the baseline provided by the organizers for the priority subtask (instance-based
learning over Jaccard distance).
UNED ORM full task 1 UNED ORM filt 1 UNED ORM topic det 1 UNED ORM priority 1
UNED ORM full task 2 UNED ORM filt 2 UNED ORM topic det 2 baseline
UNED ORM full task 3 UNED ORM filt 3 UNED ORM topic det 2 baseline
UNED ORM full task 4 UNED ORM filt 2 UNED ORM topic det 3 baseline
UNED ORM full task 5 UNED ORM filt 3 UNED ORM topic det 3 baseline
UNED ORM full task 6 baseline UNED ORM topic det 6 baseline
UNED ORM full task 7 baseline UNED ORM topic det 7 baseline
UNED ORM full task 8 UNED ORM filt 3 UNED ORM topic det 2 UNED ORM priority 1
We have described and discussed here the systems submitted by UNED-ORM
group to the RepLab 2013 evaluation campaign. We have participated in three
of the subtasks ( ltering, topic detection and polarity), as well as in the full
monitoring task.</p>
      <p>The ltering subtask turned out to be crucial for the overall performance of
a monitoring system. An in-depth analysis of the results is needed to understand
the limitations of our lter keyword approach - which was initially developed for
semi-supervised scenarios, not for a fully supervised scenario as in RepLab 2013.</p>
      <p>In the reputational polarity task, an approach to generate semantic graphs
for domain-speci c a ective lexicon adaptation has been tested. Even if a similar
approach has been previously tested achieving promising performance in a
traditional Sentiment Analysis task using news and reviews as input data, our results
in the RepLab 2013 seem to be less competitive and deserve further analysis. In
particular, more analysis on the di erences between Reputational Polarity and
traditional sentiment analysis is needed.</p>
      <p>Three of our topic detection approaches (LDA-based clustering, wiki ed
tweet clustering and term clustering) perform competitively with respect to
other RepLab submissions. Still, the room for improvement is large, and it is
not easy to assess how much of the problem can be solved automatically.</p>
      <p>Finally, in terms of instance-based learning, the results suggest that extending
tweets with associated contents (tags, external links, Wikipedia entries) does not
provide useful signals to improve performance, at least in our current setting.</p>
      <p>Future work will focus on the analysis of the results and the optimization of
the di erent subsystems for the monitoring task.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agirre</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soroa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Personalizing pagerank for word sense disambiguation</article-title>
          .
          <source>In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics</source>
          . pp.
          <volume>33</volume>
          {
          <fpage>41</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Amigo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Artiles</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spina</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corujo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>WePS-3 Evaluation Campaign: Overview of the Online Reputation Management Task</article-title>
          .
          <source>In: CLEF 2010 Labs and Workshops Notebook Papers</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Amigo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carrillo-de-Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chugur</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corujo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mart n</surname>
          </string-name>
          , T.,
          <string-name>
            <surname>Meij</surname>
            , E., de Rijke,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spina</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          : Overview of RepLab 2013:
          <article-title>Evaluating Online Reputation Management Systems</article-title>
          .
          <source>In: CLEF 2013 Working Notes (Sep</source>
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Amigo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corujo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meij</surname>
          </string-name>
          , E., de Rijke, M.: Overview of RepLab 2012:
          <article-title>Evaluating Online Reputation Management Systems</article-title>
          .
          <source>In: CLEF 2012 Labs and Workshop Notebook</source>
          Papers (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Amigo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gimenez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verdejo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Uned: Improving text similarity measures without human assessments</article-title>
          .
          <source>In: *SEM 2012: The First Joint Conference on Lexical and Computational Semantics</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Amigo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verdejo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>A General Evaluation Measure for Document Organization Tasks</article-title>
          .
          <source>In: Proceedings of SIGIR 2013</source>
          (
          <article-title>Jul</article-title>
          .)
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Carrillo-de-Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chugur</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amigo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Using an emotion-based model and sentiment analysis techniques to classify polarity for reputation</article-title>
          .
          <source>In: CLEF 2012 Labs and Workshop Notebook</source>
          Papers (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Carrillo-de-Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gervas</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>A hybrid approach to emotional sentence polarity and intensity classi cation</article-title>
          .
          <source>In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning (CoNLL'10)</source>
          . pp.
          <volume>153</volume>
          {
          <issue>161</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Carrillo-de-Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gervas</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Sentisense:
          <article-title>An easily scalable concept-based a ective lexicon for sentiment analysis</article-title>
          .
          <source>In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Mart</surname>
            n-Wanton,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spina</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amigo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
          </string-name>
          , J.: Uned at replab 2012:
          <article-title>Monitoring task</article-title>
          . In: CLEF (Online Working Notes/Labs/Workshop) (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Meij</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weerkamp</surname>
          </string-name>
          , W., de Rijke, M.:
          <article-title>Adding semantics to microblog posts</article-title>
          .
          <source>In: Proceedings of the fth ACM international conference on Web search and data mining</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Mierswa</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wurst</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klinkenberg</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scholz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Euler</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>YALE: Rapid prototyping for complex data mining tasks</article-title>
          .
          <source>In: SIGKDD'06: Proceedings of the 12th International Conference on Knowledge Discovery and Data Mining</source>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>O</given-names>
            <surname>'Connor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Krieger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Ahn</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          : Tweetmotif:
          <article-title>Exploratory search and topic summarization for twitter</article-title>
          .
          <source>Proceedings of ICWSM</source>
          pp.
          <volume>2</volume>
          {
          <issue>3</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diaz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Using semantic graphs and word sense disambiguation techniques to improve text summarization</article-title>
          .
          <source>Procesamiento de Lenguaje Natural</source>
          <volume>47</volume>
          ,
          <issue>97</issue>
          {
          <fpage>105</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Spina</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meij</surname>
            , E., de Rijke,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oghina</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bui</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Breuss</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Identifying entity aspects in microblog posts</article-title>
          .
          <source>In: SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Spina</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amigo</surname>
          </string-name>
          , E.:
          <article-title>Discovering lter keywords for company name disambiguation in twitter</article-title>
          .
          <source>Expert Systems with Applications</source>
          <volume>40</volume>
          (
          <issue>12</issue>
          ),
          <volume>4986</volume>
          {
          <fpage>5003</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Topics over time: a non-markov continuous-time model of topical trends</article-title>
          .
          <source>In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          . pp.
          <volume>424</volume>
          {
          <fpage>433</fpage>
          . KDD '06,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>W.X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lim</surname>
            ,
            <given-names>E.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Comparing twitter and traditional media using topic models</article-title>
          .
          <source>In: Proceedings of ECIR'11</source>
          . pp.
          <volume>338</volume>
          {
          <fpage>349</fpage>
          . Springer-Verlag, Berlin, Heidelberg (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>