<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>WePS-3 Evaluation Campaign: Overview of the Online Reputation Management Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Enrique Amigo</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Javier Artiles</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julio Gonzalo</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Damiano Spina</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bing Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adolfo Corujo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Illinois at Chicago</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Llorente &amp; Cuenca</institution>
          ,
          <addr-line>Communication Consultants Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>NLP Group of UNED University</institution>
          ,
          <addr-line>Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper summarizes the de nition, resources, evaluation methodology and metrics, participation and comparative results for the second task of the WEPS-3 evaluation campaign. The so-called OnlineReputation Management task consists of ltering Twitter posts containing a given company name depending of whether the post is actually related with the company or not. Five research groups submitted results for the task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        People share opinions about products, people and organizations by means of
web sites such as blogs, social networks and product comparison sites [
        <xref ref-type="bibr" rid="ref6">8, 6</xref>
        ].
Online reputation management (ORM) consists of monitoring media, detecting
relevant contents, analyzing what people say about an entity and, if necessary,
interact with costumers. Negative comments in online media can seriously a ect
the reputation of a company, and therefore online reputation management is an
increasingly important area of corporate communication.
      </p>
      <p>Perhaps the most important bottleneck for reputation management experts is
the ambiguity of entity names. For instance, a popular brand requires monitoring
hundreds of relevant blog posts and tweets per day; when the entity name is
ambiguous, ltering out spurious name matches is essential to keep the task
manageable.</p>
      <p>WePS-3 ORM task consists of automatically lter out tweets that do not
refer to a certain company. In particular, we focus on the Twitter social network
because (a) it is a critical source for real time reputation management and (b)
also because ambiguity resolution is particularly challenging: tweets are minimal
and little context is available for resolving name ambiguity.</p>
      <p>
        This task is a natural extension of WePS evaluation campaigns, which have
been previously focused on person name ambiguity in Web Search results; with
the ORM task, WePS-3 extends its scope to cover other relevant type of named
entity. Our task is related to the TREC Blog Track [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which focused on blog
posts. However, in that case, systems dealt with information needs expressed by
queries, rather than focusing on a name disambiguation problem.
2
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Task de nition</title>
      <sec id="sec-2-1">
        <title>Twitter</title>
        <p>
          Twitter is a relatively new social networking site [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] referred to as a microblogging
service. Its particularity is that posts do not exceed 140 characters and there are
no privacy conditions. Therefore, Twitter re ects opinions in real time and it is
very sensitive to burstiness phenomena.
        </p>
        <p>Tweets are particularly challenging for disambiguation tasks given that the
ambiguity must be sorted out using a very small textual context.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Ambiguity</title>
        <p>The idea of ambiguity is actually quite fuzzy. For instance, suppose that we
are interested in a certain car brand. If the brand name is common, of course,
occurrences that refer the common word sense are not related to the brand.
But let us suppose that the brand sponsors a football team. We could think
that the referred organization is actually the football team, but not the brand.
But experts could be interested on monitoring these occurrence given that they
have spend money to be mentioned in this way. In addition, experts might be
interested on mentions to the brand generically, but not on speci c products
(which might be handled separately). In short, the ambiguity is closely related
with the concept of relevance, which is inherently fuzzy.</p>
        <p>
          For evaluation purposes, one option consists of de ning the relevance criteria
for each entity just like in other competitive tasks as TREC [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. However,
interpreting the relevance criteria can be di cult even for humans. Probably, systems
will not able to tackle this issue. Indeed, interpreting the relevance criteria can
be di cult even for humans.
        </p>
        <p>In this competition we opt for a lax interpretation of relevance, considering
ambiguity at a lexical level: the sense of the name must be derived from the
company, even if the sentence does not explicitly talk about the company. Table 1
illustrates this idea for the Apple company.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Input and output data</title>
        <p>The rst decision when de ning system input and output is whether systems
should be able to use a training set for each of the companies included in the
test set. There are two possible scenarios: an ORM company that provides
individualized services to a limited number of clients, or an online system that
...you can install 3rd-party apps that haven't been approved by Apple.. TRUE
...RUMOR: Apple Tablet to Have Webcam, 3G... TRUE
...featuring me on vocals: http://itunes.apple.com/us/album/... TRUE
...Snack Attack: Warm Apple Toast... FALSE
...okay maybe i shouldn't have made that apple crumble... FALSE
Table 1. Examples of tweet disambiguation for the company Apple
accepts any company name as input. In the rst scenario, the system will
probably be trained for each of the clients. In the second scenario, this is not viable, as
the system must immediately react to any imaginable company name. We have
decided to focus on the second scenario, which is obviously the most
challenging. Therefore, the set of organization names in the training and test corpora
are di erent.</p>
        <p>For each organization in the dataset, systems are provided with the company
name and its homepage URL. This web page contain textual information that
allows systems to model the vocabulary associated to the company. The input
information per tweet consists of a tuple containing: the tweet identi er, the
entity name, the query used to retrieve the tweet, the author identi er and the
tweet content.
3
3.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Data set</title>
      <sec id="sec-3-1">
        <title>Trial Corpus</title>
        <p>The trial corpus consists of 100 tweets per organization name. 24 companies
were selected; 18 from English speaking countries and 6 from Spanish speaking
countries. Most of these entities were extracted from a Twitter Brand Index that
appears in the blog \Fluent Simplicity"4. Table 2 enumerates these entities and
the category associated in the brand index.</p>
        <p>The rst observation was that identifying companies for our purposes was not
a trivial task. The rst reason is that many companies are not usually mentioned
in Twitter. Tweets tend to focus on certain issues. For instance, some frequents
issues are entertainment technologies, movies, travel, politics, etc. Therefore,
most companies do not have enough presence in Twitter to be included in our
test bed. In addition, many company names are either too ambiguous or not
ambiguous at all. For instance, \British Airways" is not ambiguous. However,
in order to ensure a high recall, we should use the query term \British" (e.g. \I
y with British"). But in this case, 100 tweets would not be enough to obtain
true samples. Notice that this does not imply that our systems would not be
useful to monitor British Airways. The key issue is that we need reasonably
ambiguous company names in order to make the annotation task feasible. In
short, the company selection is very costly, given that it requires to retrieve and
check tweets manually to analyze their ambiguity.</p>
        <sec id="sec-3-1-1">
          <title>4 http://blog. uentsimplicity.com/twitter-brand-index/</title>
          <p>Entity name Query Language Category</p>
          <p>Best Buy best buy English Online-shopping
Leap frog leapfrog English toys
Overstock overstock English Online-shopping</p>
          <p>Palm palm English Mobile products
Lennar lennar English home builder</p>
          <p>Opera opera English Sofstware
Research in motion rim English Mobile products</p>
          <p>TAM airlines tam English Airline</p>
          <p>Warner Bros warner English Films
Southwest Arilines southwest English Airline</p>
          <p>Dunkin Donuts dunkin English Food
Delta Airlines delta English Airline</p>
          <p>CME group cme English Financial group
Borders bookstore borders English bookstore</p>
          <p>Ford Motor ford English Motor</p>
          <p>Sprint sprint English Mobile products</p>
          <p>GAP gap English Clothing store
El hormiguero hormiguero Spanish TV program
Renfe Cercanas cercanias Spanish commuter train service</p>
          <p>El Pas pais Spanish Newspaper</p>
          <p>El Pozo pozo Spanish Food
Real madrid madrid Spanish Soccer team</p>
          <p>Cuatro cuatro Spanish TV chanel</p>
          <p>Table 2. Selected tweets for trial corpus</p>
          <p>For each company, the rst 100 tweets retrieved by the corresponding query
have been annotated directly by the task organizers. During the annotation, we
observed that the best approach consisted of detecting key terms associated to
the company. In some cases these key terms were related with a certain event that
happens just before the retrieval process (such as, for instance, a new product
launched by Palm).
3.2</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Training and test corpus</title>
        <p>The initial purpose was to de ne a methodology for the company selection.
Fluent Simplicity was not enough to pick them. The next attempt consisted of
ltering automatically the companies included in DBpedia 5 which is a
knowledge base that extracts structured information from Wikipedia. The automatic
lter consisted of detecting company names that match common names. This
should ensure the ambiguity of names. However, the presence in Twitter was
less frequent than companies from the Twitter Brand Index. In addition, again,
some company names were either too much ambiguous or not ambiguous at all.
Finally, the list was expanded with a few entities that are not exactly a company,
such as sport teams or music bands, which are very common in Twitter.</p>
        <sec id="sec-3-2-1">
          <title>5 http://dbpedia.org/About</title>
          <p>Although the original plan was to annotate around 500 entities, the training
and test corpus nally contains 100 company names. We have discarded Spanish
companies given that, for now, Twitter is still far less popular than in English
speaking countries. Table 5 shows the entities selected for the training and test
corpora.
4
4.1</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Assessments</title>
      <sec id="sec-4-1">
        <title>Mechanical Turk</title>
        <p>
          The training and test corpora have been annotated by means of Mechanical
Turk services. The advantages of using this service for annotation have been
reported in previous work [
          <xref ref-type="bibr" rid="ref3 ref5">5, 3</xref>
          ] Figure 1 shows an example of our formularies for
Mechanical Turk. Each hit contains ve tweets from the same company name
to be annotated. It also includes a brief description for the company and the
annotator can access the company web page. In order to ensure that tweets have
been annotated, there is no default value for the annotation. The annotation
options for each tweet were \related", \non related" or \undecidable". Each hit
has been redundantly annotated by ve Mecahnical Turk workers. The form
includes the following instructions to annotators:
        </p>
        <p>The next table contains tweets that apparently mention a company. The task
consists of determining whether each tweet mentions the company (button
"related"), does not mention the company (button "non related") or there is not
enough information to decide it (button "undecidable"). This page provides the
company name and its URLs. For each tweet the table includes the tweet author
and content. Notice that most tweets contain links that can help you make this
decision. Find below some examples for the Apple company.</p>
        <p>902 annotators participated in the annotation of 43730 tweets. Given that
not all company names had the same presence in Twitter and some tweets have
been discarded, the number of annotated tweets per entity is variable; between
334 and 465 tweets.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Agreement analysis</title>
        <p>We have followed the following criteria to decide the nal annotation (related or
non related) for each tweet:
{ If four or ve annotators take the same decision, then this corresponds with
the ground truth. This set represents 58% of tweets.
{ If three or more annotators agree and there is no more than one disagreeing
annotator, then we also consider that it is the ground truth. We consider
that two annotators disagree when one says \related" and the other says
\non related". This sample set represents the 21% of cases.
{ The most controversial case is when three annotators are contradicted by two
annotators. These are 14% of cases. We analyzed manually 100 samples and
we found that the three votes corresponded with the ground truth in around
80% of cases. At the risk of introducing a bit of noise in the corpus, we have
considered the majority of votes as the ground truth. In any case, system
evaluation results did not change substantially when considering these cases.
{ In a 0.1% of cases, there were less than 2 related and non related votes, in
favor of undecidable votes. We have directly discarded these cases.
{ In 7% of cases there were two related votes and two, related votes and one
undecidable. These cases have been meta-evaluated manually by the task
organizers.
4.4</p>
      </sec>
      <sec id="sec-4-3">
        <title>Entity ambiguity</title>
        <p>Figure 3 shows the distribution of ambiguity across company names. That is,
the ratio of related tweets for each entity. The company names have been sorted
according to their ratio. As the gure shows, although we have tried to select
names with medium ambiguity, there is a great variability of ambiguity in the
corpus and there is an important amount of companies with low occurrence in
tweets. This has important implications in the evaluation metric de nition. It is
desirable to check to what extent systems are able to detect the ratio of related
tweets for each single company name.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Evaluation metrics</title>
      <p>Basically, this task can be considered as a classi cation task. The most natural
way of evaluating is the accuracy measure. That is, in how many cases the
system output matches the annotation. However, this metric does not consider
the distribution of related and non related tweets within the correct outputs.
That is, for a high ambiguous company name, even only a few related tweets
appeared in the corpus, the decisions taken in these cases are crucial. This issue
has relevance in this corpus given that most of company names have a very high
or low ambiguity. We consider this aspect by computing also the precision and
recall over both the related and non related classes. In addition, the F measure
of precision and recall is computed for each company name and class.</p>
      <p>Another important aspect is how to consider the cases in which the system
does not return any results. In the case of accuracy, these are fails. In term of
precision and recall measures, these cases a ect by decreasing the recall for the
corresponding class (related or non related).</p>
      <p>Finally, we are interested in knowing to what extent the systems are able
to predict the ratio of related tweets given a query. It is important because this
ratio is enough to estimate the entity popularity in Twitter. In theory, estimating
this ratio does not strictly require to know what tweets are related and what
not. We de ne the Related Ratio Deviation as the absolute di erence between
the real ratio and the ratio given by the system.</p>
      <p>Considering the six categories for the sample set T: true positive (TP), False
positive (FP), true negative (TN), false negative (FN), empty outputs for
positive inputs (EP) and empty output for negative inputs (EN), our measures are
de ned as:</p>
      <p>Accuracy =</p>
      <p>T N + T P</p>
      <p>T</p>
      <p>Precision over the related class =
Recall over the related class =</p>
      <p>Precision over the non related class =
Recall over the non related class =</p>
      <p>T P
T P + F P</p>
      <p>T P
T P + F N + EP</p>
      <p>T N
T N + F N</p>
      <p>T N
T N + F P + EN
Related Ratio Deviation = abs
(T P + F P )
(T P + F N + ET )
T</p>
      <p>The accuracy metric assigns a relative weight to the related and non related
classes depending on the distribution of both classes in each company name.
That is, the more the tweets are related to the company, the more this class is
considered in the evaluation process. However, this weighting criterion is
arbitrary. In addition, the combination of precision and recall measures by means of
the F measure over each class assumes that both precision and recall have the
same weight. The nal ranking could change if we employed a di erent metric
weighting criterion.</p>
      <p>
        For this reason, for each system pair we check to what extent the improvement
is robust across potential metric weighting schemes by applying the UIR measure
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This measure was also employed in WEPS2 campaign [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Being T8m:a&gt;b the
number of company names such us System a improves System b for all the four
metrics, and being T the total number of company name (test cases), UIR is
de ned as follows:
      </p>
      <p>U IR(a; b) = T8m:a&gt;b</p>
      <p>T</p>
      <p>T8m:b&gt;a</p>
      <p>The more System b improves System a for all metrics (or there are
contradictory results between metrics), the more U IR(a; b) increases (decreases). We
have combined the four precision and recall metrics with UIR.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Participation and evaluation results</title>
      <p>16 runs have participated in the task. Table 3 shows the evaluation results sorted
by accuracy. Two baseline systems have been added to the ranking, consisting of
tagging all tweets as related (BaselineR) or non related(BaselineNR). The rst
observation is that the ranking discriminates participating groups.</p>
      <p>The top system is LSIR-EPFL. The main particularity of this system is the
use of additional resources for classi cation which include Wordnet, meta-data
from the web page, Google results, and user feedback (just some words). Their
experiments showed that even excluding the user feedback, they obtained high
accuracy. According to their experiment description, using the same approach
but considering just the company web page, the evaluation results would descend
to the middle of the ranking.</p>
      <p>The system SINAI (located in the middle of the ranking) also employs
additional resources, but they basically consist of named entities extracted from
the tweets while LSIR-EPFL employs all the tweet content. A deeper analysis
showed that there is a great variability of evaluation results for this system across
company names. For some company names, the system improves the top ranked
system, while for other names, it achieves very low results. This variability is
not related with the ratio of related tweets for the company name. Therefore
it is not due to classi cation thresholds. In short, the SINAI evaluation results
suggest that considering the named entities appearing in tweets is appropriate
for certain company names.</p>
      <p>The second best system is ITC-UT, which uses an initial classi cation step
to predicting the ambiguity of the company name, according to some evidences.
The classi cation step consisted of a set of rules based on Part of Speech tagging
and Named Entity recognition. Given that the system variants do not di er from
Run
LSIR.EPFL 1
ITC-UT 1
ITC-UT 2
ITC-UT 3
ITC-UT 4
SINAI 1
SINAI 4
BASELINENR
SINAI 2
UVA 1
SINAI 5
KALMAR R. 4
SINAI 3
KALMAR R. 2
KALMAR R. 5
BASELINER
ALMAR R. 1
KALMAR R. 3
each other substantially, it is di cult to know what aspect lead the system to get
ahead other systems. However, this result shows that it is possible to obtain an
acceptable accuracy just considering linguistic aspects of the company mention.</p>
      <p>The system UVA makes a relevant contribution to the task results. This
system does not employ any resource related with the company, such as the
web page or Google results. Although the accuracy results are not very high,
the Related Ratio Deviation is as low as the systems located at the top of the
ranking. This result suggests that a general classi er can be employed to predict
the presence of any company in Twitter.</p>
      <p>Finally, the Kalmar system employs a bootstrapping method starting from
the vocabulary of the web page. The global accuracy results are not very high,
but a deeper analysis shows that this approach improves the best system in
terms of F measure over the related class when just a few tweets are relevant in
the collection. In general, systems tend to achieve low F measure over the related
class when the related class is not frequent. This does not happen in the case of
Kalmar system. In other words, Kalmar results suggests that bootstrapping is
appropriate for company names with high ambiguity.</p>
      <p>Table 4 shows the UIR results. The third column represents the set of
systems that are improved by the corresponding system with no dependence on
metric evaluation weightings. As the table shows, the top system, in addition to
achieve higher Accuracy, improves robustly most of the other systems. Of course,
although a baseline system (all tweets are non related) appears in the middle of
the ranking, it does not improve robustly any other system: it is just an e ect
of the metric combination used to rank systems.</p>
      <p>Run
LSIR.EPFL 1
ITC-UT 1
ITC-UT 2
ITC-UT 3
ITC-UT 4
SINAI 1
SINAI 4
BASELINENR
SINAI 2
UVA 1
SINAI 5
KALMAR R. 4
SINAI 3
KALMAR R. 2
KALMAR R. 5
BASELINER
KALMAR R. 1
KALMAR R. 3
Accuracy Improved systems
0.83 KALMAR R. 1 KALMAR R. 5 ITC-UT 2 KALMAR R. 2</p>
      <p>KALMAR R. 3 ITC-UT 4 KALMAR R. 4 UVA 1 BASELINER
SINAI 4 UVA 1
SINAI 4, UVA 1
KALMAR R. 2, KALMAR R. 3, UVA 1,
SINAI 4, UVA 1</p>
      <p>SINAI 4, SINAI 2, UVA 1, BASELINENR</p>
      <p>KALMAR R. 1, KALMAR R. 2, KALMAR R. 3
7</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>This competition is the rst attempt to de ne a shared task to solve the problem
of company name disambiguation in social networks (Twitter in our case). Our
conclusion is that it is a task feasible to evaluate, given that we have obtained
an acceptable agreement between Mechanical Turk annotators. A corpus with
around 20,000 annotated tweets is now available for future benchmarking.</p>
      <p>The evaluation results have shed some light on how to solve the task: (i)
Considering additional sources like Google results or wordnet seems to be useful;
(ii) linguistic aspects of the company mention are also very indicative (iv) It is
possible to de ne a general approach to estimate approximately the presence
of a company name in Twitter (v) Finally, bootstrapping methods seems to be
useful, specially for highly ambiguous company names.
8</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This work has been partially supported by the Spanish Ministry of Science and
Innovation within the project QEAVis-Catiex (TIN2007-67581-C02-01).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>E.</given-names>
            <surname>Amigo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Artiles</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          .
          <article-title>Combining Evaluation Metrics with a Unanimous Improvement Ratio and its Application to the Web People Search Clustering Task</article-title>
          . In
          <source>In Proceedings Of The 2nd Web People Search Evaluation Workshop (WePS</source>
          <year>2009</year>
          ),
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>J.</given-names>
            <surname>Artiles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          , and
          <string-name>
            <surname>S. Sekine.</surname>
          </string-name>
          <article-title>WePS 2 Evaluation Campaign: overview of the Web People Search Clustering Task</article-title>
          . In
          <source>In Proceedings Of The 2nd Web People Search Evaluation Workshop (WePS</source>
          <year>2009</year>
          ),
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Michael</given-names>
            <surname>Bloodgood</surname>
          </string-name>
          and
          <string-name>
            <given-names>Chris</given-names>
            <surname>Callison-Burch</surname>
          </string-name>
          .
          <article-title>Using mechanical turk to build machine translation evaluation sets</article-title>
          .
          <source>In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech</source>
          and
          <article-title>Language Data with Amazon's Mechanical Turk</article-title>
          , pages
          <volume>208</volume>
          {
          <fpage>211</fpage>
          , Los Angeles, June 2010.
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Balachander</given-names>
            <surname>Krishnamurthy</surname>
          </string-name>
          , Phillipa Gill, and
          <string-name>
            <given-names>Martin</given-names>
            <surname>Arlitt</surname>
          </string-name>
          .
          <article-title>A few chirps about twitter</article-title>
          .
          <source>In WOSP '08: Proceedings of the rst workshop on Online social networks</source>
          , pages
          <volume>19</volume>
          {
          <fpage>24</fpage>
          , New York, NY, USA,
          <year>2008</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Audrey</given-names>
            <surname>Le</surname>
          </string-name>
          , Jerome Ajot,
          <string-name>
            <given-names>Mark</given-names>
            <surname>Przybocki</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Stephanie</given-names>
            <surname>Strassel</surname>
          </string-name>
          .
          <article-title>Document image collection using amazon's mechanical turk</article-title>
          .
          <source>In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech</source>
          and
          <article-title>Language Data with Amazon's Mechanical Turk</article-title>
          , pages
          <volume>45</volume>
          {
          <fpage>52</fpage>
          , Los Angeles, June 2010.
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Management</given-names>
            <surname>Dell</surname>
          </string-name>
          <string-name>
            <surname>Mit</surname>
          </string-name>
          , Chrysanthos Dellarocas,
          <article-title>Neveen Farag Awad, and Xiaoquan (michael Zhang. Exploring the value of online reviews to organizations: Implications for revenue forecasting and planning chrysanthos dellarocas</article-title>
          .
          <source>In Management Science</source>
          , pages
          <volume>1407</volume>
          {
          <fpage>1424</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Iadh</given-names>
            <surname>Ounis</surname>
          </string-name>
          , Craig Macdonald, and
          <string-name>
            <given-names>Ian</given-names>
            <surname>Soboro</surname>
          </string-name>
          .
          <article-title>On the trec blog track</article-title>
          .
          <source>In Proceedings of International Conference on Weblogs and Social Media (ICWSM</source>
          <year>2008</year>
          ), Seattle,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>