<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Social Media Monitoring in Real Life with Blogmeter Platform</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrea Bolioli</string-name>
          <email>abolioli@celi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federica Salamino</string-name>
          <email>salamino@celi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Veronica Porzionato</string-name>
          <email>veronica.porzionato@blogmeter.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CELI srl</institution>
          ,
          <addr-line>Torino</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Me-Source srl</institution>
          ,
          <addr-line>Milano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A social media monitoring platform used by business clients has to face interesting and sometimes unexpected issues arising from real texts processing, in particular dealing with the task of sentiment analysis of word-of-mouth communication. In this paper we describe some of the solutions adopted by BlogMeter, a proprietary listening platform that helps agencies and brands to discover what is said online about brands, people, topics and companies. We present some real life case studies, some of the linguistic resources used in the semantic annotation pipeline, and we suggest some topics for future investigations.</p>
      </abstract>
      <kwd-group>
        <kwd>sentiment analysis</kwd>
        <kwd>opinion mining</kwd>
        <kwd>mood</kwd>
        <kwd>social media monitoring</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        A social media monitoring platform used by business clients has to face
interesting and sometimes unexpected issues arising from real texts processing, in
particular dealing with the task of sentiment analysis of word-of-mouth
communication. As everybody knows, Sentiment Analysis (SA) is both a topic in
natural language processing which has been investigated since several years and
a tool for social media monitoring which is used in business services. Two
classical and essential references on this topic are [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]; a recent survey that
explores the latest trends is [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. While the rst attempts on english texts date
back to the late 90's, SA on italian texts is a more recent task (probably the
rst scienti c publication is [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]).
      </p>
      <p>In this paper we will use the term "sentiment analysis" as a broad term,
that includes the narrower terms "opinion mining" and "mood". When we use
"opinion mining" we refer to the identi cation of a belief or estimation or
judgement expressed upon an object or target (a comment upon something, in simple
words). When we use "mood" we refer to the mood state or emotional state
communicated in a portion of text.</p>
      <p>We will describe some of the solutions adopted by BlogMeter, a proprietary
listening platform, and we will present some real life case studies and some of
the linguistic resources used in the semantic annotation pipeline.</p>
      <p>The paper is organized as follows. Section 2 brie y describes the Blogmeter
platform. Section 3 presents the annotation pipeline. In section 4 we touch upon
two interesting topics in SA and in section 5 we presents case studies coming
from real-life application of sentiment analysis.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Blogmeter Platform</title>
      <p>
        BlogMeter4 is a social media monitoring service operating since 2009 and used by
private and public companies in order to collect consumer and market insights
from social media and conversations taking places through them ([
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]). The
monitoring process includes three main phases:
{ Listening: thanks to purpose-developed data acquisition systems, the
platform detects and collects from the web the potentially interesting data.
{ Understanding: a "semantic engine" is used to structure and classify the
conversations in accordance to the de ned drivers (topics and entities mentioned
in the texts).
{ Analysis: through the analysis platform the user can surf the conversations in
a structured way, aggregate the drivers in one or more dashboards, discover
unforeseen trends in the concept clouds and drill down the data to read the
messages inside their original context.
      </p>
      <p>In this paper we will focus on the Understanding phase, which includes automatic
classi cation and SA. In detail it consists of:
{ creation of a domain-based taxonomy (i.e. an ontology of brands, products,
people, topics);
{ identi cation and automatic classi cation of relevant documents (according
to the taxonomy);
{ sentiment evaluation and opinion mining (automatic or supervised).</p>
      <p>The monitored sources are typically user-generated media, such as blogs,
forums, social networks, news groups, content sharing sites, sites of questions
and answers (Q&amp;A), reviews of products / services, which are active in many
countries and in di erent languages. The overall number of sources is more than
500,000 blogs (of which approximately 70,000 active, with a post in the last
three months) and 700 gathering places (forums, newsgroups, Q&amp;A sites, content
sharing platforms, social networks). This computation considers Facebook as a
single source, but in fact, it is the largest collector of conversations (the system
monitors the public status updates and the production of over 4,000 Italian
o cial pages). We also consider web services like Instagram, Google+, Tumblr,
Twitter or sectoral services like Foursquare or TripAdvisor. On the average,
every day the system analyzes the following number of "documents":
4 www.blogmeter.eu
{ 3.7 million post retrieved from web sources;
{ over 2 million interactions from 1,000 Twitter business pro les and 4,000</p>
      <p>Facebook business pages.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Semantic Annotation Pipeline</title>
      <p>
        Documents extracted from the web in the form of unstructured information
are made available to the semantic annotation pipeline which analyze and
classify them according to the domain-based taxonomies de ned for the client. The
annotation pipeline uses the UIMA framework (the Unstructured Information
Management Architecture originally developed by IBM and now by the Apache
Software Foundation [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]). UIMA annotators enrich the documents in terms of
linguistic information, recognition of entities and concepts, identi cation of
relations between concepts, entities and attitudes expressed in the text (opinions,
mood states and emotions). Some linguistic resources and annotators are
common to di erent application domains, while others are domain dependent. We
will not describe here the pipeline modules in details, and we will focus on the
main linguistic resource used in the SA module, i.e. a concept-level sentiment
lexicon for italian. The sentiment lexicon is used by the semantic annotator,
which recognizes opinions and expressions of mood and emotions, and it
associates them with the opinion targets (when performing opinion mining). This
component operates both on the sentence level (in order to treat linguistic
phenomena such as negation and quanti cation) and on the document level, in order
to identify relations between elements that are in di erent sentences.
3.1
      </p>
      <sec id="sec-3-1">
        <title>A Sentiment Lexicon for Italian</title>
        <p>
          In this section we describe the "sentiment lexicon" used by the semantic
annotator, i.e. the repository containing terms, concepts and patterns used in the SA
annotation. Researchers have been building sentiment lexica for many years, in
particular for the english language, and a review on recent results can be found
for example in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
        <p>
          Our sentiment lexicon for Italian contains about 10.000 entries (6.200
single words and 3.400 multi-word expressions); each entry has information about
sentiment, i.e. polarity, emotions, and domain application. It has been created
and updated during the past three years, performing social media monitoring
and SA in di erent application domains. Recently, an italian lexicon for
sentiment analysis (Sentix) has been developed by [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], as the result of the alignment
of several resources. One aspect it is worth mentioning is that the valence of
many words can change in di erent context and domains. The single word
\accuratezza" ("accuracy"), for example, has a default positive valence (express a
positive attitude), just as it is for "a are d'oro" ("to do a roaring trade"). On
the contrary, \andare a casa" (\going home") has no polarity in a neutral
context, as long as it is not used in an area such as sentiment on Sanremo Festival,
where it instead means being eliminated from the singing competition. Similarly,
"truccato" ("to have make up on" or \to be rigged"), would not have negative
polarity if the domain was a fashion show in Milan. Instead, in the eld of online
games or betting, the perspective changes.
        </p>
        <p>The semantic annotator is a pattern matching component, which uses the
sentiment lexicon, operates on the previous linguistic annotations and creates
the corresponding sentiment concepts. The annotator can therefore recognize
multi-word expressions that don't explicitly convey polarity and emotions but
are related to concepts that do.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Hot Topics in Social Media Monitoring</title>
      <p>Social media and users' opinions and mood states are increasingly linked. Social
networks were born as a means of interaction and places for sharing contents;
now it is widespread the desire to share emotions and opinions quickly and with
as many people as possible. An example of a highly chatted domain on the
web is Social TV, as people love expressing their opinions about TV hosts and
participants.
4.1</p>
      <sec id="sec-4-1">
        <title>Irony Detection</title>
        <p>"Tre tweet a tuo favore su diecimila? Troiano sei un ottimista!"
("Three tweets out of ten thousand in your favor?
Troiano you are an optimist!")
"E come ogni gioved il solito interrogativo: Troiano, perche?"
("As every Thursday the same question: Troiano, why?")
"Troiano migliorato? Non ho piu parole."
("Troiano improved? I have no more words.")</p>
        <p>In the same application domain we detected a high percentage of
correspondence between ironic tweets and questions: actually, since this TV program is
not a cultural show in which questions and answers are the fundamental part,
the ironic nature of questions co-occurring with a NE could be taken for granted.</p>
        <p>
          Before proceeding with the identi cation of algorithms for the automatic
recognition of irony, we chose to focus on the in-depth knowledge of speci c
domains through the research of recurring elements in order to understand how
those domains work. In future developments we will test the validity and
representativeness of examples like those reported above.
The interest for emotion detection in social media monitoring grew in 2011 after
the publication of the paper [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], where the authors argued that the analysis of
mood in twitter posts could be used to predict stock market movements up to
6 days in advance. In details, they identi ed "calmness" as the predictive mood
dimension, within a set of 6 di erent mood dimensions (happiness, kindness,
alertness, sureness, vitality and calmness).
        </p>
        <p>
          The de nition of a set of basic (or primary) emotions is a debated topic, and
the study and analysis of emotions and their expression in texts obviously has
a long tradition in philosophy and psychology (see for example [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]). In NLP
tasks, Ekman's six basic emotions (anger, disgust, fear, joy, sadness, surprise)
has been often used (e.g. in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]). In the Blogmeter platform we adopt Ekman
list of emotions and "love", which is a primary emotion in Parrot's classi cation.
        </p>
        <p>
          An interesting task we are investigating is trying to understand which kind
of relationship does exist between emotions and irony ([
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]).
        </p>
        <p>The manual annotation of emotions in a reference italian corpus would be a
useful advance for testing the accuracy of the automatic system.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Case studies and Examples</title>
      <p>In this section we present some case studies and charts that visualize mood and
opinion trends generated in di erent contexts.
5.1</p>
      <sec id="sec-5-1">
        <title>Mood Analysis</title>
        <p>As seen before, one dimension of the mood analysis is the main polarity expressed
in a text. Blogmeter mood analysis has been used:
{ as a gauge for the common feelings expressed through a peculiar social
network and/or in a given span of time. The gure 1 for example shows the
mood expressed in italian Twitter in the period January-April 2013.
{ as a marker of the general mood during speci c events. This kind of indicator
was very useful during the tracking of live TV shows, due to its capacity to
highlight positive and negative peaks on the social network in relation with
a show's progression.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Opinion Mining</title>
        <p>When the goal becomes more speci c and the need is linking a speci c subject
(i.e. a target that could be a brand, role model or personality) with its related
opinions throughout the post, sentiment analysis allows you to automatically
examine thousands of messages in depth. Blogmeter's opinion mining has been
applied for di erent industries, such as politics, banking and telecom, where the
buzz has quite high volumes and at the same time very polarized opinions.</p>
        <p>Another advanced application is near real-time semantic alerting. When
sentiment analysis uncovers critical messages they are automatically labelled as
negative and sent by email to those who can promptly intervene. This is
important for instance in the transportation industry, where users need frequently
updated information and feedback.
5.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Emotions and Attitudes</title>
        <p>Recently there has been a growing interest for emotion analysis. This kind of
investigation can be very useful when the two poles, negative and positive moods,
have results that are too streamlined to explain more complex feelings. For
instance, this analysis has been used to explore the emotions behind social, political
and natural events like the Italian earthquake in 2012 (Fig. 3).</p>
        <p>Semantic analysis can also be powerful in order to detect additional meanings,
which are not covered in the mood/opinion/emotion dimension, expressing other
kinds of people attitudes. In one of these applications, Blogmeter worked on clear
voting intentions expressed on the social web, searching for declarations like "Ill
vote X" or "I'll choose Y" (and not just "I like Z"). During the nal days of the
last Italian political campaign, the analysis revealed the striking rise of Beppe
Grillo's party.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>We presented Blogmeter, a social media listening service that provides interesting
insights about common feelings expressed in social media, opinions about speci c
subjects and declared attitudes towards real actions or events.</p>
      <p>We showed how, in order to achieve those results, it is important to exploit the
potential of a well structured linguistic annotation pipeline, but also a
domainspeci c concept-level sentiment lexicon (also called "contextualized sentiment
lexicon" in the literature).</p>
      <p>We also presented some case studies, as examples of mood, opinion and
emotion recognition in real life use cases. We leave the issue of automatic recognition
of irony for further investigation. we hope to have the opportunity to compare
the accuracy of Blogmeter system with other ones using an o cial italian corpus
for sentiment analysis (such as SentiTUT).</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We would like to thank Sacha Monotti Graziadei, Vittorio Di Tomaso and
Vincenzo Cosenza for always stimulating and leading new researches. Eugenia Burchi
and Meghan White for their fundamental supervision of the paper; Matteo Casu
for the essential help in its preparation. Last but not least, all Blogmeter
colleagues for always giving their daily contributions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nissim</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Sentiment Analysis on Italian tweets</article-title>
          .
          <source>Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis</source>
          , pages
          <fpage>100107</fpage>
          ,
          <string-name>
            <surname>Atlanta</surname>
          </string-name>
          , Georgia (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Bing Liu:
          <article-title>Sentiment Analysis and Opinion Mining</article-title>
          . Morgan &amp; Claypool Publishers (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bollen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zeng</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Twitter mood predicts the stock market</article-title>
          .
          <source>Journal of Computational Science</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          ) (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bosco</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bolioli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Developing corpora for sentiment analysis and opinion mining: the case of irony and Senti-TUT</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          , vol.
          <volume>28</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>55</fpage>
          -
          <lpage>63</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cambria</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <article-title>New Avenues in Opinion Mining and Sentiment Analysis IEEE Intelligent Systems</article-title>
          , vol.
          <volume>28</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>15</fpage>
          -
          <lpage>21</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Cambria</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          et al.
          <article-title>Knowledge-Based Approaches to Concept-Level Sentiment Analysis IEEE Intelligent Systems</article-title>
          , vol.
          <volume>28</volume>
          , no.
          <issue>2</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Chihli</given-names>
            <surname>Hung</surname>
          </string-name>
          ,
          <string-name>
            <surname>Hao-Kai Lin</surname>
          </string-name>
          :
          <article-title>Using Objective Words in SentiWordNet to Improve Word-of-Mouth Sentiment Classi cation</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          , vol.
          <volume>28</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>47</fpage>
          -
          <lpage>54</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Cosenza</surname>
          </string-name>
          , V.:
          <article-title>Social Media ROI</article-title>
          .
          <string-name>
            <surname>Apogeo</surname>
          </string-name>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Dini</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Mazzini</surname>
          </string-name>
          , G.:
          <article-title>Opinion classication Through information extraction</article-title>
          .
          <source>Proceedings of the Conference on Data Mining Methods and Databases for Engineering, Finance and Other Fields</source>
          , pp.
          <fpage>299</fpage>
          -
          <lpage>310</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Galati</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Prospettive sulle emozioni e teorie del soggetto</article-title>
          .
          <source>Bollati Boringhieri</source>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Pancaldi</surname>
            ,
            <given-names>V.:</given-names>
          </string-name>
          <article-title>L'azienda centrata sull'ascolto del cliente</article-title>
          .
          <source>FrancoAngeli</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Opinion mining and sentiment analysis</article-title>
          .
          <source>Foundations and Trends in Information Retrieval</source>
          <volume>2</volume>
          (
          <issue>1-2</issue>
          ), pp.
          <volume>1135</volume>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Strapparava</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Valitutti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>"WordNet-A ect: an A ective Extension of WordNet"</article-title>
          ,
          <source>in Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC)</source>
          , pp.
          <fpage>1083</fpage>
          -
          <lpage>1086</lpage>
          , Lisbon (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. UIMA Speci cations http://uima.apache.org/uima-speci cation.
          <source>html The Apache Software Foundation</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>