<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Role of social media in propagating controversies: the case of cultural microblog feeds</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Adrian Chifu</string-name>
          <email>adrian.chifu@lsis.org</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fidelia Ibekwe-SanJuan</string-name>
          <email>fidelia.ibekwe-sanjuan@univ-amu.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nathanala Andrianasolo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aix Marseille Univ, IRSIC</institution>
          ,
          <addr-line>Marseille</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The aim of this research is to investigate how social media mediate social controversies in the public arena. For that, we will use the CLEF MC2 corpus of microblogs [1] that captured long term political and cultural controversies in order to follow the birth and development of controversies across time and pinpoint the increasing role that social media play in their propagation, regulation and resolution.</p>
      </abstract>
      <kwd-group>
        <kwd>Focus IR</kwd>
        <kwd>opinion mining</kwd>
        <kwd>information visualization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Social media such as Facebook and Twitter have become the dominant platform
for information and communication in the web and big data era to the extent
that they have displaced the traditional media as news outlet. They have
become the most used channels for disseminating content and negotiating social
status. Contents surrounding individuals, celebrities, political gures are
henceforth publicised via social media. In the last years, important controversies born
outside the digital sphere were quickly propagated on social media, usually via
twitter and Facebook, where they acquired a life of their own before being
publicised elsewhere. In this research, we aim to study the increasing role social
media play in publicising, mediating, regulating and in resolving social
controversies (not scienti c ones). To this end, we have chosen three social controversies
situated at three levels:</p>
      <p>i) A local controversy surrounding the exploitation of the Bois Blanc quarry
in the French island of La Reunion (henceforth the "Bois blanc" or "BB"
controversy);</p>
      <p>ii) an international controversy involving the 2001 Nobel Prize in Physiology
or Medicine Sir Tim Hunt whose comments during a luncheon with women
scientists in Korea in 2015 were judged sexist, triggering a controversy via twitter
which led to his social downfall (loss of reputation, prestige and all his honorary
appointments (Tim Hunt or "TH" controversy);</p>
      <p>iii) A national controversy surrounding Christiane Taubira, the embattled
ex-minister of justice in the Hollande government during her visit at the Cannes
Festival in 2015 (henceforth "Christiane Taubira" or "CT" controversy).</p>
      <p>Owing to restrictions imposed by social media data platforms, we could not
gather all the data linked to the rst two controversies on social media as they
had "passed" by the tie we decided to embark on this study. They will
therefore be studied qualitatively using the traces we were able to gather online from
some twitter accounts, newspapers, websites and blogs. This qualitative study
will serve as a methodology design phase to identify further research questions
for such studies. For instance, it will be interesting to identify who the main
actors were in these controversies, the platforms used to launch and propagate the
controversies, the role of the traditional media (newspapers, TVs, radio) in
publicising the controversies; the thematic content around which the controversies
were cristallised, a timeline of how they were propagated and the role of social
media in bringing it to the attention of the larger public and the media. This
prior qualitative study, on small corpora will enable us to better formalise our
methodology of analysis and identify the parts of it can be automated and scaled
up to work a larger corpus. The Taubira controversy for which a larger corpus
made up of tweets was collected in the framework of the CLEF 2017 Microblog
Cultural Contextualization Track (the MC2@CLEF2017 lab has released a
collection of 70 000 000 microblogs over 18 months dealing with cultural events the
Microblog) will serve as a testbed of the automation of our methodology, tested
on the two previous controversies (BL and TH controversies respectively).
2</p>
    </sec>
    <sec id="sec-2">
      <title>Example of controversy in the MC2 corpus</title>
      <p>The two keywords "taubira" and "cannes" reveal a large controversy. For
instance, after a plain search by these two keywords, the retrieved results provide
hints of controversial opinions. Due to the huge amount of social media
information, one should be able to automatically identify and quantify controversies.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Query related controversy indicator</title>
      <p>We propose several measures to evaluate if query results are impacted by a
controversy on Twitter. We summarize the automatic steps as follows: aspect
identi cation, sentiment polarity identi cation and temporal distribution. We
give more details in the next paragraphs.</p>
      <p>The controversy occurs around entities, however it is not completely
represented by them. For instance, around the entities "Taubira" and "Cannes",
controversy can be subject to various aspects, such as "appearance", or
"presence". Thus, a rst step of aspect identi cation is required, since the the topic
of controversy is not exhaustively characterised by the involved entities.</p>
      <p>
        The words around the identi ed aspects will be features for the analysis.
Thus, the words from a context window around the target aspect are used to
model the sentiments expressed on that particular aspect. The context window
size is empirically set to 10 words around the target aspect (5 before and 5 after),
as in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Based on the context around the aspects, automatic sentiment analysis can
be carried out in order to identify the sentiment polarity. The polarity score is a
real value in the interval [ 1; 1], with -1 being very negative, 0 neutral and 1 very
positive, respectively. The sentiment analysis module is inspired by the research
of Pang et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and it is based on a Nave Bayes classi er trained on a set of
50,000 movie reviews with annotated sentiment polarities, from IMDb3. For a
group of tweets, the dispersion will indicate the intensity of the controversy.
      </p>
      <p>The topics of controversy generate reactions that are distributed in time. We
can capture the temporal distributions of tweets. In this manner, we can identify
when some topic is "fresh", or "hot", or recurrent ("comeback"). The temporal
features also allow to cluster tweets by their period, in order to form some sort
of "discussions".</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ermakova</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mulhem</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nie</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>SanJuan</surname>
          </string-name>
          , E.:
          <article-title>Cultural micro-blog contextualization 2016 workshop overview: data and pilot tasks</article-title>
          . In Balog,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Cappellato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Macdonald</surname>
          </string-name>
          , C., eds.: Working Notes of CLEF 2016 -
          <article-title>Conference and Labs of the Evaluation forum</article-title>
          , Evora, Portugal,
          <fpage>5</fpage>
          -
          <lpage>8</lpage>
          September,
          <year>2016</year>
          . Volume 1609 of CEUR Workshop Proceedings., CEUR-WS.org (
          <year>2016</year>
          )
          <volume>1197</volume>
          {
          <fpage>1200</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Badache</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fournier</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chifu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Harnessing Ratings and Aspect-Sentiment to Estimate Contradiction Intensity in Temporal-Related Reviews (to appear)</article-title>
          .
          <source>In: 21th International Conference on Knowledge Based and Intelligent Information and Engineering Systems</source>
          , KES2017. (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vaithyanathan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Thumbs up?: sentiment classi cation using machine learning techniques</article-title>
          .
          <source>In: EMNLP</source>
          . (
          <year>2002</year>
          )
          <volume>79</volume>
          {
          <fpage>86</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>