Introduction

Role of social media in propagating controversies: the case of cultural microblog feeds

Adrian Chifu

adrian.chifu@lsis.org

Fidelia Ibekwe-SanJuan

fidelia.ibekwe-sanjuan@univ-amu.fr 0

Nathanala Andrianasolo

0 0 Aix Marseille Univ, IRSIC , Marseille , France

The aim of this research is to investigate how social media mediate social controversies in the public arena. For that, we will use the CLEF MC2 corpus of microblogs [1] that captured long term political and cultural controversies in order to follow the birth and development of controversies across time and pinpoint the increasing role that social media play in their propagation, regulation and resolution.

Focus IR opinion mining information visualization

Introduction

Social media such as Facebook and Twitter have become the dominant platform for information and communication in the web and big data era to the extent that they have displaced the traditional media as news outlet. They have become the most used channels for disseminating content and negotiating social status. Contents surrounding individuals, celebrities, political gures are henceforth publicised via social media. In the last years, important controversies born outside the digital sphere were quickly propagated on social media, usually via twitter and Facebook, where they acquired a life of their own before being publicised elsewhere. In this research, we aim to study the increasing role social media play in publicising, mediating, regulating and in resolving social controversies (not scienti c ones). To this end, we have chosen three social controversies situated at three levels:

i) A local controversy surrounding the exploitation of the Bois Blanc quarry in the French island of La Reunion (henceforth the "Bois blanc" or "BB" controversy);

ii) an international controversy involving the 2001 Nobel Prize in Physiology or Medicine Sir Tim Hunt whose comments during a luncheon with women scientists in Korea in 2015 were judged sexist, triggering a controversy via twitter which led to his social downfall (loss of reputation, prestige and all his honorary appointments (Tim Hunt or "TH" controversy);

iii) A national controversy surrounding Christiane Taubira, the embattled ex-minister of justice in the Hollande government during her visit at the Cannes Festival in 2015 (henceforth "Christiane Taubira" or "CT" controversy).

Owing to restrictions imposed by social media data platforms, we could not gather all the data linked to the rst two controversies on social media as they had "passed" by the tie we decided to embark on this study. They will therefore be studied qualitatively using the traces we were able to gather online from some twitter accounts, newspapers, websites and blogs. This qualitative study will serve as a methodology design phase to identify further research questions for such studies. For instance, it will be interesting to identify who the main actors were in these controversies, the platforms used to launch and propagate the controversies, the role of the traditional media (newspapers, TVs, radio) in publicising the controversies; the thematic content around which the controversies were cristallised, a timeline of how they were propagated and the role of social media in bringing it to the attention of the larger public and the media. This prior qualitative study, on small corpora will enable us to better formalise our methodology of analysis and identify the parts of it can be automated and scaled up to work a larger corpus. The Taubira controversy for which a larger corpus made up of tweets was collected in the framework of the CLEF 2017 Microblog Cultural Contextualization Track (the MC2@CLEF2017 lab has released a collection of 70 000 000 microblogs over 18 months dealing with cultural events the Microblog) will serve as a testbed of the automation of our methodology, tested on the two previous controversies (BL and TH controversies respectively). 2

Example of controversy in the MC2 corpus

The two keywords "taubira" and "cannes" reveal a large controversy. For instance, after a plain search by these two keywords, the retrieved results provide hints of controversial opinions. Due to the huge amount of social media information, one should be able to automatically identify and quantify controversies. 3

Query related controversy indicator

We propose several measures to evaluate if query results are impacted by a controversy on Twitter. We summarize the automatic steps as follows: aspect identi cation, sentiment polarity identi cation and temporal distribution. We give more details in the next paragraphs.

The controversy occurs around entities, however it is not completely represented by them. For instance, around the entities "Taubira" and "Cannes", controversy can be subject to various aspects, such as "appearance", or "presence". Thus, a rst step of aspect identi cation is required, since the the topic of controversy is not exhaustively characterised by the involved entities.

The words around the identi ed aspects will be features for the analysis. Thus, the words from a context window around the target aspect are used to model the sentiments expressed on that particular aspect. The context window size is empirically set to 10 words around the target aspect (5 before and 5 after), as in [ 2 ].

Based on the context around the aspects, automatic sentiment analysis can be carried out in order to identify the sentiment polarity. The polarity score is a real value in the interval [ 1; 1], with -1 being very negative, 0 neutral and 1 very positive, respectively. The sentiment analysis module is inspired by the research of Pang et al. [ 3 ] and it is based on a Nave Bayes classi er trained on a set of 50,000 movie reviews with annotated sentiment polarities, from IMDb3. For a group of tweets, the dispersion will indicate the intensity of the controversy.

The topics of controversy generate reactions that are distributed in time. We can capture the temporal distributions of tweets. In this manner, we can identify when some topic is "fresh", or "hot", or recurrent ("comeback"). The temporal features also allow to cluster tweets by their period, in order to form some sort of "discussions".

1. Ermakova , L. , Goeuriot , L. , Mothe , J. , Mulhem , P. , Nie , J. , SanJuan , E.: Cultural micro-blog contextualization 2016 workshop overview: data and pilot tasks . In Balog, K. , Cappellato , L. , Ferro , N. , Macdonald , C., eds.: Working Notes of CLEF 2016 - Conference and Labs of the Evaluation forum , Evora, Portugal, 5 - 8 September, 2016 . Volume 1609 of CEUR Workshop Proceedings., CEUR-WS.org ( 2016 ) 1197 { 1200

2. Badache , I. , Fournier , S. , Chifu , A. : Harnessing Ratings and Aspect-Sentiment to Estimate Contradiction Intensity in Temporal-Related Reviews (to appear) . In: 21th International Conference on Knowledge Based and Intelligent Information and Engineering Systems , KES2017. ( 2017 )

3. Pang , B. , Lee , L. , Vaithyanathan , S. : Thumbs up?: sentiment classi cation using machine learning techniques . In: EMNLP . ( 2002 ) 79 { 86