Big Data Analytics for Opinion Mining and Patterns Detection of the Tunisian Election Zeineb Dhouioui Hanen Bouali Jalel Akaichi Bestmod Laboratory Bestmod Laboratory Bestmod Laboratory ISG Tunis ISG Tunis ISG Tunis University of Tunis University of Tunis University of Tunis zeineb.dhouioui@hotmail.frhanene.bouali@gmail.com j.akaichi@gmail.com Abstract data processing techniques (Anjaria and Guddeti, 2014). The dares includes pattern recognition, Big Data refers to an enormous vol- analysis, prediction ... Given the benefits of big ume of structured and unstructured data data, they a↵ord a chance to understand collected that cannot be handled with traditional data in order to predict data patterns. With mil- databases. The emergence of big data lions of people using social networks to express is due to the huge quantities of informa- their opinions, a tremendous volume of data is tion. Currently, researchers tend to study generated. Unfortunately, elections data that was big data particularly data produced from published by the ISIE present only a summary of social networks. Nowadays, these latest the vote counts. Detailed data is not accessible. become a vital and crucial tool in track- Combining social networks data and ISIE data, ing and extracting public opinion for de- voter behavior can be defined. (Sudhahar et al., veloped countries o↵ering benefits to the 2015) democratic process of election. In this With the evolution of the web 2.0 and the large paper, we aim to identify the political number of users, big data analysis becomes very preferences and tendency of the Tunisian useful in pattern detection. Particularly, we han- population using classification and opin- dle in this paper Tunisian Voters in legislative ion mining techniques. To prove the use- and presidential election. Indeed, despite the mu- fulness of the proposed method, we an- tual relation between socio-economic character- alyze electoral data sets in Tunisia ob- istics and voters in Tunisian elections, for the tained from the official sites of indepen- best of our knowledge, no researchers treat this dent higher instance for election. This task. Moreover, Elections in countries on path analysis demonstrates close correspon- to democracy are considered as a source of enor- dence between election results and ex- mous quantity of data. tracted opinion. The exponential rise of social networks popu- 1 Introduction larity influences real-world politics. These plat- forms have been exploited in Arab revolution as Big data is an extensively used term in tremen- a mobilization tool and showing social move- dous data miscellany. Thus, this huge amount of ments. Moreover, social networks are used to data makes hard and sometimes impossible the study voters preferences in order to forecast elec- analysis in a comfortable way using traditional 157 toral results. We cant ignore the role of social 2 Related Works networks especially Facebook in our election, Before presenting the use case of this paper, we using opinion mining we can analyze and pre- flew over existing works handling opinion min- dict the voting intention of Facebook users in ing voters patterns. Two sides were manages: Tunisia for the 2014 presidential election. The huge amounts of public opinion data need accu- • Patterns a↵ected by socio-economic vari- rate tools to extract useful information. ables In this study, we exploit stables and some statis- tical official data from ISIE in order to identify • Voter0 s sentiment analysis possible correlations between socio-economic 2.1 Socio-Economic Variables features of voters and elections results. Further- The paper develops in (Kreuzer and Pettai, more, to well draw voting behavior, we study 2003) an analytical framework, which incorpo- psychological sociological and economic vari- rate politician driven inter-party mobility and ables. Thus we have discovered high correlation. voter induced electoral changes. The proposed The exploited data consists of the results of the framework was applied in Estonia, Latvia and Tunisian election of 2014 and 2011, also as we Lithuania. Authors develop di↵erent strategies said previously some socio-economic variables and patterns which are: and some data from social networks especially Facebook and Twitter. In fact, we treat Facebook • Organizational Affiliation Strategies: activity data during months preceded the legisla- – Staying Put tive and presidential elections. An important part of our study is to determine positive and negative – Party Switching opinions. These opinions can then be interpreted – Fusion to define the political position of voters. – Fission Persons who share several common characteris- – Starting Up tics present a tendency to have common voting behavior. We are interested in establishing vot- • Electoral Affiliation Strategies: ing behavior according to voter0 s profile. – The frequency of voters affiliation The remainder of this paper is as follows: in changes part 2, we discuss existing works handling Socio- – Voters preferences among for di↵erent economic variables a↵ecting elections results party origin and voter0 s sentiment analysis. In part 3, we present data and methodologies use in this pa- • Patterns of Party Systems Transformation: per in order to present opinion mining patterns – Alignment patterns of voters in the Tunisian election in part 4. Fi- – Re-alignment patterns nally, an overview of our work and opportunities that has been opened up are provided in part 5. – De-alignment patterns • Organizational Affiliation Patterns of Politi- cians 158 • Electoral Affiliation Patterns of Voters: 2.2 Sentiment Analysis – Parties having won at least one seat We should firstly define sentiment analysis called – Parties emerging from break away leg- also opinion mining, it is considered as a new islative text analysis (Choy et al., 2011). In fact, sen- timent analysis is a technique serving to ana- – Parties whose successor or predecessor lyze people0 s opinions, sentiments, appraisals, (through fusion or fission) won at least and emotions. There have been several works one seat in this area, in (Choy et al., 2011) authors dis- Other patterns were detected in (Akarca and cussed how sentiment analysis can predict pres- Tansel, 2007) which study the voters0 behavior idential election results in Singapore according in Turkey. Authors found that Turkish voters take to the conversion of Twitter data. In (Ceron et economic performances into account and voters al., 2015), the presented methodology consists exhibit a tendency to vote against parties holding on applying supervised method to analyze the power. In addition to that, party preferences of voting intention of Twitter users in the United Turkish depend on their socio-economic charac- States during the 2012 presidential election and teristics. Moreover, Voters distinguish between for the two round of the centre left2012 primaries major and minor parties and hold party account- in Italy. Authors (Jungherr et al., 2012) check out able for economic growth. The growth rate more whether Twitter a↵ects the information exchange than a year before an election does not a↵ect its about political a↵air. They also present an eval- outcome. Later, a comparative analysis of voter uation about Twitter messages which may re- turnout in regional elections is made in (Hender- flect political sentiment. Moreover, they analyze son and McEwen, 2010). This paper conducts a Twitter activity in order to predict parties popu- cross sectional examination of voter participation larity. In (Conover et al., 2011), the main idea in regional elections in nine states between 2003 was to apply latent semantic analysis to tweets, and 2006. Authors found that variations in the and thus they discover hidden structure. Later, strength of political autonomy and the strength Lei et al in [10] propose a model able to predict of attachment to the region among the electorate the public opinions about republican presidential have a strong and positive impact on the level elections. Finally, in (Ceron et al., 2014) authors of turnout in regional election. The hypothesis assert that sentiment analysis gives accurate pre- of voter turnout in regional election is higher in dictions and can become useful in polls. More- regions with a high degree of regional distinc- over,authors in(Mohammad et al., 2014) auto- tiveness was proved. Other works were done to matically annotate a set of 2012 US presiden- see the impact of weather on the voter turnout in tial election tweets for a number of attributes (Persson et al., 2014) .The case of Swedish elec- pertaining to sentiment, emotion, purpose, and tion system was studied, authors cannot find a style by crowd sourcing. They show through an robust and statistically significant negative e↵ect analysis how public sentiment is shaped, tracking of Election Day rain turnout, unlike in the US public sentiment and polarization with respect to one inch of rain reduces turnout with about one candidates and issues, understanding the impact percent. of tweets from various entities. Other research 159 such as(Dridi, 2015) present a system for find- and those in informal languages. Thus, the first ing, from Twitter data, events that raised the in- set is handled using SentiStrength tool and the terest of users within a given time period and the second one manually. Emoticons significance is important dates for each event. In order to select also detected using SentiStrength. In this section the set of events, they used three main criteria: we propose a general method; this framework is frequency, variation and Tf-Idf. divided into 3 steps: 3 Data and Methodology • Step 1 : Data Collection : status are col- lected according to the most frequently used To reach our objective, we have chosen the re- keywords. Status containing insults are re- sults of the 2014 parliamentary and presidential moved, and status written in di↵erent lan- elections in Tunisia. The election results were guages are handled manually. fairly contested and were not surrounded by any People discus and express their opinion by extraordinary events. The sources of our data are commenting, posting or sharing status (or the web site of the Independent Higher Instance tweets). We are interested only to status, for Election (ISIE) 2014 and the National Insti- tweets and comments containing keywords tute of Statistic (INS). The two references report related to election, parties and candidates the data from 24 governors and 264 delegations manually found. People publications may for the legislative, the first and second round in contains emoticons and informal language, presidential election. We collected also Twitter thus we should analyze those manually. Af- and Facebook information from the start of the ter finding appropriate status or comments legislative election on October 2011 and the elec- referring to our subject, the collected data toral campaign from August 2014 to the end of consists of 274 status and 586 comments. second round of presidential election. Political tweets and Facebook comments includes those • Step2 : Status Classification coming from followers and candidates. To an- Counting the number of positive, negative alyze data collected from ISIE and INS, we ap- and neutral messages. Actually, using opin- ply WEKA algorithms. A study in (Bouali and ionFinder tool we are able to extract au- Akaichi, 2014) has been done and proves the ef- tomatically subjective sentences and senti- ficiency and efficacy of Support Vector Machine. ment expressions. Based on SentiStrength The first step is to load an excel file to Weka. which estimates the strength of positive and Second, we select SVM as technique. Then, in negative sentiment in texts and emoticons the training set, we choose the method of cross in various languages, we classify words into validation and we select classification0 s attribute. categories. Later on, to extract the sentiment of the collected To design the voter profile, our system is data, we use opinionFinder. We proceed first for based on identifying his sentiment. Conse- the data preprocessing. Second, we identify sen- quently, we can conclude if a given voter timent expression in the text document. Then, we is with or against party or candidate. Ad- detect two sets of expressions: those written in ditionally, we can interpret his preferences formal languages (English, Arabic and French) according to his opinion. 160 The aim of this work is to determine whether so- cial networks data, and the data gathered from the ISIE official site can predict election re- sults. The figure below illustrates the proposed methodology 1: Figure 2: Votes by Governorate • Starting up: unaffiliated voters change their opinion to be affiliated to another party. We have found that Islamist party represented Figure 1: The System Architecture by Mohamed Moncef Marzouki (MM re- ceives their votes from people living in ar- 4 Empirical Results eas su↵ering from lack of economic growth such as Sidi Bouzid, Kebili... NIDAA 4.1 Voter Behavior TOUNESS represented by Beji Caied Seb- The objective of this paper is to extract relevant ssi (BCE) received their votes from those and interesting knowledge from big amount of living in regions experiencing good eco- data using supervised learning techniques. when nomic conditions and economic growth like applying our chosen methodology, our important Tunis, Nabeul 2. finding is that voters distinguish between major and minor parties in a governing coalition and The recent election shows a decrease in hold only the major incumbent party accountable NAHDHA (Political Party) partisans instead for economic growth. of an increase in NIDAA TOUNESS(Political There is a tendency for parties to lose or win por- Party). This is due to several reasons: tion of their votes between first and second round • A tendency to vote against the parties hold- due to political mergers which eventually happen ing power: voters who votes BCE to de- in the second round. This refers to voter turnout crease the power to MMM. patterns which are: • Government0 s economic performance • Staying put: a voter remains affiliated with his her party in legislative and both rounds • Decrease of Quality of life of presidential election. • Increase of poverty rates. • Party Switching: di↵erent from fusion (party0 s voters re-affiliates in the form of a We observed that citizens of a minor socio- merger with another party). economic status don0 t exhibit political efficacy 161 Figure 3: Education Level Figure 4: Number of Facebook users and commonly ”participate less in the voting pro- In the following table, we present an example of cess”. In contrast, citizens with important socio- positive, negative and neutral tweets. The fig- economic status participate and sponsor the po- litical process. Thus, voter0 s income and occu- Opinion Tweet Example pation are crucial in vote0 s process. We mention Nature that high education level and low level of illit- Positive Financial Times: L’lection de eracy are behind significant civic activity 3. We Caid Sebssi est une chance pour found also that woman voted more to BCE. la Tunisie Negative Des tunisiens craignent un retour 4.2 Voter Sentiment en arrire aprs lection de BCE! Tunisian presidential election 2014 in the sec- Neutral BCE: Rencontre avec les prsi- ond round was limited between two main candi- dent des clubs sportifs dates where the concurrence was severe and in- tense. Actually, most sentiment and opinion have Table 1: Example of Tweet Classification. been predicted and discovered according to on- line discussions, status and comments. Collect- ures below (figure 4 and 5) provide a curve that ing and analyzing sentiment is essential for elec- mentions the evolution of the number of tweets toral campaigns. Indeed, the enormous quantity between August 15 and September 14, 2014; we of data related to public sentiment and opinion recorded 2,004 tweets containing the hashtag Tn- is considered as big data which played a crucial Elec. This hashtag is sometimes used more than role for Obama campaign. 150 times a day for the last few weeks and also With the accessibility of social networking sites, an overview about the number of Facebook users internet users can easily express their opinions that like the candidates official pages. The x about various topics. Thus, we can exploit and axis represent the number of negative and pos- mine the shared opinions. In this work, we col- itive statues written for both presidential candi- lect data from Facebook, we focus on the content dates The number of negative status for MMM of status containing a reference to political party is larger comparing to BCE according to the col- or also politician(Mehndiratta et al., 2014). lected data (figure 6) which reflects the reality i.e. 162 as hypotheses in next elections. In bulk, despite the limits of social networks analysis, we can as- sert that our results are encouraging. References Ali T Akarca and Aysit Tansel. 2007. Social and economic determinants of turkish voter choice in the 1995 parliamentary election. Electoral Studies, 26(3):633–647. Malhar Anjaria and Ram Mohana Reddy Guddeti. Figure 5: Tweets number evolution 2014. Influence factor based opinion mining of twitter data using supervised learning. In 2014 Sixth International Conference on Communication Systems and Networks (COMSNETS). Hanen Bouali and Jalel Akaichi. 2014. Comparative study of di↵erent classification techniques: Heart disease use case. In Machine Learning and Appli- cations (ICMLA), 2014 13th International Confer- ence on, pages 482–486. IEEE. Andrea Ceron, Luigi Curini, Stefano M Iacus, and Giuseppe Porro. 2014. Every tweet counts? how sentiment analysis of social media can improve our knowledge of citizens political preferences with an application to italy and france. New Media & So- ciety, 16(2):340–358. Figure 6: Comparative sentiment variation for BCE, MMM (Status) Andrea Ceron, Luigi Curini, and Stefano M Iacus. 2015. Using sentiment analysis to monitor elec- toral campaigns method mattersevidence from the united states and italy. Social Science Computer the elections results. Review, 33(1):3–20. Murphy Choy, Michelle LF Cheong, Ma Nang Laik, 5 Conclusion and Koo Ping Shung. 2011. A sentiment anal- ysis of singapore presidential election 2011 using Social networks have become a potential source twitter data with census correction. arXiv preprint arXiv:1108.5520. and a popular practice during election process. We used sentiment analysis techniques to detect Michael D Conover, Bruno Gonçalves, Jacob Ratkiewicz, Alessandro Flammini, and Filippo correlations and voters behaviors. Our statisti- Menczer. 2011. Predicting the political align- cal analysis of the 2014 Tunisian election results ment of twitter users. In Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Iner- and the social, economic and political conditions national Conference on Social Computing (Social- surrounding it, leads us to conclude that: our re- Com), 2011 IEEE Third International Conference on, pages 192–199. IEEE. sults show that the party choice of Tunisian vot- ers is a↵ected by economic, demographic and Houssem Eddine Dridi. 2015. Détection d’évènements à partir de twitter. socio-economic characteristics. We are aware Ailsa Henderson and Nicola McEwen. 2010. A com- that some interpretations require additional data. parative analysis of voter turnout in regional elec- Interpretations presented in this work can serve tions. Electoral Studies, 29(3):405–416. 163 Andreas Jungherr, Pascal Jürgens, and Harald Schoen. 2012. Why the pirate party won the german election of 2009 or the trouble with pre- dictions: A response to tumasjan, a., sprenger, to, sander, pg, & welpe, im predicting elections with twitter: What 140 characters reveal about politi- cal sentiment. Social Science Computer Review, 30(2):229–234. Marcus Kreuzer and Vello Pettai. 2003. Patterns of political instability: Affiliation patterns of politi- cians and voters in post-communist estonia, latvia, and lithuania. Studies in Comparative Interna- tional Development, 38(2):76–98. Pulkit Mehndiratta, Shelly Sachdeva, Pankaj Sachdeva, and Yatin Sehgal. 2014. Elections again, twitter may help!!! a large scale study for predicting election results using twitter. In Big Data Analytics, pages 133–144. Springer. Saif M Mohammad, Xiaodan Zhu, Svetlana Kir- itchenko, and Joel Martin. 2014. Sentiment, emo- tion, purpose, and style in electoral tweets. Infor- mation Processing & Management. Mikael Persson, Anders Sundell, and Richard Öhrvall. 2014. Does election day weather a↵ect voter turnout? evidence from swedish elections. Electoral Studies, 33:335–342. Saatviga Sudhahar, Giuseppe A Veltri, and Nello Cristianini. 2015. Automated analysis of the us presidential elections using big data and network analysis. Big Data & Society, 2(1):2053951715572916. 164