=Paper= {{Paper |id=Vol-1743/paper19 |storemode=property |title=Big Data Analytics for Opinion Mining and Patterns Detection of the Tunisian Election |pdfUrl=https://ceur-ws.org/Vol-1743/paper19.pdf |volume=Vol-1743 |authors=Zeineb Dhouioui,Hanen Bouali,Jalel Akaichi |dblpUrl=https://dblp.org/rec/conf/simbig/DhouiouiBA16 }} ==Big Data Analytics for Opinion Mining and Patterns Detection of the Tunisian Election== https://ceur-ws.org/Vol-1743/paper19.pdf
     Big Data Analytics for Opinion Mining and Patterns Detection of the
                              Tunisian Election
      Zeineb Dhouioui           Hanen Bouali                                    Jalel Akaichi
     Bestmod Laboratory       Bestmod Laboratory                             Bestmod Laboratory
          ISG Tunis               ISG Tunis                                       ISG Tunis
      University of Tunis     University of Tunis                            University of Tunis
zeineb.dhouioui@hotmail.frhanene.bouali@gmail.com                          j.akaichi@gmail.com

                     Abstract                            data processing techniques (Anjaria and Guddeti,
                                                         2014). The dares includes pattern recognition,
      Big Data refers to an enormous vol-
                                                         analysis, prediction ... Given the benefits of big
      ume of structured and unstructured data
                                                         data, they a↵ord a chance to understand collected
      that cannot be handled with traditional
                                                         data in order to predict data patterns. With mil-
      databases. The emergence of big data
                                                         lions of people using social networks to express
      is due to the huge quantities of informa-
                                                         their opinions, a tremendous volume of data is
      tion. Currently, researchers tend to study
                                                         generated. Unfortunately, elections data that was
      big data particularly data produced from
                                                         published by the ISIE present only a summary of
      social networks. Nowadays, these latest
                                                         the vote counts. Detailed data is not accessible.
      become a vital and crucial tool in track-
                                                         Combining social networks data and ISIE data,
      ing and extracting public opinion for de-
                                                         voter behavior can be defined. (Sudhahar et al.,
      veloped countries o↵ering benefits to the
                                                         2015)
      democratic process of election. In this
                                                         With the evolution of the web 2.0 and the large
      paper, we aim to identify the political
                                                         number of users, big data analysis becomes very
      preferences and tendency of the Tunisian
                                                         useful in pattern detection. Particularly, we han-
      population using classification and opin-
                                                         dle in this paper Tunisian Voters in legislative
      ion mining techniques. To prove the use-
                                                         and presidential election. Indeed, despite the mu-
      fulness of the proposed method, we an-
                                                         tual relation between socio-economic character-
      alyze electoral data sets in Tunisia ob-
                                                         istics and voters in Tunisian elections, for the
      tained from the official sites of indepen-
                                                         best of our knowledge, no researchers treat this
      dent higher instance for election. This
                                                         task. Moreover, Elections in countries on path
      analysis demonstrates close correspon-
                                                         to democracy are considered as a source of enor-
      dence between election results and ex-
                                                         mous quantity of data.
      tracted opinion.
                                                         The exponential rise of social networks popu-
 1    Introduction                                       larity influences real-world politics. These plat-
                                                         forms have been exploited in Arab revolution as
 Big data is an extensively used term in tremen-
                                                         a mobilization tool and showing social move-
 dous data miscellany. Thus, this huge amount of
                                                         ments. Moreover, social networks are used to
 data makes hard and sometimes impossible the
                                                         study voters preferences in order to forecast elec-
 analysis in a comfortable way using traditional




                                                   157
toral results. We cant ignore the role of social       2     Related Works
networks especially Facebook in our election,
                                                       Before presenting the use case of this paper, we
using opinion mining we can analyze and pre-
                                                       flew over existing works handling opinion min-
dict the voting intention of Facebook users in
                                                       ing voters patterns. Two sides were manages:
Tunisia for the 2014 presidential election. The
huge amounts of public opinion data need accu-             • Patterns a↵ected by socio-economic vari-
rate tools to extract useful information.                    ables
In this study, we exploit stables and some statis-
tical official data from ISIE in order to identify         • Voter0 s sentiment analysis
possible correlations between socio-economic           2.1    Socio-Economic Variables
features of voters and elections results. Further-
                                                       The paper develops in (Kreuzer and Pettai,
more, to well draw voting behavior, we study
                                                       2003) an analytical framework, which incorpo-
psychological sociological and economic vari-
                                                       rate politician driven inter-party mobility and
ables. Thus we have discovered high correlation.
                                                       voter induced electoral changes. The proposed
The exploited data consists of the results of the
                                                       framework was applied in Estonia, Latvia and
Tunisian election of 2014 and 2011, also as we
                                                       Lithuania. Authors develop di↵erent strategies
said previously some socio-economic variables
                                                       and patterns which are:
and some data from social networks especially
Facebook and Twitter. In fact, we treat Facebook           • Organizational Affiliation Strategies:
activity data during months preceded the legisla-
                                                                – Staying Put
tive and presidential elections. An important part
of our study is to determine positive and negative              – Party Switching
opinions. These opinions can then be interpreted                – Fusion
to define the political position of voters.                     – Fission
Persons who share several common characteris-                   – Starting Up
tics present a tendency to have common voting
behavior. We are interested in establishing vot-           • Electoral Affiliation Strategies:
ing behavior according to voter0 s profile.                     – The frequency of voters affiliation
The remainder of this paper is as follows: in                     changes
part 2, we discuss existing works handling Socio-
                                                                – Voters preferences among for di↵erent
economic variables a↵ecting elections results
                                                                  party origin
and voter0 s sentiment analysis. In part 3, we
present data and methodologies use in this pa-             • Patterns of Party Systems Transformation:
per in order to present opinion mining patterns
                                                                – Alignment patterns
of voters in the Tunisian election in part 4. Fi-
                                                                – Re-alignment patterns
nally, an overview of our work and opportunities
that has been opened up are provided in part 5.                 – De-alignment patterns

                                                           • Organizational Affiliation Patterns of Politi-
                                                             cians




                                                 158
  • Electoral Affiliation Patterns of Voters:           2.2   Sentiment Analysis

       – Parties having won at least one seat           We should firstly define sentiment analysis called
       – Parties emerging from break away leg-          also opinion mining, it is considered as a new
         islative                                       text analysis (Choy et al., 2011). In fact, sen-
                                                        timent analysis is a technique serving to ana-
       – Parties whose successor or predecessor
                                                        lyze people0 s opinions, sentiments, appraisals,
         (through fusion or fission) won at least
                                                        and emotions. There have been several works
         one seat
                                                        in this area, in (Choy et al., 2011) authors dis-
Other patterns were detected in (Akarca and             cussed how sentiment analysis can predict pres-
Tansel, 2007) which study the voters0 behavior          idential election results in Singapore according
in Turkey. Authors found that Turkish voters take       to the conversion of Twitter data. In (Ceron et
economic performances into account and voters           al., 2015), the presented methodology consists
exhibit a tendency to vote against parties holding      on applying supervised method to analyze the
power. In addition to that, party preferences of        voting intention of Twitter users in the United
Turkish depend on their socio-economic charac-          States during the 2012 presidential election and
teristics. Moreover, Voters distinguish between         for the two round of the centre left2012 primaries
major and minor parties and hold party account-         in Italy. Authors (Jungherr et al., 2012) check out
able for economic growth. The growth rate more          whether Twitter a↵ects the information exchange
than a year before an election does not a↵ect its       about political a↵air. They also present an eval-
outcome. Later, a comparative analysis of voter         uation about Twitter messages which may re-
turnout in regional elections is made in (Hender-       flect political sentiment. Moreover, they analyze
son and McEwen, 2010). This paper conducts a            Twitter activity in order to predict parties popu-
cross sectional examination of voter participation      larity. In (Conover et al., 2011), the main idea
in regional elections in nine states between 2003       was to apply latent semantic analysis to tweets,
and 2006. Authors found that variations in the          and thus they discover hidden structure. Later,
strength of political autonomy and the strength         Lei et al in [10] propose a model able to predict
of attachment to the region among the electorate        the public opinions about republican presidential
have a strong and positive impact on the level          elections. Finally, in (Ceron et al., 2014) authors
of turnout in regional election. The hypothesis         assert that sentiment analysis gives accurate pre-
of voter turnout in regional election is higher in      dictions and can become useful in polls. More-
regions with a high degree of regional distinc-         over,authors in(Mohammad et al., 2014) auto-
tiveness was proved. Other works were done to           matically annotate a set of 2012 US presiden-
see the impact of weather on the voter turnout in       tial election tweets for a number of attributes
(Persson et al., 2014) .The case of Swedish elec-       pertaining to sentiment, emotion, purpose, and
tion system was studied, authors cannot find a          style by crowd sourcing. They show through an
robust and statistically significant negative e↵ect     analysis how public sentiment is shaped, tracking
of Election Day rain turnout, unlike in the US          public sentiment and polarization with respect to
one inch of rain reduces turnout with about one         candidates and issues, understanding the impact
percent.                                                of tweets from various entities. Other research




                                                  159
such as(Dridi, 2015) present a system for find-          and those in informal languages. Thus, the first
ing, from Twitter data, events that raised the in-       set is handled using SentiStrength tool and the
terest of users within a given time period and the       second one manually. Emoticons significance is
important dates for each event. In order to select       also detected using SentiStrength. In this section
the set of events, they used three main criteria:        we propose a general method; this framework is
frequency, variation and Tf-Idf.                         divided into 3 steps:

3   Data and Methodology                                   • Step 1 : Data Collection : status are col-
                                                             lected according to the most frequently used
To reach our objective, we have chosen the re-
                                                             keywords. Status containing insults are re-
sults of the 2014 parliamentary and presidential
                                                             moved, and status written in di↵erent lan-
elections in Tunisia. The election results were
                                                             guages are handled manually.
fairly contested and were not surrounded by any
                                                             People discus and express their opinion by
extraordinary events. The sources of our data are
                                                             commenting, posting or sharing status (or
the web site of the Independent Higher Instance
                                                             tweets). We are interested only to status,
for Election (ISIE) 2014 and the National Insti-
                                                             tweets and comments containing keywords
tute of Statistic (INS). The two references report
                                                             related to election, parties and candidates
the data from 24 governors and 264 delegations
                                                             manually found. People publications may
for the legislative, the first and second round in
                                                             contains emoticons and informal language,
presidential election. We collected also Twitter
                                                             thus we should analyze those manually. Af-
and Facebook information from the start of the
                                                             ter finding appropriate status or comments
legislative election on October 2011 and the elec-
                                                             referring to our subject, the collected data
toral campaign from August 2014 to the end of
                                                             consists of 274 status and 586 comments.
second round of presidential election. Political
tweets and Facebook comments includes those                • Step2 : Status Classification
coming from followers and candidates. To an-                 Counting the number of positive, negative
alyze data collected from ISIE and INS, we ap-               and neutral messages. Actually, using opin-
ply WEKA algorithms. A study in (Bouali and                  ionFinder tool we are able to extract au-
Akaichi, 2014) has been done and proves the ef-              tomatically subjective sentences and senti-
ficiency and efficacy of Support Vector Machine.             ment expressions. Based on SentiStrength
The first step is to load an excel file to Weka.             which estimates the strength of positive and
Second, we select SVM as technique. Then, in                 negative sentiment in texts and emoticons
the training set, we choose the method of cross              in various languages, we classify words into
validation and we select classification0 s attribute.        categories.
Later on, to extract the sentiment of the collected          To design the voter profile, our system is
data, we use opinionFinder. We proceed first for             based on identifying his sentiment. Conse-
the data preprocessing. Second, we identify sen-             quently, we can conclude if a given voter
timent expression in the text document. Then, we             is with or against party or candidate. Ad-
detect two sets of expressions: those written in             ditionally, we can interpret his preferences
formal languages (English, Arabic and French)                according to his opinion.




                                                   160
The aim of this work is to determine whether so-
cial networks data, and the data gathered from
the ISIE official site can predict election re-
sults. The figure below illustrates the proposed
methodology 1:




                                                                 Figure 2: Votes by Governorate


                                                           • Starting up: unaffiliated voters change their
                                                             opinion to be affiliated to another party. We
                                                             have found that Islamist party represented
        Figure 1: The System Architecture                    by Mohamed Moncef Marzouki (MM re-
                                                             ceives their votes from people living in ar-
4    Empirical Results                                       eas su↵ering from lack of economic growth
                                                             such as Sidi Bouzid, Kebili... NIDAA
4.1 Voter Behavior                                           TOUNESS represented by Beji Caied Seb-
The objective of this paper is to extract relevant           ssi (BCE) received their votes from those
and interesting knowledge from big amount of                 living in regions experiencing good eco-
data using supervised learning techniques. when              nomic conditions and economic growth like
applying our chosen methodology, our important               Tunis, Nabeul 2.
finding is that voters distinguish between major
and minor parties in a governing coalition and           The recent election shows a decrease in
hold only the major incumbent party accountable          NAHDHA (Political Party) partisans instead
for economic growth.                                     of an increase in NIDAA TOUNESS(Political
There is a tendency for parties to lose or win por-      Party). This is due to several reasons:
tion of their votes between first and second round         • A tendency to vote against the parties hold-
due to political mergers which eventually happen             ing power: voters who votes BCE to de-
in the second round. This refers to voter turnout            crease the power to MMM.
patterns which are:
                                                           • Government0 s economic performance
    • Staying put: a voter remains affiliated with
      his her party in legislative and both rounds         • Decrease of Quality of life
      of presidential election.
                                                           • Increase of poverty rates.
    • Party Switching: di↵erent from fusion
      (party0 s voters re-affiliates in the form of a    We observed that citizens of a minor socio-
      merger with another party).                        economic status don0 t exhibit political efficacy




                                                   161
           Figure 3: Education Level                          Figure 4: Number of Facebook users


and commonly ”participate less in the voting pro-       In the following table, we present an example of
cess”. In contrast, citizens with important socio-      positive, negative and neutral tweets. The fig-
economic status participate and sponsor the po-
litical process. Thus, voter0 s income and occu-         Opinion           Tweet Example
pation are crucial in vote0 s process. We mention        Nature
that high education level and low level of illit-        Positive          Financial Times: L’lection de
eracy are behind significant civic activity 3. We                          Caid Sebssi est une chance pour
found also that woman voted more to BCE.                                   la Tunisie
                                                         Negative          Des tunisiens craignent un retour
4.2 Voter Sentiment                                                        en arrire aprs lection de BCE!
Tunisian presidential election 2014 in the sec-          Neutral           BCE: Rencontre avec les prsi-
ond round was limited between two main candi-                              dent des clubs sportifs
dates where the concurrence was severe and in-
tense. Actually, most sentiment and opinion have           Table 1: Example of Tweet Classification.
been predicted and discovered according to on-
line discussions, status and comments. Collect-         ures below (figure 4 and 5) provide a curve that
ing and analyzing sentiment is essential for elec-      mentions the evolution of the number of tweets
toral campaigns. Indeed, the enormous quantity          between August 15 and September 14, 2014; we
of data related to public sentiment and opinion         recorded 2,004 tweets containing the hashtag Tn-
is considered as big data which played a crucial        Elec. This hashtag is sometimes used more than
role for Obama campaign.                                150 times a day for the last few weeks and also
With the accessibility of social networking sites,      an overview about the number of Facebook users
internet users can easily express their opinions        that like the candidates official pages. The x
about various topics. Thus, we can exploit and          axis represent the number of negative and pos-
mine the shared opinions. In this work, we col-         itive statues written for both presidential candi-
lect data from Facebook, we focus on the content        dates The number of negative status for MMM
of status containing a reference to political party     is larger comparing to BCE according to the col-
or also politician(Mehndiratta et al., 2014).           lected data (figure 6) which reflects the reality i.e.




                                                  162
                                                       as hypotheses in next elections. In bulk, despite
                                                       the limits of social networks analysis, we can as-
                                                       sert that our results are encouraging.


                                                       References
                                                       Ali T Akarca and Aysit Tansel. 2007. Social and
                                                         economic determinants of turkish voter choice in
                                                         the 1995 parliamentary election. Electoral Studies,
                                                         26(3):633–647.

                                                       Malhar Anjaria and Ram Mohana Reddy Guddeti.
       Figure 5: Tweets number evolution                2014. Influence factor based opinion mining of
                                                        twitter data using supervised learning. In 2014
                                                        Sixth International Conference on Communication
                                                        Systems and Networks (COMSNETS).

                                                       Hanen Bouali and Jalel Akaichi. 2014. Comparative
                                                         study of di↵erent classification techniques: Heart
                                                         disease use case. In Machine Learning and Appli-
                                                         cations (ICMLA), 2014 13th International Confer-
                                                         ence on, pages 482–486. IEEE.

                                                       Andrea Ceron, Luigi Curini, Stefano M Iacus, and
                                                         Giuseppe Porro. 2014. Every tweet counts? how
                                                         sentiment analysis of social media can improve our
                                                         knowledge of citizens political preferences with an
                                                         application to italy and france. New Media & So-
                                                         ciety, 16(2):340–358.
Figure 6: Comparative sentiment variation for
BCE, MMM (Status)                                      Andrea Ceron, Luigi Curini, and Stefano M Iacus.
                                                         2015. Using sentiment analysis to monitor elec-
                                                         toral campaigns method mattersevidence from the
                                                         united states and italy. Social Science Computer
the elections results.                                   Review, 33(1):3–20.

                                                       Murphy Choy, Michelle LF Cheong, Ma Nang Laik,
5   Conclusion                                          and Koo Ping Shung. 2011. A sentiment anal-
                                                        ysis of singapore presidential election 2011 using
Social networks have become a potential source          twitter data with census correction. arXiv preprint
                                                        arXiv:1108.5520.
and a popular practice during election process.
We used sentiment analysis techniques to detect        Michael D Conover, Bruno Gonçalves, Jacob
                                                         Ratkiewicz, Alessandro Flammini, and Filippo
correlations and voters behaviors. Our statisti-         Menczer. 2011. Predicting the political align-
cal analysis of the 2014 Tunisian election results       ment of twitter users. In Privacy, Security, Risk
                                                         and Trust (PASSAT) and 2011 IEEE Third Iner-
and the social, economic and political conditions        national Conference on Social Computing (Social-
surrounding it, leads us to conclude that: our re-       Com), 2011 IEEE Third International Conference
                                                         on, pages 192–199. IEEE.
sults show that the party choice of Tunisian vot-
ers is a↵ected by economic, demographic and            Houssem Eddine Dridi.             2015.     Détection
                                                         d’évènements à partir de twitter.
socio-economic characteristics. We are aware
                                                       Ailsa Henderson and Nicola McEwen. 2010. A com-
that some interpretations require additional data.       parative analysis of voter turnout in regional elec-
Interpretations presented in this work can serve         tions. Electoral Studies, 29(3):405–416.




                                                 163
Andreas Jungherr, Pascal Jürgens, and Harald
  Schoen. 2012. Why the pirate party won the
  german election of 2009 or the trouble with pre-
  dictions: A response to tumasjan, a., sprenger, to,
  sander, pg, & welpe, im predicting elections with
  twitter: What 140 characters reveal about politi-
  cal sentiment. Social Science Computer Review,
  30(2):229–234.
Marcus Kreuzer and Vello Pettai. 2003. Patterns of
 political instability: Affiliation patterns of politi-
 cians and voters in post-communist estonia, latvia,
 and lithuania. Studies in Comparative Interna-
 tional Development, 38(2):76–98.
Pulkit Mehndiratta, Shelly Sachdeva, Pankaj
  Sachdeva, and Yatin Sehgal. 2014. Elections
  again, twitter may help!!! a large scale study for
  predicting election results using twitter. In Big
  Data Analytics, pages 133–144. Springer.
Saif M Mohammad, Xiaodan Zhu, Svetlana Kir-
  itchenko, and Joel Martin. 2014. Sentiment, emo-
  tion, purpose, and style in electoral tweets. Infor-
  mation Processing & Management.
Mikael Persson, Anders Sundell, and Richard
  Öhrvall. 2014. Does election day weather a↵ect
  voter turnout? evidence from swedish elections.
  Electoral Studies, 33:335–342.
Saatviga Sudhahar, Giuseppe A Veltri, and Nello
  Cristianini.  2015.    Automated analysis of
  the us presidential elections using big data
  and network analysis.    Big Data & Society,
  2(1):2053951715572916.




                                                     164