=Paper= {{Paper |id=Vol-1743/paper3 |storemode=property |title=Opinion Analysis Applied to Politics: A Case Study based on Twitter |pdfUrl=https://ceur-ws.org/Vol-1743/paper3.pdf |volume=Vol-1743 |authors=Gilberto Nunes,Denivaldo Lopes,Zair Abdelouahab |dblpUrl=https://dblp.org/rec/conf/simbig/NunesLA16 }} ==Opinion Analysis Applied to Politics: A Case Study based on Twitter== https://ceur-ws.org/Vol-1743/paper3.pdf
                          Opinion Analysis Applied to Politics:
                             A case study based on Twitter
                Gilberto Nunes            Denivaldo Lopes and Zair Abdelouahab
          Federal Institute of Education, Federal University of Maranhão - UFMA /
            Science and Technology               São Luı́s, Maranhão, Brazil
                 of Piauı́ - IFPI /          denivaldo.lopes@ufma.br,
               Picos, Piauı́, Brazil               zair@dee.ufma.br
       gilberto.nunes@ifpi.edu.br


                   Abstract                            1   Introduction
                                                       Prasetyo and Hauff (2015), Jungherr (2013) and
                                                       Lampos (2012) propose approaches to determine
   Nowadays, social networks such as Face-
                                                       the voter intention polls based on information re-
   book and Twitter are openly available
                                                       covered from Twitter.
   for everyone around the world over the
                                                          In this paper, we propose another approach
   Internet. These websites provide some
                                                       based on opinion analysis applied to politics in
   functionality without costs, such as: cre-
                                                       order to colect information from Twitter and de-
   ation/edition of communities and social
                                                       termine the public opinion about the current im-
   networks; it provides support to a large va-
                                                       peachment process in Brazil that is submitted the
   riety of multimedia contents (e.g. audio
                                                       elected president in October 2014.
   and video) and support to interactive com-
                                                          During the process of impeachment, as well as
   munications (e.g. chats and post). Twit-
                                                       the electoral process, opinion surveys are applied
   ter’s users post comments about a range
                                                       such as presented by Rothschild1 . He says that
   of subjects, such as, products, famous per-
                                                       this opinion survey is generally based on data ob-
   sons and politics. The dissemination of
                                                       tained from printed forms filled by the population.
   the information in these social networks
                                                       Our approach is based on opinion analysis to ana-
   should be considered due to their global
                                                       lyze messages obtained from Twitter to determine
   coverage. An important functionality of
                                                       the Brazilian population’s opinion about the im-
   Twitter is the support to georeferenced
                                                       peachment of the Brazil’s president. According to
   posts making the localization of posts pos-
                                                       Currie (1998), impeachment is considered a pro-
   sible. In this paper, we propose an ap-
                                                       cess that can result in the removal of a person from
   proach to make the Sentiment Analysis or
                                                       public office after this person has violated the Con-
   Opinion Mining. Our approach is based on
                                                       stitution of her country.
   Mining Web and of Opinion, Geographic
                                                          In this paper, we show an approach based on
   Information System (GIS) and Machine
                                                       knowledge discovery of textual sources, data from
   Learning in order to recover relevant in-
                                                       social networks (e.g. Twitter), Mining Web and of
   formation from tweets. The information
                                                       Opinion, Geographic Information System and Ma-
   recovered follows our approach is essen-
                                                       chine Learning. Applying our proposed approach,
   tial to provide support to the verification
                                                       opinion trends about impeachment can be identi-
   of population trends, e.g. in politics do-
                                                       fied in the Brazilian population.
   main. We propose a prototype that makes
                                                          This paper is presented as follows. Section 3
   the analysis of population trends, in spe-
                                                       presents some fundamental concepts to this re-
   cial, Brazil’s politic context and the im-
                                                       search work. Section 4 presents our approach for
   peachment process in course.
                                                       performing opinion analysis from data obtained in
                                                       Twitter, with Web Mining support, about impeach-
                                                           1
  Keywords — Web Mining; Opinion Mining;                     Forecasting Elections: Voter Intentions versus
                                                       Expectations — Brookings Institution - Link for ebook:
Machine Learning; Opinion Analysis; Twitter;           http://www.brookings.edu/research/papers/2012/11/01-
Geographic Information System.                         voter-expectations-wolfers.




                                                  35
ment process development in Brazil. Section 5              (2011). Second author (Zhang, 2011), Web Min-
presents some results about the impeachment pro-           ing aims to find useful knowledge from Web and
cess development in Brazil. Section 6 shows a              on the basis data mining, text mining, and multi-
case study according to the impeachment process.           media to combine the traditional data mining tech-
Section 7 presents some conclusions and future di-         niques with Web. This mining type can be subdi-
rections.                                                  vided in: Web Content Mining, Web Usage Min-
                                                           ing, Web Structure Mining. Mining Web Content
2   Related Works                                          refers to the extraction Web page content, the text
Most the related works use sentiment analysis              contained on those pages is a good example con-
and opinion mining for evaluate voting inten-              tents to be extracted. Web Usage Mining is the au-
tions, taking into consideration only the content          tomated recognition user utilization patterns based
the posts. For instance, we use for illustration           on the Web site. Web Structure Mining is based
purpose the following approaches: Prasetyo and             on interconnection between data or information in
Hauff (Dwi Prasetyo and Hauff, 2015), Jungherr             documents or sites by Web. Figure 1 illustrates the
(Jungherr, 2013) and Lampos (Lampos, 2012), as             subdivision.
will be described below.
   Prasetyo and Hauff (Dwi Prasetyo and Hauff,
2015), propose the use Twitter-based election
forecasting, sentiment analysis and machine learn-
ing techniques to determine voting intention. For
Indonesia’s presidential elections 2014.
   Jungherr (Jungherr, 2013), shows a work us-
                                                           Figure 1: Basic Taxonomy for Web Mining. Font
ing four metrics to determine voting intention,
                                                           the image: (Zhang, 2011).
likewise: the total number hashtags mentioning a
given political party; the dynamics between men-
tions positive or negative a given political party;
                                                           3.2   Opinion Mining
the total number hashtags mentioning one the can-
didates; and the total number users who used hash-         Opinion mining can be defined as a computational
tags mentioning a given party or candidate. For            technique that takes care opinion in textual sources
Germany presidential elections 2009.                       (Pang and Lee, 2008). It aims to extract informa-
   Lampos (Lampos, 2012), shows a study tech-              tion based on sentiment analysis (e.g. positive,
niques and patterns for extracting positive or neg-        negative and neutral) expressed by one or more
ative sentiment from tweets, which build on each           writers and their texts (Pang and Lee, 2008). Opin-
other, through a supervised approach for turning           ion mining has a process that analyzes a large vol-
sentiment into voting intention percentages. For           ume textual documents that contains a range sub-
United Kingdom presidential elections 2010.                jects, such as, entertainment, politics, education
   Differently from approaches mentioned above,            and marketing. The social networks like Twitter
our work uses georeferenced data, addition to the          have supported their users to express and share
textual content. Thus we can easily perform a spa-         opinions and points view. Thus, social networks
tial analysis, as shown the proposed case study            can be seen as a large documents volume in tex-
(vide section 6). In the next section, we described        tual source and digital format.
the technological used in our case study.
                                                           3.3   Geographic Information System (GIS)
3   Overview
                                                           According to Nuhcan (2014), Geographic Infor-
In this section, we present the subjects Web Min-          mation System (GIS) can be understood as a com-
ing, Opinion Mining, Geographic Information                putational information system like any other, but
System (GIS), Machine Learning and Twitter.                the differential is the database that stores geo-
                                                           referenced data, i.e. the database includes lati-
3.1 Web Mining                                             tude/longitude information linked to the data. Ini-
Web Mining is a process extracting data or infor-          tially, GIS applications were restricted to desktop
mation from web sources, as described by Zhang             computers, but nowadays they are present the Web



                                                      36
(Servers to maps) and the Smartphones (map ap-                    4       Proposed Approach for Opinion
plications).                                                              Mining Applied to Politics
3.4 Machine Learning                                              Our proposed approach for opinion mining ap-
Machine Learning is a subarea of Artificial In-                   plied to politics is based on Knowledge Discov-
telligence where the focus is to develop compu-                   ery in Databases (KDD) (Fayyad et al., 1996). To
tational methods order to provide intelligent be-                 reach the proposed objectives this article, it was
havior to computers (Arel et al., 2010). Examples                 developed an approach that consists of five steps.
of Machine Learning are Support Vector Machine                    This approach has been implemented by a Soft-
(SVM) (1998), Random Forest (2001) and Naive                      ware Prototype3 that assists its execution. The
Bayes (2006).                                                     prototype composed to two modules, one for re-
                                                                  covery (Works interconnected to Search API) and
3.5 Twitter                                                       another for the analysis (Works interconnected to
                                                                  API WEKA) of data. In the first stage, occurs the
Twitter is a social networking service that enables
                                                                  acquisition of data (tweets). In the second stage,
the users to send and receive messages denomi-
                                                                  preprocessing data, to remove noisy structures. In
nated tweets that have 140 character of maximum
                                                                  the third stage, the feature extraction of the data
size for each post. Twitter has a large number
                                                                  using TF-IDF (Robertson, 2004). In the fourth
of content such as profiles, general information,
                                                                  step, we have the Text Mining, by means of the ap-
tweets, emotions, hastags and other (Tiara et al.,
                                                                  plied of the algorithms to Machine Learning pre-
2015). This social network provides basically two
                                                                  sented previously. The fifth and last step, contem-
API2 to support the recovery of data: Search API
                                                                  plates the evaluation the results obtained through
and Streaming API. In our approach, we apply the
                                                                  the analysis of the Confusion Matrix (Sokolova
Twitter API in order to recover the tweets and the
                                                                  and Lapalme, 2009). Finally, we found the pos-
georeferenced location where they were posted.
                                                                  itive and negative opinions. Figura 2 introduces
3.6 Metrics Evaluation                                            the proposed approach which is based on KDD
                                                                  (Fayyad et al., 1996).
Once the Twitter data have been collected and pro-
cessed, it needs a mechanism to determine the va-                    It is worth mentioning the importance of using
lidity of the classification applied (Sokolova and                the WEKA4 tool and its API during execution of
Lapalme, 2009). Table 1 introduces the confusion                  the steps present in this approach, with the excep-
matrix that is used to assist the calculation of the              tion of the data stage acquisition.
evaluation.                                                       4.1      Acquisition of datas
                                                                  The data (tweets) were recovered using the Search
             Table 1: Confusion Matrix.
                         Predict                                  Twitter API. During the recovery process of tweets
                     positive negative                            it is necessary Web Mining, specifically the Web
                       TP         FN                              Content Mining, in which recovered the texts con-
                 negative positive




                       True      False                            tained in the posts by users the Twitter. A total of
                     Positive Negative                            1,218 georeferenced tweets were collected, based
                        FP        TN                              on posts related to impeachment of the President
                                                                  Dilma, during the March of 2016 which contained
          Real




                      False      True
                     Positive Negative                            the following hashtags:

          Font of data (Sokolova and La-                              #FicaDilma,         #SouMaisDilma,
          palme, 2009).                                               #NaoVaiTerGolpe,           #FicaPT,
                                                                      #FicaLula,           #NaoAoGolpe,
   The metrics used this article are derived from                     #ForaDilma, #ForaPT, #ForaPTralhas,
the Confusion Matrix, which are: Accuracy, Sen-                       #ForaLula,        #ForaDilmaLulaPT,
sitivity or Recall, Specificity, F1-Score and Preci-                  3
                                                                        Software Prototype - It is the result of applying a soft-
sion (Sokolova and Lapalme, 2009).                                ware process, as defined (Sommerville, 2006).
                                                                      4
                                                                        Machine Learning Group at the University of
  2
    Documentation Twitter Developers - Link for docu-             Waikato - Version 3.7.12 and documentation, link:
mentation: https://dev.twitter.com/overview/documentation.        http://www.cs.waikato.ac.nz/ml/weka/documentation.html.




                                                             37
       Figure 2: Proposed Approach for Opinion Mining (Based on KDD (Fayyad et al., 1996)).


  #DilmaNao,        #VaiTerImpeachment,                     ematical definition of the TF-IDF (Robertson,
  #NaiVaiTerGolpeVaiTerImpeachment.                         2004) model. The Text Mining has a model the
                                                            representation using often as feature set, known
The georeferencing of tweets corresponds to the
                                                            as “bag-of-words model”, with the help of the
26 Brazilian state capitals and the Brazilian Fed-
                                                            WEKA4 tool and using your StringToWordVector
eral District. This step it is performed by recovery
                                                            method one created the model used this paper. In
module the prototype made during the search.
                                                            this model, documents are represented as a word
4.2 Preprocessing                                           vector. Thus, all documents are represented as a
                                                            giant document/term matrix. In this paper, TF/IDF
Before feature extraction of the tweets, it is im-
                                                            (Robertson, 2004) was used as the cell value to
portant to remove the unwanted structures, such
                                                            dampen the importance of those terms if it appears
like: hyperlinks irrelevant words, special charac-
                                                            in many documents. This step it is performed by
ters, and other references. After removing those,
                                                            the analysis module assisted by the prototype.
it is necessary the stemming and normalization
applied on tweets. It is important to emphasize             4.4   Text Mining
that the preprocessing occurs in copies of tweets
collected (corpus (Khairnar and Kinikar, 2013)).            Once generated the numeric matrix values, these
Such device, it seeks to maintain the original              values are used as inputs to the classification al-
tweets intact for avoid any inconsistencies. As the         gorithms presented previously. These algorithms
previous step, this step it is also performed by the        are seeking patterns of data interpretable within
recovery module contained in the prototype.                 the matrix of values for determinate the classes of
                                                            the tweets in positive or negative for Dilma’s im-
4.3 Feature Extraction                                      peachment. This step it is also performed by the
After the preprocessing stage, tweets were sub-             analysis module.
mitted to the feature extraction process, through
                                                            4.5   Evaluation
the TF-IDF (Robertson, 2004) method. Once you
have applied the method of TF-IDF (Robertson,               Lastly, we have the evaluation of the classifica-
2004), the tweets are represented by the matrix of          tion of data the confusion matrix and its metrics.
numeric values (bag-of-words model) as the math-            Providing the obtaining of information, which will



                                                       38
provide the acquisition of knowledge at the end of                • 20% of the samples for training and 80%
the process of KDD (Fayyad et al., 1996).                           of test samples.

5    Results                                                    The algorithm that showed the best model was
                                                             used in the case study this paper. The results
The Results Section of this research is divided              for the proposed scenario and the best designs for
into three subsections. Subsection 5.1 is respon-            each algorithm can be viewed in subsection (5.3)
sible for describing the database that contains the          next.
samples used for training and testing. Subsection
5.2 includes the training models and test. Subsec-           5.3       Training and Cross-Validation results
tion 5.3 shows the results for the classification of
tweets.                                                      Table 2: Results obtained with the application of metrics for each of
                                                             the proportions using the classification algorithms.
5.1 Data Base
                                                                                             Training/        Metrics Evaluation
                                                                      Algorithms
This research, the database has 500 positive sam-                                              Test

ples and 500 negative of tweets to posts related to                                                         AC           SE          ES
the impeachment of the president Dilma. Totaling
                                                                                             80%-20%      96.7%        98.1%       95.7%
1,000 samples in the database. Is worth emphasiz-                                            60%-40%      96.9%        97.1%       96.7%
ing that the samples were divided only into pos-                 SVM1(Linear Kernel)         40%-60%      96.5%        98.0%       95.5%
itives and negatives, because the neutral samples                                            20%-80%      95.5%        97.4%       94.2%

have no representativity, as seen during the ex-
periments. Samples were collected an automatic                                               80%-20%      95.1%        94.5%       95.5%
                                                                                             60%-40%      94.5%        93.0%       95.6%
manner by Search API, but the labeling process                      Random Forest2           40%-60%      94.9%        91.6%       97.6%
was performed manually. During manual labeling                                               20%-80%      95.8%        96.4%       97.7%
it was aimed the selection of samples which had
good representativity for the classification process,                                        80%-20%      77.5%        76.2%       78.4%
that is the most variable possible. Recalling that                                           60%-40%      84.8%        84.4%       85.1%
                                                                      Naive Bayes3           40%-60%      85.0%        83.8%       85.8%
the tweets used this subsection are different from                                           20%-80%      89.3%        94.0%       86.7%
those used in subsection regarding the Case Study.
These are geo-referenced to the capital and federal              Subtitle: AC: Accuracy; SE: Sensibility; ES: Specificity.
district that make up Brazil and a period of posts
different from the month of March 2016. Thus,                1
                                                               Parameters WEKA - type kernel:         default values;
                                                                                                      3
                                                             linear; SVM type: C-SVC (classifica-       Parameters      WEKA        -     All
we seek to avoid potential problems in the tweets            tion); gama: 0.5 and other paratemtros   paratemtros with default values.
                                                             with default values;
classification.                                              2
                                                               Parameters WEKA - Number of
                                                             trees: 10; and other paratemtros with

5.2 Training and Test Models
The generation of training models and test took                 According to Table 5.3, can be checked that the
place with the help of the WEKA4 tool. Through               greatest amount of accuracy was found for the pro-
this, we used the implementations of algorithms              portion of 60% - 40%, using the SVM, with a hit
(SVM, Naive Bayes and Random Forest) clas-                   rate 96.9%. While the lowest value was recorded
sification, necessary for the creation of models.            by the accuracy Naive Bayes with a hit rate of
Scenarios were generated, respecting the training            89.3% for the proportion of 20% - 80%.
models and test as:                                             According to the analysis results for Sensitivity
                                                             in Table 5.3, we can conclude that the SVM has
    • 80% of the samples for training and 20%                the highest rate in relation to the number of true
      of test samples;                                       positive feedback. With a Sensitivity rate of 98.1
                                                             % for the proportion of 80% - 20%.
    • 60% of the samples for training and 40%                   Analyzing the data in Table 5.3 concerning
      of test samples;                                       Specificity, one can infer that the Random Forest
                                                             presents the best result for true negative reviews,
    • 40% of the samples for training and 60%                with a Specificity rate of 97.7% for the proportion
      of test samples;                                       of 80% - 20%.



                                                        39
Table 3: Results obtained with the application of metrics for each of the
proportions using the classification algorithms for Cross-validation.
                                                                                             Analyzing Figure 3, one can see that in the Mid-
                                                         Metrics Evaluation               west, Southeast and South map there most records
          Algorithms            Quantities of folds      PR      RE      F1               in favor of impeachment. Assuming the map of
    SVM1(Linear Kernel)                   10           98.5%     97.8%    98.4%           the Northeast region, little more of most records
                                                                                          are of opposed to impeachment. The map of the
         Random Forest2                   10           98.2%     97.5%    98.1%           northern region is the only one of the five regions
                                                                                          presenting the same results for the reviews.
          Naive Bayes3                    10           76.5%     82.9%    76.4%              It is important to note that other research related
                                                                                          to the impeachment process have already been car-
           Subtitle: PR: Precision; RE: Recall; F1: F1-Score.                             ried out since 2015 in Brazil, when the first evi-
                                                                                          dences to the process. One of those researches are
1
  Parameters WEKA - type kernel: linear;       ues;
SVM type: C-SVC (classification); gama:        3
                                                 Parameters WEKA - All paratemtros        very similar to the one presented in the research in
                                                                                          this study, being presented in the Veja8 magazine.
0.5 and other paratemtros with default val-    with default values.
ues;
2
  Parameters WEKA - Number of trees:
10; and other paratemtros with default val-                                               In it the magazine exposes results of a research
                                                                                          on social networks by the company Torabit9 , in
                                                                                          which 49.3 % of posts on social networks are fa-
   According to the analysis results for Precision,                                       vorable to impeachment and only 31.7 % contrary.
Recall and F1-Score in Table 5.3, we can conclude                                         Considering the results of the report and the pro-
that the SVM has the highest rate for the metrics                                         posed approach, it can be seen that the present
used in cross-validation with 10 folds. With a Pre-                                       work presents valid trends in relation to the im-
cision rate of 98.5 %, Recall rate of 97.8% and                                           peachment process. It is remarkable that the pro-
F1-Score rate of 98.4%.                                                                   posed work informs trends by region, which does
                                                                                          not happen with the work done by Torabit9 .
6        Case study                                                                          Seeking to standardize the presentation of data
In this case study were analyzed a total of 1,218                                         in the map plotted by the proposed approach (see
tweets georeferenced, highlighting that the tweets                                        Figure 3) we used a graphic seeking to make it
not georeferenced were discarded. These posts are                                         understandable, as shown in Figure 4. Analyz-
referring to the period of March 2016, linked to                                          ing Figure 4, it can be seen, in simplified way,
the process of impeachment the president of the                                           the percentages by region for each of the opinions,
country. This period was selected based on two                                            whether favorable or contrary to the impeachment.
large manifestation schedules for the month. The                                             It can be said that the proposed paper presents
first manifestation favorable5 to impeachment, oc-                                        information by regions, which can be proven
curred on day 13 and the second contrary6 on day                                          through traditional research survey. It happens be-
31.                                                                                       cause these studies uses past data, while the pro-
   Figure 3 presents the results to tweets collected                                      posed work can use past or current data. Monthly
and analyzed in the form of map for the regions of                                        data was used in the study of proposed case in
Brazil, using the approach proposed. Reminding                                            March 2016. This collection and analysis of daily
that for plotting of the map used the GeoServer                                           data can identify possible trends and allows target-
(Web Map), as shown in (Huang and Xu, 2011)                                               ing of strategic actions in general. These actions
and based on shapefiles7 to the five regions of                                           carried out by favorable movements or contrary to
Brazil. These occurrences are posts containing                                            impeachment.
hashtags cited previously in subsection 4.1.                                                 It is important to note that the case study could
     5
                                                                                          be carried out in relation to other periods for the
      Check the location and time of the demonstra-
tions of March 13 — Congress in focus - Link for                                          Twitter posts. In this new study you can be dis-
news: http://congressoemfoco.uol.com.br/noticias/confira-o-                               pensed the phases by training and testing, since the
horario-e-o-local-das-manifestacoes-de-13-de-marco/.
    6                                                                                         8
      Manifestations against the coup are sched-                                                49% of mentions on social networks are
uled for this Thursday (31/03) - Link for news:                                           pro-impeachment,           study     shows     —      Radar
http://www.pragmatismopolitico.com.br/2016/03/manifestac                                  Online       —      VEJA.com       -    Link     for    news:
oes-contra-o-golpe-estao-agendadas-para-esta-quinta-feira-                                http://veja.abril.com.br/blog/radar-on-line/sem-categoria/49-
3103.html.                                                                                das-mencoes-em-redes-sociais-sao-pro-impeachment-
    7                                                                                     mostra-estudo/.
      Shapefiles - It is a well-known format
                                                                                              9
for storing geospatial resources in files,              site:                                   Page Home - Torabit - Link for site:
http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf                                http://www.torabit.com.br/.




                                                                                     40
Figure 3: Approach to implementing proposed policy opinion analysis: map with trends impeachment
for regions of Brazil.


                                                            the data of social networks, such as the Twitter
                                                            (available through its API), can be used for public
                                                            opinion research purposes that go beyond a simple
                                                            mechanism for broadcast content. Remember that
                                                            these networks provide a range of opportunities to
                                                            detect where and when a topic of interest is being
                                                            discussed. Monitoring on a particular topic and lo-
                                                            cation, allows researchers to compare it with other
                                                            collected data using different means. As it was
                                                            shown in Case Study proposed.
                                                               Possibly, the results can be improved through
Figure 4: Graphic trends for the impeachment for
                                                            the use of other methods for feature extraction or
regions of Brazil, based on the proposed approach.
                                                            combination of these, such as: Latent Semantic In-
                                                            dexing Principal Component Analysis and others.
models were obtained in the previous study and              These improvements can come with implementa-
the same could be reused for other periods. With            tions of these methods in future work.
the application of this new study results should
verify possible trends for the process of impeach-
ment for the selected period.                               References
                                                            I. Arel, D.C. Rose, and T.P. Karnowski. 2010. Deep
7   Conclusion                                                 machine learning - a new frontier in artificial intel-
                                                               ligence research [research frontier]. Computational
It is concluded the proposed approach achieved                 Intelligence Magazine, IEEE, 5(4):13–18, Nov.
the goal of providing a solution based on opinion           Leo Breiman. 2001. Random forests. Mach. Learn.,
mining to identify policy trends according to pub-            45(1):5–32, October.
lic opinion. The result obtained with the proposed
work to collect and process data from the Twitter           David P. Currie. 1998. The first impeachment:
                                                              The constitution’s framers and the case of senator
is valid and resembles with the other work.                   william blount. American Journal of Legal History,
   Probably, the results of this study conclude that          42(4):427–429.




                                                       41
Nugroho Dwi Prasetyo and Claudia Hauff. 2015.                   Ian Sommerville. 2006. Software Engineering: (Up-
  Twitter-based election prediction in the developing              date) (8th Edition) (International Computer Sci-
  world. In Proceedings of the 26th ACM Conference                 ence). Addison-Wesley Longman Publishing Co.,
  on Hypertext & Social Media, HT ’15, pages                   Inc., Boston, MA, USA.
  149–158, New York, NY, USA. ACM.
                                                                Tiara, M.K. Sabariah, and V. Effendy. 2015. Sentiment
Usama Fayyad, Gregory Piatetsky-shapiro, and                       analysis on twitter using the combination of lexicon-
  Padhraic Smyth. 1996. From data mining to                        based and support vector machine for assessing the
  knowledge discovery in databases. AI Magazine,                   performance of a television program. pages 386–
  17:37–54.                                                        390, May.

Eibe Frank and Remco R. Bouckaert. 2006. Naive                  H. Zhang. 2011. The research of web mining in e-
  bayes for text classification with unbalanced classes.          commerce. In Management and Service Science
  In Proceedings of the 10th European Conference on               (MASS), 2011 International Conference on, pages
  Principle and Practice of Knowledge Discovery in                1–4, Aug.
  Databases, PKDD’06, pages 503–510, Berlin, Hei-
  delberg. Springer-Verlag.

Z. Huang and Z. Xu. 2011. A method of using
  geoserver to publish economy geographical infor-
  mation. In Control, Automation and Systems Engi-
  neering (CASE), 2011 International Conference on,
  pages 1–4, July.

Thorsten Joachims. 1998. Text categorization with
  suport vector machines: Learning with many rele-
  vant features. In Proceedings of the 10th European
  Conference on Machine Learning, ECML ’98, pages
  137–142, London, UK. Springer-Verlag.

Andreas Jungherr. 2013. Tweets and votes, a spe-
  cial relationship: The 2009 federal election in ger-
  many. In Proceedings of the 2Nd Workshop on Pol-
  itics, Elections and Data, PLEAD ’13, pages 5–14,
  New York, NY, USA. ACM.

Jayashri Khairnar and Mayura Kinikar. 2013. Machine
   learning algorithms for opinion mining and senti-
   ment classification. International Journal of Scien-
   tific and Research Publications, 3(6):1 – 6.

Vasileios Lampos. 2012. On voting intentions infer-
  ence from Twitter content: a case study on UK 2010
  General Election. arXiv preprint arXiv:1204.0423.

Mahmut Onur Karslıoğlu Nuhcan Akçit, Emrah Tomur.
 2014. Geographical information systems partici-
 pating into the pervasive computing. In GEOPro-
 cessing 2014, The Sixth International Conference on
 Advanced Geographic Information Systems, Appli-
 cations, and Services, pages 129–137. ThinkMind,
 March.

Bo Pang and Lillian Lee. 2008. Opinion mining and
  sentiment analysis. Found. Trends Inf. Retr., 2(1-
  2):1–135, January.

Stephen Robertson. 2004. Understanding inverse doc-
   ument frequency: On theoretical arguments for idf.
   Journal of Documentation, 60(5):503–520, July.

Marina Sokolova and Guy Lapalme. 2009. A system-
 atic analysis of performance measures for classifica-
 tion tasks. Information Processing & Management,
 45(4):427 – 437.




                                                           42