=Paper= {{Paper |id=None |storemode=property |title=Relating Political Party Mentions on Twitter with Polls and Election Results |pdfUrl=https://ceur-ws.org/Vol-986/paper_9.pdf |volume=Vol-986 |dblpUrl=https://dblp.org/rec/conf/dir/SandersB13 }} ==Relating Political Party Mentions on Twitter with Polls and Election Results== https://ceur-ws.org/Vol-986/paper_9.pdf
 Relating Political Party Mentions on Twitter with Polls and
                       Election Results
                             Eric Sanders                                              Antal van den Bosch
          CLS/CLST, Radboud University Nijmegen                                  CLS, Radboud University Nijmegen
                     Erasmusplein 1                                                      Erasmusplein 1
                   6525 HT Nijmegen                                                     6525 HT Nijmegen
                    +31 24 3616087                                                       +31 24 3611647
                     e.sanders@let.ru.nl                                            a.vandenbosch@let.ru.nl


ABSTRACT                                                               Twitter have not reached that age, but demographic information
In each of the last ten days preceding the parliamentary elections     regarding individual users is not available in any trustworthy way
of 2012 in the Netherlands at least one election poll was              on Twitter. The sheer magnitude of data available on Twitter may
published. Throughout the same period close to 170 thousand            compensate for this partly unrepresentative information.
Dutch microtext messages with references to political parties were     In this paper a comparison between the predictive potential of
posted on Twitter, the microblogging platform. In this study we        tweets and polls with respect to the outcome of the Dutch
investigate whether these tweets can serve as an addition to, or       parliament elections of 12 September 2012 is presented. The
even an alternative for the traditional polls as predictors of the     number of times a political party is mentioned in a Dutch tweet is
election outcomes. We show that counts of mentions of political        compared to the polls and the election results without
party names are strongly correlated with the polls and the election    normalization. This was done for all eleven parties that won at
results. While polls remain more accurate as a predictor of the        least one seat in the parliament. The next sections discuss related
outcome (a mean absolute error of 1.1% and a correlation of about      work, describe the data, explain the experiment, discuss the
0.98 with the actual percentage of votes cast for all parties), the    results, and draw conclusions.
Twitter statistics show a mean absolute error of 1.9% when
aggregated over a number of days, and display a high correlation       2. RELATED WORK
with elections and polls (in both cases, r≈0.95). We conclude that     Work that has focused on predicting election outcomes through
tweet mention counts form a good complementary basis for               social media mining offers a mixed bag of results. Tumasjan et al
predicting election results.                                           [1] show that for the six biggest parties in the election of the
                                                                       German parliament held on 27 September 2009, the percentage of
Categories and Subject Descriptors                                     tweets in which a party is mentioned between 13 August and 19
H.3.3 [Information Storage and Retrieval]: Information Search          September 2009 highly correlates with the election result of that
and Retrieval                                                          party. Their particular selection of parties and the period over
                                                                       which counts were gathered is questioned in a responding paper
General Terms                                                          by Jungherr et al [2]. They claim that the choices made by
Human Factors, Languages.                                              Tumasjan et al give overly optimistic results on badly grounded
                                                                       heuristics.
Keywords
Twitter, Elections, Polls.                                             O‟Connor et al [3] compare the sentiment ratio of tweets
                                                                       containing „obama‟ with presidential job approval polls in 2009
1. INTRODUCTION                                                        and presidential election polls in 2008. The ratio correlates well
With a current average of about a half billion messages posted         with the first poll but does not with the latter. Marchetti-Bowick
daily, Twitter hosts a massive amount of accessible messages,          and Chambers [4] build on the work of O‟Connor et al. and use
which in turn harbor vast amounts of information. Tweets are           distant supervision for both topic identification and sentiment
often related to personal affairs, but may also refer to popular       analysis. The comparison of the results with Obama‟s job
events. One of the interesting uses of the information in tweets is    approval poll gives better correlation than earlier work.
to try to determine people‟s opinions about certain matters.
Politics is an attractive subject to try to get opinions about from    Tjong Kim Sang and Bos [5] compare tweet mentions and
tweets. In terms of events, political elections typically evoke the    election results for the Dutch senate elections of 2011. Beyond
posting of tweets containing political views.                          raw counts of tweets they test and compare the predictive power
                                                                       of four alternative counting methods, but they do not find large
A conventional way of assessing average opinions about politics        improvements with these methods.
during election periods is polling. The standard polling method is
to ask a small but representative part of the population what party    The novelty of the work described in this paper is that it is based
or person one is planning to vote for. On Twitter people give this     on a relatively large number of consecutive polls on each of the
information without being prompted. It would be an interesting         ten days before the elections.
addition to (or even alternative to) polls if we could extract this    Gayo-Avello [6] pinpoints a couple of problems with predicting
information from tweets. The most challenging part of it is to         elections based on tweets and gives some suggestions. Apart from
gather a balanced representation from the tweets of the people         those addressed in this paper, he indicates that only good results
participating in the elections. In essence this is impossible; while   are published and analyzing afterwards is not predicting.
the legal voting age in the Netherlands is 18, many users on
3. DATA                                                                      Table 1. The regular expressions that were used to detect the
The Twitter data used in the experiments is taken from a                                      party names in the tweets
substantial archive of Dutch tweets collected within the TwiNL              Party          Regular Expression Pattern
project (ifarm.nl/erikt/twinl). The FAQ of the related search
                                                                            VVD            "vvd"
website twiqs.nl states that an estimated 40% of all Dutch tweets
are collected since December 16, 2010. The present study makes              PVDA           "pvda","partij\s+v(oor\s+|an\s+|.)?d(e|.)?\s+arbeid"
use of all tweets gathered between September 2 to September 12,             SP             “sp”
2012, for which between 2.0 and 2.4 million tweets per day have
been archived.                                                              PVV            "pvv","partij\s+v(oor\s+|an\s+|.)?d(e|.)?\s+vrijheid"
The poll data is taken from the website Alle Politieke Peilingen            CDA            “cda”
(www.allepeilingen.com) that has saved the poll results from 2000           D66            "d\'?66"
onwards of the six most cited polling institutes in the Netherlands.
These are: peil.nl, TNS NIPO, de politieke barometer, buzzpeil.nl,          GL             "gl","groen.?links"
de Stemming and NOS Peilingwijzer. All these polls try to predict           CU             "cu","christen.?unie"
the result of the elections (if the elections were held on the day of
the poll).                                                                  SGP            “sgp”
                                                                            PVDD           "pvdd","partij\s+v(oor\s+|an\s+|.)?d(e|.)?\s+dieren"
4. EXPERIMENT
For the eleven parties that won one or more seats in parliament we          50PLUS         "50[^\d]?(\+|plus)"
counted how often the party name was mentioned in a tweet in the
ten days before the elections and on election day, 12 September             An average of 0.7% of all daily tweets posted throughout the last
2012. This was done with a basic pattern match. First it was                ten days before the election mentions at least one political party.
investigated by which names parties are mentioned in the tweets.            Table 2 shows that these nearly 170 thousand tweets are not
Most parties are almost exclusively mentioned by their                      uniformly divided over the eleven days; about one third of all
abbreviation and rarely by their full name. Most full names are             tweets is posted on election day, and more tweets are posted
therefore ignored. For instance, the acronym of the VVD occurs              closer to election day.
over thousand times more often than its full name, „Volkspartij
voor Vrijheid en Democratie‟. However, two parties are often                5. RESULTS
mentioned by their full name: GroenLinks and ChristenUnie.                  First, comparisons are shown in three figures (Figures 1, 2, and 3)
Their respective abbreviations can also have other meanings: GL             between daily percentages of Twitter mentions and daily poll
being a typical English shorthand for „good luck‟ and CU for „see           results of selections of two or three parties during the ten days
you‟, but a manual inspection revealed that these abbreviations are         before the elections.
rarely used in these meanings.
                                                                            The daily percentage of Twitter mentions for a particular party is
We needed to generate several specific pattern-matching                     computed as follows:
expressions. Three parties have „van de‟ („of the‟) or „voor de‟
(„for the‟) in their full name which can be expressed in many
ways, e.g. „vd‟, „v.d.‟, „v/d‟, „van de‟, „v d‟, which are all              Perc = 100 * #mentionsp / ∑p#mentionsp
represented in the search pattern that was used. Matching is case-
insensitive, so „SGP‟, „sgp‟, „Sgp‟ etc. are all recognised. No
effort was made to find misspelled party names. The party names             where #mentionsp is the number of mentions of a particular party,
can be preceded by „@‟ (Twitter account names) or „#‟ (Twitter              and ∑p#mentionsp is the total number of mentions of all eleven
hashtags) and preceded or followed by punctuation.                          parties. The counts thus represent mentions, not tweets: if in a
                                                                            tweet two parties are mentioned, the tweet is counted twice.
Table 1 lists the resulting regular expressions for the parties.
                                                                            The percentages of poll results are computed from the predicted
During the period of ten days before the election, for each day and
                                                                            number of parliament seats, which is the statistic by which they
each party, the percentage of tweets in which a party is mentioned
                                                                            are reported and stored. As there are 150 seats in the Dutch
is compared to the result of the average of all polls that came out
                                                                            parliament, each seat stands for 0.67%. The percentage used here
that day. This was done to investigate how much the percentage of
                                                                            is the mean percentage of the predicted number of seats of all
party mentions in tweets resembles the polls. Subsequently, the
                                                                            polling institutes that released a prediction that day. For some
results of the averaged polls on the day before the election and the
                                                                            days there is a poll of only one institute. The predictions of the
election results are compared to each other and to the tweet
                                                                            polling institutes differ slightly. The largest difference between
mentions of (1) election day, (2) the day before election day, (3)
                                                                            two predictions from poll estimates for the same party on the
an aggregate of all tweets during the 10-day period before the
                                                                            same day is 4.7%. On the day before the elections, 11 September,
elections, and (4) an aggregate over a 5-day period before the
                                                                            all polling institutes published results.
elections.

 Table 2. Number of tweets with at least one political party mentioned in the 10 days before elections and on election day (0 days)
Days
before            0           1            2           3            4         5           6           7            8          9           10
election
#tweets         56,580      24,004       17,224       8,498        12,011   10,178       9,373        9,062      10,317      8,048        4,700
5.1 Twitter vs Polls Correlation                                     As an aside, the figure also shows a relatively high peak in the
                                                                     Twitter mentions of the PVV five days before the elections. This
                                                                     may be explained by the news that day that the PVV had falsely
                                                                     declared money from the European Union, while their campaign
                                                                     was outspokenly anti-Europe.

                                                                     5.3 Twitter vs Polls Trend




  Figure 1. Twitter mentions and poll results for VVD, CDA
                           and CU


Figure 1 displays the results for VVD, the party that won the
elections, CDA, a middle party, and ChristenUnie (CU), a small
party. The figure exemplifies the fairly strong correlation of the   Figure 3. Twitter mentions and poll results for PVDA and SP
percentages of Twitter mentions and poll results during the whole
period.
                                                                     Figure 3 shows how both in the Twitter mentions as in the poll
5.2 Twitter vs Polls Outliers                                        results the PvdA, one of the socialist parties and runner-up in the
                                                                     election results, was gaining in the last ten days before the
                                                                     elections while the SP, another socialist party, was losing voters.
                                                                     The SP started out popular, but in the debates the PvdA leader
                                                                     was doing well, while the SP leader‟s debating was considered
                                                                     disappointing.

                                                                     5.4 Twitter vs Polls vs Election
                                                                     Table 3 shows for all parties the difference between the election
                                                                     results on 12 September, the mean result of all polls on the day
                                                                     before the elections, and the relative percentage of tweets the
                                                                     party was mentioned on (1) election day, (2) the day before, (3)
                                                                     during all ten days and (4) during five days before the elections.
                                                                     The fourth and second rows from below list the mean absolute
                                                                     error (MAE) of the column with the election results (2nd column)
                                                                     and with the polls of the pre-election day (3rd column). The third
                                                                     last and final row show the correlation and the 95% confidence
                                                                     interval with the election and poll results.
                                                                     The MAE of the polls with the election results is smaller than the
                                                                     MAE of the tweet mentions with the election results in all cases,
                                                                     meaning that polls are a better predictor of the election results
                                                                     than raw counts of party names in tweets. The table also shows
Figure 2. Twitter mentions and poll results for PVV and GL           that tweet mentions of a time span of several days (five or ten)
                                                                     before the elections are closer to the election results than the tweet
                                                                     mentions on one specific day (election day or the day before).
This trend is typical for all but one party, GroenLinks (GL), as     Tweet mentions gathered during five days before the elections are
shown in Figure 2. For comparison, the GroenLinks estimates are      closer to the election results than all tweet mentions from ten days
compared against the predictions for the PVV. The figure displays    before the election results. Finally, the correlation coefficient and
an unexpected difference between the Twitter mentions and poll       the confidence interval show the same trend as the MAE, and are
results for GroenLinks. This party is well known for its above-      very high in all cases; 0.93 or higher.
average use of and presence on social media in their campaign [7].
Table 3. Comparison between election results, polls and tweets         Negation and hedging may be a third factor that could partially be
              from different time slots in %                           determined automatically and improve estimates. A tweet such „I
                                                                       will not vote for partyX‟ could then be left out of the count for
                        Polls    Tweet    Tweet     Tweet    Tweet
            Election                                                   partyX. This is a very challenging task, though. Morante and
Party                    11        12       11       2-11     7-11
             12 Sep                                                    Daelemans [8] provide pointers on how this may be addressed.
                        Sep       Sep      Sep       Sep      Sep
                                                                       Fourth, can we account for factors that cause an increase in the
VVD              26.8    23.7      24.6     18.9      20.7     20.6    number of tweets of a certain party? The detection of other events
PVDA             25.1    23.4      18.5     21.7      20.2     22.2    involving entities that also play a role in the focus event (such as
                                                                       the PVV scandal mentioned in the discussion of Figure 2) may be
PVV              10.2    11.6      13.6     11.5      10.7     11.4    used to discount tweets about this event.
SP                9.8    13.9       8.7       9.7     12.0     10.3    Finally, we observed that estimates based on counts aggregated
CDA               8.6     8.3       6.0       7.5      8.6       8.6   over several days better approximated the election results than the
                                                                       counts on a specific day; five days seem to represent a reasonable
D66               8.1     7.9       9.8       9.7      9.0       8.5
                                                                       aggregation window. A further study could be carried out to see
CU                3.2     3.7       2.6       2.9      3.0       2.7   whether an optimal time window can be found for events similar
GL                2.4     2.7       7.0       8.9      8.6       8.8   to the single case studied here.

SGP               2.1     1.7       3.2       4.4      2.9       2.8   We do not share Gayo-Avello‟s conclusion that elections cannot
                                                                       be predicted with Twitter, but acknowledge that further research
PVDD              2.0     1.8       3.6       3.5      3.2       3.2   has to be carried out before we say Yes we can! (predict elections
50PLUS            1.9     1.7       2.4       1.3      1.1       1.1   with Twitter).

                                                                       7. ACKNOWLEDGMENTS
MAE                                                                    We thank Ruut Brandsma from www.allepeilingen.com for
                          1.1       2.4       2.4      2.2       1.9
elections                                                              providing the data from the polling institutes.
                         0.98      0.95     0.94      0.95     0.96    8. REFERENCES
Corr
elections               (0.93    (0.82-    (0.78-   (0.83-    (0.84-   [1] Tumasjan, A., Sprenger, T., Sander, P., Welpe, I. (2010).
                        -1.0)     0.99)     0.98)    0.99)     0.99)       Predicting Elections with Twitter: What 140 Characters
MAE poll          1.1               2.4       2.3      2.0       1.7       Reveal about Political Sentiment. Proceedings of the fourth
                                                                           International AAAI Conference on Weblogs and Social
                 0.98              0.93     0.94      0.96     0.96        Media, Washington D.C., USA, 2010
Corr poll      (0.93-            (0.76-    (0.78-   (0.87-    (0.83-   [2] Jungherr, A., Jürgens, P., Schoen, H. (2012). Why the Pirate
                 1.0)             0.98)     0.98)    0.99)     0.99)       Party Won the German Election of 2009 or The Trouble
                                                                           With Predictions: A Response to Tumasjan, A., Sprenger,
6. DISCUSSION                                                              T.O., Sander, P.G., Welpe, I.M. “Predicting Elections with
The results of our comparative study on the 2012 Dutch                     Twitter: What 140 Characters Reveal about Political
parliament elections provide case-based evidence that tweets are a         Sentiment”, Social Science Computer Review
good basis for predicting election results. Purely on the basis of     [3] O‟Connor, B., Balasubramanyan, E., Routledge, B., Smith,
raw counts of party name mentions (with flexible pattern                   N. (2010). From Tweets to Polls: Linking Text Sentiment to
matching rules), without further domain knowledge, a strong                Public Opinion Time Series. Proceedings of the fourth
correlation with the poll results can be observed (around 0.95). In        International AAAI Conference on Weblogs and Social
a number of cases the difference between the Twitter mentions              Media, Washington D.C., USA, 2010
and the polls is larger than 5%, but the difference between the
various polls is also almost 5% in a few cases. Although the polls     [4] Marchetti-Bowick, M., Chambers, N. (2012). Learning for
more accurately predict the election outcome, the correlation              Microblogs with Distant Supervision: Political Forecasting
between tweet-based estimates and the outcome is observed to be            with Twitter. Proceedings of the 13th Conference of the
as high as 0.96, with a mean absolute error of only 1.9% (the polls        European Chapter of the Association for Computational
attain 1.1%), provided that the tweet counts are aggregated over a         Linguistics, Avignon, France, 2012
number of days.                                                        [5] Tjong Kim Sang, E. and Bos. J. (2012). Predicting the 2011
                                                                           Dutch Senate Election Results with Twitter, Proceedings of
As Gayo-Avello rightly points out in his paper [6] our kind of
                                                                           SASN 2012, the EACL 2012 Workshop on Semantic Analysis
approach lacks information that could improve the prediction of            in Social Networks, Avignon, France, 2012
election outcomes or poll results based on Twitter. First, who is
tweeting? If the Twitter account is from a party member or             [6] Gayo-Avello, D. (2012). No, You Cannot Predict Elections
official the tweet could be filtered out as it may be used to steer        with Twitter, IEEE Internet Computing, vol. 16, no 6. pp. 91-
social media opinions or even statistics. However, it is hard to           94, Nov.-Dec. 2012
ascertain whether a Twitter account is from a party member.            [7] http://www.sax.nu/Portals/615/docs/Rapport_Saxion_Social_
Automatic profiling based on machine learning and text                     Media_Politiek_2012.pdf
classification may help in this respect. Second, is the tweet polar
or neutral? A Twitter user who will vote for a party is likely to      [8] Morante, R., and Daelemans, W. (2009). A metalearning
compose positive tweets about that party. Automatic sentiment              approach to processing the scope of negation. Proceedings of
analysis (perhaps trained on political opinions to capture domain-         the Thirteenth Conference on Computational Natural
specific sentiment markers) might be used to reweight counts.              Language Learning, pp. 21-2), 2009