=Paper=
{{Paper
|id=None
|storemode=property
|title=Relating Political Party Mentions on Twitter with Polls and Election Results
|pdfUrl=https://ceur-ws.org/Vol-986/paper_9.pdf
|volume=Vol-986
|dblpUrl=https://dblp.org/rec/conf/dir/SandersB13
}}
==Relating Political Party Mentions on Twitter with Polls and Election Results==
Relating Political Party Mentions on Twitter with Polls and Election Results Eric Sanders Antal van den Bosch CLS/CLST, Radboud University Nijmegen CLS, Radboud University Nijmegen Erasmusplein 1 Erasmusplein 1 6525 HT Nijmegen 6525 HT Nijmegen +31 24 3616087 +31 24 3611647 e.sanders@let.ru.nl a.vandenbosch@let.ru.nl ABSTRACT Twitter have not reached that age, but demographic information In each of the last ten days preceding the parliamentary elections regarding individual users is not available in any trustworthy way of 2012 in the Netherlands at least one election poll was on Twitter. The sheer magnitude of data available on Twitter may published. Throughout the same period close to 170 thousand compensate for this partly unrepresentative information. Dutch microtext messages with references to political parties were In this paper a comparison between the predictive potential of posted on Twitter, the microblogging platform. In this study we tweets and polls with respect to the outcome of the Dutch investigate whether these tweets can serve as an addition to, or parliament elections of 12 September 2012 is presented. The even an alternative for the traditional polls as predictors of the number of times a political party is mentioned in a Dutch tweet is election outcomes. We show that counts of mentions of political compared to the polls and the election results without party names are strongly correlated with the polls and the election normalization. This was done for all eleven parties that won at results. While polls remain more accurate as a predictor of the least one seat in the parliament. The next sections discuss related outcome (a mean absolute error of 1.1% and a correlation of about work, describe the data, explain the experiment, discuss the 0.98 with the actual percentage of votes cast for all parties), the results, and draw conclusions. Twitter statistics show a mean absolute error of 1.9% when aggregated over a number of days, and display a high correlation 2. RELATED WORK with elections and polls (in both cases, r≈0.95). We conclude that Work that has focused on predicting election outcomes through tweet mention counts form a good complementary basis for social media mining offers a mixed bag of results. Tumasjan et al predicting election results. [1] show that for the six biggest parties in the election of the German parliament held on 27 September 2009, the percentage of Categories and Subject Descriptors tweets in which a party is mentioned between 13 August and 19 H.3.3 [Information Storage and Retrieval]: Information Search September 2009 highly correlates with the election result of that and Retrieval party. Their particular selection of parties and the period over which counts were gathered is questioned in a responding paper General Terms by Jungherr et al [2]. They claim that the choices made by Human Factors, Languages. Tumasjan et al give overly optimistic results on badly grounded heuristics. Keywords Twitter, Elections, Polls. O‟Connor et al [3] compare the sentiment ratio of tweets containing „obama‟ with presidential job approval polls in 2009 1. INTRODUCTION and presidential election polls in 2008. The ratio correlates well With a current average of about a half billion messages posted with the first poll but does not with the latter. Marchetti-Bowick daily, Twitter hosts a massive amount of accessible messages, and Chambers [4] build on the work of O‟Connor et al. and use which in turn harbor vast amounts of information. Tweets are distant supervision for both topic identification and sentiment often related to personal affairs, but may also refer to popular analysis. The comparison of the results with Obama‟s job events. One of the interesting uses of the information in tweets is approval poll gives better correlation than earlier work. to try to determine people‟s opinions about certain matters. Politics is an attractive subject to try to get opinions about from Tjong Kim Sang and Bos [5] compare tweet mentions and tweets. In terms of events, political elections typically evoke the election results for the Dutch senate elections of 2011. Beyond posting of tweets containing political views. raw counts of tweets they test and compare the predictive power of four alternative counting methods, but they do not find large A conventional way of assessing average opinions about politics improvements with these methods. during election periods is polling. The standard polling method is to ask a small but representative part of the population what party The novelty of the work described in this paper is that it is based or person one is planning to vote for. On Twitter people give this on a relatively large number of consecutive polls on each of the information without being prompted. It would be an interesting ten days before the elections. addition to (or even alternative to) polls if we could extract this Gayo-Avello [6] pinpoints a couple of problems with predicting information from tweets. The most challenging part of it is to elections based on tweets and gives some suggestions. Apart from gather a balanced representation from the tweets of the people those addressed in this paper, he indicates that only good results participating in the elections. In essence this is impossible; while are published and analyzing afterwards is not predicting. the legal voting age in the Netherlands is 18, many users on 3. DATA Table 1. The regular expressions that were used to detect the The Twitter data used in the experiments is taken from a party names in the tweets substantial archive of Dutch tweets collected within the TwiNL Party Regular Expression Pattern project (ifarm.nl/erikt/twinl). The FAQ of the related search VVD "vvd" website twiqs.nl states that an estimated 40% of all Dutch tweets are collected since December 16, 2010. The present study makes PVDA "pvda","partij\s+v(oor\s+|an\s+|.)?d(e|.)?\s+arbeid" use of all tweets gathered between September 2 to September 12, SP “sp” 2012, for which between 2.0 and 2.4 million tweets per day have been archived. PVV "pvv","partij\s+v(oor\s+|an\s+|.)?d(e|.)?\s+vrijheid" The poll data is taken from the website Alle Politieke Peilingen CDA “cda” (www.allepeilingen.com) that has saved the poll results from 2000 D66 "d\'?66" onwards of the six most cited polling institutes in the Netherlands. These are: peil.nl, TNS NIPO, de politieke barometer, buzzpeil.nl, GL "gl","groen.?links" de Stemming and NOS Peilingwijzer. All these polls try to predict CU "cu","christen.?unie" the result of the elections (if the elections were held on the day of the poll). SGP “sgp” PVDD "pvdd","partij\s+v(oor\s+|an\s+|.)?d(e|.)?\s+dieren" 4. EXPERIMENT For the eleven parties that won one or more seats in parliament we 50PLUS "50[^\d]?(\+|plus)" counted how often the party name was mentioned in a tweet in the ten days before the elections and on election day, 12 September An average of 0.7% of all daily tweets posted throughout the last 2012. This was done with a basic pattern match. First it was ten days before the election mentions at least one political party. investigated by which names parties are mentioned in the tweets. Table 2 shows that these nearly 170 thousand tweets are not Most parties are almost exclusively mentioned by their uniformly divided over the eleven days; about one third of all abbreviation and rarely by their full name. Most full names are tweets is posted on election day, and more tweets are posted therefore ignored. For instance, the acronym of the VVD occurs closer to election day. over thousand times more often than its full name, „Volkspartij voor Vrijheid en Democratie‟. However, two parties are often 5. RESULTS mentioned by their full name: GroenLinks and ChristenUnie. First, comparisons are shown in three figures (Figures 1, 2, and 3) Their respective abbreviations can also have other meanings: GL between daily percentages of Twitter mentions and daily poll being a typical English shorthand for „good luck‟ and CU for „see results of selections of two or three parties during the ten days you‟, but a manual inspection revealed that these abbreviations are before the elections. rarely used in these meanings. The daily percentage of Twitter mentions for a particular party is We needed to generate several specific pattern-matching computed as follows: expressions. Three parties have „van de‟ („of the‟) or „voor de‟ („for the‟) in their full name which can be expressed in many ways, e.g. „vd‟, „v.d.‟, „v/d‟, „van de‟, „v d‟, which are all Perc = 100 * #mentionsp / ∑p#mentionsp represented in the search pattern that was used. Matching is case- insensitive, so „SGP‟, „sgp‟, „Sgp‟ etc. are all recognised. No effort was made to find misspelled party names. The party names where #mentionsp is the number of mentions of a particular party, can be preceded by „@‟ (Twitter account names) or „#‟ (Twitter and ∑p#mentionsp is the total number of mentions of all eleven hashtags) and preceded or followed by punctuation. parties. The counts thus represent mentions, not tweets: if in a tweet two parties are mentioned, the tweet is counted twice. Table 1 lists the resulting regular expressions for the parties. The percentages of poll results are computed from the predicted During the period of ten days before the election, for each day and number of parliament seats, which is the statistic by which they each party, the percentage of tweets in which a party is mentioned are reported and stored. As there are 150 seats in the Dutch is compared to the result of the average of all polls that came out parliament, each seat stands for 0.67%. The percentage used here that day. This was done to investigate how much the percentage of is the mean percentage of the predicted number of seats of all party mentions in tweets resembles the polls. Subsequently, the polling institutes that released a prediction that day. For some results of the averaged polls on the day before the election and the days there is a poll of only one institute. The predictions of the election results are compared to each other and to the tweet polling institutes differ slightly. The largest difference between mentions of (1) election day, (2) the day before election day, (3) two predictions from poll estimates for the same party on the an aggregate of all tweets during the 10-day period before the same day is 4.7%. On the day before the elections, 11 September, elections, and (4) an aggregate over a 5-day period before the all polling institutes published results. elections. Table 2. Number of tweets with at least one political party mentioned in the 10 days before elections and on election day (0 days) Days before 0 1 2 3 4 5 6 7 8 9 10 election #tweets 56,580 24,004 17,224 8,498 12,011 10,178 9,373 9,062 10,317 8,048 4,700 5.1 Twitter vs Polls Correlation As an aside, the figure also shows a relatively high peak in the Twitter mentions of the PVV five days before the elections. This may be explained by the news that day that the PVV had falsely declared money from the European Union, while their campaign was outspokenly anti-Europe. 5.3 Twitter vs Polls Trend Figure 1. Twitter mentions and poll results for VVD, CDA and CU Figure 1 displays the results for VVD, the party that won the elections, CDA, a middle party, and ChristenUnie (CU), a small party. The figure exemplifies the fairly strong correlation of the Figure 3. Twitter mentions and poll results for PVDA and SP percentages of Twitter mentions and poll results during the whole period. Figure 3 shows how both in the Twitter mentions as in the poll 5.2 Twitter vs Polls Outliers results the PvdA, one of the socialist parties and runner-up in the election results, was gaining in the last ten days before the elections while the SP, another socialist party, was losing voters. The SP started out popular, but in the debates the PvdA leader was doing well, while the SP leader‟s debating was considered disappointing. 5.4 Twitter vs Polls vs Election Table 3 shows for all parties the difference between the election results on 12 September, the mean result of all polls on the day before the elections, and the relative percentage of tweets the party was mentioned on (1) election day, (2) the day before, (3) during all ten days and (4) during five days before the elections. The fourth and second rows from below list the mean absolute error (MAE) of the column with the election results (2nd column) and with the polls of the pre-election day (3rd column). The third last and final row show the correlation and the 95% confidence interval with the election and poll results. The MAE of the polls with the election results is smaller than the MAE of the tweet mentions with the election results in all cases, meaning that polls are a better predictor of the election results than raw counts of party names in tweets. The table also shows Figure 2. Twitter mentions and poll results for PVV and GL that tweet mentions of a time span of several days (five or ten) before the elections are closer to the election results than the tweet mentions on one specific day (election day or the day before). This trend is typical for all but one party, GroenLinks (GL), as Tweet mentions gathered during five days before the elections are shown in Figure 2. For comparison, the GroenLinks estimates are closer to the election results than all tweet mentions from ten days compared against the predictions for the PVV. The figure displays before the election results. Finally, the correlation coefficient and an unexpected difference between the Twitter mentions and poll the confidence interval show the same trend as the MAE, and are results for GroenLinks. This party is well known for its above- very high in all cases; 0.93 or higher. average use of and presence on social media in their campaign [7]. Table 3. Comparison between election results, polls and tweets Negation and hedging may be a third factor that could partially be from different time slots in % determined automatically and improve estimates. A tweet such „I will not vote for partyX‟ could then be left out of the count for Polls Tweet Tweet Tweet Tweet Election partyX. This is a very challenging task, though. Morante and Party 11 12 11 2-11 7-11 12 Sep Daelemans [8] provide pointers on how this may be addressed. Sep Sep Sep Sep Sep Fourth, can we account for factors that cause an increase in the VVD 26.8 23.7 24.6 18.9 20.7 20.6 number of tweets of a certain party? The detection of other events PVDA 25.1 23.4 18.5 21.7 20.2 22.2 involving entities that also play a role in the focus event (such as the PVV scandal mentioned in the discussion of Figure 2) may be PVV 10.2 11.6 13.6 11.5 10.7 11.4 used to discount tweets about this event. SP 9.8 13.9 8.7 9.7 12.0 10.3 Finally, we observed that estimates based on counts aggregated CDA 8.6 8.3 6.0 7.5 8.6 8.6 over several days better approximated the election results than the counts on a specific day; five days seem to represent a reasonable D66 8.1 7.9 9.8 9.7 9.0 8.5 aggregation window. A further study could be carried out to see CU 3.2 3.7 2.6 2.9 3.0 2.7 whether an optimal time window can be found for events similar GL 2.4 2.7 7.0 8.9 8.6 8.8 to the single case studied here. SGP 2.1 1.7 3.2 4.4 2.9 2.8 We do not share Gayo-Avello‟s conclusion that elections cannot be predicted with Twitter, but acknowledge that further research PVDD 2.0 1.8 3.6 3.5 3.2 3.2 has to be carried out before we say Yes we can! (predict elections 50PLUS 1.9 1.7 2.4 1.3 1.1 1.1 with Twitter). 7. ACKNOWLEDGMENTS MAE We thank Ruut Brandsma from www.allepeilingen.com for 1.1 2.4 2.4 2.2 1.9 elections providing the data from the polling institutes. 0.98 0.95 0.94 0.95 0.96 8. REFERENCES Corr elections (0.93 (0.82- (0.78- (0.83- (0.84- [1] Tumasjan, A., Sprenger, T., Sander, P., Welpe, I. (2010). -1.0) 0.99) 0.98) 0.99) 0.99) Predicting Elections with Twitter: What 140 Characters MAE poll 1.1 2.4 2.3 2.0 1.7 Reveal about Political Sentiment. Proceedings of the fourth International AAAI Conference on Weblogs and Social 0.98 0.93 0.94 0.96 0.96 Media, Washington D.C., USA, 2010 Corr poll (0.93- (0.76- (0.78- (0.87- (0.83- [2] Jungherr, A., Jürgens, P., Schoen, H. (2012). Why the Pirate 1.0) 0.98) 0.98) 0.99) 0.99) Party Won the German Election of 2009 or The Trouble With Predictions: A Response to Tumasjan, A., Sprenger, 6. DISCUSSION T.O., Sander, P.G., Welpe, I.M. “Predicting Elections with The results of our comparative study on the 2012 Dutch Twitter: What 140 Characters Reveal about Political parliament elections provide case-based evidence that tweets are a Sentiment”, Social Science Computer Review good basis for predicting election results. Purely on the basis of [3] O‟Connor, B., Balasubramanyan, E., Routledge, B., Smith, raw counts of party name mentions (with flexible pattern N. (2010). From Tweets to Polls: Linking Text Sentiment to matching rules), without further domain knowledge, a strong Public Opinion Time Series. Proceedings of the fourth correlation with the poll results can be observed (around 0.95). In International AAAI Conference on Weblogs and Social a number of cases the difference between the Twitter mentions Media, Washington D.C., USA, 2010 and the polls is larger than 5%, but the difference between the various polls is also almost 5% in a few cases. Although the polls [4] Marchetti-Bowick, M., Chambers, N. (2012). Learning for more accurately predict the election outcome, the correlation Microblogs with Distant Supervision: Political Forecasting between tweet-based estimates and the outcome is observed to be with Twitter. Proceedings of the 13th Conference of the as high as 0.96, with a mean absolute error of only 1.9% (the polls European Chapter of the Association for Computational attain 1.1%), provided that the tweet counts are aggregated over a Linguistics, Avignon, France, 2012 number of days. [5] Tjong Kim Sang, E. and Bos. J. (2012). Predicting the 2011 Dutch Senate Election Results with Twitter, Proceedings of As Gayo-Avello rightly points out in his paper [6] our kind of SASN 2012, the EACL 2012 Workshop on Semantic Analysis approach lacks information that could improve the prediction of in Social Networks, Avignon, France, 2012 election outcomes or poll results based on Twitter. First, who is tweeting? If the Twitter account is from a party member or [6] Gayo-Avello, D. (2012). No, You Cannot Predict Elections official the tweet could be filtered out as it may be used to steer with Twitter, IEEE Internet Computing, vol. 16, no 6. pp. 91- social media opinions or even statistics. However, it is hard to 94, Nov.-Dec. 2012 ascertain whether a Twitter account is from a party member. [7] http://www.sax.nu/Portals/615/docs/Rapport_Saxion_Social_ Automatic profiling based on machine learning and text Media_Politiek_2012.pdf classification may help in this respect. Second, is the tweet polar or neutral? A Twitter user who will vote for a party is likely to [8] Morante, R., and Daelemans, W. (2009). A metalearning compose positive tweets about that party. Automatic sentiment approach to processing the scope of negation. Proceedings of analysis (perhaps trained on political opinions to capture domain- the Thirteenth Conference on Computational Natural specific sentiment markers) might be used to reweight counts. Language Learning, pp. 21-2), 2009