=Paper=
{{Paper
|id=Vol-2481/paper21
|storemode=property
|title=Is “manovra” Really “del popolo”? Linguistic Insights into Twitter Reactions to the Annual Italian Budget Law
|pdfUrl=https://ceur-ws.org/Vol-2481/paper21.pdf
|volume=Vol-2481
|authors=Claudia Roberta Combei
|dblpUrl=https://dblp.org/rec/conf/clic-it/Combei19
}}
==Is “manovra” Really “del popolo”? Linguistic Insights into Twitter Reactions to the Annual Italian Budget Law==
Is “manovra” really “del popolo”? Linguistic Insights into Twitter Re- actions to the Annual Italian Budget Law1 Claudia Roberta Combei University of Bologna claudia.combei2@unibo.it The impact of a social media post may be huge, Abstract and unlike other prior forms of communication, it can easily cross borders in just a few seconds. In English. Relying on linguistic cues ob- fact, social media make things happen faster than tained by means of structural topic model- ever before. For instance, Facebook and Twitter ling as well as descriptive lexical anal- were crucial in allowing the Arab uprisings or the yses, this study contributes to the general Romanian anti-corruption protests to happen understanding of the Twitter users’ re- more efficiently and on a larger scale. sponse to the annual Italian budget law ap- proved at the end of December 2018. 2 Tweets and politics Some topics contained in the dataset of Besides their essential role in information dissem- tweets are procedural or generic, but be- ination, networking, and people mobilization, so- sides those, it often emerges that Twitter cial media are also important indicators and pre- users expressed their concern with respect dictors of their users’ opinions, sentiments and at- to the provisions of this law. Supportive titudes. In fact, various studies have explored peo- attitudes seem to be less frequent. This pa- ple’s reactions towards social, economic, and po- per also advocates that findings from in- litical issues, by analysing social media posts (e.g. ductive studies on Twitter data should be Burnap et al., 2014; Gaspar et al., 2016; Nesi et interpreted with caution, since the nature al., 2018), especially tweets, since they are easily of tweets might not be adequate for draw- retrievable by means of APIs. ing far-reaching generalisations. With over 6,000 tweets posted every second, corresponding to roughly 350,000 per minute, 500 1 Introduction million per day, and around 200 billion per year, In the last decade, Internet has revolutionized hu- Twitter has become one of the main tools of com- man communication and interaction. And among munication worldwide (Internet Live Stats, 2019). all forms of digitally-mediated communication, The number of tweets written daily seems to be social media stand out as one of the most effec- correlated to things happening in the real world, tive. As Boulianne (2017) points out, the effects and, as a matter of fact, it was shown that im- of social media depend on their nature of use (e.g. portant events generate high number of tweets (cf. source of information; one-to-one/one-to- Hughes and Palen, 2009), something that is gen- many/many-to-many communication; networking erally reflected also on the Twitter “trends”. and relationship-building; expression of opinions; Based on Hootsuite’s (2019) report, each month, etc.). in Italy there are almost 2.5 million active users2 Nowadays, potentially everyone with a com- of Twitter, a datum that confirms the popularity of puter or a mobile device having access to the in- this network among various layers of Italian audi- ternet can write and share contents which may be ence. viewed and debated immediately by other people. 1 Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 Users that write or share at least one tweet every month are defined “active”. This means that Twitter may represent an easily By means of structural topic modelling (cf. exploitable opportunity for politicians in their at- Roberts et al., 2014) and descriptive analyses (i.e. tempt to reinforce communication with potential terminology extraction of multi-keywords and voters in what might be defined as a permanent word sketches), we are interested in grasping the digitally-mediated electoral campaign. Addition- Twitter users’ attitudes towards the budget law in ally, it has been suggested that Twitter could be a significant moment for the first populist Govern- used to model and predict public opinion and be- ment in the eurozone, namely the coalition formed haviour regarding political events, such as elec- by Lega and M5S. toral campaigns (e.g. Coletto et al., 2015; This topic is worth studying since the two par- Kalampokis et al., 2017). In fact, Ott (2017: 59) ties displayed differences in economic, fiscal, in- claims that Twitter may be the ideal tool for the frastructural, and social policies both in the elec- afore-mentioned purposes since, it “privileges toral campaign for the 2018 general elections as discourse that is simple, impulsive, and uncivil.” well as during the first months of government. For While indeed tweets have been widely used to instance, Lega supported the flat taxation on in- analyse public opinion and political discussions in comes, while M5S the citizen's basic income all its forms, several methodological considera- (“reddito di cittadinanza” in Italian). However, tions are dutiful. First of all, Twitter users do not these measures, although slightly modified, as represent an optimal sample for public opinion or well as the amendment to the 2011 pensions re- voting population, especially due to their higher form (“quota 100” in Italian) were included in the than average level of education and political so- coalition agreement and subsequently in the draft phistication, as well as a generally younger age for the annual budget law. The bill also contained (cf. Gayo-Avello, 2013; Barberá et al., 2015). As various other economic and fiscal provisions (e.g. a matter of fact, we believe it is more accurate to taxes on digital services; new VAT rates; reducing define Twitter users as a potential share of elec- military expenses and the Italian contribution to torate. Secondly, the language of tweets is charac- United Nations; new labour measures; environ- terised by succinctness and sometimes informal- mental incentives; etc.)3. ity, colloquialism, irony, and susceptibility to ru- We believe that the textual material contained mour, all of which are aspects that render the re- in tweets may be promising in providing hints on sults of large-scale analyses hard to interpret and how Twitter users – a fraction of the Italian voters generalise. – reacted to the provisions of the budget law. Lin- guistic insights into tweets might be able to guide 3 Aims and motivations us in understanding whether the so-called “manovra del popolo” was perceived by Twitter Acknowledging all the limitations mentioned user as representing indeed the people’s interest. above, this inductive exploratory study aims to contribute to the growing body of literature exam- 4 Data ining Twitter and its increasingly prominent role in online communication by studying its applica- Although in the Western world there are three tion in the context of political discourse. In partic- mainstream social media networks (i.e. Facebook, ular, the linguistic approach presented here is Instagram, and Twitter), in this paper we analyse providing insights into tweets regarding the dis- Twitter posts, primarily as a consequence of data cussion and the approval of the annual Italian availability. Indeed, unlike other tools for social budget law (in Italian “legge finanziaria” and/or media, Twitter APIs for R (R Core Team, 2018) “legge di bilancio”). This law was also often la- allow scholars to collect large quantities of tweets belled as “the manoeuvre” (in Italian “la and their related metadata in a rather effortless manovra”) and “the people’s manoeuvre” (in Ital- way. ian “la manovra del popolo”) by its proponents – Using the rtweet package (Kearney, 2019) for in particular Movimento 5 Stelle (abbreviated R and Twitter’s developer account, we collected a M5S) –, mainly due to some of its populist provi- dataset of 167,259 Twitter posts, for a total of 6.5 sions (e.g. the citizen's basic income and pension). million tokens, consisting in tweets and retweets 3 The full text of the annual Italian budget law (Legge 12-2018 - Suppl. Ordinario n. 62) and it is available 30 dicembre 2018, n. 145 – Bilancio di previsione dello online at this webpage: https://www.gazzettauffi- Stato per l'anno finanziario 2019 e bilancio plurien- ciale.it/atto/stampa/serie_generale/originario (ac- nale per il triennio 2019-2021) was published on the cessed on the 1st of June 2019). Official Gazette of the Italian Republic (GU n.302 31- related to the Italian budget law. Moreover, we ex- 4.1 Pre-processing tracted 88 metadata describing the tweet (i.e. char- Since the tweets and their metadata would have acter length, device used, number of retweets, been used for lexical analyses and structural topic etc.) and the user (i.e. username, location, gender, modelling5, we performed several pre-processing etc.). In order to capture the most important steps: defining a “stop words” list for Italian con- phases of the Twitter discussion about the annual sisting of roughly 1,000 lexically empty or unin- budget law and considering the one-week rate formative words (i.e. prepositions, conjunctions, limit for tweets extraction imposed by the Stand- auxiliary verbs, etc.); uniformizing, normalising ard Search API4, the data were collected weekly and cleaning the texts with various corpus pro- from the 27th of November 2018 through the 8th of cessing functions available on the R packages January 2019, for a total of 43 consecutive days. quanteda (Benoit et al., 2018), tm (Feinerer, The hashtags used as keywords in the queries rep- Hornik, and Meyer, 2008), and qdapRegex resented all the names given to the budget bill by (Rinker, 2017). Hashtags at the beginning and in- Italian political actors, the press, and the public side the tweet sentences were kept and decom- opinion: “#leggedibilancio”, “#leggefinanziaria”, posed into words (i.e. from “#trasportipubblici” “#manovra”, “#manovradibilancio”, “#manov- to “trasporti pubblici”), while those after the final raeconomica”, “#manovradelpopolo”, and point were removed, since most of the times they “#manovrafinanziaria”. This guaranteed a large represented one of the keywords used for extract- coverage of Twitter users and tweet typologies. ing tweets. Numbers, punctuation, sequences Some of the afore-mentioned hashtags (e.g. made up of a single character, and excessive white “#manovra”, “#manovradelpopolo”) were also spaces were removed as well. In order to further trending at the end of December. use temporal metadata as a covariate for the topi- To avoid duplicates, we discarded all retweets cal prevalence, the “created_at” metadatum was and all posts that contained quotes of other tweets. divided it into date and hour. The removal process was obtained by filtering the dataset, thus selecting only tweets whose values 5 Analyses and results for “is_retweet” and “is_quote” corresponded to “FALSE”. Duplicates other than retweets and As a result of the ever-growing interest and avail- quotes were removed with R’s base functions du- ability of text data – often unstructured –, various plicated – which identified duplicated tweets – statistical and machine-assisted approaches for and unique – which extracted unique tweets. Since the analysis of textual material have been pro- the aim of this study is to uncover the reactions of posed. In this paper we are employing the Struc- the Italian voters active on Twitter, we removed tural Topic Model (STM) – a generative model of the tweets written by political actors. To do so, we word counts – (cf. Roberts et al., 2014) in R to defined a list containing the Twitter usernames of discover topics from tweets on the annual Italian the members of the Italian Parliament, as well as budget bill and to estimate their relationship to those of the official national and local party pro- temporal metadata. files; this list was used to automatically filter and Similarly to Latent Dirichlet Allocation (cf. remove tweets published by the unwanted pro- Blei, Ng, and Jordan, 2003) and Correlated Topic files. We decided to keep tweets from news agen- Model (cf. Blei and Lafferty, 2007), in the STM cies, online newspapers, and television channels, approach, a topic represents a mixture over words since they could represent vectors of information where each word has a probability of pertaining to exchange regarding the topic analysed in this a topic, whilst a document is a mixture over top- study. The final dataset contained 20,891 tweets. ics, therefore a specific document can consist of Tokens 701,986 various topics. The sum of the topic proportions across topics for a specific document as well as Words 414,803 the sum of word probabilities for a given topic Types 75,485 both qual to 1. The main innovation of STM is the Lemmas 31,947 possibility to model topical prevalence and topical Table 1: Dataset statistics. content6 as a function of metadata. Here we are 4 5 A description of the Standard Search API for Twitter Considering the scope of this paper and the analyses is available at this webpage: https://developer.twit- proposed, emoticons and emojis were left out. 6 ter.com/en/docs/tweets/search/api-reference/get- The topical prevalence shows the frequency with search-tweets.html (accessed on the 1st of June 2019). which a specific topic is discussed, while the topical using the date covariate to explain topical preva- Government and the oppositions on the provisions lence over time. of the law. After having calculated the estimated effects of 5.1 Topics the temporal covariate on topical prevalence, a After having employed the STM’s searchK func- plot displaying this variation was created. Figure tion to perform several tests, such as held-out like- 2 in Appendix shows how the afore-mentioned lihood and residual analysis, the ideal number of topics varied over the 43 days considered. Topics topics seemed to be between 10 and 14. Addition- are ordered as a function of their expected propor- ally, STM gave the possibility to set the type of tions. initialization, so here the spectral one was chosen, Firstly, there emerged that the variation was not since previous studies had proven its stability and particularly strong, except for some topics. For in- consistence (cf. Roberts, Stewart, and Tingley, stance, Topic 9 had a peak at the end of Decem- 2016). All results presented in this paragraph are ber/the beginning of January, suggesting that based on a K of 10. The date of the tweet was used Twitter users might have written tweets of con- as a prevalence covariate; as a word profile we cern soon after the approval of the annual Italian opted for the highest probability. We did not use budget law. On the other hand, Topic 6, which the stemming function on STM since it did not contained mostly tweets of support towards the perform well on Italian. measures of the budget bill seemed to be prevalent Figure 1 in Appendix shows the topics related primarily at the end of November and in mid-De- to the annual Italian budget law as they emerged cember. The procedural topic was generally prev- from the analysis of tweets. Each topic was further alent at the end of December, a timeframe corre- classified into one category (i.e. EU & Confi- sponding to the vote and approval of the law. The dence, Main Measures, Criticism & Concern, two topics summarising the negotiations with the Government vs. Opposition, Procedures – Ge- EU, the confidence, and the possible infringement neric, Support). This classification was based on procedure were pervasive during the entire period the correlations obtained from a hierarchical clus- considered, with some peaks in early- and mid- tering representation performed with the plot December. Topic 4 that regarded the disagree- function of the stmCorrViz package (Coppola et ment between the Government and the opposition al., 2016), on the review of the most characteris- was constant over time, and so were the topics de- ing words, and on the examination of the most ex- lineating the main measures of the law. emplar documents, namely the tweets that had the 5.2 Descriptive lexical analyses highest proportion of words associated with the topic. We were also interested in performing descriptive Although we do not claim to model public lexical analyses on tweets. First of all, with the opinion from tweets, interestingly, the topics terminology extraction tool on Sketch Engine managed to echo various issues regarding the (Kilgarriff et al., 2014) we obtained multi-key- budget law. Judging by the expected topic propor- words – able to convey more insights than single tions, one could order the most prevalent topics as words on the issues examined – that appear more follows: Topics 9, 8, and 3 (sum of topic propor- frequently in our dataset than in the reference cor- tions: 0.29) reflect disapproval and doubts to- pus (i.e. Italian Web 2016 – itTenTen16, cf. Jaku- wards the provisions of the budget law; Topics 1 bíček et al., 2013, for TenTen corpora). If we ex- and 7 (sum of topic proportions: 0.22) describe the clude the hashtags used as keywords for tweets difficult negotiation with the European Union extraction, these are the 30 most representative (EU) and the threat of an infringement procedure; syntagmas in our dataset: Topics 10 and 2 (sum of topic proportions: 0.19) Translation into Syntagma depict the main measures contained in the budget English bill; Topic 6 (topic proportion: 0.13) illustrates the reddito di cittadi- the citizen’s basic support to the budget bill and to the Government; nanza income Topic 5 (topic proportion: 0.11) refers to the pro- procedura di infra- infringement pro- cedures regarding the discussion, the vote, and the zione cedure approval of the budget law; and Topic 4 (topic clausole di salva- proportion: 0.06) reveals the conflict between the safeguard clauses guardia content represents the words used to discuss about that topic (cf. Roberts et al., 2014: 1068). voto di fiducia confidence vote Generally, three different scenarios are distin- blocco assunzioni hiring freeze guishable. professioni sanita- health professions First of all, there were several neutral verbs, rie senza titolo without a degree nouns, and modifiers associated to the budget law, flat tax flat tax most of which regarding its procedural aspects. commissione bilan- The most frequent (i.e. frequency ≥ 10.81 per mil- budget committee lion) are listed below: cio gilet azzurri blue vests Translation into Word/Syntagma taglio pensioni pension cuts English scatoletta di tonno tuna can scrivere write previous govern- cambiare change governi precedenti modificare modify ments pensioni minime minimum pensions discutere discuss scatola chiusa black box approvare approve nuove tasse new taxes contenere contain promesse elettorali campaign promises prevedere consist fasce deboli vulnerable citizens varare launch deficit strutturale structural deficit votare vote technical arrange- passare pass accordo tecnico riscrivere rewrite ment braccio di ferro trial of strength promulgare promulgate appalti senza gara no-bid contracts gialloverde yellow-green assurdità totale total nonsense economica economic terrorismo media- finanziaria financial media terrorism populista populist tico auto inquinanti polluting cars discussione discussion più tasse more taxes commissione commission sovereignist gov- bilancio budget governo sovranista Table 3: Neutral associations. ernment manovra contro il manoeuvre against Next, some positive evaluations of the budget popolo the people law emerged. The most frequent (i.e. frequency ≥ false promesse false promises 10.81 per million) are listed below: IVA sui tartufi VAT for truffles Translation into Word/Syntagma popolo italiano Italian people English Table 2: The most representative syntagmas in favorire (l’innova- favour (innova- the dataset. zione) tion) It is clear that various multi-word expressions grande big referred to procedural aspects, such as those re- buona good flecting the vote and the approval of the budget bella beautiful law (e.g. “confidence vote”), while others were significativa significant used to list its measures, especially fiscal and eco- del popolo of the people nomic policies (e.g. “the citizen’s basic income”, del cambiamento of the change “flat tax”, etc.). Nevertheless, various syntagmas per i cittadini for the citizens seemed to express doubts with respect to the pro- per la crescita for the growth visions of this law. In fact, often, the words chosen Table 4: Positive associations. by many Twitter users to express their criticism Nonetheless, several word associations seemed were rather strong (e.g. “total nonsense”, “black to suggest negative reactions to the budget law. box”, “sovereignist government”, etc.). The most frequent (i.e. frequency ≥ 10.81 per mil- These concerns and rather negative reactions to lion) are shown below: the budget bill were reflected also in the word Translation into sketches (i.e. visual representations of colloca- Word/Syntagma English tions and word combinations obtained on Sketch recessiva recessive Engine) for the words “manovra” and “legge”. piena di errori full of errors dannosa dangerous for drawing steady generalizations, even if the cattiva bad prospects they offer for content and discourse iniqua unfair analysis are indeed significant. Further research scellerata wicked on this topic might include the investigation of sbagliata wrong Twitter user’s reactions by means of sentiment snaturata wretched analysis. taroccata false vuota empty References assurda absurd Pablo Barberá, John T. Jost, Jonathan Nagler, Joshua folle deranged A. Tucker, and Richard Bonneau. 2015. Tweeting truffa fraud from Left to Right. Psychological Science, contro il popolo against the people 26(10):1531–1542. del popolino of the masses Kenneth Benoit, Kohei Watanabe, Haiyan Wang, Paul del cappio of the noose Nulty, Adam Obeng, Stefan Müller, and Akitaka da lacrime tearful Matsuo. 2018. quanteda: An R package for the scontro dispute quantitative analysis of textual data. Journal of Open Source Software, 3(30):774. protesta protest vergogna shame David M. Blei and John D. Lafferty. 2007. A correlated bocciatura failure topic model of Science. The Annals of Applied Sta- tistics, 1(1):17–35. della povertà of the poverty dell’assistenzi- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. of welfarism 2003. Latent Dirichlet Allocation. Journal of Ma- alismo buio dark chine Learning Research, 3(3):993–1022. diminuire diminish Shelley Boulianne. 2017. Revolution in the making? tagliare cut Social media effects across the globe. Information, Table 5: Criticism associations. Communication & Society, 22(1):39–54. Finally, using the tm’s findAssocs function, we Pete Burnap, Matthew L. Williams, Luke Sloan, Omer calculated the associations of the lemma Rana, William Housley, Adam Edwards, Vincent “manovra” in the term-document matrix; some of Knight, Rob Procter, and Alex Voss. 2014. Tweet- the afore-mentioned criticism words (e.g. “ab- ing the terror: modelling the social media reaction to surd”, “recessive”, “bad”) had a correlation higher the Woolwich terrorist attack. Social Network Anal- than 0.03, suggesting a rather frequent co-occur- ysis and Mining, 4(206):1-14. rence. Mauro Coletto, Claudio Lucchese, Salvatore Orlando, and Raffaele Perego. 2015. Italian Information Re- 6 Conclusions trieval Workshop - IIR 2015. In Proceedings of the 6th Italian Information Retrieval Workshop, Ca- This paper explored the Twitter users’ reactions to gliari. CEUR Workshop Proceedings. the annual Italian budget bill. STM outputs and descriptive lexical analyses showed that tweets Antonio Coppola, Margaret E. Roberts, Brandon M. Stewart, and Dustin Tingley. 2016. stmCorrViz: A concerned various aspects associated to the object Tool for Structural Topic Model Visualizations. R of this study. Apart from talking about procedural package version 1.3. Retrieved from https://cran.r- and generic issues, users expressed their doubts project.org/web/packages/stmCorrViz/index.html/ and disapproval with respect to the measures of (accessed on the 1st of June 2019). the budget law. Generally, tweets supporting this Ingo Feinerer, Kurt Hornik, and David Meyer. 2008. law were less frequent. The findings of this study, Text Mining Infrastructure in R. Journal of Statisti- although preliminary, might be seen as indicators cal Software, 25(5):1-54. of what subsequently turned out to be a failure for the first Conte government. Still, as reiterated Rui Gaspar, Cláudia Pedro, Panos Panagiotopoulos, throughout the paper, the results might not reflect and Beate Seibt. 2016. Beyond positive or negative: Qualitative sentiment analysis of social media reac- the real attitudes of the Italian voting population, tions to unexpected stressful events. Computers in since Twitter users tend to be younger and to have Human Behavior, 56:179–191. an above the average level of education and polit- ical sophistication (cf. Barberá et al., 2015). Daniel Gayo-Avello. 2013. A Meta-Analysis of State- Moreover, tweets, by nature, might not be suitable of-the-Art Electoral Prediction from Twitter Data. Social Science Computer Review, 31(6):649– Margaret E. Roberts, Brandon M. Stewart, Dustin Tin- 679. gley, Christopher Lucas, Jetson Leder-Luis, Shana Kushner Gadarian, Bethany Albertson, and David Hootsuite Media Inc. 2019. Digital in 2019. Retrieved G. Rand. 2014. Structural Topic Models for Open- from https://hootsuite.com/it/risorse/digital-in- Ended Survey Responses. American Journal of Po- 2019-italy (accessed on the 1st of June 2019). litical Science, 58(4):1064–1082. Internet Live Stats. 2019. Twitter Usage Statistics. Re- Margaret E. Roberts, Brandon M. Stewart, and Dustin trieved from https://www.inter- Tingley. 2016. Navigating the Local Modes of Big netlivestats.com/twitter-statistics/ (accessed on the Data: The Case of Topic Models. In R. Michael Al- 1st of June 2019). varez (editor), Computational Social Science: Dis- Miloš Jakubíček, Adam Kilgarriff, Vojtěch Kovář, covery and Prediction (Analytical Methods for So- Pavel Rychly, and Vít Suchomel. 2013. The TenTen cial Research), 51–97. Cambridge University Corpus Family. In 7th International Corpus Lin- Press., Cambridge. guistics Conference CL 2013. Lancaster, 125-127. Evangelos Kalampokis, Areti Karamanou, Efthimios Tambouris, and Konstantinos Tarabanis. 2017. On Predicting Election Results using Twitter and Linked Open Data: The Case of the UK 2010 Elec- tion. Journal of Universal Computer Science, 23(3):280–303. Adam Kilgarriff, Vít Baisa, Jan Bušta, Miloš Jaku- bíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. 2014. The Sketch Engine: ten years on. Lexicography, 1(1):7–36. Michael W. Kearney. 2019. rtweet: Collecting Twitter Data. R package version 0.6.9 Retrieved from https://cran.r-project.org/package=rtweet (ac- cessed on the 1st of June 2019). Amanda Lee Hughes and Leysia Palen. 2009. Twitter adoption and use in mass convergence and emer- gency events. International Journal of Emergency Management, 6(3/4):248–260. Paolo Nesi, Gianni Pantaleo, Irene Paoli, and Imad Zaza. 2018. Assessing the reTweet proneness of tweets: predictive models for retweeting. Multime- dia Tools and Applications, 77(20):26371–26396. Brian L. Ott. 2017. The age of Twitter: Donald J. Trump and the politics of debasement. Critical Studies in Media Communication, 34(1):59–68. R Core Team. 2018. R: A language and environment for statistical computing. R Foundation for Statisti- cal Computing, Vienna, Austria. Retrieved from http://www.R-project.org/ (accessed on the 1st of June 2019). Tyler W. Rinker. 2017. qdapRegex: Regular Expres- sion Removal, Extraction, and Replacement Tools. R package version 0.7.2. University at Buffalo. Buf- falo, New York. Retrieved from http://github.com/trinker/qdapRegex/ (accessed on the 1st of June 2019). Appendix Figure 1: Topics and word probabilities. Figure 2: Variation of topic proportions over time.