=Paper= {{Paper |id=Vol-2481/paper71 |storemode=property |title=Prendo la Parola in Questo Consesso Mondiale: A Multi-Genre 20th Century Corpus in the Political Domain |pdfUrl=https://ceur-ws.org/Vol-2481/paper71.pdf |volume=Vol-2481 |authors=Sara Tonelli,Rachele Sprugnoli,Giovanni Moretti |dblpUrl=https://dblp.org/rec/conf/clic-it/TonelliSM19 }} ==Prendo la Parola in Questo Consesso Mondiale: A Multi-Genre 20th Century Corpus in the Political Domain== https://ceur-ws.org/Vol-2481/paper71.pdf

Prendo la Parola in Questo Consesso Mondiale:
A Multi-Genre 20th Century Corpus in the Political Domain
Sara Tonelli† , Rachele Sprugnoli‡ , Giovanni Moretti†‡
†
Fondazione Bruno Kessler, Trento, Italy
‡
Università Cattolica, Milano, Italy
{satonelli,moretti}@fbk.eu
rachele.sprugnoli@unicatt.it

Abstract major issues, especially in those countries where
no or only limited public initiatives have been un-
English. In this paper we present a multi- dertaken to support the distribution of this kind
genre corpus spanning 50 years of Euro- of documents. For example, while in the US
pean history. It contains a comprehensive the Federal Digital System grants access to pub-
collection of Alcide De Gasperi’s pub- lic Presidential documents through APIs and bulk-
lic documents, 2,762 in total, written or data repositories, in Italy an effort along this line
transcribed between 1901 and 1954. The has started only recently with the support of the
corpus comprises different types of texts, Archive of the President of the Republic6 , but has
including newspaper articles, propaganda not delivered substantial results so far.
documents, official letters and parliamen-
tary speeches. The corpus is freely avail- This work represents a first attempt to deal with
able and includes several annotation lay- this lack of data, since we present and make avail-
ers, i.e. key-concepts, lemmas, PoS tags, able a large corpus of Italian public documents in
person names and geo-referenced places, the political domain. In particular, we release a
representing a high-quality ‘silver’ anno- comprehensive collection of Alcide De Gasperi’s
tation. We believe that this resource can public documents issued between 1901 and 1954,
foster research in historical corpus anal- which had been previously published in four vol-
ysis, stylometry and computational social umes by Il Mulino (De Gasperi, 2006; De Gasperi,
science, among others.1 2008a; De Gasperi, 2008b; De Gasperi, 2009) but
were not machine-readable. Our repository con-
1 Introduction tains all documents in three formats: txt, XML
and tab-separated. Raw text files contain only
In recent years, political scientists and history
the body of the documents, and may be straight-
scholars have started to exploit the availability of
forwardly used to extract embeddings or topics.
digital material to enrich their research, taking ad-
XML files include metadata that cover not only the
vantage of freely accessible online archives and
title, the date and the place of publication, but also
easy-to-use tools for text processing and data ex-
key-concepts automatically extracted from each
traction. Active communities have been created
text and genre labels manually assigned by do-
around topics such as the study of Parliamentary
main experts. Furthermore, the release includes
corpora (see the ParlaCLARIN2 and ParlaFormat
silver annotation for lemma, part of speech, per-
workshops3 ), the analysis of political manifestos4
son names and place names with associated co-
and of Presidential speeches.5 Despite the impor-
ordinates in a CoNLL-like format. All files and
tance of this research field, copyright and avail-
the corresponding descriptions can be downloaded
ability in machine-readable format still represent
at https://dh.fbk.eu/technologies/
1
Copyright ©2019 for this paper by its authors. Use per- corpus-de-gasperi (with CC BY-NC-SA li-
mitted under Creative Commons License Attribution 4.0 In-
ternational (CC BY 4.0). cense). The corpus can also be navigated using
2
https://www.clarin.eu/ParlaCLARIN the ALCIDE platform (Moretti et al., 2016) at this
3
https://www.clarin.eu/event/2019/ link: http://alcidedigitale.fbk.eu/.
parlaformat-workshop
4
https://manifesto-project.wzb.eu/
5
https://www.presidency.ucsb.edu/
6
documents https://archivio.quirinale.it/aspr/
2 Related Work notated and then partially revised by hand (De Fe-
lice et al., 2018). Compared with these two last
The political domain has been studied in com-
works, our corpus is broader, having a multi-
putational linguistics from various perspectives.
layered semantic analysis, and completely avail-
Annotated corpora have been created to analyse
able for download in different formats, thus open
rhetoric and metaphors in political communica-
to further analysis by the research community.
tion (Cardie and Wilkerson, 2008; Ahrens et al.,
2018), study the impact of speeches on the audi- 3 Corpus Description
ence (Guerini et al., 2013; Thomas et al., 2006)
and understand the relationship between ideol- Our corpus contains the complete collection of
ogy and linguistic complexity (Schoonvelde et al., public documents by Alcide De Gasperi, the first
2019). Resources have also been developed to Prime Minister of the Italian Republic and one of
train and test automatic systems for several types the founding fathers of the European Union. It in-
of NLP tasks, such as persuasiveness prediction cludes 2,762 documents published between 1901
(Strapparava et al., 2010), sentiment and emotion and 1954, for a total of around 3,000,000 tokens.
analysis (Young and Soroka, 2012; Rheault et al., The corpus is released as raw text, as XML with
2016), text classification (Yu et al., 2008), topic- a minimal set of meta-data and associated key-
based agreement detection (Menini et al., 2017) concepts, and as CoNLL-like format, with addi-
and recognition of ideological positions (Hirst et tional information that have been fully or semi-
al., 2010). automatically annotated (see Section 4). Texts,
Many research activities have recently dealt date and place of publication were automatically
with the digitisation and release of corpora con- generated starting from the PDF files used to is-
taining historical political texts. For example, the sue the volumes edited by Il Mulino. Each doc-
corpus of speeches given in the British Parlia- ument of the collection was classified manually
ment from 1803 to 2005 (i.e. the Hansard Cor- by a group of history scholars on the basis of a
pus) has been automatically tagged using the His- two-layered hierarchy that takes into consideration
torical Thesaurus Semantic Tagger (Piao et al., whether the text was originally released in an oral
2014; Wattam et al., 2014) and then a part of it or written form, and its specific genre. It is impor-
has been semantically enriched with information tant to note that different text genres correspond to
about speakers and topics (Nanni et al., 2019). different roles covered by De Gasperi during his
In addition, the Canadian Parliamentary Debates life: e.g. daily press when he worked as a journal-
(1901-present) have been standardised, enriched ist for newspapers in Trentino, speeches in institu-
and distributed within the “Digging into Linked tional venues when he was a Member of the Italian
Parliamentary Data” project (Beelen et al., 2017). Parliament.
The period from 1947 to 2017 is instead covered History scholars identified also four time spans
by a dataset of Dutch and Danish party congress to which each document can be assigned, that
speeches (Schumacher et al., 2019). characterise different periods in De Gasperi’s life.
As for Italian, to the best of our knowledge, These correspond to the four volumes of the
the only available comprehensive study of the lan- printed edition and are used to split the corpus into
guage of Italian politicians is the one by Bolasco different periods based on the date of publication:
(2015). He analyses the parliamentary proceed- Vol. I : De Gasperi was a journalist and a students’
ings of the Italian Chamber of Deputies in the pe- leader. He was active mainly in Trento and in
riod 1953-2008 using the TalTac2 software7 , thus the Austrian Parliament (1901 – 1918).
providing a lexical and statistical analysis. An-
other project related to our work is “Voci della Vol. II : De Gasperi founded Partito Popolare, be-
Grande Guerra” whose online platform allows to came Parliament member in Rome and then
explore a corpus of documents related to the first left the Italian political life for several years
World War including samples of parliamentary after opposing the Fascist regime, working at
proceedings and political speeches (Lenci et al., the Vatican library and as a publicist (1919 –
2016). Similarly to what we present in this pa- 1942).
per, such documents have been automatically an- Vol. III : De Gasperi founded the Christian-
7
http://www.taltac.it/ Democratic Party, became Prime Minister
Document Type Number concepts, that is a weighted list of n-grams rep-
Monographs / Prefaces 4
Daily press 963 resenting the most important concepts of a text,
Written documents
Magazines 228 automatically extracted using KD (Moretti et al.,
Official documents 433 2015).
Electoral / propaganda 473
Speeches Party conferences 188
Institutional venues 419 5 Annotation Evaluation
Not specified Not specified 54
We evaluated the quality of the automatic anno-
Table 1: Genre labels with corresponding statis- tation produced by TextPro modules on a subset
tics. of our corpus. Indeed, since these modules were
developed to perform best on contemporary texts,
and was Italian delegate at the World War II and typically trained on news, it is important to
peace conference (1943 – 1948). assess to what extent they can be reliably used on
Italian documents of the XX Century in the polit-
Vol. IV : After Christian Democracy led by De ical domain. To this end we manually annotated
Gasperi won the first general elections of the a gold standard made of documents written by
Italian Republic, he launched a plan of re- De Gasperi between 1906 and 1911 for a total of
forms to reconstruct Italy including social 8,872 tokens. We chose texts belonging to the first
housing, labor policy and unemployment in- period of De Gasperi’s life because they are the
surance (1949 – 1954). oldest in the corpus and therefore the most linguis-
tically different from the texts used for training the
4 Annotated Information modules. Results of the evaluation are compared
with the ones obtained by TextPro on contempo-
The annotations included in the release are: rary texts.
• Lemma and PoS: the corpus has been lem- 5.1 Lemmatization
matised and PoS-tagged using the TextPro
suite (Pianta et al., 2008). The module for Table 3 shows TextPro accuracy obtained on our
the lemmatization is a rule-based system, gold standard compared with the ones reported in
whereas the part-of-speech annotation is sta- Aprosio and Moretti (2018) and calculated on the
tistical and has been trained on the EVALITA Universal Dependencies (UD) test set for Italian
2007 dataset (Tamburini, 2007) following the (Bosco et al., 2013). The drop of 0.7 points in
EAGLES tagset (Monachini, 1996). accuracy is mainly due to some repeated anoma-
lies of the module in the lemmatization of defi-
• Person and place names: named entities have nite and indefinite articles (which are lemmatized
been tagged using the NER module included using the labels “det” and “indet”, instead of sin-
in TextPro and trained on the I-CAB corpus gular masculine forms “il” and “uno”) and to the
(Magnini et al., 2006). Geopolitical entities non-recognition of truncated words, such as “far”,
(GPEs) have also been geo-referenced using “bel”, “andar”, “vuol”, not common in contempo-
Nominatim8 (Clemens, 2015). The number rary texts. Other sources of errors are the pres-
of person and place names per volume is pro- ence of obsolete terms, e.g. “libello”, “soziale”,
vided in Table 2. “donde”, and the use of preterite (passato remoto,
e.g. “andò”, “apparve”), a grammatical tense not
After running the automatic modules, the output very frequent in contemporary news. Most of
was uploaded in the ALCIDE platform (Moretti previously mentioned anomalies have been fixed
et al., 2016) and, through its navigation interface, through a set of rules applied after data processing:
we identified annotations that were systematically after this correction, accuracy has risen to 0.97.
wrongly tagged, and fixed them manually. An
evaluation of the automatic annotation is reported 5.2 PoS Tagging
in Section 5.
The presence of obsolete words, truncated forms
In addition to the annotations previously men-
and preterite verbs leads to errors also in the PoS
tioned, each document is assigned to a set of key-
tagger of TextPro. However, for this module the
8
https://nominatim.openstreetmap.org/ impact is less evident than for lemmatization: as
VOL I VOL II VOL III VOL IV
PER GPE PER GPE PER GPE PER GPE
4,126 6,168 2,890 2,956 3,018 4,324 5,701 6,308
Gesù Cristo Trento Gesù Cristo Italia Palmiro Togliatti Italia Pietro Nenni Italia
Augusto Avancini Alto Adige Mussolini Roma Pietro Nenni Trieste Palmiro Togliatti Europa
Karl Lueger Trentino Leone XIII Germania Marshall Russia Tito Trieste

Table 2: Occurrences of PER and GPE per volume, with three top-frequent entities for each category.

UD Test Set De Gasperi Corpus (GPE) in De Gasperi’s documents is compared
Accuracy Accuracy
Lemma 0.96 0.89 with the scores TextPro obtained in the EVALITA
2007 campaign (Speranza, 2007), when trained
Table 3: Comparison of lemmatization perfor- and tested on a newswire corpus. The tool shows
mance on the Italian UD test set and on our gold a drop in performance on our gold standard only
standard. in the recognition of persons’ names (-0.16 F1
points), whereas place names seem to be more sta-
reported in Table 4, on De Gasperi’s documents ble (+0.1 F1 points). In both categories, precision
the performance drop is only 0.1 points accuracy has decreased more than recall: to improve it, we
with respect to the results obtained on the UD test manually checked the named entities detected by
set. Table 5 gives details on the number and dis- the automatic module in the whole corpus remov-
tribution of errors per grammatical category. Cat- ing the wrong ones. We also verified the latitude
egories registering the higher quantity of mistaken and the longitude retrieved with Nominatim for
tags are nouns, proper nouns, verbs and adjectives. all the GPEs assigning new correct coordinates to
Most mistakes concerning nouns are due to words about 6% of them. Errors were mainly related to
capitalised to show formal respect towards high- places that no longer exist or that have changed
est representatives of the State or of the Church names after the death of De Gasperi, (e.g. “Prus-
(e.g. “Vescovo”) and German common nouns that sia”, “Congo Belga”) and to little villages in the
all have the initial capital letter. Trentino area (e.g. “Oltresarca”, “Termon”).
EVALITA 2007 test set De Gasperi corpus
UD Test Set De Gasperi Corpus P R F1 P R F1
Accuracy Accuracy PER 0.92 0.93 0.92 0.70 0.82 0.76
PoS 0.96 0.95 GPE 0.85 0.86 0.85 0.82 0.90 0.86
Table 4: Comparison of PoS tagging performance Table 6: Comparison of NER performance on
on the Italian UD test set and on our gold standard. news and on our gold standard.

Grammatical Category #errors %errors 6 Use Cases
Adjectives 62 15.54
Adverbs 24 6.02 The corpus has been used to perform a number of
Conjunctions 6 1.50
pilot studies, which have confirmed the potential
Demonstrative Adjectives 8 2.01
Prepositions 10 2.51 of this kind of resource and could represent a start-
Pronouns 12 3.01 ing point for further developments (Sprugnoli et
Relative Pronouns 1 0.25 al., 2016). Three of these studies are described in
Articles 11 2.76
Nouns 94 23.56 this Section.
Proper Nouns 91 22.81 A first analysis has been carried out with the
Verbs 73 18.30 goal of studying De Gasperi’s rhetoric strategy
Acronyms 6 1.50
Foreign Terms 1 0.25 through his use of verb tenses, considered as
an important marker of temporality (Sprugnoli et
Table 5: PoS-tagging errors per category. al., 2018). This study is based on the paradigm
proposed by Chilton (2004), who includes time
among the three axes of the political discourse to-
5.3 Persons and GPEs gether with space and modality.
In Table 6 the performance of automatic recog- We run the morphological analyzer included in
nition of persons (PER) and geo-political entities TINT NLP Suite to recognise the tenses of all
verbs of the corpus. We then merge them into dimensions in Chilton (2004), this shift should be
present, past and future tense and compare the dis- seen in the light of De Gasperi’s effort after 1943
tribution of the three classes across the four vol- to justify past and present policy, using mentioned
umes. We observe that there is an evident differ- persons to build a national ideology.
ence between the use of verb tenses before and A third analysis focused on how temporal in-
after 1943. Indeed, in the first two volumes past formation is expressed in De Gasperi’s documents
tenses are more frequently used, with a highly (Speranza and Sprugnoli, 2018). To explore this
statistically significant difference with respect to aspect we manually annotated ten newspaper ar-
volumes III and IV (p < 0.001 using Wilcoxon ticles, published in 1914 and related to the out-
signed-rank test). On the other hand, after 1943 break of the Great War, following the It-TimeML
De Gasperi uses more present and future tense, guidelines (Caselli et al., 2011). This resource has
again with high statistical significance. This can been used in the EVENTI task organized within
be explained by the fact that the last volumes EVALITA 2014 (Caselli et al., 2014) and is freely
contain many press reports describing the pro- available online. The average number of annotated
grammatic commitment of Christian Democracy events and temporal relations in the documents
as well as letters and telegrams sent by De Gasperi written by De Gasperi is higher than in contem-
as Minister of Foreign Affairs, where the devel- porary newspaper articles annotated following the
opment of prospective collaborations is proposed. same guidelines, whereas the density of temporal
The last volume discusses also the reforms to be expressions is comparable. Other differences con-
adopted for the reconstruction of the newly born cern the type of events, temporal expressions and
Italian Republic and those about the forthcoming temporal relations present in the historical texts.
creation of a European Community. In general, For example, De Gasperi frequently uses events
after 1943 we observe a shift of focus from past expressing personal opinions about the topics cov-
events to the contemporary and future dimension. ered in the articles. The high presence of specula-
tions influences the temporal structure of the texts:
A second analysis related to temporality deals in many cases events are not ordered chronologi-
with cited persons, which were linked to a Dbpe- cally but presented as simultaneous with respect
dia entry using the Wiki Machine (Palmero Apro- to the time of writing. Moreover, temporal ex-
sio and Giuliano, 2016). Through this link, each pressions are mainly non-specific or fuzzy: a char-
person is associated with a dbo:birthDate and acteristic that is less evident in other corpora of
dbo:deathDate and then to a Past or Present la- contemporary texts, and that may be related to the
bel, again using the document date as a reference. more speculative nature of political texts.
Persons are considered part of the past if the ref-
erent was dead before the document publication 7 Conclusions
time. Using the classification algorithm described
in (Palmero Aprosio et al., 2017) we further as- In this paper we present the release of the cor-
sign a semantic category to each mention. A com- pus of Alcide De Gasperi’s public writings, in-
parative analysis shows that contemporary persons cluding 2,762 documents and around 3 million to-
are generally more cited than past ones, but also kens. We make available raw texts, XML files
that the category of persons mentioned in the doc- having a small set of metadata and key-concepts
ument changes significantly across the volumes: and CoNLL-like files with lemma, PoS, PER, GPE
while in Volume I cited persons include politicians annotation together with the coordinates of place
but also religious figures and artists, this range of names. Based on an evaluation performed on all
figures decreases over time, with almost exclu- four annotation layers, we show that their quality
sively political figures mentioned in Volume IV. is good, although annotation was performed auto-
As an example, we report in Fig. 1 and Fig. 2 the matically and only partially revised.
top-cited persons in Vol. I and IV respectively: This is the first freely available corpus of this
while in the early documents Beethoven, Dante kind, and we hope that it can be used to foster re-
and Nietzsche are highly cited, persons mentioned search in political science, corpus linguistics and
in the late documents include exclusively politi- history, as well as to develop and test NLP sys-
cians and religious figures, all from present time or tems using data that are different from widely used
recent past. With reference to the previously cited contemporary news.
Figure 1: Past and present persons mentioned Figure 2: Past and present persons mentioned
in Vol. 1. in Vol. 4.

Acknowledgments Claire Cardie and John Wilkerson. 2008. Text An-
notation for Political Science Research. Journal of
We thank the colleagues from the Italian-German Information Technology & Politics, 5(1):1–6.
Historical Institute at Fondazione Bruno Kessler
Tommaso Caselli, Valentina Bartalesi Lenzi, Rachele
for their help in annotating De Gasperis corpus, Sprugnoli, Emanuele Pianta, and Irina Prodanof.
and Edizioni Il Mulino, for giving access to the 2011. Annotating events, temporal expressions and
corpus and allowing its release. The project has relations in Italian: the It-TimeML experience for
been partially supported by Fondazione Cassa di the Ita-TimeBank. In Proceedings of the 5th Lin-
guistic Annotation Workshop, pages 143–151. Asso-
Risparmio di Trento e Rovereto and Fondazione ciation for Computational Linguistics.
Cassa di Risparmio delle Province Lombarde.
Tommaso Caselli, Rachele Sprugnoli, Manuela Sper-
anza, and Monica Monachini. 2014. EVENTI
References EValuation of Events and Temporal INformation at
Evalita 2014. In Proceedings of the Fourth Interna-
Kathleen Ahrens, Huiheng Zeng, and Shun-han Re- tional Workshop EVALITA 2014, pages 27–34.
bekah Wong. 2018. Using a Corpus of English
and Chinese Political Speeches for Metaphor Anal- Paul Chilton. 2004. Analysing political discourse:
ysis. In Proceedings of the Eleventh International Theory and practice. Routledge.
Conference on Language Resources and Evaluation
(LREC-2018). Konstantin Clemens. 2015. Geocoding with open-
streetmap data. GEOProcessing 2015, page 10.
Alessio Palmero Aprosio and Giovanni Moretti. 2018.
Irene De Felice, Felice DellOrletta, Giulia Ven-
Tint 2.0: an All-inclusive Suite for NLP in Italian. In
turi, Alessandro Lenci, and Simonetta Montemagni.
Proceedings of the Fifth Italian Conference on Com-
2018. Italian in the Trenches: Linguistic Annotation
putational Linguistics (CLiC-it 2018), Torino, Italy,
and Analysis of Texts of the Great War. In Fifth
December 10-12, 2018.
Italian Conference on Computational Linguistics
Kaspar Beelen, Timothy Alberdingk Thijm, Christo- (CLiC-it 2018), pages 160–164. Accademia Univer-
pher Cochrane, Kees Halvemaan, Graeme Hirst, sity Press.
Michael Kimmins, Sander Lijbrink, Maarten Marx, Alcide De Gasperi. 2006. Alcide De Gasperi nel
Nona Naderi, Ludovic Rheault, et al. 2017. Dig- Trentino asburgico. In Scritti e discorsi politici di
itization of the Canadian parliamentary debates. Alcide De Gasperi, volume 1. Il Mulino.
Canadian Journal of Political Science/Revue cana-
dienne de science politique, 50(3):849–864. Alcide De Gasperi. 2008a. Alcide De Gasperi dal Par-
tito popolare italiano all’esilio interno 1919-1942.
Sergio Bolasco, 2015. Sulla costruzione di un cor- In Scritti e discorsi politici di Alcide De Gasperi,
pus per l’analisi automatica del linguaggio parla- volume 2. Il Mulino.
mentare dei leader, chapter 5. Camera dei Deputati.
Alcide De Gasperi. 2008b. Alcide De Gasperi e la
Cristina Bosco, Montemagni Simonetta, and Simi fondazione della Democrazia cristiana, 1943-1948.
Maria. 2013. Converting Italian Treebanks: To- In Scritti e discorsi politici di Alcide De Gasperi,
wards an Italian Stanford Dependency Treebank. In volume 3. Il Mulino.
7th Linguistic Annotation Workshop and Interoper-
ability with Discourse, pages 61–69. The Associa- Alcide De Gasperi. 2009. Alcide de Gasperi e la sta-
tion for Computational Linguistics. bilizzazione della Repubblica 1948-1954. In Scritti
e discorsi politici di Alcide De Gasperi, volume 4. Il Alessio Palmero Aprosio, Sara Tonelli, Stefano
Mulino. Menini, and Giovanni Moretti. 2017. Using Seman-
tic Linking to Understand Persons’ Networks Ex-
Marco Guerini, Danilo Giampiccolo, Giovanni tracted from Text. Front. Digital Humanities, 2017.
Moretti, Rachele Sprugnoli, and Carlo Strapparava.
2013. The new release of CORPS: A corpus of po- Emanuele Pianta, Christian Girardi, and Roberto
litical speeches annotated with audience reactions. Zanoli. 2008. The TextPro Tool Suite. In Proceed-
In Multimodal Communication in Political Speech. ings of Language Resources and Evaluation Confer-
Shaping Minds and Social Action, pages 86–98. ence, pages 2603–2607, Marrakech, Morocco.
Springer.
Scott Piao, Fraser Dallachy, Alistair Baron, Paul
Graeme Hirst, Yaroslav Riabinin, and Jory Graham. Rayson, and Marc Alexander. 2014. Developing
2010. Party status as a confound in the automatic the Historical Thesaurus Semantic Tagger. In The
classification of political speech by ideology. In Digital Humanities Congress 2014.
Proceedings of the 10th International Conference on
Statistical Analysis of Textual Data (JADT 2010), Ludovic Rheault, Kaspar Beelen, Christopher
pages 731–742. Cochrane, and Graeme Hirst. 2016. Measuring
emotion in parliamentary debates with automated
Alessandro Lenci, Nicola Labanca, Claudio Marazz- textual analysis. PloS one, 11(12):e0168843.
ini, and Simonetta Montemagni. 2016. Voci della
Grande Guerra An Annotated Corpus of Italian Martijn Schoonvelde, Anna Brosius, Gijs Schumacher,
Texts on World War I. Italian Journal of Compu- and Bert N Bakker. 2019. Liberals lecture, con-
tational Linguistics, pages 101–108. servatives communicate: Analyzing complexity and
ideology in 381,609 political speeches. PloS one,
Bernardo Magnini, Emanuele Pianta, Christian Girardi, 14(2):e0208450.
Matteo Negri, Lorenza Romano, Manuela Speranza,
Valentina Bartalesi Lenzi, and Rachele Sprugnoli. Gijs Schumacher, Daniel Hansen, Mariken ACG
2006. I-CAB: the Italian Content Annotation Bank. van der Velden, and Sander Kunst. 2019.
In LREC, pages 963–968. A new dataset of Dutch and Danish party
congress speeches. Research & Politics,
6(2):2053168019838352.
Stefano Menini, Federico Nanni, Simone Paolo
Ponzetto, and Sara Tonelli. 2017. Topic-based
Manuela Speranza and Rachele Sprugnoli. 2018.
agreement and disagreement in US electoral mani-
Annotation of Temporal Information on Historical
festos. In Proceedings of the 2017 Conference on
Texts: a Small Corpus for a Big Challenge. Formal
Empirical Methods in Natural Language Process-
Representation and the Digital Humanities, page
ing, pages 2938–2944.
203.
Monica Monachini. 1996. ELM-it: EAGLES speci- Manuela Speranza. 2007. EVALITA 2007: The
fications for Italian morphosyntax lexicon specifica- Named Entity Recognition Task. In Proceedings of
tion and classification guidelines. Technical report, the EVALITA 2007 Workshop on Evaluation of NLP
Centre National de la Recherche Scientifique Paris, Tools for Italian, pages 66–68, Rome, Italy.
France.
Rachele Sprugnoli, Giovanni Moretti, Sara Tonelli, and
Giovanni Moretti, Rachele Sprugnoli, and Sara Tonelli. Stefano Menini. 2016. Fifty years of european his-
2015. Digging in the Dirt: Extracting Keyphrases tory through the lens of computational linguistics:
from Texts with KD. In Proceedings of the Sec- the de gasperi project. IJCol-Italian journal of com-
ond Italian Conference on Computational Linguis- putational linguistics, 2(2):89–100.
tics (CLiC-it 2015).
Rachele Sprugnoli, Giovanni Moretti, and Sara Tonelli.
Giovanni Moretti, Rachele Sprugnoli, Stefano Menini, 2018. Temporal Dimension in Alcide De Gasperi:
and Sara Tonelli. 2016. ALCIDE: Extracting and Past, Presentand Future in Historical Political Dis-
visualising content from large document collections course. In AIUCD 2018 - Book of Abstracts, pages
to support Humanities studies. Knowledge-Based 77–80.
Systems, 111:100–112.
Carlo Strapparava, Marco Guerini, and Oliviero Stock.
Federico Nanni, Stefano Menini, Sara Tonelli, and Si- 2010. Predicting Persuasiveness in Political Dis-
mone Paolo Ponzetto. 2019. Semantifying the UK courses. In Proceedings of the Seventh International
Hansard (1918-2018). In Proceedings of JCDL19. Conference on Language Resources and Evaluation
(LREC’10), pages 1342–1345.
Alessio Palmero Aprosio and Claudio Giuliano. 2016.
The Wiki Machine: an open source software for en- Fabio Tamburini. 2007. Evalita 2007: The Part-
tity linking and enrichment. ArXiv e-prints, Septem- of-Speech Tagging Task. Intelligenza artificiale,
ber. 4(2):57–73.
Matt Thomas, Bo Pang, and Lillian Lee. 2006. Get out
the vote: Determining support or opposition from
Congressional floor-debate transcripts. In Proceed-
ings of the 2006 conference on empirical methods in
natural language processing, pages 327–335. Asso-
ciation for Computational Linguistics.
Stephen Wattam, Paul Rayson, Marc Alexander, and
Jean Anderson. 2014. Experiences with Parallelisa-
tion of an Existing NLP Pipeline: Tagging Hansard.
In LREC, pages 4093–4096.
Lori Young and Stuart Soroka. 2012. Affective news:
The automated coding of sentiment in political texts.
Political Communication, 29(2):205–231.
Bei Yu, Stefan Kaufmann, and Daniel Diermeier.
2008. Classifying party affiliation from political
speech. Journal of Information Technology & Pol-
itics, 5(1):33–48.