=Paper= {{Paper |id=Vol-2481/paper71 |storemode=property |title=Prendo la Parola in Questo Consesso Mondiale: A Multi-Genre 20th Century Corpus in the Political Domain |pdfUrl=https://ceur-ws.org/Vol-2481/paper71.pdf |volume=Vol-2481 |authors=Sara Tonelli,Rachele Sprugnoli,Giovanni Moretti |dblpUrl=https://dblp.org/rec/conf/clic-it/TonelliSM19 }} ==Prendo la Parola in Questo Consesso Mondiale: A Multi-Genre 20th Century Corpus in the Political Domain== https://ceur-ws.org/Vol-2481/paper71.pdf
                   Prendo la Parola in Questo Consesso Mondiale:
             A Multi-Genre 20th Century Corpus in the Political Domain
                         Sara Tonelli† , Rachele Sprugnoli‡ , Giovanni Moretti†‡
                                †
                                  Fondazione Bruno Kessler, Trento, Italy
                                    ‡
                                      Università Cattolica, Milano, Italy
                                  {satonelli,moretti}@fbk.eu
                               rachele.sprugnoli@unicatt.it

                        Abstract                                major issues, especially in those countries where
                                                                no or only limited public initiatives have been un-
        English. In this paper we present a multi-              dertaken to support the distribution of this kind
        genre corpus spanning 50 years of Euro-                 of documents. For example, while in the US
        pean history. It contains a comprehensive               the Federal Digital System grants access to pub-
        collection of Alcide De Gasperi’s pub-                  lic Presidential documents through APIs and bulk-
        lic documents, 2,762 in total, written or               data repositories, in Italy an effort along this line
        transcribed between 1901 and 1954. The                  has started only recently with the support of the
        corpus comprises different types of texts,              Archive of the President of the Republic6 , but has
        including newspaper articles, propaganda                not delivered substantial results so far.
        documents, official letters and parliamen-
        tary speeches. The corpus is freely avail-                  This work represents a first attempt to deal with
        able and includes several annotation lay-               this lack of data, since we present and make avail-
        ers, i.e. key-concepts, lemmas, PoS tags,               able a large corpus of Italian public documents in
        person names and geo-referenced places,                 the political domain. In particular, we release a
        representing a high-quality ‘silver’ anno-              comprehensive collection of Alcide De Gasperi’s
        tation. We believe that this resource can               public documents issued between 1901 and 1954,
        foster research in historical corpus anal-              which had been previously published in four vol-
        ysis, stylometry and computational social               umes by Il Mulino (De Gasperi, 2006; De Gasperi,
        science, among others.1                                 2008a; De Gasperi, 2008b; De Gasperi, 2009) but
                                                                were not machine-readable. Our repository con-
1       Introduction                                            tains all documents in three formats: txt, XML
                                                                and tab-separated. Raw text files contain only
In recent years, political scientists and history
                                                                the body of the documents, and may be straight-
scholars have started to exploit the availability of
                                                                forwardly used to extract embeddings or topics.
digital material to enrich their research, taking ad-
                                                                XML files include metadata that cover not only the
vantage of freely accessible online archives and
                                                                title, the date and the place of publication, but also
easy-to-use tools for text processing and data ex-
                                                                key-concepts automatically extracted from each
traction. Active communities have been created
                                                                text and genre labels manually assigned by do-
around topics such as the study of Parliamentary
                                                                main experts. Furthermore, the release includes
corpora (see the ParlaCLARIN2 and ParlaFormat
                                                                silver annotation for lemma, part of speech, per-
workshops3 ), the analysis of political manifestos4
                                                                son names and place names with associated co-
and of Presidential speeches.5 Despite the impor-
                                                                ordinates in a CoNLL-like format. All files and
tance of this research field, copyright and avail-
                                                                the corresponding descriptions can be downloaded
ability in machine-readable format still represent
                                                                at https://dh.fbk.eu/technologies/
    1
      Copyright ©2019 for this paper by its authors. Use per-   corpus-de-gasperi (with CC BY-NC-SA li-
mitted under Creative Commons License Attribution 4.0 In-
ternational (CC BY 4.0).                                        cense). The corpus can also be navigated using
    2
      https://www.clarin.eu/ParlaCLARIN                         the ALCIDE platform (Moretti et al., 2016) at this
    3
      https://www.clarin.eu/event/2019/                         link: http://alcidedigitale.fbk.eu/.
parlaformat-workshop
    4
      https://manifesto-project.wzb.eu/
    5
      https://www.presidency.ucsb.edu/
                                                                   6
documents                                                              https://archivio.quirinale.it/aspr/
2       Related Work                                    notated and then partially revised by hand (De Fe-
                                                        lice et al., 2018). Compared with these two last
The political domain has been studied in com-
                                                        works, our corpus is broader, having a multi-
putational linguistics from various perspectives.
                                                        layered semantic analysis, and completely avail-
Annotated corpora have been created to analyse
                                                        able for download in different formats, thus open
rhetoric and metaphors in political communica-
                                                        to further analysis by the research community.
tion (Cardie and Wilkerson, 2008; Ahrens et al.,
2018), study the impact of speeches on the audi-     3 Corpus Description
ence (Guerini et al., 2013; Thomas et al., 2006)
and understand the relationship between ideol-       Our corpus contains the complete collection of
ogy and linguistic complexity (Schoonvelde et al.,   public documents by Alcide De Gasperi, the first
2019). Resources have also been developed to         Prime Minister of the Italian Republic and one of
train and test automatic systems for several types   the founding fathers of the European Union. It in-
of NLP tasks, such as persuasiveness prediction      cludes 2,762 documents published between 1901
(Strapparava et al., 2010), sentiment and emotion    and 1954, for a total of around 3,000,000 tokens.
analysis (Young and Soroka, 2012; Rheault et al.,    The corpus is released as raw text, as XML with
2016), text classification (Yu et al., 2008), topic- a minimal set of meta-data and associated key-
based agreement detection (Menini et al., 2017)      concepts, and as CoNLL-like format, with addi-
and recognition of ideological positions (Hirst et   tional information that have been fully or semi-
al., 2010).                                          automatically annotated (see Section 4). Texts,
   Many research activities have recently dealt      date and place of publication were automatically
with the digitisation and release of corpora con-    generated starting from the PDF files used to is-
taining historical political texts. For example, the sue the volumes edited by Il Mulino. Each doc-
corpus of speeches given in the British Parlia-      ument of the collection was classified manually
ment from 1803 to 2005 (i.e. the Hansard Cor-        by a group of history scholars on the basis of a
pus) has been automatically tagged using the His-    two-layered hierarchy that takes into consideration
torical Thesaurus Semantic Tagger (Piao et al.,      whether the text was originally released in an oral
2014; Wattam et al., 2014) and then a part of it     or written form, and its specific genre. It is impor-
has been semantically enriched with information      tant to note that different text genres correspond to
about speakers and topics (Nanni et al., 2019).      different roles covered by De Gasperi during his
In addition, the Canadian Parliamentary Debates      life: e.g. daily press when he worked as a journal-
(1901-present) have been standardised, enriched      ist for newspapers in Trentino, speeches in institu-
and distributed within the “Digging into Linked      tional venues when he was a Member of the Italian
Parliamentary Data” project (Beelen et al., 2017).   Parliament.
The period from 1947 to 2017 is instead covered         History scholars identified also four time spans
by a dataset of Dutch and Danish party congress      to which each document can be assigned, that
speeches (Schumacher et al., 2019).                  characterise different periods in De Gasperi’s life.
   As for Italian, to the best of our knowledge,     These correspond to the four volumes of the
the only available comprehensive study of the lan-   printed edition and are used to split the corpus into
guage of Italian politicians is the one by Bolasco   different periods based on the date of publication:
(2015). He analyses the parliamentary proceed- Vol. I : De Gasperi was a journalist and a students’
ings of the Italian Chamber of Deputies in the pe-         leader. He was active mainly in Trento and in
riod 1953-2008 using the TalTac2 software7 , thus          the Austrian Parliament (1901 – 1918).
providing a lexical and statistical analysis. An-
other project related to our work is “Voci della Vol. II : De Gasperi founded Partito Popolare, be-
Grande Guerra” whose online platform allows to             came Parliament member in Rome and then
explore a corpus of documents related to the first         left the Italian political life for several years
World War including samples of parliamentary               after opposing the Fascist regime, working at
proceedings and political speeches (Lenci et al.,          the Vatican library and as a publicist (1919 –
2016). Similarly to what we present in this pa-            1942).
per, such documents have been automatically an- Vol. III :      De Gasperi founded the Christian-
    7
        http://www.taltac.it/                                Democratic Party, became Prime Minister
       Document            Type                     Number   concepts, that is a weighted list of n-grams rep-
                           Monographs / Prefaces    4
                           Daily press              963      resenting the most important concepts of a text,
       Written documents
                           Magazines                228      automatically extracted using KD (Moretti et al.,
                           Official documents       433      2015).
                           Electoral / propaganda   473
       Speeches            Party conferences        188
                           Institutional venues     419      5     Annotation Evaluation
       Not specified       Not specified            54
                                                             We evaluated the quality of the automatic anno-
   Table 1: Genre labels with corresponding statis-          tation produced by TextPro modules on a subset
   tics.                                                     of our corpus. Indeed, since these modules were
                                                             developed to perform best on contemporary texts,
             and was Italian delegate at the World War II    and typically trained on news, it is important to
             peace conference (1943 – 1948).                 assess to what extent they can be reliably used on
                                                             Italian documents of the XX Century in the polit-
Vol. IV : After Christian Democracy led by De                ical domain. To this end we manually annotated
        Gasperi won the first general elections of the       a gold standard made of documents written by
        Italian Republic, he launched a plan of re-          De Gasperi between 1906 and 1911 for a total of
        forms to reconstruct Italy including social          8,872 tokens. We chose texts belonging to the first
        housing, labor policy and unemployment in-           period of De Gasperi’s life because they are the
        surance (1949 – 1954).                               oldest in the corpus and therefore the most linguis-
                                                             tically different from the texts used for training the
   4        Annotated Information                            modules. Results of the evaluation are compared
                                                             with the ones obtained by TextPro on contempo-
   The annotations included in the release are:              rary texts.
        • Lemma and PoS: the corpus has been lem-            5.1    Lemmatization
          matised and PoS-tagged using the TextPro
          suite (Pianta et al., 2008). The module for        Table 3 shows TextPro accuracy obtained on our
          the lemmatization is a rule-based system,          gold standard compared with the ones reported in
          whereas the part-of-speech annotation is sta-      Aprosio and Moretti (2018) and calculated on the
          tistical and has been trained on the EVALITA       Universal Dependencies (UD) test set for Italian
          2007 dataset (Tamburini, 2007) following the       (Bosco et al., 2013). The drop of 0.7 points in
          EAGLES tagset (Monachini, 1996).                   accuracy is mainly due to some repeated anoma-
                                                             lies of the module in the lemmatization of defi-
        • Person and place names: named entities have        nite and indefinite articles (which are lemmatized
          been tagged using the NER module included          using the labels “det” and “indet”, instead of sin-
          in TextPro and trained on the I-CAB corpus         gular masculine forms “il” and “uno”) and to the
          (Magnini et al., 2006). Geopolitical entities      non-recognition of truncated words, such as “far”,
          (GPEs) have also been geo-referenced using         “bel”, “andar”, “vuol”, not common in contempo-
          Nominatim8 (Clemens, 2015). The number             rary texts. Other sources of errors are the pres-
          of person and place names per volume is pro-       ence of obsolete terms, e.g. “libello”, “soziale”,
          vided in Table 2.                                  “donde”, and the use of preterite (passato remoto,
                                                             e.g. “andò”, “apparve”), a grammatical tense not
      After running the automatic modules, the output        very frequent in contemporary news. Most of
   was uploaded in the ALCIDE platform (Moretti              previously mentioned anomalies have been fixed
   et al., 2016) and, through its navigation interface,      through a set of rules applied after data processing:
   we identified annotations that were systematically        after this correction, accuracy has risen to 0.97.
   wrongly tagged, and fixed them manually. An
   evaluation of the automatic annotation is reported        5.2    PoS Tagging
   in Section 5.
                                                             The presence of obsolete words, truncated forms
      In addition to the annotations previously men-
                                                             and preterite verbs leads to errors also in the PoS
   tioned, each document is assigned to a set of key-
                                                             tagger of TextPro. However, for this module the
        8
            https://nominatim.openstreetmap.org/             impact is less evident than for lemmatization: as
              VOL I                           VOL II                   VOL III                       VOL IV
 PER                GPE               PER          GPE        PER                 GPE       PER                 GPE
 4,126              6,168             2,890        2,956      3,018               4,324     5,701               6,308
 Gesù Cristo       Trento            Gesù Cristo Italia     Palmiro Togliatti   Italia    Pietro Nenni        Italia
 Augusto Avancini Alto Adige          Mussolini    Roma       Pietro Nenni        Trieste   Palmiro Togliatti   Europa
 Karl Lueger        Trentino          Leone XIII   Germania   Marshall            Russia    Tito                Trieste

 Table 2: Occurrences of PER and GPE per volume, with three top-frequent entities for each category.

                UD Test Set       De Gasperi Corpus           (GPE) in De Gasperi’s documents is compared
                Accuracy          Accuracy
       Lemma    0.96              0.89                        with the scores TextPro obtained in the EVALITA
                                                              2007 campaign (Speranza, 2007), when trained
Table 3: Comparison of lemmatization perfor-                  and tested on a newswire corpus. The tool shows
mance on the Italian UD test set and on our gold              a drop in performance on our gold standard only
standard.                                                     in the recognition of persons’ names (-0.16 F1
                                                              points), whereas place names seem to be more sta-
reported in Table 4, on De Gasperi’s documents                ble (+0.1 F1 points). In both categories, precision
the performance drop is only 0.1 points accuracy              has decreased more than recall: to improve it, we
with respect to the results obtained on the UD test           manually checked the named entities detected by
set. Table 5 gives details on the number and dis-             the automatic module in the whole corpus remov-
tribution of errors per grammatical category. Cat-            ing the wrong ones. We also verified the latitude
egories registering the higher quantity of mistaken           and the longitude retrieved with Nominatim for
tags are nouns, proper nouns, verbs and adjectives.           all the GPEs assigning new correct coordinates to
Most mistakes concerning nouns are due to words               about 6% of them. Errors were mainly related to
capitalised to show formal respect towards high-              places that no longer exist or that have changed
est representatives of the State or of the Church             names after the death of De Gasperi, (e.g. “Prus-
(e.g. “Vescovo”) and German common nouns that                 sia”, “Congo Belga”) and to little villages in the
all have the initial capital letter.                          Trentino area (e.g. “Oltresarca”, “Termon”).
                                                                        EVALITA 2007 test set     De Gasperi corpus
               UD Test Set       De Gasperi Corpus                      P    R     F1             P     R      F1
               Accuracy          Accuracy                         PER   0.92 0.93 0.92            0.70 0.82 0.76
        PoS    0.96              0.95                             GPE   0.85 0.86 0.85            0.82 0.90 0.86
Table 4: Comparison of PoS tagging performance                Table 6: Comparison of NER performance on
on the Italian UD test set and on our gold standard.          news and on our gold standard.

      Grammatical Category         #errors   %errors          6    Use Cases
      Adjectives                   62        15.54
      Adverbs                      24        6.02             The corpus has been used to perform a number of
      Conjunctions                 6         1.50
                                                              pilot studies, which have confirmed the potential
      Demonstrative Adjectives     8         2.01
      Prepositions                 10        2.51             of this kind of resource and could represent a start-
      Pronouns                     12        3.01             ing point for further developments (Sprugnoli et
      Relative Pronouns            1         0.25             al., 2016). Three of these studies are described in
      Articles                     11        2.76
      Nouns                        94        23.56            this Section.
      Proper Nouns                 91        22.81               A first analysis has been carried out with the
      Verbs                        73        18.30            goal of studying De Gasperi’s rhetoric strategy
      Acronyms                     6         1.50
      Foreign Terms                1         0.25             through his use of verb tenses, considered as
                                                              an important marker of temporality (Sprugnoli et
      Table 5: PoS-tagging errors per category.               al., 2018). This study is based on the paradigm
                                                              proposed by Chilton (2004), who includes time
                                                              among the three axes of the political discourse to-
5.3    Persons and GPEs                                       gether with space and modality.
In Table 6 the performance of automatic recog-                   We run the morphological analyzer included in
nition of persons (PER) and geo-political entities            TINT NLP Suite to recognise the tenses of all
verbs of the corpus. We then merge them into            dimensions in Chilton (2004), this shift should be
present, past and future tense and compare the dis-     seen in the light of De Gasperi’s effort after 1943
tribution of the three classes across the four vol-     to justify past and present policy, using mentioned
umes. We observe that there is an evident differ-       persons to build a national ideology.
ence between the use of verb tenses before and             A third analysis focused on how temporal in-
after 1943. Indeed, in the first two volumes past       formation is expressed in De Gasperi’s documents
tenses are more frequently used, with a highly          (Speranza and Sprugnoli, 2018). To explore this
statistically significant difference with respect to    aspect we manually annotated ten newspaper ar-
volumes III and IV (p < 0.001 using Wilcoxon            ticles, published in 1914 and related to the out-
signed-rank test). On the other hand, after 1943        break of the Great War, following the It-TimeML
De Gasperi uses more present and future tense,          guidelines (Caselli et al., 2011). This resource has
again with high statistical significance. This can      been used in the EVENTI task organized within
be explained by the fact that the last volumes          EVALITA 2014 (Caselli et al., 2014) and is freely
contain many press reports describing the pro-          available online. The average number of annotated
grammatic commitment of Christian Democracy             events and temporal relations in the documents
as well as letters and telegrams sent by De Gasperi     written by De Gasperi is higher than in contem-
as Minister of Foreign Affairs, where the devel-        porary newspaper articles annotated following the
opment of prospective collaborations is proposed.       same guidelines, whereas the density of temporal
The last volume discusses also the reforms to be        expressions is comparable. Other differences con-
adopted for the reconstruction of the newly born        cern the type of events, temporal expressions and
Italian Republic and those about the forthcoming        temporal relations present in the historical texts.
creation of a European Community. In general,           For example, De Gasperi frequently uses events
after 1943 we observe a shift of focus from past        expressing personal opinions about the topics cov-
events to the contemporary and future dimension.        ered in the articles. The high presence of specula-
                                                        tions influences the temporal structure of the texts:
   A second analysis related to temporality deals       in many cases events are not ordered chronologi-
with cited persons, which were linked to a Dbpe-        cally but presented as simultaneous with respect
dia entry using the Wiki Machine (Palmero Apro-         to the time of writing. Moreover, temporal ex-
sio and Giuliano, 2016). Through this link, each        pressions are mainly non-specific or fuzzy: a char-
person is associated with a dbo:birthDate and           acteristic that is less evident in other corpora of
dbo:deathDate and then to a Past or Present la-         contemporary texts, and that may be related to the
bel, again using the document date as a reference.      more speculative nature of political texts.
Persons are considered part of the past if the ref-
erent was dead before the document publication          7   Conclusions
time. Using the classification algorithm described
in (Palmero Aprosio et al., 2017) we further as-        In this paper we present the release of the cor-
sign a semantic category to each mention. A com-        pus of Alcide De Gasperi’s public writings, in-
parative analysis shows that contemporary persons       cluding 2,762 documents and around 3 million to-
are generally more cited than past ones, but also       kens. We make available raw texts, XML files
that the category of persons mentioned in the doc-      having a small set of metadata and key-concepts
ument changes significantly across the volumes:         and CoNLL-like files with lemma, PoS, PER, GPE
while in Volume I cited persons include politicians     annotation together with the coordinates of place
but also religious figures and artists, this range of   names. Based on an evaluation performed on all
figures decreases over time, with almost exclu-         four annotation layers, we show that their quality
sively political figures mentioned in Volume IV.        is good, although annotation was performed auto-
As an example, we report in Fig. 1 and Fig. 2 the       matically and only partially revised.
top-cited persons in Vol. I and IV respectively:           This is the first freely available corpus of this
while in the early documents Beethoven, Dante           kind, and we hope that it can be used to foster re-
and Nietzsche are highly cited, persons mentioned       search in political science, corpus linguistics and
in the late documents include exclusively politi-       history, as well as to develop and test NLP sys-
cians and religious figures, all from present time or   tems using data that are different from widely used
recent past. With reference to the previously cited     contemporary news.
  Figure 1: Past and present persons mentioned               Figure 2: Past and present persons mentioned
  in Vol. 1.                                                 in Vol. 4.

Acknowledgments                                             Claire Cardie and John Wilkerson. 2008. Text An-
                                                              notation for Political Science Research. Journal of
We thank the colleagues from the Italian-German               Information Technology & Politics, 5(1):1–6.
Historical Institute at Fondazione Bruno Kessler
                                                            Tommaso Caselli, Valentina Bartalesi Lenzi, Rachele
for their help in annotating De Gasperis corpus,              Sprugnoli, Emanuele Pianta, and Irina Prodanof.
and Edizioni Il Mulino, for giving access to the              2011. Annotating events, temporal expressions and
corpus and allowing its release. The project has              relations in Italian: the It-TimeML experience for
been partially supported by Fondazione Cassa di               the Ita-TimeBank. In Proceedings of the 5th Lin-
                                                              guistic Annotation Workshop, pages 143–151. Asso-
Risparmio di Trento e Rovereto and Fondazione                 ciation for Computational Linguistics.
Cassa di Risparmio delle Province Lombarde.
                                                            Tommaso Caselli, Rachele Sprugnoli, Manuela Sper-
                                                              anza, and Monica Monachini. 2014. EVENTI
References                                                    EValuation of Events and Temporal INformation at
                                                              Evalita 2014. In Proceedings of the Fourth Interna-
Kathleen Ahrens, Huiheng Zeng, and Shun-han Re-               tional Workshop EVALITA 2014, pages 27–34.
  bekah Wong. 2018. Using a Corpus of English
  and Chinese Political Speeches for Metaphor Anal-         Paul Chilton. 2004. Analysing political discourse:
  ysis. In Proceedings of the Eleventh International          Theory and practice. Routledge.
  Conference on Language Resources and Evaluation
  (LREC-2018).                                              Konstantin Clemens. 2015. Geocoding with open-
                                                              streetmap data. GEOProcessing 2015, page 10.
Alessio Palmero Aprosio and Giovanni Moretti. 2018.
                                                            Irene De Felice, Felice DellOrletta, Giulia Ven-
  Tint 2.0: an All-inclusive Suite for NLP in Italian. In
                                                               turi, Alessandro Lenci, and Simonetta Montemagni.
  Proceedings of the Fifth Italian Conference on Com-
                                                               2018. Italian in the Trenches: Linguistic Annotation
  putational Linguistics (CLiC-it 2018), Torino, Italy,
                                                               and Analysis of Texts of the Great War. In Fifth
  December 10-12, 2018.
                                                               Italian Conference on Computational Linguistics
Kaspar Beelen, Timothy Alberdingk Thijm, Christo-              (CLiC-it 2018), pages 160–164. Accademia Univer-
  pher Cochrane, Kees Halvemaan, Graeme Hirst,                 sity Press.
  Michael Kimmins, Sander Lijbrink, Maarten Marx,           Alcide De Gasperi. 2006. Alcide De Gasperi nel
  Nona Naderi, Ludovic Rheault, et al. 2017. Dig-             Trentino asburgico. In Scritti e discorsi politici di
  itization of the Canadian parliamentary debates.            Alcide De Gasperi, volume 1. Il Mulino.
  Canadian Journal of Political Science/Revue cana-
  dienne de science politique, 50(3):849–864.               Alcide De Gasperi. 2008a. Alcide De Gasperi dal Par-
                                                              tito popolare italiano all’esilio interno 1919-1942.
Sergio Bolasco, 2015. Sulla costruzione di un cor-            In Scritti e discorsi politici di Alcide De Gasperi,
  pus per l’analisi automatica del linguaggio parla-          volume 2. Il Mulino.
  mentare dei leader, chapter 5. Camera dei Deputati.
                                                            Alcide De Gasperi. 2008b. Alcide De Gasperi e la
Cristina Bosco, Montemagni Simonetta, and Simi                fondazione della Democrazia cristiana, 1943-1948.
  Maria. 2013. Converting Italian Treebanks: To-              In Scritti e discorsi politici di Alcide De Gasperi,
  wards an Italian Stanford Dependency Treebank. In           volume 3. Il Mulino.
  7th Linguistic Annotation Workshop and Interoper-
  ability with Discourse, pages 61–69. The Associa-         Alcide De Gasperi. 2009. Alcide de Gasperi e la sta-
  tion for Computational Linguistics.                         bilizzazione della Repubblica 1948-1954. In Scritti
  e discorsi politici di Alcide De Gasperi, volume 4. Il   Alessio Palmero Aprosio, Sara Tonelli, Stefano
  Mulino.                                                    Menini, and Giovanni Moretti. 2017. Using Seman-
                                                             tic Linking to Understand Persons’ Networks Ex-
Marco Guerini, Danilo Giampiccolo, Giovanni                  tracted from Text. Front. Digital Humanities, 2017.
 Moretti, Rachele Sprugnoli, and Carlo Strapparava.
 2013. The new release of CORPS: A corpus of po-           Emanuele Pianta, Christian Girardi, and Roberto
 litical speeches annotated with audience reactions.         Zanoli. 2008. The TextPro Tool Suite. In Proceed-
 In Multimodal Communication in Political Speech.            ings of Language Resources and Evaluation Confer-
 Shaping Minds and Social Action, pages 86–98.               ence, pages 2603–2607, Marrakech, Morocco.
 Springer.
                                                           Scott Piao, Fraser Dallachy, Alistair Baron, Paul
Graeme Hirst, Yaroslav Riabinin, and Jory Graham.            Rayson, and Marc Alexander. 2014. Developing
  2010. Party status as a confound in the automatic          the Historical Thesaurus Semantic Tagger. In The
  classification of political speech by ideology. In         Digital Humanities Congress 2014.
  Proceedings of the 10th International Conference on
  Statistical Analysis of Textual Data (JADT 2010),        Ludovic Rheault, Kaspar Beelen, Christopher
  pages 731–742.                                             Cochrane, and Graeme Hirst. 2016. Measuring
                                                             emotion in parliamentary debates with automated
Alessandro Lenci, Nicola Labanca, Claudio Marazz-            textual analysis. PloS one, 11(12):e0168843.
  ini, and Simonetta Montemagni. 2016. Voci della
  Grande Guerra An Annotated Corpus of Italian             Martijn Schoonvelde, Anna Brosius, Gijs Schumacher,
  Texts on World War I. Italian Journal of Compu-           and Bert N Bakker. 2019. Liberals lecture, con-
  tational Linguistics, pages 101–108.                      servatives communicate: Analyzing complexity and
                                                            ideology in 381,609 political speeches. PloS one,
Bernardo Magnini, Emanuele Pianta, Christian Girardi,       14(2):e0208450.
  Matteo Negri, Lorenza Romano, Manuela Speranza,
  Valentina Bartalesi Lenzi, and Rachele Sprugnoli.        Gijs Schumacher, Daniel Hansen, Mariken ACG
  2006. I-CAB: the Italian Content Annotation Bank.          van der Velden, and Sander Kunst.     2019.
  In LREC, pages 963–968.                                    A new dataset of Dutch and Danish party
                                                             congress speeches.     Research & Politics,
                                                             6(2):2053168019838352.
Stefano Menini, Federico Nanni, Simone Paolo
   Ponzetto, and Sara Tonelli. 2017. Topic-based
                                                           Manuela Speranza and Rachele Sprugnoli. 2018.
   agreement and disagreement in US electoral mani-
                                                            Annotation of Temporal Information on Historical
   festos. In Proceedings of the 2017 Conference on
                                                            Texts: a Small Corpus for a Big Challenge. Formal
   Empirical Methods in Natural Language Process-
                                                            Representation and the Digital Humanities, page
   ing, pages 2938–2944.
                                                            203.
Monica Monachini. 1996. ELM-it: EAGLES speci-              Manuela Speranza. 2007. EVALITA 2007: The
 fications for Italian morphosyntax lexicon specifica-      Named Entity Recognition Task. In Proceedings of
 tion and classification guidelines. Technical report,      the EVALITA 2007 Workshop on Evaluation of NLP
 Centre National de la Recherche Scientifique Paris,        Tools for Italian, pages 66–68, Rome, Italy.
 France.
                                                           Rachele Sprugnoli, Giovanni Moretti, Sara Tonelli, and
Giovanni Moretti, Rachele Sprugnoli, and Sara Tonelli.       Stefano Menini. 2016. Fifty years of european his-
  2015. Digging in the Dirt: Extracting Keyphrases           tory through the lens of computational linguistics:
  from Texts with KD. In Proceedings of the Sec-             the de gasperi project. IJCol-Italian journal of com-
  ond Italian Conference on Computational Linguis-           putational linguistics, 2(2):89–100.
  tics (CLiC-it 2015).
                                                           Rachele Sprugnoli, Giovanni Moretti, and Sara Tonelli.
Giovanni Moretti, Rachele Sprugnoli, Stefano Menini,         2018. Temporal Dimension in Alcide De Gasperi:
  and Sara Tonelli. 2016. ALCIDE: Extracting and             Past, Presentand Future in Historical Political Dis-
  visualising content from large document collections        course. In AIUCD 2018 - Book of Abstracts, pages
  to support Humanities studies. Knowledge-Based             77–80.
  Systems, 111:100–112.
                                                           Carlo Strapparava, Marco Guerini, and Oliviero Stock.
Federico Nanni, Stefano Menini, Sara Tonelli, and Si-        2010. Predicting Persuasiveness in Political Dis-
  mone Paolo Ponzetto. 2019. Semantifying the UK             courses. In Proceedings of the Seventh International
  Hansard (1918-2018). In Proceedings of JCDL19.             Conference on Language Resources and Evaluation
                                                             (LREC’10), pages 1342–1345.
Alessio Palmero Aprosio and Claudio Giuliano. 2016.
  The Wiki Machine: an open source software for en-        Fabio Tamburini. 2007. Evalita 2007: The Part-
  tity linking and enrichment. ArXiv e-prints, Septem-       of-Speech Tagging Task. Intelligenza artificiale,
  ber.                                                       4(2):57–73.
Matt Thomas, Bo Pang, and Lillian Lee. 2006. Get out
 the vote: Determining support or opposition from
 Congressional floor-debate transcripts. In Proceed-
 ings of the 2006 conference on empirical methods in
 natural language processing, pages 327–335. Asso-
 ciation for Computational Linguistics.
Stephen Wattam, Paul Rayson, Marc Alexander, and
   Jean Anderson. 2014. Experiences with Parallelisa-
   tion of an Existing NLP Pipeline: Tagging Hansard.
   In LREC, pages 4093–4096.
Lori Young and Stuart Soroka. 2012. Affective news:
  The automated coding of sentiment in political texts.
  Political Communication, 29(2):205–231.
Bei Yu, Stefan Kaufmann, and Daniel Diermeier.
  2008. Classifying party affiliation from political
  speech. Journal of Information Technology & Pol-
  itics, 5(1):33–48.