ConteCorpus: An Analysis of People Response to Institutional Communications During the Pandemic

ConteCorpus: An Analysis of People Response to Institutional Communications During the Pandemic VivianaVentura viviana.ventura01@universitadipavia.it Department of Humanities University of Pavia

Pavia Italy

ElisabettaJezek jezek@unipv.it Department of Humanities University of Pavia

Pavia Italy

ConteCorpus: An Analysis of People Response to Institutional Communications During the Pandemic AAF5ACB07A13F33B9722B4DEC85B5CC5 GROBID - A machine learning software for extracting information from scholarly documents Topic 1 Economics 2 Prime Minister 3 Politics 4 Pandemic 5 Home Terms

The study of institutional communication related to the pandemic, and to the population's response to it, is of great relevance today. The Italian spokesperson for communication regarding the pandemic has been, during the year 2020, the former Prime Minister Giuseppe Conte. We retrieved 4,860,395 comments from his Facebook official page and built the Con-teCorpus, a new Italian resource annotated in CoNLL-U format. A first aim of the research was to evaluate the performance of the model used to annotate the corpus. Models trained on social media texts are usually not very generalizable. Nevertheless, the results of the evaluation were good, especially in parsing metrics, and showed that a parser trained on Twitter data can be successfully applied to Facebook data. A second aim of the research was to provide an overall view of the content of such a large corpus; for this purpose, topic modeling was conducted, training an LDA model. The model generated 5 topics that cover different aspects linked to the pandemic emergency, from economic to political issues. Through the topic modeling we investigated which topics are prevalent on particular days.

Introduction

During the year 2020, the Prime Minister Giuseppe Conte has played a major role in institutional communication, particularly in communication regarding the policies undertaken to manage the health emergency. We assumed that inter-Copyright ©️ 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). esting content from the point of view of the response of the population to institutional communications regarding the pandemic would have been found on his social media profiles. Therefore, we created ConteCorpus,1 retrieving more than 4 million comments from his Facebook page2 starting from January 2020 until December 2020, and we annotated it in CoNLL-U format 3 .

A first aim of the research was to evaluate the performance of the model used to annotate the dataset. Models trained on social media texts usually are poorly generalizable even on text retrieved from the same social media, therefore we wanted to test the performance on Facebook texts of a model trained on Twitter texts. In order to evaluate the model, we created a gold standard by extracting 1,000 sentences from the ConteCorpus and manually revising them.

A second aim of the research was to provide an overall view of this large corpus. For this purpose we performed a Topic Modeling. We trained a LDA model sampling 10% of the ConteCorpus. The LDA model generated 5 topics related to different aspects of the pandemic emergency. The model was used to see which topics were the most relevant before and after the announcement of the first and the second period of restrictions adopted to fight the pandemic in Italy.

The paper is structured as follows: we first review the relevant literature for our research (section 2), then we describe the data collection and the creation of the corpus (section 3). In section 4, we describe the evaluation we performed of the model we used to annotate the corpus in CoNLL-U format, and in section 5 we report the results of the topic modeling experiment. In section 6 we provide some concluding observations.

State of the Art

Since the beginning of the health emergency, there has been a proliferation of computational analyses that exploit data extracted from social media. These data are considered relevant as they allow us to generalize about human social and linguistic behavior, especially regarding the pandemic event. Among the tasks that have been conducted on data drawn from social media in this period, sentiment analysis, emotion profiling and topic modeling are the most common (Gagliardi et al., 2020; Tamburini, 2020; Vitale et al., 2020; Stella et al., 2020a; Stella et al., 2020b; Stella et al., 2021; De Santis et al., 2020; Sciandra, 2020; Trevisan et al., 2021; Gozzi et al., 2020; Kruspe et al., 2020; Hussain et al., 2021; Chakraborty et al., 2020; Nemes e Kiss, 2020; Jelodar et al., 2021; Lamsal, 2020; Duong et al., 2021; Gupta et al., 2021; Sullivan et al., 2021; Su et al., 2020; Garcia et Berton, 2021; Ahmed et al., 2020).

In particular, Topic Modeling aims at finding hidden semantic structures within the texts and to model them into concepts. The unsupervised clustering technique LDA (Latent Dirichlet Allocation), developed by Blei (2003), has been used extensively in analyses conducted on social media data during the pandemic (Dashtian et Murthy, 2021; Feng et Zhou, 2020; Ordun et al., 2020; Wang et al., 2020; Kabir et Mandria, 2020; Amara et al., 2020; Abd-Alzaraq et al., 2020; Naseem et al., 2021; Low et al. 2020, Andreadis et al., 2021). LDA is a statistical model that represents each document in a corpus as a probabilistic distribution over latent topics and each topic as a probabilistic distribution over words. A topic has a probability of generating various words, where the words are all the observed words in the corpus. Thus, the terms in the set of documents are used to discover hidden topics in a large corpus.

As is well known, the language of the web is characterized by deviation from the standard language that challenges the use of NLP tools. Several classifications have been proposed to label the nature of web and social media language. In general, the labels aim to define a variety of language that is diaphasically low and at an indefinite point on the diamesic axis, e.g., "netspeak" (Crystal, 2001). Web and social media language is characterized by little planning in text structure and a greater propensity for parataxis, absence of revision and punctuation, abrupt interruption of periods, and an imitation of the continuous flow of speech (Fiorentino, 2013). Although some persistent traits of web and social media language can be described, it does not constitute a single variety of language from a sociolinguistic perspective (Fiorentino, 2013). This poses a double challenge in the use of NLP tools. First, because the tools are calibrated to standard language variety resources. Secondly, even if we created models that are better suited to web and social media languages, they would not be generalizable to every language variety on the web (Sanguinetti et al., 2018).

ConteCorpus Construction

Data Collection

We have downloaded 4,860,395 comments and 534 posts published during the year 2020 on Giuseppe Conte's Facebook official profile. We made call to any 2020 post ID of Giuseppe Conte's official page to retrieve text, object id, and created time of comments. The calls to the Facebook API Graph4 were made month to month in the same fashion. Nevertheless, as Table 1 shows, a larger amount of data has been retrieved in the month of March, April, and October. In the same period in Italy the more restrictive measures to fight pandemic were taken by the government.

Processing with the Neural Pipeline Stanza

After the data collection, we processed the data with the Neural Pipeline Stanza5 to enrich the texts with some annotations.

Stanza is an opensource Python NLP toolkit, which "features a language-agnostic fully neural pipeline for text analysis, including tokenization, multiword token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity" (Qi et al., 2020). The kit supports more than 77 human languages and uses the formalism Universal Dependencies6 Knowing the difficulties of annotating non standard texts such as those derived from social media, we chose to use this pipeline because the evaluation of its models found that Stanza neural language agnostic architecture "adapts well to text of different genres […] achieving state-of-the-art or competitive performance at each step of the pipeline" (Qi et al., 2020). Moreover, models that can be downloaded from Stanza have been trained each on a single language and on a specific text genre dataset. We chose to download the model trained on PoSTWITA-UD. 7 PoSTWITA-UD is an Italian Twitter treebank in Universal Dependencies (Sanguinetti et al., 2018). Although the language of social media is very peculiar and changes from one social media to another and from groups to groups (Fiorentino, 2013), we thought that the model downloadable from Stanza -trained on this dataset -could be generalizable to our data, being indomain. Moreover, Sanguinetti et al. (2018) have added customized tags to the UD scheme to deal with some social media peculiar phenomena: "discourse:emo" for emojis and emoticons, and "parataxis:hashtag" for hashtags. They tagged the link found in some sentences as "dep" (unspecified relation) and used the "upos" (universal part-ofspeech) tag "SYM" (symbol) for hashtags and emojis. Additionally, they manually inserted the lemma of non-standard word forms not recognized by the lemmatizer (Sanguinetti et al., 2018). We processed the data divided in 12 packages; each correspond to one month data. We used every processor of the pipeline, besides the Named Entity Recognition module (TokenizeProcessor, POSProcessor, LemmaProcessor, Dep-parseProcessor). We personalized the model in or-der not to split the sentences,8 forcing the To-kenizeProcessor to consider each comment as a sentence. Furthermore, we added two metadata to each sentence: one refers to the id of the post from which the comment was retrieved, and the other is the creation time of the comment. The aim is to make it easier to retrieve the comments from the corpus by their created time or post id if one needs to analyze a particular period of time or a particular post.

End-to-End Evaluation

Construction of the Gold Standard

We built a gold standard with a dual purpose: to evaluate the performance of the model on this new collection of social media texts, and to create a standard that can be used for future training and testing. We randomly selected 83 sentences from each file of the corpus annotated automatically (one file is composed of one-month comments), and manually revised the 1,000 sentences collected. The manual revision has followed the principle that what is understandable by a human would be correct.

Evaluation with CoNLL 2018 UD Shared Task Official Evaluation Script

To perform the evaluation, we used CoNLL 2018 UD shared task official evaluation script. 9 Table 2 shows the scores of evaluation metrics resulting from the performance of Stanza model on the test set of the ConteCorpus. resulting scores on Universal part-of-speech tagging metric, XPOS on language-specific part-ofspeech tagging metric, and UFeats on morphological features tagging metric. The last 5 rows show scores in five different parsing metrics.

What we found most challenging during the manual revision of the 1,000 sentences annotated automatically was correcting the errors in tokenization: many words that the tokenizer should have splitted were joined together. This type of tokenization error is often found when punctuation is used with non standard function. For example: we found that the token "oneste…volevo" ("hon-est…I wanted to") -an adjective, a punctuation mark and a verb -are conflated in a single token. In the manual revision, tokens like this have been splitted in three different tokens and other missing tags were added. The presence of such conflated words mayhave caused a worse score in the metric that evaluates the performance of segmentation, and consequentially in the other scores. The evaluation on the parser starts with aligning system nodes and gold nodes; their respective parent nodes are also considered; if the system parent is not aligned with the gold parent or if the relation label differs, the word is not counted as correctly attached. Despite errors in segmentation seem frequent in the corpus, this did not cause an excessive lowering of the scores on the various metrics reported in Table 2 and 3. Another error that appears frequently regards the lemma assigned to the abbreviations that are not present in PoSTTWITA-UD. Canonical abbreviations are tagged correctly, for example "cmq" for "comunque" ("however"). The abbreviations tagged incorrectly are those which appeared few times: such as "ql" that stands for "quelli" (those). An unexpected good result has been achieved on parsing metrics. This result could be due to the "preference of UD scheme in assigning headedness to content words" (Sanguinetti et al., 2018); therefore, the tendency of the social media languages to eliminate function words does not affect the performance of the parser. Another explanation can be found in the very similar frequencies distribution of part-ofspeeches and syntactic relations in the training set and the gold standard, as shown in Figure 1 and 2.

Overall, the model trained on PoSTWITA-UD turned out to perform well on the test set of the ConteCorpus because PoSTWITA-UD tagset has been adapted with attention to some recurrent features of social media languages. Our evaluation showed that a model trained on texts retrieved by social media can adapt well to other social media texts if one pays attention to the neural architecture of the model and the annotation format being used.

Topic Modeling

To provide an overall view of the content of this large corpus we performed a Topic Modeling training and testing an LDA model on the Con-teCorpus. To perform topic modelling, we sampled 10% of the sentences in our dataset and trained a LDA model. We treated each sentence as a document. We pre-processed lemmas removing stopwords, downloading Italian stopwords list from the NLTK (Natural Language Toolkit) library 10 and manually inserting missing stopwords. We filtered out tokens that appear in less than 15 documents and tokens with less than three letters; additionally, we kept only the 100,000 most frequent words. We transformed the documents into vectors creating a bag-of-words representation of each document. Then, we performed the term frequency-inverse document frequency (TF-IDF) on the whole corpus to assign higher weights to the most important words. Gensim LDA model 11 was applied first to the bags-of-words and secondly on the TF-IDF corpus to extract latent topics. Better performances were achieved with the LDA model applied to bags-of-words. We determined the optimal number of topics in LDA using the Coherence Value metric. 12 The underlying idea is that a good model will generate topics with high topic Coherence Value score. We ran different LDA ex-10 https://www.nltk.org/. 11 https://radimrehurek.com/gensim/models/ldamodel. html.

5.1

Methodology

periments varying the number of topics and selected the model with the highest medium topic Coherence Value score. Our final model generated 5 topics and has a topic medium Coherence Value score of 0.5. Table 4 illustrates the top ten most representative terms associated with each detected topic.

Results

As expected, all the topics extracted from the corpus are related to the concerns about the emergency. The focus is on the economic aspect of the emergency. The first ten most frequent words in Economics topic (Table 4 and Figure 3) are economic terms: "loan", "company", "to pay" "money" etc. In all the other topics at least one of the 10 most frequent words comes from the economic sphere. Among the ten most frequent words of each topic there are only two words regarding the pandemic, found in Pandemic topic: "virus" and "pandemic". It is no coincidence that the most frequent word in this topic is "to go out". The need to face the emergency through the intervention of the institutions is evident. This is shown espe Table 4. Topic generated from the LDA model and the ten most frequent terms. cially by Prime Minister and Politics topics (Table 4). Prime Minister topic most frequent words are related to the Prime Minister. Perhaps words like "bravo" and "thank you" and "dear" show a positive judgement towards him. In Politics topic one finds words of the institutional sphere such as: "country", "government", "people", "bank". Home topic is related to the private sphere with words like "to hope", "home", "to wait", "to lose", although there is no shortage of words from the economic sphere. In Figure 3 ). In the days that followed, the prevailing topic is Economics: on 28 October, the "ristoro" decree was approved to financially support commercial activities. A peak in the topic of Economics occurred on 18 March: on those days, discussions were taking place on whether to ask the European Union for financial aid to overcome the pandemic. The prevailing topics are therefore usually related to current events.

Concluding Observations

As mentioned before, models trained with data from social media are hardly generalizable. This stems from the fact that from a sociolinguistic perspective, the language of social media does not constitute a single variety. So, we expected that the results in the various evaluation metrics we performed would be worse than the results in the evaluation conducted on the PoSTWITA-UD test set. Surprisingly, in some metrics the results on evaluating the ConteCorpus test set were better than the results on the PoSTWITA-UD test set. To offer an overall view of the content of the Con-teCorpus we performed topic modeling. The topics generated by the LDA model cover various aspects of the pandemic emergency, with a preponderance of political and economic issues. Unexpectedly, topics identified do not show concern regard the risk of contagion and the possibility of catching the disease.

Figure 1 .1Figure 1. Frequency distribution of syntactic relation tags in the training set and the gold standard.

Figure 2 .2Figure 2. Frequency distribution of part-ofspeech tags in the training set and the gold standard.

Figure 3 .3Figure 3. Intertopic distance Map and Top-30 most relevant terms for Topic 1. For a better view visit: https://sites.google.com/view/ldavisualizationcontecorpus/home-page.

the distance between the centre of the circles indicates the similarity between the topics. Here you can see that only Economics topic and Prime Minister topic overlap; this indicates that the two topics are more similar with respect to the other topics. Moreover, the size of the area of each circle represents the importance of the topic relative to the corpus. Economics topic is the most important topic in the corpus. Finally, we tested our model on unseen documents: the comments published between 15 February and 30 March 2020, before and after the announcement of the first period of restrictions to combat the pandemic, and between 1 October and 14 November 2020, before and after the announcement of the second period of restrictions. Figures4, 5and 6 show trends in topics over time. Each line represents a topic and the x-axis shows the time progression. On 23 February, the first restrictive policies were announced for some Italian cities: Figure5shows a peak in the pandemic topic on that day. Figure4shows how the prevalence of the five topics changes on 8-12 March 2020. The Figure shows a peak on 9 March in Prime Minister topic: on that day he announced the first national restrictions period to combat the pandemic. Overall, the prevalent topics on those days are economics and pandemic. On 13 October, after a summer without major restrictions, with a new exponential increase in the curve of contagions, the Italian Parliament passed a decree limiting the possibility of aggregation. That day we have a new peak in the Pandemic theme (Figure 6

Figure 4 .4Figure 4. Prevalence of topics during the days 8-12 March 2020.

Figure 5 .5Figure 5. Prevalence of topics during the days 15 February-30 March 2020.

Figure 6 .6Figure 6. Prevalence of topics during the days 1 October-15 November 2020.

Table 1 .1Number of posts and comments retrieved for each month.January February MarchAprilMayJuneJulyAugust September October November December TotPost485948452644612443753328534Comment 115,971 154,266681,221 775,972 361,179 335,772 449,913 190,777 260,237666,126 441,822427,1394,860,395

Table 33compares the

Table 2 .2Performance of Stanza's UD pre-trained model tested on the test set of ConteCorpus.

Table 3 .3Performance of Stanza's UD pre-trained model tested on official test set of PoSTWITA-UD and on test set of ConteCorpus. The scores shown are calculated using the F-measure.

https://github.com/Viviana-dev/Conte_Corpushttps://www.facebook.com/GiuseppeConte64/https://universaldependencies.org/format.htmlhttps://developers.facebook.com/docs/graph-api?lo-cale=it_IThttps://stanfordnlp.github.io/stanza/Universal Dependencies (UD) is a "framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages" (https://universaldependencies.org/).https://universaldependencies.org/treebanks/it_postwita/index.html 8 Sentence segmentation and tokenization are jointly performed by the TokenizeProcessor(Qi et al., 2020).Coherence Value metric is developed byRoder (2015). It evaluates a single topic by measuring the degree of semantic similarity between high scoring words in the topic.

Top concerns of tweeters during the COVID-19 pandemic: infoveillance study AAbd-Alrazaq DAlhuwail MHouseh MHamdi ZShah Journal of medical Internet research 22 4 e19016 2020 MEAhmed MR IRabin FNChowdhury arXiv:2006.00804 COVID-19: Social media sentiment analysis on reopening 2020 arXiv preprint Multilingual topic modeling for tracking COVID-19 trends based on Facebook data analysis AAmara MA HTaieb MBAouicha Applied Intelligence 51 5 2021 A social media analytics platform visualising the spread of COVID-19 in Italy via exploitation of automatically geotagged tweets SAndreadis GAntzoulatos TMavropoulos PGiannakeris GTzionis NPantelidis ..Kompatsiaris I Online Social Networks and Media 23 100134 2021 Latent dirichlet allocation DMBlei AYNg MIJordan the Journal of machine Learning research 3 2003 JMach) Sentiment Analysis of COVID-19 tweets by Deep Learning Classifiers-A study to show how popularity is affecting accuracy in social media KChakraborty SBhatia SBhattacharyya JPlatos RBag AEHassanien Applied Soft Computing 97 106754 2020 Language and the Internet DCrystal 2001 Cambridge University Press HDashtian DMurthy arXiv:2101.12202 Cml-covid: A large-scale covid-19 twitter dataset with latent topics, sentiment and location information 2021 arXiv preprint An Infoveillance System for Detecting and Tracking Relevant Topics from Italian Tweets During the COVID-19 Event EDe Santis AMartino ARizzi IEEE Access 8 2020 Wild language" goes Web: new writers and old problems in the elaboration of the written code GFiorentino Languages Go Web. Standard and non-standard languages on the Internet EMiola

Alessandria,

Edizioni dell'Orso 2013 Deep biaffine attention for neural dependency parsing TDozat CDManning Proceedings of the 2017 International Conference on Learning Representations (ICLR) the 2017 International Conference on Learning Representations (ICLR) 2017 The ivory tower lost: How college students respond differently than the general public to the covid-19 pandemic VDuong JLuo PPham TYang YWang Advances in Social Networks Analysis and Mining (ASONAM) 2020 Is working from home the new norm? an observational study based on a large geo-tagged covid-19 twitter dataset YFeng WZhou arXiv:2006.08581 2020 arXiv preprint L'impatto emotivo della comunicazione istituzionale durante la pandemia di Covid-19: uno studio di Twitter Sentiment Analysis GGagliardi LGregori ASuozzi Proceedings of the Seventh Italian Conference on Computational Linguistics, CLiC-it 2020 CEUR Workshop Proceedings the Seventh Italian Conference on Computational Linguistics, CLiC-it 2020

Bologna, Italy

2021 2769 Topic detection and sentiment analysis in Twitter content related to COVID-19 from Brazil and the USA KGarcia LBerton Applied Soft Computing 101 107057 2021 Collective Response to Media Coverage of the COVID-19 Pandemic on Reddit and Wikipedia: Mixed-Methods Analysis NGozzi MTizzani MStarnini FCiulla DPaolotti APanisson NPerra Journal of medical Internet research 22 10 e21597 2020 An emotion care model using multimodal textual analysis on COVID-19 VGupta NJain PKatariya AKumar SMohan AAhmadian MFerrara Chaos, Solitons & Fractals 144 110708 2021 Artificial Intelligence-Enabled Analysis of Public Attitudes on Facebook and Twitter Toward COVID-19 Vaccines in the United Kingdom and the United States: Observational Study AHussain ATahir ZHussain ZSheikh MGogate KDashtipour Journal of medical Internet research 23 4 e26627 2021 Deep sentiment classification and topic discovery on novel coronavirus or covid-19 online discussions: Nlp using lstm recurrent neural network approach HJelodar YWang ROrji SHuang IEEE Journal of Biomedical and Health Informatics 24 10 2020 AKruspe MHäberle IKuhn XXZhu arXiv:2008.12172 Cross-language sentiment analysis of European Twitter messages during the COVID-19 pandemic 2020 arXiv preprint Design and analysis of a large-scale COVID-19 tweets dataset RLamsal Applied Intelligence 2020 Using APIs for data collection on social media SLomborg ABechmann The Information Society 30 4 2014 Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study DMLow LRumker TTalkar JTorous GCecchi SSGhosh Journal of medical Internet research 22 10 e22635 2020 COVIDSenti: a large-scale benchmark Twitter data set for COVID-19 sentiment analysis UNaseem IRazzak MKhushi PWEklund JKim IEEE transactions on computational social systems 2021 Social media sentiment analysis based on COVID-19 LNemes AKiss Journal of Information and Telecommunication 5 1 2021 COrdun SPurushotham ERaff arXiv:2005.03082 Exploratory analysis of covid-19 tweets using topic modeling, umap, and digraphs 2020 arXiv preprint Stanza: A Python Natural Language Processing Toolkit for Many Human Languages PQi YZhang YZhang JBolton CDManning Association for Computational Linguistics (ACL) System Demonstrations 2020 Exploring the space of topic coherence measures MRöder ABoth AHinneburg Proceedings of the eighth ACM international conference on Web search and data mining the eighth ACM international conference on Web search and data mining 2015 PoST-WITA-UD: an Italian Twitter Treebank in universal dependencies MSanguinetti CBosco ALavelli AMazzei OAntonelli FTamburini Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC the Eleventh International Conference on Language Resources and Evaluation (LREC 2018. May. 2018 COVID-19 Outbreak through Tweeters' Words: Monitoring Italian Social Media Communication about COVID-19 with Text Mining and Word Embeddings ASciandra IEEE Symposium on Computers and Communications (ISCC) IEEE 2020. 2020 #lockdown: Network-enhanced emotional profiling in the time of Covid-19 MStella VRestocchi SDe Deyne Big Data and Cognitive Computing 4 2 14 2020 Cognitive network science reconstructs how experts, news outlets and social media perceived the COVID-19 pandemic MStella Systems 8 4 38 2020 MStella MSVitevitch FBotta arXiv:2103.15909 Cognitive networks identify the content of English and Italian popular posts about COVID-19 vaccines: Anticipation, logistics, conspiracy and loss of trust 2021 arXiv preprint Examining the impact of COVID-19 lockdown in Wuhan and Lombardy: a psycholinguistic analysis on Weibo and Twitter YSu JXue XLiu PWu JChen CChen International journal of environmental research and public health 17 12 4552 2020 Characterization of Anonymous Physician Perspectives on COVID-19 Using Social Media Data KJSullivan MBurden AKeniston JMBanda LEHunter Pac Symp Biocomput 2020 FTamburini EmoItaly 2020 Debate on online social networks at the time of COVID-19: An Italian case study MTrevisan LVassio DGiordano Online Social Networks and Media 23 100136 2021 Concerns Expressed by Chinese Social Media Users During the COVID-19 Pandemic: Content Analysis of Sina Weibo Microblogging Data JWang YZhou WZhang REvans CZhu Journal of medical Internet research 22 11 e22152 2020 #andràtuttobene: Images, Texts, Emojis and Geodata in a Sentiment Analysis Pipeline PVitale SPelosi MFalco Proceedings of the Seventh Italian Conference on Computational Linguistics, CLiC-it 2020 CEUR Workshop Proceedings the Seventh Italian Conference on Computational Linguistics, CLiC-it 2020

Bologna, Italy

2020 2769