1. Introduction

GeoLingIt at EVALITA 2023: Overview of the Geolocation of Linguistic Variation in Italy Task

Alan Ramponi

Camilla Casula

0 1 0 Fondazione Bruno Kessler (FBK), Digital Humanities Unit - Trento , Italy 1 University of Trento, Department of Information Engineering and Computer Science - Trento , Italy

GeoLingIt is the first shared task on geolocation of linguistic variation in Italy from social media posts comprising content in language varieties other than standard Italian (i.e., regional Italian, and languages and dialects of Italy). The task is articulated into two subtasks of increasing complexity for which only textual content is allowed: i) coarse-grained geolocation, aiming at predicting the region in which the variety expressed in the post is spoken, and ii) fine-grained geolocation , aiming at predicting its exact coordinates. Both tasks can be either at the country level (standard track) or restricted to a linguistic area of choice (special track). GeoLingIt has attracted wide interest at the Evalita 2023 evaluation campaign with 37 registrations and 35 submitted runs. In this paper, we present the task and data, the evaluation criteria, the participants' results, an analysis of their approaches, and the main insights from the shared task.

eol>Natural language processing computational sociolinguistics linguistic variation linguistic diversity

1. Introduction

pecially by the youngest generations. User-generated texts comprising language varieties other than standard Italy is characterized by an astonishing linguistic diver- Italian open opportunities for the study of linguistic varisity that makes it a unique landscape in Europe [1]. Be- ation in Italy, and can ultimately help in enriching and sides standard Italian, a large number of local languages, complement linguistic atlases. their dialects, and regional varieties of standard Italian In this paper, we present GeoLingIt, the first shared (i.e., regional Italian) are spoken across the country [2]. task on geolocation of linguistic variation in Italy from While Italian is employed in all formal settings in its social media posts from Twitter containing content other standard form, in informal situations it is natural to than standard Italian. GeoLingIt has been organized observe Italian speakers to use (even unwittingly) re- as part of the Evalita 2023 evaluation campaign [7], and gional forms of Italian (e.g., guaglione, toso, and caruso for relies on DiatopIt [8], a corpus of geolocated tweets “young man”, typically in Campania, Lombardy-Veneto, exhibiting regional Italian use, code-switching between and Sicily areas, respectively), or to code-switch their Italian and local language varieties, or fully written in the local language varieties with the national language. latter. Compared to previous geolocation shared tasks

Local languages and their dialects evolved from Vulgar at international venues [9, 10, 11], GeoLingIt is focused Latin like Italian, and they mostly have no established on Italy and tailored to variation across language variorthography insofar as they are primarily used in spoken eties, and it thus minimizes the efect of spurious, highlysettings. On the other hand, regional forms of Italian de- localized lexical items (e.g., mentions of events, places, rive from a geographical diferentiation of Italian due to or tourist attractions) on prediction of linguistic areas. In influences by the former [ 3], are largely used in both oral the following, we present details on GeoLingIt, the reand written informal contexts, and typically follow Italian sults obtained by participant teams, and the main insights spelling conventions. When it comes to user-generated from the shared task. texts on social media, which are informal and feature linguistic patterns from spoken language [4, 5], we observe that not only regional Italian is naturally present, 2. Task description but also local language varieties of Italy are employed, albeit at various degrees. This can be attributed to their The GeoLingIt shared task deals with the geolocation of rediscovery as “additional expressive resources” [6], es- linguistic variation in Italy from Twitter posts comprising content in language varieties other than standard Italian (i.e., regional Italian, and languages and dialects of Italy). cEeVsAsiLnIgTaAn2d0S2p3e:e8cthh TEovoallsufaotrioItnalCiaanm, pSaepig7n–of8,NPaatrumraal, LITanguage Pro- It aims to advance the study of linguistic variation in * Corresponding author. Italy, provide means to complement qualitative-driven $ alramponi@fbk.eu (A. Ramponi); ccasula@fbk.eu (C. Casula) linguistic atlases, and sensitize the community on the © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License rich linguistic landscape of the country.

CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) 2.1. Standard and special tracks guage variation in Italy. All tweets have associated geolocation information and region labels, and have been samGeoLingIt is organized into two tracks. In the standard pled to contain either regional Italian usage or content in track, the focus of the task is at the country level (i.e., local language varieties of Italy. A multi-stage data colleccomprising all language varieties of Italy), whereas in tion process has been followed based on data-driven outthe special track, the task is restricted to a linguistic area of-vocabulary tokens (from posts over a period of 2 years) chosen by participants1 (e.g., the Gallo-Italic area, includ- which have been curated manually. Under-represented ing language varieties spoken in Piedmont, Lombardy, areas from the resulting posts have been then augmented Liguria, and Emilia-Romagna regions) to favor the emer- by employing the lexical artifacts package [13]. The corgence of microvariation insights. For both tracks, two pus consists of 15,039 posts from a 2-year time frame subtasks of increasing complexity are possible: coarse- (from 2020-07-01 to 2022-06-30) to minimize periodgrained geolocation (Section 2.2) and fine-grained geolo- related biases. For more details, we refer the reader to cation (Section 2.3). Ramponi and Casula (2023) [8].

GeoLingIt is based on DiatopIt [8], a corpus of social media posts from Twitter specifically focused on lan

1Participants have been provided with the renowned linguistic map

by Pellegrini (1977) [12] to encourage linguistically-grounded proposals, and requests have been approved based on motivation and relevance of the area from a linguistics perspective. 3Regions in the development set: Apulia, Calabria, Campania, Emilia2These are: Abruzzo, Aosta Valley, Apulia, Basilicata, Calabria, Romagna, Friuli-Venezia Giulia, Lazio, Liguria, Lombardy, PiedCampania, Emilia-Romagna, Friuli-Venezia Giulia, Lazio, Liguria, mont, Sardinia, Sicily, Tuscany, and Veneto. Lombardy, Marche, Molise, Piedmont, Sardinia, Sicily, Tuscany, 4Regions in the test set: the regions in the development set plus Trentino-Alto Adige, Umbria, and Veneto. Abruzzo, Marche, Trentino-Alto Adige, and Umbria.

2.2. Subtask A: Coarse-grained geolocation

Given the text of a tweet exhibiting regional Italian features or (partially or fully) written in local languages and dialects of Italy, predict the administrative region in which the variety expressed in the post is spoken. This is a classification task, i.e., one among regions of Italy has to be predicted. In the case of the standard track, this matches all regions of Italy2 ( = 20), whereas in the special track, it corresponds to the subset of regions of the linguistic area under consideration ( = ). This subtask is applicable for the special track if ≥ 2 regions are represented in the chosen area.

2.3. Subtask B: Fine-grained geolocation

Given the text of a tweet exhibiting regional Italian features or (partially or fully) written in local languages and dialects of Italy, predict the location, in terms of longitude and latitude coordinates, in which the variety expressed in the post is spoken. This is a double regression task, i.e., a pair of real-valued numbers has to be predicted. The Data format The corpus splits are in the form of tsv diference between standard and special tracks is here the ifles, i.e., a tab-separated format, with an example per extent of the area being considered. This subtask over- line and the first line as header. Each example has id comes the simplification of coarse-grained geolocation and text columns. For the coarse-grained geolocation (Section 2.2), aiming to uncover fine-grained linguistic subtask, data files additionally include a region column, variation. Indeed, language varieties of Italy lie on a con- whereas data files for the fine-grained geolocation subtask tinuum and often cross administrative region borders. include latitude and longitude columns. As a result, the instances in both the subtasks are the same, and difer according to the label column(s). The content of such 3. Data columns is described below: Data splits During the development stage, participant teams are provided with the original training and development splits of DiatopIt. These splits consist of 13,669 and 552 examples, respectively. While the training set comprises content from all over the country, the development set contains data from 13 out of 20 regions.3 Teams are allowed to use alternative splits and even augment the dataset at their will, with the only constraint to not use external Twitter data since some tweets can be part of the test set. The (unlabeled) test set is then released during the evaluation window for allowing teams to submit their predictions, and comprises 818 examples from the same regions in the development set plus examples from 1 ≤ ≤ 7 additional regions unknown to participants during both development and evaluation stages.

At the end of the evaluation window, the = 4 additional regions in the test set have been communicated to participants.4 Splits match the original data partitions of DiatopIt; we thus refer the reader to Ramponi and Casula (2023) [8] for details on statistics and distribution.

• id: a unique identifier, diferent from the original

tweet identifier to preserve user’s anonymity; • text: the text of the tweet, with anonymized user mentions, email addresses, URLs, and location strings deriving from cross-platform posting; 4. Evaluation

We use the same baselines for both tracks. For subtask

A, we provide a most frequent baseline and a logistic regression baseline. For subtask B, we provide a centroid baseline and a -nearest neighbors baseline.

During the evaluation phase, participant teams are allowed to submit up to 3 runs (i.e., predictions on the unlabeled test set) for each track and subtask. In all the setups, only textual content can be used. We here present the metrics used for assessing the performance of runs (Section 4.1) and the baselines we provide (Section 4.2).

4.1. Metrics

Due to the diferent nature of coarse-grained geolocation and fine-grained geolocation , we employ diferent evaluation metrics for the subtasks. Subtask-specific metrics are the same for both standard and special tracks. Subtask A The submitted runs are evaluated using macro-averaged precision, recall, and F1 score on the regions of Italy under consideration. For the standard track, this matches all the administrative regions in the test set ( = 17, cf. Section 3, “Data splits”), whereas for the special track, this corresponds to the regions in the chosen linguistic area that are also represented in the test set ( = , cf. Section 2.2). Runs are ranked by macro 1 score and presented in separate rankings (i.e., one for the standard track, and one for each chosen subset of administrative regions in the special track).

Most frequent A baseline that always guesses the most frequent administrative region in the training set (i.e., Lazio) for all test set instances. Logistic regression A machine learning classifier

with default scikit-learn (v1.2.2)6 hyperparameters that employs count vectorizer with unigrams for feature extraction and operates on original text casing. Centroid A baseline that computes the center point (in terms of latitude and longitude) from the training set and predicts it for all test instances. -nearest neighbors (NN) A machine learning regressor with default scikit-learn hyperparameters that employs count vectorizer with unigrams for feature extraction and operates on original text casing.

5. Participants and results A total of 35 runs have been submitted to the GeoLingIt

shared task: 26 runs (6 teams) for the standard track and

5haversine package: https://github.com/mapado/haversine

6scikit-learn library: https://scikit-learn.org

9 runs (2 teams) for the special track. Specifically, for the

standard track we received 14 runs (5 teams) for subtask A and 12 runs (5 teams) for subtask B, whereas for the special track 6 runs (2 teams) have been submitted for subtask A (i.e., Tuscany-Lazio area and Gallo-Italic area) and 3 runs (1 team) have been tailored at subtask B (i.e., Gallo-Italic area). Overall, GeoLingIt has been one of the most participated shared tasks at Evalita 2023 [7] and attracted interest of heterogeneously composed teams with up to 7 individuals, from master students to senior academic researchers.

5.1. Overview of participant teams

In the following, we provide a summary of the approaches employed by participant teams. We refer the reader to their description papers for additional details.7 galliz [17] The team proposed a hybrid approach for subtask A, and participated in both the standard track and special track. Specifically, they combined the predictions given by i) an English pre-trained BERT classifier, previously fine-tuned on augmented GeoLingIt training data, and ii) a dictionary-based algorithm derived from external lexical sources. They then tested diferent hyperparameter setups. As regards data augmentation, the team fine-tuned an Italian word embedding model on the training set, and leveraged word vector similarities to create new training examples by substituting a single word per post with a close word in the embedding space.

Salogni [18] The team tested diferent transformerbased models pre-trained on Italian texts, with a set of hyperparameter settings (e.g., hidden layers, activation functions). They submitted a single run for the standard track, subtask B, based on a UmBERTo language model. ba tti [14] The team participated in both subtasks for the standard track. For subtask A, they experimented SCG The team participated to both tracks and experiwith multi-task learning, a transformer-based and logistic mented with logistic regression and support vector maregression model ensemble, and contrastive pre-training chines for subtask A, and linear regression and NN of a BERT-based Italian model on augmented subtask regression for subtask B.8 They did not submit a report data. Augmentation uses a vocabulary built from on- and we are thus unable to discuss further their approach. line sources to create examples by randomly substituting words with lexical items from varieties spoken in the same or diferent regions. For subtask B, they leveraged 5.2. Results data from both subtasks in a multi-task setting using ei- In this section, we summarize the results of participant ther a BERT-based Italian model or the model that under- teams in both subtask A and B for the standard track went continuous pre-training in subtask A, also testing a (Section 5.2.1) and the special track (Section 5.2.2). rectification module to adjust predictions outside land to the closest point within Italy’s boundaries. 5.2.1. Standard track

DANTE [15] The team focused on further pre-training

BERT-based Italian language models and participated in both subtask A and B for the standard track. Specifically, they experimented with two multi-task pre-training setups, namely task-specific learning and joint learning, with dialect and token classification objectives, using texts collected from external sources. Fine-tuning is then done in a single task setup on relevant subtask data. In both subtasks, they also proposed ensembles of their best-performing models. extremITA [16] The team proposed two one-for-all models, designed to tackle all the challenges at Evalita 2023. The first model is based on the IT5 encoder-decoder architecture, whereas the second one is an instructiontuned model built upon LLaMA. For fine-tuning, they used data from all Evalita 2023 challenges and encoded the tasks as prompts. The team submitted a run for model for both subtasks of the standard track.

We present the results divided by subtask below.

Subtask A: Coarse-grained geolocation In Table 2, we report the results on the test set for all runs submitted by teams participating in subtask A, ranked by macro F1.

All runs by the DANTE team obtained the best results in the subtask, with improvements ranging from 5.52 to 10.10 macro F1 points compared to the best run by the team that ranked second (galliz). The best-performing system by DANTE (run 3) is an ensemble of transformerbased classifiers originally pre-trained on Italian texts, which have been further pre-trained in a multi-task fashion on external data from Dialettando9 and Wikipedia editions for local language varieties of Italy with regioncentric objectives. The best submission by galliz (run 1) is an equally-weighted ensemble of a dictionary-based algorithm (based on Dialettando and GeoLingIt) and an English BERT model fine-tuned on augmented subtask A data, whereas the best run for ba tti (run 2) relies on a 7Indeed, we do not include the specific model versions and hyperparameter choices of participants’ systems due to space constraints. 8We thank the SCG team for providing us with this information.

9“Dialettando” website: https://www.dialettando.com is predominantly used (i.e., Veneto). On the other hand, Salentino varieties as spoken in the southern part of Apulia are part of the extreme southern varieties group [12], which also includes Sicilian, and thus make a large fraction of posts from Apulia to be misclassified as Sicily [ 8].

Besides the limitations of subtask A, this highlights that NLP should eventually go beyond “raw modeling” and start considering again linguistics as its foundation. transformer-based classifier, pre-trained on Italian texts, that has been further pre-trained in a contrastive learning fashion with subtask A data, preemptively augmented with a word substitution approach based on a vocabulary derived from Dialettando and Wikipedia content. While all teams outperformed the most frequent baseline, all runs by extremITA and SCG teams achieved worse results than the logistic regression baseline.

From a closer look, we observe that F1 scores obtained Subtask B: Fine-grained geolocation Test set results by participants’ runs greatly difer across regions (Fig- for all submitted runs in subtask B are reported in Table 3. ure 1). Campania, Lazio, Sardinia, Sicily, and Veneto are All teams except SCG outperformed both the baselines. the easiest to classify. As expected, Abruzzo, Marche, The ba tti team obtained the best results with two out of Trentino-Alto Adige, and Umbria are instead among the three submissions (i.e., run 3 and 1). Their best run relies regions with the lowest scores on average. This is mainly on multi-task learning on subtask A and B data, and uses because posts from those regions have been excluded on geography-informed postprocessing to ensure that prepurpose from the development set, and only few tweets dictions fall inside the country borders. DANTE’s runs are available in the training set, making traditional learn- adopted similar methods to those employed in subtask ing and tuning challenging. As a result, most instances A with separate layers for regression, ranking third with from those regions are typically classified as neighbor- a model ensemble (run 3). Salogni’s run is based on Uming regions in which similar varieties are spoken (e.g., BERTo fine-tuning, whereas the best run by extremITA posts comprising content in Trentino as spoken in the is based on IT5 trained to generate region labels. province of Trento – whose linguistic features exhibit By looking at predictions by models that outperformed traits of continuity between Lombard and Venetian [12] both baselines, we observe that, on average, errors range – are classified as Lombardy and Veneto, respectively). from 0.89 km to 668.11 km, with a median of 58.77 km. Er

Moreover, Friuli-Venezia Giulia and Apulia exhibit low rors are typically due to lexical items that are highly repscores on average across runs despite being represented resented in other locations, e.g., posts with “ghe mel” (en: in all data splits. The reason behind this has to be re- “of course”, Parmigiano variety) fall in the Treviso area searched in linguistics rather than computation. Besides (Veneto) instead of the Parma area (Emilia-Romagna). Friulian, Slovene and German varieties, in Friuli-Venezia Giulia varieties of Venetian are also spoken (e.g., the Tri- 5.2.2. Special track estino variety) [12], and thus posts comprising the latter We present the results divided by subtask below. are easily misclassified with the region in which Venetian Subtask A: Coarse-grained geolocation Oficial results on the test set for the areas chosen by participant teams in subtask A (i.e., the Tuscany-Lazio area and the Gallo-Italic area) are summarized in Table 4.

As regards the Tuscany-Lazio area, the best run by the galliz team (run 3) achieved an improvement over the logistic regression baseline of 11.67 points in macro F1 score. They employed a similar solution as the one for the standard track, additionally leveraging lexicons relevant to the linguistic area under consideration (i.e., lemmas from the Vocabolario del Fiorentino Contemporaneo10 and a word list for the Romanesco dialect)11 giving more weight to the BERT-based model. This confirms the usefulness of using region-specific linguistic materials in the task. For the Gallo-Italic area, all runs by the SCG team are between the two baselines we provided, but we are unfortunately unable to provide insights on their results. 10“Vocabolario del Fiorentino Contemporaneo” website: https://www.

vocabolariofiorentino.it 11Romanesco word list from “The Roman Post” website: https:// www.theromanpost.com/2016/06/dizionario-dialetto-romanesco Most freq. Subtask B: Fine-grained geolocation In Table 5, we report the results for the area chosen by participants in subtask B. As for subtask A, we however do not have enough information to discuss further the SCG’s results.

6. Analysis and discussion

In this section, we analyze the approaches adopted by teams along several dimensions, providing a discussion and the insights derived from the shared task. as the prediction of the provenance region of posts and tokens. The approach followed by DANTE appears to lead to better performance in subtask A, whereas jointly training on both subtasks as done by ba tti seems to help in modeling fine-grained geolocation. Future work may shed light on how those approaches can help each other.

External resources Some participants used external

resources to integrate the available data for the task. Three teams (i.e., DANTE, galliz, and ba tti) used data from a website containing a series of stories, poems, idioms, recipes, and articles in diferent language varieties that are spoken across Italy (i.e., Dialettando). In addition to this, DANTE also leveraged Wikipedia articles written in some of the language varieties that are present in our data. Both DANTE and ba tti used additional data from the Italian Wikipedia. For the special track, galliz also used lemmas from both a vocabulary of contemporary Florentine and a webpage for the Romanesco dialect (cf. Section 5.2.2). While galliz and ba tti used external data to create vocabularies, DANTE used it for pre-training their models. All of the teams who used external resources outperformed both baselines in both tasks, signaling that the use of external resources may indeed be pivotal in tackling the GeoLingIt task.

Models Apart from SCG, all participant teams used

transformer-based language models for their runs. Sa- Data augmentation ba tti and galliz employed data logni adopted an Italian RoBERTa-based model. DANTE augmentation techniques in order to artificially increase and ba tti used versions of BERT pre-trained on Italian the amount of training data. galliz used external data to data, with the former using a much larger pre-training ifne-tune an Italian word embeddings model, and then corpus than the latter, which might have impacted on the exploited it to swap randomly selected tokens with other DANTE runs ranking first in subtask A. In contrast, galliz semantically close ones. The ba tti team, on the other employed an English pre-trained BERT model, which still hand, constructed a vocabulary using external resources outperformed the logistic regression baseline in subtask and then used it to randomly substitute tokens with other A for both the tracks. This might indicate that subword tokens from the vocabulary. Both teams outperformed tokenization in these models is suboptimal for the lan- our baselines, showing that the augmentation and diverguage varieties in DiatopIt, which naturally exhibits sification of training data can be useful for the task. many non-Italian tokens with varied written forms, resulting in potentially small diferences between Italian 7. Conclusions and English pre-trained models. Lastly, extremITA used a T5-based model pre-trained on Italian data and a LLaMA- This paper provided an overview of GeoLingIt, the first based instruction-tuned model. Their results showed that shared task focused on geolocation of linguistic variarecent large language models fine-tuned on disparate tion in Italy. The task attracted wide interest from the tasks are still far from tackling tasks such as GeoLingIt. community, registering 37 expressions of interest and 35 oficial runs. After presenting participants’ results and the adopted approaches, we outlined the main insights from the shared task. Besides natural language processing, we hope that GeoLingIt sensitized the community on the linguistic diversity of the country.

Multi-task learning Both ba tti and DANTE used multi-task learning in their submissions. While ba tti employed it during fine-tuning to exploit subtask A information to tackle subtask B and vice versa, DANTE used multi-task learning during a further stage of pretraining of a BERT-based model pre-trained on Italian data, which was then used to separately fine-tuning it on subtask A and B. Their pre-training setup consists of four tasks, including region-informed objectives, such [2] A. Ramponi, NLP for language varieties of Italy: [11] B. R. Chakravarthi, G. Mihaela, R. T. Ionescu, Challenges and the path forward, arXiv preprint H. Jauhiainen, T. Jauhiainen, K. Lindén, N. Ljubešić, arXiv:2209.09757 (2022). URL: https://arxiv.org/abs/ N. Partanen, R. Priyadharshini, C. Purschke, E. Ra2209.09757. jagopal, Y. Scherrer, M. Zampieri, Findings of the [3] F. Avolio, Lingue e dialetti d’Italia, Le Bussole, VarDial evaluation campaign 2021, in: Proceed

Carocci, Roma, Italy, 2009. ings of the Eighth Workshop on NLP for Similar [4] J. Eisenstein, What to do about bad language on the Languages, Varieties and Dialects, Association for internet, in: Proceedings of the 2013 Conference Computational Linguistics, Kiyv, Ukraine, 2021, pp. of the North American Chapter of the Association 1–11. for Computational Linguistics: Human Language [12] G. B. Pellegrini, Carta dei dialetti d’Italia, Profilo Technologies, Association for Computational Lin- dei Dialetti Italiani, Pacini, Pisa, Italy, 1977. guistics, Atlanta, Georgia, 2013, pp. 359–369. [13] A. Ramponi, S. Tonelli, Features or spuri[5] R. van der Goot, A. Ramponi, A. Zubiaga, B. Plank, ous artifacts? data-centric baselines for fair B. Muller, I. San Vicente Roncal, N. Ljubešić, and robust hate speech detection, in: ProÖ. Çetinoğlu, R. Mahendra, T. Çolakoğlu, T. Bald- ceedings of the 2022 Conference of the North win, T. Caselli, W. Sidorenko, MultiLexNorm: A American Chapter of the Association for Comshared task on multilingual lexical normalization, putational Linguistics: Human Language Techin: Proceedings of the Seventh Workshop on Noisy nologies, Association for Computational LinguisUser-generated Text (W-NUT 2021), Association for tics, Seattle, United States, 2022, pp. 3027–3040. Computational Linguistics, Online, 2021, pp. 493– URL: https://aclanthology.org/2022.naacl-main.221. 509. URL: https://aclanthology.org/2021.wnut-1.55. doi:10.18653/v1/2022.naacl-main.221. doi:10.18653/v1/2021.wnut-1.55. [14] A. Koudounas, F. Giobergia, I. Benedetto, S. Monaco, [6] G. Berruto, Quale dialetto per l’Italia del duemila? L. Cagliero, D. Apiletti, E. Baralis, ba tti at GeoLinAspetti dell’italianizzazione e risorgenze dialettali gIt: Beyond boundaries, enhancing geolocation prein Piemonte (e altrove), in: Lingua e dialetto diction and dialect classification on social media in nell’Italia del Duemila, Congedo, 2006, pp. 101–127. Italy, in: Proceedings of the Eighth Evaluation Cam[7] M. Lai, S. Menini, M. Polignano, V. Russo, R. Sprug- paign of Natural Language Processing and Speech noli, G. Venturi, Evalita 2023: Overview of the 8th Tools for Italian. Final Workshop (EVALITA 2023), evaluation campaign of natural language process- CEUR.org, Parma, Italy, 2023. ing and speech tools for italian, in: Proceedings [15] G. Gallipoli, M. La Quatra, D. Rege Cambrin, of the Eighth Evaluation Campaign of Natural Lan- S. Greco, L. Cagliero, DANTE at GeoLingIt: Dialectguage Processing and Speech Tools for Italian. Final aware multi-granularity pre-training for locating Workshop (EVALITA 2023), CEUR.org, Parma, Italy, tweets within Italy, in: Proceedings of the Eighth 2023. Evaluation Campaign of Natural Language Process[8] A. Ramponi, C. Casula, DiatopIt: A corpus of so- ing and Speech Tools for Italian. Final Workshop cial media posts for the study of diatopic language (EVALITA 2023), CEUR.org, Parma, Italy, 2023. variation in Italy, in: Tenth Workshop on NLP [16] C. D. Hromei, D. Croce, V. Basile, R. Basili, Exfor Similar Languages, Varieties and Dialects (Var- tremITA at EVALITA 2023: Multi-task sustainable Dial 2023), Association for Computational Linguis- scaling to large language models at its extreme, in: tics, Dubrovnik, Croatia, 2023, pp. 187–199. URL: Proceedings of the Eighth Evaluation Campaign of https://aclanthology.org/2023.vardial-1.19. Natural Language Processing and Speech Tools for [9] B. Han, A. Rahimi, L. Derczynski, T. Baldwin, Twit- Italian. Final Workshop (EVALITA 2023), CEUR.org, ter geolocation prediction shared task of the 2016 Parma, Italy, 2023. workshop on noisy user-generated text, in: Proceed- [17] T. Labruna, S. Gallo, Galliz at GeoLingIt: Enhancing ings of the 2nd Workshop on Noisy User-generated BERT with vocabulary knowledge for predicting Text (WNUT), The COLING 2016 Organizing Com- the region of language varieties of Italy, in: Proceedmittee, Osaka, Japan, 2016, pp. 213–217. ings of the Eighth Evaluation Campaign of Natural [10] M. Gaman, D. Hovy, R. T. Ionescu, H. Jauhiainen, Language Processing and Speech Tools for Italian.

T. Jauhiainen, K. Lindén, N. Ljubešić, N. Partanen, Final Workshop (EVALITA 2023), CEUR.org, Parma, C. Purschke, Y. Scherrer, M. Zampieri, A report Italy, 2023. on the VarDial evaluation campaign 2020, in: Pro- [18] I. Salogni, Salogni at GeoLingIt: Geolocalization by ceedings of the 7th Workshop on NLP for Simi- ifne-tuning BERT, in: Proceedings of the Eighth lar Languages, Varieties and Dialects, International Evaluation Campaign of Natural Language ProcessCommittee on Computational Linguistics (ICCL), ing and Speech Tools for Italian. Final Workshop Barcelona, Spain (Online), 2020, pp. 1–14. (EVALITA 2023), CEUR.org, Parma, Italy, 2023.