<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>GeoLingIt at EVALITA 2023: Overview of the Geolocation of Linguistic Variation in Italy Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alan Ramponi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Camilla Casula</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fondazione Bruno Kessler (FBK), Digital Humanities Unit - Trento</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Trento, Department of Information Engineering and Computer Science - Trento</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>GeoLingIt is the first shared task on geolocation of linguistic variation in Italy from social media posts comprising content in language varieties other than standard Italian (i.e., regional Italian, and languages and dialects of Italy). The task is articulated into two subtasks of increasing complexity for which only textual content is allowed: i) coarse-grained geolocation, aiming at predicting the region in which the variety expressed in the post is spoken, and ii) fine-grained geolocation , aiming at predicting its exact coordinates. Both tasks can be either at the country level (standard track) or restricted to a linguistic area of choice (special track). GeoLingIt has attracted wide interest at the Evalita 2023 evaluation campaign with 37 registrations and 35 submitted runs. In this paper, we present the task and data, the evaluation criteria, the participants' results, an analysis of their approaches, and the main insights from the shared task.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Natural language processing</kwd>
        <kwd>computational sociolinguistics</kwd>
        <kwd>linguistic variation</kwd>
        <kwd>linguistic diversity</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>pecially by the youngest generations. User-generated
texts comprising language varieties other than standard
Italy is characterized by an astonishing linguistic diver- Italian open opportunities for the study of linguistic
varisity that makes it a unique landscape in Europe [1]. Be- ation in Italy, and can ultimately help in enriching and
sides standard Italian, a large number of local languages, complement linguistic atlases.
their dialects, and regional varieties of standard Italian In this paper, we present GeoLingIt, the first shared
(i.e., regional Italian) are spoken across the country [2]. task on geolocation of linguistic variation in Italy from
While Italian is employed in all formal settings in its social media posts from Twitter containing content other
standard form, in informal situations it is natural to than standard Italian. GeoLingIt has been organized
observe Italian speakers to use (even unwittingly) re- as part of the Evalita 2023 evaluation campaign [7], and
gional forms of Italian (e.g., guaglione, toso, and caruso for relies on DiatopIt [8], a corpus of geolocated tweets
“young man”, typically in Campania, Lombardy-Veneto, exhibiting regional Italian use, code-switching between
and Sicily areas, respectively), or to code-switch their Italian and local language varieties, or fully written in the
local language varieties with the national language. latter. Compared to previous geolocation shared tasks</p>
      <p>Local languages and their dialects evolved from Vulgar at international venues [9, 10, 11], GeoLingIt is focused
Latin like Italian, and they mostly have no established on Italy and tailored to variation across language
variorthography insofar as they are primarily used in spoken eties, and it thus minimizes the efect of spurious,
highlysettings. On the other hand, regional forms of Italian de- localized lexical items (e.g., mentions of events, places,
rive from a geographical diferentiation of Italian due to or tourist attractions) on prediction of linguistic areas. In
influences by the former [ 3], are largely used in both oral the following, we present details on GeoLingIt, the
reand written informal contexts, and typically follow Italian sults obtained by participant teams, and the main insights
spelling conventions. When it comes to user-generated from the shared task.
texts on social media, which are informal and feature
linguistic patterns from spoken language [4, 5], we
observe that not only regional Italian is naturally present, 2. Task description
but also local language varieties of Italy are employed,
albeit at various degrees. This can be attributed to their The GeoLingIt shared task deals with the geolocation of
rediscovery as “additional expressive resources” [6], es- linguistic variation in Italy from Twitter posts comprising
content in language varieties other than standard Italian
(i.e., regional Italian, and languages and dialects of Italy).
cEeVsAsiLnIgTaAn2d0S2p3e:e8cthh TEovoallsufaotrioItnalCiaanm, pSaepig7n–of8,NPaatrumraal, LITanguage Pro- It aims to advance the study of linguistic variation in
* Corresponding author. Italy, provide means to complement qualitative-driven
$ alramponi@fbk.eu (A. Ramponi); ccasula@fbk.eu (C. Casula) linguistic atlases, and sensitize the community on the
© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License rich linguistic landscape of the country.</p>
      <p>CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org)
2.1. Standard and special tracks guage variation in Italy. All tweets have associated
geolocation information and region labels, and have been
samGeoLingIt is organized into two tracks. In the standard pled to contain either regional Italian usage or content in
track, the focus of the task is at the country level (i.e., local language varieties of Italy. A multi-stage data
colleccomprising all language varieties of Italy), whereas in tion process has been followed based on data-driven
outthe special track, the task is restricted to a linguistic area of-vocabulary tokens (from posts over a period of 2 years)
chosen by participants1 (e.g., the Gallo-Italic area, includ- which have been curated manually. Under-represented
ing language varieties spoken in Piedmont, Lombardy, areas from the resulting posts have been then augmented
Liguria, and Emilia-Romagna regions) to favor the emer- by employing the lexical artifacts package [13]. The
corgence of microvariation insights. For both tracks, two pus consists of 15,039 posts from a 2-year time frame
subtasks of increasing complexity are possible: coarse- (from 2020-07-01 to 2022-06-30) to minimize
periodgrained geolocation (Section 2.2) and fine-grained geolo- related biases. For more details, we refer the reader to
cation (Section 2.3). Ramponi and Casula (2023) [8].</p>
      <p>GeoLingIt is based on DiatopIt [8], a corpus of social
media posts from Twitter specifically focused on
lan</p>
      <sec id="sec-1-1">
        <title>1Participants have been provided with the renowned linguistic map</title>
        <p>by Pellegrini (1977) [12] to encourage linguistically-grounded
proposals, and requests have been approved based on motivation and
relevance of the area from a linguistics perspective. 3Regions in the development set: Apulia, Calabria, Campania,
Emilia2These are: Abruzzo, Aosta Valley, Apulia, Basilicata, Calabria, Romagna, Friuli-Venezia Giulia, Lazio, Liguria, Lombardy,
PiedCampania, Emilia-Romagna, Friuli-Venezia Giulia, Lazio, Liguria, mont, Sardinia, Sicily, Tuscany, and Veneto.
Lombardy, Marche, Molise, Piedmont, Sardinia, Sicily, Tuscany, 4Regions in the test set: the regions in the development set plus
Trentino-Alto Adige, Umbria, and Veneto. Abruzzo, Marche, Trentino-Alto Adige, and Umbria.</p>
        <sec id="sec-1-1-1">
          <title>2.2. Subtask A: Coarse-grained geolocation</title>
          <p>Given the text of a tweet exhibiting regional Italian
features or (partially or fully) written in local languages
and dialects of Italy, predict the administrative region in
which the variety expressed in the post is spoken. This
is a classification task, i.e., one among  regions of Italy
has to be predicted. In the case of the standard track, this
matches all regions of Italy2 ( = 20), whereas in the
special track, it corresponds to the subset of regions 
of the linguistic area under consideration ( = ). This
subtask is applicable for the special track if  ≥ 2 regions
are represented in the chosen area.</p>
        </sec>
        <sec id="sec-1-1-2">
          <title>2.3. Subtask B: Fine-grained geolocation</title>
          <p>Given the text of a tweet exhibiting regional Italian
features or (partially or fully) written in local languages and
dialects of Italy, predict the location, in terms of longitude
and latitude coordinates, in which the variety expressed
in the post is spoken. This is a double regression task, i.e.,
a pair of real-valued numbers has to be predicted. The Data format The corpus splits are in the form of tsv
diference between standard and special tracks is here the ifles, i.e., a tab-separated format, with an example per
extent of the area being considered. This subtask over- line and the first line as header. Each example has id
comes the simplification of coarse-grained geolocation and text columns. For the coarse-grained geolocation
(Section 2.2), aiming to uncover fine-grained linguistic subtask, data files additionally include a region column,
variation. Indeed, language varieties of Italy lie on a con- whereas data files for the fine-grained geolocation subtask
tinuum and often cross administrative region borders. include latitude and longitude columns. As a result,
the instances in both the subtasks are the same, and difer
according to the label column(s). The content of such
3. Data columns is described below:
Data splits During the development stage, participant
teams are provided with the original training and
development splits of DiatopIt. These splits consist of 13,669
and 552 examples, respectively. While the training set
comprises content from all over the country, the
development set contains data from 13 out of 20 regions.3 Teams
are allowed to use alternative splits and even augment
the dataset at their will, with the only constraint to not
use external Twitter data since some tweets can be part
of the test set. The (unlabeled) test set is then released
during the evaluation window for allowing teams to
submit their predictions, and comprises 818 examples from
the same regions in the development set plus examples
from 1 ≤  ≤ 7 additional regions unknown to
participants during both development and evaluation stages.</p>
          <p>At the end of the evaluation window, the  = 4
additional regions in the test set have been communicated
to participants.4 Splits match the original data partitions
of DiatopIt; we thus refer the reader to Ramponi and
Casula (2023) [8] for details on statistics and distribution.</p>
          <p>• id: a unique identifier, diferent from the original</p>
          <p>tweet identifier to preserve user’s anonymity;
• text: the text of the tweet, with anonymized user
mentions, email addresses, URLs, and location
strings deriving from cross-platform posting;
4. Evaluation</p>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>We use the same baselines for both tracks. For subtask</title>
        <p>A, we provide a most frequent baseline and a logistic
regression baseline. For subtask B, we provide a centroid
baseline and a -nearest neighbors baseline.</p>
        <p>During the evaluation phase, participant teams are
allowed to submit up to 3 runs (i.e., predictions on the
unlabeled test set) for each track and subtask. In all the
setups, only textual content can be used. We here present
the metrics used for assessing the performance of runs
(Section 4.1) and the baselines we provide (Section 4.2).</p>
        <sec id="sec-1-2-1">
          <title>4.1. Metrics</title>
          <p>Due to the diferent nature of coarse-grained geolocation
and fine-grained geolocation , we employ diferent
evaluation metrics for the subtasks. Subtask-specific metrics
are the same for both standard and special tracks.
Subtask A The submitted runs are evaluated using
macro-averaged precision, recall, and F1 score on the 
regions of Italy under consideration. For the standard
track, this matches all the administrative regions in the
test set ( = 17, cf. Section 3, “Data splits”), whereas
for the special track, this corresponds to the  regions
in the chosen linguistic area that are also represented
in the test set ( = , cf. Section 2.2). Runs are ranked
by macro 1 score and presented in separate rankings
(i.e., one for the standard track, and one for each chosen
subset of administrative regions in the special track).</p>
        </sec>
      </sec>
      <sec id="sec-1-3">
        <title>Most frequent A baseline that always guesses the most frequent administrative region in the training set (i.e., Lazio) for all test set instances.</title>
      </sec>
      <sec id="sec-1-4">
        <title>Logistic regression A machine learning classifier</title>
        <p>with default scikit-learn (v1.2.2)6 hyperparameters
that employs count vectorizer with unigrams for feature
extraction and operates on original text casing.
Centroid A baseline that computes the center point
(in terms of latitude and longitude) from the training set
and predicts it for all test instances.
-nearest neighbors (NN) A machine learning
regressor with default scikit-learn hyperparameters
that employs count vectorizer with unigrams for feature
extraction and operates on original text casing.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>5. Participants and results</title>
      <sec id="sec-2-1">
        <title>A total of 35 runs have been submitted to the GeoLingIt</title>
        <p>shared task: 26 runs (6 teams) for the standard track and</p>
      </sec>
      <sec id="sec-2-2">
        <title>5haversine package: https://github.com/mapado/haversine</title>
        <p>6scikit-learn library: https://scikit-learn.org</p>
      </sec>
      <sec id="sec-2-3">
        <title>9 runs (2 teams) for the special track. Specifically, for the</title>
        <p>standard track we received 14 runs (5 teams) for subtask
A and 12 runs (5 teams) for subtask B, whereas for the
special track 6 runs (2 teams) have been submitted for
subtask A (i.e., Tuscany-Lazio area and Gallo-Italic area)
and 3 runs (1 team) have been tailored at subtask B (i.e.,
Gallo-Italic area). Overall, GeoLingIt has been one of
the most participated shared tasks at Evalita 2023 [7] and
attracted interest of heterogeneously composed teams
with up to 7 individuals, from master students to senior
academic researchers.</p>
        <sec id="sec-2-3-1">
          <title>5.1. Overview of participant teams</title>
          <p>In the following, we provide a summary of the
approaches employed by participant teams. We refer the
reader to their description papers for additional details.7
galliz [17] The team proposed a hybrid approach for
subtask A, and participated in both the standard track
and special track. Specifically, they combined the
predictions given by i) an English pre-trained BERT classifier,
previously fine-tuned on augmented GeoLingIt training
data, and ii) a dictionary-based algorithm derived from
external lexical sources. They then tested diferent
hyperparameter setups. As regards data augmentation, the
team fine-tuned an Italian word embedding model on
the training set, and leveraged word vector similarities
to create new training examples by substituting a single
word per post with a close word in the embedding space.</p>
          <p>Salogni [18] The team tested diferent
transformerbased models pre-trained on Italian texts, with a set of
hyperparameter settings (e.g., hidden layers, activation
functions). They submitted a single run for the standard
track, subtask B, based on a UmBERTo language model.
ba tti [14] The team participated in both subtasks
for the standard track. For subtask A, they experimented SCG The team participated to both tracks and
experiwith multi-task learning, a transformer-based and logistic mented with logistic regression and support vector
maregression model ensemble, and contrastive pre-training chines for subtask A, and linear regression and NN
of a BERT-based Italian model on augmented subtask regression for subtask B.8 They did not submit a report
data. Augmentation uses a vocabulary built from on- and we are thus unable to discuss further their approach.
line sources to create examples by randomly substituting
words with lexical items from varieties spoken in the
same or diferent regions. For subtask B, they leveraged 5.2. Results
data from both subtasks in a multi-task setting using ei- In this section, we summarize the results of participant
ther a BERT-based Italian model or the model that under- teams in both subtask A and B for the standard track
went continuous pre-training in subtask A, also testing a (Section 5.2.1) and the special track (Section 5.2.2).
rectification module to adjust predictions outside land to
the closest point within Italy’s boundaries.
5.2.1. Standard track</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>DANTE [15] The team focused on further pre-training</title>
        <p>BERT-based Italian language models and participated in
both subtask A and B for the standard track. Specifically,
they experimented with two multi-task pre-training
setups, namely task-specific learning and joint learning,
with dialect and token classification objectives, using
texts collected from external sources. Fine-tuning is then
done in a single task setup on relevant subtask data. In
both subtasks, they also proposed ensembles of their
best-performing models.
extremITA [16] The team proposed two one-for-all
models, designed to tackle all the challenges at Evalita
2023. The first model is based on the IT5 encoder-decoder
architecture, whereas the second one is an
instructiontuned model built upon LLaMA. For fine-tuning, they
used data from all Evalita 2023 challenges and encoded
the tasks as prompts. The team submitted a run for model
for both subtasks of the standard track.</p>
      </sec>
      <sec id="sec-2-5">
        <title>We present the results divided by subtask below.</title>
        <p>Subtask A: Coarse-grained geolocation In Table 2,
we report the results on the test set for all runs submitted
by teams participating in subtask A, ranked by macro F1.</p>
        <p>All runs by the DANTE team obtained the best results
in the subtask, with improvements ranging from 5.52 to
10.10 macro F1 points compared to the best run by the
team that ranked second (galliz). The best-performing
system by DANTE (run 3) is an ensemble of
transformerbased classifiers originally pre-trained on Italian texts,
which have been further pre-trained in a multi-task
fashion on external data from Dialettando9 and Wikipedia
editions for local language varieties of Italy with
regioncentric objectives. The best submission by galliz (run 1)
is an equally-weighted ensemble of a dictionary-based
algorithm (based on Dialettando and GeoLingIt) and an
English BERT model fine-tuned on augmented subtask
A data, whereas the best run for ba tti (run 2) relies on a
7Indeed, we do not include the specific model versions and
hyperparameter choices of participants’ systems due to space constraints.
8We thank the SCG team for providing us with this information.</p>
        <p>9“Dialettando” website: https://www.dialettando.com
is predominantly used (i.e., Veneto). On the other hand,
Salentino varieties as spoken in the southern part of
Apulia are part of the extreme southern varieties group [12],
which also includes Sicilian, and thus make a large
fraction of posts from Apulia to be misclassified as Sicily [ 8].</p>
        <p>Besides the limitations of subtask A, this highlights that
NLP should eventually go beyond “raw modeling” and
start considering again linguistics as its foundation.
transformer-based classifier, pre-trained on Italian texts,
that has been further pre-trained in a contrastive learning
fashion with subtask A data, preemptively augmented
with a word substitution approach based on a vocabulary
derived from Dialettando and Wikipedia content. While
all teams outperformed the most frequent baseline, all
runs by extremITA and SCG teams achieved worse results
than the logistic regression baseline.</p>
        <p>From a closer look, we observe that F1 scores obtained Subtask B: Fine-grained geolocation Test set results
by participants’ runs greatly difer across regions (Fig- for all submitted runs in subtask B are reported in Table 3.
ure 1). Campania, Lazio, Sardinia, Sicily, and Veneto are All teams except SCG outperformed both the baselines.
the easiest to classify. As expected, Abruzzo, Marche, The ba tti team obtained the best results with two out of
Trentino-Alto Adige, and Umbria are instead among the three submissions (i.e., run 3 and 1). Their best run relies
regions with the lowest scores on average. This is mainly on multi-task learning on subtask A and B data, and uses
because posts from those regions have been excluded on geography-informed postprocessing to ensure that
prepurpose from the development set, and only few tweets dictions fall inside the country borders. DANTE’s runs
are available in the training set, making traditional learn- adopted similar methods to those employed in subtask
ing and tuning challenging. As a result, most instances A with separate layers for regression, ranking third with
from those regions are typically classified as neighbor- a model ensemble (run 3). Salogni’s run is based on
Uming regions in which similar varieties are spoken (e.g., BERTo fine-tuning, whereas the best run by extremITA
posts comprising content in Trentino as spoken in the is based on IT5 trained to generate region labels.
province of Trento – whose linguistic features exhibit By looking at predictions by models that outperformed
traits of continuity between Lombard and Venetian [12] both baselines, we observe that, on average, errors range
– are classified as Lombardy and Veneto, respectively). from 0.89 km to 668.11 km, with a median of 58.77 km.
Er</p>
        <p>Moreover, Friuli-Venezia Giulia and Apulia exhibit low rors are typically due to lexical items that are highly
repscores on average across runs despite being represented resented in other locations, e.g., posts with “ghe mel” (en:
in all data splits. The reason behind this has to be re- “of course”, Parmigiano variety) fall in the Treviso area
searched in linguistics rather than computation. Besides (Veneto) instead of the Parma area (Emilia-Romagna).
Friulian, Slovene and German varieties, in Friuli-Venezia
Giulia varieties of Venetian are also spoken (e.g., the Tri- 5.2.2. Special track
estino variety) [12], and thus posts comprising the latter We present the results divided by subtask below.
are easily misclassified with the region in which Venetian
Subtask A: Coarse-grained geolocation Oficial
results on the test set for the areas chosen by participant
teams in subtask A (i.e., the Tuscany-Lazio area and the
Gallo-Italic area) are summarized in Table 4.</p>
        <p>As regards the Tuscany-Lazio area, the best run by the
galliz team (run 3) achieved an improvement over the
logistic regression baseline of 11.67 points in macro F1
score. They employed a similar solution as the one for the
standard track, additionally leveraging lexicons relevant
to the linguistic area under consideration (i.e., lemmas
from the Vocabolario del Fiorentino Contemporaneo10 and a
word list for the Romanesco dialect)11 giving more weight
to the BERT-based model. This confirms the usefulness of
using region-specific linguistic materials in the task. For
the Gallo-Italic area, all runs by the SCG team are between
the two baselines we provided, but we are unfortunately
unable to provide insights on their results.
10“Vocabolario del Fiorentino Contemporaneo” website: https://www.</p>
        <p>vocabolariofiorentino.it
11Romanesco word list from “The Roman Post” website: https://
www.theromanpost.com/2016/06/dizionario-dialetto-romanesco
Most freq.
Subtask B: Fine-grained geolocation In Table 5, we
report the results for the area chosen by participants in
subtask B. As for subtask A, we however do not have
enough information to discuss further the SCG’s results.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>6. Analysis and discussion</title>
      <p>In this section, we analyze the approaches adopted by
teams along several dimensions, providing a discussion
and the insights derived from the shared task.
as the prediction of the provenance region of posts and
tokens. The approach followed by DANTE appears to
lead to better performance in subtask A, whereas jointly
training on both subtasks as done by ba tti seems to help
in modeling fine-grained geolocation. Future work may
shed light on how those approaches can help each other.</p>
      <sec id="sec-3-1">
        <title>External resources Some participants used external</title>
        <p>resources to integrate the available data for the task.
Three teams (i.e., DANTE, galliz, and ba tti) used data
from a website containing a series of stories, poems,
idioms, recipes, and articles in diferent language varieties
that are spoken across Italy (i.e., Dialettando). In
addition to this, DANTE also leveraged Wikipedia articles
written in some of the language varieties that are present
in our data. Both DANTE and ba tti used additional
data from the Italian Wikipedia. For the special track,
galliz also used lemmas from both a vocabulary of
contemporary Florentine and a webpage for the Romanesco
dialect (cf. Section 5.2.2). While galliz and ba tti used
external data to create vocabularies, DANTE used it for
pre-training their models. All of the teams who used
external resources outperformed both baselines in both
tasks, signaling that the use of external resources may
indeed be pivotal in tackling the GeoLingIt task.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Models Apart from SCG, all participant teams used</title>
        <p>transformer-based language models for their runs. Sa- Data augmentation ba tti and galliz employed data
logni adopted an Italian RoBERTa-based model. DANTE augmentation techniques in order to artificially increase
and ba tti used versions of BERT pre-trained on Italian the amount of training data. galliz used external data to
data, with the former using a much larger pre-training ifne-tune an Italian word embeddings model, and then
corpus than the latter, which might have impacted on the exploited it to swap randomly selected tokens with other
DANTE runs ranking first in subtask A. In contrast, galliz semantically close ones. The ba tti team, on the other
employed an English pre-trained BERT model, which still hand, constructed a vocabulary using external resources
outperformed the logistic regression baseline in subtask and then used it to randomly substitute tokens with other
A for both the tracks. This might indicate that subword tokens from the vocabulary. Both teams outperformed
tokenization in these models is suboptimal for the lan- our baselines, showing that the augmentation and
diverguage varieties in DiatopIt, which naturally exhibits sification of training data can be useful for the task.
many non-Italian tokens with varied written forms,
resulting in potentially small diferences between Italian 7. Conclusions
and English pre-trained models. Lastly, extremITA used a
T5-based model pre-trained on Italian data and a LLaMA- This paper provided an overview of GeoLingIt, the first
based instruction-tuned model. Their results showed that shared task focused on geolocation of linguistic
variarecent large language models fine-tuned on disparate tion in Italy. The task attracted wide interest from the
tasks are still far from tackling tasks such as GeoLingIt. community, registering 37 expressions of interest and 35
oficial runs. After presenting participants’ results and
the adopted approaches, we outlined the main insights
from the shared task. Besides natural language
processing, we hope that GeoLingIt sensitized the community
on the linguistic diversity of the country.</p>
        <p>Multi-task learning Both ba tti and DANTE used
multi-task learning in their submissions. While ba tti
employed it during fine-tuning to exploit subtask A
information to tackle subtask B and vice versa, DANTE
used multi-task learning during a further stage of
pretraining of a BERT-based model pre-trained on Italian
data, which was then used to separately fine-tuning it
on subtask A and B. Their pre-training setup consists of
four tasks, including region-informed objectives, such
[2] A. Ramponi, NLP for language varieties of Italy: [11] B. R. Chakravarthi, G. Mihaela, R. T. Ionescu,
Challenges and the path forward, arXiv preprint H. Jauhiainen, T. Jauhiainen, K. Lindén, N. Ljubešić,
arXiv:2209.09757 (2022). URL: https://arxiv.org/abs/ N. Partanen, R. Priyadharshini, C. Purschke, E.
Ra2209.09757. jagopal, Y. Scherrer, M. Zampieri, Findings of the
[3] F. Avolio, Lingue e dialetti d’Italia, Le Bussole, VarDial evaluation campaign 2021, in:
Proceed</p>
        <p>Carocci, Roma, Italy, 2009. ings of the Eighth Workshop on NLP for Similar
[4] J. Eisenstein, What to do about bad language on the Languages, Varieties and Dialects, Association for
internet, in: Proceedings of the 2013 Conference Computational Linguistics, Kiyv, Ukraine, 2021, pp.
of the North American Chapter of the Association 1–11.
for Computational Linguistics: Human Language [12] G. B. Pellegrini, Carta dei dialetti d’Italia, Profilo
Technologies, Association for Computational Lin- dei Dialetti Italiani, Pacini, Pisa, Italy, 1977.
guistics, Atlanta, Georgia, 2013, pp. 359–369. [13] A. Ramponi, S. Tonelli, Features or
spuri[5] R. van der Goot, A. Ramponi, A. Zubiaga, B. Plank, ous artifacts? data-centric baselines for fair
B. Muller, I. San Vicente Roncal, N. Ljubešić, and robust hate speech detection, in:
ProÖ. Çetinoğlu, R. Mahendra, T. Çolakoğlu, T. Bald- ceedings of the 2022 Conference of the North
win, T. Caselli, W. Sidorenko, MultiLexNorm: A American Chapter of the Association for
Comshared task on multilingual lexical normalization, putational Linguistics: Human Language
Techin: Proceedings of the Seventh Workshop on Noisy nologies, Association for Computational
LinguisUser-generated Text (W-NUT 2021), Association for tics, Seattle, United States, 2022, pp. 3027–3040.
Computational Linguistics, Online, 2021, pp. 493– URL: https://aclanthology.org/2022.naacl-main.221.
509. URL: https://aclanthology.org/2021.wnut-1.55. doi:10.18653/v1/2022.naacl-main.221.
doi:10.18653/v1/2021.wnut-1.55. [14] A. Koudounas, F. Giobergia, I. Benedetto, S. Monaco,
[6] G. Berruto, Quale dialetto per l’Italia del duemila? L. Cagliero, D. Apiletti, E. Baralis, ba tti at
GeoLinAspetti dell’italianizzazione e risorgenze dialettali gIt: Beyond boundaries, enhancing geolocation
prein Piemonte (e altrove), in: Lingua e dialetto diction and dialect classification on social media in
nell’Italia del Duemila, Congedo, 2006, pp. 101–127. Italy, in: Proceedings of the Eighth Evaluation
Cam[7] M. Lai, S. Menini, M. Polignano, V. Russo, R. Sprug- paign of Natural Language Processing and Speech
noli, G. Venturi, Evalita 2023: Overview of the 8th Tools for Italian. Final Workshop (EVALITA 2023),
evaluation campaign of natural language process- CEUR.org, Parma, Italy, 2023.
ing and speech tools for italian, in: Proceedings [15] G. Gallipoli, M. La Quatra, D. Rege Cambrin,
of the Eighth Evaluation Campaign of Natural Lan- S. Greco, L. Cagliero, DANTE at GeoLingIt:
Dialectguage Processing and Speech Tools for Italian. Final aware multi-granularity pre-training for locating
Workshop (EVALITA 2023), CEUR.org, Parma, Italy, tweets within Italy, in: Proceedings of the Eighth
2023. Evaluation Campaign of Natural Language
Process[8] A. Ramponi, C. Casula, DiatopIt: A corpus of so- ing and Speech Tools for Italian. Final Workshop
cial media posts for the study of diatopic language (EVALITA 2023), CEUR.org, Parma, Italy, 2023.
variation in Italy, in: Tenth Workshop on NLP [16] C. D. Hromei, D. Croce, V. Basile, R. Basili,
Exfor Similar Languages, Varieties and Dialects (Var- tremITA at EVALITA 2023: Multi-task sustainable
Dial 2023), Association for Computational Linguis- scaling to large language models at its extreme, in:
tics, Dubrovnik, Croatia, 2023, pp. 187–199. URL: Proceedings of the Eighth Evaluation Campaign of
https://aclanthology.org/2023.vardial-1.19. Natural Language Processing and Speech Tools for
[9] B. Han, A. Rahimi, L. Derczynski, T. Baldwin, Twit- Italian. Final Workshop (EVALITA 2023), CEUR.org,
ter geolocation prediction shared task of the 2016 Parma, Italy, 2023.
workshop on noisy user-generated text, in: Proceed- [17] T. Labruna, S. Gallo, Galliz at GeoLingIt: Enhancing
ings of the 2nd Workshop on Noisy User-generated BERT with vocabulary knowledge for predicting
Text (WNUT), The COLING 2016 Organizing Com- the region of language varieties of Italy, in:
Proceedmittee, Osaka, Japan, 2016, pp. 213–217. ings of the Eighth Evaluation Campaign of Natural
[10] M. Gaman, D. Hovy, R. T. Ionescu, H. Jauhiainen, Language Processing and Speech Tools for Italian.</p>
        <p>T. Jauhiainen, K. Lindén, N. Ljubešić, N. Partanen, Final Workshop (EVALITA 2023), CEUR.org, Parma,
C. Purschke, Y. Scherrer, M. Zampieri, A report Italy, 2023.
on the VarDial evaluation campaign 2020, in: Pro- [18] I. Salogni, Salogni at GeoLingIt: Geolocalization by
ceedings of the 7th Workshop on NLP for Simi- ifne-tuning BERT, in: Proceedings of the Eighth
lar Languages, Varieties and Dialects, International Evaluation Campaign of Natural Language
ProcessCommittee on Computational Linguistics (ICCL), ing and Speech Tools for Italian. Final Workshop
Barcelona, Spain (Online), 2020, pp. 1–14. (EVALITA 2023), CEUR.org, Parma, Italy, 2023.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>