<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Salogni at GeoLingIt: Geolocalization by Fine-tuning BERT</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ilaria Salogni</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Università di Pisa</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>The recent growing interest in low-resource languages has been significantly bolstered by transformer-based models. By ifne-tuning three such models, two based on BERT and the other on RoBERTa, I aim at geolocating sequences exhibiting non-standard language varieties relying solely on linguistic content. I find that, given that the information contained in the embeddings is all we need to carry out this complex task, a model architecture with less task-specific layers leads to better results. Furthermore, models pre-trained on miscellaneous corpora generalize better than those trained exclusively on tweets. The work also shows that the greater availability of resources of a certain regional variety positively afects the capacity of the model.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>1.1. Task</p>
      <sec id="sec-1-1">
        <title>The goal of this project is to predict its location in terms</title>
        <p>
          of longitude and latitude coordinates (fine-grain
geolocation) of tweets exhibiting non-standard language, based
solely on linguistic content. This is a (double) regression
task. In contrast to previous geolocation shared tasks on
other areas ([
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]; [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]; [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]), GeoLingIt is focused on Italy.
        </p>
        <p>
          Recognizing varieties and forming an opinion about
where the speaker comes from is something so ingrained
in our experience as speakers that it seems innate, and
even a little magical. The question that drives this work
is: can Large Language Models (LLMs) do what we do
and if so, how well can they do it? do they do that in
a way that is operationally similar to ours? The Italian
scenario is a good testing ground as despite his limited 1.2. Dataset
geographical extent, it is one of the most
linguisticallydiverse in Europe. In their work, Ramponi and Casula say GeoLingIT task data comprises 15K geotagged tweets
that current transformer-based models are rather limited that exhibit non-standard Italian language use (the
confor modeling language variation over space in highly mul- tent may be fully written in local language varieties or
tilingual areas such as Italy [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. I don’t agree completely, exhibiting code-switching with standard italian), and that
not only because of the encouraging results of the ap- have been collected in the corpus DiatopIt [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The data
plication of LLMs in a always growing number of tasks, is annotated with latitude and longitude. After removing
but also because what we can explain on how they work the emojis and tags, all the labeled data provided by the
does not highlight anything which may prevent good organizers were merged and then split into
train-evalperformance. Furthermore, the work of Lutsai and Lam- test sets. Several crossvalidations were performed with
pert [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] reaches the astonishing result of a median error 3-folds or 2-folds split, using train-eval sets. Target and
of 30km worldwide level, and fewer than 15 km on the output coordinate data were normalized using Min-Max
US-level datasets for the models trained and evaluated scaling, as this understandably improved the quality of
on text features of tweets’ content and meta data context, model prediction.
using a BERT model [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The fact that Twitter language
identifier classifies with the label designed for standard 2. System description
Italian language also contents both partially and fully
written in language varieties of Italy, as observed again Knowing that representations learned by
transformerby Ramponi and Casula [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] may suggest that the LLMs based models achieve strong performance across many
already have in their pre-training dataset the knowledge tasks with various datasets ([
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], inter alia), I first decided
that they need to carry out a geolocalization task. to perform the fine-tuning of three diferent
monolin
        </p>
        <p>
          This document describes the model I submitted to the gual BERT-based [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] or RoBERTa-based [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] models,
preEVALITA 2023 evaluation campaign [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] for the task Ge- trained on Italian texts. After picking the best performing
oLingIT [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. model, I cross-validated it on a diverse set of
hyperparameter configurations (e.g., number and size of hidden
layers, activation functions) to pick the best task-specific
architecture. All the runs were performed on Colab using
high-RAM Nvidia A100 GPUs.
        </p>
        <p>EVALITA 2023: 8th Evaluation Campaign of Natural Language
Processing and Speech Tools for Italian, Sep 7 – 8, Parma, IT
$ i.salogni@studenti.unipi.it (I. Salogni)</p>
        <p>© 2023 Copyright for this paper by its authors. Use permitted under Creative
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g CCoEmmUoRns LWiceonsrekAstthribouptionP4r.0oIncteerenadtiionnagl s(CC(CBYE4U.0)R.-WS.org)
2.1. Comparing diferent models</p>
      </sec>
      <sec id="sec-1-2">
        <title>To assess how diferent pre-trained monolingual models</title>
        <p>
          perform on the given dataset, I fine-tuned the
RoBERTabased umberto-commoncrawl-cased-v1 [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] model, and 3. Results
the BERT-based models bert-base-italian-cased [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] and
AlBERTo-it [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], adding a single linear layer with two out- The umberto-commoncrawl-cased-v1 model on the
"minput neurons to the pooling layer of each model, without imal" task specific architecture yielded the best Mean
activation function. I tested this "minimal" task-specific Absolut Error (MAE) results in 3-fold cross-validation
architecture on 3 batch values (50, 100, 150) for 10 epochs, using the provided labelled data, and achieved 128.19 km
dividing the train-dev set into 3 folds. of avg distance in km on the blind test set provided by the
challenge organizer. Although AlBERTo-it and
bert-base2.2. Adding a single hidden layer italian-cased were outperformed, their achieved results
are not too distant, as shown in Table 1.
        </p>
        <p>To explore the potential benefits of introducing additional The second best MAE results were achieved using the
complexity to the model, I designed a new task-specific 300 neurons single hidden layer task-specific architecture,
architecture adding a single hidden layer right after the and then followed by the 5 neurons single hidden layer
pooling layer, testing diferent sizes (5 neurons and then architecture, as shown in Table 2. This can be explained
300 neurons) followed by an activation function (Iden- by thinking that adding a small hidden layer after the
tity, Sigmoid or ReLU), and finally, a two-neuron output pooling layer leads to an initial drastic reduction in the
layer. To reduce the computational cost, only umberto- size of the model output.
commoncrawl-cased-v1 was tested using this and the The worst results, on the other hand, were all obtained
next architectures. For the same reason batch size 50 was with the 3 hidden layers architecture and ReLU as
acmaintained. tivation function. The accuracy dropped possibly
because of excessive feature compression: when several
2.3. Adding more hidden layers hidden layers are stacked, this reduction is followed by
another further reduction of the size of the input
vecStill with the rationale of knowing whether adding fur- tor, and the linear activation function was of no use in
ther complexity would enhance the model’s learning ca- this case. Therefore, further complicating the
architecpacity, I tested a task-specific setting with 3 hidden layers ture requires an additional regularization efort, which
with neurons in combination (5, 5, 10) (10, 5, 5) and (300, the results achieved with only one hidden layer or even</p>
        <p>
          There are no specific areas where the inputs have a
larger error. In contrast, inputs from areas in
PiedmontLombardy-Veneto and Lazio-Campania have lower error
than the others. In fact, two or three marked clusters
can be observed in the scatterplots of the outputs (Figure
2), depending on the model configuration, the most
persistent of which is between Lazio and Campania, then
a cluster that follows the Alpine arc and finally less
frequently by a cluster on Sicily. Excluding that this can be
attributed to an imbalance in our fine-tuning dataset, this
result must come from the representation of the
embeddings of each model. Ramponi and Casula [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] argue on
the fact that the pre-training material that had been used
by those models may include content in language
varieties of Italy, and they attribute it to the over-prediction
of Italian of current language identifiers, observing that
content both partially and fully written in language
varieties of Italy is typically classified as standard italian by
the Twitter language identifier. I can further hypothesize
that the varieties from the areas with the smallest error
are also quantitatively more present in the pre-training
dataset of each model, as these are also the ones from the
most densely populated areas in Italy.
        </p>
        <p>However, it is very complex to reconnect these
observations to one or more linguistic facts concerning the
Italian regional varieties. The question then is how did</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>5. Conclusions</title>
      <p>The behavior shown by our models (need for
regularization in the presence of numerous layers, better results
with a single bigger hidden layer) is what we can expect
from a simple neural network. However, it is
astonishing that such a simple architecture manages to obtain
non-disastrous results in an complex NLP task. The
success of this regression task is undoubtedly attributable</p>
      <sec id="sec-2-1">
        <title>1The OSCAR [15] subcorpus also has some subsets in other ital</title>
        <p>ian language varieties (such Piedmontese), but the oficial
umbertocommoncrawl-cased-v1 model card says that it was pre-trained only
on the Italian subcorpus, deduplicated.</p>
        <p>Figure 2: Scatterplots of the target (in black) and output (in red) coordinates for each configuration
to the high-level representations of the input data,
together with BERT’s ability to understand the linguistic
context. Therefore, less is more: a simple setup, using
even just two output neurons, seems to work better than
a more complex one for BERT fine-tuned models on this
task. Furthermore, in this work the models pre-trained
on a miscellaneous corpus provides embeddings that
performed better on tweets than a corpus specifically of the
same genre. In conclusion, it is dificult to say how close
we came to the goal, if the goal was to adequately map
the diatopic variation of contemporary Italian, trying
to automatically extract regional and dialectal patterns.</p>
        <p>
          Even if in this work we were unable to further probe the
linguistic information used to carry out our task, the
studies converge in believing that BERT’s structure is, however,
linguistically founded, although perhaps in a way that is
more nuanced than can be explained by layers alone [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramponi</surname>
          </string-name>
          , C. Casula,
          <article-title>DiatopIt: A corpus of social media posts for the study of diatopic language variation in Italy</article-title>
          ,
          <source>in: Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial</source>
          <year>2023</year>
          ),
          <article-title>Association for Computational Linguistics</article-title>
          , Dubrovnik, Croatia,
          <year>2023</year>
          , pp.
          <fpage>187</fpage>
          -
          <lpage>199</lpage>
          . URL: https://aclanthology.org/
          <year>2023</year>
          .vardial-
          <volume>1</volume>
          .
          <fpage>19</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Lutsai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>Lampert</surname>
          </string-name>
          ,
          <article-title>Geolocation predicting of tweets using bert-based models</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2303</volume>
          .
          <fpage>07865</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https://aclanthology.org/N19- 1423. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1423.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Menini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Polignano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sprugnoli</surname>
          </string-name>
          , G. Venturi,
          <year>Evalita 2023</year>
          :
          <article-title>Overview of the 8th evaluation campaign of natural language processing and speech tools for italian, in: Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2023</year>
          ), CEUR.org, Parma, Italy,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramponi</surname>
          </string-name>
          , C. Casula, GeoLingIt at EVALITA 2023:
          <article-title>Overview of the geolocation of linguistic variation in Italy task</article-title>
          ,
          <source>in: Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2023</year>
          ), CEUR.org, Parma, Italy,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            .
            <surname>Rahimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Derczynski</surname>
          </string-name>
          , T. Baldwin,
          <article-title>Twitter geolocation prediction shared task of the 2016 workshop on noisy user-generated text</article-title>
          ,
          <source>in: Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)</source>
          ,
          <source>The COLING 2016 Organizing Committee</source>
          , Osaka, Japan,
          <year>2016</year>
          , pp.
          <fpage>213</fpage>
          -
          <lpage>217</lpage>
          . URL: https: //aclanthology.org/W16-3928.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gaman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hovy</surname>
          </string-name>
          , R. T. Ionescu,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jauhiainen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Jauhiainen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lindén</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ljubešić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Partanen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Purschke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Scherrer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <source>A report on the VarDial evaluation campaign</source>
          <year>2020</year>
          ,
          <source>in: Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects</source>
          ,
          <source>International Committee on Computational Linguistics (ICCL)</source>
          , Barcelona,
          <source>Spain (Online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .vardial-
          <volume>1</volume>
          .1.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          , G. Mihaela, R. T. Ionescu,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jauhiainen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Jauhiainen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lindén</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ljubešić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Partanen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Purschke</surname>
          </string-name>
          , E. Rajagopal,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Scherrer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <article-title>Findings of the VarDial evaluation campaign 2021</article-title>
          , in: Proceedings of the Eighth Workshop on NLP for
          <article-title>Similar Languages, Varieties and Dialects, Association for Computational Linguistics</article-title>
          , Kiyv, Ukraine,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .vardial-
          <volume>1</volume>
          .1.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Michael</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          , S. Bowman,
          <string-name>
            <surname>GLUE:</surname>
          </string-name>
          <article-title>A multi-task benchmark and analysis platform for natural language understanding</article-title>
          ,
          <source>in: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Association for Computational Linguistics</source>
          , Brussels, Belgium,
          <year>2018</year>
          , pp.
          <fpage>353</fpage>
          -
          <lpage>355</lpage>
          . URL: https://aclanthology.org/W18-5446. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>W18</fpage>
          -5446.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L.</given-names>
            <surname>Parisi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Francia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Magnani</surname>
          </string-name>
          ,
          <article-title>Umberto: an italian language model trained with whole word masking</article-title>
          , https://github.com/musixmatchresearch/umberto,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schweter</surname>
          </string-name>
          ,
          <source>Italian bert and electra models</source>
          ,
          <year>2020</year>
          . URL: https://doi.org/10.5281/zenodo.4263142. doi:
          <volume>10</volume>
          .5281/zenodo.4263142.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Polignano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Basile</surname>
          </string-name>
          , M. de Gemmis, G. Semeraro, V. Basile,
          <article-title>AlBERTo: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets</article-title>
          ,
          <source>in: Proceedings of the Sixth Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2019</year>
          ), volume
          <volume>2481</volume>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2019</year>
          . URL: https://www.scopus.com/inward/record.uri?e id=
          <fpage>2</fpage>
          -
          <lpage>s2</lpage>
          .
          <fpage>0</fpage>
          -
          <lpage>85074851349</lpage>
          <source>&amp;partnerID=40&amp;md5=7abe d946e06f 76b3825ae5e294fac14.</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Wiedemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Remus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chawla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Biemann</surname>
          </string-name>
          ,
          <article-title>Does BERT make any sense? interpretable word sense disambiguation with contextualized embeddings</article-title>
          , CoRR abs/
          <year>1909</year>
          .10430 (
          <year>2019</year>
          ). URL: http: //arxiv.org/abs/
          <year>1909</year>
          .10430. arXiv:
          <year>1909</year>
          .10430.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Abadji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ortiz Suarez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Romary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sagot</surname>
          </string-name>
          ,
          <article-title>Towards a Cleaner Document-Oriented Multilingual Crawled Corpus</article-title>
          , arXiv e-prints (
          <year>2022</year>
          ) arXiv:
          <fpage>2201</fpage>
          .06642. arXiv:
          <volume>2201</volume>
          .
          <fpage>06642</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Tiedemann</surname>
          </string-name>
          , S. Thottingal,
          <article-title>OPUS-MT - building open translation services for the world</article-title>
          ,
          <source>in: Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, European Association for Machine Translation</source>
          , Lisboa, Portugal,
          <year>2020</year>
          , pp.
          <fpage>479</fpage>
          -
          <lpage>480</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .eamt-
          <volume>1</volume>
          .
          <fpage>61</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Niu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Penn, Does BERT rediscover a classical NLP pipeline?</article-title>
          ,
          <source>in: Proceedings of the 29th International Conference on Computational Linguistics</source>
          ,
          <source>International Committee on Computational Linguistics</source>
          , Gyeongju, Republic of Korea,
          <year>2022</year>
          , pp.
          <fpage>3143</fpage>
          -
          <lpage>3153</lpage>
          . URL: https://aclanthology.org /
          <year>2022</year>
          .coling-
          <volume>1</volume>
          .
          <fpage>278</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>