<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Computational Humanities Research Conference, November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>DaCy: A Unified Framework for Danish NLP</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kenneth Enevoldsen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lasse Hansen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kristofer L.</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nielbo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Humanities Computing, Aarhus University</institution>
          ,
          <country country="DK">Denmark</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Clinical Medicine, Aarhus University</institution>
          ,
          <country country="DK">Denmark</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Interacting Minds Centre, Aarhus University</institution>
          ,
          <country country="DK">Denmark</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>1</volume>
      <fpage>7</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>Danish natural language processing (NLP) has in recent years obtained considerable improvements with the addition of multiple new datasets and models. However, at present, there is no coherent framework for applying state-of-the-art models for Danish. We present DaCy: a unified framework for Danish NLP built on and integrated with SpaCy. DaCy uses efficient multitask models which obtain state-of-the-art performance on named entity recognition, part-of-speech tagging, and dependency parsing. DaCy contains tools for easy integration of existing models such as for polarity, emotion, or subjectivity detection. In addition, we conduct a series of tests for biases and robustness of Danish NLP pipelines through data augmentation. DaCy large compares favorably and is especially robust to long input lengths and spelling variations and errors. All models except DaCy large display significant biases related to ethnicity while only Polyglot shows a significant gender bias. We argue that for languages with limited benchmark sets, data augmentation can be particularly useful for obtaining more realistic and fine-grained performance estimates. We provide a series of augmenters as a first step towards a more thorough evaluation of language models for low and medium resource languages and encourage further development.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Natural Language Processing</kwd>
        <kwd>Low-resource NLP</kwd>
        <kwd>Data Augmentation</kwd>
        <kwd>Danish NLP</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        1.1. DaCy
With this motivation we present DaCy: an efficient end-to-end framework for Danish NLP
with state-of-the-art performance on POS, NER and dependency parsing. DaCy fills the
gap in Danish NLP by providing a consistent interface that is easily extendable and able to
integrate other models. DaCy is built on SpaCy v.3 which comes with a range of advantages:
the framework is optimized, user-friendly, and well-documented. DaCy includes three
finetuned language models: DaCy small, based on a Danish Electra (14M parameters) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]; DaCy
medium, based on the Danish BERT (110M parameters) [
        <xref ref-type="bibr" rid="ref21">22</xref>
        ]; and DaCy large, based on the
multilingual XLM-RoBERTa (550M parameters) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. All models have been fine-tuned to do
POS tagging, NER, and dependency parsing in a single forward pass, which increases the
efficiency of the model and allows for larger models at the same computational cost.
      </p>
      <p>
        Besides models fine-tuned for DaCy, the package includes convenient wrappers to add other
models to the pipeline. For instance, Danish models for detecting polarity, emotion, and
subjectivity classification can be added in a single line of code, and any HuggingFace Transformers
[
        <xref ref-type="bibr" rid="ref33">34</xref>
        ] model trained for sentence classification can be conveniently wrapped and included in the
pipeline using utility functions. With this functionality, DaCy aims at being a unified
framework for Danish NLP. All functionality is well-documented and covered by tutorials.1
      </p>
      <sec id="sec-1-1">
        <title>1.2. Robustness &amp; Evaluation</title>
        <p>
          Fine-tuned language models are commonly evaluated by testing performance on a gold-standard
benchmark dataset. The most commonly used benchmark for Danish is the DaNE dataset
[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], which consists of the Danish Dependency Treebank [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], additionally tagged for NER.
For languages with few benchmarks datasets, such as Danish, the performance stability and
generalizability can not be reliably estimated [
          <xref ref-type="bibr" rid="ref26">27</xref>
          ]. For instance, the text included in DaNE
was collected in the years 1983–1992 from both written and spoken domains [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Given the
change of languages over time and the addition of new textual domains such as social media,
this dataset is unlikely to be representative of the contemporary domains of application. For
instance, models might not be sufficiently exposed to e.g. abbreviated names, spelling errors,
or non-standard casing to correctly and robustly classify them. In this sense, the performance
obtained on DaNE is unlikely to hold for real-world use cases.
        </p>
        <p>
          To provide an additional layer of validation, we propose evaluating models on augmented
gold-standard data. Data augmentation entails generating new data by slightly modifying
existing data points [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Data augmentation techniques such as rotation and cropping are
widely used in computer vision to reduce overfitting [
          <xref ref-type="bibr" rid="ref28">29</xref>
          ], and are becoming increasingly
common in NLP [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The complex syntactic and semantic structure of text complicates the task
of finding useful augmentations, but simple manipulations such as synonym replacement and
random character swaps and deletions have been found to be particularly useful for supervised
learning in low-resource settings [
          <xref ref-type="bibr" rid="ref32">33</xref>
          ].
        </p>
        <p>
          Although data augmentation is most commonly used for increasing the amount of training
data, it can just as well be used for evaluation purposes [
          <xref ref-type="bibr" rid="ref26">27</xref>
          ]. By augmenting a gold-standard
dataset, we can evaluate model performance when exposed to data that more closely mimics
real-life settings by adding spelling errors, more diverse names, or other manipulations. In
section 2.2, we introduce a series of augmentations and evaluate the performance of Danish
NLP pipelines on them.
        </p>
        <p>The contributions of this paper are three-fold. 1) We introduce new state-of-art models for
Danish dependency parsing, NER and POS. 2) We introduce the DaCy Python library as a
unified framework for state-of-the-art NLP in Danish. 3) We evaluate Danish NLP pipelines</p>
        <sec id="sec-1-1-1">
          <title>1See: https://centre-for-humanities-computing.github.io/DaCy/</title>
          <p>using data augmentation and provide directions for future model development.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Methods</title>
      <sec id="sec-2-1">
        <title>2.1. Training</title>
        <p>
          To train the candidate models for DaCy, all publicly available Transformer-based language
models for Danish were fine-tuned on the DaNE corpus [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] using SpaCy 3.0.3 [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. The
models include 2 Danish ELECTRAs [
          <xref ref-type="bibr" rid="ref14 ref31 ref8">8, 14, 32</xref>
          ], the Danish ConvBERT [
          <xref ref-type="bibr" rid="ref17 ref31">17, 32</xref>
          ], the Danish
BERT [
          <xref ref-type="bibr" rid="ref11 ref21">11, 22</xref>
          ], and the multilingual XLM-Roberta Large [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. All models were trained with
an input length of 10 sentences until convergence using similar hyperparameters on a Quadro
RTX 8000 GPU. Adam was used as optimizer with hyperparameters β1 = 0.9 and β2 = 0.999.
Further, L2 normalization with α = 0.01 and gradient clipping with c = 1.0 was employed. For
increased efficiency, all models were trained with a multi-task objective [
          <xref ref-type="bibr" rid="ref27 ref6">6, 28</xref>
          ] on NER, POS,
and dependency parsing. This allows the training of larger models at the same computational
cost, but it is unlikely that multi-task training at this scale improves performance [
          <xref ref-type="bibr" rid="ref1 ref24">25, 1</xref>
          ].2
        </p>
        <p>
          Table 1 shows the performance of all fine-tuned models evaluated on DaNE’s test set. The
three best performing models in each size category, XLM-Roberta, DaBERT, and Ælaectra
Cased are included in DaCy as the large, medium and small, respectively. In line with previous
ifndings [
          <xref ref-type="bibr" rid="ref23 ref24 ref5">25, 5, 24</xref>
          ], larger models tend to perform better with XLM-Roberta obtaining the best
performance across the board.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Evaluation</title>
        <p>
          To evaluate the robustness of DaCy and other Danish NLP pipelines, we assessed their
performance on multiple augmented version of the DaNE test set. All Danish models are trained on
the DaNE corpus which consists of a mix of textual data of both spoken and written origin from
the years 1983–1992 [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], with the exception of Polyglot which is trained on entities extracted
from Wikipedia [
          <xref ref-type="bibr" rid="ref25">26</xref>
          ]. As a consequence, the training data is rarely representative of the domain
in which the models will be applied. For example, social media, contemporary news media,
and historical texts have domain specific characteristics such as non-standard casing, a higher
degree of typos, use of hashtags, and historic spelling such as upper-cased nouns [
          <xref ref-type="bibr" rid="ref12 ref3 ref30">31, 3, 12</xref>
          ].
While it is infeasible to test the models on all possible domains, some of these characteristics
2For a full list of models and training configurations see the config files on Github:
centre-for-humanities-computing/DaCy/tree/main/training
https://github.com/
can be modelled using data augmentation which can provide practitioners with an estimate of
the potential shortcomings of the model. Further, data augmentation can be used to estimate
biases against protected groups such as gender and ethnicity.
        </p>
        <p>The augmenters presented here are not meant to be exhaustive, but rather a first step towards
more thorough validation of new language models. We argue that the bar for inclusion of a
new model should be set higher than a slight increase in benchmark performance. Language
models are used in a variety of contexts which current benchmarks tasks, especially for low
resource languages, do not capture. Our aim with these experiments is to provide an extra layer
of insight into the performance of language models that more closely mimics naturalistic use
cases, and encourage the development of further augmenters. Augmentation not only provides
insights into when model performance breaks down, whether certain models are more suited
for specific use-cases than others, but can also be used for identifying specific areas to improve
upon.</p>
        <p>The augmenters developed for this paper are designed in accordance with the SpaCy
framework, and are thus not necessarily tied to DaCy or Danish in particular and can be used both
during model validation and training. Comprehensive tutorials are provided on the DaCy
Github repository.</p>
        <p>
          We tested small, medium, and large SpaCy [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] and DaCy models, Stanza [
          <xref ref-type="bibr" rid="ref22">23</xref>
          ], Polyglot
[
          <xref ref-type="bibr" rid="ref25">26</xref>
          ], NERDA [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], Flair, 3 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], and DaNLP’s BERT [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] on the DaNE test set augmented with
the following augmenters:
1. Keystroke augmentation: substitute 2%, 5%, or 15% of characters with a neighbouring
character on a Danish QWERTY keyboard.
2. ÆØÅ augmentation: substitute ae/Æ with ae/Ae, ø/Ø with oe/Oe, and å/Å with aa/Aa
to simulate some historic text variations in Danish.
3. Lower-case augmentation: convert all text to lower-case.
4. Spacing augmentation: randomly remove 5% of all whitespace.
5. Name augmentations:
a) Substitute all names (PER entities) with randomly sampled Danish names,
respecting first and last names.
b) Substitute all names with randomly sampled names of Muslim origin used in
Denmark [21], respecting first and last names.
c) Substitute all names with sampled Danish male names, respecting first and last
names.
d) Substitute all names with sampled Danish female names, respecting first and last
names.
        </p>
        <p>e) Abbreviate all first names to the first character including a full stop.</p>
        <p>The stochastic augmentations, i.e. name and keystroke augmentations, were repeated 20
times.</p>
        <p>Previous evaluations of Danish NLP tools have used the gold-standard tokens instead of
using a tokenization module. While this allows for easier comparison of the specific modules it
inflates the performance metrics of the models and is unlikely to reflect the metric of interest,
namely, the performance during application.4 All models were tested using both their own
tokenizer (if they have one) and the SpaCy tokenizer for Danish. The performance reported in</p>
        <sec id="sec-2-2-1">
          <title>3As supplied by DaNLP. 4In our experiments, several of the Danish models performed worse using their own tokenizer.</title>
          <p>section 3 uses the best peforming tokenization module for each pipeline. For all models except
Stanza and Polyglot this was found to be the SpaCy tokenizer.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <p>
        This paper has introduced the DaCy models and presented a thorough evaluation of Danish
NLP models on a battery of augmentations. DaCy models achieve state-of-the-art
performance on Danish NER, POS, and dependency parsing, and are robust to augmentations such
as keystroke errors, name changes, and lowercasing. The results from training DaCy
underline three well-known trends in deep learning and NLP, 1) larger models tend to perform
better, 2) higher quality pre-training data leads to better models, as illustrated by the
superior performance of Ælaectra compared to DaELECTRA, and 3) multilingual models perform
competitively with monolingual models [
        <xref ref-type="bibr" rid="ref24 ref34 ref5">25, 35, 5</xref>
        ].
      </p>
      <p>Our experiments with multiple augmenters revealed diferent patterns of strengths and
weaknesses across Danish NLP models. In general, larger models tend to be more robust to data
augmentations. Several models are highly sensitive to casing, which limits their usefulness on
certain domains. Evaluating models on augmented data provides a more holistic and realistic
estimate of the expected performance, and can reveal in which use cases one model might be
more useful than another. For example, it might be better to use DaCy medium on social
media as opposed to DaCy large as its performance is not afected by casing.</p>
      <p>The purpose of the data augmentation experiments was to evaluate the robustness of Danish
models and to open a discussion on how to present new models going forward. As more models
are developed for low and medium resource languages, properly evaluating them becomes vital
for securing robustness, transparency, and efectiveness despite limited benchmark sets. We
do not posit data augmentation as the only solution, but demonstrate that it can efectively
reveal performance diferences on important factors such as casing, spelling errors, and biases
related to protected groups. As researchers, we bear the responsibility for releasing adequately
tested and robust models into the world. With the increasing ease of deployment, users must
be made aware of the level of performance they can realistically expect to achieve on their
problem, and when to choose one model over another. Social media researchers should know
that certain models are sensitive to casing, historians should know that some models handle
old text variations such as ae, oe, aa poorly, and lawyers should be aware that models might
not be able to identify abbreviated names as efectively. In this regard, transparency and
openness as to when and how models fail are crucial measures to report. Such evaluation
requires the development of infrastructure and tools, but is fast and easy to conduct once in
place. For instance, it only takes 8 minutes to test DaCy large on all augmented datasets
including bootstrapping. As part of the DaCy library, we provide several augmenters and
utility functions for evaluation that integrate with SpaCy and encourage new NLP models
to use and expand upon them. For the continued development of low and medium resource
NLP in a direction that is beneficial for practitioners, it is vital to conduct more thorough
evaluation of new models. We suggest these augmenters not as an evaluation standard, but
as preliminary guiding principles for future development of NLP models for low and medium
resource languages in particular.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>We would like to especially thank Martin C. Jespersen for early conversions on biases and
covert weaknesses in Danish language models.
[21] E. V. Meldgaard. Muslimske fornavne i Danmark. 2005. url: https : / / nors . ku . dk /
publikationer/webpublikationer/muslimske%5C%5Ffornavne/.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Aghajanyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shrivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Gupta</surname>
          </string-name>
          . “Muppet:
          <article-title>Massive Multi-task Representations with Pre-Finetuning”</article-title>
          . In: arXiv:
          <fpage>2101</fpage>
          .11038 [cs] (
          <year>2021</year>
          ). arXiv:
          <volume>2101</volume>
          .11038. url: http://arxiv.org/abs/2101.11038.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Akbik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Blythe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rasul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schweter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Vollgraf</surname>
          </string-name>
          . “
          <article-title>FLAIR: An Easy-to-Use Framework for State-of-the-</article-title>
          <string-name>
            <surname>Art</surname>
            <given-names>NLP</given-names>
          </string-name>
          ”.
          <source>In: Proceedings of the</source>
          <year>2019</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)</article-title>
          . Minneapolis, Minnesota: Association for Computational Linguistics,
          <year>2019</year>
          , pp.
          <fpage>54</fpage>
          -
          <lpage>59</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -4010. url: https://aclanthology.org/N19-4010.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Baldwin</surname>
          </string-name>
          . “
          <article-title>Social media: friend or foe of natural language processing?”</article-title>
          <source>In: Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation</source>
          .
          <year>2012</year>
          , pp.
          <fpage>58</fpage>
          -
          <lpage>59</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Brogaard Pauli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Barrett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Lacroix</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Hvingelby</surname>
          </string-name>
          . “
          <article-title>DaNLP: An open-source toolkit for Danish Natural Language Processing”</article-title>
          .
          <source>In: Proceedings of the 23rd Nordic Conference on Computational Linguistics</source>
          <year>2021</year>
          ).
          <source>NoDaLiDa</source>
          <year>2021</year>
          .
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>T. B. Brown</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ryder</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Subbiah</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kaplan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dhariwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Neelakantan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Shyam</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Sastry</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Askell</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Herbert-Voss</surname>
            , G. Krueger,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Henighan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Child</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ramesh</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          <string-name>
            <surname>Ziegler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Winter</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Hesse</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            , E. Sigler,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Litwin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Chess</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Berner</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>McCandlish</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>I. Sutskever</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          . “
          <article-title>Language Models are Few-Shot Learners”</article-title>
          . In: arXiv:
          <year>2005</year>
          .14165 [cs] (
          <year>2020</year>
          ). arXiv:
          <year>2005</year>
          .14165. url: http://arxiv.org/abs/
          <year>2005</year>
          .14165.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Caruana</surname>
          </string-name>
          . “
          <article-title>Multitask learning”</article-title>
          .
          <source>In: Machine learning 28.1</source>
          (
          <issue>1997</issue>
          ), pp.
          <fpage>41</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bansal</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <article-title>“An Empirical Survey of Data Augmentation for Limited Data Learning in NLP”</article-title>
          . In: arXiv:
          <fpage>2106</fpage>
          .07499 [cs] (
          <year>2021</year>
          ). arXiv:
          <volume>2106</volume>
          .07499. url: http://arxiv.org/abs/2106.07499.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Clark</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>T.</given-names>
            <surname>Luong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          . “ELECTRA:
          <article-title>Pre-training Text Encoders as Discriminators Rather Than Generators”</article-title>
          . In: arXiv:
          <year>2003</year>
          .10555 [cs] (
          <year>2020</year>
          ). arXiv:
          <year>2003</year>
          .10555. url: http://arxiv.org/abs/
          <year>2003</year>
          .10555.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wenzek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guzmán</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          . “
          <article-title>Unsupervised Cross-lingual Representation Learning at Scale”</article-title>
          . In: arXiv:
          <year>1911</year>
          .02116 [cs] (
          <year>2020</year>
          ). arXiv:
          <year>1911</year>
          . 02116. url: http : //arxiv.org/abs/
          <year>1911</year>
          .02116.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Derczynski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Ciosici</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Baglini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Christiansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Dalsgaard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fusaroli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Henrichsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hvingelby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kirkedal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Kjeldsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ladefoged</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Å. Nielsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Madsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Petersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Rystrøm</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Varab</surname>
          </string-name>
          . “
          <article-title>The Danish Gigaword Corpus”</article-title>
          .
          <source>In: Proceedings of the 23rd Nordic Conference on Computational Linguistics</source>
          .
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          . “BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding”</article-title>
          . In: arXiv:
          <year>1810</year>
          .04805 [cs] (
          <year>2019</year>
          ). arXiv:
          <year>1810</year>
          .04805. url: http://arxiv.org/abs/
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Farzindar</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Inkpen</surname>
          </string-name>
          . “
          <article-title>Natural language processing for social media”</article-title>
          .
          <source>In: Synthesis Lectures on Human Language Technologies</source>
          <volume>8</volume>
          .2 (
          <issue>2015</issue>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>166</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S. Y.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Gangal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chandar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vosoughi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mitamura</surname>
          </string-name>
          , and
          <string-name>
            <surname>E. Hovy. “</surname>
          </string-name>
          <article-title>A Survey of Data Augmentation Approaches for NLP”</article-title>
          . In: arXiv:
          <fpage>2105</fpage>
          .03075 [cs] (
          <year>2021</year>
          ). arXiv:
          <volume>2105</volume>
          .03075. url: http://arxiv.org/abs/2105.03075.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Højmark-Bertelsen. “Ælaectra - A Step Towards More Efficient Danish Natural Language Processing</surname>
          </string-name>
          <article-title>”</article-title>
          . In:
          <year>2021</year>
          . url: https://github.com/MalteHB/-l-ctra/.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Honnibal</surname>
          </string-name>
          , I. Montani,
          <string-name>
            <surname>S. Van Landeghem</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Boyd</surname>
          </string-name>
          . spaCy:
          <string-name>
            <surname>Industrial-strength Natural</surname>
          </string-name>
          Language Processing in Python.
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .5281/zenodo.1212303. url: https: //doi.org/10.5281/zenodo.1212303.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Hvingelby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. B.</given-names>
            <surname>Pauli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Barrett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rosted</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Lidegaard</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Søgaard</surname>
          </string-name>
          . “
          <article-title>DaNE: A named entity resource for danish”</article-title>
          .
          <source>In: Proceedings of the 12th Language Resources and Evaluation Conference. Lrec '20</source>
          .
          <year>2020</year>
          , pp.
          <fpage>4597</fpage>
          -
          <lpage>4604</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Feng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Yan</surname>
          </string-name>
          . “
          <article-title>ConvBERT: Improving BERT with Span-based Dynamic Convolution”</article-title>
          . In: arXiv:
          <year>2008</year>
          .02496 [cs] (
          <year>2021</year>
          ). arXiv:
          <year>2008</year>
          . 02496. url: http://arxiv.org/abs/
          <year>2008</year>
          .02496.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Johannsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. M.</given-names>
            <surname>Alonso</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Plank</surname>
          </string-name>
          . “
          <article-title>Universal Dependencies for Danish”</article-title>
          .
          <source>In: Proceedings of the Fourteenth International Workshop on Treebanks and Linguistic Theories</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>L.</given-names>
            <surname>Kjeldgaard</surname>
          </string-name>
          . “
          <article-title>Nerda”</article-title>
          . In: GitHub,
          <year>2020</year>
          . url: https://github.com/ebanalyse/NERDA.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          . “
          <article-title>Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics?” In: Computational Linguistics and Intelligent Text Processing</article-title>
          . Ed. by
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Gelbukh</surname>
          </string-name>
          . Lecture Notes in Computer Science. Berlin, Heidelberg: Springer,
          <year>2011</year>
          , pp.
          <fpage>171</fpage>
          -
          <lpage>189</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -19400-9\_
          <fpage>14</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Møllerhøj</surname>
          </string-name>
          .
          <article-title>Danish BERT model: BotXO has trained the most advanced BERT model</article-title>
          .
          <source>BotXO</source>
          .
          <year>2019</year>
          . url: https://www.botxo.ai/blog/danish-bert-model/.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>P.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Bolton, and
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          . “
          <article-title>Stanza: A Python Natural Language Processing Toolkit for Many Human Languages”</article-title>
          . In: arXiv:
          <year>2003</year>
          .07082 [cs] (
          <year>2020</year>
          ). arXiv:
          <year>2003</year>
          .07082. url: http://arxiv.org/abs/
          <year>2003</year>
          .07082.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Sutskever.</surname>
          </string-name>
          “
          <article-title>Language models are unsupervised multitask learners”</article-title>
          .
          <source>In: OpenAI blog 1</source>
          .8 (
          <issue>2019</issue>
          ), p.
          <fpage>9</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          . “
          <article-title>Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer”</article-title>
          . In: arXiv:
          <year>1910</year>
          .10683 [cs, stat] (
          <year>2020</year>
          ). arXiv:
          <year>1910</year>
          .10683. url: http:// arxiv.org/abs/
          <year>1910</year>
          .10683.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>R.</given-names>
            <surname>Al-Rfou'</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Perozzi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Skiena</surname>
          </string-name>
          . “
          <article-title>Polyglot: Distributed Word Representations for Multilingual NLP”</article-title>
          .
          <source>In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning</source>
          . Sofia, Bulgaria: Association for Computational Linguistics,
          <year>2013</year>
          , pp.
          <fpage>183</fpage>
          -
          <lpage>192</lpage>
          . url: https://aclanthology.org/W13-3520.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          . “Beyond Accuracy:
          <article-title>Behavioral Testing of NLP Models with CheckList”</article-title>
          .
          <source>In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>4902</fpage>
          -
          <lpage>4912</lpage>
          . doi:
          <volume>10</volume>
          . 18653 / v1 /
          <year>2020</year>
          . acl - main . 442. url: https : //aclanthology.org/
          <year>2020</year>
          .acl-main.
          <volume>442</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [28]
          <string-name>
            <surname>S. Ruder. “</surname>
          </string-name>
          <article-title>An Overview of Multi-Task Learning in Deep Neural Networks”</article-title>
          .
          <source>In: arXiv:1706</source>
          .05098 [cs, stat] (
          <year>2017</year>
          ). arXiv:
          <volume>1706</volume>
          .05098. url: http://arxiv.org/abs/1706.05098.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>C.</given-names>
            <surname>Shorten</surname>
          </string-name>
          and
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Khoshgoftaar</surname>
          </string-name>
          . “
          <article-title>A survey on Image Data Augmentation for Deep Learning”</article-title>
          .
          <source>In: Journal of Big Data 6.1</source>
          (
          <issue>2019</issue>
          ), p.
          <fpage>60</fpage>
          . doi:
          <volume>10</volume>
          .1186/s40537-019-0197-0. url: https://doi.org/10.1186/s40537-019-0197-0.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [30] sprogteknologi.dk. Sprogteknologi.dk.
          <year>2021</year>
          . url: https://sprogteknologi.dk/.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>N.</given-names>
            <surname>Tahmasebi</surname>
          </string-name>
          . “
          <article-title>A Study on Word2Vec on a Historical Swedish Newspaper Corpus”</article-title>
          .
          <source>In: Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference</source>
          ,
          <string-name>
            <surname>DHN</surname>
          </string-name>
          <year>2018</year>
          , Helsinki, Finland, March 7-
          <issue>9</issue>
          ,
          <year>2018</year>
          .
          <year>2018</year>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>37</lpage>
          . url: http://ceur-ws.org/Vol2084/paper2.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [32]
          <string-name>
            <surname>P. T.</surname>
          </string-name>
          Tamini-Sarnikowski.
          <article-title>“Danish transformers”</article-title>
          .
          <source>In: GitHub</source>
          ,
          <year>2020</year>
          . url: https://github. com/sarnikowski.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Zou</surname>
          </string-name>
          . “EDA:
          <article-title>Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks”</article-title>
          .
          <source>In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          .
          <source>Hong Kong</source>
          , China: Association for Computational Linguistics,
          <year>2019</year>
          , pp.
          <fpage>6382</fpage>
          -
          <lpage>6388</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          -1670. url: https://www.aclweb.org/anthology/D19-1670.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delangue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cistac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Funtowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Davison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shleifer</surname>
          </string-name>
          , P. von Platen, C. Ma,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jernite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Plu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. L.</given-names>
            <surname>Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gugger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Drame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lhoest</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Rush</surname>
          </string-name>
          . “
          <article-title>HuggingFace's Transformers: State-of-the-art Natural Language Processing”</article-title>
          . In: arXiv:
          <year>1910</year>
          .03771 [cs] (
          <year>2020</year>
          ). arXiv:
          <year>1910</year>
          .03771. url: http://arxiv.org/abs/
          <year>1910</year>
          .03771.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>L.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Constant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Al-Rfou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Siddhant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barua</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          . “mT5:
          <article-title>A massively multilingual pre-trained text-to-text transformer”</article-title>
          . In: arXiv:
          <year>2010</year>
          .11934 [cs] (
          <year>2021</year>
          ). arXiv:
          <year>2010</year>
          .11934. url: http://arxiv.org/abs/
          <year>2010</year>
          .11934.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>