<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>KLUMSy@KIPoS: Experiments on Part-of-Speech Tagging of Spoken Italian</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thomas Proisl Gabriella Lapesa</string-name>
          <email>gabriella.lapesa@ims.uni-stuttgart.de</email>
          <email>thomas.proisl@fau.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computational Corpus Linguistics Group Institute for Natural Language Processing Friedrich-Alexander-Universität Erlangen-Nürnberg Universität Stuttgart Bismarckstr.</institution>
          <addr-line>6 Pfaffenwaldring 5 b 91054 Erlangen, Germany 70569 Stuttgart</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we describe experiments on part-of-speech tagging of spoken Italian that we conducted in the context of the EVALITA 2020 KIPoS shared task (Bosco et al., 2020). Our submission to the shared task is based on SoMeWeTa (Proisl, 2018), a tagger which supports domain adaptation and is designed to flexibly incorporate external resources. We document our approach and discuss our results in the shared task along with a statistical analysis of the factors which impact performance the most. Additionally, we report on a set of additional experiments involving the combination of neural language models with unsupervised HMMs, and compare its performance to that of our system.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Part-of-speech taggers trained on standard
newspaper texts usually perform relatively poorly on
spoken language or on written communication that
is “conceptually oral”, e. g. tweets or chat
messages. The challenges of spoken language
include non-standard lexis, e. g. the use of
colloquial and dialectal forms, and non-standard
syntax, e. g. false starts, repetitions, incomplete
sentences and the use of fillers. To make things worse,
the amount of training data available for spoken
language – or non-standard varieties in general –
is usually several orders of magnitude smaller than
for the usual newspaper corpora. One strategy for
coping with this is to incorporate additional
resources, e. g. lexica or distributional information
obtained from large amounts of unannotated text.
Another strategy is to do domain adaptation, i. e. to
leverage existing written standard corpora to
pretrain an out-of-domain tagger model and to then
adapt that model to the target domain using a small
amount of in-domain data.</p>
      <p>
        We experiment with these ideas in the
context of the EVALITA 2020 shared task on
partof-speech tagging of spoken Italian
        <xref ref-type="bibr" rid="ref1 ref3">(Bosco et al.,
2020; Basile et al., 2020)</xref>
        . The data of the shared
task have been drawn from the KIParla corpus
        <xref ref-type="bibr" rid="ref11">(Mauri et al., 2019)</xref>
        and consist of the manually
annotated training and test datasets and a silver
dataset that has been automatically tagged by the
task organizers using a UDPipe1 model trained
on all Italian treebanks in the Universal
Dependencies (UD) project.2 While the silver dataset
is annotated with the standard UD tagset (as are
the corpora on which the tagger has been trained),
the training and test sets use an extended version
where tags can optionally be assigned one of two
subcategories, .DIA for dialectal forms and .LIN
for foreign words.
2
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Additional resources</title>
      <sec id="sec-2-1">
        <title>Corpora</title>
        <p>
          We use a collection of plain text corpora to
compute Brown clusters
          <xref ref-type="bibr" rid="ref4">(Brown et al., 1992)</xref>
          that the
tagger can use as additional resource.
        </p>
        <p>
          Ideally, we would use large amounts of
transcribed speech for the present task. Since there
is no such dataset, we try to use corpora that come
close. The closest to authentic speech is scripted
speech, therefore we use the Italian movie
subtitles from the OpenSubtitles corpus
          <xref ref-type="bibr" rid="ref8">(Lison and
Tiedemann, 2016)</xref>
          .3 Computer-mediated
communication, e. g. in social media, sometimes
exhibits features that are typical of spoken
language use. Therefore, we also use a
collection of roughly 11.7 million Italian tweets and
1http://ufal.mff.cuni.cz/udpipe/1
2https://universaldependencies.org/
3http://opus.nlpl.eu/OpenSubtitles-v2018.php
ca. 2.7 million Reddit posts (submissions and
comments) from the years 2011–2018. We
extracted the Reddit posts from Jason
Baumgartner’s collection of Reddit submissions and
comments4 using the processing pipeline by
Blombach et al. (2020). Additionally, we also include
all Italian corpora from the Universal
Dependencies project and, to further increase the amount of
data, a number of web corpora: The PAISÀ
corpus of Italian texts from the web
          <xref ref-type="bibr" rid="ref10">(Lyding et al.,
2014)</xref>
          ,5 the text of the Italian Wikimedia dumps,6
i. e. Wiki(pedia|books|news|versity|voyage), as
extracted by Wikipedia Extractor,7 and the Italian
subset of OSCAR, a huge multilingual Common
Crawl corpus
          <xref ref-type="bibr" rid="ref13">(Ortiz Suárez et al., 2019)</xref>
          .8
        </p>
        <p>We tokenize and sentence split all corpora
using UDPipe trained on the union of all Italian UD
corpora. We also remove all duplicate sentences.
The sizes of the resulting corpora are given in
Table 1. As final preprocessing steps, we lowercase
all words and normalize numbers, user mentions,
email addresses and URLs. Finally, we use the
implementation by Liang (2005)9 to compute 1,000
Brown clusters with a minimum frequency 5.
corpus
oscar
opensubtitles
paisa
reddit
tweets
ud
wiki
wikibooks
wikinews
wikiversity
wikivoyage
complete</p>
        <p>–
795,250,711
282,631,297
112,735,958
152,496,728</p>
        <p>672,929
578,425,024
12,106,499
2,744,317
5,766,859
3,911,881</p>
        <p>deduplicated
13,787,307,218
378,348,061
258,679,965
105,274,620
148,031,020</p>
        <p>
          615,057
560,863,691
11,825,870
2,583,135
5,365,924
3,825,872
We incorporate linguistic knowledge in the form
of Morph-it!
          <xref ref-type="bibr" rid="ref20">(Zanchetta and Baroni, 2005)</xref>
          ,10 a
morphological lexicon for Italian that contains
morphological analyses of roughly 505,000 word
4https://files.pushshift.io/reddit/
5http://www.corpusitaliano.it/
6https://dumps.wikimedia.org/
7http://medialab.di.unipi.it/wiki/Wikipedia_
Extractor
8https://oscar-corpus.com/
9https://github.com/percyliang/brown-cluster/
10https://docs.sslmit.unibo.it/doku.php?id=
resources:morph-it
forms that correspond to about 35,000 lemmata.
In its analyses, Morph-it! distinguishes between
derivational features and inflectional features. In
total, there are 664 unique feature combinations.
We simplify the analyses by stripping away all
inflectional features and some of the derivational
features, i. e. gender (for articles, nouns and
pronouns) and person and number (for pronouns).
This results in 39 coarse-grained categories that
correspond to major word classes, with some finer
distinctions for determiners and pronouns.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>System description</title>
      <p>
        For our submission to the shared task we use
SoMeWeTa
        <xref ref-type="bibr" rid="ref15 ref5">(Proisl, 2018)</xref>
        , a tagger that is based
on the averaged structured perceptron, supports
domain adaptation and can incorporate external
resources such as Brown clusters and lexica.11 Its
ability to make use of existing linguistic resources
allows the tagger to achieve competitive results
even with relatively small amounts of in-domain
training data, which is particularly useful for
nonstandard varieties or under-resourced languages
        <xref ref-type="bibr" rid="ref14 ref15 ref16 ref5">(Kabashi and Proisl, 2018; Proisl et al., 2019)</xref>
        .
      </p>
      <p>We participate in all three substasks: The main
subtask where we use all the available silver and
training data, subtask A where we only use the
data from the formal register, and subtask B where
we only use the informal data. The training
scheme is the same for all three subtasks. First,
we train preliminary models on the silver data
provided by task organizers. Keep in mind that the
silver dataset has been automatically tagged.
Therefore, it is annotated with the standard version of
the UD tagset and not with the extended one that is
used in the shared task; in addition, there will be a
certain amount of tagging errors in the data.
Nevertheless, the dataset provides the tagger with
(imperfect) domain-specific background knowledge.
In the next step, we adapt the silver models to the
union of the Italian UD treebanks, i. e. to
highquality but out-of-domain data. In the final step,
we adapt the models to spoken Italian using the
manually annotated training data. In every step we
train for 12 iterations using a search beam size of
10 and provide the tagger with the Brown clusters
and the Morph-it!-based lexicon (Section 2).
11https://github.com/tsproisl/SoMeWeTa
4.1</p>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <sec id="sec-4-1">
        <title>Data preparation and evaluation results</title>
        <p>The silver data, training data and the data from the
UD treebanks follow UD tokenization guidelines,
i. e. contractions such as parlarmi (parlar+mi) ‘to
talk+to me’ or della (di+la) ‘of+the’ are split into
their constituents for annotation. This is not the
case for the test data where contractions have to
be assigned a joint tag, e. g. VERB_PRON or
ADP_A. Therefore, we run the test data through
the UDPipe tokenizer from Section 2.1, tag the
resulting tokens and merge the tags for all tokens
that have been split. Table 2 shows the results on
the two testsets.12 On the main task, SoMeWeTa
performs reasonably well, only 1–1.4 points worse
than the fine-tuned UmBERTo model by
Tamburini (2020). On subtasks A and B, it even
outperforms that system by a considerable margin.
task
main
subA
subB
system
corrected
gold tokens
Tamburini (2020)
corrected
gold tokens
Tamburini (2020)
corrected
gold tokens
Tamburini (2020)
formal informal
To get a better insight into the impact of the
different experimental variables involved in this study,
we carried out feature ablation experiments which
targeted the different components of our system,
namely the different combinations of training and
test data (formal vs. informal) and the different
additional resources described in section 2 (use
of Brown clusters, Morph-it!, silver data, and UD
corpora). We then carried out a linear regression
analysis with tagging accuracy as a dependent
12Unfortunately, when preparing our submission, we did
not notice that contractions of prepositions (ADP) and
determiners (DET) have to be tagged as ADP_A. As
a consequence, we mis-tagged all these contractions as
ADP_DET. For reference, here are the evaluation results
of our faulty submission on the formal/informal test sets:
main 87.56/88.24, subA 87.37/87.58, subB 87.81/88.11.
variable and the different experimental
parameters as independent variables (predictors). We
follow the methodology outlined in Lapesa and
Evert (2014) and quantify the impact of a specific
predictor (e. g. the use of Brown clusters) as the
amount of variance in the dependent variable
(tagging accuracy) it accounts for. We considered the
following experimental parameters as predictors.
• setup: Training/test setup; this predictor
encodes the combination of training/test data
and has the following values: all_formal (i. e.
trained on the full set, tested on formal),
all_informal, formal_formal, formal_informal,
informal_formal, informal_informal
• silver: Use of silver data during training (yes,
no)
• ud: Use of UD corpora during training (yes, no)
• morph: Use of Morph-it! (yes, no)
• brown: Use of Brown clusters (yes, no)
We tested all the possible configurations, i. e.
all the combinations of the parameters described
above, and, to account for random effects during
training, ran each configuration 10 times. This
resulted in 960 experimental runs, each
corresponding to a single datapoint in our regression analysis.
Given that it is reasonable to assume that specific
parameter values will influence the performance
of other parameters (e. g., use of Morph-it! could
boost performance but only if larger corpora are
employed), we also test all the 2-way interactions.
As a sanity check, we also introduce the number
of an experimental run as a predictor (1 to 10, as
a categorical variable), in the hope, obviously, of
finding no effect for it. Summing up, our
regression equation looks as follows:
accuracy (setup + silver + ud +
morph + brown + run)^213
Unsurprisingly, our model achieves an excellent fit
to the data, quantified in an Adjusted R-squared
of 95.2%. Table 3 lists all significant predictors
and interactions, along with their explained
variance. Explained variance quantifies the portion of
the total R-squared that a specific parameter (or
interaction) is responsible for and can be
straightforwardly interpreted as the impact that the
manipulation of a specific parameter has on the
accuracy of our tagger. Reassuringly, we found no
effect of experimental run. All other predictors, and
13Given that we ran the regression analysis in R, and the
equation follows the R syntax in which “^2” denotes all
pairwise interactions of the predictors between
parentheses.
all the corresponding interactions, turned out to be
highly significant (with one minor exception). The
biggest role is played by the setup variable, which
alone accounts for 42.06%. Using UD corpora in
the training has also a strong impact, with a strong
interaction involving the use of silver data (6.00%
R-squared). Further strong interactions are found
between brown and morph, and brown and UD –
probably suggesting that introducing a 3-way
interaction would be appropriate here. Given the
increased complexity, however, this extension is left
for future work.</p>
        <p>Now that we have established which parameters
or interactions have the strongest impact on model
performance, it is time to ask which parameter
values ensure the best performance. In our case,
given that the system can be assembled
incrementally (adding external resources and training data
to a basic configuration), asking what the best
parameter values are amounts to determining if, for
example, the addition of Brown clusters improves
performance or is detrimental. Note that the
significance of the brown predictor in the regression
analysis already tells us that the predictor affects
performance, ruling out the possibility that it has
no impact at all. To visualize the effects in the
linear model, we follow Lapesa and Evert (2014)
and employ effect displays which show the partial
effect of one or two parameters by marginalizing
over all other parameters. Unlike coefficient
estimates, they allow an intuitive interpretation of the
effect sizes of categorical variables irrespective of
the dummy coding scheme used.</p>
        <p>Let us start with the strongest predictor, setup,
silver
● 1</p>
        <p>●
morph
● 1
ud
●
●
87 al_formal
al_informal</p>
        <p>●
formal_formtraalining_tfeosrmtal_informal informal_formal informal_informal
in its strongest interaction, the one with silver.
Figure 1 displays the predicted accuracies
resulting from the different parameter combinations of
the two predictors. Note that, given the excellent
fit of the regression model, we can assume
predicted accuracy to be a reliable estimate of actual
accuracy. Also, note that while we are
visualizing the predicted accuracy of a 2-way interaction,
we are actually displaying the effect of the
individual terms (setup and silver) and of the
interaction (setup:silver) jointly. We observe that,
unsurprisingly, independently of the use of silver data,
training on the whole dataset ensures the best
performance on both the formal and informal test sets.
The use of silver data (pink line) improves
performance, but with differences in the different
training/test setups. Interestingly, using the silver data
makes the performance gap between the models
trained on the whole dataset and those trained on
just the informal dataset negligible. Surprisingly,
we observe that the best performance is predicted
for the formal test set when the informal set is
used. Further experiments on the
complementarity of the two subtasks are needed to further clarify
this contradiction.</p>
        <p>Figure 2 displays the interaction between the
use of UD corpora and the integration of Morph-it!
in SoMeWeTa. Note that the performance gaps are
smaller here than in the previous interaction: this
is no surprise, given the smaller explanatory power
(explained variance) of the parameters and
interactions involved. Morph-it! produces substantial
improvements, but again, to a lesser extent if UD
corpora are employed: this could either be due
to a lower coverage of Morph-it! on the UD
corpora, or to the boost in model robustness produced
by the introduction of a larger training set. The
steep slope of the blue line wrt. the pink one
suggests that the presence of a morphological lexicon
like Morph-it! can compensate the lack of training
data. Let us conclude with the third strongest
interaction, the one between the use of Brown
clusters and the use of Morph-it!, not shown here for
space constraints. It is strikingly similar to the
one in Figure 2: Morph-it! improves performance
overall, and the steeper improvement in absence of
the Brown clusters suggests that the quality of the
information encoded in Morph-it! can compensate
for the lack of external resources.</p>
        <p>In sum, our analysis supports the starting
assumption that in a low-resource setting like the
one of KIPoS, integrating additional, focussed
resources always supports performance.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Additional experiments: RoBERTa with unsupervised HMM</title>
      <p>
        Fine-tuned neural language models have been
extremely successful in all areas of natural language
processing (NLP). Not only can language
models trained on huge amounts of plain text be
finetuned to all NLP tasks, they have also been shown
to learn certain linguistic abstractions
        <xref ref-type="bibr" rid="ref18">(Tenney et
al., 2019)</xref>
        . At least that seems to be the case for
English. Languages that are typologically
different from English are both more difficult to model
with current architectures
        <xref ref-type="bibr" rid="ref12">(Mielke et al., 2019)</xref>
        and seem to be more challenging when it comes
to learning linguistic abstractions
        <xref ref-type="bibr" rid="ref16">(Ravfogel et al.,
2018)</xref>
        . In the experiment described in this
section, we extend a state-of-the-art language model
architecture to explicitly model part-of-speech
information. To this end, we combine a RoBERTa
language model
        <xref ref-type="bibr" rid="ref9">(Liu et al., 2019)</xref>
        with an
unsupervised neural hidden Markov model (HMM) for
part-of-speech induction.
      </p>
      <p>The architecture of the unsupervised HMM
follows the LSTM-based variant described by Tran
et al. (2016). We directly use the negative
logarithm of the observation likelihood determined by
the backward algorithm as additional loss for the
language model. The embeddings of the best tag
sequence (determined using the Viterbi algorithm)
are added to the word embeddings before feeding
them into the language model. Due to time and
resource constraints, we opt for a small to
mediumsized model14 with a total of 45.5 million
trainable parameters and train it on 1.9 billion tokens
of text (the corpora described in Section 2.1
excluding OSCAR). The model variant with the
unsupervised HMM totals 48.7 million trainable
parameters. We pre-train and fine-tune both models
with the same set of parameters.15</p>
      <p>The results are summarized in Table 4. Due
to the small model size and relatively little
training data, the performance of both models is
below SoMeWeTa’s. (Keep in mind that
state-of-theart language models for Italian like UmBERTo or
GilBERTo16 are based on the same RoBERTa
architecture but feature roughly three times as many
parameters and have been trained on an order of
magnitude more data.) However, the experiment is
successful insofar as explicitly modelling
part-ofspeech information using an unsupervised HMM
gives modest gains on both test sets. On the union
of the two test sets, this corresponds to a
statistically significant improvement from 89.84 to 90.42
(McNemar mid-p test: p = 0:0133).</p>
      <p>model
RoBERTa
RoBERTa+HMM
formal informal
91.28
91.84
88.46
89.05
14We use the RoBERTa implementation from the
transformers library (https://github.com/huggingface/
transformers) with 6 hidden layers, 8 attention heads,
a hidden size of 512 and an intermediate size of 2048.
15Pretraining for 100,000 steps with a batch size of 500, peak
learning rate of 5 10 4, 6,000 warm-up steps and dropout
set to 0.1. Fine-tuning to the KIPoS task using the entire
training data for 4 epochs with a batch size of 32 and
learning rate of 3 10 4
16https://github.com/musixmatchresearch/
umberto, https://github.com/idb-ita/GilBERTo</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>This paper started out with the assumption that
in low-resource scenarios like the KIPoS shared
task the integration of additional resources such as
lexica (in our case, Morph-it!) and distributional
information from larger corpora (in our case, the
Brown clusters) can compensate for the lack of
large amounts of training data. Moreover, our
strategy also built on the assumption that in a
lowresource scenario domain adaptation would be a
winning strategy, as it would enable us to exploit
larger training sets for written language (out of
domain), and then fine-tune the tagger on the spoken
language (in domain). The results of our
experiments, and the insights gathered from the
statistical analysis of our results indicate that both
assumptions hold to be true, as far as our
contribution to the KIPoS shared task is concerned. In
subtasks A and B, where only half the amount of
training data was available, this strategy even
outperformed a fine-tuned state-of-the-art neural
language model. Further work is needed to assess the
complementarity of the error profiles of different
configurations, taking into the picture also the
neural architectures evaluated in Section 4.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Danilo Croce, Maria Di Maro, and
          <string-name>
            <surname>Lucia</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Passaro</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Evalita 2020: Overview of the 7th evaluation campaign of natural language processing and speech tools for italian</article-title>
          .
          <source>In Valerio Basile</source>
          , Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Blombach</surname>
          </string-name>
          , Natalie Dykes, Philipp Heinrich, Besim Kabashi, and
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Proisl</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>A corpus of German Reddit exchanges (GeRedE)</article-title>
          .
          <source>In Proc. of LREC</source>
          , pages
          <fpage>6310</fpage>
          -
          <lpage>6316</lpage>
          , Marseille. ELRA.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Cristina</given-names>
            <surname>Bosco</surname>
          </string-name>
          , Silvia Ballarè, Massimo Cerruti, Eugenio Goria, and
          <string-name>
            <given-names>Caterina</given-names>
            <surname>Mauri</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>KIPoS@EVALITA2020: Overview of the task on KIParla part of speech tagging</article-title>
          .
          <source>In Proc. of EVALITA. CEUR.org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Peter F. Brown</surname>
            , Vincent J. Della Pietra, Peter V. de Souza, Jennifer C. Lai,
            <given-names>and Robert L.</given-names>
          </string-name>
          <string-name>
            <surname>Mercer</surname>
          </string-name>
          .
          <year>1992</year>
          .
          <article-title>Class-based n-gram models of natural language</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>18</volume>
          (
          <issue>4</issue>
          ):
          <fpage>467</fpage>
          -
          <lpage>479</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Besim</given-names>
            <surname>Kabashi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Proisl</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Albanian part-of-speech tagging: Gold standard and evaluation</article-title>
          .
          <source>In Proc. of LREC</source>
          , pages
          <fpage>2593</fpage>
          -
          <lpage>2599</lpage>
          , Miyazaki. ELRA.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Gabriella</given-names>
            <surname>Lapesa</surname>
          </string-name>
          and
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Evert</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>A large scale evaluation of distributional semantic models: Parameters, interactions and model selection</article-title>
          .
          <source>TACL</source>
          ,
          <volume>2</volume>
          :
          <fpage>531</fpage>
          -
          <lpage>546</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Percy</given-names>
            <surname>Liang</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Semi-supervised learning for natural language</article-title>
          .
          <source>Master's thesis</source>
          , Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Pierre</given-names>
            <surname>Lison</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jörg</given-names>
            <surname>Tiedemann</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Opensubtitles2016: Extracting large parallel corpora from movie and TV subtitles</article-title>
          .
          <source>In Proc. of LREC</source>
          , pages
          <fpage>923</fpage>
          -
          <lpage>929</lpage>
          , Portorož. ELRA.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Yinhan</given-names>
            <surname>Liu</surname>
          </string-name>
          , Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen,
          <string-name>
            <surname>Omer Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Mike</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Luke</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Veselin</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>RoBERTa: A robustly optimized BERT pretraining approach</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Verena</given-names>
            <surname>Lyding</surname>
          </string-name>
          , Egon Stemle, Claudia Borghetti, Marco Brunello, Sara Castagnoli, Felice Dell'Orletta, Henrik Dittmann, Alessandro Lenci, and
          <string-name>
            <given-names>Vito</given-names>
            <surname>Pirrelli</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>The PAISÀ corpus of Italian web texts</article-title>
          .
          <source>In Proc. of WaC-9</source>
          , pages
          <fpage>36</fpage>
          -
          <lpage>43</lpage>
          , Gothenburg. ACL.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Caterina</given-names>
            <surname>Mauri</surname>
          </string-name>
          , Silvia Ballarè, Eugenio Goria, Massimo Cerruti, and
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Suriano</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>KIParla corpus: A new resource for spoken Italian</article-title>
          .
          <source>In Proc. of</source>
          CLiC-it, Bari.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Sabrina J. Mielke</surname>
            , Ryan Cotterell, Kyle Gorman, Brian Roark, and
            <given-names>Jason</given-names>
          </string-name>
          <string-name>
            <surname>Eisner</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>What kind of language is hard to language-model?</article-title>
          <source>In Proc. of ACL</source>
          , pages
          <fpage>4975</fpage>
          -
          <lpage>4989</lpage>
          , Florence. ACL.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Pedro</given-names>
            <surname>Javier Ortiz Suárez</surname>
          </string-name>
          , Benoît Sagot, and
          <string-name>
            <given-names>Laurent</given-names>
            <surname>Romary</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures</article-title>
          .
          <source>In Proc. of CMLC-7</source>
          , pages
          <fpage>9</fpage>
          -
          <lpage>16</lpage>
          , Mannheim. Leibniz-Institut für Deutsche Sprache.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Proisl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Uhrig</surname>
          </string-name>
          , Philipp Heinrich, Andreas Blombach, Sefora Mammerella, Natalie Dykes, and
          <string-name>
            <given-names>Besim</given-names>
            <surname>Kabashi</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>The_Illiterati: Part-ofspeech tagging for Magahi and Bhojpuri without even knowing the alphabet</article-title>
          .
          <source>In Proc. of NSURL</source>
          , Trento.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Proisl</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>SoMeWeTa: A part-of-speech tagger for German social media and web texts</article-title>
          .
          <source>In Proc. of LREC</source>
          , pages
          <fpage>665</fpage>
          -
          <lpage>670</lpage>
          , Miyazaki. ELRA.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Shauli</given-names>
            <surname>Ravfogel</surname>
          </string-name>
          , Yoav Goldberg, and
          <string-name>
            <given-names>Francis</given-names>
            <surname>Tyers</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Can LSTM learn to capture agreement? The case of Basque</article-title>
          .
          <source>In Proc. of BlackboxNLP</source>
          , pages
          <fpage>98</fpage>
          -
          <lpage>107</lpage>
          , Brussels, November. ACL.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Fabio</given-names>
            <surname>Tamburini</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>UniBO@KIPoS: Fine-tuning the Italian “BERTology” for the EVALITA 2020 KIPOS task</article-title>
          .
          <source>In Proc. of EVALITA. CEUR.org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Ian</given-names>
            <surname>Tenney</surname>
          </string-name>
          ,
          <string-name>
            <surname>Dipanjan Das</surname>
            , and
            <given-names>Ellie</given-names>
          </string-name>
          <string-name>
            <surname>Pavlick</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>BERT rediscovers the classical NLP pipeline</article-title>
          .
          <source>In Proc. of ACL</source>
          , pages
          <fpage>4593</fpage>
          -
          <lpage>4601</lpage>
          , Florence. ACL.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Ke M. Tran</surname>
            , Yonatan Bisk, Ashish Vaswani, Daniel Marcu, and
            <given-names>Kevin</given-names>
          </string-name>
          <string-name>
            <surname>Knight</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Unsupervised neural hidden Markov models</article-title>
          .
          <source>In Proc. of the Workshop on Structured Prediction for NLP</source>
          , pages
          <fpage>63</fpage>
          -
          <lpage>71</lpage>
          , Austin, TX. ACL.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Eros</given-names>
            <surname>Zanchetta</surname>
          </string-name>
          and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Baroni</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Morph-it! A free corpus-based morphological resource for the Italian language</article-title>
          .
          <source>In Proc. of Corpus Linguistics</source>
          , Birmingham.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>