<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Introducing a Gold Standard Corpus from Young Multilinguals for the Evaluation of Automatic UD-PoS Taggers for Italian1</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Veronica Juliana Schmalz</string-name>
          <email>veronicajuliana.schmalz@kuleuven.be</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jennifer-Carmen Frey</string-name>
          <email>jennifercarmen.frey@eurac.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Egon W. Stemle</string-name>
          <email>egon.stemle@eurac.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>. Faculty of Informatics, Masaryk University</institution>
          ,
          <addr-line>Brno</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>. Free University of Bozen-Bolzano</institution>
          ,
          <addr-line>Bozen</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>. Institute of Applied Linguistics, Eurac Research</institution>
          ,
          <addr-line>Bozen</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>. KU Leuven, imec research group itec</institution>
          ,
          <addr-line>Kortrijk</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Part-of-speech (PoS) tagging constitutes a common task in Natural Language Processing (NLP) given its widespread applicability. However, with the advance of new information technologies and language variation, the contents and methods for PoS-tagging have changed. The majority of Italian existing data for this task originate from standard texts, where language use is far from multifaceted informal real-life situations. Automatic PoS-tagging models trained with such data do not perform reliably on non-standard language, like social media content or language learners' texts. Our aim is to provide additional training and evaluation data from language learners tagged in Universal Dependencies (UD), as well as testing current automatic PoStagging systems and evaluating their performance on such data. We use Italian texts from a multilingual corpus of young language learners, LEONIDE, to create a tagged gold standard for evaluating UD PoS-tagging performance on nonstandard language. With the 3.7 version of Stanza, a Python NLP package, we apply available automatic PoS-taggers, namely ISDT, ParTUT, POSTWITA, TWITTIRÒ and VIT, trained with diversified data, on our dataset. Our results show that the above taggers, trained on non-standard data or multilingual treebanks, can</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>
        Part-of-Speech (PoS) tagging relates to the
assignment of tags or labels to the words,
punctuation marks and symbols of a text. It
constitutes a basic task in NLP, with applications
ranging from machine translation to speech
recognition and beyond. PoS-tags usually
correspond to the morphosyntactic word classes
of a given language, i.e. nouns, verbs,
conjunctions, etc. Since each language contains
specific linguistic characteristics that distinguish
itself from others, tagsets are usually language
dependent. The first automatic tool for the
assignment of PoS-tags in the Italian language
was the TreeTagger built at the University of
Stuttgart
        <xref ref-type="bibr" rid="ref15">(Schmid, 1994)</xref>
        to perform
lemmatization and PoS-tagging contemporarily.
Another milestone in the history of Italian
PoStagging is the so-called Baroni's TreeTagger
tagset, released in 2003. It represents the initially
most adopted tagset, containing no less than 50
labels, half exclusively dedicated to verbs
        <xref ref-type="bibr" rid="ref1">(Baroni
et al., 2004)</xref>
        . Along with the latter, Tanl (Attardi
and Simi, 2009) constitutes an additionally
relevant and comprehensive tagset for Italian. It
counts with numerous tags and includes
morphological word features. Three subcategories
with different numbers of elements can be found
in it, namely 14 coarse-grained tags, 37
finegrained tags and 336 morphed tags.
      </p>
      <p>
        Originally, automatic tagging methods were
mainly employed with standard texts, such as
essays, literature, and newspaper articles
        <xref ref-type="bibr" rid="ref1">(Del
Monte et al., 2007; Baroni et al., 2004)</xref>
        . However,
with the advent of new communication systems
and the expansion of language studies to more
informal and common areas, attention started to
shift to non-standard texts. In this regard, in
several of the EVALITA periodic evaluation
campaings for Italian NLP and speech tools, PoS
tagging non-standard language has been a topic of
interest
        <xref ref-type="bibr" rid="ref14 ref16 ref3 ref4">(cf. Tamburini, 2007; Attardi and Simi,
2009; Bosco et al., 2016, Bosco et al., 2020)</xref>
        .
These tasks proved that PoS-tagging still
represents an unsolved issue when it comes to less
widely used language from different domains.
Therefore, more studies and investigations are
needed on specific language varieties.
      </p>
      <p>
        Learner corpora exhibit a number of
characteristics that differentiate them from the
rest. In particular, numerous code-switching and
code-mixing phenomena are common among
them, as well as the presence of orthographical,
syntactic and/or grammatical errors
        <xref ref-type="bibr" rid="ref8">(Di Novo et
al., 2019)</xref>
        . More in detail, our data exhibited some
peculiarities, for example the co-presence of
variants for concepts (“Franco viene a casa e vede
che fuocare/brenn”) or new words combining
different languages and morphologies (“Se sarò
un giocatore famoso richerò money”). Given
these distinctive aspects, analysing them in the
context of PoS-tagging can offer interesting
insights from the point of view of both the
conception of these systems and their linguistic
implications.
      </p>
      <p>The rest of the paper is organized as follows.
Section 2 provides relevant details concerning the
Universal Dependencies (UD), as well as
available Italian treebanks and taggers1. A brief
overview about the differences in tagging
standard and non-standard texts is presented in
Section 3. Section 4 describes the methods and
metrics commonly used for the evaluation of
automatic taggers. We outline the tools and
methodologies used for our experiments in
Section 5 and the gold standard in Section 6. Next,
in Section 7, we report the obtained results and in
the subsequent section, namely 8, we discuss our
findings, consider possible future works and draw
our final conclusions.
1 In this paper we use this term to refer to the Stanza
models trained with the different available Italian
Treebanks.</p>
    </sec>
    <sec id="sec-3">
      <title>Universal Dependencies and Italian</title>
    </sec>
    <sec id="sec-4">
      <title>Treebanks</title>
      <p>
        Over the years, alongside the different taggers and
treebanks of each language, a new
languageindependent framework in PoS annotation has
emerged, the Universal Dependencies (UD). UD
is a cross-linguistic project with the aim of
building common annotation frameworks for
several world languages. Underlying the
Universal Dependencies annotation scheme are
universal Stanford dependencies
        <xref ref-type="bibr" rid="ref6">(Marneffe et al.,
2008)</xref>
        , Google universal PoS-tags
        <xref ref-type="bibr" rid="ref11">(Petrov et al.,
2011)</xref>
        and the Interset interlingua for
morphosyntactic tagsets
        <xref ref-type="bibr" rid="ref10">(cf. McDonald et al.,
2013)</xref>
        . In particular, for the Italian language, the
UD counts seven different Treebanks. These are
VIT, or the Venice Italian Treebank
        <xref ref-type="bibr" rid="ref7">(Delmonte et
al., 2007)</xref>
        , ISDT, Italian Stanford Dependency
Treebank
        <xref ref-type="bibr" rid="ref13 ref2">(Bosco et al., 2014)</xref>
        , ParTUT, or the
Parallel Text Universal Treebank
        <xref ref-type="bibr" rid="ref13">(Sanguinetti et
al. 2014)</xref>
        , PoSTwita
        <xref ref-type="bibr" rid="ref3">(Bosco et al., 2016)</xref>
        ,
TWITTIRÒ
        <xref ref-type="bibr" rid="ref5">(Cignarella et al., 2018)</xref>
        , Valico-UD
        <xref ref-type="bibr" rid="ref8">(Di Novo et al., 2019)</xref>
        and PUD, or the Parallel
Universal Dependencies Treebank (Zeman et al.,
2018). The UD universal Italian tagset counts a
total of 17 different labels
        <xref ref-type="bibr" rid="ref17">(Universal
Dependencies, 2021)</xref>
        .
3
      </p>
    </sec>
    <sec id="sec-5">
      <title>Pos-Tagging Standard</title>
    </sec>
    <sec id="sec-6">
      <title>Nonstandard Language vs</title>
      <p>
        Among the various available treebanks and
taggers for Italian, most have been created using
exclusively standard data, such as newspaper
articles, non-fictional texts, talks and Wikipedia
pages for training the models (as in the case of
VIT, ISDT, ParTUT and PUD). However,
recently more attention has been placed on the
creation of linguistic resources for nonstandard
language, as the quantity and dissemination of this
type of content increases exponentially, so does
the need for suitable tools for its analysis and
exploitation. In this respect, PoSTwita
        <xref ref-type="bibr" rid="ref3">(Bosco et
al., 2016)</xref>
        and TWITTIRÒ
        <xref ref-type="bibr" rid="ref5">(Cignarella et al.,
2018)</xref>
        resorted to additional non-standard Italian
linguistic data from Twitter, while Valico-UD
        <xref ref-type="bibr" rid="ref8">(Di
Novo et al., 2019)</xref>
        used texts from Italian learners
for the creation of their treebanks. Some of the
main reasons why the use of standard language
data outweighs that of nonstandard data are
difficulties concerning the automatic processing
and annotation of such texts. This applies
especially when seeing the considerable amount
of variation they contain, not only in the language
itself, but also in the usage domains and among
the individual language users
        <xref ref-type="bibr" rid="ref12 ref14">(cf. Plank, 2016 and
Sanguinetti et al. 2020)</xref>
        . As a matter of fact, some
distinctive features of non-standard texts are the
broad variation in the structure and punctuation of
utterances, namely in the syntax, but also at
lexical level due to the use of abbreviations,
domain-specific symbols or incorrect derivational
forms, as well as code-switching for learners’
language. The latter are likely to lead to issues
regarding both automatic language processing,
such as tokenization and lemmatization, and
PoStagging, especially in the case of non-suitable or
incomplete standard treebanks. For these reasons,
the creation of resources from non-standard texts,
like social media users or language learners, is
crucial.
4
      </p>
    </sec>
    <sec id="sec-7">
      <title>Evaluation of Automatic PoS-Taggers</title>
      <p>
        When it comes to evaluating the performance of a
PoS-tagger, generally an annotated gold standard
reference corpus is used. The latter requires a
distribution of the particular linguistic phenomena
that is representative of the PoS-tagger’s target
application. Additionally, since a PoS-tagger
combines several functions, like tokenization,
word/sentence segmentation, and PoS-tag
disambiguation, one of these parts must be firstly
chosen as the test object. After selecting the aspect
under analysis, it is necessary to choose which
metrics to use to compare the results. The metrics
commonly adopted for the evaluation of the tags
assigned to a linguistic corpus are accuracy,
precision, recall, F1-scores and Cohen’s K
        <xref ref-type="bibr" rid="ref6">(cf.
Arstein and Poesio, 2008)</xref>
        . These metrics vary not
only in terms of the aspects they measure but also
according to the type of data that constitute the
corpus and its size.
      </p>
      <p>
        Although various available UD taggers for
Italian exist, little is known about how these
perform on non-standard data. Some evaluations
have been done on user-generated texts in social
media
        <xref ref-type="bibr" rid="ref3 ref5">(Bosco et al. 2016; Cignarella et al. 2018)</xref>
        and recently also on spoken language
        <xref ref-type="bibr" rid="ref14 ref4">(Bosco et al.
2020)</xref>
        and adult learners of Italian with English,
French, German and Spanish as first languages
        <xref ref-type="bibr" rid="ref8">(Di Novo et al. 2019)</xref>
        . However, this is still a
nascent process, and the number of studies and
analysed varieties are limited. Therefore, a closer
examination and evaluation of an automatic
tagger on an additional non-standard resource
from a different domain promises to enhance our
knowledge about PoS-tagging.
5
      </p>
    </sec>
    <sec id="sec-8">
      <title>Methodology</title>
      <p>
        In this study, we evaluate automatic PoS-tagging
on the LEONIDE corpus
        <xref ref-type="bibr" rid="ref9">(Glaznieks et al., 2020)</xref>
        to investigate how existing tagging models trained
with the already available Italian treebanks
perform with data from young language learners.
      </p>
      <p>Given the inaccessibility of an evaluation
sample for UD PoS-tagging on Italian learner
language, we built our own pre-tokenized gold
standard sample (see Section 6). Once we had our
gold standard, we created a processing pipeline to
test available tagging models for Italian on our
data. For this, we used Stanza, a Python natural
language analysis package designed using the UD
formalism, as it offered easy access to a number
of pre-trained models for PoS-tagging UD in
Italian. The following models have been used in
our evaluation: ISDT, ParTUT, POSTWITA,
TWITTIRÒ and VIT. In order to evaluate only the
PoS-tag disambiguation step of the PoS-taggers,
regardless from other steps such as tokenization,
we tagged the pre-tokenized texts using Stanza
but deactivated the tokenizer
(tokenize_pretokenized=True) and selected a
different model as parameter each time. With the
results obtained from each model, we resorted to
sklearn.metrics.classification_report and
sklearn.metrics.cohen_kappa_score to evaluate
the total number of tags assigned to the more than
7,000 gold standard tokens according to accuracy
and Cohen's K. In this way, the use of the exact
same tokens and comparison metrics would have
allowed an equal and meaningful comparison.</p>
      <p>
        We closely focused on the accuracy and
Cohen’s K values
        <xref ref-type="bibr" rid="ref6">(cf. Artstein and Poesio, 2008)</xref>
        because the first allowed us to check the overall
performance of the tagger as well as the results on
each tag’s class, and the second to evaluate the
similarity between the gold-standard and the
automatically assigned tags.
      </p>
      <p>As the available models had been trained on
different data, both in quantity and type compared
to each other but also compared to our corpus, it
was particularly interesting to consider how they
would deal with the young language learner data
at hand, but also which type of errors they would
make. We thus investigate common
misclassifications for taggers and human
annotators, discussing possible improvements and
considerations to bear in mind when using these
automatic PoS-tagging systems. For the latter, we
used confusion matrices, so that we could check
the types of errors made, and which were the most
correctly assigned tags out of the total.
6</p>
    </sec>
    <sec id="sec-9">
      <title>Gold Standard</title>
      <p>
        For the creation of our gold standard, we used a
subset of the Longitudinal lEarner cOrpus iN
Italian, Deutsch, English (LEONIDE)
        <xref ref-type="bibr" rid="ref9">(Glaznieks
et al., 2020)</xref>
        , a collection of 2,512 texts from 163
trilingual pupils attending lower secondary school
(scuola media) in South Tyrol. The corpus
contains texts in three languages, namely English,
German, and Italian, and in two text genres,
meaning narrative in the form of a
pictureinspired story and argumentative in the form of a
simple opinion text. Over the span of three years,
the pupils were asked to write one text for each of
the three languages and each of the text genres per
year. The portion of Italian data in the corpus
amounts to 844 texts counting 93,378 tokens. For
our gold standard2, we randomly selected a
sample of 10% of the total available Italian texts,
i.e. 84 texts with 7,665 tokens. We pre-tokenized
and pre-tagged the texts in the sample using
Stanza with the combined PoS-tagging model3 in
order to present our annotators with vertical files
with one token per line and a PoS-tag to be
eventually corrected. Once this step was
completed, two language experts, native speakers
of Italian, independently annotated the texts,
correcting and adjusting the automatically
pretagged version using the guidelines and
documentation for the UD PoS tags and making
use of the whole UD tagset. Their inter-annotator
agreement in the independent tagging was
relatively high, achieving a Cohen’s Kappa of
0.98. In order to investigate a possible effect given
by the use of a pre-tagged corpus version by the
annotators, we also tested tagging the texts from
scratch, meaning without any pre-assigned labels
in the tokenized texts. For this purpose, we
selected a random sample of ten texts extracted
from the original corpus. Once again, to compare
the two tagged versions we calculated the Cohen's
K value, which resulted in 0.95. Hence, we can
conclude that the pre-tagged version had no
particular effect on the annotators and did not
significantly affect their annotation.
3This indicates the Stanza model which originates from
a combination of the existing taggers given by the
Treebanks for the Italian language
https://stanfordnlp.github.io/stanza/combined_models.
2Available at http://hdl.handle.net/20.500.12124/34
4https://universaldependencies.org/it/
      </p>
      <p>Despite the generally good agreement between
the annotators, some difficulties emerged. These
mainly concerned cases of German
codeswitching, particles, clitic pronouns and auxiliary
verbs (see Discussion), and occasionally
orthographical or overgeneralization errors (ex.
Da grande facherò [X/VERB] il calciatore). For
the gold standard these issues were unanimously
resolved in accordance with the Italian UD
guidelines4.
7</p>
    </sec>
    <sec id="sec-10">
      <title>Results</title>
      <p>Table 1 displays the obtained results in terms of
tagging models’ accuracy and Cohen’s K, this
time comparing the gold standard and the taggers’
assigned tags, along with the accuracy scores
reported in Stanza for the CoNLL 2018 Shared
Task5 on UD v2.5 Treebanks evaluation.</p>
      <p>Tagger
Combined
TWITTIRÒ
ParTUT
PoSTWITA
ISDT
VIT</p>
      <p>Training</p>
      <p>data
(in tokens)
Pre-trained
28,387
(ironic
tweets)
Multilingu
al parallel
treebank
119,238
(Tweets)
278,429
(articles,
newspapers
, legal texts,
Wikipedia)
272,000
(news,
bureaucrac
y, finance,
science,
literature
texts)</p>
      <p>Accuracy
(Stanza)</p>
      <p>The highest accuracy on our gold standard for
learner data has been achieved by the combination
of models chosen by Stanza per default. We
5https://universaldependencies.org/conll18/evaluation.
html
would have expected better results from ISDT,
considering the high accuracy values on the
standard data used to train it, and PoSTWITA for
non-standard texts. However, regardless of this, in
respect to our gold standard, the best models for
accuracy value and Cohen's K are TWITTIRÒ
and ParTUT. These latter performed well despite
the fact that their tagsets did not contain all the
tags used in our gold standard. In fact, both
TWITTIRÒ and ParTUT, as well as PoSTWITA,
did not include the PART tag (contrary to the
other treebanks such as VIT and ISDT) , and thus
did not assign it to particles. However, our human
annotators referred to this tag to mark the common
use of pronominal, reflexive and adverbial
particles, such as 'mi' and ‘si’ in the corpus (ex.
Più lingue ci sono; Si deve studiare molto).
Furthermore, the parTUT treebank also lacked the
tag for interjections, INTJ, as opposed to other
treebanks that did make use of this category.
Nevertheless, the training data for TWITTIRÒ
was the treebank provided in Stanza that was
closest to our data in type. It was created using
data from social networks, therefore far from the
scientific, nonfictional, or journalistic canon. On
the other hand, ParTUT had been designed using
standard texts but in Italian, English, and French
in parallel.
8</p>
    </sec>
    <sec id="sec-11">
      <title>Discussion</title>
      <p>The results show that the performance of the
models was significantly influenced by the
particular type of data in our gold standard corpus,
which presented incorrect orthographical or
morphological tokens, but also contained
numerous foreign words and abnormally disposed
parts-of-speech within the sentence.</p>
      <p>In fact, when inspecting the tags incorrectly
assigned by the different models with confusion
matrices (see Figure 1, 2 and 3 below), we noticed
that:
●</p>
      <p>The foreign or misspelled words, which
according to the UD rules had to be
assigned the X tag, proved to be those with
the highest number of errors. In fact, they
were often confused with proper nouns,
PROPNs, especially in the case of
codeswitching with the German language,
where nouns are spelled with initial capital
letters (ex. Dopo la scuola media voglio
fare la Hotelfachschule [PROPN-X]). This
●
●
was particularly evident with the ParTUT
model that did not assign the X tag at all
(see Figure 3);
The second most incorrectly tagged words
were particles, PART, which are not
included in the tagsets of all models
although they could have been assigned to
pronominal, reflexive and adverbial
particles (see section 7). Instead, these
words were usually assigned the PRON tag
for pronouns (ex. Si [PRON-PART] deve
parlare questa lingua);
The third most inaccurate group of tagged
words was that of interjections, INTJ,
which were also not included in all
treebanks. These were often confused with
particles or foreign words, PART or X (ex.
Ehm [X-INTJ] ciao! fece Alessandra) as it
is visible from Figure 2 in the case of the
TWITTIRÒ model.</p>
      <p>On the other hand, regarding discrepancies
between the tags assigned by the human
annotators, we found that:
●
●</p>
      <p>The groups on which there was most
disagreement between the two annotators
concerned particles, PART, and auxiliary
verbs, AUX6 (ex. Le strategie che
funzionano peggio sono [AUX-VERB]
studiare con il computer). Concerning the
first, the models did not always include
the PART tag in their employed tagsets.
Auxiliary verbs, additionally, were also at
times abnormally positioned within the
sentence and were often automatically
annotated incorrectly.</p>
      <p>Foreign words were often not annotated
according to the X tag, probably because
the annotators also had knowledge of the
German language and therefore tended to
assign the corresponding tag in the other
language (ex. Faccio la
Landesberufschule [NOUN-X]).</p>
      <p>
        We can therefore argue that there were errors
common to both automatic models and
annotators, although the reasons for the errors
were evidently different.
6 This might be due to the fact that annotators could be
influenced by the presence next to each token of a tag
automatically assigned by Stanza, that had performed
the tokenization of the texts.
Although all taggers managed to execute the task
of automatically PoS-tagging pre-tokenized
Italian non-standard language with an accuracy of
at least 75% (with the combined model offered by
Stanza showing the best performance with 95%
accuracy and 0.94 Cohen’s K), there were
differences in the performance shown by the
individual models. The best performing two
individual models were TWITTIRÒ (86%) and
ParTUT (84%), while ISDT and PoSTWITA, that
performed better in other evaluation tasks
        <xref ref-type="bibr" rid="ref13 ref2 ref5">(Bosco
et al. 2014, Cignarella et al. 2018)</xref>
        had a lower
accuracy on our data. These results hint towards
the fact that in order to automatically tag
nonstandard texts relating to language learners, the
use of high-performance systems in the generic
task is not sufficient, but the characteristics of the
actual texts must also be taken into account.
      </p>
      <p>Improvements could be made in the future
regarding the adaptation of the models to the
particular type of data used here. They could be,
indeed, re-trained again in case a complete
Treebank with Italian non-standard data becomes
available. In addition, further attempts could be
made to adapt or add the missing tags to the
tagsets of all models so as not to have results
biased by the lack of matching tags. Finally, as far
as the annotators are concerned, they could be
provided with the automatically pre-tokenized
texts from the models, but in order to avoid
preassigned tags influencing their annotation
process, it would be preferable to omit these.
Thus, human annotators would only get the
taggers’ tokenized text versions, so that the same
tokens will be available for everyone, while the
assignment of PoS would be completely up to
them.
Artstein, Ron and Poesio, Massimo (2008).
InterCoder Agreement for Computational Linguistics.</p>
      <p>Association for Computational Linguistics.</p>
      <p>Attardi, Giuseppe and Simi, Maria (2009). Overview
of the EVALITA 2009 Part-of-Speech tagging task.
Poster and Workshop proceedings of the 11th
Conference of the Italian Association for Artificial
Intelligence, 12 December 2009, Reggio Emilia.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Baroni</surname>
          </string-name>
          , Marco, Bernardini, Silvia, Comastri, Federica, Piccioni, Lorenzo, Volpi, Alessandra, Aston, Guy and Mazzoleni,
          <string-name>
            <surname>Marco</surname>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>Introducing the La Repubblica Corpus: A Large, Annotated, TEI(XML)-Compliant Corpus of Newspaper Italian</article-title>
          .
          <source>Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC'04)</source>
          . Lisbon: ELDA,
          <fpage>1771</fpage>
          -
          <lpage>1774</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Bosco</surname>
          </string-name>
          , Cristina, Dell'Orletta, Felice, Montemagni, Simonetta, Sanguinetti, Marco and Simi,
          <string-name>
            <surname>Maria</surname>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>The Evalita 2014 Dependency Parsing task</article-title>
          .
          <source>Proceedings of the First Italian Conference on Computational Linguistics CLiC-it 2014 &amp; and of the Fourth International Workshop EVALITA</source>
          <year>2014</year>
          ,
          <volume>9</volume>
          -
          <issue>11</issue>
          <year>December 2014</year>
          , Pisa,
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Bosco</surname>
          </string-name>
          , Cristina, Tamburini, Fabio, Bolioli, Andrea and Mazzei,
          <string-name>
            <surname>Alessandro</surname>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Overview of the EVALITA 2016 Part Of Speech on TWitter for ITAlian task</article-title>
          . In: Tamburini,
          <string-name>
            <surname>F</surname>
          </string-name>
          . (Ed.),
          <article-title>EVALITA Evaluation of NLP and Speech Tools for Italian</article-title>
          .
          <source>Proceedings of the Final Workshop</source>
          , Torino: Accademia University Press,
          <fpage>78</fpage>
          -
          <lpage>84</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Bosco</surname>
          </string-name>
          , Cristina, Ballaré, Silvia, Cerruti, Massimo, Goria, Eugenio, &amp;
          <string-name>
            <surname>Caterina</surname>
          </string-name>
          ,
          <string-name>
            <surname>Mauri</surname>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>KIPoS@ EVALITA2020: overview of the task on Kiparla part of speech tagging</article-title>
          .
          <source>In EVALITA 2020 Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</source>
          (pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          ). CEUR.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Cignarella</surname>
          </string-name>
          , Alessandra Teresa, Bosco, Cristina, Patti, Viviana, &amp;
          <string-name>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <surname>Mirko</surname>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Application and analysis of a multi-layered scheme for irony on the Italian Twitter Corpus TWITTIRÒ</article-title>
          . In: Calzolari, Nicoletta, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Hasida, Koiti, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Hélène, Moreno, Asuncion, Odijk, Jan, Piperidis, Stelios and Tokunaga, Takenobu (Eds.),
          <source>Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC</source>
          <year>2018</year>
          ),
          <source>Miyazaki: European Language Resources Association</source>
          ,
          <fpage>4204</fpage>
          -
          <lpage>4211</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Marneffe</surname>
            ,
            <given-names>Marie</given-names>
          </string-name>
          <string-name>
            <surname>Catherine</surname>
            and Manning,
            <given-names>Christopher D.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Stanford typed dependencies manual</article-title>
          .
          <source>Technical report</source>
          , Stanford University,
          <fpage>338</fpage>
          -
          <lpage>345</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Delmonte</surname>
          </string-name>
          , Rodolfo, Bristot, Antonella and Tonelli,
          <string-name>
            <surname>Sare</surname>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>VIT - Venice Italian Treebank: Syntactic and Quantitative Features</article-title>
          . In: De Smedt, Koenraad, Hajic, Jan and Kübler, Sandra (Eds.),
          <source>Proceedings Sixth International Workshop on Treebanks and Linguistic Theories</source>
          ,
          <article-title>Bergen: Northern European Association for Language Technology (NEALT</article-title>
          ) Proceedings Series Vol.
          <volume>1</volume>
          ,
          <fpage>43</fpage>
          -
          <lpage>54</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Di</given-names>
            <surname>Novo</surname>
          </string-name>
          , Elisa, Bosco, Cristina, Mazzei, Alessandro and Sanguinetti,
          <string-name>
            <surname>Manuela</surname>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Towards an Italian Learner Treebank in Universal Dependencies</article-title>
          . In: Bernardi, Raffaella, Navigli Roberto and Semeraro, Giovanni (Eds.),
          <source>Proceedings of the 6th Italian Conference on Computational Linguistics</source>
          , CliC-it
          <year>2019</year>
          (Vol.
          <volume>2481</volume>
          ), Bari, Italy,
          <source>CEUR WS: 1-6.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Glaznieks</surname>
          </string-name>
          , Aivars, Frey, Jennifer-Carmen, Stopfner, Maria, Zanasi, Lorenzo and Nicolas,
          <string-name>
            <surname>Lionel</surname>
          </string-name>
          (
          <year>2020</year>
          )
          <article-title>: LEONIDE: A longitudinal trilingual corpus of young learners of Italian, German and English</article-title>
          .
          <source>International Journal of Learner Corpus Research (IJLCR).</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>McDonald</surname>
          </string-name>
          , Ryan, Nivre, Joakim, QuirmbachBrundage, Yvonne, Goldberg, Yoav, Das, Dipanjan, Ganchev, Kuzman, Hall, Keith, Petrov, Slav, Zhang, Hao, Täckström, Oscar, Bedini, Claudia, Bertomeu Castelló, Nuria and Lee,
          <string-name>
            <surname>Jungmee</surname>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Universal Dependency Annotation for Multilingual Parsing</article-title>
          . In: Schütze, Henrich, Fung, Pascale, Poesio, Massimo (Eds.),
          <source>Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics</source>
          , Sofia: Association for Computational Linguistics,
          <fpage>92</fpage>
          -
          <lpage>97</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Petrov</surname>
            , Slav, Das, Dipanjan and
            <given-names>McDonald</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ryan</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>A universal part-of-speech tagset</article-title>
          . In: Calzolari, Nicoletta, Choukri, Khalid, Declerk, Thierry, Uğur Doğan, Mehmet, Maegaard, Bente, Mariani, Joseph, Moreno, Asuncion, Odijk, Jan and Piperidis, Stelios (Eds.),
          <source>Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)</source>
          ,
          <source>Instanbul: European Language Resources Association (ELRA)</source>
          ,
          <year>2089</year>
          -
          <fpage>2096</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Plank</surname>
          </string-name>
          ,
          <string-name>
            <surname>Barbara</surname>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>What to do about non-standard (or non-canonical) language in NLP</article-title>
          . In: Sharma Misra, Dipti, Sangal, Rajeev, Singh Kumar, Anil (Eds.),
          <source>Proceedings of the 13th Conference on Natural Language Processing</source>
          , Varanasi: NLP Association of India.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Sanguinetti</surname>
          </string-name>
          , Manuela and Bosco,
          <string-name>
            <surname>Cristina</surname>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Converting the parallel treebank ParTUT in Universal Stanford Dependencies</article-title>
          . In: Basili, Roberto, Lenci, Alessandro and Magnigni, Bernardo (Eds.),
          <source>Proceedings of the First Italian Conference on Computational Linguistics CLiC-it 2014 &amp; and of the Fourth International Workshop EVALITA</source>
          <year>2014</year>
          :
          <fpage>9</fpage>
          -
          <issue>11</issue>
          <year>December 2014</year>
          , Pisa, Pisa: Pisa University Press,
          <fpage>316</fpage>
          -
          <lpage>321</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Sanguinetti</surname>
            , Manuela, Cassidy, Lauren, Bosco, Cristina, Çetinoğlu, Özlem, Cignarella, Alessandra Teresa, Lynn, Teresa, Rehbein, Ines, Ruppenhofer, Josef,
            <given-names>Seddah</given-names>
          </string-name>
          <string-name>
            <surname>Djamé</surname>
          </string-name>
          &amp; Zeldes,
          <string-name>
            <surname>Amir</surname>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Treebanking user-generated content: a UD based overview of guidelines, corpora and unified recommendations</article-title>
          . arXiv preprint arXiv:
          <year>2011</year>
          .
          <year>02063</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Schmid</surname>
          </string-name>
          ,
          <string-name>
            <surname>Helmut</surname>
          </string-name>
          (
          <year>1994</year>
          )
          <article-title>: Probabilistic Part-of-Speech Tagging Using Decision Trees</article-title>
          .
          <source>Proceedings of International Conference on New Methods in Language Processing</source>
          , Manchester, UK.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Tamburini</surname>
          </string-name>
          ,
          <string-name>
            <surname>Fabio</surname>
          </string-name>
          (
          <year>2007</year>
          ).
          <source>EVALITA</source>
          <year>2007</year>
          :
          <article-title>The Partof-Speech Tagging Task</article-title>
          .
          <article-title>Contributi scientifici Associazione Italiana per l' Intelligenza Artificiale</article-title>
          .
          <string-name>
            <surname>Anno</surname>
            <given-names>IV</given-names>
          </string-name>
          ,
          <year>Giugno 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Universal</given-names>
            <surname>Dependencies</surname>
          </string-name>
          (
          <year>February 2021</year>
          ).
          <article-title>UD for Italian</article-title>
          . https://universaldependencies.org/it/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>