=Paper= {{Paper |id=Vol-2765/154 |storemode=property |title=KIPoS @ EVALITA2020: Overview of the Task on KIParla Part of Speech Tagging |pdfUrl=https://ceur-ws.org/Vol-2765/paper154.pdf |volume=Vol-2765 |authors=Cristina Bosco,Silvia Ballarè,Massimo Cerruti,Eugenio Goria,Caterina Mauri |dblpUrl=https://dblp.org/rec/conf/evalita/BoscoBCGM20 }} ==KIPoS @ EVALITA2020: Overview of the Task on KIParla Part of Speech Tagging== https://ceur-ws.org/Vol-2765/paper154.pdf
                             KIPoS @ EVALITA2020:
              Overview of the Task on KIParla Part of Speech Tagging
 Cristina Bosco? , Silvia Ballarè , Massimo Cerruti⊕ , Eugenio Goria⊕ , Caterina Mauri
                ?
                  Dipartimento di Informatica, Università degli Studi di Torino
    
      Dipartimento di Filologia Classica e Italianistica, Università degli Studi di Bologna
             ⊕
               Dipartimento di Studi Umanistici, Università degli Studi di Torino
  Dipartimento di Lingue, Letterature e Culture Moderne, Università degli Studi di Bologna
 {cristina.bosco,massimosimone.cerruti,eugenio.goria}@unito.it,
                  {silvia.ballare,caterina.mauri}@unibo.it
                        Abstract                                Written corpora are generally larger, are able to
                                                                provide a lot of information about the texts they
    English. The paper describes the first task                 include, and may count on a vast array of computa-
    on Part of Speech tagging of spoken lan-                    tional tools for morphological analysis and syntac-
    guage held at the Evalita evaluation cam-                   tic parsing. Conversely, spoken corpora of Italian
    paign, KIPoS. Benefiting from the avail-                    are generally smaller, often give a minimum of in-
    ability of a resource of transcribed spo-                   formation concerning the speakers and the context
    ken Italian (i.e. the KIParla corpus), which                in which the interaction takes place and, finally,
    has been newly annotated and released for                   provide at most basic PoS-tagging and lemmatiza-
    KIPoS, the task includes three evaluation                   tion tools. This, of course, poses considerable lim-
    exercises focused on formal versus infor-                   itations on the searches that may be performed on
    mal spoken texts. The datasets and the                      these resources, eventually leading to a possible
    results achieved by participants are pre-                   written language bias due to the different avail-
    sented, and the insights gained from the                    ability and richness of information of written vs.
    experience are discussed.                                   spoken corpora (Linell, 2005).
    Italiano. L’articolo descrive il primo                         As a consequence of this unbalance, corpus-
    task sul Part of Speech tagging di lin-                     based sociolinguistic analyses of spoken Italian,
    gua parlata tenutosi nella campagna di                      which need a comprehensive set of metadata,
    valutazione Evalita. Usufruendo di una                      have rarely been put to the test on publicly avail-
    risorsa che raccoglie trascrizioni di lin-                  able speech corpora. In fact, most sociolinguistic
    gua italiana (il corpus KIParla), anno-                     studies have been conducted on ad hoc-collected
    tate appositamente per KIPoS, il task è                    datasets, see inter al. (Alfonzetti, 2002; Mereu,
    stato focalizzato intorno a tre valutazioni                 2019).
    con lo scopo di confrontare i risultati rag-                   The KIParla corpus (Mauri et al., 2019) (661k
    giunti sul parlato formale con quelli ot-                   tokens approximately), which is available at the
    tenuti sul parlato informale. Il corpus di                  website www.kiparla.it, has been designed
    dati ed i risultati raggiunti dai parteci-                  to overcome some shortcomings of previous re-
    panti sono presentati insieme alla discus-                  source tools. KIParla is a corpus of spoken Italian
    sione di quanto emerso dall’esperienza di                   which encompasses various types of interactions
    questo task.                                                between speakers of different origins and socioe-
                                                                conomic backgrounds. It consists of speech data
                                                                collected in Bologna and Turin between 2016 and
1    Motivation                                                 2019, and contains two independent modules, i.e.
Even (Bosco et al., 2020) though in the last                    KIP (cf. sec. 3) and ParlaTO. Among other things,
decades we have witnessed an increase in the re-                KIParla provides a wide range of metadata, includ-
sources available for the study of spoken Italian,              ing situational characteristics (such as the sym-
a great unbalance can still be observed between                 metrical vs. asymmetrical relationship between
spoken and written corpora, from different angles.              the participants) and socio-demographic informa-
                                                                tion for each speaker (such as age and level of edu-
     Copyright © 2020 for this paper by its authors. Use per-
mitted under Creative Commons License Attribution 4.0 In-       cation). Nevertheless, the lack of PoS-tagging and
ternational (CC BY 4.0).                                        lemmatization currently places severe limits on its
application.                                           meaningfully used in the development of auto-
   In order to enrich the scenario of investigation    matic conversation systems and chatbots.
to be applied on the KIParla corpus, we proposed
the KIPoS task. Following the experience of the        2    Definition of the task
Evalita 2016 PoSTWITA task on PoS tagging Ital-        Given the innovative features of KIParla, we
ian Social Media Texts (Bosco et al., 2016) and        proposed KIPoS as a task for EVALITA 2020
the subsequent development of an Italian treebank      (Basile et al., 2020) to address the issues involved
for social media (Sanguinetti et al., 2017; San-       in the adaptation of a PoS tagger to the spe-
guinetti et al., 2018), where the issues related to    cific features of oral text, in order to systemati-
a particularly challenging written text genre were     cally represent those features and to provide the
addressed, KIPoS offers the opportunity of ad-         mean to access to their specificities. We pro-
dressing the theoretical and methodological chal-      vided therefore data for training (i.e. Development
lenges related to PoS tagging of Italian sponta-       Set, henceforth DEVSET) and testing (Test Set,
neous speech texts. Carrying out this task means       henceforth TESTSET) systems organized in two
processing a type of data that is known to be prob-    ensembles which respectively represent formal
lematic for computational treatment, that is un-       (DEVSET–formal and TESTSET–formal) and in-
planned spoken language (as opposed to experi-         formal texts (DEVSET–informal and TESTSET–
mental speech data). PoS tagging of this corpus        informal). This allowed us to consider one main
entails dealing with both a wide range of sponta-      task and two subtasks, which are described as fol-
neous speech phenomena and a great amount of           lows:
sociolinguistic variation.
   The most challenging aspects to be addressed in         • Main task - general: training on all given
the unconstrained speech of KIParla are:                     data (both DEVSET–formal and DEVSET–
                                                             informal) and testing on all test set data (both
   • To identify mode-specific phenomena, such
                                                             TESTSET–formal and TESTSET–informal)
     as repetitions, reformulations, fillers, incom-
     plete syntactic structures, etc.                      • Subtask A - crossFormal: training on data
                                                             from DEVSET–formal only, and testing sep-
   • To trace a relevant set of non-standard al-
                                                             arately on data from formal texts (TESTSET–
     ternatives back to the same linguistic phe-
                                                             formal) and from informal texts (TESTSET–
     nomenon (e.g.       the presence of socio-
                                                             informal)
     geographically marked forms like annà or
     andà, equal to standard Italian andare ”to           • Subtask B - crossInformal: training on
     go”), either assigning them to the correct              data from DEVSET–informal only, and test-
     part-of-speech, or working out an ad-hoc so-            ing separately on data from formal texts
     lution.                                                 (TESTSET–formal) and from informal texts
                                                             (TESTSET–informal).
   • To deal with different types of interaction and
     registers (casual conversations, interviews,         While all tasks are oriented to investigate how
     office hours, etc.) with a variable number        challenging can it be to PoS-tag spontaneous
     of participants (1 to 5), each transcribed on     speech data, the cross ones are especially useful
     a separate line and corresponding to an au-       for validating the hypothesis that some differences
     tonomous text string.                             occur between the tagging of formal conversations
   PoS-tagging of data from KIParla corpus is in-      and that of informal conversations. As we will see
tended to bring an improvement to the current          in section 5 and 6, this hypothesis is partially con-
practices in use for tagging and parsing spoken        firmed by results. Some example useful to draw
Italian. Furthermore, this result is also signifi-     the difference among the registers is provided in
cant for the purposes of (socio)linguistic research,   the next section.
in that the availability of annotated spoken cor-
                                                       3    Datasets
pora enables the researcher to validate previous
assumptions based on smaller or less informa-          All the data provided for the KIPoS task are ex-
tive datasets, but also to collect knowledge to be     tracted from the KIP module (see Section 1),
                               Dataset       Register   Speakers      Turns      Tokens
                               DEVSET         Formal       5          1.998      13.864
                                             Informal      11         3.804      19.259
                              TESTSET         Formal       2           459        3.642
                                             Informal      2           582        3.532

                                          Table 1: The sizes of the datset.

which includes various communicative situations             BO003: cioe’ tra per i miei gusti tra il
occurring in the academic context. As explained             gruppo
                                                            BO002: no eh
in detail in (Mauri et al., 2019), the recordings in-       BO002: carino sia
volve five different types of interactions, each of         BO002: di viso ma anche
which is assigned for the aims of KIPoS either to           BO003: poi e’ anche il piu’ si’ si’ si’
                                                            e’ cornificatissimo non cornificato
the section of formal texts or to the section of in-
formal texts (mainly on the basis of the relation-
                                                            Both excerpts feature spontaneous speech phe-
ship between the participants, i.e. asymmetrical
                                                            nomena, such as fillers, repetitions and reformula-
vs. symmetrical).
                                                            tions. However, example 1 shows several charac-
The KIP corpus structure can thus be outlined as
                                                            teristics of formal styles, either cross-linguistically
follows:
                                                            shared (e.g. clausal subordination, passive con-
   • Formal dataset:                                        struction, abstract and specific terms) or language-
                                                            specific (e.g. existential construction with vi
           – lessons                                        as pre-copular proform); while example 2 dis-
           – office hours                                   plays various features which are typical of in-
           – oral examinations                              formal styles, such as simple sentence structure
                                                            and pragmatically-marked word orders (e.g. il
   • Informal dataset:                                      più carino di tutti lo cornifichi), multi-functional
           – semi-structured interviews                     words (e.g. carino), colloquialisms (e.g. povero
                                                            cristo, beccare, cornifichi, cornificato), elatives
           – casual conversations.
                                                            (e.g. cornificatissimo), deictics (e.g. questo, lui)
Below are examples of formal (1) and informal (2)           and discourse markers (e.g. cioè, scusa).
texts.                                                      All speakers were informed of the aims of the
                                                            project, agreed to the recording and signed a con-
(1)1                                                        sent form.
BO088: una volta che carlo magno                               The set of data exploited for KIPoS precisely
conquisto’ l’italia fu permesso ad
anselmo di tornare eh a mantova
                                                            consists of around 200K tokens, corresponding to
BO088: nel settecentosettantaquattro                        approximately one-third of the whole KIParla cor-
BO088: ehme cosi’ po pote’ riprendere                       pus, with an equal proportion of informal and for-
la sua attivita’ prima eh di creazione
della biblioteca
                                                            mal speech data.
BO088: perche’ secondo appunto l’uso eh                        For the purposes of KIPoS, the UDpipe trained
delle biblioteche eh                                        on all the treebanks available for Italian within the
BO088: medioev medievali diciamo prima
eh vi era                                                   Universal Dependencies repository3 has been ap-
BO088: mh la insomma la raccolta di                         plied on this 200K tokens portion of the KIParla
libri dall’esterno                                          corpus. Among these data, approximately 30K
(2)2                                                        tokens have been submitted to a careful manual
                                                            check and correction4 and released as training
BO003: povero cristo sono andata a
beccare questo                                              sets of the KIPoS task (i.e. DEVSET–formal and
BO002: ma poi scusa il piu’ carino di
                                                               3
tutti lo cornifichi                                              https://universaldependencies.org/it/
BO003: si’ si’ si’ esa poi secondo me                       index.html
                                                               4
lui e’ il piu’ carino di tutti                                   We thank three students for their precious help: Filippo
                                                            Mulinacci, Martina Pittalis and Roberto Russo of the Depart-
   1
       KIP Corpus, BOC1001, oral examination                ment of Modern Languages, Literatures and Cultures of the
   2
       KIP Corpus, BOA3001, casual conversation             University of Bologna.
          Team                                            Affiliation
          UniBO                              FICLIT – University of Bologna
          UniBA                              University of Bari ”Aldo Moro”
         KLUMSy          Friedrich Alexander Universität Erlangen-Nürnberg & Universität Stuttgart

                     Table 2: The teams which participated to KIPoS and their affiliation.


DEVSET–informal). From the remaining auto-
matically annotated data, we extracted the formal-              # conversation = BOD2018
                                                                # speaker = 3_AM_BO140
TESTSET and informal-TESTSET, and we also                       # turn = 3
manually checked and validated them. Finally, we                # text = mh sı̀
released as a silver standard (i.e. SILVERSET) the              1 mh PARA
                                                                2 sı̀ INTJ
remaining data. They have been also made avail-
able together with the other data5 to be used for               The format and the labels for tagging the part
training participants’ systems.                                 of speech of the KIPoS data are compliant with
                                                                that provided in the Universal Dependencies Ital-
3.1    Annotation                                               ian treebanks. Data were indeed released in a
As far as the annotation is concerned, for the                  CoNNL-U - like format, but which only includes
purpose of the task, the original orthographic                  the three first columns of it, separated by tab keys
transcriptions were provided in a tab-delimited                 as usually. For a detailed list and description of the
.txt format.    Three are the main identifiers                  tagset used in KIPoS datasets, see the Appendix at
we used in this format, respectively indicating                 the end of this paper.
the conversation (alphanumeric), the speaker’s
ID (alphanumeric) and the position of the turn                  3.2   Tokenization Issues
(numeric) within the context of the conversa-                   For what concerns words including multiple to-
tion.    For instance, the example below in-                    kens, in the data released for the development and
cludes the first three turns of the conversation                training of participant systems (DEVSET–formal
”BOD2018”6 , in which three different speakers                  and DEVSET–informal), we annotated their com-
are involved (”1 MP BO118”, ”2 MP BO118”                        pound and splitting both. See for instance, in the
and ”3 AM BO140”):                                              first turn of the example above lines 2-3, 2 and 3: a
# conversation = BOD2018                                        verb with clitic suffix occurs and it is annotated as
# speaker = 1_MP_BO118
# turn = 1                                                      a compound in line 2-3, while its components, i.e.
# text = dovresti parlarmi della tua casa                       the verb and the clitic, are separately annotated on
1 dovresti AUX                                                  line 2 and 3 respectively.
2-3 parlarmi VERB_PRON
2 parlar VERB                                                   In contrast, for the purpose of the evaluation, the
3 mi PRON                                                       format applied on the test set (TESTSET–formal
4-5 della ADP_A
4 di ADP
                                                                and TESTSET–informal) only includes a word for
5 la DET                                                        each line, regardless of the fact that a word may be
6 tua DET                                                       composed of more than one token. This makes the
7 casa NOUN
                                                                format of the test set slightly different from that
# conversation = BOD2018                                        used in the development data, but more compliant
# speaker = 2_MP_BO118                                          with the evaluation scripts and procedures. An ex-
# turn = 2
# text = attuale                                                ample of this format follows, which consists in the
1 attuale ADJ                                                   first turn of the example above:
   5
      All the data annotate for KIPoS are available at https:   # conversation = BOD2018
//github.com/boscoc/kipos2020, with the licence                 # speaker = 1_MP_BO118
and the annotation guidelines.                                  # turn = 1
    6
      The alphanumeric code used to name the KIP’s con-         # text = dovresti parlarmi della tua casa
versations provides information about the city in which the     1 dovresti AUX
the data has been collected (BO= Bologna, TO=Turin) and         2 parlarmi VERB_PRON
the kind of interaction (A1=office hours, A3=free conversa-     3 della ADP_A
tion, C1=exams, D1=lessons, D2=interviews). For example,        4 tua DET
BOD2018 is a semistructured interview recorded in Bologna.      5 casa NOUN
                   Task         DEVSET          TESTSET              Team         Score
                               Baseline (from POSTWITA)                          0.9319
                   Main     formal and informal   formal            UniBO       0.934880
                                                                   KLUMSy       0.875629
                                                                    UniBA       0.815819
                                                       informal     UniBO       0.911316
                                                                   KLUMSy       0.882368
                                                                    UniBA       0.793684
                  Task A           formal              formal      KLUMSy       0.873672
                                                                    UniBA       0.787311
                                                       informal    KLUMSy       0.875789
                                                                    UniBA       0.757895
                  Task B          informal             formal      KLUMSy       0.878144
                                                                    UniBA       0.771101
                                                       informal    KLUMSy       0.881053
                                                                    UniBA       0.775000

Table 3: The official scores achieved by participants for the three subtasks (Main, Task A and Task
B), by training systems on both or one of the datasets provided for development (DEVSET–formal and
DEVSET–informal), on the TESTSET–formal and TESTSET–informal (best scores for each subtask in
bold face).


In this example, the verb with clitic suffix ”par-      Speech tagging that is based on a pre-trained neu-
larmi” (speak to me) has been annotated as a com-       ral language BERT-derived model (UmBERTo)
pound on a single line, i.e. line 2.                    and an adapted fine-tuning script.
                                                        KLUMSy used a tagger based on the averaged
4   Evaluation measures                                 structured perceptron, which supports domain
For the KIPoS task a single measure has been used       adaptation and can incorporate external resources
for the evaluation of participants’ runs, i.e. ac-      for dealing with the limited availability of in-
curacy, which is defined as the number of correct       domain data.
Part-of-Speech tags assignment divided by the to-          The overall higher accuracy has been achieved
tal number of tokens in the gold TESTSET. The           in the main task by the UniBO team on the
evaluation metric will be based on a token-by to-       TESTSET-formal. The availability of a larger
ken comparison and only a single tag is allowed         training corpus for the main task, which includes
for each token.                                         the DEVSET–formal and the DEVSET–informal
The evaluation is performed in a black box ap-          both, and the results calculated on both the por-
proach, where only the systems output is evalu-         tions of the TESTSET allowed, as expected, the
ated.                                                   achievement of the KIPoS overall best score. This
                                                        is confirmed also by the fact that all teams pro-
5   Participation and Results
                                                        vided their best runs in it, for formal and informal
As depicted in table 3, where the main task and the     register both. Even if the official submission of
two subtasks results are presented at glance, three     UniBO did not include the runs for Task A and
teams submitted their runs for KIPoS (see table 2       B, the results it provided in its report (Tamburini,
for their affiliation). Nevertheless, one team par-     2020) show indeed that also this team has ranked
ticipated to the main task only, while the other two    worst in Task A and B than in the main one. More
provided results for Task A and B too.                  precisely, for Task A, it achieved 0.8647 accuracy
   The three teams applied different approaches.        on TESTSET–formal and 0.8316 on TESTSET–
UniBA team used a combination of two taggers            informal, while in Task B it achieved 0.8974
implementing two different approaches, namely           on TESTSET–formal and 0.8952 on TESTSET–
stochastic Hidden Markov Model and rule-based.          informal.
UniBO applied a fine-tuning approach to Part of         As far as the other teams are concerned, UniBA
provided in its report (Izzi and Ferilli, 2020) also     bit (0.0038) higher than that for the formal one.
the results achieved using a version of the TEST-        Focusing on the cross subtasks A and B, we can
SET where a few errors detected after the official       moreover notice that systems were not equally in-
evaluation has been fixed. This allowed a small          fluenced by the type of data exploited for training:
improvement in their scores (e.g. in the main task,      UniBO provided best scores against TESTSET–
+0.0078 for formal and +0.0056 for informal reg-         formal also when trained on DEVSET–informal
ister).                                                  (Task B), while KLUMSy provided best scores
The KLUMSy team provided the best runs for               against TESTSET–informal also when trained on
both registers in Task A and B, but in its runs,         DEVSET–formal (Task A). UniBA seems instead
because of a misunderstanding of the guidelines          slightly more influenced by the features of data
about the annotation of contractions in the TEST-        used in training.
SET (which is slightly different with respect to the
DEVSET), a certain amount of mis-tagged tokens
occurred. After they were fixed, also the scores
                                                         6   Discussion and Conclusion
of this team were improved (with an increase that
varies from 0.0456 to 0.0187) with respect to the
official ones reported in table 3, as described in the   The results described in this report can be only
report of this team (Proisl and Lapesa, 2020).           considered as preliminary. First of all, KIPoS is
   Considered that the PoS tagging is a task mostly      the first edition of a task about PoS tagging of
solved, it is not surprising that the participants’      spontaneous speech for Italian and there aren’t
scores are quite high and close for all the tracks.      other results about this kind of task for the same
The larger difference observed between the best          language to be compared with. Second, the cor-
and the worst score is indeed 0.126, and it is re-       pus used for KIPoS has been newly released for
ferred to Task B on TESTSET–formal.                      the purpose of the task and never used before. Par-
Given the peculiarity of oral text on which KIPoS        ticipants provided some useful feedback about er-
is focused, it seems not especially meaningful a         rors occurring in the DEVSET and TESTSET, but
comparison of our results with state-of-the-art Pos      some further check should be applied for improv-
taggers results for the written standard language.       ing the quality of data. Finally, only three partici-
A more interesting comparison can be instead de-         pants submitted their runs (and only two provided
veloped with respect to the scores achieved within       official runs for cross-genre tasks). Even if PoS
the PoSTWITA task (Bosco et al., 2016) on writ-          tagging is among the tasks which are considered
ten texts extracted from social media. This genre        as mostly solved in literature, only a larger partici-
is indeed often considered in between written and        pation may allow a meaningful comparison among
oral, sharing some feature with the former and           different approaches and results.
some with the latter. Using the best PoSTWITA
                                                            Nevertheless, the KIPoS task produced the valu-
task accuracy score (0.9319) as our baseline (see
                                                         able result of making available a novel resource for
table 3), we can observe that the best scores
                                                         the study of spoken Italian and for the advance-
achieved in KIPoS are in line with this result. This
                                                         ment of NLP in this area. It can be of great rel-
confirms the hypothesis that oral text can be con-
                                                         evance for the investigation of both spontaneous
sidered as almost equally hard to be morphologi-
                                                         speech phenomena and sociolinguistic variation,
cally tagged than social media.
                                                         but also e.g. in the development of chatbots and
   As far as the distinction between formal and in-      vocal recognition systems.
formal conversation drawn in the KIPoS datasets          In particular, the insights gained within the con-
is concerned, a general trend of better scoring in       text of this Evalita evaluation campaign for PoS
formal data tagging can be observed, but some            tagging can pave the way for further investigating
meaningful difference among participant systems          actual speech data. They provide a solid founda-
occurs. For all subtasks UniBO best scored in for-       tion for our future research also in the direction
mal text, while KLUMSy did the same in infor-            of more detailed morphological analysis and syn-
mal data. UniBA achieved instead its best scores         tactic parsing, especially within the framework of
on TESTSET–formal with the exception of Task B           Universal Dependencies where we would like to
where its score for the informal test set is a little    release the KIPoS dataset in the near future.
7   Acknowledgments                                       Daniela Mereu. 2019. Il sardo parlato a Cagliari.
                                                            Franco Angeli, Milano.
The construction of part of the corpus has been
possible thanks to the financing of the Fondazione        Thomas Proisl and Gabriella Lapesa.              2020.
                                                            KLUMSy@KIPoS: Experiments on Part-of-Speech
CRT under the Erogazioni ordinarie 2018 pro-
                                                            Tagging of Spoken Italian. In Valerio Basile, Danilo
gram. The KIParla corpus has been made pos-                 Croce, Maria Di Maro, and Lucia C. Passaro, edi-
sibile thanks to SIR Project ’LEAdHOC’ (n.                  tors, Proceedings of the 7th evaluation campaign of
RBSI14IIG0), funded by MIUR. We would like                  Natural Language Processing and Speech tools for
to thank also the students from our BA and MA               Italian (EVALITA 2020), Online. CEUR.org.
courses at the Universities of Bologna and Torino,        Manuela Sanguinetti, Cristina Bosco, Alessandro
who participated in collecting and transcribing the        Mazzei, Alberto Lavelli, and Fabio Tamburini.
data.                                                      2017. Annotating Italian social media texts in Uni-
                                                           versal Dependencies. In Proceedings of the Fourth
                                                           International Conference on Dependency Linguis-
                                                           tics (Depling 2017), pages 229–239.
References
Giovanna Alfonzetti. 2002. La relativa non-standard.      Manuela Sanguinetti, Cristina Bosco, Alberto Lavelli,
  Italiano popolare o italiano parlato? Centro di          Alessandro Mazzei, and Fabio Tamburini. 2018.
  Studi Filologici e Linguistici Siciliani, Palermo.       PoSTWITA-UD: an Italian Twitter Treebank in Uni-
                                                           versal Dependencies. In Proceedings of the 11th
Valerio Basile, Danilo Croce, Maria Di Maro, and Lu-       Language Resources and Evaluation Conference
  cia C. Passaro. 2020. EVALITA 2020: Overview of          (LREC 2018), pages 1768–1775.
  the 7th Evaluation Campaign of Natural Language
  Processing and Speech Tools for Italian. In Valerio     Fabio Tamburini. 2020. UniBO@KIPoS: Fine-tuning
  Basile, Danilo Croce, Maria Di Maro, and Lucia C.         the Italian “BERTology” for PoS-tagging Spoken
  Passaro, editors, Proceedings of Seventh Evalua-          Data. In Valerio Basile, Danilo Croce, Maria
  tion Campaign of Natural Language Processing and          Di Maro, and Lucia C. Passaro, editors, Proceedings
  Speech Tools for Italian. Final Workshop (EVALITA         of the 7th evaluation campaign of Natural Language
  2020), Online. CEUR.org.                                  Processing and Speech tools for Italian (EVALITA
                                                            2020), Online. CEUR.org.
Cristina Bosco, Fabio Tamburini, Andrea Bolioli, and
  Alessandro Mazzei.      2016.     Overview of the
  EVALITA 2016 Part Of Speech on TWitter for ITAl-
  ian task. In Proceedings of Evalita 2016.
Cristina Bosco, Silvia Ballarè, Massimo Cerruti, Eu-
  genio Goria, and Caterina Mauri. 2020. KIPoS @
  EVALITA2020: Overview of the Task on KIParla
  Part of Speech tagging. In Valerio Basile, Danilo
  Croce, Maria Di Maro, and Lucia C. Passaro, edi-
  tors, Proceedings of Seventh Evaluation Campaign
  of Natural Language Processing and Speech Tools
  for Italian. Final Workshop (EVALITA 2020), On-
  line. CEUR.org.
Giovanni Luca Izzi and Stefano Ferilli.       2020.
  UniBA@KIPoS: A Hybrid Approach for Part-of-
  Speech Tagging. In Valerio Basile, Danilo Croce,
  Maria Di Maro, and Lucia C. Passaro, editors, Pro-
  ceedings of the 7th evaluation campaign of Natural
  Language Processing and Speech tools for Italian
  (EVALITA 2020), Online. CEUR.org.
Per Linell. 2005. The written language bias in linguis-
  tics: its nature, origins and transformations. Rout-
  ledge, London – New York.
Caterina Mauri, Silvia Ballarè, Eugenio Goria, Mas-
  simo Cerruti, and Francesco Suriano. 2019. KIParla
  Corpus: A New Resource for Spoken Italian. In
  Raffaella Bernardi, Roberto Navigli, and Giovanni
  Semeraro, editors, Proceedings of the 6th Italian
  Conference on Computational Linguistics (CLiC-it
  2019), Online. CEUR.org.
                              APPENDIX: The KIPoS tagset
Tag         Value(s)                                            Examples
ADJ         • Qualifying, numeral, possessive adjectives        una bella casa
            • Interrogative adjectives                          quanti anni hai?
            • Adjectives used as pro-forms                      -ci vediamo domani? -esatto
ADP         • Prepositions                                      di, a, da, senza te, tranne, ...
            • Pospositions                                      vent’anni fa
ADP A       • Articled prepositions                             dalla, nella, sulla, ...
ADV         • Adverbs                                           lo metto qui
            • Interrogative adverbs                             non ricordo come si chiama
AUX         • Auxiliaries                                       essere, avere
            • Modals                                            potere, volere, dovere
            • Periphrastic auxiliaries                          sta mangiando, viene visto, ...
CCONJ       • Coordinating conjunctions                         e, ma, o, però, anzi, quindi,
            • Discourse markers with predominantly connective    dunque, ...
            function
DET         • Articles                                          ho visto un film
            • Demonstratives                                    la senti questa voce?
            • Numerals                                          ho giocato tre numeri al lotto
            • Possessives                                       non nominare miasorella
            • Quantifiers                                       alcuni studenti sono assenti
DIA         • Italo-Romance dialects                            c’erano due fiulin
INTJ        • Interjections                                     sı̀, no, ecco, ...
LIN         • Languages other than Italian                      vi saluto guys
NEG         • Sentence negation                                 non
NOUN        • Nouns of any type except proper nouns             ho visto un re
NUM         • Numbers (but not numeral adjectives)              - quanti sono? -tre
PARA        • Paraverbal communication                          eh, mh, oh, bla bla, . . .
PRON        • Personal and reflexive pronouns                   io, me, tu, te, sé, ...
            • Interrogative pronouns                            chi?, cosa?, quale?, che?
            • Relative pronouns                                 il quale, dove, cui
PROPN       • Proper nouns                                      Gigi
SCONJ       • Subordinating conjunctions                        dove, quando, perché
                                                                ho detto che. . .
                                                                se vuoi
VERB        • Verbs                                             aveva vent’anni
                                                                era molto stanco
VERB PRON   • Verb + clitic pronoun cluster                     mangiarlo, donarglielo, . . .
X           • Other (e.g. truncated words)                      fior-