<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Pisa, Italy
* Corresponding author.
† This paper is the result of the collaboration between the three
authors. For academic purposes, all the authors are responsible of
Sections</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Join Together? Combining Data to Parse Italian Texts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Claudia Corbetta</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Moretti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Passarotti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Università Cattolica del Sacro Cuore</institution>
          ,
          <addr-line>largo A. Gemelli 1, 20123 Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Università degli studi di Bergamo</institution>
          ,
          <addr-line>via Salvecchio 19, 24129 Bergamo</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Università di Pavia</institution>
          ,
          <addr-line>corso Strada Nuova 65, 27100 Pavia</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>1</volume>
      <issue>3</issue>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>In this paper, we create and evaluate non-combined and combined models using Old and Contemporary Italian data to determine whether increasing the size of the training data with a combined model could improve parsing accuracy to facilitate manual annotation. We find that, despite the increased size of the training data, in-domain parsing performs better. Additionally, we discover that models trained on Old Italian data perform better on Contemporary Italian data than the reverse. We attempt to explain this result in terms of syntactic complexity, finding that Old Italian text exhibits higher sentence length and non-projectivity rate.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Parsing</kwd>
        <kwd>Universal Dependencies</kwd>
        <kwd>Combined Model</kwd>
        <kwd>Old Italian</kwd>
        <kwd>Contemporary Italian</kwd>
        <kwd>Non-Projectivity</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        and Old Italian data. The objective is to determine
whether a combined model with an expanded training
High-quality textual data (semi-)manually enhanced dataset performs better compared to non-combined
modwith diferent layers of metalinguistic annotation are els (see [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for Spanish language and [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for Stanza
comextremely valuable resources for conducting linguistic bined models).
analysis. As for the syntactic layer of annotation, the de The paper is organised as follows: Section 2 provides a
facto standard for dependency-based annotation is Uni- brief description of the Italian language, the syntactic
reversal Dependencies (UD),1 an initiative that provides sources and the Italian data available; Section 3 details the
machine-readable annotations for a wide variety of lan- data used for the experiments, presents the performances
guages, including historical languages [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. At the current of non-combined and combined models, and evaluates
state of art,2 Contemporary Italian is well-represented their performances; Section 4 analyzes the syntactic
comin UD, whereas Old Italian is only represented by one plexity of each test set (Old and Contemporary Italian)
annotated text (a portion of the Divine Comedy of Dante to address accuracy diferences; and finally, Section 5
Alighieri). The creation of additional Old Italian anno- provides the conclusion.
tated data is therefore advisable.
      </p>
      <p>Since a fully manual annotation process is
timeconsuming and requires significant efort, we aim to expe- 2. Talking about Italian
dite it by using a parser that pre-parses the data, leaving
the human annotator with only a manual revision task.</p>
      <p>To address this, given the scarcity of Old Italian data,
we create a combined parser using both Contemporary</p>
      <sec id="sec-1-1">
        <title>Italian is a Romance language derived from Latin, and its</title>
        <p>
          development is closely connected with the political,
cultural and economic system of Italy during the Late Middle
Ages [
          <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6, 7</xref>
          ]. Even though the evolution and history
of the Italian language "can be properly understood only
within the wider context of the evolution of the Italian
dialects" [5, p. 3], the dialect spoken in Florence (Tuscany)
in the thirteenth century, known as Florentine, played a
pivotal role in establishing the foundation of the Italian
language. The pre-eminence of Florentine over other
Italian dialects was established due to the importance and
prestige of Florentine literature. Its widespread success
contributed to the codification of Florentine as the lingua
volgare in the sixteenth century, distinguishing it as the
spoken Italian language in contrast to Latin, which was
still used for written cultural discourse [8].
        </p>
        <p>
          Even though Florentine (and, more generally, Tuscan
dialects) is considered conservative in its linguistic
evolution [5, p. 5], it is now widely recognized by most scholars Concerning Old Italian, the only treebank present in
as distinct from Contemporary Italian [9, p. 8]. Among UD is Italian-Old [18], encompassing the Divine Comedy,
the diferences between Contemporary Italian and Flo- a poetic text written by Dante Alighieri (1 265-1 321).
Currentine (henceforth referred to as Old Italian),3 several rently, Italian-Old contains the first two Cantiche of the
syntactic distinctions have been noted [
          <xref ref-type="bibr" rid="ref7">10, 11</xref>
          ]. These poem, namely Inferno and Purgatorio, amounting 80 694
include, among others, the position and order of clitics, tokens, 82 644 syntactic words5 and 2 402 sentences.6
the use of the marker sì ’that’ as a thematic marker, and The divergence in annotated data available for
Condiferences in the use of compound tenses [ 11, p. 425-444]. temporary Italian (around 875K syntactic words) versus
Old Italian (82K syntactic words) is considerable.
2.1. Syntactic resources Considering that i) treebanks are essential for
expanding the sample of comparable data and that ii) the manual
High-quality (semi-)manually annotated treebanks, i. e. annotation of data is an extremely time-consuming
efcorpora with annotations on various linguistic levels,4 fort, the development of automatic parsers is crucial to
are indispensable tools for in-depth analysis of the syn- expedite and assist the annotation process.
tax (and morphology) of languages. Treebanks not only The shortage of gold-annotated data for Old Italian,
facilitate faster, easier, and more precise querying of syn- compared to the large amount of data available for
Contactic structures, but also aid in tracking the evolution of temporary Italian, led us to recognize the potential of
syntactic patterns in languages through time [13]. testing combined models, i.e., models with a training set
        </p>
        <p>Among the dependency treebanks, UD is a pivotal composed of both Old and Contemporary Italian data.
initiative displaying cross-linguistically consistent
treebanks for many languages [14]. As of the current version
2.14, UD includes 283 treebanks and 161 languages, en- 3. Combining Old Italian with
compassing historical languages such as Latin (e.g. Index Contemporary Italian data
Thomisticus Treebank, ITTB [15]), Old French
(PROFITEROLE [16]) and Ancient Greek (e.g. PROIEL [17]), among
others.</p>
        <p>In Subsection 2.2, we describe UD treebanks of Italian
language.</p>
        <sec id="sec-1-1-1">
          <title>2.2. Italian data</title>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>Regarding Italian, UD includes 9 Contemporary Italian treebanks, spanning various genres, as reported in Table 1.</title>
      </sec>
      <sec id="sec-1-3">
        <title>3We adhere to the definition of Salvi and Renzi [ 9], who use the term</title>
        <p>Old Italian to refer to the language spoken in Florence during the
13th and 14th centuries.
4Treebanks usually provide information on sentence tokenization,
word lemmatization, and both morphological and syntactic details.
Syntactic analysis is mandatory in a treebank, and can be encoded
in either dependency syntax or constituency syntax [12].</p>
      </sec>
      <sec id="sec-1-4">
        <title>Considering the aforementioned divergence in data, we</title>
        <p>create and evaluate the performance of a combined
Contemporary-Old Italian model to understand whether
joining datasets from diferent periods could improve
parsing accuracy.</p>
        <p>We train models using Stanza [19], a neural pipeline for
natural language processing, with diferent training sets.
Specifically, we train models based on Contemporary
Italian data (henceforth CI), Old Italian data (henceforth
OI), and a combination of Contemporary and Old Italian
data (henceforth Combi).</p>
        <p>In Subsection 3.1 we detail the selection and
partitioning of the data. Subsection 3.2 outlines the creation of
models and presents the resulting scores. Finally,
Subsection 3.3 discusses the combined Contemporary-Old
Italian model.</p>
        <sec id="sec-1-4-1">
          <title>3.1. Selection and partitions of data</title>
        </sec>
      </sec>
      <sec id="sec-1-5">
        <title>To build the model based on OI data, we use the only Old</title>
        <p>Italian treebank available, Italian-Old.</p>
        <p>Among all the Contemporary Italian UD treebanks, we
select two treebanks, ISDT (Italian Stanford Dependency
Treebank) and VIT (Venice Italian Treebank). We select
ISDT [20], as it is the Italian treebank with the highest</p>
      </sec>
      <sec id="sec-1-6">
        <title>5We use the term "syntactic words" and "tokens" following the</title>
        <p>UD definition (see https://universaldependencies.org/u/overview/
tokenization.html).
6The numbers refer to UD version 2.14, see https:
//universaldependencies.org/treebanks/it_old/index.html.</p>
      </sec>
      <sec id="sec-1-7">
        <title>UD star ranking. This ranking, designed by the UD orga- Table 3</title>
        <p>nizers, quantifies various qualities of the corpora, such as Evalutation metrics with VIT1 and OI models (where "-&gt;"
their usability and the variety of genres they encompass. stands for "on").</p>
        <p>Moreover, since Italian-Old is based on the poetry genre, VIT1 -&gt; VIT1 OI -&gt; OI VIT1 -&gt; OI
to minimize a potential genre gap (the influence of genre LAS 71.60 75.86 42.83
on parsing has been addressed in [21]), we also select UAS 77.70 82.24 56.13
VIT [22], that includes, albeit with a limited number of
words, literary texts.7 We point out that, up to now, no
CI treebanks contain poetry (see Table 1). Table 4</p>
        <p>To avoid the CI data overwhelming the OI data due Evalutation metrics with ISDT and OI models.
to their size disparity, we partition the CI data. The VIT ISDT -&gt; ISDT OI -&gt; OI ISDT -&gt; OI
treebank, consisting of 259.625 tokens, 280.153 syntactic LAS 88.55 75.86 51.62
words, and 10.087 sentences, allows us to partition the UAS 91.41 82.24 63.03
data into three parts, with each part closely matching
the size of the Italian-Old dataset. Specifically, we divide
the VIT dataset into three partitions of 34%, 33% and 33%, that pertain to the same textual domain as the test set
respectively named VIT1, VIT2 and VIT3. Additionally, (VIT1 on VIT1, OI on OI, and ISDT on ISDT), yields higher
we further divide each partition (VIT1, VIT2 and VIT3) performance than using out-of-domain data (ISDT on OI,
into train, test, and dev sets with a split of 70%, 15%, and VIT1 on OI, and OI on VIT1 and ISDT). These results
15%, the same used in Italian-Old dataset. Unlike the align with literature on in-domain testing [25].
VIT treebank, the ISDT is not directly partitionable, as While analyzing the scores of out-of-domain parsing
it counts 278 461 tokens, 298 375 syntactic words, and (ISDT on OI, VIT on OI, and OI on ISDT and VIT), we
14 167 sentences. Therefore, we shufled the data and notice that the model trained on OI data performs better
extracted a total of 82 500 tokens (the same size of OI on CI data in both scenarios, whereas CI models yield
data), which were then partitioned into train, dev, and lower scores when applied to OI text. The diferences in
test sets with a ratio of 70%, 15%, and 15%, respectively. scores are approximately 20 points in favour of the OI</p>
        <p>We report in Table 2 the partition of each datasets in model, specifically 25.7 (LAS) and 19.4 (UAS) compared
train/dev/test. to VIT1, and 23.21 (LAS) and 17.9 (UAS) compared to
ISDT.</p>
        <p>We attempt to explain the outperformance of the OI</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3.2. Creation of models and scores model in Section 4.</title>
      <sec id="sec-2-1">
        <title>3.3. Joining model</title>
        <sec id="sec-2-1-1">
          <title>With each partition (OI, VIT1, VIT2, VIT3 and ISDT), we</title>
          <p>train 5 models using Stanza, with the training and dev
sets, and we evaluate them on the respective test sets.
Within the CI-VIT datasets, we retain only the model
that performs best, namely VIT1.</p>
          <p>We then use the model built on OI data to parse the CI
test sets, and vice versa.</p>
          <p>In Table 3 and Table 4, we report the scores of both
Label Attachment Score (LAS) and Unlabel Attachment
Score (UAS)8 of the OI model and the VIT1, and of the OI
and the ISDT respectively.</p>
          <p>For both VIT1 and ISDT scenarios, results show that
using a model trained on in-domain data, namely data</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>To challenge the results obtained in 3.2, we build com</title>
          <p>bined models with Stanza by merging OI data with CI
data. Specifically, we create two models: CombiVIT, and
CombiISDT. For each combined model, the test, dev, and
train sets are created by merging the corresponding test,
dev, and train sets of the VIT1 data and ISDT data with
those of the OI data.</p>
          <p>In Table 5, we report the UAS and LAS scores obtained.</p>
          <p>We notice that in both scenarios the combined
models perform better on CI data than on OI data, with the
combined models outperforming by 13.74 (LAS) and 10.1
7The VIT treebank contains 10 000 words of literally genre [22, 23]. (UAS) for CI-VIT data and 12.58 (LAS) and 8.87 (UAS) for
Refer also to the read.me to further details (see A). CI-ISDT data.
8Refer to [24] for an insight into the aforementioned metrics.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. An insight to OI and CI data</title>
      <p>To analyze the complexity of tree structures in each test
set (CI-ISDT, CI-VIT, and OI), we calculate:
• type-token ratio (TTR): the number of types
divided by the number of tokens (excluding
punctuation);
• tree depth (Depth): the longest path from the root
of an oriented a-cyclic graph (i.e, the syntactic
tree) to a leaf;
• lexical density (Lex. Den.): the number of content
words, i.e. words that possess semantic content
and contribute to the meaning of the sentence,9
divided by the total number of syntactic words
(excluding punctuation marks);
• sentence length (Length): the number of
syntactic words (excluding punctuation marks) in each
sentence.</p>
      <sec id="sec-3-1">
        <title>9We select as content words all words belonging to the following</title>
        <p>Universal parts of speech [26]: NOUN ’noun’, VERB ’verb’, ADJ ’
adjective’, ADV ’adverbs’, and PROPN ’proper noun’.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Among the measures described, the OI test does not</title>
        <p>difer significantly from the CI values. The only measure
in which the OI test difers from the CI tests is sentence
length (Avg. Length): OI presents a higher average
sentence length, surpassing the CI-ISDT average by 13 points
and the CI-VIT average by 3.5.</p>
        <p>Therefore, considering the parameters evaluated, only
the sentence length could be considered to explain the
possible overperformance of OI on CI data.</p>
        <p>In Subsection 4.1, we evaluate another parameter that
is related to the complexity of tree structure, namely
non-projectivity (i.e., the number of structures where
a head and its dependents form a discontinuous
constituent). It has been demonstrated [27] that sentence
length is interconnected with non-projectivity.
Specifically, non-projective sentences exhibit greater sentence
length compared to projective ones. By calculating
non-projectivity, we aim to determine whether sentence
length (which has been proven to be higher in OI test) and
non-projectivity might indicate more complex structures
in OI texts, thereby contributing to the overperformance
of the OI model on CI data.</p>
        <sec id="sec-3-2-1">
          <title>4.1. Non-projectivity</title>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>Non-projectivity arises when sentences exhibit non-local</title>
        <p>dependencies. While constituency approaches may
handle similar structures using empty categories and
coindexation [28], dependency-based approaches result in
discontinuous dependencies that lead to non-projectivity.</p>
        <p>We illustrate an example of non-projectivity, showing
the non-local dependency relation of the oblique (obl)
dependency relation of the node fóri ’holes’, which is a
dependent of the node piena ’full’. This relation causes
non-projectivity with the node pietra ’rock’, which is
dependent on the root (root) of the sentence vidi ’saw’
with an object (obj) dependency relation.</p>
        <p>Inferno, xix, vv. 13–14:
Io vidi per le coste (...) / piena la pietra
livida di fóri
‘Along the sides (...), / I saw that livid rock
was perforated’
obj</p>
        <p>
          of the model, does not lead to better LAS and UAS
acroot obl curacy scores. This confirms, in line with other studies
advcl:pred [
          <xref ref-type="bibr" rid="ref10 ref3 ref8 ref9">30, 31, 21, 32, 3</xref>
          ], that having an in-domain training set
obl:lmod is preferable.
        </p>
        <p>case Additionally, we notice that the model trained on OI
nsubj det det amod case data performs better on Contemporary Italian texts than
the reverse (i.e. models trained on Contemporary data
Io vidi per le coste piena la pietra livida di fóri on OI texts). To explain these results, we investigate
The non-projectivity of syntactic dependency trees the syntactic complexity of each test set (OI, CI-ISDT,
presents a challenging task for parsing in natural lan- and CI-VIT). Specifically we evaluate sentence length,
guage processing [29], with non-projective structures tree depth, lexical density and the type-token ratio. We
proving more dificult to parse. Concerning our task, we notice that the tests difer only in the sentence length. We
investigate the number of non-projective structures in then proceed to calculate another parameter of syntactic
each test set to determine whether the overperformance complexity, namely non-projectivity.
of OI on CIdata may be associated with a higher preva- We discover that OI texts present a higher number
lence of non-projective structures, thereby confirming of non-projective sentences. We hypothesize that the
that having more non-projective structures in the train- high level of non-projectivity could be connected to the
ing set is beneficial. genre of OI text, namely poetry. Thus far, the lack of</p>
        <p>We calculate non-projectivity of the OI, CI-VIT, and UD treebanks for OI prose texts and for CI poetry texts
CI-ISDT test sets. In Table 7 we report the total number have prevented us from investigating whether the high
of edges, the number of non-projective edges, and the degree of non-projectivity observed in OI test (based on
ratio of non-projectivity expressed in percentage of each the Italian-Old treebank) is characteristic of the poetry
test set. genre or specific to OI. Such question will be left for
further studies.</p>
        <p>Table 7 Finally, we are currently working to increase the
Non-projectivity of OI, CI-VIT, and CI-ISDT test sets. amount of manually annotated OI data, expanding both</p>
        <p>OI CI-VIT CI-ISDT the range of authors and the genres of the texts
considTotal edges 12 307 11 473 12 402 ered. This will allow us to evaluate the model’s
perforNon-projective edges 176 24 7 mance both within and outside its domain (in terms of
Non-projectivity ratio in % 1.43% 0.21% 0.06% authorship and text typology), as well as to assess its
potential applicability to other OI texts.11</p>
        <p>As shown in Table 7, OI shows a higher rate of
nonprojectivity compared to CI texts. In particular, the
nonprojectivity in OI is 7 times higher than in CI-VIT and
24 times higher than in CI-ISDT. The high rate of
nonprojective structures in OI could be related to the genre
of the text, i.e., poetry, which reflects a more creative use
of language and frequently employs inversions.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusion</title>
      <sec id="sec-4-1">
        <title>In this paper, we create and evaluate non-combined and</title>
        <p>combined models of Old Italian and Contemporary Italian
data.10 In light of the scarcity of manually annotated Old
Italian data compared to the richness of Contemporary
Italian data, the aim of this work is to determine whether
combining data to train a combined model could lead
to better accuracy in parsing, thereby facilitating the
process for human annotators.</p>
        <p>We observe that combining Contemporary Italian and
Old Italian data, even though it increases the data size
10Models are available for public use at https://github.com/CIRCSE/</p>
        <p>Old_Italian_Model.
[7] G. Rohlfs, Grammatica storica della lingua italiana EVALITA 2014: 9-11 December 2014, Pisa, Pisa
Unie dei suoi dialetti, Torino : Einaudi, 1968. versity Press, 2014, pp. 1–8.
[8] M. Vitale, La questione della lingua, Palermo : [21] F. Mambrini, M. C. Passarotti, Will a parser overtake</p>
        <p>Palumbo, 1978. achilles? first experiments on parsing the ancient
[9] G. Salvi, L. Renzi (Eds.), Grammatica dell’italiano greek dependency treebank, in: Proceedings of
antico, il Mulino, Bologna, Italy, 2010. URL: https: the Eleventh International Workshop on Treebanks
//www.mulino.it/isbn/9788815134585. and Linguistic Theories (TLT11). 30 November–1
[10] M. Dardano, Sintassi dell’italiano antico. La prosa December 2012, Lisbon, Portugal, Edições Colibri,
del Duecento e del Trecento, volume 1, Carocci, 2012, pp. 133–144.</p>
        <p>
          2012. [22] L. Alfieri, F. Tamburini, (almost) automatic
conver[
          <xref ref-type="bibr" rid="ref7">11</xref>
          ] M. Dardano, G. Frenguelli, SintAnt. La sintassi sion of the venice italian treebank into the merged
dell’italiano antico, Roma, Aracne, 2004. italian dependency treebank format., in: CEUR
[12] A. Abeillé, Treebanks: Building and using parsed WORKSHOP PROCEEDINGS, volume 1749,
Accorpora, volume 20, Springer Science &amp; Business cademia University Press, 2016, pp. 19–23.
        </p>
        <p>Media, 2003. [23] R. Delmonte, A. Bristot, S. Tonelli, Vit-venice italian
[13] A. Taylor, Treebanks in historical syntax, Annual treebank: Syntactic and quantitative features., in:</p>
        <p>Review of Linguistics 6 (2020) 195–212. Sixth International Workshop on Treebanks and
[14] J. Nivre, M.-C. de Marnefe, F. Ginter, J. Hajič, C. D. Linguistic Theories, volume 1, Northern European
Manning, S. Pyysalo, S. Schuster, F. Tyers, D. Ze- Association for Language Technol, 2007, pp. 43–54.
man, Universal Dependencies v2: An evergrowing [24] S. Buchholz, E. Marsi, CoNLL-X Shared Task on
multilingual treebank collection, in: N. Calzolari, Multilingual Dependency Parsing, in: L. Màrquez,
F. Béchet, P. Blache, K. Choukri, C. Cieri, T. De- D. Klein (Eds.), Proceedings of the Tenth
Conferclerck, S. Goggi, H. Isahara, B. Maegaard, J. Mar- ence on Computational Natural Language Learning
iani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis (CoNLL-X), Association for Computational
Linguis(Eds.), Proceedings of the Twelfth Language Re- tics (acl), New York City, nj, usa, 2006, pp. 149–164.
sources and Evaluation Conference, European Lan- URL: https://aclanthology.org/W06-2920.
guage Resources Association, Marseille, France, [25] M. Khan, M. Dickinson, S. Kübler, Towards domain
2020, pp. 4034–4043. URL: https://aclanthology.org/ adaptation for parsing web data, in: Proceedings
2020.lrec-1.497. of the International Conference Recent Advances
[15] M. Passarotti, The project of the index thomisticus in Natural Language Processing RANLP 2013, 2013,
treebank, Digital classical philology. Ancient Greek pp. 357–364.
and Latin in the digital revolution 10 (2019) 299–320. [26] S. Petrov, D. Das, R. McDonald, A universal
partURL: https://doi.org/10.1515/9783110599572-017. of-speech tagset, arXiv preprint arXiv:1104.2086
[16] S. Prévost, L. Grobol, M. Dehouck, A. Lavren- (2011).</p>
        <p>tiev, S. Heiden, Profiterole: un corpus morpho- [27] J. Macutek, R. Cech, J. Milicka, Length of
nonsyntaxique et syntaxique de français médiéval, Cor- projective sentences: A pilot study using a Czech
pus (2023). UD treebank, in: X. Chen, R. Ferrer-i Cancho (Eds.),
[17] D. T. Haug, M. Jøhndal, Creating a parallel treebank Proceedings of the First Workshop on Quantitative
of the old indo-european bible translations, in: Pro- Syntax (Quasy, SyntaxFest 2019), Association for
ceedings of the second workshop on language tech- Computational Linguistics, Paris, France, 2019, pp.
nology for cultural heritage data (LaTeCH 2008), 110–117. URL: https://aclanthology.org/W19-7913.
2008, pp. 27–34. doi:10.18653/v1/W19-7913.
[18] C. Corbetta, M. Passarotti, F. M. Cecchini, [28] J. Nivre, Constraints on non-projective dependency
G. Moretti, Highway to Hell. Towards a Univer- parsing, in: D. McCarthy, S. Wintner (Eds.), 11th
sal Dependencies Treebank for Dante Alighieri’s Conference of the European Chapter of the
AssoComedy., in: CLiC-it, 2023. ciation for Computational Linguistics, Association
[19] P. Qi, Y. Zhang, Y. Zhang, J. Bolton, C. D. Man- for Computational Linguistics, Trento, Italy, 2006,
ning, Stanza: A python natural language processing pp. 73–80. URL: https://aclanthology.org/E06-1010.
toolkit for many human languages, arXiv preprint [29] J. Nivre, J. Nilsson, Pseudo-projective dependency
arXiv:2003.07082 (2020). parsing, in: K. Knight, H. T. Ng, K. Oflazer
[20] C. Bosco, F. Dell’Orletta, S. Montemagni, M. San- (Eds.), Proceedings of the 43rd Annual Meeting
guinetti, M. Simi, The evalita 2014 dependency of the Association for Computational Linguistics
parsing task, in: Proceedings of the First Italian (ACL’05), Association for Computational
LinguisConference on Computational Linguistics CLiC-it tics, Ann Arbor, Michigan, 2005, pp. 99–106. URL:
2014 &amp; and of the Fourth International Workshop https://aclanthology.org/P05-1013. doi:10.3115/</p>
        <p>A. Online Resources
• Italian-Old,
• Italian-ISDT,
• Italian-VIT,
• ITTB,
• PROFITEROLE,
• PROIEL,
• Stanza,
• Old-Italian-Model.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>M.-C. de Marnefe</surname>
            ,
            <given-names>C. D.</given-names>
          </string-name>
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Nivre</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Zeman</surname>
          </string-name>
          , Universal Dependencies,
          <source>Computational Linguistics</source>
          <volume>47</volume>
          (
          <year>2021</year>
          )
          <fpage>255</fpage>
          -
          <lpage>308</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .cl-
          <volume>2</volume>
          .11. doi:
          <volume>10</volume>
          .1162/coli_a_
          <fpage>00402</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Sánchez-León</surname>
          </string-name>
          ,
          <article-title>Combining diferent parsers and datasets for capitel ud parsing</article-title>
          ,
          <source>in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2020</year>
          ),
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Zeldes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <article-title>Are ud treebanks getting more consistent? a report card for english ud</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2302.00636. arXiv:
          <volume>2302</volume>
          .
          <fpage>00636</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Migliorini</surname>
          </string-name>
          ,
          <article-title>Storia della lingua italiana</article-title>
          ,
          <source>Bompiani</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Maiden</surname>
          </string-name>
          , Linguistic History of Italian,
          <string-name>
            <surname>A</surname>
          </string-name>
          , Routledge,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vàrvaro</surname>
          </string-name>
          ,
          <article-title>La parola nel tempo</article-title>
          . Lingua, società e storia, Bologna : Il Mulino,
          <year>1984</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>11For an overview of Old Italian resources</article-title>
          , refer to [
          <volume>18</volume>
          ].
          <volume>1219840</volume>
          .1219853.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>M.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dickinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kuebler</surname>
          </string-name>
          ,
          <article-title>Does size matter? text and grammar revision for parsing social media data</article-title>
          , in: C.
          <string-name>
            <surname>Danescu-Niculescu-Mizil</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Farzindar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Gamon</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Inkpen</surname>
          </string-name>
          , M. Nagarajan (Eds.),
          <source>Proceedings of the Workshop on Language Analysis in Social Media</source>
          , Association for Computational Linguistics, Atlanta, Georgia,
          <year>2013</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . URL: https://aclanthology.org/W13-1101.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>M.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dickinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kübler</surname>
          </string-name>
          ,
          <article-title>Towards domain adaptation for parsing web data</article-title>
          , in: R. Mitkov, G. Angelova,
          <string-name>
            <surname>K.</surname>
          </string-name>
          Bontcheva (Eds.),
          <source>Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP</source>
          <year>2013</year>
          ,
          <string-name>
            <given-names>INCOMA</given-names>
            <surname>Ltd</surname>
          </string-name>
          . Shoumen,
          <string-name>
            <surname>BULGARIA</surname>
          </string-name>
          , Hissar, Bulgaria,
          <year>2013</year>
          , pp.
          <fpage>357</fpage>
          -
          <lpage>364</lpage>
          . URL: https://aclanthology.org/ R13-1046.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>C.</given-names>
            <surname>Corbetta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Passarotti</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Moretti, The Rise and Fall of Dependency Parsing in Dante Alighieri's Divine Comedy</article-title>
          , in: R. Sprugnoli, M. Passarotti (Eds.),
          <source>Proceedings of the Third Workshop on Language Technologies for Historical</source>
          and
          <article-title>Ancient Languages (LT4HALA) @ LREC-COLING-2024, ELRA</article-title>
          and
          <string-name>
            <given-names>ICCL</given-names>
            ,
            <surname>Torino</surname>
          </string-name>
          , Italia,
          <year>2024</year>
          , pp.
          <fpage>50</fpage>
          -
          <lpage>56</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .lt4hala-
          <fpage>1</fpage>
          .7.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>