<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Atypical or Underrepresented? A Pilot Study on Small Treebanks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Akshay Aggarwal</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chiara Alzetta</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>. Twilio</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Prague</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Czechia</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pisa - ItaliaNLP Lab aaggarwal@twilio.com</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>chiara.alzetta@ilc.cnr.it</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>We illustrate an approach for multilingual treebanks explorations by introducing a novel adaptation to small treebanks of a methodology for identifying cross-lingual quantitative trends in the distribution of dependency relations. By relying on the principles of cross-validation, we reduce the amount of data required to execute the method, paving the way to expanding its use to low-resources languages. We validated the approach on 8 small treebanks, each containing less than 100,000 tokens and representing typologically different languages. We also show preliminary but promising evidence on the use of the proposed methodology for treebank expansion.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Linguistically-annotated language resources like
treebanks are fundamental for developing reliable
models to train and test tools used to address
Natural Language Processing (NLP) tasks acquiring
linguistic evidence from corpora. Concerning the
latter, researchers frequently rely on multilingual
or parallel resources in contrastive studies to
quantify the similarities and differences between
languages
        <xref ref-type="bibr" rid="ref16">(Jiang and Liu, 2018)</xref>
        . Over the past few
years, the Universal Dependencies (UD)
initiative1 (Zeman et al., 2021) has further encouraged
such studies. UD defines a universal inventory
of categories and guidelines to facilitate
consistent annotation of similar constructions across
languages
        <xref ref-type="bibr" rid="ref10 ref18">(Nivre, 2015; de Marneffe et al., 2021)</xref>
        ,
and, at present, the project includes about 200
treebanks representing over 100 languages. The
con
      </p>
      <p>Copyright © 2021 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).</p>
      <p>1https://universaldependencies.org
sistent annotation of linguistic phenomena under
a shared representation and across different
languages makes UD treebanks exceptionally well
suited for quantitative comparison of languages
(see, for example, Croft et al. (2017), Berdicevskis
et al. (2018), Vylomova et al. (2020) and among
our works, Alzetta et al. (2019a) and Alzetta et al.
(2020a)).</p>
      <p>
        Despite their great relevance for linguistic
investigations, large treebanks are available for only
a tiny fraction of the world’s languages
        <xref ref-type="bibr" rid="ref24">(Vania et
al., 2019)</xref>
        . Even within the UD project, around
60% of the treebanks can be considered small,
i.e. containing less than 100,000 tokens.
Treebank size, in fact, is generally identified as the
bottleneck for obtaining high-quality
representative models of language use to be employed in
downstream NLP applications. In general terms,
larger datasets allow for better generalisations of
language constructions, leading to better
performances of systems trained using such data
        <xref ref-type="bibr" rid="ref29">(Zeman
et al., 2018)</xref>
        . In fact, ad-hoc strategies are
generally needed when dealing with low-resourced
languages
        <xref ref-type="bibr" rid="ref15">(Hedderich et al., 2021)</xref>
        .
      </p>
      <p>
        This paper illustrates a novel workflow
specifically designed to adapt an existing methodology
for treebank exploration to small treebanks. The
base method, extensively described by Alzetta et
al. (2020b), relies on an unsupervised algorithm
called LISCA (LInguistically–driven Selection of
Correct Arcs)
        <xref ref-type="bibr" rid="ref11">(Dell’Orletta et al., 2013)</xref>
        . LISCA
has been successfully employed in past works
for performing quantitative cross-lingual analyses
        <xref ref-type="bibr" rid="ref3 ref3 ref4 ref4 ref5 ref6">(Alzetta et al., 2019a; Alzetta et al., 2019b; Alzetta
et al., 2020a)</xref>
        and error detection on UD treebanks
        <xref ref-type="bibr" rid="ref2">(Alzetta et al., 2017)</xref>
        . The algorithm works in
two main steps. First, it acquires evidence about
language use from the distributions of phenomena
in annotated sentences. The algorithm then uses
such evidence to distinguish typical from atypical
constructions in an unseen set of sentences. The
typicality of a construction is determined with
respect to the examples observed in a corpus used
as a reference, and is encoded with a score. This
score, in fact, reflects the probability of observing
a dependency occurring in a given context (both
sentence-level and corpus-level) on the basis of the
constructions sharing common properties reported
in the reference corpus. Hence, from our point of
view, typicality and frequency are tightly related
concepts, as non-standard constructions are also
usually less frequent in natural language use.
      </p>
      <p>As such, the LISCA methodology relies on
large sets of automatically parsed sentences to
collect the statistics about phenomena distributions:
even if the data contains parsing errors2, the
corpus size guarantees the collected statistics reflect
the actual language use. However, such an
approach can be employed only for analysing
languages for which large amounts of data are
available, or at least for which the parser outputs are
generally considered reliable. To overcome such a
limit, Aggarwal (2020) suggested that if the
statistics are acquired from gold annotations (such as
treebanks), the algorithm could collect the
statistics from fewer data since these resources are
assumed to be error-free.</p>
      <p>We implemented this proposal by adapting the
original LISCA workflow as detailed in Section 2.
Our variation to the original methodology is
inspired by the k-fold approach commonly used for
performing systems’ cross-validation: according
to this approach, a dataset is split into sub-sets
of equal size, iteratively used for training and/or
evaluating a system. We employ a similar strategy
for evaluating the typicality of the dependency
relations in each treebank split, acquiring the
statistics from the sentences contained in the other splits
rather than from an external reference corpus. This
small but substantial change in the method
worklfow allows us to apply the LISCA algorithm to
small treebanks, which is particularly relevant in
the case of analyses performed on low-resource
languages.</p>
      <p>
        We tested the methodology in a case study,
reported in Section 3, involving 8 languages
represented using UD treebanks. Our goal is to
test if our method can support linguistic
investigations for exploring and quantifying
similari2An assumption when producing automatically parsed
data is that most of the errors made by a parser are
consistent. As we showed in
        <xref ref-type="bibr" rid="ref2">(Alzetta et al., 2017)</xref>
        , the LISCA-based
method allows to spot these errors types in annotations.
ties and differences between typologically
different languages. To this aim, we first validate the
adaptation to the original LISCA approach
proposed here in Section 3.1. Then, we exemplify
how the obtained results can be employed for
linguistic investigations in Section 3.2. To improve
the cross–linguistic comparability of the
analysis, we relied on Parallel UD (PUD) treebanks: a
collection of parallel treebanks developed for the
CoNLL–2017 Shared Task on multilingual
parsing
        <xref ref-type="bibr" rid="ref28">(Zeman et al., 2017)</xref>
        and linguistically
annotated under the UD representation. Being parallel,
PUDs are particularly well suited for carrying out
multilingual studies since they contain only 1,000
sentences manually translated from English into
the other languages, representing a perfect testbed
for our approach.
      </p>
      <p>Before concluding the paper in Section 5, we
report the results of preliminary investigations to
explore whether our approach could also be
employed for automatically identifying
underrepresented phenomena in treebanks. Søgaard (2020)
and Anderson et al. (2021) argue that some
treebanks cover only a restricted sample of the
structures commonly used in a language, leaving out
less common phenomena. This leakiness might
affect the performances of NLP systems even more
than the system architecture. Thus, treebanks
should be expanded not only to improve their
representativeness but also to obtain more truthful
performances of systems trained using them.
Section 4 investigates if our methodology can
contribute to this issue by exploring its application in
automatic treebank expansion.</p>
      <p>The contributions of the paper can be listed
as: (i) a novel approach specifically designed for
carrying out multilingual investigations on small
treebanks; (ii) a case study involving eight
typologically different languages to test the
methodology; and (iii) a novel formula, introduced in
Section 3.2, to measure the distance between
dependents and their syntactic head which improves the
cross-lingual comparability of treebanks with
respect to such property.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Approach</title>
      <p>
        The method presented in this paper relies on a
methodology for treebank exploration based on
the unsupervised algorithm LISCA
        <xref ref-type="bibr" rid="ref11">(Dell’Orletta
et al., 2013)</xref>
        , which we adapted to expand its usage
for small treebanks, namely containing less than
100,000 tokens.
      </p>
      <p>
        As mentioned earlier, LISCA can be employed
to quantify the typicality of each dependency
relation (hereafter deprel)3 of a linguistically
annotated corpus with respect to a large set of
examples taken as reference
        <xref ref-type="bibr" rid="ref5 ref6">(Alzetta et al., 2020b)</xref>
        . To
achieve this goal, the algorithm first collects
statistics about linguistically motivated properties of
deprels extracted from a corpus of automatically
parsed sentences (called reference corpus) to
create a statistical model (SM). Then, the algorithm
calculates a typicality score for each deprel
appearing in a test corpus relying on the SM while
also considering its linguistic context to assess the
relevance of the dependency label used for
marking the dependency in the given context. When
interpreting the assigned LISCA score, a deprel
marked by LISCA as highly typical was possibly
frequently observed in similar contexts also in the
reference corpus. In contrast, an atypical deprel
could be characterised by certain properties which
make it somehow distant from the other instances
of dependency marked with the same label in the
reference corpus.
      </p>
      <p>In essence, LISCA computes the score for a
given deprel taking into account local properties
(e.g., dependency length and direction) of each
deprel in the test corpus as well as the linguistic
context where it is located (e.g., distance form root,
leaves and number of siblings), comparing them
both against the properties and contexts of all
dependencies annotated with the same dependency
label in the reference corpus. For this reason, the
reference corpus has generally corresponded to a
large corpus of around 40M tokens: the corpus
size allows accounting for a more comprehensive
set of examples of linguistic constructions while
also compensating for possible parser errors.
Workflow. For this study, we implemented the
adaptation of the LISCA workflow proposed by
Aggarwal (2020). Inspired by the k-fold
validation approach, we modified the original approach
as follows:
1) Split a treebank into k portions of equal size
(k = 4 for this work), each containing the same
number of sentences;
2) Use LISCA to acquire the statistics (encoded
in the SM) about the distribution of linguistic
phenomena from a reference corpus obtained by
3Given a deprel →A−− − − nsubj B, we refer to A→−
dependency, with nsubj as the dependency label.
B as the
merging k − 1 portions of the previously split
treebank;
3) Use the obtained SM to compute the
typicality score of the deprels appearing in the remaining
treebank portion (i.e., the one not included in the
reference corpus);
4) Repeat steps 2 and 3 until all k portions are
analysed;
5) Merge the analysed portions and order the
deprels by decreasing LISCA score to have a unique
ranking of all the deprels in the treebank.</p>
      <p>The ordered ranking of deprels can be explored
to investigate which linguistic constructions,
represented by means of the deprels, were marked
as typical or atypical, characterised by higher and
lower scores, respectively.
2.1</p>
      <sec id="sec-2-1">
        <title>Data and Languages</title>
        <p>
          We tested our method on a selection of Parallel
UD (PUD) treebanks
          <xref ref-type="bibr" rid="ref28">(Zeman et al., 2017)</xref>
          , each
containing 1,000 sentences. In order to
encompass different language families and genera4, we
carried out the case study on the following eight
languages: Arabic (AR; Afro-Asiatic, Semitic),
Czech (CZ; Indo-European, Slavic), English (EN;
Indo-European, Germanic), Hindi (HI;
IndoEuropean, Indic), Finnish (FI; Uralic, Finnic),
Indonesian (ID; Austronesian, Malayo-Sumbawan),
Italian (IT; Indo-European, Romance) and Thai
(TH; Tai-Kadai, Kam-Tai).
3
3.1
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <sec id="sec-3-1">
        <title>Validating the Approach</title>
        <p>
          We report the results of an analysis to verify
whether the adapted and original LISCA-based
methods return comparable results. To this aim,
we compared the LISCA ranking of PUD deprels
obtained using the original algorithm workflow,
which employs a large reference corpus to build
the language SM, and the novel workflow defined
above, which acquires the statistics from the
treebank itself. We carried out this analysis for
Italian and English PUD treebanks. We manually
verified in previous studies that the original
approach applied to those languages allows
capturing elements of linguistic and parsing complexity
4The language family and genus, reported between
parenthesis as (ISO language code, family, genus), are acquired
from the World Atlas of Language Structures (WALS,
available online https://wals.info/languoid)
          <xref ref-type="bibr" rid="ref11 ref13">(Dryer
and Haspelmath, 2013)</xref>
          .
distinguishing between typical and atypical
constructions along with the produced ranking of
deprels
          <xref ref-type="bibr" rid="ref3 ref4 ref5 ref6">(Alzetta et al., 2019a; Alzetta et al., 2020b)</xref>
          .
        </p>
        <p>We compared the deprel rankings obtained
using the two methodology workflows in terms of
Spearman correlation, which returns a rank
correlation coefficient indicating a statistical
dependence between the rankings of two observed
variables. The analysis showed a strong and
significant correlation between the rankings produced
relying on the two workflows in both languages.
Specifically, we obtained a Spearman correlation
coefficient of 0.95 ( p &lt; 0.5) for Italian and
English.</p>
        <p>
          Such high correlations confirm that gold
corpora, although small, can be used to acquire
relevant statistics about language use. Manually
revised data might be limited in size. However,
their annotations are also generally correct in the
case of rare phenomena, which a parser could
wrongly annotate due to their low frequency in
the data. While large reference corpora
compensate for the possibly wrong parses assigned to rare
constructions with their size, small reference
corpora shall compensate with consistency and
correctness. Hence, we could say that using gold data
for building the SM allows reducing the number
of examples for acquiring language statistics. We
notice a difference between the two rankings only
when focusing on the bottom part, where we find
deprels with the lowest scores. While the
original method produces only a tiny number of deprels
with LISCA score equal to 0, which we usually
excluded from the analyses, we observe many more
of them in the ranking produced with our
worklfow adaptation. LISCA score zero is assigned
to those dependencies never observed in the
reference corpus; thus, their typicality is extremely low.
It is not surprising that smaller reference corpora
produce a higher number of these cases, given
their limited coverage. However, the high
correlation coefficient reported above suggests that such
deprels are still interesting from a linguistic
perspective. They correspond to rare constructions
in the language, obtaining a score slightly higher
than zero in the case of a larger reference corpus
but are still placed in the lower positions of the
ranking.
This subsection exemplifies how the ranking of
deprels obtained with our adapted approach can be
employed in linguistic analyses to identify
similarities and differences between languages. For
this case study, we focused on a specific property
of deprels, namely the length of the dependency
link. The length of a deprel, measured as the linear
distance in terms of intervening tokens between
a word and its syntactic head, is a property
frequently explored in linguistically annotated
corpora. It is highly related to processing complexity
in all languages
          <xref ref-type="bibr" rid="ref12 ref14 ref23 ref27">(Demberg and Keller, 2008;
Temperley, 2007; Futrell et al., 2015; Yu et al., 2019)</xref>
          .
For example, McDonald and Nivre (2011)
observed that parsers tend to make more mistakes on
longer sentences and longer dependencies. Such
complexity makes this property particularly
interesting from a multilingual perspective, especially
when dealing with parallel corpora, as in our case
study.
        </p>
        <p>We inspected the ranking of deprels to monitor
the LISCA score associated with deprels of
different lengths and their distribution along the
ranking of each language. To facilitate the rankings
exploration and comparison, we split each
ranking into three portions of equal size, referred to
as top, middle and bottom, where top contains
deprels obtaining the highest scores (more typical).
In contrast, the bottom contains the deprels with
the lowest scores (atypical).</p>
        <p>In order to allow a proper multilingual
comparison of the distribution of deprel lengths along with
the rankings, we defined the novel measure called</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Adjusted Link Length (LLadjusted, cf. Figure 1).</title>
      <p>
        The measure, inspired by Brevity Penalty used in
BLEU score
        <xref ref-type="bibr" rid="ref19">(Papineni et al., 2002)</xref>
        , is designed
to compute the length of deprels involving content
words as dependant while simultaneously
improving cross-language comparability as the length of
a deprel is measured keeping in mind the
overall length of the sentence where it is located and
the average sentence length in the treebank. This
way, instead of comparing absolute length values,
we can observe the tendency of languages towards
producing longer or shorter deprels.
      </p>
      <p>In LLadjusted, we operationally compute
the length of deprels as a function of a)
the average sentence length in the treebank
(T rbAvgSentLen), b) the length of the sentence
where the deprel appears (SentLength), and c)
the distance, in tokens, between the dependent and
its syntactic head (LLraw). The formula’s values
of 0.5 and 1.25 were determined empirically to
account for unusually short and long sentences,
respectively, in the treebank. Thus, the
resulting value associated with each deprel denotes it
as ‘long’, ‘medium’ or ‘short’ with respect to the
average deprel length computed in the treebank.
Note that, although our analysis focuses on
content words, function words are still accounted for
when computing the LISCA score as they might
be part of the context of content words.</p>
      <p>Figure 2 displays the distribution of deprels
of different lengths (computed using LLadjusted)
along the portions of the treebank ranking of
each language. The distributions show that longer
deprels are given a lower plausibility score by
LISCA in all languages. Interestingly, the length
distributions are pretty similar across different
languages except for Hindi. Such difference
could be due to the typical word order of
constituents of the considered languages. Hindi,
in fact, is the only language of our set where
the order of the main constituents is of the type
S(ubject)O(bject)V(erb)5, and the dominant word
5All the other languages are S(ubject)V(erb)O(bject)
languages.
order of a language has been shown to influence
the dependency length across major dependency
types by Yadav et al. (2020).</p>
      <p>It should be noted that such difference between
languages could also be observed computing the
length of dependency relations straightforwardly
on PUD treebanks: the average linear link length
computed on Hindi PUD is 6.54, for Thai PUD,
the language showing shorter relations, is 2.67,
while the remaining languages show a value
ranging between 3.1 and 3.5. However, our
methodology allows us to combine multiple properties
simultaneously into a score, thus isolating in
different portions of the rankings the deprels that show
an atypical value for a given property but could
be still considered quite typical for the language
based on their context. As proof, observe that long
and medium deprels in Hindi tend to appear earlier
in the ranking than in other languages: 19.73% of
deprels located in the middle bin are covered by
medium and long deprels, suggesting that longer
deprels are more common in Hindi. On the
contrary, only 7% of deprels of the middle bin are
long in Thai, pointing to their atypicality in the
language.</p>
      <p>
        The above results show the methodology’s
effectiveness for exploring tendencies and
peculiarities of languages in multilingual studies. However,
small samples like PUD treebanks are usually not
suited for analysing infrequent phenomena
        <xref ref-type="bibr" rid="ref22">(Taherdoost, 2016)</xref>
        . Hence, one might wonder if we are
actually capturing the atypicality of linguistic
constructions, or instead, we are biased by phenomena
underrepresented in the treebank. In the
following Section, we will explore whether low LISCA
scores might be associated with infrequent
linguistic phenomena due to under-representation in the
data used to build the SM.
Our analyses started from the premise that PUD
treebanks are error-free. Therefore we can look at
the rankings as containing correctly annotated
examples of language use. However, the approach
employed in this study does not exclude the
scenario that a deprel might obtain a low LISCA score
because of a lack of similar constructions in the
treebank. We explored this idea both at deprel and
sentence level, as described below.
      </p>
      <p>
        Concerning the deprel–level analysis, we tested
the accuracy of a parser for deprels in the three
portions of the LISCA rankings. To this aim, we
parsed each PUD treebanks using UDPipe
        <xref ref-type="bibr" rid="ref21">(Straka
et al., 2016)</xref>
        , relying on the k-fold approach used
to train LISCA: we split each PUD into 4
portions of 250 sentences each, trained UDPipe with
34 of the portions and parsed the remaining
portion. Then, we checked if deprels were parsed
accurately. Again, we excluded function words from
this analysis to improve cross-language
comparability and avoid biased results as function words
are usually more accurately parsed than content
words. We observed that wrongly parsed deprels
mainly concentrate in the bottom bins for all
languages based on the obtained results. This
suggests that there might be a relationship between
low LISCA scores and underrepresented
phenomena.
      </p>
      <p>For the sentence-level analysis, we computed
the LISCA score for each sentence in all PUD
treebanks as the arithmetic mean of the scores of the
individual deprels belonging to the sentence to get
a sentence–level LISCA score. In the analysis,
we explored whether sentences with low average
LISCA scores are also more difficult to parse than
those with higher average LISCA scores. Having
computed the sentence–level LISCA scores, we
We proposed a novel workflow to adapt an
existing approach for treebank exploration to small
treebanks and low-resourced languages. Results
of our analyses showed the effectiveness of the
methodology in multiple scenarios. First, the
adapted method allows obtaining reliable results
on par with the original method workflow when
performing linguistic explorations of the
treebanks. Secondly, the results also show the
potential of the method for automatically
identifying underrepresented constructions in treebanks.
The latter result paves the way for the automatic
identification of cases required to expand the
treebanks, which we plan to further investigate in
future work.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>We would like to sincerely thank the anonymous
reviewers for their helpful comments.
ganathan Ramasamy, Carlos Ramisch, Fam Rashel,
Mohammad Sadegh Rasooli, Vinit Ravishankar,
Livy Real, Petru Rebeja, Siva Reddy, Georg Rehm,
Ivan Riabov, Michael Rießler, Erika Rimkute˙,
Larissa Rinaldi, Laura Rituma, Luisa Rocha, Eiríkur
Rögnvaldsson, Mykhailo Romanenko, Rudolf Rosa,
Valentin Ro s,ca, Davide Rovati, Olga Rudina, Jack
Rueter, Kristján Rúnarsson, Shoval Sadde, Pegah
Safari, Benoît Sagot, Aleksi Sahala, Shadi Saleh,
Alessio Salomoni, Tanja Samardžic´, Stephanie
Samson, Manuela Sanguinetti, Ezgi Sanıyar, Dage Särg,
Baiba Saul¯ıte, Yanin Sawanakunanon, Shefali
Saxena, Kevin Scannell, Salvatore Scarlata, Nathan
Schneider, Sebastian Schuster, Lane Schwartz,
Djamé Seddah, Wolfgang Seeker, Mojgan
Seraji, Mo Shen, Atsuko Shimada, Hiroyuki
Shirasu, Yana Shishkina, Muh Shohibussirri, Dmitry
Sichinava, Janine Siewert, Einar Freyr
Sigurðsson, Aline Silveira, Natalia Silveira, Maria Simi,
Radu Simionescu, Katalin Simkó, Mária Šimková,
Kiril Simov, Maria Skachedubova, Aaron Smith,
Isabela Soares-Bastos, Carolyn Spadine, Rachele
Sprugnoli, Steinhór Steingrímsson, Antonio Stella,
Milan Straka, Emmett Strickland, Jana Strnadová,
Alane Suhr, Yogi Lesmana Sulestio, Umut
Sulubacak, Shingo Suzuki, Zsolt Szántó, Dima Taji,
Yuta Takahashi, Fabio Tamburini, Mary Ann C.
Tan, Takaaki Tanaka, Samson Tella, Isabelle Tellier,
Marinella Testori, Guillaume Thomas, Liisi Torga,
Marsida Toska, Trond Trosterud, Anna Trukhina,
Reut Tsarfaty, Utku Türk, Francis Tyers, Sumire
Uematsu, Roman Untilov, Zdenˇka Urešová, Larraitz
Uria, Hans Uszkoreit, Andrius Utka, Sowmya
Vajjala, Rob van der Goot, Martine Vanhove, Daniel
van Niekerk, Gertjan van Noord, Viktor Varga,
Eric Villemonte de la Clergerie, Veronika Vincze,
Natalia Vlasova, Aya Wakasa, Joel C.
Wallenberg, Lars Wallin, Abigail Walsh, Jing Xian Wang,
Jonathan North Washington, Maximilan Wendt,
Paul Widmer, Seyi Williams, Mats Wirén,
Christian Wittern, Tsegay Woldemariam, Tak-sum Wong,
Alina Wróblewska, Mary Yako, Kayo Yamashita,
Naoki Yamazaki, Chunxiao Yan, Koichi Yasuoka,
Marat M. Yavrumyan, Arife Betül Yenice,
Olcay Taner Yıldız, Zhuoran Yu, Zdeneˇk Žabokrtský,
Shorouq Zahra, Amir Zeldes, Hanzhi Zhu, Anna
Zhuravleva, and Rayan Ziane. 2021. Universal
dependencies 2.8.1. LINDAT/CLARIAH-CZ
digital library at the Institute of Formal and Applied
Linguistics (ÚFAL), Faculty of Mathematics and
Physics, Charles University.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Akshay</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Consistency of Linguistic Annotation</article-title>
          .
          <source>Master's thesis</source>
          ,
          <source>Univerzita Karlova (ÚFAL)</source>
          , Prague, Czechia, September. Thesis Supervisor Zeman, Daniel.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Chiara</given-names>
            <surname>Alzetta</surname>
          </string-name>
          , Felice Dell'Orletta,
          <string-name>
            <given-names>Simonetta</given-names>
            <surname>Montemagni</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Giulia</given-names>
            <surname>Venturi</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Dangerous Relations in Dependency Treebanks</article-title>
          .
          <source>In Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories</source>
          , pages
          <fpage>201</fpage>
          -
          <lpage>210</lpage>
          , Prague, Czech Republic.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Chiara</given-names>
            <surname>Alzetta</surname>
          </string-name>
          , Felice Dell'Orletta,
          <string-name>
            <given-names>Simonetta</given-names>
            <surname>Montemagni</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Giulia</given-names>
            <surname>Venturi</surname>
          </string-name>
          . 2019a.
          <article-title>Inferring quantitative typological trends from multilingual treebanks. A case study</article-title>
          .
          <source>Lingue e linguaggio</source>
          ,
          <volume>18</volume>
          (
          <issue>2</issue>
          ):
          <fpage>209</fpage>
          -
          <lpage>242</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Chiara</given-names>
            <surname>Alzetta</surname>
          </string-name>
          , Felice Dell'Orletta,
          <string-name>
            <given-names>Simonetta</given-names>
            <surname>Montemagni</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Giulia</given-names>
            <surname>Venturi</surname>
          </string-name>
          . 2019b.
          <article-title>Inferring quantitative typological trends from multilingual treebanks. A case study</article-title>
          .
          <source>Lingue e Linguaggio</source>
          ,
          <source>XVIII(2)</source>
          :
          <fpage>209</fpage>
          -
          <lpage>242</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Chiara</given-names>
            <surname>Alzetta</surname>
          </string-name>
          , Felice Dell'Orletta, Simonetta Montemagni, Petya Osenova, Kiril Simov, and
          <string-name>
            <given-names>Giulia</given-names>
            <surname>Venturi</surname>
          </string-name>
          . 2020a.
          <article-title>Quantitative Linguistic Investigations across Universal Dependencies treebanks</article-title>
          .
          <source>In Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it)</source>
          , Bologna (online), Italy, March.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Chiara</given-names>
            <surname>Alzetta</surname>
          </string-name>
          , Felice Dell'Orletta,
          <string-name>
            <given-names>Simonetta</given-names>
            <surname>Montemagni</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Giulia</given-names>
            <surname>Venturi</surname>
          </string-name>
          . 2020b.
          <article-title>Linguisticallydriven Selection of Difficult-to-Parse Dependency Structures</article-title>
          . IJCoL.
          <source>Italian Journal of Computational Linguistics</source>
          ,
          <volume>6</volume>
          (
          <issue>6</issue>
          -2):
          <fpage>37</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Mark Anderson</surname>
            ,
            <given-names>Anders</given-names>
          </string-name>
          <string-name>
            <surname>Søgaard</surname>
          </string-name>
          , and Carlos GómezRodríguez.
          <year>2021</year>
          .
          <article-title>Replicating and Extending "Because Their Treebanks Leak": Graph Isomorphism, Covariants, and Parser Performance</article-title>
          .
          <source>In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)</source>
          , pages
          <fpage>1090</fpage>
          -
          <lpage>1098</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Aleksandrs</given-names>
            <surname>Berdicevskis</surname>
          </string-name>
          , Çag˘rı Çöltekin, Katharina Ehret, Kilu von Prince, Daniel Ross, Bill Thompson, Chunxiao Yan, Vera Demberg, Gary Lupyan,
          <string-name>
            <given-names>Taraka</given-names>
            <surname>Rama</surname>
          </string-name>
          , et al.
          <year>2018</year>
          .
          <article-title>Using Universal Dependencies in cross-linguistic complexity research</article-title>
          .
          <source>In Proceedings of the Second Workshop on Universal Dependencies (UDW</source>
          <year>2018</year>
          ), pages
          <fpage>8</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>William</given-names>
            <surname>Croft</surname>
          </string-name>
          , Dawn Nordquist, Katherine Looney, and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Regan</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Linguistic Typology meets Universal Dependencies</article-title>
          .
          <source>In Proceedings of the 15th International Workshop on Treebanks and Linguistic Theories (TLT15)</source>
          ,
          <source>CEUR Workshop Proceedings</source>
          , pages
          <fpage>63</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Marie-Catherine de Marneffe</surname>
          </string-name>
          ,
          <string-name>
            <surname>Christopher D Manning</surname>
            ,
            <given-names>Joakim</given-names>
          </string-name>
          <string-name>
            <surname>Nivre</surname>
            , and
            <given-names>Daniel</given-names>
          </string-name>
          <string-name>
            <surname>Zeman</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>Universal dependencies</article-title>
          .
          <source>Computational linguistics</source>
          ,
          <volume>47</volume>
          (
          <issue>2</issue>
          ):
          <fpage>255</fpage>
          -
          <lpage>308</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Felice</given-names>
            <surname>Dell'Orletta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Giulia</given-names>
            <surname>Venturi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Simonetta</given-names>
            <surname>Montemagni</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Linguistically-driven Selection of Correct Arcs for Dependency Parsing</article-title>
          .
          <source>Computación y Sistemas</source>
          ,
          <volume>17</volume>
          (
          <issue>2</issue>
          ):
          <fpage>125</fpage>
          -
          <lpage>136</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Vera</given-names>
            <surname>Demberg</surname>
          </string-name>
          and Frank Keller.
          <year>2008</year>
          .
          <article-title>Data from eyetracking corpora as evidence for theories of syntactic processing complexity</article-title>
          .
          <source>Cognition</source>
          ,
          <volume>109</volume>
          (
          <issue>2</issue>
          ):
          <fpage>193</fpage>
          -
          <lpage>210</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Matthew S.</given-names>
            <surname>Dryer</surname>
          </string-name>
          and Martin Haspelmath, editors.
          <year>2013</year>
          . WALS Online.
          <article-title>Max Planck Institute for Evolutionary Anthropology</article-title>
          , Leipzig.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Richard</given-names>
            <surname>Futrell</surname>
          </string-name>
          , Kyle Mahowald, and
          <string-name>
            <given-names>Edward</given-names>
            <surname>Gibson</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Large-scale evidence of dependency length minimization in 37 languages</article-title>
          .
          <source>Proceedings of the National Academy of Sciences</source>
          ,
          <volume>112</volume>
          (
          <issue>33</issue>
          ):
          <fpage>10336</fpage>
          -
          <lpage>10341</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Michael A.</given-names>
            <surname>Hedderich</surname>
          </string-name>
          , Lukas Lange, Heike Adel, Jannik Strötgen, and
          <string-name>
            <given-names>Dietrich</given-names>
            <surname>Klakow</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios</article-title>
          .
          <source>In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , pages
          <fpage>2545</fpage>
          -
          <lpage>2568</lpage>
          , Online, June. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Jingyang</given-names>
            <surname>Jiang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Haitao</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Quantitative Analysis of Dependency Structures</article-title>
          , volume
          <volume>72</volume>
          .
          <string-name>
            <surname>Walter de Gruyter GmbH</surname>
          </string-name>
          &amp; Co KG.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Ryan</given-names>
            <surname>McDonald</surname>
          </string-name>
          and
          <string-name>
            <given-names>Joakim</given-names>
            <surname>Nivre</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Analyzing and integrating dependency parsers</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>37</volume>
          (
          <issue>1</issue>
          ):
          <fpage>197</fpage>
          -
          <lpage>230</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Joakim</given-names>
            <surname>Nivre</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Towards a universal grammar for natural language processing</article-title>
          .
          <source>In International conference on intelligent text processing and computational linguistics</source>
          , pages
          <fpage>3</fpage>
          -
          <lpage>16</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Kishore</given-names>
            <surname>Papineni</surname>
          </string-name>
          , Salim Roukos, Todd Ward, and
          <string-name>
            <given-names>WeiJing</given-names>
            <surname>Zhu</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Bleu: a method for automatic evaluation of machine translation</article-title>
          .
          <source>In Proceedings of the 40th annual meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>311</fpage>
          -
          <lpage>318</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Anders</given-names>
            <surname>Søgaard</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Some Languages Seem Easier to Parse Because Their Treebanks Leak</article-title>
          .
          <source>In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          , pages
          <fpage>2765</fpage>
          -
          <lpage>2770</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Milan</given-names>
            <surname>Straka</surname>
          </string-name>
          , Jan Hajic, and
          <string-name>
            <given-names>Jana</given-names>
            <surname>Straková</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>UDPipe: trainable pipeline for processing CoNLL-U ifles performing tokenization, morphological analysis, pos tagging and parsing</article-title>
          .
          <source>In Proceedings of the tenth international conference on language resources and evaluation (LREC</source>
          <year>2016</year>
          ), pages
          <fpage>4290</fpage>
          -
          <lpage>4297</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Hamed</given-names>
            <surname>Taherdoost</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Sampling methods in research methodology; how to choose a sampling technique for research. How to Choose a Sampling Technique for Research (April 10,</article-title>
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>David</given-names>
            <surname>Temperley</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Minimization of dependency length in written English</article-title>
          . Cognition,
          <volume>105</volume>
          (
          <issue>2</issue>
          ):
          <fpage>300</fpage>
          -
          <lpage>333</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>Clara</given-names>
            <surname>Vania</surname>
          </string-name>
          , Yova Kementchedjhieva, Anders Søgaard, and
          <string-name>
            <given-names>Adam</given-names>
            <surname>Lopez</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages</article-title>
          .
          <source>In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          , pages
          <fpage>1105</fpage>
          -
          <lpage>1116</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>Ekaterina</given-names>
            <surname>Vylomova</surname>
          </string-name>
          , Edoardo M Ponti,
          <string-name>
            <given-names>Eitan</given-names>
            <surname>Grossman</surname>
          </string-name>
          ,
          <string-name>
            <surname>Arya D McCarthy</surname>
            ,
            <given-names>Yevgeni</given-names>
          </string-name>
          <string-name>
            <surname>Berzak</surname>
          </string-name>
          , Haim Dubossarsky, Ivan Vulic´,
          <string-name>
            <surname>Roi</surname>
            <given-names>Reichart</given-names>
          </string-name>
          , Anna Korhonen, and
          <string-name>
            <given-names>Ryan</given-names>
            <surname>Cotterell</surname>
          </string-name>
          .
          <year>2020</year>
          . Proceedings of the Second Workshop on Computational Research in Linguistic Typology.
          <source>In Proceedings of the Second Workshop on Computational Research in Linguistic Typology.</source>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>Himanshu</given-names>
            <surname>Yadav</surname>
          </string-name>
          , Ashwini Vaidya, Vishakha Shukla, and
          <string-name>
            <given-names>Samar</given-names>
            <surname>Husain</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Word Order Typology Interacts With Linguistic Complexity: A Cross-Linguistic Corpus Study</article-title>
          .
          <source>Cognitive science</source>
          ,
          <volume>44</volume>
          (
          <issue>4</issue>
          ):
          <fpage>e12822</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>Xiang</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Agnieszka</given-names>
            <surname>Falenska</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Jonas</given-names>
            <surname>Kuhn</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Dependency length minimization vs. word order constraints: an empirical study on 55 treebanks</article-title>
          .
          <source>In Proceedings of the First Workshop on Quantitative Syntax (Quasy</source>
          , SyntaxFest
          <year>2019</year>
          ), pages
          <fpage>89</fpage>
          -
          <lpage>97</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Zeman</surname>
          </string-name>
          , Martin Popel, Milan Straka, Jan Hajic, Joakim Nivre, Filip Ginter, Juhani Luotolahti, Sampo Pyysalo, Slav Petrov, Martin Potthast, Francis Tyers, Elena Badmaeva, Memduh Gokirmak, Anna Nedoluzhko, Silvie Cinkova,
          <article-title>Jan Hajic jr</article-title>
          .,
          <string-name>
            <surname>Jaroslava</surname>
            <given-names>Hlavacova</given-names>
          </string-name>
          , Václava Kettnerová, Zdenka Uresova, Jenna Kanerva, Stina Ojala, Anna Missilä,
          <string-name>
            <given-names>Christopher D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , Sebastian Schuster, Siva Reddy, Dima Taji, Nizar Habash, Herman Leung,
          <string-name>
            <surname>Marie-Catherine de Marneffe</surname>
          </string-name>
          , Manuela Sanguinetti, Maria Simi, Hiroshi Kanayama, Valeria dePaiva,
          <string-name>
            <surname>Kira</surname>
            <given-names>Droganova</given-names>
          </string-name>
          , Héctor Martínez Alonso, Çag˘rı Çöltekin, Umut Sulubacak, Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Georg Rehm, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Michael Mandl, Jesse Kirchner, Hector Fernandez Alcalde, Jana Strnadová, Esha Banerjee, Ruli Manurung, Antonio Stella, Atsuko Shimada, Sookyoung Kwak, Gustavo Mendonca, Tatiana Lando, Rattima Nitisaroj, and
          <string-name>
            <given-names>Josie</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Conll 2017 shared task: Multilingual parsing from raw text to universal dependencies</article-title>
          .
          <source>In Proceedings of the CoNLL</source>
          <year>2017</year>
          <article-title>Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies</article-title>
          , pages
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          , Vancouver, Canada, August. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Zeman</surname>
          </string-name>
          , Jan Hajicˇ, Martin Popel, Martin Potthast, Milan Straka, Filip Ginter, Joakim Nivre, and
          <string-name>
            <given-names>Slav</given-names>
            <surname>Petrov</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>CoNLL 2018 shared task: Multilingual Parsing from Raw Text to Universal Dependencies</article-title>
          .
          <source>In Proceedings of the CoNLL</source>
          <year>2018</year>
          <article-title>Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies</article-title>
          , pages
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          , Brussels, Belgium, October. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>