<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Series</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Searching for a Measure of Word Order Freedom</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vladislav Kubonˇ</string-name>
          <email>vk@ufal.mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Markéta Lopatková</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tomáš Hercig</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Charles University in Prague, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science and Engineering, Faculty of Applied Sciences, University of West Bohemia</institution>
          ,
          <addr-line>Univerzitní 8, 306 14 Plzenˇ</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>NTIS-New Technologies for the Information Society, Faculty of Applied Sciences, University of West Bohemia</institution>
          ,
          <addr-line>Technická 8, 306 14 Plzenˇ</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <volume>1649</volume>
      <fpage>11</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>This paper compares various means of measuring of word order freedom applied to data from syntactically annotated corpora for 23 languages. The corpora are part of the HamleDT project, the word order statistics are relative frequencies of all word order combinations of subject, predicate and object both in main and subordinated clauses. The measures include Euclidean distance, max-min distance, entropy and cosine similarity. The differences among the measures are discussed.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The question of different features of natural languages has
been engrossing theoretical linguists for hundred of years.
They have been studying various language characteristics
and classifying natural languages according to their
properties, giving arise of a language typology, see esp. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], or [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], to mention also the Czech tradition. These
investigations led to a system of four basic language types,
namely isolated, agglutinative, inflectional and
polysynthetic languages.
      </p>
      <p>
        Theoretical linguists have introduced an extensive list
of relevant language features, a summary can be found,
e.g., in the World Atlas of Language Structures (WALS)
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. We will focus one particular phenomenon, word order
of natural languages. While the classification of languages
cannot be based upon a single phenomenon, the word
order characteristics seems to belong among important
features both for theoretical research and for practical natural
language applications.
      </p>
      <p>
        Languages are typically classified according to the
degree of word order freedom to (more or less) fixed word
order and free word order languages. The former type is
often exemplified by English, where a word order
position encodes a syntactic function (e.g., the first noun in
an indicative sentence, having prototypically the function
of subject, is followed by a predicative verb and a noun
with the object functions); this property typically
correlates with under-developed flection. The later type can be
exemplified by Czech, where a syntactic function is
encoded by morphological case marking [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and word order
expresses an information structure.
      </p>
      <p>From the practical point of view, a freedom of word
order to a great extent correlates with a parsing difficulty
of a particular natural language (a language with more
fixed word order is typically easier to parse than a
language containing, e.g., non-projective constructions). On
top of that, modern unsupervised methods of natural
language processing might also profit from investigations of
a similar kind as we present in this paper. If researchers
would have an exact information about the properties of a
language which they want to process using unsupervised
methods, this knowledge might help them to choose an
adequate processing method and/or to properly set its
parameters.</p>
      <p>The examination of a natural language typology have
been traditionally based upon a systematic observation of
linguistic material. However, linguistic research is in
completely different position now: linguistic observations can
be based on large amount of language data stored in
corpora which have been growing not only in size but also in
complexity of annotation during the last decade.</p>
      <p>
        Moreover, several attempts to propose an unified
annotation scheme – let us mention at least Stanford
Dependencies and Stanford Universal Dependencies [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ],1
Google Universal Tags [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Universal Dependencies [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],2
– make it possible to use existing corpora for different
languages.
      </p>
      <p>
        In this paper we exploit the annotation developed in
the frame of the HamleDT project (Harmonized
MultiLanguage Dependency Treebank [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]).3
      </p>
      <p>
        We have already presented a study where we focused
on word order properties of HamleDT treebanks and the
languages ranking – we used a simple max-min distance
based on a distribution of sentences among all variants of
the word order. Here we re-calculate the results of the
experiments described in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] using standard measures like
Euclidean distance, entropy, and cosine similarity.
      </p>
      <p>In the remaining sections of the paper we are first going
to introduce the data and tools used for the experiment,
section 3 describes the setup of the experiment, section
4 presents the results and the final section discusses the
conclusions and possible directions for future work.
1http://nlp.stanford.edu/software/stanford-dependencies.shtml
2http://universaldependencies.org/
3https://ufal.mff.cuni.cz/hamledt
2</p>
      <p>
        Available Data Resources and Tools
HamleDT (Harmonized Multi-Language Dependency
Treebank, [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ])4 is a compilation of existing dependency
treebanks (or dependency conversions of other treebanks),
transformed so that they all conform to the same
annotation style. These treebanks as well as searching tools are
available through a repository for linguistic data and
resources LINDAT/CLARIN.5
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Corpora</title>
      <p>HamleDT integrates corpora for several tens of languages.
Wherever it is possible due to license agreements, the
corpora are transformed into a common data and annotation
format, which enables a user – after a very short period
of getting acquainted with each particular treebank – to
search and analyze comfortably the data of a particular
language.</p>
      <p>
        The HamleDT family of treebanks is based on the
dependency framework and technology developed for the
Prague Dependency Treebank (PDT),6 i.e., large
syntactically annotated corpus for the Czech language [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Here
we focus on the so-called analytical layer, describing a
surface sentence structure (relevant for studying word order
properties). Unfortunately, due to various technical and
licensing restrictions, it was not possible to use all
treebanks contained in HamleDT. Thus our effort focusses on
23 treebanks with available annotation on this syntactic
layer, which still represent a wide variety of languages
having various word-order properties.
      </p>
      <p>As an example, Figure 1 shows three dependency
representations for an English sentence in the HamleDT
format.7 Tables 1 and 2 provide an overview of the languages
and the size of the corpora examined in our experiment.
2.2</p>
    </sec>
    <sec id="sec-3">
      <title>Querying Tool</title>
      <p>
        The advantage of using a common annotation framework
for multiple treebanks also has a very useful consequence
– instead of developing tailor-made searching tools we can
apply a common tool to all treebanks we are analyzing. In
the case of HamleDT, we can use the PML-TQ [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] search
tool,8 originally developed for processing the data from
PDT.
      </p>
      <p>Having the treebanks in the common data format and
annotation scheme, the PML-TQ framework makes it
possible to analyze the data in a uniform way. A typical user
4https://ufal.mff.cuni.cz/hamledt
5https://lindat.mff.cuni.cz/
6http://ufal.mff.cuni.cz/pdt3.0
7Data of each treebank in HamleDT are distributed in three
annotation schemes – (a) the transformation of the treebank to the praguian
style (used in PDT; leftmost in Figure 1), (b) the original annotation
format of the given treebank (or its dependency transformation in case of
non-dependency treebanks; in the middle of Figure 1), and (c) the
transformation of the treebank to the Universal Dependencies style (rigthmost
in the figure).</p>
      <p>8https://lindat.mff.cuni.cz/services/pmltq/
interested in monolingual data can use PML-TQ in an
interactive way. Such approach would, of course, not work
for our set of 23 treebanks, therefore we have used a
command line interface which PML-TQ also provides. This
interface makes it possible to create scripts that process a
specified set of treebanks automatically.</p>
      <p>Let us now give an example of a PML-TQ query used
in our analysis. It counts sentences having an SVO word
order in the main clause.</p>
      <p>a-node $p :=
[ depth() = "1", id ~ "prague",
afun = "Pred", tag ~ "ˆV",
1x a-node</p>
      <p>[ afun = "Sb" ],
1x a-node</p>
      <p>[ afun = "Obj" ],
a-node</p>
      <p>[ afun = "Sb", ord &lt; $p.ord ],
a-node</p>
      <p>[ afun = "Obj", ord &gt; $p.ord ] ];
&gt;&gt; give count()</p>
      <p>The query searches data annotated in the praguian style
(id ~ "prague") for sentences containing verbs (tag ~
"ˆV") with the analytical function of a predicate (afun
= "Pred") at the depth of one level below the
technical root of the tree (depth() = "1"; i.e., this query
focuses on the word order in main clauses, excluding
coordinated predicates and disregarding also subordinate
clauses). There must be exactly one subject and one
object directly depending on the predicate (for the subject:
1x a-node [afun = "Sb"]), the subject must precede
the verb (afun = "Sb", ord &lt; $p.ord), and the object
must follow it (afun = "Obj", ord &gt; $p.ord). The
result of the query is the count of such sentences (&gt;&gt;
give count()). The visualization of the PML-TQ query
can be found in Figure 2.
3</p>
      <sec id="sec-3-1">
        <title>The Experiment</title>
        <p>In order to avoid possible bias caused by a combination
of too many language phenomena in complicated
sentences, we have decided to exclude all sentences
containing coordinated predicates, subjects or objects from our
experiment. The phenomenon of coordination is to some
extent “orthogonal” to that of word order (especially in
dependency-based approaches to a language description);
thus the results might have been negatively influenced if
coordination of verbs or the coordination of its direct
dependents would be allowed.</p>
        <p>In this experiment, we have focused on “full”
structures, i.e., sentences with core syntactic structure
consisting of subject, predicate and object. We have created
several queries aiming at a thorough investigation of the
phenomenon of the mutual position of these syntactic units.
a-tree
zone=en_prague
requires
Pred
VB-S---3P-----a-tree
zone=en_orig
.</p>
        <p>AuxK
Z:------------requires
ROOT
VBZ
merger approval
Sb Obj
NNXSX----------
NNXSX---------The the of
Atr Atr AuxP
PZXXX---------- PZXXX----------
RR--X---------merger approval .</p>
        <p>SBJ OBJ P
NN NN .</p>
        <p>The the of
NMOD NMOD NMOD
DT DT IN
a-tree
zone=en
requires
root
VERB
merger approval .
nsubj dobj punct
NOUN NOUN PUNCT
The the
det det
DET DET
authorities
nmod</p>
        <p>NOUN
of Norwegian
case amod
ADP ADJ
authorities
Atr
NNXPX---------Norwegian
Atr
AOXX-----1----authorities
PMOD
NNS
Norwegian
NMOD
JJ
The results presented in Tables 1 and 2 may serve as a basis
for an estimation of a degree of word order freedom of
individual languages. A typical mutual position of a subject,
a predicate and an object constitutes one of the basic
typological characteristic of a natural language. The problem
of measuring the degree of word order freedom cannot be,
of course, reduced only to this phenomenon, the freedom
of word order of other sentence elements should
probably be taken into account as well. Our decision to base
the estimation on just these three constituents has several
reasons. First of all, these constituents are present in a
vast majority of sentences, they constitute a certain
backbone of every sentence. Second, they are also relatively
easily identifiable in all treebanks, regardless of the
original annotation schemes. Although the HamleDT treebanks
provide uniform annotation, the transformation of less
frequent language phenomena from various languages may
provide results which are not as uniform as we would like
them to be. Last but not least, the three main constituents
are located on top of the dependency tree, they do not
require overly complex queries which might bring additional
bias into the experiment.</p>
        <p>The number we are looking for would describe how far
is the distribution of individual variants of word order from
the ideal absolutely free order of the main constituents.
It is obvious that the languages with the highest degree
of word order freedom would demonstrate the most equal
distribution of sentences among all variants of the word
order described in our tables, i.e., the frequency of all
variants of the order of subject, verb and object will be equal to
16.66% (let us denote this “ideal vector” as Y )9. The
difference between an actual distribution vector of each
particular language from our table and this ideal vector then
expresses the difference in word order freedom.</p>
        <p>There are several measures which we can use for these
9The equal frequency of all variants actually means that there are
probably no grammatical rules which would prefer any order of
constituents over the others.</p>
        <sec id="sec-3-1-1">
          <title>Number of Number of SVO sentences matches (%) OVS (%)</title>
          <p>calculations.10 Let us start with the simplest one, the
maxmin measure (marked as M1 in the subsequent text):
M1 = max xi − min xi</p>
          <p>i∈1,..n i∈1,..n</p>
          <p>This measure has a value 0 for the ideal vector. The
higher its value, the more fixed seems to be the word
order of that particular language. The main advantage of this
measure is its ability to reduce n-dimensional vectors into
two dimensions only (leaving aside all four other values),
thus enabling simple graphical representation. The same
property also constitutes the greatest disadvantage of this
measure, i.e. its insensitivity to subtle differences in
distribution of values among the four variants which were
actually left aside.</p>
          <p>The second measure is the standard Euclidean distance
between two vectors (marked as M2 in the subsequent
text):</p>
          <p>M2 = kX − Y k =
s n
∑ (xi − yi)2
i=1</p>
          <p>In this formula, the symbol X represents the
distribution of word order variants of a given language and Y is
the “ideal vector” with equal distribution of frequencies.
The Euclidean distance is more precise than M1 because it
reflects all six variants of the word order.</p>
          <p>The third measure, very often used for measuring the
similarity of two vectors in information retrieval, is the
cosine similarity (marked as M3 in the subsequent text):
M3 =</p>
          <p>∑in=1 (xi × yi)
p∑in=1 (xi)2 × p∑in=1 (yi)2</p>
          <p>Actually, because both M2 and M3 represent a
distance between two vectors (although measured by
different means and providing numerically different values),
their results with regard to the estimation of word order
freedom would be very similar, the main difference being
the order of the numerical values of M2 and M3. While the
values of M2 are decreasing with the growing word order
freedom, the values of M3 are increasing.</p>
          <p>Because M2 and M3 are in principle quite similar, let us
therefore use one more measure which is also quite natural
and widely used, namely the entropy (marked as M4 in the
subsequent text):</p>
          <p>n
M4 = − ∑ P(xi) ln P(xi)</p>
          <p>i=1</p>
          <p>The values P(xi) are the probabilities of individual word
order variants. Because we do not know the exact
probabilities, we are going to use their relative frequencies from
Tables 1 and 2. The entropy is maximal for the equal
distribution of relative frequencies (probabilities), minimal for
10Actually, the word measure should not be understood as a strictly
mathematical term. The cosine similarity is not a measure in a
mathematical sense, it does not have all properties required by the mathematical
definition of the term measure.
an absolutely deterministic system which has only one
acceptable type of the word order. In other words, the higher
is the entropy for a particular language, the higher is its
degree of word order freedom.</p>
          <p>The results obtained for all four measures are presented in
Tables 3 and 4. In order to enable an easier comparison
of individual measures, we are presenting also the rank
of all languages with regard to their degree of word order
freedom for each particular measure. The ranks then show
how similar the measures are. In both tables, the order of
languages corresponds to their rank according to the M1
measure applied to main sentences.</p>
          <p>Table 3 shows the rank of individual languages with
regard to the word order freedom calculated according to all
measures mentioned above. It was calculated on main
sentences with “full” structure, i.e. main sentences
containing both subject and (exactly one) object, and although the
rank according to each individual measure differs (with the
exception of M2 and M3 which provide, not surprisingly,
an absolutely identical rank), the highest rank always
belongs to the two classical languages, Latin and Ancient
Greek, closely followed by three Slavic languages
(Slovak, Slovenian and Czech) and German. The languages
with the most fixed word order are, according to all
measures, English, Japanese, Estonian and Hindi.</p>
          <p>When comparing both tables, we may notice some
substantial differences in the word order freedom rank for
main and subordinated clauses. We may identify two
distinctive groups of languages which exhibit a relatively big
rank shift. The languages with substantially higher degree
of the word order freedom in subordinated clauses are
Arabic, Catalan and Estonian. The languages with exactly
opposite property are Bengali, German and Dutch. In case of
Dutch we may recall the famous examples of phenomena
exceeding the expressive power of context-free languages,
namely the subordinated clauses such as ...dat Jan Piet de
kinderen zag helpen zwemmen (... that Jan saw Piet help
the children swim) where the Dutch syntax requires a very
strict order of words. Also in German, the word order in
subordinated clauses follows much stricter rules than in
the main ones. In this respect, the results obtained through
our experiment correlate with the syntactic rules of the
language.
5</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Final Remarks and Conclusion</title>
        <p>Although the results presented in this paper support to
a relatively great extent the intuitive comprehension of
the notion of word order freedom of ”big” European
languages, there are at least two aspects of our experiment
which are, according to our opinion, quite interesting. The
first one is the fact that our experiment is based solely
on data, publicly available in syntactically annotated
corpora. Thanks to this fact the experiment does not require
the knowledge of, or even the familiarity with all the
languages under investigation. On the other hand, some of</p>
        <sec id="sec-3-2-1">
          <title>Treebank</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>M1 Rank</title>
        </sec>
        <sec id="sec-3-2-3">
          <title>M2 Rank</title>
        </sec>
        <sec id="sec-3-2-4">
          <title>M3 Rank M4 Rank</title>
          <p>the corpora contained in the HamleDT set are too small to
constitute a reliable source of information about the
properties of a given language. However, this obstacle can be
easily overcome in the future with the growing size and
number of treebanks available under a common annotation
scheme.</p>
          <p>The second interesting aspect is the comparison of
measures which give in principle very similar results and thus
they support the claim that the phenomenon of word order
freedom may be quantified practically by any reasonably
selected measure. In other words, it is not necessary to
develop any specialized measures just for this particular
purpose, it is enough if we use the well known ones, such
as the Euclidean distance or entropy.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Grant support</title>
      <p>The work on this project was partially supported by
the LINDAT/CLARIN project of the Ministry of
Education, Youth and Sports of the Czech Republic (project
LM2015071).</p>
      <p>This work was also supported by the project LO1506 of
the Czech Ministry of Education, Youth and Sports and by
Grant No. SGS-2016-018 Data and Software Engineering
for Advanced Applications.</p>
      <p>This work has been using language resources and tools
developed and/or stored and/or distributed by the
LINDAT/CLARIN project of the Ministry of Education, Youth
and Sports of the Czech Republic (project LM2015071).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Saussure</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          : Course in General Linguistics. Open Court, La Salle,
          <string-name>
            <surname>Illinois</surname>
          </string-name>
          (
          <year>1983</year>
          )
          <article-title>(prepared by C. Bally and A</article-title>
          . Sechehaye, translated by R. Harris).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Sapir</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Language</surname>
          </string-name>
          .
          <article-title>An Introduction to the Study of Speech</article-title>
          . Harcourt, Brace and company, New York (
          <year>1921</year>
          )
          <article-title>(http://www</article-title>
          .gutenberg.org/files/12629/12629- h/12629-h.htm).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Skalicˇka</surname>
          </string-name>
          , V.:
          <article-title>Vývoj jazyka</article-title>
          .
          <source>Soubor statí. Státní pedagogické nakladatelství</source>
          ,
          <source>Praha</source>
          (
          <year>1960</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Dryer</surname>
            ,
            <given-names>M.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haspelmath</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The World Atlas of Language Structures Online</article-title>
          . Harcourt, Brace and company, Leipzig (
          <year>2005</year>
          -2013) Available online at http://wals.info, Accessed on 2015-
          <volume>06</volume>
          -28.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Futrell</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mahowald</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gibson</surname>
          </string-name>
          , E.:
          <article-title>Quantifying Word Order Freedom in Dependency Corpora</article-title>
          .
          <source>In: Proceedings of the International Conference on Dependency Linguistics (Depling</source>
          <year>2015</year>
          ), Uppsala, Sweden, Uppsala University (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>de Marneffe</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>MacCartney</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.:
          <article-title>Generating typed dependency parses from phrase structure parses</article-title>
          .
          <source>In: Proceedings of LREC</source>
          <year>2006</year>
          .
          <article-title>(</article-title>
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>de Marneffe</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          :
          <article-title>The Stanford typed dependencies representation</article-title>
          .
          <source>In: COLING Workshop on Cross-framework and Cross-domain Parser Evaluation</source>
          . (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>de Marneffe</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dozat</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silveira</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haverinen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ginter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nivre</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Universal Stanford Dependencies:
          <article-title>A cross-linguistic typology</article-title>
          .
          <source>In: Proceedings of LREC</source>
          <year>2014</year>
          .
          <article-title>(</article-title>
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nivre</surname>
          </string-name>
          , J.:
          <article-title>Characterizing the errors of datadriven dependency parsing models</article-title>
          .
          <source>In: Proceedings of EMNLP-CoNLL</source>
          <year>2007</year>
          . (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Nivre</surname>
            , J., de Marneffe,
            <given-names>M.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ginter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hajicˇ</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petrov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pyysalo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silveira</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsarfaty</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zeman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Universal dependencies v1: A multilingual treebank collection</article-title>
          .
          <source>In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC</source>
          <year>2016</year>
          ), Portorož, Slovenia, European Language Resources Association (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Zeman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dušek</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marecˇek</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramasamy</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Šteˇpánek</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Žabokrtský</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hajicˇ</surname>
          </string-name>
          , J.: HamleDT:
          <article-title>Harmonized multi-language dependency treebank</article-title>
          .
          <source>Language Resources and Evaluation</source>
          <volume>48</volume>
          (
          <year>2014</year>
          )
          <fpage>601</fpage>
          -
          <lpage>637</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Kubonˇ</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopatková</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mírovský</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Analysis of Word Order in Multiple Treebanks</article-title>
          .
          <source>In: Proceedings of CICLing</source>
          <year>2016</year>
          . LNCS, Berlin Heidelberg, Springer-Verlag (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Lopatková</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kubonˇ</surname>
          </string-name>
          , V.:
          <article-title>Free or FixedWord Order: What can Treebanks Reveal? In Yaghob</article-title>
          , J., ed.:
          <source>Information Technologies - Applications and Theory</source>
          , Prague, Charles University in Prague (
          <year>2015</year>
          )
          <fpage>23</fpage>
          -
          <lpage>29</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Kubonˇ</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopatková</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Word-order analysis based upon treebank data</article-title>
          . In Sidorov, G.,
          <string-name>
            <surname>Galicia-Haro</surname>
          </string-name>
          , S., eds.
          <source>: MICAI 2015: Advances in Artificial Intelligence and Soft Computing</source>
          ,
          <string-name>
            <surname>Part I</surname>
          </string-name>
          . Volume
          <volume>9413</volume>
          ., Berlin / Heidelberg, Springer (
          <year>2015</year>
          )
          <fpage>47</fpage>
          -
          <lpage>58</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Bejcˇek</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Hajicˇová</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Hajicˇ</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Jínová</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kettnerová</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolárˇová</surname>
          </string-name>
          , V.,
          <string-name>
            <surname>Mikulová</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mírovský</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nedoluzhko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Panevová</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poláková</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ševcˇíková</surname>
          </string-name>
          , M.,
          <string-name>
            <surname>Šteˇpánek</surname>
          </string-name>
          , J.,
          <source>Zikánová, Š.: Prague Dependency Treebank</source>
          <volume>3</volume>
          .0. Charles University in Prague, MFF,
          <string-name>
            <surname>ÚFAL</surname>
          </string-name>
          , Prague (
          <year>2013</year>
          )
          <article-title>(http://ufal</article-title>
          .mff.
          <source>cuni.cz/pdt3</source>
          .0/).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Pajas</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Šteˇpánek</surname>
          </string-name>
          , J.:
          <article-title>System for Querying Syntactically Annotated Corpora</article-title>
          .
          <source>In: Proceedings of the ACL-IJCNLP 2009 Software Demonstrations</source>
          , Suntec, Singapore, Association for Computational Linguistics (
          <year>2009</year>
          )
          <fpage>33</fpage>
          -
          <lpage>36</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>