<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>FicTree: a Manually Annotated Treebank of Czech Fiction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tomáš Jelínek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Charles Univeristy, Faculty of Arts</institution>
          ,
          <addr-line>Prague</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <volume>1885</volume>
      <fpage>181</fpage>
      <lpage>185</lpage>
      <abstract>
        <p>We present a manually annotated treebank of Czech fiction, intended to serve as an addendum to the Prague Dependency Treebank. The treebank has only 166,000 tokens, so it does not serve as a good basis for training of NLP tools, but added to the PDT training data, it can help improve the annotation of texts of fiction. We describe the composition of the corpus, the annotation process including inter-annotator agreement. On the newly created data and the data of the PDT, we performed a number of experiments with parsers (TurboParser, Parsito, MSTParser and MaltParser). We observe that the extension of PDT training data by a part of the new treebank actually does improve the results of the parsing of literary texts. We investigate cases where parsers agree on a different annotation than the manual one.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The Czech National Corpus (CNC) has decided to enrich
the annotation of some of its large synchronous corpora
by syntactic annotation, using the formalism of the Prague
Dependency Treebank (PDT) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The parsers used for
syntactic annotation must be trained on manually
annotated data, with only PDT data available now. To achieve a
reliable parsing, it is necessary to ensure the training data
to be as close as possible to the target text, but in PDT,
the texts are only journalistic, while one third of the texts
in representative corpora of synchronous written Czech of
the CNC belongs to the fiction genre. In many ways,
fiction differs considerably from the characteristics of
journalistic texts, for example by a significantly lower
proportion of nouns versus verbs: in the journalistic genre, 33.8%
tokens are nouns and 16.0% are verbs; in fiction, the
ratio of nouns and verbs is almost equal, 24.3% tokens are
nouns, and 21.2% verbs (based on statistics [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] from the
SYN2005 corpus [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]).
      </p>
      <p>Therefore, a new manually annotated treebank of fiction
texts was created; it was annotated according to the PDT
a-layer guidelines. The scope of the new treebank is only
about 11% of the PDT data, due to the difficulties of
manual syntactic annotation, but even so, using this new
resource does improve the parsing of fiction texts.
In this article we present this new treebank, named
FicTree (Treebank of Czech fiction), its composition, and the
annotation process. We describe the first experiments with
parsers based on the data of FicTree and PDT. In the data
of the FicTree treebank parsed by four parsers, we
investigate cases where all parsers agree on a syntactic annotation
of one token which differs from the manual annotation.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Composition of the Treebank</title>
      <p>
        The manually annotated treebank FicTree is composed of
eight texts and longer fragments of texts from the genre of
fiction published in Czech from 1991 to 2007, with a total
of 166,437 tokens, 12,860 sentences. It is annotated
according to the PDT a-layer annotation guidelines [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The
PDT data annotated on the analytical layer comprise, for
comparison, 1,503,739 tokens, 87,913 sentences. Seven of
the eight texts which compose the FicTree treebank, were
included in the CNC corpus SYN2010 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] (the eigth one
was originally intended to be included in the SYN2010
corpus too, but was removed in the balancing process).
The size of the eight texts ranges from 4,000 to 32,000
tokens, the average is 20,800 tokens. Most of the texts are
written in original Czech (80%), the remaining 20% are
translations (from German and Slovak). Most of the texts
belong to the fiction genre without any subgenre
(according to the classification of the CNC), one large text (18.2%
of all tokens) belongs to the subclass of memoirs, 5.9%
tokens come from texts for children and youth.
      </p>
      <p>The language data included in the PDT and in FicTree
differ in many characteristics in a similar way to the
differences between the whole genres of journalism and
fiction described above. In FicTree, there are significantly
shorter sentences with an average of 12.9 tokens per
sentence compared to an average of 17.1 tokens per sentence
in PDT. The part-of-speech ratio is also significantly
different, as shown in Table 1.</p>
      <p>It is evident from the table that there is a significantly
lower proportion of nouns, adjectives and numerals in
FicTree, and a higher proportion of verbs, pronouns and
adverbs, which corresponds to the assumption that in fiction,
verbal expressions are preferred, whereas journalism tends
to use more nominal expressions.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Annotation Procedure</title>
      <p>
        The FicTree treebank was syntactically annotated
according to the formalism of the analytical layer of the Prague
Dependency Treebank. The texts were lemmatized and
morphologically annotated using a hybrid system of
rulebased desambiguation [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and stochastic tagger
Featurama1. The texts were then doubly parsed using two
parsers: MSTParser [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and MaltParser [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] (the parsing
took place several years ago when better parsers such as
TurboParser [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] were not available) trained on the PDT
a-layer training data. The difference in the algorithms
of both parsers ensured that the errors in the texts were
distributed differently, it can be assumed that errors in
the subsequent manual corrections will not be identical.
According to Berzak [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] there are likely some deviations
common for both parsers, which will also manifest in the
final (manual) annotation, but this distortion of the data
could not be avoided.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Manual Correction of Parsing Results</title>
        <p>The automatically annotated data was then distributed to
three annotators that checked one sentence after using the
TrEd software for manual treebank editing and corrected
the data. The two versions of the parsed text (parsed by the
MSTParser and by the MaltParser) were always assigned
to two different annotators, we ensured that the
combinations of parsers and annotators were varied. The data were
divided into 163 text parts of approx. 1000 tokens, every
combination of parsers and annotators has occurred in at
least 10 text parts (the proportion of texts corrected by
indivudual annotators was 26%, 35% and 39%).</p>
        <p>The task of the manual annotators was to correct syntactic
structure and syntactic labels, but they also had the
possibility to suggest corrections of segmentation, tokenization
or morphological annotation and lemmatization.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Adjudication</title>
        <p>The two corrected versions of syntactic annotation from
each text were merged, the resulting doubly annotated
texts were examined by an experienced annotator
(adjudicator) who decided which of the proposed annotations
1See http://sourceforge.net/projects/featurama.
to accept. The adjudicator was not limited to the two
manually corrected versions, she was allowed to choose
another solution consistent with the PDT annotation manual
and data. Some changes in tokenization and segmentation
were also performed (159 cases, mainly sentence split or
merge). The adjudication took approximately five years of
work due to the difficulty of the task, the effort to
maximize the consistency of the same phenomenon across the
treebank (and in accordance with PDT data), and other
workload with a higher priority.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Accuracy of the Parsing and of the Manual</title>
      </sec>
      <sec id="sec-3-4">
        <title>Corrections</title>
        <p>In the following two tables, we will present the accuracy of
each step of annotation and the inter-annotator agreement.
Table 2 shows to what extent the automatically parsed and
the manually corrected versions of the text agree with the
final syntactic annotation, first for the texts annotated with
the MSTParser, then for the ones annotated with the
MaltParser. Two measures of agreement with the final
annotation are shown: UAS (unlabeled attachment score, i. e.
the proportion of tokens with a correct head) and LAS
(labeled attachment score, i. e. the proportion of tokens with
a correct head and dependency label).</p>
        <p>It is clear from the table that due to the relatively low
input parsing quality, the annotators had to carry out a large
number of manual interventions in the parsing correction
process. The dependencies or labels were modified for
15–20% of tokens. The manually corrected versions differ
much less from the final annotation, the disagreement is
approx. 5% of the tokens.</p>
        <p>Table 3 presents the agreement between the two
automatically parsed versions and the inter-annotator
agreement (the agreement between the two manually corrected
versions). As in the previous table, we use the measures
UAS and LAS.</p>
        <p>The table shows that the agreement between the
automatically annotated versions is very similar to the
agreement between the final annotation and the worse of the two
parsing results. After the manual corrections, the
agreement between the two versions of texts has increased
considerably, but the difference is approximately twice the
difference between each of the manually corrected
versions of texts and final syntactic markings. This fact shows
that the final annotation alternately used the solutions from
both versions of the texts.</p>
        <p>The results of the experiment with the UAS and LAS
scores for all parsers are approximately 2% worse for
FicTree than for PDT, probably due to the genre differences
of FicTree versus PDT data. In the case of SENT, the
FicTree scores are comparable or better than the PDT etest,
probably because the sentence length in FicTree is
significantly lower, so there is a higher percentage of well-parsed
sentences.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Parsing Experiments</title>
      <p>
        We conducted a series of experiments on PDT and FicTree
data. All data was automatically lemmatized and
morphologically tagged using the MorphoDiTa tagger [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].2 We
used four parsers, two parsers of older generation, which
were used for the automatic annotation of FicTree data
(before manual corrections, with a different morphological
annotation and with other settings providing a better
parsing accuracy): MSTParser [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]3 and MaltParser [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ];4 and
two newer parsers: TurboParser [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]5 and Parsito [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].6 We
use three measures: UAS (unlabeled attachment score),
LAS (labeled attachment score) and SENT (labeled
attachment score for whole sentences, i. e. the proportion of
sentences in which all tokens have correct heads and syntactic
labels).
4.1
      </p>
      <sec id="sec-4-1">
        <title>Training on the PDT Data</title>
        <p>The first experiment was to compare the parsing of the
PDT test data (journalism) and the whole FicTree data
(fiction) using parsers trained on PDT training data
(journalism). The results of the experiment are shown in Table 4.
Two following columns compare the results on the PDT
etest and on the whole FicTree data.
In the second experiment, we split FicTree data into
training data (90%) and test data (10%) and combined the
FicTree training data with the PDT training data. This
experiment was repeated three times with different distribution
of the FicTree data, in order to achieve a more reliable
result (10% of FicTree is only 16,000 tokens). In that way,
30% of FicTree has effectively been used as test data, the
parsers beeing trained on PDT training data plus each time
90% of FicTree. It would have been better to use the whole
FicTree data in a 10-fold cross-validation experiment
(always adding 90% of data to train PDT and testing the
remaining 10% ), but we lacked the time and computational
resources to do so. Table 5 compares the results of parsers
trained on the PDT training data itself and on these merged
data (train+ in the table), using PDT etest data and FicTree
test data. For each of the measures (UAS, LAS, SENT),
the accuracy of the parser trained on the PDT training data
is always in one table column, in the following column,
there is the accuracy measured for the parser trained on
the combined training data (PDT and FicTree, train+). The
average for the three experiments is shown.
is consistent for all parsers and measures except for the
measure SENT for the MSTParser.</p>
        <p>For the FicTree test data, we note a significant
improvement in parsing, the increase in the measures is between
0.4% and 2.5%. It is therefore clear that for the syntactic
annotation of texts of fiction, the extension of the training
data by the FicTree training data is definitely beneficial.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>The Agreement of Parsers versus the</title>
    </sec>
    <sec id="sec-6">
      <title>Manual Annotation</title>
      <p>We also attempted to use the results of the parsing to
assess the quality of the manual annotation and adjudication
of the FicTree treebank. The whole FicTree data was
annotated by four parsers trained on the PDT training data.
From these parsed data, we chose those cases where all
four parsers agree on one dependency relation and / or
syntactic function of a token, whereas the manual syntactic
annotation is different. In total, parsers agreed for 70.04%
of tokens in the FicTree data (78.12% if we only count
dependencies without syntactic labels). 5.17% of all tokens
do not match manual annotation (3.43% of tokens
without syntactic labels). Table 6 shows 10 syntactic functions
which occur most frequently in such cases of agreement
between four parsers and disagreement with manual
annotation. In the first column, the syntactic label from the
manual annotation is shown. In the second column, we
present the proportion of disagreement in the tokens with
this syntactic label, in the third column, there is the
absolute number of occurrences.</p>
      <p>The data in the table shows that differences between
parsers and manual markup often occur with the Adv and
Obj syntactic labels (adverbial and object), since the
annotation performed by parsers often differs from the manual
annotation due to the difficulty of linguistic phenomena.
Frequent differences between parsing results and manual
annotations are discussed in more detail later, we will first
give two examples of such differences and their supposed
reason.
The first example, a sentence fragment pohledy plné
bezmeˇrné d˚uveˇry, ‘regards full of unbounded trust’
displayed below, shows a typical example of wrong
parsing result due to incorrect morphological annotation. The
parsers agree on an erroneous interpretation of the
syntactic structure. After the tokens where dependencies or
syntactic labels differ, we show the annotation (numbers
indicate relative differencies, –1 means that the governing
node is positioned 1 to the left, +2 governing node is 2 to
the right; syntactic labels are shown if they differ).
Pohledy plné/–1/+2 bezmeˇrné d˚uveˇry/Obj/-2/Atr/–3
Regards full of unbounded trust
Incorrect morphological tagging of the ambiguous form
plné ‘full’ (which can formally agree both with the
preceding noun pohledy ‘regards’ and with the following noun
d˚uveˇry‘trust’ in number, gender and case) led the parsers
to ignore the valency characteristics of the adjective plný
‘full’, they consider it to be the attribute of the
following noun d˚uveˇry‘trust’, which they interpret as a
nominal attribute of the preceding noun pohledy ‘regards’. The
manual annotation is correct, the adjective plný ‘full’ is
dependent on the preceding noun pohledy ‘regards’, the
following noun d˚uveˇry‘trust’ is an object of the adjective.
Similar differences in the attribution of the Adv and Obj
syntactic labels and their dependency relations are
common, the manual annotation is in most cases correct (the
parsers agree on an erroneous syntactic structure).
In some cases, it is unclear whether the manual
annotation or the parsing results are correct, as in the following
sentence:
Doktorka/+6/+1 vychutnávala chvíli efekt svých slov a pak
pokracˇovala:
The doctor enjoyed for a while the effect of her words, and
then went on:
The head of the subject Doktorka ‘doctor’ in manual
annotation is the coordinating conjunction a ‘and’ which
coordinates two verbs representing two clauses:
vychutnávala ‘enjoyed’ and pokracˇovala ‘went on/continued’. The
subject is considered as a sentence member modifying the
whole coordination (i. e. both verbs). However, all parsers
agree on a different head: the verb vychutnávala ‘enjoyed’
closest to the subject. In this interpretation, the second
verb has a null subject (pro-drop). Both interpretations
are possible in the formalism of PDT, there is no strict
rule indicating when the subject should modify
coordinated verbs and when it should depend on the closest verb
only. In the PDT data, both solutions are used. (The more
the structures in the coordinated sentences are similar and
simple, the more likely it is that the subject will be
common.).</p>
      <sec id="sec-6-1">
        <title>Most Frequent Discrepancies between Parsing</title>
      </sec>
      <sec id="sec-6-2">
        <title>Results and Manual Annotation</title>
        <p>In cases where dependencies between the manually
assigned one and the one on which the parsers agree are
different, the syntactic labels are usually the same. These
functions are mostly auxiliary functions: AuxV (auxiliary
verbs), AuxP (prepositions) and AuxC (conjunctions) or
are related to punctuation (AuxX, AuxK, AuxG). When
the syntactic labels differ, the most frequent mismatches
are Obj and Adv, Sb and Obj, Adv and Atr.</p>
        <p>The highest proportion of discrepancies between the
manually and automatically assigned functions is related
to the following functions: AuxO (46.5%), AuxR (21.9%),
AuxY (15.9%), ExD (14.0%) and Atv (13.5%). AuxO
and AuxR refer to two possible syntactic functions of the
reflexive particles se/si ‘myself, yourself, herself. . . ’
depending on context, for correct parsing, understanding of
semantics and use of lexicon would be necessary. The
AuxY function covers particles and other auxiliary
functions, ExD is a function which covers several different
phenomena in the PDT formalism and is difficult to parse
automatically. None of these functions occur frequently in
the training data.
5.3</p>
      </sec>
      <sec id="sec-6-3">
        <title>Manual Analysis</title>
        <p>When we analyzed manually a sample of sentences in
which four parsers agree on a dependency or syntactic
label different from the one chosen manually, we found out
that in 75% of cases, the manual annotation was certainly
correct, about 20% of the occurrencies could not be
decided quickly due to the complexity of the construction, in
less than 5% of such occurrences the manual annotation
was incorrect. It would certainly be useful to carefully
check all cases of such discrepancy, it may reduce the
error rate in FicTree data by about 0.2–0.5%, but for now we
lack the resources to do so.
6</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>The new manually annotated treebank of Czech fiction
FicTree will allow for a better syntactic annotation of texts
of fiction when we add it to the PDT training data. Given
that larger training data were shown to be beneficial in
parsing journalistic texts as well, its use may be broader.
We plan to publish the FicTree trebank in the Lindat /
CLARIN repository in the near future (after additional
checks of selected phenomena) and we would like to
publish it later in the Universal Dependencies7 format, too,
using publicly available conversion and verification tools.</p>
      <sec id="sec-7-1">
        <title>Acknowledgement</title>
        <p>This paper, the creation of the data and the experiments
on which the paper is based have been supported by the
Ministry of Education of the Czech Republic, through the
project Czech National Corpus, no. LM2015044.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Bartonˇ</surname>
          </string-name>
          , V. Cvrcˇek, F. Cˇermák, T. Jelínek, V. Petkevicˇ: “Statistiky cˇeštiny /Statistics of Czech”. NLN, Prague,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Berzak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barbu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Korhonen</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          <article-title>Katz: “Bias and Agreement in Syntactic Annotations”</article-title>
          , in Computing Research Repository,
          <volume>1605</volume>
          .04481,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Cˇermák</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
            Doležalová-Spoustová,
            <given-names>J. Hlavácˇová</given-names>
          </string-name>
          , M. Hnátková,
          <string-name>
            <given-names>T.</given-names>
            <surname>Jelínek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kocek</surname>
          </string-name>
          , M. Koprˇivová, M. Krˇen,
          <string-name>
            <given-names>R.</given-names>
            <surname>Novotná</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Petkevicˇ</surname>
          </string-name>
          , V. Schmiedtová,
          <string-name>
            <given-names>H.</given-names>
            <surname>Skoumalová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Šulc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Velíšek</surname>
          </string-name>
          <article-title>: “SYN2005: a balanced corpus of written Czech”</article-title>
          .
          <source>Institute of the Czech National Corpus</source>
          , Prague,
          <year>2005</year>
          . Available on-line: http://www.korpus.cz.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>J. Hajicˇ</surname>
          </string-name>
          : “Complex Corpus Annotation: The Prague Dependency Treebank,” in Šimková M. (ed.):
          <article-title>Insight into the Slovak and Czech Corpus Linguistics</article-title>
          , pp.
          <fpage>54</fpage>
          -
          <lpage>73</lpage>
          . Veda, Bratislava, Slovakia,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. Hajicˇ</given-names>
            , J.
            <surname>Panevová</surname>
          </string-name>
          , E. Buránˇová,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Urešová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bémová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Štepánek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pajas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kárník</surname>
          </string-name>
          <article-title>: “A manual for analytic layer tagging of the prague dependency treebank</article-title>
          .
          <source>” ÚFAL Internal Report</source>
          , Prague,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Jelínek</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          <article-title>Petkevicˇ: “Systém jazykového zna cˇkování soucˇasné psané</article-title>
          cˇeštiny,” in Cˇermák F. (ed.):
          <source>Korpusová lingvistika Praha</source>
          <year>2011</year>
          , vol.
          <volume>3</volume>
          : Gramatika a znacˇkování korpus˚u, pp.
          <fpage>154</fpage>
          -
          <lpage>170</lpage>
          . NLN, Prague,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Krˇen</surname>
          </string-name>
          , T. Bartonˇ, V. Cvrcˇek, M. Hnátková,
          <string-name>
            <given-names>T.</given-names>
            <surname>Jelínek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kocek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Novotná</surname>
          </string-name>
          , V. Petkevicˇ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Procházka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Schmiedtová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Skoumalová</surname>
          </string-name>
          <article-title>: “SYN2010: a balanced corpus of written Czech”</article-title>
          .
          <source>Institute of the Czech National Corpus</source>
          , Prague,
          <year>2010</year>
          . Available on-line: http://www.korpus.cz.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.F.T.</given-names>
            <surname>Martins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.B.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.A.</given-names>
            <surname>Smith</surname>
          </string-name>
          <article-title>: “Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers</article-title>
          ,”
          <source>in Proceedings of ACL</source>
          <year>2013</year>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>McDonald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pereira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ribarov</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Hajicˇ: “Nonprojective Dependency Parsing using Spanning Tree Algorithms</article-title>
          ,” in
          <source>Proceedings of EMNLP</source>
          <year>2005</year>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Nivre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hall</surname>
          </string-name>
          , J. Nilsson: “
          <article-title>MaltParser: A Data-Driven Parser-Generator for Dependency Parsing</article-title>
          ,”
          <source>in Proceedings of LREC</source>
          <year>2006</year>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Straka</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Hajicˇ</surname>
          </string-name>
          , J. Straková,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Hajicˇ jr.: “Parsing Universal Dependency Treebanks using Neural Networks</article-title>
          and
          <string-name>
            <surname>Search-Based</surname>
            <given-names>Oracle</given-names>
          </string-name>
          ,”
          <source>in Proceedings of TLT</source>
          <year>2015</year>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Straková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Straka</surname>
          </string-name>
          , J. Hajicˇ: “
          <article-title>Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition,”</article-title>
          <source>in Proceedings of ACL</source>
          <year>2014</year>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>