<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Old School vs. New School: Comparing Transition-Based Parsers with and without Neural Network Enhancement</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Miryam de Lhoneux</string-name>
          <email>miryam.de_lhoneux@lingfil.uu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sara Stymne</string-name>
          <email>sara.stymne@lingfil.uu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joakim Nivre</string-name>
          <email>joakim.nivre@lingfil.uu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Linguistics and Philology Uppsala University</institution>
        </aff>
      </contrib-group>
      <fpage>99</fpage>
      <lpage>110</lpage>
      <abstract>
        <p>In this paper, we attempt a comparison between "new school" transitionbased parsers that use neural networks and their classical "old school" counterpart. We carry out experiments on treebanks from the Universal Dependencies project. To facilitate the comparison and analysis of results, we only work on a subset of those treebanks. However, we carefully select this subset in the hope to have results that are representative for the whole set of treebanks. We select two parsers that are hopefully representative of the two schools; MaltParser and UDPipe and we look at the impact of training size on the two models. We hypothesize that neural network enhanced models have a steeper learning curve with increased training size. We observe, however, that, contrary to expectations, neural network enhanced models need only a small amount of training data to outperform the classical models but the learning curves of both models increase at a similar pace after that. We carry out an error analysis on the development sets parsed by the two systems and observe that overall MaltParser suffers more than UDPipe from longer dependencies. We observe that MaltParser is only marginally better than UDPipe on a restricted set of short dependencies.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Treebanks have recently been released for a large number of languages in a
consistent annotation within the framework of the Universal Dependencies (UD) project
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. In line with work on dependency parsing following the CoNLL shared task
from 2006 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and in contrast with most work on constituency parsing which has
focused on one language and one domain (the English Penn Treebank), this project
may help reshape the field of syntactic parsing by using a wide variety of languages
and domains. Simultaneously to the development of this project, syntactic parsing
has seen a significant boost in accuracy in the last couple of years with methods
that make use of neural networks to learn dense vectors for words, POS tags and
dependency relations [
        <xref ref-type="bibr" rid="ref1 ref18 ref4">4, 18, 1</xref>
        ] or stacks [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Motivated by the success of neural network models on the PTB and the
Chinese treebank, Straka et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] trained Parsito, a neural network model, on UD
treebanks and obtained good results, improving over their ‘classical’ counterpart
MaltParser [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. There was, however, no systematic comparison between the
classical and the neural network approach and it may be that one approach is more
suitable than another for specific settings. Moreover, the results they reported for
MaltParser are obtained using default settings but results can be much higher with
optimised settings.
      </p>
      <p>In this paper, we propose to compare the performance of these two types of
parsers on UD treebanks. We first propose to select a sample of the UD treebanks
to ease the comparison between parsing models in general.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Treebank Sampling for Comparative Parser Evaluation</title>
      <p>Having a large and varied data set has the advantage that our parsing models will
generalize better. A disadvantage is that it is expensive to train models for all the
languages, especially with the new neural network models that need a search over a
large hyperparameter space in order to be optimised. As UD grows, it may become
more and more prohibitive to train models for all the languages when we want to
evaluate how a parser does as opposed to another or as opposed to a modified
version of itself.</p>
      <p>We therefore suggest that it might be wise to examine their behavior in a
smallscaled setting before training them for the large number of treebanks in UD. Parsing
models can first be evaluated on a small sample of UD treebanks. Subsequently,
depending on the observations on the small set, we can move to a medium sample
before finally testing on all the treebanks if evidence points towards a clear
direction. We have come up with a set of criteria to select the small sample which we
now turn to.</p>
      <p>The objective was to have a sample as representative of the whole treebank set
as possible. To ensure typological variety we divided UD languages into
coarsegrained and fine-grained language families. This led to a total of 15 different
fine-grained families and 8 coarse-grained. We made it a requirement to not
select two languages from the same fine-grained family and ensured to have some
variety in coarse-grained families. We made sure to have at least one isolating,
one morphologically-rich and one inflecting language. We additionally ensured a
variability of treebank sizes and domains. Since parsing non-projective trees is
notoriously harder than parsing projective trees, we also made sure to have at least
one treebank with a large amount of non-projective trees. The quality of treebanks
was also considered in the selection, in particular, there are known issues1 about
inconsistency in the annotation. We selected languages that had as little of those
Czech
Chinese
Finnish
English
Ancient_Greek-PROIEL
Kazakh
Tamil
Hebrew
as possible. To ensure comparability, we finally made sure to select only treebanks
with morphological features (with one exception for Kazakh). This resulted in
a selection of 8 treebanks. The selection is given in Table 1 together with main
arguments for inclusion for each.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Comparing Parsing Accuracy</title>
      <p>
        As said in the introduction, a systematic comparison of classical models for
transitionbased parsing as opposed to models helped by neural network training (henceforth
NN parsers) is lacking. With our selection of treebanks just presented, we will now
compare those two model types. More specifically, we compare MaltParser with
Parsito. As was also said, when compared with NN parsers, MaltParser results
were reported using default settings but results can be much better with optimised
settings. For this reason, we optimised models with the help of MaltOptimizer [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
In an attempt to keep the models somewhat similar across languages, we chose
to use the same parsing algorithm for all the languages and hence only optimised
options specific to the data set (like the root label for example) (MaltOptimizer’s
phase 1) and the feature model (MaltOptimizer’s phase 3). We selected the
arcstandard swap system with a lazy oracle for its ability to deal with non-projectivity
which was crucial at least for Ancient Greek. Additionally, it was one of the
popular algorithms suggested by MaltOptimizer for our selection of languages, together
with its projective version.
      </p>
      <p>For Parsito, we used the pretrained models that the authors made available.
They are trained on UD version 1.2 but we tested them on version 1.3 since that
is the version used for the other parsers. We additionally optimised models for the
languages for which there was no pretrained model available as well as for Tamil
because the results were too low on version 1.3 as compared to version 1.2
(probably due to significant differences in the 2 versions). In order to optimise those
models, we first made sure that we could reproduce results from the pretrained
models on one language (Hebrew): we experimented with the transition system
and oracle and used the random hyperparameter search provided by UDPipe to
tune the hyperparameters.</p>
      <p>
        We add the best reported results for SyntaxNet [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] because they are currently on
average the best reported results for UD. Parsito and SyntaxNet are both
transitionbased parsers that use neural networks to learn vector representations of words,
POS tags and dependency relations in a similar way to Chen and Manning [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
Parsito adds a search-based oracle and SyntaxNet adds beam search and global
normalization.
      </p>
      <p>
        Parsito has been integrated into the recent UDPipe [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] parsing pipeline that
performs tokenisation, morphological analysis and POS tagging. Beam search was
added to Parsito in the version used in UDPipe. We will refer to that parser as the
UDPipe parser in the remainder of this paper. UDPipe taggers and morphological
analysers were trained for all treebanks and both MaltParser and the UDPipe parser
were tested on the test sets that used those models to predict POS tags (universal
and language specific) as well as morphological features. Note that SyntaxNet
results are not directly comparable as they use their own POS tagger and
morphological analyser.
      </p>
      <p>Labeled Attachment Scores (LAS) are given in Table 2. As appears from the
table, UDPipe largely outperforms MaltParser. However, MaltParser performs
better than UDPipe on very small data sets. SyntaxNet is even better than UDPipe in
most cases but is also outperformed by MaltParser on very small treebanks.</p>
    </sec>
    <sec id="sec-4">
      <title>4 Impact of Training Size on Neural Network Parsing</title>
      <p>
        Looking at these results, we can hypothesize that the superior performance of the
NN parsers depend on having reasonably large training sets. We hypothesize that
neural network parsers improve more steeply with increased training size than
MaltParser does. We tested this hypothesis with a learning curve experiment. We
trained several parsers for our selection of languages, varying the size of the
training data. In order to prevent sequential effects that may come from the structure
of treebanks from impacting the results, we randomly shuffled the sentences of the
training set before splitting it to different sample sizes. We compared the effect
of doing that using MaltParser and UDPipe. We first split the training data sets
into one sample of 1K and samples of 50K. Additionally, to zoom in on what
happens with very small data sizes, we also ran the experiment with splits of 1K from
1K to 15K. Because it would have been unreasonable to optimise models for each
split size, we used unoptimised version of both MaltParser and UDPipe. For
MaltParser, we used the arc-standard swap algorithm again with a lazy oracle and an
extended feature model that uses morphological features, in the same way as was
done by Nivre [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The use of morphological features is important so as to have
a model that is comparable to UDPipe but also because morphological features are
crucial for parsing morphologically rich languages like Finnish and Hebrew. For
UDPipe, we also used the swap transition system as well as a lazy oracle so that we
have a comparable system for both parsers. We used UDPipe’s default
hyperparameters which they report to be the hyperparameters that worked best across their
experiments on UD treebanks2.
      </p>
      <p>As can be seen in Figures 1 and 2, the hypothesis that neural network parsers
increase more steeply than MaltParser with increased training size seems to hold
only to some extent. The learning curve for UDPipe is very steep for only the first
few thousands of tokens but flattens out quite quickly after that and continues
improving at a similar pace as MaltParser. It is interesting to note also that MaltParser
does not even outperform UDPipe on all treebanks with a training size of 1K.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Error Analysis</title>
      <sec id="sec-5-1">
        <title>Error Analysis Approaches</title>
        <p>Since training size does not seem to be the only factor explaining the variation in
results in the comparison of the parsers, we want to gain more insights into the
strengths and weaknesses of each parser.</p>
        <p>
          There are many ways of comparing different parsers which, as described in
Kirilin and Versley [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], can be placed on a coarse-grained to fine-grained scale where
the coarsest level just consists in comparing the attachment scores and the finest
level consists in manually looking at output parse trees of both systems. There are
many other possibilities in between these two levels. McDonald and Nivre [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]
look at the impact of different properties of trees on accuracy of a transition-based
and a graph-based parser. Goldberg and Elhadad [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] follow up on this and look
more in detail at the over- and underproductions of specific constructions of those
two systems, showing that it is possible to train a classifier that predicts which
system was used to parse some output data. Kummerfeld et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] also look at some
fine-grained phenomena such as PP and NP attachment and compare the behaviour
of many constituency parsers on those phenomena. Kirilin and Versley [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] also
look at fine-grained error patterns made by different parsers on UD treebanks.
        </p>
        <p>
          There has not been much work on characterizing the errors of neural network
parsers, at least the feedforward neural network ones. Nguyen et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] compared
the performance of two graph-based and two transition-based parsers, each pair of
which contains a neural network parser, on the Vietnamese treebank. The parsers
they used, however, use Recurrent Neural Network (RNN) models. Such a study
has not been done for feedforward neural network parsers, as far as we are aware.
Similarly, recent work has started investigating what neural network parsers learn
[
          <xref ref-type="bibr" rid="ref10 ref9">10, 9</xref>
          ], but these again use RNNs. For this reason, the approach by McDonald
and Nivre [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] seems particularly suited here to get an overview of the strengths
and weaknesses of feedforward neural network parsers compared to their classical
counterpart. It would be interesting to follow up on this study by looking at more
fine-grained phenomena.
5.2
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>MaltParser vs UDPipe</title>
        <p>
          As just mentioned, McDonald and Nivre [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] conducted an extensive error
analysis on two parsers, MaltParer and MSTParser, coming from two different parsing
frameworks: the transition-based and the graph-based parsing framework
respectively. They analyse the effect of different properties of dependency trees on
accuracy which they divide into two different types: graph and linguistics factors. For
the first type, they look at the length of dependencies, length of the sentence, and
a few other things. For the second type, they divide accuracy across different POS
tags and dependency relations. This allows them to observe a tradeoff between
the rich representation of MaltParser that allows it to do well on frequent
dependencies and the problem of error propagation from which MaltParser suffers more
than MSTParser. They argue that these results can be explained by the properties
of the two parsers and the tradeoff between rich representation combined with local
greedy inference and less rich representation combined with global exact inference.
We attempted to carry out a similar study to compare the two parsers that we are
investigating in this paper.
        </p>
        <p>
          McDonald and Nivre [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] worked on the CoNLL shared-task data [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], that is,
data from 13 different languages. They concatenated the test sets parsed by the two
systems they compared and conducted their analysis on that concatenated test set.
They were able to do that because the test sets all had a very similar size. In this
study, we are working on the development sets. We could also have concatenated
those development sets but their sizes vary by orders of magnitude ( 500 tokens to
150K tokens) so we unfortunately cannot do this. We could theoretically create
a balanced test set using only as many tokens as there are in the smallest
development set. The 2 smallest development sets are, however, very small ( 500 and
1200 tokens), which means that doing this would make the data set very small
and sparse. In order to reach a compromise between a perfectly balanced small data
set and a very unbalanced big data set, we concatenated a portion of each treebank
of the size of the smallest data size of the development sets excluding Tamil and
Kazakh (which is Finnish and has 9K tokens). We added the full development
set of Tamil and Kazakh but we keep in mind the fact that they have a small impact
on the results.
        </p>
        <sec id="sec-5-2-1">
          <title>Graph Factors</title>
          <p>
            Our overall results are not as clear-cut as they were in McDonald and Nivre [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ]
but we observe somewhat similar tendencies: the advantages of MSTParser over
MaltParser seem to carry over to UDPipe to some extent. For example, as can be
seen in Figure 3, MaltParser seems to suffer from dependency length more than
UDPipe does. Note that we use the F1 harmonic mean of precision and recall here
instead of LAS. This is because we cannot assume a one-to-one correspondence
between the predicted and gold dependencies.
          </p>
          <p>The picture is not so clear for sentence length as can be seen from Figure 4
because although accuracy for MaltParser seems to be decreasing more than for
UDPipe between 1-10 and 20-30 token sentences, they both perform similarly on
sentences between 40 and 50 tokens and UDPipe is then again better on sentences
longer than 50 tokens. Note however that the max length of sentences for Kazakh
and Tamil are 27 and 49 respectively which might provide some explanation of
why MaltParser outperforms UDPipe for those.</p>
          <p>
            It is interesting to point out that Nguyen et al. [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ] reported a similar tendency
between the results of RNN graph-based and transition-based parsers and their
classical counterpart, where the RNN ones suffer less from sentence and
dependency length than the classical ones.
          </p>
        </sec>
        <sec id="sec-5-2-2">
          <title>Linguistic Factors</title>
          <p>
            McDonald and Nivre [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] created a taxonomy of POS tags and dependency
relations so as to group similar ones together and have consistent labels across
treebanks. Luckily, in our case, we already have consistent dependency labels and
POS tags. In Figure 5 we give accuracies (in terms of F1 again for the same reason
as before) for the 15 most frequent dependency relations and in Figure 6 we give
accuracies for the POS tags. The picture seems somewhat consistent with what we
have observed so far: UDPipe is better than MaltParser on dependencies that may
be distant such as conj and advcl. MaltParser is better, although not by far, for
dependencies that are expected to be short such as nummod.
          </p>
          <p>The accuracies for POS tags show a similar picture as dependency relations.
UDPipe outperforms MaltParser on most POS tags except the ones that are
expected to be close to their head such as NUM.</p>
          <p>Overall then, UDPipe outperforms MaltParser on most dependencies and tags
which shows that in general, their representation is better. MaltParser outperforms
UDPipe only by a small margin on a limited number of phenomena which all seem
to involve short dependencies.</p>
          <p>
            It is important to note however that an important factor at play here is the
beam search used by UDPipe. It has been shown that beam search helps with long
dependencies [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ]. It would be interesting to isolate the beam search factor from
the neural network classifier one by comparing UDPipe with and without it.
6
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and Future Work</title>
      <p>In this paper, we presented a comparison of the performance of neural network
parsers with their classical counterpart. We saw that on small treebanks, MaltParser
outperforms UDPipe. We investigated the effect of training size on both types of
parsers and observed that UDPipe is better than MaltParser even on tiny data sizes.
We observed that UDPipe has a steep learning curve with very small data sizes but
flattens out to a smaller increase rate in a similar way as MaltParser which stays
only a few percentages below with increased training sizes in most cases. We
carried out an error analysis and observed that MaltParser suffers more from increased
dependency length than UDPipe does. MaltParser only does marginally better on a
restricted set of short dependencies. Doing more fine-grained error analysis could
lead to a more in-depth understanding of the strengths and weaknesses of the two
models and it would be interesting to investigate the effect of beam search on neural
network models.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Andor</surname>
          </string-name>
          , Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, and Michael Collins.
          <year>2016</year>
          .
          <article-title>Globally normalized transition-based neural networks</article-title>
          .
          <source>In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7- 12</source>
          ,
          <year>2016</year>
          , Berlin, Germany, Volume
          <volume>1</volume>
          :
          <string-name>
            <given-names>Long</given-names>
            <surname>Papers</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Miguel</given-names>
            <surname>Ballesteros</surname>
          </string-name>
          and
          <string-name>
            <given-names>Joakim</given-names>
            <surname>Nivre</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>MaltOptimizer: Fast and effective parser optimization</article-title>
          .
          <source>Natural Language Engineering</source>
          <volume>22</volume>
          (
          <issue>2</issue>
          ):
          <fpage>187</fpage>
          -
          <lpage>213</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Sabine</given-names>
            <surname>Buchholz</surname>
          </string-name>
          and
          <string-name>
            <given-names>Erwin</given-names>
            <surname>Marsi</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Conll-x shared task on multilingual dependency parsing</article-title>
          .
          <source>In Proceedings of the Tenth Conference on Computational Natural Language Learning. Association for Computational Linguistics</source>
          , pages
          <fpage>149</fpage>
          -
          <lpage>164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Danqi</given-names>
            <surname>Chen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Christopher D</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>A fast and accurate dependency parser using neural networks</article-title>
          .
          <source>In Empirical Methods in Natural Language Processing (EMNLP).</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Chris</given-names>
            <surname>Dyer</surname>
          </string-name>
          , Miguel Ballesteros, Wang Ling, Austin Matthews, and
          <string-name>
            <surname>Noah</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Smith</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Transition-based dependency parsing with stack long short-term memory</article-title>
          .
          <source>In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</source>
          (Volume
          <volume>1</volume>
          : Long Papers).
          <source>Association for Computational Linguistics</source>
          , Beijing, China, pages
          <fpage>334</fpage>
          -
          <lpage>343</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Yoav</given-names>
            <surname>Goldberg</surname>
          </string-name>
          and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Elhadad</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Inspecting the structural biases of dependency parsing algorithms</article-title>
          .
          <source>In Proceedings of the Fourteenth Conference on Computational Natural Language Learning. Association for Computational Linguistics</source>
          , pages
          <fpage>234</fpage>
          -
          <lpage>242</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Angelika</given-names>
            <surname>Kirilin</surname>
          </string-name>
          and
          <string-name>
            <given-names>Yannick</given-names>
            <surname>Versley</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>What is hard in Universal Dependency Parsing?</article-title>
          <source>In 6th Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL</source>
          <year>2015</year>
          ). pages
          <fpage>31</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Jonathan</surname>
            <given-names>K Kummerfeld</given-names>
          </string-name>
          , David Hall, James R Curran, and
          <string-name>
            <given-names>Dan</given-names>
            <surname>Klein</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Parser showdown at the wall street corral: An empirical investigation of error types in parser output</article-title>
          .
          <source>In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics</source>
          , pages
          <fpage>1048</fpage>
          -
          <lpage>1059</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Adhiguna</given-names>
            <surname>Kuncoro</surname>
          </string-name>
          , Miguel Ballesteros, Lingpeng Kong, Chris Dyer,
          <source>Graham Neubig, and Noah A Smith</source>
          .
          <year>2016</year>
          .
          <article-title>What do recurrent neural network grammars learn about syntax? arXiv preprint</article-title>
          arXiv:
          <volume>1611</volume>
          .
          <fpage>05774</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Tal</surname>
            <given-names>Linzen</given-names>
          </string-name>
          , Emmanuel Dupoux, and
          <string-name>
            <given-names>Yoav</given-names>
            <surname>Goldberg</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Assessing the ability of lstms to learn syntax-sensitive dependencies</article-title>
          .
          <source>arXiv preprint arXiv:1611</source>
          .
          <fpage>01368</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Ryan</surname>
            <given-names>McDonald</given-names>
          </string-name>
          and
          <string-name>
            <given-names>Joakim</given-names>
            <surname>Nivre</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Characterizing the errors of datadriven dependency parsing models</article-title>
          .
          <source>In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)</source>
          . pages
          <fpage>122</fpage>
          -
          <lpage>131</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Dat</given-names>
            <surname>Quoc</surname>
          </string-name>
          <string-name>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Mark</given-names>
            <surname>Dras</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Johnson</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>An empirical study for vietnamese dependency parsing</article-title>
          .
          <source>In Proceedings of the Australasian Language Technology Association Workshop 2016</source>
          . Melbourne, Australia, pages
          <fpage>143</fpage>
          -
          <lpage>149</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Joakim</given-names>
            <surname>Nivre</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Universal dependency evaluation</article-title>
          .
          <source>Unpublished paper .</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Joakim</surname>
            <given-names>Nivre</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marie-Catherine de Marneffe</surname>
          </string-name>
          , Filip Ginter, Yoav Goldberg, Jan Hajic,
          <string-name>
            <surname>Christopher D Manning</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ryan</surname>
            <given-names>McDonald</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Slav</given-names>
            <surname>Petrov</surname>
          </string-name>
          , Sampo Pyysalo,
          <string-name>
            <given-names>Natalia</given-names>
            <surname>Silveira</surname>
          </string-name>
          , et al.
          <year>2016</year>
          .
          <article-title>Universal dependencies v1: A multilingual treebank collection</article-title>
          .
          <source>In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC</source>
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Joakim</surname>
            <given-names>Nivre</given-names>
          </string-name>
          , Johan Hall, Jens Nilsson, Atanas Chanev, Güls¸en Eryig˘it, Sandra Kübler, Svetoslav Marinov, and
          <string-name>
            <given-names>Erwin</given-names>
            <surname>Marsi</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>MaltParser: A language-independent system for data-driven dependency parsing</article-title>
          .
          <source>Natural Language Engineering</source>
          <volume>13</volume>
          (
          <issue>2</issue>
          ):
          <fpage>95</fpage>
          -
          <lpage>135</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Milan</surname>
            <given-names>Straka</given-names>
          </string-name>
          ,
          <article-title>Jan Hajicˇ, and</article-title>
          <string-name>
            <surname>Straková.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>UDPipe: trainable pipeline for processing CoNLL-U files performing tokenization, morphological analysis, pos tagging and parsing</article-title>
          .
          <source>In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)</source>
          .
          <source>European Language Resources Association (ELRA)</source>
          , Paris, France.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Milan</surname>
            <given-names>Straka</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jan</surname>
            <given-names>Hajicˇ</given-names>
          </string-name>
          , Jana Straková, and
          <article-title>Jan Hajicˇ jr</article-title>
          .
          <year>2015</year>
          .
          <article-title>Parsing universal dependency treebanks using neural networks and search-based oracle</article-title>
          .
          <source>In Proceedings of Fourteenth International Workshop on Treebanks and Linguistic Theories (TLT 14).</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>David</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Chris</given-names>
            <surname>Alberti</surname>
          </string-name>
          , Michael Collins, and
          <string-name>
            <given-names>Slav</given-names>
            <surname>Petrov</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Structured training for neural network transition-based parsing</article-title>
          .
          <source>In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</source>
          (Volume
          <volume>1</volume>
          : Long Papers).
          <source>Association for Computational Linguistics</source>
          , Beijing, China, pages
          <fpage>323</fpage>
          -
          <lpage>333</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Yue</given-names>
            <surname>Zhang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Joakim</given-names>
            <surname>Nivre</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Analyzing the effect of global learning and beam-search on transition-based dependency parsing</article-title>
          .
          <source>In COLING (Posters)</source>
          . pages
          <fpage>1391</fpage>
          -
          <lpage>1400</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>