=Paper= {{Paper |id=Vol-2253/paper13 |storemode=property |title=Parsing Italian Texts Together is Better Than Parsing Them Alone! |pdfUrl=https://ceur-ws.org/Vol-2253/paper13.pdf |volume=Vol-2253 |authors=Oronzo Antonelli,Fabio Tamburini |dblpUrl=https://dblp.org/rec/conf/clic-it/AntonelliT18 }} ==Parsing Italian Texts Together is Better Than Parsing Them Alone!== https://ceur-ws.org/Vol-2253/paper13.pdf

Parsing Italian texts together is better than parsing them alone!

Oronzo Antonelli Fabio Tamburini
DISI, University of Bologna, Italy FICLIT, University of Bologna, Italy
antonelli.oronzo@gmail.it fabio.tamburini@unibo.it

Abstract By looking at the cited papers we can observe
that they evaluated the state-of-the-art parsers be-
English. In this paper we present a work fore the “neural net revolution” not including the
aimed at testing the most advanced, state- last improvements proposed by new research stud-
of-the-art syntactic parsers based on deep ies.
neural networks (DNN) on Italian. We The goal of this paper is twofold: first, we
made a set of experiments by using the would like to test the effectiveness of parsers based
Universal Dependencies benchmarks and on the newly-proposed technologies, mainly deep
propose a new solution based on ensem- neural networks, on Italian, and, second, we would
ble systems obtaining very good perfor- like to propose an ensemble system able to further
mances. improve the neural parsers performances when
parsing Italian texts.
Italiano. In questo contributo presentia-
mo alcuni esperimenti volti a verificare 2 The Neural Parsers
le prestazioni dei più avanzati parser
sintattici sull’italiano utilizzando i tree- We considered nine state of the art parsers repre-
bank disponibili nell’ambito delle Univer- senting a wide range of contemporary approaches
sal Dependencies. Proponiamo inoltre un to dependency parsing whose architectures are
nuovo sistema basato sull’ensemble par- based on neural network models (see Table 1). We
sing che ha mostrato ottime prestazioni. set-up each parser using the data from the Italian
Universal Dependencies (Nivre et al., 2016) tree-
bank, UD Italian 2.1 (general texts) and UD Italian
1 Introduction PoSTWITA 2.2 (tweets). For all parsers, we used
the default settings for training, following the rec-
Syntactic parsing of morphologically rich lan-
ommendation of the developers.
guages like Italian often poses a number of hard
In Chen and Manning (2014) dense features are
challenges. Various works applied different kinds
used to learn representations of words, tags and
of freely available parsers on Italian training them
labels using a neural network classifier in order
using different resources and different methods for
to take parsing decisions within a transition-based
comparing their results (Lavelli, 2014; Alicante
greedy model. To address some limitations, in An-
et al., 2015; Lavelli, 2016) and gather a clear pic-
dor et al. (2016) the authors augmented the parser
ture of the syntactic parsing task performances for
model with a beam search and a conditional ran-
the Italian language. In this direction seems rel-
dom field loss objective. The work of Balles-
evant to cite the EVALITA1 periodic campaigns
teros et al. (2015) extends the parser defined in
for the evaluation of constituency and dependency
Dyer et al. (2015) introducing character-level rep-
parsers devoted to the syntactic analysis of Italian
resentation of words using bidirectional LSTMs
(Bosco and Mazzei, 2011; Bosco et al., 2014).
to improve the performance of stack-LSTM model
Other studies regarding the syntactic parsing
which learn representations of the parser state.
of Italian tried to enhance the parsing perfor-
In Kiperwasser and Goldberg (2016) the bidirec-
mances by building some kind of ensemble sys-
tional LSTMs recurrent output vector for each
tems (Lavelli, 2013; Mazzei, 2015).
word is concatenated with any possible heads re-
1
http://www.evalita.it current vector, and the result is used as input to a
multi-layer perceptron (MLP) network that scores After the influential paper from Reimers and
each resulting edge. Cheng et al. (2016) pro- Gurevych (2017) it is clear to the community that
pose a bidirectional attention model which uses reporting a single score for each DNN training ses-
two additional unidirectional RNN, called left- sion could be heavily affected by the system ini-
right and right-left query component. Based on tialisation point and we should instead report the
Kiperwasser and Goldberg (2016) and Cheng et al. mean and standard deviation of various runs with
(2016) model, in Dozat and Manning (2017) a the same setting in order to get a more accurate
biaffine attention mechanism is used, instead of picture of the real systems performances and make
traditional MLP-based attention. The model pro- more reliable comparisons between them.
posed in Nguyen et al. (2017) train a neural net- Table 2 shows the parsers performances on
work model that learn jointly POS tagging and the test set for the three setups described above
graph-based dependency parsing. The model uses executing the training/validation/test cycle for 5
a bidirectional LSTM to learn POS tagging and the times. In any setup the DM17 parser exhibits the
Kiperwasser and Goldberg (2016) approach for best performances, notably very high for general
dependency parsing. Shi et al. (2017a,b) described Italian. As we can expect, the performances on
a parser that combines three parsing paradigms us- setup1 were much lower than that for setup0 due
ing a dynamic programming approach. to the intrinsic difficulties of parsing tweets and to
the scarcity of annotated tweets for training. Join-
Parser Ref.-Abbreviation Method Parsing
(Chen and Manning, 2014) - Tb: a-s Greedy ing the two datasets in the setup2 allowed to get
CM14 a relevant gain in parsing tweets even if we added
(Ballesteros et al., 2015) - Tb: a-s Be-se out-of-domain data. For these reasons, for all the
BA15
(Kiperwasser and Goldberg, 2016)- Tb: a-h Greedy following experiments, we abandoned the setup1
KG16:T because it seemed more relevant to use the joined
(Kiperwasser and Goldberg, 2016)- Gb: a-f Eisner data (setup2) and compare them to setup0.
KG16:G
(Andor et al., 2016) - Tb: a-s Beam-S
AN16 3 An Ensemble of Neural Parsers
(Cheng et al., 2016) - Gb: a-f cle
CH16 The D EPENDA BLE tool in Choi et al. (2015) re-
(Dozat and Manning, 2017) - Gb: a-f cle ports ensemble upper bound performance assum-
DM17
(Shi et al., 2017a,b)- Tb: a-h./ Greedy ing that, given the parsers outputs, the best tree
SH17 -eager can be identified by an oracle “M ACRO” (M A), or
Gb: a-f Eisner that the best arc can be identified by another oracle
(Nguyen et al., 2017) - Gb: a-f Eisner
NG17 “M ICRO” (mi). Table 3 shows that, by applying
these oracles, we have plenty of space for improv-
Table 1: All the neural parsers considered in ing the performances by building some kind of en-
this study with their fundamental features as well semble system able to cleverly choose the correct
as their abbreviations used throughout the paper. information from the different parsers outputs and
In this table “Tb/Gb” means “Transition/Graph- combine them improving the final solution. This
based”, “Beam-S” means “Beam-search” and “a- observation motivates our proposal.
s/h/f” means “arc-standard/hybrid/factored”. To combine the parser outputs we used the fol-
lowing ensemble schemas:
We trained, validated and tested the nine con-
sidered parsers, as well as all the proposed exten- • Voting: Each parser contributes by assigning
sions, by considering three different setups: a vote on every dependency edge as described
in Zeman and Žabokrtský (2005). With the
• setup0: only the UD Italian 2.1 dataset;
majority approach the dependency tree could
• setup1: only the UD Italian PoSTWITA 2.2 be ill-formed, in this case using the switching
dataset; approach the tree is replaced with the output
of the first parser.
• setup2: UD Italian 2.1 dataset joined with the
UD Italian PoSTWITA 2.2 dataset (train and • Reparsing: As described in Sagae and Lavie
validation sets) keeping the test set of PoST- (2006) together with Hall et al. (2007) a MST
WITA 2.2; algorithm is used to reparse a graph where
setup0 Chu-Liu/Edmons are used: equally weighted
Valid. Ita Test Ita (w2); weighted according to the total la-
UAS LAS UAS LAS
CM14 88.20/0.18 85.46/0.14 89.33/0.17 86.85/0.22 beled accuracy on the validation set (w3);
BA15 91.15/0.11 88.55/0.23 91.57/0.38 89.15/0.33 weighted according to labeled accuracy per
KG16:T 91.17/0.29 88.42/0.24 91.21/0.33 88.72/0.24 coarse grained PoS tag on the validation set
KG16:G 91.85/0.27 89.23/0.31 92.04/0.18 89.65/0.10
AN16 85.52/0.34 77.67/0.30 87.70/0.31 79.48/0.24 (w4).
CH16 92.42/0.00 89.60/0.00 92.82/0.00 90.26/0.00
DM17 93.37/0.27 91.37/0.24 93.72/0.14 91.84/0.18 • Distilling: In Kuncoro et al. (2016) the au-
SH17 89.67/0.24 85.05/0.24 89.89/0.29 84.55/0.30 thors train a distillation parser using a loss
NG17 90.37/0.12 87.19/0.21 90.67/0.15 87.58/0.11
objective with a cost that incorporates ensem-
setup1
ble uncertainty estimates for each possible at-
Valid. PoSTW Test PoSTW
UAS LAS UAS LAS tachment.
CM14 81.03/0.17 75.24/0.30 81.50/0.28 76.07/0.17
BA15 83.44/0.20 77.70/0.25 84.06/0.38 78.64/0.44 4 Results
KG16:T 77.38/0.14 68.81/0.25 77.41/0.43 69.13/0.43
KG16:G 78.81/0.23 70.14/0.33 78.78/0.44 70.52/0.51 Tables 4, 7 and 9 show the performances of the en-
AN16 77.74/0.25 66.63/0.16 77.78/0.33 67.21/0.30
CH16 84.78/0.00 78.51/0.00 86.12/0.00 79.89/0.00
sembles built on the best results on validation set
DM17 85.01/0.16 78.80/0.09 86.26/0.16 80.40/0.19 obtained in the 5 training/test cycles considering
SH17 80.52/0.18 73.71/0.14 81.11/0.29 74.53/0.26 both setup0 and setup2. Table 6 reports the num-
NG17 82.02/0.11 75.20/0.24 82.74/0.39 76.22/0.41
ber of malformed trees for the majority strategy.
setup2
Valid. Ita+PoSTW Test PoSTW
Table 5 and 8 report the number of cases when
UAS LAS UAS LAS the ensemble combination output differs from the
CM14 85.52/0.13 81.51/0.05 82.62/0.24 77.45/0.23 baseline, including both labeled (L) and unla-
BA15 87.85/0.13 83.80/0.12 85.15/0.29 80.12/0.27
KG16:T 83.89/0.23 77.77/0.26 80.47/0.36 72.92/0.46
beled (U) outputs. On the average the percent-
KG16:G 84.70/0.14 78.41/0.14 81.41/0.37 73.49/0.19 age of different unlabeled output varies from 2%
AN16 82.95/0.33 73.46/0.37 79.81/0.27 69.19/0.19 to 15% with respect to baseline. For the best result
CH16 89.16/0.00 84.56/0.00 86.85/0.00 80.93/0.00
DM17 89.72/0.10 85.85/0.13 87.22/0.24 81.65/0.21
(DM17+ALL) the difference on setup0 and setup2
SH17 85.85/0.36 80.00/0.39 83.12/0.50 76.38/0.38 is about 4%.
NG17 86.81/0.04 82.13/0.09 84.09/0.07 78.02/0.11 The results of the voting approach reported in
Table 4 shows that the majority strategy is slightly
Table 2: Mean/standard deviation of UAS/LAS for
better than the switching strategy, although it must
each parser and for the different setups by repeat-
be taken into account that there might be ill-
ing the experiments 5 times. All the results are sta-
formed dependency trees for the former strategy.
tistically significant (p < 0.05) and the best values
The percentage of ill-formed trees on valid./test
are showed in boldface.
set vary from a minimum of 2% to a maximum
Validation Test of 8%. For this reasons the majority strategy
should be used when it is followed by a man-
UAS LAS UAS LAS
ual correction phase. The switching strategy per-
setup0
forms well if the first parser of voters is one of the
mi 98.30% 97.82% 98.08% 97.72%
best parsers, in fact the combinations AN16+ALL
MA 96.62% 95.10% 96.31% 94.82%
and AN16+CM14+SH17 have worst performance
setup2
than the counterparts which using the best parser
mi 97.08% 96.02% 96.32% 94.73%
(DM17) as the first voter. Overall, the highest
MA 94.62% 91.29% 93.27% 88.50%
performance is achieved using all parsers together
Table 3: Results obtained by building an ensemble with DM17 as the first voter. For setup0 the in-
system based on the oracles mi e M A and consid- creases are +0.19% in UAS e +0.38% in LAS,
ering all parsers. while in setup2 are +0.92% in UAS e +2.47% in
LAS with respect to the best single parser (again
DM17).
each word in the sentence is a node. The The results of the reparsing approach reported
MSTs algorithms used are Chu-Liu/Edmons in Table 7 shows that the Chu-Liu/Edmonds al-
(cle) and Eisner as reported in McDonald gorithm is slightly better than the Eisner algo-
et al. (2005). Three weighting strategies for rithm. In this case, the choice of which strategy
setup0 setup0
Validation Test Validation Test
Voters/Strategy UAS LAS UAS LAS /11.908 /10.417
DM17+CH16+BA15/maj. 94.20% 92.27% 93.77% 92.13% Voters/Strategy U L U L
DM17+CH16+BA15/swi. 94.11% 92.16% 93.79% 92.14% DM17+CH16+BA15/maj. 208 61 188 46
AN16+CM14+SH17/maj. 90.43% 87.96% 91.03% 88.47% DM17+CH16+BA15/swi. 192 52 175 39
AN16+CM14+SH17/swi. 89.44% 86.77% 90.17% 87.43% AN16+CM14+SH17/maj. 1.006 424 783 336
DM17+CM14+SH17/maj. 93.84% 92.03% 93.82% 92.27% AN16+CM14+SH17/swi. 1.130 489 870 371
DM17+CM14+SH17/swi. 93.76% 91.94% 93.82% 92.25% DM17+CM14+SH17/maj. 170 37 139 15
AN16+ALL/maj. 94.37% 92.65% 93.83% 92.27% DM17+CM14+SH17/swi. 157 33 129 13
AN16+ALL/swi. 93.99% 92.15% 93.43% 91.73% AN16+ALL/maj. 382 126 328 105
DM17+ALL/maj. 94.42% 92.67% 93.94% 92.41% AN16+ALL/swi. 460 164 386 133
DM17+ALL/swi. 94.38% 92.60% 93.91% 92.37% DM17+ALL/maj. 356 117 282 81
DM17 (baseline) 93.74% 91.66% 93.75% 92.03% DM17+ALL/swi. 312 97 255 72
setup2 setup2
Validation Test Validation Test
Voters/Strategy UAS LAS UAS LAS /24.243 /12.668
DM17+CH16+BA15/maj. 90.57% 87.16% 88.21% 83.64% Voters/Strategy U L U L
DM17+CH16+BA15/swi. 90.51% 87.10% 88.13% 83.51% DM17+CH16+BA15/maj. 597 219 470 213
AN16+CM14+SH17/maj. 86.90% 83.60% 84.09% 79.78% DM17+CH16+BA15/swi. 521 185 394 172
AN16+CM14+SH17/swi. 86.01% 82.50% 82.58% 77.94% AN16+CM14+SH17/maj. 2.757 1.329 1.805 941
DM17+CM14+SH17/maj. 90.35% 87.21% 88.07% 83.64% AN16+CM14+SH17/swi. 2.976 1.429 1.986 1.033
DM17+CM14+SH17/swi. 90.27% 87.11% 87.99% 83.52% DM17+CM14+SH17/maj. 490 140 337 93
AN16+ALL/maj. 90.30% 87.26% 88.36% 84.13% DM17+CM14+SH17/swi. 453 121 300 73
AN16+ALL/swi. 89.70% 86.45% 87.46% 83.06% AN16+ALL/maj. 1.377 624 897 440
DM17+ALL/maj. 90.64% 87.60% 88.51% 84.42% AN16+ALL/swi. 1.610 741 1.063 534
DM17+ALL/swi. 90.65% 87.62% 88.50% 84.20% DM17+ALL/maj. 1.156 502 784 378
DM17 (baseline) 89.82% 85.96% 87.59% 81.95% DM17+ALL/swi. 920 374 614 280

Table 4: Results of ensembles using switching and Table 5: Numbers of cases when there is a dif-
majority approaches on the best models in setup0 ferent output between the ensemble systems, us-
and setup2. The baseline is defined by the best ing switching and majority, and the baseline Dozat
results of Dozat and Manning (2017). and Manning (2017).
setup0 setup2
to use must take into account if we want to allow Voters Valid. Test Valid. Test
/564 /482 /1235 /674
non-projectivity or not. The percentage of non- DM17+CH16+BA15 9 7 31 31
projective dependency trees on valid./test set for AN16+CM14+SH17 45 25 88 77
Chu-Liu/Edmonds vary from a minimum of 7% to DM17+CM14+SH17 6 6 19 23
AN16+ALL 18 17 73 63
a maximum of 12% compared with the average for DM17+ALL 17 11 75 57
the Italian corpora of 4%. Overall, the highest per-
formances are achieved using Chu-Liu/Edmonds Table 6: Number of malformed trees obtained by
algorithm. For setup0 the increases are +0.25% using the majority strategy for both setups.
in UAS and +0.45% in LAS, while in setup2 are
+0.77% in UAS and +2.30% in LAS with respect Thanks to the number of parser models adopted
to the best single parser (DM17). in the experiments it has been possible to verify
The results of the distilling strategy reported in that the performances of the ensemble models in-
Table 9, unlike the previous proposals, show worse crease as the number of parsers grows.
outcomes, which score below the baseline. The improvement of LAS is, in most cases, at
least twice the value of UAS. This could mean
5 Discussion and Conclusions
that ensemble models catch with better precision
We have studied the performances of some neu- the type of dependency relations rather than head-
ral dependency parsers on generic and social me- dependent relations.
dia domain. Using the predictions of each single All the proposed ensemble strategies, except for
parser we combined the best outcomes to improve distilling, perform more or less in the same way,
the performance in various ways. The ensemble therefore the choice of which strategy to use is
models are more efficient on corpora built using due, in part, to the properties that we want to ob-
in-domain data (social media), giving an improve- tain on the combined dependency tree.
ment of ∼ 1% in UAS and ∼ 2.5% in LAS. Our work is inspired by the work of Mazzei
setup0 Setup UAS LAS
Validation Test setup0 92.50% (–1.25%) 89.93% (–2.10%)
Voters/Strategy UAS LAS UAS LAS
DM17+CH16+BA15/cle-w2 93.82% 91.85% 93.54% 91.83%
setup2 86.73% (–0.86%) 81.39% (–0.56%)
DM17+CH16+BA15/cle-w3 93.89% 91.82% 93.78% 92.06%
DM17+CH16+BA15/cle-w4 94.20% 92.28% 93.72% 92.04% Table 9: Results of distilling approach on the best
DM17+CH16+BA15/eisner 94.05% 92.05% 93.46% 91.78% models in setup0 and setup2. In brackets are re-
ALL/cle-w2 94.31% 92.53% 93.85% 92.23%
ALL/cle-w3 94.16% 92.41% 94.00% 92.48%
ported the differences between the distilled mod-
ALL/cle-w4 94.29% 92.58% 93.95% 92.38% els and the best results of DM17, as baseline.
ALL/eisner 94.31% 92.53% 93.95% 92.35%
DM17 (baseline) 93.74% 91.66% 93.75% 92.03%
setup2 the models used in the ensembles; furthermore we
Validation Test have experimented the distilling strategy and eis-
Voters/Strategy UAS LAS UAS LAS ner reparsing algorithm. Moreover, we built en-
DM17+CH16+BA15/cle-w2 90.33% 86.95% 87.69% 83.31%
DM17+CH16+BA15/cle-w3 89.82% 85.96% 87.59% 81.95% sembles on larger datasets using both generic and
DM17+CH16+BA15/cle-w4 90.41% 86.99% 87.94% 83.32% social media texts.
DM17+CH16+BA15/eisner 90.50% 87.05% 88.04% 83.51%
ALL/cle-w2 90.52% 87.53% 88.36% 84.25%
ALL/cle-w3 89.90% 86.75% 87.79% 83.54%
Acknowledgements
ALL/cle-w4 90.42% 87.46% 88.19% 84.11%
ALL/eisner 90.45% 87.41% 88.31% 84.08% We gratefully acknowledge the support of
DM17 (baseline) 89.82% 85.96% 87.59% 81.95% NVIDIA Corporation with the donation of the Ti-
tan Xp GPU used for this research.
Table 7: Results of ensembles using reparsing ap-
proaches on the best models in setup0 and setup2. References
The baseline is again defined by the best results of
DM17. Anita Alicante, Cristina Bosco, Anna Corazza,
and Alberto Lavelli. 2015. Evaluating italian
setup0 parsing across syntactic formalisms and anno-
Validation Test tation schemes. In Roberto Basili, Cristina
/11.908 /10.417 Bosco, Rodolfo Delmonte, Alessandro Mos-
Voters/Strategy UAS LAS UAS LAS
DM17+CH16+BA15/cle-w2 360 129 307 90 chitti, and Maria Simi, editors, Harmonization
DM17+CH16+BA15/cle-w3 96 0 89 1 and Development of Resources and Tools for
DM17+CH16+BA15/cle-w4 267 76 247 52 Italian Natural Language Processing within the
DM17+CH16+BA15/eisner 375 130 327 103
ALL/cle-w2 400 131 333 103 PARLI Project, Springer International Publish-
ALL/cle-w3 351 108 299 79 ing, Cham, pages 135–159.
ALL/cle-w4 383 126 307 87
ALL/eisner 411 133 333 106 Daniel Andor, Chris Alberti, David Weiss, Ali-
setup2 aksei Severyn, Alessandro Presta, Kuzman
Validation Test Ganchev, Slav Petrov, and Michael Collins.
/24.243 /12.668
2016. Globally normalized transition-based
Voters/Strategy UAS LAS UAS LAS
DM17+CH16+BA15/cle-w2 1.056 496 800 424 neural networks. In Proceedings of the 54th
DM17+CH16+BA15/cle-w3 0 0 0 0 Annual Meeting of the Association for Compu-
DM17+CH16+BA15/cle-w4 603 264 491 236
DM17+CH16+BA15/eisner 1.047 443 789 376
tational Linguistics (Volume 1: Long Papers).
ALL/cle-w2 1.347 599 882 417 ACL, Berlin, Germany, pages 2442–2452.
ALL/cle-w3 1.261 537 804 363
ALL/cle-w4 1.274 576 822 389 Miguel Ballesteros, Chris Dyer, and Noah A.
ALL/eisner 1.367 607 916 436 Smith. 2015. Improved transition-based parsing
by modeling characters instead of words with
Table 8: Numbers of cases when there is a differ- lstms. In Proceedings of the 2015 Conference
ent output between the ensemble systems, using on Empirical Methods in Natural Language
reparsing approaches, and the baseline Dozat and Processing. ACL, Lisbon, Portugal, pages 349–
Manning (2017). 359.
Cristina Bosco, Felice DellOrletta, Simonetta
(2015). Different from his work, we use larger Montemagni, Manuela Sanguinetti, and Maria
set of state-of-the-art parsers, all based on neural Simi. 2014. The evalita 2014 dependency pars-
networks, in order to gain more diversity among ing task. In Proceedings of the Fourth Inter-
national Workshop EVALITA 2014. Pisa, Italy, Transactions of the Association for Computa-
pages 1–8. tional Linguistics 4:313–327.
Cristina Bosco and Alessandro Mazzei. 2011. The Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng
evalita 2011 parsing task. In Working Notes of Kong, Chris Dyer, and Noah A. Smith. 2016.
EVALITA 2011, CELCT, Povo, Trento. Distilling an ensemble of greedy dependency
parsers into one mst parser. In Proceedings
Danqi Chen and Christopher Manning. 2014. A
of the 2016 Conference on Empirical Methods
fast and accurate dependency parser using neu-
in Natural Language Processing. ACL, Austin,
ral networks. In Proceedings of the 2014
Texas, pages 1744–1753.
Conference on Empirical Methods in Natural
Language Processing (EMNLP). ACL, Doha, Alberto Lavelli. 2013. An ensemble model for
Qatar, pages 740–750. the evalita 2011 dependency parsing task. In
Bernardo Magnini, Francesco Cutugno, Mauro
Hao Cheng, Hao Fang, Xiaodong He, Jianfeng
Falcone, and Emanuele Pianta, editors, Eval-
Gao, and Li Deng. 2016. Bi-directional atten-
uation of Natural Language and Speech Tools
tion with agreement for dependency parsing. In
for Italian. Springer Berlin Heidelberg, Berlin,
Proceedings of the 2016 Conference on Empir-
Heidelberg, pages 30–36.
ical Methods in Natural Language Processing.
ACL, Austin, Texas, pages 2204–2214. Alberto Lavelli. 2014. Comparing state-of-the-
art dependency parsers for the evalita 2014 de-
Jinho D. Choi, Joel Tetreault, and Amanda Stent. pendency parsing task. In Proceedings of the
2015. It depends: Dependency parser compari- Fourth International Workshop EVALITA 2014.
son using a web-based evaluation tool. In Pro- Pisa, Italy, pages 15–20.
ceedings of the 53rd Annual Meeting of the As-
sociation for Computational Linguistics and the Alberto Lavelli. 2016. Comparing state-of-the-art
7th International Joint Conference on Natural dependency parsers on the italian stanford de-
Language Processing (Volume 1: Long Papers). pendency treebank. In Proceedings of the Third
ACL, Beijing, China, pages 387–396. Italian Conference on Computational Linguis-
tics (CLiC-it 2016). Napoli, Italy, pages 173–
Timothy Dozat and Christopher D. Manning. 178.
2017. Deep biaffine attention for neural depen-
Alessandro Mazzei. 2015. Simple voting algo-
dency parsing. In Proceedings of the 2017 In-
rithms for italian parsing. In Roberto Basili,
ternational Conference on Learning Represen-
Cristina Bosco, Rodolfo Delmonte, Alessandro
tations.
Moschitti, and Maria Simi, editors, Harmoniza-
Chris Dyer, Miguel Ballesteros, Wang Ling, tion and Development of Resources and Tools
Austin Matthews, and Noah A. Smith. 2015. for Italian Natural Language Processing within
Transition-based dependency parsing with stack the PARLI Project, Springer International Pub-
long short-term memory. In Proceedings of lishing, Cham, pages 161–171.
the 53rd Annual Meeting of the Association for
Ryan McDonald, Fernando Pereira, Kiril Ribarov,
Computational Linguistics and the 7th Interna-
and Jan Hajic. 2005. Non-projective depen-
tional Joint Conference on Natural Language
dency parsing using spanning tree algorithms.
Processing (Volume 1: Long Papers). ACL,
In Proceedings of Human Language Technol-
Beijing, China, pages 334–343.
ogy Conference and Conference on Empiri-
Johan Hall, Jens Nilsson, Joakim Nivre, Gülsen cal Methods in Natural Language Processing.
Eryigit, Beáta Megyesi, Mattias Nilsson, and ACL, Vancouver, British Columbia, Canada,
Markus Saers. 2007. Single malt or blended? pages 523–530.
a study in multilingual parser optimization. In Dat Quoc Nguyen, Mark Dras, and Mark John-
Proceedings of the CoNLL Shared Task Session son. 2017. A novel neural network model for
of EMNLP-CoNLL 2007. ACL, Prague, Czech joint pos tagging and graph-based dependency
Republic, pages 933–939. parsing. In Proceedings of the CoNLL 2017
Eliyahu Kiperwasser and Yoav Goldberg. 2016. Shared Task: Multilingual Parsing from Raw
Simple and accurate dependency parsing us- Text to Universal Dependencies. ACL, Vancou-
ing bidirectional lstm feature representations. ver, Canada, pages 134–142.
Joakim Nivre, Marie-Catherine de Marneffe, Filip
Ginter, Yoav Goldberg, Jan Hajic, Christo-
pher D. Manning, Ryan McDonald, Slav Petrov,
Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty,
and Daniel Zeman. 2016. Universal dependen-
cies v1: A multilingual treebank collection. In
Proceedings of the Tenth International Confer-
ence on Language Resources and Evaluation
(LREC 2016).
Nils Reimers and Iryna Gurevych. 2017. Re-
porting score distributions makes a difference:
Performance study of lstm-networks for se-
quence tagging. In Proceedings of the 2017
Conference on Empirical Methods in Natural
Language Processing. ACL, Copenhagen, Den-
mark, pages 338–348.
Kenji Sagae and Alon Lavie. 2006. Parser com-
bination by reparsing. In Proceedings of the
Human Language Technology Conference of
the NAACL, Companion Volume: Short Papers.
ACL, Stroudsburg, PA, USA, NAACL-Short
’06, pages 129–132.
Tianze Shi, Liang Huang, and Lillian Lee. 2017a.
Fast(er) exact decoding and global training for
transition-based dependency parsing via a min-
imal feature set. In Proceedings of the 2017
Conference on Empirical Methods in Natural
Language Processing. ACL, Copenhagen, Den-
mark, pages 12–23.
Tianze Shi, Felix G. Wu, Xilun Chen, and Yao
Cheng. 2017b. Combining global models for
parsing universal dependencies. In Proceedings
of the CoNLL 2017 Shared Task: Multilingual
Parsing from Raw Text to Universal Dependen-
cies. ACL, Vancouver, Canada, pages 31–39.
Daniel Zeman and Zdeněk Žabokrtský. 2005. Im-
proving parsing accuracy by combining diverse
dependency parsers. In Proceedings of the
Ninth International Workshop on Parsing Tech-
nology. ACL, Vancouver, British Columbia,
pages 171–178.