<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Dependency Parsing Performance by Incorporating Additional Features for Agglutinative Languages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mücahit Altıntaş</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. Cüneyd Tantuğ</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>34469</institution>
          ,
          <addr-line>Maslak, Istanbul</addr-line>
          ,
          <country country="TR">Turkey</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Computer Science, Natural Language Processing and Social Robotic Lab, Istanbul Technical University</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Faculty of Engineering, Bayburt University</institution>
          ,
          <addr-line>69002, Bayburt</addr-line>
          ,
          <country country="TR">Turkey</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Language Processing</institution>
          ,
          <addr-line>ALTNLP</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>In recent studies, the use of language models has increased noticeably and has made quite good contributions. However, using the proper representation and taking into account the complementary components are still among the issues to be considered. In this research, the impact of sub-word level sentence piece based word representation on the performance of dependency parsing has been demonstrated for agglutinative languages. Furthermore, we propose to use the sentence representation that holds all meaning of the sentence as an additional feature to improve dependency parsing. Our proposed enhancements are experimented on nine agglutinative languages; Estonian, Finnish, Hungarian, Indonesian, Japanese, Kazakh, Korean, Turkish, and Uyghur. We found that the sentence piece based token encoding has contributed parsing performance for the majority of the experimented languages. Using the entire meaning of the sentence as a complementary feature has enhanced parsing performance for six languages out of nine. agglutinative languages, dependency parsing, sentence piece, sentence representation Dependency parsing is one of the core components of natural language computation that identifies syntactic relationships among the words within a sentence. It is crucial for several natural language processing (NLP) downstream tasks. Zhou et al. [1] employed dependency parsing to obtain semantic representation in order to enhance text-to-speech. Luo et al. [2] applied dependency parsing knowledge as supplementary information, which allows the question answering (QA) model to better match within the semantic component of the question. Zhang et al. [3] utilized the encoder outputs of dependency parser as the inputs for the Seq2Seq neural machine translation (NMT) model by training both dependency parsing and machine translation model parameters concurrently. Cai and Lapata [4], Xia et al. [5] reported that syntax-aware representation improves the semantic role labeling (SRL) performance.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>a word may contain many morphemes, each of which is responsible for supplying the word
with grammatical function or endowing new meaning. A word may have numerous diferent
surface form, that entails the out of vocabulary (OOV) or data sparsity problems. To abate these
problems, sub-word level representations have been proposed in the literature. Dos Santos
and Zadrozny [6], Kim et al. [7] reported that characters level word representation improves
performances for word-level tasks. Yu et al. [8] proposed to use syllable-level word embedding
in morphologically rich languages such as Korean. Bojanowski et al. [9] introduced an extension
of the continuous skip-gram model in which words are represented as the sum of the n-gram
character vectors. However, agglutinative languages convey grammatical information through
inflections, so they tend to have more flexible word order. This case causes discontinuous
constituents that impose non-projectivity in dependency structures [10]. Fortunately, splitting
their morphemes is simple since each piece of grammatical information is contained in a single
morpheme or vice versa. Eryiğit and Oflazer [11] demonstrated that considering morphemes as
the primary units of syntactic structure rather than word forms improves parsing accuracy for
an agglutinative language, Turkish. Özateş et al. [12] made use of the morpheme information
and hand-crafted rules to improve the word vector representation in dependency parsing.</p>
      <p>In this paper, we propose two enhancements to increase dependency parsing accuracy of
agglutinative languages in particular, but not restricted with them only.</p>
      <p>• We employ sub-word level sentence piece [13] based on word representation to capture
morphemes more precisely and also attenuate the OOV (out of vocabulary) and data
sparsity problems. Sentence piece is a neural network-based universal sub-word tokenizer
that is language independent.
• As a complementary feature to token features, we use sentence representation that holds
the whole meaning of the sentence. It is based on the fact that sentences with the same
meaning but diferent word orders have the same dependency tree structure.
We investigate the impact of our proposed improvements to dependency parsing accuracy on
nine widely used agglutinative languages; Estonian, Finnish, Hungarian, Indonesian, Japanese,
Kazakh, Korean, Turkish, and Uyghur.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Approach</title>
      <p>Our proposed model is an enhancement on the experiment described by Dozat and Manning [14].
The enhanced model comprises an LSTM-based encoder and biafine classifiers. Sub-word level
representations; character based and sentence piece [15] based are obtained by using attention
mechanism over hidden states of a single LSTM layer. Three bi-directional LSTM layers are
utilized to make the concatenation of token and sub-token embeddings context-aware. Pre-trained
word embedding is added to the model after these bi-LSTM layers. Sentence representation that
is obtained by concatenating the last hidden states of bi-LSTM and sentence vectors that comes
from the pre-trained model is also employed as an extra feature by broadcasting for each word
in the sentence. Figure 1 illustrates our proposed neural dependency parser architecture.</p>
      <p>To express in formulas,  is a sentence that includes  words and is represented as  =
 0,  1, ...,   where  0 is added synthetically as the ROOT token. Each word   can be represented</p>
      <sec id="sec-2-1">
        <title>Biaffine Classifier</title>
        <sec id="sec-2-1-1">
          <title>Biaffine</title>
          <p>MLP
Linear Layer
Linear Layer
Linear Layer
Linear Layer
x
i
iltonea trraeM
R co</p>
          <p>S</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Biaffine</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Feature Encoder</title>
      </sec>
      <sec id="sec-2-3">
        <title>Output</title>
      </sec>
      <sec id="sec-2-4">
        <title>Concatenation BiLSTM Layer</title>
        <p>. .
. . . .
by a combination of surface form (  ), lemma (  ), POS tag (  ), morphological feature (  ),
character (  ), and sentence piece (  ) based on characteristics of the word, respectively, as given
below (Equ. 1).</p>
        <p>
          =   ,   ,   ,   ,   ,   (
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
        </p>
        <p>Here,   and   are sub-word level features of the word while   ,   ,   and   are word level
features.</p>
        <p>Encoder: The concatenation of word level (Equ. 3) and sub-word level (Equ. 4) embedding
vectors yields the vector   that is used as input to the bi-LSTM layers. (Equ. 2).</p>
        <p>A sub-word representation is obtained by using attention on the stacked hidden states of a
single layer LSTM. (Equ. 5).</p>
        <p>where   =  ,1 ,  ,2 , ...,  , is the sequence of the sub-word features of the word and  is the
number of sub-word features that may be sentence piece or characters of the word.</p>
        <p>A multi-layer Bi-LSTM (Equ. 10) is used to generate contextual word representations over   s.
External contextualized word representation that may be obtained from ELECTRA or BERT is
concatenated with right and left hidden states of the corresponding word on the last Bi-LSTM
layer(Equ. 12).</p>
        <p>= 
 ⊕</p>
        <p>= (  ) ⊕ (  ) ⊕ (  ) ⊕ (  )</p>
        <p>=  (  ) ⊕  (  )
 () = 
  = (</p>
        <p>)
  = [ℎ0; ℎ1; ...; ℎ ]
  =   (((</p>
        <p>,0 ), ..., ( , ))
(⃖ℎ⃖⃗, ℎ
⃖⃖⃖⃗) = (</p>
        <p>, )
 =   ((</p>
        <p>0, ...,   ))
(⃖ℎ⃖⃖
, ℎ
⃖⃖⃗), (⃖ℎ⃖⃖0⃖,⃖ℎ⃖⃖⃗) = ( , )
  =  (  ) ⊕ ⃖ℎ⃖⃖ ⊕ ⃖ℎ⃖⃗

where  (  ) denotes pre-trained model vector of the word surface form   .</p>
        <p>To represent a sentence, the pre-trained model sentence embedding vector is concatenated
with the final hidden states of the last bi-LSTM layer’s backward and forward directions
respectively (Equ. 13).</p>
        <p>=  ( ′
′) ⊕ ⃖ℎ⃖⃖0⃖ ⊕ ⃖ℎ⃖⃖⃗

where  ( ′</p>
        <p>′) provides the sentence representation.</p>
        <p>Classifier:</p>
        <p>Deep bi-afine attention, as proposed in Dozat and Manning
[14], is employed as
a classifier. Multi-layer perceptron (MLP) are used to get concentrated characteristics of word
representation as head (Equ. 15) and dependent (Equ. 14). Then, these representations are input
into a bi-afine attention mechanism, which provides a score vector expressing the likelihood of
being the parent for each word in the sentence (Equ. 17).</p>
        <p>ℎ</p>
        <p>
          ℎ

=  
=  
(  )
(  )
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
(
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
(
          <xref ref-type="bibr" rid="ref7">7</xref>
          )
(
          <xref ref-type="bibr" rid="ref8">8</xref>
          )
(
          <xref ref-type="bibr" rid="ref9">9</xref>
          )
(
          <xref ref-type="bibr" rid="ref10">10</xref>
          )
(
          <xref ref-type="bibr" rid="ref11">11</xref>
          )
(
          <xref ref-type="bibr" rid="ref12">12</xref>
          )
(
          <xref ref-type="bibr" rid="ref13">13</xref>
          )
(14)
        </p>
        <p>Similarly, another bi-afine classifier is employed to compute the dependency label
probabilities of the relevant word with each probable head (Equ. 21).</p>
        <p>(−ℎ)

()
=   
= [ℎ(0−ℎ)</p>
        <p>; ...; ℎ
() ( (−ℎ)
, ℎ</p>
        <p>(−)
ℎ</p>
        <p>ℎ

(−)
(−ℎ)
 (−ℎ)

()
=</p>
        <p>These two bi-afine classifiers are jointly trained in the training phase with respect to the sum
of cross-entropy losses. The Chu-Lie-Edmonds approach is utilized during testing to extract the
greatest spanning tree from the resultant score matrices.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiment and Results</title>
      <p>Our proposed enhancements on dependency parsing have been evaluated on nine
agglutinative languages, namely, Estonian, Finnish, Hungarian, Indonesian, Japanese, Kazakh, Korean,
Turkish, and Uyghur. Table 1 lists some details of utilized treebanks. Indonesian belongs to the
Austronesian language family. Oficial splits of treebanks have been used. All of the scores in
this research are acquired on the test set by the associated model, which was trained on the
related treebank’s training set. Uyghur UDT treebank has no validation set. Thus, we have
used the test set to ensure that the training process was not over-training.
language model (LM) for the relevant language, BERT LM has been used, if BERT LM does
T
D
n
a
i
n
o
t
s</p>
      <p>T
s
i
n
n
i
E</p>
      <p>D</p>
      <p>G</p>
      <p>S
T</p>
      <p>K
h</p>
      <p>h
g</p>
      <p>n
E</p>
      <p>F</p>
      <p>U
K</p>
      <p>T
H</p>
      <p>I</p>
      <p>J</p>
      <p>K
G</p>
      <p>T
r</p>
      <p>r
d
e
g
e
z
S
n
a
i
a
n
u
4477
557
559
20179
not exist, ELMo has been employed to obtain pre-trained word vector, if there is no ELMo,
word2vec pre-trained vectors have been exploited. Table 3b explains which pre-trained word
vectors have been used for each language. We only utilized the corresponding vector of the
ifrst word piece per word, disregarding the remainders for words that may consist of multiple
word pieces in BERT and ELECTRA. We have used the Xaiver uniform initialization [18] with
the same random seed for all our experiments.</p>
      <p>To obtain sub-token based word representations, the first ten sentence pieces of words and
the first twenty characters of words have been used and the rests have been ignored. During
training, one more layer is fine-tuned in each iteration, starting from the final layer of the
pre-trained model. The AdamW optimizer [24] is employed with a linear schedule warm-up.</p>
      <p>As evaluation metrics, the word-based unlabeled attachment score (UAS) and labeled
attachment score (LAS) are utilized. CoNLL 2018 UD Shared Task evaluation script1 has been used to
calculate UAS and LAS.</p>
      <p>To manifest the impact of our proposed enhancements; sentence piece based word
representation and complimentary sentence representation, we provide the UAS and LAS of our
three models. Our benchmark model uses sentence piece based word representation but not
sentence representation. The other two models are the model without using sentence piece
based word representation, and the model with using sentence representation. Table 4 shows the
performances of our models and some previous models that are trained with gold annotations
for the same treebanks; Udify [25], UDPipe 2.0 [26], UDPipe 2.0 with using BERT and Flair
pre-trained word embeddings [26].</p>
      <p>The UDify [25] model intends to create a single parsing model for 75 languages in the UD
dataset, leveraging the multilingual BERT model which has been trained on the top largest 104
languages on Wikipedias. This parser demonstrates that languages with minimal labeled data
can be parsed by using data from other languages. The encoder output was obtained using an
1The evaluation script can be downloaded from http://universaldependencies.org/conll18/conll18_ud_eval.py
attention mechanism through layers of the pre-trained model.</p>
      <p>UDPipe 2.0 [27] is an NLP tool that also includes a dependency parser. Except for a few
minor diferences, its architecture is nearly identical to that of our base parser. It utilizes
character-based word representation obtained by bi-directional GRU (gated recurrent units)
as only sub-word level representation. They employ three forms of embeddings to represent
each input word: pre-trained word embedding, trained word embedding, and character-based
word embedding. Straka et al. [26] looked into the impact of utilizing both BERT and Flair word
vectors on UDPipe 2.0.</p>
      <p>The results show that the sentence piece based word representation has contributed to all
experimented languages other than Estonian and Kazakh. Sentence representation has improved
parsing performance for Estonian, Hungarian, Japanese, Korean and Turkish. In Indonesian,
sentence representation has boosted the LAS while slightly decreased the UAS. In Finnish,
Kazakh and Uyghur, sentence representation has had a little unfavorable afect on the UAS
and the LAS. We have achieved higher scores than previously reported in [25, 26] for Estonian,
Finnish, Hungarian, Korean, and Turkish.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion and Conclusion</title>
      <p>In this study, we propose to employ sub-word level sentence piece based word representation
and sentence representation that stores the entire meaning of the sentence in order to boost
dependency parsing performance. Although the proposed improvements are applicable to
all languages, we experiment their influence on a subset of languages; the nine agglutinative
languages. We intend to alleviate the challenges of dependency parsing for agglutinative
languages due to their unique characteristics such as rich morpho-syntax, flexible word order,
and so on.</p>
      <p>With the exception of Estonian and Kazakh, sentence piece based token encoding improves
parsing performance by capturing morphemes in all languages tested. Despite being an
agglutinative language, Estonian borrows about a third of its vocabulary from Germanic languages. We
think that this is why sentence piece-based word encoding does not increase parsing accuracy
in this language. The obtained result for Kazakh is attributed to a data shortage, because the
Kazakh training set has just 31 sentences. Due to a lack of learning data, parsing accuracy
diminishes as the number of learned parameters grows with each additional feature. In Estonian,
Hungarian, Japanese, Korean, Turkish and partially Indonesian, employing sentence
representation as an additional feature improves the parsing accuracy. Because the entire meaning of the
sentence contributes to extract syntactic information. We construct our sentence representation
by concatenating the latest hidden states of bi-LSTM backward and forward directions, as well
as ELECTRA or BERT-based sentence vectors where they are available. However, because
there are no publicly accessible ELECTRA or BERT pre-trained LMs for the Kazakh and Uyghur
languages, the sentence representations of both of these languages rely only on the final hidden
states of backward and forward directions of bi-LSTM. Additionally, training data of these
languages are relatively small to fit to provide well-learned sentence representation. As a result,
using sentence representation in these languages is inefective in improving parsing accuracy.
For Finnish, we received an unexpected result. Finnish has a large vocabulary because it is a
highly morphological rich language. Because of the vast quantity of the vocabulary, pre-trained
LM tokenizers of this language mostly granulates the token into word pieces that represent
morphemes rather than words. We only used the matching vector of the first word piece per
word when fine-tuning BERT or ELECTRA LM, ignoring the remainders. We suspect that
sentence vectors loses syntactic information, because of disregarding some word pieces carry
syntactic information. This might be why the sentence representation is unable to increase
parsing performance for Finnish.</p>
      <p>In conclusion, sub-word units and morpho-syntactic features are critical to identifying
the syntactic function of the word for agglutinative languages. Sentence piece based word
representation contributes to capturing morphemes of the word and enhances parsing accuracy.
Furthermore, with a few exceptions, sentence representation that stores the whole meaning of
the sentence increases parsing performance for the majority of languages.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>We would like to thank Wiseborn M. Danquah and my dear wife Şeyma Altıntaş for their
insightful remarks, as well as all of the other anonymous reviewers who took the time and
efort to review this research.
[14] T. Dozat, C. D. Manning, Deep biafine attention for neural dependency parsing, arXiv
preprint arXiv:1611.01734 (2016).
[15] T. Kudo, J. Richardson, Sentencepiece: A simple and language independent subword
tokenizer and detokenizer for neural text processing, arXiv preprint arXiv:1808.06226
(2018).
[16] K. Clark, M.-T. Luong, Q. V. Le, C. D. Manning, Electra: Pre-training text encoders as
discriminators rather than generators, arXiv preprint arXiv:2003.10555 (2020).
[17] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[18] X. Glorot, Y. Bengio, Understanding the dificulty of training deep feedforward neural
networks, in: Proceedings of the thirteenth international conference on artificial intelligence
and statistics, JMLR Workshop and Conference Proceedings, 2010, pp. 249–256.
[19] A. Virtanen, J. Kanerva, R. Ilo, J. Luoma, J. Luotolahti, T. Salakoski, F. Ginter, S. Pyysalo,</p>
      <p>Multilingual is not enough: Bert for finnish, arXiv preprint arXiv:1912.07076 (2019).
[20] D. M. Nemeskey, Natural Language Processing Methods for Language Modeling, Ph.D.</p>
      <p>thesis, Eötvös Loránd University, 2020.
[21] F. Ginter, J. Hajič, J. Luotolahti, M. Straka, D. Zeman, CoNLL 2017 shared task -
automatically annotated raw texts and word embeddings, 2017. URL: http://hdl.handle.net/
11234/1-1989, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied
Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University.
[22] K. Kim, Pretrained language models for korean, https://github.com/kiyoungkim1/LMkor,
2020.
[23] W. Che, Y. Liu, Y. Wang, B. Zheng, T. Liu, Towards better UD parsing: Deep
contextualized word embeddings, ensemble, and treebank concatenation, in: Proceedings of the
CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal
Dependencies, Association for Computational Linguistics, Brussels, Belgium, 2018, pp. 55–64. URL:
http://www.aclweb.org/anthology/K18-2005.
[24] I. Loshchilov, F. Hutter, Fixing weight decay regularization in adam, 2018. URL: https:
//openreview.net/forum?id=rk6qdGgCZ.
[25] D. Kondratyuk, M. Straka, 75 languages, 1 model: Parsing universal dependencies
universally, arXiv preprint arXiv:1904.02099 (2019).
[26] M. Straka, J. Straková, J. Hajič, Evaluating contextualized embeddings on 54 languages
in pos tagging, lemmatization and dependency parsing, arXiv preprint arXiv:1908.07448
(2019).
[27] M. Straka, Udpipe 2.0 prototype at conll 2018 ud shared task, in: Proceedings of the CoNLL
2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, 2018,
pp. 197–207.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <article-title>Dependency parsing based semantic representation learning with graph neural network for enhancing expressiveness of text-to-speech</article-title>
          ,
          <source>arXiv preprint arXiv:2104.06835</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>Knowledge base question answering via encoding of complex query graphs</article-title>
          ,
          <source>in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>2185</fpage>
          -
          <lpage>2194</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>Syntax-enhanced neural machine translation with syntax-aware word representations</article-title>
          , arXiv preprint arXiv:
          <year>1905</year>
          .
          <volume>02878</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lapata</surname>
          </string-name>
          ,
          <article-title>Syntax-aware semantic role labeling without parsing, Transactions of the Association for Computational Linguistics 7 (</article-title>
          <year>2019</year>
          )
          <fpage>343</fpage>
          -
          <lpage>356</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , M. Zhang, G. Fu,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Si</surname>
          </string-name>
          ,
          <article-title>Syntax-aware neural semantic role labeling</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>33</volume>
          ,
          <year>2019</year>
          , pp.
          <fpage>7305</fpage>
          -
          <lpage>7313</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dos Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zadrozny</surname>
          </string-name>
          ,
          <article-title>Learning character-level representations for part-of-speech tagging</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>1818</fpage>
          -
          <lpage>1826</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jernite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sontag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Rush</surname>
          </string-name>
          ,
          <article-title>Character-aware neural language models</article-title>
          ,
          <source>in: Thirtieth AAAI conference on artificial intelligence</source>
          ,
          <source>2016</source>
          , pp.
          <fpage>2741</fpage>
          -
          <lpage>2749</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kulkarni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Syllable-level neural language model for agglutinative language</article-title>
          ,
          <source>arXiv preprint arXiv:1708.05515</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Grave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , T. Mikolov,
          <article-title>Enriching word vectors with subword information, Transactions of the association for computational linguistics 5 (</article-title>
          <year>2017</year>
          )
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Tsarfaty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Seddah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kübler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Versley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Candito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Foster</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Rehbein</surname>
          </string-name>
          , L. Tounsi,
          <article-title>Statistical parsing of morphologically rich languages (spmrl) what, how and whither</article-title>
          ,
          <source>in: Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Eryiğit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Oflazer</surname>
          </string-name>
          ,
          <article-title>Statistical dependency parsing of turkish</article-title>
          , Sabanci University Research Database (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Ş. B. Özateş</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Özgür</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Güngör</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Öztürk</surname>
          </string-name>
          ,
          <article-title>A hybrid approach to dependency parsing: Combining rules and morphology with deep learning</article-title>
          , arXiv preprint arXiv:
          <year>2002</year>
          .
          <volume>10116</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kudo</surname>
          </string-name>
          , Subword regularization:
          <article-title>Improving neural network translation models with multiple subword candidates</article-title>
          , arXiv preprint arXiv:
          <year>1804</year>
          .
          <volume>10959</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>