<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Hybrid Approach to Part-of-Speech Disambiguation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>E. P. Bruches</string-name>
          <email>bruches@bk.ru</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>D. A. Karpenko</string-name>
          <email>dmitry.karpenko@onpositive.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>V. A. Krayvanova</string-name>
          <email>krayvanova@yandex.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Altai State Technical University</institution>
          ,
          <addr-line>Barnaul</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>OnPositive</institution>
          ,
          <addr-line>Novosibirsk</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper a hybrid approach to the part-of-speech disambiguation for Russian is presented. It combines the rule-based and the articial neural networks oriented approaches. The methods are applied independently, then the results are compared, and the decision about the right part of speech for the word-form is made. The main advantage of the presented algorithm is the ability to suggest several parts of speech when it is dicult to select the right one with a fair degree of condence. It reduces the probability of error during the syntactic and semantic levels analysis. Java implementation of the proposed algorithm is published under EPL.</p>
      </abstract>
      <kwd-group>
        <kwd>NLP</kwd>
        <kwd>part-of-speech disambiguation</kwd>
        <kwd>articial neural networks</kwd>
        <kwd>morphological analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>One of the barriers to use NLP in urgent needs for practical tasks is a lack of
a free high-quality parser. The nal aim of our work is to create such a parser.
This paper describes our solution of one of the problems we faced with, namely
a Part-of-Speech (POS) ambiguity and need of a good POS-tagging.</p>
      <p>
        Over last few years in addition to traditional approaches to disambiguation
(statistical[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and rule-based[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]) machine learning based algorithms were
proposed. In Stanford Tagger 2.0 Maximum entropy cyclic dependency network is
used[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Santos et al.[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] have suggested the using MLP with Neural Character
Embeddings. The tendency of applying machine learning algorithms to this task
is also observed for the Russian language. Malanin[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] uses heterogeneous
neural networks to a POS-tagging in Russian. Antonova and Solov’ev in paper[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
present the results of a Conditional Random Fields (CRF) approach.
      </p>
      <p>Approaches based on the formal description of the rules for determining parts
of speech, require much manual labour. Neural networks and other statistical
approaches allow to extract implicitly these rules from annotated corpora, but the
question about suciency of samples arises. Hence, in this paper we suggest our
own approach to a POS disambiguation, which combines both articial neural
networks and rule-based approach.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Algorithm</title>
      <p>We use Russian sentences as an input of the algorithm. We develop the tagging
module which based on morphological dictionary. The dictionary is formed from
the OpenCorpora4 and the Wiktionary5 data. The tags, which the dictionary
consists of, are part of speech, gender, number, case etc. The full list can be found
on our web-site6. At the initial stage a tagging module assigns morphological
features to all the word-forms.</p>
      <p>Let T agsw be a set of tags, which a word-form w is assigned with. A pair
Tw = hw; T agswi is the token of the word-form w. The sequence of tokens is an
input of the POS homonymy disambiguation module. Let SP0 T agsw be a
set of POS tags, which a word-form assigned with. The following situations can
occur as a result of the disambiguation algorithm.</p>
      <sec id="sec-2-1">
        <title>The single tag for a word-form was detected.</title>
        <p>The set of POS tags was decreased.</p>
        <p>The set of POS tags was keep which was assigned initially.</p>
        <p>We apply the neural networks approach and the rule-based approach
independently and then compare results to make a general decision of disambiguation.
Let SPr be a set of parts of speech, which were chosen by the rule-based
approach and let SPn be a set of parts of speech, which were chosen by the neural
networks. If SPr \ SPn is not empty, then a word-form is assigned with all the
tags from set SPnew = SPr \ SPn. Otherwise a word-form is assigned with all
the tags from set SPnew = SPr [ SPn. For some words with POS homonymy one
of the parts of speech appears in vast majority of cases, while the other one is
extremely rare. We have dened POS-tag for such words beforehand and apply
a ltration algorithm to process these words. The list of words which ltration
is used for, can be found on the project web-site 7.</p>
        <p>Let’s consider each approach by itself.</p>
        <p>There are several possible approaches to solving the task of POS
disambiguation with neural networks. In the simplest case one network with outputs
corresponding to all the recognized parts of speech can be trained. However,
such architecture makes possible theoretically to get a part of speech, which
was absent in the dictionary for this word-form. Another possible approach is
to train an individual neural network for every set of POS tags, which occur
a in training set. The size of training subsets for such networks is limited, and
4 http://opencorpora.org/
5 https://ru.wiktionary.org/
6 http://176.9.34.20:8080/com.onpositive.text.webview/materials/tagsets.</p>
        <p>html
7 http://176.9.34.20:8080/com.onpositive.text.webview/materials/filters.</p>
        <p>html
thus it cannot guarantee sucient recognition quality. Therefore we use the third
approach, when for every ambiguous pair of POS tags hsp1; sp2i two neural
networks are trained. Neural network Nsp1;sp2 estimates the correctness of tag sp1 in
the presence of the alternative tag sp2, and neural network Nsp2;sp1 estimates it
vice versa. For example, Ł ıºŁ (‘ they were driving for twenty
minutes ’) the homonymous word Ł (‘ minutes ’/‘will pass ’) can be a noun
or a verb, the right choice is a noun. Hence, neural network Nsp1;sp2 is a
classier with a single neuron on the output layer, which estimates likelihood that
the input vector corresponds to the part of speech sp1. Training sets for neural
networks are made from a part of an annotated text corpus from Open Corpora.
Main corpus of OpenCorpora does not have full POS disambiguation. The cases,
which in dictionary have POS ambiguity and which were disambiguated in the
corpus, are used for training. We have found words, which in dictionary have
POS ambiguity, and trained each network on such cases. The obtained cases are
classied by parts of speech and homonymy. For uncommon pairs hsp1; sp2i we
are not able to collect a sucient training set, so these homonymy cases are not
examined in this approach.</p>
        <p>The neural network analyses not the word-form but only its morphological
image T agsn(w) T ags(w), which includes the following key features: part
of speech, case, gender, number, transitivity and aspect. The analysed context
consists of three words: the homonymous word wi, wi 1 to the left of it and wi+1
to the right of it, where i is the word number in the sentence. At the sentence
boundary the parameters of previous or following word are replaced with zeros.
The case where a phrase consists of one word is not analysed.</p>
        <p>The network input is a vector of binary values (0 or 1), which is calculated as
follows. Tags T agsn(w) for word-form w are converted into binary vector B with
length M , where M is the number of tags in the corresponding set. Let xt be the
binary value of tag t in some grammatical category, then xt = 1 if t 2 T ags(w)
and xt = 0 otherwise. For example, if xt corresponds to the Nominative case,
then xt = 1 in the case when tagging module has assigned Nominative case to
the current word-form. If the word-form has another case or even does not have
such category, then xt = 0. If more than one tag correspond to the current
wordform, i.e. homonymy, then all the binary values xt, which correspond to all the
possible tags, will be set to 1 in the binary vector B. Let’s consider the example
of forming the data for the POS grammatical category. For simplicity let’s use
a reduced list of possible parts of speech: noun; adjective; verb; aderb. In this
way, the word Œ (‘ tractor ’), which is unambiguously tagged as noun, will
be associated with binary vector h1; 0; 0; 0i. The word ÆºŁº (‘ whitewash ’),
which can be a noun or a verb depending on the context, will be associated with
binary vector h1; 0; 1; 0i. As a result, the group of binary vectors hBi 1; Bi; Bi+1i
will correspond to the current word and its context. The numbering of the
wordforms in the sentence begins with 0. Four bits E = he1; e2; e3; e4i, which encode
word-form position concerning sentence boundaries, are added at the end of
group hBi 1; Bi; Bi+1i. If i = 0 then e1 = 1; if i = 1 then e2 = 1; if i = sn 2
then e3 = 1; if i = sn 1 then e4 = 1, where sn is word-form quantity in the
sentence. Obtained vector of binary values is given as an input to all the neural
networks, designed to resolve this homonymy case. For example, if there is an
ambiguity Noun Verb, two networks Nnoun;verb and Nverb;noun designed for
cases with noun and verb respectively will be used. If the result of the neural
network is above the specied threshold value, then the veried POS-tag is
supposed to be correct. The algorithm described above performs within the
bounds of a sentence for all the words, which have homonymy, from left to
right. According to the training rate criterion and recognition quality, we use
multilayer homogeneous neural networks with a sigmoidal activation function
and a resilient backpropagation algorithm for a training. The input layer has
133 neurons, each of two hidden layers has 532 neurons, and the output layer
has the single neuron.</p>
        <p>Our rule-based approach to a POS disambiguation consists of two stages:
rules applying and results testing with syntactic relations. In the algorithm the
word-form context is considered within the bounds of a sentence. The algorithm
applies the rules to each word-form wi, for which tagging module assigned more
than one part of speech, i.e. jSP0j &gt; 1. The rules are created manually and
represent the pair C(hT i) ! sp, where C(hT i) is a predicate, which has the
context of analysed word-form wi as input parameters; hT i is a set of tokens;
sp is a part of speech, assigned to the analysed word-form wi, if a predicate is
true on this word-form context. Each rule is aimed at the choice between two
parts of speech. All the rules are divided into several groups depending on the
homonymy type they deal with. They are presented by two types:
1. The rules, that require the presence of certain grammatical features in the
word-form context (for example, one of the rules to disambiguate the
wordform aeŁ (‘ news ’/’to lead ’) requires the presence of a preposition to
dene this word-form as an noun);
2. The rules, that require the absence of certain grammatical features in the
word-form context (for example, one of the rules to disambiguate the
wordform (‘ volume ’/’that ’) requires the absence of a preposition to dene
this word-form as a noun).</p>
        <p>The obtained sets of rules could be found on the web-site 8. Before analysis the
tokens Twi 1 , Twi , Twi+1 are divided into parts of speech and all the possible
sets with two and three tokens are formed. Let Twspi be POS-tags from token Twi .
According to the number of analysed tokens all the rules are divided into:
triplet hT i = hTwspi 1 ; Twspi; Twspi+1 i;
paired prex ; Twspii;
paired postx h ThTi i==h ThTwswpispi1; Twspi+1 i;
paired neutral hT i = hTwspi; Twspj i , where Twspj = Twspi+1 or Twspj = Twspi 1 .
For example, if Twi 1 has two possible parts of speech, and Twi has three possible
parts of speech, then 2 * 3 = 6 possible combinations will be produced for prex
8 http://176.9.34.20:8080/com.onpositive.text.webview/materials/rules.</p>
        <p>html
and neutral types of rules. All the possible combinations for hTwspi 1 ; Twspii are
given as an input to all the prex and neutral rules, all possible combinations
for hTwspi; Twspi+1 i are given as an input to all the postx and neutral rules. The
POS-tags from the initial set SP0 for the word-form wi, which are included into
the phrases satisfying at least to one of the rules, are kept. At the beginning
and at the end of the sentence the rules, which have only two arguments, are
used as well as for the sentences consisting of two words. We do not apply this
approach to the sentences consisting of the single word. After the word-form was
assigned with the set of POS SPr, the result is tested using a syntactic rules.
The testing is as follows. There is the set of syntactic relations, which can be
formed by the words with the certain POS-tag. If the word with the assigned
POS-tag sp 2 SPr is able to form the syntactic relation in the sentence, then sp
is supposed to be correct. If some of POS-tags are veried by syntactic rules, the
other POS-tags are removed. The algorithm proposes all the POS-tags, assigned
by the rules, if there is no veried POS-tags.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Evaluation and Results</title>
      <p>The evaluation of the proposed algorithm quality was carried out on two
disambiguated text corpora: OpenCorpora (the part which is not used for training)
and RusCorpora9. In order to evaluate the implementation quality for words
with POS homonymy, we carried out the precision of partial disambiguation (at
least on tag from assigned is correct) with formula 1.</p>
      <p>P =</p>
      <sec id="sec-3-1">
        <title>WsemiAmbig ;</title>
        <p>WAmbig
where WsemiAmbig is the number of words, for which SPnew \ SPetalon 6= ?,
SPnew is POS-tag set for current ambigious word-form, which was formed as a
result of the algorithm implementation, SPetalon is POS-tag set, which the
wordform was assigned with in the corpus; WAmbig is the number of words, which were
initially assigned with more than one POS-tags. Another characteristic of the
implementation quality is accuracy for all the words (including words without
homonymy), which was carried out with formula 2.</p>
        <p>Acc =</p>
      </sec>
      <sec id="sec-3-2">
        <title>Wcorrect ;</title>
        <p>Wtotal
where Wcorrect is the number of words, for which SPnew \ SPetalon 6= ?, Wtotal
is the total number of words. We also have calculated, how the size of the set
of assigned POS-tags decreased after processing. Corpora sizes and evaluation
results are shown in table 1. The last column shows what percent of extra
POStags was removed. Such considerable divergence in results can be explained by
the fact, that in these corpora the dierent tag sets and, moreover, the dierent
approaches to determination of parts of speech are used.</p>
        <sec id="sec-3-2-1">
          <title>9 http://www.ruscorpora.ru/</title>
          <p>
            For comparison, the Trigram model, described in the paper [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ], has precision
97,18% in the Part-of-Speech Tagging task. M-WANN-Tagger, presented in the
paper [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ], copes with the same task for Russian with precision 97,01%. The
authors in [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] managed to achieve precision 96,56% in the Part-of-Speech Tagging
using heterogeneous neural networks.
4
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>The proposed approach to the part-of-speech disambiguation for Russian
combines both machine learning and expert systems. The results revealed that the
algorithm copes with notional parts of speech, and it is less eective with
functional ones. There is one of the priority lines for our future investigation. We
are also going to improve this algorithm for multi-words and unknown words
and to take into account punctuation. The java implementation of the proposed
algorithm is available on our web-site 10.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Sokirko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toldova</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>The comparison of two methods of lexical and morphological disambiguation for Russian</article-title>
          . Internet-mathematics
          <year>2005</year>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Brill</surname>
          </string-name>
          , E.:
          <article-title>A Simple Rule-Based Part of Speech Tagger</article-title>
          .
          <source>In: Proceedings of ANLC'92</source>
          . (
          <year>1992</year>
          )
          <fpage>152</fpage>
          <lpage>155</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.:
          <article-title>Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics?</article-title>
          .
          <source>In: Computational Linguistics and Intelligent Text Processing, 12th International Conference, Part I. Lecture Notes in Computer Science</source>
          <volume>6608</volume>
          . (
          <year>2011</year>
          )
          <fpage>171</fpage>
          <lpage>189</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zadrozny</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Learning character-level representations for part-of-speech tagging</article-title>
          .
          <source>In: Proceedings of the 31st International Conference on Machine Learning</source>
          , JMLR: W&amp;CP. (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Malanin</surname>
          </string-name>
          , G.:
          <article-title>Part-of-Speech tagging with heterogeneous neural networks and a priori information</article-title>
          .
          <source>Youth scientic-technical journal (12)</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Antonova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <article-title>Solov'ev, A.: Conditional Random Fields in NLP-related Tasks for Russian</article-title>
          .
          <source>Information Technologies and Systems</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Carneiro</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frana</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lima</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Multilingual part-of-speech tagging with weightless neural networks</article-title>
          .
          <source>Neural Networks (16)</source>
          (
          <year>2015</year>
          )
          <fpage>11</fpage>
          <lpage>21</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>