<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Reproducing Russian NER Baseline Quality without Additional Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Valentin Malykh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexey Ozerin</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for Systems Analysis of Russian Academy of Sciences</institution>
          ,
          <addr-line>9, pr. 60-letiya Oktyabrya, Moscow, 117312</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Laboratory of Neural Systems and Deep learning, Moscow Institute of Physics and Technology (State University)</institution>
          ,
          <addr-line>9, Institutskiy per., Dolgoprudny, Moscow Region, 141700</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>54</fpage>
      <lpage>59</lpage>
      <abstract>
        <p>Baseline solutions for the named entity recognition task in Russian language were published a few years ago. These solutions rely heavily on the addition data, like databases, and di erent kinds of preprocessing. Here we demonstrate that it is possible to reproduce the quality of existing database-based solution by character-aware neural net trained on corpus itself only.</p>
      </abstract>
      <kwd-group>
        <kwd>named entity recognition</kwd>
        <kwd>character awareness</kwd>
        <kwd>neural nets</kwd>
        <kwd>multitasking</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        The rst results for character-based named entity recognition in English language
were presented in early 2000-s [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The close idea of character-based named entity
tagging was introduced in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for the Portuguese and Spanish languages, but our
model does not use convolution inside. For the English language text classi
cation (close task for the named entity recognition) character-aware architecture
was described in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], it is also basing on convolutions, so principally di ers from
our model. Previous research for Russian language hadn't been based on
characters, but on words [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. State of the art solution on the public corpus with named
entity markup [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is also word-level based.
      </p>
      <p>
        One of the core ideas for our model comes from the character aware neural
nets introduced recently in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Another idea, that of matching the sequences
to train the arti cial neural net to get the text structure is coming from [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Our
solution is based on the multi-task learning which was introduced for natural
language processing tasks in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Model</title>
      <p>
        The architecture of our recurrent neural network is inspired by [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The network
consists of long short-term memory units, which were initially proposed in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
There are two main di erences to the Yoon Kim setup [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. First one is that our
model predicts two things instead of one:
{ the next character,
{ a markup label for the current character.
      </p>
      <p>
        Second one is that we do not use convolution, so we not exploiting word concept
inside our architecture, only character concept. We suppose that model could
learn the concept of word from data, and rely on this assumption while quality
measurement. Prediction errors and gradients are calculated, and then weights
are updated by truncated back-propagation through time [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
3.1
      </p>
      <p>Mathematical formulation
Let ht be the state of the last neural net layer before softmax transformations
(hidden state). The probability is predicted by standard sotfmax over the set of
characters C and the set of markup labels M:</p>
      <p>P r(ct+1jc1:t) =
exp(ht pj1+qj)</p>
      <p>1
Pj02C ht pj10 +q1j0</p>
      <p>exp(ht pi2+qi )
P r(mtjc1:t) = Pi02M ht pi20 +q2i0
2
(1)
(2)
Here pj1 is j-th column in character output embedding matrix P1 2 Rk jCj, q1j
is a character bias term. pi2 is i-th column in markup output embedding matrix
P2 2 Rl jMj and q2i is markup bias term, k and l are character and markup
embedding vector lengths.</p>
      <p>The nal negative log likelihood (N LL) is computed over the test corpus of
length T :</p>
      <p>N LL =</p>
      <p>T
X(log P r(ct+1jc1:t) + log P r(mtjc1:t))
t=1
(3)</p>
      <p>The diagram of our model could be found on the gure 1.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>
        The corpus parameters are presented at table 1, more details on it could be found
in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. It can be obtained from the authors of the original paper by sending a
request to gareev-rm@yandex.ru or to any other author of the original paper.
      </p>
      <p>
        Similar to [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] we calculate 5-fold cross-validation with precision (P), recall
(R), and F-measure (F) metrics. The results of experiments are presented in
table 2. Since we are working with characters we cannot use labelling produced
for characters by our system directly, so we parse the produced markup for every
token (which is known for us from the corpus) and take the label for the majority
of characters in the token as a token label.
      </p>
      <p>.</p>
      <p>g
n
i
l
p
m
a
s
d
n
a
x
a
m
t
f
o
S
s
r
e
y
a
l
N
N
R
s
g
n
i
d
d
e
b
m
E</p>
      <p>M a n d e l b r o t</p>
    </sec>
    <sec id="sec-5">
      <title>Comparison</title>
      <p>The results of comparison are presented on tables 3, 4, 5.</p>
      <p>On the person token class our system performed better than CRF-based one
by all the metrics by the mean value and standard deviation. On the organisation
class our system is better by recall and comparable by F-measure with
CRFmodel. In overall case our system was on par with knowledge-base approach
performance in F-measure and in recall with CRF-model.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>We applied character aware RNN model with LSTM units to the problem of the
named entity recognition in Russian language. Even without any preprocessing
and supplementary data from external knowledge-base the model was able to
learn solution end-to-end from the corpus with markup. Results demonstrated
by our approach are on the level of existing state of the art in the eld.</p>
      <p>The main weakness of proposed model is di erentiation between person and
organization tokens. This is due to the small size of the corpus. A possible
solution is pre-training on a large corpus such as Wikipedia, without any markup,
just to train internal distributed representation of a language model. We presume
that such pre-training would allow RNN to beat CRF-model.</p>
      <p>
        Another direction of our future work is addition of attention as it was
demonstrated to improve performance on character-level sequence tasks [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smarr</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.:
          <article-title>Named entity recognition with character-level models</article-title>
          .
          <source>In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume</source>
          <volume>4</volume>
          , Association for Computational Linguistics (
          <year>2003</year>
          )
          <volume>180</volume>
          {
          <fpage>183</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. dos Santos,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Guimaraes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Niteroi</surname>
          </string-name>
          , R., de Janeiro, R.:
          <article-title>Boosting named entity recognition with neural character embeddings</article-title>
          .
          <source>In: Proceedings of NEWS 2015 The Fifth Named Entities Workshop</source>
          . (
          <year>2015</year>
          )
          <fpage>25</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , LeCun, Y.:
          <article-title>Character-level convolutional networks for text classi cation</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          . (
          <year>2015</year>
          )
          <volume>649</volume>
          {
          <fpage>657</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Popov</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kirilov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maynard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Creation of reusable components and language resources for named entity recognition in russian</article-title>
          .
          <source>In: Conference on Language Resources and Evaluation</source>
          . (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gareev</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tkachenko</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solovyev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simanovsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ivanov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Introducing baselines for russian named entity recognition</article-title>
          .
          <source>In: Computational Linguistics and Intelligent Text Processing</source>
          . Springer (
          <year>2013</year>
          )
          <volume>329</volume>
          {
          <fpage>342</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Alternative structures for character-level rnns</article-title>
          .
          <source>arXiv preprint arXiv:1511.06303</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jernite</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sontag</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rush</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          :
          <article-title>Character-aware neural language models</article-title>
          .
          <source>arXiv preprint arXiv:1508.06615</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          :
          <article-title>Sequence to sequence learning with neural networks</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . (
          <year>2014</year>
          )
          <volume>3104</volume>
          {
          <fpage>3112</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Collobert</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weston</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A uni ed architecture for natural language processing: Deep neural networks with multitask learning</article-title>
          .
          <source>In: Proceedings of the 25th international conference on Machine learning</source>
          ,
          <source>ACM</source>
          (
          <year>2008</year>
          )
          <volume>160</volume>
          {
          <fpage>167</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural computation 9(8)</source>
          (
          <year>1997</year>
          )
          <volume>1735</volume>
          {
          <fpage>1780</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Graves</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Generating sequences with recurrent neural networks</article-title>
          .
          <source>arXiv preprint arXiv:1308.0850</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Golub</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Character-level question answering with attention</article-title>
          .
          <source>arXiv preprint arXiv:1604.00727</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>