<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Russian Named Entities Recognition and Classification Using Distributed Word and Phrase Representations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Roman Ivanitskiy</string-name>
          <email>litemn@yandex.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Shipilo</string-name>
          <email>alexandershipilo@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liubov Kovriguina</string-name>
          <email>lyukovriguina@corp.ifmo.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ITMO University</institution>
          ,
          <addr-line>Saint-Petersburg</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Saint-Petersburg State University</institution>
          ,
          <addr-line>Saint-Petersburg</addr-line>
          ,
          <country country="RU">Russia</country>
          ,
          <institution>ITMO University</institution>
          ,
          <addr-line>Saint-Petersburg</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>150</fpage>
      <lpage>156</lpage>
      <abstract>
        <p>The paper presents results on Russian named entities classification and equivalent named entities retrieval using word and phrase representations. It is shown that a word or an expression's context vector is an efficient feature to be used for predicting the type of a named entity. Distributed word representations are now claimed (and on a reasonable basis) to be one of the most promising distributional semantics models. In the described experiment on retrieving similar named entities the results go further than retrieving named entities of the same type or named entities-individuals of the same class: it is shown that equivalent variants of a named entity can be extracted. This result contributes to the task of unsupervised entities and semantic relations clustering and can be used for paraphrase search and automatic ontology population. The models were trained with word2vec on the Russian segment of parallel corpora used for statistical machine translation. Vector representations were constructed and evaluated for words, lexemes and noun phrases.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        1 Introduction
Model of distributed word and phrase
representations introduced by Mikolov in 2013
        <xref ref-type="bibr" rid="ref7">(Mikolov et
al., 2013)</xref>
        has proved its efficiency on a variety of
languages and tasks in natural language
processing and got a number of extensions since its
appearance. It provides a faster and more accurate
implementation of the models relying on the basic
idea of distributional semantics known as ”similar
words occur in similar contexts”. Mikolov et al.
have shown that ”word representations computed
using neural networks are very interesting because
the learned vectors explicitly encode many
linguistic regularities and patterns, and ... many of
these patterns can be represented as linear
translations”
        <xref ref-type="bibr" rid="ref7">(Mikolov et al., 2013)</xref>
        . This paper presents
the results of word2vec1 application to the
traditional NLP task - named entity recognition (NER)
- for the Russian language. Results concerning
NER classification can contribute to the pool of
evaluation data and extend existing distributional
semantic models for Russian, i.e., RusVectores2.
      </p>
      <p>NER recognition and classification can be
successfully done using a large number of techniques
and resources, especially technologies of
Semantic Web and knowledge bases like DBPedia3,
which provides semantic search over billions of
entities. DBpedia Spotlight4, a tool for
automatically annotating mentions of DBpedia resources
in the text, can skip the problem of NER
annotation for newswire corpora, nonfiction corpora,
datasets of medical records, etc. However, some
genres of human discourse produce texts that lack
such resources and demand considerable efforts
on its annotation: spoken language gives a plenty
of examples of occasional abbreviations,
unpredictable names distortion of personalia, toponyms
and organizations. Moreover, there has emerged
a recent activity on paraphrase search. This
determined the interest to analyze the response of
the trained word2vec model given a named entity
as a stimulus. Before applying word2vec to
spoken corpora we decided to test its ability to
cluster named entities with the same label and extract
semantic equivalents for a given named entity on
Russian segment of parallel corpora used for
ma1Word2vec is a group of models (and software) for
unsupervised word representations learning.</p>
      <p>2Cf. http://ling.go.mail.ru/dsm/en/
3Cf. http://wiki.dbpedia.org/
4Cf. http://spotlight.dbpedia.org/
chine translation. Two experiments are described
in the paper. The first one learns SVM classifier
on the FactRuEval5 training dataset, the second
experiment analyses lists of entities with the
highest value of the cosine measure with the named
entity-stimulus. Both experiments are done on 4
training models: models 1 and 2 were trained on a
1 billion corpus (word forms and lexemes
respectively) and models 3 and 4 were trained on a 100
million corpus (a subset of the larger) which has
been annotated with noun phrases to extend word
representations to noun phrase representations.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        There exists a considerable number of studies on
NER on English texts evaluating various types
of algorithms, but Russian NER has been mostly
done using rule-based algorithms and pattern
matching whereas recent studies focus on words
embeddings as a feature for training NER
classifiers
        <xref ref-type="bibr" rid="ref12">(Turian et al., 2010)</xref>
        , on news corpora
        <xref ref-type="bibr" rid="ref10">(Siencnik, 2015)</xref>
        ,
        <xref ref-type="bibr" rid="ref9">(Seok et al., 2016)</xref>
        , microblog posts
        <xref ref-type="bibr" rid="ref2">(Godin et al., 2014)</xref>
        ,
        <xref ref-type="bibr" rid="ref14 ref5">(Kisa and Karagoz, 2015)</xref>
        ,
CoNLL 2003 Shared Task Corpus and Wikipedia
articles.
      </p>
      <p>
        Segura-Bedmar et al.
        <xref ref-type="bibr" rid="ref8">(Segura-Bedmar et al.,
2015)</xref>
        describe a machine learning approach that
uses word embedding features to recognize drug
names from biomedical texts. They trained
the Word2vec tool on two different corpora:
Wikipedia and MedLine aimed to study the
effectiveness of using word embeddings as features to
improve performance of the NER system. To
evaluate approach and compare it with previous work,
they made a series of experiments on the dataset of
SemEval-2013 Task 9.1 Drug Name Recognition.
Demir and Ozgur
        <xref ref-type="bibr" rid="ref1 ref13">(Demir and Ozgur, 2014)</xref>
        developed a fast unsupervised method for learning
continuous vector representations of words, and used
these representations along with language
independent features to develop a NER system. They
evaluated system for the highly inflectional
Turkish and Czech languages. Turkish datasets
contained 63.72M sentences that correspond to a total
of 1.02B words and 1.36M hapax legomena.
Publicly available data crawled from Czech news sites
provided by the ACL machine translation
workshop were used for the Czech language. This
dataset contained 36.42M sentences
correspond5Cf. http://github.com/
dialogue-evaluation/factRuEval-2016
ing to 635.99M words and 906K hapax legomena.
      </p>
      <p>
        A number of papers describe experiments that
go beyond word representations and ”construct
phrase embeddings by learning how to
compose word embeddings using features that
capture phrase structure and context”
        <xref ref-type="bibr" rid="ref14 ref5">(Yu and Dredze,
2015)</xref>
        ,
        <xref ref-type="bibr" rid="ref6">(Lopyrev, 2014)</xref>
        . However, ”phrase”
notion in these works is quite vague and varies
considerably. Yin and Schultze stress that
”generalized phrases ... include conventional linguistic
phrases as well as skip-bigrams. ... Socher et
al. use the term ”word sequence”. Mikolov et
al. use the term ”phrase” for word sequences that
are mostly frequent continuous collocations”
        <xref ref-type="bibr" rid="ref1 ref13">(Yin
and Schu¨tze, 2014)</xref>
        . For the purposes of the
described experiment accurate noun phrase
extraction is crucial, because items of the noun phrase
can be rare words but the whole phrase can occur
in frequent contexts (about processing rare words
in distributed word representations models see
paper
        <xref ref-type="bibr" rid="ref3">(Guthrie et al., 2006)</xref>
        ).
3
3.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Data Preparation</title>
      <p>Datasets
Four datasets were built to train distributed word
representations on the basis of FactRuEval
training dataset and Russian parts of parallel corpora
used to train statistical machine translation
systems6. The list of all used corpora is given below:
• Russian subcorpus of Multilingual UN
Parallel Text 2000—2009,
• Europarl,
• News,
• FactRuEval,
• Russian subcorpus of Yandex parallel corpus,
• Russian subcorpus</p>
      <p>Russian parallel corpus.</p>
      <p>of</p>
      <p>Czech-EnglishTotal size of these corpora is 1 billion tokens.</p>
      <p>Datasets will be from now on referred to as
Dataset 1, Dataset 2, Dataset 3 and Dataset 4.
They were used to train word2vec models with
the same indices. Basic preprocessing included
removal of xml/html tagging, timestamps and URLs.</p>
      <p>6Cf. http://www.statmt.org/
Dataset 1. This corpus is built of wordforms
of 1 billion corpora and has no linguistic
preprocessing except tokenization. Training
entity is word form.</p>
      <p>Dataset 2. This is 1 lemmatized
billion corpus. Tagging was performed using
Mystem morphological analyzer7 supporting
homonymy resolution. Training entity is
lexeme.</p>
      <p>Dataset 3. This is 100 million subcorpus of
the above corpus. Training entities are
wordforms and noun phrases.</p>
      <p>Dataset 4. This is lemmatized 100 million
subcorpus of the above corpus. Training
entities are lexemes and noun phrases (also
represented by lexemes).
3.2</p>
      <p>
        Noun Phrase Extraction for Corpora 3
and 4
For the given task, a noun phrase may
include more than one named entity, therefore, to
provide equal context probability smaller noun
phrases were extracted from the complex ones
(i.e string ”Government of Krasnoyarsk Krai”
(label:organization) is represented by the whole noun
phrase and its smaller part: noun phrase
”Krasnoyarsk Krai” (label:location). For these cases
sentences are duplicated in the corpus for each
embedded noun phrase. Noun phrases are extracted
using the following procedure:
• input sentences are tokenized, tagged and
parsed using SemSin syntactic parser that
produces a labelled syntactic tree for the
input sentence
        <xref ref-type="bibr" rid="ref4">(Kanevsky and Boyarsky, 2012)</xref>
        ;
• the NP extraction algorithm finds all word
sequences depending from every noun within
the sentence and writes these sequences as a
candidate noun phrase;
• candidate noun phrases that contain no
symbols in uppercase are filtered out.
4
      </p>
      <sec id="sec-3-1">
        <title>Evaluation Procedure</title>
        <p>System performance was evaluated using the
above mentioned manually tagged FactRuEval test
7Cf. https://tech.yandex.ru/mystem/
dataset. It has 3 basic types of named entities:
name of persons, organizations and locations. For
the first experiment a string containing named
entity was sent to classifier and it produced its
label. For datasets 1 and 2 evaluation dataset was
cut to named entities represented by single word
forms/lexemes, datasets 3 and 4 were evaluated on
the whole test set (see results in Tables 2–5 of
Section 6). For the second experiment named entities
from the training FactRuEval dataset were used as
stimuli. For datasets 1 and 2 the stimuli list
included only unigrams and for datasets 3 and 4 the
list was built of 20% unigrams and 80% of noun
phrases length from 2 to 5. Each stimulus was
fed to the trained word2vec model that generated
a response list of 10 NE-candidates having highest
cosine measures. Candidate NEs were manually
tagged as true if a candidate was a named entity
and had the same class as the stimulus, and false
otherwise. Evaluation results are presented in
Table 6, section 6.
5</p>
      </sec>
      <sec id="sec-3-2">
        <title>Experiment Setup</title>
        <p>The overall architecture of the system can be seen
in Fig. 1. Software used includes open source
word2vec toolkit8, Java libraries for word2vec9,
Weka10 and NLP software mentioned in Section
3.</p>
        <p>Both experiments workflow comprises the
following steps:
1. Data collection and cleansing;
2. Data linguistic processing (tokenization,
sentence segmentation, tagging, parsing);
3. NP extraction;
4. Model training and evaluation on wordforms
(trained model 1);
5. Model training on evaluation on lexemes
(trained model 2);
6. Model training and evaluation on noun
phrases (trained model 3 and 4);
7. Building stimuli lists for each model;
8. Experiment 1 on NE classification;
8Cf. https://code.google.com/archive/p/
word2vec
9Cf. http://deeplearning4j.org/
10Cf. http://www.cs.waikato.ac.nz/ml/
weka/</p>
        <p>Experiment 1 detailed plan. SVM classifier was
learned on FactRuEval training set. NE word2vec
vectors were used as feature vectors (dimension
was set to 200). FactRuEval test set was used to
test the classifier that is sent a NE-unigram or a
NE-noun phrase and returns its label.</p>
        <p>Experiment 2 detailed plan. Unigrams and noun
phrases from the stimuli lists were sent to the
trained word2vec models. Each model returned a
list of 10-best candidates for each stimulus that
included both words and phrases (for models 3 and
4). Percent of named entities having the same
label as the stimulus was count.
6</p>
        <p>
          Results and Discussion
Experiment 1: NE Label Prediction Evaluated
on FactRuEval Training and Test Datasets
Figures 2–5 below show output of SVM
classifier after dimensionality reduction using t-SNE
algorithm11 for all 4 training models. Distribution
of NE labels conforms with the well-known fact
that in many cases it is difficult or impossible to
distinguish organizations and locations12.
Classification quality was evaluated with f-score
measure, results are given in tables 2–5. The system
shows competitive quality in comparison to other
11Cf. https://lvdmaaten.github.io/tsne/
12In Figures 2–5, 0 corresponds to organizations, 1 - to
locations, 2 - names of persons.
machine learning or rule-based algorithms
developed for the Russian language according to the
report provided by the FactRuEval committee in
2016
          <xref ref-type="bibr" rid="ref11">(Starostin et al., 2016)</xref>
          , see Table 1. In
Table 1 minimum and maximum values for
precision, recall and f-score are given. Average
values for the performance of 13 NER systems that
took part in the competition are given in round
brackets. If we compare state-of-the-art
performance with the performance of the described
system (for model 4), based on distributed word
representations approach, we can see that the system
shows average results for locations (0.86 f-score)
and persons (0.89 f-score) and outperforms
stateof-the-art systems in retrieving organizations (0.79
vs 0.68 f-score). NE-unigrams are classified with
very high f-scores (0.99, 0.96 and 0.97 f-scores for
persons, locations and organizations respectively
acc. to model 2). It can be seen from figure 3 that
points corresponding to three NE types interfere
less showing better classification results. This is a
common feature for models 3 and 4 that both were
trained on datasets containing lemmas, whereas
models 1 and 3 (see fig. 2 and 4) were trained on
datasets with wordforms and the areas
corresponding to each NE are very vague. Persons names are
classified with the highest f-score in all 4 models
that is quite predictable, because sometimes
distinguishing between locations and organizations is
a non-trivial task (i.e. sometimes it can not be
made clear from the context what is mentioned
a social institute (organization) or a building it
occupies (location). Both for NE-single words and
NE-phrases results show importance of
lemmatization before computing word embeddings for the
inflectional languages with rich morphology, like
Russian, even when a large corpus is used.
Quality was evaluated with f-score measure,
percent of true positives is given in Table 5. The
overall quality is not high, still it is possible to find
and predict the class of unlabelled named entities
which vectors have high cosine measure with the
vector of the labelled NE. Trained model 1
produces high f-score values due to evaluation
limitations: in models 1 and 2 only unigrams are
considered. Consequent comparison of trained
models 2-4 confirms that quality improves when noun
phrases are predicted.
        </p>
        <p>Given a word or a phrase, word2vec is
capable to retrieve linguistic units that are involved in
some semantic relation with the given one:
synonyms, items of the same paradigmatic class,
associations. But what can be found in the semantic
similarity space of a named entity? In this
experiment it is assumed that among words and phrases
low some examples are provided, only English
translations are given. NE-stimulus is the first item
in the list, given in italics, the rest items are
responses. Equivalents (that can be paraphrases or
alternative names) are given in bold.</p>
        <p>• The Prosecutor General: ATTY GEN,
ATTY GEN of Russia, RF ATTY GEN,
Deputy Prosecutor General, RF Prosecutor
General, RF Prosecutor, General
Prosecutor Office, Prosecutor General of Russia,
Prosecutor General of Ukraine, Prosecutor
General of Moscow
• Latin America: Latin, South America,
Counties of Latin America, Latin
American countries, South-East Asia, Countries
of South America, China, Country, Eastern
Europe
In most cases, the list of responses contains
individuals of the same class as the stimulus: i.e.
given the name of a region in Russia, it will return
a list of other Russian regions. Among the
NEcandidates for the city stimuli wrongly
lemmatized city names and toponyms misspellings were
found, which can be also used to eliminate
lemmatization or spelling mistakes.
7</p>
        <p>Future Work
Future work implies development of a stable and
comprehensive model of distributed noun phrase
representations that will extend existing resources
for the Russian language. Admissible results on
NE prediction using response word2vec lists allow
to continue with the experiments on NE
recognition from noisy texts and spoken language.
Ability of distributed word representations to capture
paraphrases and lexical variants of named entities
can be used in algorithms of paraphrase search and
similar entities and events clustering.
which vectors have high cosine measure with the
vector of a named entity equivalent names of a
named entity can be found. This turned out to be
true for 48% of organizations, 50% of locations,
57% of person names (acc.to model 4). In 30%
of cases more than 3 equvalent names are found
among first 10 responses to the NE-stimulus.
Be</p>
        <p>Acknowledgements
This work was partially financially supported by
the Government of the Russian Federation, Grant
074-U01.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Hakan</given-names>
            <surname>Demir</surname>
          </string-name>
          and
          <string-name>
            <given-names>Arzucan</given-names>
            <surname>Ozgur</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Improving Named Entity Recognition for Morphologically Rich Languages Using Word Embeddings</article-title>
          .
          <source>In 13th International Conference on Machine Learning and Applications, ICMLA</source>
          <year>2014</year>
          , Detroit, MI, USA, December 3-
          <issue>6</issue>
          ,
          <year>2014</year>
          , pages
          <fpage>117</fpage>
          -
          <lpage>122</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Frederic</given-names>
            <surname>Godin</surname>
          </string-name>
          , Baptist Vandersmissen, Wesley De Neve, and Rik Van de Walle.
          <year>2014</year>
          .
          <article-title>ACL W-NUT NER shared task: Named Entity Recognition for Twitter Microposts using Distributed Word Representations</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>David</given-names>
            <surname>Guthrie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Ben</given-names>
            <surname>Allison</surname>
          </string-name>
          , Wei Liu, Louise Guthrie, and
          <string-name>
            <given-names>Yorick</given-names>
            <surname>Wilks</surname>
          </string-name>
          .
          <year>2006</year>
          . A Closer Look at Skipgram Modelling.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Evgeniy</given-names>
            <surname>Kanevsky</surname>
          </string-name>
          and
          <string-name>
            <given-names>Kirill</given-names>
            <surname>Boyarsky</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>The Semantic-and-Syntactic Parser SEMSIN</article-title>
          .
          <source>In International conference on computational linguistics Dialog-2012</source>
          (-
          <fpage>2012</fpage>
          ), Bekasovo, Russia.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Kezban</given-names>
            <surname>Dilek</surname>
          </string-name>
          Kisa and
          <string-name>
            <given-names>Pinar</given-names>
            <surname>Karagoz</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Named Entity Recognition from Scratch on Social Media</article-title>
          .
          <source>In Proceedings of the 6th International Workshop on Mining Ubiquitous and Social Environments (MUSE</source>
          <year>2015</year>
          )
          <article-title>co-located with the 26th</article-title>
          <source>European Conference on Machine Learning / 19th European Conference on Principles and Practice of Knowledge Discovery in Databases (ECML PKDD</source>
          <year>2015</year>
          ), Porto, Portugal, September 7,
          <year>2015</year>
          ., pages
          <fpage>2</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Konstantin</given-names>
            <surname>Lopyrev</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Learning Distributed Representations of Phrases.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , Ilya Sutskever, Kai Chen, Gregory S. Corrado, and
          <string-name>
            <given-names>Jeffrey</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Distributed Representations of Words and Phrases and their Compositionality</article-title>
          .
          <source>In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8</source>
          ,
          <year>2013</year>
          ,
          <string-name>
            <given-names>Lake</given-names>
            <surname>Tahoe</surname>
          </string-name>
          , Nevada, United States., pages
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Isabel</given-names>
            <surname>Segura-Bedmar</surname>
          </string-name>
          ,
          <article-title>Vıctor Suarez-Paniagua, and</article-title>
          <string-name>
            <given-names>Paloma</given-names>
            <surname>Martınez</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Exploring Word Embedding for Drug Name Recognition</article-title>
          .
          <source>In 13th International Conference on Machine Learning and Applications, ICMLA</source>
          <year>2014</year>
          , Detroit, MI, USA, December 3-
          <issue>6</issue>
          ,
          <year>2014</year>
          , pages
          <fpage>117</fpage>
          -
          <lpage>122</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Miran</given-names>
            <surname>Seok</surname>
          </string-name>
          ,
          <string-name>
            <surname>Hye-Jeong</surname>
            <given-names>Song</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chan-Young</surname>
            <given-names>Park</given-names>
          </string-name>
          ,
          <source>JongDae Kim, and Yu seop Kim</source>
          .
          <year>2016</year>
          .
          <article-title>Named Entity Recognition using Word Embedding as a Feature</article-title>
          .
          <source>International Journal of Software Engineering and Its Applications</source>
          ,
          <volume>10</volume>
          (
          <issue>2</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Scharolta</given-names>
            <surname>Katharina Siencnik</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Adapting Word2vec to Named Entity Recognition</article-title>
          .
          <source>In Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA</source>
          <year>2015</year>
          ), pages
          <fpage>239</fpage>
          -
          <lpage>243</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Starostin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Bocharov</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alexeeva</surname>
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Bodrova</surname>
            <given-names>A.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>FactRuEval 2016: Evaluation of Named Entity Recognition and Fact Extraction Systems for Russian</article-title>
          .
          <source>In Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue</source>
          <year>2016</year>
          ”, Moscow, June 1-4,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Joseph</given-names>
            <surname>Turian</surname>
          </string-name>
          ,
          <string-name>
            <surname>Lev-Arie Ratinov</surname>
            , and
            <given-names>Yoshua</given-names>
          </string-name>
          <string-name>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Word Representations: A Simple and General Method for Semi-Supervised Learning</article-title>
          .
          <source>In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>384</fpage>
          -
          <lpage>394</lpage>
          , Uppsala, Sweden, July. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Wenpeng</given-names>
            <surname>Yin</surname>
          </string-name>
          and Hinrich Schu¨tze.
          <year>2014</year>
          .
          <article-title>An Exploration of Embeddings for Generalized Phrases</article-title>
          .
          <source>In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          <year>2014</year>
          , June 22-27,
          <year>2014</year>
          , Baltimore,
          <string-name>
            <surname>MD</surname>
          </string-name>
          , USA, Student Research Workshop, pages
          <fpage>41</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Mo</given-names>
            <surname>Yu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Dredze</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Learning Composition Models for Phrase Embeddings</article-title>
          .
          <source>TACL</source>
          ,
          <volume>3</volume>
          :
          <fpage>227</fpage>
          -
          <lpage>242</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>