<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Named Entity Recognition from Scratch on Social Media</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kezban Dilek Onal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pinar Karagoz</string-name>
          <email>karagozg@ceng.metu.edu.tr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Middle East Technical University</institution>
          ,
          <addr-line>Ankara</addr-line>
          ,
          <country country="TR">Turkey</country>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>With the extensive amount of textual data owing through social media platforms, the interest in Information Extraction (IE) on such textual data has increased. Named Entity Recognition (NER) is one of the basic problems of IE. State-of-the-art solutions for NER face an adaptation problem to informal texts from social media platforms. In this study, we addressed this generalization problem with the NLP from scratch idea that has been shown to be successful for several NLP tasks on formal text. Experimental results have shown that word embeddings can be successfully used for NER on informal text.</p>
      </abstract>
      <kwd-group>
        <kwd>NER</kwd>
        <kwd>word embeding</kwd>
        <kwd>NLP From Scratch</kwd>
        <kwd>Social Media</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Recently, with the extensive amount of data owing through social media
platforms, the interest in information extraction from informal text has increased.
Named entity recognition (NER), being one of the basic subtasks of Information
Extraction, aims to extract and classify entity names from text. Extracted
entities are utilized in applications involving the semantics of the content such as
the topic of the text or location of the mentioned event.</p>
      <p>
        The NER problem has been studied widely in the last decade and
stateof-the-art algorithms achieve performance close to human on formal texts for
English [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and several other languages including Turkish [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. However,
currently, existing NER algorithms face a generalization problem for textual data
from social media. The recognition performance of state-of-the art algorithms
degrade dramatically on English tweets as reported in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and similarly in Turkish
tweets, forum text and speech data [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        The main reason for the decrease in recognition performance of NER
algorithms is the informal and noisy nature of social media text [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Social media text
comprises spelling errors, incorrect use of punctuation, grammar and
capitalization. NER algorithms fail to adapt to this new genre of text because algorithms
are designed for formal text and are based on features present in well-formed
text yet absent in social media. Commonly used features by NER algorithms are
existence in a gazetteer, letter cases of words, Part of Speech (POS) tags and
morphological substructures of words. Quality of these features degrade on social
media. For instance, a single misspelled character in an entity name causes the
'existence in gazetteer' feature to become invalid. Moreover, misspellings cannot
be tolerated by POS tagging and morphological analysis tools.
      </p>
      <p>
        Besides the peculiarities in text structure, types of entities and context in
social media text di er from the newswire text commonly used for training NER
systems [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. For instance, the most common person names mentioned in newswire
text are politicians, businessman and celebrities whereas social media text
contains names of friends, artists, ctional characters from movies [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Similarly,
names of local spots and cafes are common in social media text whereas larger
geographical units like cities, districts, countries are frequent in newswire text
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        Currently, normalization and domain adaptation are the two basic methods
for adapting existing NLP systems to social media text [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Normalization is
integrated to NLP systems as a pre-processing step. Normalization converts
social media text by correct misspellings and removing noise so that existing NLP
algorithms can perform better on the converted text. It improves recognition
scores [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ], yet it is not able to address the issues of lack of context and di
erent entity types mentioned previously. The other method, domain adaptation,
update of existing systems for social media. In rule based systems resource
extension or de nition of new rules is required. Adaptation of machine learning
models for NER requires both re-training of classi ers and re-design of features.
Experiments from [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] have shown that re-training the Stanford CRF model [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
on social media data without feature re-design yields lower recognition
performance. In addition, [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] reports that the performance of a CRF based model for
Turkish on social media is increased when it is re-trained with capitalization
excluded from the feature set.
      </p>
      <p>Although the recognition accuracy is improved by both normalization and
domain adaptation, they require considerable additional e ort and yet it is not
su cient to approximate formal text performance. A system that can adapt to
new genres should be designed to make the most of the syntactic and semantic
context so that it can generalize to various uses of language.</p>
      <p>
        In this study, we investigate the recognition performance of the NLP from
Scratch method by Collobert et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] on social media text. NLP from Scratch is
a semi-supervised machine learning approach for NLP tasks. It includes an
unsupervised learning step for learning word embeddings from a large unannotated
text corpus. Word embeddings are representations for words in a vector space
that can encode semantic similariites. Word embeddings can be used as features
for representing words in classi cation problems. Previous work has shown that
word embeddings are very powerful word features for several NLP tasks since
they can encode the semantic similarity between words [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        The semi-supervised NLP From Scratch approach for training NER classi ers
has gained attention in recent years [
        <xref ref-type="bibr" rid="ref16 ref34">34, 16</xref>
        ] since it enables exploitation of huge
amount of unannotated text that is produced on social media platforms like
blogs, forums and microblogs like Twitter. To the best of our knowledge, this is
the rst work to study the adaptation ability of the NLP from Scratch approach
to social media through word embeddings.
      </p>
      <p>In order to measure the recognition performance of the approach, we
performed experiments on English and Turkish texts. We included several data sets
from di erent platforms. Experiment results suggest that state-of-the art
systems can be outperformed for morphologically rich languages such as Turkish
by NLP from Scratch approach without normalization and any human e ort for
extending gazetteers. For English, an average ranking system can be obtained
without any normalization and gazetteer extension e ort.</p>
      <p>The rest of the paper is composed of six sections. In Section 2, we discuss
related work on NER on social media and present background information on
the NLP from Scratch approach. In Section 3, we present how we applied the
NLP from Scratch approach for social media. We report our experiment results
in Section 4. We give a discussion on the results in Section 5 and conclude the
paper with an overview in Section 6.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <sec id="sec-2-1">
        <title>NER on Social Media for English</title>
        <p>
          State of the art NLP tools for English can achieve very high accuracy on formal
texts yet their performance declines signi cantly on social media text. For
instance, Ritter et al. reported [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ] that F-measure score achieved by the
ConNLLtrained Stanford recognizer drops from 86% to 46% when tested on tweets.
Consequently, there are several recent studies [
          <xref ref-type="bibr" rid="ref14 ref20 ref22 ref23 ref30">23, 20, 30, 22, 14</xref>
          ] that focus on NER
on tweets.
        </p>
        <p>
          Normalization as a pre-processing step is widely studied [
          <xref ref-type="bibr" rid="ref15 ref19 ref21 ref36">21, 15, 36, 19</xref>
          ] to
improve performance of not only NER yet several NLP tasks on social media.
Regarding domain adaptation studies, TwitIE [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] is a version of the ANNIE
component of the GATE platform tailored for tweets. NERD-ML [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ] is an
adaptation of NERD, a system which integrates the power of semantic web
entity extractors into NER process.
        </p>
        <p>
          Besides, there are solutions in the literature that adopt neither normalization
nor domain adaptation. Ritter et al. [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] proposed a two step algorithm speci c
to tweets which rst exploits a CRF model to segment named entities and then
utilizes LabeledLDA to classify entities. Li et al. proposed TwiNER [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] which
extracts candidate entities using external resources Wikipedia and Web-N Gram
corpus and then performs random walk to rank candidates.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>NER on Social Media for Turkish</title>
        <p>
          There exists a considerable number of studies on NER on Turkish texts. Earlier
studies are focused on formal texts whereas the recent studies focus on informal
text speci cally on the messages from the microblogging platform Twitter. For
formal text, an HMM based statistical model by Tur et al. [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] and a rule-based
system by Kucuk et al. [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] are the earliest studies. A CRF based model by Seker
et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and a semi-supervised word embedding based classi er by Demir et al.
[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] are the state-of-the-art NER algorithms for Turkish. Both algorithms report
performance close to human, 92 % on formal text.
        </p>
        <p>
          Despite the high accuracy on formal text, a recent study reports that the CRF
model has a recognition accuracy of 5% CoNLL F1 measure on tweets without
any pre-processing or normalization [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. There is also a dramatic decrease
reported on forum and speech data sets. Previous studies that address NER on
informal Turkish texts [
          <xref ref-type="bibr" rid="ref17 ref4">4, 17</xref>
          ] present normalization methods to x misspellings.
Dictionary extension is another proposed solution in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Normalization of
Turkish social media texts has been addressed by [
          <xref ref-type="bibr" rid="ref1 ref10 ref32">10, 32, 1</xref>
          ] in order to correct the
spelling errors. The results are improved to at most 48% [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] on tweets with
both dictionary extension or normalization which is still low compared to formal
text and requires human knowledge for resource and rule base extension.
        </p>
        <p>
          The NLP From Scratch approach has shown to be successful on Turkish
formal texts [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. The performance of the CRF based classi er can be approximated
by the NLP From Scratch approach without any human e ort for compiling
gazetteers and de ning rules [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Word Embeddings and NLP from Scratch</title>
        <p>
          The NLP from Scratch approach was introduced by Collobert et al. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. In [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ],
word embeddings have been shown to be successful features for several NLP tasks
such as POS Tagging, chunking, NER and semantic role labeling. Word
embeddings are distributed representations for words [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. A distributed representation
of a symbol is a continuous valued vector which captures various characteristic
features of the symbol. Word embedding representation comes as an alternative
to one-hot representation for words which is a discrete high dimensional sparse
vector. The idea behind word embeddings is to map all the words in a language
into a relatively low dimensional space such that words that occur in similar
contexts have similar vectors.
        </p>
        <p>
          Although learning distributed word representations has been studied for a
long time, the concept of word embeddings became notable together with the
Neural Language Model (NNLM) by Bengio et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Bengio et al.'s approach
enables learning a language model and the word embeddings simultaneously. The
NNLM is trained to predict the next word given a sequence of words. The model
is a neural network placed on top of a linear lookup layer which maps each word
to its embedding vector. This embedding layer is treated as an ordinary layer of a
network, its weights are initialized randomly and updated with backpropagation
during training of the neural network. The nal weights correspond to the word
embeddings. The neural network layers above the linear layer constitute the
language model for predicting the next word.
        </p>
        <p>
          Following Bengio's study, several other neural network models for learning
word embeddings have been proposed. Neural language models such as the
NNLM [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], the hierarchical probabilistic neural network language model [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ],
the recurrent neural network model (RNNLM) [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] are complex models that
require extensive computational resources to be trained. The high computational
complexity of the models led to scalability concerns and the very recent word
embedding learning methods like Skip-Gram [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] and Glove [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] lean on
simpler and scalable models. For example, Skip-Gram shared within the Word2Vec
framework enables training on a corpus of a billion words on a personal computer
in an hour without compromising the quality of word embeddings measured on
question sets [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
        </p>
        <p>
          The language models proposed for learning word embeddings adopt di erent
neural network structures and training criteria. The model proposed by Collobert
et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], is trained to predict the center word given the surrounding symmetric
context. The Skip-Gram model [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] utilized in this work aims to predict the
surrounding context at the maximum distance of C from the center word. In
addition, the model does not contain any non-linear layers and is considered as
a log-linear model. In order to create training samples for the Skip-Gram model,
R words from the future and history context are selected as the context where
R is a randomly selected number in the range [1,C].
        </p>
        <p>The most important property of the word embeddings is that the
similarity between the words can be measured by the vector similarity between their
embeddings. Word embedding based NLP classi ers can generalize much better
owing to this property. For example, if a NER classi er has seen the sentence
I visited jennifer with jennifer labeled as person name during training, it can
infer that kate may be a person name in the sentence I visited kate since the
word embeddings of jennifer and kate are similar.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>NER From Scratch on Social Media</title>
      <p>The NER from Scratch approach is composed of two steps:
1. A language model is trained on a large unannotated text corpus. Word
embeddings and a language model are learned jointly during training.
2. A NER classi er is trained on supervised NER data where word embeddings
obtained from Step 1 are leveraged as features for representing words.</p>
      <p>
        This two step approach has shown to yield successful classi ers for several
NLP tasks including NER on English, in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Di erent algorithms and models
can be selected for the two steps. In this study, for learning word embeddings we
utilized the Skip-gram model and the Negative Sampling algorithm [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] within
the Word2Vec framework. As the NER classi er model, we experimented with
the Window Approach Network (WAN) of the SENNA, the NLP from Scratch
framework proposed in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Realization of these steps require a large corpus and supervised NER data.
We applied three pre-processing steps to both unannotated corpus text and
annotated NER data sets. First of all, the URLs in text are replaced with a
unique token. Secondly, numbers are normalized by replacing each digit by "D"
character, as typical in the literature [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]. Finally, all the words are lowercased
meaning that capitalization feature is completely removed from the data.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Learning Word Embeddings</title>
        <p>
          The Skip-Gram [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] model is a log-linear language model designed to predict
the context of a word. The model takes a word as input and predicts words
within a certain range C before and after the input word. The training process
is performed on unsupervised raw textual data by creating training samples
from the sentences in the corpus. For each word w in the sentence, the context
to predict is the words that occur in a context window of size R centered around
w where R is a randomly selected number in the range [1,C].
        </p>
        <p>In this study, we obtained word embeddings of size 50 by training the Skip
Gram model with negative sampling and the context range of 7. We obtained
50-dimensional embedding vectors using the open source implementation of
Word2Vec.1
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Training NER model</title>
        <p>
          The NER classi er model is learned in the supervised learning step. As the NER
classi er model, we experimented with the Window Approach Network (WAN)
of the SENNA, the NLP from Scratch framework proposed in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The WAN
network is trained to predict the NER label of the center word given a context
window. It is a typical neural network classi er with one hidden layer and a
softmax layer placed on top of the output layer. The input layer is a context
window of radius r. Word embeddings are used to represent words therefore
a context window is represented by the vector obtained by concatenating the
embedding vectors of words within the window.
        </p>
        <p>
          The model is trained to map the context window vector to the NER label of
the center word of the window. We trained the WAN network to minimize the
Word-Level Log Likelihood (WLL) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] cost function that considers each word
independently. In other words, relationships between tags of the models is not
considered for determination of the NER label.
        </p>
        <p>In the experiments of this study, we trained a WAN network 350 input units
that corresponds to a context window of size 7 and with 175 hidden layer units.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>We experimented on several English and Turkish data sets in order to measure
the generalization of the NLP from Scratch approach on social media. English
is an analytical language whereas Turkish is an agglutinative language. To be
precise, grammatical structure of Turkish relies on morphemes for in ection. On</p>
      <sec id="sec-4-1">
        <title>1 https://code.google.com/p/word2vec/</title>
        <p>the contrary, grammatical structure of English is based on word order and
auxiliary words. We believe that experiments on these two languages from di erent
paradigms is important to observe the performance of the proposed approach.</p>
        <p>
          We measured the performance of our proposed approach with the CoNLL
metric. It is considered as a strict metric since it accepts a labeling to be true
when both the boundary and the type of the named entity is detected correctly.
For instance, the entity patti smith is considered to be recognized correctly only if
the whole phrase is classi ed as a person entity. Details for the ConLL metric can
be found in [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. We report Precision, Recall and F1 values based on the CoNLL
metric. We used the evaluation script from CoNLL 2003 Shared Task
LanguageIndependent Named Entity Recognition 2 for computing the evaluation metrics.
4.1
4.2
        </p>
        <sec id="sec-4-1-1">
          <title>Experiments on Turkish Texts</title>
        </sec>
        <sec id="sec-4-1-2">
          <title>Data Sets</title>
          <p>
            For the unsupervised word embedding learning phase, we compiled a large
corpus by merging the Boun Web Corpus [
            <xref ref-type="bibr" rid="ref31">31</xref>
            ] and Vikipedi (Turkish Wikipedia)3.
Boun Web Corpus was compiled from three newspaper web pages and a
general sampling of Turkish web pages [
            <xref ref-type="bibr" rid="ref31">31</xref>
            ]. The merged corpus contains about 40
million sentences with 500 million tokens and a vocabulary of size 954K.
          </p>
          <p>
            For evaluation of the proposed approach for NER, we experimented with six
annotated NER datasets from the literature. Two of the data sets, namely
Formal Set 1 [
            <xref ref-type="bibr" rid="ref33">33</xref>
            ] and Formal Set 2 [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ], contain well-formed text compiled from
newspaper resources. We included two Twitter NER data sets, namely Twitter
Set 1 [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ] and Twitter Set 2 [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] in experiments. Moreover, we experimented on
the Forum Data Set and Speech Data Set from [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]. The number of entities per
type in the data sets are given in Table 1.
          </p>
          <p>Data Set</p>
          <p>PER LOC ORG
Formal Set 1 16356 10889 10198
Formal Set 2 398 571 456
Twitter Set 1 458 282 246
Twitter Set 2 4261 240 445
Forum Data Set 21 34 858</p>
          <p>Speech Data Set 85 112 72
Twitter Set 1 includes 2320 tweets that are collected on July 26, 2013
between 12:00 and 13:00 GMT. The Twitter Set 2 includes 5040 tweets.
Twitter Data Set 2 contains many person annotations embedded in hashtags and
mentions whereas there are no such annotations in Twitter Data Set 1.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>2 http://www.cnts.ua.ac.be/conll2002/ner/bin/ 3 https://tr.wikipedia.org</title>
        <p>
          Forum Data Set [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] is collected from a popular online hardware forum.4 It
includes very speci c technology brand and device names tagged as organization
entities. Data contains many spelling errors and incorrect use of capitalization.
        </p>
        <p>
          Speech Data Set [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] is reported to be obtained via a mobile assistant
application that converts the spoken utterance into written text by using Google
Speech Recognition Service. As noted in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], a group of people are asked to give
relevant orders to the mobile application. Orders recorded by the application are
included in the data set.
4.3
        </p>
        <sec id="sec-4-2-1">
          <title>Experiment Results</title>
          <p>In order to measure the adaptation performance of word embedding features,
we trained the WAN network on the well-formed Formal Set 1 and reported
results on the other data sets without re-training. We shu ed Formal Data Set
1 and divided it into three partitions for training, cross validation and test with
percentages of 80%, 10% and 10% of the data set, respectively.</p>
          <p>Algorithm</p>
          <p>PLO</p>
          <p>PER</p>
          <p>LOC</p>
          <p>
            ORG
Formal Set 1 [
            <xref ref-type="bibr" rid="ref33">33</xref>
            ] Seker et al.[
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] 91.94 92.94
          </p>
          <p>
            Demir et al. [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] 91.85 94.69
NER From Scratch 83.84 87.82
          </p>
          <p>In Table 2 and Table 3, we present F1 scores of the NER From Scratch
approach in comparison with the highest scores reported in the literature, on
the related data sets. F1 scores on the three type of entities PER, LOC and
ORG, are given separately. In addition, the PLO column in the tables indicate
the average F1 score over three types. The x signs in the tables indicate that F1
score is not reported for the entity type.</p>
          <p>The NER From Scratch approach outperforms the rule-based system by
Kucuk et al. on Formal Set 2 with a large margin. However, the proposed approach
fails to outperform the previous studies on Formal Set 1. Both of the previous
systems are machine learning classi ers. The CRF model of Seker et al. exploits
gazetteers and capitalization as a feature which are powerful features for formal
text. As mentioned previously, the NER From Scratch approach relies only on
word embeddings. The other NER algorithm by Demir et al. uses word
embeddings as features yet includes additional features such as a xes and word type
as features.
4 http://www.donanimhaber.com</p>
          <p>Algorithm</p>
          <p>PLO</p>
          <p>PER</p>
          <p>LOC</p>
          <p>ORG</p>
          <p>
            The focus of our study is based on generalization to new platforms therefore
a decrease in formal text performance is acceptable since NER From Scratch
approach ignores many of the clues in formal text. It is also crucial to note that
the CRF based model is reported to achieve at most 19% F1 score under CoNLL
schema on tweets [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] with a normalization step. As a nal note on the formal
text performance, within the scope of this study, we measured the performance
with a simple neural network classi er that considers only word context. The
results by word embeddings can be further improved by a classi er that can
incorporate the sentence level context and dependencies between NER tags like
CRF models or the Sentence Approach Network (SAN) in [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ].
          </p>
          <p>
            In Table 3 we report the results on Turkish informal data sets. In Twitter
Set 1, the NER From Scratch approach outperforms previous studies [
            <xref ref-type="bibr" rid="ref10 ref17">17, 10</xref>
            ] by
Kucuk et al. and the NER Pipeline with Normalization [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ]. In [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ] performance
of a multilingual rule-based NER system adapted to Turkish is reported. The
NER system in [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ] is improved by normalization and dictionary extension in
[
            <xref ref-type="bibr" rid="ref10">10</xref>
            ]. The CoNLL F1 score of the ITU NLP Pipeline with Normalization that
includes the CRF-based model [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] is obtained from [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ].
          </p>
          <p>On Twitter Set 1, the NER From Scratch approach outperforms the
rulebased and CRF based solutions without any need for normalization, dictionary
construction or extension. Many of the misspellings in tweets are tolerated and
the misspelled words are recognized correctly due to their closeness to the
original word in the embedding space. Most of the misspellings in Turkish tweets
originate from replacement of Turkish characters with diacritics (c,g, ,o,s,u) with
non-accentuated characters (c,g,i,o,s,u). For instance, the words besiktas
(original: besiktas, football team), sah n (original: sahin, male name) are recognized
correctly. In addition, we have observed that the large amount of unannotated
text can cover other types of misspellings such as single character replacement.
The misspelled name k l ctaroglu (correct version: k l cdaroglu surname of a
Turkish politician) could also be recognized correctly by the proposed approach.</p>
          <p>
            The proposed solution achieves lower F1-score under CoNLL schema than
that of the rule based system of Kucuk et al. [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] on Twitter Data Set 2. The NER
From Scratch outperforms the rule based system on organization names and
location names yet the low F1 score on person entity type leads to a lower PLO
F1 score. Majority of the entities in Twitter Set 2 are person names embedded
in mentions. Approximately 80% of the person names occur in mentions. The
proposed approach fails to recognize these person names since the training set
does not contain any annotations embedded in mentions. For instance, the name
patti smith exists as @pattismith in a mention. This situation causes the extent
of the possible names space to be very large.
          </p>
          <p>
            The NER From Scratch approach outperforms the CRF based model on
the Speech Data Set and the Forum Data Set. However, the improvement by the
proposed approach on the forum data is less than the improvement on the speech
data. Forum data set is collected from a hardware forum [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] and includes very
speci c technology brand and device names tagged as organization entities. The
training data set includes very formal organization names such as Milli Egitim
Bakanl g (meaning Ministry of National Education) di erent from technology
brand names. Despite the domain speci c tags, NER From Scratch performed the
best without any human e ort for dictionary construction. In addition, proposed
approach is able to recognize the brand names steelsies and tp-link that were
seen neither in the corpus nor in the training set.
4.4
          </p>
        </sec>
        <sec id="sec-4-2-2">
          <title>Experiments on English Texts</title>
          <p>
            The recent survey on Twitter NER by Derczynski et al. [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] reports results on
the performance of existing systems on three di erent Twitter data sets. We
experimented on two of these data sets that are publicly available. The rst
data set is the data set from the study of Ritter et al. [
            <xref ref-type="bibr" rid="ref29">29</xref>
            ]. 5 The second data
set is the data set from the Concept Extraction Challenge 6 in Making Sense
of Microblog Posts (MSM) Worskhop in WWW 2013. We refer to this data set
MSM2013 Data Set in the rest of the document. The Ritter data set contains
tokenized content. On the other hand, we tokenized the MSM2013 data set using
the Stanford Core NLP tokenizer. 7
          </p>
          <p>In this set of experiments, we utilized Wikipedia 8 as the corpus for learning
word embeddings. We performed two di erent sets of experiments reported as
NER From Scratch (CoNLL2003) and NER From Scratch (CoNLL2003 + MSM
2013) in Table 5. In the NER From Scratch (CoNLL2003) experiments, we used
5 https://github.com/aritter/twitter_nlp/blob/master/data/annotated/ner.</p>
          <p>txt
6 http://oak.dcs.shef.ac.uk/msm2013/ie_challenge/</p>
          <p>MSM2013-CEChallengeFinal.zip
7 http://nlp.stanford.edu/software/corenlp.shtml
8 http://www.wikipedia.org/
the original CoNLL 2003 data set for training. CoNLL 2003 NER Shared Task
data set is used for training the From Scratch classi er. This data set is the
commonly used for training supervised classi ers for English NER. In the NER
From Scratch (CoNLL2003 + MSM 2013) experiments, we extended training
data set by merging the CoNLL 2003 and MSM 2013 training sets and measured
the performance of the classi er trained with this data set.</p>
          <p>
            Algorithm
ALL PER LOC ORG MISC
NER From Scratch (CoNLL2003) 51.46 66.28 42.52 23.95
NER From Scratch (CoNLL2003 + MSM 2013) 62.53 71.64 51.43 41.71
Bottom Score from [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] (Zemanta) 28.42 45.71 46.59 6.62
Top Score from [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] (NERD-ML) 77.18 86.74 64.08 50.36
          </p>
          <p>
            In Table 5, we present the CoNLL F1 scores of trained models in comparison
with the top and bottom scores in the rank on related data sets from Derczynski
et al.'s study [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ]. It is crucial to note that we performed the exact category
mapping in Derczynski et al.'s work for the Ritter data set so that the scores are
comparable. In addition, reported MSM2013 results are the F1 scores computed
on the test partition of the data set. Test partitions are not included in training.
          </p>
          <p>Results on the MSM2013 show that the NER From Scratch classi er trained
with only CoNLL 2003 data can achieve an average performance. Inclusion of
tweets in the training data set improves the results with 20% on MSM 2013.
This is expected since the training data set and test data set include similar
tweets. A notable increase is observed in the Ritter set with inclusion of tweets.</p>
          <p>The di erence between the results obtained by the classi er trained with the
formal training set and the expanded training set can be attributed to two issues.
First of all, the corpus for embedding learning contains only Wikipedia text yet it
cannot cover any misspellings or informal abbreviations that are found in social
media. Secondly, there is a 10 year gap between the training data and the test
data sets in the rst setup. For instance, companies like Facebook that did not
exist in 2003 are tagged as organizations names in Twitter data sets. Although,
entities unseen in the training set have word embeddings close to similar known
entities, the training set should include samples with the same context.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>Gazetteers are important resources for NER algorithms. Existence of a word
in a gazetteer is exploited for both rule-based and machine learning systems.
We believe that word embeddings can encode gazetteers when supported with
training data. Entity names are clustered in the embedding space. Whenever any
of the names is seen in training data, the model can make connections on new
samples via the vector similarities in embedding space. In Table 6, the closest
5 words in embedding space are given for the person name kaz m and the city
name nigde. In addition, we included common misspellings of these names and
their neighbors in the table. The misspelled versions have similar neighbors.
This implies that a gazetteer which can also cover misspelled words can be
obtained without any human e ort by Word2Vec word embeddings. This is one
of the major reasons that can explain the success of the NER From Scratch
approach on social media NER. Misspelled words and abbreviations occur in
similar contexts with the original word.</p>
      <p>kaz m (PER)
kazim (PER)
(misspelled)
nigde (LOC)
nigde (LOC)
(misspelled)
alpaslan (male name) cuneyt (male name) bal kesir (city) diyarbakir (city)
esat (male name) kursat (male name) ad yaman (city) izmir (city)
tahir (male name) namik (male name) k rsehir (city) amasya (city)
suphi (male name) yucel (male name) corum (city) asfa
nurettin (male name) ertugrul(male name) nevsehir(city) balikesir (city)</p>
      <p>Besides the gazetteer e ect, word embeddings can capture the similarity of
the surrounding context of a word. Similar to entity names, words that co-occur
with a celebrity name are also close in the embedding space. For instance, the
words album and single are mostly referred together with singer names and these
two words are located very closely in the embedding space.</p>
      <p>Performance of the NLP From Scratch approach is a ected by three major
factors:
{ Coverage of the corpus utilized for obtaining word embeddings
{ Coverage of the training data sets
{ The ability of the classi er model to capture syntactic and semantic context.</p>
      <p>For the experiments on Turkish texts, the corpus utilized for learning
embeddings contained both formal text from both newspaper web sites and ordinary
web pages. Owing to the diversity of text sources, the NLP from Scratch
classi er trained on formal text is able to outperform previous studies on Turkish
data sets. However, in English experiments, the corpus contained only formal
text from Wikipedia text. In this study, we improved performance of the from
scratch approach on English by increasing the coverage of the training data set
since obtaining a large enough Twitter corpus requires a long time. It is
possible to investigate the e ect of corpus expansion and compare with the e ect of
training set expansion, as a future work.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>In this study, we investigated application of the NLP from scratch idea for solving
the NER problem on social media. We obtained word embeddings by
unsupervised training on a large corpus and used them as features to train a neural
network classi er. We measured the adaptation ability of the model based on
word embeddings to new text genres. We experimented on both English and
Turkish data sets. The NER from Scratch approach is able to achieve an
average performance on English Twitter data sets without any human e ort for
de ning gazetteers and dictionary extension. We observed that the recognition
performance of the approach can be improved by extending the training set with
tweets. For Turkish, without any e orts for gazetteer construction or rule
extension, state-of-the-art algorithms can be outperformed on the social media data
sets.</p>
      <p>Word embeddings are powerful word features yet there are some issues in
social media that require special techniques. Entities embedded in mentions and
hashtags cannot be discovered without a segmentation step prior to classi cation.
Sentence and phrase hashtags are very common in tweets and they incorporate
valuable information.</p>
      <p>The classi er we utilized in this study is a simple model that exploits only
word level context. As a future work, we plan to experiment with more complex
models that can integrate sentence level context and dependencies betwwen NER
tags. Besides, we would like to investigate from scratch normalization methods
for social media.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>K.</given-names>
            <surname>Adal</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Eryigit</surname>
          </string-name>
          .
          <article-title>Vowel and diacritic restoration for social media texts</article-title>
          .
          <source>In 5th Workshop on Language Analysis for Social Media</source>
          (LASM)
          <string-name>
            <surname>at</surname>
            <given-names>EACL</given-names>
          </string-name>
          , Gothenburg, Sweden,
          <year>April 2014</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ducharme</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vincent</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Janvin</surname>
          </string-name>
          .
          <article-title>A neural probabilistic language model</article-title>
          .
          <source>J. Mach. Learn. Res.</source>
          ,
          <volume>3</volume>
          :
          <fpage>1137</fpage>
          {
          <fpage>1155</fpage>
          ,
          <string-name>
            <surname>Mar</surname>
          </string-name>
          .
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>K.</given-names>
            <surname>Bontcheva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Derczynski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Funk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Greenwood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Maynard</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Aswani</surname>
          </string-name>
          .
          <article-title>TwitIE: An open-source information extraction pipeline for microblog text</article-title>
          .
          <source>In Proceedings of the International Conference on Recent Advances in Natural Language Processing. Association for Computational Linguistics</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>G.</given-names>
            <surname>Celikkaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Torunoglu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Eryigit</surname>
          </string-name>
          .
          <article-title>Named entity recognition on real data: A preliminary investigation for turkish</article-title>
          .
          <source>In Proceedings of the 7th International Conference on Application of Information and Communication Technologies, AICT2013</source>
          , Baku, Azarbeijan,
          <year>October 2013</year>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>R.</given-names>
            <surname>Collobert</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          . A Uni ed
          <article-title>Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning</article-title>
          .
          <source>In Proceedings of the 25th International Conference on Machine Learning, ICML '08</source>
          , pages
          <fpage>160</fpage>
          {
          <fpage>167</fpage>
          , New York, NY, USA,
          <year>2008</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>R.</given-names>
            <surname>Collobert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bottou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Karlen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kavukcuoglu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Kuksa</surname>
          </string-name>
          .
          <article-title>Natural language processing (almost) from scratch</article-title>
          .
          <source>Journal of Machine Learning Scratch</source>
          ,
          <volume>12</volume>
          :
          <fpage>2493</fpage>
          {
          <fpage>2537</fpage>
          ,
          <string-name>
            <surname>Nov</surname>
          </string-name>
          .
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Seker</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Eryigit</surname>
          </string-name>
          .
          <article-title>Initial explorations on using crfs for turkish named entity recognition</article-title>
          .
          <source>In In Proceedings of the 24th International Conference on Computational Linguistics</source>
          ,
          <string-name>
            <surname>COLING</surname>
          </string-name>
          <year>2012</year>
          ., Mumbai, India,
          <fpage>8</fpage>
          -
          <issue>15</issue>
          <year>December 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>H.</given-names>
            <surname>Demir</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Ozgur</surname>
          </string-name>
          .
          <article-title>Improving named entity recognition for morphologically rich languages using word embeddings</article-title>
          .
          <source>In 13th International Conference on Machine Learning and Applications, ICMLA</source>
          <year>2014</year>
          , Detroit, MI, USA, December 3-
          <issue>6</issue>
          ,
          <year>2014</year>
          , pages
          <fpage>117</fpage>
          {
          <fpage>122</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>L.</given-names>
            <surname>Derczynski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Maynard</surname>
          </string-name>
          , G. Rizzo, M. van Erp,
          <string-name>
            <given-names>G.</given-names>
            <surname>Gorrell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Petrak</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Bontcheva</surname>
          </string-name>
          .
          <article-title>Analysis of named entity recognition and linking for tweets</article-title>
          .
          <source>Information Processing &amp; Management</source>
          ,
          <volume>51</volume>
          (
          <issue>2</issue>
          ):
          <volume>32</volume>
          {
          <fpage>49</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>K.</given-names>
            <surname>Dilek</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Steinberger</surname>
          </string-name>
          .
          <article-title>Experiments to Improve Named Entity Recognition on Turkish Tweets</article-title>
          .
          <source>In Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM)@ EACL</source>
          , pages
          <volume>71</volume>
          {
          <fpage>78</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>J.</given-names>
            <surname>Eisenstein</surname>
          </string-name>
          .
          <article-title>What to do about bad language on the internet</article-title>
          .
          <source>In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , pages
          <volume>359</volume>
          {
          <fpage>369</fpage>
          ,
          <string-name>
            <surname>Atlanta</surname>
          </string-name>
          , Georgia,
          <year>June 2013</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>G.</given-names>
            <surname>Eryigit. ITU Turkish</surname>
          </string-name>
          <article-title>NLP web service</article-title>
          .
          <source>In Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL)</source>
          , Gothenburg, Sweden,
          <year>April 2014</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>J. R. Finkel</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Grenager</surname>
            , and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Manning</surname>
          </string-name>
          .
          <article-title>Incorporating non-local information into information extraction systems by gibbs sampling</article-title>
          .
          <source>In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics</source>
          , pages
          <volume>363</volume>
          {
          <fpage>370</fpage>
          . Association for Computational Linguistics,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>J.</given-names>
            <surname>Gelernter</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Mushegian</surname>
          </string-name>
          .
          <article-title>Geo-parsing messages from microtext</article-title>
          .
          <source>Transactions in GIS</source>
          ,
          <volume>15</volume>
          (
          <issue>6</issue>
          ):
          <volume>753</volume>
          {
          <fpage>773</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. B. Han,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cook</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Baldwin</surname>
          </string-name>
          .
          <article-title>Automatically constructing a normalisation dictionary for microblogs</article-title>
          .
          <source>In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL '12</source>
          , pages
          <fpage>421</fpage>
          {
          <fpage>432</fpage>
          ,
          <string-name>
            <surname>Stroudsburg</surname>
          </string-name>
          , PA, USA,
          <year>2012</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>M. Konkol</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <article-title>Brychc n, and M. Konop k. Latent semantics in named entity recognition</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <volume>42</volume>
          (
          <issue>7</issue>
          ):
          <volume>3470</volume>
          {
          <fpage>3479</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. D. Kucuk, G. Jacquet, and
          <string-name>
            <given-names>R.</given-names>
            <surname>Steinberger</surname>
          </string-name>
          .
          <article-title>Named entity recognition on turkish tweets</article-title>
          . In N. Calzolari,
          <string-name>
            <given-names>K.</given-names>
            <surname>Choukri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Declerck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Loftsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Maegaard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mariani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moreno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Odijk</surname>
          </string-name>
          , and S. Piperidis, editors,
          <source>LREC</source>
          , pages
          <volume>450</volume>
          {
          <fpage>454</fpage>
          .
          <string-name>
            <surname>European Language Resources Association</surname>
          </string-name>
          (ELRA),
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18. D.
          <article-title>Kucuk and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Yazici</surname>
          </string-name>
          .
          <article-title>Named entity recognition experiments on turkish texts</article-title>
          .
          <source>In Flexible Query Answering Systems</source>
          , pages
          <fpage>524</fpage>
          {
          <fpage>535</fpage>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <article-title>Improving text normalization via unsupervised model and discriminative reranking</article-title>
          .
          <source>ACL</source>
          <year>2014</year>
          , page
          <volume>86</volume>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>C. Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Weng</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Yao</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Datta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Sun</surname>
            , and
            <given-names>B.-S.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
          </string-name>
          . Twiner:
          <article-title>Named entity recognition in targeted twitter stream</article-title>
          .
          <source>In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '12</source>
          , pages
          <fpage>721</fpage>
          {
          <fpage>730</fpage>
          , New York, NY, USA,
          <year>2012</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>F.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Weng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          .
          <article-title>A broad-coverage normalization system for social media language</article-title>
          .
          <source>In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, ACL '12</source>
          , pages
          <fpage>1035</fpage>
          {
          <fpage>1044</fpage>
          ,
          <string-name>
            <surname>Stroudsburg</surname>
          </string-name>
          , PA, USA,
          <year>2012</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhou</surname>
          </string-name>
          .
          <article-title>Named entity recognition for tweets</article-title>
          .
          <source>ACM Trans. Intell. Syst. Technol.</source>
          ,
          <volume>4</volume>
          (
          <issue>1</issue>
          ):3:
          <issue>1</issue>
          {3:
          <fpage>15</fpage>
          ,
          <string-name>
            <surname>Feb</surname>
          </string-name>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhou</surname>
          </string-name>
          .
          <article-title>Recognizing named entities in tweets</article-title>
          .
          <source>In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT '11</source>
          , pages
          <fpage>359</fpage>
          {
          <fpage>367</fpage>
          ,
          <string-name>
            <surname>Stroudsburg</surname>
          </string-name>
          , PA, USA,
          <year>2011</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <article-title>E cient estimation of word representations in vector space</article-title>
          .
          <source>CoRR, abs/1301.3781</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25. T. Mikolov, M. Kara at, L. Burget,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cernocky</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Khudanpur</surname>
          </string-name>
          .
          <article-title>Recurrent neural network based language model</article-title>
          .
          <source>In INTERSPEECH</source>
          , pages
          <volume>1045</volume>
          {
          <fpage>1048</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <given-names>F.</given-names>
            <surname>Morin</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <article-title>Hierarchical probabilistic neural network language model</article-title>
          .
          <source>In AISTATS'05</source>
          , pages
          <fpage>246</fpage>
          {
          <fpage>252</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <given-names>D.</given-names>
            <surname>Nadeau</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Sekine</surname>
          </string-name>
          .
          <article-title>A survey of named entity recognition and classi cation</article-title>
          .
          <source>Lingvisticae Investigationes</source>
          ,
          <volume>30</volume>
          (
          <issue>1</issue>
          ):3{
          <fpage>26</fpage>
          ,
          <fpage>2007</fpage>
          -
          <volume>01</volume>
          -01T00:
          <fpage>00</fpage>
          :
          <fpage>00</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>J. Pennington</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Socher</surname>
            , and
            <given-names>C. D.</given-names>
          </string-name>
          <string-name>
            <surname>Manning</surname>
          </string-name>
          . Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In Proceedings of EMNLP</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <given-names>A.</given-names>
            <surname>Ritter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cherry</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Dolan</surname>
          </string-name>
          .
          <article-title>Unsupervised modeling of twitter conversations</article-title>
          .
          <source>In Human Language Technologies</source>
          :
          <article-title>The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics</article-title>
          ,
          <source>HLT '10</source>
          , pages
          <fpage>172</fpage>
          {
          <fpage>180</fpage>
          ,
          <string-name>
            <surname>Stroudsburg</surname>
          </string-name>
          , PA, USA,
          <year>2010</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <given-names>A.</given-names>
            <surname>Ritter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Clark</surname>
          </string-name>
          , Mausam, and
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          .
          <article-title>Named entity recognition in tweets: An experimental study</article-title>
          .
          <source>In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '11</source>
          , pages
          <fpage>1524</fpage>
          {
          <fpage>1534</fpage>
          ,
          <string-name>
            <surname>Stroudsburg</surname>
          </string-name>
          , PA, USA,
          <year>2011</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31. H.
          <string-name>
            <surname>Sak</surname>
            , T. Gungor, and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Saraclar</surname>
          </string-name>
          .
          <article-title>Turkish language resources: Morphological parser, morphological disambiguator and web corpus</article-title>
          .
          <source>In Proceedings of the 6th International Conference on Advances in Natural Language Processing</source>
          ,
          <source>GoTAL '08</source>
          , pages
          <fpage>417</fpage>
          {
          <fpage>427</fpage>
          , Berlin, Heidelberg,
          <year>2008</year>
          . Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <given-names>D.</given-names>
            <surname>Torunoglu</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Eryigit</surname>
          </string-name>
          .
          <article-title>A cascaded approach for social media text normalization of Turkish</article-title>
          .
          <source>In 5th Workshop on Language Analysis for Social Media</source>
          (LASM)
          <string-name>
            <surname>at</surname>
            <given-names>EACL</given-names>
          </string-name>
          , Gothenburg, Sweden,
          <year>April 2014</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33. G. Tur, D. Hakkani-Tur, and
          <string-name>
            <surname>K.</surname>
          </string-name>
          <article-title>O azer. A statistical information extraction system for turkish</article-title>
          .
          <source>Natural Language Engineering</source>
          ,
          <volume>9</volume>
          (
          <issue>02</issue>
          ):
          <volume>181</volume>
          {
          <fpage>210</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>J. Turian</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Ratinov</surname>
            , and
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Bengio</surname>
          </string-name>
          .
          <article-title>Word representations: a simple and general method for semi-supervised learning</article-title>
          .
          <source>In Proceedings of the 48th annual meeting of the association for computational linguistics</source>
          , pages
          <volume>384</volume>
          {
          <fpage>394</fpage>
          . Association for Computational Linguistics,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <surname>M. Van Erp</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Rizzo</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Troncy</surname>
          </string-name>
          .
          <article-title>Learning with the web: Spotting named entities on the intersection of nerd and machine learning</article-title>
          .
          <source>In In Proceedings of the 3rd Workshop on Making Sense of Microposts</source>
          , pages
          <volume>27</volume>
          {
          <fpage>30</fpage>
          .
          <string-name>
            <surname>Citeseer</surname>
          </string-name>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Eisenstein</surname>
          </string-name>
          .
          <article-title>A log-linear model for unsupervised text normalization</article-title>
          .
          <source>In EMNLP</source>
          , pages
          <volume>61</volume>
          {
          <fpage>72</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>