<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Microposts</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Combining Named Entity Recognition Methods for Concept Extraction in Microposts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Peter Krammer upsypkra@savba.sk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michal Laclavík laclavik.ui@savba.sk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Informatics, Slovak Academy of Sciences</institution>
          <addr-line>Dúbravská cesta 9 845 07 Bratislava</addr-line>
          ,
          <country country="SK">Slovakia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ladislav Hluchý</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Marek Ciglan</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Štefan Dlugolinský</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <volume>4</volume>
      <fpage>34</fpage>
      <lpage>41</lpage>
      <abstract>
        <p>NER in microposts is a key and challenging task of mining semantics from social media. Our evaluation of a number of popular NE recognizers over a micropost dataset has shown a significant drop-off in results quality. Current state-of-theart NER methods perform much better on formal text than on microposts. However, the experiment provided us with an interesting observation - although individual NER tools did not perform very well on micropost data, we have received recall over 90% when we merged all the results of the examined tools. This means that if we would be able to combine different NE recognizers in a meaningful way, we might be able to get NER in microposts of an acceptable quality. In this paper, we propose a method for NER in microposts, which is designed to combine annotations yielded by existing NER tools in order to produce more precise results than input tools alone. We combine NE recognizers utilizing ML techniques, namely decision tree and random forest using the C4.5 algorithm. The main advantage of the proposed method lies in the possibility of combining arbitrary NER methods and in its application on short, informal texts. The evaluation on a standard dataset shows that the proposed approach outperforms underlying NER methods as well as a baseline recognizer, which is a simple combination of the best underlying recognizers for each target NE class. To the best of our knowledge, up-to-date, the proposed approach achieves the highest F1 score on the #MSM2013 dataset.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
      <p>1.2.7 [Natural language processing]: Language parsing
and understanding, Text analysis.
named entity recognition, machine learning, microposts
PCeorpmyirsisgihont tco m20ak1e4 dhieglidtalbyorahuatrhdocro(sp)i/eoswonfearl(ls)o;r cpoaprtyionfgthpiserwmoitrktefdor
poenrlsyonfoarl oprricvlaatsesraonodmaucsaediesmgircanptuedrpwositehso.ut fee provided that copies are
nPoutbmliashdeedoradsipstarirbtuotefdthfoer#prMofiticorrocpoomstms2e0rc1i4al Wadovraknsthaogpe apnrdoctheeadticnogpsi,es
baveaariltahbislenotice anadstCheEfUulRlcVitoatli-o1n14o1n (thhet tfirpst:/p/acgeeu.rT-owsco.oprygo/tVhoelr-w1i1se4,1)to
online
republish, to post on servers or to redistribute to lists, requires prior specific
p#eMrmicisrsoiponosatnsd2/0o1r4a, fAeep.ril 7th, 2014, Seoul, Korea.</p>
      <p>Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.</p>
    </sec>
    <sec id="sec-2">
      <title>1. INTRODUCTION</title>
      <p>
        A significant growth of social media interaction can be
observed in recent years. People are able to interact through
the Internet from almost anywhere at anytime. They can
share their experience, thoughts and knowledge instantly
and they do it in mass dimensions. The easiest and probably
the most popular way of interaction on the Web is through
microposts – short text messages posted on the Web. There
is a plenty of services offering such communication, notorious
examples of microposts include tweets, Facebook statuses,
comments, Google+ posts, Instagram photos. Microposts
analysis has a big potential in hidden knowledge that can
be used in wide range of domains like emergency response,
public opinion assessment, business or political sentiment
analysis and many more. The most important task in
order to analyze and make sense of microposts is the Named
Entity Recognition (NER). NER in microposts is a
challenging problem because of a limited size of a single micropost,
prevalence of term ambiguity, noisy content,
multilingualism [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. These are the main reasons why existing NER
methods perform better on formal newswire text than on
microposts and there is clearly a space for new methods of
NER designed for social media streams.
      </p>
      <p>In this paper, we first evaluate multiple popular and
widely used NER methods on the micropost data. The
results show a significant decrease of result quality compared
to those reported for newswire texts. An interesting
observation from the experiment is that we can achieve recall over
90% on the micropost data, when all the results are unified.
This means, that in theory, we could achieve very high
quality annotations of named entities (NEs) in microposts just
by combining existing NER tools in a “smart” way. The
rest of the paper is dedicated to the research question, how
to combine annotations of different NER tools in order to
achieve better recognition in microposts.</p>
      <p>
        We propose an approach for combining NER methods
represented by different NE recognizers in order to make
a new NE recognizer intended to be used on microposts.
The method is designed to combine annotations produced
by different NER tools by exploiting machine learning (ML)
techniques. We use the term annotation to refer to a
substring of an input text that has been marked by a NER tool
as a reference to an entity of one of target classes; i.e., LOC,
MISC, ORG and PER. The main challenge is the
transformation of text annotations produced by NER tools into a
form usable for training ML classification algorithms. Once
the NER annotations were transformed to an appropriate
format, we have performed an evaluation of a number of
popular ML classification techniques. The best performing
on our problem domain was the C4.5 algorithm [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] that
was used to train decision tree (DT) and random forest (RF)
models. The resulting classification model outperformed the
best of underlying individual recognizers by more than 10%
in F1 score and a chosen baseline model by 3% in F1 score.
      </p>
      <p>The main contributions of the work are following: (i) We
show that although existing NER tools designed for news
text do not perform well on microposts, by merging results
of several different NER tools, we can achieve high recall and
precision. (ii) We utilize ML classifiers to combine the
outputs of multiple NE recognizers. The principal challenge is
the transformation of text annotations yielded by NER tools
to feature vectors that can be used for the training of
classification algorithms. (iii) We provide an extensive evaluation
of popular classification models to asses their suitability for
the problem of combing results of NER tools. For the best
performing ones, we have studied the influence of algorithms
parameters on the classification results.</p>
      <p>The paper is structured as follows. In Section 2, we briefly
summarize research works related to NER. In Section 3 we
conduct an experiment, in which a number of existing
popular NER tools are evaluated on microposts data. Results
show dramatic drop in quality measures compared to the
numbers reported on news datasets. In Section 4, we define
a baseline NE recognizer, explain our approach of combining
NER tools and evaluate our NE recognition models. Finally,
Section 5 discusses open issues and Section 6 summarizes our
results and concludes the paper.
2.</p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>
        There has been a large amount of NER research conducted
on formal text, such as newswire or biomedical text. The
performance of NE recognizers for this kind of text is
comparable to that of humans. For instance, the MUC-7 NE task,
where the best NE recognizer scored F1 = 93.39%, while the
annotators scored F1 = 97.60% and F1 = 96.95% [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
Another example is CoNLL-2003 shared task, where the best
NER recognizer scored F1 = 88.76% in English test [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
It has been later outperformed by Ratinov and Roth [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]
achieving F1 = 90.8%. NE recognizers, which have been
designed for these tasks and which achieve state-of-the-art
performance results, heavily rely on linguistic features
observable in formal text. But many of the important
features absent in microposts; e.g. capitalization. Therefore,
news-trained recognizers perform worse on them. The
performance drop-off is also caused by nature of microposts
content – its length, informality, noise and multilingualism.
Many of the problems related to NER in microposts are
discussed by Bontcheva and Rout in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        The idea of combining different methods for NER is not
new. It has been successfully applied on formal text by
Florian et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], who combine four diverse classifying methods;
i.e., transformation-based learning, hidden Markov model,
robust risk minimization (RRM) and maximum entropy.
Classifiers are complemented by gazetteers together with
the output of two externally trained NE recognizers and the
whole is used to extract text features. The RMM method is
used in order to select a good performing combination of the
features. Todorovski and Dˇzeroski [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] introduce meta
decision trees (MDT) for combining multiple classifiers. They
present a C4.5 algorithm-based training algorithm for
producing MDTs. Another application is by Si et al. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], who
combine several NER methods for bio-entity recognition in
biomedical texts. They experiment with combining NE
classifiers by three different approaches; i.e., majority vote,
unstructured exponential model and conditional random field.
Also Saha and Ekbal [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] use seven diverse NER classifiers
to build a number of voting models depending upon
identified text features that are selected mostly without a domain
knowledge.
      </p>
      <p>
        Regarding the NER for tweets, there is also a similar
approach taken by Liu et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Authors combine a k-Nearest
Neighbors (k-NN) classifier with a linear Conditional
Random Fields (CRF) model under a semi-supervised learning
framework and show increase in F1 with respect to a
baseline system, which is its modified version without k-NN and
semi-supervised learning. Etter et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] deal with
multilingual NER for short informal text. They do not rely
on language dependent features such as dictionaries or POS
tagging, but they use language independent features derived
from the character composition of a word and its context in a
message; i.e., words, character n-grams for words, ±k words
to the left, message length, word length and word position in
message. They use an algorithm that combines Support
Vector Machine (SVM) with a Hidden Markov Model (HMM)
to train a NER model on a manually annotated data. The
experiments show that the language independent features
lead to F1 score increase and the model outperforms Ritter
et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Ritter et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] present re-built NLP pipeline for
tweets; i.e., POS tagger, chunker and NE recognizer. The
NE recognizer leverages the redundancy inherent in tweets
using Labeled LDA [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] to exploit Freebase1 dictionaries as
a source of distant supervision. TwiNER, a novel
unsupervised NER system for targeted tweet streams is proposed
by Li et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Similarly to Etter et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], TwiNER does
not rely on any linguistic features of the text. It aggregates
information garnered from the Web and Wikipedia. The
advantage of TwiNER is that it does not require manually
annotated training set. On the other hand, TwiNER does
not categorize the type of discovered NEs. Authors prefer
the problem of correctly locating and recognizing presence
of NEs instead of their classification. Habib and Keulen [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],
the winning solution of the #MSM2013 IE Challenge, splits
the NER problem in named entity extraction (NEE) and
named entity classification (NEC), too. The NEE task is
performed by union of entities recognized by two models;
i.e., CRF and SVM. Both models are trained on manually
labeled tweet data. The CRF involves POS tags and
capitalization of the words as features. The SVM segments tweet
using Li et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] approach and enriches the segments by
external knowledge base (KB). It uses the same features as
the SVM model and information from external KB.
3.
      </p>
    </sec>
    <sec id="sec-4">
      <title>COMBINED NER METHODS</title>
      <p>We have used state-of-the-art NER methods represented
by various existing NE recognizers. These methods were
combined in our classification models discussed later in this
paper. Below we briefly describe used NE recognizers
focusing on their NER methods.</p>
      <p>
        1) ANNIE (v7.1) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] relies on finite state algorithms,
1http://www.freebase.com
      </p>
      <p>ANNIE
Finite state
algorithms</p>
      <p>Gazetteers</p>
      <p>Perceptron</p>
      <p>Learning
Linked Data</p>
      <p>Machine learning</p>
      <p>
        Maximum
entropy
gazetteers and the JAPE (Java Annotation Patterns Engine)
language. 2) Apache OpenNLP2 (v1.5.2) is based on
maximum entropy models and perceptron learning algorithm.
3) Illinois Named Entity Tagger (v1.0.4) [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] uses a
regularized averaged perceptron with external knowledge
(unlabeled text, gazetteers built from Wikipedia and word class
models). We have used Illinois NET with 4-label type set
and default configuration. 4) Illinois Wikifier (v1.03) [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]
is based on a Ranking SVM and exploits Wikipedia link
structure in disambiguation. 5) Open Calais operates
behind a shroud of mystery since there is not much
information available about how its NE recognition works.
Official sources4 say, that it uses NLP, ML and other
methods as well as Linked Data. 6) Stanford Named Entity
Recognizer (v1.2.7) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is based on CRF sequence models.
We have used the English 4-class caseless CONLL model5.
7) Wikipedia Miner6 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] is a text annotation tool, which is
capable of annotating Wikipedia topics in a given text. It
exploits Wikipedia link graph, Wikipedia category hierarchy
and relies on ML classifiers, which are used for measuring
relatedness of concepts and terms, as well as for measuring
disambiguation. We have applied this software to discover
Wikipedia topics, which were then tagged according to the
DBPedia Ontology7.
      </p>
      <p>Most of the NE recognizers are based on statistical
learning methods. Some of them use also gazetteers and other
external knowledge like Wikipedia or Linked Data. Outline
of the NE recognizers is depicted in Figure 1.
3.1</p>
    </sec>
    <sec id="sec-5">
      <title>NE Recognizers Evaluation</title>
      <p>
        In this section, we evaluate NER methods described in
Section 3 on a micropost data corpus. Our intent was to
see the performance of each individual NE recognizer. The
evaluation was focused also on analysis, which NE
recognizer is more suitable for particular named entity class and
whether NE recognizers produce diverse results. NE
recognizers were evaluated over the adapted #MSM2013 IE
Challenge training dataset [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We have taken the 1.5 version and
cleaned it from duplicate as well as from overlapping
microposts with the test dataset. The cleaned training dataset
2http://opennlp.apache.org
3http://cogcomp.cs.illinois.edu/page/download_
view/Wikifier
4http://www.opencalais.com/about
5english.conll.4class.caseless.distsim.crf.ser.gz
6http://wikipedia-miner.cms.waikato.ac.nz
7http://dbpedia.org/Ontology
MISC - 94 (6.22%)
      </p>
      <p>LOC - 96 (6.35%)
finally contained 2752 unique manually annotated
microposts with classification restricted to four entity types: PER,
LOC, ORG and MISC. We have also adapted a test dataset
from the #MSM2013 IE Challenge on which we later
evaluated our classification models. The occurrence of NEs in
both datasets is displayed in Figure 2. Named entity types
were not equally distributed. The most frequent entity type
in both datasets was PER and the least frequent was MISC.
Datasets used in this paper are also available for download8
in GATE SerialDataStore format. Datasets includes results
of all the used NE recognizers as well as our NER models
discussed later in the paper.</p>
      <p>Evaluated NE recognizers were not specially configured,
tweaked or trained for microposts prior to the evaluation.
We wanted to see, how they cope with the different kind
of text that they were trained for. The alignment with our
taxonomy was done by simple mapping. Evaluation results
are displayed in Table 1 and ordered by Micro avg. F1 score.
We provide also a Macro summary which averages P , R and
F1 measures on a per document basis, while the Micro
summary considers the whole dataset as a one document. The
evaluation has also shown, that the NE recognizers produced
diverse annotations. This behavior could be seen in raised
recall after the results were unified and cleaned from
duplicates. Figure 3 illustrates the situation and the possible
recall, which could be theoretically achieved when
combining the recognizers.</p>
      <p>
        More details about the evaluation can be found in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Some of the evaluation results may slightly differ from those
displayed in Table 1. It is because we did accept adjectivals
and demonymic forms for countries as MISC type in this
work; e.g., Americans, English.
4.
      </p>
    </sec>
    <sec id="sec-6">
      <title>COMBINING NE RECOGNIZERS</title>
      <p>The idea of how to combine NE recognizers was to use
ML techniques to build a classification model, which would
8http://ikt.ui.sav.sk/microposts/
be trained on features describing microposts’ text as well
as annotations produced by involved NE recognizers. We
have used the training dataset for building the model and
the test dataset for evaluating it and comparing with other
NE recognizers (Section 3.1).</p>
      <p>
        According to the evaluation results in Section 3.1, we
have chosen seven out of eight NE recognizers based on
different methods. The discarded one was LingPipe because
of its weak9 performance on micropost data. Chosen NE
recognizers were then complemented by Miscinator, an NE
recognizer specially designed for the #MSM2013 IE
Challenge [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ].
      </p>
      <p>As overall recall of the underlying NE recognizers was
relatively high, we wanted to gain maximum precision while not
devalue the recall. We decided to involve ML techniques, but
it was necessary to transform this problem into a standard
ML task. In this case it was suitable to transform the task
of NER into a task of classification. The intent was that
ML process would produce a classification model capable
of classifying given annotations from involved methods into
four target classes LOC, MISC, ORG, PER and one special
class NULL indicating that the annotation did not belong
to any of the four target classes. Then a simple algorithm
would be applied to merge the re-classified annotations into
final results.</p>
    </sec>
    <sec id="sec-7">
      <title>4.1 Baseline NE Recognizer</title>
      <p>We have defined a baseline NE recognizer in the way that
each target entity class was extracted by the best NE
recognizer according to the evaluation made over the training
dataset (section 3.1); i.e., LOC, MISC and ORG classes were
extracted by OpenCalais and PER class was extracted by
Illinois NET. The performance of the baseline can be seen
in Table 2 together with performances of the NE
recognizers considered for combining. The evaluation has been made
over the test dataset. We can see that the baseline NE
recognizer had outperformed underlying NE recognizers in
precision and F1 measure, which was expected. Our goal was
to overcome the performance of the baseline NE recognizer
with a model produced by ML approach.</p>
    </sec>
    <sec id="sec-8">
      <title>4.2 Transforming NEs into Feature Vectors</title>
      <p>We have taken an approach of describing how particular
methods performed on different entity types compared to
the response of other methods and a manual annotation.
Used as a training vector, this description was an input for
training a classification model. A vector of input training
features was generated for each annotation found by
underlying NER methods restricted to following types: LOC,
9we have used the English News: MUC-6 model
MISC, ORG, PER, NP – noun phrase, VP – verb phrase,
OTHER – different type. We called this annotation a
reference annotation. The vector of each reference annotation
consisted of several sub-vectors (Figure 4).</p>
      <p>annotation
vector
tweet
vector
mveetchtoodr1 mveetchtoodr2 … mveethctoodrN
correct
answer
preproc.
vector</p>
      <p>The first sub-vector of the training vector was an
annotation vector (Figure 5). The annotation vector described the
reference annotation – whether it was upper or lower case,
used a capital first letter or capitalized all of its words, the
word count, and the type of the detected annotation.</p>
      <p>The second sub-vector described microposts as a whole
(Figure 6). It contained features describing whether all
words longer than four characters were capitalized,
uppercase, or lowercase. We called this sub-vector tweet vector.</p>
      <p>tweet vector
caalpwitaolridzes*d upaplewrocradsse*d loawl ewrocradsse*d
* words longer than four characters
preproc.</p>
      <p>vector
ail
ai a
ai r</p>
      <p>The rest of the sub-vectors were computed according to
the overlap of the reference annotation with annotations
produced by particular NER method. Such sub-vector (termed
a method vector by us) was computed for each method
and contained four other vectors describing the overlap of
method annotations with reference annotation on each
target entity type (Figure 7). The annotation type attribute
was filled with a class of method annotation that exactly
matched position of the reference annotation and was one of
the target entity classes, otherwise it was left blank.</p>
      <p>Each overlap vector of a particular method and NE class
(Figure 8) consisted of five components – ail: the average
intersection length of a reference annotation with the method
method
vector
annotations of the same NE class, aiia: the average
intersection ratio of the method annotations of the same NE class
with reference annotation, aiir: the average intersection
ratio of a reference annotation with method annotations of the
same NE class, average confidence (if the underlying method
return such value), and variance of the average confidence.</p>
      <p>NE overlap</p>
      <p>vector
ail
ai a
ai r</p>
      <p>avg.
confidence
confidence
variance</p>
      <p>The ail component in overlap vector was computed using
formula (1), where R was a fixed reference annotation and
MC was a set of n method annotations of class C intersecting
with the reference annotation R. The ail component was a
simple arithmetic mean of intersection lengths.</p>
      <p>n
1 X
ail(R,MC) = n
i=1
|R ∩ MCi|</p>
      <p>The aiia component was computed using formula (2),
which was also a simple arithmetic mean, but the
intersection lengths were normalized by lengths of particular method
annotations MCi intersecting with the reference annotation
R. We wanted the value of aiia component to describe how
much were method annotations covered by the reference
annotation.</p>
      <p>n
1 X |R ∩ MCi|
aiia(R,MC) = n</p>
      <p>|MCi|
i=1</p>
      <p>Similarly, the aiir component was computed using formula
(3), but the intersection lengths were normalized by length
of the reference annotation R. The value of aiir component
was used to describe how much was the reference annotation
covered by method annotations.</p>
      <p>n
1 X |R ∩ MCi|
aiir(R,MC) = n i=1 |R|</p>
      <p>A simple example of overlap vector computation is
depicted in Figure 9. The overlap vector is computed for
method 4 and PER class according to the highlighted
reference annotation. In this example, the reference annotation
is M2.PER1, but it can be any method annotation or
manual annotation. The rest of the method 4 overlap vectors
are zero-valued since method 4 does not return annotations
of types LOC, MISC and ORG. Similarly, there will be
overlap vectors according to the same reference annotation
computed for methods 1, 2 and 3 to finally have all method
vectors computed in a training vector. In addition, there
will be eight training vectors computed, because of eight
annotations taken as reference annotations, where also the
manual annotation PER is included.
(1)
(2)
(3)
manual
text</p>
      <p>M1.LOC1</p>
      <p>A l l a r d M1.PER1
S y d n e y A l l a r d M2.PER1
S y d n e y A l l a r d M3.PER1</p>
      <p>S y d n e y M3.LOC1
method 4 M r . S y d n e y A l l a r d M4.PER1
M4.PER2
ail M2.PER1,M4PER = 1 (6 + 6) = 6.00</p>
      <p>2
aiia M2.PER1,M4PER = 12 160 + 66 = 0.80
The last two components in the training vector were the
correct answer (i.e., the correct annotation type taken from
manual annotation) and a special preprocessing vector
(Figure 6). The preprocessing vector included three components:
ail, aiia and aiir, which described the intersection of the
reference annotation when it was correct with the correct
answer. If the reference annotation was not correct the values
of the preprocessing vector components were set to zero.</p>
      <p>The number of learning features depended on the number
of combined methods, since for each involved method a new
method vector was computed and included into the training
vector. There were some features, which were less or more
important or not important at all. The effect of specific
learning features is discussed later.</p>
    </sec>
    <sec id="sec-9">
      <title>4.3 Training Data Preprocessing</title>
      <p>Training data was generated automatically as a collection
of training vectors, which needed further processing prior to
apply ML algorithms. There have been duplicate training
vectors removed in order to eliminate distortion in training
and validation process thus getting a more balanced
classification model.</p>
      <p>According to the preprocessing vector (Figure 6), there
have been training vectors removed, in which the
annotation type attribute in the annotation vector was correct but
the aiir attribute in the preprocessing vector was not equal
to 1.0, i.e., the bounds of the reference annotation were not
equal to the bounds of the correct answer. In previous
versions, we tried to accept all the training vectors whose aiir
attribute was at least 0.95, i.e., the reference annotation
overlapped with the correct answer at least on 95%, but
this led to models with lower precision.</p>
      <p>We have removed also several attributes, which led to zero
information gain and which were not useful for the
classification, i.e., attributes with the same value for all the training
vectors. They were usually average confidence and variance
of the average confidence scores, because some NE
recognizers did not provide annotation confidence information,
hence both attributes were always zero and therefore also
their information gain. Due to same reasons, we have
removed also attributes, which contained information in less
than 3% of records. Attributes of the preprocessing vector
have been also removed.</p>
      <p>The preprocessing phase had significantly reduced the size
of training data and therefore memory requirements as well
as it had sped up the training process. It started with a set
of ∼ 63, 000 training vectors with ∼ 200 attributes and
finished on ∼ 31, 000 unique records with ∼ 100 highly relevant
attributes.</p>
    </sec>
    <sec id="sec-10">
      <title>4.4 Model Training and Evaluation</title>
      <p>
        We have tried several algorithms to train different
classification model candidates, which we compared according to
the F1 score. We have also examined AUROC and ACC
(accuracy) measures. All these three measures were
obtained from 10-fold cross validation of the model candidates
over the training dataset. Cross validation served as a good
method for identifying suitable model candidates, because
it avoided an effect of overfitting without a need of another
test dataset. The best performance has been achieved by DT
classification model built with J4810 algorithm (DTJ48)
followed by RF [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] model. The third was a classification model
based on REPTree (Reduced Error Pruned Tree) built with
Bagging algorithm (Table 3). We have focused on the first
two best performing algorithms and built several
classification models while varying some of input parameters of these
algorithms in order to gain precision and recall. It was
Minimum Number of Instances per Leaf parameter (hereinafter
parameter ”M”) for DTJ48 and number of trees for RF. The
classification models were evaluated using a hold-out
validation method over the test dataset. Evaluation results
are displayed in Table 4. The best performing were
models based on RF, which outperformed models based on DT,
baseline recognizer and all the underlying NE recognizers.
We can see that recall and precision have been growing with
the number of trees in the RF models and continued to
converge to 79% and 76% respectively. This behavior is more
obvious in Figure 10, where F1 measures are depicted for
particular NE classes according to the variated number of
trees. Dashed lines indicate score of the baseline model.
      </p>
      <p>Evaluation results of models built with J48 algorithm
(C4.5), while varying the M parameter, are displayed in
Figure 11. We can see that the F1 score for LOC has been
approaching the baseline score similarly as it was for RF
algorithm while varying the number of trees parameter.
Analogous behavior can be seen in Macro and Micro average
scores. In ORG and PER classification the score was higher
10J48 is an implementation of C4.5 algorithm</p>
      <p>F1
than the baseline or at least the same. We cannot say, that
it has been growing with the parameter M. The same applies
for MISC, where the F1 score varied around the baseline. In
general, increasing minimum number of instances per leaf
in DT (parameter M) led to models with higher recall and
precision. There were four classification models, which have
slightly outperformed the baseline model, but not as much
as the RF models.</p>
      <p>
        The #MSM2013 21 3 model in the Table 4 is our
submission to the #MSM2013 IE Challenge [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. This model was
one of our early models, which were based on groundwork
of this paper. The model has finished on the second place in
the challenge loosing 1% in F1 on a winner Habib et. al [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
Results of this model in the table may be slightly worse than
the official challenge results11, since we have used more strict
evaluation criteria. We did not accept partially correct
consecutive annotations; i.e., PER/Christian PER/Bale was
incorrect, while PER/Christian Bale was correct. For a better
11http://oak.dcs.shef.ac.uk/msm2013/ie_challenge/
results/challenge_results_summary.pdf
      </p>
      <p>LOC</p>
      <p>ORG
85
.
1F .450
0
050 2 5 10 20 50 100 200
.</p>
      <p>Trees
405
.
1F .53
3000 2 5 10 20 50 100 200
.</p>
      <p>Trees
0605
.
1F .0505
.
5040 2 4 6 8 10 12
.</p>
      <p>M
2048
.
1F .0343
.
0030 2 4 6 8 10 12
.</p>
      <p>M
PER</p>
      <p>Micro
2 4 6 8 10 12</p>
      <p>M
comparison of the models, we present precision, recall and
F1 measures of the best performing model – RF N400, best
DT model – DTJ48 M13, baseline recognizer and the three
best performing NE recognizers in Figure 12. The gain in
precision of the RF N400 model with respect to the NE
recognizer with the highest precision – Stanford NER was 18%.
However, the baseline recognizer had higher precision than
RF N400 by 4%. Model based on DT – DTJ48 M13 was
the third best in precision followed by Stanford NER. The
highest score in recall among the combined NE recognizers
has been achieved by Illinois NET reaching 69%. The gain
in recall of the RF N400 model with respect to Illinois NET
was 10%. RF N400 reached the highest score in recall
followed by DTJ48 M13 and Illinois NET. Stanford NER and
the baseline recognizer shared the fourth place.</p>
      <p>The highest score in F1 measure among the combined NE
recognizers has been achieved by Illinois NET and
Stanford NER, which both reached 67%. The gain in F1 of
RF N400 with respect to them was 15%. RF N400 model
with 400 trees has outperformed also the second DTJ48 M13
model and the third baseline recognizer, whose gain was
10%. A comparison on NE class basis is depicted in
Figure 13. We did not include the baseline recognizer in the
charts, since it is represented there by its NE recognizers
(see Section 4.1). Our RF N400 model was the best in
recognizing two most occurring entity classes in the test dataset
– ORG and PER. It has gained 7% and 5% with respect to
Illinois Wikifier and Illinois NET respectively. The best in
recognizing LOC entities was Open Calais, on which the
RF N400 model lost 1%. The MISC entity type was a
domain of the DTJ48 M13 model, which has gained 24% with
respect to the second Open Calais.</p>
      <p>Closer analysis of annotation results has shown, that there
have been many results correctly classified, but they did
not exactly match position in text; i.e., results were
partially correct. Therefore we tried to apply post-processing
and trimmed non-alphabetical characters off the results. We
have also removed definite articles from LOC and PER
results. Moreover, we have removed titles from PER results;
e.g., Dr., Mr. or Sir. Evaluation of models with this
sim0.75
0.70
ple post-processing (PP) is displayed in Table 5. We have
applied post-processing on the best versions of RF and DT
models. The gain in F1 with respect to models without
postprocessing was 3%. Finally, we tried to build up a model by
combining our best models, which were RF N400 PP for
LOC, ORG, PER NE classes and DTJ48 M13 PP for MISC
class. This model had better performance in MISC
recognition, but the overall improvement was not markable, because
the occurrence of MISC entities in the test dataset was very
low, thus it did not significantly affect the F1 score.</p>
    </sec>
    <sec id="sec-11">
      <title>5. DISCUSSION AND FUTURE WORK</title>
      <p>
        The structure of the best models (DTJ48 M13 and
RF N400) is based on DTs, which use rules always related
to one input attribute. This could present a weakness of
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.5
0.4
0.3
0.2
0.1
0.0
these models. One possible solution could be to use
multivariate DTs, which support multiple attributes per node in
a tree and can handle also correlated attributes [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The
drawback of using multivariate DTs is in the time needed to
built them, but on the other hand their time performance is
higher, because they do not test the same attribute multiple
times. We expect that such models could better utilize the
potential of data and therefore could be also more accurate
than RF or DT models.
      </p>
    </sec>
    <sec id="sec-12">
      <title>CONCLUSIONS</title>
      <p>We have shown an approach of combining NE recognizers
based on diverse methods on a task of NER in microposts
and examined several ML techniques for the combination of
text and annotation features produced by the recognizers.
The best performing were RF and DT based on C4.5
algorithm. Combination models produced by these algorithms
have achieved performance superior to that of underlying
NE recognizers as well as the baseline recognizer, which was
built of the best performing NE recognizers for each target
NE class. The best of our combination models was RF N400,
an RF model with 400 trees. Its gain in F1 with respect to
the best individual NE recognizer was 15% and with respect
to the baseline recognizer 4%. Performance of the RF and
DT models indicated that ML techniques lead to more
favorable combination of underlying NE recognizers than it
was done manually in the baseline NE recognizer. The
advantage of the ML models is that they can adapt to actual
text according to its features and annotations from
underlying NE recognizers, as well as benefit from given negative
examples.</p>
    </sec>
    <sec id="sec-13">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work was supported by projects VEGA 2/0185/13,
VENIS FP7-284984 and CLAN APVV-0809-11.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. E. C.</given-names>
            <surname>Basave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Varga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rowe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stankovic</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.-S.</given-names>
            <surname>Dadzie</surname>
          </string-name>
          .
          <article-title>Making sense of microposts (#msm2013) concept extraction challenge</article-title>
          .
          <source>In Making Sense of Microposts (#MSM2013) Concept Extraction Challenge</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bontcheva</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Rout</surname>
          </string-name>
          .
          <article-title>Making sense of social media streams through semantics: a survey</article-title>
          .
          <source>Semantic Web</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Breiman</surname>
          </string-name>
          .
          <article-title>Random forests</article-title>
          . Mach. Learn.,
          <volume>45</volume>
          (
          <issue>1</issue>
          ):
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          , Oct.
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Cunningham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Maynard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bontcheva</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Tablan</surname>
          </string-name>
          .
          <article-title>GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications</article-title>
          .
          <source>ACL'02. ACL</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dlugolinsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ciglan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Laclavik</surname>
          </string-name>
          .
          <article-title>Evaluation of named entity recognition tools on microposts</article-title>
          .
          <source>INES</source>
          <year>2013</year>
          . IEEE,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Etter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ferraro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cotterell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Buzek</surname>
          </string-name>
          , and
          <string-name>
            <surname>B. Van Durme. Nerit:</surname>
          </string-name>
          <article-title>Named entity recognition for informal text</article-title>
          .
          <source>Technical report, Technical Report 11</source>
          , HLTCE, Johns Hopkins University,
          <year>July 2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Finkel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Grenager</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <article-title>Incorporating non-local information into information extraction systems by gibbs sampling</article-title>
          .
          <source>ACL '05</source>
          , pages
          <fpage>363</fpage>
          -
          <lpage>370</lpage>
          , Stroudsburg, PA, USA,
          <year>2005</year>
          . ACL.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Florian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ittycheriah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jing</surname>
          </string-name>
          , and
          <string-name>
            <surname>T. Zhang.</surname>
          </string-name>
          <article-title>Named entity recognition through classifier combination</article-title>
          .
          <source>CONLL '03</source>
          , pages
          <fpage>168</fpage>
          -
          <lpage>171</lpage>
          , Stroudsburg, PA, USA,
          <year>2003</year>
          . ACL.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Habib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Keulen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhu</surname>
          </string-name>
          . Concept extraction challenge: University of Twente at #msm2013.
          <article-title>In Making Sense of Microposts (#MSM2013) Concept Extraction Challenge</article-title>
          , pages
          <fpage>17</fpage>
          -
          <lpage>20</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Korting</surname>
          </string-name>
          .
          <source>C4</source>
          .
          <article-title>5 algorithm and multivariate decision trees, image processing division. National Institute for Space Research-INPE Sa˜o Jos´e dos Campos-SP,</article-title>
          <string-name>
            <surname>Brazil</surname>
          </string-name>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Datta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sun</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.-S.</given-names>
            <surname>Lee</surname>
          </string-name>
          . Twiner:
          <article-title>Named entity recognition in targeted twitter stream</article-title>
          .
          <source>SIGIR '12</source>
          , pages
          <fpage>721</fpage>
          -
          <lpage>730</lpage>
          , New York, NY, USA,
          <year>2012</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhou</surname>
          </string-name>
          .
          <article-title>Recognizing named entities in tweets</article-title>
          .
          <source>HLT '11</source>
          , pages
          <fpage>359</fpage>
          -
          <lpage>367</lpage>
          , Stroudsburg, PA, USA,
          <year>2011</year>
          . ACL.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E.</given-names>
            <surname>Marsh</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Perzanowski</surname>
          </string-name>
          .
          <article-title>Muc-7 evaluation of ie technology: Overview of results</article-title>
          .
          <source>MUC-7</source>
          ,
          <year>April 1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Milne</surname>
          </string-name>
          and
          <string-name>
            <given-names>I. H.</given-names>
            <surname>Witten</surname>
          </string-name>
          .
          <article-title>An open-source toolkit for mining wikipedia</article-title>
          .
          <source>Artif</source>
          . Intell.,
          <volume>194</volume>
          :
          <fpage>222</fpage>
          -
          <lpage>239</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Quinlan</surname>
          </string-name>
          .
          <source>C4</source>
          .
          <article-title>5: programs for machine learning</article-title>
          . Morgan Kaufmann Publishers Inc., San Francisco, CA, USA,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ramage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nallapati</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <article-title>Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora</article-title>
          .
          <source>EMNLP '09</source>
          , pages
          <fpage>248</fpage>
          -
          <lpage>256</lpage>
          , Stroudsburg, PA, USA,
          <year>2009</year>
          . ACL.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ratinov</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Roth</surname>
          </string-name>
          .
          <article-title>Design challenges and misconceptions in named entity recognition</article-title>
          .
          <source>CoNLL '09</source>
          , pages
          <fpage>147</fpage>
          -
          <lpage>155</lpage>
          . ACL,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ratinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Downey</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Anderson</surname>
          </string-name>
          .
          <article-title>Local and global algorithms for disambiguation to wikipedia</article-title>
          .
          <source>HLT '11</source>
          , pages
          <fpage>1375</fpage>
          -
          <lpage>1384</lpage>
          . ACL,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ritter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Clark</surname>
          </string-name>
          , Mausam, and
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          .
          <article-title>Named entity recognition in tweets: An experimental study</article-title>
          .
          <source>EMNLP '11</source>
          , pages
          <fpage>1524</fpage>
          -
          <lpage>1534</lpage>
          , Stroudsburg, PA, USA,
          <year>2011</year>
          . ACL.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Saha</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Ekbal</surname>
          </string-name>
          .
          <article-title>Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition</article-title>
          .
          <source>Data Knowl. Eng.</source>
          ,
          <volume>85</volume>
          :
          <fpage>15</fpage>
          -
          <lpage>39</lpage>
          , May
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>L.</given-names>
            <surname>Si</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kanungo</surname>
          </string-name>
          , and
          <string-name>
            <surname>X. Huang.</surname>
          </string-name>
          <article-title>Boosting performance of bio-entity recognition by combining results from multiple systems</article-title>
          .
          <source>BIOKDD '05</source>
          , pages
          <fpage>76</fpage>
          -
          <lpage>83</lpage>
          , New York, NY, USA,
          <year>2005</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>E. F.</given-names>
            <surname>Tjong Kim Sang and F. De Meulder</surname>
          </string-name>
          .
          <article-title>Introduction to the conll-2003 shared task: language-independent named entity recognition</article-title>
          .
          <source>CONLL '03</source>
          , pages
          <fpage>142</fpage>
          -
          <lpage>147</lpage>
          , Stroudsburg, PA, USA,
          <year>2003</year>
          . ACL.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>L.</given-names>
            <surname>Todorovski</surname>
          </string-name>
          and
          <string-name>
            <surname>S. Dˇzeroski.</surname>
          </string-name>
          <article-title>Combining classifiers with meta decision trees</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>50</volume>
          (
          <issue>3</issue>
          ):
          <fpage>223</fpage>
          -
          <lpage>249</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24] Sˇtefan Dlugolinsky´,
          <string-name>
            <given-names>P.</given-names>
            <surname>Krammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ciglan</surname>
          </string-name>
          , and
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Laclav´ık. MSM2013 IE Challenge: Annotowatch. In Making Sense of Microposts (#MSM2013) Concept Extraction Challenge</article-title>
          , pages
          <fpage>21</fpage>
          -
          <lpage>26</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>