=Paper=
{{Paper
|id=Vol-1749/paper_012
|storemode=property
|title=FBK–NLP at NEEL–IT: Active Learning for Domain Adaptation
|pdfUrl=https://ceur-ws.org/Vol-1749/paper_012.pdf
|volume=Vol-1749
|authors=Anne–Lyse Minard,Mohammed Refat Hamouda Qwaider,Bernardo Magnini
|dblpUrl=https://dblp.org/rec/conf/clic-it/MinardQM16
}}
==FBK–NLP at NEEL–IT: Active Learning for Domain Adaptation==
FBK-NLP at NEEL-IT: Active Learning for Domain Adaptation
Anne-Lyse Minard1,2 , Mohammed R. H. Qwaider1 , Bernardo Magnini1
1
Fondazione Bruno Kessler, Trento, Italy
2
Dept. of Information Engineering, University of Brescia, Italy
{minard,qwaider,magnini}@fbk.eu
Abstract Named Entity Recognition system to a specific do-
main (tweets) by creating new annotated data.
English. In this paper we present the
The system follows 3 steps: entity recognition
FBK-NLP system which participated to
and classification, entity linking to DBpedia and
the NEEL-IT task at Evalita 2016. We
clustering. Entity recognition and classification is
concentrated our work on domain adapta-
performed by the EntityPro module (Pianta and
tion of an existed Named Entity Recogni-
Zanoli, 2007), which is based on machine learn-
tion tool. Particularly, we created a new
ing and uses the SVM algorithm. Entity linking is
annotated corpus for the NEEL-IT task us-
performed using the named entity disambiguation
ing an Active Learning method. Our sys-
module developed within the NewsReader project
tem obtained the best results for the task of
for several languages including Italian. In addition
Named Entity Recognition, with an F1 of
we used the Alignments dataset (Nechaev et al.,
0.516.
2016), a resource which provides links between
Italiano. In questo articolo descrivi- Twitter profiles and DBpedia. Clustering step is
amo il sistema FBK-NLP con il quale string-based, i.e. two entities are part of the same
abbiamo partecipato al task NEEL-IT cluster if they are equal.
a Evalita 2016. Ci siamo concentrati The paper is organized as follows. In Section 2
sull’adattamento di un sistema per il ri- we present the domain adaptation of the Named
conoscimento di entità al dominio dei Entity Recognition tool using Active Learning.
tweets. In particolare, abbiamo creato Then in Section 3 we describe the system with
un nuovo corpus usando una metodologia which we participated to the task and in Section
basata su Active Learning. Il sistema ha 4 the results we obtained as well as some further
ottenuto i risultati migliori sul sottotask di experiments. Finally we conclude the paper with
riconoscimento delle entità, con una F1 di a discussion in Section 5.
0,516.
2 Domain Adaptation for NER
1 Introduction We have at our disposal a system for Named En-
tity Recognition and Classification, a module of
This paper describes the FBK-NLP system which
the TextPro pipeline (Pianta et al., 2008) called
participated to the NEEL-IT task at EVALITA
EntityPro (Pianta and Zanoli, 2007), which works
2016 (Basile et al., 2016). The NEEL-IT task
for 4 named entity categories in the news domain.
focuses on Named Entity Linking in tweets in
It is trained on the publicly available Italian cor-
Italian. It consists in three steps: Named En-
pus I-CAB (Magnini et al., 2006). I-CAB is com-
tity Recognition and Classification (NER) in 7
posed of news articles from the regional newspa-
classes (person, location, organization, product,
per ”L’Adige”, is annotated with person, organi-
event, thing and character); the linking of each en-
zation, location and geo-political entities, and was
tity to an entry of DBpedia; the clustering of the
used for the Named Entity Recognition task at
entities. Our participation to the task was mainly
Evalita 2007 and 2009.1 However, no annotated
motivated by our interest in experimenting on the
data are available for the task of NER in tweets for
application of Active Learning (AL) for domain
1
adaptation, in particular to adapt a general purpose www.evalita.it/
Global Memory of the system with the informa-
tion about the manual revision. At step 5 a sin-
gle tweet is selected from the unlabeled dataset
through a specific selection strategy (see Algo-
rithm 1). The selected tweet is removed from the
unlabeled set and is given for revision to the anno-
tator.
The Global Memory contains the revision done
by the annotator for each tweet. In particular we
are interested in the entities wrongly annotated by
the system, which are used to select new tweets to
be annotated. Each entity (or error) saved in the
memory is used up to 6 times in order to select
new tweets. From the unlabeled dataset, the sys-
tem selects the most informative instance (i.e. with
Figure 1: Architecture of the TextPro-AL platform the lowest confidence score) that contains one of
the errors saved in the Global Memory (GM). The
Italian. selection strategy is detailed in Algorithm 1. In
As we were interested in applying Active Learn- a first step the system annotates the tweets of the
ing (AL) methods to the production of training unlabeled dataset. Then the tweets are sorted from
data, we decided to annotate manually our own the most informative to the less informative and
set of domain specific training data using AL browsed. The first tweet in the list that contains
method.2 Active Learning is used in order to select an error saved in the GM is selected to be revised
the most informative examples to be annotated, in- by the annotator. If no tweets are selected through
stead of selecting random examples. this process, the system picks one tweet randomly.
We exploited TextPro-AL (Magnini et al.,
2016), a platform which integrates a NLP pipeline, Algorithm 1: Algorithm of the selection strat-
i.e. TextPro (Pianta et al., 2008), with a system of egy
Active Learning and an annotation interface based Data: NESet = {NE1 ... NEn }
on MTEqual (Girardi et al., 2014). TextPro-AL begin
enables for a more efficient use of the time of the NESortedList ←
annotators. getMostInformativeInstances(NESet);
repeat
2.1 The TextPro-AL platform instance, sample ←
NESortedList.next();
The architecture of the TextPro-AL platform is
if inMemory(instance) and
represented in Figure 1. The AL cycle starts with
revised(instance) then
an annotator providing supervision on a tweet au-
return sample;
tomatically tagged by the system (step 1): the an-
notator is asked to revise the annotation in case until NESortedList.hasNext();
the system made a wrong classification. At step return getRandomSample(NESet);
2a the annotated tweet is stored in a batch, where
it is accumulated with other tweets for re-training,
and, as a result, a new model (step 3) is produced. 2.2 Available Data
This model is then used to automatically annotate
a set of unlabeled tweets (step 4) and to assign a As unlabeled database of tweets in the AL process
confidence score3 to each annotated tweet. At step we used around 8,000 tweets taken from the devel-
2b the manually annotated tweet is stored in the opment set of Sentipolc 20144 (Basile et al., 2014)
and the Twita corpus5 (Basile and Nissim, 2013).
2
The annotated data made available by the organizers of
4
the task were used partly as test data and partly as a reference http://www.di.unito.it/˜tutreeb/
for the annotators (see Section 2.2). sentipolc-evalita14/tweet.html
3 5
The confidence score is computed as the average of the http://valeriobasile.github.io/twita/
margin estimated by the SVM classifier for each entity. about.html
class AL tweets NEEL-IT dev news corpus
test 70% dev 30% total
# sent/tweets 2,654 700 300 1,000 458
# tokens 49,819 13,283 5,707 18,990 8,304
Person 1628 225 90 315 293
Location 343 89 43 132 115
Organization 723 185 63 248 224
Product 478 67 41 108 -
Event 133 12 3 15 -
Thing 15 15 4 19 -
Character 50 15 1 16 -
Table 1: Statistics about the used datasets. The numbers of tokens for the tweets are computed after the
tokenizaion, i.e. the hashtags and aliases can be split in more than one token and the emoji are composed
by several tokens (see Section 3.1).
The development data provided by the NEEL- entities (we will refer to this corpus as AL tweets),
IT organizers is composed by 1000 annotated which allowed us to obtain an F1 of 53.22 on test
tweets. We split it in two parts: 30% for devel- 70%. Statistics about the corpus are presented in
opment (used mainly as a reference for the anno- Table 1.
tators) and 70% for evaluation (referred to as test
70%). 3 Description of the system
We decided to retrain EntityPro using a smaller
training set to be able to change the behavior of 3.1 Entity Recognition and Classification
the model more quickly. In particular we used a The preprocessing of the tweets is done using the
sub-part of the training data used by EntityPro, i.e. TextPro tool suite7 (Pianta et al., 2008), in particu-
6.25% of the training set of the NER task at Evalita lar using the tokenizer, the PoS tagger and the lem-
2007,6 for a total of 8,304 tokens (referred to as matizer. The rules used by the tokenizer have been
news corpus in the remainder of the paper). lightly adapted for the processing of tweets, for
In order to determine the portion to be used, example to be able to split Twitter profile names
we tested the performance of EntityPro using as and hashtags in small units. The PoS tagger and
training data different portions of the corpus (50%, the lemmatizer have been used as they are, with-
25%, 12.5% and 6.25%) on test 70%. The best out any adaptation.
results were obtained using 6.25% of the corpus In order to avoid some encoding problems we
(statistics about this corpus is given in Table 1). replaced all the emoji by their Emoji codes (e.g.
:confused face:) using the python package emoji
2.3 Manual Annotation of Training Data 0.3.9.8
with TextPro-AL The task of entity recognition and classification
In our experimentation with TextPro-AL for do- is performed using an adapted version of the En-
main adaptation we built the first model using tityPro module (Pianta and Zanoli, 2007). Enti-
the news corpus only. Evaluated on test 70%, it tyPro performs named entity recognition based on
reached an F1 of 41.62 with a precision of 54.91 machine learning, using an SVM algorithm and
and a recall of 33.51. It has to be noted that with the Yamcha tool (Kudo and Matsumoto, 2003). It
this model only 3 categories of entities can be rec- exploits a rich set of linguistic features, as well
ognized: person, location and organization. Then as gazetteers. We added to the features an ortho-
every time that 50 new tweets were annotated, the graphic feature (capitalized word, digits, etc.) and
system was retrained and evaluated on the test bigrams (the first two characters and the last two).
70% corpus. The learning curves of the system The classifier is used in a one-vs-rest multi-
are presented in Figure 2. In total we were able to classification strategy. The format used for the
manually annotate 2,654 tweets for a total of 3,370
7
http://textpro.fbk.eu/
6 8
http://www.evalita.it/2007/tasks/ner http://pypi.python.org/pypi/emoji/
Figure 2: Learning curves of the system (recall, precision and F1)
annotation is the classic IOB2 format. Each to- Twitter profiles. It has 920,625 mapped DBpedia
ken is labeled either as B- followed by the en- entries to their corresponding user profile(s) with
tity class (person, location, organization, product, a confidence score.
event, thing or character) for the first token of an A procedure is built to query Twitter to get the
entity, I- followed by the entity class for the tokens Twitter profile id from the alias of a user, then
inside an entity or O if the token is not part of an query the Alignments dataset to get the corre-
entity. sponding DBpedia link if it exists.
3.2 Entity Linking 3.3 Clustering
Entity Linking is performed using the Named The clustering task aims at gathering the entities
Entity Disambiguation (NED) module developed referring to the same instance and at assigning to
within the NewsReader Project9 supplemented them an identifier, either a DBpedia link or a cor-
with the use of a resource for Twitter profiles link- pus based identifier. We performed this task ap-
ing. The NED module is a wrapper around DB- plying a basic string matched method, i.e. we con-
pedia spotlight developed within NewsReader and sider that two entities are part of the same cluster
part of the ixa-pipeline.10 Each entity recognized if their strings are the same.
by the NER module is sent to DBpedia Spotlight
which returns the most probable URI if the entity 4 Results
exists in DBpedia.
The tweets often contain aliases, i.e. user profile We submitted 3 runs to the NEEL-IT task; they
names, which enable the author of the tweet to re- differ from the data included in the training dataset
fer to other Twitter users. For example @edoardo- of EntityPro:
fasoli and @senatoremonti in the following tweet:
• Run 1: news corpus and AL tweets
@edoardofasoli @senatoremonti Tutti e due. In
order to identify the DBpedia links of the aliases
• Run 2: news corpus, AL tweets and NEEL-IT
in the tweets we used the Alignments dataset
devset
(Nechaev et al., 2016). The Alignments dataset
is built from the 2015-10 edition of English DB- • Run 3: AL tweets and NEEL-IT devset
pedia, which contains DBpedia links aligned with
9 The official results are presented in the first part
http://www.newsreader-project.eu/
10
https://github.com/ixa-ehu/ of Table 2. Our best performance is obtained with
ixa-pipe-ned the run 3, with a final score of 0.49.
runs training set tagging linking clustering final score
run 1 news corpus + AL tweets 0.509 0.333 0.574 0.4822
run 2 news corpus + AL tweets + NEEL-IT devset 0.508 0.346 0.583 0.4894
run 3 AL tweets + NEEL-IT devset 0.516 0.348 0.585 0.4932
run 4* AL tweets + NEEL-IT devset 0.517 0.355 0.590 0.4976
run 5* news corpus 0.378 0.298 0.473 0.3920
run 6* NEEL-IT devset 0.438 0.318 0.515 0.4328
run 7* news corpus + NEEL-IT devset 0.459 0.334 0.541 0.4543
Table 2: Results of the submitted runs (runs 1 to 3) and of some further experiments (runs 4 to 8). The
official task metrics are ”strong typed mention match”, ”strong link match” and ”mention ceaf”, and
refer to ”tagging”, ”linking” and ”clustering” respectively.
After the evaluation period, we have run fur- of 0.516 for named entity recognition.
ther experiments, which are marked with an as- Our work has been concentrated on the use of
terisk in Table 2. The run 4 is a version of run Active Learning for the domain adaptation of a
3 in which we have removed the wrong links to NER system. On the other hand, the Micro-NEEL
the Italian DBpedia (URIs of type http://it. team (Corcoglioniti et al., 2016) focuses on the
dbpedia.org/). For runs 5, 6 and 7, EntityPro task of Entity Linking, using The Wiki Machine
is trained using the news corpus alone, the NEEL- (Palmero Aprosio and Giuliano, 2016). We have
IT devset, and both respectively. combined our NER system with the Micro-NEEL
In Table 3, we present the performances of our system. For the tagging subtask we used the same
systems in terms of precision, recall and F1 for the configuration than run 4 (AL tweets + NEEL-IT
subtask of named entity recognition and classifica- devset). The results obtained with combination of
tion. We observed that using the NEEL-IT devset the two systems are 0.517 for tagging, 0.465 for
the precision of our system increased, instead us- linking and 0.586 for clustering. The final score
ing the news corpus the recall increased. is 0.5290, surpassing all the runs submitted to the
task.
precision recall F1
run 1 0.571 0.459 0.509 One of the main difficulty in identifying named
run 2 0.581 0.451 0.508 entities in tweets is the problem of the splitting
run 3 0.598 0.454 0.516 of hashtags and aliases (e.g. the identification
of Monti in @senatoremonti). We adapted the
Table 3: Results for the task of named entity TextPro tokenizer to split in small units those se-
recognition and classification quences of characters, but it works only if the dif-
ferent words are capitalized or separated by some
punctuation signs (e.g. or -). A more complex
5 Discussion approach should be used, using a dictionary to im-
We have described our participation to the NEEL- prove the splitting.
IT task at Evalita 2016. Our work focused on the Named entity categories covered in this task
task of named entity recognition, for which we get are seven: person, location, organization, product,
the best results. We were interested in the topic event, thing and character. The first three cate-
of domain adaptation. The domain adaptation in- gories are the classical ones and cover the highest
cludes two aspects: the type of the documents and number of named entities in several corpora. Ta-
the named entity classes of interest. Using Enti- ble 1 gives us an evidence of the prominence of
tyPro, an existing NER tool, and the TextPro-AL these three classes. With the AL method we used,
platform, we created a training dataset for NER in we were able to annotate new tweets containing
tweets, for the 7 classes identified in the task.11 entities of the less represented classes, in particu-
With this new resource our system obtained an F1 lar for product, event and character. However the
11
class thing is still not well represented in our cor-
We will soon make available the new training set from
the website of the HLT-NLP group at FBK (http:// pus and the classes unbalanced. In the future we
hlt-nlp.fbk.eu/). plan to add in the TextPro-AL platform the pos-
sibility for the annotators to monitor the Global Conference on Computational Linguistics, Proceed-
Memory used in the AL process in order to give ings of the Conference System Demonstrations, Au-
gust 23-29, 2014, Dublin, Ireland, pages 120–123.
precedence to examples containing entities of not
well represented classes. Taku Kudo and Yuji Matsumoto. 2003. Fast Methods
for Kernel-based Text Analysis. In Proceedings of
Acknowledgments the 41st Annual Meeting on Association for Com-
putational Linguistics - Volume 1, ACL ’03, pages
This work has been partially supported by the EU- 24–31, Stroudsburg, PA, USA.
CLIP (EUregio Cross LInguistic Project) project, Bernardo Magnini, Emanuele Pianta, Christian Girardi,
under a collaboration between FBK and Eure- Matteo Negri, Lorenza Romano, Manuela Speranza,
gio.12 Valentina Bartalesi Lenzi, and Rachele Sprugnoli.
2006. I-CAB: the Italian Content Annotation Bank.
In Proceedings of the 5th Conference on Language
Resources and Evaluation (LREC-2006).
References
Bernardo Magnini, Anne-Lyse Minard, Mohammed
Valerio Basile and Malvina Nissim. 2013. Sentiment
R. H. Qwaider, and Manuela Speranza. 2016.
analysis on italian tweets. In Proceedings of the 4th
T EXT P RO -AL: An Active Learning Platform for
Workshop on Computational Approaches to Subjec-
Flexible and Efficient Production of Training Data
tivity, Sentiment and Social Media Analysis, pages
for NLP Tasks. In Proceedings of COLING 2016,
100–107, Atlanta.
the 26th International Conference on Computational
Linguistics: System Demonstrations.
Valerio Basile, Andrea Bolioli, Malvina Nissim, Vi-
viana Patti, and Paolo Rosso. 2014. Overview of Yaroslav Nechaev, Francesco Corcoglioniti, and Clau-
the Evalita 2014 SENTIment POLarity Classifica- dio Giuliano. 2016. Linking knowledge bases to
tion Task. In Proceedings of the 4th evaluation cam- social media profiles.
paign of Natural Language Processing and Speech
tools for Italian (EVALITA’14), Pisa, Italy. Alessio Palmero Aprosio and Claudio Giuliano. 2016.
The Wiki Machine: an open source software for en-
Pierpaolo Basile, Annalina Caputo, Anna Lisa Gen- tity linking and enrichment. ArXiv e-prints.
tile, and Giuseppe Rizzo. 2016. Overview of
the EVALITA 2016 Named Entity rEcognition and Emanuele Pianta and Roberto Zanoli. 2007. Entitypro:
Linking in Italian Tweets (NEEL-IT) Task. In Exploiting svm for italian named entity recognition.
Pierpaolo Basile, Anna Corazza, Franco Cutugno, Intelligenza Artificiale numero speciale su Stru-
Simonetta Montemagni, Malvina Nissim, Viviana menti per lelaborazione del linguaggio naturale per
Patti, Giovanni Semeraro, and Rachele Sprugnoli, litaliano EVALITA 2007, 4(2):69–70.
editors, Proceedings of Third Italian Conference on
Computational Linguistics (CLiC-it 2016) & Fifth Emanuele Pianta, Christian Girardi, and Roberto
Evaluation Campaign of Natural Language Pro- Zanoli. 2008. The TextPro Tool Suite. In Pro-
cessing and Speech Tools for Italian. Final Work- ceedings of the 6th International Conference on
shop (EVALITA 2016). Associazione Italiana di Lin- Language Resources and Evaluation (LREC 2008),
guistica Computazionale (AILC). Marrakech, Morocco.
Francesco Corcoglioniti, Alessio Palmero Aprosio,
Yaroslav Nechaev, and Claudio Giuliano. 2016. Mi-
croNeel: Combining NLP Tools to Perform Named
Entity Detection and Linking on Microposts. In
Pierpaolo Basile, Anna Corazza, Franco Cutugno,
Simonetta Montemagni, Malvina Nissim, Viviana
Patti, Giovanni Semeraro, and Rachele Sprugnoli,
editors, Proceedings of Third Italian Conference on
Computational Linguistics (CLiC-it 2016) & Fifth
Evaluation Campaign of Natural Language Pro-
cessing and Speech Tools for Italian. Final Work-
shop (EVALITA 2016). Associazione Italiana di Lin-
guistica Computazionale (AILC).
Christian Girardi, Luisa Bentivogli, Mohammad Amin
Farajian, and Marcello Federico. 2014. Mt-equal:
a toolkit for human assessment of machine transla-
tion output. In COLING 2014, 25th International
12
http://www.euregio.it