=Paper=
{{Paper
|id=Vol-1749/paper_012
|storemode=property
|title=FBK–NLP at NEEL–IT: Active Learning for Domain Adaptation
|pdfUrl=https://ceur-ws.org/Vol-1749/paper_012.pdf
|volume=Vol-1749
|authors=Anne–Lyse Minard,Mohammed Refat Hamouda Qwaider,Bernardo Magnini
|dblpUrl=https://dblp.org/rec/conf/clic-it/MinardQM16
}}
==FBK–NLP at NEEL–IT: Active Learning for Domain Adaptation==
<pdf width="1500px">https://ceur-ws.org/Vol-1749/paper_012.pdf</pdf>
<pre>
       FBK-NLP at NEEL-IT: Active Learning for Domain Adaptation

           Anne-Lyse Minard1,2 , Mohammed R. H. Qwaider1 , Bernardo Magnini1
                           1
                             Fondazione Bruno Kessler, Trento, Italy
               2
                 Dept. of Information Engineering, University of Brescia, Italy
                        {minard,qwaider,magnini}@fbk.eu


                     Abstract                          Named Entity Recognition system to a specific do-
                                                       main (tweets) by creating new annotated data.
    English. In this paper we present the
                                                          The system follows 3 steps: entity recognition
    FBK-NLP system which participated to
                                                       and classification, entity linking to DBpedia and
    the NEEL-IT task at Evalita 2016. We
                                                       clustering. Entity recognition and classification is
    concentrated our work on domain adapta-
                                                       performed by the EntityPro module (Pianta and
    tion of an existed Named Entity Recogni-
                                                       Zanoli, 2007), which is based on machine learn-
    tion tool. Particularly, we created a new
                                                       ing and uses the SVM algorithm. Entity linking is
    annotated corpus for the NEEL-IT task us-
                                                       performed using the named entity disambiguation
    ing an Active Learning method. Our sys-
                                                       module developed within the NewsReader project
    tem obtained the best results for the task of
                                                       for several languages including Italian. In addition
    Named Entity Recognition, with an F1 of
                                                       we used the Alignments dataset (Nechaev et al.,
    0.516.
                                                       2016), a resource which provides links between
    Italiano. In questo articolo descrivi-             Twitter profiles and DBpedia. Clustering step is
    amo il sistema FBK-NLP con il quale                string-based, i.e. two entities are part of the same
    abbiamo partecipato al task NEEL-IT                cluster if they are equal.
    a Evalita 2016. Ci siamo concentrati                  The paper is organized as follows. In Section 2
    sull’adattamento di un sistema per il ri-          we present the domain adaptation of the Named
    conoscimento di entità al dominio dei             Entity Recognition tool using Active Learning.
    tweets. In particolare, abbiamo creato             Then in Section 3 we describe the system with
    un nuovo corpus usando una metodologia             which we participated to the task and in Section
    basata su Active Learning. Il sistema ha           4 the results we obtained as well as some further
    ottenuto i risultati migliori sul sottotask di     experiments. Finally we conclude the paper with
    riconoscimento delle entità, con una F1 di        a discussion in Section 5.
    0,516.
                                                       2       Domain Adaptation for NER
1   Introduction                                       We have at our disposal a system for Named En-
                                                       tity Recognition and Classification, a module of
This paper describes the FBK-NLP system which
                                                       the TextPro pipeline (Pianta et al., 2008) called
participated to the NEEL-IT task at EVALITA
                                                       EntityPro (Pianta and Zanoli, 2007), which works
2016 (Basile et al., 2016). The NEEL-IT task
                                                       for 4 named entity categories in the news domain.
focuses on Named Entity Linking in tweets in
                                                       It is trained on the publicly available Italian cor-
Italian. It consists in three steps: Named En-
                                                       pus I-CAB (Magnini et al., 2006). I-CAB is com-
tity Recognition and Classification (NER) in 7
                                                       posed of news articles from the regional newspa-
classes (person, location, organization, product,
                                                       per ”L’Adige”, is annotated with person, organi-
event, thing and character); the linking of each en-
                                                       zation, location and geo-political entities, and was
tity to an entry of DBpedia; the clustering of the
                                                       used for the Named Entity Recognition task at
entities. Our participation to the task was mainly
                                                       Evalita 2007 and 2009.1 However, no annotated
motivated by our interest in experimenting on the
                                                       data are available for the task of NER in tweets for
application of Active Learning (AL) for domain
                                                           1
adaptation, in particular to adapt a general purpose           www.evalita.it/
                                                                   Global Memory of the system with the informa-
                                                                   tion about the manual revision. At step 5 a sin-
                                                                   gle tweet is selected from the unlabeled dataset
                                                                   through a specific selection strategy (see Algo-
                                                                   rithm 1). The selected tweet is removed from the
                                                                   unlabeled set and is given for revision to the anno-
                                                                   tator.
                                                                      The Global Memory contains the revision done
                                                                   by the annotator for each tweet. In particular we
                                                                   are interested in the entities wrongly annotated by
                                                                   the system, which are used to select new tweets to
                                                                   be annotated. Each entity (or error) saved in the
                                                                   memory is used up to 6 times in order to select
                                                                   new tweets. From the unlabeled dataset, the sys-
                                                                   tem selects the most informative instance (i.e. with
Figure 1: Architecture of the TextPro-AL platform                  the lowest confidence score) that contains one of
                                                                   the errors saved in the Global Memory (GM). The
Italian.                                                           selection strategy is detailed in Algorithm 1. In
   As we were interested in applying Active Learn-                 a first step the system annotates the tweets of the
ing (AL) methods to the production of training                     unlabeled dataset. Then the tweets are sorted from
data, we decided to annotate manually our own                      the most informative to the less informative and
set of domain specific training data using AL                      browsed. The first tweet in the list that contains
method.2 Active Learning is used in order to select                an error saved in the GM is selected to be revised
the most informative examples to be annotated, in-                 by the annotator. If no tweets are selected through
stead of selecting random examples.                                this process, the system picks one tweet randomly.
   We exploited TextPro-AL (Magnini et al.,
2016), a platform which integrates a NLP pipeline,                  Algorithm 1: Algorithm of the selection strat-
i.e. TextPro (Pianta et al., 2008), with a system of                egy
Active Learning and an annotation interface based                    Data: NESet = {NE1 ... NEn }
on MTEqual (Girardi et al., 2014). TextPro-AL                        begin
enables for a more efficient use of the time of the                     NESortedList ←
annotators.                                                              getMostInformativeInstances(NESet);
                                                                        repeat
2.1    The TextPro-AL platform                                              instance, sample ←
                                                                              NESortedList.next();
The architecture of the TextPro-AL platform is
                                                                            if inMemory(instance) and
represented in Figure 1. The AL cycle starts with
                                                                              revised(instance) then
an annotator providing supervision on a tweet au-
                                                                                 return sample;
tomatically tagged by the system (step 1): the an-
notator is asked to revise the annotation in case                        until NESortedList.hasNext();
the system made a wrong classification. At step                          return getRandomSample(NESet);
2a the annotated tweet is stored in a batch, where
it is accumulated with other tweets for re-training,
and, as a result, a new model (step 3) is produced.                2.2   Available Data
This model is then used to automatically annotate
a set of unlabeled tweets (step 4) and to assign a                 As unlabeled database of tweets in the AL process
confidence score3 to each annotated tweet. At step                 we used around 8,000 tweets taken from the devel-
2b the manually annotated tweet is stored in the                   opment set of Sentipolc 20144 (Basile et al., 2014)
                                                                   and the Twita corpus5 (Basile and Nissim, 2013).
   2
      The annotated data made available by the organizers of
                                                                     4
the task were used partly as test data and partly as a reference       http://www.di.unito.it/˜tutreeb/
for the annotators (see Section 2.2).                              sentipolc-evalita14/tweet.html
    3                                                                5
      The confidence score is computed as the average of the           http://valeriobasile.github.io/twita/
margin estimated by the SVM classifier for each entity.            about.html
                class           AL tweets             NEEL-IT dev               news corpus
                                              test 70% dev 30% total
                # sent/tweets   2,654         700       300       1,000         458
                # tokens        49,819        13,283    5,707     18,990        8,304
                Person          1628          225       90        315           293
                Location        343           89        43        132           115
                Organization    723           185       63        248           224
                Product         478           67        41        108           -
                Event           133           12        3         15            -
                Thing           15            15        4         19            -
                Character       50            15        1         16            -

Table 1: Statistics about the used datasets. The numbers of tokens for the tweets are computed after the
tokenizaion, i.e. the hashtags and aliases can be split in more than one token and the emoji are composed
by several tokens (see Section 3.1).


   The development data provided by the NEEL-           entities (we will refer to this corpus as AL tweets),
IT organizers is composed by 1000 annotated             which allowed us to obtain an F1 of 53.22 on test
tweets. We split it in two parts: 30% for devel-        70%. Statistics about the corpus are presented in
opment (used mainly as a reference for the anno-        Table 1.
tators) and 70% for evaluation (referred to as test
70%).                                                   3       Description of the system
   We decided to retrain EntityPro using a smaller
training set to be able to change the behavior of       3.1      Entity Recognition and Classification
the model more quickly. In particular we used a         The preprocessing of the tweets is done using the
sub-part of the training data used by EntityPro, i.e.   TextPro tool suite7 (Pianta et al., 2008), in particu-
6.25% of the training set of the NER task at Evalita    lar using the tokenizer, the PoS tagger and the lem-
2007,6 for a total of 8,304 tokens (referred to as      matizer. The rules used by the tokenizer have been
news corpus in the remainder of the paper).             lightly adapted for the processing of tweets, for
   In order to determine the portion to be used,        example to be able to split Twitter profile names
we tested the performance of EntityPro using as         and hashtags in small units. The PoS tagger and
training data different portions of the corpus (50%,    the lemmatizer have been used as they are, with-
25%, 12.5% and 6.25%) on test 70%. The best             out any adaptation.
results were obtained using 6.25% of the corpus            In order to avoid some encoding problems we
(statistics about this corpus is given in Table 1).     replaced all the emoji by their Emoji codes (e.g.
                                                        :confused face:) using the python package emoji
2.3     Manual Annotation of Training Data              0.3.9.8
        with TextPro-AL                                    The task of entity recognition and classification
In our experimentation with TextPro-AL for do-          is performed using an adapted version of the En-
main adaptation we built the first model using          tityPro module (Pianta and Zanoli, 2007). Enti-
the news corpus only. Evaluated on test 70%, it         tyPro performs named entity recognition based on
reached an F1 of 41.62 with a precision of 54.91        machine learning, using an SVM algorithm and
and a recall of 33.51. It has to be noted that with     the Yamcha tool (Kudo and Matsumoto, 2003). It
this model only 3 categories of entities can be rec-    exploits a rich set of linguistic features, as well
ognized: person, location and organization. Then        as gazetteers. We added to the features an ortho-
every time that 50 new tweets were annotated, the       graphic feature (capitalized word, digits, etc.) and
system was retrained and evaluated on the test          bigrams (the first two characters and the last two).
70% corpus. The learning curves of the system              The classifier is used in a one-vs-rest multi-
are presented in Figure 2. In total we were able to     classification strategy. The format used for the
manually annotate 2,654 tweets for a total of 3,370
                                                            7
                                                                http://textpro.fbk.eu/
   6                                                        8
       http://www.evalita.it/2007/tasks/ner                     http://pypi.python.org/pypi/emoji/
                    Figure 2: Learning curves of the system (recall, precision and F1)


annotation is the classic IOB2 format. Each to-          Twitter profiles. It has 920,625 mapped DBpedia
ken is labeled either as B- followed by the en-          entries to their corresponding user profile(s) with
tity class (person, location, organization, product,     a confidence score.
event, thing or character) for the first token of an        A procedure is built to query Twitter to get the
entity, I- followed by the entity class for the tokens   Twitter profile id from the alias of a user, then
inside an entity or O if the token is not part of an     query the Alignments dataset to get the corre-
entity.                                                  sponding DBpedia link if it exists.

3.2    Entity Linking                                    3.3    Clustering
Entity Linking is performed using the Named              The clustering task aims at gathering the entities
Entity Disambiguation (NED) module developed             referring to the same instance and at assigning to
within the NewsReader Project9 supplemented              them an identifier, either a DBpedia link or a cor-
with the use of a resource for Twitter profiles link-    pus based identifier. We performed this task ap-
ing. The NED module is a wrapper around DB-              plying a basic string matched method, i.e. we con-
pedia spotlight developed within NewsReader and          sider that two entities are part of the same cluster
part of the ixa-pipeline.10 Each entity recognized       if their strings are the same.
by the NER module is sent to DBpedia Spotlight
which returns the most probable URI if the entity        4     Results
exists in DBpedia.
   The tweets often contain aliases, i.e. user profile   We submitted 3 runs to the NEEL-IT task; they
names, which enable the author of the tweet to re-       differ from the data included in the training dataset
fer to other Twitter users. For example @edoardo-        of EntityPro:
fasoli and @senatoremonti in the following tweet:
                                                             • Run 1: news corpus and AL tweets
@edoardofasoli @senatoremonti Tutti e due. In
order to identify the DBpedia links of the aliases
                                                             • Run 2: news corpus, AL tweets and NEEL-IT
in the tweets we used the Alignments dataset
                                                               devset
(Nechaev et al., 2016). The Alignments dataset
is built from the 2015-10 edition of English DB-             • Run 3: AL tweets and NEEL-IT devset
pedia, which contains DBpedia links aligned with
   9                                                        The official results are presented in the first part
     http://www.newsreader-project.eu/
  10
     https://github.com/ixa-ehu/                         of Table 2. Our best performance is obtained with
ixa-pipe-ned                                             the run 3, with a final score of 0.49.
    runs       training set                                     tagging    linking    clustering   final score
    run 1      news corpus + AL tweets                          0.509      0.333      0.574        0.4822
    run 2      news corpus + AL tweets + NEEL-IT devset         0.508      0.346      0.583        0.4894
    run 3      AL tweets + NEEL-IT devset                       0.516      0.348      0.585        0.4932
    run 4*     AL tweets + NEEL-IT devset                       0.517      0.355      0.590        0.4976
    run 5*     news corpus                                      0.378      0.298      0.473        0.3920
    run 6*     NEEL-IT devset                                   0.438      0.318      0.515        0.4328
    run 7*     news corpus + NEEL-IT devset                     0.459      0.334      0.541        0.4543

Table 2: Results of the submitted runs (runs 1 to 3) and of some further experiments (runs 4 to 8). The
official task metrics are ”strong typed mention match”, ”strong link match” and ”mention ceaf”, and
refer to ”tagging”, ”linking” and ”clustering” respectively.


   After the evaluation period, we have run fur-            of 0.516 for named entity recognition.
ther experiments, which are marked with an as-                 Our work has been concentrated on the use of
terisk in Table 2. The run 4 is a version of run            Active Learning for the domain adaptation of a
3 in which we have removed the wrong links to               NER system. On the other hand, the Micro-NEEL
the Italian DBpedia (URIs of type http://it.                team (Corcoglioniti et al., 2016) focuses on the
dbpedia.org/). For runs 5, 6 and 7, EntityPro               task of Entity Linking, using The Wiki Machine
is trained using the news corpus alone, the NEEL-           (Palmero Aprosio and Giuliano, 2016). We have
IT devset, and both respectively.                           combined our NER system with the Micro-NEEL
   In Table 3, we present the performances of our           system. For the tagging subtask we used the same
systems in terms of precision, recall and F1 for the        configuration than run 4 (AL tweets + NEEL-IT
subtask of named entity recognition and classifica-         devset). The results obtained with combination of
tion. We observed that using the NEEL-IT devset             the two systems are 0.517 for tagging, 0.465 for
the precision of our system increased, instead us-          linking and 0.586 for clustering. The final score
ing the news corpus the recall increased.                   is 0.5290, surpassing all the runs submitted to the
                                                            task.
                    precision   recall   F1
            run 1   0.571       0.459    0.509                 One of the main difficulty in identifying named
            run 2   0.581       0.451    0.508              entities in tweets is the problem of the splitting
            run 3   0.598       0.454    0.516              of hashtags and aliases (e.g. the identification
                                                            of Monti in @senatoremonti). We adapted the
Table 3: Results for the task of named entity               TextPro tokenizer to split in small units those se-
recognition and classification                              quences of characters, but it works only if the dif-
                                                            ferent words are capitalized or separated by some
                                                            punctuation signs (e.g. or -). A more complex
5        Discussion                                         approach should be used, using a dictionary to im-
We have described our participation to the NEEL-            prove the splitting.
IT task at Evalita 2016. Our work focused on the               Named entity categories covered in this task
task of named entity recognition, for which we get          are seven: person, location, organization, product,
the best results. We were interested in the topic           event, thing and character. The first three cate-
of domain adaptation. The domain adaptation in-             gories are the classical ones and cover the highest
cludes two aspects: the type of the documents and           number of named entities in several corpora. Ta-
the named entity classes of interest. Using Enti-           ble 1 gives us an evidence of the prominence of
tyPro, an existing NER tool, and the TextPro-AL             these three classes. With the AL method we used,
platform, we created a training dataset for NER in          we were able to annotate new tweets containing
tweets, for the 7 classes identified in the task.11         entities of the less represented classes, in particu-
With this new resource our system obtained an F1            lar for product, event and character. However the
    11
                                                            class thing is still not well represented in our cor-
    We will soon make available the new training set from
the website of the HLT-NLP group at FBK (http://            pus and the classes unbalanced. In the future we
hlt-nlp.fbk.eu/).                                           plan to add in the TextPro-AL platform the pos-
sibility for the annotators to monitor the Global           Conference on Computational Linguistics, Proceed-
Memory used in the AL process in order to give              ings of the Conference System Demonstrations, Au-
                                                            gust 23-29, 2014, Dublin, Ireland, pages 120–123.
precedence to examples containing entities of not
well represented classes.                                 Taku Kudo and Yuji Matsumoto. 2003. Fast Methods
                                                            for Kernel-based Text Analysis. In Proceedings of
Acknowledgments                                             the 41st Annual Meeting on Association for Com-
                                                            putational Linguistics - Volume 1, ACL ’03, pages
This work has been partially supported by the EU-           24–31, Stroudsburg, PA, USA.
CLIP (EUregio Cross LInguistic Project) project,          Bernardo Magnini, Emanuele Pianta, Christian Girardi,
under a collaboration between FBK and Eure-                 Matteo Negri, Lorenza Romano, Manuela Speranza,
gio.12                                                      Valentina Bartalesi Lenzi, and Rachele Sprugnoli.
                                                            2006. I-CAB: the Italian Content Annotation Bank.
                                                            In Proceedings of the 5th Conference on Language
                                                            Resources and Evaluation (LREC-2006).
References
                                                          Bernardo Magnini, Anne-Lyse Minard, Mohammed
Valerio Basile and Malvina Nissim. 2013. Sentiment
                                                            R. H. Qwaider, and Manuela Speranza. 2016.
  analysis on italian tweets. In Proceedings of the 4th
                                                            T EXT P RO -AL: An Active Learning Platform for
  Workshop on Computational Approaches to Subjec-
                                                            Flexible and Efficient Production of Training Data
  tivity, Sentiment and Social Media Analysis, pages
                                                            for NLP Tasks. In Proceedings of COLING 2016,
  100–107, Atlanta.
                                                            the 26th International Conference on Computational
                                                            Linguistics: System Demonstrations.
Valerio Basile, Andrea Bolioli, Malvina Nissim, Vi-
  viana Patti, and Paolo Rosso. 2014. Overview of         Yaroslav Nechaev, Francesco Corcoglioniti, and Clau-
  the Evalita 2014 SENTIment POLarity Classifica-           dio Giuliano. 2016. Linking knowledge bases to
  tion Task. In Proceedings of the 4th evaluation cam-      social media profiles.
  paign of Natural Language Processing and Speech
  tools for Italian (EVALITA’14), Pisa, Italy.            Alessio Palmero Aprosio and Claudio Giuliano. 2016.
                                                            The Wiki Machine: an open source software for en-
Pierpaolo Basile, Annalina Caputo, Anna Lisa Gen-           tity linking and enrichment. ArXiv e-prints.
   tile, and Giuseppe Rizzo. 2016. Overview of
   the EVALITA 2016 Named Entity rEcognition and          Emanuele Pianta and Roberto Zanoli. 2007. Entitypro:
   Linking in Italian Tweets (NEEL-IT) Task. In             Exploiting svm for italian named entity recognition.
   Pierpaolo Basile, Anna Corazza, Franco Cutugno,          Intelligenza Artificiale numero speciale su Stru-
   Simonetta Montemagni, Malvina Nissim, Viviana            menti per lelaborazione del linguaggio naturale per
   Patti, Giovanni Semeraro, and Rachele Sprugnoli,         litaliano EVALITA 2007, 4(2):69–70.
   editors, Proceedings of Third Italian Conference on
   Computational Linguistics (CLiC-it 2016) & Fifth       Emanuele Pianta, Christian Girardi, and Roberto
   Evaluation Campaign of Natural Language Pro-             Zanoli. 2008. The TextPro Tool Suite. In Pro-
   cessing and Speech Tools for Italian. Final Work-        ceedings of the 6th International Conference on
   shop (EVALITA 2016). Associazione Italiana di Lin-       Language Resources and Evaluation (LREC 2008),
   guistica Computazionale (AILC).                          Marrakech, Morocco.

Francesco Corcoglioniti, Alessio Palmero Aprosio,
  Yaroslav Nechaev, and Claudio Giuliano. 2016. Mi-
  croNeel: Combining NLP Tools to Perform Named
  Entity Detection and Linking on Microposts. In
  Pierpaolo Basile, Anna Corazza, Franco Cutugno,
  Simonetta Montemagni, Malvina Nissim, Viviana
  Patti, Giovanni Semeraro, and Rachele Sprugnoli,
  editors, Proceedings of Third Italian Conference on
  Computational Linguistics (CLiC-it 2016) & Fifth
  Evaluation Campaign of Natural Language Pro-
  cessing and Speech Tools for Italian. Final Work-
  shop (EVALITA 2016). Associazione Italiana di Lin-
  guistica Computazionale (AILC).

Christian Girardi, Luisa Bentivogli, Mohammad Amin
  Farajian, and Marcello Federico. 2014. Mt-equal:
  a toolkit for human assessment of machine transla-
  tion output. In COLING 2014, 25th International
  12
       http://www.euregio.it

</pre>