=Paper=
{{Paper
|id=Vol-2696/paper_266
|storemode=property
|title=DeLFT and Entity-fishing: Tools for CLEF HIPE 2020 Shared Task
|pdfUrl=https://ceur-ws.org/Vol-2696/paper_266.pdf
|volume=Vol-2696
|authors=Tanti Kristanti,Laurent Romary
|dblpUrl=https://dblp.org/rec/conf/clef/KristantiR20
}}
==DeLFT and Entity-fishing: Tools for CLEF HIPE 2020 Shared Task==
<pdf width="1500px">https://ceur-ws.org/Vol-2696/paper_266.pdf</pdf>
<pre>
DeLFT and entity-fishing : Tools for CLEF HIPE
             2020 Shared Task

Tanti Kristanti[0000−0003−3524−020X] and Laurent Romary[0000−0002−0756−0508]

                                   Inria, Paris
                   {tanti.kristanti,laurent.romary}@inria.fr


        Abstract. This article presents an overview of approaches and results
        during our participation in the CLEF HIPE 2020 NERC-COARSE-LIT
        and EL-ONLY tasks for English and French. For these two tasks, we use
        two systems: 1) DeLFT, a Deep Learning framework for text processing;
        2) entity-fishing, generic named entity recognition and disambiguation
        service deployed in the technical framework of INRIA.

        Keywords: entity recognition · entity linking · machine learning · deep
        learning


1     Introduction

Named Entity Recognition (NER) refers to the task of identifying text spans
that mention named entities and classifying them into predefined classes (e.g.,
person, location, organization). Whereas, Entity Linking (EL) refers to the task
of detecting textual entity mentions and matching them to corresponding entries
within knowledge bases (e.g., Wikipedia, Wikidata). Over the past few years,
traditional machine learning-based (i.e., rule-based, unsupervised, and feature-
based supervised learning) approaches have evolved into deep learning (DL)
approaches, yielding state-of-the-art performances [7, 8, 21]. For our participation
in the CLEF HIPE 2020 shared task, we use two different systems that implement
non-neural machine learning (ML) and DL approaches.
    In HIPE, a shared task dedicated to the evaluation of named entity (NE)
processing on historical newspapers in French, German, and English [5], we par-
ticipated in two different NERC-COARSE-LIT and EL-ONLY tasks by using two
systems: DeLFT and entity-fishing. Deep Learning Framework for Text (DeLFT)
is a framework for text processing, which re-implements standard state-of-the-art
DL architectures. Meanwhile, entity-fishing is a generic named entity recognition
and disambiguation (NERD) system, which applies supervised machine learning
based on Random Forest and Gradient Tree Boosting exploiting various features.

    Copyright © 2020 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0). CLEF 2020, 22-25
    September 2020, Thessaloniki, Greece.
    Both DeLFT1 and entity-fishing 2 are open-source systems under an Apache
2 license. Codes, models, and resources that are publicly available through the
Github repository allow users and contributors, including us, to use existing
services and models and to contribute to system and model improvements and
tests. With the use of DeLFT, our goal is to re-build DL-based models for rec-
ognizing mentions within English and French HIPE historical corpus belonging
to Person (pers), Location (LOC), Organization (ORG), Product (PROD), and
Time (TIME) classes. Meanwhile, with entity-fishing service, we use the ser-
vice to disambiguate provided mentions within French and English HIPE data
against Wikidata entries.


2     Tools
This section provides a general description of the two systems we use, DeLFT and
entity-fishing. For a better understanding of technical discussions, it is advisable
to refer directly to their repositories and documentation.

2.1   DeLFT
Deep Learning Framework for Text (DeLFT) is an open-source framework for
text processing, including sequence labeling (e.g., NE tagging) and text classifi-
cation problems. This Keras and TensorFlow framework re-implements standard
state-of-the-art DL architectures for text processing.
    DeLFT supports many DL architectures (e.g., Bidirectional LSTMs and Con-
ditional Random Fields [8], Bidirectional LSTM and Convolutional Neural Net-
works [2], Bidirectional Gated Recurrent Unit [15]) and contextualized embed-
dings (e.g., ELMo, BERT). For using the desired pre-trained word embeddings,
we need to provide them from the source separately. DeLFT then loads and
manages these embeddings by compiling at the very first access to be stored in
a database.

2.2   entity-fishing
entity-fishing is a generic NERD system against Wikidata. Deployed as part
of the French national infrastructure Huma-Num3 , the service provides a stan-
dardized interface, open and flexible architecture, allowing easy deployment,
including in digital humanities contexts. Initiated in the context of the EU FP7
Cendari project from 2013 to 2016, entity-fishing aimed at setting up a digital
research environment for historians of the medieval and WWI periods to access
archival contents and acquire numerous assets or entities information [6].
    In general, entity-fishing has three phases: language identification, mention
recognition, and entity resolution. First, language identification is necessary for
1
  https://github.com/kermitt2/delft
2
  https://github.com/kermitt2/entity-fishing
3
  https://www.huma-num.fr/
selecting appropriate utilities for text processing (e.g., tokenizer, sentence seg-
mentation) and a specific Wikipedia from the knowledge base. Second, mention
recognition has the responsibility to extract entity mentions from the input.
To support the generic nature, even though prepared with a set of recognizers,
entity-fishing provides the possibility for users to extend with additional ones.
    entity-fishing supports traditional mention extractors: named entity recogni-
tion, Wikipedia lookup, acronym extraction. For NER, entity-fishing uses grobid-
ner. grobid-ner4 is a library for processing texts, extracting named entities, and
classifying these entities into 27 classes (e.g., person, location, media, organiza-
tion, period) using a Conditional Random Field (CRF) statistical model. Mean-
while, Wikipedia lookup is complementary to the machine learning NER ap-
proach. The lookup attempts to find all mentions that correspond to either a
title or an anchor in Wikipedia using an N-gram-based matching approach. For
the acronyms, entity-fishing treats them as mentions and uses the base for dis-
ambiguation. The resolved entity is then further propagated in the text for each
occurrence of the acronym. The result of the mention recognition step is an ag-
gregated list of objects containing raw values from the original text, their actual
positions, and NER classes (within the 27 classes).
    Lastly, entity resolution is the process of matching entity mentions to their
corresponding Wikidata entries. The entity resolution has three stages: candidate
generation, candidate ranking, candidate selector. In the candidate generation
phase, each mention has a list of possible candidates for the disambiguation.
Then, in the candidate ranking, each candidate is assigned a confidence score
calculated as regression probability using various features.


2.3    Auxiliary Resources

We use external datasets and embeddings in addition to those provided by the
HIPE organizers.


Datasets The HIPE corpus consists of training, dev, and test datasets for
each task and language. However, since for English, HIPE does not provide the
training data, we use a pre-trained CoNLL-2012 (based on Ontonotes 5.0) [17]
model and test the model with the HIPE test data.
    Moreover, motivated by the promising French model results published by
DeLFT, we use the annotated [19] French TreeBank (FTB) corpus and the HIPE
data to re-build and benchmark the NER French model. This French journalistic
genre corpus from the year 1990 is the annotated 2007 version of the FTB tree-
bank containing the span, the literal type, sometimes completed with a subtype,
and Aleda unique identifier of each mention. They use seven basic classes: Per-
son, Location, Organization, Company, Product, POI (Point of Interest), and
FictionChar (fictional character). FTB corpus contains 11,636 manually anno-
tated mentions with the distribution of 3,761 location names, 3,357 company
4
    https://github.com/kermitt2/grobid-ner
names, 2,381 organization names, 2,025 person names, 67 product names, 29
fictional character names, and 15 POIs.


Word Embeddings We use various static word embeddings: Global Vectors
for Word Representation (GloVe) [14], English fastText Common Crawl [1, 11],
and French Wikipedia fastText.5 We also use ELMo [16] contextualized word
representations for English6 and French7 .


Table 1. Comparison of DeLFT NER models with various feature sets and other
published systems.

                                                        CoNLL-2003 [20] Ontonotes 5.0 [17] FTB [19]
Model
                                                                        F1-score
                                       DeLFT models [3]
CoNLL-2003-BiLSTM-CRF + GloVe                                91.35              -             -
CoNLL-2003-BiLSTM-CRF + GloVe + ELMo                         92.71              -             -
CoNLL-2003-BiLSTM-CRF + GloVe + ELMo + valid set             93.09              -             -
CoNLL-2003-BiLSTM-CNN-CRF + GloVe                            91.07              -             -
CoNLL-2003-BiLSTM-CNN-CRF + GloVe + ELMo                     92.57              -             -
CoNLL-2003-BiLSTM-CNN-CRF + GloVe + ELMo + valid set         93.04              -             -
CoNLL-2003-BiLSTM-CNN + GloVe                                89.47              -             -
CoNLL-2003-BiLSTM-CNN + GloVe + ELMo                         92.00              -             -
CoNLL-2003-BiLSTM-CNN + GloVe + ELMo + valid set             92.16              -             -
CoNLL-2003-BiGRU-CRF + GloVe                                 90.72              -             -
CoNLL-2003-BiGRU-CRF + GloVe + ELMo                          92.44              -             -
CoNLL-2003-BiGRU-CRF + GloVe + ELMo + valid set              92.71              -             -
CoNLL-2003-BERT-base                                         90.90              -             -
CoNLL-2003-BERT-base + CRF                                   91.20              -             -
CoNLL-2012-BiLSTM-CRF + fastText                               -              87.01           -
CoNLL-2012-BiLSTM-CRF + fastText + ELMo                        -             89.01            -
FTB-BiLSTM-CRF + fastText                                      -                -           87.45
FTB-BiLSTM-CRF + fastText + ELMo                               -                -           89.23
                                       neural achitectures
Lample, et al. (2016) [8]                                    90.94              -             -
Ma and Hovy (2016) [10]                                      91.21              -             -
Chiu and Eric (2016) [2]                                     91.62            86.28           -
Peters, et al. (2018) [16]                                   92.22              -             -
Devlin, et al. (2018) [4]                                    92.80              -             -
                                     non neural achitectures
Ratinov and Roth (2009) [18]                                 90.80              -             -
Passos, et al. (2014) [13]                                   90.90            82.30           -
Luo, et al. (2015) [9]                                       89.90              -             -
Luo, et al. (2015) + linking [9]                             91.20              -             -


3    Benchmarking NER Models

Machine learning approaches have dominated NER, but the trend is towards
neural network architectures that achieve state-of-the-art performances. For this
5
  https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.fr.vec
6
  https://s3-us-west-2.amazonaws.com/allennlp/models/ELMo/
7
  https://traces1.inria.fr/oscar/files/models/cc/fr.zip
reason, we compare several published NER systems as well as DeLFT pre-
trained models against various corpora (i.e., CoNLL-2003, CoNLL-2012, FTB)
and present them in Table 1.
    [18] non-neural machine learning system achieves a 90.80 F1-score on CoNLL-
2003. [13] improves with 90.90 on CoNLL-2003 and 82.30 on Ontonotes 5.0.
Although [9] exceeds the previous achievements with their NERD system, which
pushes the F1-score to 91.20. Nevertheless, their NER pure system reaches 89.90.
    Meanwhile, for neural architectures, [8] reaches a 90.94 F1-score on CoNLL-
2003, then [10] improves the results. [2] reports an F1-score of 91.62 on CoNLL-
2003 and 86.28 on OntoNotes 5.0. [16] ELMo enhanced bidirectional LSTM with
a CRF layer (BiLSTM-CRF) achieves an averaged F1-score of 92.22 over five
runs. [4] has a competitive performance with state-of-the-art systems where its
BERTLARGE fine-tuning approach tested on CoNLL-2003 reaches 92.80.
    DeLFT has reimplemented neural architectures for NER [3]. Table 1 presents
reported best F1-scores over ten runs for the English model using CoNLL-2003
and CoNLL-2012 and the French model using the FTB corpus.
    The model trained with CoNLL-2003 within the BiLSTM-CRF architecture
and GloVe word embedding, tested against the test set, achieves a 91.35 F1-
score. The result is improving with GloVe combined with ELMo. Within BERT
architecture and the CRF activation layer for fine-tuning, the model achieves an
average of 91.20 F1-score. The best F1-score on CoNLL-2003 is 93.09 when using
both train and validation datasets within a BiLSTM-CRF architecture, coupled
with GloVe and ELMo embeddings. Meanwhile, the CoNLL-2012-based model
within the BiLSTM-CRF architecture and the fastText embedding achieves an
F1-score of 87.01. The involvement of ELMo increases the score by 2 points to
89.01.
    The French model trained with the FTB corpus within the BiLSTM-CRF ar-
chitecture and French Wikipedia fastText reaches an 87.45 F1-score. Meanwhile,
with the use of French ELMo, the score is improving into 89.23.
    From these results, we learn that DL-based systems have better performance
than conventional machine learning systems. The use of contextualized word
embeddings within the BiLSTM-CRF architecture improves the scores. The re-
sults in the CoNLL-2003 column also show that ELMo-based models give better
F1-score than BERT-based models.


4   Work Phases

In general, the HIPE shared task contains two tasks:

1. Named Entity Recognition and Classification (NERC): the recognition and
   classification of entity mentions with predefined high-level (i.e., pers, org,
   prod, loc, time), finer-grained, or nested entity classes.
2. Named Entity Linking (NEL): the task of matching identified entity mentions
   to Wikidata entries, with or without prior knowledge of mention types and
   boundaries.
4.1   Named Entity Recognition and Classification (NERC)

Although the English model built with the CoNLL-2003 dataset is promising,
this model does not support the Time (Date) entities. Moreover, since HIPE
does not provide training data for English, we use a CoNLL-2012 pre-trained
model within the BiLSTM-CRF architecture, and ELMo contextualized word
embeddings. For the French model, we enrich the French HIPE (i.e., the version
1.2 train and dev) dataset with annotated FTB data.
    For training the models, we follow the default hyper-parameters 8 as applied
for other pre-trained sequence labeling models in DeLFT, except for the batch
size and maximum epoch, we follow as indicates here.9


Challenges Combining data from different environments poses challenges, par-
ticularly with the reason of different NE class definitions as well as annotation
guidelines. CoNLL-2012 define 18 classes. FTB corpus comes with seven NE
classes. Meanwhile, HIPE uses five classes.
    HIPE annotates absolute dates without months and hours, which confirms
to the CoNLL-2012 DATE class. Furthermore, the HIPE Location (loc) entities
suites with those belonging to CoNLL-2012 FAC (i.e., buildings, airports, high-
ways, bridges), GPE (i.e., countries, cities, states), and LOCATION (i.e., non-
GPE locations, mountain ranges, bodies of water) entities. It’s also challenging
to search the equivalence of the HIPE PROD entities, which we understand as
media products since CoNLL-2012 classifies them in the ORG class.


Experiments We benchmarked the French NER trained with the HIPE data
(i.e., train and dev v-1.2) only and the HIPE plus additional FTB data. The
only HIPE model achieved an F1-score of 85.71 on the dev set. Meanwhile, the
enriched model performs better with an increase of 3 scores into 88.46.


4.2   Named Entity Linking

For the NEL task, we use entity extraction and disambiguation services provided
by entity-fishing. There are several ways to access these services; however, the
most straightforward way is to use the service RESTful web services.10
   First, we collect the text from the HIPE data. Then, we include this text
as part of the JSON input query. The entity-fishing query processing service
takes as input a JSON structured query and returns the JSON query enriched
with a list of identified and, when possible, disambiguated entities. The JSON
query format for the response file is identical to the input query. The client must
respect the JSON query format, which is as follow:
8
   https://github.com/kermitt2/delft/blob/master/delft/sequenceLabelling/config.py
9
   https://github.com/kermitt2/delft/blob/master/nerTagger.py
10
   https://nerd.readthedocs.io/en/latest/restAPI.html
{
     "text": "The text to be processed.",
     "shortText": "term1 term2 ...",
     "termVector": [
         {
             "term": "term1",
             "score": 0.3
         },
         {
             "term": "term2",
             "score": 0.1
         }
     ],
     "language": {
         "lang": "en"
     },
     "entities": [],
     "mentions": ["ner","wikipedia"],
     "nbest": 0,
     "sentence": false,
     "customisation": "generic",
     "processSentence": [],
     "structure": "grobid"
}


Table 2. NERC-COARSE-LIT and EL-ONLY results compared to the best system
and the baseline. Our results are highlighted.

Lang Team             Evaluation                 Precision Recall F1
EN L3i                NE-COARSE-LIT-micro-strict   0.623   0.641 0.632
EN Inria-DeLFT        NE-COARSE-LIT-micro-strict 0.461     0.606 0.524
EN Baseline           NE-COARSE-LIT-micro-strict   0.531   0.327 0.405
EN L3i                NE-COARSE-LIT-micro-fuzzy    0.794   0.817 0.806
EN Inria-DeLFT        NE-COARSE-LIT-micro-fuzzy 0.568      0.746 0.645
EN Baseline           NE-COARSE-LIT-micro-fuzzy    0.736   0.454 0.562
FR L3i                NE-COARSE-LIT-micro-strict   0.831   0.849 0.84
FR Baseline           NE-COARSE-LIT-micro-strict   0.693   0.606 0.646
FR Inria-DeLFT        NE-COARSE-LIT-micro-strict 0.605     0.675 0.638
FR L3i                NE-COARSE-LIT-micro-fuzzy    0.912   0.931 0.921
FR Inria-DeLFT        NE-COARSE-LIT-micro-fuzzy 0.755      0.842 0.796
FR Baseline           NE-COARSE-LIT-micro-fuzzy    0.825   0.721 0.769
EN    Inria-DeLFT        NEL-LIT-micro-fuzzy-@1     0.633   0.685 0.658
EN    L3i                NEL-LIT-micro-fuzzy-@1     0.593   0.593 0.593
EN    aidalight-baseline NEL-LIT-micro-fuzzy-@1     0.506   0.506 0.506
FR    L3i                NEL-LIT-micro-fuzzy-@1      0.64   0.638 0.639
FR    Inria-DeLFT        NEL-LIT-micro-fuzzy-@1     0.585    0.65 0.616
FR    aidalight-baseline NEL-LIT-micro-fuzzy-@1     0.502   0.495 0.498
5      Results
Table 2 lists the best system, our system, and the baseline results for the NE-COARSE-
LIT and EL-ONLY tasks.11 Our NER system performs worse than the L3i system.
However, we perform better than the provided baseline, which is a CRF sequence
classifier. We have an exception to the French NE-COARSE-LIT-strict result, which is
slightly below the baseline F1-score.
    It turns out that our EL system, especially for English, performs better than the
L3i team and the aidalight-baseline, which corresponds to [12]. Our French EL system
is better than the L3i EL system in terms of recall but rather appalling in terms of
precision.
    Table 3 and Table 4 present our English and French NER and EL performance
on the HIPE test data with detailed information on false positive and false negative
numbers. Strict NER, which is a more demanding task, performs worse than fuzzy NER.
Looking further at each class, we highlight that there are considerably misclassified
PROD entities and thus contribute to numerous false negatives.


             Table 3. NERC-COARSE-LIT results on the HIPE test data.

Lang Evaluation               Label P     R    F1 TP FP FN
EN NE-COARSE-LIT-micro-fuzzy ALL 0.568 0.746 0.645 335 255 114
EN NE-COARSE-LIT-micro-strict ALL 0.461 0.606 0.524 272 318 177
FR NE-COARSE-LIT-micro-fuzzy ALL 0.755 0.842 0.796 1347 438 253
FR NE-COARSE-LIT-micro-strict ALL 0.605 0.675 0.638 1080 705 520


                          Table 4. EL-ONLY-LIT results.

Lang Evaluation                   Label P     R    F1 TP FP FN
EN NEL-LIT-micro-fuzzy-@1         ALL 0.633 0.685 0.658 305 177 140
EN NEL-LIT-micro-fuzzy-relaxed-@1 ALL 0.633 0.685 0.658 305 177 140
EN NEL-LIT-micro-fuzzy-relaxed-@3 ALL 0.633 0.685 0.65 305 177 140
EN NEL-LIT-micro-fuzzy-relaxed-@5 ALL 0.633 0.685 0.658 305 177 140
FR NEL-LIT-micro-fuzzy-@1         ALL 0.585 0.65 0.616 1039 737 560
FR NEL-LIT-micro-fuzzy-relaxed-@1 ALL 0.604 0.67 0.635 1072 704 527
FR NEL-LIT-micro-fuzzy-relaxed-@3 ALL 0.604 0.67 0.635 1072 704 527
FR NEL-LIT-micro-fuzzy-relaxed-@5 ALL 0.604 0.67 0.635 1072 704 527


6      Conclusion
As our participation in the HIPE shared task, we highlight that the quantity and
quality of data need are essential for the NERC and the NEL tasks. Further, although
11
     https://github.com/impresso/CLEF-HIPE-2020/blob/master/evaluation-
     results/ranking_summary_final.md
DeLFT and entity-fishing achieve good F1-scores, their performance is quite sensitive
to noise data.


Acknowledgements We thank the anonymous reviewers for their insightful com-
ments.


References
 1. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with
    subword information. Transactions of the Association for Computational Linguis-
    tics 5, 135–146 (2017)
 2. Chiu, J.P., Nichols, E.: Named entity recognition with bidirectional lstm-cnns.
    Transactions of the Association for Computational Linguistics 4, 357–370 (2016)
 3. Delft. https://github.com/kermitt2/delft (2018–2020)
 4. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirec-
    tional transformers for language understanding. arXiv preprint arXiv:1810.04805
    (2018)
 5. Ehrmann, M., Romanello, M., Flückiger, A., Clematide, S.: Overview of CLEF
    HIPE 2020: Named Entity Recognition and Linking on Historical Newspapers. In:
    Arampatzis, A., Kanoulas, E., Tsikrika, T., Vrochidis, S., Joho, H., Lioma, C., Eick-
    hoff, C., Névéol, A., Cappellato, L., Ferro, N. (eds.) Experimental IR Meets Mul-
    tilinguality, Multimodality, and Interaction. Proceedings of the 11th International
    Conference of the CLEF Association (CLEF 2020). Lecture Notes in Computer
    Science (LNCS), vol. 12260. Springer (2020)
 6. Foppiano, L., Romary, L.: entity-fishing: a DARIAH entity recognition and disam-
    biguation service. In: Digital Scholarship in the Humanities . Tokyo, Japan (Sep
    2018), https://hal.inria.fr/hal-01812100
 7. Habibi, M., Weber, L., Neves, M., Wiegandt, D.L., Leser, U.: Deep learning with
    word embeddings improves biomedical named entity recognition. Bioinformatics
    33(14), i37–i48 (2017)
 8. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural
    architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016)
 9. Luo, G., Huang, X., Lin, C.Y., Nie, Z.: Joint entity recognition and disambiguation.
    In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language
    Processing. pp. 879–888 (2015)
10. Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional lstm-cnns-crf.
    arXiv preprint arXiv:1603.01354 (2016)
11. Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., Joulin, A.: Advances in pre-
    training distributed word representations. arXiv preprint arXiv:1712.09405 (2017)
12. Nguyen, D.B., Hoffart, J., Theobald, M., Weikum, G.: Aida-light: High-throughput
    named-entity disambiguation. LDOW 1184 (2014)
13. Passos, A., Kumar, V., McCallum, A.: Lexicon infused phrase embeddings for
    named entity resolution. arXiv preprint arXiv:1404.5367 (2014)
14. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word repre-
    sentation. In: Proceedings of the 2014 conference on empirical methods in natural
    language processing (EMNLP). pp. 1532–1543 (2014)
15. Peters, M.E., Ammar, W., Bhagavatula, C., Power, R.: Semi-supervised sequence
    tagging with bidirectional language models. arXiv preprint arXiv:1705.00108
    (2017)
16. Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K.,
    Zettlemoyer, L.: Deep contextualized word representations. arXiv preprint
    arXiv:1802.05365 (2018)
17. Pradhan, S., Moschitti, A., Xue, N., Uryupina, O., Zhang, Y.: Conll-2012 shared
    task: Modeling multilingual unrestricted coreference in ontonotes. In: Joint Con-
    ference on EMNLP and CoNLL-Shared Task. pp. 1–40 (2012)
18. Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recog-
    nition. In: Proceedings of the Thirteenth Conference on Computational Natural
    Language Learning (CoNLL-2009). pp. 147–155 (2009)
19. Sagot, B., Richard, M., Stern, R.: Annotation référentielle du Corpus Arboré de
    Paris 7 en entités nommées. In: Antoniadis, G., Blanchon, H., Sérasset, G. (eds.)
    Traitement Automatique des Langues Naturelles (TALN). Actes de la conférence
    conjointe JEP-TALN-RECITAL 2012, vol. 2 - TALN. Grenoble, France (Jun 2012),
    https://hal.inria.fr/hal-00703108
20. Sang, E.F., De Meulder, F.: Introduction to the conll-2003 shared task: Language-
    independent named entity recognition. arXiv preprint cs/0306050 (2003)
21. Santos, C.N.d., Guimaraes, V.: Boosting named entity recognition with neural
    character embeddings. arXiv preprint arXiv:1505.05008 (2015)

</pre>