<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Experiments from LIMSI at the French Named Entity Recognition Coarse-grained task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Cyril Grouin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lavergne</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Université Paris-Saclay</institution>
          ,
          <addr-line>CNRS, LIMSI, 91405 Orsay</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents the participation of the LIMSI team in the HIPE 2020 Challenge on the Coarse-grained named entity recognition task for French. Our approach jointly predicts the literal and metonymy entities. For this, a CamemBERT base model and a CRF model were used. We submitted three systems: a joint model using only CamemBERT, a joint model extended with a CRF layer, and a CamemBERT model without joint option. Experimental results show that the second system achieved best results on the literal tags (F1=.814) while the third system performed best (F1=.667) on the metonymy tags. The second system allowed us to obtain our best results on both the dev and test datasets for the literal tags. Nevertheless, we observed a difference on the metonymy tags where our first system obtained best results on the dev dataset (F1=.663) while our third system performed best on the test dataset (F1=.667).</p>
      </abstract>
      <kwd-group>
        <kwd>Named Entity Recognition</kwd>
        <kwd>historical texts</kwd>
        <kwd>contextual word embeddings</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        In 2011 and 2012, two corpora have been produced and annotated into extended named
entities. The 2011 Quaero corpus focused on broadcast news [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] while the 2012 Quaero
corpus is composed of press archives in French from December 1880 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Those two
corpora were used for NLP Challenges that included both coarse-grained and fine-grained
named entities, and several named entity imbrications such as the metonymy
phenomena. The current HIPE 2020 Challenge builds on the annotation guidelines produced
during the 2011 and 2012 Quaero NLP Challenges [
        <xref ref-type="bibr" rid="ref6 ref7">7, 6</xref>
        ].
      </p>
      <p>
        Specifically for the HIPE 2020 challenge [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], one main issue concerns the
digitization of texts from distinct times (from 1798 to 2018 on the French data) with
digitization errors such as insertion and deletion of characters (e.g., “oppositipjn” instead
of “opposition”), including insertion and deletetion of spaces which produces
tokenization issues (“limitrop he” vs. “limitrophe” or “rég iment” vs. “régiment”, producing two
tokens instead of only one). Digitization errors mainly occur on grammatical words.1
Nevertheless, one may find such errors in named entity, which makes a NER task more
difficult (e.g., in the person name “Picqu¶art” instead of “Picquart” or in the town name
“Glascow” instead of “Glasgow”).
      </p>
      <p>The CLEF HIPE 2020 challenge proposed several tasks (coarse-grained and
finegrained named entity recognition (NER), and entity linking) in three languages (English,
French, German). We are interested in the sub-task 1.1 called NERC Coarse-grained for
French, that concerns the recognition and classification of entity mentions according to
coarse-grained types (Person, Location, Organisation and Product). For this task we
distinguish two coarse types of the entity mention token: according to the literal sense and
to the metonymic sense, named respectively literal and metonymy tags. These coarse
types correspond to NE-COARSE-LIT and NE-COARSE-METO columns in the data.</p>
      <p>
        For this challenge we proposed three neural systems that take benefit from contextual
word embeddings using Camembert [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] model. This model was extended to jointly
predict literal and metonymy NE tags, in addition to the use of a CRF layer on the top of
this model to further improve the predictions by taking advantage from neighborhoods
labels.
      </p>
      <p>The paper is organized along the following lines: Section 2 presents related work on
NER task. Section 3 describes the proposed NER system. The experimental setup and
results are described in Section 4, just before the conclusion (Section 5).
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        The Named Entity Recognition (NER) task consists in identifying text spans that
mention named entities (people names, companies, location) and classifying them into
predefined categories (Person, Location, Organisation and Product). NER task is a key
component of several Natural Language Processing (NLP) applications such as information
retrieval [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], text understanding [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], question answering [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
      </p>
      <p>
        For decades, the NER task has been widely studied and different approaches have
been proposed. Traditional approaches can fall into three categories [
        <xref ref-type="bibr" rid="ref15 ref28">28, 15</xref>
        ]: rule-based
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], unsupervised learning [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and feature-based supervised learning approaches [
        <xref ref-type="bibr" rid="ref17 ref30">30,
17</xref>
        ]. Recent approaches are based on neural network architectures in which hidden
features are discovered automatically. Generally the NER architecture can be regarded as
a composition of an encoder (CNN, BiLSTM (bi-directional long-short term memory),
RNN, transformer, etc.) and a decoder (BiLSTM, CRF, etc.) [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The first NER
neural model was proposed by [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] which is based on unidirectional LSTM architecture.
Collobert et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] proposed a CNN-CRF architecture enriched by character-level
embeddings. Lample et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] proposed a BiLSTM-CRF architecture that takes benefit
from both word and character-level embeddings. State of the art NER systems leverage
recent advances in deep learning and recent approaches that take benefit from contextual
or language model embeddings such as BERT [
        <xref ref-type="bibr" rid="ref16 ref18 ref20">20, 16, 18</xref>
        ].
1 The most common errors are found in short grammatical words (error/correct form): Cn/Un,
co/ce, cotte/cette, do/de, k/à, lai/lui, lo/le, on/en, quo/que, uno/une.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Proposed NER system</title>
      <p>
        The proposed NER system is based on CamemBERT [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] model which we extended
to jointly predict both NE tags: literal and metonymy. In the following subsections we
briefly define the CamemBERT model and then present the proposed joint NER model.
3.1
      </p>
      <sec id="sec-3-1">
        <title>CamemBERT</title>
        <p>
          The CamemBERT model is based on RoBERTa (Robustly Optimized BERT Pretraining
Approach) [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] which is based on BERT (Bidirectional Encoder Representations from
Transformers) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>
          BERT’s model architecture is a multi-layer bidirectional Transformer [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] encoder,
trained with a masked language modeling and Next Sentence Prediction objectives.
RoBERTa was proposed to improve BERT pre-training procedure by dynamically
changing the masking pattern applied to the training data, removing the next sentence
prediction task, and training with larger batches and longer sequences, on more data, and for
longer.
        </p>
        <p>
          Similar to BERT and RoBERTa, CamemBERT is a multi-layer bidirectional
Transformer. It uses the original architectures of BERTBASE (12 layers, 768 hidden
dimensions, 12 attention heads, 110M parameters) and BERTLARGE (24 layers, 1024 hidden
dimensions, 12 attention heads, 110M parameters). CamemBERT is similar to RoBERTa,
using the improved pre-training procedure. However it uses the whole-word masking and
the SentencePiece tokenization [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] instead of WordPiece [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. For more details about
CamemBert we refer the reader to Martin et al. [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ].
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Joint NER</title>
        <p>
          As we mentioned before, the NER system has to train jointly both literal and metonymy
NE tags. Inspired by previous work on Intent Classification and Slot Filling [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], we
propose to extend CamemBERT for this purpose. Hence, the final hidden states of the
tokens h2 ; :::; hT (excluding the first special token (&lt;s&gt;) fed into two softmax layers to
classify over literal and metonymy tags. Specifically, each tokenized input word fed into
a SentencePiece tokenizer and the hidden state of the first sub-token is used as input to
the softmax classifiers. The literal and metonymy tags are predicted respectively as :
yiLIT = sof tmax.W hi + b/; i Ë 1:::N
(1)
yiMET O = sof tmax.W hi + b/; i Ë 1:::N (2)
where hi is the hidden state of the first sub-token for the word wi.
        </p>
        <p>To jointly learn literal and metonymy tags, the learning objective is to maximize the
conditional probability defined as follows:</p>
        <p>N
p.yLIT ; yMET Oðw/ = Ç p.yiLIT ðw/p.yiMET Oðw/
i=1
(3)</p>
        <p>
          The model is fine-tuned end-to-end via minimizing the cross-entropy loss.
3.3
For the NER task, label predictions are dependent on surrounding words’ predictions.
Thus, for a given input sentence, it is helpful to consider the correlations between
neighborhood labels and jointly decode the best label chain. It has been shown that the use of
conditional random fields (CRF) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] layer on top of BiLSTM (bi-directional long-short
term memory) encoder improves many sequence labeling task including NER [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. For
that reason, we propose to add a CRF layer for modeling NE label dependencies, on top
of the joint CamemBERT model.
4
4.1
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments and results</title>
      <sec id="sec-4-1">
        <title>Data description</title>
        <p>For the Coarse-grained NER task, we used the French datasets provided by the
organizers. The corpus is divided into train, dev and test sets, which are composed respectively
of 158, 43 and 43 documents from distinct periods of time. As the document size is very
long to be processed with our model, we decided to split the document to several
sentences of length g l.</p>
        <p>Two rules are considered during splitting: i) take into account the NE tags i.e. we
have to reach the end of the tag before splitting, ii) take into account the end of the
document, if the remaining part of the document is less than or equal to 2 &lt; l, than we
consider the remaining document as a sentence. Both rules are applied to train and dev
datasets, while only the second rule was applied to the test dataset since annotations
were masked.</p>
        <p>
          After hyper-parameter fine-tuning, we defined the minimum length size to 50, thus
the sentence length varies from 50 to 149. Table 1 reports sentence numbers for each
data set.
We used the CamemBERT base model as provided by its authors,2 which is composed
of 12 layers, 768 hidden dimensions and 12 attention heads. CamemBERT is pre-trained
on the French part of the OSCAR [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] corpus: a pre-filtered and pre-classified version
2 https://camembert-model.fr/
of Common Crawl, composed of 138GB of raw text and 32.7B tokens after subword
tokenization.
        </p>
        <p>
          For fine-tuning, all hyper-parameters are tuned on the development (dev) set. The
minimum sentence length is selected from [
          <xref ref-type="bibr" rid="ref10 ref20">10,20,50</xref>
          ]. The max sequence length is 256.
The batch size is selected from [32, 64, 128]. The maximum number of epochs is 100.
For optimization we used Adam [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] with an initial learning rate of 5e-5. The dropout
probability is 0.1.
4.3
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Results</title>
        <p>This section reports the results of the best three submitted systems, namely:
– sys1: fine-tuning the joint NER model without CRF layer;
– sys2: fine-tuning the joint NER model with CRF layer;
– sys3: fine-tuning the CamemBERT base model without joint option, hence for this
model the literal and metonymy tags are concatenated and considered as only one
tag. This system has more tags to predict than sys1 and sys2;</p>
        <p>
          The results are evaluated at entity and document levels in terms of micro and macro
Precision, Recall and F1-measure considering two scenarios : exact (strict) and fuzzy
(relaxed) boundary matching, for both literal and metonymy tags [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
        </p>
        <p>The best systems are selected based on the results on the dev set, whose results in
terms of micro Precision, Recall and F1-measure for both tags are summarized in Table
2. Note that our system is often the second best on French.</p>
        <p>Similarly as on the dev set, for literal tag, the three systems achieve comparable
results. For metonymy, sys1 achieves better results than sys3 and sys2, yielding
respectively to 8:82~ and 2:63~ of improvements in terms of micro F1 in strict and fuzzy
scenarios, while sys2, that includes a CRF layer on top of the joint NER model,
improves the results at the document level in terms of macro F1 (0:68) by 2:25~ and 1:34~
respectively for sys3 and sys1 in both scenarios.</p>
        <p>Results on test data in terms of micro Precision, Recall and F1-measure for both tags
are summarized in Table 3.
We observe that for literal tag the proposed systems obtain comparable results.</p>
        <p>The prediction of metonymic tags is not an easy task, since it represents only 0:27~
B-org and 0:05~ of I-org (in each of train, dev and test sets) and a few rare tags B-Loc
in train and dev and B-time in test. Considering, the prediction of metonymy tags as a
separate task (the case for sys1 and sys2), this may cause generalization problems. The
results we obtained on the test data confirm this hypothesis. Indeed, the sys3 trained
on the concatenation of both tags achieved the best results in terms of micro F1 w.r.t.
sys1 and sys2, which is not the case on dev. Thus, the concatenation of both tags is
helpful to predict metonymy tag and to avoid their frequency problem. i.e. considering
those labels BI-loc_BI-org vs. BI-loc_O, it easier for the system to distinguish the LOC
category with or without metonymy.</p>
        <p>Last, sys1 achieves the best results in terms of macro F1 (0:747) in comparison to
sys2 (:738) and sys3 (:733).
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>In this paper, we presented our participation in the CLEF HIPE 2020 Challenge on
the Coarse-grained named entity recognition task for French. The proposed approach
jointly predicts the literal and metonymic entities. For this, a CamemBERT base model
and a CRF model were used. We submitted three systems: a joint model using only
CamemBERT, a joint model extended with a CRF layer, and a CamemBERT model
without joint option.</p>
      <p>On the test dataset, we achieved our best results on the literal tags using our
second system (F1=.814) while on the metonymy tags, our third system performed best
(F1=.667). Our second system allowed us to obtain the best results on both the dev and
test datasets for the literal tags. Nevertheless, we observed a difference on the metonymy
tags where our first system obtained best results on the dev dataset (F1=.663) while our
third system performed best on the test dataset (F1=.667); surprisingly, differences
between first and third systems are about 5 points in strict F1-score.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhuo</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Bert for joint intent classification and slot filling</article-title>
          . arXiv preprint arXiv:
          <year>1902</year>
          .
          <volume>10909</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Collobert</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weston</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karlen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kavukcuoglu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuksa</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Natural language processing (almost) from scratch</article-title>
          .
          <source>Journal of machine learning research 12(Aug)</source>
          ,
          <fpage>2493</fpage>
          -
          <lpage>2537</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . Association for Computational Linguistics, Minneapolis,
          <source>Minnesota (Jun</source>
          <year>2019</year>
          ). https://doi.org/10.18653/v1/
          <fpage>N19</fpage>
          - 1423, https://www.aclweb.org/anthology/N19-1423
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ehrmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romanello</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flückiger</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clematide</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <source>Overview of CLEF HIPE</source>
          <year>2020</year>
          :
          <article-title>Named Entity Recognition and Linking on Historical Newspapers</article-title>
          . In: Arampatzis,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Kanoulas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Tsikrika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Vrochidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Joho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Lioma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Eickhoff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Névéol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Cappellato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          , N. (eds.)
          <article-title>Experimental IR Meets Multilinguality, Multimodality, and Interaction</article-title>
          .
          <source>Proceedings of the 11th International Conference of the CLEF Association (CLEF</source>
          <year>2020</year>
          ).
          <source>Lecture Notes in Computer Science (LNCS)</source>
          , vol.
          <volume>12260</volume>
          . Springer (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Etzioni</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cafarella</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Downey</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popescu</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shaked</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soderland</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weld</surname>
            ,
            <given-names>D.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yates</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Unsupervised named-entity extraction from the web: An experimental study</article-title>
          .
          <source>Artificial intelligence</source>
          <volume>165</volume>
          (
          <issue>1</issue>
          ),
          <fpage>91</fpage>
          -
          <lpage>134</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Galibert</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosset</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grouin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zweigenbaum</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quintard</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Extended named entities annotation on OCRed documents: From corpus constitution to evaluation campaign</article-title>
          .
          <source>In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)</source>
          .
          <source>European Language Resources Association (ELRA)</source>
          , Istanbul, Turkey (May
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Grouin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosset</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zweigenbaum</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fort</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Galibert</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quintard</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Proposal for an extension of traditional named entities: from guidelines to evaluation, an overview</article-title>
          .
          <source>In: Proc of LAW</source>
          . Jeju-do, South Korea (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            , G., Cheng,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>Named entity recognition in query</article-title>
          .
          <source>In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval</source>
          . pp.
          <fpage>267</fpage>
          -
          <lpage>274</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hammerton</surname>
          </string-name>
          , J.:
          <article-title>Named entity recognition with long short-term memory</article-title>
          .
          <source>In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4</source>
          . pp.
          <fpage>172</fpage>
          -
          <lpage>175</lpage>
          . Association for Computational Linguistics (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Woodland</surname>
          </string-name>
          , P.C.
          <article-title>: A rule-based named entity recognition system for speech input</article-title>
          .
          <source>In: Sixth International Conference on Spoken Language Processing</source>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ba</surname>
          </string-name>
          , J.:
          <article-title>Adam: A Method for Stochastic Optimization</article-title>
          . In: Bengio,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>LeCun</surname>
          </string-name>
          , Y. (eds.) 3rd
          <source>International Conference on Learning Representations, ICLR</source>
          <year>2015</year>
          , San Diego, CA, USA, May 7-
          <issue>9</issue>
          ,
          <year>2015</year>
          , Conference Track Proceedings (
          <year>2015</year>
          ), http://arxiv.org/abs/1412.6980
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Kudo</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Richardson</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing</article-title>
          .
          <source>In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</source>
          . pp.
          <fpage>66</fpage>
          -
          <lpage>71</lpage>
          . Association for Computational Linguistics, Brussels, Belgium (Nov
          <year>2018</year>
          ). https://doi.org/10.18653/v1/
          <fpage>D18</fpage>
          -2012, https://www.aclweb.org/anthology/D18-2012
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Lafferty</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pereira</surname>
            ,
            <given-names>F.C.</given-names>
          </string-name>
          :
          <article-title>Conditional random fields: Probabilistic models for segmenting and labeling sequence data (</article-title>
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lample</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ballesteros</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Subramanian</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kawakami</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Neural Architectures for Named Entity</article-title>
          .
          <source>In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          . pp.
          <fpage>260</fpage>
          -
          <lpage>270</lpage>
          . Association for Computational Linguistics, San Diego, California (Jun
          <year>2016</year>
          ). https://doi.org/10.18653/v1/
          <fpage>N16</fpage>
          -1030
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            , A., Han,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>A survey on deep learning for named entity recognition</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meng</surname>
          </string-name>
          , Y.,
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A Unified MRC Framework for Named Entity Recognition</article-title>
          . arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>11476</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Liao</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veeramachaneni</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A simple semi-supervised algorithm for named entity recognition</article-title>
          .
          <source>In: Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing</source>
          . pp.
          <fpage>58</fpage>
          -
          <lpage>65</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>LTP: A New Active Learning Strategy for Bert-CRF Based Named Entity Recognition</article-title>
          . arXiv preprint arXiv:
          <year>2001</year>
          .
          <volume>02524</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ott</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoyanov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          . arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Luoma</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pyysalo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Exploring Cross-sentence Contexts for Named Entity Recognition with BERT</article-title>
          . arXiv preprint arXiv:
          <year>2006</year>
          .
          <volume>01563</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovy</surname>
          </string-name>
          , E.:
          <article-title>End-to-end Sequence Labeling via Bi-directional LSTM-CNNsCRF</article-title>
          . In:
          <article-title>Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</article-title>
          . pp.
          <fpage>1064</fpage>
          -
          <lpage>1074</lpage>
          . Association for Computational Linguistics, Berlin, Germany (Aug
          <year>2016</year>
          ). https://doi.org/10.18653/v1/
          <fpage>P16</fpage>
          -1101, https://www.aclweb.org/anthology/P16-1101
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muller</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ortiz</surname>
            <given-names>Suárez</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>P.J.</given-names>
            ,
            <surname>Dupont</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Romary</surname>
          </string-name>
          , L., de la Clergerie, É.V.,
          <string-name>
            <surname>Seddah</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sagot</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Camembert: a tasty french language model</article-title>
          .
          <source>In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Mollá</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Zaanen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Named entity recognition for question answering (</article-title>
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Moosavi</surname>
            ,
            <given-names>N.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strube</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Which Coreference Evaluation Metric Do You Trust? A Proposal for a Link-based Entity Aware Metric</article-title>
          . In:
          <article-title>Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</article-title>
          . pp.
          <fpage>632</fpage>
          -
          <lpage>642</lpage>
          . Association for Computational Linguistics, Berlin, Germany (Aug
          <year>2016</year>
          ). https://doi.org/10.18653/v1/
          <fpage>P16</fpage>
          -1060, https://www.aclweb.org/anthology/P16-1060
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Schuster</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakajima</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Japanese and korean voice search</article-title>
          .
          <source>In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source>
          . pp.
          <fpage>5149</fpage>
          -
          <lpage>5152</lpage>
          . IEEE (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Suárez</surname>
            ,
            <given-names>P.J.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sagot</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romary</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures. Challenges in the Management of Large Corpora (CMLC-7</article-title>
          )
          <year>2019</year>
          p.
          <volume>9</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Vaswani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parmar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uszkoreit</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
          </string-name>
          , Ł.,
          <string-name>
            <surname>Polosukhin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Attention is all you need</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <fpage>5998</fpage>
          -
          <lpage>6008</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Yadav</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bethard</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A Survey on Recent Advances in Named Entity Recognition from Deep Learning models</article-title>
          .
          <source>In: COLING</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          :
          <article-title>ERNIE: Enhanced language representation with informative entities</article-title>
          . arXiv preprint arXiv:
          <year>1905</year>
          .
          <volume>07129</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
          </string-name>
          , J.:
          <article-title>Named entity recognition using an HMM-based chunk tagger</article-title>
          .
          <source>In: proceedings of the 40th Annual Meeting on Association for Computational Linguistics</source>
          . pp.
          <fpage>473</fpage>
          -
          <lpage>480</lpage>
          . Association for Computational Linguistics (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>