<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SinNer@CLEF-HIPE2020: Sinful Adaptation of SotA models for Named Entity Recognition in Historical French and German Newspapers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pedro Javier Ortiz Suárez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yoann Dupont</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gaël Lejeune</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tian Tian</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ALMAnaCH</institution>
          ,
          <addr-line>Inria, Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>STIH, Sorbonne Université</institution>
          ,
          <addr-line>Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Sorbonne Université</institution>
          ,
          <addr-line>Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this article we present the approaches developed by the Sorbonne-INRIA for NER (SinNer) team for the CLEF-HIPE 2020 challenge on Named Entity Processing on old newspapers. The challenge proposed various tasks for three languages, among them we focused on Named Entity Recognition coarse-grained in French and German texts. The best system we proposed ranked third for these two languages, it uses FastText embeddings and Elmo language models (FrELMo and German ELMo). We combine several word representations in order to enhance the quality of the results for all NE types. We show that reconstruction of sentence segments has an important impact on the results.</p>
      </abstract>
      <kwd-group>
        <kwd>Named Entity Recognition Historical Texts French ELMo CRFs</kwd>
        <kwd>Sentence Segmentation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Among the aspects for which Natural Language Processing (NLP) can be useful
for Digital Humanities (DH) figures prominently Named Entity Recognition.
This task interests researchers for numerous reasons since the application can
be pretty wide. We can cite genealogy or history for which finding mentions
of persons and places in texts is very useful. Researchers in digital literature
have shown a great interest in NER since it can help for instance to highlight
the path of different characters in a book or in a book series. There can be
cross-fertilization between NER and DH since some researchers showed that
some particular properties of literature can help to build better NER systems
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Apart from literature, NER can also be used more generally to help refine
queries to assist browsing in newspaper collections [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Like other NLP tasks,
NER quality will suffer from different problems related to variations in the input
data: variation in languages (multilinguality), variation in the quality of the data
(OCR errors mainly) and specificity of the application domain (literature vs.
epidemic surveillance for instance). These difficulties can be connected with the
challenges for low-level NLP tasks highlighted by Dale et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In CLEF-HIPE
shared task [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], the variation in language and in text quality will be the main
problems even if the specificity of the application can be of great interest.
      </p>
      <p>NER in old documents represent an interesting challenge for NLP since it
is usually necessary to process documents that show different kind of variations
as compared to the particular laboratory conditions on which NER systems are
trained. Most NER systems are usually designed to process clean data.
Additionally, there is the multilingual issue since NER systems have been designed
primarily for English, with assumptions on the availability of data on the one
hand and on the universal nature of some linguistic properties on the other hand.</p>
      <p>
        The fact that the texts processed in Digital Humanities are usually not
borndigital is very important since, even after OCR post-correction, it is very likely
that some noise would be found in the text. Other difficulties will arise as well in
those type of documents. The variation in language is one of them since
contemporary English will clearly not be the most frequent language. It is interesting
for researchers to check how much diachronic variation has an influence on NER
systems [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. It makes it even more important to work on multilingual NER and
to build architectures that need less training data [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. More generally, NER
in ancient texts represents a great opportunity for NLP to compare to main
approaches to handle variation in texts: adapting the texts to an existing
architecture via modernization or normalization [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] or adapting the pipeline to non
standard data (OCR noise, language variants. . . ) via domain adaptation or data
augmentation techniques [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>In Section 2 we present a brief state-of-the-art for Named Entity Recognition
with a focus on digitized documents. Section 3 and 4 are respectively devoted to
the description of the dataset of CLEF-HIPE 2020 shared task and the methods
we developed to extract NE for French and German. The results of our
systems are described in Section 5 and in Section 6 we give some conclusions and
perspectives for this work.
2</p>
      <p>
        Related Work on Named Entity Recognition
Named Entity Recognition came into light as a prerequisite for designing robust
Information Extraction (IE) systems in the MUC conferences [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. This task soon
began to be treated independently from IE since it can serve multiple purposes,
like Information retrieval or Media Monitoring for instance [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]. As such, shared
task specifically dedicated to NER started to rise like the CoNLL 2003 shared
task [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]. Two main paths were followed by the community: (i) since NER was
at first used for general purposes, domain extension start to gain interest [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]; (ii)
since the majority of NER systems were designed for English, the extension to
novel languages (including low resource languages) became of importance [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ].
      </p>
      <p>
        One can say that NER followed the different trends in NLP. The first
approaches were based on gazeeters and handcrafted rules. Initially NER was
considered to be solved by a patient process involving careful syntactic analysis [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
Supervised learning approaches came to fashion with the increase of available
data and the rise of shared tasks on NER. Decision trees and Markov models
were soon outperformed by Condition Random Fields (CRF). Thanks to its
ability to model dependencies and to take advantage of the sequentiality of textual
data, CRF helped to set new state-of-the-art results in the domain [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Since
supervised learning results were bound by the size of training data, lighter
approaches were tested in the beginning of the 2000’s, among them we can cite
weakly supervision [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] and active learning [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ].
      </p>
      <p>
        During a time, most of promising approaches involved an addition to improve
CRFs : word embeddings [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], (bi-)LSTMs [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] or contextual embeddings [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ].
More recently, the improvements in contextual word embeddings made the CRFs
disappear as standalone models for systems reaching state-of-the-art results, see
[
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] for a review on the subject and a very interesting discussion on the limits
attained by state-of-the-art systems, the Glass Ceiling.
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Dataset for the CLEF-HIPE shared task</title>
      <p>The dataset of the CLEF-HIPE shared task contains newspaper articles of
17th20th century. The text is an output of an OCR software, then tokenised and
annotated with labels corresponding to each sub-task. This pecularity of
historical documents will be detailed later in this section. The corpus provided for
French and German both contained training data (train) and development data
(dev) whereas, for English only development data was provided for the shared
task. For this reason, we chose to work on French and German only. Table 1
shows some statistics of this dataset. The size of the train dataset was twice
higher for French than for German whereas the development sets have roughly
the same size. As usual in NER, persons (Pers) and locations (Loc) are the most
frequent entity types.</p>
      <sec id="sec-2-1">
        <title>Tokens Documents Segments</title>
      </sec>
      <sec id="sec-2-2">
        <title>Labeled named entities</title>
      </sec>
      <sec id="sec-2-3">
        <title>Pers</title>
      </sec>
      <sec id="sec-2-4">
        <title>Loc Org Time Prod</title>
      </sec>
      <sec id="sec-2-5">
        <title>Train Fr</title>
        <p>Dev Fr</p>
      </sec>
      <sec id="sec-2-6">
        <title>Train De</title>
        <p>Dev De</p>
        <p>Table 2 shows an excerpt of the train dataset (CoNLL format). For each
document, general information were provided. Among them, newspaper and date
may have been features useful for recognising entities but we did not take
advantage of it. Each document was composed of segments, starting with "# segment
. . . " corresponding to lines in the original documents. Each segment is tokenized
in order to correspond to the CoNLL format with one token per line. These two
notions, segments and tokens, are very important since they do not always match
the type of unit usually processed in NLP pipelines. Segments seldom correspond
to sentences so that there is a need to concatenate the segments to get the raw
text and then segment it into sentences. This is very interesting since it gets us
close to real-world conditions rather than laboratory conditions, and we show in
Section 5.2 that this segment vs. sentence question has an important influence
on the results. Regarding tokens, the tokenization is obviously not perfect. We
can see that there are non-standard words and bad tokenization due to the OCR
output (in red in Table 2). If we concatenate the tokens we get the sequence "Su.
_sss allemands" instead of "Suisse allemande". These non-standard words make
the Named Entity Recognition task more complicated and, again, more realistic.</p>
        <p>CRFs and Contextualized Word Embeddings for NER</p>
        <sec id="sec-2-6-1">
          <title>CRF model (run3)</title>
          <p>
            SEM (Segmenteur-Étiqueteur Markovien)45 [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] is a free NLP tool that relies on
linear-chain CRFs [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ] to perform tagging. SEM uses Wapiti [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ] v1.5.06 as
linear-chain CRFs implementation. For this particular NER task, SEM uses the
following features:
4 available at: https://github.com/YoannDupont/SEM
5 translates to: Markovian Tokenizer-Tagger (MTT).
6 available at: https://github.com/Jekub/Wapiti
– token, prefix/suffix from 1 to 5 and a Boolean isDigit features in a [
            <xref ref-type="bibr" rid="ref2">-2, 2</xref>
            ]
window;
– previous/next common noun in sentence;
– 10 gazetteers (including NE lists and trigger words for NEs) applied with
some priority rules in a [
            <xref ref-type="bibr" rid="ref2">-2, 2</xref>
            ] window;
– a “fill-in-the-gaps” gazetteers feature where tokens not found in any gazetteer
are replaced by their POS, as described in [
            <xref ref-type="bibr" rid="ref27">27</xref>
            ]. This feature used token
unigrams and token bigrams in a [
            <xref ref-type="bibr" rid="ref2">-2, 2</xref>
            ] a window.
– tag unigrams and bigrams.
          </p>
          <p>We trained a CLEF HIPE specific model by optimizing L1 and L2 penalties
on the development set. The metric used to estimate convergence of the model
is the error on the development set (1 accuracy). For French, our optimal L1
and L2 penalties were 0.5 and 0.0001 respectively (default Wapiti parameters).
For German, our optimal L1 and L2 penalties were 1.0 and 0.0001 respectively.</p>
          <p>One interest of SEM is that it has a built-in sentence tokenizer for French
using a rule-based approach. By default, CLEF-HIPE provides a newline
segmentation that is the output of the OCR. As a result, some NE mentions span
across multiple segments, making it very hard to identify them correctly. It is to
be expected that models trained (and labelling on) sentences would yield
better performances than those trained (and labelling on) segments. SEM makes
it simple to switch between different sequence segmentations, which allowed us
to label sentences and output segments. SEM’s sentence segmentation engine
works using mainly local rules to determine whether a token is the last of a
sequence (eg: is a dot preceded by a known title abbreviation?). It also uses
non-local rules to remember whether a token is between parentheses or French
quotes to not segment automatically within them. Since we work at token level,
we had to adapt some rules to fit CLEF-HIPE tokenization. For example, SEM
decides at tokenization stage whether a dot is a strong punctuation or part of
a larger token, as for abbreviations. This has the advantage of making sentence
segmentation easier. CLEF-HIPE tokenization systematically separates dots, so
we adapted some sentence segmentation rules, for example: we decided not to
consider a dot as a sentence terminator if the previous token was in a lexica
of titles or functions. No specific handling of OCR errors were done. Another
interest is that SEM has an NE mention broadcasting process. Mentions found
at least once in a document are used as a gazetteer to tag unlabeled mentions
within said document. When a new mention overlaps and is strictly longer than
an already found mention, the new mention will replace the previous one in the
document.
4.2</p>
        </sec>
        <sec id="sec-2-6-2">
          <title>Contextualized word embeddings</title>
          <p>
            Embeddings from Language Models (ELMo) [
            <xref ref-type="bibr" rid="ref25">25</xref>
            ] is a Language Model, i.e, a
model that given a sequence of N tokens, (t1; t2; :::; tN ), computes the
probability of the sequence by modeling the probability of token tk given the history
(t1; :::; tk 1):
p(t1; t2; : : : ; tN ) =
          </p>
          <p>N
Y p(tk j t1; t2; : : : ; tk 1):
k=1
However, ELMo in particular uses a bidirectional language model (biLM)
consisting of L LSTM layers, that is, it combines both a forward and a backward
language model jointly maximizing the log likelihood of the forward and
backward directions:</p>
          <p>N
X ( log p(tk j t1; : : : ; tk 1; x; !LST M ; s)
k=1
+ log p(tk j tk+1; : : : ; tN ; x;</p>
          <p>LST M ; s) ) :
where at each position k, each LSTM layer l outputs a context-dependent
representation !h kL;Ml with l = 1; : : : ; L for a forward LSTM, and h kL;Ml of tk given
(tk+1; : : : ; tN ) for a backward LSTM.</p>
          <p>ELMo also computes a context-independent token representation xLM via
k
token embeddings or via a CNN over characters. ELMo then ties the parameters
for the token representation ( x) and Softmax layer ( s) in the forward and
backward direction while maintaining separate parameters for the LSTMs in
each direction.</p>
          <p>ELMo is a task specific combination of the intermediate layer representations
in the biLM, that is, for each token tk, a L-layer biLM computes a set of 2L + 1
representations
where hkL;M0 is the token layer and</p>
          <p>Rk = fxkLM ; !h kL;Ml ; h kL;Ml j l = 1; : : : ; Lg
= fhkL;Ml j l = 0; : : : ; Lg;</p>
          <p>hkL;Ml = [!h kL;Ml ; h kL;Ml ];
for each biLSTM layer.</p>
          <p>When included in a downstream model, as it is the case in this paper, ELMo
collapses all L layers in R into a single vector ELMok = E(Rk; e), generally
computing a task specific weighting of all biLM layers:</p>
          <p>ELMotkask = E(Rk; task)</p>
          <p>L
= task X sltaskhkL;Ml :
l=0
applying layer normalization to each biLM layer before weighting.</p>
          <p>
            Following [
            <xref ref-type="bibr" rid="ref25">25</xref>
            ], we use in this paper ELMo models where L = 2, i.e., the
ELMo architecture involves a character-level CNN layer followed by a 2-layer
biLSTM.
The LSTM-CRF is a model originally proposed by Lample et al. [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ] it consists
of a Bi-LSTM encoder pre-appended by both character level word embeddings
and pre-trained word embeddings, and a CRF decoder layer. For our
experiments, we follow the same approach as Ortiz Suárez et al. [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ] by using the
Bi-LSTM-CRF implementation of Straková et al. [
            <xref ref-type="bibr" rid="ref31">31</xref>
            ] which is open source and
readily available7, and pre-appending contextualized word-embeddings to the
model. For French we pre-append the FrELMo model [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ], which is the standard
ELMo [
            <xref ref-type="bibr" rid="ref25">25</xref>
            ] implementation8 trained on the French OSCAR9 corpus [
            <xref ref-type="bibr" rid="ref22">22</xref>
            ] [
            <xref ref-type="bibr" rid="ref23">23</xref>
            ].
For German we pre-append the German ELMo [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ], which is again the standard
ELMo implementation but trained on the German Wikipedia.
          </p>
          <p>
            Contrary to the approach of Ortiz Suárez et al. [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ], we do not use the
CamemBERT model [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ] for French or the German BERT [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ]. Both of these
models are BERT-based and as such they are limited to a 512-token
contextualized window. Moreover, they both use SentencePiece [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ] meaning that tokens
are actually subwords, which considerably increases the number of tokens per
sentence, specially for the longer ones, thus decreasing the contextual windows
of both CamemBERT and the German BERT. SentencePiece also introduces
the problem of a fixed-size vocabulary, which in the case of this shared task
might negatively impact the performance of said models, as they could
struggle handling OCR problems or just non-standard vocabulary. Since our main
goal was to reconstruct the sentences and use long contextualized sequences we
opted to use ELMo which can easily handle longer sequences with it’s standard
implementation and actually has a dynamic vocabulary thanks to the CNN
character embedding layer, thus it might be better equipped to handle non-standard
orthography and OCR problems.
          </p>
          <p>
            For the fixed word embeddings we used the Common Crawl-based FastText
embeddings [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] originally trained by Facebook as opposed to the embeddings
provided by the HIPE shared task, as we obtained better dev scores using the
original FastText embeddings for both French and German.
          </p>
          <p>
            We used the standard hyperparameters originally10 used by Straková et al.
[
            <xref ref-type="bibr" rid="ref31">31</xref>
            ]. Namely a batch size of 8, a dropout of 0.5, a learning rate of 0.001 and 10
epochs. The difference between run 1 and 2, is that run 1 uses the data as is,
while run 2 uses the reconstructed sentences.
5
5.1
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results and Discussion</title>
      <sec id="sec-3-1">
        <title>Official shared task results</title>
        <p>The results of our 3 runs compared to the best run on the NERC-coarse
sharedtask for French and German are given in Table 3 (strict scenario). For both
7 Available at: https://github.com/ufal/acl2019_nested_ner.
8 Available at: https://github.com/allenai/bilm-tf
9 Available at: https://oscar-corpus.com
10 https://github.com/ufal/acl2019_nested_ner/blob/master/tagger.py#L484.
tasks, we are the third best ranking team. We only did very minimal
adaptation of existing systems. We did not modify tokenization for any language. The
most notable change was to use custom sentence segmentation instead of given
segments for French and using some additional lexica as features for our CRF
model in German (for French, we only used existing SEM lexica). Other than
that, we only optimized hyper-parameters on the dev set. This clearly illustrates
the power of contextual embeddings and today’s neural network architectures.
This is encouraging in terms of usability of SotA models on real-world data.
run
winner
run 1
run 2
run 3
average
median</p>
        <p>P</p>
        <p>R</p>
        <p>P</p>
        <p>R
In this section, we evaluate the influence of sequence segmentation on system
performances. This evaluation is done for French only, as we used SEM to provide
sentence segmentation and SEM could only provide a proper sentence
segmentation for that language. As can be seen in table 4, sentence segmentation allows
to improve results by 3.5 F1 points. This is due to the fact that some entities
were split across multiple segments in the original data. Using a custom sentence
segmentation allows to have entities in a single sequence. This segmentation is
applied both with training data and evaluation data, so that our systems can
access a more proper context for named entities. The cost of using another
segmentation is relatively cheap, as SEM can process nearly 1GB of raw text per
hour.</p>
        <p>A per entity comparison is also available in Table 4. One can see that the
improvement of sentence segmentation is not very significant for locations (Loc).
It is due to two facts : (i) locations are usually small in number of tokens and
therefore less prone to be separated in two segments and (ii) there was less room
from improvement since they were the easiest entity type to detect (86.35%
F1-score). To the contrary, entities of type “product” (Prod), usually longer in
tokens, were very hard to predict with only 48.57% F1-measure and benefited
the most from segmentation in sentences (+16 percentage points in F1-measure).
Loc
Org
Pers
Prod
Time
Segments</p>
        <p>Sentences</p>
        <p>Segments</p>
        <p>Sentences</p>
        <p>Segments</p>
        <p>Sentences
In Table 5 we show the results that could have been obtained by training the
Bi-LSTM model on both train and dev dataset. We used the same
hyperparameters as we did for our official run. Despite the fact that it does not ensure the
robustness of the system, the added-value seem to be quite disappointing11. In
German the gain may be a bit more significant, probably due to the smaller size
of the training dataset.</p>
        <p>metric
french</p>
        <p>german
not to dev
to dev
not to dev
to dev
In this article we presented three methods developed for the Named Entity
Recognition task in French and German historical newspapers. The first method
relied on linear-chain CRFs while the other two methods use a Bidirectional
LSTM and a bidirectional Language Model (ELMo). The later outperformed
the CRF model and achieved rank 3 on the NER task in both French and
German. We also showed that the type of sequences used has a significant influence
on the results. When we segment in sentences rather than using the segments of
11 In particular, if we consider that it would not have given us a better ranking on any
language.
the dataset as it is the results are systematically much better, with an exception
for locations where the gain is marginal. This proves that sentence segmentation
remains a key component of efficient NLP architectures, in particular for models
taking advantage of the context.</p>
        <p>As a future work it would be interesting to assess the importance of noise in
the data. For instance, by comparing the results of NER on texts obtained via
different OCR tools. The influence of the qualitative jumps in the data, which is
common in Digital Humanities, is an important aspect to evaluate the robustness
of the system in real-world conditions rather than laboratory conditions. We also
plan to provide an in-depth analysis of the impact of word embeddings and neural
architecture, as we only provided our best results in this paper.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Brooke</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hammond</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baldwin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Bootstrapped text-level named entity recognition for literature</article-title>
          .
          <source>In: Proc. of the 54th Annual Meeting of the Association for Computational Linguistics</source>
          . pp.
          <fpage>344</fpage>
          -
          <lpage>350</lpage>
          . Berlin, Germany (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Möller</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pietsch</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soni</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yeung</surname>
            ,
            <given-names>C.M.:</given-names>
          </string-name>
          <article-title>German bert</article-title>
          . https: //deepset.ai/german-bert (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dale</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Somers</surname>
            ,
            <given-names>H.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moisl</surname>
          </string-name>
          , H.:
          <article-title>Handbook of Natural Language Processing</article-title>
          . Marcel Dekker, Inc.,
          <source>USA</source>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dupont</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Exploration de traits pour la reconnaissance d'entités nommées du français par apprentissage automatique</article-title>
          .
          <source>In: 24e Conférence sur le Traitement Automatique des Langues Naturelles (TALN)</source>
          . p.
          <volume>42</volume>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ehrmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Colavizza</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rochat</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaplan</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Diachronic evaluation of NER systems on old newspapers</article-title>
          .
          <source>Proc. of the 13th Conference on Natural Language Processing (KONVENS</source>
          <year>2016</year>
          ) pp.
          <fpage>97</fpage>
          -
          <lpage>107</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ehrmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romanello</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flückiger</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clematide</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Extended Overview of CLEF HIPE 2020: Named Entity Processing on Historical Newspapers</article-title>
          . In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Eickhoff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Névéol</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.) Working Notes of CLEF 2020 -
          <article-title>Conference and Labs of the Evaluation Forum. CEUR-WS (</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Evans</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>A framework for named entity recognition in the open domain</article-title>
          .
          <source>In: Proc. of the Recent Advances in Natural Language Processing (RANLP)</source>
          . pp.
          <fpage>137</fpage>
          -
          <lpage>144</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Finkel</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grenager</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Incorporating non-local information into information extraction systems by Gibbs sampling</article-title>
          .
          <source>In: Proc. of the 43rd Annual Meeting on Association for Computational Linguistics</source>
          . p.
          <fpage>363</fpage>
          -
          <lpage>370</lpage>
          . USA (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Ghannay</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caubrière</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Estève</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Camelin</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simonnet</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laurent</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morin</surname>
          </string-name>
          , E.:
          <article-title>End-to-end named entity and semantic concept extraction from speech</article-title>
          .
          <source>In: IEEE Spoken Language Technology Workshop</source>
          . Athens, Greece (Dec
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Learning word vectors for 157 languages</article-title>
          .
          <source>In: Proc. of the Eleventh International Conference on Language Resources and Evaluation (LREC</source>
          <year>2018</year>
          ).
          <article-title>European Language Resources Association (ELRA), Miyazaki</article-title>
          , Japan (May
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Grishman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sundheim</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Design of the MUC-6 evaluation</article-title>
          . In
          <source>: Proc. of the 6th Conference on Message Understanding</source>
          . p.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          . MUC6 '95,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computational Linguistics, USA (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Hobbs</surname>
            ,
            <given-names>J.R.:</given-names>
          </string-name>
          <article-title>The generic information extraction system</article-title>
          .
          <source>In: Proc. of the 5th Conference on Message Understanding</source>
          . p.
          <fpage>87</fpage>
          -
          <lpage>91</lpage>
          . MUC5 '93,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computational Linguistics, USA (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kudo</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Richardson</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing</article-title>
          .
          <source>In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</source>
          . pp.
          <fpage>66</fpage>
          -
          <lpage>71</lpage>
          . Association for Computational Linguistics, Brussels, Belgium (Nov
          <year>2018</year>
          ). https://doi.org/10.18653/v1/
          <fpage>D18</fpage>
          -2012, https: //www.aclweb.org/anthology/D18-2012
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lafferty</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pereira</surname>
            ,
            <given-names>F.C.N.</given-names>
          </string-name>
          :
          <article-title>Conditional random fields: Probabilistic models for segmenting and labeling sequence data</article-title>
          .
          <source>In: Proc. of the Eighteenth International Conference on Machine Learning (ICML)</source>
          <year>2001</year>
          ,
          <string-name>
            <given-names>Williams</given-names>
            <surname>College</surname>
          </string-name>
          , Williamstown, MA, USA. pp.
          <fpage>282</fpage>
          -
          <lpage>289</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Lample</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ballesteros</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Subramanian</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kawakami</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Neural architectures for named entity recognition</article-title>
          .
          <source>In: Proc. of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          . pp.
          <fpage>260</fpage>
          -
          <lpage>270</lpage>
          . San Diego, California (Jun
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Lavergne</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cappé</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yvon</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Practical very large scale CRFs</article-title>
          .
          <source>In: Proc. of the 48th Annual Meeting of the Association for Computational Linguistics</source>
          . pp.
          <fpage>504</fpage>
          -
          <lpage>513</lpage>
          . Association for Computational Linguistics (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Leaman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>TaggerOne: joint named entity recognition and normalization with semi-Markov Models</article-title>
          .
          <source>Bioinformatics</source>
          <volume>32</volume>
          (
          <issue>18</issue>
          ),
          <fpage>2839</fpage>
          -
          <lpage>2846</lpage>
          (06
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muller</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ortiz</surname>
            <given-names>Suárez</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>P.J.</given-names>
            ,
            <surname>Dupont</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Romary</surname>
          </string-name>
          , L., de la Clergerie, É.,
          <string-name>
            <surname>Seddah</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sagot</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>CamemBERT: a tasty French language model</article-title>
          .
          <source>In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          . pp.
          <fpage>7203</fpage>
          -
          <lpage>7219</lpage>
          . Association for Computational Linguistics,
          <source>Online (Jul</source>
          <year>2020</year>
          ). https://doi.org/10.18653/v1/
          <year>2020</year>
          .acl-main.
          <volume>645</volume>
          , https://www.aclweb. org/anthology/2020.acl-main.
          <fpage>645</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>May</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <string-name>
            <surname>German ELMo Model</surname>
          </string-name>
          (
          <year>2019</year>
          ), https://github.com/ t
          <article-title>-systems-on-site-services-gmbh/german-elmo-model</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Neudecker</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilms</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Faber</surname>
            , W.J., van Veen,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Large-scale refinement of digital historic newspapers with named entity recognition</article-title>
          .
          <source>In: Proc. of IFLA</source>
          <year>2014</year>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Ortiz</surname>
            <given-names>Suárez</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>P.J.</given-names>
            ,
            <surname>Dupont</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Muller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Romary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Sagot</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          :
          <article-title>Establishing a new state-of-the-art for French named entity recognition</article-title>
          .
          <source>In: Proc. of The 12th Language Resources and Evaluation Conference</source>
          . pp.
          <fpage>4631</fpage>
          -
          <lpage>4638</lpage>
          . European Language Resources Association, Marseille, France (May
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Ortiz</surname>
            <given-names>Suárez</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>P.J.</given-names>
            ,
            <surname>Romary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Sagot</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          :
          <article-title>A monolingual approach to contextualized word embeddings for mid-resource languages</article-title>
          .
          <source>In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          . pp.
          <fpage>1703</fpage>
          -
          <lpage>1714</lpage>
          . Association for Computational Linguistics,
          <source>Online (Jul</source>
          <year>2020</year>
          ). https://doi.org/10.18653/v1/
          <year>2020</year>
          .acl-main.
          <volume>156</volume>
          , https://www.aclweb. org/anthology/2020.acl-main.
          <fpage>156</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Ortiz</surname>
            <given-names>Suárez</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>P.J.</given-names>
            ,
            <surname>Sagot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Romary</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          :
          <article-title>Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures</article-title>
          . In: Bański,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Barbaresi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Biber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Breiteneder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Clematide</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Kupietz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Lüngen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Iliadi</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <source>(eds.) 7th Workshop on the Challenges in the Management of Large Corpora (CMLC-7)</source>
          . pp.
          <fpage>9</fpage>
          -
          <lpage>16</lpage>
          .
          <source>Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22nd July</source>
          <year>2019</year>
          ,
          <article-title>Leibniz-Institut für Deutsche Sprache</article-title>
          ,
          <string-name>
            <surname>Mannheim</surname>
          </string-name>
          (
          <year>2019</year>
          ). https://doi.org/10.14618/ids-pub-
          <volume>9021</volume>
          , http://nbn-resolving.de/urn:nbn:de:bsz:
          <fpage>mh39</fpage>
          -
          <lpage>90215</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Lexicon infused phrase embeddings for named entity resolution</article-title>
          .
          <source>In: Proc. of the Eighteenth Conference on Computational Natural Language Learning</source>
          . pp.
          <fpage>78</fpage>
          -
          <lpage>86</lpage>
          . Ann Arbor,
          <source>Michigan (Jun</source>
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iyyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gardner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Deep contextualized word representations</article-title>
          .
          <source>In: Proc. of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics</source>
          . pp.
          <fpage>2227</fpage>
          -
          <lpage>2237</lpage>
          . New Orleans, USA (Jun
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Rahimi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohn</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Massively multilingual transfer for NER</article-title>
          .
          <source>In: Proc. of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          . pp.
          <fpage>151</fpage>
          -
          <lpage>164</lpage>
          . Association for Computational Linguistics, Florence,
          <source>Italy (Jul</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Raymond</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fayolle</surname>
          </string-name>
          , J.:
          <article-title>Reconnaissance robuste d'entités nommées sur de la parole transcrite automatiquement</article-title>
          .
          <source>In: TALN'10</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Rössler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Adapting an NER-system for German to the biomedical domain</article-title>
          .
          <source>In: Proc. of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications</source>
          . pp.
          <fpage>95</fpage>
          -
          <lpage>98</lpage>
          . Geneva,
          <string-name>
            <surname>Switzerland</surname>
          </string-name>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Zhang, J.,
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>C.L.:</given-names>
          </string-name>
          <article-title>Multi-criteria-based active learning for named entity recognition</article-title>
          .
          <source>In: Proc. of the 42nd Annual Meeting of the Association for Computational Linguistics</source>
          . pp.
          <fpage>589</fpage>
          -
          <lpage>596</lpage>
          . Barcelona,
          <string-name>
            <surname>Spain</surname>
          </string-name>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Stanislawek</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wróblewska</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wójcicka</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ziembicki</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Biecek</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Named entity recognition - is there a glass ceiling?</article-title>
          <source>In: Proc. of the 23rd Conference on Computational Natural Language Learning</source>
          . pp.
          <fpage>624</fpage>
          -
          <lpage>633</lpage>
          . Hong Kong (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Straková</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Straka</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hajic</surname>
          </string-name>
          , J.:
          <article-title>Neural architectures for nested NER through linearization</article-title>
          .
          <source>In: Proc. of the 57th Conference of the Association for Computational Linguistics</source>
          ,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          , Florence, Italy. pp.
          <fpage>5326</fpage>
          -
          <lpage>5331</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Tjong Kim Sang</surname>
            ,
            <given-names>E.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Meulder</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition</article-title>
          .
          <source>In: Proc. of the Seventh Conference on Natural Language Learning</source>
          . p.
          <fpage>142</fpage>
          -
          <lpage>147</lpage>
          . CONLL '03,
          <string-name>
            <surname>USA</surname>
          </string-name>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Yangarber</surname>
          </string-name>
          , R.:
          <article-title>Counter-training in discovery of semantic patterns</article-title>
          .
          <source>In: Proc. of the 41st Annual Meeting of the Association for Computational Linguistics</source>
          . pp.
          <fpage>343</fpage>
          -
          <lpage>350</lpage>
          . Association for Computational Linguistics, Sapporo,
          <source>Japan (Jul</source>
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Yangarber</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grishman</surname>
          </string-name>
          , R.:
          <article-title>Unsupervised learning of generalized names</article-title>
          .
          <source>In: In Proc. of the International Conference on Computational Linguistics (ICCL)</source>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>