=Paper=
{{Paper
|id=Vol-3180/paper-84
|storemode=property
|title=Knowledge-based Contexts for Historical Named Entity Recognition & Linking
|pdfUrl=https://ceur-ws.org/Vol-3180/paper-84.pdf
|volume=Vol-3180
|authors=Emanuela Boros,Carlos-Emiliano González-Gallardo,Edward Giamphy,Ahmed Hamdi,José G. Moreno,Antoine Doucet
|dblpUrl=https://dblp.org/rec/conf/clef/BorosGGH0D22
}}
==Knowledge-based Contexts for Historical Named Entity Recognition & Linking==
Knowledge-based Contexts for Historical Named Entity Recognition & Linking Emanuela Boros1 , Carlos-Emiliano González-Gallardo1 , Edward Giamphy1,2 , Ahmed Hamdi1 , José G. Moreno1,3 and Antoine Doucet1 1 University of La Rochelle, L3i, 17000 La Rochelle, France 2 Preligens, 75009 Paris, France 3 University of Toulouse, IRIT, 31000 Toulouse, France Abstract This paper summarizes the participation of the L3i laboratory of the University of La Rochelle in the Identifying Historical People, Places, and other Entities (HIPE) evaluation campaign of CLEF 2022 in both tasks: named entity recognition and classification (NERC), coarse- and fine-grained, and entity linking (EL) in historical newspapers and classical commentaries. For both tasks, we developed models based on our previous models, which ranked first at CLEF-hipe-2020. The NERC model is a Transformer-based architecture and the EL model is a BiLSTM-based architecture. For NERC, our main contribution is two-fold: (1) data-wise improvement – we propose a knowledge-based strategy to provide related context information to the NERC model; (2) model-wise improvement – we adapt the NERC model to the task of detecting coarse- and fine-grained entities in non-standard text via adapters and we include the knowledge-based contexts as context jokers. Our approaches ranked first on 84.6% of the leaderboards we participated in for NERC and 85.7% of them for EL. Keywords historical documents, fine-grained named entity recognition, named entity linking, knowledge bases, language models 1. Introduction The identification of entities in historical documents, such as people and places, can be seen as a building block of historical knowledge that allows easier access and better information retrieval [1, 2, 3, 4]. Also, knowledge about historical events is gradually fading, especially among the younger generations. Thus, preserving the historical memory of the information that can be extracted from historical documents and bringing them to a larger audience, not limited to researchers and experts in the humanities [5, 6, 7], could lead to better and wider access to cultural heritage. Although named entity recognition (NER) and linking (EL) systems have been developed to process modern data collections in general, NER and EL systems for processing historical CLEF 2022: Conference and Labs of the Evaluation Forum, September 5–8, 2022, Bologna, Italy $ emanuela.boros@univ-lr.fr (E. Boros); carlos.gonzalez gallardo@univ-lr.fr (C. González-Gallardo); edward.giamphy@preligens.com (E. Giamphy); ahmed.hamdi@univ-lr.fr (A. Hamdi); jose.moreno@irit.fr (J. G. Moreno); antoine.doucet@univ-lr.fr (A. Doucet) 0000-0001-6299-9452 (E. Boros); 0000-0002-0787-2990 (C. González-Gallardo); 0000-0002-2722-5168 (E. Giamphy); 0000-0002-8964-2135 (A. Hamdi); 0000-0002-8852-5797 (J. G. Moreno); 0000-0001-6160-3356 (A. Doucet) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) documents are less common [8, 9]. Because these documents are not digitally born, they are scanned and processed by optical character recognition (OCR) tools to extract their textual content. However, the OCR process is not error free and misrecognizes some of the content. This can be due to the level of degradation of the document being scanned, the digitization process, and also the quality of the OCR tool. This causes digitization errors in the recognized text, such as misspelled locations or names. In this context, the first CLEF-HIPE-2020 edition [10, 11, 12] proposed the tasks of named entity recognition and classification (NERC), both fine- and coarse-grained, and entity linking (EL) in historical newspapers written in English, French and German. The evaluation showed that neural-based systems with pre-trained language models or Transformer-based approaches [13, 14, 15] clearly prevailed in NERC [16], beating symbolic conditional random field (CRF) [17, 18], pattern-based approaches or BiLSTMs [19] by a large margin. For its second edition, the HIPE evaluation campaign1 took advantage of the availability of several NE annotated datasets produced by several European cultural heritage projects [20]. In this paper, we present our participation in the Identifying Historical People, Places, and other Entities (HIPE) evaluation campaign of CLEF 2022 in both tasks: NERC, fine-grained and coarse-grained, and EL in historical newspapers. For both tasks, we based our models on those that we proposed at CLEF-HIPE-2020 [13]. The NERC model was mainly based on the Transformer architecture [21] and the EL model was based on a BiLSTM architecture [22]. For NERC, our main contribution is two-fold: (1) we propose a knowledge-based system, where we build a multilingual knowledge base resting on Wikipedia and Wikidata to provide related context information to the NERC model (data-wise improvement); (2) we adapt the NERC model to the task of detecting coarse- and fine-grained entities in non-standard text by learning modular language- and task-specific representations via newly-proposed additional adapters [23, 24], small bottleneck layers inserted between the weights of two auxiliary Transformer layers (model-wise improvement) [25]. Furthermore, for taking advantage of the additional Wikipedia-based contexts, we include them in the model with mean-pooled representations that we refer to as context jokers. Official results of our participation show the effectiveness of our models over the CLEF-HIPE-2022 benchmark. The paper is organized as follows: Section 2 introduces the task and the datasets. Section 3 presents our knowledge-retrieval modules. Sections 4 and 5 respectively present our NERC and EL systems and their corresponding performance. Conclusions are drawn in Section 6, where future work is also presented. 2. Datasets The CLEF-HIPE-2022 competition proposed corpora composed of historical newspapers and classical commentaries covering circa 200 years. The historical newspaper data is composed of five datasets in English, Finnish, French, German and Swedish which originate from various projects and national libraries in Europe, from which, we experimented with the hipe-2020 dataset. hipe-2020 includes newspaper articles from Swiss, Luxembourgish and American 1 https://hipe-eval.github.io/HIPE-2022/ Table 1 Overview of the hipe-2020 and ajmc datasets. LOC = location, ORG = organization, PERS = person, PROD = product, TIME = time, WORK = human work, OBJECT = physical object, and SCOPE = spe- cific part of work. FR DE EN Type train dev test train dev test train dev test LOC 3,089 774 854 1,740 588 595 – 384 181 hipe-2020 ORG 836 159 130 358 164 130 – 118 76 PERS 2,525 679 502 1,166 372 311 – 402 156 PROD 200 49 61 112 49 62 – 33 19 TIME 276 68 53 118 69 49 – 29 17 PERS 577 123 139 620 162 128 618 130 96 WORK 378 99 80 321 70 74 467 116 95 ajmc LOC 15 0 9 31 10 2 39 3 3 OBJECT 10 0 0 6 4 2 3 0 0 DATE 2 0 3 2 0 0 12 5 3 SCOPE 639 169 129 758 157 176 684 162 151 newspapers in French, German, and English (19C-20C) and it contains 19,848 linked entities [10, 11, 12]. We also experimented with the classical commentaries data from the Ajax Multi-Commentary (ajmc) project that is composed of digitized 19C commentaries published in French, German, and English [26], annotated with both universal named entities (person, location, organisation) and domain-specific named entities (bibliographic references to primary and secondary literature). Table 1 presents the statistics regarding the number and type of entities in the aforementioned datasets divided according to the training, development, and test sets. 3. Knowledge-based Contexts One of the main challenges of NER applied to historical newspapers and classical commentaries concerns the digitization process of these heritage materials. The OCR output contains errors which produce noisy text and complications, similar to those studied in [27]. Introducing external grammatically correct contexts into NERC systems have been shown to have a positive impact over the entities identification [28]. It consists on adding complementary and related sentences, paragraphs or documents from external resources like Wikipedia or knowledge graphs (KG) to enrich the surrounding of an entity, which helps NERC systems on detecting the correct label. KGs structure information in a connected form, by representing entities (e.g., people, places) as nodes, and relationships between entities (e.g., being part of, being located in) as edges. Thus, we propose two main techniques for generating additional contexts: • Wikipedia Knowledge Retrieval Module: We create a local instance of ElasticSearch2 , which provides dense vector field indexing and a 𝑘-nearest neighbor (kNN) search API. Given a query vector, this API obtains the 𝑘 closest vectors and returns those documents as search hits. • Knowledge Graph Embedding Retrieval Module: We produce English contexts by extending the indexing scheme to a knowledge graph embedding model over the Wikidata5m3 [29] dataset. 3.1. Wikipedia Knowledge Retrieval Module We download the latest (02/04/2022) XML dumps4 of the French and German Wikipedia and transform them into plain text using the Wikipedia2Vec [30] utility5 . We focus on French and German since for English we create another type of retrieval module which also contains Wikipedia paragraphs. Similar to Wang et al. [28], we define a document, inside our instance of ElasticSearch, as a triplet composed of a sentence, a title, and a paragraph. We create a dense vector index over the sentence embedding field computed with a pre-trained multilingual Sentence-BERT model6 [31, 32]. During context retrieval, for a given sentence from the datasets described in Section 2, we compute its dense vector representation with the same multilingual Sentence-BERT pre-trained model and take it as a query to retrieve the top-k semantically similar documents based on a k-nearest neighbors algorithm (k-NN) cosine similarity search over the sentence embedding field (Figure 1). Figure 1: Context retrieval for the Wikipedia Knowledge Retrieval Module. 2 We utilized ElasticSearch v8.1. 3 https://deepgraphlearning.github.io/project/wikidata5m 4 https://dumps.wikimedia.org/ 5 https://wikipedia2vec.github.io/wikipedia2vec/ 6 https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 3.2. Knowledge Graph Embedding Retrieval Module Wikidata5m is a large-scale KG with aligned entity descriptions. It integrates around five million Wikidata7 entities, which are described in the first paragraph of the corresponding Wikipedia pages. We index the Wikidata5m dataset along with the dense vectors produced by the RotatE KG embedding model [33] pre-trained over the same dataset8 . RotatE defines each relation between entities as a rotation from the source entity to the target entity in the dense vector space. In this case, we describe “an ElasticSearch document” as a triplet formed by an entity identifier, an entity description, and an entity embedding. We create a standard index on the entity identifier field and two dense vector indexes: the former on the entity embedding field, and the latter on the embeddings from the entity description field obtained with the same Sentence-BERT model as in the previous module. We propose two different methods for context retrieval (Figure 2) to evaluate the influence of the KG embedding on the semantic similarity: • KG Embedding Retrieval Module 1: it takes into consideration the entity embedding index and follows the same principle utilized in the Wikipedia Knowledge Retrieval Module. For a given sentence, the top-k semantically similar documents are retrieved over the sentence embedding field. • KG Embedding Retrieval Module 2: it retrieves the top-1 semantically similar document. Then, a second search over the entity dense vector index is performed to retrieve the top-k similar documents based on the KG embeddings of the entities. Figure 2: Context retrieval for KG Embedding Retrieval Module 1 (left) and KG Embedding Retrieval Module 2 (right). 4. Named Entity Recognition and Classification In CLEF-HIPE-2022 [20], the named entity recognition and classification (NERC) task consists in the recognition and classification of entities, such as people and locations, within historical 7 https://www.wikidata.org/ 8 https://graphvite.io/docs/latest/pretrained_model.html multilingual newspapers and classical commentaries. According to the organizers [10], it is composed of two sub-tasks with different levels of difficulty: • Subtask 1.1 - NERC-Coarse: the identification and categorization of entity mentions according to high-level entity types (e.g., Person, Location). • Subtask 1.2 - NERC-Fine: the recognition and classification of entity mentions at different levels, finer-grained entity types and nested entities, up to one level of depth (nested entities). 4.1. NERC Architecture Figure 3: The NERC model architecture with additional contexts and examples of sentences from both datasets. Our proposed architecture is presented in Figure 3. In the right, we detail our model that consists in a Base Model with new adapter layers, and the encoding of the additional contexts (context jokers). As an overview, after the contexts are generated for an initial sentence, we encode the tokens of the sentence with the Base Model, while the additional contexts are only passed through the BERT pre-trained model encoder. These representations are afterward concatenated, followed by the prediction CRF-based layers. In the left, we present two example sentences from hipe-2020 and ajmc, for demonstrating the different levels of the entity types. Base Model Our base model is based on the architecture proposed for CLEF-HIPE-2020 [13, 25] that consists in a hierarchical, multitask learning approach, with a fine-tuned encoder based on BERT. The previous model included an encoder with two Transformer [21] layers on top of the BERT pre-trained model encoder. This year, we add adapter modules to these layers [34]. The adapters are added to each Transformer layer after the projection following multi-headed attention. The adapter consists of a bottleneck which contains few parameters relative to the attention and feed-forward layers in the original model. This acts as a task adapter [24] for fine-grained NER. The attention modules in the Transformer layers adapt not only to the task, but also to the noisy input which proved to increase performance of NER in such special conditions [25]. Finally, the multitask prediction layer consists of separate CRF layers9 . Context Jokers In order to include the additional contexts generated as explained in Section 3, we introduce the context jokers. Each additional context is passed through the BERT pre-trained model encoder10 which is afterward mean-pooled along the sequence axis11 . We call this representation the context joker. The context jokers are afterward concatenated with the sequential representation of the initial tokens of the sentence, as seen in Figure 3 and they are discarded at the moment of prediction. We call them jokers because we see them as wild cards unobtrusively inserted in the representation of the current sentence for improving the recognition of the fine-grained entities. However, we also consider that these jokers can affect the results in a way not immediately apparent and could also be harmful to the performance of a NERC system. 4.2. Experiments and Internal Results CLEF-HIPE-2022 consists in assessing both tasks, NERC and EL in terms of precision (P), recall (R), and F-measure (F1) at macro and micro levels [35, 10]12 . Two evaluation scenarios are considered: strict (exact boundary matching) and fuzzy boundary matching. For our internal NERC results, we report only the strict matching (NERC-Coarse and NERC-Fine). Our experimental setup consists in a baseline model and three settings with different levels of knowledge-based contexts: • no-context: Base Model with bert-base-multilingual-cased13 ; • v0-language-specific: context jokers are generated with Wikipedia Knowledge Re- trieval Module; • v1-en-wk5m: context jokers are generated with KG Embedding Retrieval Module 1; • v2-en-wk5m: contexts jokers are generated with KG Embedding Retrieval Module 2. French Our preliminary results for French, hipe-2020 and ajmc datasets, are shown in Table 2. They reveal that generating contexts with KG Embedding Retrieval Module 1 & 2 brings considerable improvements for HIPE even if our Base Model provides the higher precision for NERC-Coarse and the Wikipedia Knowledge Retrieval Module the higher recall for both granularities. Adding any type of context to ajmc seems to slightly affect the precision while 9 There is a CRF layer for each level of the entity types (NE-COARSE-LIT, NE-COARSE-METO, NE-FINE-LIT, NE-FINE-METO, NE-FINE-COMP, NE-NESTED), thus six layers. If a dataset does not have fine-grained entities (e.g., English in hipe-2020, we maintain the same numbers of layers, and the model will learn to predict no entity. 10 We do not utilize in this case the additional Transformer layers with adapters, since these were specifically proposed for noisy text and they do not bring any increase in performance as observed by Boroş et al. [25]. 11 The maximum length of each context corresponds to the one handled by the language model. Thus, for example, for a BERT-base model, the maximum is 512. 12 We utilized the HIPE-scorer https://github.com/hipe-eval/HIPE-scorer. 13 https://huggingface.co/bert-base-multilingual-cased Table 2 NERC results on French (Internal). hipe-2020 ajmc P R F1 P R F1 no-context NERC-Coarse 0.765 0.755 0.76 0.833 0.792 0.812 NERC-Fine 0.651 0.665 0.658 0.691 0.697 0.694 v0-language-specific NERC-Coarse 0.758 0.768 0.763 0.83 0.800 0.815 NERC-Fine 0.632 0.694 0.662 0.69 0.697 0.693 v1-en-wk5m NERC-Coarse 0.762 0.767 0.765 0.83 0.803 0.816 NERC-Fine 0.643 0.69 0.666 0.625 0.633 0.629 v2-en-wk5m NERC-Coarse 0.756 0.758 0.757 0.828 0.814 0.821 NERC-Fine 0.655 0.692 0.673 0.69 0.697 0.693 Table 3 NERC results on German (Internal). hipe-2020 ajmc P R F1 P R F1 no-context NERC-Coarse 0.754 0.73 0.742 0.910 0.877 0.893 NERC-Fine 0.598 0.657 0.626 0.895 0.872 0.883 v0-language-specific NERC-Coarse 0.761 0.756 0.759 0.933 0.877 0.904 NERC-Fine 0.644 0.684 0.664 0.912 0.869 0.890 v1-en-wk5m NERC-Coarse 0.759 0.767 0.763 0.930 0.898 0.913 NERC-Fine 0.677 0.684 0.681 0.909 0.887 0.898 v2-en-wk5m NERC-Coarse 0.76 0.774 0.767 0.935 0.906 0.920 NERC-Fine 0.654 0.701 0.676 0.906 0.887 0.897 the contexts produced by the KG Embedding Retrieval Module 2 has a positive impact for the NERC-Coarse recall. German As for German, our preliminary results presented in Table 3 show the larger improve- ments when applying contexts for both ajmc and hipe-2020, specially with KG Embedding Table 4 NERC results on English (Internal). hipe-2020 ajmc P R F1 P R F1 no-context NERC-Coarse 0.604 0.563 0.583 0.789 0.859 0.823 NERC-Fine – – – 0.740 0.833 0.784 v1-en-wk5m NERC-Coarse 0.565 0.601 0.583 0.828 0.871 0.849 NERC-Fine – – – 0.755 0.839 0.795 v2-en-wk5m NERC-Coarse 0.565 0.601 0.583 0.86 0.868 0.864 NERC-Fine – – – 0.782 0.836 0.808 Retrieval Module 1 & 2. We assume that this is due the considerably smaller training dataset than for the other languages. English Our preliminary results for English, shown in Table 4, indicate that generating contexts with KG Embedding Retrieval Module 1 & 2 brings considerable improvements on ajmc for both granularities. Adding contexts to hipe-2020 has a double effect. They negatively impact precision while improving recall. This is due to the lack of English training documents and the fact that the contexts were generated using the French and German hipe-2020 training datasets14 . 4.3. CLEF-HIPE-2022 Results The official CLEF-HIPE-2022 competition was restricted to two submissions. We, thus, selected our baseline (no-context) and our best context generator models (v2-en-wk5m). In order to improve the performance of our models, we stacked, for each language, a language-specific language model. For English, we add bert-base-cased15 , while for French and German, we add the open-source French and German Europeana BERT models pretrained on the open source Europeana digitized newspapers provided by The European Library and published by the MDZ Digital Library team (dbmdz)16 . French, German, English Our official results for French, German and English are shown in Tables 5, 6, and 7 respectively. Adding contexts with the KG Embedding Retrieval Module 2 reveals 14 These training sets were combined and used for training the model. Since the English hipe-2020 has only NERC-Coarse entities, we discarded the NERC-Fine and the nested entities from the the French and German hipe-2020, before training. 15 We utilized the English BERT model https://huggingface.co/bert-base-cased. 16 We utilized the bert-base-french-europeana-cased and bert-base-german-europeana-cased from https://huggingface.co/dbmdz/. Table 5 NERC results on French (CLEF-HIPE-2022). hipe-2020 ajmc P R F1 P R F1 no-context NERC-Coarse 0.786 0.831 0.808 0.78 0.817 0.798 NERC-Fine 0.679 0.767 0.720 0.623 0.669 0.645 v2-en-wk5m NERC-Coarse 0.782 0.827 0.804 0.81 0.842 0.826 NERC-Fine 0.697 0.779 0.736 0.646 0.694 0.669 Table 6 NERC results on German (CLEF-HIPE-2022). hipe-2020 ajmc P R F1 P R F1 no-context NERC-Coarse 0.757 0.792 0.774 0.913 0.903 0.908 NERC-Fine 0.658 0.724 0.689 0.860 0.901 0.880 v2-en-wk5m NERC-Coarse 0.78 0.787 0.784 0.946 0.921 0.934 NERC-Fine 0.657 0.71 0.682 0.915 0.898 0.906 Table 7 NERC results on English (CLEF-HIPE-2022). hipe-2020 ajmc P R F1 P R F1 no-context NERC-Coarse 0.604 0.619 0.612 0.831 0.851 0.841 NERC-Fine – – – 0.745 0.822 0.781 v2-en-wk5m NERC-Coarse 0.624 0.617 0.620 0.824 0.876 0.850 NERC-Fine – – – 0.754 0.848 0.798 a general improvement for all languages for ajmc. The additional contexts for hipe-2020 behave differently. For French, our baseline model performed better for coarse granularity with exact boundary matching. For German, contexts improved performance for coarse granularity while slightly negatively affecting fine granularity. Finally, for English, the KG Embedding Retrieval Module 2 boosted the performance for the coarse-grained entities. 5. Entity Linking In CLEF-HIPE-2022, the EL task consists in the disambiguation of named entities using two settings: • EL-only: The ground-truth regarding the entity mentions is provided, hence the entity disambiguation runs uses the gold entity mentions of NERC and the only variable is the EL system; • End-to-end EL: No prior knowledge of the named entities is given, therefore EL has to be performed over the named entities predicted with the NERC models (no-context and v2-en-wk5m). Table 8 EL results (CLEF-HIPE-2022) for the hipe-2020 dataset. Language Setting P R F1 P R F1 relaxed strict EL-only 0.620 0.620 0.620 0.602 0.602 0.602 French no-context 0.563 0.594 0.578 0.546 0.576 0.560 v2-en-wk5m 0.560 0.592 0.576 0.543 0.574 0.558 EL-only 0.497 0.497 0.497 0.481 0.481 0.481 German no-context 0.453 0.473 0.463 0.438 0.458 0.447 v2-en-wk5m 0.462 0.466 0.464 0.446 0.451 0.449 EL-only 0.546 0.546 0.546 0.546 0.546 0.546 English no-context 0.471 0.465 0.468 0.471 0.465 0.468 v2-en-wk5m 0.463 0.474 0.469 0.463 0.474 0.469 Our EL model is based on the same neural approach that we proposed for CLEF-HIPE-2020 [13]. It is combined with a filtering process to analyze the historical mentions and to disambiguate them using the Wikidata KB [36]. Combining information from Wikipedia, Wikidata, and DBpedia allows a thorough analysis of the characteristics of the entities and, as in CLEF-HIPE- 2020, it helped our method to correctly disambiguate mentions in historical documents. Table 8 presents our EL scores for CLEF-HIPE-2022 in terms of P, R, and F1 for the hipe-2020 dataset. It can be observed that adding contexts to German and English has a negative impact on the recall which is consistent with our NERC results (cf. Table 6 and Table 7). Results also show that applying additional contexts to French does not increase performances. The extended results and ranking of CLEF-HIPE-2022 are available at the official website of the evaluation campaign17 . 17 https://hipe-eval.github.io/HIPE-2022/results 6. Conclusions For the participation of our team (L3i) in CLEF-HIPE-2022, we proposed two neural-based methods for the tasks of NERC and EL. We conclude, for NERC, that our joker-based approach generally performed well, due to the additional KG-based contexts and model improvements in regards to the treatment of such contexts. For EL, the model we proposed for CLEF-HIPE-2020 confirmed its good performance, with and without context. Finally, we consider that external knowledge has brought clear improvements to both our approaches and future work on this subject could furthermore prove the utility and importance of high-quality symbolic knowledge. Acknowledgments This work has been supported by the ANNA (2019-1R40226) and TERMITRAD (2020-2019- 8510010) projects funded by the Nouvelle-Aquitaine Region, France. We would like to also thank Nicolas Sidère and Jean-Loup Guillaume for the insightful discussions. References [1] F. Boschetti, A. Cimino, F. Dell’Orletta, G. Lebani, L. Passaro, P. Picchi, G. Venturi, S. Mon- temagni, A. Lenci, Computational analysis of historical documents: An application to italian war bulletins in world war i and ii, in: Workshop on Language resources and technologies for processing and linking historical documents and archives (LRT4HDA 2014), ELRA, 2014, pp. 70–75. [2] M. Rovera, F. Nanni, S. P. Ponzetto, Providing advanced access to historical war memoirs through the identification of events, participants and roles (2019). [3] A. Cybulska, P. Vossen, Historical event extraction from text, in: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, 2011, pp. 39–43. [4] E. Boschee, P. Natarajan, R. Weischedel, Automatic extraction of events from open source text for predictive forecasting, in: Handbook of Computational Approaches to Counterterrorism, Springer, 2013, pp. 51–67. [5] S. Oberbichler, E. Boroş, A. Doucet, J. Marjanen, E. Pfanzelter, J. Rautiainen, H. Toivonen, M. Tolonen, Integrated interdisciplinary workflows for research on historical newspapers: Perspectives from humanities scholars, computer scientists, and librarians, Journal of the Association for Information Science and Technology 73 (2022) 225–239. [6] S. Hechl, P.-C. Langlais, J. Marjanen, S. Oberbichler, E. Pfanzelter, Digital interfaces of historical newspapers: opportunities, restrictions and recommendations, Journal of Data Mining & Digital Humanities (2021). [7] S. Oberbichler, E. Pfanzelter, Topic-specific corpus building: A step towards a representative newspaper corpus on the topic of return migration using text mining methods, Journal of Digital History 1 (2021) 74–98. [8] M. Ehrmann, A. Hamdi, E. Linhares Pontes, M. Romanello, A. Douvet, A Survey of Named Entity Recognition and Classification in Historical Documents, ACM Computing Surveys (2022 (to appear)). URL: https://arxiv.org/abs/2109.11406. [9] E. Linhares Pontes, L. A. Cabrera-Diego, J. G. Moreno, E. Boros, A. Hamdi, N. Sidère, M. Coustaty, A. Doucet, Entity linking for historical documents: challenges and solutions, in: International Conference on Asian Digital Libraries, Springer, 2020, pp. 215–231. [10] M. Ehrmann, M. Romanello, S. Bircher, S. Clematide, Introducing the CLEF 2020 HIPE shared task: Named entity recognition and linking on historical newspapers, in: J. M. Jose, E. Yilmaz, J. Magalhães, P. Castells, N. Ferro, M. J. Silva, F. Martins (Eds.), Advances in information retrieval, Springer International Publishing, Cham, 2020, pp. 524–532. [11] M. Ehrmann, M. Romanello, A. Flückiger, S. Clematide, Overview of clef hipe 2020: Named entity recognition and linking on historical newspapers, in: International Conference of the Cross-Language Evaluation Forum for European Languages, Springer, 2020, pp. 288–310. [12] M. Ehrmann, M. Romanello, A. Flückiger, S. Clematide, Extended overview of clef hipe 2020: named entity processing on historical newspapers, in: CEUR Workshop Proceedings, 2696, CEUR-WS, 2020. [13] E. Boros, E. L. Pontes, L. A. Cabrera-Diego, A. Hamdi, J. G. Moreno, N. Sidère, A. Doucet, Robust named entity recognition and linking on historical multilingual documents, in: Conference and Labs of the Evaluation Forum (CLEF 2020), volume 2696, CEUR-WS Working Notes, 2020, pp. 1–17. [14] K. Labusch, C. Neudecker, Named entity disambiguation and linking historic newspaper ocr with bert., in: CLEF (Working Notes), 2020. [15] S. Schweter, L. März, Triple e-effective ensembling of embeddings and language models for ner of historical german., in: CLEF (Working Notes), 2020. [16] V. Provatorova, S. Vakulenko, E. Kanoulas, K. Dercksen, J. M. van Hulst, Named entity recognition and linking on historical newspapers: Uva. ilps & rel at clef hipe 2020 (2020). [17] T. Kristanti, L. Romary, Delft and entity-fishing: Tools for clef hipe 2020 shared task, in: CLEF 2020-Conference and Labs of the Evaluation Forum, volume 2696, CEUR, 2020. [18] C. B. El Vaigh, G. Le Noé-Bienvenu, G. Gravier, P. Sébillot, Irisa system for entity detection and linking at clef hipe 2020, in: CEUR Workshop Proceedings, 2020. [19] P. J. O. Suárez, Y. Dupont, G. Lejeune, T. Tian, Sinner@ clef-hipe2020: Sinful adaptation of sota models for named entity recognition in french and german, in: CLEF (Working Notes), 2020. [20] M. Ehrmann, M. Romanello, A. Doucet, S. Clematide, Introducing the hipe 2022 shared task: Named entity recognition and linking in multilingual historical documents, in: European Conference on Information Retrieval, Springer, 2022, pp. 347–354. [21] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polo- sukhin, Attention is all you need, Advances in neural information processing systems 30 (2017). [22] N. Kolitsas, O.-E. Ganea, T. Hofmann, End-to-end neural entity linking, in: Proceedings of the 22nd Conference on Computational Natural Language Learning, 2018, pp. 519–529. [23] S.-A. Rebuffi, H. Bilen, A. Vedaldi, Learning multiple visual domains with residual adapters, Advances in neural information processing systems 30 (2017). [24] J. Pfeiffer, I. Vulić, I. Gurevych, S. Ruder, MAD-X: An Adapter-Based Framework for Multi- Task Cross-Lingual Transfer, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, 2020, pp. 7654–7673. URL: https://aclanthology.org/2020.emnlp-main.617. doi:10. 18653/v1/2020.emnlp-main.617. [25] E. Boroş, A. Hamdi, E. L. Pontes, L.-A. Cabrera-Diego, J. G. Moreno, N. Sidere, A. Doucet, Alleviating digitization errors in named entity recognition for historical documents, in: Proceedings of the 24th conference on computational natural language learning, 2020, pp. 431–441. [26] M. Romanello, S. Najem-Meyer, B. Robertson, Optical character recognition of 19th century classical commentaries: the current state of affairs, in: The 6th International Workshop on Historical Document Imaging and Processing, 2021, pp. 1–6. [27] S. Mayhew, T. Tsygankova, D. Roth, ner and pos when nothing is capitalized, in: Proceed- ings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong, China, 2019, pp. 6256–6261. URL: https://aclanthology.org/D19-1650. doi:10.18653/v1/D19-1650. [28] X. Wang, Y. Shen, J. Cai, T. Wang, X. Wang, P. Xie, F. Huang, W. Lu, Y. Zhuang, K. Tu, et al., Damo-nlp at semeval-2022 task 11: A knowledge-based system for multilingual named entity recognition, arXiv preprint arXiv:2203.00545 (2022). [29] X. Wang, T. Gao, Z. Zhu, Z. Liu, J. Li, J. Tang, Kepler: A unified model for knowledge embedding and pre-trained language representation, arXiv preprint arXiv:1911.06136 (2019). [30] I. Yamada, A. Asai, J. Sakuma, H. Shindo, H. Takeda, Y. Takefuji, Y. Matsumoto, Wikipedia2Vec: An efficient toolkit for learning and visualizing the embeddings of words and entities from Wikipedia, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics, 2020, pp. 23–30. [31] N. Reimers, I. Gurevych, Sentence-BERT: Sentence embeddings using Siamese BERT- networks, in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong, China, 2019, pp. 3982–3992. URL: https://aclanthology.org/D19-1410. doi:10.18653/v1/ D19-1410. [32] N. Reimers, I. Gurevych, Making monolingual sentence embeddings multilingual using knowledge distillation, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 4512–4525. [33] Z. Sun, Z.-H. Deng, J.-Y. Nie, J. Tang, Rotate: Knowledge graph embedding by relational rotation in complex space, arXiv preprint arXiv:1902.10197 (2019). [34] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, S. Gelly, Parameter-efficient transfer learning for nlp, in: International Conference on Machine Learning, PMLR, 2019, pp. 2790–2799. [35] J. Makhoul, F. Kubala, R. Schwartz, R. Weischedel, et al., Performance measures for information extraction, in: Proceedings of DARPA broadcast news workshop, Herndon, VA, 1999, pp. 249–252. [36] E. Linhares Pontes, L. A. Cabrera-Diego, J. G. Moreno, E. Boros, A. Hamdi, A. Doucet, N. Sidere, M. Coustaty, Melhissa: a multilingual entity linking architecture for historical press articles, International Journal on Digital Libraries 23 (2022) 133–160.