<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Transformer-based Subject Entity Detection in Wikipedia Listings</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Nicolas</forename><surname>Heist</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Data and Web Science Group</orgName>
								<orgName type="institution">University of Mannheim</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Heiko</forename><surname>Paulheim</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Data and Web Science Group</orgName>
								<orgName type="institution">University of Mannheim</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Transformer-based Subject Entity Detection in Wikipedia Listings</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">2E4F7665E631D61D9077A0D69E2AC36F</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T09:16+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Subject Entity Detection</term>
					<term>Named Entity Recognition</term>
					<term>Wikipedia Listings</term>
					<term>CaLiGraph</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In tasks like question answering or text summarisation, it is essential to have background knowledge about the relevant entities. The information about entities -and in particular, about long-tail or emerging entities -in publicly available knowledge graphs like DBpedia or CaLiGraph is far from complete. In this paper, we present an approach that exploits the semi-structured nature of listings (like enumerations and tables) to identify the main entities of the listing items (i.e., of entries and rows). These entities, which we call subject entities, can be used to increase the coverage of knowledge graphs. Our approach uses a transformer network to identify subject entities on token-level and surpasses an existing approach in terms of performance while being bound by fewer limitations. Due to a flexible input format, it is applicable to any kind of listing and is, unlike prior work, not dependent on entity boundaries as input. We demonstrate our approach by applying it to the complete Wikipedia corpus and extract 40 million mentions of subject entities with an estimated precision of 71% and recall of 77%. The results are incorporated in the most recent version of CaLiGraph.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction 1.Motivation</head><p>Background knowledge provides an essential advantage in tasks like text summarisation or question answering. With ready-to-use entity linking tools like Falcon <ref type="bibr" target="#b1">[1]</ref>, entities in text can be identified and additional information can be drawn from background knowledge graphs (e.g. DBpedia <ref type="bibr" target="#b2">[2]</ref> or CaLiGraph 1 <ref type="bibr" target="#b3">[3]</ref>). Of course, this is only possible if the necessary information about the entity is included in the knowledge graph <ref type="bibr" target="#b4">[4]</ref>.</p><p>Hence, it is important to equip knowledge graphs with as much entity knowledge as possible. While this is easily possible for prominent entities that are mentioned frequently, the retrieval of information about long-tail and emerging entities that are mentioned only very infrequently is tedious <ref type="bibr" target="#b5">[5]</ref>. Still, approaches for automatic information extraction can be applied to increase</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Gilby Clarke</head><p>----------------Discography --------------</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Albums with Guns N' Roses</head><p>-The Spaghetti Incident? (1993) -Greatest Hits the coverage of knowledge graphs to a certain extent. One strand of research is concerned with open information extraction systems that try to extract facts from web text (e.g. <ref type="bibr" target="#b6">[6,</ref><ref type="bibr" target="#b7">7]</ref>). While they perform strongly on well-known entities, the extraction quality for long-tail entities is considerably worse <ref type="bibr" target="#b6">[6]</ref>.</p><p>The extraction of information from semi-structured data is in general less error-prone and has already proven to yield high-quality results as, for example, DBpedia itself is extracted primarily from Wikipedia infoboxes; other approaches use the category system of Wikipedia <ref type="bibr" target="#b8">[8,</ref><ref type="bibr" target="#b9">9,</ref><ref type="bibr" target="#b10">10]</ref>; many more approaches focus on tables (in Wikipedia or the web) as a semi-structured data source to extract entities and relations (see <ref type="bibr" target="#b11">[11]</ref> for a comprehensive survey).</p><p>In this work, we generalize over structures like enumerations (Listings 1 and 2) and tables (Listing 3 in Figure <ref type="figure" target="#fig_0">1</ref>) by simply considering them as listings with listing items (i.e., enumeration entries or table rows). Further, we call the main entity, that a listing item is about, a subject entity (SE). In previous work, we defined SEs as all entities in a listing appearing as instances to a common concept <ref type="bibr" target="#b12">[12]</ref>. In case of Figure <ref type="figure" target="#fig_0">1</ref>, the SEs are the mentioned albums (e.g. The Spaghetti Incident? or California Girl). Here, the common concept is made explicit through the section labels above the listings (Albums with..), but it may as well be the case that it is only implicitly defined through the respective SEs. As a listing item typically mentions only one SE together with some context (in this case, the publication year of the album), we assume that at most one SE per listing item exists.</p><p>In the English Wikipedia chapter alone, we find almost five million listings in roughly two million articles. From our estimation, about 80% of the listings are suitable for the extraction of SEs, bearing an immense potential for knowledge graph completion (for details, see Section 3.1). Upon extraction, they can easily be digested by downstream applications: Due to the semi-structured nature of listings, the quality of extraction is higher than extraction from plain text, and SEs are typically extracted in groups of instances sharing a common concept (as given by the definition above). Especially the latter point makes subsequent disambiguation step much easier, as the group of extracted instances provides context for every individual instance. Another example of the downstream use of SEs is a work of ours where we used groups of SEs to learn lexical patterns that entail axioms <ref type="bibr" target="#b12">[12]</ref>. For example, if a listing is in a section that starts with Albums with, we learn that the SEs are of the type Album.</p><p>The combination of these two ideas, i.e. of extracting novel SEs and learning defining axioms for them, can bring a big benefit. In Figure <ref type="figure" target="#fig_0">1</ref>, instead of simply discovering California Girl as a new entity, we additionally assign the type Album. Thinking further, we can learn an axiom that all albums mentioned in the discography of Gilby Clarke are albums that are authored by him. The additional information can be used to refine the description of the extracted entity in the knowledge graph.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2.">Problem Statement</head><p>Given an arbitrary listing, we want to identify the SEs among all entities mentioned in the listing. In the literature, there are only very few approaches that deal with this problem. The most related approach is a previous work of the authors that is concerned with the detection of SEs in Wikipedia list pages <ref type="bibr" target="#b3">[3]</ref>. <ref type="foot" target="#foot_0">2</ref> The approach uses a hand-crafted set of features to classify entities in tables or enumerations of list pages as SEs. However, the approach has several limitations:</p><p>• It is only applicable to list pages and not to listings in any other context as the features are primarily designed for the list page context. • Dependencies between individual SEs of listing items are not taken into account as the classification is done separately for every item. • The approach needs mention boundaries of entities as input for the classification. Consequently, it cannot identify any new entities but only categorize existing entities into subject and non-subject entities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.3.">Contributions</head><p>To harness the information expressed through SEs in more general settings, we aim to overcome the previously mentioned limitations in this work. In particular, we make the following contributions:</p><p>• We present a Transformer-based approach for SE detection with a flexible input format that allows us to apply it to any kind of listing. Further, the model takes dependencies between listing items into account (Section 4.1). • During prediction, the approach detects SEs end-to-end without relying on mention boundaries of the entities in the input sequence (Section 4.2). • We introduce a novel mechanism for generating negative samples of listings (Section 4.3) and a fine-tuning mechanism on noisy listing labels (Section 4.4) leading to more accurate prediction results.</p><p>• In our evaluation, we show that the performance of our approach is superior to previous work (Section 5.3); further, we analyse its performance in a more general scenario -that is, arbitrary listings of Wikipedia pages (Section 5.4). • We run the extraction of SEs on the complete Wikipedia corpus and incorporate the results in a new version of CaLiGraph (Section 5.6).</p><p>The produced code is publicly available and part of the CaLiGraph extraction framework.<ref type="foot" target="#foot_1">3</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>With the presented approach we detect SEs end-to-end, directly from listing text. For a given listing, we identify mentions of named entities and decide at the same time whether they are SEs of a listing or not. In the following, we first review Named Entity Recognition (NER) and subsequently discuss approaches that detect SEs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Named Entity Recognition</head><p>NER is a subproblem of Entity Linking (EL) which only tries to identify mentions of named entities in the text without actually disambiguating them <ref type="bibr" target="#b13">[13]</ref>. As opposed to general Entity Recognition, NER only deals with the identification of named entities and ignores the linking of concepts (also called Wikification) <ref type="bibr" target="#b14">[14]</ref>.</p><p>Early NER systems were based on hand-crafted rules and lexicons, followed by systems using feature-engineering and machine learning <ref type="bibr" target="#b15">[15]</ref>. One of the first competitive NER systems that used neural networks has been presented by <ref type="bibr">Collobert et al. in 2011 [16]</ref>. This eventually lead to more sophisticated architectures based on word embeddings and LSTMs (e.g. from Lample et al. <ref type="bibr" target="#b17">[17]</ref>).</p><p>With the rise of transformer networks <ref type="bibr" target="#b18">[18]</ref> like BERT <ref type="bibr" target="#b19">[19]</ref> in 2018, they also found their direct application in NER (e.g. by Liang et al. <ref type="bibr" target="#b20">[20]</ref>), or as part of an end-to-end EL system like the one from Broscheit <ref type="bibr" target="#b21">[21]</ref>. The latter uses a simple but effective prediction scheme, where entities are predicted at token-level and multiple subsequent tokens with the same predicted entity are collapsed into the actual entity prediction. In our work, we use a similar token-level prediction scheme to detect SEs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Subject Entity Detection</head><p>Although SE detection has not explicitly been addressed in the literature very frequently, there are some approaches that deal with related problems or subproblems of it. In table interpretation, an important task is the identification of the subject column, i.e. the column containing the entity with outgoing relations to all other columns. TAIPAN <ref type="bibr" target="#b22">[22]</ref> is an approach that aims to recover the semantics of tables and names subject column identification as the first major task towards relation extraction in tables. To identify subject columns, they choose the columns having entities with the most outgoing edges to entities in other columns w.r.t. a background knowledge graph. While this is a viable approach for tables that are already annotated with entities, it is not broadly applicable to general listings that may not have many known (or even annotated) entities.</p><p>Another related approach is from Zhao et al. <ref type="bibr" target="#b23">[23]</ref> who deal with a problem which they call key entity detection. Primarily, they do sentiment analysis in financial texts and use the detection of key entities -which they define to be subjects of events related to financial information -in order to attribute the positive or negative sentiment to a concrete entity. Similar to our proposed approach, they use a Transformer to detect key entities. However, they only use it to select the key entities from a predefined set of entities and ignore the NER part.</p><p>As mentioned in the introduction, the most closely related approach is the authors' prior work <ref type="bibr" target="#b3">[3]</ref>: using manually defined features and a binary XGBoost classifier, entities on list pages are classified into either subject entities or non-subject entities. For the page List of Japanese speculative fiction writers, <ref type="foot" target="#foot_2">4</ref> for example, all entities in the enumerations that are Japanese speculative fiction writers are classified as SEs.</p><p>More concretely, the approach uses page features (e.g. number of sections or tables on the page), positional features (e.g. indentation level of entry in the enumeration), and linguistic features (e.g. whether the column header is synonymous with the list page title). Overall, SEs are extracted with a precision of 90% and a recall of 67%. The classifier is trained and evaluated with a set of list pages that are annotated through distant supervision with DBpedia for background knowledge. This part is discussed in detail in Section 3.2 as the approach presented here relies on this training data generation strategy as well.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Preliminaries</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Listings in Wikipedia</head><p>Overall, the English Wikipedia has more than five million articles. Roughly two million of them contain at least one listing in the form of an enumeration or a table. All over these pages, we find 3.5 million enumerations and 1.4 million tables. <ref type="foot" target="#foot_3">5</ref> The roughly 90K list pages in Wikipedia contain the most structured and easily exploitable form of listings. Here, listings are almost exclusively used to list a number of entities that have some common property (e.g. all Japanese speculative fiction writers).</p><p>Listings that appear on other Wikipedia pages are used for this purpose as well but not exclusively, which makes the detection of SEs much more complex. From the inspection of a sample of Wikipedia listings, we estimate that approximately 85% of enumerations and 67% of tables are usable for our approach. Especially enumerations are often used to simply structure content (e.g. to list the individual episodes in a biography). But even if listings are used to describe entities, they may not be usable due to various reasons:</p><p>• Entity description without explicit mention (example in Figure <ref type="figure" target="#fig_1">2a</ref>) • Description of the properties of a single entity (example in Figure <ref type="figure" target="#fig_1">2b</ref>) • Listing items contain groups of entities (example in Figure <ref type="figure" target="#fig_1">2c</ref>) Especially the first point renders a big portion of tables useless for our approach as an entity is implicitly described through entities and literals mentioned in multiple table columns (e.g. a sports match is described through date, player, opponent, and result).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Distantly-Supervised Training Data Generation for List Pages</head><p>In our experiments, we will use the training data generation strategy that we introduced in previous work <ref type="bibr" target="#b3">[3]</ref>; to make this paper self-contained, we will give an overview of this strategy here. The strategy is based on the observation that DBpedia classes, Wikipedia categories, and Wikipedia list pages can be transformed into an immense taxonomy through linguistic and statistical methods. For example, the taxonomy contains the hierarchy Person &gt; Writer &gt; Speculative fiction writer &gt; Japanese speculative fiction writer. The first two elements originate from DBpedia classes, the third from a category, and the last from a list page.</p><p>As a consequence, we can use this hierarchy to infer the DBpedia classes of SEs for many list pages. To label the list page List of Japanese speculative fiction writers, we assign every entity with the DBpedia class Writer a positive label and every entity with a class that is disjoint with Writer a negative label. Then we include all listing items into our training set that either have an entity with a positive label or only entities with negative labels. Other listing items are ignored as we cannot be certain that they may contain SEs which we could not identify due to the incompleteness of DBpedia.</p><p>The knowledge graph CaLiGraph <ref type="bibr" target="#b9">[9,</ref><ref type="bibr" target="#b3">3]</ref> uses this extended taxonomy of DBpedia classes, categories, and list pages as a type hierarchy, and enriches the original DBpedia instances with additional, more fine-grained types. Furthermore, CaLiGraph contains a higher number of instances than DBpedia as it additionally contains the extracted SEs from list pages.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Transformers for Token Classification</head><p>Pre-trained transformer networks <ref type="bibr" target="#b18">[18]</ref> like BERT <ref type="bibr" target="#b19">[19]</ref> or DistilBERT <ref type="bibr" target="#b24">[24]</ref> produced new state-ofthe-art results for various NLP tasks including NER and question answering. To a large extent, their ubiquitous application is due to the fact that only a comparably small amount of fine-tuning is necessary to fit them to various tasks. BERT, for instance, consists of 12 multi-head attention layers followed by a simple linear layer as classification head. To apply a transformer model to a token classification problem, it is oftentimes sufficient to fine-tune the final classification head.</p><p>The input for a transformer model can consist of plain text and needs to be tokenized before it can be processed. Every word in the input sequence is transformed into one or more tokens (if the word is not contained in the vocabulary, multiple word-piece tokens are used). Further, the input sequence has to contain special tokens that indicate, for example, the start and the end of the sequence. Using BERT for token classification, the input sequence has a fixed length of 512 tokens, has to start with a [CLS] token and end with a [SEP] token. Additional special tokens may be introduced to provide more context information to the model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Subject Entity Detection with Transformers</head><p>To detect SEs in listings, we phrase the problem as a token classification problem where we, similar to the work of Broscheit <ref type="bibr" target="#b21">[21]</ref>, produce a label for every token of the input sequence. In a subsequent step, we aggregate the token labels to predictions of SE mentions. We use 13 different token labels, such as Person or Organisation, to identify SEs and additionally make a prediction of their types (refer to Table <ref type="table" target="#tab_3">5</ref> for the full list of labels). In Section 4.1 we explain how to create input sequences that preserve the context and the structure of a listing. In Section 4.2 we show our choice of labels for SE prediction, and in Section 4.3 we introduce a mechanism to generate negative samples of listings. Finally, Section 4.4 explains how to use noisy SE labels on page listings for further fine-tuning of our models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Token-level Subject Entity Detection</head><p>To pass a listing for SE detection to the transformer model, we use multiple special tokens in order to encode context information (page, section, potential table header) and structural information (entries, rows, columns) of the listing into the input sequence. Every sequence consists of the listing context, followed by the special token indicating the end of context [CXE], and one or more listing items:</p><formula xml:id="formula_0">[ C L S ] &lt; c o n t e x t &gt; [ C X E ] &lt; l i s t i n g i t e m s &gt; [ S E P ]</formula><p>We use the special token [CXS] to separate context elements. Within listing items, table rows and columns are indicated with [ROW] and [COL], respectively. For enumerations, we use the tokens [E1] to [En] to indicate the start of an entry with the indentation level 1 to n.</p><p>Ignoring that some words may be split into multiple tokens, the input for the first listing item of Listing 1 in Figure <ref type="figure" target="#fig_0">1</ref> looks as follows:</p><formula xml:id="formula_1">[ C L S ] G i l b y C l a r k</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>e [ C X S ] D i s c o g r a p h y [ C X S ] A l b u m s w i t h G u n s N ' R o s e s [ C X E ] [ E 1 ] T h e S p a g h e t t i I n c i d e n t ? ( 1 9 9 3 ) [ S E P ]</head><p>We want the model to take dependencies between listing entities into account. For example, if the SE in the first listing item is mentioned right in the beginning, it is very likely that this is the case for the remaining listing items as well. Instead of only providing one listing item per input sequence, we can provide as many as the input sequence length permits. Through the attention layers within the Transformer architecture, the model is able to take these dependencies within the input sequence into account. Hence, we put Listing 1 into one input sequence:</p><formula xml:id="formula_2">[ C L S ] G i l b y C l a r k</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>e [ C X S ] D i s c o g r a p h y [ C X S ] A l b u m s w i t h G u n s N ' R o s e s [ C X E ] [ E 1 ] T h e S p a g h e t t i I n c i d e n t ? ( 1 9 9 3 ) [ E 1 ] G r e a t e s t H i t s ( 1 9 9 9 ) [ S E P ]</head><p>Likewise, we encode Listing 3 as one input sequence:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>[ C L S ] G i l b y C l a r k e [ C X S ] D i s c o g r a p h y [ C X S ] S o l o a l b u m s [ C X S ] [ R O W ] N a m e [ C O L ] Y e a r [ C X E ]</head><formula xml:id="formula_3">[ R O W ] R u b b e r [ C O L ] 1 9 9 8 [ R O W ] S w a g [ C O L ] 2 0 0 1 [ S E P ]</formula><p>If the listing is too long to fit into one input sequence, we split the listing items into chunks and process them one after another. Each chunk is augmented with the same context information and a different set of listing items. Depending on the length of listing items, it is possible to fit 20 or more items into one input sequence. In our ablation study in Section 5.5 we show that this item chunking strategy has a strongly positive effect on the recall of the model. But apart from that we immensely reduce the run time of the model for training and prediction. The number of processed input sequences is reduced by a factor that is roughly equivalent to the median number of items per listing. <ref type="foot" target="#foot_4">6</ref></p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Coarse-grained Entity Type Prediction</head><p>The most common notation to tag tokens in NER is the BIO notation (Begin, Inside, and Outside of an entity) together with an entity type (e.g. Person or Organisation). We decided not to use the BIO notation as, per definition, there is at most one SE per listing item. Instead of making the task even simpler and getting rid of the entity type prediction in favor of a simple binary SE prediction task as well, we decided to stick with the coarse-grained entity type prediction. This has the advantage that the entity types can be used as additional information in downstream tasks -most importantly in a subsequent entity disambiguation step. In addition to that, we show in our ablation study in Section 5.5 that the more difficult task of entity type prediction even slightly increases the precision of the model.</p><p>Context and special tokens are annotated with the IGNORE label to indicate the model that we need no prediction for these tokens. SEs are annotated with the respective entity type, everything else is annotated with NONE. Again ignoring word-piece tokenization, the labels for Listing 1 of Figure <ref type="figure" target="#fig_0">1</ref> look as follows:</p><formula xml:id="formula_4">I G N O R E I G N O R E I G N O R E I G N O R E I G N O R E I G N O R E I G N O R E I G N O R E I G N O R E I G N O R E I G N O R E I G N O R E I G N O R E W O R K _ O F _ A R T W O R K _ O F _ A R T W O R K _ O F _ A R T N O N E I G N O R E W O R K _ O F _ A R T W O R K _ O F _ A R T N O N E I G N O R E</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Negative Sampling through Shuffled Listings</head><p>It is difficult to find negative examples of complete listings if the training data is generated heuristically and with distant supervision as described in Section 3.2. Positives can be found easily (i.e., there is an entity in the listing item that has the correct type), but the inverse does not always hold. If we do not find a positive, this may mean that the listing item does not contain one, but it is as well possible that the annotation is missing. From a logical standpoint, it is even unlikely that some items in a listing contain SEs while others do not.</p><p>To mitigate this problem, we equipped our approach with a sampling mechanism for negatives that randomly assembles them from the contexts and items of all positives in the training set. If the context and items are assembled randomly, the differences between the individual items (and the difference in the context) should be higher than in a real listing. The intention of this mechanism is that the model learns to identify the coherence between SEs of listing items as well as between items and the context.</p><p>For enumeration listings, the mechanism is simple as we pick the context from one listing and a random number of items (between three and the maximum number of items per chunk) from other listings. For table listings, we have to take care that the number of columns of an assembled listing is consistent. Hence, the positives from the training set are divided into groups of the same column size and listings are only assembled from within a single group. A negative example produced from four different listings could look as follows:</p><p>The mechanism has exactly one hyper-parameter which is the proportion of negative listings to generate. We experiment with values between 0.0 (no negative samples at all) and 1.0 (as many negatives as we have positives).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Fine-Tuning on Noisy Page Labels</head><p>The training data generation strategy described in Section 3.2 lets us create labels for listings of list pages that we use for the initial training of our models. To train a model that works well on listings of any pages, additional training data of listings that are not on list pages may be beneficial (the differences in listings have been described in Section 3.1).</p><p>We gather this data by first training a model using the heuristically labelled list pages. We apply the model to listings of all pages for noisy labels of SEs. We then filter them by discarding any listings where multiple types of SEs have been predicted (e.g., if the first SE of a listing is labelled as PERSON and the second is labelled as WORK_OF_ART ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Experiments</head><p>The goal of our experiments is to compare the performance of our approach against previous work on SE detection in list pages (Section 5.3) and evaluate its performance in the more general setting of Wikipedia page listings (Section 5.4). Further, we analyze some of our design choices in an ablation study (Section 5.5). Finally, we apply our best model to the complete Wikipedia corpus and report our extraction results (Section 5.6).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Metrics</head><p>For the evaluation of our SE detection models, we stick to the common metrics for NER introduced in SemEval-2013 <ref type="bibr" target="#b25">[25]</ref>. We report precision, recall, and F1-scores of the following scenarios:</p><p>• Partial: Prediction matches the boundary of the true entity at least partially.</p><p>• Exact: Prediction exactly matches the boundary of the true entity.</p><p>• Ent-Type: At least partial boundary match and entity type matches.</p><p>• Strict: Predicted boundary and type exactly match with the true entity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Datasets</head><p>In the experiments, we primarily focus on Wikipedia as a data corpus due to its encyclopedic structure and the convenient mapping of entities to DBpedia and CaLiGraph. From the main dataset 𝐷 which consists of all Wikipedia pages that contain listings, we create the subsets D-LP train and D-LP test (from list pages) as well as D-P train and D-P test (from any pages with listings). The statistics of the datasets are shown in Table <ref type="table" target="#tab_0">1</ref>. For the experiments, we use a dump of the English Wikipedia from October 2020 to be compatible with the latest release of CaLiGraph. The datasets D-LP train and D-LP test are created as explained in Section 3.2. For the experiments, we use a part of D-LP train for validation so that we have a distribution of 60% training, 20% validation, and 20% test set (similar to <ref type="bibr" target="#b3">[3]</ref>).</p><p>The datasets D-P train and D-P test consist of listings from arbitrary Wikipedia pages. Hence, no type information is available to infer the SE labels through distant supervision. For D-P train , we retrieved the labels as described in Section 4.4. For D-P test , we provided the type information by manually annotating the roughly 1K listings with coarse-grained entity types (e.g. Person or Organisation). We mapped these types to their DBpedia counterparts and used this information to infer the SE labels via distant supervision. This substantially reduced the annotation effort from labelling roughly 10K listing items with concrete SE labels to labelling 1K listings with coarse-grained types. This implies that this dataset is also, in part, heuristically created and the results have to be taken with a grain of salt.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Evaluation on Wikipedia List Pages</head><p>The evaluation results for experiments on the dataset D-LP test are given in Table <ref type="table" target="#tab_1">2</ref>. We compare the approach Heist and Paulheim <ref type="bibr" target="#b3">[3]</ref> with our model in the two configurations 𝑂𝑢𝑟𝑠 𝐿𝑃 <ref type="foot" target="#foot_5">7</ref> and model does not depend on mention boundaries as input (which might also account for some loss in performance).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.5.">Ablation Study</head><p>To verify some assumptions that we made during the design of the SE detection approach, we perform an ablation study using the page listings dataset D-P test . Firstly, we investigate how much chunking of items in input sequences influences the performance of the model. The results in Table <ref type="table" target="#tab_2">4</ref> show that it has a slightly positive effect on precision (3% for enumerations, 4% for tables) and a roughly doubling effect on recall. The results confirm our assumption that the model is able to improve its predictions by considering the dependencies between the listing items.</p><p>Further, we investigate whether the additional prediction of entity types has an influence on the performance (as opposed to a binary prediction of SEs). The results show that there is a positive effect on precision and a slightly negative effect on recall. As the F1 measure increases slightly and as the predicted types provide additional information for downstream tasks, we stick with type prediction instead of binary SE prediction.</p><p>Additionally, we see from Table <ref type="table" target="#tab_2">4</ref> that our negative sampling mechanism slightly increases the precision and recall of our final model. Consequently, the model seems to be able to learn whether there is some consistency between the listing items in the input sequence.</p><p>Finally, the fine-tuning on pages has a very strong effect on recall as it comes with an increase of 25% and the precision of the model is also increased by 5%. This result confirms that additional fine-tuning on noisy labels still yields a huge benefit.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.6.">Subject Entity Extraction over Wikipedia</head><p>Applying the model 𝑂𝑢𝑟𝑠 𝑓 𝑖𝑛𝑎𝑙 to the complete dataset of Wikipedia listings 𝐷 took 13 hours on a single NVIDIA RTX A6000 GPU with 48GB of RAM. We extracted a total of 40 million entity mentions from 2.7M enumerations and 1M tables on 1.7 million pages. Of the 40 million entity mentions, 19.5 million can be traced back to 3.8 million known entities (i.e., the predicted mention boundary overlapped with an existing link in Wikipedia, and hence, CaLiGraph), which means that each known entity has on average 5.1 mentions. If we use that same factor of 5.1 to estimate the number of entities for the remaining 20.5 million entity mentions, they describe 4 million additional unknown entities that could be added to the knowledge graph.</p><p>In Table <ref type="table" target="#tab_3">5</ref> we display the number of extracted entity mentions aggregated by entity type. Unsurprisingly, the most frequently extracted entities are of the types Person, Work of Art, Organisation, and Location. Apart from that, the mention type distribution roughly resembles the distribution of entities in DBpedia <ref type="bibr" target="#b27">[27]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>In this work, we have presented a Transformer-based SE detection approach that overcomes several limitations of prior work to make it applicable to more general settings and, at the same time, improve extraction performance. An evaluation of listings of Wikipedia pages shows that the performance for such a more general setting is considerably worse than for the scenario of Wikipedia list pages. While the inferior results can partly be attributed to conceptual limitations of SE detection in arbitrary listings (c.f. Section 3.1), further improvement is necessary so that the results can be consumed by downstream applications without extensive post-filtering.</p><p>We are developing a post-filtering mechanism that takes the differences within an extracted group of SEs into account. For example, we can discard a group of extracted SEs if their predicted entity types show a high degree of diversion.</p><p>In the extraction framework of CaLiGraph, we will integrate a subsequent entity disambiguation step, which matches the identified SE mentions with existing entities or creates new entities in the knowledge graph. The main challenge will be to match SEs with existing entities and at the same time match SEs with one another (as the same entity may be occurring in multiple listings).</p><p>Complementary to the disambiguation step, we plan to further enhance CaLiGraph by using the defining axioms extracted from the listing context. The disambiguation step can be supported by the information extracted from the axioms, and similarly, the disambiguated entities can help to refine the axiom extraction.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Simplified view on the listings of the Wikipedia page of Gilby Clarke.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Examples of Wikipedia page listings with layout or content that is challenging for SE detection.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Statistics of the datasets used for the experiments. The complete corpus 𝐷 contains all Wikipedia pages that have listings. D-LP train and D-LP test are extracted from all Wikipedia list pages and are labelled through distant supervision; D-P train contains listings from arbitrary pages and contains noisy labels from a model trained on list pages while D-P test is annotated manually.</figDesc><table><row><cell>Dataset</cell><cell>#Pages</cell><cell cols="2">#Listings</cell><cell cols="4">Items per Listing (Avg.) Items per Listing (Med.)</cell></row><row><cell></cell><cell></cell><cell>Enums</cell><cell>Tables</cell><cell>Enums</cell><cell>Tables</cell><cell>Enums</cell><cell>Tables</cell></row><row><cell>𝐷</cell><cell cols="3">1,980,021 3,463,053 1,352,848</cell><cell>10.57</cell><cell>14.43</cell><cell>6</cell><cell>8</cell></row><row><cell>D-LP train</cell><cell>68,494</cell><cell>289,666</cell><cell>116,715</cell><cell>18.06</cell><cell>31.26</cell><cell>8</cell><cell>12</cell></row><row><cell>D-LP test</cell><cell>17,123</cell><cell>75,063</cell><cell>28,688</cell><cell>18.17</cell><cell>31.32</cell><cell>8</cell><cell>12</cell></row><row><cell>D-P train</cell><cell>546,667</cell><cell>663,455</cell><cell>306,399</cell><cell>18.72</cell><cell>24.53</cell><cell>12</cell><cell>13</cell></row><row><cell>D-P test</cell><cell>502</cell><cell>763</cell><cell>265</cell><cell>8.42</cell><cell>11.25</cell><cell>6</cell><cell>7</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Evaluation results for SE detection on Wikipedia list pages (evaluating on D-LP test ). Precision, recall and F1-score (in %) are given for the Exact scenario. 𝑂𝑢𝑟𝑠 𝐿𝑃 is the best configuration for D-LP test while 𝑂𝑢𝑟𝑠 𝑃 is the best configuration for D-P test using D-LP train as training data.</figDesc><table><row><cell>Approach</cell><cell></cell><cell>Enums</cell><cell>Tables</cell><cell>Overall</cell></row><row><cell></cell><cell>P</cell><cell>R F1 P</cell><cell>R F1 P</cell><cell>R F1</cell></row><row><cell cols="5">Heist and Paulheim [3] 91 82 86 90 55 68 90 67 77</cell></row><row><cell>𝑂𝑢𝑟𝑠 𝐿𝑃</cell><cell cols="4">93 94 94 89 87 88 92 91 92</cell></row><row><cell>𝑂𝑢𝑟𝑠 𝑃</cell><cell cols="4">92 93 93 88 86 87 91 90 91</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 4</head><label>4</label><figDesc>Evaluation results for SE detection on Wikipedia page listings (evaluated on D-P test ) for variations of our best model configuration 𝑂𝑢𝑟𝑠 𝑓 𝑖𝑛𝑎𝑙 . Precision, recall and F1-score (in %) are given for the Exact scenario.</figDesc><table><row><cell>Approach</cell><cell></cell><cell>Enums</cell><cell>Tables</cell><cell>Overall</cell></row><row><cell></cell><cell>P</cell><cell>R F1 P</cell><cell>R F1 P</cell><cell>R F1</cell></row><row><cell>𝑂𝑢𝑟𝑠 𝑓 𝑖𝑛𝑎𝑙</cell><cell cols="4">73 76 75 67 81 73 71 77 74</cell></row><row><cell>.. without item chunks</cell><cell cols="4">70 35 47 63 40 49 68 37 48</cell></row><row><cell>.. without type prediction</cell><cell cols="4">69 78 73 54 84 66 64 79 71</cell></row><row><cell>.. without negative sampling</cell><cell cols="4">71 74 73 66 81 73 70 76 73</cell></row><row><cell cols="5">.. without fine-tuning on pages 65 48 55 67 64 66 66 52 58</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 5</head><label>5</label><figDesc>Number of extracted mentions of subject entities for the whole Wikipedia dataset of listings 𝐷 aggregated by entity type.</figDesc><table><row><cell>Entity Type</cell><cell cols="3">#Mentions Entity Type #Mentions Entity Type #Mentions</cell></row><row><cell>PERSON</cell><cell>13,622,704 GPE</cell><cell>1,519,747 NORP</cell><cell>230,707</cell></row><row><cell>OTHER</cell><cell>9,398,003 PRODUCT</cell><cell>1,000,117 LANGUAGE</cell><cell>86,354</cell></row><row><cell>WORK_OF_ART</cell><cell>7,148,235 SPECIES</cell><cell>964,922 LAW</cell><cell>11,490</cell></row><row><cell>ORG</cell><cell>2,916,528 FAC</cell><cell>893,226</cell><cell></cell></row><row><cell>LOC</cell><cell>1,531,452 EVENT</cell><cell>370,440 Total</cell><cell>39,693,925</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">List pages are special Wikipedia pages that contain only listings describing entities of a certain topic.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">https://github.com/nheist/CaLiGraph</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">https://en.wikipedia.org/wiki/List_of_Japanese_speculative_fiction_writers</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_3">These numbers exclude very small listings with less than three items, which we do not consider.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_4">We deliberately use the median and not the average of items per listing as large listings will be split into multiple input sequences due to the size limitation.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_5">Configuration: Model roberta-base trained for 3 epochs with batch size 64, learning rate 5e-5, no warmup or weight decay, negative sample size 0.5</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_6">Configuration: Model roberta-base trained for 2 epochs with batch size 64, learning rate 5e-5, no warmup or weight decay, negative sample size 0.3</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_7">Configuration: Similar to 𝑂𝑢𝑟𝑠 𝑃 with an additional fine-tuning step of one epoch on D-𝑃 𝑡𝑟𝑎𝑖𝑛 .</note>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"> <ref type="bibr" target="#b26">[26]</ref><p>. Both of our model configurations significantly outperform the existing approach Heist and Paulheim <ref type="bibr" target="#b3">[3]</ref>, especially in terms of recall for both enumerations and tables, showing that our model can identify substantially more entities while keeping a high level of precision. For enumerations, the precision increased slightly and the recall is over ten percent higher. While precision is kept almost constant for tables, the recall increased by more than 30 percent.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4.">Evaluation on Wikipedia Page Listings</head><p>The evaluation results for the model 𝑂𝑢𝑟𝑠 𝑓 𝑖𝑛𝑎𝑙 9 on D-P test is given in Table <ref type="table">3</ref>. Comparing the Exact scenario with the results on Wikipedia list pages, it becomes clear that the performance on arbitrary listings is worse. The losses in performance for tables are slightly higher than those for enumerations. This aligns with the observation that a lower portion of tables is usable for our approach. For tables, we have the advantage that mention boundaries are often indicated through column separators but this does not reflect in the results. In general, we notice that training the models for more than two to three epochs on D-LP train leads to overfitting on list page data and hence reduced performance on D-P test .</p><p>Unfortunately, it is not possible to apply the approach Heist and Paulheim <ref type="bibr" target="#b3">[3]</ref> to this dataset as it contains several features that are specific to list pages. As an alternative, we implemented the pick-first-entity baseline which has already proven as a strong baseline in prior work <ref type="bibr" target="#b3">[3]</ref>. In this baseline, we simply label the first mentioned entity in an item as SE. In Table <ref type="table">3</ref> we see that this baseline has a very high recall (as most SEs are mentioned in the beginning) while the precision is far lower than the one of 𝑂𝑢𝑟𝑠 𝑓 𝑖𝑛𝑎𝑙 . This shows that the model is able to sort out many false positives (tripling precision) by sacrificing only some correct SEs. In cases where coverage is not the only important criterion (as is usually the case in knowledge graph completion), our model should be preferred. The more important point, however, is that our</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">S</forename></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Falcon 2.0: An entity and relation linking tool over Wikidata</title>
		<author>
			<persName><forename type="first">A</forename><surname>Sakor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Patel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-E</forename><surname>Vidal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">29th ACM International Conference on Information &amp; Knowledge Management</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="3141" to="3148" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Dbpedia-a large-scale, multilingual knowledge base extracted from wikipedia</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Isele</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jakob</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jentzsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kontokostas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">N</forename><surname>Mendes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hellmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Morsey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Van Kleef</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Auer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Semantic web</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="167" to="195" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Entity extraction from Wikipedia list pages</title>
		<author>
			<persName><forename type="first">N</forename><surname>Heist</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Paulheim</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">European Semantic Web Conference</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="327" to="342" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Evaluating entity linking: An analysis of current benchmark datasets and a roadmap for doing a better job</title>
		<author>
			<persName><forename type="first">M</forename><surname>Van Erp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mendes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Paulheim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ilievski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Plu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Rizzo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Waitelonis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Tenth International Conference on Language Resources and Evaluation (LREC&apos;16)</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="4373" to="4379" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">On emerging entity detection</title>
		<author>
			<persName><forename type="first">M</forename><surname>Färber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rettinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">El</forename><surname>Asmar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">European Knowledge Acquisition Workshop</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="223" to="238" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Extracting knowledge from web text with monte carlo tree search</title>
		<author>
			<persName><forename type="first">G</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Web Conference</title>
				<imprint>
			<date type="published" when="2020">2020. 2020</date>
			<biblScope unit="page" from="2585" to="2591" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Supervised open information extraction</title>
		<author>
			<persName><forename type="first">G</forename><surname>Stanovsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Michael</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Dagan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long Papers</title>
		<imprint>
			<date type="published" when="2018">2018. 2018</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="885" to="895" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">YAGO: a core of semantic knowledge</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Suchanek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kasneci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Weikum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The World Wide Web Conference</title>
				<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="697" to="706" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Uncovering the semantics of Wikipedia categories</title>
		<author>
			<persName><forename type="first">N</forename><surname>Heist</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Paulheim</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Semantic Web Conference</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="219" to="236" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Learning defining features for categories</title>
		<author>
			<persName><forename type="first">B</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">25th International Joint Conference on Artificial Intelligence</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="3924" to="3930" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Web table extraction, retrieval, and augmentation: A survey</title>
		<author>
			<persName><forename type="first">S</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Balog</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Intelligent Systems and Technology (TIST)</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="1" to="35" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Information extraction from co-occurring similar entities</title>
		<author>
			<persName><forename type="first">N</forename><surname>Heist</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Paulheim</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Web Conference 2021</title>
				<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="3999" to="4009" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Design challenges for entity linking</title>
		<author>
			<persName><forename type="first">X</forename><surname>Ling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Weld</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Transactions of the ACL</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="315" to="328" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Learning to link with Wikipedia</title>
		<author>
			<persName><forename type="first">D</forename><surname>Milne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">H</forename><surname>Witten</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">17th ACM conference on Information and knowledge management</title>
				<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="509" to="518" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">A survey of named entity recognition and classification</title>
		<author>
			<persName><forename type="first">D</forename><surname>Nadeau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sekine</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Lingvisticae Investigationes</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="page" from="3" to="26" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Natural language processing (almost) from scratch</title>
		<author>
			<persName><forename type="first">R</forename><surname>Collobert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Weston</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bottou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Karlen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kavukcuoglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kuksa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of machine learning research</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="2493" to="2537" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Neural architectures for named entity recognition</title>
		<author>
			<persName><forename type="first">G</forename><surname>Lample</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ballesteros</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Subramanian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kawakami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Dyer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
				<imprint>
			<date type="published" when="2016">2016. 2016</date>
			<biblScope unit="page" from="260" to="270" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Attention is all you need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ł</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<imprint>
			<date type="published" when="2019">2019. 2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">BOND: BERT-assisted open-domain named entity recognition with distant supervision</title>
		<author>
			<persName><forename type="first">C</forename><surname>Liang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Er</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">26th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1054" to="1064" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Investigating entity knowledge in BERT with simple neural end-to-end entity linking</title>
		<author>
			<persName><forename type="first">S</forename><surname>Broscheit</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">23rd Conference on Computational Natural Language Learning (CoNLL)</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="677" to="685" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">TAIPAN: automatic property mapping for tabular data</title>
		<author>
			<persName><forename type="first">I</forename><surname>Ermilov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A.-C</forename><forename type="middle">N</forename><surname>Ngomo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">European Knowledge Acquisition Workshop</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="163" to="179" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">A BERT based sentiment analysis and key entity detection approach for online financial texts</title>
		<author>
			<persName><forename type="first">L</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2021">2021. 2021</date>
			<biblScope unit="page" from="1233" to="1238" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><surname>Sanh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wolf</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1910.01108</idno>
		<title level="m">DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<author>
			<persName><forename type="first">I</forename><surname>Segura-Bedmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Martínez Fernández</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Herrero</surname></persName>
		</author>
		<author>
			<persName><surname>Zazo</surname></persName>
		</author>
		<title level="m">SemEval-2013 task 9: Extraction of drug-drug interactions from biomedical texts (ddiextraction</title>
				<imprint>
			<date type="published" when="2013">2013. 2013</date>
		</imprint>
	</monogr>
	<note>ACL</note>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Transformers: State-of-the-art natural language processing</title>
		<author>
			<persName><forename type="first">T</forename><surname>Wolf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Sanh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Delangue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Moi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cistac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Louf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Funtowicz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2020 conference on empirical methods in natural language processing: system demonstrations</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="38" to="45" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Knowledge graphs on the web-an overview</title>
		<author>
			<persName><forename type="first">N</forename><surname>Heist</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hertling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ringler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Paulheim</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Knowledge Graphs for eXplainable Artificial Intelligence: Foundations, Applications and Challenges</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="3" to="22" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
