<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Deep Learning for Knowledge Graphs, October</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Transformer-based Subject Entity Detection in Wikipedia Listings</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nicolas Heist</string-name>
          <email>nico@informatik.uni-mannheim.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Heiko Paulheim</string-name>
          <email>heiko@informatik.uni-mannheim.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Subject Entity Detection, Named Entity Recognition, Wikipedia Listings, CaLiGraph</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data and Web Science Group, University of Mannheim</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>2</volume>
      <fpage>3</fpage>
      <lpage>27</lpage>
      <abstract>
        <p>In tasks like question answering or text summarisation, it is essential to have background knowledge about the relevant entities. The information about entities - and in particular, about long-tail or emerging entities - in publicly available knowledge graphs like DBpedia or CaLiGraph is far from complete. In this paper, we present an approach that exploits the semi-structured nature of listings (like enumerations and tables) to identify the main entities of the listing items (i.e., of entries and rows). These entities, which we call subject entities, can be used to increase the coverage of knowledge graphs. Our approach uses a transformer network to identify subject entities on token-level and surpasses an existing approach in terms of performance while being bound by fewer limitations. Due to a flexible input format, it is applicable to any kind of listing and is, unlike prior work, not dependent on entity boundaries as input. We demonstrate our approach by applying it to the complete Wikipedia corpus and extract 40 million mentions of subject entities with an estimated precision of 71% and recall of 77%. The results are incorporated in the most recent version of CaLiGraph.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>1.1. Motivation</title>
        <p>
          Background knowledge provides an essential advantage in tasks like text summarisation or
question answering. With ready-to-use entity linking tools like Falcon [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], entities in text can
be identified and additional information can be drawn from background knowledge graphs (e.g.
DBpedia [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] or CaLiGraph1 [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]). Of course, this is only possible if the necessary information
about the entity is included in the knowledge graph [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>
          Hence, it is important to equip knowledge graphs with as much entity knowledge as possible.
While this is easily possible for prominent entities that are mentioned frequently, the retrieval
of information about long-tail and emerging entities that are mentioned only very infrequently
is tedious [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Still, approaches for automatic information extraction can be applied to increase
nEvelop-O
LGOBE
∗Corresponding author.
http://www.uni-mannheim.de/dws/people/researchers/phd-students/nicolas-heist/ (N. Heist);
CEUR
Workshop
Proceedings
 Gilby Clarke
 ---  -- ---- -- -
--- Discography
   -- ---- - --- -
--  Albums with Guns N' Roses
   - The Spaghetti Incident? (1993)
   - Greatest Hits (1999)
  Albums with Nancy Sinatra
   - California Girl
  Solo albums
     Name           Year   
 -     Rubber         1998     
--     Swag            2001      
 ...
        </p>
        <p>Page Title</p>
        <p>
          Section
Listing 1
Listing 2
Listing 3
the coverage of knowledge graphs to a certain extent. One strand of research is concerned with
open information extraction systems that try to extract facts from web text (e.g. [
          <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
          ]). While
they perform strongly on well-known entities, the extraction quality for long-tail entities is
considerably worse [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
        <p>
          The extraction of information from semi-structured data is in general less error-prone and
has already proven to yield high-quality results as, for example, DBpedia itself is extracted
primarily from Wikipedia infoboxes; other approaches use the category system of Wikipedia
[
          <xref ref-type="bibr" rid="ref10 ref8 ref9">8, 9, 10</xref>
          ]; many more approaches focus on tables (in Wikipedia or the web) as a semi-structured
data source to extract entities and relations (see [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] for a comprehensive survey).
        </p>
        <p>
          In this work, we generalize over structures like enumerations (Listings 1 and 2) and tables
(Listing 3 in Figure 1) by simply considering them as listings with listing items (i.e., enumeration
entries or table rows). Further, we call the main entity, that a listing item is about, a subject
entity (SE). In previous work, we defined SEs as all entities in a listing appearing as instances to a
common concept [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. In case of Figure 1, the SEs are the mentioned albums (e.g. The Spaghetti
Incident? or California Girl). Here, the common concept is made explicit through the section
labels above the listings (Albums with..), but it may as well be the case that it is only implicitly
defined through the respective SEs. As a listing item typically mentions only one SE together
with some context (in this case, the publication year of the album), we assume that at most one
SE per listing item exists.
        </p>
        <p>
          In the English Wikipedia chapter alone, we find almost five million listings in roughly two
million articles. From our estimation, about 80% of the listings are suitable for the extraction of
SEs, bearing an immense potential for knowledge graph completion (for details, see Section 3.1).
Upon extraction, they can easily be digested by downstream applications: Due to the
semistructured nature of listings, the quality of extraction is higher than extraction from plain text,
and SEs are typically extracted in groups of instances sharing a common concept (as given
by the definition above). Especially the latter point makes subsequent disambiguation step
much easier, as the group of extracted instances provides context for every individual instance.
Another example of the downstream use of SEs is a work of ours where we used groups of SEs
to learn lexical patterns that entail axioms [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. For example, if a listing is in a section that
starts with Albums with, we learn that the SEs are of the type Album.
        </p>
        <p>The combination of these two ideas, i.e. of extracting novel SEs and learning defining axioms
for them, can bring a big benefit. In Figure 1, instead of simply discovering California Girl as a
new entity, we additionally assign the type Album. Thinking further, we can learn an axiom
that all albums mentioned in the discography of Gilby Clarke are albums that are authored by
him. The additional information can be used to refine the description of the extracted entity in
the knowledge graph.</p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Problem Statement</title>
        <p>
          Given an arbitrary listing, we want to identify the SEs among all entities mentioned in the listing.
In the literature, there are only very few approaches that deal with this problem. The most
related approach is a previous work of the authors that is concerned with the detection of SEs
in Wikipedia list pages [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].2 The approach uses a hand-crafted set of features to classify entities
in tables or enumerations of list pages as SEs. However, the approach has several limitations:
• It is only applicable to list pages and not to listings in any other context as the features
are primarily designed for the list page context.
• Dependencies between individual SEs of listing items are not taken into account as the
classification is done separately for every item.
• The approach needs mention boundaries of entities as input for the classification.
Consequently, it cannot identify any new entities but only categorize existing entities into
subject and non-subject entities.
        </p>
      </sec>
      <sec id="sec-1-3">
        <title>1.3. Contributions</title>
        <p>To harness the information expressed through SEs in more general settings, we aim to
overcome the previously mentioned limitations in this work. In particular, we make the following
contributions:
• We present a Transformer-based approach for SE detection with a flexible input format
that allows us to apply it to any kind of listing. Further, the model takes dependencies
between listing items into account (Section 4.1).
• During prediction, the approach detects SEs end-to-end without relying on mention
boundaries of the entities in the input sequence (Section 4.2).
• We introduce a novel mechanism for generating negative samples of listings (Section 4.3)
and a fine-tuning mechanism on noisy listing labels (Section 4.4) leading to more accurate
prediction results.
2List pages are special Wikipedia pages that contain only listings describing entities of a certain topic.
• In our evaluation, we show that the performance of our approach is superior to previous
work (Section 5.3); further, we analyse its performance in a more general scenario - that
is, arbitrary listings of Wikipedia pages (Section 5.4).
• We run the extraction of SEs on the complete Wikipedia corpus and incorporate the
results in a new version of CaLiGraph (Section 5.6).</p>
        <p>The produced code is publicly available and part of the CaLiGraph extraction framework.3</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>With the presented approach we detect SEs end-to-end, directly from listing text. For a given
listing, we identify mentions of named entities and decide at the same time whether they are
SEs of a listing or not. In the following, we first review Named Entity Recognition (NER) and
subsequently discuss approaches that detect SEs.</p>
      <sec id="sec-2-1">
        <title>2.1. Named Entity Recognition</title>
        <p>
          NER is a subproblem of Entity Linking (EL) which only tries to identify mentions of named
entities in the text without actually disambiguating them [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. As opposed to general Entity
Recognition, NER only deals with the identification of named entities and ignores the linking of
concepts (also called Wikification) [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
        <p>
          Early NER systems were based on hand-crafted rules and lexicons, followed by systems using
feature-engineering and machine learning [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. One of the first competitive NER systems that
used neural networks has been presented by Collobert et al. in 2011 [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. This eventually lead
to more sophisticated architectures based on word embeddings and LSTMs (e.g. from Lample et
al. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]).
        </p>
        <p>With the rise of transformer networks [18] like BERT [19] in 2018, they also found their
direct application in NER (e.g. by Liang et al. [20]), or as part of an end-to-end EL system like
the one from Broscheit [21]. The latter uses a simple but efective prediction scheme, where
entities are predicted at token-level and multiple subsequent tokens with the same predicted
entity are collapsed into the actual entity prediction. In our work, we use a similar token-level
prediction scheme to detect SEs.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Subject Entity Detection</title>
        <p>Although SE detection has not explicitly been addressed in the literature very frequently, there
are some approaches that deal with related problems or subproblems of it. In table interpretation,
an important task is the identification of the subject column, i.e. the column containing the
entity with outgoing relations to all other columns. TAIPAN [22] is an approach that aims to
recover the semantics of tables and names subject column identification as the first major task
towards relation extraction in tables. To identify subject columns, they choose the columns
having entities with the most outgoing edges to entities in other columns w.r.t. a background
knowledge graph. While this is a viable approach for tables that are already annotated with</p>
        <sec id="sec-2-2-1">
          <title>3https://github.com/nheist/CaLiGraph</title>
          <p>entities, it is not broadly applicable to general listings that may not have many known (or even
annotated) entities.</p>
          <p>Another related approach is from Zhao et al. [23] who deal with a problem which they call
key entity detection. Primarily, they do sentiment analysis in financial texts and use the detection
of key entities - which they define to be subjects of events related to financial information
- in order to attribute the positive or negative sentiment to a concrete entity. Similar to our
proposed approach, they use a Transformer to detect key entities. However, they only use it to
select the key entities from a predefined set of entities and ignore the NER part.</p>
          <p>
            As mentioned in the introduction, the most closely related approach is the authors’ prior
work [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]: using manually defined features and a binary XGBoost classifier, entities on list pages
are classified into either subject entities or non-subject entities. For the page List of Japanese
speculative fiction writers ,4 for example, all entities in the enumerations that are Japanese
speculative fiction writers are classified as SEs.
          </p>
          <p>More concretely, the approach uses page features (e.g. number of sections or tables on the
page), positional features (e.g. indentation level of entry in the enumeration), and linguistic
features (e.g. whether the column header is synonymous with the list page title). Overall, SEs are
extracted with a precision of 90% and a recall of 67%. The classifier is trained and evaluated with
a set of list pages that are annotated through distant supervision with DBpedia for background
knowledge. This part is discussed in detail in Section 3.2 as the approach presented here relies
on this training data generation strategy as well.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Preliminaries</title>
      <sec id="sec-3-1">
        <title>3.1. Listings in Wikipedia</title>
        <p>Overall, the English Wikipedia has more than five million articles. Roughly two million of them
contain at least one listing in the form of an enumeration or a table. All over these pages, we
ifnd 3.5 million enumerations and 1.4 million tables. 5 The roughly 90K list pages in Wikipedia
contain the most structured and easily exploitable form of listings. Here, listings are almost
exclusively used to list a number of entities that have some common property (e.g. all Japanese
speculative fiction writers).</p>
        <p>Listings that appear on other Wikipedia pages are used for this purpose as well but not
exclusively, which makes the detection of SEs much more complex. From the inspection of a
sample of Wikipedia listings, we estimate that approximately 85% of enumerations and 67% of
tables are usable for our approach. Especially enumerations are often used to simply structure
content (e.g. to list the individual episodes in a biography). But even if listings are used to
describe entities, they may not be usable due to various reasons:
• Entity description without explicit mention (example in Figure 2a)
• Description of the properties of a single entity (example in Figure 2b)
• Listing items contain groups of entities (example in Figure 2c)</p>
        <sec id="sec-3-1-1">
          <title>4https://en.wikipedia.org/wiki/List_of_Japanese_speculative_fiction_writers 5These numbers exclude very small listings with less than three items, which we do not consider.</title>
          <p>(a) Listing containing no explicit mention of the entities (Source: https://en.wikipedia.org/wiki/Sunrisers_
Hyderabad_in_2018)
(b) Listing describing the properties of an entity (Source: https://en.wikipedia.org/wiki/Dynamic_HTML)
(c) Listing containing groups of entities (Source: https://en.wikipedia.org/wiki/Ibiza_(Vino_de_la_Tierra))</p>
          <p>Especially the first point renders a big portion of tables useless for our approach as an entity
is implicitly described through entities and literals mentioned in multiple table columns (e.g. a
sports match is described through date, player, opponent, and result).</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Distantly-Supervised Training Data Generation for List Pages</title>
        <p>
          In our experiments, we will use the training data generation strategy that we introduced in
previous work [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]; to make this paper self-contained, we will give an overview of this strategy
here. The strategy is based on the observation that DBpedia classes, Wikipedia categories,
and Wikipedia list pages can be transformed into an immense taxonomy through linguistic
and statistical methods. For example, the taxonomy contains the hierarchy Person &gt; Writer &gt;
Speculative fiction writer &gt; Japanese speculative fiction writer . The first two elements originate
from DBpedia classes, the third from a category, and the last from a list page.
        </p>
        <p>As a consequence, we can use this hierarchy to infer the DBpedia classes of SEs for many list
pages. To label the list page List of Japanese speculative fiction writers , we assign every entity
with the DBpedia class Writer a positive label and every entity with a class that is disjoint with
Writer a negative label. Then we include all listing items into our training set that either have
an entity with a positive label or only entities with negative labels. Other listing items are
ignored as we cannot be certain that they may contain SEs which we could not identify due to
the incompleteness of DBpedia.</p>
        <p>
          The knowledge graph CaLiGraph [
          <xref ref-type="bibr" rid="ref3 ref9">9, 3</xref>
          ] uses this extended taxonomy of DBpedia classes,
categories, and list pages as a type hierarchy, and enriches the original DBpedia instances with
additional, more fine-grained types. Furthermore, CaLiGraph contains a higher number of
instances than DBpedia as it additionally contains the extracted SEs from list pages.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Transformers for Token Classification</title>
        <p>Pre-trained transformer networks [18] like BERT [19] or DistilBERT [24] produced new
state-ofthe-art results for various NLP tasks including NER and question answering. To a large extent,
their ubiquitous application is due to the fact that only a comparably small amount of fine-tuning
is necessary to fit them to various tasks. BERT, for instance, consists of 12 multi-head attention
layers followed by a simple linear layer as classification head. To apply a transformer model to a
token classification problem, it is oftentimes suficient to fine-tune the final classification head.</p>
        <p>The input for a transformer model can consist of plain text and needs to be tokenized before
it can be processed. Every word in the input sequence is transformed into one or more tokens
(if the word is not contained in the vocabulary, multiple word-piece tokens are used). Further,
the input sequence has to contain special tokens that indicate, for example, the start and the
end of the sequence. Using BERT for token classification, the input sequence has a fixed length
of 512 tokens, has to start with a [CLS] token and end with a [SEP] token. Additional special
tokens may be introduced to provide more context information to the model.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Subject Entity Detection with Transformers</title>
      <p>To detect SEs in listings, we phrase the problem as a token classification problem where we,
similar to the work of Broscheit [21], produce a label for every token of the input sequence.
In a subsequent step, we aggregate the token labels to predictions of SE mentions. We use 13
diferent token labels, such as Person or Organisation, to identify SEs and additionally make a
prediction of their types (refer to Table 5 for the full list of labels). In Section 4.1 we explain how
to create input sequences that preserve the context and the structure of a listing. In Section 4.2
we show our choice of labels for SE prediction, and in Section 4.3 we introduce a mechanism to
generate negative samples of listings. Finally, Section 4.4 explains how to use noisy SE labels
on page listings for further fine-tuning of our models.</p>
      <sec id="sec-4-1">
        <title>4.1. Token-level Subject Entity Detection</title>
        <p>To pass a listing for SE detection to the transformer model, we use multiple special tokens
in order to encode context information (page, section, potential table header) and structural
information (entries, rows, columns) of the listing into the input sequence. Every sequence
consists of the listing context, followed by the special token indicating the end of context [CXE],
and one or more listing items:
[ C L S ] &lt; c o n t e x t &gt; [ C X E ] &lt; l i s t i n g i t e m s &gt; [ S E P ]</p>
        <p>We use the special token [CXS] to separate context elements. Within listing items, table rows
and columns are indicated with [ROW] and [COL], respectively. For enumerations, we use the
tokens [E1] to [En] to indicate the start of an entry with the indentation level 1 to n.</p>
        <p>Ignoring that some words may be split into multiple tokens, the input for the first listing
item of Listing 1 in Figure 1 looks as follows:
[ C L S ] G i l b y C l a r k e [ C X S ] D i s c o g r a p h y [ C X S ] A l b u m s w i t h G u n s N ’ R o s e s [ C X E ]
[ E 1 ] T h e S p a g h e t t i I n c i d e n t ? ( 1 9 9 3 ) [ S E P ]</p>
        <p>We want the model to take dependencies between listing entities into account. For example, if
the SE in the first listing item is mentioned right in the beginning, it is very likely that this is the
case for the remaining listing items as well. Instead of only providing one listing item per input
sequence, we can provide as many as the input sequence length permits. Through the attention
layers within the Transformer architecture, the model is able to take these dependencies within
the input sequence into account. Hence, we put Listing 1 into one input sequence:
[ C L S ] G i l b y C l a r k e [ C X S ] D i s c o g r a p h y [ C X S ] A l b u m s w i t h G u n s N ’ R o s e s [ C X E ]
[ E 1 ] T h e S p a g h e t t i I n c i d e n t ? ( 1 9 9 3 )
[ E 1 ] G r e a t e s t H i t s ( 1 9 9 9 ) [ S E P ]</p>
        <p>Likewise, we encode Listing 3 as one input sequence:
[ C L S ] G i l b y C l a r k e [ C X S ] D i s c o g r a p h y [ C X S ] S o l o a l b u m s [ C X S ]
[ R O W ] N a m e [ C O L ] Y e a r [ C X E ]
[ R O W ] R u b b e r [ C O L ] 1 9 9 8
[ R O W ] S w a g [ C O L ] 2 0 0 1 [ S E P ]</p>
        <p>If the listing is too long to fit into one input sequence, we split the listing items into chunks and
process them one after another. Each chunk is augmented with the same context information
and a diferent set of listing items. Depending on the length of listing items, it is possible to fit
20 or more items into one input sequence. In our ablation study in Section 5.5 we show that this
item chunking strategy has a strongly positive efect on the recall of the model. But apart from
that we immensely reduce the run time of the model for training and prediction. The number
of processed input sequences is reduced by a factor that is roughly equivalent to the median
number of items per listing.6
6We deliberately use the median and not the average of items per listing as large listings will be split into multiple
input sequences due to the size limitation.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Coarse-grained Entity Type Prediction</title>
        <p>The most common notation to tag tokens in NER is the BIO notation (Begin, Inside, and Outside
of an entity) together with an entity type (e.g. Person or Organisation). We decided not to use
the BIO notation as, per definition, there is at most one SE per listing item. Instead of making
the task even simpler and getting rid of the entity type prediction in favor of a simple binary SE
prediction task as well, we decided to stick with the coarse-grained entity type prediction. This
has the advantage that the entity types can be used as additional information in downstream
tasks - most importantly in a subsequent entity disambiguation step. In addition to that, we
show in our ablation study in Section 5.5 that the more dificult task of entity type prediction
even slightly increases the precision of the model.</p>
        <p>Context and special tokens are annotated with the IGNORE label to indicate the model that
we need no prediction for these tokens. SEs are annotated with the respective entity type,
everything else is annotated with NONE. Again ignoring word-piece tokenization, the labels for
Listing 1 of Figure 1 look as follows:</p>
        <p>I G N O R E I G N O R E I G N O R E I G N O R E I G N O R E I G N O R E I G N O R E I G N O R E I G N O R E I G N O R E I G N O R E I G N O R E
I G N O R E W O R K _ O F _ A R T W O R K _ O F _ A R T W O R K _ O F _ A R T N O N E</p>
        <p>I G N O R E W O R K _ O F _ A R T W O R K _ O F _ A R T N O N E I G N O R E</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Negative Sampling through Shufled Listings</title>
        <p>It is dificult to find negative examples of complete listings if the training data is generated
heuristically and with distant supervision as described in Section 3.2. Positives can be found
easily (i.e., there is an entity in the listing item that has the correct type), but the inverse does
not always hold. If we do not find a positive, this may mean that the listing item does not
contain one, but it is as well possible that the annotation is missing. From a logical standpoint,
it is even unlikely that some items in a listing contain SEs while others do not.</p>
        <p>To mitigate this problem, we equipped our approach with a sampling mechanism for negatives
that randomly assembles them from the contexts and items of all positives in the training set. If
the context and items are assembled randomly, the diferences between the individual items
(and the diference in the context) should be higher than in a real listing. The intention of this
mechanism is that the model learns to identify the coherence between SEs of listing items as
well as between items and the context.</p>
        <p>For enumeration listings, the mechanism is simple as we pick the context from one listing
and a random number of items (between three and the maximum number of items per chunk)
from other listings. For table listings, we have to take care that the number of columns of
an assembled listing is consistent. Hence, the positives from the training set are divided into
groups of the same column size and listings are only assembled from within a single group. A
negative example produced from four diferent listings could look as follows:
[ C L S ] G i l b y C l a r k e [ C X S ] D i s c o g r a p h y [ C X S ] A l b u m s w i t h G u n s N ’ R o s e s [ C X E ]
[ E 1 ] J a m e s S t e w a r t a s B i l l y J i m H a w k i n s
[ E 1 ] C u r z o n M i l l C o m p a n y , p a r t o f A s h t o n s y n d i c a t e .
[ E 1 ] B r e p h o l o x a V a n D u z e e , 1 9 0 4 [ S E P ]</p>
        <p>The mechanism has exactly one hyper-parameter which is the proportion of negative listings
to generate. We experiment with values between 0.0 (no negative samples at all) and 1.0 (as
many negatives as we have positives).</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Fine-Tuning on Noisy Page Labels</title>
        <p>The training data generation strategy described in Section 3.2 lets us create labels for listings of
list pages that we use for the initial training of our models. To train a model that works well
on listings of any pages, additional training data of listings that are not on list pages may be
beneficial (the diferences in listings have been described in Section 3.1).</p>
        <p>We gather this data by first training a model using the heuristically labelled list pages. We
apply the model to listings of all pages for noisy labels of SEs. We then filter them by discarding
any listings where multiple types of SEs have been predicted (e.g., if the first SE of a listing is
labelled as PERSON and the second is labelled as WORK_OF_ART ).</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments</title>
      <p>The goal of our experiments is to compare the performance of our approach against previous
work on SE detection in list pages (Section 5.3) and evaluate its performance in the more general
setting of Wikipedia page listings (Section 5.4). Further, we analyze some of our design choices
in an ablation study (Section 5.5). Finally, we apply our best model to the complete Wikipedia
corpus and report our extraction results (Section 5.6).</p>
      <sec id="sec-5-1">
        <title>5.1. Metrics</title>
        <p>For the evaluation of our SE detection models, we stick to the common metrics for NER
introduced in SemEval-2013 [25]. We report precision, recall, and F1-scores of the following
scenarios:
• Partial: Prediction matches the boundary of the true entity at least partially.
• Exact: Prediction exactly matches the boundary of the true entity.
• Ent-Type: At least partial boundary match and entity type matches.</p>
        <p>• Strict: Predicted boundary and type exactly match with the true entity.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Datasets</title>
        <p>In the experiments, we primarily focus on Wikipedia as a data corpus due to its encyclopedic
structure and the convenient mapping of entities to DBpedia and CaLiGraph. From the main
dataset  which consists of all Wikipedia pages that contain listings, we create the subsets
D-LPtrain and D-LPtest (from list pages) as well as D-Ptrain and D-Ptest (from any pages with
listings). The statistics of the datasets are shown in Table 1. For the experiments, we use a
dump of the English Wikipedia from October 2020 to be compatible with the latest release of
CaLiGraph.
Statistics of the datasets used for the experiments. The complete corpus  contains all Wikipedia pages
that have listings. D-LPtrain and D-LPtest are extracted from all Wikipedia list pages and are labelled
through distant supervision; D-Ptrain contains listings from arbitrary pages and contains noisy labels
from a model trained on list pages while D-Ptest is annotated manually.</p>
        <p>Dataset
#Pages
#Listings</p>
        <p>Items per Listing (Avg.) Items per Listing (Med.)
Enums
Evaluation results for SE detection on Wikipedia list pages (evaluating on D-LPtest ). Precision, recall and
F1-score (in %) are given for the Exact scenario. 
 is the best configuration for D-LPtest while 

is the best configuration for D-Ptest using D-LPtrain as training data.</p>
        <p>
          Approach




Heist and Paulheim [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] 91
        </p>
        <p>P
93
92</p>
        <p>Enums</p>
        <p>R
82
94
93</p>
        <p>F1
86
94
93</p>
        <p>P
90
89
88</p>
        <p>Tables</p>
        <p>R
55
87
86</p>
        <p>F1
68
88
87</p>
        <p>P
90
92
91</p>
        <p>Overall</p>
        <p>R
67
91
90</p>
        <p>F1
77
92
91</p>
        <p>
          The datasets D-LPtrain and D-LPtest are created as explained in Section 3.2. For the
experiments, we use a part of D-LPtrain for validation so that we have a distribution of 60% training,
20% validation, and 20% test set (similar to [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]).
        </p>
        <p>The datasets D-Ptrain and D-Ptest consist of listings from arbitrary Wikipedia pages. Hence,
no type information is available to infer the SE labels through distant supervision. For D-Ptrain,
we retrieved the labels as described in Section 4.4. For D-Ptest , we provided the type information
by manually annotating the roughly 1K listings with coarse-grained entity types (e.g. Person or
Organisation). We mapped these types to their DBpedia counterparts and used this information
to infer the SE labels via distant supervision. This substantially reduced the annotation efort
from labelling roughly 10K listing items with concrete SE labels to labelling 1K listings with
coarse-grained types. This implies that this dataset is also, in part, heuristically created and the
results have to be taken with a grain of salt.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Evaluation on Wikipedia List Pages</title>
        <p>
          The evaluation results for experiments on the dataset D-LPtest are given in Table 2. We compare
the approach Heist and Paulheim [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] with our model in the two configurations  

7 and
7Configuration: Model roberta-base trained for 3 epochs with batch size 64, learning rate 5e-5, no warmup or weight
decay, negative sample size 0.5
        </p>
        <p>.
Precision, recall and F1-score (in %) for SE detection on Wikipedia page listings (evaluating on D-Ptest )
using our best model configuration 
 

.8 Both configurations are trained with training part of D-LPtrain and tuned using the
evaluation part. The former model configuration is the best one w.r.t. performance on
The latter model configuration is the best one w.r.t. performance on D-Ptest . To train our models,
D-LPtest .
we use the Huggingface transformer library [26].</p>
        <p>
          Both of our model configurations significantly outperform the existing approach Heist and
Paulheim [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], especially in terms of recall for both enumerations and tables, showing that our
model can identify substantially more entities while keeping a high level of precision. For
enumerations, the precision increased slightly and the recall is over ten percent higher. While
precision is kept almost constant for tables, the recall increased by more than 30 percent.
        </p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Evaluation on Wikipedia Page Listings</title>
        <p>The evaluation results for the model</p>
        <p>9 on D-Ptest is given in Table 3. Comparing the
Exact scenario with the results on Wikipedia list pages, it becomes clear that the performance
on arbitrary listings is worse. The losses in performance for tables are slightly higher than those
for enumerations. This aligns with the observation that a lower portion of tables is usable for
our approach. For tables, we have the advantage that mention boundaries are often indicated
through column separators but this does not reflect in the results. In general, we notice that
training the models for more than two to three epochs on D-LPtrain leads to overfitting on list
page data and hence reduced performance on D-Ptest .</p>
        <p>
          Unfortunately, it is not possible to apply the approach Heist and Paulheim [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] to this dataset
as it contains several features that are specific to list pages. As an alternative, we implemented
the pick-first-entity baseline which has already proven as a strong baseline in prior work [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
In this baseline, we simply label the first mentioned entity in an item as SE. In Table 3 we see
that this baseline has a very high recall (as most SEs are mentioned in the beginning) while
the precision is far lower than the one of  
 
. This shows that the model is able to sort
out many false positives (tripling precision) by sacrificing only some correct SEs. In cases
where coverage is not the only important criterion (as is usually the case in knowledge graph
completion), our model should be preferred. The more important point, however, is that our
8Configuration: Model roberta-base trained for 2 epochs with batch size 64, learning rate 5e-5, no warmup or weight
decay, negative sample size 0.3
9Configuration: Similar to
        </p>
        <p>with an additional fine-tuning step of one epoch on D-   .
  
.. without item chunks
.. without type prediction
.. without negative sampling
.. without fine-tuning on pages
model does not depend on mention boundaries as input (which might also account for some
loss in performance).</p>
      </sec>
      <sec id="sec-5-5">
        <title>5.5. Ablation Study</title>
        <p>To verify some assumptions that we made during the design of the SE detection approach, we
perform an ablation study using the page listings dataset D-Ptest . Firstly, we investigate how
much chunking of items in input sequences influences the performance of the model. The
results in Table 4 show that it has a slightly positive efect on precision (3% for enumerations,
4% for tables) and a roughly doubling efect on recall. The results confirm our assumption that
the model is able to improve its predictions by considering the dependencies between the listing
items.</p>
        <p>Further, we investigate whether the additional prediction of entity types has an influence on
the performance (as opposed to a binary prediction of SEs). The results show that there is a
positive efect on precision and a slightly negative efect on recall. As the F1 measure increases
slightly and as the predicted types provide additional information for downstream tasks, we
stick with type prediction instead of binary SE prediction.</p>
        <p>Additionally, we see from Table 4 that our negative sampling mechanism slightly increases
the precision and recall of our final model. Consequently, the model seems to be able to learn
whether there is some consistency between the listing items in the input sequence.</p>
        <p>Finally, the fine-tuning on pages has a very strong efect on recall as it comes with an increase
of 25% and the precision of the model is also increased by 5%. This result confirms that additional
ifne-tuning on noisy labels still yields a huge benefit.</p>
      </sec>
      <sec id="sec-5-6">
        <title>5.6. Subject Entity Extraction over Wikipedia</title>
        <p>Applying the model     to the complete dataset of Wikipedia listings  took 13 hours
on a single NVIDIA RTX A6000 GPU with 48GB of RAM. We extracted a total of 40 million
entity mentions from 2.7M enumerations and 1M tables on 1.7 million pages. Of the 40 million
entity mentions, 19.5 million can be traced back to 3.8 million known entities (i.e., the predicted
mention boundary overlapped with an existing link in Wikipedia, and hence, CaLiGraph), which
means that each known entity has on average 5.1 mentions. If we use that same factor of 5.1 to
estimate the number of entities for the remaining 20.5 million entity mentions, they describe 4
million additional unknown entities that could be added to the knowledge graph.</p>
        <p>In Table 5 we display the number of extracted entity mentions aggregated by entity type.
Unsurprisingly, the most frequently extracted entities are of the types Person, Work of Art,
Organisation, and Location. Apart from that, the mention type distribution roughly resembles
the distribution of entities in DBpedia [27].</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this work, we have presented a Transformer-based SE detection approach that overcomes
several limitations of prior work to make it applicable to more general settings and, at the same
time, improve extraction performance. An evaluation of listings of Wikipedia pages shows that
the performance for such a more general setting is considerably worse than for the scenario of
Wikipedia list pages. While the inferior results can partly be attributed to conceptual limitations
of SE detection in arbitrary listings (c.f. Section 3.1), further improvement is necessary so that
the results can be consumed by downstream applications without extensive post-filtering.</p>
      <p>We are developing a post-filtering mechanism that takes the diferences within an extracted
group of SEs into account. For example, we can discard a group of extracted SEs if their predicted
entity types show a high degree of diversion.</p>
      <p>In the extraction framework of CaLiGraph, we will integrate a subsequent entity
disambiguation step, which matches the identified SE mentions with existing entities or creates new entities
in the knowledge graph. The main challenge will be to match SEs with existing entities and at
the same time match SEs with one another (as the same entity may be occurring in multiple
listings).</p>
      <p>Complementary to the disambiguation step, we plan to further enhance CaLiGraph by using
the defining axioms extracted from the listing context. The disambiguation step can be supported
by the information extracted from the axioms, and similarly, the disambiguated entities can
help to refine the axiom extraction.
[18] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I.
Polosukhin, Attention is all you need, Advances in neural information processing systems 30
(2017).
[19] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional
transformers for language understanding, in: 2019 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language Technologies,
Volume 1 (Long and Short Papers), 2019, pp. 4171–4186.
[20] C. Liang, Y. Yu, H. Jiang, S. Er, R. Wang, T. Zhao, C. Zhang, BOND: BERT-assisted
open-domain named entity recognition with distant supervision, in: 26th ACM SIGKDD
International Conference on Knowledge Discovery &amp; Data Mining, 2020, pp. 1054–1064.
[21] S. Broscheit, Investigating entity knowledge in BERT with simple neural end-to-end entity
linking, in: 23rd Conference on Computational Natural Language Learning (CoNLL), 2019,
pp. 677–685.
[22] I. Ermilov, A.-C. N. Ngomo, TAIPAN: automatic property mapping for tabular data, in:</p>
      <p>European Knowledge Acquisition Workshop, Springer, 2016, pp. 163–179.
[23] L. Zhao, L. Li, X. Zheng, J. Zhang, A BERT based sentiment analysis and key entity
detection approach for online financial texts, in: 2021 IEEE 24th International Conference
on Computer Supported Cooperative Work in Design (CSCWD), IEEE, 2021, pp. 1233–1238.
[24] V. Sanh, L. Debut, J. Chaumond, T. Wolf, DistilBERT, a distilled version of BERT: smaller,
faster, cheaper and lighter, arXiv preprint arXiv:1910.01108 (2019).
[25] I. Segura-Bedmar, P. Martínez Fernández, M. Herrero Zazo, SemEval-2013 task 9: Extraction
of drug-drug interactions from biomedical texts (ddiextraction 2013), ACL, 2013.
[26] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf,
M. Funtowicz, et al., Transformers: State-of-the-art natural language processing, in: 2020
conference on empirical methods in natural language processing: system demonstrations,
2020, pp. 38–45.
[27] N. Heist, S. Hertling, D. Ringler, H. Paulheim, Knowledge graphs on the web–an overview,
Knowledge Graphs for eXplainable Artificial Intelligence: Foundations, Applications and
Challenges (2020) 3–22.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sakor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Patel</surname>
          </string-name>
          , M.-E. Vidal,
          <article-title>Falcon 2.0: An entity and relation linking tool over Wikidata</article-title>
          ,
          <source>in: 29th ACM International Conference on Information &amp; Knowledge Management</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>3141</fpage>
          -
          <lpage>3148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Isele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jakob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jentzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kontokostas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. N.</given-names>
            <surname>Mendes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Morsey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Van Kleef</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          , et al.,
          <article-title>Dbpedia-a large-scale, multilingual knowledge base extracted from wikipedia</article-title>
          ,
          <source>Semantic web 6</source>
          (
          <year>2015</year>
          )
          <fpage>167</fpage>
          -
          <lpage>195</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Heist</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <article-title>Entity extraction from Wikipedia list pages</article-title>
          ,
          <source>in: European Semantic Web Conference</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>327</fpage>
          -
          <lpage>342</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Van Erp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mendes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ilievski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Plu</surname>
          </string-name>
          , G. Rizzo,
          <string-name>
            <given-names>J.</given-names>
            <surname>Waitelonis</surname>
          </string-name>
          ,
          <article-title>Evaluating entity linking: An analysis of current benchmark datasets and a roadmap for doing a better job</article-title>
          ,
          <source>in: Tenth International Conference on Language Resources and Evaluation (LREC'16)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>4373</fpage>
          -
          <lpage>4379</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Färber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rettinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. El</given-names>
            <surname>Asmar</surname>
          </string-name>
          ,
          <article-title>On emerging entity detection</article-title>
          ,
          <source>in: European Knowledge Acquisition Workshop</source>
          , Springer,
          <year>2016</year>
          , pp.
          <fpage>223</fpage>
          -
          <lpage>238</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Extracting knowledge from web text with monte carlo tree search</article-title>
          ,
          <source>in: The Web Conference</source>
          <year>2020</year>
          ,
          <year>2020</year>
          , pp.
          <fpage>2585</fpage>
          -
          <lpage>2591</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Stanovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Michael</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Dagan</surname>
          </string-name>
          ,
          <article-title>Supervised open information extraction, in: 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          , Volume
          <volume>1</volume>
          (
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2018</year>
          , pp.
          <fpage>885</fpage>
          -
          <lpage>895</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          , G. Kasneci,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Weikum, YAGO: a core of semantic knowledge</article-title>
          ,
          <source>in: The World Wide Web Conference</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>697</fpage>
          -
          <lpage>706</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.</given-names>
            <surname>Heist</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <article-title>Uncovering the semantics of Wikipedia categories</article-title>
          , in: International Semantic Web Conference, Springer,
          <year>2019</year>
          , pp.
          <fpage>219</fpage>
          -
          <lpage>236</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Learning defining features for categories</article-title>
          ,
          <source>in: 25th International Joint Conference on Artificial Intelligence</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>3924</fpage>
          -
          <lpage>3930</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , K. Balog,
          <article-title>Web table extraction, retrieval, and augmentation: A survey</article-title>
          ,
          <source>ACM Transactions on Intelligent Systems and Technology (TIST) 11</source>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>N.</given-names>
            <surname>Heist</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <article-title>Information extraction from co-occurring similar entities</article-title>
          ,
          <source>in: The Web Conference</source>
          <year>2021</year>
          ,
          <year>2021</year>
          , pp.
          <fpage>3999</fpage>
          -
          <lpage>4009</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>X.</given-names>
            <surname>Ling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Weld</surname>
          </string-name>
          ,
          <article-title>Design challenges for entity linking</article-title>
          ,
          <source>Transactions of the ACL 3</source>
          (
          <year>2015</year>
          )
          <fpage>315</fpage>
          -
          <lpage>328</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Milne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. H.</given-names>
            <surname>Witten</surname>
          </string-name>
          ,
          <article-title>Learning to link with Wikipedia</article-title>
          ,
          <source>in: 17th ACM conference on Information and knowledge management</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>509</fpage>
          -
          <lpage>518</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Nadeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sekine</surname>
          </string-name>
          ,
          <article-title>A survey of named entity recognition and classification</article-title>
          ,
          <source>Lingvisticae Investigationes</source>
          <volume>30</volume>
          (
          <year>2007</year>
          )
          <fpage>3</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Collobert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bottou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Karlen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kavukcuoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kuksa</surname>
          </string-name>
          ,
          <article-title>Natural language processing (almost) from scratch</article-title>
          ,
          <source>Journal of machine learning research 12</source>
          (
          <year>2011</year>
          )
          <fpage>2493</fpage>
          -
          <lpage>2537</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>G.</given-names>
            <surname>Lample</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ballesteros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Subramanian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kawakami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dyer</surname>
          </string-name>
          ,
          <article-title>Neural architectures for named entity recognition</article-title>
          ,
          <source>in: 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>260</fpage>
          -
          <lpage>270</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>