<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SectionLinks: Mapping Orphan Wikidata Entities onto Wikipedia Sections</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Natalia Ostapuk</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Djellel Difallah</string-name>
          <email>djellel@nyu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Philippe Cudré-Mauroux</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>New York University</institution>
          ,
          <addr-line>New York</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Fribourg</institution>
          ,
          <addr-line>Fribourg</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Wikidata is a key resource for the provisioning of structured data on several Wikimedia projects, including Wikipedia. By design, all Wikipedia articles are linked to Wikidata entities; such mappings represent a substantial source of both semantic and structural information. However, only a small subgraph of Wikidata is mapped in that way - only about 10% of the sitelinks are linked to English Wikipedia, for example. In this paper, we describe a resource we have built and published to extend this subgraph and add more links between Wikidata and Wikipedia. We start from the assumption that a number of Wikidata entities can be mapped onto Wikipedia sections, in addition to Wikipedia articles. The resource we put forward contains tens of thousands of such mappings, hence considerably enriching the highly structured Wikidata graph with encyclopedic knowledge from Wikipedia.</p>
      </abstract>
      <kwd-group>
        <kwd>Wikidata</kwd>
        <kwd>Wikipedia</kwd>
        <kwd>Linked Data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Knowledge Graphs (KGs) provide a rich, structured, and multilingual source
of information useful for a variety of applications that require
machinereadable data. KGs are leveraged in search engines, natural language
understanding, and virtual assistants, to name but a few examples. A KG is
usually represented as a graph of vertices denoting entities and connected with
directed edges depicting their relationships. KGs can be constructed
automatically using information extraction techniques, or semi-automatically, as
is the case with Wikidata3, a KG built and maintained by a community of
volunteers. Wikidata has the advantage of being curated by humans and of
being tightly integrated with multiple Wikimedia projects (e.g., Wikipedia,
Wikimedia Commons, and Wiktionary). For example, every Wikipedia article</p>
      <sec id="sec-1-1">
        <title>3 https://www.wikidata.org/</title>
        <p>across all languages has a corresponding and unique language-independant
Wikidata entity. This mapping between Wikipedia and Wikidata is beneficial
for both projects. On one hand, it facilitates information extraction and
standardization of Wikipedia articles across languages, which can benefit from
the standard structure and values of their Wikidata counterpart, e.g., for
populating infoboxes. On the other hand, Wikipedia articles are routinely
updated, which in turn keeps Wikidata fresh and useful for online
applications.</p>
        <p>However, the Wikipedia editorial guidelines require that an entity be
notable or worthy of notice to be added to the encyclopedia, which is not the
case of Wikidata. Hence, only a fraction of Wikidata entities has a
corresponding article in any language. We refer to the remaining entities, without
an article, as orphans. In the absence of a textual counterpart, orphans often
suffer from incompleteness and lack of maintenance.</p>
        <p>Our present work stems from the observation that a substantial number
of orphan entities are indeed available in Wikipedia, but not at the page
level; orphan entities can be described within existing Wikipedia articles in
the form of sections, subsections, and paragraphs of a more generic concept
or fact. Interestingly, even a short section describing an orphan Wikidata
entity can carry useful information that could enrich the entity with additional
facts and relationships. Such pieces of information are unfortunately buried
inside long articles without direct relevance to the main subject. Instead,
we propose to establish a fine-grained mapping between Wikidata orphan
entities and Wikipedia (sub)-sections.</p>
        <p>Our main contribution is a dataset of such mapping between Wikidata
and Wikipedia sections that we created using several algorithmic methods,
ranging from string matching to graph inference.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2 Related Work</title>
      <p>To the best of our knowledge, we are the first to come up with a resource
providing fine-grained mappings between Wikipedia and Wikidata; our
mappings come in addition to the existing links that Wikipedia provides to
Wikidata through section anchors (see Section 3).</p>
      <p>
        A similar effort of matching entities to Wikipedia articles was made by
Tonon et al. in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The paper addresses the problem of constructing a
knowledge graph of Web Entities and mapping it onto DBpeadia with Wikipedia
articles acting as DBpedia entries.
      </p>
      <p>
        Our effort is not directly related to link prediction [
        <xref ref-type="bibr" rid="ref10 ref12">10,12</xref>
        ], which typically
operates in a homogeneous domain (e.g., when trying to infer new links in
a given social network or knowledge graph), while we operate across two
heterogeneous domains (i.e., Wikidata and Wikipedia). It is however related
to Ad-hoc Object Retrieval techniques [
        <xref ref-type="bibr" rid="ref13 ref17">13,17</xref>
        ], which retrieve target entities
based on keyword or natural language queries, as well as to Entity Linking
[
        <xref ref-type="bibr" rid="ref1 ref11 ref16 ref4">16,4,1,11</xref>
        ], which attempts to link mentions in Web text to their referent
entities in a knowledge base.
      </p>
      <p>
        A special case of Entity Linking is Wikipedia Linking, which aims at
discovering links between Wikipedia documents. This task was broadly studied
within the Wiki track of INEX4 conference [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">7,5,6</xref>
        ]. Participants were invited
to establish links between Wikipedia articles both at the page and text level
(i.e. detect an anchor point in the text of the source document and a best
entry point in the text of the target). The task of linking documents at the
text level is of particular interest to us as it is a general case of linking a
document to a section and closely relates to the main topic of this paper. A
number of interesting approaches were developed both for identifying link
source and target pages [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and detecting the best entry point inside the
text of the target [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Our work is also directly related to information extraction [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and KG
construction [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] efforts. In that context, a number of systems have
recently been proposed to extract information, often in the form of triples,
from structured or unstructured content and link it to a semi-structured
representation like a knowledge graph. DeepDive [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], for instance, is a
well-known tool that employs statistical learning, inference and declarative
constructs written by the user to build a knowledge base from a large
collection of text. FRED [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is a machine-reading tool that automatically
generates RDF/OWL ontologies and linked data from multilingual natural
language text. MapSDI [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is a recent rule-based mapping framework for
integrating heterogeneous data into knowledge graphs. It takes as input a set of
data sources and mapping rules to produce a consolidated knowledge graph.
None of those tools is readily applicable to our problem of linking Wikipedia
sections to Wikidata entities, however.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3 Relevance and Use Cases</title>
      <p>In Wikidata, entities are characterized by a unique identifier (a sequential
integer prefixed with Q), multilingual labels, descriptions, and aliases when
available. Each entity may have multiple statements to express a property or
a relationship with another entity. An entity can have Sitelinks5 referencing
other Wikimedia projects. These are hyperlinks that establish an identity
mapping between the entity and, for instance, a Wikipedia page. Thanks to
Sitelinks, Wikidata is often utilized as a hub for multilingual data, connecting
a given concept to articles written in a dozen languages.</p>
      <p>To understand Wikidata’s Sitelinks coverage, we collected the number of
label entities per language. We focus on the 15 languages having +1 Million
4 International Workshop of the Initiative for the Evaluation of XML Retrieval
5 https://www.wikidata.org/wiki/Help:Sitelinks</p>
      <p>English
Dutch
Spanish
French
German
Italian
Russian
Swedish
Ukranian
Chinese
Portuguese</p>
      <p>Polish
Vietnamese
Japanese
Arabic
1</p>
      <p>Count(i5nMio,logscale) 10
30</p>
      <p>OrphanEntities
Articles
50</p>
      <p>Wikipedia articles (see Section 4). We examined the number of orphan
entities (defined above in Section 1) having a label in each language, as shown
in Figure 1, which we contrast to the number of available Wikipedia
articles. We see that the gap between the number of orphans and articles is
much higher for languages having more labels. In fact, English Wikipedia,
the largest and most active project of all Wikis, links to only about 10% of
all Wikidata entities having an English label. This discrepancy signals a
necessity to close this gap using alternate methods.</p>
      <p>This work aims to identify potential orphan entity textual content that
may exist within Wikipedia in the form of sections. This content can be linked
using anchor links to article sections. Currently, Wikidata does not support
using anchors links as a Sitelink, i.e. linking to a specific section of a page.
Wikidata Entities
Brexit 
Q7888194
European Union
(Withdrawal) Act
2018 
Q29582790
exit day </p>
      <p>Q59189602</p>
      <p>It is worth noting that inter-language Wikipedia can perform this operation,
for example, the Wikidata entity Q2915096 contains a Sitelink to the English
Wikipedia page Survival_function, and all the other Wikidata sitelinks are
listed on this page in the left column (Figure 2). A link to a section can be
added to this list and thus can be mapped to the source Wikidata entity, as is
the case for the French language. Unfortunately, this is done inconsistently
and provides only an indirect mapping to Wikidata, and also assumes that
at least one language has a dedicated Wikipedia page for the entity. Our
proposed resource fills this important gap by building an external resource
to map Wikidata orphans to Wikipedia, without entering a sitelink.</p>
      <p>Figure 3 illustrates what we want to achieve. It depicts the Wikipedia
entry for Brexit. While the Brexit entity (Q7888194) from Wikidata
correctly links to that page, two related Wikidata entities are orphans:
European Union (Withdrawal) Act 2018 (Q29582790) and exit day (Q59189602).
Linking those two entities to their corresponding sections in Wikipedia, as
shown in the figure, would provide important information and context to
Wikidata and greatly improve a number of key downstream tasks such as
ad-hoc object retrieval, joint embeddings, or question answering.</p>
    </sec>
    <sec id="sec-4">
      <title>4 The Dataset</title>
      <p>We developed two different algorithms to derive mappings from Wikidata
entities to Wikipedia sections. We ran both our algorithms on 15 languages
and obtained tens of thousands of new links in the process (see section 4.2
for details). The two resulting datasets complement each other (i.e. contain
sitelinks for different sets of entities) and are both available as part of our
resource. The rest of this section describes our methods and results in
detail, and provides performance numbers and illustrative examples to better
assess the usefulness of our resource.</p>
      <sec id="sec-4-1">
        <title>4.1 Data Generation Pipeline</title>
        <p>We consider a bipartite graph G whose vertices consist of two disjoint
subsets: D, representing Wikidata entities that are missing a Wikipedia link, and
P, representing Wikipedia page sections. Our goal is to correctly match as
many vertices as possible from D to P (i.e., to create as many correct links
as possible between Wikidata entities and Wikipedia section). To help with
this task, we use existing labels and statements available from each entity,
as well as the section titles that we collect from the 15 Wikis. We proceed in
four steps:
Candidates selection: the first step is to identify candidates, both from
Wikidata (D vertices) and Wikipedia (P vertices), in order to create the
matching graph G.</p>
        <p>Key generation: then, we create a key (or a set of keys) to represent each
vertex in D and P.</p>
        <p>Matching: at this stage, we create candidate links between by matching
keys in D with keys in P.</p>
        <p>Filtering: finally, as the matching step may result in many false positive
links, we consider a postprocesing step where each resulting link is
vetted against a set of roles or conditions.</p>
        <p>We describe two different instantiations of our data generation pipeline
below: one considering a strict all-to-all matching algorithm, and a second,
graph-based algorithm that takes into account the neighborhood of each
candidate.</p>
      </sec>
      <sec id="sec-4-2">
        <title>All-to-All Matching Algorithm Our first approach considers a complete</title>
        <p>bipartite graph, where each Wikidata entity in D is matched to all Wikipedia
sections in P. Since we do not apply any restriction on the candidate targets
(Wikipedia pages) and since the number of matches grows quadratically,
the key comparison method and the filtering functions both have to be very
strict, otherwise the algorithm would return a lot of false positive matches.
We achieve this with the requirement that a Wikipedia key has to comprise
all tokens from both the page title and the section title; as such, a Wikipedia
key is specific enough to guarantee with a high probability that it refers to
the same object as a corresponding Wikidata entity.</p>
        <p>Candidates selection First, we identify all orphan Wikidata entities, i.e. all
entities that have a label in a given language but do not have a sitelink to
a corresponding Wikipedia page or section. Orphans are further filtered by
type to exclude service pages like categories or templates, as well as some
types which have homonymous labels but rarely match any Wikipedia
section (for example, an entity of the type painting with the label The
Crucifixion match the Wikipedia section describing the crucifixion of Jesus, which is
irrelevant to this object).</p>
        <p>Key generation We consider a set of keys for each Wikidata candidate in D.
This set of keys consists of its label and all its aliases for a given language.
For example, for an entity Q63854053 the set of keys will be {“spun silk”,
“noil”, “silk noil”}. To generate keys for Wikipedia page section in P, we
concatenate the page title with the section title. After all keys are generated, we
split each key into tokens, remove punctuation and stop-words, sort tokens
in alphabetic order and concatenate them back together. We used stop word
lists provided by the NLTK package6 in this context.</p>
        <p>Matching The output of the key generation step consist of two key-value
tables: one for Wikidata entities, where the keys are as described above
and each value is an entity id, and another for Wikipedia sections, with
(page_title section_title) pairs as values. These two tables are then joined by
key and grouped by QID (Wikidata entity id). This operation was performed
on a Hadoop cluster.</p>
        <p>Filtering The last step of the pipeline is result filtering. As mentioned above,
this approach considers all possible matches and hence may bring up a lot of
false positives, therefore the filtering function we use is strict also: we keep
only those QIDs for which exactly one Wikipedia section was found. In more
formal terms, this step of the algorithm checks the output of the groupBy
operation and filters out records which grouped more than one value per
QID. Figure 4 outlines the overall pipeline of our first approach.</p>
        <sec id="sec-4-2-1">
          <title>6 https://www.nltk.org</title>
          <p>Neighbors Matching Algorithm Although the above algorithm
demonstrates a good performance (over 80% precision for English Wikipedia), a
manual analysis reveals that it has a relatively low recall. The reason is that
in many cases, when a Wikipedia section describes an object its title is
selfsufficient, i.e. it stands on its own and does not depend on the title page to
identify the object. Hence, matching a Wikidata label strictly with a
combination of a page and section title results in a lot of false negatives as the page
title introduces redundancy. On the other hands, matching with section titles
only will significantly drop the precision in general. To tackle this problem,
we restrict the set of candidate Wikipedia sections for each Wikidata entity
by leveraging the Wikidata graph structure.</p>
          <p>Candidates selection Our candidate selection algorithm in this case is based
on the assumption that a Wikipedia page that is “semantically” related (e.g.,
to a subclass relation) to a Wikidata entity in D is more likely to contain a
section relevant to that entity.</p>
          <p>We introduce a second condition to further restrict the candidates as
follows: a candidate in P should be related to one and only one source entity
in D for a particular edge type. For example, consider an orphan entity
badminton racket and a triple [(badminton), (uses), (badminton racket)]. Here,
(badminton) is a good candidate, because it is linked to (badminton racket)
with the relation [uses]. On the other hands, in the triple [(Sofia Shinas),
(occupation), (singer)], (singer) is not an interesting candidate for (Sofia
Shinas), as many entities have an occupation singer.</p>
          <p>As such, we developed the following algorithm pipeline for selecting
candidate Wikipedia sections using a graph-based approach:
– Identify an orphan Wikidata entity;
– Collect its neighbor entities following all incoming and outgoing edges;
– Filter out neighbors that do not have a Wikipedia sitelink;
– Filter out neighbors with non-unique edges;
– Extract Wikipedia sitelinks from the remaining neighbors;
– Consider sections of the resulting Wikipedia pages as candidates for
matching.</p>
          <p>This algorithms yields excellent results in practice as we describe below
in Section 4.3.</p>
          <p>Key generation As we significantly limited the set of candidate Wikipedia
sections in P, we consider a different way of constructing the keys. First, we
do not always consider the tokens from the page title for the keys in P
(although they may be included). Second, in addition to removing stop words
and punctuation, we consider a third postprocessing step by stemming the
key tokens, i.e. we remove all affixes that mostly carry morphological
information and only keep just their root (e.g. words works, worked, working
are all reduced to work ). Finally, we remove disambiguation tokens from
Wikipedia page titles: when a title is ambiguous, a disambiguation word or
phrase can be added in parenthesis. For example, titles Mercury (element),
Mercury (planet) and Mercury (mythology) are all reduced to Mercury.
Matching The matching step is similar to the one in our first algorithm, but
instead of running a join of two tables we process each Wikidata entity in D
individually. If one of the Wikidata keys exactly match a Wikipedia section
key, we consider the section as a potential sitelink for this entity.
Filtering Due to the various manipulations we consider on the keys, we may
end up with situations where different Wikipedia sections have the same
key that matches a Wikidata key. For a example, an article Rotterdam Metro
includes two sections: Line D and Lines. After stemming and stop words
removal, both section titles are reduced to line. If we consider a Wikidata
entity with a label Line D (which is also reduced to line), we get two potential
matches. In that case, we consider the edit distance between the Wikidata
label and the section title (in their original forms) and pick the closest match
to break the tie.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.2 Resource Description</title>
        <p>We ran our methods using the dumps of 15 Wikipedias obtained from April
2020.7 While the Wikidata graph dump was obtained from February 2020.8</p>
        <p>The final resource contains 126,151 sitelinks for 109,734 unique entities
for 15 languages obtained with the two methods described above. The subset
of languages we initially considered were chosen according to the following
criteria:
1. Number of articles in the corresponding Wikipedia (over 1 million)9
2. Number of Wikipedia active users (over 1000)</p>
        <p>We plan to run algorithms on more languages in the future. We report
below the full list of languages we considered as well as detailed statistics
on the datasets (see Table 1).</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.3 Evaluation Results</title>
        <p>To estimate the precision of our resource, we randomly sampled several
hundreds matches from each dataset and manually evaluated them as either
true or false. For instance, one algorithm matched the entity Q49001814</p>
        <sec id="sec-4-4-1">
          <title>7 https://dumps.wikimedia.org/</title>
        </sec>
        <sec id="sec-4-4-2">
          <title>8 https://dumps.wikimedia.org/wikidatawiki/</title>
        </sec>
        <sec id="sec-4-4-3">
          <title>9 https://meta.wikimedia.org/wiki/List_of_Wikipedias</title>
          <p>Arabic
English
French
Russian
(Timber Dam, the name of the dam in Montana, USA) to the section Timber
dams of the Wikipedia page Dam, which describes a type of dams made of
timber. This match was labelled as a false positive. An example of a true
positive match is a mapping of the entity Q334415 (security camera) onto
the page Surveillance, section Cameras. We labelled each sample this way
and then divided the number of true positive matches by the sample size to
get a precision value. We then generalized from the sample observations to
the whole dataset using linear extrapolation in order to estimate the dataset
precision. Table 2 reports our results.</p>
          <p>We evaluated 12 samples – one sample per algorithm plus the joint
results, for 4 different languages (Arabic, English, French, Russian). Each
sample contains around 200 mappings. This number was chosen empirically, as
we observed that 200 random examples were enough to stabilize the metric,
and increasing the sample size did not change the resulting value
significantly. Overall we manually labelled 2400 mappings.</p>
          <p>Our evaluation aims to demonstrate that the overall accuracy of the
resource is high enough that it can be used for many tasks that do not require
a perfect dataset (for example, most deep learning algorithms are robust
to errors in the training set). We cannot provide evaluation results for all
languages unfortunately, as we decided to focus on those languages we felt
comfortable with only.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5 Availability and Reusability</title>
      <p>Our resource is available in JSON and RDF formats and comply with the
Wikibase data model.10 To keep the resource compact and as easy to process
as possible, we only publish the sitelinks discover using our methods.</p>
      <p>In the JSON representation, an entity contains two fields: id (the unique
identifier of an entity) and sitelinks (links to Wikipedia pages). Each sitelink
record comprises three fields: site, title and url. A section title is appended
to the page title separated with # symbol. Such a compound title is then
URL-encoded and added to the URL path. Following the Wikidata guidelines,
each entity is encoded as a single line.
10 https://www.mediawiki.org/wiki/Wikibase/DataModel</p>
      <p>The RDF dump is serialized using the Turtle format and stores nodes
describing Wikipedia links. Section titles are added in the same manner as
described above . 11</p>
      <p>The resource is published on the Zenodo platform under CC BY 4.0
license.12. The canonical citation is available on the Zenodo page. The source
code is also available on our github repository to help maintain and generate
newer releases in the future.13.</p>
    </sec>
    <sec id="sec-6">
      <title>6 Conclusion and Future Work</title>
      <p>We presented a dataset that extends Wikidata orphan entities with Sitelinks
referencing Wikipedia sections for the 15 most prominent languages in
Wikipedia. To generate this resource, we employed string matching and graph
processing methods that leverage multilingual labels and the graph
structure to find corresponding sections in Wikipedia. Since our methods use
heuristics, we compute the accuracy of a subset of the data using manual
judgment. This piece of information can be useful to inform downstream
application on how to use the data. For instance, for entities with an English
label, we identified 9,834 links with 82% accuracy when using exact label
matching, and 25,469 links when using graph-based method alone with 81%
accuracy.</p>
      <p>We believe that using this resource can improve both sources in terms
of completeness and freshness, as well as diminish the information gap that
persists between Wikipedia-based entities and tail-entities. For example, one
could build targeted information extraction tools and automatically curate
entities that do not have a dedicated Wikipedia article using our resource.
As future work, we plan to incorporate embedding-based similarity scores
into our mapping method and perform a comprehensive evaluation of the
obtained results in terms of both precision and recall. We also envision building
a section recommendation system that can be offered to Wikidata editors for
relevance judgment.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This project has received funding from the European Research Council (ERC)
under the European Union’s Horizon 2020 research and innovation
programme (grant agreement 683253/GraphInt).
11 For the detailed description of the Wikidata RDF format refer to: https://www.</p>
      <p>mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format
12 http://doi.org/10.5281/zenodo.3840622
13 https://github.com/eXascaleInfolab/WikidataSectionLinks</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Demartini</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Difallah</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cudré-Mauroux</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Zencrowd:
          <article-title>Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking</article-title>
          .
          <source>In: Proceedings of the 21st International Conference on World Wide Web</source>
          . p.
          <fpage>469</fpage>
          -
          <lpage>478</lpage>
          . WWW '
          <volume>12</volume>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA (
          <year>2012</year>
          ). https://doi.org/10.1145/2187836.2187900, https://doi.org/10.1145/ 2187836.2187900
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Gangemi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Presutti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Recupero</surname>
            ,
            <given-names>D.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nuzzolese</surname>
            ,
            <given-names>A.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Draicchio</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mongiovì</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Semantic web machine reading with FRED</article-title>
          .
          <source>Semantic Web</source>
          <volume>8</volume>
          (
          <issue>6</issue>
          ),
          <fpage>873</fpage>
          -
          <lpage>893</lpage>
          (
          <year>2017</year>
          ). https://doi.org/10.3233/SW-160240, https://doi.org/ 10.3233/SW-160240
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Geva</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trotman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>L.X.</given-names>
          </string-name>
          :
          <article-title>Link discovery in the wikipedia</article-title>
          .
          <source>Shlomo Geva</source>
          , Jaap Kamps, Andrew Trotman p.
          <volume>326</volume>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Collective entity linking in web text: A graphbased method</article-title>
          .
          <source>In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          . p.
          <fpage>765</fpage>
          -
          <lpage>774</lpage>
          . SIGIR '
          <volume>11</volume>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA (
          <year>2011</year>
          ). https://doi.org/10.1145/2009916.2010019, https://doi.org/10.1145/ 2009916.2010019
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>D.W.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Geva</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trotman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Overview of the INEX 2008 link the wiki track</article-title>
          . In: Geva,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Kamps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Trotman</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.)
          <article-title>Advances in Focused Retrieval, 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008</article-title>
          ,
          <string-name>
            <surname>Dagstuhl</surname>
            <given-names>Castle</given-names>
          </string-name>
          , Germany, December 15-
          <issue>18</issue>
          ,
          <year>2008</year>
          .
          <source>Revised and Selected Papers. Lecture Notes in Computer Science</source>
          , vol.
          <volume>5631</volume>
          , pp.
          <fpage>314</fpage>
          -
          <lpage>325</lpage>
          . Springer (
          <year>2008</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -03761-0_
          <fpage>32</fpage>
          , https://doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -03761-0_
          <fpage>32</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>D.W.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Geva</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trotman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Overview of the INEX 2009 link the wiki track</article-title>
          . In: Geva,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Kamps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Trotman</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.)
          <article-title>Focused Retrieval and Evaluation, 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009</article-title>
          , Brisbane, Australia, December 7-
          <issue>9</issue>
          ,
          <year>2009</year>
          ,
          <source>Revised and Selected Papers. Lecture Notes in Computer Science</source>
          , vol.
          <volume>6203</volume>
          , pp.
          <fpage>312</fpage>
          -
          <lpage>323</lpage>
          . Springer (
          <year>2009</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -14556-8_
          <fpage>31</fpage>
          , https://doi. org/10.1007/978-3-
          <fpage>642</fpage>
          -14556-8_
          <fpage>31</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>D.W.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trotman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Geva</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Overview of INEX 2007 link the wiki track</article-title>
          . In: Fuhr,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Kamps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Lalmas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Trotman</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.)
          <article-title>Focused Access to XML Documents, 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007</article-title>
          ,
          <string-name>
            <surname>Dagstuhl</surname>
            <given-names>Castle</given-names>
          </string-name>
          , Germany, December 17-
          <issue>19</issue>
          ,
          <year>2007</year>
          .
          <source>Selected Papers. Lecture Notes in Computer Science</source>
          , vol.
          <volume>4862</volume>
          , pp.
          <fpage>373</fpage>
          -
          <lpage>387</lpage>
          . Springer (
          <year>2007</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>540</fpage>
          -85902-4_
          <fpage>32</fpage>
          , https://doi.org/10.1007/978-3-
          <fpage>540</fpage>
          -85902-4_
          <fpage>32</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Itakura</surname>
            ,
            <given-names>K.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clarke</surname>
            ,
            <given-names>C.L.A.</given-names>
          </string-name>
          : University of waterloo at INEX2007:
          <article-title>adhoc and link-the-wiki tracks</article-title>
          . In: Fuhr,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Kamps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Lalmas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Trotman</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.)
          <article-title>Focused Access to XML Documents, 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007</article-title>
          ,
          <string-name>
            <surname>Dagstuhl</surname>
            <given-names>Castle</given-names>
          </string-name>
          , Germany, December 17-
          <issue>19</issue>
          ,
          <year>2007</year>
          .
          <source>Selected Papers. Lecture Notes in Computer Science</source>
          , vol.
          <volume>4862</volume>
          , pp.
          <fpage>417</fpage>
          -
          <lpage>425</lpage>
          . Springer (
          <year>2007</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>540</fpage>
          -85902-4_
          <fpage>35</fpage>
          , https://doi.org/10.1007/978-3-
          <fpage>540</fpage>
          -85902-4_
          <fpage>35</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Jozashoori</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vidal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Mapsdi: A scaled-up semantic data integration framework for knowledge graph creation</article-title>
          . In: Panetto,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Debruyne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Hepp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Ardagna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.A.</given-names>
            ,
            <surname>Meersman</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (eds.)
          <article-title>On the Move to Meaningful Internet Systems: OTM 2019 Conferences -</article-title>
          Confederated International Conferences: CoopIS, ODBASE,
          <string-name>
            <surname>C</surname>
          </string-name>
          &amp;
          <article-title>TC 2019, Rhodes</article-title>
          , Greece,
          <source>October 21- 25</source>
          ,
          <year>2019</year>
          ,
          <source>Proceedings. Lecture Notes in Computer Science</source>
          , vol.
          <volume>11877</volume>
          , pp.
          <fpage>58</fpage>
          -
          <lpage>75</lpage>
          . Springer (
          <year>2019</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -33246-
          <issue>4</issue>
          _4, https: //doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -33246-
          <issue>4</issue>
          _
          <fpage>4</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Liben-Nowell</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kleinberg</surname>
            ,
            <given-names>J.M.:</given-names>
          </string-name>
          <article-title>The link prediction problem for social networks</article-title>
          .
          <source>In: Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management</source>
          , New Orleans, Louisiana, USA, November 2-
          <issue>8</issue>
          ,
          <year>2003</year>
          . pp.
          <fpage>556</fpage>
          -
          <lpage>559</lpage>
          . ACM (
          <year>2003</year>
          ). https://doi.org/10.1145/956863.956972, https://doi.org/10.1145/956863. 956972
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mausam</surname>
            , Etzioni,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Entity linking at web scale</article-title>
          .
          <source>In: Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-Scale Knowledge Extraction</source>
          . p.
          <fpage>84</fpage>
          -
          <lpage>88</lpage>
          . AKBC-WEKEX '
          <fpage>12</fpage>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computational Linguistics, USA (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Martínez</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berzal</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Talavera</surname>
            ,
            <given-names>J.C.C.</given-names>
          </string-name>
          :
          <article-title>A survey of link prediction in complex networks</article-title>
          .
          <source>ACM Comput. Surv</source>
          .
          <volume>49</volume>
          (
          <issue>4</issue>
          ),
          <volume>69</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>69</lpage>
          :
          <fpage>33</fpage>
          (
          <year>2017</year>
          ). https://doi.org/10.1145/3012704, https://doi.org/10.1145/3012704
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Pound</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mika</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaragoza</surname>
          </string-name>
          , H.:
          <article-title>Ad-hoc object retrieval in the web of data</article-title>
          .
          <source>In: Proceedings of the 19th International Conference on World Wide Web</source>
          . pp.
          <fpage>771</fpage>
          -
          <lpage>780</lpage>
          . WWW '10,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2010</year>
          ). https://doi.org/10.1145/1772690.1772769, http://doi.acm.
          <source>org/10</source>
          .1145/ 1772690.1772769
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Qiao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hong</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhiguang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          :
          <article-title>Knowledge graph construction techniques</article-title>
          .
          <source>Journal of Computer Research and Development</source>
          <volume>53</volume>
          (
          <issue>3</issue>
          ),
          <fpage>582</fpage>
          -
          <lpage>600</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Sarawagi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Information extraction</article-title>
          .
          <source>Foundations and trends in databases 1(3)</source>
          ,
          <fpage>261</fpage>
          -
          <lpage>377</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J</given-names>
            ., Han, J
          </string-name>
          .:
          <article-title>Entity linking with a knowledge base: Issues, techniques, and solutions</article-title>
          .
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>27</volume>
          (
          <issue>2</issue>
          ),
          <fpage>443</fpage>
          -
          <lpage>460</lpage>
          (
          <year>2015</year>
          ). https://doi.org/10.1109/TKDE.
          <year>2014</year>
          .
          <volume>2327028</volume>
          , https://doi.org/10.1109/TKDE.
          <year>2014</year>
          .2327028
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Tonon</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demartini</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cudré-Mauroux</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Combining inverted indices and structured search for ad-hoc object retrieval</article-title>
          .
          <source>In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          . pp.
          <fpage>125</fpage>
          -
          <lpage>134</lpage>
          . SIGIR '12,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2012</year>
          ). https://doi.org/10.1145/2348283.2348304, http://doi.acm.
          <source>org/ 10</source>
          .1145/2348283.2348304
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Tonon</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Felder</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Difallah</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cudré-Mauroux</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : VoldemortKG:
          <article-title>Mapping schema.org and Web Entities to Linked Open Data</article-title>
          . Springer International Publishing (
          <year>2016</year>
          ). https://doi.org/10.1007/978- 3-
          <fpage>319</fpage>
          -46547-023, https://exascale.info/assets/pdf/voldemort.pdf, http://www.slideshare.net/eXascaleInfolab/voldemortkg
          <article-title>-mapping-schemaorgand-web-entities-to-linked-open-data</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ré</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cafarella</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Deepdive: declarative knowledge base construction</article-title>
          .
          <source>Commun. ACM</source>
          <volume>60</volume>
          (
          <issue>5</issue>
          ),
          <fpage>93</fpage>
          -
          <lpage>102</lpage>
          (
          <year>2017</year>
          ). https://doi.org/10.1145/3060586, https://doi.org/10.1145/3060586
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>