<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Detecting and Correcting Typing Errors in DBpedia</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daniel Caminhas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Cones</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Natalie Hervieux</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Denilson Barbosa</string-name>
          <email>denilsong@ualberta.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Alberta Edmonton</institution>
          ,
          <addr-line>AB</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>DBpedia has long been one of the major hubs of the Linked Open Data ecosystem. It is built by a largely automated process that uses many extractors and manually curated mappings to read information from infoboxes on Wikipedia. Given the complexity of the task, it is not surprising that DBpedia contains di erent kinds of errors, ranging from mistakes in the source text to errors in the extractors themselves (or in the order in which they are applied). Of particular importance are typing errors in which an entity is assigned a type from the DBpedia ontology to which it does not belong. These errors propagate very far, given the modern practice of relying on Knowledge Graphs (KGs) such as DBpedia for obtaining training data through distant supervision. We posit a way to correct these errors is through a post factum analysis of the KG. Thus, we introduce and evaluate a KG re nement approach that uses binary classi ers that rely on semantic embeddings of the entities to detect and remove incorrect type assignments. Our initial evaluation is done using a highly curated gold standard of 35 types from the DBpedia ontology and shows the method is very promising.</p>
      </abstract>
      <kwd-group>
        <kwd>knowledge graphs entity embeddings DBpedia</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Knowledge Graphs (KGs) built from Web sources have been found e ective for
end-user applications such as question answering (e.g., YAGO [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] used in the
IBM Watson System [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) and data interlinking in the Linked Open Data (LOD)
ecosystem. Moreover, such KGs are the primary source of training data for NLP
applications following the distant supervision paradigm. Many approaches exist
for building and maintaining KGs: they can be manually curated; collaboratively
edited, like Freebase [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Wikidata [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]; or automatically extracted, like
DBpedia [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Many companies have their own proprietary KGs, including Facebook,
Google, Microsoft, and Di bot, to mention a few.
      </p>
      <p>Manually curated KGs tend to have high precision, often at the expense
of coverage, while KGs automatically derived from Web sources have a high
coverage but are susceptible to systematic errors. These errors may impact public
image when manifested in user-facing applications such as web or social media
search and can have far-reaching consequences as they propagate. Detecting and
xing these errors depends on the processes used and level of human involvement
in creating the KGs.</p>
      <p>
        Despite DBpedia's importance as a general use KG as well as its crucial
role for the LOD movement, about 12% of DBpedia triples have some quality
issues [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Most triples in DBpedia come from parsing the infoboxes of articles
in Wikipedia. In principle, the infoboxes should follow strict templates with a
list of attributes for the type of entity described in the article (e.g., person,
organization, etc.). However, adherence to templates and editorial practices is
hard to enforce, especially over time.
      </p>
      <p>
        Typing Errors. One particularly problematic error in DBpedia concerns
entity types. For example, at the time of writing, DBpedia says that the
entity dbr:Egypt is a dbo:MusicalArtist. Similarly, dbr:United Nations and
dbr:European Union are, among other things, also classi ed as a dbo:Country,
together with another 7,106 entities, which seems unreasonably high, even
accounting for entities that were historically identi ed as such. Table 1 shows
other examples of type inconsistencies that can be identi ed using disjointness
axioms [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Besides incorrect type assignments, DBpedia also su ers from the
problem of missing types for some entities. For example, 27% of the 30935
entities classi ed as a dbo:University are not classi ed as an dbo:Organization.
      </p>
      <p>We note that these errors, although problematic, are the exception instead
of the norm in DBpedia. Moreover, we posit that, given the complexity and
inherently noisy processes through which Wikipedia and DBpedia are created,
the best way to correct these errors is through a post factum analysis of the
entity types, which is what we seek to accomplish.</p>
      <p>Throughout the paper, we use the customary dbr:, dbo:, and dbp: pre xes to
indicate resources (which are entities), ontological predicates (e.g., types), and
properties, respectively.</p>
      <p>
        Wikipedia states that the United Nations have 193 members, while there are 8 other
entities that are not members but are recognized as countries by at least one UN
member.
Our Contribution. We propose the use of binary classi ers (one per type)
to predict the type(s) of DBpedia entities. The classi ers rely on two kinds
of semantic embeddings of the entities. From the Wikipedia text, we derive
word2vec-like embeddings, while from DBpedia itself, we use PCA [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] to embed
the entities based on their ontological properties. We test our method using a
manually curated partial gold standard with 3876 entities of 35 di erent types
of the DBpedia ontology. The performed experiments show that our approach
is able to automatically nd errors and assign types for DBpedia entities with
over 97% accuracy.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        In the construction of a knowledge graph, there is a trade-o between coverage
and correctness. To address those problems, some e ort has been made to re ne
knowledge graphs. In contrast to the knowledge graph creation methods, the
re nement techniques assume the existence of a knowledge graph which can be
improved in a post-processing step by adding missing knowledge or identifying
and removing errors [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>One possible approach is to validate the knowledge graph manually using
human annotators. Besides being costly, this approach is unfeasible for large
databases such as DBpedia, due to its low scalability. Because of this, most
researchers focus on developing automatic or semi-automatic solutions for
knowledge graph re nement.</p>
      <p>
        Many of the proposed solutions aim nding erroneous relations (i.e., the
edges of the graph) between pairs of entities [
        <xref ref-type="bibr" rid="ref13 ref2 ref3 ref5 ref9">2, 3, 5, 9, 13</xref>
        ]. Meanwhile, others
works aim to nd incorrect literal values, such as numbers and dates. Identifying
incorrect interlinks (links that connect entities representing the same concept in
di erent graphs) between knowledge graphs has also been attempted [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. A
comprehensive survey on knowledge graph re nement methods is presented by
Paulheim [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        To the best of our knowledge, Ma et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] was the rst attempt at identifying
incorrect type assertions. They proposed using disjointness axioms to detect
inconsistencies. To create the axioms, they used association rule mining since
it allows for the discovery of implicit knowledge in massive data. The axioms
are learned from DBpedia and tested on DBpedia and Zhishi.me [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Although
this approach is, in fact, able to identify several inconsistencies, it has a few
limitations. First of all, the association rules are learned from DBpedia, which is
itself a noisy dataset. Thus, there will always be some wrong axioms. Secondly,
some entities on DBpedia are assigned to a single incorrect type. For example,
the only assigned type for dbr:Nail polish is dbo:Person, which is wrong.
However, since there are no other types associated with this entity, there is no
axiom capable of identifying this error, because each rule involves two classes.
      </p>
      <p>
        In this work, we introduce resource2vec embeddings, which are vectors,
similar to word embeddings [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], that represent entities on DBpedia. These
embeddings are used as a feature for a set of machine learning classi ers that detect if
the type assigned to an entity is correct. The intuition behind this approach is
that embeddings of entities of the same type will be closer to one another in an
n-dimensional continuous vector space than embeddings of entities of di erent
types. For example, the similarity between two vectors for entities of the type
Country (e.g., dbr:Canada and dbr:United States) will be greater than the
similarity between a vector of a country and a university (e.g., dbr:Canada and
dbr:Stanford University).
      </p>
      <p>
        The usage of entity embedding for type detection on DBpedia was also
proposed by Zhou et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. One important di erence between our work and the
one presented by Zhou et al. resides in the creation of the embedding. While
they only use Wikipedia to train their embeddings, our embeddings are trained
using properties from both Wikipedia and DBpedia. (as we explain in section
3). Another important di erence is in the dataset used for training and testing.
Zhou et al. uses DBpedia itself to create the datasets. They query a public
DBpedia SPARQL endpoint to select, for each DBpedia type, entities as positive
examples of that type. Negative examples are chosen from a random selection of
instances from all the remaining types. We argue that this approach will create a
noisy dataset since, as we discussed, many entities on DBpedia have incorrectly
assigned types, and that is exactly the problem that we are attempting to solve.
In this work, we use a manually curated partial gold standard for training and
testing.
3
3.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Method</title>
      <sec id="sec-3-1">
        <title>Representing DBpedia Entities</title>
        <p>
          Our approach consists of creating a semantic mapping of DBpedia resources,
which is used as a feature for a set of binary machine learning classi ers. For that,
we concatenate wikipedia2vec and DBpedia2vec embeddings. The wikipedia2vec
are word2vec-like embeddings that represent Wikipedia entities. They are
created using Wikipedia2Vec [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], a tool that allows learning embeddings of words
and entities simultaneously, and places similar words and entities close to one
another in a continuous vector space.
        </p>
        <p>DBpedia2vec are embeddings that help to represent a DBpedia entity. They
are created using the entity's properties (i.e., predicates in the RDF tuples) on
DBpedia. Our intuition is that most entities of the same type share the same
properties. For example, countries usually have properties such as dbo:areaTotal,
dbo:capital, and dbo:largestCity, while people are more likely to have
properties like dbo:birthDate, dbo:birthPlace, and
dbo:nationality.</p>
        <p>
          To create DBpedia2vec, we create a list of all distinct properties existing
in DBpedia (ignoring properties that are common across all types on DBpedia
ontology, such as dbo:wikiPageID, dbo:wikiPageWikiLink, dbo:abstract, and
dbo:sameAs). A one-hot encoding vector is created for the entity. Each dimension
of this vector represents one of the 3480 properties of DBpedia. Then, we apply
a probabilistic principal component analysis (PCA) [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] to linearly reduce the
dimensionality of the embeddings using Singular Value Decomposition of the
data. In this way, we are able to project the 3480-dimension embeddings to a
lower dimensional and continuous space with n2 = 300 dimensions.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Identifying and correcting erroneous types</title>
        <p>The resource2vec embeddings are used as a feature by a binary classi er which
is trained to determine if the type assigned to a resource is correct. One classi er
is trained for each type, using resource2vec embeddings of resources that belong
to that type as positive examples and resource2vec embeddings of randomly
selected resources from all other types as negative examples.</p>
        <p>
          This approach allows us to not only identify erroneous type assignments
but also to assign the correct type to any DBpedia resource for which the
resource2vec embedding is created, even if no type has been assigned yet on
DBpedia. We tested the classi cation using three algorithms: Naive Bayes, K-nearest
neighbours (K-NN), and nearest centroids, which represents each class (i.e., each
type) by its centroid and assigns the class of the nearest centroid to test
samples [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiment setup and Results</title>
      <p>The experiments were performed using resource2vec embeddings created by
concatenating 500-dimensional wikipedia2vec embeddings trained on a Wikipedia
dump extracted Feb. 2019 and 300-dimensional dbpedia2vec trained on the
2016 release of DBpedia. In an attempt to obtain high-quality embeddings, the
wikipedia2vec embeddings were trained with 10 iterations over the articles, a
windows size of 10, and a minimum number of 10 occurrences for words and 5
occurrences for entities.</p>
      <p>Gold Standard. To better evaluate our classi ers, we created a gold
standard encompassing the following 35 types from the DBpedia ontology:
Aircraft, Airline, Airport, Album, AmericanFootballPlayer, Animal, Automobile,
Bacteria, Bank, Book, Building, City, Country, Currency, Food, Galaxy,
HorseTrainer, Language, MilitaryCon ict, Murderer, MusicalArtist,
MythologicalFigure, Planet, Plant, President, Software, Song , Sport, Swimmer, Theatre,
TimePeriod, Train, University, Volcano, and Weapon. We chose these types with the
goal of maximizing the diversity of entities while minimizing inter-type overlap
(which could potentially confuse our analysis and preliminary conclusions). If
the approach works well in this simpli ed setting, it may be worth scaling the
solution to consider all types on DBpedia.</p>
      <p>To build the gold standard, annotators were asked to use any resources at
their disposal (e.g., Wikipedia's own entity lists or categories) to nd examples
of entities in each of the 35 types. In total, we selected 3876 entities. The number
of entities per type varied from 94 (for the types Sport and Software) to 112 (for
the type Aircraft ). All of our testing and evaluation data can be downloaded
from https://bit.ly/2FcqQQW.</p>
      <p>
        The Need For Manual Annotations. An alternative to the manual
annotation and evaluation that we followed here would be exploiting an independent
KG (e.g., Google's knowledge graph) for the evaluation. In principle, such an
external KG could be used to correct typing errors on its own. We attempted
such an approach but ran into several di culties. First, there is the issue of the
incompleteness of the external KG itself. We found that generally only high-level
types are assigned to entities in the Google KG: for instance, most entities of
type Aircraft in DBpedia are labeled simply as Thing in the Google KG.
Second, DBpedia interlinks to other KGs are wrong or missing up to 20% of the
time [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], therefore, nding equivalent entities on di erent knowledge graphs is
a challenging task itself. Finally, the ontology of the KGs may be signi cantly
di erent, for example, we noticed that other KGs (e.g., YAGO), has signi cantly
more types than DBpedia.
4.1
      </p>
      <sec id="sec-4-1">
        <title>Comparing classi ers</title>
        <p>Our rst experiment consisted of comparing popular binary classi ers suitable
for the task. We used 70% of the entities of the gold standard for training the
classi ers and the remaining 30% for testing them. Hyperparameter tuning for
k-NN was performed using 5-fold cross-validation. Table 2 shows the results. The
reported values are an average of 10 runs on di erent training/testing splits of
the gold standard. For each run, we averaged the precision, recall, F1-Score, and
accuracy of the 35 binary classi ers. The Nearest Centroid approach leads to
better classi ers, achieving more than 97% of performance in all metrics, while
the Naive Bayes classi ers achieved the lowest performance.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Predicting types of unseen entities</title>
        <p>Motivated by the high accuracy of the binary classi ers, we tested whether the
proposed supervised approach could detect incorrect type assignments among
other entities of the 35 classes in our gold standard. For this, we used the best
performing algorithm (Nearest Centroid). In total, 364,791 entity-type pairs were
checked by the classi er: a positive classi cation con rmed the type prediction
while a negative classi cation disproved it. Human annotators veri ed the output
of the classi ers for a random sample of 3699 predictions. Table 3 shows the
results.</p>
        <p>Upon further inspection of the false negatives, we noticed some
discrepancies in the way entities are classi ed in DBpedia which were not re ected in the
way we created our gold standard. The most notable example concerns the class
dbo:Animal, for which most instances correspond to Wikipedia articles
describing a species (e.g., dbr:American black bear). In fact, all instances in our gold
standard correspond to species. In our test sample, however, we found many
individual racehorses also classi ed as dbo:Animal (e.g., dbr:Fusaichi Pegasus).
Not surprisingly, all such instance-type pairs were (correctly in our opinion)
rejected by our classi er. To further illustrate our claim, we note that
racehorses have properties like dbo:honours, dbo:owner, dbo:sex, dbo:trainer,
and dbp:earnings, while most other instances with dbo:animal type have
properties such as dbo:family, dbo:genus, dbo:kingdom, dbo:order, dbo:phylum,
and dbo:conservationStatus.</p>
        <p>We found other similar cases involving other ontology types. In order to
more accurately evaluate the e ectiveness of the classi er, we removed from our
analysis the following cases:
{ Animals that are also an instance of type racehorse.
{ Cities that are ctional or medieval cities.
{ Automobiles that are buses or trucks.
{ Songs that are rhymes, prayers, hymns, lullabies, or marches.
{ Countries that are ctional countries, former sovereign states, or former
kingdoms.</p>
        <p>The manual inspection showed that the proposed approach has a very low
false positive rate of less than 5%, which is very encouraging. Moreover, the
method is correct about 75% of the times it claims a type assignment is wrong,
for a false negative rate below 25%. The performance of the classi er varies across
classes: for example, both false positive and false negative rates for entities tagged
as dbo:HorseTrainer is 0%. On the other hand, the false positive rate for the
class dbo:President is 13%. That is probably because dbp:President is a more
generic class, which can include presidents of countries, universities, companies,
institutes, associations, councils, etc. This could be addressed by increasing the
diversity of entities in the training data.</p>
        <p>Some entities had multiple types.
Since DBpedia entities are annotated with types from YAGO, we also attempted
to leverage these links for identifying and correcting typing errors in DBpedia.
However, these two ontologies cannot be easily aligned: we found 419,297 unique
objects with a YAGO pre x for which there was a predicate rdf:type associated
with a DBpedia entity. Thus, our attempt to use YAGO boiled down to: for
each DBpedia entity in our gold standard , we created a one-hot encoding vector
that represents the YAGO types assigned to that entity, then we apply PCA to
reduce the dimensionality of the one-hot encoding vectors. From those vectors,
we created binary classi ers as in Section 4.1. The results are shown in Table 4.</p>
        <p>Although the embeddings created using all YAGO types seem to carry a
strong signal, it is clear we need a way to lter out a large number of types
before we can obtain embeddings for entities other than in the gold standard.
Furthermore, we observed several entities in the gold standard with identical sets
of YAGO types, which means that the classi ers may be over tting, rendering
the numbers in Table 4 unreliable.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Future Work</title>
      <p>This paper presented an e ective approach for detecting erroneous type
assignments in a KG by leveraging an annotated corpus of text and the properties in
the KG itself. The assumptions behind the method are that the input KG is of
su cient quality so that the initial type assignment is not random. Moreover,
our method can be applied post factum, without requiring any changes to the
already complex KG generation process. We evaluated the approach on a large
and carefully created gold standard, and obtained very encouraging results. We
used the best performing binary classi ers to verify the type assignment for
thousands of unseen entity-type pairs and found out that the false positive rate of
the method is below 5%, while the false negative rate is below 25%. We believe
these results are encouraging. The method also found some inconsistencies in
the way types are assigned to entities.</p>
      <p>There are several interesting directions for future work. First, the method
described here is supervised, and although expanding our gold standard to
enWe were not able to obtain embeddings for all entities among the 35 types, let alone
embeddings for all entities in DBpedia, due to the time complexity of PCA analysis
and the size of the input matrix.
compass the 537 classes in DBpedia certainly seems within reach of its
community, we seek to develop a fully unsupervised method for selecting representative
entities for each class to be used to derive the centroids. Another interesting idea
would be to perform a detailed assessment of the DBpedia ontology and try to
identify types that are too broad and should be split into multiple subtypes, and
types that are too speci c and could be merged with others. Also, we believe it
would be interesting to evaluate our method for the task of identifying missing
types (as opposed to incorrect ones). Finally, although we tested on DBpedia
only, we believe our method could be easily adapted to nd errors on other
knowledge graphs provided one can nd an annotated text corpus.</p>
      <sec id="sec-5-1">
        <title>Acknowledgements</title>
        <p>This work was done with the support of the Natural Sciences and Engineering
Research Council of Canada and gifts from NVIDIA and Di bot Inc.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Kurt</given-names>
            <surname>Bollacker</surname>
          </string-name>
          , Colin Evans, Praveen Paritosh, Tim Sturge, and
          <string-name>
            <given-names>Jamie</given-names>
            <surname>Taylor</surname>
          </string-name>
          . Freebase:
          <article-title>a collaboratively created graph database for structuring human knowledge</article-title>
          .
          <source>In Proceedings of the 2008 ACM SIGMOD international conference on Management of data</source>
          , pages
          <volume>1247</volume>
          {
          <fpage>1250</fpage>
          . AcM,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Carlson</surname>
          </string-name>
          , Justin Betteridge, Richard C Wang,
          <string-name>
            <surname>Estevam R Hruschka Jr</surname>
          </string-name>
          , and Tom M Mitchell.
          <article-title>Coupled semi-supervised learning for information extraction</article-title>
          .
          <source>In Proceedings of the third ACM international conference on Web search and data mining</source>
          , pages
          <volume>101</volume>
          {
          <fpage>110</fpage>
          . ACM,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Xin</given-names>
            <surname>Dong</surname>
          </string-name>
          , Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang.
          <article-title>Knowledge vault: A webscale approach to probabilistic knowledge fusion</article-title>
          .
          <source>In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          , pages
          <volume>601</volume>
          {
          <fpage>610</fpage>
          . ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>David</given-names>
            <surname>Ferrucci</surname>
          </string-name>
          ,
          <string-name>
            <surname>Eric Brown</surname>
          </string-name>
          , Jennifer Chu-Carroll,
          <string-name>
            <given-names>James</given-names>
            <surname>Fan</surname>
          </string-name>
          , David Gondek, Aditya A Kalyanpur,
          <string-name>
            <given-names>Adam</given-names>
            <surname>Lally</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J William</given-names>
            <surname>Murdock</surname>
          </string-name>
          , Eric Nyberg, John Prager, et al.
          <article-title>Building watson: An overview of the deepqa project</article-title>
          .
          <source>AI magazine</source>
          ,
          <volume>31</volume>
          (
          <issue>3</issue>
          ):
          <volume>59</volume>
          {
          <fpage>79</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Jens</surname>
            <given-names>Lehmann</given-names>
          </string-name>
          , Daniel Gerber, Mohamed Morsey, and
          <article-title>Axel-Cyrille Ngonga Ngomo</article-title>
          .
          <article-title>Defacto-deep fact validation</article-title>
          .
          <source>In International semantic web conference</source>
          , pages
          <volume>312</volume>
          {
          <fpage>327</fpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Jens</given-names>
            <surname>Lehmann</surname>
          </string-name>
          , Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas,
          <string-name>
            <surname>Pablo N Mendes</surname>
            ,
            <given-names>Sebastian</given-names>
          </string-name>
          <string-name>
            <surname>Hellmann</surname>
          </string-name>
          , Mohamed Morsey, Patrick Van Kleef,
          <source>Soren Auer</source>
          , et al.
          <article-title>Dbpedia{a large-scale, multilingual knowledge base extracted from wikipedia</article-title>
          .
          <source>Semantic Web</source>
          ,
          <volume>6</volume>
          (
          <issue>2</issue>
          ):
          <volume>167</volume>
          {
          <fpage>195</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Yanfang</surname>
            <given-names>Ma</given-names>
          </string-name>
          , Huan Gao,
          <string-name>
            <surname>Tianxing Wu</surname>
            , and
            <given-names>Guilin</given-names>
          </string-name>
          <string-name>
            <surname>Qi</surname>
          </string-name>
          .
          <article-title>Learning disjointness axioms with association rule mining and its application to inconsistency detection of linked data</article-title>
          .
          <source>In Chinese Semantic Web and Web Science Conference</source>
          , pages
          <volume>29</volume>
          {
          <fpage>41</fpage>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , Ilya Sutskever, Kai Chen, Greg S Corrado, and
          <string-name>
            <given-names>Je</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <volume>3111</volume>
          {
          <fpage>3119</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Ndapandula</given-names>
            <surname>Nakashole</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Martin</given-names>
            <surname>Theobald</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Gerhard</given-names>
            <surname>Weikum</surname>
          </string-name>
          .
          <article-title>Scalable knowledge harvesting with high precision and high recall</article-title>
          .
          <source>In Proceedings of the fourth ACM international conference on Web search and data mining</source>
          , pages
          <volume>227</volume>
          {
          <fpage>236</fpage>
          . ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Xing</surname>
            <given-names>Niu</given-names>
          </string-name>
          , Xinruo Sun, Haofen Wang, Shu Rong, Guilin Qi, and
          <string-name>
            <given-names>Yong</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <article-title>Zhishi. me-weaving chinese linking open data</article-title>
          .
          <source>In International Semantic Web Conference</source>
          , pages
          <volume>205</volume>
          {
          <fpage>220</fpage>
          . Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Heiko</given-names>
            <surname>Paulheim</surname>
          </string-name>
          .
          <article-title>Identifying wrong links between datasets by multi-dimensional outlier detection</article-title>
          .
          <source>In WoDOOM</source>
          , pages
          <volume>27</volume>
          {
          <fpage>38</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>Heiko</given-names>
            <surname>Paulheim</surname>
          </string-name>
          .
          <article-title>Knowledge graph re nement: A survey of approaches and evaluation methods</article-title>
          .
          <source>Semantic web</source>
          ,
          <volume>8</volume>
          (
          <issue>3</issue>
          ):
          <volume>489</volume>
          {
          <fpage>508</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>Heiko</given-names>
            <surname>Paulheim</surname>
          </string-name>
          and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Bizer</surname>
          </string-name>
          .
          <article-title>Improving the quality of linked data using statistical distributions</article-title>
          .
          <source>International Journal on Semantic Web and Information Systems (IJSWIS)</source>
          ,
          <volume>10</volume>
          (
          <issue>2</issue>
          ):
          <volume>63</volume>
          {
          <fpage>86</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Fabian</surname>
            <given-names>M Suchanek</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Gjergji</given-names>
            <surname>Kasneci</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Gerhard</given-names>
            <surname>Weikum</surname>
          </string-name>
          .
          <article-title>Yago: a core of semantic knowledge</article-title>
          .
          <source>In Proceedings of the 16th international conference on World Wide Web</source>
          , pages
          <volume>697</volume>
          {
          <fpage>706</fpage>
          . ACM,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Robert</surname>
            <given-names>Tibshirani</given-names>
          </string-name>
          , Trevor Hastie, Balasubramanian Narasimhan, and
          <string-name>
            <given-names>Gilbert</given-names>
            <surname>Chu</surname>
          </string-name>
          .
          <article-title>Diagnosis of multiple cancer types by shrunken centroids of gene expression</article-title>
          .
          <source>Proceedings of the National Academy of Sciences</source>
          ,
          <volume>99</volume>
          (
          <issue>10</issue>
          ):
          <volume>6567</volume>
          {
          <fpage>6572</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Michael</surname>
            <given-names>E</given-names>
          </string-name>
          <string-name>
            <surname>Tipping and Christopher M Bishop.</surname>
          </string-name>
          <article-title>Probabilistic principal component analysis</article-title>
          .
          <source>Journal of the Royal Statistical Society: Series B (Statistical Methodology)</source>
          ,
          <volume>61</volume>
          (
          <issue>3</issue>
          ):
          <volume>611</volume>
          {
          <fpage>622</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <article-title>Denny Vrandecic and Markus Krotzsch. Wikidata: a free collaborative knowledge base</article-title>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Ikuya</surname>
            <given-names>Yamada</given-names>
          </string-name>
          , Akari Asai, Hiroyuki Shindo, Hideaki Takeda, and Yoshiyasu Takefuji.
          <article-title>Wikipedia2vec: An optimized tool for learning embeddings of words and entities from wikipedia</article-title>
          .
          <source>arXiv preprint</source>
          <year>1812</year>
          .
          <volume>06280</volume>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Amrapali</surname>
            <given-names>Zaveri</given-names>
          </string-name>
          , Dimitris Kontokostas,
          <article-title>Mohamed A Sherif, Lorenz Buhmann, Mohamed Morsey, Soren Auer, and Jens Lehmann. User-driven quality evaluation of dbpedia</article-title>
          .
          <source>In Proceedings of the 9th International Conference on Semantic Systems</source>
          , pages
          <fpage>97</fpage>
          {
          <fpage>104</fpage>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Hanqing</surname>
            <given-names>Zhou</given-names>
          </string-name>
          , Amal Zouaq, and
          <string-name>
            <given-names>Diana</given-names>
            <surname>Inkpen</surname>
          </string-name>
          .
          <article-title>Dbpedia entity type detection using entity embeddings and n-gram models</article-title>
          .
          <source>In International Conference on Knowledge Engineering and the Semantic Web</source>
          , pages
          <volume>309</volume>
          {
          <fpage>322</fpage>
          . Springer,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>