<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Knowledge Graph Construction from Entity Co-occurrence</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Data and Web Science Group, University of Mannheim</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The construction of knowledge graphs from resources on the Web is a topic that gained a lot of attention in recent years, especially with the uprising of large-scale cross-domain knowledge graphs like DBpedia and YAGO. Their successful exploitation of Wikipedia's structural elements like infoboxes and categories gives way to the thought that there is still a huge potential for approaches focused on structural elements of web documents. In this work, we present our research idea towards further exploitation of semi-structured data with a focus on entity cooccurrence. We want to explore the potential of co-occurrence patterns in varying contexts and test their generality when applying them to the Document Web. An overview of the state of the art is given and we show how our three-phased approach for the extraction of co-occurrence patterns ts in. Two planned experiments for the construction of knowledge graphs based on Wikipedia and the Document Web are sketched. Finally, potentials and limitations of the approach are discussed.</p>
      </abstract>
      <kwd-group>
        <kwd>Knowledge Acquisition Knowledge Graph Construction Entity Co-occurrence Information Extraction Pattern Extraction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The Web is a vast source of structured and unstructured data. Unfortunately,
extracting knowledge from this pool of data is a non-trivial task. In the recent
years, however, extraordinary progress has been made in extracting knowledge
from the Web and persisting it in a machine-readable format. Google coined the
term "Knowledge Graph" (KG) in 2012 to describe such stores of knowledge
that contain ontological information describing a certain domain as well as facts
describing the state of the world with respect to the domain.1 Many application
areas like question answering [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], entity disambiguation [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], and text
categorization [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] as well as concrete applications like search engines and AI assistants
pro t heavily from the availability of domain-speci c as well as cross-domain
KGs.
      </p>
      <p>
        Large-scale cross-domain KGs like DBpedia [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and YAGO [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] contain
millions of entities and several hundred millions of facts. Both of them rely on
1
https://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-thingsnot.html
Wikipedia as their prime source of knowledge (with YAGO additionally using
WordNet [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] to build its ontology). Another commonality is their exploitation of
Wikipedia's structural properties to extract high-quality information. While
DBpedia depends on infoboxes to gain information about articles, YAGO exploits
the category system to build a taxonomy of articles and discover relationships
between them. This is, of course, no coincidence, because extracting
information from (semi-)structured sources yields results that are yet unmatched by
approaches working on unstructured data only (YAGO reports accuracy values
of about 95% [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]). Being aware that only a small part of the information on the
web is available in a (semi-)structured format, we believe that there is still huge
potential in exploiting the structuredness of data with the aim of constructing
large-scale KGs.
      </p>
      <p>In this paper, we describe our research idea for the construction of a KG
using entity co-occurrence. In principle, we want to discover patterns in web
documents that indicate relationships between the included entities. Instead of
focusing on relationships between two entities at a time, however, the aim is to
identify patterns including groups of entities that are related twofold: they are
connected on the document surface (e.g. they appear in the same table column)
and have a semantic connection (e.g. they are both persons who live in Berlin).</p>
      <p>We de ne the problem as follows: rsur(e1; e2) denotes a relation between
entities e1 and e2 that manifests in the surface of the underlying text corpus (i.e.
they are somehow connected on the document level) and rsem(e1; e2) describes
a semantic relationship between e1 and e2 (i.e. they have an arbitrary common
property). We denote the document corpus with D, entities in a document d 2 D
with Ed, and entities used to extract a co-occurrence pattern p with Ep. A
pattern p is uniquely de ned by speci c relations rsur and rsem, and the source
document d. For every document d 2 D we want to nd a set of patterns Pd
with</p>
      <p>Pd = fpd;rsur;rsem j8e1; e2 2 Ep : e1; e2 2 Ed ^ rsur(e1; e2) ^ rsem(e1; e2)g
To be able to extract information from arbitrary documents, the extracted
document-level sets are fused into a document-independent set P .</p>
      <p>Based on our problem de nition, we pose the following research questions:
RQ1: Is it possible to discover arbitrary entity co-occurrence patterns locally
(i.e. within a bounded context like Wikipedia) as well as globally (on the Web)?
RQ2: Can co-occurrence patterns be grouped into di erent types of patterns
and if so, how do these groups di er in their performance?
RQ3: How well can (groups of) co-occurrence patterns be generalized so that
they can be applied to arbitrary web documents?</p>
      <p>The remainder of this paper is organized as follows: In the following section
we describe the current state of the art. In Section 3 we elaborate on our research
idea based on two speci c examples. Section 4 describes the research
methodology in detail and in Section 5 we sketch our planned experiments. Finally, we
conclude with a discussion of the general research idea in Section 6.</p>
    </sec>
    <sec id="sec-2">
      <title>State of the Art</title>
      <p>
        A large amount of publications tackle the problem of KG construction [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
identify four groups of approaches that can be characterized by their choice of
data sources and ontology: (1) approaches that exploit the structuredness of
Wikipedia and either use a prede ned ontology (like DBpedia [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]) or extract
their ontology from the underlying structured data (like YAGO [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]); (2) open
information extraction approaches that work without an ontology and extract
information from the whole web (e.g. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]); (3) approaches that use a xed
ontology and also target the whole web (e.g. KnowledgeVault [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], NELL [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]); and
nally (4) approaches that target the whole web, but construct taxonomies (is-a
hierarchies) instead (e.g. [
        <xref ref-type="bibr" rid="ref12 ref24">12, 24</xref>
        ]).
      </p>
      <p>
        While inspired by approaches from (1), our research idea can best be
categorized into group (3) as the aim is to extract knowledge from the whole web and
use an existing ontology. Consequently, we will focus on approaches from those
groups in the remainder of this section. Besides DBpedia [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and YAGO [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], two
more Wikipedia-based approaches are WiBi [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and DBTax [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. While the authors
of WiBi generate a taxonomy by iteratively extracting hypernymy-relationships
from Wikipedia's article and category network, the authors of DBTax use an
unsupervised approach to scan the category tree of Wikipedia for prominent
nodes which for themselves already form a complete taxonomy. Inspired by an
analysis about list pages in Wikipedia from Paulheim and Ponzetto [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
strive to augment DBpedia by exploiting the fact that entities in Wikipedia's
list pages are all instances of a common concept. They use statistical methods
to discover the common type of a list page in order to assign it to the entities
which are lacking it. In [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] the authors use Wikipedia's tables to extract
multiple millions of facts. By bootstraping their approach with data from DBpedia,
they are able to apply machine learning in order to extract facts with a precision
of about 81.5%. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] exploit the structuredness of abstracts of Wikipedia pages
to extract facts related to the subject of the page and mentioned entities. They
use a supervised classi cation approach using only features that are
languageindependent like the position of the mentioned entity in the abstract or the types
of the mentioned entity.
      </p>
      <p>
        The never-ending language learner NELL [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] cyclically crawls a corpus of one
billion web pages to continuously learn new facts by harvesting text patterns.
KnowledgeVault [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] gathers facts from web documents by extracting them from
text, HTML tables, the DOM tree, and schema.org annotations. To verify their
validity, they compare the extracted facts with existing knowledge in Freebase
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Most of their extractors are designed to discover relations between two
entities (e.g. for the extraction from the DOM tree the lexicalized path between two
entities is used as feature vector). Only for the extraction of facts from HTML
tables they consider groups of entities as the relations in tables are usually
expressed between whole columns. Various other approaches use HTML tables
(e.g. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]) or structured markup (e.g. [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]) for the extraction of facts.
Nevertheless, none of these de ne a generic approach for the extraction of facts between
multiple entities using arbitrary structures of a web document.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Approach</title>
      <p>Figure 1 shows two exemplary settings for an application of the approach. With
background knowledge from an existing KG, it can be applied to any corpus
of web documents. Figure 1a displays a Wikipedia list page of persons who are
all related on the surface level (i.e. they all appear in the same enumeration)
and on the semantic level (i.e. they are all ctional journalists). As Wikipedia
entities are closely linked to DBpedia, the entities referenced in these lists can
be linked to their respective counterpart in DBpedia automatically, thus making
it easy to automatically nd semantic commonalities between them. By using
the information about the entities, we can identify groups of entities and extract
a pattern for the recognition of such entities on a Wikipedia list page. In this
case such a pattern could identify the rst entity of every enumeration point as
a ctional journalist.</p>
      <p>
        Figure 1b shows a more generic setting where the extraction of the pattern is
rather di cult as entities on this page are not linked already and a navigation bar
is not as standardized as an enumeration. Nevertheless, there are various entity
recognition and linking tools that can help with the former problem (e.g. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]).
Regarding the latter problem it is worth noting that there is a steady increase in
adoption of Open Source software [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and especially Web Content Management
Systems, thus making it likely to nd more and more websites with standardized
components.
      </p>
      <p>In both scenarios the whole underlying document corpus can be scanned
for semantically related entities within speci c documents in order to discover
their relation on the surface of the document. Fusing and generalizing the
extracted patterns can then yield in (a) a pattern for the extraction of persons from
Wikipedia list pages and (b) a pattern for the extraction of persons from
navigation bars. When additional contextual information is included in the pattern
(e.g. information about the domain) it may even be possible to de ne general
patterns for the extraction of more speci c types like journalists or scientists,
respectively.
4
4.1</p>
    </sec>
    <sec id="sec-4">
      <title>Methodology</title>
      <sec id="sec-4-1">
        <title>Knowledge Graph Construction</title>
        <p>The foundation of our processing pipeline form a KG that is used as seed (KGs)
and a corpus of web documents D. The extraction itself can be separated into
three phases: Pattern Extraction, Pattern Fusion and Pattern Application.</p>
        <p>
          Pattern Extraction: If necessary, entities in D are located and linked to
KGs. Applying distant supervision [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] and the local closed world assumption [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
to KGs and D, we can gather data for the extraction of patterns. Spec cially,
we want to nd patterns comprising entities that are related through speci c
relations rsur and rsem. Paths in the DOM tree can serve as feature vectors for
arbitrary web documents but depending on the corpus D it might make sense
to use more speci c feature vectors (like Wiki markup when using Wikipedia as
(a) Excerpt from Wikipedia's List of ctional journalists
(a) https://en.wikipedia.org/wiki/List of fictional journalists
(b) https://dws.informatik.uni-mannheim.de/en/people/researchers
(b) Navigation bar
for researchers of the
DWS group
data corpus). The output of this phase is a (possibly empty) set of patterns Pd
for every d 2 D.
        </p>
        <p>Pattern Fusion: Two patterns p1 and p2 may be merged together if their
respective relations rsur and rsem are equal, subsume one another, or can both
be subsumed by a more general relation. For rsur a subsumption can mean
that only the part of the DOM tree that p1 and p2 have in common is used as
pattern. Regarding rsem, a pattern that indenti es scientists and a pattern that
identi es journalists are merged into a pattern that identi es persons. Patterns
can then be ranked by their accuracy on the extraction data and their support
in D in order to lter out irrelevant patterns. As an output we have a set P with
generalized patterns for D.</p>
        <p>Pattern Application: As a last step, the patterns in P can either be applied
to D or, depending on the generality of the patterns, to any other corpus of web
documents to extract new entities and facts. Note that while new entities can
be extracted with any co-occurrence pattern, facts are always extracted in the
context of the relation rsem of the respective pattern.</p>
        <p>
          Finally, an approach similar to the one applied in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] could be used to
transform this sequential extraction pipeline into an iterative one. After an initial
pattern extraction, the discovered surface relations are applied to D in order to
nd new semantic relations, and vice versa.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Knowledge Graph Evaluation</title>
        <p>
          We plan to evaluate our resulting KGs on three levels: To get a rst impression of
the results, we evaluate intrinsically by comparing metrics like size or coverage
with other KGs of the domain (see [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] for a comprehensive overview of KG
metrics). An extrinsic evaluation is conducted with the help of human judges
in order to get an absolute estimation of the quality of our results. Due to the
tremendous size of large-scale KGs, we plan to use crowd-sourced evaluation tools
like Amazon Mechanical Turk.2 Finally, we will perform a task-based evaluation
by analyzing whether the performance of applications increase when our KG is
used instead of others.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Planned Experiments</title>
      <p>
        For an exploration of the potential of co-occurrence patterns, our rst prototype
will be implemented in a constrained environment where rsur is restricted to a
speci c type of patterns. Using DBpedia as seed KG and Wikipedia as document
corpus, the idea is to construct a KG from the category tree and connected
list pages. While the category tree already served for many publications as the
backbone of a taxonomy, list pages have only been used in few occasions (see
Section 2). This may be due to the fact that list pages, unlike categories, have no
precisely de ned structure and their hierarchical organization within Wikipedia
is rather implicit. In general, a list page is a Wikipedia page with a title starting
with List of. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] have analyzed 2,000 list pages and identi ed three common
layouts: They appear either as enumeration, table or in an arbitrary unstructured
format, while the latter appears less frequent. Hence, we will focus on the former
two types due to their structured nature and frequency. Co-occurrence patterns
can then be derived as explained in Section 3. The English version of DBpedia
contains 212,175 list pages in its latest release3, so we are positive that a lot of
still hidden knowledge can be extracted with this approach.
      </p>
      <p>
        Using the insights gained from our rst prototype, we will subsequently
perform an experiment in an unconstrained environment using the whole web as
document corpus. Here, we strive to extract patterns where rsur and rsem can
have arbitrary forms. The Common Crawl4 will serve as our source of documents.
Instead of linking entities in the crawl on our own, we plan to use semantic
annotations on web pages (e.g. using Microdata or RDFa format [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]). Consequently,
the pipeline described in Section 4 will be applicable. The most recent Web Data
Commons crawl for semantic annotations5 contains almost 40 billion triples in
several million domains and various approaches (cf. [
        <xref ref-type="bibr" rid="ref25 ref5">5, 25</xref>
        ]) have successfully
utilized them for their experiments. Hence, we see this as a promising setup for the
large-scale extraction of co-occurrence patterns.
2 https://www.mturk.com/
3 Pages starting with List of in
http://downloads.dbpedia.org/2016-10/corei18n/en/labels en.ttl.bz2
4 http://commoncrawl.org/
5 http://webdatacommons.org/structureddata/#results-2017-1
      </p>
    </sec>
    <sec id="sec-6">
      <title>Discussion</title>
      <p>
        Our approach for the construction of KGs from entity co-occurrence is designed
to exploit arbitrary document structures that contain related entities. It thus
extends the state of the art as existing approaches either focus on relations
between two entities ([
        <xref ref-type="bibr" rid="ref3 ref5">3, 5</xref>
        ]) or treat only special cases of document structures
like tables ([
        <xref ref-type="bibr" rid="ref19 ref21">19, 21</xref>
        ]).
      </p>
      <p>The approach bears potential as it works orthogonally to the existing
approaches by focusing on harvesting patterns formed by multiple entities.
Consequently, it might be possible to extract information that is yet untouched since,
as soon as a co-occurrence pattern is found, no evidence for a certain fact in
the immediate surroundings of an entity is necessary in order to extract it. The
main limitation of our approach is the inability to extend the ontology of the
seed KG. Depending on the richness of the ontology, some relations might not
be representable, resulting in a potential loss of information. Furthermore, it is
yet unexplored how e ciently co-occurrence patterns can be extracted (on large
scale) and whether it is necessary to include additional contextual information
into the patterns in order to create document-independent ones.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>I would like to thank Heiko Paulheim for his guidance and support in the
realization of this work and Stefan Dietze for his elaborate review und valuable
suggestions for the general direction of my research.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bollacker</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Evans</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , et al.:
          <article-title>Freebase: a collaboratively created graph database for structuring human knowledge</article-title>
          .
          <source>In: 2008 ACM SIGMOD international conference on Management of data</source>
          . pp.
          <volume>1247</volume>
          {
          <fpage>1250</fpage>
          .
          <string-name>
            <surname>AcM</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Brewster</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , et al.:
          <article-title>Data driven ontology evaluation (</article-title>
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Carlson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Betteridge</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kisiel</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Settles</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hruschka</surname>
            <given-names>Jr</given-names>
          </string-name>
          , E.R., Mitchell, T.M.:
          <article-title>Toward an architecture for never-ending language learning</article-title>
          .
          <source>In: AAAI</source>
          . vol.
          <volume>5</volume>
          , p.
          <fpage>3</fpage>
          .
          <string-name>
            <surname>Atlanta</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D</given-names>
            <surname>Macredie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Mijinyawa</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          :
          <article-title>A theory-grounded framework of open source software adoption in smes</article-title>
          .
          <source>European Journal of Information Systems</source>
          <volume>20</volume>
          (
          <issue>2</issue>
          ),
          <volume>237</volume>
          {
          <fpage>250</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gabrilovich</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , et al.:
          <article-title>Knowledge vault: A web-scale approach to probabilistic knowledge fusion</article-title>
          .
          <source>In: 20th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          . pp.
          <volume>601</volume>
          {
          <fpage>610</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Fader</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soderland</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Etzioni</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Identifying relations for open information extraction</article-title>
          .
          <source>In: Conference on empirical methods in natural language processing</source>
          . pp.
          <volume>1535</volume>
          {
          <issue>1545</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Flati</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vannella</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , et al.:
          <article-title>Two is bigger (and better) than one: the wikipedia bitaxonomy project</article-title>
          .
          <source>In: 52nd Annual</source>
          <article-title>Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</article-title>
          .
          <source>vol. 1</source>
          , pp.
          <volume>945</volume>
          {
          <issue>955</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Fossati</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kontokostas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
          </string-name>
          , J.:
          <article-title>Unsupervised learning of an extensive and usable taxonomy for dbpedia</article-title>
          .
          <source>In: 11th International Conference on Semantic Systems</source>
          . pp.
          <volume>177</volume>
          {
          <fpage>184</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Gabrilovich</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Markovitch</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Wikipedia-based semantic interpretation for natural language processing</article-title>
          .
          <source>Journal of Arti cial Intelligence Research</source>
          <volume>34</volume>
          ,
          <volume>443</volume>
          {
          <fpage>498</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Harabagiu</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moldovan</surname>
            ,
            <given-names>D.I.</given-names>
          </string-name>
          , et al.:
          <article-title>Falcon: Boosting knowledge for answer engines</article-title>
          .
          <source>In: TREC</source>
          . vol.
          <volume>9</volume>
          , pp.
          <volume>479</volume>
          {
          <issue>488</issue>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Heist</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulheim</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>Language-agnostic relation extraction from wikipedia abstracts</article-title>
          .
          <source>In: International Semantic Web Conference</source>
          . pp.
          <volume>383</volume>
          {
          <fpage>399</fpage>
          . Springer (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Hertling</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulheim</surname>
          </string-name>
          , H.:
          <article-title>Webisalod: providing hypernymy relations extracted from the web as linked open data</article-title>
          .
          <source>In: International Semantic Web Conference</source>
          . pp.
          <volume>111</volume>
          {
          <fpage>119</fpage>
          . Springer (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kuhn</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mischkewitz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ring</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Windheuser</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Type inference on wikipedia list pages</article-title>
          .
          <source>Informatik</source>
          <year>2016</year>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Isele</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jakob</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , et al.:
          <article-title>Dbpedia{a large-scale, multilingual knowledge base extracted from wikipedia</article-title>
          .
          <source>Semantic Web</source>
          <volume>6</volume>
          (
          <issue>2</issue>
          ),
          <volume>167</volume>
          {
          <fpage>195</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Mendes</surname>
            ,
            <given-names>P.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jakob</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , et al.:
          <article-title>Dbpedia spotlight: shedding light on the web of documents</article-title>
          .
          <source>In: 7th international conference on semantic systems</source>
          . pp.
          <volume>1</volume>
          {
          <issue>8</issue>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Meusel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petrovski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>The webdatacommons microdata, rdfa and microformat dataset series</article-title>
          . In: International Semantic Web Conference. pp.
          <volume>277</volume>
          {
          <fpage>292</fpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>G.A.</given-names>
          </string-name>
          :
          <article-title>Wordnet: a lexical database for english</article-title>
          .
          <source>Communications of the ACM</source>
          <volume>38</volume>
          (
          <issue>11</issue>
          ),
          <volume>39</volume>
          {
          <fpage>41</fpage>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Mintz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bills</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , et al.:
          <article-title>Distant supervision for relation extraction without labeled data</article-title>
          .
          <source>In: Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-</source>
          Volume 2. pp.
          <volume>1003</volume>
          {
          <issue>1011</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19. Mun~oz, E.,
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mileo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Using linked data to mine rdf from wikipedia's tables</article-title>
          .
          <source>In: 7th ACM international conference on Web search and data mining</source>
          . pp.
          <volume>533</volume>
          {
          <fpage>542</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Paulheim</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ponzetto</surname>
            ,
            <given-names>S.P.</given-names>
          </string-name>
          :
          <article-title>Extending dbpedia with wikipedia list pages</article-title>
          .
          <source>NLPDBPEDIA@ ISWC</source>
          <volume>13</volume>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Ritze</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmberg</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          , et al.:
          <article-title>Pro ling the potential of web tables for augmenting cross-domain knowledge bases</article-title>
          .
          <source>In: 25th international conference on world wide web</source>
          . pp.
          <volume>251</volume>
          {
          <issue>261</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Suchanek</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kasneci</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weikum</surname>
          </string-name>
          , G.:
          <article-title>Yago: a core of semantic knowledge</article-title>
          .
          <source>In: 16th international conference on World Wide Web</source>
          . pp.
          <volume>697</volume>
          {
          <fpage>706</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Weikum</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Theobald</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>From information to knowledge: harvesting entities and relationships from web sources</article-title>
          .
          <source>In: 29th ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems</source>
          . pp.
          <volume>65</volume>
          {
          <fpage>76</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>K.Q.</given-names>
          </string-name>
          :
          <article-title>Probase: A probabilistic taxonomy for text understanding</article-title>
          .
          <source>In: 2012 ACM SIGMOD International Conference on Management of Data</source>
          . pp.
          <volume>481</volume>
          {
          <fpage>492</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gadiraju</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          , et al.:
          <article-title>Knowmore-knowledge base augmentation with structured web markup</article-title>
          .
          <source>Semantic Web Journal</source>
          , IOS Press (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Zaveri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rula</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maurino</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , et al.:
          <article-title>Quality assessment for linked data: A survey</article-title>
          .
          <source>Semantic Web</source>
          <volume>7</volume>
          (
          <issue>1</issue>
          ),
          <volume>63</volume>
          {
          <fpage>93</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>