<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Scalable and Plug-in Based System to Construct a Production-Level Knowledge Base</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tomoya Yamazaki</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kentaro Nishi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Takuya Makabe</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mei Sasaki</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chihiro Nishimoto</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hiroki Iwasawa</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Masaki Noguchi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yukihiro Tagami</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>tomoyama</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>kentnish</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>tmakabe</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>mesasaki</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>cnishimo</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>hiwasawa</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>manoguch</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>yutagamig@yahoo-corp.jp</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yahoo Japan Corporation</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Knowledge bases play crucial roles in a wide variety of information systems, such as web search engines and intelligent personal assistants. For responding to constantly uctuating user information demands, we aim to construct a large-scale and well-structured comprehensive knowledge base from the world's evolving data. To maintain the quality of our large knowledge base at the production-level, we carefully design not only to match entities but to incorporate various automatic and manual validation methods because it is difficult to lter out all incorrect facts automatically in practice. In this paper, we propose a novel plugin-based system architecture satisfying the ability to rapidly identify mistakes and the system extensibility. Our constructed knowledge base is already utilized in Japanese Web services, and the number of entities in it keeps growing steadily.</p>
      </abstract>
      <kwd-group>
        <kwd>knowledge base</kwd>
        <kwd>entity matching</kwd>
        <kwd>data integration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>al. [5] proposed that industrial systems are considerably differerent to those in
academia regarding objectives; therefore, we cannot merely adopt such one-shot
KB construction systems. The main objective of our KB construction system is
not to maximize F1, but to maintain high precision at all time, while trying to
improve recall over time, because displaying incorrect information leads to losing
our trust. Since the qualities of KBs include various types of measurement such
as the precision and recall of the entity matching, the ratio of invalid or
incorrect data, the ratio of missing relation etc., we design that our KB construction
system can control its measurement and evaluate each algorithm or software
component one by one. Therefore, plugin-based system architectures are
critical to satisfying many system requirements such as the ability to x incorrect
data immediately and the interchangeability of each method. This plugin-based
system allows to implement various algorithms and also satisfy business
requirements, while the system focuses on ensuring our SLAs and scaling computation.</p>
      <p>
        WOO [1] is one of the plugin-based KB construction systems in Yahoo! Inc.
and is designed to enable various types of products to synthesize KBs. We
describe the procedure of how to output KBs from input data via the WOO as
follows: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) Import various scheme of data and normalize input data to the
common format, (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Match entities by some entity matching algorithms, (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) Assign
persistent ID, and (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) Export KB in any output format. In this paper, we
extend the WOO regarding maintain high precision from multiple measurement
perspectives and explore loose-coupling and plugin-based architectures. In
addition to the above four primary functions of WOO, our KB construction systems
validates incorrect data and completes missing relations and facts. We carefully
design the system architecture such that it is easily extensible.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Our KB Construction System Architecture</title>
      <p>Our KB construction system is designed to handle hundreds of millions of
entities covering our wide-ranging domain services (e.g., books, movies, companies,
landmarks etc.,) and constructs a large production-level KB every day by
using Apache Spark. Our system is mainly composed of twelve components shown
in Figure 1, and we roughly divide the roles into two groups whether input
data are Web-crawled data or not. We daily collect and update structured, or
semi-structured data such as linked open data (LOD, e.g., Wikidata, Wikipedia,
DBpedia, or Freebase), content provider (CP) data (e.g., landmark, movie or
book data), and Web-crawled data by using our Web crawler. These three types
of data have different characteristics regarding their accuracy and the complexity
of how to extract key-value information, as shown in Table 1.</p>
      <p>Our system imports such structured and semi-structured data, other than
Web-crawled data, from the Importer to the Exporter through other ten
primary components. As shown in Table 1, extracting key-value information from
Web-crawled data is another challenge due to the combination of various input
documents and variation in extraction targets [3]; thus, we design our
information extraction (IE) method separately from the main components and retrieve
the IE data as additional data.
2.1</p>
      <sec id="sec-2-1">
        <title>The JKB Scheme</title>
        <p>We introduce a new scheme for KBs and explain its advantages. The JKB is a
set of entities. An entity is composed of a unique id, types (e.g., Person and
Written Work), and a set of Triples. A Triple is similar to the RDF scheme
(https://www.w3.org/TR/rdf11-concepts/) that is composed of subject, predicate,
and object. Moreover, triples in the JKB scheme have its certainty score as with
YAGO [8], the data type. We can maintain the quality of the JKB by ltering
triples following whether the certainty score is higher than a threshold or the
data type is consistent to our ontology. Since the JKB scheme is allowed to add
arbitrary meta information such as data source, we can easy to debug by tracing
the data source.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Primary Functions of Each Component</title>
        <p>Importer is a data feed component. It supports arbitrary input data scheme,
except for Web-crawled data, as shown in Figure 1, and uni es with the JKB
scheme with the certainty score. We manually assign the certainty score for every
data source or predicate taking into account its update frequency or estimated
precision.</p>
        <p>Attribute Converter converts the types and predicates of the input data to our
ontology by using the mapping le. For example, we provide a mapping from the
person-type in Wikidata such as https://www.wikidata.org/wiki/Q215627
to the Person type in our ontology. We traverse the type hierarchy of each
LOD and semi-automatically construct type mappings from LODs to the JKB.
We show the semi-automatical steps of constructing person-mappings from the
Wikidata to the JKB as follows:
1. We judge whether the mapping from the type in Wikidata (https://www.wikid
ata.org/wiki/Q215627) to the type in our ontology is correct.
2. If the previous step passes, we automatically collect candidate mappings
between all subclasses of the type in Wikidata and the type in our ontology.
3. We sample the mapping results and manually check the correctness of the
above candidate mappings.</p>
        <p>Entity Matcher outputs entity clusters, which groups the same entities. The
Entity Matcher is mainly composed of three steps, rule-based matchings,
graphbased matchings and ltering unnecessary entity clusters and their attributes.
First, accurate matchings are conducted by rule-based matchings. Second, we
create blocks of candidate entity clusters with weak matching methods (e.g.,
name matching) to reduce the computation time. We connect edges between
related entity clusters of each block with more strict matching methods (e.g.,
name and birthdate) than blocking matching and extract cliques from each entity
cluster to entity cluster graph. Third, the Entity Matcher removes entity clusters
whose attribute certainties are zero and unmapped attributes at the Attribute
Converter.</p>
        <p>ID Assigner assigns a unique ID to each entity cluster based on the set of
data sources of entities in the cluster. For example, if the ID of an entity cluster
composed of two entities derived from \Wikidata ID: 100" and \DBpedia ID:
2000" is \JKB ID: 300", we save two relations as follows: \Wikidata ID: 100"
! \JKB ID: 300" and \DBpedia ID: 2000" ! \JKB ID: 300".</p>
        <p>We preserve the relations between the ID and the set of data sources on
Apache HBase (https://hbase.apache.org/), which is the Hadoop database, as the
ID Table. The ID Assigner ensures that the same entity clusters compared to
the past ones inherit to the past ID in the following steps:
1. Get the past JKB IDs of the entities in the given entity cluster by
crossreferencing data sources of the entities in the entity cluster and the past ID
Table. The past JKB IDs are candidate IDs for the given entity custer.
2. Get all past data sources by cross-referencing the past JKB IDs and the past</p>
        <p>ID Table.
3. Calculate the ratio of the number of the intersections of past and current
data sources to the number of past data sources.
4. If the ratio is greater than 0:5, assign the past ID to the entity cluster;
otherwise, assign a new ID and update the ID Table.</p>
        <p>The ID is persistent across the time with the ID Assigner.</p>
        <p>Additional Data Combiner combines entities from the additional data whose
scheme is the same as the JKB's; thus, the Additional Data Combiner
incorporates these entities into entity clusters from the ID Assigner by using their IDs.
Entity Merger merges entities in each entity cluster into one entity. We can
de ne the certainty score to each Triple derived from the data source. Entity
Merger merges the same Triples of entities in the entity cluster into one Triple
whose certainty score is merged by a monotone increasing function. Since the
main objective of our KB construction system is to maintain high precision at
all time, the Entity Merger is a simple but powerful component because the
Validator can lter unreliable attributes accurately due to the certainty score.
Object Converter converts the object of each Triple to the corresponding
entity-id by referring the ID Table. If the Triple is derived from linked data
and the object is described as the speci c identi er, it is easy for the Object
Converter to convert the object to the corresponding entity-id by using the ID
Table. Since many objects of triples are represented as literals, linking such literal
to the corresponding entities is similar to entity-disambiguation problems.</p>
        <p>
          Therefore, we should resolve the entity ambiguity from two viewpoints, (
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
the type consistency between the predicate and our ontology and (
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
distinguishing different objects with the same name and the same type, such as a person's
name. We solve the rst problem by comparing our ontology with the range type
of the predicate and the second problem by converting the object only when the
entity whose pair of the object-type and the object-name is uniquely de ned in
the JKB.
        </p>
        <p>Attribute Completer completes attributes to entities based on our ontology
and their URL attributes. First, it completes attributes by using a symmetric
property de ned in our ontology such as the inverseOf property of owl:inverseOf
(https://www.w3.org/TR/owl-ref/#inverseOf-def). Second, it extracts useful
information from the entity-related URL, for example, OGP (http://ogp.me/)
im</p>
        <p>➛
!
"
➛
➛
➛
#
!
➛
ages are useful attributes of these entities. The Attribute Completer partially
addresses a well-known challenge; knowledge base completion [7] by reasoning
missing inversed Triples.</p>
        <p>
          Validator removes invalid data based on (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) blacklists created via the Manual
Re ner, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) inconsistency between Triples and our ontology, and (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) the results
of fact checking by using crawl data (e.g., URLs are deadlink or not). It also
rewrites values to standardize de nitions of the JKB, for example, the phonetic
characters are uni ed to hiragana (one of the writing systems of Japanese). We
describe the details of the component in Section 3.3.
        </p>
        <p>Exporter lters and corrects Triples to avoid service-speci c issue, such as
copyright problems, and outputs the JKB to a well-known format (e.g., JSON
or N-Triples).</p>
        <p>Information Extractor extracts factual information from a large set of
Webcrawled data. Figure 2 illustrates an overview of the method of the Information
Extractor (IE). There has been extensive work on IE from Web data [3]. We
regard each Web page as a Document Object Model (DOM) Tree and collect all
DOM path patterns related to a predicate by using the JKB.</p>
        <p>The Information Extractor uses simple DOM-based method for extracting
information from semi-structured data based on distant supervision [6]. First,
we nd path patterns from the subject to the object with con dence scores. Path
patterns are extracted from DOM trees of a large amount of Web-crawled data
and the JKB. Second, we extract new information from the extracted path
patterns and merge the con dence score with our de ned function in the following.</p>
        <p>Given two con dence scores PA and PB, we de ne the new associative binary
relation between these scores as follows:</p>
        <p>PA</p>
        <p>PB =</p>
        <p>PAPB
PAPB + (1 PA)(1</p>
        <p>PB)
;
where is a hyperparameter for the probability that two paths from two other
Web pages are the same and the extracted fact is incorrect. We set, in practice,
the hyperparameter to 0.1 by the preliminary experimental results. The above
binary relation satis es an associative law.</p>
        <p>Manual Re ner can handle corner cases that are difficult to remove or re ne
automatically. First, we nd the incorrect facts from user feedback and our
quantitative evaluation. Second, we examine if these facts are derived from a
business requirement or rare cases or not. If these facts are corner cases and
required immediate modi cation, we add them to the blacklist. Third, we create
a new Triple with which we cannot import data due to the lack of information,
such as images, without any additional information. The quality of JKB improves
with the Manual Re ner.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Quantitative Results</title>
      <p>3.1</p>
      <sec id="sec-3-1">
        <title>Overall JKB Results</title>
        <p>We rst compared two JKBs separated by a week and con rmed that only
0.0004% of entities changed their IDs within a week. We also observed that
more than 94% of the entities did not change their IDs as shown in Figure 4.
Since we stopped importing some data sources and the Validator lters more
and more invalid entities, some entities have deleted from the JKB. That is the
reason why there are about 6% of entity-IDs are inconsistent to the current
entity-IDs.</p>
        <p>
          Second, we show the number of matched entities. The Entity Matcher uses
two algorithms as follows; (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) Rule-based matching matches entities whose Wikipedia
URL or some identi ers, such as IMDb ID and ISBN. (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) Graph-based matching
matches entities with their types, names, and reliable attributes (e.g., birthdate
or coordinates). We show the number of matched entities in Figure 4. Due to the
graph-based matching, the number of matched entities increases. Graph-based
matching does not match entities derived from the same data sources to improve
the precision. For this reason, the precision of matching results is about 99%.
3.3
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Automatical Validation and Completion Results</title>
        <p>
          The Validator automatically removes many invalid Triples to maintain the
quality of the JKB, as shown in the right side of Figure 3. The Validator lters (
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
facts whose domain types are inconsistent with the type of objects, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) facts that
are functional (https://www.w3.org/TR/owl2-syntax/#Functional Data Properties),
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) facts whose data types do not match objects (e.g., if the data type is URL,
the value must start with \http"), and (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) facts whose values do not satisfy the
data type format (e.g., date or ISBN). We observed that 97.7, 2.0, 0.2, 0.1%
of all validation data are (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ), (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ), (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ), and (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ), respectively. Note that a large
amount of the invalid data of (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) is mainly derived from unmapped entities from
the LOD to our ontology; thus, we can reduce the number of ltering entities,
and this validation is one of the reasons that the JKB maintains high accuracy.
        </p>
        <p>The Attribute Completer completes about 1.4% of all facts to the JKB.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>We presented our plugin-based KB construction system that constructs scalable
and production-level KB. We described an overview of our plugin-based system
architecture and each software component. Our plugin-based KB system allows
to implement various entity-matching and validation algorithms and also satisfy
business requirements, while the system focuses on ensuring our SLAs and scaling
computation. Our constructed knowledge base, JKB, is one of the largest KBs
in Japanese and already utilized in Japanese Web services.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bellare</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Curino</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Machanavajihala</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mika</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahurkar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sane</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>WOO: A Scalable and Multi-tenant Platform for Continuous Knowledge Base Synthesis</article-title>
          .
          <source>VLDB '13</source>
          , vol.
          <volume>6</volume>
          , pp.
          <volume>1114</volume>
          {
          <issue>1125</issue>
          (Aug
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Blanco</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cambazoglu</surname>
            ,
            <given-names>B.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mika</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torzec</surname>
          </string-name>
          , N.:
          <article-title>Entity Recommendations in Web Search</article-title>
          . pp.
          <volume>33</volume>
          {
          <fpage>48</fpage>
          . ISWC '
          <volume>13</volume>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <issue>3</issue>
          .
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>C.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kayed</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girgis</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shaalan</surname>
            ,
            <given-names>K.F.</given-names>
          </string-name>
          :
          <article-title>A Survey of Web Information Extraction Systems</article-title>
          . vol.
          <volume>18</volume>
          , pp.
          <volume>1411</volume>
          {
          <issue>1428</issue>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Choudhury</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Purohit</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Pirrung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          , Thomas,
          <string-name>
            <surname>M.:</surname>
          </string-name>
          <article-title>NOUS: Construction and Querying of Dynamic Knowledge Graphs</article-title>
          . pp.
          <volume>1563</volume>
          {
          <fpage>1565</fpage>
          . ICDM '
          <volume>17</volume>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>G.C.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.S.</given-names>
            ,
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.G.</surname>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Rampalli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Prasad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Arcaute</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Krishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Deep</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Raghavendra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Doan</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Why Big Data Industrial Systems Need Rules and What We Can Do About It</article-title>
          . pp.
          <volume>265</volume>
          {
          <fpage>276</fpage>
          . SIGMOD '
          <volume>15</volume>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Mintz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bills</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Snow</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Distant Supervision for Relation Extraction Without Labeled Data</article-title>
          . pp.
          <volume>1003</volume>
          {
          <fpage>1011</fpage>
          . ACL '
          <volume>09</volume>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Reasoning With Neural Tensor Networks for Knowledge Base Completion</article-title>
          .
          <source>NIPS '13</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Suchanek</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kasneci</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weikum</surname>
          </string-name>
          , G.:
          <article-title>YAGO: a core of semantic knowledge unifying WordNet and Wikipedia</article-title>
          . pp.
          <volume>697</volume>
          {
          <fpage>706</fpage>
          . WWW '
          <volume>07</volume>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsiao</surname>
            , L., Cheng,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hancock</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rekatsinas</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , R, C.:
          <article-title>Fonduer: Knowledge Base Construction from Richly Formatted Data</article-title>
          . SIGMOD '
          <volume>18</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>