<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>VKGBuilder { A Tool of Building and Exploring Vertical Knowledge Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tong Ruan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Haofen Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fanghuai Hu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>East China University of Science &amp; Technology</institution>
          ,
          <addr-line>Shanghai, 200237</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recently, search engine companies like Google and Baidu are building their own knowledge graphs to empower the next generation of Web search. Due to the success of knowledge graphs in search, customers from vertical sectors are eager to embrace KG related technologies to develop domain speci c semantic platforms or applications. However, they lack skills or tools to achieve the goal. In this paper, we present an integrated tool VKGBuilder to help users manage the life cycle of knowledge graphs. We will describe three modules of VKGBuilder in detail which construct, store, search and explore knowledge graphs in vertical domains. In addition, we will demonstrate the capability and usability of VKGBuilder via a real-world use case in the library industry.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Schema Inconsistency or Data Conflict</p>
      <p>Knowledge Integration Module
LRODDB ImLDLipnOo2kDrRetrer ExApSlaicgnhnseimomneanatnd SEcdhietmora
TUeGxCt InWEfxorUtrarmGapcCaptteoiorrn EnriDcahtmaent Data Editor</p>
      <p>Incremental Schema Design and Data Enrichment
Knowledge Store Module</p>
      <p>Virtual Graph Database</p>
      <p>Knowledge Access Module
Restful (Card View,Wheel View) Natural Language Interface</p>
      <p>API Visual Explorer Semantic Search With
end-to-end solutions to manage the life cycle of knowledge graphs and hide the
technical details as much as possible. To the best of our knowledge, we present
the rst suitable tool for vertical industry users called VKGBuilder. It allows
rapid and continuous VKG construction which imports and extracts data from
diverse data sources, provides a mechanism to detect intra- and inter-data source
con icts, and consolidates these data into a consistent KG. It also provides
intuitive and user-friendly interfaces for novice users with little knowledge of
semantic technologies to understand and exploit the underlying VKG.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Description of VKGBuilder</title>
      <p>VKGBuilder is composed of three modules namely the Knowledge Integration
module, the Knowledge Store module, and the Knowledge Access module. The
whole architecture is shown in Figure 1. Knowledge Integration is the core
module for VKG construction with three main components. Knowledge Store is a
virtual graph database which combines RDBs, in-memory stores and inverted
indexes to support fast access of VKG in di erent scenarios, and the Knowledge
Access module provides di erent interfaces for end users and applications.
2.1</p>
      <p>Knowledge Integration Module
{ Data Importers and Information Extractors. Structured data from internal
relational database are imported and converted into RDF triples by D2R
importers3. A LOD Linker is developed to enrich VKG with domain
ontologies from the public linked open data. For the user generated contents
(UGCs), we mainly consider encyclopaedic sites like Wikipedia, Baidu Baike,
and Hudong Baike. Due to the semi-structured nature of these sites,
wrappers automatically extract properties and values of certain entities. As for
unstructured text, distant-supervised learning methods are adapted to
discover missing relations between entities or ll property values of a given
entity where the above extracted semantic data serve as seeds.
3 http://d2rq.org/
VKGBuilder { A Tool of Building and Exploring Vertical Knowledge Graphs
{ Schema Inconsistency and Data Con ict Detection. After semantic data are
extracted or imported from various sources, data integration is performed
to build an integrated knowledge graph. During integration, schema-level
inconsistency and data-level con icts might occur. Schema editing is used
to de ne axioms of properties such as (e.g., functional, inverse, transitive),
concept subsumptions, and concepts of entities. Then a rule-based validator
is triggered to check whether the newly added data or imported ontologies
will cause any con icts with existing ones. The possible con icts are resolved
by user de ned rules or delivered to domain experts for human intervention.
{ Schema and Data Editor. Knowledge workers can extend or re ne a VKG
in both schema-level and data-level with a collaborative editing interface.
2.2</p>
      <p>Knowledge Access Module
{ Visual Explorer. It includes three views namely the Wheel View, the Card
View, and the Detail View. The Wheel View organizes concepts and entities
in two wheels. In the left wheel, the node of interest is displayed in the
center. If it is a concept, its child concepts are neighbors in the same wheel.
If it is an entity, its related entities are connected via properties as outgoing
(or incoming) edges. When a related concept (or entity) is clicked, the right
wheel is expanded with the clicked node in the center surrounded with its
related information on the VKG. Thus, we allow users to navigate through
the concept hierarchy and traverse between di erent entities. The Card View
visualizes entities in a force-directed graph layout, which is similar to the
galaxy visualization in a 3D space. The Card View also allows to change the
focus through drag and drop as well as zoom-in and zoom-out. The Detailed
View shows all properties and property values of a particular entity. The
three views can be switched from one to another in a exible way.
{ Semantic Search with Natural Language Interface. Users can submit any
keyword query or natural language question. The query is interpreted into
possible SPARQL queries with natural language descriptions. Once a
SPARQL query is selected, the corresponding answers are returned, along with
relevant documents which contain semantic annotations on these answers.
Besides, a summary (a.k.a, knowledge card) of the main entity mentioned in
the query or the top-ranked answer is shown. Related entities de ned in the
VKG as well as correlated entities in the query log are recommended.
{ Restful APIs. They are designed for developers with little knowledge of
semantic technologies to access the VKG using any programming language
from any platform at ease. These APIs are actually manipulations of
SPARQL queries to support graph traversal or sub-graph matching on the VKG.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Demonstration</title>
      <p>VKGBuilder is rst used in the ZhouShan Library. The current VKG
(marineoriented KG) contains more than 32,000 shes and each sh has more than 20
properties. Besides shes, VKGBuilder also captures knowledge about shing
grounds, sh processing methods, related researchers and local enterprises. An
online demo video of VKGBuilder can be downloaded at http://202.120.1.49:
19155/SSE/video/VKGBuilder.wmv.</p>
      <p>Figure 2 shows a snapshot of the semantic search interface. When a user
enters a query \Distribution of Little Yellow Croaker", VKGBuilder rst
segments the query into \Little Yellow Croaker" and \Distribution". Here, \Little
Yellow Croaker" is recognized as a sh, and properties about \distribution" are
returned. Then all sub-graphs connecting the sh with each property are found
as possible SPARQL query interpretations of the input query. Top
interpretations whose scores are above a threshold are returned with natural language
descriptions for further selection. Once a user selects a query, the answers (e.g.,
China East Sea) are returned. Also, related books with these answers as
semantic annotations are returned. The related library classi cation of these books are
displayed in the left, and the knowledge card as well as related concepts and
entities of Little Yellow Croaker are listed in the right panel.</p>
      <p>In Figure 3, the Wheel View initially shows the root concept (owl:Thing)
in the center of the left wheel (denoted as LC). When a sub-concept Fish is
clicked, it becomes the center of the right wheel (denoted as RC) with its child
concepts (e.g., Chondrichthyes). We can also navigate between entities. For
instance, selenium is one of the nutrients of Little Yellow Croaker. When clicking
selenium, all shes containing this nutrient are shown in the right wheel.</p>
      <p>The user experience heavily depends on the quality of the underlying VKG.
The extraction and importing are executed automatically in the back-end while
we provide a user interface for con ict resolution. For \Little Yellow Croaker",
we extract Ray-finned Fishes and Actinopterygii from di erent sources as
values of the property Class in the scienti c classi cation. Since Class is de ned
as a functional property and the two values do not refer to the same thing, a
con ict occurs. As shown in Figure 4, VKGBuilder accepts Actinopterygii as
the nal value because this value is extracted from more trusted sources.</p>
      <p>Acknowledgements This work is funded by the National Key Technology
R&amp;D Program through project No. 2013BAH11F03.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>