=Paper= {{Paper |id=None |storemode=property |title=VKGBuilder - A Tool of Building and Exploring Vertical Knowledge Graphs |pdfUrl=https://ceur-ws.org/Vol-1272/paper_141.pdf |volume=Vol-1272 |dblpUrl=https://dblp.org/rec/conf/semweb/RuanWH14 }} ==VKGBuilder - A Tool of Building and Exploring Vertical Knowledge Graphs== https://ceur-ws.org/Vol-1272/paper_141.pdf
VKGBuilder – A Tool of Building and Exploring
        Vertical Knowledge Graphs

                   Tong Ruan, Haofen Wang, and Fanghuai Hu

       East China University of Science & Technology, Shanghai, 200237, China
             {ruantong,whfcarter}@ecust.edu.cn, xiaohuqi@126.com



       Abstract. Recently, search engine companies like Google and Baidu are
       building their own knowledge graphs to empower the next generation of
       Web search. Due to the success of knowledge graphs in search, customers
       from vertical sectors are eager to embrace KG related technologies to
       develop domain specific semantic platforms or applications. However,
       they lack skills or tools to achieve the goal. In this paper, we present
       an integrated tool VKGBuilder to help users manage the life cycle of
       knowledge graphs. We will describe three modules of VKGBuilder in
       detail which construct, store, search and explore knowledge graphs in
       vertical domains. In addition, we will demonstrate the capability and
       usability of VKGBuilder via a real-world use case in the library industry.


1     Introduction
Recently, an increasing amount of semantic data sources are published on the
Web. These sources are further interlinked to form Linking Open Data (LOD).
Search engine companies like Google and Baidu leverage LOD to build their
own semantic knowledge bases (called knowledge graphs 1 ) to empower semantic
search. The success of KGs in search attracts much attention from users in
vertical sectors. They are eager to embrace related technologies to build semantic
platforms in their domains. However, they either lack skills to implement such
platforms from scratch or fail to find sufficient tools to accomplish the goal.
    Compared with general-purpose KGs, knowledge graphs in vertical industries
(denoted as VKG) have the following characteristics: a) More accurate and richer
data of certain domains to be used for business analysis and decision making;
b) Top-down construction to ensure the data quality and stricter schema while
general KGs are built in a bottom-up manner with more emphasis on the wide
coverage of data from different domains; c) Internal data stored in RDBs are
further considered to be integrated into VKGs; and d) Besides search, VKGs
should provide user interfaces especially for KG construction and maintenance.
    While there exist tool suites (e.g., LOD2 Stack2 ) which help to build and
explore LOD, these tools are mainly developed for researchers and developers
of the Semantic Web community. Vertical markets, on the other hand, need
1
    http://en.wikipedia.org/wiki/Knowledge_Graph
2
    http://lod2.eu/
2               Tong Ruan et al.

                                   Schema Inconsistency or Data Conflict
            Knowledge Integration Module
                      D2R
     RDB           Importer               Schema
                      LOD              Expansion and            Schema
     LOD                                 Alignment               Editor
                     Linker
                      UGC
     UGC                                                      Data Editor
                    Wrapper                   Data
                  Information              Enrichment
     Text
                    Extractor
             Incremental Schema Design and Data Enrichment
              Knowledge Store Module
                            Virtual Graph Database
            Knowledge Access Module
     Restful          Visual Explorer           Semantic Search With
      API         (Card View,Wheel View)      Natural Language Interface




    Fig. 1. Architecture of VKGBuilder                                      Fig. 2. Semantic Search Interface


end-to-end solutions to manage the life cycle of knowledge graphs and hide the
technical details as much as possible. To the best of our knowledge, we present
the first suitable tool for vertical industry users called VKGBuilder. It allows
rapid and continuous VKG construction which imports and extracts data from
diverse data sources, provides a mechanism to detect intra- and inter-data source
conflicts, and consolidates these data into a consistent KG. It also provides
intuitive and user-friendly interfaces for novice users with little knowledge of
semantic technologies to understand and exploit the underlying VKG.


2       Description of VKGBuilder
VKGBuilder is composed of three modules namely the Knowledge Integration
module, the Knowledge Store module, and the Knowledge Access module. The
whole architecture is shown in Figure 1. Knowledge Integration is the core mod-
ule for VKG construction with three main components. Knowledge Store is a
virtual graph database which combines RDBs, in-memory stores and inverted
indexes to support fast access of VKG in different scenarios, and the Knowledge
Access module provides different interfaces for end users and applications.

2.1          Knowledge Integration Module
    – Data Importers and Information Extractors. Structured data from internal
      relational database are imported and converted into RDF triples by D2R
      importers3 . A LOD Linker is developed to enrich VKG with domain on-
      tologies from the public linked open data. For the user generated contents
      (UGCs), we mainly consider encyclopaedic sites like Wikipedia, Baidu Baike,
      and Hudong Baike. Due to the semi-structured nature of these sites, wrap-
      pers automatically extract properties and values of certain entities. As for
      unstructured text, distant-supervised learning methods are adapted to dis-
      cover missing relations between entities or fill property values of a given
      entity where the above extracted semantic data serve as seeds.
3
    http://d2rq.org/
VKGBuilder – A Tool of Building and Exploring Vertical Knowledge Graphs           3

 – Schema Inconsistency and Data Conflict Detection. After semantic data are
   extracted or imported from various sources, data integration is performed
   to build an integrated knowledge graph. During integration, schema-level
   inconsistency and data-level conflicts might occur. Schema editing is used
   to define axioms of properties such as (e.g., functional, inverse, transitive),
   concept subsumptions, and concepts of entities. Then a rule-based validator
   is triggered to check whether the newly added data or imported ontologies
   will cause any conflicts with existing ones. The possible conflicts are resolved
   by user defined rules or delivered to domain experts for human intervention.
 – Schema and Data Editor. Knowledge workers can extend or refine a VKG
   in both schema-level and data-level with a collaborative editing interface.


2.2   Knowledge Access Module

 – Visual Explorer. It includes three views namely the Wheel View, the Card
   View, and the Detail View. The Wheel View organizes concepts and entities
   in two wheels. In the left wheel, the node of interest is displayed in the
   center. If it is a concept, its child concepts are neighbors in the same wheel.
   If it is an entity, its related entities are connected via properties as outgoing
   (or incoming) edges. When a related concept (or entity) is clicked, the right
   wheel is expanded with the clicked node in the center surrounded with its
   related information on the VKG. Thus, we allow users to navigate through
   the concept hierarchy and traverse between different entities. The Card View
   visualizes entities in a force-directed graph layout, which is similar to the
   galaxy visualization in a 3D space. The Card View also allows to change the
   focus through drag and drop as well as zoom-in and zoom-out. The Detailed
   View shows all properties and property values of a particular entity. The
   three views can be switched from one to another in a flexible way.
 – Semantic Search with Natural Language Interface. Users can submit any
   keyword query or natural language question. The query is interpreted into
   possible SPARQL queries with natural language descriptions. Once a SPAR-
   QL query is selected, the corresponding answers are returned, along with
   relevant documents which contain semantic annotations on these answers.
   Besides, a summary (a.k.a, knowledge card) of the main entity mentioned in
   the query or the top-ranked answer is shown. Related entities defined in the
   VKG as well as correlated entities in the query log are recommended.
 – Restful APIs. They are designed for developers with little knowledge of se-
   mantic technologies to access the VKG using any programming language
   from any platform at ease. These APIs are actually manipulations of SPAR-
   QL queries to support graph traversal or sub-graph matching on the VKG.


3     Demonstration

VKGBuilder is first used in the ZhouShan Library. The current VKG (marine-
oriented KG) contains more than 32,000 fishes and each fish has more than 20
4       Tong Ruan et al.




      Fig. 3. Wheel View                  Fig. 4. Conflict Resolution


properties. Besides fishes, VKGBuilder also captures knowledge about fishing
grounds, fish processing methods, related researchers and local enterprises. An
online demo video of VKGBuilder can be downloaded at http://202.120.1.49:
19155/SSE/video/VKGBuilder.wmv.
    Figure 2 shows a snapshot of the semantic search interface. When a user
enters a query “Distribution of Little Yellow Croaker”, VKGBuilder first seg-
ments the query into “Little Yellow Croaker” and “Distribution”. Here, “Little
Yellow Croaker” is recognized as a fish, and properties about “distribution” are
returned. Then all sub-graphs connecting the fish with each property are found
as possible SPARQL query interpretations of the input query. Top interpreta-
tions whose scores are above a threshold are returned with natural language
descriptions for further selection. Once a user selects a query, the answers (e.g.,
China East Sea) are returned. Also, related books with these answers as seman-
tic annotations are returned. The related library classification of these books are
displayed in the left, and the knowledge card as well as related concepts and
entities of Little Yellow Croaker are listed in the right panel.
    In Figure 3, the Wheel View initially shows the root concept (owl:Thing)
in the center of the left wheel (denoted as LC). When a sub-concept Fish is
clicked, it becomes the center of the right wheel (denoted as RC) with its child
concepts (e.g., Chondrichthyes). We can also navigate between entities. For
instance, selenium is one of the nutrients of Little Yellow Croaker. When clicking
selenium, all fishes containing this nutrient are shown in the right wheel.
    The user experience heavily depends on the quality of the underlying VKG.
The extraction and importing are executed automatically in the back-end while
we provide a user interface for conflict resolution. For “Little Yellow Croaker”,
we extract Ray-finned Fishes and Actinopterygii from different sources as
values of the property Class in the scientific classification. Since Class is defined
as a functional property and the two values do not refer to the same thing, a
conflict occurs. As shown in Figure 4, VKGBuilder accepts Actinopterygii as
the final value because this value is extracted from more trusted sources.
    Acknowledgements This work is funded by the National Key Technology
R&D Program through project No. 2013BAH11F03.