Towards A Unified Knowledge Graph Data Management System Baozhu Liu, Xin Wang, Pengkai Liu, Sizhuo Li College of Intelligence and Computing, Tianjin University, Tianjin, China {liubaozhu,wangx,liupengkai,lszskye}@tju.edu.cn ABSTRACT Web Interface Visualization Knowledge graph currently has two main data models: RDF graph SPARQL Cypher and property graph. The query language on RDF graph is SPARQL, while the query language on property graph is mainly Cypher. Semantic Different data models and query languages hinder the wider ap- Lexical Parser Syntax Parser Translation plication of knowledge graphs. In the paper, we propose a unified Unified Knowledge Graph Query Processing interoperable knowledge graph database system, which can effec- tively manage both RDF and property graphs. Property Graph RDF Graph OR Reference Format: Baozhu Liu, Xin Wang, Pengkai Liu, Sizhuo Li. Towards A Unified Knowledge Graph Knowledge Graph Data Management System. In the 2nd Workshop on Search, Exploration, and Analysis in Heterogeneous Datastores (SEA Data Vertex Table 1 Edge Table 1 Vertex Table 3 2021). id property id source target property id property Vertex Table 2 Edge Table 2 id property id source target property 1 INTRODUCTION With the proliferation of Knowledge Graphs (KG), the applications Unified Knowledge Graph Storage of KGs have a rapid growth in recent years. RDF (Resource Descrip- Figure 1: The Overall Architecture. tion Framework) graph and property graph are the two mainstream data models of KGs. On one hand, RDF has become the World Wide the storage problem of untyped entities is addressed. (3) The inter- Web Consortium recommendation to represent KGs, and is widely operability of SPARQL and Cypher is realized, and enables them to used by triple stores, such as gStore [1]. On the other hand, property interchangeably operate on the same knowledge graph. (4) With graphs are widely applied to graph databases, such as Neo4j [2]. It a unified Web interface, users are allowed to query with two dif- has been widely recognized that it is necessary to unify the data ferent languages over the same KG and visualize query results and models and query languages for KG database management. To this explanations. end, we propose a unified KG data management system, which con- Due to the unified storage scheme and query processing method sists of three components, i.e., storage manager, query processing that we utilized, it is easier to manage multiple KGs in one database. coordinator, and Web interface, making multiple KGs managable in Users no longer need to switch among different database systems a unified database management system. The queries will be trans- to obtain storage and query support for different data models. lated into unified semantics denoted by relational algebra using To verify the effectiveness and efficiency of the proposed system, the query processing coordinator. In storage manager, RDF graphs extensive experiments were conducted on several data sets. The and property graphs will be shred into relations with the specific experimental results show that our system outperforms gStore [1] approaches. and Neo4j [2], which are two state-of-the-art KG database systems. The comparison of the features supported by the systems is shown in Table. 1. 2 APPROACH AND NOVELTY Table 1: System Comparison. As shown in Fig. 1, to the best of our knowledge, the system pro- posed in this paper is the first KG database system that realizes a Storage Query System unified storage scheme, facilitates the interoperability of SPARQL RDF Property BGP Text Graph RPQ and Cypher, and meanwhile provides a Web interface to visual- Graph Search Analysis ize the query results and explanations. (1) Based on the relational √ √ √ √ √ √ ours model, a unified storage scheme is utilized to efficiently store RDF √ √ √ gStore × × × graphs and property graphs, and support the query requirements √ √ √ √ Neo4j × × of knowledge graphs. (2) Using the characteristic-set-based method, 3 FUTURE WORKS Copyright © 2021 for the individual papers by the papers’ authors. Copyright © 2021 for the volume as a collection by its editors. This volume and its papers are published In order to meet the storage and query requirements of large-scale under the Creative Commons License Attribution 4.0 International (CC BY 4.0). KG data, we will focus on the distributed KG data management sys- Published in the Proceedings of the 2nd Workshop on Search, Exploration, and Anal- ysis in Heterogeneous Datastores, co-located with VLDB 2021 (August 16-20, 2021, tems in the future. Moreover, more query features will be supported Copenhagen, Denmark) on CEUR-WS.org. in the unified KG management system. ACKNOWLEDGMENTS REFERENCES This work is supported by National Key Research and Development [1] Lei Zou, M Tamer Özsu, Lei Chen, Xuchuan Shen, Ruizhe Huang, and Dongyan Zhao. gstore: a graph-based sparql query engine. The VLDB journal, 23(4):565–590, Program of China (2019YFE0198600); the National Natural Science 2014. Foundation of China (61972275, 61972402); and CCF-Huawei Data- [2] Justin J Miller. Graph database applications and concepts with neo4j. In Proceedings of the Southern Association for Information Systems Conference, Atlanta, GA, USA, base Innovation Research Plan (CCF-Huawei DBIR2019004B). volume 2324, 2013. 2