<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On Directly Mapping Relational Databases to Property Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Radu Stoica</string-name>
          <email>fr.a.stoica@student</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>George Fletcher</string-name>
          <email>g.h.l.fletcher@gtue.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juan F. Sequeda</string-name>
          <email>juan@capsenta.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Capsenta</institution>
          ,
          <addr-line>Austin, Texas</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Eindhoven University of Technology</institution>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Much of the data found in practice resides in relational DBs. However, many contemporary analytical tasks are performed on graphs. Property graphs are currently one of the most prevalent data models for graph data management in industry. Therefore, a key challenge is to understand the fundamental relationships between relational databases and property graph databases. This paper reports our ongoing work towards understanding these relationships by proposing R2PG-DM, a direct mapping of relational databases to property graphs. Given a relational database schema and instance, a direct mapping generates a corresponding property graph instance. The semantics of our mapping is de ned using Datalog. Our work is inspired by existing approaches for direct mappings of relational databases into earlier graph data models. Future work is to study our mapping with respect to fundamental properties such as information and query preservation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        A major class of contemporary data analytics focuses on gaining insights from the
rich complex patterns found in graph-structured data collections such as social,
communication, nancial, biological, and mobility networks [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. For example,
investigative journalists have recently found, through graph analytics, surprising
social relationships between executives of companies within the O shore Leaks
nancial social network data set, linking company o cers and their companies
registered in the Bahamas.3 Property graphs (PG) are currently one of the most
prevalent data models for the management of such data in industry [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. A PG
is an edge- and node-labeled directed multigraph where both edges and nodes
have associated sets of properties, i.e., key-value pairs.
      </p>
      <p>The O shore Leaks PG was constructed as a mapping from relational database
(RDB) sources.4 Indeed, much of the data found in practice resides in RDBs.
Therefore, a key challenge is to understand the fundamental relationships
between RDBs and PGs. A crucial rst step in exploratory graph analytics on</p>
      <p>RDBs is to transform the database into a PG. Only then can the basic graph
structure be explored and new relationships discovered using contemporary graph
database systems.</p>
      <p>This motivates the study of direct mappings from RDBs to PGs. A direct
mapping is a transformation from RDBs to PGs which (1) is domain and
schemaindependent, i.e., works regardless of the source database schema and instance,
and (2) transforms the content of the source instance into a target instance, i.e.,
given a RDB instance generates a corresponding PG instance.</p>
      <p>
        State of the art. Most approaches to graph analytics over relational data
extract graphs as user-speci ed views on relational data, e.g., [
        <xref ref-type="bibr" rid="ref3 ref8">3,8</xref>
        ]. This requires,
however, that the user already fully understands the desired graph view, which
is overly restrictive in the common case of exploratory graph analytics.
      </p>
      <p>
        The study of direct mappings from relational to property graphs is not as
well-developed. Of the small handful of approaches, the focus has been on
optimized layout for e cient query processing or mappings which are lossy or
obfuscate the input RDB schema [
        <xref ref-type="bibr" rid="ref4 ref5 ref7 ref9">4,5,7,9</xref>
        ]. Furthermore, all current direct mappings
are de ned procedurally (i.e., de ned with pseudo-code), making it di cult to
formally reason about their correctness and other basic properties.
Our contributions. In this short note, we report on our work-in-progress on
R2PG-DM, a declarative direct mapping from RDB to PG. R2PG-DM losslessly
transforms both the source instance and schema while also intuitively preserving
the original structure of the input RDB. Our work is inspired by the approach of
Sequeda et al. to direct mappings from relational to RDF graphs, the W3C
standard for sharing graph-structured data on the web [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We present R2PG-DM by
example and outline the basic research questions we are currently investigating
in our study of the connections between RDB and PG.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2 RDBs, PGs, and direct mappings</title>
      <p>Let D be an enumerable set of data values containing the special value null, and
let A be an enumerable set of attribute names containing the special attribute
tid. An RDB schema is a triple S = (R; att; ), where R is a nite set of relation
names, att is a function assigning to each r 2 R a nite set att(r) A n ftidg,
and is a nite set of primary and foreign keys over R and att.5 For r 2 R, an
r-tuple is a function t : att(r) [ ftidg ! D such that t(tid) 6= null. An instance I
of S is an assignment to each r 2 R of a nite set I(r) of r-tuples satisfying ,
such that for distinct t; t0 2 Sr2R I(r) it holds that t(tid) 6= t0(tid). An example
schema and instance is given in Figure 1 (top left).</p>
      <p>A property graph is a structure (V; E; e; `; p) where V is a nite set of vertices,
E is a nite set of edges, e is a function assigning an ordered pair of vertices
to each edge (i.e., the source and target vertices of the edge, resp.), ` assigns a
nite set of labels to each vertex and edge (from some domain of labels), and
p assigns a nite set of key-value pairs to each vertex and edge. An example
property graph is given in Figure 1 (bottom left).
5 https://en.wikipedia.org/wiki/Foreign_key</p>
      <p>On Directly Mapping Relational Databases to Property Graphs</p>
      <p>Let PG denote the set of all PGs and RDB denote the set of all pairs (S; I)
where S is an RDB schema and I is an instance of S. A direct mapping is a total
function from RDB to PG.
3</p>
    </sec>
    <sec id="sec-3">
      <title>The R2PG-DM direct mapping</title>
      <p>We next informally present R2PG-DM, a direct mapping which addresses the
shortcomings of current solutions discussed in Section 1. With R2PG-DM, RDB
relations are interpreted as classes of \things" (i.e., labeled vertices) and
foreignkey relationships are interpreted as edges between these things. In particular:
{ each input r-tuple t is represented in the output graph by a vertex v
(identi ed by t(tid)), which is labeled with r, the name of the relation to which t
belongs; furthermore, v has key-value pair (a; t(a)), for each a 2 att(r).
{ for each foreign key relationship f 2 (from, say, relation r to relation
s), for each pair of tuples t 2 I(r) and t0 2 I(s) participating in f , this
relationship is represented by an edge with label r-s from vertex vt to vertex
vt0 , representing t and t0, respectively.</p>
      <p>Similarly, as there currently does not exist a standard PG schema language,
the input schema is also fully represented in the output PG, e.g., each relation
and each attribute of a relation is represented by a vertex, and foreign keys are
represented by edges between the relevant attributes of relations, etc. Figure 1
illustrates an RDB database (S; I) and PG translation R2PG-DM(S; I). We omit
here the translation of S to PG, and only illustrate the translation of I.</p>
      <p>
        Clearly, each component of the R2PG-DM mapping can be speci ed by a
declarative non-recursive Datalog query [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] over the source DB represented using
a xed set of predicates (see Section 4.1 \Storing relational databases" of Sequeda
et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]), with the target PG represented by three predicates V ertex, Edge, and
P roperty capturing the vertices, edges, and the key-value properties associated
with vertices and edges, respectively. This is illustrated in Figure 1 (right).
      </p>
    </sec>
    <sec id="sec-4">
      <title>4 Ongoing research</title>
      <p>
        Our ongoing work proceeds along three lines. First, we are proceeding with a
formal study of R2PG-DM, establishing basic properties such as information and
query preservation [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]; we hypothesize that R2PG-DM generates a graph that is
isomorphic as a graph generated by the direct mapping of [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Moreover, we are
studying how to extend R2PG-DM in order to generate a graph with properties
on edges, taking full advantage of the data model. Second, we are developing
practical tools for e cient and scalable R2PG-DM, to be made available as
open source code. Third, we aim to extend the R2PG-DM approach to support
customized mappings for the cases where there is a schema de ned on the target
instance (e.g., mapping the relational schema S to an equivalent PG schema).
This builds upon ongoing e orts towards standards for PG schema languages.6
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Angela</given-names>
            <surname>Bonifati</surname>
          </string-name>
          , George Fletcher, Hannes Voigt,
          <string-name>
            <given-names>Nikolay</given-names>
            <surname>Yakovets</surname>
          </string-name>
          . Querying Graphs. Morgan &amp; Claypool,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Todd</surname>
            <given-names>J</given-names>
          </string-name>
          .
          <string-name>
            <surname>Green</surname>
          </string-name>
          et al.
          <article-title>Datalog and recursive query processing</article-title>
          .
          <source>Foundations and Trends in Databases</source>
          <volume>5</volume>
          (
          <issue>2</issue>
          ):
          <fpage>105</fpage>
          -
          <lpage>195</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Mohamed</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Hassan</surname>
          </string-name>
          et al.
          <article-title>Extending in-memory relational database engines with native graph support</article-title>
          .
          <source>In EDBT 2018</source>
          , pages
          <fpage>25</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Ognjen</given-names>
            <surname>Orel</surname>
          </string-name>
          , Slaven Zakosek,
          <string-name>
            <given-names>Mirta</given-names>
            <surname>Baranovic</surname>
          </string-name>
          .
          <article-title>Property oriented relational-tograph database conversion</article-title>
          .
          <source>Automatika</source>
          <volume>57</volume>
          (
          <issue>3</issue>
          ):
          <fpage>836</fpage>
          -
          <lpage>845</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Subhesh</given-names>
            <surname>Pradhan</surname>
          </string-name>
          , Sharma Chakravarthy,
          <string-name>
            <given-names>Aditya</given-names>
            <surname>Telang</surname>
          </string-name>
          .
          <article-title>Modeling relational data as graphs for mining</article-title>
          .
          <source>In COMAD</source>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Juan</surname>
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Sequeda</surname>
          </string-name>
          , Marcelo Arenas, Daniel P. Miranker.
          <article-title>On directly mapping relational databases to RDF and OWL</article-title>
          .
          <source>In WWW 2012</source>
          , pages
          <fpage>649</fpage>
          -
          <lpage>658</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Roberto De Virgilio, Antonio Maccioni, Riccardo Torlone.
          <article-title>R2G: a tool for migrating relations to graphs</article-title>
          .
          <source>In EDBT 2014</source>
          , pages
          <fpage>640</fpage>
          -
          <lpage>643</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Konstantinos</given-names>
            <surname>Xirogiannopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Amol</given-names>
            <surname>Deshpande</surname>
          </string-name>
          .
          <article-title>Extracting and analyzing hidden graphs from relational databases</article-title>
          .
          <source>In SIGMOD 2017</source>
          , pages
          <fpage>897</fpage>
          -
          <lpage>912</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Kang</given-names>
            <surname>Min</surname>
          </string-name>
          <string-name>
            <given-names>Yoo</given-names>
            , Sungchan Park,
            <surname>Sang-Goo Lee</surname>
          </string-name>
          .
          <article-title>RDB2Graph: a generic framework for modeling relational databases as graphs</article-title>
          .
          <source>In JIST 2014</source>
          , pages
          <fpage>148</fpage>
          -
          <lpage>151</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>