<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>openCypher Queries over Combined RDF and LPG Data in Amazon Neptune</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Willem Broekema</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohamed Elzarei</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ora Lassila</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos Manuel Lopez Enriquez</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcin Neyman</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Florian Schmedding</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Schmidt</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Steigmiller</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Geo Varkey</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gregory Todd Williams</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amanda Xiang</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amazon Neptune Team</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amazon Web Services</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Seattle</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>The Semantic Web stack, with the Resource Description Framework (RDF) as foundation, envisioned enhancing the Web with a globally connected network of knowledge [1]. RDF has proven itself for data serialization, interlinking at global scale, and reasoning. In contrast, Labeled Property Graphs (LPGs) emerged organically from companies and organizations, resulting in diferent flavors of the LPG data model without built-in semantics and with looser constraints. Users building graph applications today have to weigh the respective characteristics of both graph technologies, and then commit to one, with considerable switching costs. A stack determines the supported data formats and poses restrictions on the possible query expressivity. At Amazon Neptune we have seen that ontologists and data scientists typically choose RDF, whereas graph projects initiated by software engineers gravitate towards LPG. Our proposed OneGraph [3] metamodel is meant to overcome the need of having to choose, and the feature now highlighted is developed in that context. With support for openCypher over RDF data in Neptune Analytics, users now have the possibility to combine RDF and LPG data in a single, integrated database, and run e.g. path queries and algorithms which they could not do before with just SPARQL, while keeping the benefit of RDF features like IRIs [2].</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Presentation</title>
      <p>Imagine an engineering company that tracks how parts, grouped into components, are integrated
in products. RDF is suitable for expressing such hierarchies, but SPARQL does not ofer paths
as first class values, as shown below, so users would have to resort to LPG and openCypher
path queries to see how items are connected. The same applies to modelling other sequential
processes like supply chains. Another example is a virtual assistant created by developers using
openCypher, where the assistant’s knowledge should be enriched with internal RDF datasets
maintained by ontologists, or external open RDF datasets, in order to better answer questions.</p>
      <p>RDF and LPG have diferent data models that define what a graph is. In RDF a graph is
built from triples, using IRIs for resources. In LPGs a graph consists of vertices, edges between
vertices, and labels and properties for vertices and edges. To overcome the structural diference,
we created an interpretation so that any triple corresponds to the declaration of either a vertex
(with an id and label), a vertex property, or an edge (with a label).</p>
      <p>Another aspect was introducing support for IRIs in the openCypher query language. We reuse
the existing String type for this, by having the query engine recognize specially formatted
strings that denote IRIs. Also we support SPARQL’s PREFIX in openCypher, to allow abbreviated
IRI references in queries using prefix::suffix notation. And if an RDF dataset contains
blank nodes, they are replaced by unique IRIs at load time (scholemization), which does not
take expressive power away from users, and can be built upon the IRI functionality.</p>
      <p>For the existing openCypher functions we had to carefully design how they interact with RDF
values; and similarly equality and ordering had to be defined properly. Finally, the serialization
of all RDF values as openCypher query results had to be defined, where we choose to not
introduce incompabilities by e.g. not serializing custom literal datatypes.</p>
      <p>To come back to the earlier engineering company example, this SPARQL query returns all
Car products that include a specific part. The query won’t give insight in how exactly the part
is embedded in increasingly bigger components:
SELECT ?product {
?part ex:partId "YSH301" .
?part ex:isPartOf* ?product .</p>
      <p>?product a ex:Car .
}</p>
      <p>In contrast, this openCypher query returns the complete trace from part to product as a sequence of
nodes and edges, easily inspected or processed further:
MATCH p = ( ({ex::partId : "YSH301"})-[: ex::isPartOf*0..]-&gt;(: ex::Car) )
RETURN p</p>
      <p>
        Our presentation will also include a discussion of challenges and design choices, and the pragmatic
approach taken to get a core feature set that naturally maps RDF to openCypher and LPG. These topics
include: RDF’s notions of blank nodes and named graphs and its rich type system (e.g. numerics, literals),
LPG and RDF composite types [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], edge ids and properties, and the IRI syntax in queries.
      </p>
      <p>
        To conclude, in this presentation we highlight standardization activities towards bridging RDF and
LPG, like composite types for SPARQL and RDF, and the RDF-star Working Group [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and our efort to
align the OneGraph model and implementation with their outcomes. We believe that interoperability
between data models (with LPG and RDF as a start) and query languages (SPARQL, openCypher, Gremlin,
GQL) will enable customers to use graph database technology in more flexible and powerful ways.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>O.</given-names>
            <surname>Lassila</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hendler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          ,
          <article-title>The semantic web</article-title>
          ,
          <source>Scientific American</source>
          <volume>284</volume>
          (
          <year>2001</year>
          )
          <fpage>34</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Duerst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Suignard</surname>
          </string-name>
          , Rfc 3987:
          <article-title>Internationalized resource identifiers (iris</article-title>
          ),
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>O.</given-names>
            <surname>Lassila</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bebee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bechberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Broekema</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lawrence</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. M. Lopez Enriquez</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Sharda</surname>
          </string-name>
          , et al.,
          <article-title>The OneGraph vision: Challenges of breaking the graph model lock-in, Semantic Web 14 (</article-title>
          <year>2023</year>
          )
          <fpage>125</fpage>
          -
          <lpage>134</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Lassila</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. M. L. Enriquez</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Thompson</surname>
          </string-name>
          ,
          <article-title>Datatypes for lists and maps in RDF literals</article-title>
          ,
          <source>in: European Semantic Web Conference</source>
          , Springer,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>World</given-names>
            <surname>Wide Web Consortium</surname>
          </string-name>
          , RDF-star working group (
          <year>2024</year>
          ). URL: https://www.w3.org/groups/ wg/rdf-star/,
          <source>last accessed: July 2</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>