=Paper=
{{Paper
|id=Vol-3279/paper2
|storemode=property
|title=Transforming RDF-star to Property Graphs: A Preliminary Analysis of Transformation Approaches
|pdfUrl=https://ceur-ws.org/Vol-3279/paper2.pdf
|volume=Vol-3279
|authors=Ghadeer Abuoda,Daniele Dell’Aglio,Arthur Keen,Katja Hose
|dblpUrl=https://dblp.org/rec/conf/semweb/AbuodaDKH22
}}
==Transforming RDF-star to Property Graphs: A Preliminary Analysis of Transformation Approaches==
<pdf width="1500px">https://ceur-ws.org/Vol-3279/paper2.pdf</pdf>
<pre>
Transforming RDF-star to Property Graphs:
A Preliminary Analysis of Transformation Approaches
Ghadeer Abuoda1 , Daniele Dell’Aglio1 , Arthur Keen2 and Katja Hose1
1
    Department of Computer Science, Aalborg University, Aalborg, Denmark
2
    ArangoDB, San Francisco, United States


                                         Abstract
                                         RDF and property graph models have many similarities, such as using basic graph concepts like nodes
                                         and edges. However, such models differ in their modeling approach, expressivity, serialization, and
                                         the nature of applications. RDF is the de-facto standard model for knowledge graphs on the Semantic
                                         Web and supported by a rich ecosystem for inference and processing. The property graph model, in
                                         contrast, provides advantages in scalable graph analytical tasks, such as graph matching, path analysis,
                                         and graph traversal. RDF-star extends RDF and allows capturing metadata as a first-class citizen. To
                                         tap on the advantages of alternative models, the literature proposes different ways of transforming
                                         knowledge graphs between property graphs and RDF. However, most of these approaches cannot provide
                                         complete transformations for RDF-star graphs. Hence, this paper provides a step towards transforming
                                         RDF-star graphs into property graphs. In particular, we identify different cases to evaluate transformation
                                         approaches from RDF-star to property graphs. Specifically, we categorize two classes of transformation
                                         approaches and analyze them based on the test cases. The obtained insights will form the foundation for
                                         building complete transformation approaches in the future.


1. Introduction
The most popular models for representing knowledge graphs are: RDF1 (Resource Description
Framework) and property graphs [1] (PG). While RDF represents knowledge graphs as a set of
subject-predicate-object triples, property graphs assign key-value style properties to nodes and
edges.Recently, RDF-star [2] has been proposed as an extension of RDF to enable enriching RDF
triples with metadata information by embedding triples in subjects or objects of other triples,
which allows providing statements about statements and somewhat resembles adding properties
to edges in property graphs. RDF-star is supported by a rich ecosystem of data management
systems and standards, most notably systems such as Stardog, OpenLink’s Virtuoso, Ontotext
GraphDB, AllegroGraph, Apache Jena, and more recently also Oxigraph, but also query stan-
dards, such as SPARQL2 and its extension SPARQL-star3 as well as RDF Schema, which allows


QuWeDa 2022: 6th Workshop on Storing, Querying and Benchmarking Knowledge Graphs at ISWC, October 23, 2022,
virtual
$ gsmas@cs.aau.dk (G. Abuoda); dade@cs.aau.dk (D. Dell’Aglio); arthur@arangodb.com (A. Keen);
khose@cs.aau.dk (K. Hose)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
                  1
                    RDF 1.1 Primer: https://www.w3.org/TR/rdf11-primer/
                  2
                    SPARQL 1.1 Query Language: https://www.w3.org/TR/sparql11-query/
                  3
                    SPARQL-star Query Language: https://w3c.github.io/rdf-star/cg-spec/editors_draft.html#sparql-star
Alex              age         Alex           age
                        25                         25
                                                                               Alex
                                         certainty              Alex
               certainty                                                              (age = 25)
                                                                        (age = 25)
                                       0.5
            0.5
                             (a) RDF-star Graph                (b) A property graph

  Figure 1: Graphical Representation of Listing 1 as (a) RDF-star and (b) Property Graph


  Listing 1: An example RDF-star graph in Turtle star format
  @prefix ex: <http://example.org/> .
  <<ex:Alex ex:age 25>> ex:certainty 0.5 .


  describing classes of RDF resources and properties4 . In contrast, many graph database systems,
  such as Neo4j, TigerGraph, JanusGraph, RedisGraph, and SAP HANA are based on different
  variations of the property graph model [3] and different query languages [4, 5]. Unfortunately,
  RDF-star graphs and property graphs are not entirely compatible with one another. Although
  they both describe data through graphs, their underlying models and semantics are different,
  leading to many data interoperability issues [6, 7, 8]. Metadata or edge properties in RDF-star
  can be modeled as separate nodes or RDF-star triples. In contrast, edge properties can only
  be represented as literal key-value pairs in property graphs. In general, it is challenging to
  transform an RDF-star graph fully into a property graph because of the rich expressiveness of
  the former. The heterogeneity between the two models and their frameworks makes it necessary
  to study their interoperability, i.e., the ability to map one model to another for data exchange
  and sharing [7].
     The mapping between the two models is crucial for data exchange, data integration as well as
  reusability of systems and tools between the frameworks. RDF-star, specifically the RDF model,
  is recognized as a web-native model that supports data exchange and sharing across different
  sources because of its formal semantics and the universal uniqueness of resources using IRIs.
  RDF is a common and flexible model for knowledge representation, and that is exemplified
  by knowledge graphs that cover a broad set of domains, such as DBpedia [9], YAGO [10], and
  Wikidata [11]. On the contrary, even with the wide adoption of property graph engines, property
  graphs lack many essential features, such as a schema language, a standard query language,
  standard data serialization formats, etc. Achieving interoperability and reliable transformations
  between the two frameworks will finally enable us to exploit the benefits of both models.
     The transformation of property graphs to RDF-star has been explored recently [12], and basic
  transformation rules for property graphs to RDF-star were proposed [2, 13]. However, the latter
  does not cover all RDF-star constructs and allows for multiple alternatives.

       4
           RDF Schema 1.1: https://www.w3.org/TR/rdf-schema/
   Consider, for instance, the example illustrated in Figure 1(a). If we start with the triple
(ex:Alex, ex:age, 25) in Listing 1, then we could represent the RDF element (Alex) as a node
in a property graph, as shown in Figure 1(b). This node would then have a property (age,25) and
the RDF triple would be represented by a single node in a property graph. However, if we have
a single node without an edge, we cannot represent the metadata about the original RDF triple
(ex:certainty 0.5) in the property graph. Studying such cases, this paper makes the following
contributions:
     • We identify two alternative approaches of transformations: RDF-topology-preserving and
       Property-Graph transformation.
     • We define a set of test cases capturing the diverse RDF-star constructs that have to be
       considered when transforming RDF-star to property graphs.
     • Using the test cases, we systematically evaluate alternative mapping approaches and
       identify their shortcomings.
This paper is structured as follows: while Section 2 introduces preliminaries, Section 3 discusses
related work. Section 4 presents alternative transformation approaches. Afterwards, Section 5
provides details on our test cases, which we use in Section 6 to identify and discuss shortcomings
of transformation approaches. Section 7 concludes the work with an outlook for future work.


2. Preliminaries
In this section, we formally introduce RDF1 , RDF-star [2], and property graphs [1].

2.1. Resource Description Framework (RDF).
RDF is a W3C standard data model that represents information as a set of statements. Each
statement denotes a typed relation between two resources.

Definition 1 (RDF statement). Let 𝐼, 𝐵 and 𝐿 be the disjoint sets of Internationalized Resource
Identifiers (IRIs), blank nodes and literals. An RDF statement is a triple (𝑠, 𝑝, 𝑜) ∈ (𝐼 ∪ 𝐵) × 𝐼 ×
(𝐼 ∪ 𝐵 ∪ 𝐿), and it indicates that 𝑠 and 𝑜 (subject and object, resp.) are in a relation 𝑝 (predicate).

   In this paper, we consider two types of RDF statements that we distinguish based on whether
the object is an IRI or a literal. Object property statements are RDF statements (𝑠, 𝑝, 𝑜) ∈ (𝐼 ∪𝐵)×
𝐼 × (𝐼 ∪ 𝐵), while datatype property statements are RDF statements (𝑠, 𝑝, 𝑜) ∈ (𝐼 ∪ 𝐵) × 𝐼 × 𝐿.
An RDF graph containing three RDF statements is shown in Listing 2 serialized in Turtle5 , and
visually in Figure 2(a) as a graph. The first two statements are object property statements. The
first statement describes two resources, ex:Apple_Inc and ex:California, related by the predicate
ex:located_in. The second statements indicates that ex:Apple_Inc has ex:Tim_Cook as a ex:CEO.
The last statement is a datatype property statement, and it indicates that ex:Tim_Cook has the
literal "2011" as the value of the ex:start_date predicate.


    5
        RDF Turtle: https://www.w3.org/TR/turtle/
       Listing 2: An RDF graph in Turtle format
      @prefix ex: <http://example.org/> .
      ex:Apple_Inc ex:located_in ex:California .
      ex:Apple_Inc ex:CEO ex:Tim_Cook .
      ex:Tim_Cook ex:start_date 2011 .


                                                       :located_in
                                          :Apple_Inc                     :California
             :located_in                                                                        :located_in
:Apple_Inc                 :California    :CEO                                                                      :California

      :CEO                                                 :start_date                 :Apple_Inc
                                2011      :Tim_Cook                                                   :CEO
                                                                                                                       :Tim_Cook
             :Tim_Cook     :start_date                                                         (start_date= 2011)
                                                                2011

         (a) RDF Graph                           (b) RDF-star Graph                           (c) Property Graph

      Figure 2: Example in RDF, RDF-star, and Property Graphs


      2.2. RDF-star
      Looking at the RDF graph in Figure 2(a), one can spot some imprecise data modelling choices:
      stating that Tim Cook started in 2011 is not totally correct, as he started in 2011 his role as CEO
      of Apple. In other words, one should associate the starting date to the statement (ex:Apple_Inc,
      ex:CEO, ex:Tim_Cook), as depicted in Figure 2(b). There are several ways to implement this idea
      in RDF, such as RDF reification [2], singleton properties [14], and named graphs [15]. However,
      these mechanisms have significant shortcomings [16, 17, 18].

       Listing 3: An RDF-Star Graph in Turtle-Star Format
      @prefix ex: <http://example.org/> .
        ex:Apple_Inc ex:located_in ex:California .
      <<ex:Apple_Inc ex:CEO ex:Tim_Cook>> ex:start_date 2011 .


         A solution to overcome such shortcomings was recently proposed by Hartig et al. with
      RDF-star [2, 12]. RDF-star extends RDF by letting RDF statements be subjects or objects in
      other statements. Listing 3 shows an RDF-star document serialized in Turtle-star [2]. The first
      statement is a compliant RDF statement (it appeared also in Listing 2). The second statement
      indicates that ex:Apple_Inc appointed ex:Tim_Cook as ex:CEO in 2011. We formally define an
      RDF-star statement as follows.

      Definition 2 (RDF-star statement). Let 𝑠 ∈ 𝐼 ∪ 𝐵 , 𝑝 ∈ 𝐼, 𝑜 ∈ 𝐼 ∪ 𝐵 ∪ 𝐿. An RDF-star
      statement is a triple defined recursively as:
          • Any RDF statement (𝑠, 𝑝, 𝑜) is an RDF-star statement;
          • Let 𝑡 and ¯𝑡 be RDF-star statements. Then, (𝑡, 𝑝, 𝑜), (𝑠, 𝑝, 𝑡) and (𝑡, 𝑝, ¯𝑡) are RDF-star state-
            ments, also known as asserted statement. 𝑡 and ¯𝑡 are called embedded or quoted statements.
2.3. Property Graphs
A property graph (PG) is a graph where nodes and edges can have multiple properties, repre-
sented as key-value pairs. Figure 2(c) illustrates the graph described in the above section as a
property graph. In this case, the starting date of Tim Cook as the CEO of Apple Inc is reported
as a key-value property on the :CEO edge. PGs have not a unique and standardized model; each
PG engine proposes its data model. A generic PG model definition is proposed by [3].

Definition 3 (Property Graph). Let 𝐿 be the set of the labels, 𝑃 𝑁 be the set of property names,
and 𝐷 be the set of property values. A property graph 𝐺 is an edge-labeled directed multi-graph
such that 𝐺 = (𝑁, 𝐸, 𝑒𝑑𝑔𝑒, 𝑙𝑏𝑙, 𝑃, 𝜎), where:
    • 𝑁 is a set of nodes,
    • 𝐸 is a set of edges between nodes, such that 𝑁 ∩ 𝐸 = ∅
    • 𝑒𝑑𝑔𝑒 : 𝐸 → (𝑁 × 𝑁 ) is a total function that associates each edge in 𝐸 with a pair of nodes
      in 𝑁 . If 𝑒𝑑𝑔𝑒(𝑒1 ) = (𝑛1 , 𝑛2 ), 𝑛1 is the source node and 𝑛2 is the target node.
    • 𝑙𝑏𝑙 : (𝑁 ∪ 𝐸) → 𝒫(L) is a function that associates each edge or node with a set of labels.
    • 𝜎 : (𝑁 ∪ 𝐸) → 𝒫(𝑃 ) is a function that associates a node or edge with a non-empty set of
      properties 𝑃 defined as a set of key-value pairs (𝑘, 𝑣) where 𝑘 ∈ 𝑃 𝑁 and 𝑣 ∈ 𝐷

To ease all approaches’ output representation, we map any IRI to a distinct string representing
a local name. Given 𝐼, a set of all IRIs, 𝑙𝑜𝑐𝑎𝑙𝑁 𝑎𝑚𝑒 is a function that maps an IRI to a string
that represents the local name of an RDF resource6 . For example, the local name for the RDF
resource (http://example.com/meets) 𝑙𝑜𝑐𝑎𝑙𝑁 𝑎𝑚𝑒("http://example.com/meets") is "meets". We
will use this function in the output representation in Section 6.


3. Related Work
We can distinguish between related work on converting between (i) RDF and PG and (ii) RDF-star
and PG.

RDF and PG. Angles et al. [13] propose three variations of transforming RDF into PG op-
tionally in consideration of schemas: simple, generic, and complete. The authors formally show
that two of the proposed mapping approaches (generic and complete) satisfy the property of
information preservation, i.e., there exist inverse mappings that allow recovering the origi-
nal dataset without information loss. The evaluation in this paper (Section 6) includes the
schema-independent mapping referred to as the Generic Database Mapping using the authors’
implementation (RDF2PG7 ). Although our focus is on RDF-star (instead of RDF), we include
this approach since it provides the basic formalities and implementation that can be extended
to support RDF-star.
   In the opposite direction, Bruyat et al. [19] propose PREC8 , a library that enables tranformation
PGs into RDF graphs. The authors built a uniform graph model to describe the structure of

   6
       In Neo4j, the user can configure the local name of RDF terms such as subPropertyOf, subClassOf, Class, etc.
   7
       https://github.com/renzoar/rdf2pg
   8
       https://bruju.github.io/PREC/
Listing 4: RDF triples
@prefix ex: <http://example.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ex:book ex:publish_date "1963-03-22"^^xsd:date .
ex:book ex:pages         "100"^^xsd:integer .
ex:book ex:cover         20 .
ex:book ex:index         "55" .


the property graph in RDF terms. PREC uses a context in RDF-star format that describes the
mappings between the terms used in the PG model and IRIs. The user can define a template
representing the different properties and edges in the resulting RDF graph. Despite using
RDF-star internally, the approach does not support mapping PGs to RDF-star graphs.

RDF-star and PG. Hartig et al. [2] propose two approaches for transforming RDF-star to
PG. The first approach maps ordinary RDF triples to edges in the PG. Metadata triples are then
represented as edge properties. Our analysis (Section 6) includes this approach using the authors’
implementation (RDF-star Tools9 ). The second approach treats datatype and object property
statements differently. The former are transformed into node properties and the latter into edges.
This approach, however, is limited in mapping embedded triples, and it is not implemented in
the RDF-star Tools library.
   Neosemantics is a well-known project to import RDF data into Neo4J, implemented as a Neo4j
plug-in10 . The implementation was only recently extended to include an RDF-star importing
feature. As we will see in Section 6 importing RDF-star into PG using this transformation is
lossy and does not cover all cases.
   In the other direction, Khayatbashi et al. [12] present an analysis evaluating three transfor-
mation approaches from PG to RDF, including an RDF-star approach. As a part of the study,
the authors evaluated the performance of querying the generated RDF graphs in multiple
triple stores. They found that there is no clear best mapping in terms of execution time; the
performance of the queries over RDF and RDF-star graphs resulting from the mapping varies
compared to their equivalent pure RDF representations.


4. Transformation Approaches from RDF to Property Graphs
Analyzing the approaches discussed in Section 3, we can extrapolate two principle approaches:
RDF-topology Preserving Transformation (RPT) and Property Graph Transformation (PGT). RPT
tries to preserve the RDF-star graph structure by transforming each RDF statement into an edge
in the PG. PGT, on the other hand, ensures that datatype property statements are mapped to
node properties in the PG. In what follows, we first explain how these approaches transform RDF
triples into PG and afterwards how the basic algorithms can be extended to support RDF-star.
   Consider the example in Listing 4 with multiple datatype property statements describing
the RDF resource (ex:book). Figure 3 shows graphical visualizations of the property graphs

    9
        https://github.com/RDFstar/RDFstarTools
   10
        Neosemantics: https://neo4j.com/labs/neosemantics/
generated by the two approaches: RPT (a) and PGT (b).
   RPT, for example, converts the triple (ex:book,ex:index,55) into two nodes (ex:book) and
(55), connected by an edge (ex:index). All other triples involving RDF resources, blank nodes,
or literal values can be transformed in a similar way so that we obtain the PG in Figure 3(a).
Algorithm 1 formalizes the RPT approach; for each triple it always creates a node for the subject
(line 3) and the object (line 5) with an edge connecting them (line 12) – of course avoiding
duplicate nodes for the same IRIs.
   For the same example (Listing 4), PGT creates the PG in Figure 3(b) consisting of a single node
respresenting the RDF resource (ex:book) with multiple properties representing property-object
pairs from the RDF statements, such as (ex:index,55). Distinguishing between datatype and
object property statements, this approach transforms object property statements to edges and
datatype property statements to properties of the node representing the subject. Unlike RPT,
the resulting PG nodes represent only RDF resources or blank nodes while literal objects will
become properties. PGT is more formally sketched in Algorithm 2, which first checks the type
of the statement’s object (line 5) and based on that decides to either create a node (if it does not
yet exist, line 6) or a property (line 13).
                                                  Algorithm 2: PGT
  Algorithm 1: RPT                                  Input: A set of RDF Triples 𝑇
  Input: A set of RDF Triples 𝑇                            Output: A Property Graph 𝑃 𝑔 = (𝑁, 𝐸, 𝑒𝑑𝑔𝑒, 𝑙𝑏𝑙, 𝑃, 𝜎)
  Output: A Property Graph 𝑃 𝑔 = (𝑁, 𝐸, 𝑒𝑑𝑔𝑒, 𝑙𝑏𝑙, 𝑃, 𝜎)    1: 𝑃 𝑔 ← ∅
   1: 𝑃 𝑔 ← ∅                                               2: for 𝑡 ∈ 𝑇 , such that 𝑡 =< 𝑠, 𝑝, 𝑜 > do
   2: for 𝑡 ∈ 𝑇 , such that 𝑡 =< 𝑠, 𝑝, 𝑜 > do               3:    𝑁 = 𝑁 ∪ {𝑠}
   3:    𝑁 = 𝑁 ∪ {𝑠}                                        4:    𝑙𝑏𝑙(𝑠) = {"RDF resource"}
   4:    𝑙𝑏𝑙(𝑠) = {"RDF resource"}                          5:    if 𝑜 is an RDF resource then
   5:    𝑁 = 𝑁 ∪ {𝑜}                                        6:        𝑁 = 𝑁 ∪ {𝑜}
   6:    if 𝑜 is an RDF resource then                       7:        𝑙𝑏𝑙(𝑜) = {"RDF resource"}
   7:        𝑙𝑏𝑙(𝑜) = {"RDF resource"}                      8:        𝐸 = 𝐸 ∪ {𝑒}
   8:    else                                               9:        𝑒𝑑𝑔𝑒(𝑒) = (𝑠, 𝑜)
   9:        𝑙𝑏𝑙(𝑜) = {"Literal"}                          10:        𝑙𝑏𝑙(𝑒) = 𝑝
  10:    end if                                            11:    else
  11:    𝐸 = 𝐸 ∪ {𝑒}                                       12:        𝑃 = 𝑃 ∪ {𝑝𝑟}
  12:    𝑒𝑑𝑔𝑒(𝑒) = (𝑠, 𝑜)                                  13:        𝑝𝑟 = {(p, o)}
  13:    𝑙𝑏𝑙(𝑒) = 𝑝                                        14:        𝜎(𝑠) = pr
  14: end for                                              15:    end if
  15: return 𝑃 𝑔                                           16: end for
                                                           17: return 𝑃 𝑔


                                              20
                            :cover
                                  :index
                   :book                           55
                                                                 :book
                                                                         publish_date = "1963-03-22"
                           :publish_date
               :pages                                                    index = 55
                                           "1963-03-22"                  pages = 100
                                                                         cover = 20
                    100


                               (a) RPT                                      (b) PGT

Figure 3: RPT and PGT transformations for the example in Listing 4
   Let us now consider the RDF-star example in Listing 5, which contains an asserted triple
for an embedded data property statement – the PGs obtained by applying RPT and PGT are
shown in Figure 4. Algorithms 3 and 4 illustrate the main principle of mapping the embedded
and asserted triples. RPT conversion for RDF-star is identical to RDF triples, then converting
the asserted triple into an edge property (Algorithm 3 lines 5-8). PGT transforms the embedded
triple depending on its object; if it is an RDF resource, PGT converts it to an edge. Otherwise, it
converts the embedded triple into a node with a property (Algorithm 4 lines 6-11) and fails to
transform the asserted triple.
Listing 5: RDF-star triples
@prefix ex: <http://example.org/> .
<<ex:Mark ex:age 28>> ex:certainty 1 .


   In summary, the transformation of the triples from Listing 5 using PGT results in a PG with a
single node that makes it impossible to represent the asserted triple since PGs do not support
properties over other properties. In contrast, RPT transforms the embedded triple into an edge
in the PG and can express the asserted triple as the edge’s property. The Abstracting away from
a few details (see also Section 6), the Neosemantics approach10 basically follows PGT while
RDF-star Tools9 and RDF2PG7 follow RPT.
                                                   Algorithm 4: PGT-star
 Algorithm 3: RPT-star                               Input: A set of RDF-star Triples 𝑇
  Input: A set of RDF-star Triples 𝑇                                   Output: A Property Graph 𝑃 𝑔 = (𝑁, 𝐸, 𝑒𝑑𝑔𝑒, 𝑙𝑏𝑙, 𝑃, 𝜎)
  Output: A Property Graph 𝑃 𝑔 = (𝑁, 𝐸, 𝑒𝑑𝑔𝑒, 𝑙𝑏𝑙, 𝑃, 𝜎)                1: 𝑃 𝑔 ← ∅
   1: 𝑃 𝑔 ← ∅                                                           2: for 𝑡 ∈ 𝑇 , such that 𝑡 =< 𝑠, 𝑝, 𝑜 > do
   2: for 𝑡 ∈ 𝑇 , such that 𝑡 =< 𝑠, 𝑝, 𝑜 > do                           3:     if ¯𝑡 is an embedded triple, such that 𝑡 =< ¯𝑡, 𝑝, 𝑜 > and
                                                                              ¯𝑡 =< ¯   𝑠, 𝑝  𝑜 > then
                                                                                           ¯, ¯
   3:     if ¯𝑡 is an embedded triple, such that 𝑡 =< ¯𝑡, 𝑝, 𝑜 > and
         ¯𝑡 =< ¯   𝑠, 𝑝  𝑜 > then
                      ¯, ¯                                              4:             𝑜 is an RDF resource then
                                                                                    if ¯
   4:          𝑃 𝑔𝑂𝑢𝑡 = RPT(𝑡¯)                                         5:              𝑃 𝑔𝑂𝑢𝑡 = PGT(𝑡¯)
   5:          𝑝𝑟 = {(p, o)}                                            6:          else
   6:          𝑃 = 𝑃 ∪ {𝑝𝑟}                                             7:              𝑃 𝑔𝑂𝑢𝑡 = PGT(𝑡¯)
   7:          𝜎(𝑒) = 𝑝𝑟                                                8:              𝑝𝑟 = {(𝑝¯,𝑜
                                                                                                  ¯)}
   8:     end if                                                        9:              𝑃 = 𝑃 ∪ {𝑝𝑟}
   9: end for                                                          10:               𝜎(𝑠¯) = 𝑝𝑟
  10: return 𝑃 𝑔 ∪ 𝑃 𝑔𝑂𝑢𝑡                                              11:          end if
                                                                       12:     end if
                                                                       13: end for
                                                                       14: return 𝑃 𝑔 ∪ 𝑃 𝑔𝑂𝑢𝑡

5. Test Cases
In this section, we present a systematic list of test cases that transformation approaches need
to fulfill. We distinguish between basic cases that conform to small RDF graphs as well as a
range of RDF-star specific test cases that challenge existing approaches – as we will see in our
evaluation in Section 6. The complete list of test cases with their short titles is shown in Table 1
– subcases represent variations and bold font indicates cases discussed in more detail in this
paper. For the sake of space, we present only part of these cases in this section, more details for
all cases are available on our website11
   11
        https://relweb.cs.aau.dk/rdfstar
                                                      IRI = http://example.com/
                                           Legend     int=http://www.w3.org/2001/XMLSchema\#integer”
                                                      str=http://www.w3.org/2001/XMLSchema\#string"
                                                      date=http://www.w3.org/2001/XMLSchema\#date"
                                                                                                       IRI = http://example.com/
                                                                                              Legend   int=http://www.w3.org/2001/XMLSchem
                                                                                                       str=http://www.w3.org/2001/XMLSchem
                                                                                                       date=http://www.w3.org/2001/XMLSche


                                           :Matt
                             :likes

                         certainty = 0.5
               :Mary                                                                         :Matt
                                                                            :likes
                            :age
                                                                        certainty = 0.5
                       certainty = 1
                                                          :Mary
                                            28               age = 28


                           (a) RPT                                       (b) PGT

Figure 4: RPT and PGT transformations for the example in Listing 5


5.1. Standard RDF

Case 1: Standard RDF statement This case represents an object property statement. Both,
subject and object are RDF resources. Most transformation approaches map this case to two
nodes (subject and object) with an edge (the predicate) connecting them.
@prefix ex: <http://example.org/> .
ex:alice ex:meets ex:bob .


Case 2: The predicate of an RDF statement is subject in another statement Mapping
an RDF statement to two nodes with the predicate as label of the edge between them leads to
problems when the predicate itself is also used as a subject in another RDF statement – Case 2.1
therefore consists of the following statements:
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ex: <http://example.org/> .
ex:Sam ex:mentor ex:Lee .
ex:mentor rdfs:label "project supervisor" .
ex:mentor ex:name "mentor's name" .


Other variants of Case 2 include a predicate for a non-literal object, such as rdf:type and
rdfs:subPropertyOf.
Case 3: Data types and language tags It is also important to test the support of different
data types and language tags. Hence, Case 3.1, for instance, contains several datatype property
statements involving different data types and formats for the literal objects:
@prefix ex: <http://example.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ex:book ex:publish_date "1963-03-22"^^xsd:date .
ex:book ex:pages         "100"^^xsd:integer .
ex:book ex:cover         20 .
ex:book ex:index         "55" .


5.2. RDF-star

Case 8: Embedded object property statement in subject position As the name indicates
and the following listing shows, this test case features an RDF-star statement where the subject
Table 1
Test cases for evaluating RDFstar-to-PG transformation approaches
                                             Standard RDF
    Case   Description
    1      Standard RDF statement
    2      The predicate of an RDF statement is subject in another statement
    2.1    Predicate as subject and literal as object
    2.2    Predicate as subject and RDF resource as object
    2.3    Predicate as subject and RDF property as object - rdfs:subPropertyOf
    2.4    Predicate as subject and RDF class as object - rdf:type
    3      Data types and language tags
    3.1    Datatype property statements with different data types of the literal objects
    3.2    Datatype property statements with different language tags of the literal objects
    4      RDF list
    5      Blank nodes
    6      Named graphs
    7      Multiple types for resources - rdf:type
                                               RDF-Star
    8      Embedded object property statement in subject position
    9      Embedded datatype property statement in subject position
    10     Embedded object propertystatement in object position
    11     Embedded object property statement in subject position and non-literal object
    11.1   Asserted statement with non-literal object
    11.2   Asserted statement with non-literal object that appears in another asserted statement
    12     Embedded statement in subject position - object property with rdf:type predicate
    12.1   Asserted statement with rdf:type as predicate
    12.2   Embedded statement with rdf:type as predicate
    13     Double nested RDF-star statement in subject position
    14     Multi-valued properties
    14.1   RDF statements with same subject and predicate and different objects
    14.2   RDF-star statements with the same subject and predicate and different objects
    15     Multiple instances of embedded statements in a single RDF-star graph
    15.1   Identical embedded RDF-star statements with different asserted statements
    15.2   RDF statements as embedded and asserted statements in the same graph

corresponds to an embedded object property statement and the object is a literal:
@prefix ex: <http://example.org/> .
<<ex:alice ex:likes ex:bob>> ex:certainty 0.5 .


Case 9: Embedded datatype property statement in subject position Similar to the previous
case we again have an RDF-star statement where the subject corresponds to an embedded
statement. In contrast to the Case 8, the embedded statement in Case 9 is a datatype property
statement:
@prefix ex: <http://example.org/> .
<<ex:Mark ex:age 28>> ex:certainty 1 .


Case 10: Embedded object property statement in object position Of course, RDF-star
                        statements can also have embedded statements on object position, which is covered in this case.
                        Similar to Case 8, the embedded statement is an object property statement.
                        @prefix ex: <http://example.org/> .
                        ex:bobhomepage ex:source <<ex:mainPage ex:writer ex:alice>> .


                        Other test cases cover other variations of asserted statements (Case 11), the usage of rdf:type
                        in the embedded and asserted statements (Case 12), the double nesting of RDF-star statements
                        (Case 13), the same RDF-star statement with different asserted statements (Case 14), and multiple
                        occurrences of an RDF-star statement within the same graph (Case 15). As mentioned above,
                        details can be found on our project website11 .


                        6. Analysis and Discussion
                        In this section, we use the test cases identified in Section 5 to evaluate a number of transformation
                        approaches that we have identified in Section 3: RDF2PG7 , RDF-Star Tools9 , and Neosemantics 10
                        (Neo4j Community Edition version 4.3.6). The complete results and analysis can be found in
                        the extended arxiv version of this paper [20]. As we will see and as already mentioned in
                        Section 4, RDF-star Tools and RDF2PG follow the RDF-topology Preservation Transformation
                        (RPT) whereas Neosemantics adopts the Property Graph Transformation (PGT).

                                                                                     Label= IRI/meets
                    Type= "meets"                                                                                                             Type= RI/meets
                                                                       n1                                     n2                              Label= "ObjectProperty"
         n1                                n2
                                                                                                                                     n1                                          n2
"uri" = IRI/alice                                                "kind" = "IRI"                         "kind" = "IRI"
                                    "uri" = IRI/bob
Labels = {"Resource"}                                            "IRI" = IRI/alice                      "IRI" = IRI/bob   "iri" = IRI/alice                             "iri" = IRI/alice
                                    Labels = {"Resource"}                                                                 Labels = {"Resource"}                         Labels = {"Resource"}


                (a) Neosemantics                                              (b) RDF-star Tools                                                  (c) RDF2PG                                Legend   IRI = <http:/

                                                            Legend   IRI = <http://example.com/>                                                                                                     "meets" = "lo

                                                                     "meets" = "localName(http://example.com/meets)"


                        Figure 5: PGs obtained for Case 1


                        6.1. Standard RDF
                        At first, let us discuss our findings for Cases 1 through 7 from Table 1 targeting standard RDF
                        statements. Case 1 corresponds to a simple RDF statement with IRIs as subject and object.
                        Non-surprisingly, all three libraries create a PG with two nodes and one edge (see Figure 5). The
                        main differences are in the way types, labels, and properties are handled. Both Neosemantics and
                        RDF2PG use “Resource” as the label of the two nodes and a key value pair (key=IRI, value=IRI
                        of the subject/object). Additionally, RDF-star Tools uses two additional properties for the nodes:
                        (key=“kind”, value=“IRI”) and (key=“IRI”, value=subject/object IRI). Whereas Neosemantics and
                        RDF2PG use the predicate from the RDF statement as the edge’s type, RDF-star Tools uses the
                        predicate as an edge label. RDF2PG additionally uses “ObjectProperty” as an additional edge
                        label.
                                                                                                                                           str=http://www.w3.org/2001/XMLSchema\#string"
                                                                                                                                           date=http://www.w3.org/2001/XMLSchema\#date"


                                                                                      Legend          IRI = http://example.com/
                                                                                                                            n2             "kind" = "Literal"
                                                                                           Label= IRI/publish_date                         "Literal" = "1963-03-22"
                                                                                                                                           "Datatype" = date
                                                                                          n1                       Label= IRI/index
                                                                  "kind" = "IRI"
n1                                                                "IRI" = IRI/book
                                                                                                                                                  n5
     "uri" = "http://example.com/book"                                                            Label= IRI/cover
                                                                                          IRI = http://example.com/
                                                                                                                                                        "kind" = "Literal"
                                                                                                                                                        "Literal" = "55"
     Labels = {"Resource"}                                                      Legend
                                                                                          int = http://www.w3.org/2001/XMLSchema\#integer”
                                                                               Label= IRI/pages                                                         "Datatype" = str
                                                                                          str = http://www.w3.org/2001/XMLSchema\#string"
     publish_date = "1963-03-22"                                                          date = http://www.w3.org/2001/XMLSchema\#date"
                                                                                                                             n3    "kind" = "Literal"
     index = 55                                                                                                                    "Literal" = 20
     pages = 100                                                                          n4      "kind" = "Literal"
                                                                                                                                   "Datatype" = int

     cover = 20                                                                                   "Literal" = 100
                                                                                                  "Datatype" = int


       (a) Neosemantics                                                                                    (b) RDF-star Tools

                                                                                            n2     Label = {"Literal"}
                                                    Type= IRI/publish_date                         value = "1963-03-22"
                                                    Label="DatatypeProperty"                       Datatype = date

                                             n1                             Type= IRI/index
                     "iri" = IRI/book                                       Label="DatatypeProperty"
                     Labels = {"Resource"}
                                                      Type= IRI/cover
                                                      Label="DatatypeProperty"                                n5   Label = {"Literal"}
                                  Type= IRI/pages                                                                  value = "55"
                                  Label="DatatypeProperty"                                                         Datatype = str
                                                                             n3      Label = {"Literal"}
                                                                                     value = 20
                                             n4    Label = {"Literal"}               Datatype = int
                                                   value = 100
                                                   Datatype = int


                                                                  (c) RDF2PG

Figure 6: PGs obtained for Case 3


   Case 3.1 tests the transformation of datatype property statements. In this case, the differences
between the libraries are more evident, as depicted in Figure 6. RDF-star Tools and RDF2PG
represent each literal using a separate node- The properties of the edge are similar to the
ones described for Case 1 with RDF2PG using “DatatypeProperty” as edge label. Neosemantics
instead, only creates a single node representing the complete set of input RDF statements; the
nodes has one property for each datatype property statement in the input. Additionally, Cases
3.1 and 3.2 also test how transformation approaches support data types and language tags in
RDF statements. While Neosemantics ignores them, both RDF-star Tools and RDF2PG define
the nodes representing the literal objects with type literal and use the XSD schema data type as
annotation.
   Cases 4–6 test different RDF features, such as RDF lists (case 4), blank nodes (case 5), and named
graphs (case 6). All three projects support RDF lists and blank nodes but only Neosemantics can
also import named graphs.

6.2. RDF-star
Let us now discuss over findings for evaluating Cases 8 through 13 from Table 1 targeting
diverse RDF-star constructs. We have observed that many of them are not supported by existing
libraries. RDF2PG does not support embedded RDF-star statements, so none of these cases could
                                                           Legend     IRI = http://example.com/
                                                                           Legend IRI = <http://example.com/>
                                                                      int=http://www.w3.org/2001/XMLSchema\#integer”


                                                                          Label= IRI/age
      n1                                                     n1                                  n2
           "uri" = IRI/Mark                                              certainty = 1         "kind" = "Literal"
           Labels = {"Resource"}                   "kind" = "IRI"
                                                                                               "Literal" = 28
                                                   "IRI" = IRI/Mark
           age = 28                                                                            "Datatype" = int


       (a) Neosemantics                                                (b) RDF-star Tools

 Figure 7: PGs obtained for Case 9


  be converted to a PG.
      Neosemantics ignores RDF-star statements where: (i) the object of the asserted statement
  is a literal value (Case 9) or an embedded statement (Case 10), (ii) all elements of an RDF-star
  statement are RDF resources (Cases 11.1 and 11.2), (iii) the predicate of the embedded or asserted
  statement is "rdf:type"  (Cases
                       Label=       12.1 and 12.2), or (iv) the RDF-star statement is a double embedded
                               IRI/age
  statement n1 (Case 13).                        n2
      RDF-star Tools supports most cases by      creating
                                              "kind"
                                                             nodes in the PG for literal objects. However,
                                                      = "Literal"
                       certainty = 1
"kind"
  Cases = "IRI"
           10 and 13 cause an error in the reading
                                              "Literal" = 28 of the conversion: we believe that this is an
                                                      phase
"IRI" = IRI/Mark
  implementation error, and both cases could         be supported
                                              "Datatype"    = int   in the same manner as the others.
      Among all cases, Case 9 is particularly interesting and represents a natural extension of Case
  3.1. If the embedded statement (mark,age,25) is translated in PGT-style, e.g., by Neosemantics,
  mark is represented as a node with a property having age as key and 25 as value. It is then not
  straightforward how to convert the asserted statement (stating that the certainty of (mark,age,25)
  is 1). Hence, when running Neosemantics, we noticed that it does not transform the asserted
  statement completely. RDF-star Tools (RPT-style transformation), on the other hand, converts
  the statement in Case 9 by transforming the embedded statement to an edge with type age
  between two nodes representing mark and 25; the part of the asserted statement is represented
  as a key-value property associated to the age edge.

 6.3. Edge cases
 Case 2 (resource both as a predicate and a subject/object in another statement) is not supported
 correctly by existing transformation approaches. The result PGs include two separate and
 independent elements for such resources: one node and one edge property. This is different
 from RDF, where an IRI identifies the same resource independently from its position in an RDF
 statement.
    Annotating RDF predicates (both datatype and object properties) is a common way to model
 schema information in RDF. Our finding suggests that there may be more conversion problems
 when looking at RDF graphs that include RDF Schema and OWL axioms. Existing tools do not
 have approaches to work at schema level. This investigation is beyond the scope of this paper
 but we plan to investigate it in our future research.
    Other interesting cases are those where the RDF input contains multiple statements with
 the same subject and predicate but different objects. In Case 14.1, for example, contains two
 datatype property statements with the same subjects and predicates. Neosemantics converts
the literals ("Info_page","aau_page") into a list of strings12 used as a value in a property of the
node representing the subject of the RDF statement.

6.4. RDF-topology Preservation Transformation (RPT) and Property Graph
     Transformation (PGT)
Our analysis confirmed our observation that RDF-star Tools follows the RPT approach; the
approach creates a node for each distinct subject and object of every RDF statement. Predicates
and objects of RDF-star statements are transformed into properties with the key corresponding
to the predicate and the value to the object. Neosemantics, in contrast, follows PGT approach
for most of the datatype property statements, e.g., Cases 2.1, 3.1, 3.2, 11.2, and 14.1, and supports
types and labels (Case 2.4, 5, 6, and 7) slightly differently. Finally, the RDF2PG project adopts
the RPT approach in all cases handled.
   In summary, the two lines of transformation approaches, RPT and PGT, form the core of
start-of-the-art libraries. However, none of the tested libraries supports all test cases. In the end,
the particular requirements for converting RDF-star to PGs vary based on the particular use
cases and application-specific needs. In one use case, a user might favor one transformation
approach over the other based on ontology availability, the application domain, performance,
or additional application-specific needs. And driven by real-world applications, users might
want to adopt a combination of the basic transformation approaches to best fit their needs.

6.5. Discussion

Ontology and schema availability Application domains, such as financial services, life science,
and military, have well-established domain ontologies that are native to the RDF/RDF-star model.
Such use cases often also exhibit computationally expensive queries and algorithms that are
better for PGs, Hence, in such cases, RPT is preferable to convert the ontology and PGT to
convert instance data to reduce the number of nodes and therefore improve runtime.
   Another use case involves harvesting Linked Open Data [21, 22] where the PGT approach is
usually preferable. Likewise, PGT is easier to use in combination with tabular data. However, in
situations, where the ontology is essential and the RDF data is highly complex and heterogeneous,
users may prefer RPT. In the end, some graph-based machine learning models also require
graphs that model literal property values as nodes.
Performance and query complexity Query performance over graphs generally depends on
the number of edges and nodes [23]. Therefore, the user may choose to convert an RDF/RDF-
star graph into a PG using a specific transformation method to comply with performance
requirements. For example, in an application for a smart home, the authors transformed the
RDF graph into a PG based on a custom transformation [24]. The authors evaluated the usage of
a PG generated by the custom transformation versus the PG generated by NSMTX13 . The PG of
the custom transformation had two nodes and one edge compared to 16 nodes and 15 edges for
the PG generated by the NSMTX plugin. As a result, executing queries over the former graph is

   12
        https://neo4j.com/docs/cypher-manual/current/syntax/values/#composite-types
   13
        https://neo4j.com/nsmtx-rdf/
more efficient than over the NSMTX graph. The custom transformation eliminated many edges
(i.e., relations) and converted them into properties. Since RPT tends to convert each triple into
an edge, the transformation generates many nodes compared to PGT that wraps some triples as
properties for nodes.
Data sharing RPT allows RDF-star asserted triples (i.e., edge attributes) to be represented
as nodes in the PG, treating them as individual entities. This approach can make the graph
representation more expressive for other users to understand and re-use than PGT. On the other
hand, the PGT can result in many nodes with properties as literal key-value pairs, not explicit
edges. This representation can be a natural choice to represent descriptions of nodes.


7. Conclusion
In this paper, we have evaluated and discussed how to transform RDF-star graphs into property
graphs. To evaluate existing approaches (Neosemantics, RDF-star Tools, and RDF2PG), we have
identified a number of test cases. Our analysis has shown that none of these three approaches
supports all test cases and that none of them is the best for all applications. None of them con-
siders user requirements for the transformation process. Nevertheless, existing approaches can
roughly be categorized into two lines of transformation approaches: RDF-topology preserving
transformation (RPT) and PG transformation (PGT). In our future work, we plan to expand
our experiments to comprehensive datasets and combining RPT and PGT into a single hybrid
transformation approach. Additionally, we plan to work on the query interoperability between
RDF-star and property graphs.

Acknowledgments
This research was partially funded by the Danish Council for Independent Research (DFF) under
grant agreement no. DFF-8048-00051B and the Poul Due Jensen Foundation.

References
 [1] M. Rodriguez, P. Neubauer, Constructions from Dots and Lines, American Society for
     Information Science and Technology (2010) 8366.
 [2] O. Hartig, Foundations of RDF* and SPARQL*:(An Alternative Approach to Statement-Level
     Metadata in RDF), in: AMW, 2017.
 [3] D. Tomaszuk, RDF Data in Property Graph Model, in: MTSR, 2016, pp. 104–115.
 [4] N. Francis, A. Green, P. Guagliardo, L. Libkin, T. Lindaaker, V. Marsault, S. Plantikow,
     M. Rydberg, P. Selmer, A. Taylor, Cypher: An Evolving Query Language for Property
     Graphs, in: SIGMOD, 2018, pp. 1433–1445.
 [5] A. Deutsch, Y. Xu, M. Wu, V. E. Lee, Aggregation Support for Modern Graph Analytics in
     TigerGraph, in: SIGMOD, 2020, pp. 377–392.
 [6] O. Hartig, Foundations to Query Labeled Property Graphs using SPARQL, in: AMAR,
     2019.
 [7] R. Angles, H. Thakkar, D. Tomaszuk, RDF and Property Graphs Interoperability: Status
     and Issues, in: AMW, 2019.
 [8] O. Lassila, M. Schmidt, B. Bebee, D. Bechberger, W. Broekema, A. Khandelwal, K. Lawrence,
     R. Sharda, B. Thompson, Graph? yes! which one? help!, preprint arXiv:2110.13348 (2021).
 [9] J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann,
     M. Morsey, P. Van Kleef, S. Auer, et al., DBpedia – A Large-scale, Multilingual Knowledge
     Base Extracted from Wikipedia, Semantic web (2015) 167–195.
[10] F. M. Suchanek, G. Kasneci, G. Weikum, Yago: A Core of Semantic Knowledge, in: Proc. of
     TheWebConf, 2007, pp. 697–706.
[11] D. Vrandečić, Wikidata: A New Platform for Collaborative Data Collection, in: TheWebConf,
     2012, pp. 1063–1064.
[12] S. Khayatbashi, S. Ferrada, O. Hartig, Converting Property Graphs to RDF: A Preliminary
     Study of the Practical Impact of Different Mappings (2022).
[13] R. Angles, H. Thakkar, D. Tomaszuk, Mapping RDF Databases to Property Graph Databases,
     IEEE Access (2020) 86091–86110.
[14] V. Nguyen, O. Bodenreider, A. Sheth, Don’t Like RDF Reification? Making Statements
     about Statements Using Singleton Property, in: TheWebConf, 2014, pp. 759–770.
[15] J. J. Carroll, C. Bizer, P. Hayes, P. Stickler, Named graphs, Journal of Web Semantics (2005)
     247–267.
[16] J. Frey, K. Müller, S. Hellmann, E. Rahm, M.-E. Vidal, Evaluation of Metadata Representa-
     tions in RDF stores, Semantic Web (2019) 205–229.
[17] D. Hernández, L. Galárraga, K. Hose, Computing How-Provenance for SPARQL Queries
     via Query Rewriting, PVLDB (2021) 3389–3401.
[18] O. Pelgrin, L. Galárraga, K. Hose, Towards Fully-fledged Archiving for RDF Datasets,
     Semantic Web (2021) 903–925.
[19] J. Bruyat, P.-A. Champin, L. Médini, F. Laforest, PREC: semantic translation of property
     graphs, arXiv preprint arXiv:2110.12996 (2021).
[20] G. Abuoda, D. Dell’Aglio, A. Keen, K. Hose, Transforming RDF-star to Property Graphs: A
     Preliminary Analysis of Transformation Approaches - extended version, arXiv preprint
     arXiv:2210.05781 (2022).
[21] A. Harth, K. Hose, R. Schenkel (Eds.), Linked Data Management, Chapman and Hall/CRC,
     2014.
[22] O. Hartig, K. Hose, J. F. Sequeda, Linked data management, in: Encyclopedia of Big Data
     Technologies, Springer, 2019.
[23] S. Das, J. Srinivasan, M. Perry, E. I. Chong, J. Banerjee, A Tale of Two Graphs: Property
     Graphs as RDF in Oracle, in: EDBT, 2014, pp. 762–773.
[24] N. Baken, Linked Data for Smart Homes: Comparing RDF and Labeled Property Graphs,
     in: LDAC2020, 2020, pp. 23–36.

</pre>