Transforming RDF-star to Property Graphs: A Preliminary Analysis of Transformation Approaches Ghadeer Abuoda1 , Daniele Dell’Aglio1 , Arthur Keen2 and Katja Hose1 1 Department of Computer Science, Aalborg University, Aalborg, Denmark 2 ArangoDB, San Francisco, United States Abstract RDF and property graph models have many similarities, such as using basic graph concepts like nodes and edges. However, such models differ in their modeling approach, expressivity, serialization, and the nature of applications. RDF is the de-facto standard model for knowledge graphs on the Semantic Web and supported by a rich ecosystem for inference and processing. The property graph model, in contrast, provides advantages in scalable graph analytical tasks, such as graph matching, path analysis, and graph traversal. RDF-star extends RDF and allows capturing metadata as a first-class citizen. To tap on the advantages of alternative models, the literature proposes different ways of transforming knowledge graphs between property graphs and RDF. However, most of these approaches cannot provide complete transformations for RDF-star graphs. Hence, this paper provides a step towards transforming RDF-star graphs into property graphs. In particular, we identify different cases to evaluate transformation approaches from RDF-star to property graphs. Specifically, we categorize two classes of transformation approaches and analyze them based on the test cases. The obtained insights will form the foundation for building complete transformation approaches in the future. 1. Introduction The most popular models for representing knowledge graphs are: RDF1 (Resource Description Framework) and property graphs [1] (PG). While RDF represents knowledge graphs as a set of subject-predicate-object triples, property graphs assign key-value style properties to nodes and edges.Recently, RDF-star [2] has been proposed as an extension of RDF to enable enriching RDF triples with metadata information by embedding triples in subjects or objects of other triples, which allows providing statements about statements and somewhat resembles adding properties to edges in property graphs. RDF-star is supported by a rich ecosystem of data management systems and standards, most notably systems such as Stardog, OpenLink’s Virtuoso, Ontotext GraphDB, AllegroGraph, Apache Jena, and more recently also Oxigraph, but also query stan- dards, such as SPARQL2 and its extension SPARQL-star3 as well as RDF Schema, which allows QuWeDa 2022: 6th Workshop on Storing, Querying and Benchmarking Knowledge Graphs at ISWC, October 23, 2022, virtual $ gsmas@cs.aau.dk (G. Abuoda); dade@cs.aau.dk (D. Dell’Aglio); arthur@arangodb.com (A. Keen); khose@cs.aau.dk (K. Hose) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 1 RDF 1.1 Primer: https://www.w3.org/TR/rdf11-primer/ 2 SPARQL 1.1 Query Language: https://www.w3.org/TR/sparql11-query/ 3 SPARQL-star Query Language: https://w3c.github.io/rdf-star/cg-spec/editors_draft.html#sparql-star Alex age Alex age 25 25 Alex certainty Alex certainty (age = 25) (age = 25) 0.5 0.5 (a) RDF-star Graph (b) A property graph Figure 1: Graphical Representation of Listing 1 as (a) RDF-star and (b) Property Graph Listing 1: An example RDF-star graph in Turtle star format @prefix ex: . <> ex:certainty 0.5 . describing classes of RDF resources and properties4 . In contrast, many graph database systems, such as Neo4j, TigerGraph, JanusGraph, RedisGraph, and SAP HANA are based on different variations of the property graph model [3] and different query languages [4, 5]. Unfortunately, RDF-star graphs and property graphs are not entirely compatible with one another. Although they both describe data through graphs, their underlying models and semantics are different, leading to many data interoperability issues [6, 7, 8]. Metadata or edge properties in RDF-star can be modeled as separate nodes or RDF-star triples. In contrast, edge properties can only be represented as literal key-value pairs in property graphs. In general, it is challenging to transform an RDF-star graph fully into a property graph because of the rich expressiveness of the former. The heterogeneity between the two models and their frameworks makes it necessary to study their interoperability, i.e., the ability to map one model to another for data exchange and sharing [7]. The mapping between the two models is crucial for data exchange, data integration as well as reusability of systems and tools between the frameworks. RDF-star, specifically the RDF model, is recognized as a web-native model that supports data exchange and sharing across different sources because of its formal semantics and the universal uniqueness of resources using IRIs. RDF is a common and flexible model for knowledge representation, and that is exemplified by knowledge graphs that cover a broad set of domains, such as DBpedia [9], YAGO [10], and Wikidata [11]. On the contrary, even with the wide adoption of property graph engines, property graphs lack many essential features, such as a schema language, a standard query language, standard data serialization formats, etc. Achieving interoperability and reliable transformations between the two frameworks will finally enable us to exploit the benefits of both models. The transformation of property graphs to RDF-star has been explored recently [12], and basic transformation rules for property graphs to RDF-star were proposed [2, 13]. However, the latter does not cover all RDF-star constructs and allows for multiple alternatives. 4 RDF Schema 1.1: https://www.w3.org/TR/rdf-schema/ Consider, for instance, the example illustrated in Figure 1(a). If we start with the triple (ex:Alex, ex:age, 25) in Listing 1, then we could represent the RDF element (Alex) as a node in a property graph, as shown in Figure 1(b). This node would then have a property (age,25) and the RDF triple would be represented by a single node in a property graph. However, if we have a single node without an edge, we cannot represent the metadata about the original RDF triple (ex:certainty 0.5) in the property graph. Studying such cases, this paper makes the following contributions: • We identify two alternative approaches of transformations: RDF-topology-preserving and Property-Graph transformation. • We define a set of test cases capturing the diverse RDF-star constructs that have to be considered when transforming RDF-star to property graphs. • Using the test cases, we systematically evaluate alternative mapping approaches and identify their shortcomings. This paper is structured as follows: while Section 2 introduces preliminaries, Section 3 discusses related work. Section 4 presents alternative transformation approaches. Afterwards, Section 5 provides details on our test cases, which we use in Section 6 to identify and discuss shortcomings of transformation approaches. Section 7 concludes the work with an outlook for future work. 2. Preliminaries In this section, we formally introduce RDF1 , RDF-star [2], and property graphs [1]. 2.1. Resource Description Framework (RDF). RDF is a W3C standard data model that represents information as a set of statements. Each statement denotes a typed relation between two resources. Definition 1 (RDF statement). Let 𝐼, 𝐵 and 𝐿 be the disjoint sets of Internationalized Resource Identifiers (IRIs), blank nodes and literals. An RDF statement is a triple (𝑠, 𝑝, 𝑜) ∈ (𝐼 ∪ 𝐵) × 𝐼 × (𝐼 ∪ 𝐵 ∪ 𝐿), and it indicates that 𝑠 and 𝑜 (subject and object, resp.) are in a relation 𝑝 (predicate). In this paper, we consider two types of RDF statements that we distinguish based on whether the object is an IRI or a literal. Object property statements are RDF statements (𝑠, 𝑝, 𝑜) ∈ (𝐼 ∪𝐵)× 𝐼 × (𝐼 ∪ 𝐵), while datatype property statements are RDF statements (𝑠, 𝑝, 𝑜) ∈ (𝐼 ∪ 𝐵) × 𝐼 × 𝐿. An RDF graph containing three RDF statements is shown in Listing 2 serialized in Turtle5 , and visually in Figure 2(a) as a graph. The first two statements are object property statements. The first statement describes two resources, ex:Apple_Inc and ex:California, related by the predicate ex:located_in. The second statements indicates that ex:Apple_Inc has ex:Tim_Cook as a ex:CEO. The last statement is a datatype property statement, and it indicates that ex:Tim_Cook has the literal "2011" as the value of the ex:start_date predicate. 5 RDF Turtle: https://www.w3.org/TR/turtle/ Listing 2: An RDF graph in Turtle format @prefix ex: . ex:Apple_Inc ex:located_in ex:California . ex:Apple_Inc ex:CEO ex:Tim_Cook . ex:Tim_Cook ex:start_date 2011 . :located_in :Apple_Inc :California :located_in :located_in :Apple_Inc :California :CEO :California :CEO :start_date :Apple_Inc 2011 :Tim_Cook :CEO :Tim_Cook :Tim_Cook :start_date (start_date= 2011) 2011 (a) RDF Graph (b) RDF-star Graph (c) Property Graph Figure 2: Example in RDF, RDF-star, and Property Graphs 2.2. RDF-star Looking at the RDF graph in Figure 2(a), one can spot some imprecise data modelling choices: stating that Tim Cook started in 2011 is not totally correct, as he started in 2011 his role as CEO of Apple. In other words, one should associate the starting date to the statement (ex:Apple_Inc, ex:CEO, ex:Tim_Cook), as depicted in Figure 2(b). There are several ways to implement this idea in RDF, such as RDF reification [2], singleton properties [14], and named graphs [15]. However, these mechanisms have significant shortcomings [16, 17, 18]. Listing 3: An RDF-Star Graph in Turtle-Star Format @prefix ex: . ex:Apple_Inc ex:located_in ex:California . <> ex:start_date 2011 . A solution to overcome such shortcomings was recently proposed by Hartig et al. with RDF-star [2, 12]. RDF-star extends RDF by letting RDF statements be subjects or objects in other statements. Listing 3 shows an RDF-star document serialized in Turtle-star [2]. The first statement is a compliant RDF statement (it appeared also in Listing 2). The second statement indicates that ex:Apple_Inc appointed ex:Tim_Cook as ex:CEO in 2011. We formally define an RDF-star statement as follows. Definition 2 (RDF-star statement). Let 𝑠 ∈ 𝐼 ∪ 𝐵 , 𝑝 ∈ 𝐼, 𝑜 ∈ 𝐼 ∪ 𝐵 ∪ 𝐿. An RDF-star statement is a triple defined recursively as: • Any RDF statement (𝑠, 𝑝, 𝑜) is an RDF-star statement; • Let 𝑡 and ¯𝑡 be RDF-star statements. Then, (𝑡, 𝑝, 𝑜), (𝑠, 𝑝, 𝑡) and (𝑡, 𝑝, ¯𝑡) are RDF-star state- ments, also known as asserted statement. 𝑡 and ¯𝑡 are called embedded or quoted statements. 2.3. Property Graphs A property graph (PG) is a graph where nodes and edges can have multiple properties, repre- sented as key-value pairs. Figure 2(c) illustrates the graph described in the above section as a property graph. In this case, the starting date of Tim Cook as the CEO of Apple Inc is reported as a key-value property on the :CEO edge. PGs have not a unique and standardized model; each PG engine proposes its data model. A generic PG model definition is proposed by [3]. Definition 3 (Property Graph). Let 𝐿 be the set of the labels, 𝑃 𝑁 be the set of property names, and 𝐷 be the set of property values. A property graph 𝐺 is an edge-labeled directed multi-graph such that 𝐺 = (𝑁, 𝐸, 𝑒𝑑𝑔𝑒, 𝑙𝑏𝑙, 𝑃, 𝜎), where: • 𝑁 is a set of nodes, • 𝐸 is a set of edges between nodes, such that 𝑁 ∩ 𝐸 = ∅ • 𝑒𝑑𝑔𝑒 : 𝐸 → (𝑁 × 𝑁 ) is a total function that associates each edge in 𝐸 with a pair of nodes in 𝑁 . If 𝑒𝑑𝑔𝑒(𝑒1 ) = (𝑛1 , 𝑛2 ), 𝑛1 is the source node and 𝑛2 is the target node. • 𝑙𝑏𝑙 : (𝑁 ∪ 𝐸) → 𝒫(L) is a function that associates each edge or node with a set of labels. • 𝜎 : (𝑁 ∪ 𝐸) → 𝒫(𝑃 ) is a function that associates a node or edge with a non-empty set of properties 𝑃 defined as a set of key-value pairs (𝑘, 𝑣) where 𝑘 ∈ 𝑃 𝑁 and 𝑣 ∈ 𝐷 To ease all approaches’ output representation, we map any IRI to a distinct string representing a local name. Given 𝐼, a set of all IRIs, 𝑙𝑜𝑐𝑎𝑙𝑁 𝑎𝑚𝑒 is a function that maps an IRI to a string that represents the local name of an RDF resource6 . For example, the local name for the RDF resource (http://example.com/meets) 𝑙𝑜𝑐𝑎𝑙𝑁 𝑎𝑚𝑒("http://example.com/meets") is "meets". We will use this function in the output representation in Section 6. 3. Related Work We can distinguish between related work on converting between (i) RDF and PG and (ii) RDF-star and PG. RDF and PG. Angles et al. [13] propose three variations of transforming RDF into PG op- tionally in consideration of schemas: simple, generic, and complete. The authors formally show that two of the proposed mapping approaches (generic and complete) satisfy the property of information preservation, i.e., there exist inverse mappings that allow recovering the origi- nal dataset without information loss. The evaluation in this paper (Section 6) includes the schema-independent mapping referred to as the Generic Database Mapping using the authors’ implementation (RDF2PG7 ). Although our focus is on RDF-star (instead of RDF), we include this approach since it provides the basic formalities and implementation that can be extended to support RDF-star. In the opposite direction, Bruyat et al. [19] propose PREC8 , a library that enables tranformation PGs into RDF graphs. The authors built a uniform graph model to describe the structure of 6 In Neo4j, the user can configure the local name of RDF terms such as subPropertyOf, subClassOf, Class, etc. 7 https://github.com/renzoar/rdf2pg 8 https://bruju.github.io/PREC/ Listing 4: RDF triples @prefix ex: . @prefix xsd: . ex:book ex:publish_date "1963-03-22"^^xsd:date . ex:book ex:pages "100"^^xsd:integer . ex:book ex:cover 20 . ex:book ex:index "55" . the property graph in RDF terms. PREC uses a context in RDF-star format that describes the mappings between the terms used in the PG model and IRIs. The user can define a template representing the different properties and edges in the resulting RDF graph. Despite using RDF-star internally, the approach does not support mapping PGs to RDF-star graphs. RDF-star and PG. Hartig et al. [2] propose two approaches for transforming RDF-star to PG. The first approach maps ordinary RDF triples to edges in the PG. Metadata triples are then represented as edge properties. Our analysis (Section 6) includes this approach using the authors’ implementation (RDF-star Tools9 ). The second approach treats datatype and object property statements differently. The former are transformed into node properties and the latter into edges. This approach, however, is limited in mapping embedded triples, and it is not implemented in the RDF-star Tools library. Neosemantics is a well-known project to import RDF data into Neo4J, implemented as a Neo4j plug-in10 . The implementation was only recently extended to include an RDF-star importing feature. As we will see in Section 6 importing RDF-star into PG using this transformation is lossy and does not cover all cases. In the other direction, Khayatbashi et al. [12] present an analysis evaluating three transfor- mation approaches from PG to RDF, including an RDF-star approach. As a part of the study, the authors evaluated the performance of querying the generated RDF graphs in multiple triple stores. They found that there is no clear best mapping in terms of execution time; the performance of the queries over RDF and RDF-star graphs resulting from the mapping varies compared to their equivalent pure RDF representations. 4. Transformation Approaches from RDF to Property Graphs Analyzing the approaches discussed in Section 3, we can extrapolate two principle approaches: RDF-topology Preserving Transformation (RPT) and Property Graph Transformation (PGT). RPT tries to preserve the RDF-star graph structure by transforming each RDF statement into an edge in the PG. PGT, on the other hand, ensures that datatype property statements are mapped to node properties in the PG. In what follows, we first explain how these approaches transform RDF triples into PG and afterwards how the basic algorithms can be extended to support RDF-star. Consider the example in Listing 4 with multiple datatype property statements describing the RDF resource (ex:book). Figure 3 shows graphical visualizations of the property graphs 9 https://github.com/RDFstar/RDFstarTools 10 Neosemantics: https://neo4j.com/labs/neosemantics/ generated by the two approaches: RPT (a) and PGT (b). RPT, for example, converts the triple (ex:book,ex:index,55) into two nodes (ex:book) and (55), connected by an edge (ex:index). All other triples involving RDF resources, blank nodes, or literal values can be transformed in a similar way so that we obtain the PG in Figure 3(a). Algorithm 1 formalizes the RPT approach; for each triple it always creates a node for the subject (line 3) and the object (line 5) with an edge connecting them (line 12) – of course avoiding duplicate nodes for the same IRIs. For the same example (Listing 4), PGT creates the PG in Figure 3(b) consisting of a single node respresenting the RDF resource (ex:book) with multiple properties representing property-object pairs from the RDF statements, such as (ex:index,55). Distinguishing between datatype and object property statements, this approach transforms object property statements to edges and datatype property statements to properties of the node representing the subject. Unlike RPT, the resulting PG nodes represent only RDF resources or blank nodes while literal objects will become properties. PGT is more formally sketched in Algorithm 2, which first checks the type of the statement’s object (line 5) and based on that decides to either create a node (if it does not yet exist, line 6) or a property (line 13). Algorithm 2: PGT Algorithm 1: RPT Input: A set of RDF Triples 𝑇 Input: A set of RDF Triples 𝑇 Output: A Property Graph 𝑃 𝑔 = (𝑁, 𝐸, 𝑒𝑑𝑔𝑒, 𝑙𝑏𝑙, 𝑃, 𝜎) Output: A Property Graph 𝑃 𝑔 = (𝑁, 𝐸, 𝑒𝑑𝑔𝑒, 𝑙𝑏𝑙, 𝑃, 𝜎) 1: 𝑃 𝑔 ← ∅ 1: 𝑃 𝑔 ← ∅ 2: for 𝑡 ∈ 𝑇 , such that 𝑡 =< 𝑠, 𝑝, 𝑜 > do 2: for 𝑡 ∈ 𝑇 , such that 𝑡 =< 𝑠, 𝑝, 𝑜 > do 3: 𝑁 = 𝑁 ∪ {𝑠} 3: 𝑁 = 𝑁 ∪ {𝑠} 4: 𝑙𝑏𝑙(𝑠) = {"RDF resource"} 4: 𝑙𝑏𝑙(𝑠) = {"RDF resource"} 5: if 𝑜 is an RDF resource then 5: 𝑁 = 𝑁 ∪ {𝑜} 6: 𝑁 = 𝑁 ∪ {𝑜} 6: if 𝑜 is an RDF resource then 7: 𝑙𝑏𝑙(𝑜) = {"RDF resource"} 7: 𝑙𝑏𝑙(𝑜) = {"RDF resource"} 8: 𝐸 = 𝐸 ∪ {𝑒} 8: else 9: 𝑒𝑑𝑔𝑒(𝑒) = (𝑠, 𝑜) 9: 𝑙𝑏𝑙(𝑜) = {"Literal"} 10: 𝑙𝑏𝑙(𝑒) = 𝑝 10: end if 11: else 11: 𝐸 = 𝐸 ∪ {𝑒} 12: 𝑃 = 𝑃 ∪ {𝑝𝑟} 12: 𝑒𝑑𝑔𝑒(𝑒) = (𝑠, 𝑜) 13: 𝑝𝑟 = {(p, o)} 13: 𝑙𝑏𝑙(𝑒) = 𝑝 14: 𝜎(𝑠) = pr 14: end for 15: end if 15: return 𝑃 𝑔 16: end for 17: return 𝑃 𝑔 20 :cover :index :book 55 :book publish_date = "1963-03-22" :publish_date :pages index = 55 "1963-03-22" pages = 100 cover = 20 100 (a) RPT (b) PGT Figure 3: RPT and PGT transformations for the example in Listing 4 Let us now consider the RDF-star example in Listing 5, which contains an asserted triple for an embedded data property statement – the PGs obtained by applying RPT and PGT are shown in Figure 4. Algorithms 3 and 4 illustrate the main principle of mapping the embedded and asserted triples. RPT conversion for RDF-star is identical to RDF triples, then converting the asserted triple into an edge property (Algorithm 3 lines 5-8). PGT transforms the embedded triple depending on its object; if it is an RDF resource, PGT converts it to an edge. Otherwise, it converts the embedded triple into a node with a property (Algorithm 4 lines 6-11) and fails to transform the asserted triple. Listing 5: RDF-star triples @prefix ex: . <> ex:certainty 1 . In summary, the transformation of the triples from Listing 5 using PGT results in a PG with a single node that makes it impossible to represent the asserted triple since PGs do not support properties over other properties. In contrast, RPT transforms the embedded triple into an edge in the PG and can express the asserted triple as the edge’s property. The Abstracting away from a few details (see also Section 6), the Neosemantics approach10 basically follows PGT while RDF-star Tools9 and RDF2PG7 follow RPT. Algorithm 4: PGT-star Algorithm 3: RPT-star Input: A set of RDF-star Triples 𝑇 Input: A set of RDF-star Triples 𝑇 Output: A Property Graph 𝑃 𝑔 = (𝑁, 𝐸, 𝑒𝑑𝑔𝑒, 𝑙𝑏𝑙, 𝑃, 𝜎) Output: A Property Graph 𝑃 𝑔 = (𝑁, 𝐸, 𝑒𝑑𝑔𝑒, 𝑙𝑏𝑙, 𝑃, 𝜎) 1: 𝑃 𝑔 ← ∅ 1: 𝑃 𝑔 ← ∅ 2: for 𝑡 ∈ 𝑇 , such that 𝑡 =< 𝑠, 𝑝, 𝑜 > do 2: for 𝑡 ∈ 𝑇 , such that 𝑡 =< 𝑠, 𝑝, 𝑜 > do 3: if ¯𝑡 is an embedded triple, such that 𝑡 =< ¯𝑡, 𝑝, 𝑜 > and ¯𝑡 =< ¯ 𝑠, 𝑝 𝑜 > then ¯, ¯ 3: if ¯𝑡 is an embedded triple, such that 𝑡 =< ¯𝑡, 𝑝, 𝑜 > and ¯𝑡 =< ¯ 𝑠, 𝑝 𝑜 > then ¯, ¯ 4: 𝑜 is an RDF resource then if ¯ 4: 𝑃 𝑔𝑂𝑢𝑡 = RPT(𝑡¯) 5: 𝑃 𝑔𝑂𝑢𝑡 = PGT(𝑡¯) 5: 𝑝𝑟 = {(p, o)} 6: else 6: 𝑃 = 𝑃 ∪ {𝑝𝑟} 7: 𝑃 𝑔𝑂𝑢𝑡 = PGT(𝑡¯) 7: 𝜎(𝑒) = 𝑝𝑟 8: 𝑝𝑟 = {(𝑝¯,𝑜 ¯)} 8: end if 9: 𝑃 = 𝑃 ∪ {𝑝𝑟} 9: end for 10: 𝜎(𝑠¯) = 𝑝𝑟 10: return 𝑃 𝑔 ∪ 𝑃 𝑔𝑂𝑢𝑡 11: end if 12: end if 13: end for 14: return 𝑃 𝑔 ∪ 𝑃 𝑔𝑂𝑢𝑡 5. Test Cases In this section, we present a systematic list of test cases that transformation approaches need to fulfill. We distinguish between basic cases that conform to small RDF graphs as well as a range of RDF-star specific test cases that challenge existing approaches – as we will see in our evaluation in Section 6. The complete list of test cases with their short titles is shown in Table 1 – subcases represent variations and bold font indicates cases discussed in more detail in this paper. For the sake of space, we present only part of these cases in this section, more details for all cases are available on our website11 11 https://relweb.cs.aau.dk/rdfstar IRI = http://example.com/ Legend int=http://www.w3.org/2001/XMLSchema\#integer” str=http://www.w3.org/2001/XMLSchema\#string" date=http://www.w3.org/2001/XMLSchema\#date" IRI = http://example.com/ Legend int=http://www.w3.org/2001/XMLSchem str=http://www.w3.org/2001/XMLSchem date=http://www.w3.org/2001/XMLSche :Matt :likes certainty = 0.5 :Mary :Matt :likes :age certainty = 0.5 certainty = 1 :Mary 28 age = 28 (a) RPT (b) PGT Figure 4: RPT and PGT transformations for the example in Listing 5 5.1. Standard RDF Case 1: Standard RDF statement This case represents an object property statement. Both, subject and object are RDF resources. Most transformation approaches map this case to two nodes (subject and object) with an edge (the predicate) connecting them. @prefix ex: . ex:alice ex:meets ex:bob . Case 2: The predicate of an RDF statement is subject in another statement Mapping an RDF statement to two nodes with the predicate as label of the edge between them leads to problems when the predicate itself is also used as a subject in another RDF statement – Case 2.1 therefore consists of the following statements: @prefix rdfs: . @prefix ex: . ex:Sam ex:mentor ex:Lee . ex:mentor rdfs:label "project supervisor" . ex:mentor ex:name "mentor's name" . Other variants of Case 2 include a predicate for a non-literal object, such as rdf:type and rdfs:subPropertyOf. Case 3: Data types and language tags It is also important to test the support of different data types and language tags. Hence, Case 3.1, for instance, contains several datatype property statements involving different data types and formats for the literal objects: @prefix ex: . @prefix xsd: . ex:book ex:publish_date "1963-03-22"^^xsd:date . ex:book ex:pages "100"^^xsd:integer . ex:book ex:cover 20 . ex:book ex:index "55" . 5.2. RDF-star Case 8: Embedded object property statement in subject position As the name indicates and the following listing shows, this test case features an RDF-star statement where the subject Table 1 Test cases for evaluating RDFstar-to-PG transformation approaches Standard RDF Case Description 1 Standard RDF statement 2 The predicate of an RDF statement is subject in another statement 2.1 Predicate as subject and literal as object 2.2 Predicate as subject and RDF resource as object 2.3 Predicate as subject and RDF property as object - rdfs:subPropertyOf 2.4 Predicate as subject and RDF class as object - rdf:type 3 Data types and language tags 3.1 Datatype property statements with different data types of the literal objects 3.2 Datatype property statements with different language tags of the literal objects 4 RDF list 5 Blank nodes 6 Named graphs 7 Multiple types for resources - rdf:type RDF-Star 8 Embedded object property statement in subject position 9 Embedded datatype property statement in subject position 10 Embedded object propertystatement in object position 11 Embedded object property statement in subject position and non-literal object 11.1 Asserted statement with non-literal object 11.2 Asserted statement with non-literal object that appears in another asserted statement 12 Embedded statement in subject position - object property with rdf:type predicate 12.1 Asserted statement with rdf:type as predicate 12.2 Embedded statement with rdf:type as predicate 13 Double nested RDF-star statement in subject position 14 Multi-valued properties 14.1 RDF statements with same subject and predicate and different objects 14.2 RDF-star statements with the same subject and predicate and different objects 15 Multiple instances of embedded statements in a single RDF-star graph 15.1 Identical embedded RDF-star statements with different asserted statements 15.2 RDF statements as embedded and asserted statements in the same graph corresponds to an embedded object property statement and the object is a literal: @prefix ex: . <> ex:certainty 0.5 . Case 9: Embedded datatype property statement in subject position Similar to the previous case we again have an RDF-star statement where the subject corresponds to an embedded statement. In contrast to the Case 8, the embedded statement in Case 9 is a datatype property statement: @prefix ex: . <> ex:certainty 1 . Case 10: Embedded object property statement in object position Of course, RDF-star statements can also have embedded statements on object position, which is covered in this case. Similar to Case 8, the embedded statement is an object property statement. @prefix ex: . ex:bobhomepage ex:source <> . Other test cases cover other variations of asserted statements (Case 11), the usage of rdf:type in the embedded and asserted statements (Case 12), the double nesting of RDF-star statements (Case 13), the same RDF-star statement with different asserted statements (Case 14), and multiple occurrences of an RDF-star statement within the same graph (Case 15). As mentioned above, details can be found on our project website11 . 6. Analysis and Discussion In this section, we use the test cases identified in Section 5 to evaluate a number of transformation approaches that we have identified in Section 3: RDF2PG7 , RDF-Star Tools9 , and Neosemantics 10 (Neo4j Community Edition version 4.3.6). The complete results and analysis can be found in the extended arxiv version of this paper [20]. As we will see and as already mentioned in Section 4, RDF-star Tools and RDF2PG follow the RDF-topology Preservation Transformation (RPT) whereas Neosemantics adopts the Property Graph Transformation (PGT). Label= IRI/meets Type= "meets" Type= RI/meets n1 n2 Label= "ObjectProperty" n1 n2 n1 n2 "uri" = IRI/alice "kind" = "IRI" "kind" = "IRI" "uri" = IRI/bob Labels = {"Resource"} "IRI" = IRI/alice "IRI" = IRI/bob "iri" = IRI/alice "iri" = IRI/alice Labels = {"Resource"} Labels = {"Resource"} Labels = {"Resource"} (a) Neosemantics (b) RDF-star Tools (c) RDF2PG Legend IRI = "meets" = "lo "meets" = "localName(http://example.com/meets)" Figure 5: PGs obtained for Case 1 6.1. Standard RDF At first, let us discuss our findings for Cases 1 through 7 from Table 1 targeting standard RDF statements. Case 1 corresponds to a simple RDF statement with IRIs as subject and object. Non-surprisingly, all three libraries create a PG with two nodes and one edge (see Figure 5). The main differences are in the way types, labels, and properties are handled. Both Neosemantics and RDF2PG use “Resource” as the label of the two nodes and a key value pair (key=IRI, value=IRI of the subject/object). Additionally, RDF-star Tools uses two additional properties for the nodes: (key=“kind”, value=“IRI”) and (key=“IRI”, value=subject/object IRI). Whereas Neosemantics and RDF2PG use the predicate from the RDF statement as the edge’s type, RDF-star Tools uses the predicate as an edge label. RDF2PG additionally uses “ObjectProperty” as an additional edge label. str=http://www.w3.org/2001/XMLSchema\#string" date=http://www.w3.org/2001/XMLSchema\#date" Legend IRI = http://example.com/ n2 "kind" = "Literal" Label= IRI/publish_date "Literal" = "1963-03-22" "Datatype" = date n1 Label= IRI/index "kind" = "IRI" n1 "IRI" = IRI/book n5 "uri" = "http://example.com/book" Label= IRI/cover IRI = http://example.com/ "kind" = "Literal" "Literal" = "55" Labels = {"Resource"} Legend int = http://www.w3.org/2001/XMLSchema\#integer” Label= IRI/pages "Datatype" = str str = http://www.w3.org/2001/XMLSchema\#string" publish_date = "1963-03-22" date = http://www.w3.org/2001/XMLSchema\#date" n3 "kind" = "Literal" index = 55 "Literal" = 20 pages = 100 n4 "kind" = "Literal" "Datatype" = int cover = 20 "Literal" = 100 "Datatype" = int (a) Neosemantics (b) RDF-star Tools n2 Label = {"Literal"} Type= IRI/publish_date value = "1963-03-22" Label="DatatypeProperty" Datatype = date n1 Type= IRI/index "iri" = IRI/book Label="DatatypeProperty" Labels = {"Resource"} Type= IRI/cover Label="DatatypeProperty" n5 Label = {"Literal"} Type= IRI/pages value = "55" Label="DatatypeProperty" Datatype = str n3 Label = {"Literal"} value = 20 n4 Label = {"Literal"} Datatype = int value = 100 Datatype = int (c) RDF2PG Figure 6: PGs obtained for Case 3 Case 3.1 tests the transformation of datatype property statements. In this case, the differences between the libraries are more evident, as depicted in Figure 6. RDF-star Tools and RDF2PG represent each literal using a separate node- The properties of the edge are similar to the ones described for Case 1 with RDF2PG using “DatatypeProperty” as edge label. Neosemantics instead, only creates a single node representing the complete set of input RDF statements; the nodes has one property for each datatype property statement in the input. Additionally, Cases 3.1 and 3.2 also test how transformation approaches support data types and language tags in RDF statements. While Neosemantics ignores them, both RDF-star Tools and RDF2PG define the nodes representing the literal objects with type literal and use the XSD schema data type as annotation. Cases 4–6 test different RDF features, such as RDF lists (case 4), blank nodes (case 5), and named graphs (case 6). All three projects support RDF lists and blank nodes but only Neosemantics can also import named graphs. 6.2. RDF-star Let us now discuss over findings for evaluating Cases 8 through 13 from Table 1 targeting diverse RDF-star constructs. We have observed that many of them are not supported by existing libraries. RDF2PG does not support embedded RDF-star statements, so none of these cases could Legend IRI = http://example.com/ Legend IRI = int=http://www.w3.org/2001/XMLSchema\#integer” Label= IRI/age n1 n1 n2 "uri" = IRI/Mark certainty = 1 "kind" = "Literal" Labels = {"Resource"} "kind" = "IRI" "Literal" = 28 "IRI" = IRI/Mark age = 28 "Datatype" = int (a) Neosemantics (b) RDF-star Tools Figure 7: PGs obtained for Case 9 be converted to a PG. Neosemantics ignores RDF-star statements where: (i) the object of the asserted statement is a literal value (Case 9) or an embedded statement (Case 10), (ii) all elements of an RDF-star statement are RDF resources (Cases 11.1 and 11.2), (iii) the predicate of the embedded or asserted statement is "rdf:type" (Cases Label= 12.1 and 12.2), or (iv) the RDF-star statement is a double embedded IRI/age statement n1 (Case 13). n2 RDF-star Tools supports most cases by creating "kind" nodes in the PG for literal objects. However, = "Literal" certainty = 1 "kind" Cases = "IRI" 10 and 13 cause an error in the reading "Literal" = 28 of the conversion: we believe that this is an phase "IRI" = IRI/Mark implementation error, and both cases could be supported "Datatype" = int in the same manner as the others. Among all cases, Case 9 is particularly interesting and represents a natural extension of Case 3.1. If the embedded statement (mark,age,25) is translated in PGT-style, e.g., by Neosemantics, mark is represented as a node with a property having age as key and 25 as value. It is then not straightforward how to convert the asserted statement (stating that the certainty of (mark,age,25) is 1). Hence, when running Neosemantics, we noticed that it does not transform the asserted statement completely. RDF-star Tools (RPT-style transformation), on the other hand, converts the statement in Case 9 by transforming the embedded statement to an edge with type age between two nodes representing mark and 25; the part of the asserted statement is represented as a key-value property associated to the age edge. 6.3. Edge cases Case 2 (resource both as a predicate and a subject/object in another statement) is not supported correctly by existing transformation approaches. The result PGs include two separate and independent elements for such resources: one node and one edge property. This is different from RDF, where an IRI identifies the same resource independently from its position in an RDF statement. Annotating RDF predicates (both datatype and object properties) is a common way to model schema information in RDF. Our finding suggests that there may be more conversion problems when looking at RDF graphs that include RDF Schema and OWL axioms. Existing tools do not have approaches to work at schema level. This investigation is beyond the scope of this paper but we plan to investigate it in our future research. Other interesting cases are those where the RDF input contains multiple statements with the same subject and predicate but different objects. In Case 14.1, for example, contains two datatype property statements with the same subjects and predicates. Neosemantics converts the literals ("Info_page","aau_page") into a list of strings12 used as a value in a property of the node representing the subject of the RDF statement. 6.4. RDF-topology Preservation Transformation (RPT) and Property Graph Transformation (PGT) Our analysis confirmed our observation that RDF-star Tools follows the RPT approach; the approach creates a node for each distinct subject and object of every RDF statement. Predicates and objects of RDF-star statements are transformed into properties with the key corresponding to the predicate and the value to the object. Neosemantics, in contrast, follows PGT approach for most of the datatype property statements, e.g., Cases 2.1, 3.1, 3.2, 11.2, and 14.1, and supports types and labels (Case 2.4, 5, 6, and 7) slightly differently. Finally, the RDF2PG project adopts the RPT approach in all cases handled. In summary, the two lines of transformation approaches, RPT and PGT, form the core of start-of-the-art libraries. However, none of the tested libraries supports all test cases. In the end, the particular requirements for converting RDF-star to PGs vary based on the particular use cases and application-specific needs. In one use case, a user might favor one transformation approach over the other based on ontology availability, the application domain, performance, or additional application-specific needs. And driven by real-world applications, users might want to adopt a combination of the basic transformation approaches to best fit their needs. 6.5. Discussion Ontology and schema availability Application domains, such as financial services, life science, and military, have well-established domain ontologies that are native to the RDF/RDF-star model. Such use cases often also exhibit computationally expensive queries and algorithms that are better for PGs, Hence, in such cases, RPT is preferable to convert the ontology and PGT to convert instance data to reduce the number of nodes and therefore improve runtime. Another use case involves harvesting Linked Open Data [21, 22] where the PGT approach is usually preferable. Likewise, PGT is easier to use in combination with tabular data. However, in situations, where the ontology is essential and the RDF data is highly complex and heterogeneous, users may prefer RPT. In the end, some graph-based machine learning models also require graphs that model literal property values as nodes. Performance and query complexity Query performance over graphs generally depends on the number of edges and nodes [23]. Therefore, the user may choose to convert an RDF/RDF- star graph into a PG using a specific transformation method to comply with performance requirements. For example, in an application for a smart home, the authors transformed the RDF graph into a PG based on a custom transformation [24]. The authors evaluated the usage of a PG generated by the custom transformation versus the PG generated by NSMTX13 . The PG of the custom transformation had two nodes and one edge compared to 16 nodes and 15 edges for the PG generated by the NSMTX plugin. As a result, executing queries over the former graph is 12 https://neo4j.com/docs/cypher-manual/current/syntax/values/#composite-types 13 https://neo4j.com/nsmtx-rdf/ more efficient than over the NSMTX graph. The custom transformation eliminated many edges (i.e., relations) and converted them into properties. Since RPT tends to convert each triple into an edge, the transformation generates many nodes compared to PGT that wraps some triples as properties for nodes. Data sharing RPT allows RDF-star asserted triples (i.e., edge attributes) to be represented as nodes in the PG, treating them as individual entities. This approach can make the graph representation more expressive for other users to understand and re-use than PGT. On the other hand, the PGT can result in many nodes with properties as literal key-value pairs, not explicit edges. This representation can be a natural choice to represent descriptions of nodes. 7. Conclusion In this paper, we have evaluated and discussed how to transform RDF-star graphs into property graphs. To evaluate existing approaches (Neosemantics, RDF-star Tools, and RDF2PG), we have identified a number of test cases. Our analysis has shown that none of these three approaches supports all test cases and that none of them is the best for all applications. None of them con- siders user requirements for the transformation process. Nevertheless, existing approaches can roughly be categorized into two lines of transformation approaches: RDF-topology preserving transformation (RPT) and PG transformation (PGT). In our future work, we plan to expand our experiments to comprehensive datasets and combining RPT and PGT into a single hybrid transformation approach. Additionally, we plan to work on the query interoperability between RDF-star and property graphs. Acknowledgments This research was partially funded by the Danish Council for Independent Research (DFF) under grant agreement no. DFF-8048-00051B and the Poul Due Jensen Foundation. References [1] M. Rodriguez, P. Neubauer, Constructions from Dots and Lines, American Society for Information Science and Technology (2010) 8366. [2] O. Hartig, Foundations of RDF* and SPARQL*:(An Alternative Approach to Statement-Level Metadata in RDF), in: AMW, 2017. [3] D. Tomaszuk, RDF Data in Property Graph Model, in: MTSR, 2016, pp. 104–115. [4] N. Francis, A. Green, P. Guagliardo, L. Libkin, T. Lindaaker, V. Marsault, S. Plantikow, M. Rydberg, P. Selmer, A. Taylor, Cypher: An Evolving Query Language for Property Graphs, in: SIGMOD, 2018, pp. 1433–1445. [5] A. Deutsch, Y. Xu, M. Wu, V. E. Lee, Aggregation Support for Modern Graph Analytics in TigerGraph, in: SIGMOD, 2020, pp. 377–392. [6] O. Hartig, Foundations to Query Labeled Property Graphs using SPARQL, in: AMAR, 2019. [7] R. Angles, H. Thakkar, D. Tomaszuk, RDF and Property Graphs Interoperability: Status and Issues, in: AMW, 2019. [8] O. Lassila, M. Schmidt, B. Bebee, D. Bechberger, W. Broekema, A. Khandelwal, K. Lawrence, R. Sharda, B. Thompson, Graph? yes! which one? help!, preprint arXiv:2110.13348 (2021). [9] J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann, M. Morsey, P. Van Kleef, S. Auer, et al., DBpedia – A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia, Semantic web (2015) 167–195. [10] F. M. Suchanek, G. Kasneci, G. Weikum, Yago: A Core of Semantic Knowledge, in: Proc. of TheWebConf, 2007, pp. 697–706. [11] D. Vrandečić, Wikidata: A New Platform for Collaborative Data Collection, in: TheWebConf, 2012, pp. 1063–1064. [12] S. Khayatbashi, S. Ferrada, O. Hartig, Converting Property Graphs to RDF: A Preliminary Study of the Practical Impact of Different Mappings (2022). [13] R. Angles, H. Thakkar, D. Tomaszuk, Mapping RDF Databases to Property Graph Databases, IEEE Access (2020) 86091–86110. [14] V. Nguyen, O. Bodenreider, A. Sheth, Don’t Like RDF Reification? Making Statements about Statements Using Singleton Property, in: TheWebConf, 2014, pp. 759–770. [15] J. J. Carroll, C. Bizer, P. Hayes, P. Stickler, Named graphs, Journal of Web Semantics (2005) 247–267. [16] J. Frey, K. Müller, S. Hellmann, E. Rahm, M.-E. Vidal, Evaluation of Metadata Representa- tions in RDF stores, Semantic Web (2019) 205–229. [17] D. Hernández, L. Galárraga, K. Hose, Computing How-Provenance for SPARQL Queries via Query Rewriting, PVLDB (2021) 3389–3401. [18] O. Pelgrin, L. Galárraga, K. Hose, Towards Fully-fledged Archiving for RDF Datasets, Semantic Web (2021) 903–925. [19] J. Bruyat, P.-A. Champin, L. Médini, F. Laforest, PREC: semantic translation of property graphs, arXiv preprint arXiv:2110.12996 (2021). [20] G. Abuoda, D. Dell’Aglio, A. Keen, K. Hose, Transforming RDF-star to Property Graphs: A Preliminary Analysis of Transformation Approaches - extended version, arXiv preprint arXiv:2210.05781 (2022). [21] A. Harth, K. Hose, R. Schenkel (Eds.), Linked Data Management, Chapman and Hall/CRC, 2014. [22] O. Hartig, K. Hose, J. F. Sequeda, Linked data management, in: Encyclopedia of Big Data Technologies, Springer, 2019. [23] S. Das, J. Srinivasan, M. Perry, E. I. Chong, J. Banerjee, A Tale of Two Graphs: Property Graphs as RDF in Oracle, in: EDBT, 2014, pp. 762–773. [24] N. Baken, Linked Data for Smart Homes: Comparing RDF and Labeled Property Graphs, in: LDAC2020, 2020, pp. 23–36.