JPA Criteria Queries over RDF Data

                            Claus Stadler1 and Jens Lehmann2
                    1
                    Computer Science Institute, University of Leipzig
                         cstadler@informatik.uni-leipzig.de
        2
          Computer Science Institute III, University of Bonn & Fraunhofer IAIS
            jens.lehmann@cs.uni-bonn.de, jens.lehmann@iais.fraunhofer.de


         Abstract. We present the design and implementation of a prototype
         system for querying RDF data via the Java Persistence API (JPA) cri-
         teria query feature. The JPA is a specification for management of (pri-
         marily, but not limited to) relational data and provides a framework
         for uniform storage and retrieval of Java objects using various backends.
         The framework provides the Criteria API, which enables building queries
         programmatically against a Java domain model and executing them on
         any supported backend. In this short paper, we describe our work to-
         wards supporting the Web of Data as a new backend. Our contributions
         comprise (i) a system design for enabling JPA compliant object/RDF
         mappings together with de-/serialization of object graphs as RDF, (ii)
         an approach for rewriting criteria queries to SPARQL queries, and (iii)
         a prototype implementation.


Keywords: RDF, SPARQL, JPA, Criteria Query, Query Rewriting


1     Introduction
A widely adopted practice in object oriented programming is to devise a domain
model together with a data access abstraction for storing and retrieving persis-
tent domain objects, referred to as entities. A main task of this abstraction is to
facilitate the mapping between entities and the model supported by the backend.
Querying and storing RDF data with object oriented programming languages
suffers from similar conceptual and technical difficulties as encountered in the
SQL domain, where these issues have become known as the impedance mismatch.
    One standard solution with the goal of overcoming these issues is the Java
Persistence API (JPA), which is a specification (latest version 2.1 from 2013)3 for
management of (primarily, but not limited to) relational data. Besides facilitating
the mapping of Java entities to and from an underlying data store, it defines the
criteria API which provides a programmatic, database-agnostic way for querying
objects. Criteria queries are expressed over the classes and attributes of the
domain model, and are thus independent of the specifics of the backend. JPA
implementations perform the translation to corresponding queries supported by
the respective backend.
3
    http://download.oracle.com/otndocs/jcp/persistence-2_1-fr-eval-spec/index.html
2          Claus Stadler and Jens Lehmann

    The essence of a pure JPA abstraction for RDF is the possibility to enable
development against that data without having to deal with RDF and SPARQL
specifics. The following advantages result from this: (i) Simplified consumption
of RDF data from the Web of Data in Java applications by means of a declara-
tive - rather than programmatic - mapping approach. Naturally, this limits the
application to cases where the domain and RDF models are sufficiently similar
for such mapping to exist. (ii) Unified querying over RDBs and triple stores
via the criteria API, as well as higher flexibility in exchanging backends. (iii)
De-silo-ification: Data silos of existing applications based on the JPA could be
upgraded to use RDF stores and participate in the Web of Data, without any
change in their application logic.
    In this work, we make the following contributions towards enabling these ben-
efits: (i) A JPA-based system design for enabling object/RDF (short: O/RDF)
mappings, (ii) considerations for rewriting criteria queries to SPARQL via map-
pings, and (iii) a prototype implementation that enables querying over Java
entities backed by RDF data. The prototype is available as Open Source4 as
the mapper module in our Jena-based Semantic Web toolkit. It is published on
Maven Central5 under the license is Apache 2 license.
    The remainder is structured as follows: In Section 2, we present a simple
example demonstrating a criteria query over an annotated Java class. In Sec-
tion 3, we provide more details about the JPA and introduce important notions
for rewriting them to SPARQL. Related work is summarized in Section 4. After-
wards, Section 5 describes the core design of our system, especially the aspect
of establishing a mapping between Java object graphs and their corresponding
RDF graph. In Section 6 we present our approach to rewriting criteria queries.
Finally, we conclude in Section 7.


2     A Mapping and Criteria Query Example
The mapping of Java objects to and from RDF, as well as the criteria query-
ing processing, is based on mapping information associated with classes. In this
section we present an example based on DBpedia. Note, that in principle, map-
pings can be stored separately from classes, and multiple mappings can exist for
a single class. Choosing the appropriate set of mappings is part of the O/RDF
engine configuration. Our system supports a set of Java annotations for this
purpose of which essential ones are demonstrated in Listing 1 and are described
as follows: The @DefaultIri annotation is a non-obstrusive (i.e. requires no ad-
ditional methods or attributes) way for specifying a rule how to generate IRIs
for instances of the class. Its argument is a string in the Spring Expression Lan-
guage. @RdfType causes RDF generated from entities of that class to include
the corresponding rdf:type triple. This annotation also acts as a constraint dur-
ing criteria query processing when requesting entities of that class. @Iri and
@IriNs both associate an attribute with an RDF property, whereas @IriNs is a
4
    https://github.com/AKSW/jena-sparql-api/tree/master/jena-sparql-api-mapper
5
    http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22jena-sparql-api-mapper%22
                                           JPA Criteria Queries over RDF Data       3

     short-hand that constructs the property IRI by appending the attribute name
     to the given namespace. @Lang indicates that a String field maps to an RDF
     term with a language tag, as configured in the O/RDF engine. @Datatype needs
     to be specified if an attribute value’s Java datatype differs from the one in the
     RDF model. A simple criteria query asking for all companies founded after 1950
     having at least 10K locations is shown in Listing 2.
 1   @RdfType("dbo:Company")
 2   @DefaultIri("dbr:#{name}")
 3   public class Company {
 4     @Lang @Iri("rdfs:label")             private String name;
 5     @IriNs("dbo") @Datatype("xsd:gYear") private int foundingYear;
 6     @IriNs("dbo")                        private int numberOfLocations;
 7     /∗ ... ∗/
 8   }
                   Listing 1. An annotated Java Company domain class

 1   CriteriaBuilder cb = em.getCriteriaBuilder();
 2   CriteriaQuery<Company> cq = cb.createQuery(Company.class);
 3
 4   Root<Company> r = cq.from(Company.class);
 5   cq.select(r)
 6       .where(cb.greaterThan(r.get("foundingYear"), 1950))
 7       .where(cb.greaterThanOrEqualTo(r.get("numberOfLocations"), 10000))
 8       .orderBy(cb.asc(r.get("foundingYear")))
 9
10   List<Company> matches = em.createQuery(cq).getResultList();
              Listing 2. A simple criteria query over the Company entity class


     3     Preliminaries
     3.1   Overview of the JPA
     A criteria query comprises the following basic information. For brevity, we do
     not consider grouping, aggregates and sub-queries.
      – The result type, which is the Java class being queried for. Most commonly,
        this is simply an entity class (such as Company), but it can also be the class
        of an entity’s attribute (the company’s name) or that of a computed value
        (the company’s average number of locations).
      – A set of Query Roots (short: root): A root always references an entity class
        and serves two purposes: (i) Roots introduce the initial sets of entities on
        which query evaluation operates. Evaluation of a criteria query conceptually
        constructs the cartesian product among the sets of entities referenced by
        the roots. (ii) Roots serve as starting points for navigation along paths of
        attributes. For example, a query root based on the Company class enables
        obtaining a path referencing the foundingYear attribute. It is important to
        note, that roots and paths are primitive expressions and can thus participate
        in compound ones.
4          Claus Stadler and Jens Lehmann

    – Constraints: A set of predicate expressions constraining the set of objects in
      the query result.
    – Orders: A list of (expression, sort-direction) pairs.
    – The Selection: An expression computing the final values of the result set
      (based on the query root’s cartesian product). Often simply a root.
    – Distinct: Removes duplicates from the result set.
      The most important JPA components are:
    – The EntityManager is the entry point for persistence-related operations on
      Java entities. It provides a standard interface for creating, reading, updating,
      and deleting entites(i.e. CRUD operations), and enables querying over them
      independently of the underlying data store. It provides the getCriteriaBuilder
      method which is the starting point for criteria query construction.
    – The CriteriaBuilder is the factory for all criteria related constructs, namely
      criteria queries, compound selections, expressions, predicates, and orderings.

3.2     SPARQL Concepts and Roles
The process of rewriting criteria queries to SPARQL requires mapping entity
classes and attributes to their counterparts in SPARQL. For this purpose, we
introduce the following notions borrowed form description logics, and adapt them
to SPARQL. Note, that a similar idea for translating OWL class expressions to
SPARQL is presented in [1]. Let GP and V be the infinite sets of SPARQL
graph patterns6 and variables, respectively, and vars be the function that yields
a graph pattern’s variables.
Definition 1. A SPARQL Concept is a pair (gp, v) with gp ∈ GP and v ∈
vars(GP ), and intentionally denotes a set of resources/individuals whose exten-
sion over an RDF graph is obtained by evaluating its graph pattern and projecting
the stated variable.
Definition 2. A SPARQL role is defined as (gp, s, t) with gp ∈ GP and s, t ∈
vars(gp). Its evaluation over an RDF graph denotes a binary relation between a
set of source and target resources.
SPARQL roles are a powerful notion, as they enable relating resources to com-
puted values, such as ({ ?s dbo:foundingYear ?x. BIND(year(?x) As ?o) }, ?s,
?o). This feature is necessary to e.g. correctly process the criteria query in List-
ing 2, where RDF terms of type xsd:gYear are mapped to Java integers. An
empty role represents a zero-length path and is expressed as a role with an
empty group graph pattern, and the same variable for source and target.
Definition 3. SPARQL Role concatenation r1 ◦ r2 → r3 yields a new role start-
ing with the source variable of r1 and the target one of r2 . The graph patterns
of r1 and r2 are grouped into a new one, and a FILTER statement equating the
target of r1 with the source of r2 is appended.
6
    https://www.w3.org/TR/sparql11-query/#GraphPattern
                                       JPA Criteria Queries over RDF Data         5

4   Related Work
Well known implementations of the JPA specification are EclipseLink7 (JPA’s
reference implementation), Hibernate8 and Apache OpenJPA9 , which, to the
best of our knowledge, do not feature RDF support. Yet, dedicated Java O/RDF
mapping frameworks exist, which are based on either one of the two predominant
Java RDF frameworks, namely Apache Jena10 and Eclipse RDF4J11 .
    Eclipse Komma[3]12 is an RDF4J-based framework, which provides its own
non-JPA EntityManager API and distinguishes between interface and behaviour
definitions. The latter implement one or more interfaces. Interfaces can carry
RDF mapping information, similar to that in Listing 1. When loading a given
RDF resource with Komma, the framework will, as of now, always yield a Java
proxy implementing all suitable interfaces, whose method calls will delegate to
all appropriate behaviors in customizable order. However, this approach makes
it difficult to reuse third party code, as classes intended to act as behaviors may
not be derived from a corresponding interface, and proxying, despite being a
powerful feature, is known to sometimes cause subtle issues in regard to equality
and inheritance checks. Historically, Komma evolved from the Alibaba13 project,
which in turn evolved from Elmo.
    EmpireRDF14 implements the JPA EntityManager interface and supports
querying the RDFized data with SPARQL and SERQL. However, it does not
feature support for criteria queries. Therefore, at present, none of the existing
solutions facilitate a pure abstraction that enables querying the domain model
without knowledge of the underlying RDF model.

5   System Architecture
In this section, we present the core design of our O/RDF system. Technically, the
system is designed to account for two main functions: (i) Recursively serializing
and de-serializing object graphs as RDF. For serialization, the process is initiated
by requesting the state of a Java object to be written out as an RDF graph rooted
in a given IRI. For de-serialization, the request is to load an IRI’s RDF data as
an instance of given class, whereas the returned object’s actual type may be
that of a subclass; e.g. a request for Person may yield an Actor. While this
functionality does not differ substantially from related work, it is these parts
of the system that hold RDF mapping information and need to be extended to
support criteria query processing. (ii) Executing criteria queries on a SPARQL
backend. This involves rewriting criteria queries to SPARQL and eventually
deserializing IRIs as Java entities in order to construct the final result set.
7
   http://www.eclipse.org/eclipselink/
8
   http://hibernate.org/
 9
   http://openjpa.apache.org/
10
   https://jena.apache.org/
11
   http://rdf4j.org/
12
   https://github.com/komma/komma
13
   https://bitbucket.org/openrdf/alibaba
14
   https://github.com/mhgrove/Empire
6         Claus Stadler and Jens Lehmann

    As part of the O/RDF mapping process, we need to determine the RDF
graph corresponding to an entity. For this purpose, we introduce the notion of a
Resource Shape, which is a specification to be evaluated over an RDF graph in
regard to a given SPARQL concept. This yields for every resource matched by
the concept the (possibly empty) RDF graph matching the shape, referred to as
shape graph. Upon retrieval of an entity, its corresponding shape graph serves
as the basis for populating its attributes. Upon storage of an entity, the set of
added/removed triples is computed from comparing the shape graph at retrieval
time to the RDF graph obtained from the entity’s latest state. At present, re-
source shape specifications are built using our own API which provides methods
for matching triples via ingoing and outgoing property paths. Relevant related
work in this regard are the ongoing efforts on Shape Expressions (ShEx) [2] and
the Shapes Constraint Language (SHACL)15 .

5.1     O/RDF Mapping Components

In this section, we describe the main components for de-/serializing object graphs
from/to RDF graphs, depicted in Figure 1.


               Fig. 1. Core components of the O/RDF mapping system

TypeDecider An IRI alone is insufficient for determining the appropriate set
of corresponding candidate entity classes, as any class could act as a “view” over
the resource’s RDF data. The purpose of the TypeDecider is to narrow down this
set of candidates – ideally to a single entity class. Note, that this functionality
requires all entity classes to be known to the O/RDF system in advance. The
TypeDecider supports exposing a resource shape for a given base class, whose
results can be passed to the getApplicableTypes method in order to decide on
the applicable sub-classes a resource can be loaded with. Also, for a given entity
and its corresponding IRI, it can write out the triples needed to preserve the
entity type in RDF.

RdfType is the core interface for establishing an O/RDF mapping for an indi-
vidual Java class. The getJavaClass method returns that class and createJavaOb-
ject is used to create fresh, unpopulated instances of it. The latter method takes
15
     https://www.w3.org/TR/shacl/
                                             JPA Criteria Queries over RDF Data                 7

an RDF term as argument in order to support instantiation of primitive/im-
mutable Java types from RDF literals. The getRootNode method returns a Java
object’s RDF term which acts as the root in its RDF serialization. Some Java
classes do not have an identity on their own, in which case hasIdentity returns
false. For example, an instance of a Collection generally neither has an attribute
nor an entry that uniquely identifies a specific instance. Yet, upon creating the
RDF model, there needs to be an IRI that represents the collection in order to
establish links to the contained items’ corresponding RDF terms. If an object’s
class does not provide an IRI by itself, one is created based on the sequence
of attribute names by which that object was reachable from an entity with an
identity.
    The resolvePath method is crucial for rewriting criteria queries to SPARQL.
It returns for a given attribute name a PathFragment object, which holds the
corresponding SPARQL role together with information for the RdfMapperEngine
about how to resolve sub-paths.
    The exposeShape method yields the RdfType’s resource shape. The corre-
sponding shape graph can be passed to the populate method, in order to par-
tially update a given entity’s state and obtain a set of references which need
further resolution. A reference comprises the information of which IRI needs to
be resolved as an entity of which Java class, together with a callback function
that updates the given entity’s state with the result of the resolution.
    The exposeGraphFragment method converts an entity’s state to an RDF
graph fragment and is thus the opposite operation of populate. The exposed
fragment comprises an RDF graph together with a mapping of which of its re-
sources correspond to entities for which further RDF graph fragments can be
obtained.

RDFMapperEngine provides the essential functionality for retrieval and stor-
age of entities and handles all recursive aspects of RDF de-/serialization and role
construction based on the registered RdfTypes. Notably, it provides the Role-
Builder facade for resolving paths of attribute names to SPARQL roles. The
RDFMapperEngine is the core of our JPA EntityManager implementation.

6    Rewriting Criteria Queries
Here, we outline our approach for rewriting criteria queries to SPARQL. The fun-
damental operation is to rewrite primitive criteria expressions, i.e. roots, paths
and constants, to SPARQL. Let P be an initially empty list of graph patterns,
and rewrite a function that yields for a given criteria expression a SPARQL
expression. Then rewrite(path, P ) obtains the SPARQL role from RDFMap-
perEngine, adds the role’s graph pattern to P and returns the target variable
as a SPARQL expression. Alias names become SPARQL variable names, and
during rewriting, every criteria expression is assigned a fresh unique alias if none
was provided at query construction time. As arithmetic, (in)equality, and condi-
tional criteria expressions have direct counterparts in SPARQL, their rewrite is:
rewrite(opcriteria (a1 , . . . , an ), P ) → opsparql (rewrite(a1 , P ), . . . , rewrite(an , P ))
    Let Q be the target SPARQL query. The essential rewriting steps are:
  8         Claus Stadler and Jens Lehmann

      – Add every query root’s corresponding SPARQL concept graph pattern to Q.
      – Rewrite the constraint expressions and add them as FILTERs to Q.
      – Rewrite the sort condition expressions and add their graph patterns as OP-
        TIONAL patterns to Q.
      – Add the graph patterns of selection expressions as OPTIONAL patterns.
      – Apply DISTINCT, LIMIT and OFFSET of the criteria query directly to Q.
  Finally, the criteria query result set is constructed by executing the SPARQL
  query and using each solution binding to retrieve entities from the RdfMap-
  perEngine according to the criteria query’s selection. Listing 3 shows the rewrite
  of Listing 2.
1 SELECT DISTINCT ?s {
2   ?s a dbo:Company .
3   ?s dbo:foundingYear ?a      . BIND(year(?a) As ?x) . FILTER(?x > 1950)
4   ?s dbo:numberOfLocations ?y . FILTER(?y >= 10000)
5   OPTIONAL { ?s dbo:foundingYear ?b . BIND(year(?b) As ?z) }
6 } ORDER BY ASC(?z)
               Listing 3. The example criteria query translated to SPARQL

  7      Conclusion and Future Work
  In this submission, we (i) presented a system architecture for object/RDF map-
  pings, (ii) outlined notions for rewriting JPA criteria queries to SPARQL, and
  (iii) provide an Open Source prototype implementation. We demonstrate by ex-
  ample that our approach enables querying over Java domain models without the
  need to be aware of RDF and SPARQL. For certain use cases, this can greatly
  simplify querying, consumption, creation and modification of RDF data in Java
  applications. Our approach to translating criteria queries to SPARQL can be
  naturally complemented with arbitrary SPARQL (federation) engines in order
  to facilitate querying over the Web of Data. In the future we will extend the
  feature set, work on a more rigorous formalization and clarify semantic aspects.
  Directions for future research in regard to O/RDF systems are query optimiza-
  tion and performance analyses.

  Bibliography
  [1] S. Bin, L. Bühmann, J. Lehmann, and A.-C. Ngonga Ngomo. Towards
      SPARQL-based induction for large-scale RDF data sets. In ECAI 2016 - Pro-
      ceedings of the 22nd European Conference on Artificial Intelligence, volume
      285 of Frontiers in Artificial Intelligence and Applications, pages 1551–1552.
      IOS Press, 2016.
  [2] S. Staworko, I. Boneva, J. E. Labra Gayo, S. Hym, E. G. Prud’hommeaux,
      and H. Solbrig. Complexity and expressiveness of shex for rdf. In
      LIPIcs-Leibniz International Proceedings in Informatics, volume 31. Schloss
      Dagstuhl-Leibniz-Zentrum fuer Informatik, 2015.
  [3] K. Wenzel. Komma: An application framework for ontology-based software
      systems. Semantic Web–Interoperability, Usability, Applicability, 2010.