=Paper=
{{Paper
|id=Vol-3830/paper2
|storemode=property
|title=Generating Transparent and Query-Based RDF Layers
|pdfUrl=https://ceur-ws.org/Vol-3830/paper2.pdf
|volume=Vol-3830
|authors=Nils Rollshausen,Eduard Kamburjan,Martin Giese
|dblpUrl=https://dblp.org/rec/conf/semiim/RollshausenKG24
}}
==Generating Transparent and Query-Based RDF Layers==
Generating Transparent and Query-Based RDF Layers
Nils Rollshausen1 , Eduard Kamburjan2 and Martin Giese2
1
Technical University of Darmstadt, Germany
2
University of Oslo, Norway
Abstract
We present inkblot , a software tool that generates object-oriented API code to represent the data found
in a semantic model in a way suitable for programmers not familiar with semantic technologies. The API
interacts with an RDF triple store via a SPARQL endpoint, but hides this connection from the programmer.
The approach generates adequately typed fields and methods, based on the types and cardinalities in the
data. Unlike previous work, the generation is not driven by an OWL ontology or similar declarative
description of the application domain, but by a SPARQL query that accesses the pertinent data.
Keywords
Code generation, Object-orientation, Graph query
1. Introduction
Motivation. Object-oriented (OO) software development strives to produce designs contain-
ing classes with fields and relationships corresponding to entities in the application domain.
The same holds true for semantic technologies, where significant effort is invested to produce
ontologies that accurately formalize a conceptualization of the domain.
It is thus inevitable when interfacing between application software and semantic technologies
that domain concepts will be represented twice, once in the form of types in the Resource
Description Framework (RDF) sense, and once in an object-oriented class hierarchy. Moreover,
boilerplate code needs to be written that transfers information between the two representations.
The need for these manually created and maintained abstraction layers is a source of errors and
presents a significant barrier for adoption of semantic web technologies.
This situation naturally leads to the idea of automatically generating program code that fits a
semantic model in the form of a library or API that exposes a class hierarchy that is consistent
with a semantic model of the domain, and that incorporates services such as synchronization
with a triple store, reasoning, etc. Due to the wealth of research on ontology languages like
OWL as a means of expressing domain knowledge, a variety of approaches have been proposed
to generate such code from an ontology [1, 2, 3, 4]. However, we believe that there are some
fundamental difficulties when using an ontology for this purpose.
First, the ontology describes the domain, and not the information available. For example, an
ontology about human beings may rightly express that every human has a mother. Translating
this to a requirement for the OO code stating that the ‘mother’ field may never be null is
SofLiM4KG’24: Software Lifecycle Management for Knowledge Graphs Workshop, November 11–12, 2024, Baltimore, US
Envelope-Open nrollshausen@seemoo.tu-darmstadt.de (N. Rollshausen); eduard@ifi.uio.no (E. Kamburjan); martingi@ifi.uio.no
(M. Giese)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
nonsensical since any given dataset will not contain information about the mother of every
human mentioned. Typing, cardinalities, and non-nullness in the code should be informed
by what information will have to be represented, and not the realities of the domain as such.
Second, the typing of relations in a semantic model does not fit that of associations in object
orientation. E.g., domain statements are inherited upwards in the class hierarchy: if 𝑅 has
domain 𝐶, then 𝑅 also has domain 𝐶’ for any superclass of 𝐶. But if an OO class 𝐶 has a field 𝑓,
then any subclass of 𝐶 will also have the field.
Both of these are instances of the so-called impedance mismatch [5, 6] or semantic gap [7, 8]
between the object model of RDF, geared towards data-driven tasks, and the object model of
programming languages, geared towards typeability and modularity.
Approach. We explore a different approach that uses a closed-world formalism to describe
the shape of the information to be represented. Specifically, we require the user to formulate
a SPARQL Protocol and RDF Query Language (SPARQL) query that loads the information
required. Based on this query and some additional configuration information (field names,
multiplicities, etc.), we generate API code that provides an object oriented representation of the
information, as well as implementations to lazily load and store data from and to a SPARQL
endpoint.
The core challenge here lies in automatically generating writeback SPARQL updates given
a single retrieval query, including updates for object creation, modification, and deletion. An
approach for seamless read-only integration of semantic web data into object-oriented languages
in a query-based manner has already been described in [8].
Our proposed API includes a runtime that supports caching and handles consistency of the
loaded objects. Furthermore, it includes an extendable monitoring mechanism that (a) validates
assumptions on the RDF data using SHACL shapes or SPARQL validation queries, and (b) verifies
assumptions on the program data that must hold before the writeback to the RDF store. All
local changes are collected until a commit to improve performance, and the API works with
every write-enabled SPARQL endpoint. The generated API also supports creating completely
new objects, writing them into the store, and deleting existing objects from the RDF data.
Contribution. Our main contribution is a tool for query-based two-way integration of OO
and RDF through API generation from SPARQL queries. The tool is available as open source at
https://github.com/smolang/inkblot and has a modular backend that currently supports Kotlin
code generation, which can be used from any JVM-based language.
2. Motivating Example
We first introduce an example to illustrate our notion of transparent and query-based RDF layers.
Consider a software written in the domain of bicycles that is used to manage the inventory of a
shop. The inventory is stored in an RDF store, and must be manipulated using software in an
object-oriented language.
The software must manipulate RDF structures, but these structures are optimized for the
domain and data storage, not necessarily the application. More precisely, the data loaded from
1 SELECT ?bike ?mfg ?fw ?bw ?bells {
2 ?bike a bk:bike; bk:hasFrame [bk:frontWheel ?fw; bk:backWheel ?bw]
3 OPTIONAL { ?bike bk:mfgYear ?mfg }
4 OPTIONAL { ?bike bk:hasFrame [bk:hasBell ?bells] } }
5 SELECT ?wheel ?diameter {
6 ?wheel a bk:wheel; bk:diameter ?diameter. }
7 SELECT ?bell ?color WHERE {
8 ?bell a bk:bell;
9 bk:color ?color }
Listing 1: SPARQL retrieval queries for bike, wheel and bell objects.
and written to the RDF store must be transformed into OO structures. Take, for example, the
three SPARQL queries in Lst. 1 that load all bikes, wheels, and bells from the RDF store.
The first query loads all bikes, their wheels, and manufacturing dates. Note that there are bike
frames in the RDF store, to which the wheels are attached. However, they are not loaded because
they are not manipulated by the program, yet they are part of the ontology. Management
applications do not need to consider the semantic connotations of data, and software developers
prefer to manipulate the objects in terms of their programming language (except during I/O).
For example, the code in Lst. 2 implements two business logic operations: The first loads all
bikes lacking a manufacturing date and sets their date to the current year. The second creates a
new bicycle and equips it with a bell removed from an existing bike.
All SPARQL queries in this example are hidden in the special classes Bike , Wheel , and Bell
and their associated Factory classes that manage the I/O of data – only when we have to
provide a filter to extend the load query (l. 4), semantic technologies are accessed directly.
Our tool, inkblot , generates such code from the given SPARQL queries, provides basic oper-
ations on the loaded data, and offers a way to extend the generated structures with application
specific code. It furthermore manages unambiguous writeback and consistency, provides an
architecture to monitor its assumptions about the data at runtime, and supports lazy loading of
resources handled through a pointer structure.
1 Inkblot.endpoint = "http://example.com/sparql"
2
3 // add manufacturing year where unset
4 val undated = BikeFactory.commitAndLoadSelected("!bound(?mfg)")
5 undated.forEach{ cycle -> cycle.mfgYear = 2023 }
6
7 // create a tricycle and transfer an existing bell to it
8 val scrapBike = BikeFactory.loadFromUri("...")
9 val bell = scrapBike.bells[0]
10 scrapBike.bells_remove(bell)
11 val wheel1 = WheelFactory.create(diameter=20.0)
12 [...]
13 val newCycle = BikeFactory.create(wheel1, wheel2, 2023, listOf(bell))
14 Inkblot.commit() // commit all changes
Listing 2: Usage examples of the generated library classes
3. Architecture
The workflow of inkblot , pictured in Fig. 1, has three steps that transform a set of SPARQL
queries or configuration files into a set of classes in the target language, together with some
auxiliary structures to use at runtime.
Input The input to inkblot is a set of SPARQL queries. These queries are then translated into
a configuration file in which the user can refine the generated abstraction of the class: its
fields, its connection to other classes, and the multiplicity of fields.
Generation The library generation itself analyzes the input query (Variable Path Analysis) and
synthesizes several other queries for manipulating the RDF related to an object (Query
Synthesizer). The user can override some of the generated queries, especially if different
deletion behavior is required.
Based on analysis results and synthesized queries, the actual generation generates code
in the target language (Semantic Object Generator). To ensure that runtime data indeed
follows the assumptions used during generation (e.g., multiplicity of fields), SPARQL
queries and SHACL constraints for monitoring are generated (Validation Generator and
SHACL Generator). The SPARQL queries are included in the generated code, while the
SHACL constraints are intended for external monitoring.
Output The generated RDF layer consists of the inkblot runtime in the target language, and
a class and class factory for each query, as well as optional wrapper classes that can be
used in the type hierarchy of the application.
We will continue to go over the components of inkblot , as well as its input, in detail.
Figure 1: Workflow of inkblot . Rectangles are inkblot components, rounded rectangles are input/out-
put files. Dashed arrows indicate optional inputs/outputs.
3.1. Input
The input to inkblot is a set of SPARQL queries. A configuration record is generated for each
query. The configuration record for the query for wheels is given in Lst. 3. Each record has a
name and contains (a) the original query, (b) the anchor variable, (c) the RDF type of the anchor
variable, and (d) a list of property records. The anchor variable is used to load the node in the
RDF graph that is loaded into the object, the so-called anchor node. Two objects are considered
to model the same structure if they have the same anchor node.
Each property record has a name and contains (a) a SPARQL variable name from the query,
(b) a type, and (c) a cardinality. The SPARQL query must contain a path from the anchor
variable to each variable used in a property record, but parts of the path may be optional. The
type of a property is either an XSD datatype, the name of a property record, or the constant
inkblot:rawObjectReference . The cardinality is either optional "?" , solitary "!" or many
"*" .
Each property record becomes a field of the class generated for the query, with the name of
the record being the name of the field, loaded from query results using the contained SPARQL
variable. The type decides the type of the field – an XSD datatype becomes a primitive type,
another record name becomes a reference to that class, and anything else is handled as a URI.
Cardinality decides whether the field can have a null value (for ? ) or is a list (for * ).
In the default, unedited configuration, the most generic cardinality is used and the name of
each property is set to the name of the SPARQL variable. A property record is generated for
every variable in the result set of the input query.
1 "Wheel": {
2 "anchor": "wheel",
3 "type": "http://example.com/ns/class",
4 "query": "... SELECT ?wheel ...",
5 "properties": {
6 "diameter": {
7 "sparql": "diameter",
8 "type": "[classnameOrXSD]",
9 "cardinality": "*"
10 }
11 }
12 }
Listing 3: Excerpt of the default JSON configuration for wheels.
3.2. Generation
The configuration file is then processed by inkblot to generate the program code. The generated
code provides access to the fields defined for the class but also generates further queries and
structures for writeback, consistency, and validation.
Analysis. As a preprocessing step to synthesize queries that relay changes between the RDF
store and the program, we need to analyze variable dependencies in the input. The original
query, which we denote retrieval query, must have explicit paths from the anchor variable to
any variable used in a record.
The inkblot Variable Path Analysis component traverses the retrieval query and collects
all triple patterns contained in it, keeping track of potential optional contexts. The analysis
itself ensures that the query conforms to the limitations outlined below and builds a dependency
graph in which nodes are SPARQL variables, constant URIs or values, and edges are the RDF
predicates that are used to relate them to one another in the input query. It then computes the
set of all simple paths leading from the anchor variable to any variable (including those not
in the result set) or literal and verifies that the graph is connected. Paths traversing the same
nodes using different edges are considered to be distinct. The additional non-result variables
and literals are used later to update the RDF graph correctly.
Synthesis. The QuerySynthesizer component uses the results of the path analysis to create
additional SPARQL queries to update the RDF store. We distinguish between initialization,
addition, removal, change, and deletion updates. All of these updates can operate on either
literal values or object references without any changes. We will therefore only refer to general
values in the following, which can either be literal values or URIs representing object references.
To create an update that adds a new value to a property, for example, we first restrict the
dependency graph to only edges that are safe (i.e. not optional) in the context of the variable
representing the property. If that variable is defined inside of an Optional block, this includes
all edges found in the same block or any parent blocks. We then build a sub-graph by taking
the neighbourhood of the target variable and recursively extending it by the neighbourhood of
any non-anchor variables contained in it. All edges in this sub-graph are translated into triples
and form the basis of the generated update, substituting the object’s URI for the anchor variable
and the value to be added for the target variable. Triples that are known to exist already, even
if the property in question is not set yet, are included as the where condition of the SPARQL
update while all other triples are inserted.
To remove a value, we follow a similar procedure and build the same neighbourhood sub-
graph. We include the full sub-graph in the where condition but only delete triples that form
the last edge on a simple path from the anchor variable to the target variable. This ensures that
any annotations belonging to the removed value and used in the retrieval query for filtering
purposes stay intact.
A change update is equivalent to a removal update for the old value followed by an addition
update to set the new value. Together, these three types of updates cover everything needed to
modify the properties of our generated classes. To create and delete entire objects themselves,
we need additional initialization and deletion updates.
For object creation, we distinguish between the base creation update and additional initializer
updates. The base creation updates insert all triples corresponding to safe edges in the depen-
dency graph. Since all variables contained in this subgraph are strictly required, we can bind
them to matching required constructor arguments. For constructor arguments corresponding to
optional properties, we use initializer updates that are only executed if a non-null / non-empty
constructor argument is passed. Initializer updates are identical to addition updates but are
considered distinct to allow overwriting one but not the other.
For object deletion, we delete all triples that contain the anchor node of the deleted object in
either subject or object position. Other nodes are not deleted recursively to avoid unintentional
loss of data that may be required by other applications.1
1
In terms of our ongoing example, while our application may reasonably delete frame nodes when deleting a bike,
other applications with a different view of the data may expect the frame to continue to exist even if the bike itself
These updates, particularly the deletion update we have just discussed, assume specific
semantics for their operations. As these semantics may be application-specific, the user has the
option to provide their own update queries and override the ones generated by inkblot . In this
way, users can make adaptations to their specific application while still benefiting from fully
automatic code generation and the same level of runtime support as with the default updates.
Generation. Equipped with the synthesized SPARQL updates, the actual Semantic Object
Generator generates three classes per configuration record: a core class, a factory for this class,
and a wrapper class for the core class, which is a default implementation to add business logic
around the core class. We will describe them in detail in the next section. The factory class
uses slightly modified versions of the retrieval query to load individual instances or subsets
of all instances. For access to the defined properties, we generate standard getter and setter
methods that operate on locally cached property values. For references to other objects, we
only store the URI of the referenced object — the corresponding getter method transparently
lazy-loads and instantiates the referenced object on access. The details of lazy loading are
described by Kamburjan et al. [8]. All setter methods record changes made to the property
using the previously generated SPARQL updates in addition to updating the cached property
value. This creates the local changelog of SPARQL updates that will be applied to the data store
when local changes are committed.
The generated code is statically typed, using the type and cardinality information contained
in the configuration file. Boolean values, strings, and all numeric xsd types are supported
natively and mapped to appropriate Kotlin type equivalents. Other xsd types, such as dates, are
exposed to the user in their string representation. The communication with the RDF store over
the SPARQL endpoint is implemented using Apache Jena2 .
Validation. The annotated type and cardinality information is used to generate well-typed
code. However, this corresponds to assumptions about the actual data in the RDF store, which
may not hold at runtime. We therefore provide two mechanisms to validate the used data: For
one, inkblot generates SPARQL queries selecting instances with invalid types or cardinalities
using variations of the retrieval query with different added filter expressions. These validat-
ing queries are embedded into the generated Factory classes and automatically validate the
consistency of the data store before any objects are loaded or created. For another, inkblot
automatically generates SHACL constraints for all classes, encoding all assumptions made in
the configuration. The user can manually check these constraints using any SHACL validator.
Language portability Currently, only Kotlin is supported as a target language, as it can
use the numerous Java libraries for RDF and be used from other JVM-based languages, but the
generation backend is designed modularly and can be adapted to other languages. Only the
runtime and the semantic object generator components are specific to the target language; all
other components are language-independent and can be reused for different backends. New
is deleted and used for spare parts.
2
In our current Kotlin-backend, there are no Jena-specific design decisions in inkblot .
backends can be implemented as extensions of the AbstractSemanticObjectGenerator class
and can be selected by the user during generation.
Limitations. In terms of the supported SPARQL fragment, inkblot supports queries that
provide unambiguous writeback locations for all variables to allow persisting changes to the
data store. This means that each variable may only be bound in a single location and that the
provided graph patterns form connected graphs around the anchor node. Language features
such as subqueries, explicit variable bindings, and set difference operations also do not allow
unambiguous writeback and are not currently supported.
4. Generated API
The output of inkblot is a set of classes to be used together with the inkblot runtime. The
API is documented in greater detail at https://github.com/smolang/inkblot; we present only its
main capabilities here.
4.1. Runtime
The runtime of inkblot maintains a global cache of all loaded objects to ensure that each
anchor node is represented only by a single object at a given time and local changes remain
consistent. The cache is transparent to the user.
Additionally, it maintains a log of all changes to loaded objects and the SPARQL updates
required to persist these changes to the data store. When the user decides to commit these
local outstanding changes, the gathered updates are batched together and sent to the SPARQL
endpoint. To avoid unexpected behaviour when loading new instances from the data store
in the presence of uncommitted local changes, all methods that load multiple instances also
commit changes to the data store. The API assumes that no other party is modifying the RDF
data.
The API of the runtime permits the following operations to the user:
Loading Using the endpoint property, the URL of the used SPARQL endpoint for the RDF
connection can be set.
Writing Using the commit() method, all changes in the loaded objects are forced to be mirrored
in the RDF store. This method is also called by several of the generated classes, which we
detail below.
Monitoring The runtime monitors the consistency of local data in addition to the previously
described SPARQL and SHACL validation mechanisms for online data. This includes
checks for data types that do not have a one-to-one mapping to JVM primitive types, such
as the XSD NegativeInteger, verifying that values assigned to such properties conform to
the limitations of the underlying XSD type. If any such check fails, listeners added/re-
moved using the addViolationListener() /removeViolationListener() methods are
notified of this constraint violation, even before any changes are committed.
The runtime also implements lazy loading: whenever an anchor node is loaded, it is first
checked whether the runtime has an object for it in the cache. Similarly, when a loaded object
refers to another object, the second object is only loaded when the access actually happens. If
the object is never accessed, it is not loaded. For example, if the program loads a bike but never
accesses its wheels, then the wheels are not loaded at all.
4.2. Generated Classes
For each configuration record C, three classes are generated, as pictured in Fig. 2. The core class
SemanticObject SemanticObjectFactory
me rge (othe r) loadFromURI(uri)
de le te () commitAndLoadAll()
commitAndLoadS e le cte d(filte r)
validate Online Data()
addValidatingQue ry(que ry)
C CFactory
WrappedC
prope rtie s cre ate (...)
Figure 2: UML diagram of the generated classes for a configuration record C.
contains all the data, operations for changes, and manages the relation of the object to the RDF
store. The factory is used to create new core class objects either from scratch or by loading
them from the store. A wrapper class is available to include core classes in the type hierarchy
of the software application.
Core Class. A core class object provides read access to an object’s unique URI and full
read/write access to all properties defined in the configuration record. Using Kotlin’s built-
in support for property getter and setter methods, we render functional properties as first-
class Kotlin properties, making changelogging, lazy loading of object references and runtime
constraint checks completely transparent to the user. We also use Kotlin’s distinction between
nullable and non-nullable types to offload enforcement of ‘exactly one’ cardinality constraints
to the type system. For non-functional properties that can have multiple values, we still provide
access using first-class language features in the form of an immutable Kotlin list. Addition and
removal of list entries is handled using dedicated methods.
Inspired by the OpenCitations ocdm API [9], we also provide a merge operation that combines
two instances of the same class and redirects all references to either instance to the new unified
instance. All core classes implement the SemanticObject interface.
Factory. For each configuration record, inkblot generates the class itself as well as an
associated Factory class used for creating new instances (create() ) as well as loading existing
1 "Bike": {
2 "anchor": "bike",
3 "type": "http://example.com/ns/Bike",
4 "query": "... SELECT ?bike ...",
5 "properties": {
6 "frontWheel": {"sparql": "fw", "type": "Wheel", "cardinality": "!" }
7 "backWheel": {"sparql": "bw", "type": "Wheel", "cardinality": "!" }
8 "bells": {"sparql": "bell", "type": "Bell", "cardinality": "*" }
9 "mfgYear": {"sparql": "mfg", "type": "xsd:int", "cardinality": "?" }
10 }
11 }
1 sh:targetClass ;
2 sh:property [
3 sh:datatype xsd:int;
4 sh:path
5 sh:maxCount 1;
6 sh:name "..."; sh:message "...";
7 ].
Listing 4: JSON configuration for class ‘bike’ and an example SHACL shape.
instances from the data store. The factory provides methods to load all instances found in
the data store (commitAndLoadAll() ), a single instance specified by a URI (loadFromURI() ),
or a subset of instances specified by a SPARQL filter expression (commitAndLoadSelected() ).
These filter expressions are the only place where the user is exposed to the underlying semantic
technology.
The factory also allows users to perform validation of instances in the remote data store
at runtime using validateOnlineData() , optionally using custom validation queries (added
by addValidatingQuery() ) in addition to automatically generated ones. Contrary to the
monitoring of the runtime, which is performed on the local objects, these checks are performed
on the triplestore. The generated validating SPARQL queries, for example, are used here.
Wrapper. To allow users to extend generated classes with custom application logic, inkblot
can optionally create wrapper classes that contain a reference to a class instance and expose
the same set of properties lifted from that instance without containing any logic on their own.
5. Usage
Let us illustrate the overall workflow that connects the queries and the program code in Sec. 2
and follow the query for bike throughout generation. The generated default configuration
record is analogous to the one given for wheels in Sec. 3. After modification to fit our intents,
the configuration record is as shown in Lst. 4. The RDF type of the anchor variable is updated
(l. 3), the names of the properties are different from the SPARQL variables used to retrieve them
(l. 6,7,8,9) and the cardinality is changed. Every bike has exactly one front wheel and one back
wheel (l. 6, l. 7), at most one manufacturing year (l. 9) and arbitrarily many bells (l. 8). The
types of the properties are updated to correspond to an XSD datatype or the name of another
configuration record.
The following artifacts are then generated. First, the SHACL shapes, for example the shape in
Lst. 4 checking the cardinality and type of the manufacturing date. Second, the core class Bike ,
given in Lst. 5. As we can see, it has a property for each variable defined in its configuration
record. The bells list variable is not exposed, but an API is generated so users cannot manipulate
internals. Also note the cached loading of front wheels at l. 8 – the front wheel is only loaded
via the cache once accessed. Lst. 6 describes the full details of property access and modification
for mfgYear . While the loading is straightforward, changing the value requires generating
updates depending on whether the value is set to null and is deleted, set to a value for the first
time, or changed to a new value. These queries are added to the global changelog and only
executed when the changelog and cache are committed. Until then, the database is not updated.
1 class Bike internal constructor(uri: String, frontWheel: String,
2 backWheel: String, bells: List,
3 mfgYear: Int?) : SemanticObject(uri) {
4 [...]
5 // frontWheel
6 private var _inkbltRef_frontWheel: String = frontWheel
7 var frontWheel: Wheel
8 get() = Wheel.loadFromURI(_inkbltRef_frontWheel)
9 set(value) {...}
10 // backWheel
11 private var _inkbltRef_backWheel: String = backWheel
12 var backWheel: Wheel
13 get() = Wheel.loadFromURI(_inkbltRef_backWheel)
14 set(value) {...}
15 // bells, extra operations instead of setter
16 private val _inkbltRef_bells = bells.toMutableSet()
17 val bells: List
18 get() = _inkbltRef_bells.map { BellFactory.loadFromURI(it) }
19 fun bells_add(obj: Bell) {...}
20 fun bells_remove(obj: Bell) {...}
21 // manufacturing year
22 var mfgYear: Int? = mfgYear
23 set(value) {...}
24 // special merge operation
25 fun merge(other: Bike) {...}
26 }
Listing 5: Excerpt of core class ‘bike’
The BikeFactory handles creation of Bike objects and the connection to the RDF store. Most
of the interface is inherited from the generic SemanticObjectFactory , only the method to
create a Bike from scratch (create() ) is added, as well as an internal helper function to create
a Bike from a query result.
As the Bike class is part of the inkblot type hierarchy and contains all the logic for managing
RDF references and lazy loading, inkblot can optionally generate a wrapper/interface class
WrappedBike (Lst. 7) that has a Bike as a field and provides quick access to its fields as well as
a space to add custom application logic.
1 var mfgYear: Int? = mfgYear
2 set(value) {
3 if(deleted)
4 throw Exception("...")
5 if(value == null) { // Unset value
6 val oldValueNode = ResourceFactory.createTypedLiteral(field).asNode()
7 val cn = CommonPropertyRemove(uri, "/*mfgYear*/", oldValueNode)
8 Inkblot.changelog.add(cn)
9 }
10 else if(field == null) { // Pure insertion
11 val newValueNode = ResourceFactory.createTypedLiteral(value).asNode()
12 val cn = CommonPropertyAdd(uri, "/*mfgYear*/", newValueNode)
13 Inkblot.changelog.add(cn)
14 }
15 else { // Change value
16 val oldValueNode = ResourceFactory.createTypedLiteral(field).asNode()
17 val newValueNode = ResourceFactory.createTypedLiteral(value).asNode()
18 val cn = CommonPropertyChange(uri, "/*mfgYear*/", oldValueNode, newValueNode!!)
19 Inkblot.changelog.add(cn)
20 }
21
22 field = value
23 markDirty()
24 }
Listing 6: Excerpt of managing a property in the core class ‘bike’
6. Related Work
There are many existing works investigating the static generation of library code for RDF data
access. A comprehensive comparison of existing approaches can be found in [7], alongside a
detailed discussion of the semantic gap. In terms of Baset and Stoffel’s taxonomy, inkblot is an
Active/Static approach that uses SPARQL as the source language.
Most of the existing approaches use OWL ontologies as their input [1, 2, 3, 4]. All of these
works face the issue of the semantic gap between RDF and their respective target languages
and address them to varying degrees.
Among these, Sapphire [4] stands out as significantly narrowing the semantic gap by sup-
porting open-world reasoning and using JVM bytecode manipulation to allow re-typing objects
at runtime, matching RDF semantics.
Also owlready [3] includes extensive reasoning support, and addresses the gap by combining
open- and closed-world reasoning. It provides a sophisticated API for accessing and modifying
OWL ontologies in Python. Owlready redefines several Python methods and makes use of the
language’s dynamic nature to approximate RDF semantics. Other approaches for integrating
RDF with dynamic languages exist [10, 11], but are less relevant to our use case of generating
statically typed Kotlin/JVM code.
There are also several open-source tools to transform OWL to Java, implementing the ideas
found in [2] and other works. Two of these are Jastor 3 and owl2java4 , both of which generate
APIs that are structurally close to typical inkblot output. The Protégé ontology editor also
3
https://jastor.sourceforge.net/
4
https://github.com/piscisaureus/owl2java
1 object BikeFactory : SemanticObjectFactory(listOf(/*queries*/)) {
2 fun create(frontWheel: Wheel, backWheel: Wheel,
3 bells: List, mfgYear: Int?):Bike {...}
4 override fun instantiateSingleResult(lines:List):Bike?{...}
5 }
1 class WrappedBike(private val bike: Bike) {
2 var frontWheel: Wheel
3 get() = bike.frontWheel
4 set(value) { bike.frontWheel = value }
5
6 var backWheel: Wheel
7 get() = bike.backWheel
8 set(value) { bike.backWheel = value }
9
10 val bells: List
11 get() = bike.bells
12 fun bells_add(entry: Bell) = bike.bells_add(entry)
13 fun bells_remove(entry: Bell) = bike.bells_remove(entry)
14 ...
15 fun delete() = bike.delete()
16 fun merge(other: WrappedBike) = bike.merge(other.bike)
17 }
Listing 7: Interface of the generated factory for bikes, and the interface class.
provides a Java code generation plugin using a similar OWL-to-Java mapping5 .
Our approach is not the first to generate API code from SPARQL queries: the tool grlc [12]
takes the same route. However, grlc does not connect to a programming language directly,
but merely provides web developers with a more familiar way to access semantic data without
hiding much of the underlying complexity.
Scheglmann et al. [13] identify the need for more abstraction and customization in generated
APIs. To this end, they propose a method that takes OWL ontologies as an input and transforms
them into two intermediate models in custom modeling languages. These models are edited by
the user to customize the Java API that is generated from the model in the final translation step.
Parreiras et al. [14] address similar issues of customization and portability with Agogo. At its
core, Agogo is a domain-specific language (DSL) for describing mappings of semantic concepts
to object-oriented APIs. Similarly to the intermediate models in [13], it can then be used to
generate API code in a given target language. Agogo is also particularly interesting in that
it allows users to define their own SPARQL queries to read and update properties, making it
query-based in a way that is similar to our approach.
Compared to these approaches, inkblot offers similar possibilities for abstraction and cus-
tomization but makes different trade-offs, favouring a higher degree of automation and standard
formats over unlimited customization.
Another approach that could be described as query-based is LITEQ [15], which provides a
custom query language that can be used to load and create semantic data dynamically using
existing RDFS annotations. From a usage perspective, however, LITEQ appears closer to libraries
like Jena than the other presented approaches: while it allows for type-safe access to data, it
5
https://protegewiki.stanford.edu/wiki/Protege-OWL_Code_Generator
does not provide the kind of fully encapsulated API typically generated by other tools.
Still, these last three approaches are closer in spirit to inkblot than purely OWL-based
generation methods, as they allow for deviations between data access APIs and the underlying
data model.
7. Conclusion
Semantic technologies are hard to use for non-experts, and this work describes a step toward
enabling programming applications that use semantic data through an automatically generated
API that almost completely hides the semantic technologies for the programmer. We envision
that inkblot configurations are maintained by the developers of the semantic dataset or
ontology and provided to external programmers. In this way, semantic web technologies are
easier to integrate into bigger applications.
We focus on the use of SPARQL queries as the interface between OO and RDF, in contrast to
approaches that focus on either generating OO code from an ontology or manipulating raw
RDF data. We conjecture that this is a more elegant and practical solution to connecting the
worlds of semantic data and programming than including concepts from one side into the other.
Future Work. For the next step, we plan to implement a Python backend. As for usability, we
plan to investigate using a DSL as part of the runtime to hide the SPARQL filter expressions used
currently, and the ability to define subtypes over configurations based on the Liskov principle
for RDF data loading [8]. Furthermore, we assume that it is possible to simplify the local change
log, similar to the ocdm API [9], for performance.
Acknowledgments
This work was partially supported by the SM4RTENANCE project, an EU H2020 project under
grant agreement No. 101123423.
References
[1] N. M. Goldman, Ontology-oriented programming: Static typing for the inconsistent
programmer, in: ISWC, volume 2870 of Lecture Notes in Computer Science, Springer, 2003,
pp. 850–865.
[2] A. Kalyanpur, D. J. Pastor, S. Battle, J. A. Padget, Automatic mapping of OWL ontologies
into Java, in: SEKE, 2004, pp. 98–103.
[3] J.-B. Lamy, Owlready: Ontology-oriented programming in python with automatic clas-
sification and high level constructs for biomedical ontologies, Artificial Intelligence
in Medicine 80 (2017) 11–28. URL: https://www.sciencedirect.com/science/article/pii/
S0933365717300271. doi:https://doi.org/10.1016/j.artmed.2017.07.002 .
[4] G. Stevenson, S. Dobson, Sapphire: Generating Java runtime artefacts from OWL ontolo-
gies, in: CAiSE Workshops, volume 83 of Lecture Notes in Business Information Processing,
Springer, 2011, pp. 425–436.
[5] G. P. Copeland, D. Maier, Making Smalltalk a database system, in: B. Yormark (Ed.),
SIGMOD, ACM Press, 1984, pp. 316–325. URL: https://doi.org/10.1145/602259.602300.
doi:10.1145/602259.602300 .
[6] V. Eisenberg, Y. Kanza, Ruby on semantic web, in: ICDE, IEEE Computer Society, 2011,
pp. 1324–1327. URL: https://doi.org/10.1109/ICDE.2011.5767945. doi:10.1109/ICDE.2011.
5767945 .
[7] S. Baset, K. Stoffel, Object-oriented modeling with ontologies around: A survey of existing
approaches, Int. J. Softw. Eng. Knowl. Eng. 28 (2018) 1775–1794.
[8] E. Kamburjan, V. N. Klungre, M. Giese, Never mind the semantic gap: Modular, lazy and
safe loading of RDF data, in: ESWC, volume 13261 of Lecture Notes in Computer Science,
Springer, 2022, pp. 200–216.
[9] S. Persiani, M. Daquino, S. Peroni, A programming interface for creating data according
to the SPAR ontologies and the OpenCitations data model, in: ESWC, volume 13261 of
Lecture Notes in Computer Science, Springer, 2022, pp. 305–322.
[10] M. Babik, L. Hluchy, Deep integration of Python with Web Ontology Language, in: C. Bizer,
S. Auer, L. Miller (Eds.), 2nd Intl. Workshop on Scripting for the Semantic Web, volume
181 of CEUR Workshop Proceedings, 2006. URL: http://CEUR-WS.org/Vol-181/paper1.pdf.
[11] E. Oren, R. Delbru, S. Gerke, A. Haller, S. Decker, ActiveRDF: object-oriented semantic
web programming, in: WWW, ACM, 2007, pp. 817–824.
[12] A. Meroño-Peñuela, R. Hoekstra, grlc makes GitHub taste like linked data APIs, in:
SALAD@ESWC, volume 1629 of CEUR Workshop Proceedings, CEUR-WS.org, 2016.
[13] S. Scheglmann, A. Scherp, S. Staab, Declarative representation of programming access to
ontologies, in: ESWC, volume 7295 of Lecture Notes in Computer Science, Springer, 2012,
pp. 659–673.
[14] F. S. Parreiras, C. Saathoff, T. Walter, T. Franz, S. Staab, APIs à gogo: Automatic generation
of ontology APIs, in: ICSC, IEEE Computer Society, 2009, pp. 342–348.
[15] M. Leinberger, S. Scheglmann, R. Lämmel, S. Staab, M. Thimm, E. Viegas, Semantic web
application development with LITEQ, in: ISWC (2), volume 8797 of Lecture Notes in
Computer Science, Springer, 2014, pp. 212–227.