=Paper=
{{Paper
|id=Vol-3830/paper2
|storemode=property
|title=Generating Transparent and Query-Based RDF Layers
|pdfUrl=https://ceur-ws.org/Vol-3830/paper2.pdf
|volume=Vol-3830
|authors=Nils Rollshausen,Eduard Kamburjan,Martin Giese
|dblpUrl=https://dblp.org/rec/conf/semiim/RollshausenKG24
}}
==Generating Transparent and Query-Based RDF Layers==
<pdf width="1500px">https://ceur-ws.org/Vol-3830/paper2.pdf</pdf>
<pre>
                                Generating Transparent and Query-Based RDF Layers
                                Nils Rollshausen1 , Eduard Kamburjan2 and Martin Giese2
                                1
                                    Technical University of Darmstadt, Germany
                                2
                                    University of Oslo, Norway


                                              Abstract
                                              We present inkblot , a software tool that generates object-oriented API code to represent the data found
                                              in a semantic model in a way suitable for programmers not familiar with semantic technologies. The API
                                              interacts with an RDF triple store via a SPARQL endpoint, but hides this connection from the programmer.
                                              The approach generates adequately typed fields and methods, based on the types and cardinalities in the
                                              data. Unlike previous work, the generation is not driven by an OWL ontology or similar declarative
                                              description of the application domain, but by a SPARQL query that accesses the pertinent data.

                                              Keywords
                                              Code generation, Object-orientation, Graph query


                                1. Introduction
                                Motivation. Object-oriented (OO) software development strives to produce designs contain-
                                ing classes with fields and relationships corresponding to entities in the application domain.
                                The same holds true for semantic technologies, where significant effort is invested to produce
                                ontologies that accurately formalize a conceptualization of the domain.
                                   It is thus inevitable when interfacing between application software and semantic technologies
                                that domain concepts will be represented twice, once in the form of types in the Resource
                                Description Framework (RDF) sense, and once in an object-oriented class hierarchy. Moreover,
                                boilerplate code needs to be written that transfers information between the two representations.
                                The need for these manually created and maintained abstraction layers is a source of errors and
                                presents a significant barrier for adoption of semantic web technologies.
                                   This situation naturally leads to the idea of automatically generating program code that fits a
                                semantic model in the form of a library or API that exposes a class hierarchy that is consistent
                                with a semantic model of the domain, and that incorporates services such as synchronization
                                with a triple store, reasoning, etc. Due to the wealth of research on ontology languages like
                                OWL as a means of expressing domain knowledge, a variety of approaches have been proposed
                                to generate such code from an ontology [1, 2, 3, 4]. However, we believe that there are some
                                fundamental difficulties when using an ontology for this purpose.
                                   First, the ontology describes the domain, and not the information available. For example, an
                                ontology about human beings may rightly express that every human has a mother. Translating
                                this to a requirement for the OO code stating that the ‘mother’ field may never be null is

                                SofLiM4KG’24: Software Lifecycle Management for Knowledge Graphs Workshop, November 11–12, 2024, Baltimore, US
                                Envelope-Open nrollshausen@seemoo.tu-darmstadt.de (N. Rollshausen); eduard@ifi.uio.no (E. Kamburjan); martingi@ifi.uio.no
                                (M. Giese)
                                            © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
nonsensical since any given dataset will not contain information about the mother of every
human mentioned. Typing, cardinalities, and non-nullness in the code should be informed
by what information will have to be represented, and not the realities of the domain as such.
Second, the typing of relations in a semantic model does not fit that of associations in object
orientation. E.g., domain statements are inherited upwards in the class hierarchy: if 𝑅 has
domain 𝐶, then 𝑅 also has domain 𝐶’ for any superclass of 𝐶. But if an OO class 𝐶 has a field 𝑓,
then any subclass of 𝐶 will also have the field.
  Both of these are instances of the so-called impedance mismatch [5, 6] or semantic gap [7, 8]
between the object model of RDF, geared towards data-driven tasks, and the object model of
programming languages, geared towards typeability and modularity.

Approach. We explore a different approach that uses a closed-world formalism to describe
the shape of the information to be represented. Specifically, we require the user to formulate
a SPARQL Protocol and RDF Query Language (SPARQL) query that loads the information
required. Based on this query and some additional configuration information (field names,
multiplicities, etc.), we generate API code that provides an object oriented representation of the
information, as well as implementations to lazily load and store data from and to a SPARQL
endpoint.
   The core challenge here lies in automatically generating writeback SPARQL updates given
a single retrieval query, including updates for object creation, modification, and deletion. An
approach for seamless read-only integration of semantic web data into object-oriented languages
in a query-based manner has already been described in [8].
   Our proposed API includes a runtime that supports caching and handles consistency of the
loaded objects. Furthermore, it includes an extendable monitoring mechanism that (a) validates
assumptions on the RDF data using SHACL shapes or SPARQL validation queries, and (b) verifies
assumptions on the program data that must hold before the writeback to the RDF store. All
local changes are collected until a commit to improve performance, and the API works with
every write-enabled SPARQL endpoint. The generated API also supports creating completely
new objects, writing them into the store, and deleting existing objects from the RDF data.

Contribution. Our main contribution is a tool for query-based two-way integration of OO
and RDF through API generation from SPARQL queries. The tool is available as open source at
https://github.com/smolang/inkblot and has a modular backend that currently supports Kotlin
code generation, which can be used from any JVM-based language.


2. Motivating Example
We first introduce an example to illustrate our notion of transparent and query-based RDF layers.
Consider a software written in the domain of bicycles that is used to manage the inventory of a
shop. The inventory is stored in an RDF store, and must be manipulated using software in an
object-oriented language.
  The software must manipulate RDF structures, but these structures are optimized for the
domain and data storage, not necessarily the application. More precisely, the data loaded from
     1   SELECT ?bike ?mfg ?fw ?bw ?bells {
     2      ?bike a bk:bike; bk:hasFrame [bk:frontWheel ?fw; bk:backWheel ?bw]
     3      OPTIONAL { ?bike bk:mfgYear ?mfg }
     4      OPTIONAL { ?bike bk:hasFrame [bk:hasBell ?bells] } }
     5   SELECT ?wheel ?diameter {
     6      ?wheel a bk:wheel; bk:diameter ?diameter. }
     7   SELECT ?bell ?color WHERE {
     8      ?bell a bk:bell;
     9      bk:color ?color }

                       Listing 1: SPARQL retrieval queries for bike, wheel and bell objects.


         and written to the RDF store must be transformed into OO structures. Take, for example, the
         three SPARQL queries in Lst. 1 that load all bikes, wheels, and bells from the RDF store.
            The first query loads all bikes, their wheels, and manufacturing dates. Note that there are bike
         frames in the RDF store, to which the wheels are attached. However, they are not loaded because
         they are not manipulated by the program, yet they are part of the ontology. Management
         applications do not need to consider the semantic connotations of data, and software developers
         prefer to manipulate the objects in terms of their programming language (except during I/O).
         For example, the code in Lst. 2 implements two business logic operations: The first loads all
         bikes lacking a manufacturing date and sets their date to the current year. The second creates a
         new bicycle and equips it with a bell removed from an existing bike.
            All SPARQL queries in this example are hidden in the special classes Bike , Wheel , and Bell
         and their associated Factory classes that manage the I/O of data – only when we have to
         provide a filter to extend the load query (l. 4), semantic technologies are accessed directly.
            Our tool, inkblot , generates such code from the given SPARQL queries, provides basic oper-
         ations on the loaded data, and offers a way to extend the generated structures with application
         specific code. It furthermore manages unambiguous writeback and consistency, provides an
         architecture to monitor its assumptions about the data at runtime, and supports lazy loading of
         resources handled through a pointer structure.
1        Inkblot.endpoint = "http://example.com/sparql"
2
3        // add manufacturing year where unset
4        val undated = BikeFactory.commitAndLoadSelected("!bound(?mfg)")
5        undated.forEach{ cycle -> cycle.mfgYear = 2023 }
6
7        // create a tricycle and transfer an existing bell to it
8        val scrapBike = BikeFactory.loadFromUri("...")
9        val bell = scrapBike.bells[0]
10       scrapBike.bells_remove(bell)
11       val wheel1 = WheelFactory.create(diameter=20.0)
12       [...]
13       val newCycle = BikeFactory.create(wheel1, wheel2, 2023, listOf(bell))
14       Inkblot.commit() // commit all changes

                             Listing 2: Usage examples of the generated library classes
3. Architecture
The workflow of inkblot , pictured in Fig. 1, has three steps that transform a set of SPARQL
queries or configuration files into a set of classes in the target language, together with some
auxiliary structures to use at runtime.
Input The input to inkblot is a set of SPARQL queries. These queries are then translated into
     a configuration file in which the user can refine the generated abstraction of the class: its
     fields, its connection to other classes, and the multiplicity of fields.

Generation The library generation itself analyzes the input query (Variable Path Analysis) and
    synthesizes several other queries for manipulating the RDF related to an object (Query
    Synthesizer). The user can override some of the generated queries, especially if different
    deletion behavior is required.
      Based on analysis results and synthesized queries, the actual generation generates code
      in the target language (Semantic Object Generator). To ensure that runtime data indeed
      follows the assumptions used during generation (e.g., multiplicity of fields), SPARQL
      queries and SHACL constraints for monitoring are generated (Validation Generator and
      SHACL Generator). The SPARQL queries are included in the generated code, while the
      SHACL constraints are intended for external monitoring.

Output The generated RDF layer consists of the inkblot runtime in the target language, and
    a class and class factory for each query, as well as optional wrapper classes that can be
    used in the type hierarchy of the application.
We will continue to go over the components of inkblot , as well as its input, in detail.


Figure 1: Workflow of inkblot . Rectangles are inkblot components, rounded rectangles are input/out-
put files. Dashed arrows indicate optional inputs/outputs.


3.1. Input
The input to inkblot is a set of SPARQL queries. A configuration record is generated for each
query. The configuration record for the query for wheels is given in Lst. 3. Each record has a
     name and contains (a) the original query, (b) the anchor variable, (c) the RDF type of the anchor
     variable, and (d) a list of property records. The anchor variable is used to load the node in the
     RDF graph that is loaded into the object, the so-called anchor node. Two objects are considered
     to model the same structure if they have the same anchor node.
        Each property record has a name and contains (a) a SPARQL variable name from the query,
     (b) a type, and (c) a cardinality. The SPARQL query must contain a path from the anchor
     variable to each variable used in a property record, but parts of the path may be optional. The
     type of a property is either an XSD datatype, the name of a property record, or the constant
     inkblot:rawObjectReference . The cardinality is either optional "?" , solitary "!" or many
     "*" .
        Each property record becomes a field of the class generated for the query, with the name of
     the record being the name of the field, loaded from query results using the contained SPARQL
     variable. The type decides the type of the field – an XSD datatype becomes a primitive type,
     another record name becomes a reference to that class, and anything else is handled as a URI.
     Cardinality decides whether the field can have a null value (for ? ) or is a list (for * ).
        In the default, unedited configuration, the most generic cardinality is used and the name of
     each property is set to the name of the SPARQL variable. A property record is generated for
     every variable in the result set of the input query.

 1    "Wheel": {
 2       "anchor": "wheel",
 3       "type": "http://example.com/ns/class",
 4       "query": "... SELECT ?wheel ...",
 5       "properties": {
 6          "diameter": {
 7             "sparql": "diameter",
 8             "type": "[classnameOrXSD]",
 9             "cardinality": "*"
10          }
11       }
12    }

                     Listing 3: Excerpt of the default JSON configuration for wheels.


     3.2. Generation
     The configuration file is then processed by inkblot to generate the program code. The generated
     code provides access to the fields defined for the class but also generates further queries and
     structures for writeback, consistency, and validation.

     Analysis. As a preprocessing step to synthesize queries that relay changes between the RDF
     store and the program, we need to analyze variable dependencies in the input. The original
     query, which we denote retrieval query, must have explicit paths from the anchor variable to
     any variable used in a record.
        The inkblot Variable Path Analysis component traverses the retrieval query and collects
     all triple patterns contained in it, keeping track of potential optional contexts. The analysis
     itself ensures that the query conforms to the limitations outlined below and builds a dependency
graph in which nodes are SPARQL variables, constant URIs or values, and edges are the RDF
predicates that are used to relate them to one another in the input query. It then computes the
set of all simple paths leading from the anchor variable to any variable (including those not
in the result set) or literal and verifies that the graph is connected. Paths traversing the same
nodes using different edges are considered to be distinct. The additional non-result variables
and literals are used later to update the RDF graph correctly.

Synthesis. The QuerySynthesizer component uses the results of the path analysis to create
additional SPARQL queries to update the RDF store. We distinguish between initialization,
addition, removal, change, and deletion updates. All of these updates can operate on either
literal values or object references without any changes. We will therefore only refer to general
values in the following, which can either be literal values or URIs representing object references.
   To create an update that adds a new value to a property, for example, we first restrict the
dependency graph to only edges that are safe (i.e. not optional) in the context of the variable
representing the property. If that variable is defined inside of an Optional block, this includes
all edges found in the same block or any parent blocks. We then build a sub-graph by taking
the neighbourhood of the target variable and recursively extending it by the neighbourhood of
any non-anchor variables contained in it. All edges in this sub-graph are translated into triples
and form the basis of the generated update, substituting the object’s URI for the anchor variable
and the value to be added for the target variable. Triples that are known to exist already, even
if the property in question is not set yet, are included as the where condition of the SPARQL
update while all other triples are inserted.
   To remove a value, we follow a similar procedure and build the same neighbourhood sub-
graph. We include the full sub-graph in the where condition but only delete triples that form
the last edge on a simple path from the anchor variable to the target variable. This ensures that
any annotations belonging to the removed value and used in the retrieval query for filtering
purposes stay intact.
   A change update is equivalent to a removal update for the old value followed by an addition
update to set the new value. Together, these three types of updates cover everything needed to
modify the properties of our generated classes. To create and delete entire objects themselves,
we need additional initialization and deletion updates.
   For object creation, we distinguish between the base creation update and additional initializer
updates. The base creation updates insert all triples corresponding to safe edges in the depen-
dency graph. Since all variables contained in this subgraph are strictly required, we can bind
them to matching required constructor arguments. For constructor arguments corresponding to
optional properties, we use initializer updates that are only executed if a non-null / non-empty
constructor argument is passed. Initializer updates are identical to addition updates but are
considered distinct to allow overwriting one but not the other.
   For object deletion, we delete all triples that contain the anchor node of the deleted object in
either subject or object position. Other nodes are not deleted recursively to avoid unintentional
loss of data that may be required by other applications.1

1
    In terms of our ongoing example, while our application may reasonably delete frame nodes when deleting a bike,
    other applications with a different view of the data may expect the frame to continue to exist even if the bike itself
  These updates, particularly the deletion update we have just discussed, assume specific
semantics for their operations. As these semantics may be application-specific, the user has the
option to provide their own update queries and override the ones generated by inkblot . In this
way, users can make adaptations to their specific application while still benefiting from fully
automatic code generation and the same level of runtime support as with the default updates.

Generation. Equipped with the synthesized SPARQL updates, the actual Semantic Object
Generator generates three classes per configuration record: a core class, a factory for this class,
and a wrapper class for the core class, which is a default implementation to add business logic
around the core class. We will describe them in detail in the next section. The factory class
uses slightly modified versions of the retrieval query to load individual instances or subsets
of all instances. For access to the defined properties, we generate standard getter and setter
methods that operate on locally cached property values. For references to other objects, we
only store the URI of the referenced object — the corresponding getter method transparently
lazy-loads and instantiates the referenced object on access. The details of lazy loading are
described by Kamburjan et al. [8]. All setter methods record changes made to the property
using the previously generated SPARQL updates in addition to updating the cached property
value. This creates the local changelog of SPARQL updates that will be applied to the data store
when local changes are committed.
   The generated code is statically typed, using the type and cardinality information contained
in the configuration file. Boolean values, strings, and all numeric xsd types are supported
natively and mapped to appropriate Kotlin type equivalents. Other xsd types, such as dates, are
exposed to the user in their string representation. The communication with the RDF store over
the SPARQL endpoint is implemented using Apache Jena2 .

Validation. The annotated type and cardinality information is used to generate well-typed
code. However, this corresponds to assumptions about the actual data in the RDF store, which
may not hold at runtime. We therefore provide two mechanisms to validate the used data: For
one, inkblot generates SPARQL queries selecting instances with invalid types or cardinalities
using variations of the retrieval query with different added filter expressions. These validat-
ing queries are embedded into the generated Factory classes and automatically validate the
consistency of the data store before any objects are loaded or created. For another, inkblot
automatically generates SHACL constraints for all classes, encoding all assumptions made in
the configuration. The user can manually check these constraints using any SHACL validator.

Language portability Currently, only Kotlin is supported as a target language, as it can
use the numerous Java libraries for RDF and be used from other JVM-based languages, but the
generation backend is designed modularly and can be adapted to other languages. Only the
runtime and the semantic object generator components are specific to the target language; all
other components are language-independent and can be reused for different backends. New


    is deleted and used for spare parts.
2
    In our current Kotlin-backend, there are no Jena-specific design decisions in inkblot .
backends can be implemented as extensions of the AbstractSemanticObjectGenerator class
and can be selected by the user during generation.

Limitations. In terms of the supported SPARQL fragment, inkblot supports queries that
provide unambiguous writeback locations for all variables to allow persisting changes to the
data store. This means that each variable may only be bound in a single location and that the
provided graph patterns form connected graphs around the anchor node. Language features
such as subqueries, explicit variable bindings, and set difference operations also do not allow
unambiguous writeback and are not currently supported.


4. Generated API
The output of inkblot is a set of classes to be used together with the inkblot runtime. The
API is documented in greater detail at https://github.com/smolang/inkblot; we present only its
main capabilities here.

4.1. Runtime
The runtime of inkblot maintains a global cache of all loaded objects to ensure that each
anchor node is represented only by a single object at a given time and local changes remain
consistent. The cache is transparent to the user.
   Additionally, it maintains a log of all changes to loaded objects and the SPARQL updates
required to persist these changes to the data store. When the user decides to commit these
local outstanding changes, the gathered updates are batched together and sent to the SPARQL
endpoint. To avoid unexpected behaviour when loading new instances from the data store
in the presence of uncommitted local changes, all methods that load multiple instances also
commit changes to the data store. The API assumes that no other party is modifying the RDF
data.
   The API of the runtime permits the following operations to the user:

Loading Using the endpoint property, the URL of the used SPARQL endpoint for the RDF
     connection can be set.

Writing Using the commit() method, all changes in the loaded objects are forced to be mirrored
     in the RDF store. This method is also called by several of the generated classes, which we
     detail below.

Monitoring The runtime monitors the consistency of local data in addition to the previously
    described SPARQL and SHACL validation mechanisms for online data. This includes
    checks for data types that do not have a one-to-one mapping to JVM primitive types, such
    as the XSD NegativeInteger, verifying that values assigned to such properties conform to
    the limitations of the underlying XSD type. If any such check fails, listeners added/re-
    moved using the addViolationListener() /removeViolationListener() methods are
    notified of this constraint violation, even before any changes are committed.
   The runtime also implements lazy loading: whenever an anchor node is loaded, it is first
checked whether the runtime has an object for it in the cache. Similarly, when a loaded object
refers to another object, the second object is only loaded when the access actually happens. If
the object is never accessed, it is not loaded. For example, if the program loads a bike but never
accesses its wheels, then the wheels are not loaded at all.

4.2. Generated Classes
For each configuration record C, three classes are generated, as pictured in Fig. 2. The core class

                                  SemanticObject                    SemanticObjectFactory
                                  me rge (othe r)                     loadFromURI(uri)
                                    de le te ()                      commitAndLoadAll()
                                                                commitAndLoadS e le cte d(filte r)
                                                                     validate Online Data()
                                                                  addValidatingQue ry(que ry)


                                         C                                     CFactory
 WrappedC
                                    prope rtie s                               cre ate (...)


Figure 2: UML diagram of the generated classes for a configuration record C.


contains all the data, operations for changes, and manages the relation of the object to the RDF
store. The factory is used to create new core class objects either from scratch or by loading
them from the store. A wrapper class is available to include core classes in the type hierarchy
of the software application.

Core Class. A core class object provides read access to an object’s unique URI and full
read/write access to all properties defined in the configuration record. Using Kotlin’s built-
in support for property getter and setter methods, we render functional properties as first-
class Kotlin properties, making changelogging, lazy loading of object references and runtime
constraint checks completely transparent to the user. We also use Kotlin’s distinction between
nullable and non-nullable types to offload enforcement of ‘exactly one’ cardinality constraints
to the type system. For non-functional properties that can have multiple values, we still provide
access using first-class language features in the form of an immutable Kotlin list. Addition and
removal of list entries is handled using dedicated methods.
   Inspired by the OpenCitations ocdm API [9], we also provide a merge operation that combines
two instances of the same class and redirects all references to either instance to the new unified
instance. All core classes implement the SemanticObject interface.

Factory. For each configuration record, inkblot generates the class itself as well as an
associated Factory class used for creating new instances (create() ) as well as loading existing
 1   "Bike": {
 2      "anchor": "bike",
 3      "type": "http://example.com/ns/Bike",
 4      "query": "... SELECT ?bike ...",
 5      "properties": {
 6         "frontWheel": {"sparql": "fw", "type": "Wheel", "cardinality": "!" }
 7         "backWheel": {"sparql": "bw", "type": "Wheel", "cardinality": "!" }
 8         "bells": {"sparql": "bell", "type": "Bell", "cardinality": "*" }
 9         "mfgYear": {"sparql": "mfg", "type": "xsd:int", "cardinality": "?" }
10      }
11   }


 1   sh:targetClass <http://example.com/ns/Bike>;
 2   sh:property [
 3    sh:datatype xsd:int;
 4    sh:path <http://example.com/ns/bike#mfgDate>
 5    sh:maxCount 1;
 6    sh:name "..."; sh:message "...";
 7   ].


              Listing 4: JSON configuration for class ‘bike’ and an example SHACL shape.


     instances from the data store. The factory provides methods to load all instances found in
     the data store (commitAndLoadAll() ), a single instance specified by a URI (loadFromURI() ),
     or a subset of instances specified by a SPARQL filter expression (commitAndLoadSelected() ).
     These filter expressions are the only place where the user is exposed to the underlying semantic
     technology.
        The factory also allows users to perform validation of instances in the remote data store
     at runtime using validateOnlineData() , optionally using custom validation queries (added
     by addValidatingQuery() ) in addition to automatically generated ones. Contrary to the
     monitoring of the runtime, which is performed on the local objects, these checks are performed
     on the triplestore. The generated validating SPARQL queries, for example, are used here.

     Wrapper. To allow users to extend generated classes with custom application logic, inkblot
     can optionally create wrapper classes that contain a reference to a class instance and expose
     the same set of properties lifted from that instance without containing any logic on their own.


     5. Usage
     Let us illustrate the overall workflow that connects the queries and the program code in Sec. 2
     and follow the query for bike throughout generation. The generated default configuration
     record is analogous to the one given for wheels in Sec. 3. After modification to fit our intents,
     the configuration record is as shown in Lst. 4. The RDF type of the anchor variable is updated
     (l. 3), the names of the properties are different from the SPARQL variables used to retrieve them
     (l. 6,7,8,9) and the cardinality is changed. Every bike has exactly one front wheel and one back
     wheel (l. 6, l. 7), at most one manufacturing year (l. 9) and arbitrarily many bells (l. 8). The
     types of the properties are updated to correspond to an XSD datatype or the name of another
     configuration record.
         The following artifacts are then generated. First, the SHACL shapes, for example the shape in
     Lst. 4 checking the cardinality and type of the manufacturing date. Second, the core class Bike ,
     given in Lst. 5. As we can see, it has a property for each variable defined in its configuration
     record. The bells list variable is not exposed, but an API is generated so users cannot manipulate
     internals. Also note the cached loading of front wheels at l. 8 – the front wheel is only loaded
     via the cache once accessed. Lst. 6 describes the full details of property access and modification
     for mfgYear . While the loading is straightforward, changing the value requires generating
     updates depending on whether the value is set to null and is deleted, set to a value for the first
     time, or changed to a new value. These queries are added to the global changelog and only
     executed when the changelog and cache are committed. Until then, the database is not updated.

1    class Bike internal constructor(uri: String, frontWheel: String,
2                          backWheel: String, bells: List<String>,
3                          mfgYear: Int?) : SemanticObject(uri) {
4    [...]
5    // frontWheel
6    private var _inkbltRef_frontWheel: String = frontWheel
7    var frontWheel: Wheel
8       get() = Wheel.loadFromURI(_inkbltRef_frontWheel)
9       set(value) {...}
10   // backWheel
11   private var _inkbltRef_backWheel: String = backWheel
12   var backWheel: Wheel
13      get() = Wheel.loadFromURI(_inkbltRef_backWheel)
14      set(value) {...}
15   // bells, extra operations instead of setter
16   private val _inkbltRef_bells = bells.toMutableSet()
17   val bells: List<Bell>
18      get() = _inkbltRef_bells.map { BellFactory.loadFromURI(it) }
19   fun bells_add(obj: Bell) {...}
20   fun bells_remove(obj: Bell) {...}
21   // manufacturing year
22   var mfgYear: Int? = mfgYear
23      set(value) {...}
24   // special merge operation
25   fun merge(other: Bike) {...}
26   }


                                    Listing 5: Excerpt of core class ‘bike’

        The BikeFactory handles creation of Bike objects and the connection to the RDF store. Most
     of the interface is inherited from the generic SemanticObjectFactory , only the method to
     create a Bike from scratch (create() ) is added, as well as an internal helper function to create
     a Bike from a query result.
        As the Bike class is part of the inkblot type hierarchy and contains all the logic for managing
     RDF references and lazy loading, inkblot can optionally generate a wrapper/interface class
     WrappedBike (Lst. 7) that has a Bike as a field and provides quick access to its fields as well as
     a space to add custom application logic.
1        var mfgYear: Int? = mfgYear
2        set(value) {
3         if(deleted)
4           throw Exception("...")
5         if(value == null) { // Unset value
6           val oldValueNode = ResourceFactory.createTypedLiteral(field).asNode()
7           val cn = CommonPropertyRemove(uri, "/*mfgYear*/", oldValueNode)
8           Inkblot.changelog.add(cn)
9         }
10        else if(field == null) { // Pure insertion
11          val newValueNode = ResourceFactory.createTypedLiteral(value).asNode()
12          val cn = CommonPropertyAdd(uri, "/*mfgYear*/", newValueNode)
13          Inkblot.changelog.add(cn)
14        }
15        else { // Change value
16          val oldValueNode = ResourceFactory.createTypedLiteral(field).asNode()
17          val newValueNode = ResourceFactory.createTypedLiteral(value).asNode()
18          val cn = CommonPropertyChange(uri, "/*mfgYear*/", oldValueNode, newValueNode!!)
19          Inkblot.changelog.add(cn)
20        }
21
22           field = value
23           markDirty()
24       }


                             Listing 6: Excerpt of managing a property in the core class ‘bike’


     6. Related Work
     There are many existing works investigating the static generation of library code for RDF data
     access. A comprehensive comparison of existing approaches can be found in [7], alongside a
     detailed discussion of the semantic gap. In terms of Baset and Stoffel’s taxonomy, inkblot is an
     Active/Static approach that uses SPARQL as the source language.
        Most of the existing approaches use OWL ontologies as their input [1, 2, 3, 4]. All of these
     works face the issue of the semantic gap between RDF and their respective target languages
     and address them to varying degrees.
        Among these, Sapphire [4] stands out as significantly narrowing the semantic gap by sup-
     porting open-world reasoning and using JVM bytecode manipulation to allow re-typing objects
     at runtime, matching RDF semantics.
        Also owlready [3] includes extensive reasoning support, and addresses the gap by combining
     open- and closed-world reasoning. It provides a sophisticated API for accessing and modifying
     OWL ontologies in Python. Owlready redefines several Python methods and makes use of the
     language’s dynamic nature to approximate RDF semantics. Other approaches for integrating
     RDF with dynamic languages exist [10, 11], but are less relevant to our use case of generating
     statically typed Kotlin/JVM code.
        There are also several open-source tools to transform OWL to Java, implementing the ideas
     found in [2] and other works. Two of these are Jastor 3 and owl2java4 , both of which generate
     APIs that are structurally close to typical inkblot output. The Protégé ontology editor also
     3
         https://jastor.sourceforge.net/
     4
         https://github.com/piscisaureus/owl2java
1    object BikeFactory : SemanticObjectFactory<Bike>(listOf(/*queries*/)) {
2    fun create(frontWheel: Wheel, backWheel: Wheel,
3            bells: List<Bell>, mfgYear: Int?):Bike {...}
4    override fun instantiateSingleResult(lines:List<Solution>):Bike?{...}
5    }


1    class WrappedBike(private val bike: Bike) {
2    var frontWheel: Wheel
3     get() = bike.frontWheel
4     set(value) { bike.frontWheel = value }
5
6    var backWheel: Wheel
7     get() = bike.backWheel
8     set(value) { bike.backWheel = value }
9
10   val bells: List<Bell>
11    get() = bike.bells
12   fun bells_add(entry: Bell) = bike.bells_add(entry)
13   fun bells_remove(entry: Bell) = bike.bells_remove(entry)
14   ...
15   fun delete() = bike.delete()
16   fun merge(other: WrappedBike) = bike.merge(other.bike)
17   }


                 Listing 7: Interface of the generated factory for bikes, and the interface class.


     provides a Java code generation plugin using a similar OWL-to-Java mapping5 .
        Our approach is not the first to generate API code from SPARQL queries: the tool grlc [12]
     takes the same route. However, grlc does not connect to a programming language directly,
     but merely provides web developers with a more familiar way to access semantic data without
     hiding much of the underlying complexity.
        Scheglmann et al. [13] identify the need for more abstraction and customization in generated
     APIs. To this end, they propose a method that takes OWL ontologies as an input and transforms
     them into two intermediate models in custom modeling languages. These models are edited by
     the user to customize the Java API that is generated from the model in the final translation step.
        Parreiras et al. [14] address similar issues of customization and portability with Agogo. At its
     core, Agogo is a domain-specific language (DSL) for describing mappings of semantic concepts
     to object-oriented APIs. Similarly to the intermediate models in [13], it can then be used to
     generate API code in a given target language. Agogo is also particularly interesting in that
     it allows users to define their own SPARQL queries to read and update properties, making it
     query-based in a way that is similar to our approach.
        Compared to these approaches, inkblot offers similar possibilities for abstraction and cus-
     tomization but makes different trade-offs, favouring a higher degree of automation and standard
     formats over unlimited customization.
        Another approach that could be described as query-based is LITEQ [15], which provides a
     custom query language that can be used to load and create semantic data dynamically using
     existing RDFS annotations. From a usage perspective, however, LITEQ appears closer to libraries
     like Jena than the other presented approaches: while it allows for type-safe access to data, it

     5
         https://protegewiki.stanford.edu/wiki/Protege-OWL_Code_Generator
does not provide the kind of fully encapsulated API typically generated by other tools.
  Still, these last three approaches are closer in spirit to inkblot than purely OWL-based
generation methods, as they allow for deviations between data access APIs and the underlying
data model.


7. Conclusion
Semantic technologies are hard to use for non-experts, and this work describes a step toward
enabling programming applications that use semantic data through an automatically generated
API that almost completely hides the semantic technologies for the programmer. We envision
that inkblot configurations are maintained by the developers of the semantic dataset or
ontology and provided to external programmers. In this way, semantic web technologies are
easier to integrate into bigger applications.
  We focus on the use of SPARQL queries as the interface between OO and RDF, in contrast to
approaches that focus on either generating OO code from an ontology or manipulating raw
RDF data. We conjecture that this is a more elegant and practical solution to connecting the
worlds of semantic data and programming than including concepts from one side into the other.

Future Work. For the next step, we plan to implement a Python backend. As for usability, we
plan to investigate using a DSL as part of the runtime to hide the SPARQL filter expressions used
currently, and the ability to define subtypes over configurations based on the Liskov principle
for RDF data loading [8]. Furthermore, we assume that it is possible to simplify the local change
log, similar to the ocdm API [9], for performance.

Acknowledgments
This work was partially supported by the SM4RTENANCE project, an EU H2020 project under
grant agreement No. 101123423.


References
 [1] N. M. Goldman, Ontology-oriented programming: Static typing for the inconsistent
     programmer, in: ISWC, volume 2870 of Lecture Notes in Computer Science, Springer, 2003,
     pp. 850–865.
 [2] A. Kalyanpur, D. J. Pastor, S. Battle, J. A. Padget, Automatic mapping of OWL ontologies
     into Java, in: SEKE, 2004, pp. 98–103.
 [3] J.-B. Lamy, Owlready: Ontology-oriented programming in python with automatic clas-
     sification and high level constructs for biomedical ontologies, Artificial Intelligence
     in Medicine 80 (2017) 11–28. URL: https://www.sciencedirect.com/science/article/pii/
     S0933365717300271. doi:https://doi.org/10.1016/j.artmed.2017.07.002 .
 [4] G. Stevenson, S. Dobson, Sapphire: Generating Java runtime artefacts from OWL ontolo-
     gies, in: CAiSE Workshops, volume 83 of Lecture Notes in Business Information Processing,
     Springer, 2011, pp. 425–436.
 [5] G. P. Copeland, D. Maier, Making Smalltalk a database system, in: B. Yormark (Ed.),
     SIGMOD, ACM Press, 1984, pp. 316–325. URL: https://doi.org/10.1145/602259.602300.
     doi:10.1145/602259.602300 .
 [6] V. Eisenberg, Y. Kanza, Ruby on semantic web, in: ICDE, IEEE Computer Society, 2011,
     pp. 1324–1327. URL: https://doi.org/10.1109/ICDE.2011.5767945. doi:10.1109/ICDE.2011.
     5767945 .
 [7] S. Baset, K. Stoffel, Object-oriented modeling with ontologies around: A survey of existing
     approaches, Int. J. Softw. Eng. Knowl. Eng. 28 (2018) 1775–1794.
 [8] E. Kamburjan, V. N. Klungre, M. Giese, Never mind the semantic gap: Modular, lazy and
     safe loading of RDF data, in: ESWC, volume 13261 of Lecture Notes in Computer Science,
     Springer, 2022, pp. 200–216.
 [9] S. Persiani, M. Daquino, S. Peroni, A programming interface for creating data according
     to the SPAR ontologies and the OpenCitations data model, in: ESWC, volume 13261 of
     Lecture Notes in Computer Science, Springer, 2022, pp. 305–322.
[10] M. Babik, L. Hluchy, Deep integration of Python with Web Ontology Language, in: C. Bizer,
     S. Auer, L. Miller (Eds.), 2nd Intl. Workshop on Scripting for the Semantic Web, volume
     181 of CEUR Workshop Proceedings, 2006. URL: http://CEUR-WS.org/Vol-181/paper1.pdf.
[11] E. Oren, R. Delbru, S. Gerke, A. Haller, S. Decker, ActiveRDF: object-oriented semantic
     web programming, in: WWW, ACM, 2007, pp. 817–824.
[12] A. Meroño-Peñuela, R. Hoekstra, grlc makes GitHub taste like linked data APIs, in:
     SALAD@ESWC, volume 1629 of CEUR Workshop Proceedings, CEUR-WS.org, 2016.
[13] S. Scheglmann, A. Scherp, S. Staab, Declarative representation of programming access to
     ontologies, in: ESWC, volume 7295 of Lecture Notes in Computer Science, Springer, 2012,
     pp. 659–673.
[14] F. S. Parreiras, C. Saathoff, T. Walter, T. Franz, S. Staab, APIs à gogo: Automatic generation
     of ontology APIs, in: ICSC, IEEE Computer Society, 2009, pp. 342–348.
[15] M. Leinberger, S. Scheglmann, R. Lämmel, S. Staab, M. Thimm, E. Viegas, Semantic web
     application development with LITEQ, in: ISWC (2), volume 8797 of Lecture Notes in
     Computer Science, Springer, 2014, pp. 212–227.

</pre>