<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>SEBD</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>API for Ontology-driven LPG Graph DB Management</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Davide Di Pierro</string-name>
          <email>davide.dipierro@uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Ferilli</string-name>
          <email>stefano.ferilli@uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Università degli Studi di Bari Aldo Moro</institution>
          ,
          <addr-line>Via Edoardo Orabona, 4, 70125 Bari BA</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>31</volume>
      <fpage>02</fpage>
      <lpage>05</lpage>
      <abstract>
        <p>Graph databases are nowadays in widespread use for both scientific and industrial purposes. Graph models are also exploited in Artificial Intelligence to represent Knowledge Bases, in which the mere storage of data is paired with formal ontologies that allow us to interpret them, express constraints on them, and reason about them. While the standard practice for Knowledge Graphs is based on the RDF graph model, recently a new framework was proposed, named GraphBRAIN, based on the Labelled Property Graph model and more oriented to the DB perspective. Here we describe an API for enacting the new framework and make it available to serve the needs of independent applications interested in using knowledge bases. The API will allow ontology-compliant access to the data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Currently, there is extensive work to enhance the software with technologies and methods for
handling the growing, diverse data. Although graph DBs ofer interpretability and scalability,
their scheme-less nature hinders access to data semantics, crucial in this landscape. Attempts to
address this issue have involved introducing constraints or limitations. However, proper schemes
would have greater power in determining data representation in the DB. An even stronger
solution would involve utilizing formal ontologies as DB schemes, enabling the application
of advanced AI solutions and technologies. Unfortunately, the standard graph model used in
research on ontologies is diferent than the Labelled Property Graph (LPG) model adopted by
current leading graph DBs such as Neo4j. The GraphBRAIN framework [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] has been proposed
as a solution: it can define ontologies specifically suited for the LPG model, and use them as
schemes for Neo4j DBs [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], obtaining a fully-fledged knowledge base.
      </p>
      <p>In this paper, we propose the first prototype of an API by which applications can use
GraphBRAIN technology to consult or manipulate a knowledge base and reason with it. The API allows
the users to define, modify or reuse ontologies, access and manipulate the DB in compliance
with a given ontology and run advanced functions on the data, such as automated reasoning
and mining. Specifically concerning data access and manipulation, the API allows the users to
issue queries using the standard Cypher language of the Neo4j DB, but automatically checking
their compliance to the specified ontology/scheme 1 in a way that is fully transparent to the
user. We consider this a relevant added value to standard graph DB management since it allows
users to create their own schemes and query their data as in traditional DBMS settings. Other
features are also available in the API. GraphBRAIN ontologies and instances can be translated
into Semantic Web (SW) standards, mapped onto existing SW resources, and passed to SW
reasoners to perform classical tasks such as consistency checking and instance checking, or
infer new knowledge. Still, being GraphBRAIN ontologies independent of the SW approaches,
they can be mapped also onto other formalisms, enabling further kinds of automated reasoning
(e.g., rule-based MultiStrategy reasoning).</p>
      <p>The rest of the paper is organized as follows. After reporting related work in Sec. 2 and some
background in Sec. 3, in Sec. 4 we recall the GraphBRAIN formalism to define ontologies to be
used as schemes for LPG-based graph DBs. Then, Sec. 5 describes the API we have developed
and shows examples of its use. Sec. 6 concludes the paper and outlines future work issues.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>To the best of our knowledge, there is no previous attempt to integrate graph DBs and ontologies
into an API. However, in many applications, the possibility of providing an API interfacing with
a DB for supporting integration, management and sharing of knowledge is given.</p>
      <p>
        A great interest exists in the field of medicine, in order to bring common knowledge to
presumably work together for a certain objective. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] provided a web-based portal and an API to
connect to a relational database for the virology community. the applications Jmol and STRAP
were implemented to visualize and interact with the virus molecular structures and provide
sequence–structure alignment capabilities. After extended curation practices that maintain
data uniformity, a relational database based on a scheme for macromolecular structures and the
APIs provided enhance the ability to perform structural bioinformatics experiments and studies
on virus capsids. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] published diferent versions of APIs and datasets to enrich the medics
community to study bacteria. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] built a high-performance database of proteins. They provided
an API to query simplifying the SQL statements by means of object-oriented structures. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
developed an ontology of chemical entities of biological interest, named ChEBI, which contains
a wealth of chemical data. Unlike many other resources, ChEBI is human-curated, providing a
reliable, non-redundant collection of chemical entities and related data.
      </p>
      <p>
        [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] developed the Spectral Physics Environment for Advanced Remote Sensing (SPEARS)
application programming interface. It is a local thermal equilibrium (LTE) spectral modelling
optimized for synthesizing optical spectra from any combination of fundamental spectroscopic
databases. The API facilitates the user who has to specify the following parameters: the
thermodynamic state of the species (i.e. temperature, pressure, concentration), the model
geometry, and the model resolution.
      </p>
      <p>The added value of APIs is collaboration, that is, the possibility of creating a community
which voluntarily contributes to the expansion of some source of knowledge. The technological</p>
      <sec id="sec-2-1">
        <title>1In this paper, we will use terms ‘scheme’ and ‘ontology’ interchangeably.</title>
        <p>
          leaps moved the discipline of biology into a more information-based one, giving rise to new
ifelds of study like bioinformatics. In parallel to this, algorithms and hardware need to evolve
accordingly, to satisfy these growing needs. The success in solving these issues led to the
development of some projects like the 1000 Genomes Project for human [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], the 1001 Genomes
Project for Arabidopsis [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], and Ensembl [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>Apart from APIs, there are diferent works worth to be mentioned when discussing the
potential of ontologies and graph DBs.</p>
        <p>
          As said, the use of graph structures is a common solution to store ontological definitions.
Elbattah et al [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] compared the use of Neo4j with other DBs and/or structures, analysing the
pros and cons of this well-known strategy.
        </p>
        <p>
          Gong et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] reported the case of an RDF-based ontology that is stored in an LPG-based
graph, and the results are prominent when the focus is on querying, storing and scaling without
worrying about inference and consistency issues.
        </p>
        <p>
          Neo4j (and graph DBs in general), diferently from ontological settings, are not intended
to perform reasoning on the data. As a consequence, a DB will never be enough for keeping
peculiarities of ontological purposes. To supply DBs in this task, additional features (plugins)
may come in handy. Neo4j is strongly optimised to perform queries and navigate the graph
rapidly. Hence, when moving from one perspective to another one will lose advantages. That is
the reason why in our previous works we described the separation of concerns strategy so as to
distinguish instances from schemes. Here [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] there is a survey of works related to this topic.
        </p>
        <p>A traditional approach to impose constraints on the data, e.g. in Neo4j, is to use specific
queries to limit or make mandatory some attributes [14]. However, in our perspective, this may
be considered limiting because we cannot express relationships between higher-level concepts,
expressed by labels of nodes in the graph.</p>
        <p>Taking everything into consideration, with this work we aim to abstract from specific fields
of applications, but also to leverage a very flexible database, to easily integrate diferent sources
of knowledge. This API will allow researchers to create their own LPG-based KG, so as to take
advantage of the integration of schemes without losing performance in the manipulation and
management of instances.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Background</title>
      <p>Graph DBs may adopt diferent graph models and various providers ofer diferent
implementations [15]. A major diference is in their structure. A standard graph-based model for Knowledge
Representation in AI is the Resource Description Framework (RDF), where the graph is described
as a set of atomic triples ⟨, , ⟩ , each establishing a relationship (arc)  between a subject
resource  and an object resource or literal value  (nodes). A Property Graph [16] is a directed,
edge-labelled, attributed, multi-graph [17]:</p>
      <p>= ( ,  , , ,  ,  ,  , , , )
where  is a set of vertices,  a set of vertices identifiers,  ∶  →  a function from vertices
to their identifiers,  a set of directed edges,  a set of edges identifiers,  ∶  →  a function
from edges to their identifiers,  (resp.,  ) the vertices (resp., edges) attributes domain and 
(resp.,  ) the domain for allowed vertices (resp., edges) attributes values. It allows representing
properties on nodes, in the form of attribute-value pairs. One cannot make any assumption on
the attributes available in a node unless additional constraints are introduced. The idea is to
ensure flexibility: integration of knowledge is facilitated when there is no need of satisfying
strict node or relationship constraints. Indeed, real-world data are often noisy, incorrect or
incomplete, likely causing a cumbersome process in the integration of knowledge. Labelled PGs
(LPGs) [18] evolve the PG model, so that nodes and arcs may have labels. To give an analogy
with the traditional relational model, assigning a label to a node corresponds to stating that it is
an instance of the class specified in the label. A node may be associated with multiple labels
because in real-world scenarios not all classes are disjoint. One of the most famous graphs DBs,
Neo4j2, adopts the LPG model.</p>
      <p>Among all NoSQL DB, [19] models, graph DBs are gaining momentum both in academia and
industry applications thanks to their flexible, intuitive, and interpretable structure. In Graph
DBs, the data are stored in nodes, and nodes (or data blocks) may be associated with one another
via connections called edges [20], relationships which provide index-free adjacency. They are
described by the following main characteristics:
• the data and/or the scheme are represented by graphs, or by data structures generalizing
the notion of a graph (hypergraphs or hypernodes).
• manipulation is obtained by transformations on a graph, or by operations whose main
primitives are on graph features like paths, neighbourhoods, subgraphs, graph patterns,
connectivity, and graph statistics.
• integrity constraints enforce data consistency. These constraints can be grouped into
scheme-instance consistency, identity and referential integrity, and functional and
inclusion dependencies [21].</p>
      <p>An example of data manipulation and constraining is given in Fig. 1, that shows a fragment
of the Movie dataset 3 concerning the actors starring in a specific movie.</p>
      <p>We can add a constraint about nodes labelled ’Movie’, saying that the URLs of movies are
unique, using Cypher:</p>
      <p>C R E A T E C O N S T R A I N T u n i q u e _ r e l F O R ( n : M o v i e ) R E Q U I R E n . u r l I S U N I Q U E</p>
      <sec id="sec-3-1">
        <title>2https://neo4j.com/ 3https://github.com/neo4j-graph-examples/recommendations</title>
        <p>After the activation of this constraint, trying to insert a node having an already present URL
would return the following:</p>
        <p>N o d e ( 4 9 8 7 ) a l r e a d y e x i s t s w i t h l a b e l ’ M o v i e ’ a n d
p r o p e r t y ’ u r l ’ = ’ h t t p s : / / t h e m o v i e d b . o r g / m o v i e / 6 1 8 ’
In Neo4j there are other types of constraints regarding arcs, but not nearly as powerful as
ontology definitions.</p>
        <p>Compared to previous DB models (e.g., the relational one), graph DBs are more suitable when
the relationships matter more than the single data blocks. Also, the way these relationships
become visible and interpretable (by names) is relevant.</p>
        <p>Graph DBs may provide a solution to a major challenge for DB research, which is to provide a
scalable architecture to support Big Data [22] domains and exploit all state-of-the-art approaches
in Artificial Intelligence (AI) leveraging graph structures. The Big Data problem refers not only
to the volume of data but also to their variety. Indeed, ‘volume’ and ‘variety’ are two of the
ifve V’s [ 23] that distinguish problems to be included in big data scenarios from those that are
not. Moreover, graph DBs often represent an obvious choice when dealing with data that are
inherently organized as graphs, such as networks.</p>
        <sec id="sec-3-1-1">
          <title>3.1. GraphBRAIN</title>
          <p>GraphBRAIN is a knowledge base management system aimed at joining the eficiency in data
handling provided LPG graph DBs, specifically Neo4j, with the expressive power of ontologies.
Since Neo4j is schemaless, the ontologies would act as schemes and guide all the general CRUD
operations available in traditional DB settings. They would also help to keep consistency in the
graph as well as to discern the type of information that can be retrieved. Last but not least, using
ontologies as schemes provide high-level and formal interpretations of the data, and enables
advanced and semantic-aware automated reasoning and mining functions on them.</p>
          <p>
            While standard approaches to ontology description in AI adopt the RDF graph model,
GraphBRAIN defined its own formalism specifically built around the more powerful LPG model [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ].
It can express the standard ontological concepts: (a hierarchy of) entities (or ‘classes’ in SW
terminology), (a hierarchy of) relationships (or ‘object properties’ in SW terminology),
userdefined types, and entity attributes (or ‘datatype properties’ in SW terminology). Diferently
from standard SW approaches, it can express attributes on both entities and relationships, it
can label the nodes, and it can set several instances of the same relationship between the same
pair of instances (thanks to unique identifiers automatically assigned to every node and arc in
the graph). More specifically, the following mapping is established between elements of the
graph and elements of the ontologies/schemes:
• entity instances are represented as nodes, labelled with the most specific class (only) to
which they belong;
• relationship instances are represented as arcs, labelled with the most specific relationship
(only) they express;
• literal-valued attributes of entity and relationship instances are represented as node/arc
properties.
          </p>
          <p>GraphBRAIN ofers many services for handling both schemes and data. Regarding schemes,
it allows users to create, modify and merge ontologies. Merging is based on the unique name
assumption, that is, concepts sharing the same name must refer to the same concept.</p>
          <p>Users may search, browse and modify the database content. They can also explore the
available ontologies and define their own ontologies, from scratch or as a variation of existing
ones. They can also run several graph mining algorithms to obtain relevant indications on the
graph content. Examples of the functionalities provided by GraphBRAIN are:
• assess the relevance of nodes and arcs in the graph, and extract the most relevant ones;
• extract a portion of the graph that is relevant to some specified starting points (nodes
and/or arcs);
• extract frequent patterns and associated sub-graphs;
• predict possible links between nodes;
• retrieve complex patterns of nodes and relationships;
• translate portions of the graph into a more understandable form.</p>
          <p>Our schemes aim to also provide an abstract middle layer between representations. Storing
schemes in an intermediate format (e.g. XML) gives us the possibility of moving towards
other semantic directions. As an example, the intermediate format may be exported into other
(formal) languages to exploit, for instance, reasoning capabilities. Formal schemes are prone to
be exploited in the field of the SW [ 24]. In [25], a preliminary mapping between our schemes
and SW schemes (serialised in the Web Ontology Language (OWL) [26]) is shown.</p>
          <p>The SW is just one perspective that we can exploit for reasoning purposes. We are able to
translate schemes into a first-order logical language (Prolog), or into the OWL language. In this
perspective, by adding information about instances in the same formalism, we can implement
all reasoning techniques on Knowledge Bases. All approaches come under the umbrella term
“multistrategy reasoning” [27].</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. GraphBRAIN Scheme Specification</title>
      <p>We have mentioned the capability of our solution to deal with schemes without injecting them
into the graph. Schemes are stored in separate files, written in a standardised formalism.</p>
      <p>Let us now quickly describe how GraphBRAIN ontologies/schemes are formally represented.
More detailed information can be found in [28]. Compared to that description, here we refer
to a newer version that allows also us to specify sub-relationships and user-defined types.
GraphBRAIN schemes need identifiers for the concepts that are represented. By convention.
they consist of uppercase letters, lowercase letters or decimal digits only.</p>
      <p>GraphBRAIN schemes are expressed as XML files, organized according to the specifications
of a so-called GraphBRAIN Scheme (GBS) format. A scheme consists of an XML file whose tags
provide all the kinds of components in GBS ontology. In order to simplify the description of
the components, we will write XML tag names in boldface, XML tag attribute names in italics
and entity or relationship names in small caps. Text in plain typeface reports comments useful
to understand the various elements and their behaviour. For ease of reference, we report in
Figure 2 an excerpt of a GBS scheme defining the ontology for the domain of ‘computing’, and
will use it as a running example in the rest of this section. As visible in Figure 2, the root tag in
the XML is domain and specifies the name of the scheme in its name attribute.
Schema Modularity</p>
      <p>Each GBS scheme is intended to describe one domain. Still, to enforce
modular knowledge representation, some schemes might want to reuse the elements of other,
more ‘basic’ schemes (e.g., a scheme describing the Cultural Heritage domain might want
to reuse the scheme describing the Library domain). The first (optional) section of a GBS
ifle, enclosed in tag imports, allows doing this. Each scheme to be imported is specified in
the scheme attribute of an import tag (e.g., in Figure 2 the scheme ‘general’ is imported into
‘computing’). Should some elements of the imported scheme be unnecessary, they might be
removed using delete tags in this section (a feature not used in Figure 2). Imports are transitive,
that is, an imported scheme may in turn import others. Obviously, loops in importing are not
allowed. When defining an element (entity or relationship) with the same name as a previously
imported one, GraphBRAIN will assume they refer to the same concept, and try to merge them
if compatible or raise an error. Merging involves taking the union of their attributes (attributes
with the same name must be of the same type) and of their sub-classes or sub-relationships (it
is forbidden that a class  is a subclass of a class  in one scheme and vice-versa in the other).</p>
      <p>In the next (again, optional) section of a GBS file, enclosed in the types tag, new datatypes
can be defined for use in the entity or relationship attributes. Again, this supports reuse and
modularity. Each user-defined type is enclosed in a type tag. Its name, specified in the name
attribute, must be unique and will be referenced in the entity or relationship attributes. A
user-defined type consists of an enumeration of values of two possible kinds (reported in the
datatype attribute): ‘select’ (representing a plain list of values) or ‘tree’ (representing a hierarchy
of values). The lists of values are enclosed in tag values, with each value specified in a tag value
under attribute name. For ‘tree’ datatypes, tag value may recursively include nested values
tags. In Figure 2, a ‘select’ type logisticOperator is defined to represent logistic operators, and
used for attribute function of entity ElectronicComponent.</p>
      <p>Entities and relationships The last two sections in a GBS file, enclosed in tags entities
and relationships, allow specifying the classes and relationships of the ontology and their
generalization/specialization hierarchies. The universal entity Entity and the universal
relationship Relationship are the roots of their corresponding hierarchies and are abstract. We
will call top-level entities and relationships starting from the first level. Top-level entities (resp.
relationships) are specified using tag entity (resp. relationship). Direct specializations of a
class (resp. relationship) can be specified in an optional taxonomy tag (used only if the entity
or relationship has direct specializations), each enclosed in a value tag. The taxonomy tag is
recursive (value tags may in turn include it if they have further direct specializations in the
hierarchy. In Figure 2, top-class Award has no sub-classes, while top-class
ElectronicComponent has a direct sub-class Chip, that in turn has several direct sub-classes. All classes and
relationships, except abstract ones (for which attribute abstract=“true”) may have instances.
Attributes Each entity or relationship may have attributes, enclosed in the attributes tag.
It is mandatory for non-abstract entities (their instances must be described by some attribute)
and optional for relationships (a relationship carries information in its very linking of two
instances). Each attribute is described using an attribute tag, which reports, among other
information, its name (in attribute name) and type (in attribute datatype). It can take values from
a primitive data type (if datatype is one of ‘integer’, ‘real’, ‘boolean’, ‘string’, ‘text’, ‘date’), an
enumeration (if datatype is ‘select’ or ‘tree’, following the same specification as for user-defined
data types), a user-defined type (if datatype is the name of a user-defined type) or an instance
of another class in ontology (if datatype is ‘entity’, in which case attribute target specifies the
class). Attributes of type ‘entity’ are represented in the graph as arcs connecting the class
instance to the other entity’s instance; the same happens for dates, where the target entity
is one of the GraphBRAIN pre-defined classes Day, Month or Year. These actually represent
1:1 (i.e., functional) relationships between an instance of the entity and an instance of the
target entity. As usual in generalization taxonomies (e.g., in the Object Oriented programming
paradigm [29] or in ontologies [30]), sub-classes and sub-relationships inherit the attributes of
all their generalizations.</p>
      <p>Subject-Object pairs Relationships must also express their subject and object entities [31]
(or ‘domain’ and ‘range’, using SW terminology). Relationships with the same name may be
established between diferent pairs of entities. The set of pairs is enclosed in a references tag,
within which each pair is specified by a reference tag, whose subject and object attributes specify
the two classes. Note that the pairs are independent of each other, so they actually identify
diferent relationships (in SW approaches they should get diferent names). Sub-relationships
inherit all references from the parent (clearly, subjects and objects of sub-relationships must
be the same as the subjects or objects of their parents, or subclasses thereof). In Figure 2,
relationship mayReplace may be established between two ElectronicComponents and has a
sub-relationship replace.
5. API
While a Web application is available to interactively query, explore and modify a GraphBRAIN
knowledge graph4, our objective is to make the technology available to other applications. If
powered by GraphBRAIN, these applications might boost their efectiveness, and provide new
and advanced support to their users. To this aim, we propose an API providing all GraphBRAIN
functionality, from ontology definition to ontology-based data access and manipulation, from
semantic aware data mining to automated human-like reasoning on the available knowledge.
The first prototype of the API was developed in Java language, obtaining a jar that can be
imported as a library in Java applications and used to obtain the GraphBRAIN functionality.</p>
      <sec id="sec-4-1">
        <title>5.1. Functionality</title>
        <p>Development of the GraphBRAIN API is an ongoing project. We will list and describe here the
functions that are already fully implemented. A first requirement was that the API must provide
at least all the functions that are available in the interface, as a direct query or by a combination
of queries. Every function in the API must guarantee consistency in the DB. Hence, creations
and updates of nodes, relationships, and attributes are guided by the scheme previously created
through the API itself.</p>
        <p>We implemented first the most general CRUD operations, common to all DB applications,
both for the instances (the content of the DB):
• login: login to a specific instance of Neo4j by providing the URL and credentials (user,
password).
4Available at http://193.204.187.73:8088/GraphBRAIN/
• create node: create a node by assigning it a label (and optionally properties).
• connect two nodes with an arc: create an arc by specifying properties that identify
the subject and object of the relationship.
• create arc(s): same functionality as the previous one but not limiting the case to a 1 to 1
relationship.
• delete node/arc: delete node/arc by specifying its properties.
• filter nodes/arcs : return list of nodes/arcs identified by specific properties.
• get node/relationship info: return info about a specific node/relationship by specifying
known properties.
and for scheme/ontology definition:
• load scheme: load an entire GBS or OWL scheme.
• create/rename/delete class: add, rename or remove a class to/in/from the ontology.
• create/rename/delete relationship: add, rename or remove a relationship to/in/from
the ontology.
• create attribute: create an attribute by specifying at least its name and type.
• add/remove attribute to/from class: add/remove a previously-generated attribute to
a class.
• add/remove attribute to/from relationship: add/remove a previously-generated
attribute to a relationship.</p>
        <p>The possibility of loading an external scheme/ontology is of paramount importance to not
create from scratch large state-of-the-art ontologies. The API allows importing ontologies in
GBS formalism or in the standard OWL language. OWL may express more complex constraints
but in our settings we just import what the GB scheme allows, ignoring the rest. Moreover, the
API can export the ontology and (a portion of) the graph to OWL/RDF. This allows us to exploit
OWL reasoners to perform common inference operations like instance checking, ontology
consistency and so on. For instance, the signature of the methods to load an entire schema is:
( i ) p u b l i c D o m a i n D a t a ( S t r i n g d o m a i n N a m e , F i l e f i l e )
( i i ) p u b l i c s t a t i c D o m a i n D a t a r e a d O W L ( S t r i n g f i l e P a t h )
DomainData is the Java class that stores an entire schema. It stores two hierarchies (classes and
relationships), and for each class (resp., relationship) information about their attributes. Both
methods return a DomainData object: (i) takes the name of the domain and a GBS file, (ii) takes
the path of the OWL source.</p>
      </sec>
      <sec id="sec-4-2">
        <title>5.2. Technicalities</title>
        <p>The API requires the (local or online) URL of a Neo4j instance to be queried, and associated
access credentials. Then, given a GBS schema and a query in Cypher language, it presents the
result to the user only if the query is scheme-compliant, just like in traditional DBs. For the
sake of simplicity, but without loss of generality, we will show how it works for the following
query, which represents the basic building block of all Cypher queries:
MATCH ( n : lN { pN } ) −[ r : t R { pR } ] − &gt; (m: lM { pM } ) RETURN n . qN , r . qR ,m. qM
meaning: “Return properties  ,  ,  of nodes  ,  and arc  , respectively, such that  and 
are connected by arc  and  ,  and  have the specified labels  ,  ,  and property values
 ,  ,  , respectively”.</p>
        <p>The general strategy is as follows: the variables (aliases) used in the query to name the
subject, relationship and object are identified, and the labels and properties associated with
them are extracted. Then, the ontology is consulted to check if relationship  is valid between
classes  and  or their generalizations, in which case the ontology is consulted again to
check if the properties are valid for the three components (or for their generalizations). If so,
the query is finally issued, and the results are collected and returned.</p>
        <p>The API parses the query to extract its building blocks and creates a map structure in which
the keys are the aliases and the attributes store all information about them. So, the first objective
is identifying the aliases. The API first parses the query to distinguish the pattern-matching
clause, introduced by the keyword MATCH, from the results clause, introduced by the keyword
RETURN. The aliases are located in the former. So, the former is further split, based on a pair of
square brackets that enclose the relationship. In the resulting 3 parts, the aliases are identified
because they are the identifiers that precede the colon :. The labels of the aliases are recognized
because they follow the colon; in turn, they are followed by the properties, recognized because
delimited by braces { and }. Then, additional properties associated with the aliases are located
in the results clause. They are associated with their respective aliases, connected to them by
a dot . and separated by commas ,. These properties are added to the list of properties of the
aliases, if not already present.</p>
      </sec>
      <sec id="sec-4-3">
        <title>5.3. Use Examples</title>
        <p>Here we show two examples of applications of our API. These examples were run on the online
DB available with the GraphBRAIN Web Application, and the ontology/scheme
retrocomputing, also available in the Web Application, which concerns the history of computing.</p>
        <p>The former example regards the execution of a query on an existing DB according to a given
scheme. Using the API we loaded the ontology by specifying the path of its GBS file. This is
the straightforward way when we already have a scheme. After loading, we may modify the
scheme but we focus now on the graph querying. This is the query example we gave:
MATCH ( n : P e r s o n { name : ’ Donald ’ , surname : ’ M i c h i e ’ } ) −[ r : d e v e l o p e d { r o l e
: ’ a u t h o r ’ } ] − &gt; (m: Book { l a n g u a g e : ’ i t a ’ } ) RETURN n . name , n . surname , r .
r o l e ,m. t i t l e
asking the DB for all the books in the Italian language written by Donald Michie as the author.
The API parses the query and identifies 3 variables: two nodes  ,  and an arc  , such that
the label of  is Person, the label of  is Book and the type of  is developed. It looks at the
ontology for a relationship Person.developed.Book and realizes that, while it does not exist,
Person-Document is a valid subject-object pair for relationship developed, where Document
is a superclass of Book. So far, so good. Now, for each such element, the ontology is queried to
check whether the specified attributes are valid. In fact, the ontology confirms that attributes
name and surname are specified for entity Person, attributes language and title are specified
for entity Document (and thus inherited by subclass Book), and attribute role is specified for
relationship developed. So, the query is run and the following result is returned:</p>
        <p>In the other example, we show how we can move from the graph DB perspective to the SW
one by mapping schemes and instances onto an OWL/RDF-based representation, and executing
an SW reasoner. For the latter task, we used the OWL API 5, which is compatible with Java, and
makes available various SW reasoners, including Pellet [32]. Since mapping a large number of
instances may require a big computational efort, without loss of generality, we exported only
a portion of the whole graph. To demonstrate the versatility of our solution, the second use
case exploits an external ontology regarding tourism 6. To this KB, some Italian instances have
been added in order to create value from reasoning. The exported portion was obtained starting
from a specified node (purposely chosen to belong to the Hotel class specified in the ontology)
and taking all its neighbouring within a fixed maximum number (  ) of hops from it. Here we
chose  = 2 . Then we performed the following reasoning tasks, concerning both the exported
instances and the scheme: ontology consistency, subclasses of a class, atoms inferences.</p>
        <p>Here is an extract of the results provided by Pellet.</p>
        <p>I s the ontology c o n s i s t e n t ? True
I s i t c o n s i s t e n t ? True
S u b c l a s s e s of S i t e :
Nodeset [ Node ( &lt; h ttp : / /www. semanticweb . org / o n t o l o g i e s / Hotel . owl# Park &gt; ) , Node ( &lt; htt p : / /www. semanticweb . org / o n t o l o g i e s
/ Hotel . owl# Hotel_3_Stars &gt; ) , Node ( &lt; h ttp : / /www. semanticweb . org / o n t o l o g i e s / Hotel . owl# F i r e _ S t a t i o n &gt; ) , Node ( &lt;
http : / /www. semanticweb . org / o n t o l o g i e s / Hotel . owl# Hotel &gt; ) , Node ( &lt; htt p : / /www. semanticweb . org / o n t o l o g i e s / Hotel .
owl# Subway_Station &gt; ) ,
. . .</p>
        <p>Axiom : − SubClassOf ( &lt; h ttp : / /www. semanticweb . org / o n t o l o g i e s / Hotel . owl#WI−FI &gt; &lt; http : / /www. semanticweb . org / o n t o l o g i e s /</p>
        <p>Hotel . owl# P u b l i c _ f a c i l i t y &gt;)
I s axiom e n t a i l e d by reasoner ? : − t r u e
I s axiom contained in ontology ? : − t r u e
No . of E x p l a n a t i o n s : − 1
Explanation : −
[ SubClassOf ( &lt; http : / /www. semanticweb . org / o n t o l o g i e s / Hotel . owl#WI−FI &gt; &lt; ht tp : / /www. semanticweb . org / o n t o l o g i e s / Hotel . owl
# P u b l i c _ f a c i l i t y &gt;) ]</p>
        <p>The reasoner confirmed KB consistency, built using the online ontology and schema-compliant
data loaded into GraphBRAIN. It correctly recognized subclasses and provided axioms based on
ontological hierarchy since no external rules were introduced.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusion and Future Work</title>
      <p>Graph DBs are becoming increasingly popular for industrial and research purposes. However,
their lack of schemes hampers data interpretability and semantic operations. To address this, the
GraphBRAIN framework ofers a solution by introducing LPG-compatible ontologies for Neo4j,
transforming the DB into a comprehensive knowledge base. The paper presents an API that
allows applications to easily construct, refine, and reuse ontologies, enabling seamless access
and manipulation of data. Future developments include expanding functionalities, language
support (Python), and a full-fledged integration with standard Semantic Web tools.</p>
      <sec id="sec-5-1">
        <title>5https://owlapi.sourceforge.net/ 6http://www.di.uniba.it/ lisi/ontologies/OnTourism.owl</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was partially supported by the projects FAIR – Future AI Research (PE00000013),
spoke 6 – Symbiotic AI, and CHANGES – Cultural Heritage Active innovation for Next-GEn
Sustainable society (PE00000020), Spoke 3 – Digital Libraries, Archives and Philology, under
the NRRP MUR program funded by the NextGenerationEU.
[14] J. Pokornỳ , M. Valenta, J. Kovačič, Integrity constraints in graph databases, Procedia</p>
      <p>Computer Science 109 (2017) 975–981.
[15] D. Dominguez-Sal, P. Urbón-Bayes, A. Giménez-Vanó, S. Gómez-Villamor, N.
MartínezBazan, J. L. Larriba-Pey, Survey of graph database performance on the hpc scalable graph
analysis benchmark, in: Web-Age Information Management: WAIM 2010 International
Workshops: IWGD 2010, XMLDM 2010, WCMT 2010, Jiuzhaigou Valley, China, July 15-17,
2010 Revised Selected Papers 11, Springer, 2010, pp. 37–48.
[16] R. Angles, The property graph database model., in: AMW, 2018, pp. 1–10.
[17] S. Jouili, V. Vansteenberghe, An empirical comparison of graph databases, in: 2013</p>
      <p>International Conference on Social Computing, IEEE, 2013, pp. 708–715.
[18] M. Y. Kpiebaareh, W.-P. Wu, S. Bayitaa, C. R. Haruna, L. Tandoh, User-connection behaviour
analysis in service management using bipartite labelled property graph, in: Proceedings
of the 16th EAI International Conference on Mobile and Ubiquitous Systems: Computing,
Networking and Services, 2019, pp. 318–327.
[19] S. Sharma, R. Shandilya, S. Patnaik, A. Mahapatra, Leading nosql models for handling big
data: a brief review, International Journal of Business Information Systems 22 (2016) 1–25.
[20] G. ShefaliPatil, A. Bhatia, Graph databases-an overview, 1Student, ME Computers, Terna</p>
      <p>College of Engg, Navi Mumbai 2 (2014) 657–660.
[21] R. Angles, C. Gutierrez, Survey of graph database models, ACM Computing Surveys
(CSUR) 40 (2008) 1–39.
[22] S. Sagiroglu, D. Sinanc, Big data: A review, in: 2013 international conference on
collaboration technologies and systems (CTS), IEEE, 2013, pp. 42–47.
[23] T. L. Nguyen, A framework for five big v’s of big data and organizational culture in firms,
in: 2018 IEEE International Conference on Big Data (Big Data), IEEE, 2018, pp. 5411–5413.
[24] H.-G. Kim, Semantic web, Recuperado de http://semanticweb. org/wiki/Main_Page. html
(2003).
[25] D. Di Pierro, D. Redavid, S. Ferilli, Linking graph databases and semantic web for reasoning
in library domains, in: Proceedings of the 18th Italian Research Conference on Digital
Libraries, volume 3160, 2022, pp. 1–12. URL: https://www.scopus.com/inward/record.uri?
eid=2-s2.0-85134261683&amp;partnerID=40&amp;md5=7da94dc9c8902e78b717199709f28991.
[26] D. Allemang, J. Hendler, Semantic web for the working ontologist: efective modeling in</p>
      <p>RDFS and OWL, Elsevier, 2011.
[27] S. Ferilli, Gear: A general inference engine for automated multistrategy reasoning,
Electronics 12 (2023) 256.
[28] S. Ferilli, Integration strategy and tool between formal ontology and graph database
technology, Electronics 10 (2021) 2616.
[29] T. Rentsch, Object oriented programming, ACM Sigplan Notices 17 (1982) 51–57.
[30] B. Smith, Ontology, in: The furniture of the world, Brill, 2012, pp. 47–68.
[31] S. Decker, S. Melnik, F. V. Harmelen, D. Fensel, M. Klein, J. Broekstra, M. Erdmann,
I. Horrocks, The semantic web: The roles of xml and rdf, IEEE Internet computing 4 (2000)
63–73.
[32] E. Sirin, B. Parsia, B. C. Grau, A. Kalyanpur, Y. Katz, Pellet: A practical owl-dl reasoner,
Journal of Web Semantics 5 (2007) 51–53.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ferilli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Redavid</surname>
          </string-name>
          ,
          <article-title>The graphbrain system for knowledge graph management and advanced fruition</article-title>
          ,
          <source>in: Foundations of Intelligent Systems: 25th International Symposium, ISMIS</source>
          <year>2020</year>
          , Graz, Austria,
          <source>September 23-25</source>
          ,
          <year>2020</year>
          , Proceedings, Springer,
          <year>2020</year>
          , pp.
          <fpage>308</fpage>
          -
          <lpage>317</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ferilli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Redavid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. D.</given-names>
            <surname>Pierro</surname>
          </string-name>
          ,
          <article-title>Lpg-based ontologies as schemas for graph dbs</article-title>
          ,
          <source>in: Proceedings of the SEBD 2022: The 30th Italian Symposium on Advanced Database Systems (SEBD</source>
          <year>2022</year>
          ), volume
          <volume>3194</volume>
          of
          <string-name>
            <given-names>Central</given-names>
            <surname>Europe (CEUR) Workshop</surname>
          </string-name>
          <string-name>
            <surname>Proceedings</surname>
          </string-name>
          ,
          <year>2022</year>
          , pp.
          <fpage>256</fpage>
          -
          <lpage>267</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Carrillo-Tripp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Shepherd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. A.</given-names>
            <surname>Borelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Venkataraman</surname>
          </string-name>
          , G. Lander,
          <string-name>
            <given-names>P.</given-names>
            <surname>Natarajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , C. L.
          <string-name>
            <surname>Brooks</surname>
            <given-names>III</given-names>
          </string-name>
          , V. S. Reddy,
          <article-title>Viperdb2: an enhanced and web api enabled relational database for structural virology</article-title>
          ,
          <source>Nucleic acids research</source>
          <volume>37</volume>
          (
          <year>2009</year>
          )
          <fpage>D436</fpage>
          -
          <lpage>D442</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Funke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Renaud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Freney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Riegel</surname>
          </string-name>
          ,
          <article-title>Multicenter evaluation of the updated and extended api (rapid) coryne database 2.0</article-title>
          ,
          <source>Journal of Clinical Microbiology</source>
          <volume>35</volume>
          (
          <year>1997</year>
          )
          <fpage>3122</fpage>
          -
          <lpage>3126</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E. R.</given-names>
            <surname>Jeferson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. P.</given-names>
            <surname>Walsh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. J.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J.</given-names>
            <surname>Barton</surname>
          </string-name>
          ,
          <article-title>Snappi-db: a database and api of structures, interfaces and alignments for protein-protein interactions</article-title>
          ,
          <source>Nucleic acids research</source>
          <volume>35</volume>
          (
          <year>2007</year>
          )
          <fpage>D580</fpage>
          -
          <lpage>D589</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Swainston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hastings</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dekker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Muthukrishnan</surname>
          </string-name>
          , J. May,
          <string-name>
            <given-names>C.</given-names>
            <surname>Steinbeck</surname>
          </string-name>
          , P. Mendes,
          <article-title>libchebi: an api for accessing the chebi database</article-title>
          ,
          <source>Journal of Cheminformatics</source>
          <volume>8</volume>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Murzyn</surname>
          </string-name>
          , E. Jans,
          <string-name>
            <given-names>M.</given-names>
            <surname>Clemenson</surname>
          </string-name>
          ,
          <article-title>Spears: A database-invariant spectral modeling api</article-title>
          ,
          <source>Journal of Quantitative Spectroscopy and Radiative Transfer</source>
          <volume>277</volume>
          (
          <year>2022</year>
          )
          <fpage>107958</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Siva</surname>
          </string-name>
          ,
          <volume>1000</volume>
          genomes project,
          <source>Nature biotechnology 26</source>
          (
          <year>2008</year>
          )
          <fpage>256</fpage>
          -
          <lpage>257</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Weigel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mott</surname>
          </string-name>
          ,
          <article-title>The 1001 genomes project for arabidopsis thaliana</article-title>
          ,
          <source>Genome biology 10</source>
          (
          <year>2009</year>
          )
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Rios</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. M.</given-names>
            <surname>McLaren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Birney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stabenau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Flicek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cunningham</surname>
          </string-name>
          ,
          <article-title>A database and api for variation, dense genotyping and resequencing data</article-title>
          ,
          <source>BMC bioinformatics 11</source>
          (
          <year>2010</year>
          )
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Elbattah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Roushdy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Aref</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.-B. M. Salem</surname>
          </string-name>
          ,
          <article-title>Large-scale ontology storage and query using graph database-oriented approach: The case of freebase</article-title>
          ,
          <source>in: 2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems (ICICIS)</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>39</fpage>
          -
          <lpage>43</lpage>
          .
          <source>doi:1 0 . 1 1 0 9 / I n t e l C I S . 2 0</source>
          <volume>1 5 . 7 3 9 7 1 9 1 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>F.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>X. Yuan,</surname>
          </string-name>
          <article-title>Neo4j graph database realizes eficient storage performance of oilfield ontology</article-title>
          ,
          <source>PloS one 13</source>
          (
          <year>2018</year>
          )
          <article-title>e0207595</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>D.</given-names>
            <surname>Di Pierro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ferilli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Redavid</surname>
          </string-name>
          ,
          <article-title>Lpg-based knowledge graphs: A survey, a proposal and current trends</article-title>
          ,
          <source>Information</source>
          <volume>14</volume>
          (
          <year>2023</year>
          )
          <fpage>154</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>