=Paper=
{{Paper
|id=Vol-2786/Paper2
|storemode=property
|title=Semantic Hybrid Multi-Model Multi-Platform (SHM3P) Databases
|pdfUrl=https://ceur-ws.org/Vol-2786/Paper2.pdf
|volume=Vol-2786
|authors=Sven Groppe
|dblpUrl=https://dblp.org/rec/conf/isic2/Groppe21
}}
==Semantic Hybrid Multi-Model Multi-Platform (SHM3P) Databases==
Semantic Hybrid Multi-Model Multi-Platform (SHM3P)
Databases
Sven Groppe
Institute of Information Systems (IFIS), University of Lübeck, Ratzeburger Allee 160, D-23562 Lübeck, Germany
Abstract
Today’s companies have to handle a zoo of data of different models. Multi-model databases promise to simplify data admin-
istration for the parallel usage of different data models. Compared to the other data models, semantic data models introduce
an additional abstraction layer for reasoning purposes, such that semantic data models provide superior capabilities. Hence
semantic multi-model databases use the semantic data model as main glue between the different data models. Furthermore,
applications as well as databases are today running on different platforms like mobile devices, web, desktops, servers, clouds
and post-clouds (e.g., fog and edge computing). Hybrid multi-model multi-platform (HM3P) databases and its semantic
counterpart (SHM3P databases) integrate the different platforms in order to offer their advantages and benefits for data dis-
tribution, query processing and transaction handling to their users. In this paper we introduce and discuss the novel concept
of SHM3P databases and its open challenges.
Keywords
Semantic Web, databases, multi-platform, multi-model, cloud, post-cloud, edge computing, fog computing, dew computing,
hardware acceleration, Internet-of-Things, mobile database, parallel database, main-memory database
1. Introduction model data [3] hindering optimizations down to the
physical layer of connected DBMSs [4]. Furthermore,
Today companies have to deal with and process data we propose the semantic data model in order to unify
in various data formats: The backends of their web the other data models, because the semantic data model
shops with databases about customers and their or- offers the ontology layer as additional abstraction layer,
ders are typically connected to relational databases. which can be utilized for data integration purposes of
Product catalogs of companies are often exchanged us- the other data models.
ing XML, JSON or RDF. The boom of social networks While in the past database management systems
leads to a high demand to process their graph data, (DBMSs) run mainly on parallel servers, there are to-
other social media like wikis offer their data as un- day various different platforms like mobile devices,
structured data. Key-value stores are often used when- web, desktops, servers (maybe additionally hardware
ever data must be accessed in a simple way just via accelerated by GPUs, FPGAs and in future scenarios
keys. However, there is also a need for schema-free even quantum computing), clouds and post-clouds (e.g.,
or schema-less databases, which don’t ask the data to fog and edge computing) offering execution environ-
stay in the inflexible corset of a schema, but still work- ments for running a DBMS1 .
ing on complex data formats like document stores. The Multi-platform development (as supported by e.g.
data is hence stored according to and processed using the programming language Kotlin [5]) allows to share
different models (multi-model data [1]). The big chal- common code between different platforms like desk-
lenge for today’s companies are the synchronization top, server, web, mobile and IoT. Multi-platform de-
and integration of their multi-model data into a sin- velopment reduces the development costs for a DBMS
gle view of and for the customer [2]. Multi-Model running on multiple platforms drastically.
Database Management Systems (MM-DBMSs) of- Puzzling all pieces together we propose the follow-
fer the management of different data models in one ing definitions ((H)M3P DBMS are defined according
single database [1] in order to overcome the disadvan- to [4]):
tages of polyglot persistence, where applications use
several databases at the same time to handle multi- Definition 1 (M3P/HM3P/SHM3P DBMS). A Multi-
Model Multi-Platform Database Management System
ISIC’21: International Semantic Intelligence Conference, February (M3P DBMS) is a MM-DBMS that can be executed on
25–27, 2021, New Delhi, India different platforms. A hybrid M3P (HM3P) DBMS spans
" groppe@ifis.uni-luebeck.de (S. Groppe) over different platforms in operation. A Semantic HM3P
0000-0001-5196-1117 (S. Groppe)
© 2021 Copyright for this paper by its authors. Use permitted under Creative 1 Note that clients of DBMSs typically run on different plat-
Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) forms, but we are considering the database server here.
17
Single instance of SHM3P Database
offers (fully cross-platform optimized) functionality of & replaces
Mobile DB
Mobile Devices
Quantum DB Cloud DB & Infrastructure
Main- Quantum Cloud
IoT DB Computer
Memory DB
On the Edge GPU
GPU-accelerated
Reasoning: Parallel Server
Lightweight reasoning on Heavyweight reasoning Heavyweight reasoning Reasoning on small data sizes
large data sizes of IoT devices on moderate data sizes on large data sizes of mobile devices
How to integrate the different reasoning capabilities and requirements into one transparent global reasoner?
Figure 1: SHM3P database spanning over multiple platforms. Here, an SHM3P database replaces an IoT database in an
Industry 4.0 scenario (using edge-computing), a GPU-accelerated parallel database (on a parallel server) for archiving and
generating long-term statistics of the IoT data, which is further supported by a quantum computer for query and reasoning
optimization, a database in the cloud for natural language processing tasks and a mobile database (on mobile devices and
infrastructure) for monitoring and controlling of the production line in the company. Platforms are marked with an italic
font. Green text marks discussion about reasoning in these scenarios. Figure is based on [4] and extended by the discussion
on reasoning.
(SHM3P) DBMS supports a (global) semantic layer (for cerning MM-DBMSs, multi-platform development,
querying and reasoning purposes) over all platforms of databases running on different platforms, polyglot per-
an HM3P DBMS. sistence and further related work. Section 3 introduces
SHM3P DBMSs and explores the advantages, and anal-
Whereas today’s M3P DBMSs are typically devel- yses envisioned platforms and common properties of
oped for platforms of the same type (like windows and their combinations. Finally we summarize the results
linux servers, see Section 2.1), some other even span and provide an overview of future work in Section 4.
over a (locally installed) private cloud and a public cloud
(in a so called hybrid cloud 2 ). In contrast, we envi-
sion SHM3P DBMSs over platforms of different type 2. Basics
(like IoT and hardware-accelerated parallel servers) in-
tegrating the features of databases developed for these 2.1. Databases for Multi-Model Data
platforms (like energy-savings on IoT devices and high
Polyglot persistence uses different databases support-
throughput on servers) while offering advanced global
ing different data models (and maybe running on dif-
reasoning capabilities over all platforms. Hence SHM3P
ferent platforms) within one application [3]. Federated
databases support any data model at any platform by
query languages enable polyglot persistence by sup-
tightly integrating them with a semantic layer. For an
porting queries over heterogeneous data stores within
example installation, see Figure 1.
one single query. One example of such a query lan-
Our main contributions are:
guage is CloudMdsQL [6], with which one can for-
• the introduction of SHM3P DBMS as new type of
mulate queries over SQL and NoSQL databases. The
DBMS,
proposed prototype even optimizes the queries glob-
• a detailed discussion of the current state of the art
ally and pushes operations down to the integrated SQL
about and comparative analysis of DBMS designed
and NoSQL databases as much as possible. A similar
for different platforms with special attention to Se-
approach is taken by [7] offering to query cloud-based
mantic Web DBMS, and
NoSQL like Google’s Bigtable and relational databases
• a discussion about open research challenges for
with the Google Bigtable query language GQL. The fo-
HM3P DBMS and SHM3P DBMS.
cus of Apache Drill3 is interactive ad-hoc analysis of
The remainder is as follows: Section 2 describes the
large-scale datasets with low-latency handling up to
basics and an analysis of current state-of-the-art con-
petabytes of data spread across thousands of servers.
2 Please note that private and public clouds are platforms of the
same type. 3 https://drill.apache.org/ (accessed on 17.12.2020)
18
Drill optimizes a query plan to leverage the datastore’s where most of which, i.e. 4 of these 5 MM DBMS with
internal processing capabilities and by considering data RDF support, also manage graph data. The graph model
locality. Commercial multi-store products like IBM seems to be more popular (12 from 21 MM DBMS).
BigInsights, Microsoft HDInsight and Oracle Bigdata MM DBMS with RDF support typically don’t support
Appliance as well as open source projects like PrestoDB4 reasoning at all or only in a rudimentary way, such
integrate diverse data sources by using database con- that users should look for native semantic DBMS if
nectors (like JDBC drivers). Tatooine [8] uses a se- reasoning is needed. Hence reasoning seems to be chal-
mantic layer as glue between databases for different lenging in the MM DBMS context. Most multi-model
data models supporting a semantic integration. How- databases run SQL, SQL-like or extensions of SQL
ever, all these polystores also don’t support to fully op- queries. Binaries of these databases are offered in ma-
timize queries across the integrated, but independent chine code (often compiled from C/C++) or for the Java
data sources, which limit data processing. virtual machine (JVM). They usually run on all or a big
Federation Databases [9] and multidatabases [10] subset of the major desktop operating systems linux,
place a mediator between different autonomous windows, macOS, unix and their variants. Few multi-
databases for integration purposes by reformulating model databases like IBM DB2 run on mainframes op-
queries according to a global schema to the native erating e.g. z/OS. While all offer to run in the cloud,
schemes of the integrated databases, which afterwards some are also enabled for the hybrid cloud. In the
execute these queries. Today, some research focus on hybrid cloud, a (locally installed) private cloud is to-
federating databases following the polyglot persistence gether used with a public cloud. Hybrid clouds de-
approach: For example, DBMS+ [11] provides unified crease costs spent to the public cloud provider while
declarative processing for the integration of several still having on-demand resources with the illusion of
processing and database platforms. BigDAWG [12] of- infinite capacity at the public cloud for a surprising
fers location transparency while running queries high resource demand.
against the three different integrated systems While all multi-model databases run on different plat-
PostgreSQL, SciDB and Accumulo. forms, they don’t integrate database instances on dif-
Multi-Model Databases: A multi-model database ferent types of platforms and different types of databases.
is one single database for multiple data models, which Databases in hybrid clouds combining the resources of
fully integrates a backend to offer advanced perfor- a locally installed private cloud with a public cloud are
mance, scalability and fault tolerance [13]. One of the approximations of the idea of operating on multiple
first of this type are Object-Relational DataBase Man- platforms of different types. An HM3P DBMS extends
agement Systems (ORDBMSs), which support various this idea and supports multiple types of platforms like
data models like relational, text, XML, spatial and ob- main-memory, cloud, Internet-of-Things (with e.g. edge
ject. ORDBMSs use the relational technology for im- computing) and hardware-accelerated databases using
plementing the support of their data models, i.e., the their different advantages at runtime for database tasks
relational model is the first-class citizen. In compari- like data distribution, transaction handling and query
son and in general, in multi-model databases the dif- processing. A SHM3P DBMS offers a semantic layer as
ferent models can be all first class citizens and sup- glue between the different data models and supports
ported in a native way (utilizing e.g. specialized in- global semantic querying and reasoning by tightly in-
dices for them). The authors in [14] propose to use a tegrating local query engines and reasoners.
semantic layer as glue between the different data mod-
els in order to support global querying and reasoning 2.2. Multi-Platform Development
over all data. We extend this idea to multi-platform
databases integrating the technologies and features of There are several programming languages like C/C++
different types of databases. available compiling to various platform targets in their
[4] contains an overview of current state-of-the-art native machine code best suitable for high performance
multi-model databases, their type of extension, their programs. Calls to the operating system for disk ac-
supported data models, query languages and platforms. cesses or developing a (native) graphical user interface
The investigated multi-model databases support at most must be ported to the different platforms. There is no
5 from 8 data models, such that no multi-model database special support for multi-platform development like
offers all data models to their users. From the investi- code-sharing of common code and allowing to define
gated 21 MM DBMS only 5 support RDF as data model, platform-specific modules to code the differences be-
tween the different platforms. Java was one of the first
4 https://prestodb.io/ (accessed on 17.12.2020) programming languages for developing one code run-
19
ning on different platforms, which is still the key for Semantic Web tools with native binaries run usually
the success of Java. It has been implemented by com- on any desktop and server computers, some only on
piling to bytecode, which is processed in the Java vir- linux operating systems.
tual machine (JVM) available for many platforms. The Hence these DBMSs can be called Multi-Platform
JVM introduces an intermediate abstraction layer, but DBMSs, but don’t bring the multi-platform approach
also some performance overhead, although the byte- to its full potential. They are typically developed for
code is often just-in-time (JIT) compiled to native ma- one type of platform: server, cluster or cloud. DBMSs
chine code. Scripting languages like JavaScript also designed for different types of platforms like cluster,
run on different platforms (i.e., wherever browsers and mobile, IoT and the web are not considered so far. HM3P
Node.js environments can be started). JavaScript be- DBMSs span over different platforms at runtime, which
sides HTML 5 is the basis of cross-platform libraries may be the case for hybrid cloud installations, but which
like React Native and PhoneGap. Advanced multi- are also not deployed at different platform types. Hence,
platform support introducing a module concept for shar- full-fledged HM3P DBMSs have to consider various
ing common code between the different platforms, and different properties (e.g., availability of nodes, stor-
platform-specific modules for coding remaining dif- age and computing resources), the data (like security
ferences, is introduced by modern programming lan- concerns) and queries (like one-time versus continu-
guages like Kotlin [5]. Kotlin offers multi-platform sup- ous queries) of the supported platforms at runtime for
port for the JVM (Desktop, Server and Android), data distribution and processing. Reasoning support
JavaScript engines (browser and server via Node.js) is not available for all platforms and types of queries
and via LLVM Windows, Linux, Android (arm32/64), [16]: While many contributions exist for RDFS and
MacOS, iOS, Raspberry Pi and WebAssembly. OWL support during one-time query processing on
Many DBMSs are implemented in C/C++ for per- server and desktop computers, there exist only few ap-
formance reasons and run in native machine code for proaches for the cloud and for P2P networks. There
operating systems like Windows, Linux, Unix and Ma- exist only few approaches for trigger and continuous
cOS (see [4]). Some modern DBMSs and most Seman- queries with RDFS and OWL support on server and
tic Web tools (see [15]) are implemented in Java fur- desktop computers as well as for the cloud. Ontology
ther decreasing development costs, but still running inference for trigger and continuous queries in P2P
on clusters and servers operating Windows, Linux, Unix networks haven’t been considered so far. The devel-
and MacOS. Real multi-platform tools by e.g. using opment of an SH3MP database may help to support
Kotlin multi-platform projects are missing so far for ontology inference in trigger and continuous queries
Semantic Web tools. with reasonable efforts also on these platforms.
Multi-Platform Clients offering to set up queries
2.3. Databases for different Platforms and displaying their results are available for all DBMSs5 :
DBMSs typically offer clients for platforms like the Web,
Most DBMSs and their clients run on different plat- major desktop operating systems like Windows, Linux,
forms. There exist usually also numerous language Unix and MacOS, mobile apps like android and iOS.
bindings for APIs calling database functionalities from Some clients are even implemented as cross-platform
database applications. application6 , which also support different DBMSs. The
Multi-Platform DBMSs are typically either imple- situation is quite comfortable for the Semantic Web:
mented in C/C++ or in Java. Ports are often available The W3C standardized the protocol to query SPARQL
for Windows, Linux, Unix (sometimes for Solaris) and endpoints in [17]. The protocol [17] is widely sup-
MacOS (see [4]). Only few DBMSs still run on main- ported and hence the Semantic Web DBMSs as well
frames. Modern DBMSs run in the Cloud and some- as the clients can be easily exchanged.
times they are offered only as managed service in the The user may have the impression that a database
Cloud (e.g., Cosmos DB). Some few are also running in may be running on different platforms, because (s)he
a Hybrid Cloud, where the DBMS is running in a local gets in touch with clients for the database available for
installation of a cluster (private cloud) as well as in a different platforms. However, the DBMS does neither
public cloud (of a cloud provider). [15] contains a se- store nor process the data on the clients’ computer, but
lection of 18 widely-used Semantic Web tools includ- only transfers the query result to it. We envision a
ing triple stores and Semantic Web databases. Over SHM3P DBMS, where the advantages of the different
half of these tools are implemented in Java (i.e., 6 of 5 We consider PostgreSQL and its clients as example here.
these tools run on any platform, which supports java) 6 For example, DBeaver available at https://dbeaver.io/ (accessed
or support java language bindings (4 of these tools). on 17.12.2020).
20
Size:
Mebi
advantages because they have been developed for dif-
Yobi
Zebi
Byte
Pebi
Tebi
Gibi
Exbi
Kibi
Binary: 21 210 220 230 240 250 260
ferent application scenarios, devices, properties of their
270 280 2167
Decimal: 10 103 106 109 1012 1015 1018 1021 1024 1050 indexed data (velocity, heterogeneity, size etc.) and
so on. Table 1 contains a rough evaluation of these
Yotta
Atoms on Earth
Zetta
Mega
Giga
Peta
Byte
Tera
Kilo
Exa
databases. Databases have tailored their architectures
Data: Office Internet Big Data* according to the properties of the different platforms,
IoT but often also to the required properties coming from
Company: SMEs Global Player their applications. Especially distributed databases can-
Devices: not offer all: The well known PAC theorem [19] de-
IoT Device Cluster Multi- scribes trade-offs, where developers of distributed sys-
Embedded Mobile Cloud Cloud
tems (and hence also distributed databases) can choose
Historical
Home
Server to fully support only two features with high efficiency
Computer Desktop out of three: Partition-tolerance, Availability and
Databases: Main Memory Cloud Consistency. For example, if the system works cor-
Hardware
Centralized IoT rectly also in the case of network partitions and is highly
Mobile Web Cloud available, then consistency must be relaxed, such that
Platforms: Desktop Cloud some replicas may contain older states and not the
Web/Mobile Fog/Edge/Dew
most recent ones. The PACELC theorem [20] refines
the PAC theorem and states that in the case of network
P2P
Partitions only Availability or Consistency is guaran-
SMEs: Small and medium-sized enterprises
* social media, search engines teed. In case of no failures when the databases run nor-
mally (Else), then there is a trade-off between Latency
Figure 2: Data sizes in companies, devices, databases and
platforms. See [18] for the estimation of atoms on earth. and Consistency, i.e., only small latency or high con-
sistency can be guaranteed, but not both at the same
time. Distributed triple stores, which are built on top
of NoSQL databases, inherit the properties of their un-
platforms are utilized for data storing and processing,
derlying systems: For example, D-SPARQ [21] sup-
and the overall best approaches are chosen according
ports PA/EC, because it is based on MongoDB7 . Jena-
to the platform properties.
HBase [22], H2RDF [23] and H2RDF+ [24] inherit the
PC/EC properties of HBase8 . CM-Well9 is based on
3. Multi-Platform Multi-Model Cassandra10 supporting PA/EL. Remaining research
challenges include hybrid approaches supporting PA
Databases and PC (as well as EL and EC) for different fragments
of the data at the same time according to their appli-
Figure 2 provides an overview over data sizes of differ- cations.
ent types of data used in companies, devices, databases Hence there is a need to run these different types of
and platforms. It already becomes obvious that some databases at the same time, but there might be also
types of databases fit better to the considered types the need for integrating the data of these databases
of data and company, used devices and platforms than (like in the scenario of combining the data of IoT de-
the others. Hence the different types of data are stored vices with accounting data). For an advanced process-
on and processed at different platforms dependent on ing of this different types of data stored in different
their size, the devices they are generated at and other databases and other database tasks it is indispensable
properties like their velocity. Integrating these data to break the boundaries of single installations of these
sets implies to support multiple models and also dif- DBMSs and to run one single DBMS. Furthermore, it
ferent platforms at the same time. This also requires to is desirable that this single DBMS provides a seman-
support and integrate different types of databases run- tic layer for advanced processing and reasoning capa-
ning on different platforms. For example, one might bilities and for a tight integration of the different data
combine the data of IoT devices (stored in an IoT models. This would also allow to offer the best features
database running on the edge of the network) with
the accounting data containing the remaining time for 7 https://www.mongodb.com/ (accessed on 17.12.2020)
charging off (stored in a main memory database run- 8 https://hbase.apache.org/ (accessed on 17.12.2020)
9 https://github.com/CM-Well/CM-Well (accessed on
ning on an employee’s desktop computer). These dif-
17.12.2020)
ferent types of databases have different properties and 10 https://cassandra.apache.org/ (accessed on 17.12.2020)
21
of the different types of databases to applications and ecution plans are ideal for many-core CPUs and GPUs
users “under one hood” transparently or with an in- as well as whenever the best possibilities among enu-
telligent integration into one query language and API. merated ones must be found (like in query optimiza-
This single SHM3P DBMS installation runs over all tion and multi-version concurrency control (MVCC)).
platforms at the same time offering the advantages of Complex operations like joins processing large data
all the different types of DBMSs (to the data that has inputs are very suitable for GPU-acceleration, too (see
been previously processed by the single installations) e.g. [25] for especially designed joins for SPARQL pro-
tightly integrated in a semantic layer, but to have e.g. cessing on GPUs).
a global optimization of data distribution, transaction Field-programmable gate arrays (FPGAs) can recon-
handling and global queries and reasoning tasks with figure interconnects for connecting programmable logic
full potential by having freedom of processing down to blocks with each other. This property makes FPGAs
the physical layer (e.g., index accesses)11 . One single ideal suitable for data-flow-driven algorithms (like pro-
SHM3P DBMS would also reduce development costs cessing an execution plan for evaluating queries in a
of applications and periods of vocational adjustment streaming way without block-wise materialization of
of developers by offering one API and query language intermediate steps like it is the case for many-core CPUs
with an additional semantic layer for all different plat- and GPUs), but also any arbitrary type of parallelism
forms. A very big challenge for SHM3P DBMSs is to can be offered by FPGAs. FPGA-acceleration of SPARQL
provide a global distributed reasoner, which integrates query processing as discussed in e.g. [26] achieves
different types of reasoners to be processed on the dif- scalable speedups even increasing with larger data sets.
ferent platforms, where reasoning is optimized for this Dynamic partial reconfiguration enables FPGAs to dy-
heterogeneous environment minimizing overall costs namically exchange their configurations to process dif-
combining weighted costs of different types (commu- ferent queries at runtime [26].
nication, processing, lifetime of IoT devices etc.). Universal quantum computers try to combine the
full power of classical computers with quantum com-
3.1. Platforms puters that manipulate (some few) qubits in super po-
sition by applying quantum logic gates. In compari-
We describe shortly the different platforms running son, quantum annealers - operating on up to several
execution environments for different types of DBMSs thousand qubits - only run special types of quantum
here. algorithms to solve adiabatic (as special form of com-
Server Platforms are typical platforms for database binatorial) optimization problems, which is e.g. the
servers of small to medium-sized enterprises (SMEs). case for traffic control12 , selecting the execution plan
The DBMSs running on servers are usually centralized with the best estimated costs (from a set of enumer-
databases, which are operating in parallel on multi- ated plans) [27], concurrency control between transac-
core and sometimes many-core systems, often in vir- tions [28] as well as optimizing transaction schedules
tual machines. Relational DBMSs, most Semantic Web [29, 30].
DBMSs and Reasoners are typically running on server Cloud Databases are designed to be run in the
platforms, and all other types of DBMSs usually offer cloud, where (storage and computing) resources can
a local mode to run on a single server. be dynamically allocated and freed according to users’
Hardware-Accelerated Servers speed up database demands. Hence, cloud databases must consider that
tasks by utilizing the massive parallelism of special nodes (for storing and computing) are joining and leav-
hardware behind today’s multi-core CPUs. ing, such that it may be necessary to redistribute data
Modern Graphical Processing Units (GPUs) consist and to react for processing jobs on leaving nodes. Fur-
of several thousand computing cores, which follow the thermore, as the nodes are typically not high-end hard-
single-instruction multiple-data paradigm, i.e., the same ware like servers with redundant components and
instruction is executed on different data on different clouds consist of many more nodes (up to several thou-
cores at the same time. GPUs are often regarded as sand nodes), hardware and communication failures may
special form of many-core CPUs. Hence, neither all occur more often. Hence, cloud computing architec-
parallel algorithms are suitable for nor benefit from tures apply simple fault-tolerance mechanisms by re-
GPUs. However, the massive parallel processing of ex- peating crashed jobs. Table 2 contains an overview
11 Note that single installations of DBMSs can only be accessed 12 investigated by Volkswagen, see
via their offered APIs or by setting up subqueries (of the global https://www.volkswagenag.com/en/news/stories/
query) to them, which hinders the full potential of optimized pro- 2018/11/intelligent-traffic-control-with-quantum-computers.html
cessing of e.g. joins between the data of the different DBMSs. (accessed on 17.12.2020)
22
Table 1
Rough Evaluation of different Types of Databases.
DBMS
Main Paral- Distri- Fede- Cloud Web Mobile IoT
Feature Memory lel buted rated Cloud
Scalability – O + + + + + + + + + +
Transaction rates + + + + + O / + O + + + – – –
Intra-Transaction + + + + + O / + – / O + O – –
Parallelism
Atomicity + + + + + + + + + + + + +
Durability + + + + + + + + + – O –
Consistency + + + + + + + + + + + + +
Extensibility – + O / + O + + + + + – + + +
Schemaless – – – – – – – – – – + + + + + + + + + +
Availability + + + + – – – – – – – – – –
Transparency of + + + + + O + + – – – –
Distribution
Geographical Dis- – – – + + + + + + + + + + +
tribution
Mobility – – – O O O + + +
Node Autonomy – – – O + O – – + + +
Heterogeneity of – – – – + – – + + + + +
DBMS
Administration O O – – / – – – + + – – – – –
Hardware Costs – – – – – + + + + + – + + +
Reasoning + + + + + + + – – + + + – – – – –
Table 2
Evolution of Big data analytics engines. Based on [16] and extended by the rows “Impact on Databases” and “Impact on
Reasoning”.
Generation: 1 2 3 4
Features: Batch + Interactive + Near-Real-Time + Real-Time Streaming
+ Iterative Processing + Native Iterative Processing
Processing MapReduce DAG Dataflows Resilient Distributed Cyclic Dataflows
Model: Datasets (RDD)
Impact on Long-Running Query Answering + Continuous Queries + Real-Time Continuous Queries
Databases: Queries with lower latency
Impact on Long-Running Reasoning with + Capabilities for Stream + Cap. for Real-Time Stream
Reasoning: Reasoning lower latency Reasoning Reasoning
Engine: Hadoop TEZ Spark Flink
over important state-of-the-art Big Data analytics en- a new form of cloud: the web cloud [35]: One just
gines working in cloud environments. Additionally visits with his/her web browser a certain webpage in
to one-time queries, Apache Spark and Apache Flink order to connect his/her computer to the web cloud.
offer to process data streams and continuous queries, In this way the setup of the web cloud is much easier
such that they also belong to the type of than those of traditional clouds. Furthermore, the web
stream databases. There exists various examples of cloud has a much larger number of potential nodes, as
Semantic Web databases on top of the different Cloud any computer running a browser may connect to and
technologies like [31] (HBase, Pig), [32] (Spark) and be integrated in the web cloud. New challenges arise
[33] (Flink), but also other contributions avoiding to when setting up a cloud by web browsers: The nodes
use the well-known technologies like [34] in order to may be more often disconnected. Data is processed
support local joining. Web Cloud Databases rely on within the browser and hence we must use the tech-
23
nologies offered by the browser for data management but as more IoT devices are also available.
purposes. New technologies like WebAssembly [36] Dew computing [45, 46] overcomes availability prob-
introducing a virtual machine for the browser may help lems, where the communication between cloud and
to speed up processing in the browser. There exist first IoT devices is disturbed, by placing an additional local
approaches to distribute SPARQL queries in some kind server near to the IoT devices taking over the tasks of
of web clouds [37]. the cloud during downtimes and synchronizing with
Mobile Databases [38] involve the technical infras- the cloud at uptimes.
tructure of mobile providers like base stations (being Besides many approaches to semantic IoT like cor-
near-by to their connected mobile devices) in order to responding ontologies [47] and interoperability issues
speed up processing, lower communication (and hence [48], there are not so many contributions to seman-
also energy) costs, increase availability and durability tic IoT databases. IoT databases are often organized
(by logging at the base stations instead on mobile de- as P2P database, especially if they work on the fog or
vices) in order to overcome limitations of the mobile edge, or follow the dew computing concept. Hence
devices. Some RDF stores like [39] are especially de- contributions to P2P networks processing Semantic Web
signed to run on mobile devices, but they do not con- data like [41] are relevant for semantic IoT databases
sider the backend of mobile providers so far. as well. One of the big challenges here is the distribu-
P2P Databases [40, 41] use peer-to-peer (P2P) net- tion of data and processing tasks between cloud and
works as underlying backend technology to master a IoT infrastructure including the devices themselves.
frequent joining and leaving of nodes for data stor- Furthermore, IoT devices often generate data streams,
ing and processing. In comparison to clouds, they are such that organizing the IoT database as stream database
designed for a much more frequent change in their is a reasonable choice: The IoT application design may
topology and for an equal distribution of functional- especially consider to reduce data by aggregation and
ity without distinction of master and slave nodes. P2P focusing on only relevant data, which should be done
databases have to introduce more redundancy in data nearby the things. One research direction may con-
storing as well as even in processing in order to over- sider how to use Semantic Web technologies for defin-
come the frequent disconnections to their nodes. Fur- ing such aggregation tasks. Reasoning at data sources
thermore, P2P databases must consider heterogeneity or nearby, or in clouds is another difficult question and
in the connected nodes much more than other types of not so easy to answer in comparison to query process-
databases. There exist already quite many approaches ing on the fog or edge, as reasoning consumes much
for semantic data processing in P2P networks like [41], more processing resources.
but ontology inference is considered only on a rudi-
mentary basis and for trigger and continuous queries 3.2. (S)HM3P Databases and their
not at all [16].
Challenges
IoT Databases [42] are especially developed to serve
as data store for large-scale installations of the Internet- HM3P databases are single installations of a M3P DBMS,
of-Things (IoT). IoT databases often operate in the cloud, which are not only able to run on multiple platforms,
but the communication bootleneck from the IoT de- but runs and tightly integrates different types of DBMSs
vices to the cloud doesn’t scale especially for IoT de- for ease of use and optimization purposes at runtime.
vices with high velocity and large-scale installations. SHM3P databases integrate the different types of DBMSs
In companion with the cloud, fog computing [43] in an additional semantic layer and supports global
stores and processes data and application logic on near- reasoning over all integrated DBMSs.
things edge devices with higher capabilities (rather than IoT databases operating at the same time in clouds
primarily in cloud data centers), which saves commu- and on fog, edge or dew computing are reasonable ex-
nication avoiding the route over the internet backbone. amples for H3MP DBMSs: They span over different
However, fog computing is not really scalable in the platforms, the edge of the IoT network and the cloud
number of connected things, as the near-things edge data centers, and have to distribute functionality like
devices do not increase in number and capabilities in data aggregation at or near to the things and complex
the same way. operations, e.g., natural language processing and rea-
The scalability issue is solved in a better way by soning, at the cloud data centers. Furthermore, IoT
edge computing [44], which utilizes additionally all databases have to consider different types of query pro-
IoT devices for data storage and processing, and ex- cessing by dealing with traditional (one-time) queries
ecuting application logic: As more IoT devices are de- on static data, continuous queries on data streams and
ployed, as more data needs to be stored and processed, spatial-temporal queries on archived data of data streams.
24
IoT devices are often heterogeneous because they are • developing multi-platform transaction synchroniza-
e.g. developed by different manufacturers: the use of tion approaches and supporting global transaction
ontologies and hence of semantic databases simplifies synchronization approaches over distributed differ-
the integration of these devices. Semantic IoT databases ent transaction synchronization approaches running
sometimes manage data at the IoT devices in the tradi- on different platforms
tional way for performance reasons and only support • combining different types of databases (on different
reasoning and semantic querying at the cloud centers platforms) to offer the best of these databases and
after transforming the data of IoT devices to semantic platforms under one hood to applications and users
data [16]. Other approaches support even reasoning transparently or via intelligent integration into query
on streams [16]. language and API, e.g., guaranteeing atomicity and
Multi-platform DBMSs are already highly ambitious isolation in transactions for the data stored on a par-
even for large, established database companies since it allel server, but not for those data in the cloud sup-
requires data management skills in an extremely wide porting fast updates
spectrum (i.e., data management issues in sensors and
smart objects for IoT databases are completely differ- Specific challenges of SHM3P DBMSs are
ent from the challenges of in-memory databases of P2P • integrating different data models in a semantic layer
data oriented systems and semantic querying and rea- on top of the underlying data models
soning of Semantic Web databases). Hence current ap- • efficient transformations from and to the semantic
proaches are more on interoperability between the va- model in an operational system
riety of DBMSs, each one focusing on its specific issues • developing efficient semantic querying and reason-
related to its specific functionalities. However, we pro- ing over the integrated data of different models
pose to support a global approach to integrate all these • global reasoning over reasoners running on differ-
specific functionalities in order to use their different ent platforms supporting some kind of distributed
benefits in an uniform way and to increase the overall heterogeneous reasoning
benefits of the global approach. • developing a combination of stream reasoning over
New challenges of M3P and HM3P DBMSs in streaming data (e.g. of IoT devices) with static rea-
comparison to traditional DBMSs and MM-DBMSs are soning over large-scale data sets (stored e.g. in clouds)
• developing only one code base for the different plat- • supporting transactions over semantic data by inte-
forms, but not introducing performance overhead in grating the reasoner in transaction synchronization
comparison to single platform databases13
• identifying common properties of several platforms We are sure that this is not an exhaustive list of new
and reusing those approaches (like fault tolerance challenges. Many further challenges will arise during
mechanisms) in different combinations, which are developing the (S)(H)M3P DBMSs and considering es-
best suitable for these considered platforms pecially combinations of different platforms and mod-
• data distribution among different platforms (apply- els at runtime.
ing different data distribution approaches as well)
• efficient binary serialization and communication pro-
tocols for integrating the different platforms
4. Summary and Conclusions
• data distribution strategies considering overall the Multi-model databases provide the infrastructure to han-
different properties of used platforms and models dle the zoo of data models managed in today’s compa-
(like fast reads in relational databases on parallel nies. Multi-model databases that are able to run on
servers and fast updates in cloud databases) a variety of platforms, which are typically deployed
• query optimization and other database tasks across and in use in parallel in today’s companies, are called
different platforms, which apply different database multi-model multi-platform database management sys-
approaches tems (M3P DBMSs). Hybrid M3P (HM3P) DBMSs span
• dealing with and integrating different privacy over different platforms at run-time. Our focus is on its
and security mechanisms supporting different pri- semantic counterpart: Semantic HM3P (SHM3P) DBMSs
vacy and security levels in the different platforms offer its additional semantic layer for simple integra-
(with research e.g. on querying heterogeneous en- tion of the DBMS technologies of its operational plat-
crypted data) forms. Furthermore, we describe and analyze different
13 We are of the opinion that this is possible by applying Kotlin types of DBMSs and platforms concerning their prop-
features like expected and actual declarations for classes and types, erties, chances and challenges for DBMSs with spe-
and inline functions and classes.
25
cial focus on Semantic DBMSs. Current state-of-the- systems, in: AFIPS National Computer Confer-
art (S)M3P DBMSs don’t exploit the multiple platform ence, 1981, pp. 487–499.
idea to its full potential, because they typically only [11] H. Lim, Y. Han, S. Babu, How to fit when no one
tightly integrate one type of platform and database. size fits., in: CIDR, 2013.
We see great further optimization possibilities in data [12] A. Elmore, J. Duggan, M. Stonebraker, M. Bal-
and functionality distribution like query processing, azinska, U. Cetintemel, V. Gadepally, J. Heer,
reasoning and transaction handling, and ease of usage B. Howe, J. Kepner, T. Kraska, S. Mad-
when different types of platforms and databases are den, D. Maier, T. Mattson, S. Papadopoulos,
supported in one single installation of a M3P DBMS J. Parkhurst, N. Tatbul, M. Vartak, S. Zdonik, A
by tightly integrating them based on a semantic layer. demonstration of the bigdawg polystore system,
Proc. VLDB Endow. 8 (2015) 1908–1911.
[13] J. Lu, Z. H. Liu, P. Xu, C. Zhang, UDBMS: road
References to unification for multi-model data management,
in: ER Workshops, 2018, pp. 285–294.
[1] J. Lu, I. Holubová, Multi-model databases: A new
[14] I. Holubova, S. Scherzinger, Nextgen multi-
journey to handle the variety of data, ACM Com-
model databases in semantic big data architec-
puting Surveys (CSUR) 52 (2019).
tures, Open Journal of Semantic Web (OJSW) 7
[2] R. Kotorov, Customer relationship management:
(2020) 1–16.
strategic lessons and future directions, Business
[15] W3C, Semantic Web Development Tools, ac-
Process Management Journal 9 (2003) 566–571.
cessed on 23/4/2020. https://www.w3.org/2001/
[3] S. Leberknight, Polyglot persis-
sw/wiki/Tools.
tence, Scott Leberknight’s Weblog,
[16] S. Groppe, Emergent models, frameworks, and
http://www.sleberknight.com/blog/sleberkn/
hardware technologies for big data analytics, The
entry/polyglot_persistence, 2008.
Journal of Supercomputing 76 (2020) 1800–1827.
[4] S. Groppe, J. Groppe, Hybrid multi-model multi-
[17] L. Feigenbaum, G. T. Williams, K. G.
platform (hm3p) databases, in: Proceedings
Clark, E. Torres (editors), SPARQL 1.1
of the 9th International Conference on Data
Protocol, 2013. W3C Recommendation,
Science, Technology and Applications (DATA),
https://www.w3.org/TR/sparql11-protocol/.
2020.
[18] D. Weisenberger, How many atoms are there
[5] JetBrains s.r.o., FAQ - Kotlin Programming Lan-
in the world?, accessed on 17.12.2020. http://
guage, 2020. URL: https://kotlinlang.org/docs/
education.jlab.org/qa/mathatom_05.html.
reference/faq.html.
[19] E. A. Brewer, Pushing the CAP: strategies for
[6] B. Kolev, P. Valduriez, C. Bondiombouy,
consistency and availability, Computer 45 (2012)
R. Jiménez-Peris, R. Pau, J. Pereira, Cloudmdsql:
23–29.
querying heterogeneous cloud data stores with
[20] D. Abadi, Consistency tradeoffs in modern dis-
a common language, Distributed and Parallel
tributed database system design: CAP is only
Databases 34 (2016) 463–503.
part of the story, Computer 45 (2012) 37–42.
[7] M. Zhu, T. Risch, Querying combined cloud-
[21] R. Mutharaju, S. Sakr, A. Sala, P. Hitzler, D-
based and relational databases, in: International
sparq: Distributed, scalable and efficient rdf
Conference on Cloud and Service Computing,
query engine, in: Proceedings of the 12th In-
2011, pp. 330–335.
ternational Semantic Web Conference (Posters &
[8] R. Bonaque, T. D. Cao, B. Cautis, F. Goasdoué,
Demonstrations Track), Sydney, Australia, 2013,
J. Letelier, I. Manolescu, O. Mendoza, S. Ribeiro,
p. 261–264.
X. Tannier, M. Thomazo, Mixed-instance query-
[22] V. Khadilkar, M. Kantarcioglu, B. Thuraisingham,
ing: a lightweight integration architecture for
P. Castagna, Jena-hbase: A distributed, scalable
data journalism, PVLDB 9 (2016) 1513–1516.
and efficient rdf triple store, in: Proceedings of
[9] M. Hammer, D. McLeod, On Database Manage-
the 2012th International Conference on Posters
ment System Architecture., Technical Report,
& Demonstrations Track, Boston, USA, 2012, p.
MIT, Cambridge Laboratory for Computer Sci-
85–88.
ence, 1979.
[23] N. Papailiou, I. Konstantinou, D. Tsoumakos,
[10] J. M. Smith, P. A. Bernstein, U. Dayal, N. Good-
N. Koziris, H2RDF: Adaptive query processing on
man, T. Landers, K. W. T. Lin, E. Wong, Multibase:
rdf data in the cloud, in: Proceedings of the 21st
Integrating heterogeneous distributed database
International Conference on World Wide Web,
26
Lyon, France, 2012, p. 397–400. fication, W3C Proposed Recommendation, https:
[24] N. Papailiou, I. Konstantinou, D. Tsoumakos, //www.w3.org/TR/wasm-core-1/, 2019.
P. Karras, N. Koziris, H2RDF+: high-performance [37] A. Grall, P. Folz, G. Montoya, H. Skaf-Molli,
distributed joins over large-scale RDF graphs, in: P. Molli, M. Vander Sande, R. Verborgh, Ladda:
Proceedings of the 2013 IEEE International Con- Sparql queries in the fog of browsers, in: Euro-
ference on Big Data, Santa Clara, USA, 2013, pp. pean Semantic Web Conference, Springer, 2017,
255–263. pp. 126–131.
[25] X. Zhang, M. Zhang, P. Peng, J. Song, Z. Feng, [38] V. Kumar, Mobile database systems, Wiley On-
L. Zou, A scalable sparse matrix-based join for line Library, 2006.
sparql query processing, in: International Con- [39] D. Le-Phuoc, J. X. Parreira, V. Reynolds,
ference on Database Systems for Advanced Ap- M. Hauswirth, Rdf on the go: An rdf storage and
plications, Springer, 2019, pp. 510–514. query processor for mobile devices, in: ISWC,
[26] S. Werner, D. Heinrich, S. Groppe, C. Blochwitz, Citeseer, 2010.
T. Pionteck, Runtime adaptive hybrid query en- [40] K. Graffi, D. Stingl, C. Gross, H. Nguyen, A. Ko-
gine based on fpgas, Open Journal of Databases vacevic, R. Steinmetz, Towards a p2p cloud: Reli-
(OJDB) 3 (2016) 21–41. able resource reservations in unreliable p2p sys-
[27] I. Trummer, C. Koch, Multiple query optimiza- tems, in: International Conference on Parallel
tion on the d-wave 2x adiabatic quantum com- and Distributed Systems, 2010, pp. 27–34.
puter, Proc. VLDB Endow. 9 (2016). [41] R. Mietz, S. Groppe, O. Kleine, D. Bimschas, S. Fis-
[28] S. Roy, L. Kot, C. Koch, Quantum databases, in: cher, K. Römer, D. Pfisterer, A p2p semantic
CIDR, 2013. query framework for the internet of things, PIK-
[29] T. Bittner, S. Groppe, Avoiding blocking by Praxis der Informationsverarbeitung und Kom-
scheduling transactions using quantum anneal- munikation 36 (2013) 73–79.
ing, in: 24th International Database Engineering [42] ObjectBox Limited, The best IoT Databases for
& Applications Symposium (IDEAS), Seoul, Re- the Edge – an overview and compact guide,
public of Korea, 2020. https://objectbox.io/the-best-iot-databases-for-
[30] T. Bittner, S. Groppe, Hardware accelerating the the-edge-an-overview-and-compact-guide/,
optimization of transaction schedules via quan- 2019.
tum annealing by avoiding blocking, Open [43] M. Abdelshkour, Iot, from cloud
Journal of Cloud Computing (OJCC) 7 (2020) 1– to fog computing, Cisco Blogs,
21. URL: http://nbn-resolving.de/urn:nbn:de:101: http://blogs.cisco.com/perspectives/iot-from-
1-2020112218332015343957. cloud-to-fog-computing, 2015.
[31] S. Groppe, T. Kiencke, S. Werner, D. Heinrich, [44] P. Garcia Lopez, A. Montresor, D. Epema,
M. Stelzner, L. Gruenwald, P-luposdate: Us- A. Datta, T. Higashino, A. Iamnitchi, M. Barcel-
ing precomputed bloom filters to speed up sparql los, P. Felber, E. Riviere, Edge-centric comput-
processing in the cloud, Open Journal of Seman- ing: Vision and challenges, SIGCOMM Comput.
tic Web (OJSW) 1 (2014) 25–55. Commun. Rev. 45 (2015) 37–42.
[32] D. Graux, L. Jachiet, P. Geneves, N. Layaïda, Spar- [45] K. Skala, D. Davidovic, E. Afgan, I. Sovic, Z. Sojat,
qlgx: Efficient distributed evaluation of sparql Scalable distributed computing hierarchy: Cloud,
with apache spark, in: ISWC, 2016. fog and dew computing, Open Journal of Cloud
[33] A. Azzam, S. Kirrane, A. Polleres, Towards Computing (OJCC) 2 (2015) 16–24.
making distributed rdf processing flinker, in: [46] Y. Wang, Definition and categorization of dew
Innovate-Data, IEEE, 2018, pp. 9–16. computing, Open Journal of Cloud Computing
[34] S. Groppe, J. Blume, D. Heinrich, S. Werner, A (OJCC) 3 (2016) 1–7.
self-optimizing cloud computing system for dis- [47] S. Mishra, S. Jain, Ontologies as a semantic model
tributed storage and processing of semantic web in iot, International Journal of Computers and
data, Open Journal of Cloud Computing (OJCC) Applications 42 (2020) 233–243.
1 (2014) 1–14. [48] A. Cimmino, M. Poveda-Villalón, R. García-
[35] S. Groppe, N. Reimer, Code generation for big Castro, ewot: A semantic interoperability ap-
data processing in the web using webassem- proach for heterogeneous iot ecosystems based
bly, Open Journal of Cloud Computing (OJCC) on the web of things, Sensors 20 (2020) 822.
6 (2019) 1–15.
[36] A. Rossberg (editor), WebAssembly Core Speci-