Semantic Hybrid Multi-Model Multi-Platform (SHM3P)
Databases
Sven Groppe
Institute of Information Systems (IFIS), University of Lübeck, Ratzeburger Allee 160, D-23562 Lübeck, Germany


                                          Abstract
                                          Today’s companies have to handle a zoo of data of different models. Multi-model databases promise to simplify data admin-
                                          istration for the parallel usage of different data models. Compared to the other data models, semantic data models introduce
                                          an additional abstraction layer for reasoning purposes, such that semantic data models provide superior capabilities. Hence
                                          semantic multi-model databases use the semantic data model as main glue between the different data models. Furthermore,
                                          applications as well as databases are today running on different platforms like mobile devices, web, desktops, servers, clouds
                                          and post-clouds (e.g., fog and edge computing). Hybrid multi-model multi-platform (HM3P) databases and its semantic
                                          counterpart (SHM3P databases) integrate the different platforms in order to offer their advantages and benefits for data dis-
                                          tribution, query processing and transaction handling to their users. In this paper we introduce and discuss the novel concept
                                          of SHM3P databases and its open challenges.

                                          Keywords
                                          Semantic Web, databases, multi-platform, multi-model, cloud, post-cloud, edge computing, fog computing, dew computing,
                                          hardware acceleration, Internet-of-Things, mobile database, parallel database, main-memory database


1. Introduction                                                                                                    model data [3] hindering optimizations down to the
                                                                                                                   physical layer of connected DBMSs [4]. Furthermore,
Today companies have to deal with and process data                                                                 we propose the semantic data model in order to unify
in various data formats: The backends of their web                                                                 the other data models, because the semantic data model
shops with databases about customers and their or-                                                                 offers the ontology layer as additional abstraction layer,
ders are typically connected to relational databases.                                                              which can be utilized for data integration purposes of
Product catalogs of companies are often exchanged us-                                                              the other data models.
ing XML, JSON or RDF. The boom of social networks                                                                     While in the past database management systems
leads to a high demand to process their graph data,                                                                (DBMSs) run mainly on parallel servers, there are to-
other social media like wikis offer their data as un-                                                              day various different platforms like mobile devices,
structured data. Key-value stores are often used when-                                                             web, desktops, servers (maybe additionally hardware
ever data must be accessed in a simple way just via                                                                accelerated by GPUs, FPGAs and in future scenarios
keys. However, there is also a need for schema-free                                                                even quantum computing), clouds and post-clouds (e.g.,
or schema-less databases, which don’t ask the data to                                                              fog and edge computing) offering execution environ-
stay in the inflexible corset of a schema, but still work-                                                         ments for running a DBMS1 .
ing on complex data formats like document stores. The                                                                 Multi-platform development (as supported by e.g.
data is hence stored according to and processed using                                                              the programming language Kotlin [5]) allows to share
different models (multi-model data [1]). The big chal-                                                             common code between different platforms like desk-
lenge for today’s companies are the synchronization                                                                top, server, web, mobile and IoT. Multi-platform de-
and integration of their multi-model data into a sin-                                                              velopment reduces the development costs for a DBMS
gle view of and for the customer [2]. Multi-Model                                                                  running on multiple platforms drastically.
Database Management Systems (MM-DBMSs) of-                                                                            Puzzling all pieces together we propose the follow-
fer the management of different data models in one                                                                 ing definitions ((H)M3P DBMS are defined according
single database [1] in order to overcome the disadvan-                                                             to [4]):
tages of polyglot persistence, where applications use
several databases at the same time to handle multi- Definition 1 (M3P/HM3P/SHM3P DBMS). A Multi-
                                                                  Model Multi-Platform Database Management System
ISIC’21: International Semantic Intelligence Conference, February (M3P DBMS) is a MM-DBMS that can be executed on
25–27, 2021, New Delhi, India                                     different platforms. A hybrid M3P (HM3P) DBMS spans
" groppe@ifis.uni-luebeck.de (S. Groppe)                          over different platforms in operation. A Semantic HM3P
 0000-0001-5196-1117 (S. Groppe)
                                    © 2021 Copyright for this paper by its authors. Use permitted under Creative       1 Note that clients of DBMSs typically run on different plat-
                                    Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)                                        forms, but we are considering the database server here.
                                                                                                                                     17


                                         Single instance of SHM3P Database
                               offers (fully cross-platform optimized) functionality of & replaces
                                                                                                                        Mobile DB
                                                                                                                        Mobile Devices
                                                                Quantum DB                        Cloud DB              & Infrastructure
                                              Main-             Quantum                           Cloud
                    IoT DB                                      Computer
                                       Memory DB
                    On the Edge                      GPU
                                   GPU-accelerated
 Reasoning:                          Parallel Server
 Lightweight reasoning on                  Heavyweight reasoning       Heavyweight reasoning            Reasoning on small data sizes
 large data sizes of IoT devices           on moderate data sizes      on large data sizes              of mobile devices

          How to integrate the different reasoning capabilities and requirements into one transparent global reasoner?
Figure 1: SHM3P database spanning over multiple platforms. Here, an SHM3P database replaces an IoT database in an
Industry 4.0 scenario (using edge-computing), a GPU-accelerated parallel database (on a parallel server) for archiving and
generating long-term statistics of the IoT data, which is further supported by a quantum computer for query and reasoning
optimization, a database in the cloud for natural language processing tasks and a mobile database (on mobile devices and
infrastructure) for monitoring and controlling of the production line in the company. Platforms are marked with an italic
font. Green text marks discussion about reasoning in these scenarios. Figure is based on [4] and extended by the discussion
on reasoning.


(SHM3P) DBMS supports a (global) semantic layer (for cerning MM-DBMSs, multi-platform development,
querying and reasoning purposes) over all platforms of databases running on different platforms, polyglot per-
an HM3P DBMS.                                               sistence and further related work. Section 3 introduces
                                                            SHM3P DBMSs and explores the advantages, and anal-
   Whereas today’s M3P DBMSs are typically devel- yses envisioned platforms and common properties of
oped for platforms of the same type (like windows and their combinations. Finally we summarize the results
linux servers, see Section 2.1), some other even span and provide an overview of future work in Section 4.
over a (locally installed) private cloud and a public cloud
(in a so called hybrid cloud 2 ). In contrast, we envi-
sion SHM3P DBMSs over platforms of different type 2. Basics
(like IoT and hardware-accelerated parallel servers) in-
tegrating the features of databases developed for these 2.1. Databases for Multi-Model Data
platforms (like energy-savings on IoT devices and high
                                                            Polyglot persistence uses different databases support-
throughput on servers) while offering advanced global
                                                            ing different data models (and maybe running on dif-
reasoning capabilities over all platforms. Hence SHM3P
                                                            ferent platforms) within one application [3]. Federated
databases support any data model at any platform by
                                                            query languages enable polyglot persistence by sup-
tightly integrating them with a semantic layer. For an
                                                            porting queries over heterogeneous data stores within
example installation, see Figure 1.
                                                            one single query. One example of such a query lan-
   Our main contributions are:
                                                            guage is CloudMdsQL [6], with which one can for-
• the introduction of SHM3P DBMS as new type of
                                                            mulate queries over SQL and NoSQL databases. The
   DBMS,
                                                            proposed prototype even optimizes the queries glob-
• a detailed discussion of the current state of the art
                                                            ally and pushes operations down to the integrated SQL
   about and comparative analysis of DBMS designed
                                                            and NoSQL databases as much as possible. A similar
   for different platforms with special attention to Se-
                                                            approach is taken by [7] offering to query cloud-based
   mantic Web DBMS, and
                                                            NoSQL like Google’s Bigtable and relational databases
• a discussion about open research challenges for
                                                            with the Google Bigtable query language GQL. The fo-
   HM3P DBMS and SHM3P DBMS.
                                                            cus of Apache Drill3 is interactive ad-hoc analysis of
   The remainder is as follows: Section 2 describes the
                                                            large-scale datasets with low-latency handling up to
basics and an analysis of current state-of-the-art con-
                                                            petabytes of data spread across thousands of servers.
   2 Please note that private and public clouds are platforms of the

same type.                                                              3 https://drill.apache.org/ (accessed on 17.12.2020)
                                                                                                               18


Drill optimizes a query plan to leverage the datastore’s where most of which, i.e. 4 of these 5 MM DBMS with
internal processing capabilities and by considering data RDF support, also manage graph data. The graph model
locality. Commercial multi-store products like IBM seems to be more popular (12 from 21 MM DBMS).
BigInsights, Microsoft HDInsight and Oracle Bigdata MM DBMS with RDF support typically don’t support
Appliance as well as open source projects like PrestoDB4 reasoning at all or only in a rudimentary way, such
integrate diverse data sources by using database con- that users should look for native semantic DBMS if
nectors (like JDBC drivers). Tatooine [8] uses a se- reasoning is needed. Hence reasoning seems to be chal-
mantic layer as glue between databases for different lenging in the MM DBMS context. Most multi-model
data models supporting a semantic integration. How- databases run SQL, SQL-like or extensions of SQL
ever, all these polystores also don’t support to fully op- queries. Binaries of these databases are offered in ma-
timize queries across the integrated, but independent chine code (often compiled from C/C++) or for the Java
data sources, which limit data processing.                 virtual machine (JVM). They usually run on all or a big
   Federation Databases [9] and multidatabases [10] subset of the major desktop operating systems linux,
place a mediator between different autonomous windows, macOS, unix and their variants. Few multi-
databases for integration purposes by reformulating model databases like IBM DB2 run on mainframes op-
queries according to a global schema to the native erating e.g. z/OS. While all offer to run in the cloud,
schemes of the integrated databases, which afterwards some are also enabled for the hybrid cloud. In the
execute these queries. Today, some research focus on hybrid cloud, a (locally installed) private cloud is to-
federating databases following the polyglot persistence gether used with a public cloud. Hybrid clouds de-
approach: For example, DBMS+ [11] provides unified crease costs spent to the public cloud provider while
declarative processing for the integration of several still having on-demand resources with the illusion of
processing and database platforms. BigDAWG [12] of- infinite capacity at the public cloud for a surprising
fers location transparency while running queries high resource demand.
against the three different integrated systems                While all multi-model databases run on different plat-
PostgreSQL, SciDB and Accumulo.                            forms, they don’t integrate database instances on dif-
   Multi-Model Databases: A multi-model database ferent types of platforms and different types of databases.
is one single database for multiple data models, which Databases in hybrid clouds combining the resources of
fully integrates a backend to offer advanced perfor- a locally installed private cloud with a public cloud are
mance, scalability and fault tolerance [13]. One of the approximations of the idea of operating on multiple
first of this type are Object-Relational DataBase Man- platforms of different types. An HM3P DBMS extends
agement Systems (ORDBMSs), which support various this idea and supports multiple types of platforms like
data models like relational, text, XML, spatial and ob- main-memory, cloud, Internet-of-Things (with e.g. edge
ject. ORDBMSs use the relational technology for im- computing) and hardware-accelerated databases using
plementing the support of their data models, i.e., the their different advantages at runtime for database tasks
relational model is the first-class citizen. In compari- like data distribution, transaction handling and query
son and in general, in multi-model databases the dif- processing. A SHM3P DBMS offers a semantic layer as
ferent models can be all first class citizens and sup- glue between the different data models and supports
ported in a native way (utilizing e.g. specialized in- global semantic querying and reasoning by tightly in-
dices for them). The authors in [14] propose to use a tegrating local query engines and reasoners.
semantic layer as glue between the different data mod-
els in order to support global querying and reasoning 2.2. Multi-Platform Development
over all data. We extend this idea to multi-platform
databases integrating the technologies and features of There are several programming languages like C/C++
different types of databases.                              available compiling to various platform targets in their
   [4] contains an overview of current state-of-the-art native machine code best suitable for high performance
multi-model databases, their type of extension, their programs. Calls to the operating system for disk ac-
supported data models, query languages and platforms. cesses or developing a (native) graphical user interface
The investigated multi-model databases support at most must be ported to the different platforms. There is no
5 from 8 data models, such that no multi-model database special support for multi-platform development like
offers all data models to their users. From the investi- code-sharing of common code and allowing to define
gated 21 MM DBMS only 5 support RDF as data model, platform-specific modules to code the differences be-
                                                           tween the different platforms. Java was one of the first
    4 https://prestodb.io/ (accessed on 17.12.2020)        programming languages for developing one code run-
                                                                                                                          19


ning on different platforms, which is still the key for Semantic Web tools with native binaries run usually
the success of Java. It has been implemented by com- on any desktop and server computers, some only on
piling to bytecode, which is processed in the Java vir- linux operating systems.
tual machine (JVM) available for many platforms. The         Hence these DBMSs can be called Multi-Platform
JVM introduces an intermediate abstraction layer, but DBMSs, but don’t bring the multi-platform approach
also some performance overhead, although the byte- to its full potential. They are typically developed for
code is often just-in-time (JIT) compiled to native ma- one type of platform: server, cluster or cloud. DBMSs
chine code. Scripting languages like JavaScript also designed for different types of platforms like cluster,
run on different platforms (i.e., wherever browsers and mobile, IoT and the web are not considered so far. HM3P
Node.js environments can be started). JavaScript be- DBMSs span over different platforms at runtime, which
sides HTML 5 is the basis of cross-platform libraries may be the case for hybrid cloud installations, but which
like React Native and PhoneGap. Advanced multi- are also not deployed at different platform types. Hence,
platform support introducing a module concept for shar- full-fledged HM3P DBMSs have to consider various
ing common code between the different platforms, and different properties (e.g., availability of nodes, stor-
platform-specific modules for coding remaining dif- age and computing resources), the data (like security
ferences, is introduced by modern programming lan- concerns) and queries (like one-time versus continu-
guages like Kotlin [5]. Kotlin offers multi-platform sup- ous queries) of the supported platforms at runtime for
port for the JVM (Desktop, Server and Android), data distribution and processing. Reasoning support
JavaScript engines (browser and server via Node.js) is not available for all platforms and types of queries
and via LLVM Windows, Linux, Android (arm32/64), [16]: While many contributions exist for RDFS and
MacOS, iOS, Raspberry Pi and WebAssembly.                 OWL support during one-time query processing on
   Many DBMSs are implemented in C/C++ for per- server and desktop computers, there exist only few ap-
formance reasons and run in native machine code for proaches for the cloud and for P2P networks. There
operating systems like Windows, Linux, Unix and Ma- exist only few approaches for trigger and continuous
cOS (see [4]). Some modern DBMSs and most Seman- queries with RDFS and OWL support on server and
tic Web tools (see [15]) are implemented in Java fur- desktop computers as well as for the cloud. Ontology
ther decreasing development costs, but still running inference for trigger and continuous queries in P2P
on clusters and servers operating Windows, Linux, Unix networks haven’t been considered so far. The devel-
and MacOS. Real multi-platform tools by e.g. using opment of an SH3MP database may help to support
Kotlin multi-platform projects are missing so far for ontology inference in trigger and continuous queries
Semantic Web tools.                                       with reasonable efforts also on these platforms.
                                                             Multi-Platform Clients offering to set up queries
2.3. Databases for different Platforms                    and  displaying their results are available for all DBMSs5 :
                                                          DBMSs typically offer clients for platforms like the Web,
Most DBMSs and their clients run on different plat- major desktop operating systems like Windows, Linux,
forms. There exist usually also numerous language Unix and MacOS, mobile apps like android and iOS.
bindings for APIs calling database functionalities from Some clients are even implemented as cross-platform
database applications.                                    application6 , which also support different DBMSs. The
   Multi-Platform DBMSs are typically either imple- situation is quite comfortable for the Semantic Web:
mented in C/C++ or in Java. Ports are often available The W3C standardized the protocol to query SPARQL
for Windows, Linux, Unix (sometimes for Solaris) and endpoints in [17]. The protocol [17] is widely sup-
MacOS (see [4]). Only few DBMSs still run on main- ported and hence the Semantic Web DBMSs as well
frames. Modern DBMSs run in the Cloud and some- as the clients can be easily exchanged.
times they are offered only as managed service in the        The user may have the impression that a database
Cloud (e.g., Cosmos DB). Some few are also running in may be running on different platforms, because (s)he
a Hybrid Cloud, where the DBMS is running in a local gets in touch with clients for the database available for
installation of a cluster (private cloud) as well as in a different platforms. However, the DBMS does neither
public cloud (of a cloud provider). [15] contains a se- store nor process the data on the clients’ computer, but
lection of 18 widely-used Semantic Web tools includ- only transfers the query result to it. We envision a
ing triple stores and Semantic Web databases. Over SHM3P DBMS, where the advantages of the different
half of these tools are implemented in Java (i.e., 6 of       5 We consider PostgreSQL and its clients as example here.
these tools run on any platform, which supports java)         6 For example, DBeaver available at https://dbeaver.io/ (accessed
or support java language bindings (4 of these tools). on 17.12.2020).
                                                                                                                      20

Size:


                         Mebi
                                                          advantages because they have been developed for dif-


                                                                    Yobi
                                                            Zebi
           Byte


                                              Pebi
                                       Tebi
                                Gibi


                                                     Exbi
                  Kibi
 Binary:   21     210    220    230    240    250    260
                                                          ferent application scenarios, devices, properties of their
                                                            270     280     2167
 Decimal: 10 103 106 109 1012 1015 1018 1021 1024 1050 indexed data (velocity, heterogeneity, size etc.) and
                                                          so on. Table 1 contains a rough evaluation of these


                                                                    Yotta

                                                                              Atoms on Earth
                                                            Zetta
                         Mega

                                Giga


                                              Peta
           Byte


                                       Tera
                  Kilo


                                                     Exa
                                                          databases. Databases have tailored their architectures
Data:            Office            Internet Big Data*     according to the properties of the different platforms,
                                            IoT           but often also to the required properties coming from
Company:                    SMEs          Global Player   their applications. Especially distributed databases can-
Devices:                                                  not offer all: The well known PAC theorem [19] de-
               IoT Device            Cluster     Multi-   scribes trade-offs, where developers of distributed sys-
           Embedded Mobile             Cloud     Cloud
                                                          tems (and hence also distributed databases) can choose
          Historical
            Home
                              Server                      to fully support only two features with high efficiency
          Computer         Desktop                        out of three: Partition-tolerance, Availability and
Databases:        Main Memory                    Cloud    Consistency. For example, if the system works cor-
                                     Hardware
                     Centralized                    IoT   rectly also in the case of network partitions and is highly
                       Mobile Web Cloud                   available, then consistency must be relaxed, such that
Platforms:           Desktop                 Cloud        some replicas may contain older states and not the
                   Web/Mobile           Fog/Edge/Dew
                                                          most recent ones. The PACELC theorem [20] refines
                                                          the PAC theorem and states that in the case of network
                                         P2P
                                                          Partitions only Availability or Consistency is guaran-
              SMEs: Small and medium-sized enterprises
              *     social media, search engines          teed. In case of no failures when the databases run nor-
                                                          mally (Else), then there is a trade-off between Latency
Figure 2: Data sizes in companies, devices, databases and
platforms. See [18] for the estimation of atoms on earth. and Consistency, i.e., only small latency or high con-
                                                          sistency can be guaranteed, but not both at the same
                                                          time. Distributed triple stores, which are built on top
                                                          of NoSQL databases, inherit the properties of their un-
platforms are utilized for data storing and processing,
                                                          derlying systems: For example, D-SPARQ [21] sup-
and the overall best approaches are chosen according
                                                          ports PA/EC, because it is based on MongoDB7 . Jena-
to the platform properties.
                                                          HBase [22], H2RDF [23] and H2RDF+ [24] inherit the
                                                          PC/EC properties of HBase8 . CM-Well9 is based on
3. Multi-Platform Multi-Model                             Cassandra10 supporting PA/EL. Remaining research
                                                          challenges include hybrid approaches supporting PA
     Databases                                            and PC (as well as EL and EC) for different fragments
                                                          of the data at the same time according to their appli-
Figure 2 provides an overview over data sizes of differ- cations.
ent types of data used in companies, devices, databases      Hence there is a need to run these different types of
and platforms. It already becomes obvious that some databases at the same time, but there might be also
types of databases fit better to the considered types the need for integrating the data of these databases
of data and company, used devices and platforms than (like in the scenario of combining the data of IoT de-
the others. Hence the different types of data are stored vices with accounting data). For an advanced process-
on and processed at different platforms dependent on ing of this different types of data stored in different
their size, the devices they are generated at and other databases and other database tasks it is indispensable
properties like their velocity. Integrating these data to break the boundaries of single installations of these
sets implies to support multiple models and also dif- DBMSs and to run one single DBMS. Furthermore, it
ferent platforms at the same time. This also requires to is desirable that this single DBMS provides a seman-
support and integrate different types of databases run- tic layer for advanced processing and reasoning capa-
ning on different platforms. For example, one might bilities and for a tight integration of the different data
combine the data of IoT devices (stored in an IoT models. This would also allow to offer the best features
database running on the edge of the network) with
the accounting data containing the remaining time for          7 https://www.mongodb.com/ (accessed on 17.12.2020)

charging off (stored in a main memory database run-            8 https://hbase.apache.org/ (accessed on 17.12.2020)
                                                               9 https://github.com/CM-Well/CM-Well          (accessed  on
ning on an employee’s desktop computer). These dif-
                                                          17.12.2020)
ferent types of databases have different properties and       10 https://cassandra.apache.org/ (accessed on 17.12.2020)
                                                                                                                                    21


of the different types of databases to applications and   ecution plans are ideal for many-core CPUs and GPUs
users “under one hood” transparently or with an in-       as well as whenever the best possibilities among enu-
telligent integration into one query language and API.    merated ones must be found (like in query optimiza-
This single SHM3P DBMS installation runs over all         tion and multi-version concurrency control (MVCC)).
platforms at the same time offering the advantages of     Complex operations like joins processing large data
all the different types of DBMSs (to the data that has    inputs are very suitable for GPU-acceleration, too (see
been previously processed by the single installations)    e.g. [25] for especially designed joins for SPARQL pro-
tightly integrated in a semantic layer, but to have e.g.  cessing on GPUs).
a global optimization of data distribution, transaction      Field-programmable gate arrays (FPGAs) can recon-
handling and global queries and reasoning tasks with      figure interconnects for connecting programmable logic
full potential by having freedom of processing down to    blocks with each other. This property makes FPGAs
the physical layer (e.g., index accesses)11 . One single  ideal suitable for data-flow-driven algorithms (like pro-
SHM3P DBMS would also reduce development costs            cessing an execution plan for evaluating queries in a
of applications and periods of vocational adjustment      streaming way without block-wise materialization of
of developers by offering one API and query language      intermediate steps like it is the case for many-core CPUs
with an additional semantic layer for all different plat- and GPUs), but also any arbitrary type of parallelism
forms. A very big challenge for SHM3P DBMSs is to         can be offered by FPGAs. FPGA-acceleration of SPARQL
provide a global distributed reasoner, which integrates   query processing as discussed in e.g. [26] achieves
different types of reasoners to be processed on the dif-  scalable speedups even increasing with larger data sets.
ferent platforms, where reasoning is optimized for this   Dynamic partial reconfiguration enables FPGAs to dy-
heterogeneous environment minimizing overall costs        namically exchange their configurations to process dif-
combining weighted costs of different types (commu-       ferent queries at runtime [26].
nication, processing, lifetime of IoT devices etc.).         Universal quantum computers try to combine the
                                                          full power of classical computers with quantum com-
3.1. Platforms                                            puters that manipulate (some few) qubits in super po-
                                                          sition by applying quantum logic gates. In compari-
We describe shortly the different platforms running son, quantum annealers - operating on up to several
execution environments for different types of DBMSs thousand qubits - only run special types of quantum
here.                                                     algorithms to solve adiabatic (as special form of com-
   Server Platforms are typical platforms for database binatorial) optimization problems, which is e.g. the
servers of small to medium-sized enterprises (SMEs). case for traffic control12 , selecting the execution plan
The DBMSs running on servers are usually centralized with the best estimated costs (from a set of enumer-
databases, which are operating in parallel on multi- ated plans) [27], concurrency control between transac-
core and sometimes many-core systems, often in vir- tions [28] as well as optimizing transaction schedules
tual machines. Relational DBMSs, most Semantic Web [29, 30].
DBMSs and Reasoners are typically running on server          Cloud Databases are designed to be run in the
platforms, and all other types of DBMSs usually offer cloud, where (storage and computing) resources can
a local mode to run on a single server.                   be dynamically allocated and freed according to users’
   Hardware-Accelerated Servers speed up database demands. Hence, cloud databases must consider that
tasks by utilizing the massive parallelism of special nodes (for storing and computing) are joining and leav-
hardware behind today’s multi-core CPUs.                  ing, such that it may be necessary to redistribute data
   Modern Graphical Processing Units (GPUs) consist and to react for processing jobs on leaving nodes. Fur-
of several thousand computing cores, which follow the thermore, as the nodes are typically not high-end hard-
single-instruction multiple-data paradigm, i.e., the same ware like servers with redundant components and
instruction is executed on different data on different clouds consist of many more nodes (up to several thou-
cores at the same time. GPUs are often regarded as sand nodes), hardware and communication failures may
special form of many-core CPUs. Hence, neither all occur more often. Hence, cloud computing architec-
parallel algorithms are suitable for nor benefit from tures apply simple fault-tolerance mechanisms by re-
GPUs. However, the massive parallel processing of ex- peating crashed jobs. Table 2 contains an overview
    11 Note that single installations of DBMSs can only be accessed       12 investigated          by          Volkswagen,          see
via their offered APIs or by setting up subqueries (of the global     https://www.volkswagenag.com/en/news/stories/
query) to them, which hinders the full potential of optimized pro-    2018/11/intelligent-traffic-control-with-quantum-computers.html
cessing of e.g. joins between the data of the different DBMSs.        (accessed on 17.12.2020)
                                                                                                                                                                                  22


Table 1
Rough Evaluation of different Types of Databases.
              DBMS
                            Main                  Paral-            Distri-                   Fede-               Cloud            Web              Mobile           IoT
  Feature                  Memory                  lel              buted                     rated                                Cloud
  Scalability                       –                   O                   +                         +            +       +       +    +       +        +       +        +

  Transaction rates         +       +       +       +       +       O / +                             O            +       +            +                –       –        –
  Intra-Transaction         +       +       +       +       +       O / +                     –       / O              +                O                –            –
  Parallelism
  Atomicity                 +       +       +      +    +       +       +       +                     +                +                +                +            +
  Durability                        +                   +           +       +       +             +       +        +       +            –                O            –
  Consistency               +       +       +      +    +       +       +       +                     +                +                +                +            +
  Extensibility                     –                   +           O       / +                       O            +       +       +    +       +        –       +    +       +

  Schemaless                –       –       –      –    –       –   –       –       –                 –           +    +       +   +    +       +        +       +    +       +

  Availability                  +       +               +                   +                         –                –           –    –       –    –       –   –    –       –
  Transparency      of          +       +           +       +               +                         O            +       +            –                –       –        –
  Distribution
  Geographical Dis-             –       –               –                   +                         +            +       +       +    +       +    +       +   +        +
  tribution
  Mobility                          –                   –                   –                         O                O                O            +       +        +

  Node Autonomy                 –       –               –                   O                         +                O            –       –        +       +        +

  Heterogeneity of              –       –               –                   –                         +                –                –            +       +   +    +       +
  DBMS
  Administration                    O                   O                   –             –       / –         –        –            +       +        –       –   –    –       –
  Hardware Costs                    –               –       –               –                         –            +       +       +    +       +        –       +    +       +

  Reasoning                 +       +       +      +    +       +           +                     –       –        +       +            +            –       –   –    –       –


Table 2
Evolution of Big data analytics engines. Based on [16] and extended by the rows “Impact on Databases” and “Impact on
Reasoning”.
   Generation: 1                            2                   3                           4
   Features:   Batch                        + Interactive       + Near-Real-Time            + Real-Time Streaming
                                                                + Iterative Processing      + Native Iterative Processing
   Processing MapReduce                     DAG Dataflows       Resilient       Distributed Cyclic Dataflows
   Model:                                                       Datasets (RDD)
   Impact on Long-Running                   Query Answering + Continuous Queries            + Real-Time Continuous Queries
   Databases: Queries                       with lower latency
   Impact on Long-Running                   Reasoning      with + Capabilities for Stream + Cap. for Real-Time Stream
   Reasoning: Reasoning                     lower latency       Reasoning                   Reasoning
   Engine:    Hadoop                        TEZ                 Spark                       Flink


over important state-of-the-art Big Data analytics en-                                  a new form of cloud: the web cloud [35]: One just
gines working in cloud environments. Additionally                                       visits with his/her web browser a certain webpage in
to one-time queries, Apache Spark and Apache Flink                                      order to connect his/her computer to the web cloud.
offer to process data streams and continuous queries,                                   In this way the setup of the web cloud is much easier
such that they also belong to the type of                                               than those of traditional clouds. Furthermore, the web
stream databases. There exists various examples of                                      cloud has a much larger number of potential nodes, as
Semantic Web databases on top of the different Cloud                                    any computer running a browser may connect to and
technologies like [31] (HBase, Pig), [32] (Spark) and                                   be integrated in the web cloud. New challenges arise
[33] (Flink), but also other contributions avoiding to                                  when setting up a cloud by web browsers: The nodes
use the well-known technologies like [34] in order to                                   may be more often disconnected. Data is processed
support local joining. Web Cloud Databases rely on                                      within the browser and hence we must use the tech-
                                                                                                               23


nologies offered by the browser for data management but as more IoT devices are also available.
purposes. New technologies like WebAssembly [36]                Dew computing [45, 46] overcomes availability prob-
introducing a virtual machine for the browser may help lems, where the communication between cloud and
to speed up processing in the browser. There exist first IoT devices is disturbed, by placing an additional local
approaches to distribute SPARQL queries in some kind server near to the IoT devices taking over the tasks of
of web clouds [37].                                          the cloud during downtimes and synchronizing with
   Mobile Databases [38] involve the technical infras- the cloud at uptimes.
tructure of mobile providers like base stations (being          Besides many approaches to semantic IoT like cor-
near-by to their connected mobile devices) in order to responding ontologies [47] and interoperability issues
speed up processing, lower communication (and hence [48], there are not so many contributions to seman-
also energy) costs, increase availability and durability tic IoT databases. IoT databases are often organized
(by logging at the base stations instead on mobile de- as P2P database, especially if they work on the fog or
vices) in order to overcome limitations of the mobile edge, or follow the dew computing concept. Hence
devices. Some RDF stores like [39] are especially de- contributions to P2P networks processing Semantic Web
signed to run on mobile devices, but they do not con- data like [41] are relevant for semantic IoT databases
sider the backend of mobile providers so far.                as well. One of the big challenges here is the distribu-
   P2P Databases [40, 41] use peer-to-peer (P2P) net- tion of data and processing tasks between cloud and
works as underlying backend technology to master a IoT infrastructure including the devices themselves.
frequent joining and leaving of nodes for data stor- Furthermore, IoT devices often generate data streams,
ing and processing. In comparison to clouds, they are such that organizing the IoT database as stream database
designed for a much more frequent change in their is a reasonable choice: The IoT application design may
topology and for an equal distribution of functional- especially consider to reduce data by aggregation and
ity without distinction of master and slave nodes. P2P focusing on only relevant data, which should be done
databases have to introduce more redundancy in data nearby the things. One research direction may con-
storing as well as even in processing in order to over- sider how to use Semantic Web technologies for defin-
come the frequent disconnections to their nodes. Fur- ing such aggregation tasks. Reasoning at data sources
thermore, P2P databases must consider heterogeneity or nearby, or in clouds is another difficult question and
in the connected nodes much more than other types of not so easy to answer in comparison to query process-
databases. There exist already quite many approaches ing on the fog or edge, as reasoning consumes much
for semantic data processing in P2P networks like [41], more processing resources.
but ontology inference is considered only on a rudi-
mentary basis and for trigger and continuous queries 3.2. (S)HM3P Databases and their
not at all [16].
                                                                   Challenges
   IoT Databases [42] are especially developed to serve
as data store for large-scale installations of the Internet- HM3P databases are single installations of a M3P DBMS,
of-Things (IoT). IoT databases often operate in the cloud, which are not only able to run on multiple platforms,
but the communication bootleneck from the IoT de- but runs and tightly integrates different types of DBMSs
vices to the cloud doesn’t scale especially for IoT de- for ease of use and optimization purposes at runtime.
vices with high velocity and large-scale installations. SHM3P databases integrate the different types of DBMSs
   In companion with the cloud, fog computing [43] in an additional semantic layer and supports global
stores and processes data and application logic on near- reasoning over all integrated DBMSs.
things edge devices with higher capabilities (rather than       IoT databases operating at the same time in clouds
primarily in cloud data centers), which saves commu- and on fog, edge or dew computing are reasonable ex-
nication avoiding the route over the internet backbone. amples for H3MP DBMSs: They span over different
However, fog computing is not really scalable in the platforms, the edge of the IoT network and the cloud
number of connected things, as the near-things edge data centers, and have to distribute functionality like
devices do not increase in number and capabilities in data aggregation at or near to the things and complex
the same way.                                                operations, e.g., natural language processing and rea-
   The scalability issue is solved in a better way by soning, at the cloud data centers. Furthermore, IoT
edge computing [44], which utilizes additionally all databases have to consider different types of query pro-
IoT devices for data storage and processing, and ex- cessing by dealing with traditional (one-time) queries
ecuting application logic: As more IoT devices are de- on static data, continuous queries on data streams and
ployed, as more data needs to be stored and processed, spatial-temporal queries on archived data of data streams.
                                                                                                                            24


IoT devices are often heterogeneous because they are • developing multi-platform transaction synchroniza-
e.g. developed by different manufacturers: the use of                   tion approaches and supporting global transaction
ontologies and hence of semantic databases simplifies                   synchronization approaches over distributed differ-
the integration of these devices. Semantic IoT databases                ent transaction synchronization approaches running
sometimes manage data at the IoT devices in the tradi-                  on different platforms
tional way for performance reasons and only support • combining different types of databases (on different
reasoning and semantic querying at the cloud centers                    platforms) to offer the best of these databases and
after transforming the data of IoT devices to semantic                  platforms under one hood to applications and users
data [16]. Other approaches support even reasoning                      transparently or via intelligent integration into query
on streams [16].                                                        language and API, e.g., guaranteeing atomicity and
    Multi-platform DBMSs are already highly ambitious                   isolation in transactions for the data stored on a par-
even for large, established database companies since it                 allel server, but not for those data in the cloud sup-
requires data management skills in an extremely wide                    porting fast updates
spectrum (i.e., data management issues in sensors and
smart objects for IoT databases are completely differ-                   Specific challenges of SHM3P DBMSs are
ent from the challenges of in-memory databases of P2P • integrating different data models in a semantic layer
data oriented systems and semantic querying and rea-                    on top of the underlying data models
soning of Semantic Web databases). Hence current ap- • efficient transformations from and to the semantic
proaches are more on interoperability between the va-                   model in an operational system
riety of DBMSs, each one focusing on its specific issues • developing efficient semantic querying and reason-
related to its specific functionalities. However, we pro-               ing over the integrated data of different models
pose to support a global approach to integrate all these • global reasoning over reasoners running on differ-
specific functionalities in order to use their different                ent platforms supporting some kind of distributed
benefits in an uniform way and to increase the overall                  heterogeneous reasoning
benefits of the global approach.                                      • developing a combination of stream reasoning over
    New challenges of M3P and HM3P DBMSs in                             streaming data (e.g. of IoT devices) with static rea-
comparison to traditional DBMSs and MM-DBMSs are                        soning over large-scale data sets (stored e.g. in clouds)
• developing only one code base for the different plat- • supporting transactions over semantic data by inte-
   forms, but not introducing performance overhead in                   grating the reasoner in transaction synchronization
   comparison to single platform databases13
• identifying common properties of several platforms                     We are sure that this is not an exhaustive list of new
   and reusing those approaches (like fault tolerance                 challenges.  Many further challenges will arise during
   mechanisms) in different combinations, which are                   developing   the (S)(H)M3P DBMSs and considering es-
   best suitable for these considered platforms                       pecially combinations   of different platforms and mod-
• data distribution among different platforms (apply-                 els at runtime.
   ing different data distribution approaches as well)
• efficient binary serialization and communication pro-
   tocols for integrating the different platforms
                                                                      4. Summary and Conclusions
• data distribution strategies considering overall the Multi-model databases provide the infrastructure to han-
   different properties of used platforms and models dle the zoo of data models managed in today’s compa-
   (like fast reads in relational databases on parallel nies. Multi-model databases that are able to run on
   servers and fast updates in cloud databases)                       a variety of platforms, which are typically deployed
• query optimization and other database tasks across and in use in parallel in today’s companies, are called
   different platforms, which apply different database multi-model multi-platform database management sys-
   approaches                                                         tems (M3P DBMSs). Hybrid M3P (HM3P) DBMSs span
• dealing with and integrating different privacy over different platforms at run-time. Our focus is on its
   and security mechanisms supporting different pri- semantic counterpart: Semantic HM3P (SHM3P) DBMSs
   vacy and security levels in the different platforms offer its additional semantic layer for simple integra-
   (with research e.g. on querying heterogeneous en- tion of the DBMS technologies of its operational plat-
   crypted data)                                                      forms. Furthermore, we describe and analyze different
    13 We are of the opinion that this is possible by applying Kotlin types of DBMSs and platforms concerning their prop-
features like expected and actual declarations for classes and types, erties, chances and challenges for DBMSs with spe-
and inline functions and classes.
                                                                                                             25


cial focus on Semantic DBMSs. Current state-of-the-           systems, in: AFIPS National Computer Confer-
art (S)M3P DBMSs don’t exploit the multiple platform          ence, 1981, pp. 487–499.
idea to its full potential, because they typically only [11] H. Lim, Y. Han, S. Babu, How to fit when no one
tightly integrate one type of platform and database.          size fits., in: CIDR, 2013.
We see great further optimization possibilities in data [12] A. Elmore, J. Duggan, M. Stonebraker, M. Bal-
and functionality distribution like query processing,         azinska, U. Cetintemel, V. Gadepally, J. Heer,
reasoning and transaction handling, and ease of usage         B. Howe, J. Kepner, T. Kraska, S. Mad-
when different types of platforms and databases are           den, D. Maier, T. Mattson, S. Papadopoulos,
supported in one single installation of a M3P DBMS            J. Parkhurst, N. Tatbul, M. Vartak, S. Zdonik, A
by tightly integrating them based on a semantic layer.        demonstration of the bigdawg polystore system,
                                                              Proc. VLDB Endow. 8 (2015) 1908–1911.
                                                         [13] J. Lu, Z. H. Liu, P. Xu, C. Zhang, UDBMS: road
References                                                    to unification for multi-model data management,
                                                              in: ER Workshops, 2018, pp. 285–294.
  [1] J. Lu, I. Holubová, Multi-model databases: A new
                                                         [14] I. Holubova, S. Scherzinger, Nextgen multi-
      journey to handle the variety of data, ACM Com-
                                                              model databases in semantic big data architec-
      puting Surveys (CSUR) 52 (2019).
                                                              tures, Open Journal of Semantic Web (OJSW) 7
  [2] R. Kotorov, Customer relationship management:
                                                              (2020) 1–16.
      strategic lessons and future directions, Business
                                                         [15] W3C, Semantic Web Development Tools, ac-
      Process Management Journal 9 (2003) 566–571.
                                                              cessed on 23/4/2020. https://www.w3.org/2001/
  [3] S.        Leberknight,        Polyglot     persis-
                                                              sw/wiki/Tools.
      tence,        Scott    Leberknight’s      Weblog,
                                                         [16] S. Groppe, Emergent models, frameworks, and
      http://www.sleberknight.com/blog/sleberkn/
                                                              hardware technologies for big data analytics, The
      entry/polyglot_persistence, 2008.
                                                              Journal of Supercomputing 76 (2020) 1800–1827.
  [4] S. Groppe, J. Groppe, Hybrid multi-model multi-
                                                         [17] L. Feigenbaum, G. T. Williams, K. G.
      platform (hm3p) databases, in: Proceedings
                                                              Clark, E. Torres (editors), SPARQL 1.1
      of the 9th International Conference on Data
                                                              Protocol,       2013. W3C Recommendation,
      Science, Technology and Applications (DATA),
                                                              https://www.w3.org/TR/sparql11-protocol/.
      2020.
                                                         [18] D. Weisenberger, How many atoms are there
  [5] JetBrains s.r.o., FAQ - Kotlin Programming Lan-
                                                              in the world?, accessed on 17.12.2020. http://
      guage, 2020. URL: https://kotlinlang.org/docs/
                                                              education.jlab.org/qa/mathatom_05.html.
      reference/faq.html.
                                                         [19] E. A. Brewer, Pushing the CAP: strategies for
  [6] B. Kolev, P. Valduriez, C. Bondiombouy,
                                                              consistency and availability, Computer 45 (2012)
      R. Jiménez-Peris, R. Pau, J. Pereira, Cloudmdsql:
                                                              23–29.
      querying heterogeneous cloud data stores with
                                                         [20] D. Abadi, Consistency tradeoffs in modern dis-
      a common language, Distributed and Parallel
                                                              tributed database system design: CAP is only
      Databases 34 (2016) 463–503.
                                                              part of the story, Computer 45 (2012) 37–42.
  [7] M. Zhu, T. Risch, Querying combined cloud-
                                                         [21] R. Mutharaju, S. Sakr, A. Sala, P. Hitzler, D-
      based and relational databases, in: International
                                                              sparq: Distributed, scalable and efficient rdf
      Conference on Cloud and Service Computing,
                                                              query engine, in: Proceedings of the 12th In-
      2011, pp. 330–335.
                                                              ternational Semantic Web Conference (Posters &
  [8] R. Bonaque, T. D. Cao, B. Cautis, F. Goasdoué,
                                                              Demonstrations Track), Sydney, Australia, 2013,
      J. Letelier, I. Manolescu, O. Mendoza, S. Ribeiro,
                                                              p. 261–264.
      X. Tannier, M. Thomazo, Mixed-instance query-
                                                         [22] V. Khadilkar, M. Kantarcioglu, B. Thuraisingham,
      ing: a lightweight integration architecture for
                                                              P. Castagna, Jena-hbase: A distributed, scalable
      data journalism, PVLDB 9 (2016) 1513–1516.
                                                              and efficient rdf triple store, in: Proceedings of
  [9] M. Hammer, D. McLeod, On Database Manage-
                                                              the 2012th International Conference on Posters
      ment System Architecture., Technical Report,
                                                              & Demonstrations Track, Boston, USA, 2012, p.
      MIT, Cambridge Laboratory for Computer Sci-
                                                              85–88.
      ence, 1979.
                                                         [23] N. Papailiou, I. Konstantinou, D. Tsoumakos,
[10] J. M. Smith, P. A. Bernstein, U. Dayal, N. Good-
                                                              N. Koziris, H2RDF: Adaptive query processing on
      man, T. Landers, K. W. T. Lin, E. Wong, Multibase:
                                                              rdf data in the cloud, in: Proceedings of the 21st
      Integrating heterogeneous distributed database
                                                              International Conference on World Wide Web,
                                                                                                                  26


     Lyon, France, 2012, p. 397–400.                            fication, W3C Proposed Recommendation, https:
[24] N. Papailiou, I. Konstantinou, D. Tsoumakos,               //www.w3.org/TR/wasm-core-1/, 2019.
     P. Karras, N. Koziris, H2RDF+: high-performance       [37] A. Grall, P. Folz, G. Montoya, H. Skaf-Molli,
     distributed joins over large-scale RDF graphs, in:         P. Molli, M. Vander Sande, R. Verborgh, Ladda:
     Proceedings of the 2013 IEEE International Con-            Sparql queries in the fog of browsers, in: Euro-
     ference on Big Data, Santa Clara, USA, 2013, pp.           pean Semantic Web Conference, Springer, 2017,
     255–263.                                                   pp. 126–131.
[25] X. Zhang, M. Zhang, P. Peng, J. Song, Z. Feng,        [38] V. Kumar, Mobile database systems, Wiley On-
     L. Zou, A scalable sparse matrix-based join for            line Library, 2006.
     sparql query processing, in: International Con-       [39] D. Le-Phuoc, J. X. Parreira, V. Reynolds,
     ference on Database Systems for Advanced Ap-               M. Hauswirth, Rdf on the go: An rdf storage and
     plications, Springer, 2019, pp. 510–514.                   query processor for mobile devices, in: ISWC,
[26] S. Werner, D. Heinrich, S. Groppe, C. Blochwitz,           Citeseer, 2010.
     T. Pionteck, Runtime adaptive hybrid query en-        [40] K. Graffi, D. Stingl, C. Gross, H. Nguyen, A. Ko-
     gine based on fpgas, Open Journal of Databases             vacevic, R. Steinmetz, Towards a p2p cloud: Reli-
     (OJDB) 3 (2016) 21–41.                                     able resource reservations in unreliable p2p sys-
[27] I. Trummer, C. Koch, Multiple query optimiza-              tems, in: International Conference on Parallel
     tion on the d-wave 2x adiabatic quantum com-               and Distributed Systems, 2010, pp. 27–34.
     puter, Proc. VLDB Endow. 9 (2016).                    [41] R. Mietz, S. Groppe, O. Kleine, D. Bimschas, S. Fis-
[28] S. Roy, L. Kot, C. Koch, Quantum databases, in:            cher, K. Römer, D. Pfisterer, A p2p semantic
     CIDR, 2013.                                                query framework for the internet of things, PIK-
[29] T. Bittner, S. Groppe, Avoiding blocking by                Praxis der Informationsverarbeitung und Kom-
     scheduling transactions using quantum anneal-              munikation 36 (2013) 73–79.
     ing, in: 24th International Database Engineering      [42] ObjectBox Limited, The best IoT Databases for
     & Applications Symposium (IDEAS), Seoul, Re-               the Edge – an overview and compact guide,
     public of Korea, 2020.                                     https://objectbox.io/the-best-iot-databases-for-
[30] T. Bittner, S. Groppe, Hardware accelerating the           the-edge-an-overview-and-compact-guide/,
     optimization of transaction schedules via quan-            2019.
     tum annealing by avoiding blocking, Open              [43] M.      Abdelshkour,       Iot,      from      cloud
     Journal of Cloud Computing (OJCC) 7 (2020) 1–              to      fog      computing,        Cisco      Blogs,
     21. URL: http://nbn-resolving.de/urn:nbn:de:101:           http://blogs.cisco.com/perspectives/iot-from-
     1-2020112218332015343957.                                  cloud-to-fog-computing, 2015.
[31] S. Groppe, T. Kiencke, S. Werner, D. Heinrich,        [44] P. Garcia Lopez, A. Montresor, D. Epema,
     M. Stelzner, L. Gruenwald, P-luposdate: Us-                A. Datta, T. Higashino, A. Iamnitchi, M. Barcel-
     ing precomputed bloom filters to speed up sparql           los, P. Felber, E. Riviere, Edge-centric comput-
     processing in the cloud, Open Journal of Seman-            ing: Vision and challenges, SIGCOMM Comput.
     tic Web (OJSW) 1 (2014) 25–55.                             Commun. Rev. 45 (2015) 37–42.
[32] D. Graux, L. Jachiet, P. Geneves, N. Layaïda, Spar-   [45] K. Skala, D. Davidovic, E. Afgan, I. Sovic, Z. Sojat,
     qlgx: Efficient distributed evaluation of sparql           Scalable distributed computing hierarchy: Cloud,
     with apache spark, in: ISWC, 2016.                         fog and dew computing, Open Journal of Cloud
[33] A. Azzam, S. Kirrane, A. Polleres, Towards                 Computing (OJCC) 2 (2015) 16–24.
     making distributed rdf processing flinker, in:        [46] Y. Wang, Definition and categorization of dew
     Innovate-Data, IEEE, 2018, pp. 9–16.                       computing, Open Journal of Cloud Computing
[34] S. Groppe, J. Blume, D. Heinrich, S. Werner, A             (OJCC) 3 (2016) 1–7.
     self-optimizing cloud computing system for dis-       [47] S. Mishra, S. Jain, Ontologies as a semantic model
     tributed storage and processing of semantic web            in iot, International Journal of Computers and
     data, Open Journal of Cloud Computing (OJCC)               Applications 42 (2020) 233–243.
     1 (2014) 1–14.                                        [48] A. Cimmino, M. Poveda-Villalón, R. García-
[35] S. Groppe, N. Reimer, Code generation for big              Castro, ewot: A semantic interoperability ap-
     data processing in the web using webassem-                 proach for heterogeneous iot ecosystems based
     bly, Open Journal of Cloud Computing (OJCC)                on the web of things, Sensors 20 (2020) 822.
     6 (2019) 1–15.
[36] A. Rossberg (editor), WebAssembly Core Speci-