Semantic Hybrid Multi-Model Multi-Platform (SHM3P) Databases Sven Groppe Institute of Information Systems (IFIS), University of Lübeck, Ratzeburger Allee 160, D-23562 Lübeck, Germany Abstract Today’s companies have to handle a zoo of data of different models. Multi-model databases promise to simplify data admin- istration for the parallel usage of different data models. Compared to the other data models, semantic data models introduce an additional abstraction layer for reasoning purposes, such that semantic data models provide superior capabilities. Hence semantic multi-model databases use the semantic data model as main glue between the different data models. Furthermore, applications as well as databases are today running on different platforms like mobile devices, web, desktops, servers, clouds and post-clouds (e.g., fog and edge computing). Hybrid multi-model multi-platform (HM3P) databases and its semantic counterpart (SHM3P databases) integrate the different platforms in order to offer their advantages and benefits for data dis- tribution, query processing and transaction handling to their users. In this paper we introduce and discuss the novel concept of SHM3P databases and its open challenges. Keywords Semantic Web, databases, multi-platform, multi-model, cloud, post-cloud, edge computing, fog computing, dew computing, hardware acceleration, Internet-of-Things, mobile database, parallel database, main-memory database 1. Introduction model data [3] hindering optimizations down to the physical layer of connected DBMSs [4]. Furthermore, Today companies have to deal with and process data we propose the semantic data model in order to unify in various data formats: The backends of their web the other data models, because the semantic data model shops with databases about customers and their or- offers the ontology layer as additional abstraction layer, ders are typically connected to relational databases. which can be utilized for data integration purposes of Product catalogs of companies are often exchanged us- the other data models. ing XML, JSON or RDF. The boom of social networks While in the past database management systems leads to a high demand to process their graph data, (DBMSs) run mainly on parallel servers, there are to- other social media like wikis offer their data as un- day various different platforms like mobile devices, structured data. Key-value stores are often used when- web, desktops, servers (maybe additionally hardware ever data must be accessed in a simple way just via accelerated by GPUs, FPGAs and in future scenarios keys. However, there is also a need for schema-free even quantum computing), clouds and post-clouds (e.g., or schema-less databases, which don’t ask the data to fog and edge computing) offering execution environ- stay in the inflexible corset of a schema, but still work- ments for running a DBMS1 . ing on complex data formats like document stores. The Multi-platform development (as supported by e.g. data is hence stored according to and processed using the programming language Kotlin [5]) allows to share different models (multi-model data [1]). The big chal- common code between different platforms like desk- lenge for today’s companies are the synchronization top, server, web, mobile and IoT. Multi-platform de- and integration of their multi-model data into a sin- velopment reduces the development costs for a DBMS gle view of and for the customer [2]. Multi-Model running on multiple platforms drastically. Database Management Systems (MM-DBMSs) of- Puzzling all pieces together we propose the follow- fer the management of different data models in one ing definitions ((H)M3P DBMS are defined according single database [1] in order to overcome the disadvan- to [4]): tages of polyglot persistence, where applications use several databases at the same time to handle multi- Definition 1 (M3P/HM3P/SHM3P DBMS). A Multi- Model Multi-Platform Database Management System ISIC’21: International Semantic Intelligence Conference, February (M3P DBMS) is a MM-DBMS that can be executed on 25–27, 2021, New Delhi, India different platforms. A hybrid M3P (HM3P) DBMS spans " groppe@ifis.uni-luebeck.de (S. Groppe) over different platforms in operation. A Semantic HM3P  0000-0001-5196-1117 (S. Groppe) © 2021 Copyright for this paper by its authors. Use permitted under Creative 1 Note that clients of DBMSs typically run on different plat- Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) forms, but we are considering the database server here. 17 Single instance of SHM3P Database offers (fully cross-platform optimized) functionality of & replaces Mobile DB Mobile Devices Quantum DB Cloud DB & Infrastructure Main- Quantum Cloud IoT DB Computer Memory DB On the Edge GPU GPU-accelerated Reasoning: Parallel Server Lightweight reasoning on Heavyweight reasoning Heavyweight reasoning Reasoning on small data sizes large data sizes of IoT devices on moderate data sizes on large data sizes of mobile devices How to integrate the different reasoning capabilities and requirements into one transparent global reasoner? Figure 1: SHM3P database spanning over multiple platforms. Here, an SHM3P database replaces an IoT database in an Industry 4.0 scenario (using edge-computing), a GPU-accelerated parallel database (on a parallel server) for archiving and generating long-term statistics of the IoT data, which is further supported by a quantum computer for query and reasoning optimization, a database in the cloud for natural language processing tasks and a mobile database (on mobile devices and infrastructure) for monitoring and controlling of the production line in the company. Platforms are marked with an italic font. Green text marks discussion about reasoning in these scenarios. Figure is based on [4] and extended by the discussion on reasoning. (SHM3P) DBMS supports a (global) semantic layer (for cerning MM-DBMSs, multi-platform development, querying and reasoning purposes) over all platforms of databases running on different platforms, polyglot per- an HM3P DBMS. sistence and further related work. Section 3 introduces SHM3P DBMSs and explores the advantages, and anal- Whereas today’s M3P DBMSs are typically devel- yses envisioned platforms and common properties of oped for platforms of the same type (like windows and their combinations. Finally we summarize the results linux servers, see Section 2.1), some other even span and provide an overview of future work in Section 4. over a (locally installed) private cloud and a public cloud (in a so called hybrid cloud 2 ). In contrast, we envi- sion SHM3P DBMSs over platforms of different type 2. Basics (like IoT and hardware-accelerated parallel servers) in- tegrating the features of databases developed for these 2.1. Databases for Multi-Model Data platforms (like energy-savings on IoT devices and high Polyglot persistence uses different databases support- throughput on servers) while offering advanced global ing different data models (and maybe running on dif- reasoning capabilities over all platforms. Hence SHM3P ferent platforms) within one application [3]. Federated databases support any data model at any platform by query languages enable polyglot persistence by sup- tightly integrating them with a semantic layer. For an porting queries over heterogeneous data stores within example installation, see Figure 1. one single query. One example of such a query lan- Our main contributions are: guage is CloudMdsQL [6], with which one can for- • the introduction of SHM3P DBMS as new type of mulate queries over SQL and NoSQL databases. The DBMS, proposed prototype even optimizes the queries glob- • a detailed discussion of the current state of the art ally and pushes operations down to the integrated SQL about and comparative analysis of DBMS designed and NoSQL databases as much as possible. A similar for different platforms with special attention to Se- approach is taken by [7] offering to query cloud-based mantic Web DBMS, and NoSQL like Google’s Bigtable and relational databases • a discussion about open research challenges for with the Google Bigtable query language GQL. The fo- HM3P DBMS and SHM3P DBMS. cus of Apache Drill3 is interactive ad-hoc analysis of The remainder is as follows: Section 2 describes the large-scale datasets with low-latency handling up to basics and an analysis of current state-of-the-art con- petabytes of data spread across thousands of servers. 2 Please note that private and public clouds are platforms of the same type. 3 https://drill.apache.org/ (accessed on 17.12.2020) 18 Drill optimizes a query plan to leverage the datastore’s where most of which, i.e. 4 of these 5 MM DBMS with internal processing capabilities and by considering data RDF support, also manage graph data. The graph model locality. Commercial multi-store products like IBM seems to be more popular (12 from 21 MM DBMS). BigInsights, Microsoft HDInsight and Oracle Bigdata MM DBMS with RDF support typically don’t support Appliance as well as open source projects like PrestoDB4 reasoning at all or only in a rudimentary way, such integrate diverse data sources by using database con- that users should look for native semantic DBMS if nectors (like JDBC drivers). Tatooine [8] uses a se- reasoning is needed. Hence reasoning seems to be chal- mantic layer as glue between databases for different lenging in the MM DBMS context. Most multi-model data models supporting a semantic integration. How- databases run SQL, SQL-like or extensions of SQL ever, all these polystores also don’t support to fully op- queries. Binaries of these databases are offered in ma- timize queries across the integrated, but independent chine code (often compiled from C/C++) or for the Java data sources, which limit data processing. virtual machine (JVM). They usually run on all or a big Federation Databases [9] and multidatabases [10] subset of the major desktop operating systems linux, place a mediator between different autonomous windows, macOS, unix and their variants. Few multi- databases for integration purposes by reformulating model databases like IBM DB2 run on mainframes op- queries according to a global schema to the native erating e.g. z/OS. While all offer to run in the cloud, schemes of the integrated databases, which afterwards some are also enabled for the hybrid cloud. In the execute these queries. Today, some research focus on hybrid cloud, a (locally installed) private cloud is to- federating databases following the polyglot persistence gether used with a public cloud. Hybrid clouds de- approach: For example, DBMS+ [11] provides unified crease costs spent to the public cloud provider while declarative processing for the integration of several still having on-demand resources with the illusion of processing and database platforms. BigDAWG [12] of- infinite capacity at the public cloud for a surprising fers location transparency while running queries high resource demand. against the three different integrated systems While all multi-model databases run on different plat- PostgreSQL, SciDB and Accumulo. forms, they don’t integrate database instances on dif- Multi-Model Databases: A multi-model database ferent types of platforms and different types of databases. is one single database for multiple data models, which Databases in hybrid clouds combining the resources of fully integrates a backend to offer advanced perfor- a locally installed private cloud with a public cloud are mance, scalability and fault tolerance [13]. One of the approximations of the idea of operating on multiple first of this type are Object-Relational DataBase Man- platforms of different types. An HM3P DBMS extends agement Systems (ORDBMSs), which support various this idea and supports multiple types of platforms like data models like relational, text, XML, spatial and ob- main-memory, cloud, Internet-of-Things (with e.g. edge ject. ORDBMSs use the relational technology for im- computing) and hardware-accelerated databases using plementing the support of their data models, i.e., the their different advantages at runtime for database tasks relational model is the first-class citizen. In compari- like data distribution, transaction handling and query son and in general, in multi-model databases the dif- processing. A SHM3P DBMS offers a semantic layer as ferent models can be all first class citizens and sup- glue between the different data models and supports ported in a native way (utilizing e.g. specialized in- global semantic querying and reasoning by tightly in- dices for them). The authors in [14] propose to use a tegrating local query engines and reasoners. semantic layer as glue between the different data mod- els in order to support global querying and reasoning 2.2. Multi-Platform Development over all data. We extend this idea to multi-platform databases integrating the technologies and features of There are several programming languages like C/C++ different types of databases. available compiling to various platform targets in their [4] contains an overview of current state-of-the-art native machine code best suitable for high performance multi-model databases, their type of extension, their programs. Calls to the operating system for disk ac- supported data models, query languages and platforms. cesses or developing a (native) graphical user interface The investigated multi-model databases support at most must be ported to the different platforms. There is no 5 from 8 data models, such that no multi-model database special support for multi-platform development like offers all data models to their users. From the investi- code-sharing of common code and allowing to define gated 21 MM DBMS only 5 support RDF as data model, platform-specific modules to code the differences be- tween the different platforms. Java was one of the first 4 https://prestodb.io/ (accessed on 17.12.2020) programming languages for developing one code run- 19 ning on different platforms, which is still the key for Semantic Web tools with native binaries run usually the success of Java. It has been implemented by com- on any desktop and server computers, some only on piling to bytecode, which is processed in the Java vir- linux operating systems. tual machine (JVM) available for many platforms. The Hence these DBMSs can be called Multi-Platform JVM introduces an intermediate abstraction layer, but DBMSs, but don’t bring the multi-platform approach also some performance overhead, although the byte- to its full potential. They are typically developed for code is often just-in-time (JIT) compiled to native ma- one type of platform: server, cluster or cloud. DBMSs chine code. Scripting languages like JavaScript also designed for different types of platforms like cluster, run on different platforms (i.e., wherever browsers and mobile, IoT and the web are not considered so far. HM3P Node.js environments can be started). JavaScript be- DBMSs span over different platforms at runtime, which sides HTML 5 is the basis of cross-platform libraries may be the case for hybrid cloud installations, but which like React Native and PhoneGap. Advanced multi- are also not deployed at different platform types. Hence, platform support introducing a module concept for shar- full-fledged HM3P DBMSs have to consider various ing common code between the different platforms, and different properties (e.g., availability of nodes, stor- platform-specific modules for coding remaining dif- age and computing resources), the data (like security ferences, is introduced by modern programming lan- concerns) and queries (like one-time versus continu- guages like Kotlin [5]. Kotlin offers multi-platform sup- ous queries) of the supported platforms at runtime for port for the JVM (Desktop, Server and Android), data distribution and processing. Reasoning support JavaScript engines (browser and server via Node.js) is not available for all platforms and types of queries and via LLVM Windows, Linux, Android (arm32/64), [16]: While many contributions exist for RDFS and MacOS, iOS, Raspberry Pi and WebAssembly. OWL support during one-time query processing on Many DBMSs are implemented in C/C++ for per- server and desktop computers, there exist only few ap- formance reasons and run in native machine code for proaches for the cloud and for P2P networks. There operating systems like Windows, Linux, Unix and Ma- exist only few approaches for trigger and continuous cOS (see [4]). Some modern DBMSs and most Seman- queries with RDFS and OWL support on server and tic Web tools (see [15]) are implemented in Java fur- desktop computers as well as for the cloud. Ontology ther decreasing development costs, but still running inference for trigger and continuous queries in P2P on clusters and servers operating Windows, Linux, Unix networks haven’t been considered so far. The devel- and MacOS. Real multi-platform tools by e.g. using opment of an SH3MP database may help to support Kotlin multi-platform projects are missing so far for ontology inference in trigger and continuous queries Semantic Web tools. with reasonable efforts also on these platforms. Multi-Platform Clients offering to set up queries 2.3. Databases for different Platforms and displaying their results are available for all DBMSs5 : DBMSs typically offer clients for platforms like the Web, Most DBMSs and their clients run on different plat- major desktop operating systems like Windows, Linux, forms. There exist usually also numerous language Unix and MacOS, mobile apps like android and iOS. bindings for APIs calling database functionalities from Some clients are even implemented as cross-platform database applications. application6 , which also support different DBMSs. The Multi-Platform DBMSs are typically either imple- situation is quite comfortable for the Semantic Web: mented in C/C++ or in Java. Ports are often available The W3C standardized the protocol to query SPARQL for Windows, Linux, Unix (sometimes for Solaris) and endpoints in [17]. The protocol [17] is widely sup- MacOS (see [4]). Only few DBMSs still run on main- ported and hence the Semantic Web DBMSs as well frames. Modern DBMSs run in the Cloud and some- as the clients can be easily exchanged. times they are offered only as managed service in the The user may have the impression that a database Cloud (e.g., Cosmos DB). Some few are also running in may be running on different platforms, because (s)he a Hybrid Cloud, where the DBMS is running in a local gets in touch with clients for the database available for installation of a cluster (private cloud) as well as in a different platforms. However, the DBMS does neither public cloud (of a cloud provider). [15] contains a se- store nor process the data on the clients’ computer, but lection of 18 widely-used Semantic Web tools includ- only transfers the query result to it. We envision a ing triple stores and Semantic Web databases. Over SHM3P DBMS, where the advantages of the different half of these tools are implemented in Java (i.e., 6 of 5 We consider PostgreSQL and its clients as example here. these tools run on any platform, which supports java) 6 For example, DBeaver available at https://dbeaver.io/ (accessed or support java language bindings (4 of these tools). on 17.12.2020). 20 Size: Mebi advantages because they have been developed for dif- Yobi Zebi Byte Pebi Tebi Gibi Exbi Kibi Binary: 21 210 220 230 240 250 260 ferent application scenarios, devices, properties of their 270 280 2167 Decimal: 10 103 106 109 1012 1015 1018 1021 1024 1050 indexed data (velocity, heterogeneity, size etc.) and so on. Table 1 contains a rough evaluation of these Yotta Atoms on Earth Zetta Mega Giga Peta Byte Tera Kilo Exa databases. Databases have tailored their architectures Data: Office Internet Big Data* according to the properties of the different platforms, IoT but often also to the required properties coming from Company: SMEs Global Player their applications. Especially distributed databases can- Devices: not offer all: The well known PAC theorem [19] de- IoT Device Cluster Multi- scribes trade-offs, where developers of distributed sys- Embedded Mobile Cloud Cloud tems (and hence also distributed databases) can choose Historical Home Server to fully support only two features with high efficiency Computer Desktop out of three: Partition-tolerance, Availability and Databases: Main Memory Cloud Consistency. For example, if the system works cor- Hardware Centralized IoT rectly also in the case of network partitions and is highly Mobile Web Cloud available, then consistency must be relaxed, such that Platforms: Desktop Cloud some replicas may contain older states and not the Web/Mobile Fog/Edge/Dew most recent ones. The PACELC theorem [20] refines the PAC theorem and states that in the case of network P2P Partitions only Availability or Consistency is guaran- SMEs: Small and medium-sized enterprises * social media, search engines teed. In case of no failures when the databases run nor- mally (Else), then there is a trade-off between Latency Figure 2: Data sizes in companies, devices, databases and platforms. See [18] for the estimation of atoms on earth. and Consistency, i.e., only small latency or high con- sistency can be guaranteed, but not both at the same time. Distributed triple stores, which are built on top of NoSQL databases, inherit the properties of their un- platforms are utilized for data storing and processing, derlying systems: For example, D-SPARQ [21] sup- and the overall best approaches are chosen according ports PA/EC, because it is based on MongoDB7 . Jena- to the platform properties. HBase [22], H2RDF [23] and H2RDF+ [24] inherit the PC/EC properties of HBase8 . CM-Well9 is based on 3. Multi-Platform Multi-Model Cassandra10 supporting PA/EL. Remaining research challenges include hybrid approaches supporting PA Databases and PC (as well as EL and EC) for different fragments of the data at the same time according to their appli- Figure 2 provides an overview over data sizes of differ- cations. ent types of data used in companies, devices, databases Hence there is a need to run these different types of and platforms. It already becomes obvious that some databases at the same time, but there might be also types of databases fit better to the considered types the need for integrating the data of these databases of data and company, used devices and platforms than (like in the scenario of combining the data of IoT de- the others. Hence the different types of data are stored vices with accounting data). For an advanced process- on and processed at different platforms dependent on ing of this different types of data stored in different their size, the devices they are generated at and other databases and other database tasks it is indispensable properties like their velocity. Integrating these data to break the boundaries of single installations of these sets implies to support multiple models and also dif- DBMSs and to run one single DBMS. Furthermore, it ferent platforms at the same time. This also requires to is desirable that this single DBMS provides a seman- support and integrate different types of databases run- tic layer for advanced processing and reasoning capa- ning on different platforms. For example, one might bilities and for a tight integration of the different data combine the data of IoT devices (stored in an IoT models. This would also allow to offer the best features database running on the edge of the network) with the accounting data containing the remaining time for 7 https://www.mongodb.com/ (accessed on 17.12.2020) charging off (stored in a main memory database run- 8 https://hbase.apache.org/ (accessed on 17.12.2020) 9 https://github.com/CM-Well/CM-Well (accessed on ning on an employee’s desktop computer). These dif- 17.12.2020) ferent types of databases have different properties and 10 https://cassandra.apache.org/ (accessed on 17.12.2020) 21 of the different types of databases to applications and ecution plans are ideal for many-core CPUs and GPUs users “under one hood” transparently or with an in- as well as whenever the best possibilities among enu- telligent integration into one query language and API. merated ones must be found (like in query optimiza- This single SHM3P DBMS installation runs over all tion and multi-version concurrency control (MVCC)). platforms at the same time offering the advantages of Complex operations like joins processing large data all the different types of DBMSs (to the data that has inputs are very suitable for GPU-acceleration, too (see been previously processed by the single installations) e.g. [25] for especially designed joins for SPARQL pro- tightly integrated in a semantic layer, but to have e.g. cessing on GPUs). a global optimization of data distribution, transaction Field-programmable gate arrays (FPGAs) can recon- handling and global queries and reasoning tasks with figure interconnects for connecting programmable logic full potential by having freedom of processing down to blocks with each other. This property makes FPGAs the physical layer (e.g., index accesses)11 . One single ideal suitable for data-flow-driven algorithms (like pro- SHM3P DBMS would also reduce development costs cessing an execution plan for evaluating queries in a of applications and periods of vocational adjustment streaming way without block-wise materialization of of developers by offering one API and query language intermediate steps like it is the case for many-core CPUs with an additional semantic layer for all different plat- and GPUs), but also any arbitrary type of parallelism forms. A very big challenge for SHM3P DBMSs is to can be offered by FPGAs. FPGA-acceleration of SPARQL provide a global distributed reasoner, which integrates query processing as discussed in e.g. [26] achieves different types of reasoners to be processed on the dif- scalable speedups even increasing with larger data sets. ferent platforms, where reasoning is optimized for this Dynamic partial reconfiguration enables FPGAs to dy- heterogeneous environment minimizing overall costs namically exchange their configurations to process dif- combining weighted costs of different types (commu- ferent queries at runtime [26]. nication, processing, lifetime of IoT devices etc.). Universal quantum computers try to combine the full power of classical computers with quantum com- 3.1. Platforms puters that manipulate (some few) qubits in super po- sition by applying quantum logic gates. In compari- We describe shortly the different platforms running son, quantum annealers - operating on up to several execution environments for different types of DBMSs thousand qubits - only run special types of quantum here. algorithms to solve adiabatic (as special form of com- Server Platforms are typical platforms for database binatorial) optimization problems, which is e.g. the servers of small to medium-sized enterprises (SMEs). case for traffic control12 , selecting the execution plan The DBMSs running on servers are usually centralized with the best estimated costs (from a set of enumer- databases, which are operating in parallel on multi- ated plans) [27], concurrency control between transac- core and sometimes many-core systems, often in vir- tions [28] as well as optimizing transaction schedules tual machines. Relational DBMSs, most Semantic Web [29, 30]. DBMSs and Reasoners are typically running on server Cloud Databases are designed to be run in the platforms, and all other types of DBMSs usually offer cloud, where (storage and computing) resources can a local mode to run on a single server. be dynamically allocated and freed according to users’ Hardware-Accelerated Servers speed up database demands. Hence, cloud databases must consider that tasks by utilizing the massive parallelism of special nodes (for storing and computing) are joining and leav- hardware behind today’s multi-core CPUs. ing, such that it may be necessary to redistribute data Modern Graphical Processing Units (GPUs) consist and to react for processing jobs on leaving nodes. Fur- of several thousand computing cores, which follow the thermore, as the nodes are typically not high-end hard- single-instruction multiple-data paradigm, i.e., the same ware like servers with redundant components and instruction is executed on different data on different clouds consist of many more nodes (up to several thou- cores at the same time. GPUs are often regarded as sand nodes), hardware and communication failures may special form of many-core CPUs. Hence, neither all occur more often. Hence, cloud computing architec- parallel algorithms are suitable for nor benefit from tures apply simple fault-tolerance mechanisms by re- GPUs. However, the massive parallel processing of ex- peating crashed jobs. Table 2 contains an overview 11 Note that single installations of DBMSs can only be accessed 12 investigated by Volkswagen, see via their offered APIs or by setting up subqueries (of the global https://www.volkswagenag.com/en/news/stories/ query) to them, which hinders the full potential of optimized pro- 2018/11/intelligent-traffic-control-with-quantum-computers.html cessing of e.g. joins between the data of the different DBMSs. (accessed on 17.12.2020) 22 Table 1 Rough Evaluation of different Types of Databases. DBMS Main Paral- Distri- Fede- Cloud Web Mobile IoT Feature Memory lel buted rated Cloud Scalability – O + + + + + + + + + + Transaction rates + + + + + O / + O + + + – – – Intra-Transaction + + + + + O / + – / O + O – – Parallelism Atomicity + + + + + + + + + + + + + Durability + + + + + + + + + – O – Consistency + + + + + + + + + + + + + Extensibility – + O / + O + + + + + – + + + Schemaless – – – – – – – – – – + + + + + + + + + + Availability + + + + – – – – – – – – – – Transparency of + + + + + O + + – – – – Distribution Geographical Dis- – – – + + + + + + + + + + + tribution Mobility – – – O O O + + + Node Autonomy – – – O + O – – + + + Heterogeneity of – – – – + – – + + + + + DBMS Administration O O – – / – – – + + – – – – – Hardware Costs – – – – – + + + + + – + + + Reasoning + + + + + + + – – + + + – – – – – Table 2 Evolution of Big data analytics engines. Based on [16] and extended by the rows “Impact on Databases” and “Impact on Reasoning”. Generation: 1 2 3 4 Features: Batch + Interactive + Near-Real-Time + Real-Time Streaming + Iterative Processing + Native Iterative Processing Processing MapReduce DAG Dataflows Resilient Distributed Cyclic Dataflows Model: Datasets (RDD) Impact on Long-Running Query Answering + Continuous Queries + Real-Time Continuous Queries Databases: Queries with lower latency Impact on Long-Running Reasoning with + Capabilities for Stream + Cap. for Real-Time Stream Reasoning: Reasoning lower latency Reasoning Reasoning Engine: Hadoop TEZ Spark Flink over important state-of-the-art Big Data analytics en- a new form of cloud: the web cloud [35]: One just gines working in cloud environments. Additionally visits with his/her web browser a certain webpage in to one-time queries, Apache Spark and Apache Flink order to connect his/her computer to the web cloud. offer to process data streams and continuous queries, In this way the setup of the web cloud is much easier such that they also belong to the type of than those of traditional clouds. Furthermore, the web stream databases. There exists various examples of cloud has a much larger number of potential nodes, as Semantic Web databases on top of the different Cloud any computer running a browser may connect to and technologies like [31] (HBase, Pig), [32] (Spark) and be integrated in the web cloud. New challenges arise [33] (Flink), but also other contributions avoiding to when setting up a cloud by web browsers: The nodes use the well-known technologies like [34] in order to may be more often disconnected. Data is processed support local joining. Web Cloud Databases rely on within the browser and hence we must use the tech- 23 nologies offered by the browser for data management but as more IoT devices are also available. purposes. New technologies like WebAssembly [36] Dew computing [45, 46] overcomes availability prob- introducing a virtual machine for the browser may help lems, where the communication between cloud and to speed up processing in the browser. There exist first IoT devices is disturbed, by placing an additional local approaches to distribute SPARQL queries in some kind server near to the IoT devices taking over the tasks of of web clouds [37]. the cloud during downtimes and synchronizing with Mobile Databases [38] involve the technical infras- the cloud at uptimes. tructure of mobile providers like base stations (being Besides many approaches to semantic IoT like cor- near-by to their connected mobile devices) in order to responding ontologies [47] and interoperability issues speed up processing, lower communication (and hence [48], there are not so many contributions to seman- also energy) costs, increase availability and durability tic IoT databases. IoT databases are often organized (by logging at the base stations instead on mobile de- as P2P database, especially if they work on the fog or vices) in order to overcome limitations of the mobile edge, or follow the dew computing concept. Hence devices. Some RDF stores like [39] are especially de- contributions to P2P networks processing Semantic Web signed to run on mobile devices, but they do not con- data like [41] are relevant for semantic IoT databases sider the backend of mobile providers so far. as well. One of the big challenges here is the distribu- P2P Databases [40, 41] use peer-to-peer (P2P) net- tion of data and processing tasks between cloud and works as underlying backend technology to master a IoT infrastructure including the devices themselves. frequent joining and leaving of nodes for data stor- Furthermore, IoT devices often generate data streams, ing and processing. In comparison to clouds, they are such that organizing the IoT database as stream database designed for a much more frequent change in their is a reasonable choice: The IoT application design may topology and for an equal distribution of functional- especially consider to reduce data by aggregation and ity without distinction of master and slave nodes. P2P focusing on only relevant data, which should be done databases have to introduce more redundancy in data nearby the things. One research direction may con- storing as well as even in processing in order to over- sider how to use Semantic Web technologies for defin- come the frequent disconnections to their nodes. Fur- ing such aggregation tasks. Reasoning at data sources thermore, P2P databases must consider heterogeneity or nearby, or in clouds is another difficult question and in the connected nodes much more than other types of not so easy to answer in comparison to query process- databases. There exist already quite many approaches ing on the fog or edge, as reasoning consumes much for semantic data processing in P2P networks like [41], more processing resources. but ontology inference is considered only on a rudi- mentary basis and for trigger and continuous queries 3.2. (S)HM3P Databases and their not at all [16]. Challenges IoT Databases [42] are especially developed to serve as data store for large-scale installations of the Internet- HM3P databases are single installations of a M3P DBMS, of-Things (IoT). IoT databases often operate in the cloud, which are not only able to run on multiple platforms, but the communication bootleneck from the IoT de- but runs and tightly integrates different types of DBMSs vices to the cloud doesn’t scale especially for IoT de- for ease of use and optimization purposes at runtime. vices with high velocity and large-scale installations. SHM3P databases integrate the different types of DBMSs In companion with the cloud, fog computing [43] in an additional semantic layer and supports global stores and processes data and application logic on near- reasoning over all integrated DBMSs. things edge devices with higher capabilities (rather than IoT databases operating at the same time in clouds primarily in cloud data centers), which saves commu- and on fog, edge or dew computing are reasonable ex- nication avoiding the route over the internet backbone. amples for H3MP DBMSs: They span over different However, fog computing is not really scalable in the platforms, the edge of the IoT network and the cloud number of connected things, as the near-things edge data centers, and have to distribute functionality like devices do not increase in number and capabilities in data aggregation at or near to the things and complex the same way. operations, e.g., natural language processing and rea- The scalability issue is solved in a better way by soning, at the cloud data centers. Furthermore, IoT edge computing [44], which utilizes additionally all databases have to consider different types of query pro- IoT devices for data storage and processing, and ex- cessing by dealing with traditional (one-time) queries ecuting application logic: As more IoT devices are de- on static data, continuous queries on data streams and ployed, as more data needs to be stored and processed, spatial-temporal queries on archived data of data streams. 24 IoT devices are often heterogeneous because they are • developing multi-platform transaction synchroniza- e.g. developed by different manufacturers: the use of tion approaches and supporting global transaction ontologies and hence of semantic databases simplifies synchronization approaches over distributed differ- the integration of these devices. Semantic IoT databases ent transaction synchronization approaches running sometimes manage data at the IoT devices in the tradi- on different platforms tional way for performance reasons and only support • combining different types of databases (on different reasoning and semantic querying at the cloud centers platforms) to offer the best of these databases and after transforming the data of IoT devices to semantic platforms under one hood to applications and users data [16]. Other approaches support even reasoning transparently or via intelligent integration into query on streams [16]. language and API, e.g., guaranteeing atomicity and Multi-platform DBMSs are already highly ambitious isolation in transactions for the data stored on a par- even for large, established database companies since it allel server, but not for those data in the cloud sup- requires data management skills in an extremely wide porting fast updates spectrum (i.e., data management issues in sensors and smart objects for IoT databases are completely differ- Specific challenges of SHM3P DBMSs are ent from the challenges of in-memory databases of P2P • integrating different data models in a semantic layer data oriented systems and semantic querying and rea- on top of the underlying data models soning of Semantic Web databases). Hence current ap- • efficient transformations from and to the semantic proaches are more on interoperability between the va- model in an operational system riety of DBMSs, each one focusing on its specific issues • developing efficient semantic querying and reason- related to its specific functionalities. However, we pro- ing over the integrated data of different models pose to support a global approach to integrate all these • global reasoning over reasoners running on differ- specific functionalities in order to use their different ent platforms supporting some kind of distributed benefits in an uniform way and to increase the overall heterogeneous reasoning benefits of the global approach. • developing a combination of stream reasoning over New challenges of M3P and HM3P DBMSs in streaming data (e.g. of IoT devices) with static rea- comparison to traditional DBMSs and MM-DBMSs are soning over large-scale data sets (stored e.g. in clouds) • developing only one code base for the different plat- • supporting transactions over semantic data by inte- forms, but not introducing performance overhead in grating the reasoner in transaction synchronization comparison to single platform databases13 • identifying common properties of several platforms We are sure that this is not an exhaustive list of new and reusing those approaches (like fault tolerance challenges. Many further challenges will arise during mechanisms) in different combinations, which are developing the (S)(H)M3P DBMSs and considering es- best suitable for these considered platforms pecially combinations of different platforms and mod- • data distribution among different platforms (apply- els at runtime. ing different data distribution approaches as well) • efficient binary serialization and communication pro- tocols for integrating the different platforms 4. Summary and Conclusions • data distribution strategies considering overall the Multi-model databases provide the infrastructure to han- different properties of used platforms and models dle the zoo of data models managed in today’s compa- (like fast reads in relational databases on parallel nies. Multi-model databases that are able to run on servers and fast updates in cloud databases) a variety of platforms, which are typically deployed • query optimization and other database tasks across and in use in parallel in today’s companies, are called different platforms, which apply different database multi-model multi-platform database management sys- approaches tems (M3P DBMSs). Hybrid M3P (HM3P) DBMSs span • dealing with and integrating different privacy over different platforms at run-time. Our focus is on its and security mechanisms supporting different pri- semantic counterpart: Semantic HM3P (SHM3P) DBMSs vacy and security levels in the different platforms offer its additional semantic layer for simple integra- (with research e.g. on querying heterogeneous en- tion of the DBMS technologies of its operational plat- crypted data) forms. Furthermore, we describe and analyze different 13 We are of the opinion that this is possible by applying Kotlin types of DBMSs and platforms concerning their prop- features like expected and actual declarations for classes and types, erties, chances and challenges for DBMSs with spe- and inline functions and classes. 25 cial focus on Semantic DBMSs. Current state-of-the- systems, in: AFIPS National Computer Confer- art (S)M3P DBMSs don’t exploit the multiple platform ence, 1981, pp. 487–499. idea to its full potential, because they typically only [11] H. Lim, Y. Han, S. Babu, How to fit when no one tightly integrate one type of platform and database. size fits., in: CIDR, 2013. We see great further optimization possibilities in data [12] A. Elmore, J. Duggan, M. Stonebraker, M. Bal- and functionality distribution like query processing, azinska, U. Cetintemel, V. Gadepally, J. Heer, reasoning and transaction handling, and ease of usage B. Howe, J. Kepner, T. Kraska, S. Mad- when different types of platforms and databases are den, D. Maier, T. Mattson, S. Papadopoulos, supported in one single installation of a M3P DBMS J. Parkhurst, N. Tatbul, M. Vartak, S. Zdonik, A by tightly integrating them based on a semantic layer. demonstration of the bigdawg polystore system, Proc. VLDB Endow. 8 (2015) 1908–1911. [13] J. Lu, Z. H. Liu, P. Xu, C. Zhang, UDBMS: road References to unification for multi-model data management, in: ER Workshops, 2018, pp. 285–294. [1] J. Lu, I. Holubová, Multi-model databases: A new [14] I. Holubova, S. Scherzinger, Nextgen multi- journey to handle the variety of data, ACM Com- model databases in semantic big data architec- puting Surveys (CSUR) 52 (2019). tures, Open Journal of Semantic Web (OJSW) 7 [2] R. Kotorov, Customer relationship management: (2020) 1–16. strategic lessons and future directions, Business [15] W3C, Semantic Web Development Tools, ac- Process Management Journal 9 (2003) 566–571. cessed on 23/4/2020. https://www.w3.org/2001/ [3] S. Leberknight, Polyglot persis- sw/wiki/Tools. tence, Scott Leberknight’s Weblog, [16] S. Groppe, Emergent models, frameworks, and http://www.sleberknight.com/blog/sleberkn/ hardware technologies for big data analytics, The entry/polyglot_persistence, 2008. Journal of Supercomputing 76 (2020) 1800–1827. [4] S. Groppe, J. Groppe, Hybrid multi-model multi- [17] L. Feigenbaum, G. T. Williams, K. G. platform (hm3p) databases, in: Proceedings Clark, E. Torres (editors), SPARQL 1.1 of the 9th International Conference on Data Protocol, 2013. W3C Recommendation, Science, Technology and Applications (DATA), https://www.w3.org/TR/sparql11-protocol/. 2020. [18] D. Weisenberger, How many atoms are there [5] JetBrains s.r.o., FAQ - Kotlin Programming Lan- in the world?, accessed on 17.12.2020. http:// guage, 2020. URL: https://kotlinlang.org/docs/ education.jlab.org/qa/mathatom_05.html. reference/faq.html. [19] E. A. Brewer, Pushing the CAP: strategies for [6] B. Kolev, P. Valduriez, C. Bondiombouy, consistency and availability, Computer 45 (2012) R. Jiménez-Peris, R. Pau, J. Pereira, Cloudmdsql: 23–29. querying heterogeneous cloud data stores with [20] D. Abadi, Consistency tradeoffs in modern dis- a common language, Distributed and Parallel tributed database system design: CAP is only Databases 34 (2016) 463–503. part of the story, Computer 45 (2012) 37–42. [7] M. Zhu, T. Risch, Querying combined cloud- [21] R. Mutharaju, S. Sakr, A. Sala, P. Hitzler, D- based and relational databases, in: International sparq: Distributed, scalable and efficient rdf Conference on Cloud and Service Computing, query engine, in: Proceedings of the 12th In- 2011, pp. 330–335. ternational Semantic Web Conference (Posters & [8] R. Bonaque, T. D. Cao, B. Cautis, F. Goasdoué, Demonstrations Track), Sydney, Australia, 2013, J. Letelier, I. Manolescu, O. Mendoza, S. Ribeiro, p. 261–264. X. Tannier, M. Thomazo, Mixed-instance query- [22] V. Khadilkar, M. Kantarcioglu, B. Thuraisingham, ing: a lightweight integration architecture for P. Castagna, Jena-hbase: A distributed, scalable data journalism, PVLDB 9 (2016) 1513–1516. and efficient rdf triple store, in: Proceedings of [9] M. Hammer, D. McLeod, On Database Manage- the 2012th International Conference on Posters ment System Architecture., Technical Report, & Demonstrations Track, Boston, USA, 2012, p. MIT, Cambridge Laboratory for Computer Sci- 85–88. ence, 1979. [23] N. Papailiou, I. Konstantinou, D. Tsoumakos, [10] J. M. Smith, P. A. Bernstein, U. Dayal, N. Good- N. Koziris, H2RDF: Adaptive query processing on man, T. Landers, K. W. T. Lin, E. Wong, Multibase: rdf data in the cloud, in: Proceedings of the 21st Integrating heterogeneous distributed database International Conference on World Wide Web, 26 Lyon, France, 2012, p. 397–400. fication, W3C Proposed Recommendation, https: [24] N. Papailiou, I. Konstantinou, D. Tsoumakos, //www.w3.org/TR/wasm-core-1/, 2019. P. Karras, N. Koziris, H2RDF+: high-performance [37] A. Grall, P. Folz, G. Montoya, H. Skaf-Molli, distributed joins over large-scale RDF graphs, in: P. Molli, M. Vander Sande, R. Verborgh, Ladda: Proceedings of the 2013 IEEE International Con- Sparql queries in the fog of browsers, in: Euro- ference on Big Data, Santa Clara, USA, 2013, pp. pean Semantic Web Conference, Springer, 2017, 255–263. pp. 126–131. [25] X. Zhang, M. Zhang, P. Peng, J. Song, Z. Feng, [38] V. Kumar, Mobile database systems, Wiley On- L. Zou, A scalable sparse matrix-based join for line Library, 2006. sparql query processing, in: International Con- [39] D. Le-Phuoc, J. X. Parreira, V. Reynolds, ference on Database Systems for Advanced Ap- M. Hauswirth, Rdf on the go: An rdf storage and plications, Springer, 2019, pp. 510–514. query processor for mobile devices, in: ISWC, [26] S. Werner, D. Heinrich, S. Groppe, C. Blochwitz, Citeseer, 2010. T. Pionteck, Runtime adaptive hybrid query en- [40] K. Graffi, D. Stingl, C. Gross, H. Nguyen, A. Ko- gine based on fpgas, Open Journal of Databases vacevic, R. Steinmetz, Towards a p2p cloud: Reli- (OJDB) 3 (2016) 21–41. able resource reservations in unreliable p2p sys- [27] I. Trummer, C. Koch, Multiple query optimiza- tems, in: International Conference on Parallel tion on the d-wave 2x adiabatic quantum com- and Distributed Systems, 2010, pp. 27–34. puter, Proc. VLDB Endow. 9 (2016). [41] R. Mietz, S. Groppe, O. Kleine, D. Bimschas, S. Fis- [28] S. Roy, L. Kot, C. Koch, Quantum databases, in: cher, K. Römer, D. Pfisterer, A p2p semantic CIDR, 2013. query framework for the internet of things, PIK- [29] T. Bittner, S. Groppe, Avoiding blocking by Praxis der Informationsverarbeitung und Kom- scheduling transactions using quantum anneal- munikation 36 (2013) 73–79. ing, in: 24th International Database Engineering [42] ObjectBox Limited, The best IoT Databases for & Applications Symposium (IDEAS), Seoul, Re- the Edge – an overview and compact guide, public of Korea, 2020. https://objectbox.io/the-best-iot-databases-for- [30] T. Bittner, S. Groppe, Hardware accelerating the the-edge-an-overview-and-compact-guide/, optimization of transaction schedules via quan- 2019. tum annealing by avoiding blocking, Open [43] M. Abdelshkour, Iot, from cloud Journal of Cloud Computing (OJCC) 7 (2020) 1– to fog computing, Cisco Blogs, 21. URL: http://nbn-resolving.de/urn:nbn:de:101: http://blogs.cisco.com/perspectives/iot-from- 1-2020112218332015343957. cloud-to-fog-computing, 2015. [31] S. Groppe, T. Kiencke, S. Werner, D. Heinrich, [44] P. Garcia Lopez, A. Montresor, D. Epema, M. Stelzner, L. Gruenwald, P-luposdate: Us- A. Datta, T. Higashino, A. Iamnitchi, M. Barcel- ing precomputed bloom filters to speed up sparql los, P. Felber, E. Riviere, Edge-centric comput- processing in the cloud, Open Journal of Seman- ing: Vision and challenges, SIGCOMM Comput. tic Web (OJSW) 1 (2014) 25–55. Commun. Rev. 45 (2015) 37–42. [32] D. Graux, L. Jachiet, P. Geneves, N. Layaïda, Spar- [45] K. Skala, D. Davidovic, E. Afgan, I. Sovic, Z. Sojat, qlgx: Efficient distributed evaluation of sparql Scalable distributed computing hierarchy: Cloud, with apache spark, in: ISWC, 2016. fog and dew computing, Open Journal of Cloud [33] A. Azzam, S. Kirrane, A. Polleres, Towards Computing (OJCC) 2 (2015) 16–24. making distributed rdf processing flinker, in: [46] Y. Wang, Definition and categorization of dew Innovate-Data, IEEE, 2018, pp. 9–16. computing, Open Journal of Cloud Computing [34] S. Groppe, J. Blume, D. Heinrich, S. Werner, A (OJCC) 3 (2016) 1–7. self-optimizing cloud computing system for dis- [47] S. Mishra, S. Jain, Ontologies as a semantic model tributed storage and processing of semantic web in iot, International Journal of Computers and data, Open Journal of Cloud Computing (OJCC) Applications 42 (2020) 233–243. 1 (2014) 1–14. [48] A. Cimmino, M. Poveda-Villalón, R. García- [35] S. Groppe, N. Reimer, Code generation for big Castro, ewot: A semantic interoperability ap- data processing in the web using webassem- proach for heterogeneous iot ecosystems based bly, Open Journal of Cloud Computing (OJCC) on the web of things, Sensors 20 (2020) 822. 6 (2019) 1–15. [36] A. Rossberg (editor), WebAssembly Core Speci-