2Acknowledging that the list is not complete and does not men

NoSQL, Open Source M MDMS

Daniel Ritter

0 1 2 3 4 5

Luigi Dell'Aquila

0 2 3 4 5

Andrii Lomakin

0 2 3 4 5

Emanuele Tagliaferri

0 2 3 4 5 0 CEUR Workshop Proceedings , CEUR-WS.org 1 Hasso Plattner Institute, University of Potsdam , Potsdam , Germany 2 OrientDB / SAP SE , Walldorf (Baden) , Germany 3 Workshop Proce dings 4 a multi-model data design , namely OrientDB 5 base, DataStax , Redis Labs, MarkLogic, or (c) aspiring

1994

OrientDB is a full-function, NoSQL MMDMS (multi-model database management system), addressing the big data variety problem with one single, multi-model store and a SQL-based, multi-model query language. It combines graph and semistructured data management with object-oriented, text and spatial capabilities, and features a variety of deployment and distribution / replication options, transactional / ACID storage and indexing, making it a commercially successful MMDMS. With that, OrientDB is well-suited for novel adaptations of applications like smart logistics, asset management, social data storage and analysis, and other use cases that require multiple perspectives on the data.

Graph data management Multi-model database NoSQL Object-oriented Semi-structured data

2Acknowledging that the list is not complete and does not men

tence (→ persisting data)

3OrientDB, visited 11/2021: https://git.io/JRY6H 1. Introduction

Modern analytical, business and smart applications (e. g., in the areas of logistics, asset management, and social network analysis) require eficient storage and access to large amounts of multi-model data [1, 2, 3]. Data management platforms addressing the multi-model or in general the big data variety problem could be more generally referred to as multi-model data platforms [1] (e. g., as in the Q3/2021 Forrester wave1 that especially stresses on the importance of such a polyglot persistence model for modern applications).

In the literature, these multi-model data platforms

are commonly diferentiated into polystores (e. g., BigDAWG [4]) and multi-model database management system (MMDMS) [1, 5], according to the number of separate data stores: polystores have multiple, federated data stores and MMDMSs have one single, integrated data store. In practice, there are several (commercial) multimodel data platforms2 that are (a) more on the polystore side, e. g., from Microsoft, Oracle, and SAP, or (b) on the “best-in-class” NoSQL systems like MongoDB (document) and Neo4j (graph) (all in the Forrester Wave). nEvelop-O LGOBE 0000-0001-6146-3365 (D. Ritter) CEUR htp:/ceur-ws.org ISN1613-073 © 2021 Copyright for this paper by its authors. Use permitted under Creative (2) SQL-like, multi-model query / operations, e. g., traversal, matching, lookup (→ querying data) multi-model database side, e. g., from ArangoDB, Couch- like inheritance with semi-structured document (schema

OrientDB aims to cover all of them, distinguishing it

from existing single-model databases that are growing Relational Graph Document OrientDB into multi-model data platforms (in Tab. 1: MongoDB Table Vertex / Edge Collection Class misses (2) and partially (3), Postgres and Redis require Row Vertex Document Vertex / extensions / modules for ( 1 ), Postgres could ease ( 4 ), and document Redis lacks (2)). Its multi-model versatility (cf. diferen- Column Property pKaeiyr-va/luaet- aPtrtorpibeurttey / tiators ( 1 )–( 4 )) lets OrientDB compete with single-model tribute idtotcoummaeinnttasitnorheisg[h6]raannkdsgarsaaphNdoaStQaLbadsaetsa[b7a]seansdyshteemlp4s. Ref. / join Edge - (join) Edge / link The multi-model concepts and design decisions that Ori- (i. e., object-oriented features like inheritance and polyentDB followed are presented in this work and might morphism are usable throughout the entire data model). further inspire designs of other MMDMSs. Considering a row table, OrientDB follows graph and

This paper is organised as follows: Sect. 2 covers the document approaches and ofers vertex and document data model (cf. diferentiator ( 1 )) in Sect. 2.1 as well as entities, while in correspondence to column, property querying data (cf. (2)) in Sect. 2.2. Section 2.3 gives in- (i. e., property graph) and attribute (i. e., key-value pairs) sights into persisting data (cf. (3)). In Sect. 3 the general entities are available. system architecture is introduced and the distribution The equivalent to a table join in graph databases is the model (cf. ( 4 )) is explained. Section 4 shares initial perfor- edge entity, which is the same in OrientDB. In addition, mance numbers on how OrientDB compares to selected documents might directly refer to other documents via systems in Tab. 1 and Sect. 5 concludes the paper. link entities, usually not found in document stores. Logical, physical model entities In Fig. 1, the basic 2. Multi-model data entities are set into context to each other and connected In this section we describe OrientDB’s data model (cf. dif- to the most important (physical) storage entity. That storferentiator ( 1 )), its unified SQL-based query capabilities age entity is called cluster and denotes the place, where over multi-model data (cf. (2)), and unified data storage the data of a class is stored, possibly in diferent physical and manipulation (cf. (3)). locations. In fact, usually multiple clusters are automatically created or assigned for each class (e. g., one cluster 2.1. Data definition per CPU core or one per data center). A cluster has an As many NoSQL databases, OrientDB exclusively focuses identifier id, information on whether it is the default cluson the main non-relational data categories (i. e., graph, ter of a class and a selection strategy that is considered, document, key-value [8]), as shown in Tab. 1. However, when adding new data (e. g., round-robin, balanced). Furfor a single, unified, and expressive data model (cf. dif- ther storage details on clusters are discussed in Sect. 2.3. ferentiator ( 1 )), its multi-model data definition uniquely Classes are object-oriented, schema constructs that combines semi-structured / document with graph data might have a super class (e. g., for inheritance) and properand object-oriented principles. ties with constraints / rules. OrientDB supports flexible / Base entities For a better understanding, Tab. 2 sets schema-less data, but also strict and mixed-schema. With the main entities of OrientDB’s data model into context properties and constraints, the latter can be realized. Orito its relational, graph and document counterparts. In- entDB ships several default classes like OUser and ORole stead of a relational table, a graph vertex / edge or a doc- for security, vertex V and edge E, among others, from ument collection, OrientDB defines class — known from which properties can be derived. object oriented programming — as its top level entity The main entity — in which all data is stored — is a record, which gets assigned a record identifier @rid (RID), a version @version and a class @class. The RID is of the format #<CID>:<RP>, where <CID> is the cluster

4db-engines.com: “graph+dbms” and “document+store”, 8/2021

link , , logical storage entities

Record Document n 1 n Field / keyvalue pair is a Vertex Edge on values TP blueprints compatible

schema 1

Class

1 n

Propertty

1 n Constraint / rule super class 1 "collection", "table" physical storage 1..n

Cluster

default, id, selection

1 --- collection as vertex Cluster 11 customer 1 2 ... 382 .C.. lusnter 23 C-R-E-ATiEnsCeLrAtSSdocpeurmseonntsEXTENDS V; [_europe] Cluster 4 INSERT INTO person CONTENT {"name": "Frank"}; default 5 INSERT INTO person CONTENT {"name": "Linda",

6 "likes": "#13:382"} Figure 2: Clusters and record identifiers @rid 7 --- collection as edge

8 CREATE CLASS friend EXTENDS E; identifier and <RP> is the record’s position in that cluster, 9 CREATE EDGE friend FROM #13:382 to #13:37; thus it is a direct pointer to its physical storage location (cf. Ex. 1), similar to CTID in PostgreSQL. Listing 2.: Records / documents in OrientDB Example 1 (Clusters, RIDs). Figure 2 shows a 1 {"@rid": "13:382", "@class": "person", Customer class with three clusters (i. e., identified by 2 "name": "Frank", ... } 11–13) and default cluster 11. A record with RID #13:382 3 can be found in cluster 13 at position 382. ∎ 4 {"@rid": "13:37", "@class": "person", OrientDB has diferent record types with document as the 5 "name": "Linda", "likes": "#13:382", ... } most relevant. A document represents semi-structured The corresponding JSON records in OrientDB are shown in data by a set of JSON field elements or key-value pairs, Listing 2. Notably, the RIDs and class assignments are part /demnioxteindgscphreompear)t.ieUsnolfiktheeortheceorrddo’scuclmasesn(tes.tgo.,refos,r dstorciuct- of the documents and can be used as shown above. ∎ ments in OrientDB can have links to other documents, Sizing / limits While the number of databases within where a link is a set of RIDs. While in a schema-less an OrientDB server is not limited, each database can have setup no properties are required, the creation of an index 215 − 1 clusters. A cluster can store 263 − 1 records, which for a field requires a property. allows up to 278 − 1 records per database. The maximum

From a graph perspective, vertex and edge entities are size of a record / document is 2GB (cf. 16MB in MongoDB documents, thus allowing them to be simple property and 512MB in Redis). There is no limit on the number of graphs with key-value pairs or carrying more complex properties for a schema-less, and 2 × 109 properties per data like JSON documents (cf. Ex. 2). database for schema-full usage.

Example 2 (Class, document, graph). OrientDB’s Summary OrientDB specifies a single, unified data multi-model, SQL DDL, DML allow for the creation of a model with an object-oriented class entity at its core. class that is of type vertex V as shown in Listing 1. The The object-oriented features allow for expressive modelclass extends built-in V to specify that the documents in ing, including key-value or JSON document records and person can be vertices in a graph (line 2). As in SQL, complex graph data at vertices and edges. there are several way to insert data. For simplicity, two JSON documents are inserted into the person class, the second one with a RID, indicating a document link with label likes (lines 4–6). Since the documents are vertices in a graph, they can also be related by creating an edge class friend and specifying an edge using the RIDs of the 2.2. Querying data On top of the unified data model, OrientDB specifies OSQL with SQL-like query capabilities and operations (cf. diferentiator (2)), which we will briefly introduce by example for document, graph, text and geospatial (including object-oriented capabilities).

Document queries OSQL queries on documents are Text and geospatial queries The text and geospatial mostly similar to standard SQL. Notable deviations de- features are based on the data model in Sect. 2.1 and both note the dot-notation for specifying fields in nested JSON require index creation (cf. Ex. 4). objects (e. g., address.city) and working with arrays Example 4 (Graph traversal, pattern matching). (e. g., UNWIND returns the entries of a JSON array as single We recall that for indexed fields, a property has to be added lines). to the class, as shown in Listing 5.

Besides the usual SQL keywords like DISTINCT, object- Listing 5.: Text and spatial indexes oriented extensions are added (e. g., to check the type of a class with INSTANCEOF). 1 TRAVERSE friend FROM #13:37 2 WHILE $depth <= 3 BREADTH_FIRST; 1 --- property, fulltext index 2 CREATE PROPERTY person.name STRING; Graph queries Apart from the support of TinkerPop 3 CREATE INDEX name ON Person(name) /trGavreemrsalilns qanuderpieasttoenrnitmsagtrcahpihnsg, O(cSf.QELx.a3ll)o.ws for graph 45 ---FULGLeToEsXpTatiEaNlGIiNnEdeLxUCENE; Example 3 (Graph traversal, pattern matching). 6 CREATE PROPERTY person.location EMBEDDED An example of graph traversal on the documents of ↪ OPoint; Listing 2 is depicted in Listing 3 and pattern matching is 7 INSERT INTO person SET location = illustrated in Listing 4. ↪ St_GeomFromText(

Listing 3.: Graph traversal: Breadth-first search 8 "POINT (51.498308 -0.176882)") 9 WHERE name = "Frank"; 10 CREATE INDEX person.location ON

↪ person(location) 11 SPATIAL ENGINE LUCENE;

Listing 4.: Graph pattern matching 1 MATCH 2 {class: Person, as: people, where: (name =

↪ 'Linda')} 3 RETURN people For spatial coordinates (e. g., numeric), decimal degree values can be parsed from String using St_GeomFromText (e. g., by updating Frank’s record).

Listing 6.: Text and spatial queries 1 --- text search using SEARCH_CLASS function Queries in Listings 3 and 4 return records #13:382, #13:37, 2 SELECT FROM person respectively. ∎ 3 WHERE SEARCH_CLASS("+name:Fran*") = true Graph traversals are defined using TRAVERSE instead of 4 --- spatial search using NEAR operator SELECT, but traversals can be embedded into queries as 5 SELECT *,$distance FROM person sub-queries. In traversal queries, the FROM-clause can 6 WHERE [location,$spatial] contain classes, clusters, one or more record identifiers 7 NEAR [51.495449,-0.17625,{"maxDistance":1}] and sub-queries. Similar to a WHERE-clause, the WHILE condition limits the traversal, while the result set can The syntax to support spatial arguments (i. e., $spatial) is be limited by LIMIT (not shown). During a traversal, taken from Lucene. ∎ projections help to restrict to fields that should be fol- In contrast to all other supported index types (e. g., hash, lowed. While in Listing 3 vertices of type friend are spec- B-tree / range), text and spatial indexes are created ified, OSQL supports *, ALL(), and ANY(). When spec- in Lucene5, which was therefore embedded into Oriifying a class, polymorphic traversals can be specified. entDB (cf. ENGINE LUCENE, ENGINE SPATIAL ENGINE For instance, when customer is a person, then specifying LUCENE). When creating indexes in Lucene, additional customer.name will also traverse person vertices. OSQL metadata can be passed to the underlying engines to consupports diferent search strategies like BREADTH_FIRST, figure according to their full capabilities (e. g., analyzers, which can be limited by selections on context variables parsers). like $depth (nesting depth) or $path (path traversed). Text queries require function SEARCH_CLASS (line 3) Similarly, linked documents are traversed and traversals to specify complex fulltext searches (e. g., regular exprescan be directed by using keywords like IN() or OUT(). sions) in a query’s WHERE-clause. Further search func

Graph pattern matching requires the MATCH keyword tions like SEARCH_FIELDS (search index for more than with JSON input for a valid target CLASS, an alias for one field) enable the full spectrum of Lucene’s search a node pattern (e. g., people), and a WHERE-clause that capabilities. matches a node in the pattern.

5Apache Lucene, visited 11/2021: https://lucene.apache.org/

Summary OrientDB defines a SQL-like, multi-model query language that covers all aspects of the unified data model (i. e., including object-oriented, document, graph, Summary OrientDB uses ACID transactions for key-value, text, and geospatial). External engines can be change operations on its single multi-model storage. The added, however, which loose their state, if not persistent. data is queried through extensible index capabilities, while data is accessed through a RID mapping to data pages on disk (i. e., mostly independent of the database size).

3. System architecture, Distribution / Replication In this section we introduce the general system architec

ture and deployment / distribution options for horizontal scaling (cf. diferentiator ( 4 )). In the architecture we locate the introduced single, multi-model data and storage components from Sect. 2 (cf. diferentiators ( 1 )–(3)). 2.3. Persisting data As one of the first MMDMSs, OrientDB provides a single, multi-model persistence with ACID transactions (cf. differentiator (3)), whose main aspects we briefly introduce.

Indexes OrientDB defines several built-in indexes like

Hash- and SB-Tree [9] for fast lookup and sequential record access. The indexes can be set as UNIQUE (i. e., not allowing for duplicate keys), which is also guaranteed in distributed setups. Composite keys allow for searches in multiple indexes at the same time. In addition, custom indexes can be specified, configured and loaded into the system. For instance, fulltext and geospatial indexes are provided that way through the Lucene search engine.

There can be up to two billion indexes per database, without limitations regarding the number of indexes per class (cf. 64 indexes per collection in MongoDB). 3.1. Component architecture The main deployment unit of OrientDB is the ODB Server shown in Fig. 4. The server accepts local OSQL and remote JDBC and REST calls. In addition, specialized integration like graph / Gremlin is available through modules Transactions, Storage Compared to many other provided by the open source community. The oficial NoSQL databases, OrientDB supports ACID transactions database administration tool is OrientDB studio, which with isolation levels READ COMMITTED (default, dis- uses the REST API. While business applications usually tributed) and REPEATABLE READS (single instance). Dur- use JDBC and REST, other clients use OrientDB’s binary ing the commit of a transaction, records are physically protocol to access the server. Besides data import and mistored using its main storage entity (i. e., cluster). Each gration tools like ETL, Neo4j (community projects) and cluster is split in pages, which contain system informa- teleporter (relational data migration tool). OrientDB’s tion (e. g., checksum for integrity / recovery) and record active open source community contributed several landata (i. e., paginated storage). On disk, the data is stored guage bindings / connectors for Python, .Net, NodeJS that in variable size data files as shown in Fig. 3. The RIDs are complement the standard Java, JDBC, and REST drivers. mapped through a fixed size, append-only cluster position Subsequently, we briefly introduce internal layers that map through cluster pointers (i. e., page identifier, record were recently significantly reworked: multi-model API,

Distribution Layer (coord., repl.) REST OSQL JDBC Spark Multi-Model API Daterbase Layer (e.g., ODB SQL engine, index, transactions, security) Storage Layer (e.g., embedded, in-memory; disk cache, paginated storage, restore / WAL) File System

binary protocol

Graph API / Gremlin r e v r e S B D O database, and storage layers. The distribution layer will be discussed in Sect. 3.2.

Multi-model API OrientDB originally started out with separate document and graph APIs. From version 3.0, the access to the database layer is managed by a unified, multi-model API that we introduced in Sect. 2.2. The new API makes use of the data model from Sect. 2.1 to give a combined perspective on multi-model data. The old document and graph APIs as well as access via TinkerPop 2.x is deprecated and support for TinkerPop 3.x was added (cf. Graph API / Gremlin). ( 1 ) Embedded (2) Standalone

Application

ODB

ODB (3) Replication

ODB (replica N-2ODB r/w (replica N-1) r/w Figure 5: Distribution and replication ( 4 )

Mixed Embedded Application

ODB (replica N) r/w Database layer The OSQL optimizer and execution (cf. variant ( 1 )) and scaled-out to several OrientDB inare part of the database engine. The incoming OSQL stances through replication. Embedded instances can statements are transformed into logical, physical plans provide replicas, and thus run in a mixed deployment ( 4 ). that leverage the indexes from Sect. 2.3 (cf. Appendix A).

Regarding security, OrientDB supports authentication A seamless evolution or scaling is possible through a through Kerberos and LDAP integration, database en- Raft -based [13] auto-discovery mechanism that identicryption and a sophistiacated security concept reaching ifes OrientDB instances, stores the runtime cluster confrom clients over server down to record-level. ifguration and synchronizes certain operations between Storage layer The storage components, introduced in nodes. After nodes have been added to the distributed Sect. 2.3 are seamlessly integrated into the ODB server to system, OrientDB uses a variant of Fast Paxos [14, 15] support full ACID transactions with disk-based storage. to support distributed transactions (i. e., fast rounds, but Despite the exception of Lucene (text, spatial), which are bigger quorums than in classic Paxos). In that way, Orikept in their native storage, built-in and custom indexes entDB supports multi-leader replication, which allows operate on the same ODB storage. several nodes to perform change operations (insert, update, delete). To avoid quorum nodes lagging behind and 3.2. Distribution / Replication requiring expensive quroum rounds to catch up (espeThe ODB server can be deployed in diferent (distributed) cially after failure), each node records change operations setups that allow applications to scale OrientDB from per transaction as persistent version counters. In case small to larger installations, as shown in Fig. 5: ( 1 ) em- a node missed some operations, it can easily detect that bedded, (2) standalone, (3) replicated, and ( 4 ) mixed. by comparing the local version with the one in the next

OrientDB can be embedded into one application ( 1 ), ei- received message and catch up by direct synchronizather completely in-memory (e. g., for testing / CI) or with tion with another replica (similar to [16]). With that, local storage (e. g., for simple micro-services). Multiple OrientDB supports distributed, unique indexes and faultapplications can start with a single, standalone OrientDB tolerance without slowing down a quorum. instance (2) that can be evolved from the embedded one

4. Experiments In this section we conduct a preliminary performance

assessment of OrientDB compared to the databases in Tab. 1, i. e., well-established NoSQL databases (MongoDB, Redis), and one extended RDBMS (Postgres/JSON), using the well-known YCSB benchmark [17]. the results are considered preliminary, but give interesting insights into relative rather than their absolute performance, which we consider valuable. 4.2. Preliminary results The preliminary results of our benchmark runs are shown in Fig. 6 as time per request in seconds (lower bar is better) for MongoDB (short mdb), OrientDB (odb), Postgres/JSON (pqj), and Redis (rds), which we briefly discuss for each workload and concurrent user scaling. 4.1. Setup, Limitations The YCSB benchmark is an open-source, NoSQL database benchmark with a broad coverage of database systems (including all selected databases from Tab. 1). The bench- Workload A: Update-heavy, read The first workload mark features several workloads of JSON documents that is a combination of 50% read and upgrade operations with cover important operations like read, insert, update and a Zipfian record selection. Fig. 6a shows that this keyread-modify-write as well as configuration parameters value store workload is best for Redis, while OrientDB is for scale-factors (e. g., number of records and fields per slightly better for read operations compared to MongoDB record, field length). and Postgres. The update performance is similar for the three non key-value stores.

Setup For our measurements, we run all workloads

(A–F) and use the default parameters (e. g., single user Workload B: Mostly read, update Similarly, Redis and record, records with 10 fields of length 100). We set dominates the read-heavy workload with 95% read operthe number of documents per workload to five million ations with Zipfian distribution in Fig. 6b. The focus on records, operation count to 500k and the batch size to read operations leads to better results for OrientDB and 10k. If a workload requires an index, a hash index is set. Postgres, which have slightly slower update compared

Out of the multitude of polystore and multi-model to read operations.

NoSQL systems, we decided to choose MongoDB as one Workload C: Read-only The Zipfian read-only workof the leading document stores, Postgres/JSON as one load underpins the previous observations on read operof the mostly used RDBMS with multiple NoSQL exten- ations for OrientDB and Postgres, which gain ground sions, and Redis, as a representative key-value store (cf. on Redis, shown in Fig. 6c. Hereby, OrientDB might Tab. 1). For all databases we benchmarked the latest profit from its WTinyLFU read cache implementation server versions available, for which we updated the Ori- that keeps a mix of LRU and heavily read older records senctaDnBopderrivaetirotnos vinerwsioornk3lo.1a.d3 Ean6.d added support for the in memory and its physical RID mapping.

All measurements are conducted on two Intel X5650 CPUs with 2.67GHzs (12 cores), 24GB RAM, Windows 10 operating system, and JDK 1.8.

Workload D: Read latest Consequently, the 95% read

and 5% insert workload for reading the latest inserts shows a similar read performance for all databases. Unlike for the similar update case of workload B, OrientDB Limitations YCSB originally targeted key-value stores, is slightly behind MongoDB and Postgres, while Redis and thus only works with “flat” key-value pair JSON doc- shows similar results to its update performance. uments (i. e., no nesting, arrays). However, even widecolumn and document stores provide YCSB drivers, due Workload E: Short-ranges In this scan operation to the lack of open source benchmark alternatives. Re- dominated workload, the inserts (again only 5%) are simcently, MongoDB tried to adapt their workloads to TPC-C ilar to that of workload D. The scans, however, show [18], but had to admit that neither YCSB nor TPC-C suf- best performance for Postgres and worst for Redis, while ifciently address their workloads. MongoDB and OrientDB are closely left in-between.

Since OrientDB is an MMDMS, one could expect a Workload F: Read-modify-write For the combined suitable multi-model benchmark. In fact, there are poly- r-m-w workload, all databases perform similar to the store benchmarks (e. g., PolyBench [19]) and at least one previous workloads, with Redis on top, then OrientDB multi-model benchmark (UniBench [20]), but none of the with slightly faster read performance than MongoDB and benchmarks is available as open-source (e. g., for database Postgres, but similar update and r-m-w time. driver development).

Due to the lack of more suitable benchmarks and the Concurrent user scaling Since the results for workrather simple nature of YCSB’s data sets and workloads, loads A–F were for a single user, and thus can be considered as baseline performance, we now briefly study workloads B and F for 64 concurrent users in Fig. 7.

MongoDB and Postgres perform slightly better for read 6YCSB OrientDB 3.1. port, visited 11/2021: https://github.com/ operations than OrientDB and Redis in both workloads, brianfrankcooper/YCSB/pull/1468 while MongoDB shows the fastest updates, followed by cus). For single user workloads, MongoDB seems to have additional overhead (i. e., MongoDB’s hash index is actually a B-tree, cf. [21]) that amortizes for multiple users in a single database instance (cf. observation ( 1 )). Secondly, despite being implemented in Java, OrientDB’s unified data model and query capabilities on one single persistence, combined with proven database technology (e. g., W-TinyLFU bufer cache) and of-heap memory usage make it competitive to the other databases in our experiments (cf. observation (2)).

5. Conclusions

INSERT READ (d) Read latest

INSERT SCAN (e) Short-range scans READ UPDATE R-M-W (f) Read, modify, write

This paper gives the first comprehensive description of a

recently revamped OrientDB, a commercially successful

NoSQL, open source MMDMS. While more and more

READ UPDATE R-M-W database systems strive to become multi-model data plat(b) Workload F: Read, modify, write forms, OrientDB early on addressed NoSQL multi-model Figure 7: YCSB workloads B, F (single instance, 64 users) key diferentiators, such as ( 1 ) a single, unified data model, (2), SQL-like, multi-model query and operations, (3) a sinRedis and OrientDB (cf. Figs. 7a and 7b). Notably, for gle, multi-model ACID-transacted store, and ( 4 ) a seamread and update operations, MongoDB’s performance less scaling. Although OrientDB is one of the earliest remains similar to the single user benchmarks. Redis and NoSQL MMDMSs, we showed that it is competitive comOrientDB are factors of 5–10 slower than their single user pared to “best-in-class” document and key-value stores. performance and Postgres even beyond that for update While multi-model data platforms are on a rise, we and r-m-v workloads. found that future work should consider standardized, 4.3. Discussion open benchmark initiatives for MMDMSs as well as for While it is not surprising that all single user YCSB work- single NoSQL areas like document stores (similar to YCSB loads were best for the key-value store Redis, we made for key-value). For OrientDB in particular, there are sevtwo notable observations: ( 1 ) good single instance scal- eral areas of future improvements like adding all custom ing of MongoDB from one to 64 users, (2) OrientDB with indexes to the ODB storage, and further improving transcompetitive results compared to “best-in-class” document acted change operation performance. and key-value stores. Acknowledgements We are especially indebted to Dr.

Firstly, the slightly better update and insert perfor- Spranz and Dr. Pepke, who made this work possible, and mance of MongoDB and Redis could be explained by we thank the fellow OrientDB contributors, mainly Entheir slightly more relaxed ACID guarantees (e. g., with rico Risa and Colin Leister, for the joint journey through single-document focus, compare-and-set operation fo- the OrientDB 3.0 to 3.2 releases.

7JavaCC parser generator, visited 11/2021: https://javacc.github.

Storage 13 (2017) 35:1–35:31. URL: https://doi.org/ io/javacc/documentation/grammar.html

JavaCC grammar JavaCC parser AST (optimized) Parser cache Execution plan (steps / tasks) Plan cache

grammar, which is used to generate an OSQL parser that creates a query AST. To speed up the generated JavaCC parser, recurring queries are cached (i. e., logical plan cache). Several AST-level optimizations are applied (e. g., task push-down, index look-ups on link chains) that (partially) rewrite the AST (e. g., JSON path / chain dot notation becomes a set of sub-queries).

An execution planner creates an execution plan from the (optimized) AST, using pre-defined execution steps. The resulting physical plan itself is “executable” in the sense that its steps (e. g., fetch data, project, filter) and is cached (physical plan cache).

10.1145/3149371. doi: 10 .1145/3149371.

[11]

B. S.

Gill ,

D. S.

Modha , WOW: wise ordering for [1]

Lu ,

Holubová ,

Cautis , Multi-model databases writes - combining spatial and temporal locality in

tices, comparisons, and open challenges, in: Storage Technologies (FAST) , USENIX , 2005 .

Cuzzocrea ,

Allan ,

N. W.

Paton ,

Srivastava , [12]

Karedla ,

J. S.

Love ,

B. G.

Wherry , Caching strate-

dan , A.

Labrinidis , A.

Schuster , H. Wang (Eds.), puter 27 ( 1994 ) 38 - 46 . URL: https://doi.org/10.1109/

ACM International Conference on Information and 2 .268884. doi: 10 .1109/2.268884.

Knowledge

Management (CIKM),

ACM , 2018 , pp. [13]

Ongaro ,

J. K.

Ousterhout , USENIX annual tech-

2301- 2302 . doi: 10 .1145/3269206.3274269. nical conference (ATC), USENIX Association , 2014 , [2]

Abiteboul ,

Arenas ,

Barceló , M. Bien- pp. 305 - 319 .

venu , D.

Calvanese , C.

David , R.

Hull , E. Hüller- [14] L.

Lamport , Fast paxos, Distributed Comput. 19

meier , B.

Kimelfeld , L.

Libkin , W.

Martens , T. Milo , ( 2006 ) 79 - 103 . doi: 10 .1007/s00446-006-0005-x.

Murlak ,

Neven ,

Ortiz ,

Schwentick , J. Stoy- [15]

Zhao , Fast paxos made easy: Theory and imple-

anovich , J. Su , D.

Suciu , V.

Vianu , K.

Yi , Re- mentation, Int. J. Distributed Syst. Technol . 6 ( 2015 )

search directions for principles of data management 15-33 . doi: 10 .4018/ijdst.2015010102.

(dagstuhl perspectives workshop 16151) , Dagstuhl [16]

Zhou ,

Mu , Fault-tolerant replication with

Manifestos 7 ( 2018 ) 1 - 29 . doi: 10 .4230/DagMan.7. pull-based consensus in MongoDB , in: J. Mick-

1.1. ens, R. Teixeira (Eds.), USENIX Symposium on [3]

Abadi ,

Ailamaki ,

Andersen ,

Bailis , Networked Systems Design and Implementation ,

Balazinska ,

Bernstein ,

Boncz , S. Chaud- (NSDI), USENIX Association , 2021 , pp. 687 - 703 .

huri , A.

Cheung , A.

Doan , et al., The seattle report [17]

B. F.

Cooper ,

Silberstein , E. Tam, R. Ramakrish-

on database research, ACM SIGMOD Record 48 nan , R. Sears, Benchmarking cloud serving systems

( 2020 ) 44 - 53 . with YCSB, in: ACM Symposium on Cloud Com [4]

Duggan ,

A. J.

Elmore ,

Stonebraker , M. Bal- puting (SoCC) , 2010 , pp. 143 - 154 . doi: 10 .1145/

azinska , B.

Howe , J.

Kepner , S.

Madden , D.

Maier , 1807128 . 1807152 .

Mattson ,

S. B.

Zdonik , The bigdawg polystore [18]

Kamsky , Adapting TPC-C benchmark to measure

system , ACM SIGMOD Record 44 ( 2015 ) 11 - 16 . performance of multi-document transactions in

doi:10.1145/2814710 .2814713. mongodb, Proc. VLDB Endow . 12 ( 2019 ) 2254 - 2262 . [5]

Mihai

, Multi-model database systems:

The doi:10.14778/3352063 .3352140.

state of afairs, Economics and Applied Informatics [19]

Karimov ,

Rabl ,

Markl , Polybench: The first

( 2020 ) 211 - 215 . benchmark for polystores , in: TPC Technology [6]

Besta , E. Peter,

Gerstenberger ,

Fischer , Conference on Performance Evaluation and Bench-

Podstawski ,

Barthels , G. Alonso, T. Hoe- marking (TPCTC) , Springer, 2018 , pp. 24 - 41 .

lfer , Demystifying graph databases: Analysis and [20]

Zhang , J. Lu,

Xu ,

Chen , Unibench: A bench-

and graph queries , CoRR abs/ 1910 .09017 ( 2019 ). tems, in: R. Nambiar, M. Poess (Eds.), TPC Tech-

arXiv: 1910 .09017. nology Conference on Performance Evaluation and [7]

Fernandes ,

Bernardino , Graph databases com- Benchmarking (TPCTC) , volume 11135 of Lecture

parison: Allegrograph, arangodb , infinitegraph, Notes in Computer Science, Springer, 2018 , pp. 7 - 23 .

neo4j, and orientdb , in: J. Bernardino , C. Quix (Eds.), doi:10.1007/978-3- 030 -11404-6\_2.

International Conference on Data Science , Technol- [21]

Graefe , Modern b-tree techniques,

ogy and Applications (DATA) , SciTePress , 2018 , pp. Found. Trends Databases 3 ( 2011 ) 203 - 402 .

373- 380 . doi: 10 .5220/0006910203730380. URL: https://doi.org/10.1561/1900000028. [8]

Dann ,

Ritter ,

Fröning , Non-relational doi: 10 .1561/1900000028.

sions, challenges, CoRR abs/ 2007 .07595 ( 2020 ).

arXiv: 2007 . 07595 . [9] P. E.

O'Neil, The sb-tree an index-sequential struc- A. OSQL query model

Informatica 29 ( 1992 ) 241 - 265 . The new OSQL query model from OrientDB version 3 .0.x [10]

Einziger ,

Friedman , B. Manes,

TinyLFU: A is sketched in Fig. 8. OSQL is defined based on a JavaCC 7