MMSBench-Net: Scenario-Based Evaluation of Multi-Model
                                Database Systems
                                David Lengweiler1 , Marco Vogt1 and Heiko Schuldt1
                                1
                                    University of Basel, Department of Mathematics and Computer Science, Spiegelgasse 1, 4051 Basel, Switzerland


                                                  Abstract
                                                  Multi-model database systems have gained increasing popularity due to their efficient management of diverse types of data
                                                  and support for complex queries. They offer a unified approach for managing data in various formats, including structured,
                                                  semi-structured, and unstructured data. However, benchmarking the performance of such systems is a challenging task, given
                                                  their complexity, mainly due to their support for multiple data models. While significant research exists for benchmarking
                                                  single-model databases, a comprehensive approach for evaluating multi-model databases is still in an early stage. To address
                                                  this challenge, we propose MMSBench-Net, a benchmark for evaluating multi-model database systems that support structured
                                                  relational, semi-structured document, and graph data models. MMSBench-Net enables comparative analysis of database
                                                  systems and demonstrates how different workloads can reveal the strengths and weaknesses of multi-model database systems.
                                                  To demonstrate the effectiveness of the benchmark, we compare the performance of two database systems: Polypheny and
                                                  SurrealDB. Our work is a first step towards a comprehensive evaluation methodology for multi-model database systems.

                                                  Keywords
                                                  Database benchmark, Polystore, Multi-model database


                                1. Introduction                                                                                            load adjustments, limiting their usefulness for detailed
                                                                                                                                           evaluations and only allowing for broad comparisons.
                                The field of data management has experienced a signif-                                                        This paper makes two contributions: Firstly, we pro-
                                icant transformation in recent years. While relational                                                     pose a benchmark called MMSBench-Net that is tailored
                                databases continue to dominate the market, more spe-                                                       to benchmarking multi-model database systems. It is
                                cialized systems have emerged. Two data models that                                                        based on a real-world scenario that deals with relational,
                                have gained substantial popularity are the graph and the                                                   document and graph data. Secondly, we demonstrate the
                                document model. These data models allow data to be                                                         utility of our benchmark by comparing the performance
                                represented and queried unknown from the relational                                                        of two multi-model database systems, Polypheny1 and
                                model [1]. However, these new data models are by no                                                        SurrealDB2 and discuss the results.
                                means an evolution of the relational model. As a result,                                                      The remainder of this paper is structured as follows:
                                use cases that could be modeled optimally with the re-                                                     In Section 2, we introduce the MMSBench-Net, discuss
                                lational model might only be modeled poorly with the                                                       the underlying scenario and present the data, and work-
                                graph or the document model. As a result, database                                                         load that is being generated. In Section 3, we then briefly
                                management systems supporting multiple data models                                                         introduce the two multi-model database systems subject
                                have gained popularity. These multi-model database sys-                                                    to the benchmark evaluation presented in this paper. Sec-
                                tems allow applications to manage their data in a way                                                      tion 4 then presents and discusses the obtained results.
                                that best suits the specific domains, but also introduce                                                   The paper concludes with an overview of related work
                                greater complexity. While there are well-established                                                       in Section 5, an outlook towards future work in Section 6
                                benchmarks like TPC-C [2], TPC-H [3] and YCSB [4] for                                                      and a conclusion in Section 7.
                                single-model databases, the set of benchmarks targeting
                                multi-model databases is very limited. Existing bench-
                                marks for multi-model databases often focus on specific 2. Benchmark
                                data models, which restricts the range of systems that
                                can be evaluated. Moreover, these benchmarks typically To evaluate the performance of multi-model database
                                involve complex scenarios that lack fine-grained work- systems, we propose MMSBench-Net, a benchmark that
                                                                                                    assesses their ability to manage structured relational,
                                34th GI-Workshop on Foundations of Databases (Grundlagen von Daten- semi-structured document, and graph data models. MMS-
                                banken), June 7-9, 2023, Hirsau, Germany                            Bench-Net is designed to evaluate the efficiency and ver-
                                Envelope-Open david.lengweiler@unibas.ch (D. Lengweiler);           satility of multi-model database systems under different
                                marco.vogt@unibas.ch (M. Vogt); heiko.schuldt@unibas.ch
                                (H. Schuldt)
                                                                                                    workloads.   The “-Net” suffix refers to the first scenario
                                Orcid 0009-0004-0588-8210 (D. Lengweiler); 0000-0002-2674-2219
                                (M. Vogt); 0000-0001-9865-6371 (H. Schuldt)                                                                1
                                                                                                                                               https://polypheny.com/
                                            © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License   2
                                            Attribution 4.0 International (CC BY 4.0).                                                         https://surrealdb.com/


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
         Document                       Graph                                            Relational

            Logs                                                              Login                            User
                                           [Device]
 deviceId <number>                                                accesstime <timestamp> PK     id    <int> PK

 error: optional                 [Connection]                     deviceId <int>                firstname <string>

 ↳ message <string>                             [Connection]      userId <int> FK               lastname <string>
                             [Device]
 ↳ type <string>                                                  duration <float>              birthday <int>
                                                 [Device]
 ↳ stacktrace <array>          [Connection]      id: <int>,       successful <boolean>          salary <int>
                                                add. props
 users <array> optional

 message <string> optional

 timestamp <string>


Figure 1: Multi-Model Schema of MMSBench-Net


introduced in this paper. We plan to add more scenarios          deviceId: 3,
(and thus suffixes) in the future, leading to a complete         timstamp: "2017-07-23:14-03",
suite.                                                           error: {
   The MMSBench-Net benchmark consists of a set of                   message: "Out of Memory",
queries that reflect real-world use cases across the three           type: "Application Error",
data models. These queries are designed to evaluate vari-            stacktrace: ["Error on start of..."]
ous aspects of multi-model database systems, including           },
their ability to handle complex data structures, support         user:{
complex queries, and efficiently execute transactions.               id: 34,
                                                                     status: "logged in"
                                                                 },
2.1. Scenario                                                    users: [34, 45]

MMSBench-Net is inspired by a real-world scenario of
a company’s network monitoring application. Network
monitoring plays a vital role in identifying and address-      Figure 2: Example Status Log Showing an Error
ing potential issues, threats and vulnerabilities in the
network infrastructure, ensuring smooth operations and
preventing data breaches or downtime. The monitoring           its “purchase year” and other relevant information.
application continuously collects all kinds of information        In irregular intervals, each device produces a semi-
about the network, including logged-in devices, usage          structured log-entry containing information about its
statistics and log messages, resulting in huge amounts of      current state. An example of such a log can be seen in
heterogeneous data.                                            Figure 2. These logs might contain error information, in-
   The network monitoring application modeled by MMS-          dicating problems with the device. All log entries include
Bench-Net maintains data in three data models, (i) a           the properties deviceId, timestamp and users. However,
graph part, modeling the topology of the network, (ii) a       additional properties with varying levels of nesting are
document part, which consists of semi-structured logs          randomly generated for each log entry.
produces by the devices and (iii) a relational part, which        An important information for monitoring a network
holds basic information about the users and recorded           is which person is currently associated with which de-
data about their access patterns. The complete schema is       vices. For this scenario, we assume a rather simple user
depicted in Figure 1.                                          database represented as a relational table containing in-
   The topology of the network is saved as a graph, where      formation on the employee. Furthermore, there is also a
each device in the network is represented as a node, and       table for recording successful and failed login attempts
a network connection between two devices is modeled            and for accounting the usage of devices. Hence, this
as an edge. For both, the devices modeled as nodes and         scenario necessitates the database system to deal with
the connection modeled as edges, additional information        heterogeneous read and write workloads.
is being stored, such as, the “manufacturer” of a device,
2.2. Schema and Data Generation                                 relational queries. The collected queries are then sequen-
                                                                tially executed on the database systems.
To generate the schema and to populate it with realis-
tic, but artificially created data, MMSBench-Net starts
with building a simulation. This simulation includes the        2.3. Workload Generation
graph representing the network that is being monitored,         A workload consists of a collection of randomly chosen
as well as the users interacting with it. The nodes in the      queries according to a configurable distribution. Since
graph represent devices (e.g., computers, mobile phones,        the order in which queries are executed can impact the
switches, and routers). The edges between these nodes           performance of a database system (e.g., due to concur-
represent network connections between these devices.            rency effects and locking), the implementation needs to
The simulation utilizes the defined topology to gener-          make sure that the workload is identical for all systems
ate meaningful workloads. By making changes to this             under evaluation (e.g., by using the same seed). MMS-
topology, it becomes possible to adjust the distribution        Bench-Netuses a variety of queries to build its workloads:
of available targets for the queries. This enables to easily
align with specific requirements and desired focus of a         Read Device or Connection Selects a device or con-
workload.                                                             nection and retrieves it partially or fully. One of
   The process of generating this simulated network con-              the static parameters is chosen for this.
sists of multiple steps:
                                                                Read Log Selects a device and reads all or parts of its
User Generation First, a configurable number of users                 logs. Filters as well as projections of underlying
      is being generated.                                             keys are chosen from the target device.
Generation of Devices For each type of device (e.g., Remove Device Selects a device and deletes it, also
      switches, computers), a random number (within a           all connections to this device are deleted as well.
      configurable range) of devices is being generated.        Logs are deleted as well, information on log-in
                                                                attempts are kept.
Device Properties and Logs Generation For each de-
      vice, a random set of properties is being generated. Remove Connection Randomly selects a connection
      Furthermore, a set of login, as well as status logs       between two network devices and deletes it.
      is added as well.
                                                           Add Device Adds a device to the network. Generate
Generation of Connections According to the layout               new connections to existing devices.
      of the network, multiple pairs of devices are se-
      lected and connections between them are created. Remove Logs Randomly selects a device and deletes
                                                                some of its logs.
Connection Properties Generation In contrast to the
      devices, connections do not have status logs, but Add Logs Creates a random log message and adds it to
      they also create multiple dynamic properties.             existing devices or connections.
                                                                Add User Creates a new user. All attributes are ran-
   After the generation of the network is done, it is used
                                                                     domly generated.
as a template to create the workload. Each workload
consists of queries in a query language supported by the        Remove User Randomly selects a user who is being
system under evaluation.                                             deleted.
   A distinction is made between the three data models.
First, the graph data is handled as already seen in Figure 1.   Change User Randomly selects a user and adjusts an
For this, each device is represented as a node and each              attribute.
connection is translated to an edge which connects them.
The small set of dynamic properties is inserted directly           Besides simple queries, there are also more complex
as part of these nodes and edges (if properties are not         retrieval operations which can be chosen, their frequency
supported by the data model, these are handled as if they       is also configurable.
are unstructured data). Then all generated device logs
                                                                Connectivity Checks “Find all similar connected de-
(an example can be seen in Figure 2), are translated to
                                                                     vices” or “Find connected device of specific type”
document queries. Each entity of type device translates
its nested status logs into multiple document queries,          Error Analysis “Identify the top 10 most common errors”
each containing a timestamp and the ID of the device.                  or “Calculate the percentage of errors caused by
   Finally, all login records are collected from the devices           each user”
and together with the user data itself are translated into
Login Activity “Successful logins by user and month” or     provide fast performance while adhering to these guaran-
      “Average duration of successful logins by user and    tees. It also supports unstructured data and basic graph
      hour of the day”                                      functionality, which makes it a suitable choice for this
                                                            comparison. SurrealDB was designed with the goal of re-
   Firstly, the actions are selected and implemented on the ducing the number of joins required for retrieval queries.
simulated network, while concurrently being captured It accomplishes this objective by utilizing a graph struc-
and converted into queries for the evaluated systems. ture that allows a tuple to any other tuple. SurrealQL, a
Once the simulation concludes, the gathered queries are SQL-like query language, is the primary means of inter-
distributed across a configurable number of available acting with the system, which can be accessed through
threads and executed on the evaluated system. The exe- either a REST or a web socket interface.
cution time for each query is measured individually and
recorded for subsequent analysis. This facilitates a com-
prehensive analysis of various aspects of the database 4. Evaluation
systems. Each iteration of this workload generation and
execution process is referred to as a cycle; in an evalua- Our evaluation uses Chronos [8], an ‘evaluation-as-a-
tion, multiple cycles can be chained together to construct service’ framework which allows to easily execute differ-
more extensive workloads.                                   ent system evaluations and configurations in parallel. To
                                                            achieve this, it manages a collection of nodes, which are
                                                            used to execute these different evaluation configurations.
3. Evaluated Systems                                        The evaluation machines used for obtaining the results
                                                            presented in this paper are equipped with an Intel Xeon
To showcase the capabilities of MMSBench-Net, two X5650 24-core CPU with 24 GiB of RAM. All machines
multi-model databases have been chosen to be evalu- run Ubuntu 22.04 LTS (with kernel version 5.15.0-37) and
ated: Polypheny and SurrealDB. These two systems have the same patch level. As Java runtime environment, we
been selected since they follow completely opposite ap- use OpenJDK version 17.0.3. The presented numbers are
proaches for implementing multiple data models beneath the median over three runs.
one facade. While Polypheny maintains the individual           Each run uses either a SurrealDB instance in a Docker4
models independently, SurrealDB follows a more mono- container, deployed from scratch and configured to use
lithic approach by combining all data models in one uni- a persistent on-file configuration, or a fresh Polypheny
fied model.                                                 instance. The Polypheny instance uses a MongoDB5
                                                            store for the document data, a Neo4j6 store for the graph
3.1. Polypheny                                              data, and a PostgreSQL7 store for the relational data.
                                                            Each of these stores is deployed by Polypheny using
Polypheny [5, 6] is a PolyDBMS [7], which is a multi- Docker containers, this requires less setup than bare-
model database system built according to the architecture metal deployments and achieves similar performance [9].
principle of a polystore and supporting multiple query Both Polypheny and SurrealDB have indexes on their
languages. Data can be represented according to the re- primary keys. We provide a reference implementation of
lational, the document and the labeled-property graph the benchmark, including all configurations and the raw
data models. Polypheny utilizes multiple highly opti- results8 .
mized database systems like HypherSQL3 , MongoDB,              As a first overview comparison, the default configu-
Neo4j, and PostgreSQL as storage and execution engines. ration of the benchmark, simulating a network with 10
To achieve competitive performance, Polypheny utilizes users and around 65 devices, is being used. All scaling pa-
these underlying data stores to push down queries. Queries rameters are configured to only allow for a slight growth
not supported by the underlying data store are executed of the network. The different runtimes after multiple
within Polypheny itself. Polypheny also provides support cycles of workloads can be seen in Figure 3.
for transactions with ACID guarantees.                         With such a small network and thus a low number of
                                                            queries, SurrealDB manages to execute the workloads
3.2. SurrealDB                                              faster than Polypheny, even when the amount of queries
                                                            increases. If one observes the results grouped by the
SurrealDB is a multi-model database management sys- query model, Polypheny is faster than SurrealDB for the
tem that provides traditional database guarantees, such relational queries, this can be seen in Figure 4.
as ACID transactions, persistent data storage, and fine-
                                                            4
grained data access control. Its primary objective is to 5 https://www.docker.com/
                                                              https://www.mongodb.com/
                                                            6
                                                              https://neo4j.com/
                                                            7
                                                              https://www.postgresql.org/
3                                                           8
    https://hsqldb.org/                                       https://download-dbis.dmi.unibas.ch/paper/GvDB23.zip
                                                       Polypheny                                                                         Polypheny
  Total Runtime (log scale, ns)                                                                                             1013


                                                                                            Total Runtime (log scale, ns)
                                                       SurrealDB                                                                         SurrealDB
                                             13
                                           10


                                           1012                                                                             1012


                                           1011
                                                                                                                            1011

                                           1010
                                                  10 20 30 40 50 60 70 80 90 100                                                   1x        2x        3x        4x    5x
                                                              Cycles                                                                              Device Scaling

Figure 3: Runtime with Increasing Number of Cycles                                     Figure 5: Runtime with Increasing Number of Devices


                                                                                                                                         Polypheny
      Mean Query Runtime (log scale, ns)


                                                       Polypheny


                                                                                            Total Runtime (log scale, ns)
                                           103         SurrealDB                                                                         SurrealDB


                                           102                                                                              1011


                                                  20       40       60      80   100                                               20%      40%      60%     80%      100%
                                                                   Cycles                                                                  Ratio of Complex Query

Figure 4: Mean Relational Query Runtime with Increasing                                Figure 6: Runtime for Increasing Ratio of Complex Queries
Number of Cycles


                                                                                       5. Related Work
   However, in most real-world scenarios, the network
starts with a significantly higher number than the 10                                  One of the first prominent benchmarks for evaluating
users used by default. Thus, the number of users is being                              database management systems was the Wisconsin [10]
adjusted, leading to a higher number of user logins and                                benchmark, introduced in 1983. The space of multi-
therefore more relational workload. With an increasing                                 model database evaluation, in contrast, has a rather short
ratio of relational workload, Polypheny is able to per-                                history. One of the first ones being BigBench [11], in-
form similar to SurrealDB. This behavior is similar if the                             troduced by TPC as TPCx-BB. BigBench uses a schema,
number of devices in the network is increased. While                                   which combines structured, semi-structured and unstruc-
this does not increase the ratio of the relational workload,                           tured data. But beside TPC, there has been an increase in
compared to the other data models, it still results in better                          work, which provides similar benchmarks to the one pro-
overall performance of Polypheny, which is depicted in                                 posed in this paper. In [12], a benchmark using key-value,
Figure 5. Figure 6 depicts a comparison of different ratios                            column, document and graph data is used to compare
of complex queries in the workload.                                                    ArangoDB9 and OrientDB10 against a combination of
   The results obtained from the evaluation of the two                                 single-model databases, using a proposed synthetic gener-
quite different systems confirms the concepts of the MMS-                              ated benchmark. They were able to show, that depending
Bench-Net benchmark, in particular that it is agnostic to                              on the scenario, multi-model databases can be faster than
the concrete database under evaluation and has a wide                                  configurations combining multiple single-model database
applicability for the evaluation of single- and multi-model                            systems. UniBench [13] targets the same data models as
database systems in realistic settings.                                                MMSBench-Net, but also considers key-value and XML
                                                                                       9
                                                                                           https://www.arangodb.com/
                                                                                       10
                                                                                            https://orientdb.org/
data. It puts great effort in modeling an as realistic as     benchmark allows for a fair comparison of different sys-
possible social-commerce scenario. M2Bench [14] relies        tems, and our results provide insights into the perfor-
heavily on existing benchmark datasets and extends the        mance of Polypheny and SurrealDB under different work-
used data models of UniBench by introducing the array         loads. Ultimately, this benchmark will guide the devel-
model into its evaluation.                                    opment and evaluation of novel multi-model database
                                                              systems.

6. Future Work
                                                              Acknowledgments
Our goal for MMSBench-Net is to extend it into a bench-
marking suite that offers various real-world usage scenar-    This work was partly supported by the SNSF (“Polypheny-
ios for multi-model data management. However, there           DDI: A Flexible Polystore-based Distributed Data Infras-
are some limitations with MMSBench-Net that we need           tructure”, grant no. 200020_213121). The authors would
to address in the future.                                     like to thank R. Arnold, R. Gasser, S. Heller, L. Sauter,
    First, the minimal set of queries that we have chosen     F. Spiess and A. Mbilinyi for their valuable feedback.
for evaluation may not be representative of all possible
ways to query multi-model systems. In future evalua-
tions, we should include a more diverse set of queries to     References
reflect the range of possibilities when querying these sys-
                                                       [1] E. F. Codd, A relational model of data for large
tems. This would provide more comprehensive results
                                                           shared data banks, Communications of the ACM
and strengthen the obtained conclusions.
                                                           13 (1970) 377–387. doi:10/dwxst4 .
    Second, the current composition of workloads is too
                                                       [2] T. P. P. Council, TPC benchmark c revision 5.11,
broad and general to allow for nuanced comparisons
                                                           2010. URL: https://tpc.org/tpc_documents_current_
of multi-model systems. We need to create more fine-
                                                           versions/pdf/tpc-c_v5.11.0.pdf.
grained workloads that focus on specific aspects of data
                                                       [3] T. P. P. Council, TPC benchmark h standard revision
models to capture the subtle differences between these
                                                           3.0.1, 2022. URL: https://tpc.org/tpc_documents_
systems.
                                                           current_versions/pdf/tpc-h_v3.0.1.pdf.
    In addition to the limitations of the benchmark, our
                                                       [4] B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrish-
evaluation only compared two systems, leaving a lot of
                                                           nan, R. Sears, Benchmarking cloud serving systems
unexplored territory. Future evaluations should include
                                                           with YCSB, in: Proc. SoCC’10, ACM Press, 2010, pp.
additional systems such as ArangoDB and OrientDB to
                                                           143–154. doi:10/cxjrfd .
gain more insights into their performance. Although not
                                                       [5] M. Vogt, Adaptive Management of Multimodel Data
all multi-model databases support the same data models,
                                                           and Heterogeneous Workloads, Ph.D. thesis, Uni-
it is possible to use parts of unsupported data models or
                                                           versity of Basel, 2022. doi:10/j44k .
substitute them with other models to expand the range
                                                       [6] M. Vogt, N. Hansen, J. Schönholz, D. Lengweiler,
of systems that can be evaluated.
                                                           I. Geissmann, S. Philipp, A. Stiemer, H. Schuldt,
    Lastly, we should consider evaluating configurations
                                                           Polypheny-DB: Towards bridging the gap between
that use a combination of multiple single-model databases
                                                           polystores and HTAP systems, in: Proc. Poly’21,
to facilitate interesting comparisons. By addressing these
                                                           LNCS, Springer, 2020, pp. 25–36. doi:10/gnxv2h .
limitations, we can develop a more comprehensive and
                                                       [7] M. Vogt, D. Lengweiler, I. Geissmann, N. Hansen,
nuanced benchmarking suite that offers a more accurate
                                                           M. Hennemann, C. Mendelin, S. Philipp, H. Schuldt,
evaluation of multi-model systems.
                                                           Polystore systems and DBMSs: Love marriage or
                                                           marriage of convenience?, in: Proc. Poly’21, volume
7. Conclusion                                              12921   of LNCS, Springer, 2021, pp. 65–69. doi:10/
                                                           gn8qvm .
In this paper, we introduced the MMSBench-Net, a new   [8] M. Vogt, A. Stiemer, S. Coray, H. Schuldt, Chronos:
benchmarked superficially tailored to benchmark multi-     The swiss army knife for database evaluations,
model database systems that is based on the scenario       in: Proc. EDBT’20, OpenProceedings.org, 2020, pp.
of a network monitoring application. Our evaluation of     583–586. doi:10/g8w5 .
Polypheny and SurrealDB demonstrates the effectiveness [9] W. Felter, A. Ferreira, R. Rajamony, J. Rubio, An up-
and applicability of the proposed benchmark.               dated performance comparison of virtual machines
   Our research represents an important first step to-     and Linux containers, in: Proc. ISPASS’15, 2015, pp.
wards establishing a comprehensive evaluation method-      171–172. doi:10/gfvg6d .
ology for multi-model database systems. The proposed [10] H. Boral, D. J. DeWitt, A methodology for database
     system performance evaluation, in: Proc. SIG-
     MOD’84, ACM, 1984, pp. 176–185. doi:10/fk5fbn .
[11] C. Baru, M. Bhandarkar, C. Curino, et al., Discussion
     of BigBench: A Proposed Industry Standard Perfor-
     mance Benchmark for Big Data, in: Performance
     Characterization and Benchmarking. Traditional to
     Big Data, Springer, 2015, pp. 44–63. doi:10/j44q .
[12] F. R. Oliveira, L. del Val Cura, Performance Evalua-
     tion of NoSQL Multi-Model Data Stores in Polyglot
     Persistence Applications, in: Proc. IDEAS’16, ACM,
     2016, pp. 230–235. doi:10/j44n .
[13] C. Zhang, J. Lu, P. Xu, Y. Chen, UniBench: A Bench-
     mark for Multi-model Database Management Sys-
     tems, in: Performance Evaluation and Benchmark-
     ing for the Era of Artificial Intelligence, Springer,
     2019, pp. 7–23. doi:10/j44m .
[14] B. Kim, K. Koo, U. Enkhbat, S. Kim, J. Kim, B. Moon,
     M2Bench: A Database Benchmark for Multi-Model
     Analytic Workloads, Proceedings of the VLDB En-
     dowment 16 (2022) 747–759. doi:10/j44p .