<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Parameter Curation and Data generation for Benchmarking Multi-model Queries</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Parameter Curation</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chao Zhang Supervised by Jiaheng Lu University of Helsinki</institution>
          ,
          <country country="FI">Finland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Pair A: @PersonId=33, @BrandName=''Adidas''</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Unlike traditional database management systems which are organized around a single data model, a multi-model database is designed to support multiple data models against a single, integrated backend. For instance, document, graph, relational, and key-value models are examples of data models that may be supported by a multi-model database. As more and more platforms are proposed to deal with multimodel data, it becomes important to have benchmarks that can be used to evaluate performance and usability of the next generation of multi-model database systems. In this paper, we discuss the motivations and challenges for benchmarking multi-model databases, and then present our current research on the data generation and parameter curation for benchmarking multi-model queries. Our benchmark can be found at http://udbms.cs.helsinki. /bench/.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Recently, there is a new trend [
        <xref ref-type="bibr" rid="ref11 ref12 ref13 ref3">12, 11, 13, 3</xref>
        ] for data
management, namely, the multi-model approach, which mainly
aims to utilize a single platform to manage data in di
erent models, e.g., key-value, document, table, and graph.
Compared to the polyglot persistence technology in NoSQL
world which entails managing separate data stores to satisfy
various use cases, the multi-model approach has been
considered as the next generation of data management technology
combining exibility, scalability, and consistency.
      </p>
      <p>
        The multi-model query is a unique operation in
multimodel databases which allows users to retrieve multi-model
data by using a single query language. Figure 1 depicts
an example of a typical multi-model query in the social
commerce [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]: For a given person p(id=56) and product
brand b(\Nike"), nd p's friends who have bought products
in brand b, and return their feedback which contains
product's reviews with the 5-star rating. This query involves
three data models: customer with friends (Graph), order
embedded with an item list (JSON ), and customer's
feedback (Relation).
      </p>
      <p>
        Database benchmark becomes an essential tool for the
evaluation and comparison of DBMSs since the advent of
Wisconsin benchmark [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] in the early 1980s. Since then,
many database benchmarks have been proposed by academia
      </p>
      <p>Proceedings of the VLDB 2018 PhD Workshop, August
27, 2018. Rio de Janeiro, Brazil. Copyright (C) 2018 for
this paper by its authors. Copying permitted for private
and academic purposes</p>
      <p>Person
_id: 101</p>
      <p>Person
_id: 33
friend friend
friend
Person
_id: 145</p>
      <p>Person
_id: 56
{
"id": 1,
"customer_id": 33,
"total_price": 135,
"items": [
{"product_id": 85,
"brand": "Nike"},
{"product_id": 86,
"brand":"Adidas"}
]
}</p>
      <p>Order(JSON)</p>
      <p>
        Feedback(Relation)
custID productID Rating
33
56
101
145
145
145
85
86
87
87
88
89
5
4
4
3
5
4
and industry for various evaluation goals, such as TPC-X
series for RDBMSs and data warehouses, OO7 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
benchmark for object-oriented DBMSs, and XMark [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] for XML
DBMSs. More recently, The NoSQL and big data
movements in the late 2000s brought the arrival of the next
generation of benchmarks, such as YCSB benchmark [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for
cloud serving systems, LDBC [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and Rbench [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]
benchmarks for Graph and RDF DBMSs, BigBench [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] benchmark
for big data systems. Unfortunately, these benchmarks are
not well suited for the evaluation of multi-model databases
due to the lack of consideration of multiple data models, e.g.,
multi-model storage, multi-model query processing,
multimodel query evaluation. Motivated by this, my Ph.D.
dissertation will focus on the holistic evaluation of multi-model
databases. In general, there are three main challenges on
evaluating multi-model databases:
      </p>
      <p>First, existing data generators that only involve single
model cannot be directly adopted to evaluate the
multimodel databases, and how to design a meaningful data
models to mimic most cases of multi-model application remains
an open question. In this regard, we develop a new data
gen)
s
c
se 4
(
e
m
i
tn 2
u
R
0</p>
      <p>Pair A Pair B
Substitution Parameters
erator to generate the correlated data in diverse data
models, including Graph, JSON, XML, key-value, and tabular.
Furthermore, to simulate the data distributions in real life,
we propose a three-phase framework to generate the data
in the scenario of social commerce. Our data generator also
has good scalability because it is implemented on the top of
Hadoop and Spark, which enables us to generate the data
in parallel.</p>
      <p>
        The second benchmarking challenge is the problem of
Parameter Curation [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], with the goal of selecting the
substitution parameters for the multi-model query template to yield
stable runtime behaviors. The rationale is that, the di
erent parameter values for same query template would result
in high runtime variance. For instance, in Figure 1,
PersonId/56 and BrandName/\Nike" with the orange color in
the AQL query are two substitution parameters that can be
replaced by other values for the example query template.
Figure 2 illustrates our experiment results with two di
erent pairs of substitution parameters on two representative
multi-model databases: ArangoDB [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and OrientDB [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
As shown in Figure 2, these parameters lead to the
opposite evaluation results to compare the performance between
ArangoDB and OrientDB. Interestingly, we observed the
query runtime mainly depends on the domination of data
models. For example, pair A involves relative larger
intermediate results of JSON while pair B takes in the larger
size of Graph. Therefore, the problem of parameter curation
for benchmarking the multi-model queries requires
answering three interesting questions: (i) how to select parameters
from the data model perspective, (ii) how to cover di
erent workloads concerning the data model, and (iii) how to
guarantee the stable distribution of substitution parameters.
In light of this, we formalize this problem as the top-k
parameter groups curation, and then propose a new algorithm,
MJFast, to select the ideal parameter groups.
      </p>
      <p>
        The third challenge corresponds to the metrics of the
benchmark. As expected, both metrics for evaluating the
multi-model dataset (e.g., how closely the data mimic the
real heterogeneous datasets) and multi-model query (e.g.,
to what extent the queries capture the diverse multi-model
patterns) are needed. However, the disparity between data
models in the data structure and workload complexity is
a major hurdle when trying to de ne these new metrics.
For the dataset evaluation metrics, we intend to use the
dataset coherence and relationship specialty [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], as well as
the multi-model complexity. Regarding the metrics of
multimodel query, we de ne a uni ed metric, characterizing the
query processing concerning the data models. For instance,
the metric can be used to either measure the cost of the
nested-loop join for the relational model or assess the cost
of the shortest path matching for the graph model.
      </p>
      <p>The rest of this paper is divided as follows. Section 2
presents the overview of our approach. Section 3 introduces
the methods and techniques for the data generation. Section
4 gives our method for the parameter curation. Section 5
shows the preliminary experimental results. Finally, the last
chapter summarizes this paper and outlines our future work.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>OVERVIEW OF OUR APPROACH</title>
      <p>
        Figure 3 gives an overview of our benchmarking approach,
which consists of three key components to evaluate the
multimodel query. The metadata in the repository is rst passed
into the Data Generation (Section 3) component that
generates the data in a uni ed multi-model form based on our
developed data generator. Next, the Workload
Generation component generates the multi-model queries against
the data models. These multi-model queries consist of a
set of complex read-only queries that involve at least two
data models, aiming to cover di erent business cases and
technical perspectives. More speci cally, as for business
cases, these queries fall into four main levers [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] :
individual, conversation, community, and commerce. In these four
levers, common-used business cases in di erent
granularities are rendered. Regarding technical perspectives, these
queries are designed based on the choke-point concept which
combines usual technical challenges to process the data in
multiple data models, ranging from the conjunctive queries
(OLTP) to analysis (OLAP) workloads. The nal part is
the Parameter Curation (Section 4) component. It rst
characterizes the multi-model query by identifying the
parameters and corresponding involved data models. Then
model vectors corresponding to each parameter value are
generated. Finally, the top-k parameters for the multi-model
query are selected based on the proposed MJFast algorithm.
3.
      </p>
    </sec>
    <sec id="sec-3">
      <title>DATA GENERATION</title>
      <p>
        The data generation is the cornerstone of our benchmark
and comprised of two main parts: social network
generation and e-commerce data generation. The former part is
to generate the social graph, including the person entities
and knows relations, as well as their activities such as posts,
comments, and likes. This generation is based on the LDBC
SNB[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] data generator, which is a representative tool of
generating data in the social network with rich semantics and
scalability. The latter one is to generate the e-commerce
data. Speci cally, we propose a three-phase framework to
generate the transactions by taking into account person's
interests, friendship, and social engagement. The three-phase
framework consists of purchase, propagation-purchase,
repurchase in the context of social commerce.
      </p>
      <p>Purchase. In this phase, we consider two factors when
generating the transaction data. First, persons usually buy
products based on their interests. Second, persons owning
more interests are more likely to buy products than
others. This phase is implemented on the top of Spark SQL
using scala, which utilizes a plentiful APIs and UDFs to
output the various model simultaneously without any
additional operations. Consequently, our data include ve
models: social network (Graph), vendor and feedback (Relation),
order (JSON), invoice (XML), product (Key-value).</p>
      <p>Propagation-Purchase. In this phase, we incorporate
two ingredients from previous data generation: (i) person's
basic demographic data, e.g., gender, age, location. (ii)</p>
      <sec id="sec-3-1">
        <title>Metadata</title>
      </sec>
      <sec id="sec-3-2">
        <title>Repository</title>
      </sec>
      <sec id="sec-3-3">
        <title>LDBC</title>
      </sec>
      <sec id="sec-3-4">
        <title>Interests</title>
      </sec>
      <sec id="sec-3-5">
        <title>Purchase</title>
      </sec>
      <sec id="sec-3-6">
        <title>Friendships</title>
      </sec>
      <sec id="sec-3-7">
        <title>Propagation</title>
      </sec>
      <sec id="sec-3-8">
        <title>Purchase</title>
        <p>Graph</p>
      </sec>
      <sec id="sec-3-9">
        <title>CLVSV Model</title>
      </sec>
      <sec id="sec-3-10">
        <title>Multi-Model</title>
      </sec>
      <sec id="sec-3-11">
        <title>Data</title>
        <p>Relation,JSON,
XML,Key-value</p>
      </sec>
      <sec id="sec-3-12">
        <title>Re-Purchase</title>
      </sec>
      <sec id="sec-3-13">
        <title>Metrics</title>
      </sec>
      <sec id="sec-3-14">
        <title>Choke</title>
      </sec>
      <sec id="sec-3-15">
        <title>Points</title>
      </sec>
      <sec id="sec-3-16">
        <title>Multi-Model</title>
      </sec>
      <sec id="sec-3-17">
        <title>Query</title>
      </sec>
      <sec id="sec-3-18">
        <title>Business</title>
      </sec>
      <sec id="sec-3-19">
        <title>Factors</title>
        <p>where Pk k P r(Rui = kjA = au) is the expectation of
the probability distribution of the target user u's rating on
the target item i, and A = fa1; a2; :::; amg is user attribute
set, we compute this part based on naive bayes method.
The latter part E(Rvi : 8v 2 N (u)) is the expectation of
u's friends' rating distribution on the target item, in which
N (u) is the friends set of user u, and the item i is from the
purchase transaction of friends.</p>
        <p>
          Re-Purchase. To make ne-grained predictions by
incorporating the customer's social activities, we propose a
new probabilistic model, CLVSC (Customer Lifetime Value
in Social Commerce), to generate the transactions based on
the history of customer's purchases and social activities:
CLVSC ib = E(X j n ; x0; n; m; ; ; ; )
(E(M j p; q; ; mx; x) + E(S j s; ; ))
(2)
where i and b are the customer and brand index,
respectively, E(X j ) is the expected number of behaviors, and
E(M j ) is the expected monetary value, parameters in these
two parts are for the beta-geometric/beta-binomial model [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ],
and E(S j ) is the expected number of customer's social
activities, in which the parameters are for the Poisson-gamma
model.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. PARAMETER CURATION 4.1</title>
    </sec>
    <sec id="sec-5">
      <title>Preliminaries</title>
      <p>In this section, we describe the preliminaries for the
parameter curation problem. We assume that the selection of
parameters for a multi-model query should guarantee the
following properties: (i) the query result should correspond
to involved data models and their combinations, (ii) the size
of involved data models should be bounded in each class, (iii)
the selected parameters should cover di erent classes in the
whole parameter space.</p>
      <p>To satisfy the property (i), we propose a vector-based
approach to represent parameter values from the multi-model
perspective. Speci cally, we compute sizes of all
intermediate results correspond to a parameter value based on the
permutation of data models. For instance, given the query
in the Section 1, and a parameter pair (p, b), we compute a
non-zero vector (G,J,GJ,GJR), where G stands for Graph,
J for JSON,R for Relation, GJ refers to the combination
of these two models, i.e., persons who are p's friends and
have bought products in brand b. This method allows us
to represent the model-oriented results independently of the
databases. The de nition of the model vector is as follow:</p>
      <p>De nition 1. Model Vector: In a multi-model query,
each model vector is de ned as ! fc1; :; ck; :; cng, where ck is
k-th intermediate result size against involved data models or
their combinations, cn is the nal result size. The length of
! is between [3, 2m-1], where m is the number of the data
model.</p>
      <p>Regarding property (ii), we assume that a representative
class in the whole parameter space consists of two traits: the
considerable number of model vectors, and the bounded
distance between these vectors. Hence, we de ne the quali ed
class as the candidate parameter group:</p>
      <p>De nition 2. Candidate parameter group: In the
parameter space, the candidate parameter group is the space
with radius covering at least model vectors.</p>
      <p>To ful ll the property (iii), we nd the k farthest
candidate parameter groups. Therefore, the parameter curation
problem boils down to nding the top-k candidate
parameter groups, with the maximum number of model vectors,
and maximum distance between groups.
4.2</p>
    </sec>
    <sec id="sec-6">
      <title>Problem Definition and Algorithm</title>
      <p>We now formalize the problem as follow:
Top-k Parameter Groups Curation: Given the
multimodel query MQ with parameter space P that is a set of N
points in Rd, each point in P is a d-dimensional multi-model
vector, the distance between two groups is the Euclidean
distance between centroid of groups. The objective is to
select k disjoint candidate parameter groups Sk P such
that the score S</p>
      <p>k k
S(Sk) = X Density(Sk)=N1 + X Distance(Sk)=N2 (3)
1 1
is maximized, where the and re ects the importance
of density and distance, which has + = 1. N1 and N2
are the normalization constants that normalize the sum of
density and the sum of distance between 0 to 1, respectively.</p>
      <p>The problem of parameter curation is non-trivial because
it includes two NP-COMPLETE problems: top-k highest dense
regions problem and top-k weighted maximum vertex cover
problem. Therefore, we propose a greedy algorithm, called
MJFast, to tackle this problem. The main idea of MJFast is
rst to gather the similar candidate parameter groups into
the cluster, then chooses the top-k densest candidate
parameter groups from each cluster. Finally, the top-k farthest
groups from all groups are returned. In speci c, we propose
a new data structure, snowball, to store the strongly closed
candidate parameter groups. Snowball starts with an
arbitrary candidate parameter group in which the centroid is
a model vector. Then it recursively rolls if other quali ed
centroid vectors exist among the group. Since each snowball
only maintains top-k densest groups at each iteration, the
search space will be reduced dramatically. In MJFast, we
build a k -d tree to speed up the searches for nearest
neighbor, thus the average search time is O(logn). In the worst
case that all points of P are within the same group, the time
complexity is O(nlogn).</p>
    </sec>
    <sec id="sec-7">
      <title>EXPERIMENT RESULTS</title>
    </sec>
    <sec id="sec-8">
      <title>5.1 Data Generation</title>
      <p>In the case of e ciency, experiment result suggests the
data generator can generate 1G multi-model dataset in 5
minutes, on a single 8-core machine running MapReduce and
Spark in \pseudo-distributed" mode. In terms of scalability,
we successfully generate 10G dataset within 20 minutes on
a cluster with three nodes.</p>
    </sec>
    <sec id="sec-9">
      <title>5.2 Parameter Curation</title>
      <p>We use two metrics to compare the MJFast with the
random method. First, to measure the stability of the method,
we compare the KL-divergence DKL between two groups of
parameters for a xed k. The distribution is the discrete
distribution of model-dominating. For example, when k is 5,
the two model-dominating distributions are (G :4, J :1, R:0)
and (G :1, J :1, R:3) respectively, so the DKL is 1.11.
Second, we proceed with experiments on ArangoDB to compare
the total runtime variance (TRV) between two groups of
parameters. As shown in Table 1b, the parameters curated by
MJFast can not only yield the small value of the DKL, but
also result in low runtime variance as the k increase.</p>
    </sec>
    <sec id="sec-10">
      <title>5.3 Preliminary Benchmarking Results</title>
      <p>Table 1a illustrates the preliminary results for
benchmarking the multi-model queries on ArangoDB and OrientDB.
We conduct the experiments on a quad-core Xeon E5540
server with 32GB of RAM and 500GB of disk. All of
benchmark queries involve at least two models, in particular, Q1,
Q2, Q5 are Graph-dominating workloads, and Q3, Q4 are
JSON-dominating workloads. The results show that, in
multi-model context, OrientDB outperform ArangoDB
regarding the Graph-dominating workload, and ArangoDB is
better at JSON-dominating workload. This also suggests
that the multi-model capacities of these two databases
depend on their main models. i.e., ArangoDB is originally
a document-oriented database, and OrientDB is a natively
graph database.</p>
    </sec>
    <sec id="sec-11">
      <title>6. CONCLUSION AND FURTURE WORK</title>
      <p>Benchmarking the multi-model databases is a challenging
task since current public data and workloads can not well
match the various cases of real applications. To date, we
have developed a scalable data generator to provide data in
multiple data models, involving Graph, JSON, XML,
keyvalue, and tabular. MJFast algorithm, which is proposed to
(a) Mean runtime of multi-model queries (s)
k=5
k=10
k=20</p>
      <p>DKL
0.89
0.19
0.11
address the problem of parameter curation, ensures that our
performance analysis is holistic and valid.</p>
      <p>The general plan to complete my Ph.D. dissertation is to
focus on the three components shown in Figure 3. First,
the data schema and corresponding model in the real
application could be changed, we will introduce this process
in data generation. Second, we will optimize the MJFast
algorithm by incorporating the sampling-based method to
avoid the computation of whole parameter space. Finally,
we will nalize the multi-model query template and the
unied metric, and then conduct a set of experimental study on
multi-model databases. Another extension is to investigate
the ACID guarantees of multi-model transactions.
Acknowledgement This work is supported by Academy of
Finland (310321), China Scholarship and CIMO Fellowship.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>ArangoDB.</surname>
          </string-name>
          <article-title>Highly available multi-model NoSQL database</article-title>
          . https://www.arangodb.com/,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Carey</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. J. DeWitt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Naughton</surname>
          </string-name>
          .
          <article-title>The oo7 benchmark</article-title>
          .
          <source>In ACM SIGMOD</source>
          , pages
          <volume>12</volume>
          {
          <fpage>21</fpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          .
          <article-title>Big data challenge: a data management perspective</article-title>
          .
          <source>Frontiers Comput. Sci.</source>
          ,
          <volume>7</volume>
          (
          <issue>2</issue>
          ):
          <volume>157</volume>
          {
          <fpage>164</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B. F.</given-names>
            <surname>Cooper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Silberstein</surname>
          </string-name>
          , E. Tam,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Sears</surname>
          </string-name>
          .
          <article-title>Benchmarking cloud serving systems with YCSB</article-title>
          .
          <source>In ACM SoCC</source>
          , pages
          <volume>143</volume>
          {
          <fpage>154</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>D. J. DeWitt.</surname>
          </string-name>
          <article-title>The wisconsin benchmark: Past, present, and future</article-title>
          .
          <source>In The Benchmark Handbook</source>
          , pages
          <volume>119</volume>
          {
          <fpage>165</fpage>
          .
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>O.</given-names>
            <surname>Erling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Averbuch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Larriba-Pey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Cha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gubichev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Prat-Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pham</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Boncz</surname>
          </string-name>
          .
          <article-title>The LDBC Social Network Benchmark: Interactive Workload</article-title>
          .
          <source>In SIGMOD</source>
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Fader</surname>
          </string-name>
          .
          <article-title>Customer-base analysis with discrete-time transaction data</article-title>
          .
          <source>PhD thesis</source>
          , University of Auckland,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ghazal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rabl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Raab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Poess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Crolotte</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Jacobsen</surname>
          </string-name>
          . BigBench:
          <article-title>Towards an Industry Standard Benchmark for Big Data Analytics</article-title>
          .
          <source>In ACM SIGMOD</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gubichev</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Boncz</surname>
          </string-name>
          .
          <article-title>Parameter curation for benchmark queries</article-title>
          .
          <source>In TPCTC</source>
          , pages
          <volume>113</volume>
          {
          <fpage>129</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Benyoucef. From</surname>
          </string-name>
          e
          <article-title>-commerce to social commerce: A close look at design features</article-title>
          .
          <source>ECRA</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lu</surname>
          </string-name>
          .
          <article-title>Towards Benchmarking Multi-Model Databases</article-title>
          .
          <source>In CIDR</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lu</surname>
          </string-name>
          and
          <string-name>
            <given-names>I.</given-names>
            <surname>Holubova</surname>
          </string-name>
          <article-title>. Multi-model data management: What's new and what's next? In EDBT,</article-title>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Xu</surname>
          </string-name>
          , and
          <string-name>
            <surname>C. Zhang.</surname>
          </string-name>
          <article-title>UDBMS: road to uni cation for multi-model data management</article-title>
          .
          <source>CoRR, abs/1612.08050</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>OrientDB.</surname>
          </string-name>
          <article-title>Distributed Multi-model and Graph Database</article-title>
          . http://orientdb.com/orientdb/,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Qiao</surname>
          </string-name>
          and
          <string-name>
            <surname>Z. M.</surname>
          </string-name>
          <article-title>Ozsoyoglu</article-title>
          . Rbench:
          <article-title>Application-speci c RDF benchmarking</article-title>
          .
          <source>In ACM SIGMOD</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Waas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Kersten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Carey</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Manolescu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Busse</surname>
          </string-name>
          .
          <article-title>XMark: A Benchmark for XML Data Management</article-title>
          .
          <source>In VLDB</source>
          , pages
          <volume>974</volume>
          {
          <fpage>985</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>K. Z. Zhang.</surname>
          </string-name>
          <article-title>Consumer behavior in social commerce: A literature review</article-title>
          .
          <source>Decision Support Systems</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>