<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Monitoring Thread-Related Resource Demands of a Multi-Tenant In-Memory Database in a Cloud Environment</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dominik Paluch</string-name>
          <email>dominik.paluch@in.tum.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Harald Kienegger</string-name>
          <email>harald.kienegger@in.tum.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johannes Rank</string-name>
          <email>johannes.rank@in.tum.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Helmut Krcmar</string-name>
          <email>krcmar@in.tum.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chair for Information Systems, Technical University of Munich</institution>
          ,
          <addr-line>Garching</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>Estimating the resource demand of a highly configurable software system like an in-memory database is a dificult task. Many factors such as the workload, flexible resource allocation, multi-tenancy and various configuration settings influence the actual performance behavior of such systems. Cloud providers ofering Database-asa-Service applications need to monitor and apply these factors in order to utilize their systems in an eficient and cost-efective manner. However, only observing the CPU utilization of the database's processes, as done by traditional performance approaches, is not suficient to accomplish this task. This is especially relevant for environments with multiple active database tenants, which adds another level of complexity to the thread handling on multiple layers like the database management system or the operating system. In this paper, we propose a fine-grained monitoring setup allowing us to analyze the performance of virtualized multi-tenant databases. Our focus is on extensively collecting and analyzing performance data on a thread level. We utilize this setup to show the performance influence of varying database configuration settings, diferent workload characteristics, multi-tenancy and virtualization features. Therefore, we conducted several experiments by applying the TPC-H benchmark generating OLAP workload on a SAP HANA database in a virtualized environment. In our experiments, we can show a strong dependency between the specific type of workload and the performance. Furthermore, we analyze the workload-dependent performance improvements and the performance degradation when changing the runtime configuration.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• General and reference → Measurement; Performance; •
Information systems → Main memory engines; Database
performance evaluation;</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>
        Cloud-based services are getting more and more attractive to
enterprises by ofering technical and financial advantages to in-house
data centers. Furthermore, the demand for database services in
the cloud, also defined as Database-as-a-Service (DaaS), has been
increasing during recent years. Growing density of memory chips
combined with decreasing prices permit the operation of databases
holding an enterprise applications complete operational data set
in memory. These so-called in-memory databases have advantages
over traditional disk-based databases regarding performance when
performing big data analytics or processing analytical data in
general. For this reason, they are optimally suited for online analytical
processing (OLAP) workload. In these cases, parallel computing
methods are used to optimize the performance of OLAP workload
resulting in the generation of multiple parallel threads [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Utilizing virtualization and multi-tenancy features allows cloud
providers to decrease their operational costs, such as hardware-,
energy- and software licensing. Furthermore, these features help
them to reduce administration eforts in order to provide their
services in a cost-eficient and scalable fashion [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, it
is crucial to understand the impact of diferent influence factors
and setups on the performance behavior of in-memory databases
when utilizing these features [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Recent work has shown that
the eficient usage of threads is an important performance aspect
for all in-memory databases that processes OLAP workload [
        <xref ref-type="bibr" rid="ref11 ref7">7,
11</xref>
        ]. Current research either puts a focus on the operation of a
multi-tenant database system itself [
        <xref ref-type="bibr" rid="ref12 ref9">9, 12</xref>
        ] or puts a more generic
focus on the underlying virtualization layer [
        <xref ref-type="bibr" rid="ref10 ref4">4, 10</xref>
        ]. However, cloud
providers are often utilizing both concepts. They use virtualization
in order to increase the eficiency and improve the flexibility of their
hardware usage and multi-tenancy features on the database layer
to allow further reductions in the maintenance efort and in the
overhead of multiple virtual machines (VMs). Thus, it is expedient
to consider not only the performance of a multi-tenant database or
the virtualization layer, but considering both concepts.
      </p>
      <p>Furthermore, cloud providers often utilize only high level
monitoring, which does not allow to specifically monitor the utilization
of threads through a process. In our work, we could notice a a fully
utilized CPU even in scenarios with less intense workload. Thus,
monitoring only the CPU utilization is not enough to draw
conclusions on the current workload intensity. In order to identify critical
workload scenarios, a more fine-grained monitoring is necessary
to capture all performance-relevant aspects.</p>
      <p>
        This article is an extension of [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and addresses the challenge
to examine the performance behavior of the in-memory database
SAP HANA in the context of a cloud provider’s virtualized
environment by considering various configuration changes. These include
changes regarding the database system, the virtualization layer,
the workload parametrization and the multi-tenancy features.
Furthermore, a focus is set on thread eficiency taking into account
the high parallelization when processing OLAP workload. Thus,
following aspects were considered when designing our setup:
Workload Characteristics Regarding the workload, we
consider the workload intensity and the definition of a user in
the context of our benchmark setup. In this paper, we
extend previous work by varying the type and intensity of the
workload.
      </p>
      <p>Multi-Tenancy We measure the threading characteristics of
the single statements to get additional insights into threading
eficiency. We also set various limits of concurrently active
threads in the HANA database to show the performance
impact on response time of the individual statements. In
addition, we vary the number of concurrently active tenant
databases to consider multi-tenant scenarios as well as
singletenant scenarios.</p>
      <p>
        Virtualization Aspects When considering threading, it is
important to include aspects of the virtualization layer. We
consider the dynamic assignment of processing resources to
a VM running a multi-tenant in-memory database. Since the
simultaneous multithreading (SMT) technology has an
influence on the processing time of threads [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], we also perform
benchmarks to quantify this impact.
2
2.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>METHODOLOGY</title>
    </sec>
    <sec id="sec-4">
      <title>Hardware and Software Setup</title>
      <p>
        We used the established OLAP benchmark suite TPC-H on a SAP
HANA database for our experiments [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. To conduct the
benchmarks, we used HANA version 2.0 SP 2 in a multi-tenant
configuration. In total, five tenants have been configured. We filled the
created database tenants with data sets created with a scale-factor
of 30 as proposed by the benchmark guidelines. This resulted in
tenant sizes of 30 GB each. To avoid unwanted performance
interferences between the tenants, we created individual data sets for
each tenant.
      </p>
      <p>We chose SUSE Linux Enterprise Server (SLES) 12 SP2 as the
underlying operating system. Our experiments were conducted
on two VMs on an IBM Power E870 server with four CPU sockets
populated with Power8 CPUs. In total, the server ofered 40 physical
CPU cores operating at a clock frequency of 4.19 GHz. To operate
the VMs, the server utilized the firmware-based hypervisor platform
IBM PowerVM. In this context, the VMs are also referred to as
logical partitions (LPARs). The server was equipped with 4096 GB
RAM. We assigned 256 GB RAM to the LPAR running the HANA
database. Furthermore, we utilized diferent CPU assignments and
varied between two CPU cores and four CPU cores. Through the
utilization of the IBM tool ppc64_cpu we modified the configuration
of the SLES operating system and varied the SMT configuration. We
conducted benchmark runs with SMT turned of, SMT-2, SMT-4 and
SMT-8. The diferent SMT configurations denote a hardware-based
multithreading feature of the Power8 CPUs. In this context, the digit
of the identifier equals the number of threads per core, i.e. SMT-4
equals four threads per core. In order to minimize the performance
impact on the database through our benchmark-setup, we utilized
another LPAR on the same server for our benchmark driver. In
this setup, both LPARs are connected via a virtual switch, which
allowed us to exclude any network-related performance impact,
since this aspect is not included in the scope of this paper. In order
to perform the benchmarks, we utilized customized shell scripts as
the benchmark driver.
2.2</p>
    </sec>
    <sec id="sec-5">
      <title>Experimental Design</title>
      <p>
        In previous work, we could show that the performance of
database tenants is strongly dependent on thread-related parameters
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. These parameters have a major impact on the performance
behavior. Thus, we created our experimental design with the
objective to extensively collect thread-related metrics influencing the
CPU-related resource demand.
2.2.1 Monitoring setup. Our monitoring setup aims to collect
finegrained information about the thread-usage of the HANA database.
We utilized the virtual file system /proc to collect performance data
regarding relevant database processes and their thread utilization.
The database processes an OLAP query via the so-called indexserver
process, that is part of every database tenant. In a first step, the
indexserver process invokes an SQL executor thread, which performs
query optimizations, prepares the execution plan and identifies the
query as an OLAP query. After the query has been identified as
a complex OLAP query, it is delegated to the job executor thread.
For parallel processing, the job executor assigns the query to
multiple idle threads from a predefined thread pool. These threads are
utilized as jobworker threads. Based on these processing steps, we
decided to put a focus on the three layers described in Table 1 when
processing the raw data from our script-based monitoring solution.
Monitoring l1 allowed us to achieve fine-grained insights about
the thread-usage while processing OLAP queries. The jobworker
threads consume most of the system resources when processing
an OLAP query. Thus, we have put a clear focus on analyzing the
resource usage of these jobworker threads. However, in order to
consider the resource consumption of other threads, we decided to
analyze the monitoring data from l2 and l3 in addition.
      </p>
      <p>After conducting our benchmarks, we noticed an identical
memory usage pattern in the database’s various tenants. Furthermore,
an in-memory database is generally designed to keep its operational
data set in memory. Thus, we decided to exclude memory usage
from our work. Consequently, we put a focus on CPU utilization
in our resource demand analysis. Thus, we decided to extract the
following performance-relevant metrics from our raw data:
query: Each TPC-H query reflects individual performance
relevant aspects of the database processing OLAP workload due to
diferences in the execution plan. Thus, each query has an individual
performance behavior and needs to be monitored separately.</p>
      <p>response time: We utilized this metric as the main indicator for
the performance behavior of an individual query. We defined this
metric as the time from which the query was sent to the database
until a result has been returned.</p>
      <p>processing time: In addition to the response time, we also
considered the actual processing time of a query.</p>
      <p>active jobworker threads: We noticed the jobworker threads were
most relevant for processing an OLAP query. Thus, we counted
the number of jobworker threads, which have been involved when
processing a query.</p>
      <p>rs ratio: To get further insights about the CPU utilization of the
jobworker threads, we also analyzed the ratio between the total
count of jobworker threads in the status running and those in the
status sleeping.</p>
      <p>rs jumps: In our analysis, we also counted the number of times
the jobworker threads changed their status from running to sleeping.
This is important, since it allows us to draw conclusions about the
thread-utilization of a query.</p>
      <p>jw cpu: Furthermore, we considered the total CPU time consumed
by the jobworker threads.</p>
      <p>total cpu: We included this metric to compare the CPU time
consumed by the jobworker threads with the CPU time consumed
by the whole system to ensure that no major interference by another
thread has occured.</p>
      <p>cpu jumps: Threads get assigned to diferent CPU cores by the
operating system. However, the utilization of diferent cores by
one thread increases the processing time of the thread. Thus, we
counted the number of times the OS assigned a jobworker thread to
a diferent CPU core.</p>
      <p>context switches: This metric is only available for the whole
system. It describes the switching of the CPU from one thread to
another. The system has to perform multiple time-consuming steps,
when performing a context switch. Thus, this metric indicates a
negative impact on the performance.
2.2.2 Benchmark design. We designed our experiments in order
to get fine-grained information about the thread usage of the
database tenants in the virtualized environment of a cloud provider.
The parameters of the experiment are listed in Table 2. In our first
benchmark run, we aimed at getting fine-grained information about
the performance behavior of the individual TPC-H queries. We
assigned two CPU cores to our database LPAR for the first benchmark
run and executed all 22 queries consecutively on a single tenant.
Since we wanted to exclude caching efects from our results, we
executed the queries as regular, non-prepared statements. To avoid any
interferences between the queries, we configured a waiting time
of 10 seconds after each query execution. Furthermore, we were
interested in monitoring data with a very high resolution in order
to consider all relevant processing phases. Thus, we configured
our monitoring setup to collect data at intervals of 0.0079 seconds
on average. Lower monitoring resolutions would result in a lack
of accuracy and would not allow us to identify the exact resource
demand for each query. This aspect is especially relevant for the
queries, which have only a short runtime. In later benchmark runs,
we decided to decrease the monitoring resolution to 0.179 seconds
to reduce the size of our raw data sets. In scenarios with higher
loads the query runtime increased, which allowed us to decrease
the monitoring resolution but still collect enough data to allow us
a fine-grained analysis.</p>
      <p>
        In order to show the influence of workload characteristics on the
performance behavior, we decided to compare benchmark runs in
single-tenant scenarios with those in multi-tenant scenarios. In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ],
we could show that performance diferences between single- and
multi-tenancy scenarios exist. In addition, we could show the largest
performance diferences occur in scenarios with high load. However,
we demonstrated the performance impact being much smaller when
we additionally increased the number of active tenants in our
multitenant setup. Thus, we decided to compare only the performance
of scenarios with five active tenants to the performance with only
one active tenant in this paper. This helped us to limit the number
of long running benchmark runs. We varied the number of users in
the performed benchmark runs in both scenarios. For a low load
scenario we utilized 5 concurrent users, for a medium load scenario
we utilized 20 concurrent users and for a high load scenario we
utilized 50 concurrent users. For further performance insights, we
decided to vary parameter x1. Thus, we utilized two diferent user
definitions. With the first definition, we did not conduct the
TPCH benchmark as intended. We chose to run each TPC-H query
isolated and defined the user as the number of concurrently active
executions of the same query. In the second definition, we defined
the TPC-H user as intended. Thus, each user represented a diferent
set of queries in in a specified sequence, which was unique for
each user. Summarizing, we varied parameter x1, x2 and x3 in this
second set of benchmark runs.
      </p>
      <p>SAP HANA is a highly configurable software system. Hence,
it ofers various parameters to optimize the performance of the
database. The parameter max_concurrency limits the maximum
number of jobworker threads a database tenant can utilize. SAP
recommends setting the parameter to a value equal to the number
of available CPU cores divided by the number of tenant databases.
In this third set of benchmark runs, we varied parameters x2, x3
and x4. We set parameter x4 to either three, which limited the
number of jobworker threads or no value, which did not set any limit.
Furthermore, we set parameter x1 to the user definition according
to the TPC-H benchmark for this run and all following runs.</p>
      <p>In the fourth set of benchmark runs, we aimed to analyze the
performance behavior in a virtualized cloud environment. In this setup,
it is possible for administrators to assign more CPU resources to a
VM dynamically. For this reason, we increased the CPU assignment
from two to four CPUs during this benchmark run. It is also
possible to migrate the VM to a server with a diferent CPU, which for
example ofers diferent capabilities regarding SMT. Thus, we
considered the performance-impact of simultaneous-multithreading
in this paper. Summarizing, we decided to vary the parameters x2,
x3, x5 and x6. Parameter x4 has not been set during these runs to
avoid performance restrictions.</p>
    </sec>
    <sec id="sec-6">
      <title>3 RESULTS</title>
    </sec>
    <sec id="sec-7">
      <title>3.1 Analyzing the resource demand of the individual TPC-H queries</title>
      <p>In this section, we analyze the thread-related resource demand of
the individual TPC-H queries utilizing our collected monitoring
data. Figure 1 shows the thread-related resource demand through
the previously described performance metrics in a single tenant
environment. We analyzed each query individually, formed groups
and pointed out any anomalies.</p>
      <p>grp1 (Query 1, 18): Both queries stand out due to their high
response time. Furthermore, both queries are rather CPU intensive.
The status of the utilized jobworker threads are comparatively rarely
set from running to sleeping, which enhances the eficiency. These
threads are also rarely assigned to a diferent CPU core. The number
of context switches performed during the execution of these queries
is also comparatively low. However, query 18 utilizes a much higher
number of jobworker threads indicating a better parallelizability.</p>
      <p>grp2 (Query 9, 13, 21): The resource utilization of these queries is
similar to the previous set of queries. In this case, all queries utilize a
high number of jobworker threads. In addition, the processing phase
of these threads is interrupted only rarely through sleeping phases.
The number of context switches is higher than in the previous set.</p>
      <p>grp3 (Query 15, 16, 22): The major diferences to the resource
demand of the previous query set are the low response times. The
number of sleeping phases and the number of context switches are
rising to a value slightly below the average value.</p>
      <p>grp4 (Query 6, 13): In contrast to the previous set, these queries
show a much lower utilization of jobworker threads. The number
of sleeping phases also increases.</p>
      <p>grp5 (Query 4, 14, 17, 19, 20): Compared to the previous set, these
queries show only an average CPU utilization. All queries only
utilize a low number of jobworker threads. Except for the queries 4
and 14, the processing phases are often interrupted through sleeping
phases. This also results in a higher number of diferent assignments
to the CPU cores.</p>
      <p>grp6 (Query 5, 10, 12): The resource demand of these queries
is very similar to the previous query set. However, they utilize a
higher number of jobworker threads and are less often assigned to
diferent CPU cores.</p>
      <p>grp7 (Query 2, 3, 7, 8): These queries show the lowest utilization
of the CPU. In this case, this also results in a high number of
context switches. Query 2 shows a very high variance regarding the
5
s 1
odn
ecS 01
20
5
0
0
rs 5
reokw
eoJvb 40
iftco
A
rebum 30
N</p>
      <p>02
tsn 300
e
sgnm 250
is
A
froPCU 200
eubm 150
N
lizead 100
m
r
oN 05
1 3 5 7 9 11 13 15 17 19 21</p>
      <p>TPC-H Query Class
1 3 5 7 9 11 13 15 17 19 21</p>
      <p>TPC-H Query Class
(a) Response Time</p>
      <p>(b) Utilized CPU Time
1 3 5 7 9 11 13 15 17 19 21</p>
      <p>TPC-H Query Class
(c) Active Threads
1 3 5 7 9 11 13 15 17 19 21</p>
      <p>TPC-H Query Class
(d) Sleeping Phases
rs 400
reok 1
bw 200
o 1
Jy
zb 01
ed 00
ilit
eUm 800
i
PCTU 600
d
lzeam 400
i
r
oN 002
eep 500
l
S
tteoS 400
sad
rehT 300
lfao
todT 200
ilze
roaNm 100</p>
      <p>0
sec 030
iftttreeoxSbnhuCw 800020
o 2
m
ilrzeaodN 20040
m
N
20000
assignment of CPU cores, the number of sleeping phases and the
CPU time.</p>
      <p>In most cases, the response time is very close to the actual
processing time in this scenario with only one active user.
However, query 2, 3, 12 and 21 show diferences between these times.
In conclusion, our fine-grained monitoring solution allows cloud
providers to examine the resource demand of the specific workload
in detail.</p>
    </sec>
    <sec id="sec-8">
      <title>3.2 Performance behavior in diferent workload scenarios</title>
      <p>In this section, we analyze the thread-related resource demand
of the individual queries when changing the workload scenarios.
Figure 2a shows the efects of workload changes on the query
performance when a user is running all queries in a predefined sequence
compared to the repeated execution of only a single query. It is
noticable, that only CPU intensive queries (i.e. grp1 and 2) can benefit
from the new workload scenario. However, a low CPU utilization
does not necessarily result in a large loss of performance as for
example query 3 shows. The number of sleeping phases also afects
the performance. In most cases, the efect is stronger in high load
scenarios with 50 active users. In multi-tenant environments, the
efect is also stronger than in single-tenant environments. For CPU
intensive queries, changing the user definition results in a decreased
probability for the CPU being blocked by another CPU intensive
query. Thus, these queries benefit from the workload change.
However, for less CPU intensive queries (i.e. grp7 ) the propability for
the CPU being blocked by a more CPU intensive query increases.
In conclusion, these benchmark results show the importance of
performance predictions in the cloud context. Changes in the workload
which can occur i.e. when the usage profile of one tenant changes.
However, these simple changes can result in major performance
losses depending on the specific type of workload.</p>
    </sec>
    <sec id="sec-9">
      <title>3.3 Performance behavior in diferent runtime configurations</title>
      <p>In this section, we describe the performance influence of runtime
environment factors. In a first experiment, we limited the number
of jobworker threads the database can utilize and compared the
results to the setup with unlimited jobworker threads.</p>
      <p>Figure 2b shows mixed results in the environment with only
one active tenant. In general, queries with an increased number
of sleeping phases (i.e. grp5) seem to benefit from the parameter
especially in scenarios with only a low load. Through the static
parameter max_concurrency a single database tenant cannot longer
fully utilize the CPU resources in many workload scenarios. This
results in the decreased performance for most OLAP queries. In
multi-tenant environments, Figure 2c shows a much clearer picture
of the performance behavior with the limiting parameter enabled. In
scenarios with low load, queries with a higher number of sleeping
phases benefit clearly from the parameter change. However, CPU
intensive queries with less sleeping phases (i.e. grp1 and 2) show
a performance loss. In a less intense workload, the probability of
the CPU resources being blocked through an CPU intensive query
decreases. Thus, the duration of the sleeping phases can be
decreased. In scenarios with higher loads, the efect does not continue,
as there are only slight diferences in the performance behavior
in these cases. Additionally, we noticed a significant lower
diference between the response time and the processing time during the
benchmark runs with a limited number of jobworker threads. This
can be explained by an increased resource availability for other
relevant database threads. In order to analyze the performance
improvement through the assignment of more CPU resources, we
changed the CPU assignment of the LPAR from two to four for the
next benchmark runs. Figure 2d shows the performance
improvement through the additional CPU resources. In general, queries with
more sleeping phases especially profit from the additional resources
in the scenario with only five active users. In these scenarios, the
huge diference between the performance improvements of the
individual queries is noticeable. The efect is much less intense
in high load scenarios. The performance improvement is slightly
higher in multi-tenant scenarios. This performance behavior can be
explained by a decreasing duration of the sleeping phases. Threads
in the sleeping status can be assigned faster to a processing unit due
005
)ten
recnP 0
i(egan
cenhC -500
frrreeeoavP 00
m
g 0
a -1
A</p>
      <p>008
t)
rceen 600
i(nP
eagnh 400
C
ce
roanm 200
free
P
reagv 0
A
t)
recenP 500
i(n
-FFTO 400
M
oS
tred 300
aop
m
eganC 200
echC
roanm 100
fre
P
reeag 0
v
A
005
-1 1 3 5 7 9 11 13 15 17 19 21</p>
      <p>TPC-H Query
(a) Impact of workload changes (b) Limited number of
jobon the performance in a single worker threads in a single
tentenant environment ant environment
5 Users
20 Users 50 Users
5 Users
20 Users 50 Users
00
-2 1 3 5 7 9 11 13 15 17 19 21</p>
      <p>TPC-H Query
1 3 5 7 9 11 13 15 17 19 21</p>
      <p>TPC-H Query
(c) Limited number of job- (d) Asignment of more CPU
reworker threads in a multi sources in a single tenant
envitenant environment ronment</p>
      <p>SMT-2 SMT-4 SMT-8</p>
      <p>SMT-2 SMT-4 SMT-8
305
t)en 300
rc
ienP 250
(e
aghnC 200
ec
ranm 150
fro
eegP 100
a
revA 50</p>
      <p>0
)trecn
ienP 150
(
FFO
ST
M
treod 100
eaogpn
m
C
ecahC 50
frreaon
m
P
reeag 0
v</p>
      <p>A
5 Users
20 Users 50 Users
5 Users</p>
      <p>20 Users 50 Users
501
t)ecn 100
re
P
i(egn 50
an
ecnhC 0
ra
m
freoP -50
reag
evA -100
50
-1 1 3 5 7 9 11 13 15 17 19 21</p>
      <p>TPC-H Query
to the increased CPU resources. To further analyze the performance
improvement through hardware multithreading, we changed the
setup of our database LPAR to utilize no SMT at all. Afterwards,
we set the LPAR to utilize SMT-2, SMT-4 and SMT-8. Figure 2e
and Figure 2f show the resulting performance improvements of the
diferent SMT-settings compared to the benchmark run with SMT
disabled. It is noticeable, queries with a high CPU demand combined
with a high count of sleeping phases and a relatively low number of
active jobworker threads (i.e. grp5) usually benefit from SMT. This
efect is very intense in low-load scenarios as Figure 2e shows. In
multi-tenant scenarios, this efect further increases. Queries with a
low CPU demand and a high number of active jobworker threads
(i.e. grp6 and 7 ) show almost no benefit with SMT-2 enabled. They
also show a lower performance with SMT-8 compared to SMT-4.
This is the result of these queries not being able to benefit from the
additional SMT capabilities. Increasing complexity regarding the
access of the CPU cache results in a slight performance decrease in
such cases. Figure 2f shows the performance improvements in high
load scenarios. In general, the diferences between the individual
queries regarding their performance improvement through SMT
are much lower than in low load scenarios. Additionally, all queries
benefit from SMT-4 and SMT-8 under high workload. This
performance behavior can be explained by the higher resource demand
related to this workload scenario.</p>
      <p>In conclusion, these results show the dependency of the
performance on multiple workload-related aspects. Depending on the
workload, identical changes in the database configuration can
either improve the performance or result in performance losses. Our
monitoring solution allows cloud providers to closer analyze their
workload. In order to operate the databases in an eficient and
costefective manner, this analysis is crucial. Detailed knowledge about
the resource usage of the specific workload allows cloud providers
to deploy database tenants eficiently. Furthermore, valuable
resources can be assigned dynamically where they are needed. To
assign too many or too few resources is disadvantageous, since either
performance goals are not met or unneeded resources are assigned.
With detailed knowledge about the workload, cloud providers can
avoid both situations. Changing CPU-related resources i.e. by
migrating the database to a more powerful server or by assigning
more CPU-resources in a virtualized environment also results in
difering degrees of success depending on the specific workload
scenario.</p>
    </sec>
    <sec id="sec-10">
      <title>4 RELATED WORK</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], the author provides performance insights into the in-memory
database SAP HANA in a multi-tenant configuration. However, he
only considers the database and the applied workload as a black box
and give no further insights about performance-relevant factors.
Furthermore, he does not consider the eficiency of thread usage
in his work. In his experiments only small sized tenants are used,
which is unlikely in a real world scenario.
      </p>
      <p>
        The authors in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] provide more fine-grained performance
insights into SAP HANA in a multi-tenant configuration considering
amongst other factors diferently sized tenant databases, a
varying workload and diferent CPU assignments. In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] they extend
their work by providing new models for the prediction of memory
occupancy. In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], they further extend their work and provide
insights into the usage of threads. However, they only measure the
average number of utilized CPU cores to obtain the thread usage
through the queries. Furthermore, they utilize a lower monitoring
resolution resulting in a lower accuracy when considering the
resource demand of the individual queries. In this paper, we could
show the limitations of this approach by performing benchmarks
in various scenarios. Considering only the utilization of CPU cores
is not suficient to explain the performance behavior of the TPC-H
queries in our scenarios. Thus, we extended their work by providing
more fine-grained insights into thread usage in varying hardware
environments.
      </p>
    </sec>
    <sec id="sec-11">
      <title>5 CONCLUSION AND FUTURE WORK</title>
      <p>In our work, we provided fine-grained performance insights on the
in-memory database SAP HANA. We have built a monitoring setup
allowing us to perform a detailed analysis of the thread-utilization
of the database. Our setup is also capable of collecting data with
a very high resolution, preventing any losses through inaccurate
monitoring data. We have shown the dependency of several metrics
on the performance behavior in multiple scenarios. Furthermore,
our monitoring setup allowed us to group the TPC-H queries
according to their resource demand. The fine-grained analysis of
the resource-demand of diferent queries allowed us to explain
anomalies when observing their performance behavior in diferent
workload scenarios as well as in diferent runtime environments.</p>
      <p>In further work, we plan to create a fine-grained performance
prediction model allowing us to simulate the performance behavior
in diferent scenarios.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Armbrust</surname>
          </string-name>
          , Armando Fox, Rean Grifith,
          <string-name>
            <given-names>Anthony D.</given-names>
            <surname>Joseph</surname>
          </string-name>
          , Randy Katz, Andy Konwinski,
          <string-name>
            <given-names>Gunho</given-names>
            <surname>Lee</surname>
          </string-name>
          , David Patterson,
          <string-name>
            <given-names>Ariel</given-names>
            <surname>Rabkin</surname>
          </string-name>
          , Ion Stoica, and
          <string-name>
            <given-names>Matei</given-names>
            <surname>Zaharia</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>A View of Cloud Computing</article-title>
          .
          <source>Commun. ACM 53</source>
          ,
          <issue>4</issue>
          (April
          <year>2010</year>
          ),
          <fpage>50</fpage>
          -
          <lpage>58</lpage>
          . https://doi.org/10.1145/1721654.1721672
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Brunnert</surname>
          </string-name>
          , Christian Vögele, Alexandru Danciu, Matthias Pfaf, Manuel Mayer, and
          <string-name>
            <given-names>Helmut</given-names>
            <surname>Krcmar</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Performance Management Work</article-title>
          .
          <source>Business &amp; Information Systems Engineering</source>
          <volume>6</volume>
          ,
          <issue>3</issue>
          (
          <issue>01</issue>
          <year>Jun 2014</year>
          ),
          <fpage>177</fpage>
          -
          <lpage>179</lpage>
          . https://doi.org/10. 1007/s12599-014-0323-7
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Dehne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rau-Chaplin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zaboli</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhou</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Scalable realtime OLAP on cloud architectures</article-title>
          .
          <source>J. Parallel and Distrib. Comput</source>
          .
          <volume>79</volume>
          -
          <fpage>80</fpage>
          (
          <year>2015</year>
          ),
          <fpage>31</fpage>
          -
          <lpage>41</lpage>
          . https://doi.org/10.1016/j.jpdc.
          <year>2014</year>
          .
          <volume>08</volume>
          .006
          <string-name>
            <given-names>Special</given-names>
            <surname>Issue</surname>
          </string-name>
          <article-title>on Scalable Systems for Big Data Management and Analytics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Martin</given-names>
            <surname>Grund</surname>
          </string-name>
          , Jan Schafner, Jens Krueger, Jan Brunnert, and
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Zeier</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>The Efects of Virtualization on Main Memory Systems</article-title>
          .
          <source>In Proceedings of the Sixth International Workshop on Data Management on New Hardware (DaMoN '10)</source>
          . ACM, New York, NY, USA,
          <fpage>41</fpage>
          -
          <lpage>46</lpage>
          . https://doi.org/10.1145/1869389.1869395
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Barroso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Eggers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gharachorloo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. M.</given-names>
            <surname>Levy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Parekh</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>An analysis of database workload performance on simultaneous multithreaded processors</article-title>
          .
          <source>In Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)</source>
          .
          <fpage>39</fpage>
          -
          <lpage>50</lpage>
          . https://doi.org/10.1109/ISCA.
          <year>1998</year>
          .694761
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Karsten</given-names>
            <surname>Molka</surname>
          </string-name>
          and
          <string-name>
            <given-names>Giuliano</given-names>
            <surname>Casale</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Experiments or simulation? A characterization of evaluation methods for in-memory databases</article-title>
          .
          <source>In 11th International Conference on Network and Service Management (CNSM</source>
          <year>2015</year>
          ). IEEE,
          <fpage>201</fpage>
          -
          <lpage>209</lpage>
          . https://doi.org/10.1109/CNSM.
          <year>2015</year>
          .7367360
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Karsten</given-names>
            <surname>Molka</surname>
          </string-name>
          and
          <string-name>
            <given-names>Giuliano</given-names>
            <surname>Casale</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Contention-Aware Workload Placement for In-Memory Databases in Cloud Environments</article-title>
          .
          <article-title>ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS</article-title>
          <year>2016</year>
          )
          <volume>2</volume>
          ,
          <issue>1</issue>
          ,
          <string-name>
            <surname>Article 1</surname>
          </string-name>
          (
          <issue>Sept</issue>
          .
          <year>2016</year>
          ),
          <volume>29</volume>
          pages. https://doi.org/10.1145/2961888
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Molka</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Casale</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Eficient Memory Occupancy Models for In-memory Databases</article-title>
          .
          <source>In 2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS)</source>
          .
          <volume>430</volume>
          -
          <fpage>432</fpage>
          . https://doi.org/10.1109/MASCOTS.
          <year>2016</year>
          .56
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Molka</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Casale</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Energy-eficient resource allocation and provisioning for in-memory database clusters</article-title>
          .
          <source>In 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM)</source>
          .
          <volume>19</volume>
          -
          <fpage>27</fpage>
          . https://doi.org/10.23919/INM.
          <year>2017</year>
          . 7987260
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Tobias</surname>
            <given-names>Mühlbauer</given-names>
          </string-name>
          , Wolf Rödiger, Andreas Kipf, Alfons Kemper, and
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Neumann</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>High-Performance Main-Memory Database Systems and Modern Virtualization: Friends or Foes?</article-title>
          .
          <source>In Proceedings of the Fourth Workshop on Data Analytics in the Cloud (DanaC'15)</source>
          . ACM, New York, NY, USA, Article
          <volume>4</volume>
          , 4 pages. https://doi.org/10.1145/2799562.2799643
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Dominik</surname>
            <given-names>Paluch</given-names>
          </string-name>
          , Harald Kienegger, and
          <string-name>
            <given-names>Helmut</given-names>
            <surname>Krcmar</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>A WorkloadDependent Performance Analysis of an In-Memory Database in a Multi-Tenant Configuration</article-title>
          .
          <source>In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering (ICPE '18)</source>
          . ACM, New York, NY, USA,
          <fpage>131</fpage>
          -
          <lpage>134</lpage>
          . https://doi.org/10.1145/3185768.3186290
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Jan</given-names>
            <surname>Schafner</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Multi Tenancy for Cloud-Based In-Memory Column Databases: Workload Management</article-title>
          and
          <string-name>
            <given-names>Data</given-names>
            <surname>Placement</surname>
          </string-name>
          . Springer International Publishing, Heidelberg. https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -00497-
          <issue>6</issue>
          _
          <fpage>1</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Transaction</given-names>
            <surname>Processing Performance Council</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>TPC-H benchmark specification</article-title>
          . http://www.tpc.org/tpch/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>