Monitoring Thread-Related Resource Demands of a Multi-Tenant In-Memory Database in a Cloud Environment Dominik Paluch Johannes Rank Chair for Information Systems Chair for Information Systems Technical University of Munich Technical University of Munich Garching, Germany Garching, Germany dominik.paluch@in.tum.de johannes.rank@in.tum.de Harald Kienegger Helmut Krcmar Chair for Information Systems Chair for Information Systems Technical University of Munich Technical University of Munich Garching, Germany Garching, Germany harald.kienegger@in.tum.de krcmar@in.tum.de ABSTRACT 1 INTRODUCTION Estimating the resource demand of a highly configurable software Cloud-based services are getting more and more attractive to enter- system like an in-memory database is a difficult task. Many factors prises by offering technical and financial advantages to in-house such as the workload, flexible resource allocation, multi-tenancy data centers. Furthermore, the demand for database services in and various configuration settings influence the actual performance the cloud, also defined as Database-as-a-Service (DaaS), has been behavior of such systems. Cloud providers offering Database-as- increasing during recent years. Growing density of memory chips a-Service applications need to monitor and apply these factors in combined with decreasing prices permit the operation of databases order to utilize their systems in an efficient and cost-effective man- holding an enterprise applications complete operational data set ner. However, only observing the CPU utilization of the database’s in memory. These so-called in-memory databases have advantages processes, as done by traditional performance approaches, is not over traditional disk-based databases regarding performance when sufficient to accomplish this task. This is especially relevant for envi- performing big data analytics or processing analytical data in gen- ronments with multiple active database tenants, which adds another eral. For this reason, they are optimally suited for online analytical level of complexity to the thread handling on multiple layers like the processing (OLAP) workload. In these cases, parallel computing database management system or the operating system. In this paper, methods are used to optimize the performance of OLAP workload we propose a fine-grained monitoring setup allowing us to analyze resulting in the generation of multiple parallel threads [3]. the performance of virtualized multi-tenant databases. Our focus Utilizing virtualization and multi-tenancy features allows cloud is on extensively collecting and analyzing performance data on a providers to decrease their operational costs, such as hardware-, thread level. We utilize this setup to show the performance influ- energy- and software licensing. Furthermore, these features help ence of varying database configuration settings, different workload them to reduce administration efforts in order to provide their characteristics, multi-tenancy and virtualization features. There- services in a cost-efficient and scalable fashion [1]. However, it fore, we conducted several experiments by applying the TPC-H is crucial to understand the impact of different influence factors benchmark generating OLAP workload on a SAP HANA database and setups on the performance behavior of in-memory databases in a virtualized environment. In our experiments, we can show a when utilizing these features [2]. Recent work has shown that strong dependency between the specific type of workload and the the efficient usage of threads is an important performance aspect performance. Furthermore, we analyze the workload-dependent for all in-memory databases that processes OLAP workload [7, performance improvements and the performance degradation when 11]. Current research either puts a focus on the operation of a changing the runtime configuration. multi-tenant database system itself [9, 12] or puts a more generic focus on the underlying virtualization layer [4, 10]. However, cloud CCS CONCEPTS providers are often utilizing both concepts. They use virtualization • General and reference → Measurement; Performance; • In- in order to increase the efficiency and improve the flexibility of their formation systems → Main memory engines; Database per- hardware usage and multi-tenancy features on the database layer formance evaluation; to allow further reductions in the maintenance effort and in the overhead of multiple virtual machines (VMs). Thus, it is expedient KEYWORDS to consider not only the performance of a multi-tenant database or the virtualization layer, but considering both concepts. In-memory Database; Performance Analysis; Cloud Computing; Furthermore, cloud providers often utilize only high level moni- Multi-Tenancy; SAP HANA toring, which does not allow to specifically monitor the utilization 31st GI-Workshop on Foundations of Databases (Grundlagen von Datenbanken), of threads through a process. In our work, we could notice a a fully 11.06.2019-14.06.2019, Saarburg, Germany. utilized CPU even in scenarios with less intense workload. Thus, Copyright is held by the author/owner(s). D. Paluch et al. monitoring only the CPU utilization is not enough to draw conclu- Table 1: Layers of our monitoring solution sions on the current workload intensity. In order to identify critical workload scenarios, a more fine-grained monitoring is necessary Layer Description to capture all performance-relevant aspects. l1 The resource consumption of the jobworker processes This article is an extension of [11] and addresses the challenge l2 The resource consumption of the indexserver processes to examine the performance behavior of the in-memory database of the individual tenant databases SAP HANA in the context of a cloud provider’s virtualized environ- l3 The total resource consumption of the LPAR ment by considering various configuration changes. These include changes regarding the database system, the virtualization layer, the workload parametrization and the multi-tenancy features. Fur- thermore, a focus is set on thread efficiency taking into account utilization of the IBM tool ppc64_cpu we modified the configuration the high parallelization when processing OLAP workload. Thus, of the SLES operating system and varied the SMT configuration. We following aspects were considered when designing our setup: conducted benchmark runs with SMT turned off, SMT-2, SMT-4 and Workload Characteristics Regarding the workload, we con- SMT-8. The different SMT configurations denote a hardware-based sider the workload intensity and the definition of a user in multithreading feature of the Power8 CPUs. In this context, the digit the context of our benchmark setup. In this paper, we ex- of the identifier equals the number of threads per core, i.e. SMT-4 tend previous work by varying the type and intensity of the equals four threads per core. In order to minimize the performance workload. impact on the database through our benchmark-setup, we utilized Multi-Tenancy We measure the threading characteristics of another LPAR on the same server for our benchmark driver. In the single statements to get additional insights into threading this setup, both LPARs are connected via a virtual switch, which efficiency. We also set various limits of concurrently active allowed us to exclude any network-related performance impact, threads in the HANA database to show the performance since this aspect is not included in the scope of this paper. In order impact on response time of the individual statements. In to perform the benchmarks, we utilized customized shell scripts as addition, we vary the number of concurrently active tenant the benchmark driver. databases to consider multi-tenant scenarios as well as single- tenant scenarios. 2.2 Experimental Design Virtualization Aspects When considering threading, it is im- In previous work, we could show that the performance of data- portant to include aspects of the virtualization layer. We base tenants is strongly dependent on thread-related parameters consider the dynamic assignment of processing resources to [11]. These parameters have a major impact on the performance a VM running a multi-tenant in-memory database. Since the behavior. Thus, we created our experimental design with the ob- simultaneous multithreading (SMT) technology has an influ- jective to extensively collect thread-related metrics influencing the ence on the processing time of threads [5], we also perform CPU-related resource demand. benchmarks to quantify this impact. 2.2.1 Monitoring setup. Our monitoring setup aims to collect fine- 2 METHODOLOGY grained information about the thread-usage of the HANA database. We utilized the virtual file system /proc to collect performance data 2.1 Hardware and Software Setup regarding relevant database processes and their thread utilization. We used the established OLAP benchmark suite TPC-H on a SAP The database processes an OLAP query via the so-called indexserver HANA database for our experiments [13]. To conduct the bench- process, that is part of every database tenant. In a first step, the in- marks, we used HANA version 2.0 SP 2 in a multi-tenant config- dexserver process invokes an SQL executor thread, which performs uration. In total, five tenants have been configured. We filled the query optimizations, prepares the execution plan and identifies the created database tenants with data sets created with a scale-factor query as an OLAP query. After the query has been identified as of 30 as proposed by the benchmark guidelines. This resulted in a complex OLAP query, it is delegated to the job executor thread. tenant sizes of 30 GB each. To avoid unwanted performance inter- For parallel processing, the job executor assigns the query to multi- ferences between the tenants, we created individual data sets for ple idle threads from a predefined thread pool. These threads are each tenant. utilized as jobworker threads. Based on these processing steps, we We chose SUSE Linux Enterprise Server (SLES) 12 SP2 as the decided to put a focus on the three layers described in Table 1 when underlying operating system. Our experiments were conducted processing the raw data from our script-based monitoring solution. on two VMs on an IBM Power E870 server with four CPU sockets Monitoring l 1 allowed us to achieve fine-grained insights about populated with Power8 CPUs. In total, the server offered 40 physical the thread-usage while processing OLAP queries. The jobworker CPU cores operating at a clock frequency of 4.19 GHz. To operate threads consume most of the system resources when processing the VMs, the server utilized the firmware-based hypervisor platform an OLAP query. Thus, we have put a clear focus on analyzing the IBM PowerVM. In this context, the VMs are also referred to as resource usage of these jobworker threads. However, in order to logical partitions (LPARs). The server was equipped with 4096 GB consider the resource consumption of other threads, we decided to RAM. We assigned 256 GB RAM to the LPAR running the HANA analyze the monitoring data from l 2 and l 3 in addition. database. Furthermore, we utilized different CPU assignments and After conducting our benchmarks, we noticed an identical mem- varied between two CPU cores and four CPU cores. Through the ory usage pattern in the database’s various tenants. Furthermore, Monitoring an In-Memory Database in a Cloud Environment an in-memory database is generally designed to keep its operational Table 2: Parameters of the experiment data set in memory. Thus, we decided to exclude memory usage from our work. Consequently, we put a focus on CPU utilization Layer Description in our resource demand analysis. Thus, we decided to extract the x1 Single Query User / Multi Query User following performance-relevant metrics from our raw data: x2 Number of active Users query: Each TPC-H query reflects individual performance rel- x3 Number of active Tenants evant aspects of the database processing OLAP workload due to x4 Thread-Limit On / Off differences in the execution plan. Thus, each query has an individual x5 Number of assigned CPU cores performance behavior and needs to be monitored separately. x6 Number of Threads / Core (SMT) response time: We utilized this metric as the main indicator for the performance behavior of an individual query. We defined this metric as the time from which the query was sent to the database until a result has been returned. to consider all relevant processing phases. Thus, we configured processing time: In addition to the response time, we also consid- our monitoring setup to collect data at intervals of 0.0079 seconds ered the actual processing time of a query. on average. Lower monitoring resolutions would result in a lack active jobworker threads: We noticed the jobworker threads were of accuracy and would not allow us to identify the exact resource most relevant for processing an OLAP query. Thus, we counted demand for each query. This aspect is especially relevant for the the number of jobworker threads, which have been involved when queries, which have only a short runtime. In later benchmark runs, processing a query. we decided to decrease the monitoring resolution to 0.179 seconds rs ratio: To get further insights about the CPU utilization of the to reduce the size of our raw data sets. In scenarios with higher jobworker threads, we also analyzed the ratio between the total loads the query runtime increased, which allowed us to decrease count of jobworker threads in the status running and those in the the monitoring resolution but still collect enough data to allow us status sleeping. a fine-grained analysis. rs jumps: In our analysis, we also counted the number of times In order to show the influence of workload characteristics on the the jobworker threads changed their status from running to sleeping. performance behavior, we decided to compare benchmark runs in This is important, since it allows us to draw conclusions about the single-tenant scenarios with those in multi-tenant scenarios. In [11], thread-utilization of a query. we could show that performance differences between single- and jw cpu: Furthermore, we considered the total CPU time consumed multi-tenancy scenarios exist. In addition, we could show the largest by the jobworker threads. performance differences occur in scenarios with high load. However, total cpu: We included this metric to compare the CPU time we demonstrated the performance impact being much smaller when consumed by the jobworker threads with the CPU time consumed we additionally increased the number of active tenants in our multi- by the whole system to ensure that no major interference by another tenant setup. Thus, we decided to compare only the performance thread has occured. of scenarios with five active tenants to the performance with only cpu jumps: Threads get assigned to different CPU cores by the one active tenant in this paper. This helped us to limit the number operating system. However, the utilization of different cores by of long running benchmark runs. We varied the number of users in one thread increases the processing time of the thread. Thus, we the performed benchmark runs in both scenarios. For a low load counted the number of times the OS assigned a jobworker thread to scenario we utilized 5 concurrent users, for a medium load scenario a different CPU core. we utilized 20 concurrent users and for a high load scenario we context switches: This metric is only available for the whole utilized 50 concurrent users. For further performance insights, we system. It describes the switching of the CPU from one thread to decided to vary parameter x 1 . Thus, we utilized two different user another. The system has to perform multiple time-consuming steps, definitions. With the first definition, we did not conduct the TPC- when performing a context switch. Thus, this metric indicates a H benchmark as intended. We chose to run each TPC-H query negative impact on the performance. isolated and defined the user as the number of concurrently active executions of the same query. In the second definition, we defined 2.2.2 Benchmark design. We designed our experiments in order the TPC-H user as intended. Thus, each user represented a different to get fine-grained information about the thread usage of the data- set of queries in in a specified sequence, which was unique for base tenants in the virtualized environment of a cloud provider. each user. Summarizing, we varied parameter x 1 , x 2 and x 3 in this The parameters of the experiment are listed in Table 2. In our first second set of benchmark runs. benchmark run, we aimed at getting fine-grained information about SAP HANA is a highly configurable software system. Hence, the performance behavior of the individual TPC-H queries. We as- it offers various parameters to optimize the performance of the signed two CPU cores to our database LPAR for the first benchmark database. The parameter max_concurrency limits the maximum run and executed all 22 queries consecutively on a single tenant. number of jobworker threads a database tenant can utilize. SAP Since we wanted to exclude caching effects from our results, we exe- recommends setting the parameter to a value equal to the number cuted the queries as regular, non-prepared statements. To avoid any of available CPU cores divided by the number of tenant databases. interferences between the queries, we configured a waiting time In this third set of benchmark runs, we varied parameters x 2 , x 3 of 10 seconds after each query execution. Furthermore, we were and x 4 . We set parameter x 4 to either three, which limited the interested in monitoring data with a very high resolution in order number of jobworker threads or no value, which did not set any limit. D. Paluch et al. Furthermore, we set parameter x 1 to the user definition according to the TPC-H benchmark for this run and all following runs. 1400 Normalized CPU Time Utilized by Jobworkers In the fourth set of benchmark runs, we aimed to analyze the per- 20 1200 formance behavior in a virtualized cloud environment. In this setup, 1000 15 it is possible for administrators to assign more CPU resources to a Seconds 800 10 VM dynamically. For this reason, we increased the CPU assignment 600 from two to four CPUs during this benchmark run. It is also possi- 400 5 ble to migrate the VM to a server with a different CPU, which for 200 0 example offers different capabilities regarding SMT. Thus, we con- 1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21 sidered the performance-impact of simultaneous-multithreading TPC-H Query Class TPC-H Query Class in this paper. Summarizing, we decided to vary the parameters x 2 , (a) Response Time (b) Utilized CPU Time x 3 , x 5 and x 6 . Parameter x 4 has not been set during these runs to avoid performance restrictions. 500 Normalized Total of Threads Set to Sleep 3 RESULTS 50 Number of Active Jobworkers 400 3.1 Analyzing the resource demand of the 40 300 individual TPC-H queries 200 30 In this section, we analyze the thread-related resource demand of 100 20 the individual TPC-H queries utilizing our collected monitoring 0 data. Figure 1 shows the thread-related resource demand through 1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21 TPC-H Query Class TPC-H Query Class the previously described performance metrics in a single tenant environment. We analyzed each query individually, formed groups (c) Active Threads (d) Sleeping Phases and pointed out any anomalies. grp1 (Query 1, 18): Both queries stand out due to their high response time. Furthermore, both queries are rather CPU intensive. 300 32000 Normalized Number of CPU Assignments Normalized Number of Context Switches The status of the utilized jobworker threads are comparatively rarely 250 set from running to sleeping, which enhances the efficiency. These 200 28000 threads are also rarely assigned to a different CPU core. The number 150 24000 of context switches performed during the execution of these queries 100 is also comparatively low. However, query 18 utilizes a much higher 50 20000 number of jobworker threads indicating a better parallelizability. 1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21 grp2 (Query 9, 13, 21): The resource utilization of these queries is TPC-H Query Class TPC-H Query Class similar to the previous set of queries. In this case, all queries utilize a high number of jobworker threads. In addition, the processing phase (e) CPU Assignments (f) Context Switches of these threads is interrupted only rarely through sleeping phases. The number of context switches is higher than in the previous set. Figure 1: Resource demand grp3 (Query 15, 16, 22): The major differences to the resource demand of the previous query set are the low response times. The number of sleeping phases and the number of context switches are assignment of CPU cores, the number of sleeping phases and the rising to a value slightly below the average value. CPU time. grp4 (Query 6, 13): In contrast to the previous set, these queries In most cases, the response time is very close to the actual show a much lower utilization of jobworker threads. The number processing time in this scenario with only one active user. How- of sleeping phases also increases. ever, query 2, 3, 12 and 21 show differences between these times. grp5 (Query 4, 14, 17, 19, 20): Compared to the previous set, these In conclusion, our fine-grained monitoring solution allows cloud queries show only an average CPU utilization. All queries only providers to examine the resource demand of the specific workload utilize a low number of jobworker threads. Except for the queries 4 in detail. and 14, the processing phases are often interrupted through sleeping phases. This also results in a higher number of different assignments 3.2 Performance behavior in different to the CPU cores. workload scenarios grp6 (Query 5, 10, 12): The resource demand of these queries In this section, we analyze the thread-related resource demand is very similar to the previous query set. However, they utilize a of the individual queries when changing the workload scenarios. higher number of jobworker threads and are less often assigned to Figure 2a shows the effects of workload changes on the query perfor- different CPU cores. mance when a user is running all queries in a predefined sequence grp7 (Query 2, 3, 7, 8): These queries show the lowest utilization compared to the repeated execution of only a single query. It is not- of the CPU. In this case, this also results in a high number of con- icable, that only CPU intensive queries (i.e. grp1 and 2) can benefit text switches. Query 2 shows a very high variance regarding the from the new workload scenario. However, a low CPU utilization Monitoring an In-Memory Database in a Cloud Environment does not necessarily result in a large loss of performance as for 5 Users 20 Users 50 Users 5 Users 20 Users 50 Users 500 150 example query 3 shows. The number of sleeping phases also affects Average Performance Change (in Percent) Average Performance Change (in Percent) the performance. In most cases, the effect is stronger in high load 100 0 scenarios with 50 active users. In multi-tenant environments, the 50 effect is also stronger than in single-tenant environments. For CPU -500 0 intensive queries, changing the user definition results in a decreased -50 -1000 probability for the CPU being blocked by another CPU intensive -100 query. Thus, these queries benefit from the workload change. How- -1500 -150 ever, for less CPU intensive queries (i.e. grp7) the propability for 1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21 the CPU being blocked by a more CPU intensive query increases. TPC-H Query TPC-H Query In conclusion, these benchmark results show the importance of per- (a) Impact of workload changes (b) Limited number of job- formance predictions in the cloud context. Changes in the workload on the performance in a single worker threads in a single ten- which can occur i.e. when the usage profile of one tenant changes. tenant environment ant environment However, these simple changes can result in major performance losses depending on the specific type of workload. 5 Users 20 Users 50 Users 5 Users 20 Users 50 Users 800 350 300 Average Performance Change (in Percent) Average Performance Change (in Percent) 600 3.3 Performance behavior in different runtime 250 400 200 configurations 150 200 In this section, we describe the performance influence of runtime 100 environment factors. In a first experiment, we limited the number 0 50 of jobworker threads the database can utilize and compared the -200 0 results to the setup with unlimited jobworker threads. 1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21 TPC-H Query TPC-H Query Figure 2b shows mixed results in the environment with only one active tenant. In general, queries with an increased number (c) Limited number of job- (d) Asignment of more CPU re- of sleeping phases (i.e. grp5) seem to benefit from the parameter worker threads in a multi sources in a single tenant envi- especially in scenarios with only a low load. Through the static tenant environment ronment parameter max_concurrency a single database tenant cannot longer Average Performance Change Compared to SMT-OFF (in Percent) Average Performance Change Compared to SMT-OFF (in Percent) SMT-2 SMT-4 SMT-8 SMT-2 SMT-4 SMT-8 fully utilize the CPU resources in many workload scenarios. This 500 150 results in the decreased performance for most OLAP queries. In 400 multi-tenant environments, Figure 2c shows a much clearer picture 100 300 of the performance behavior with the limiting parameter enabled. In scenarios with low load, queries with a higher number of sleeping 200 50 phases benefit clearly from the parameter change. However, CPU 100 intensive queries with less sleeping phases (i.e. grp1 and 2) show a performance loss. In a less intense workload, the probability of 0 0 1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21 the CPU resources being blocked through an CPU intensive query TPC-H Query TPC-H Query decreases. Thus, the duration of the sleeping phases can be de- (e) Performance improvement (f) Performance improvement creased. In scenarios with higher loads, the effect does not continue, through SMT in a single ten- through SMT in a single ten- as there are only slight differences in the performance behavior ant environment with 5 active ant environment with 50 active in these cases. Additionally, we noticed a significant lower differ- Users Users ence between the response time and the processing time during the benchmark runs with a limited number of jobworker threads. This Figure 2: Resulting performance changes can be explained by an increased resource availability for other relevant database threads. In order to analyze the performance improvement through the assignment of more CPU resources, we to the increased CPU resources. To further analyze the performance changed the CPU assignment of the LPAR from two to four for the improvement through hardware multithreading, we changed the next benchmark runs. Figure 2d shows the performance improve- setup of our database LPAR to utilize no SMT at all. Afterwards, ment through the additional CPU resources. In general, queries with we set the LPAR to utilize SMT-2, SMT-4 and SMT-8. Figure 2e more sleeping phases especially profit from the additional resources and Figure 2f show the resulting performance improvements of the in the scenario with only five active users. In these scenarios, the different SMT-settings compared to the benchmark run with SMT huge difference between the performance improvements of the disabled. It is noticeable, queries with a high CPU demand combined individual queries is noticeable. The effect is much less intense with a high count of sleeping phases and a relatively low number of in high load scenarios. The performance improvement is slightly active jobworker threads (i.e. grp5) usually benefit from SMT. This higher in multi-tenant scenarios. This performance behavior can be effect is very intense in low-load scenarios as Figure 2e shows. In explained by a decreasing duration of the sleeping phases. Threads multi-tenant scenarios, this effect further increases. Queries with a in the sleeping status can be assigned faster to a processing unit due low CPU demand and a high number of active jobworker threads D. Paluch et al. (i.e. grp6 and 7) show almost no benefit with SMT-2 enabled. They 5 CONCLUSION AND FUTURE WORK also show a lower performance with SMT-8 compared to SMT-4. In our work, we provided fine-grained performance insights on the This is the result of these queries not being able to benefit from the in-memory database SAP HANA. We have built a monitoring setup additional SMT capabilities. Increasing complexity regarding the allowing us to perform a detailed analysis of the thread-utilization access of the CPU cache results in a slight performance decrease in of the database. Our setup is also capable of collecting data with such cases. Figure 2f shows the performance improvements in high a very high resolution, preventing any losses through inaccurate load scenarios. In general, the differences between the individual monitoring data. We have shown the dependency of several metrics queries regarding their performance improvement through SMT on the performance behavior in multiple scenarios. Furthermore, are much lower than in low load scenarios. Additionally, all queries our monitoring setup allowed us to group the TPC-H queries ac- benefit from SMT-4 and SMT-8 under high workload. This perfor- cording to their resource demand. The fine-grained analysis of mance behavior can be explained by the higher resource demand the resource-demand of different queries allowed us to explain related to this workload scenario. anomalies when observing their performance behavior in different In conclusion, these results show the dependency of the perfor- workload scenarios as well as in different runtime environments. mance on multiple workload-related aspects. Depending on the In further work, we plan to create a fine-grained performance workload, identical changes in the database configuration can ei- prediction model allowing us to simulate the performance behavior ther improve the performance or result in performance losses. Our in different scenarios. monitoring solution allows cloud providers to closer analyze their workload. In order to operate the databases in an efficient and cost- REFERENCES effective manner, this analysis is crucial. Detailed knowledge about [1] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, the resource usage of the specific workload allows cloud providers Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. 2010. A View of Cloud Computing. Commun. ACM 53, 4 (April to deploy database tenants efficiently. Furthermore, valuable re- 2010), 50–58. https://doi.org/10.1145/1721654.1721672 sources can be assigned dynamically where they are needed. To as- [2] Andreas Brunnert, Christian Vögele, Alexandru Danciu, Matthias Pfaff, Manuel sign too many or too few resources is disadvantageous, since either Mayer, and Helmut Krcmar. 2014. Performance Management Work. Business & Information Systems Engineering 6, 3 (01 Jun 2014), 177–179. https://doi.org/10. performance goals are not met or unneeded resources are assigned. 1007/s12599-014-0323-7 With detailed knowledge about the workload, cloud providers can [3] F. Dehne, Q. Kong, A. Rau-Chaplin, H. Zaboli, and R. Zhou. 2015. Scalable real- avoid both situations. Changing CPU-related resources i.e. by mi- time OLAP on cloud architectures. J. Parallel and Distrib. Comput. 79-80 (2015), 31 – 41. https://doi.org/10.1016/j.jpdc.2014.08.006 Special Issue on Scalable Systems grating the database to a more powerful server or by assigning for Big Data Management and Analytics. more CPU-resources in a virtualized environment also results in [4] Martin Grund, Jan Schaffner, Jens Krueger, Jan Brunnert, and Alexander Zeier. 2010. The Effects of Virtualization on Main Memory Systems. In Proceedings of differing degrees of success depending on the specific workload the Sixth International Workshop on Data Management on New Hardware (DaMoN scenario. ’10). ACM, New York, NY, USA, 41–46. https://doi.org/10.1145/1869389.1869395 [5] J. L. Lo, L. A. Barroso, S. J. Eggers, K. Gharachorloo, H. M. Levy, and S. S. Parekh. 1998. An analysis of database workload performance on simultaneous multi- 4 RELATED WORK threaded processors. In Proceedings. 25th Annual International Symposium on In [12], the author provides performance insights into the in-memory Computer Architecture (Cat. No.98CB36235). 39–50. https://doi.org/10.1109/ISCA. 1998.694761 database SAP HANA in a multi-tenant configuration. However, he [6] Karsten Molka and Giuliano Casale. 2015. Experiments or simulation? A charac- only considers the database and the applied workload as a black box terization of evaluation methods for in-memory databases. In 11th International and give no further insights about performance-relevant factors. Conference on Network and Service Management (CNSM 2015). IEEE, 201–209. https://doi.org/10.1109/CNSM.2015.7367360 Furthermore, he does not consider the efficiency of thread usage [7] Karsten Molka and Giuliano Casale. 2016. Contention-Aware Workload Place- in his work. In his experiments only small sized tenants are used, ment for In-Memory Databases in Cloud Environments. ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS 2016) 2, which is unlikely in a real world scenario. 1, Article 1 (Sept. 2016), 29 pages. https://doi.org/10.1145/2961888 The authors in [6] provide more fine-grained performance in- [8] K. Molka and G. Casale. 2016. Efficient Memory Occupancy Models for In-memory sights into SAP HANA in a multi-tenant configuration considering Databases. In 2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS). 430–432. amongst other factors differently sized tenant databases, a vary- https://doi.org/10.1109/MASCOTS.2016.56 ing workload and different CPU assignments. In [8] they extend [9] K. Molka and G. Casale. 2017. Energy-efficient resource allocation and provision- their work by providing new models for the prediction of memory ing for in-memory database clusters. In 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM). 19–27. https://doi.org/10.23919/INM.2017. occupancy. In [7], they further extend their work and provide in- 7987260 sights into the usage of threads. However, they only measure the [10] Tobias Mühlbauer, Wolf Rödiger, Andreas Kipf, Alfons Kemper, and Thomas Neumann. 2015. High-Performance Main-Memory Database Systems and Modern average number of utilized CPU cores to obtain the thread usage Virtualization: Friends or Foes?. In Proceedings of the Fourth Workshop on Data through the queries. Furthermore, they utilize a lower monitoring Analytics in the Cloud (DanaC’15). ACM, New York, NY, USA, Article 4, 4 pages. resolution resulting in a lower accuracy when considering the re- https://doi.org/10.1145/2799562.2799643 [11] Dominik Paluch, Harald Kienegger, and Helmut Krcmar. 2018. A Workload- source demand of the individual queries. In this paper, we could Dependent Performance Analysis of an In-Memory Database in a Multi-Tenant show the limitations of this approach by performing benchmarks Configuration. In Companion of the 2018 ACM/SPEC International Conference in various scenarios. Considering only the utilization of CPU cores on Performance Engineering (ICPE ’18). ACM, New York, NY, USA, 131–134. https://doi.org/10.1145/3185768.3186290 is not sufficient to explain the performance behavior of the TPC-H [12] Jan Schaffner. 2014. Multi Tenancy for Cloud-Based In-Memory Column Databases: queries in our scenarios. Thus, we extended their work by providing Workload Management and Data Placement. Springer International Publishing, Heidelberg. https://doi.org/10.1007/978-3-319-00497-6_1 more fine-grained insights into thread usage in varying hardware [13] Transaction Processing Performance Council. 2018. TPC-H benchmark specifica- environments. tion. http://www.tpc.org/tpch/.