An effective resource management approach in a FaaS environment Andreas Christoforou, Andreas S. Andreou Cyprus University of Technology {andreas.christoforou,andreas.andreou}@cut.ac.cy Abstract cloud providers adopted and currently support and of- fer this architecture. AWS Lambda [3], IBM Cloud Serverless computing introduces a new Cloud service Functions [4], Google Cloud Functions [5] and Mi- which consist an increasingly popular architecture for crosoft Azure Functions [6] are the major serverless building distributed applications. This paper investi- providers. This new cloud paradigm is becoming in- gates and proposes a new resource management ap- creasingly popular and is gaining great attention from proach in a FaaS platform, based on intelligent tech- the software industry and research community. A num- niques. A number of experiments applied through an ber of new technical challenges and open problems [7] indicative framework consisting of a client application have emerged and a number of questions have been and a Lambda function. Three Genetic Algorithms set about the importance and the future of serverless were employed to deliver optimal solutions in a multi- computing. objective environment. Resource management support in such a FaaS envi- ronment is essential for the software development pro- 1 Introduction cess itself, which is directed towards satisfying the SLA and providing QoS assurance. The identification of the Serverless computing is a relatively new processing optimum scenario for resource allocation to serve ade- paradigm or model that emerged through the con- quately a specific workload is a tedious, computation- tinuous and vast development of the Cloud. Server- ally complex and time-consuming process since multi- less computing provides a service in which develop- ple objectives need to be satisfied. ers can write and deploy code without provisioning or This research work is motivated by the following two managing servers or containers. The adoption of this Research Questions (RQ) paradigm has great impact on several software engi- • RQ1: Is it possible to implement easy to use neering aspects such as development process, pricing and efficient resource management algorithms in model and Quality of Sercice (QoS) assurance. a FaaS platform? Despite the various benefits and advantages of serverless computing, like zero server management, no • RQ2: How intelligent techniques can deliver effi- up-front provisioning, high availability, auto-scalability cient resource management to developers in a FaaS and pay only for the resources used, there are also some environment with the minimum possible cost and weaknesses that should also be taken into account: As time? currently offered from providers, it is not suitable for long term tasks because of the limited time a service The Amazon Lambda platform is considered a com- can run; additionally, there is increasing complexity of plete platform as it offers the most features, while the underlying architecture, which is intensified by the presents the greatest market share; for these reasons lack of appropriate operational tools. it was selected to constitute our experimental environ- The main representative of this new service architec- ment. ture is Function as a Service (FaaS) or event-based pro- The rest of this paper is organized as follows: Section gramming [1], where a function may triggered through 2 provides a brief overview of relevant literature focus- an api call or by an event. Since Amazon introduced ing more on resource management approaches. Section Lambda serverless platform in late 2014 [2], many other 3 introduces the proposed approach which addresses 2 the two research questions. The experimental process to the first research question. In this research work an is described in section 4, while section 5 discuses the exhaustive algorithm will be applied on a low demand results obtained. Finally, section 6 concludes the paper environment and a small scale workload with the re- and outlines future research steps. sults obtained being considered as the reference data for the proposed intelligent approaches, the latter be- ing able to reach to solutions faster. Our proposition is 2 Literature Review the employment of a multi-objective genetic algorithms (MOGAs) [15] as our optimization method that will A short literature review has been carried out that is generate near-optimal solutions and their performance not limited to resource management approaches, since will be compared with the reference optimal solutions very few relevant research works were identified. extracted by the exhaustive algorithm. Two research works refer to efficient resource man- agement: In [8] a solution using a well-known re- source allocation strategy for a Lambda platform was presented based on the model predictive controller Genetic algorithms are a type of evolutionary algo- (MPC). This solution designs a resource allocation rithm, which are widely used to solve search-based op- policy through the understanding of the run-time at- timization problems by simulating the theory of natu- tributes of the workload. The authors in [9] performed ral evolution on a population of individuals (candidate a dedicated test to identify if cost optimization is feasi- solutions). Problems like the one this study is dealing ble when utilizing increased resources for lowering pro- with, require the optimization of multiple criteria at cessing times. the same time. In such a case multi-objective genetic Evaluation of main FaaS providers was performed algorithms can be adopted, where the goal is to find in [10] and relevant results were presented in terms of the best solution by otimizing a set of objective func- throughput, network bandwidth, a file I/O and perfor- tions. In case of conflicting or competing objectives, mance computed according to concurrent invocations. a multi-objective genetic algorithm normally delivers a In [11], the authors report results from a comprehen- set of optimal solutions instead of a single one. This sive investigation on the performance of microservices set of optimal solutions is called Pareto optimal set (or hosted by a serveless platform. The investigation dealt Pareto front) and contains those solutions that are not with implications of infrastructure elasticity, load bal- dominated by any other solution yielded during evolu- ancing, provisioning variation, infrastructure retention, tion. Each optimal solution constitutes a specific bal- and memory reservations. A micro-benchmark intro- ance between the objectives under optimization, where duced in [12] was used to evaluate the performance and any improvement in one of them leads to worsening the cost model of popular FaaS providers. other (conflicting targets, e.g. cost vs time in our case). An open-source FaaS tool called Snafu was intro- Therefore, a decision maker is provided with the set of duced in [13] which is employed for managing, execut- optimal solutions and is therefore supported to take de- ing and testing functions across provider-specific inter- cisions as to which values of the decision variables are faces. Finally the work of Hong et al. [14] describes most suited based on the targets and the requirements six serverless design patterns that can be used to build of his/her application. serverless applications and services. In the context of our application, the candidate so- 3 Multi-Objective Optimization lutions, or alternatively the decision variables, consist Approach of two of the available configuration options offered by AWS Lambda platform, namely memory allocation Our general aim is to allocate a sufficient amount of re- and number of maximum concurrency functions, as sources in a FaaS environment that should be able to well as a third variable, the batch size, that represents serve a specific workload. By utilizing an exhaustive the number of inputs that each individual function is algorithm we first aim to identify the optimal solutions required to process. Our approach is graphically rep- for both objectives, cost and performance. Exhaustive resented in Figure 1 where the three decision variables approaches have great demands on resources and, sub- in conjunction with the characteristics of the workload sequently, on cost, and thus they cannot be the answer define the level of Cost and Duration. 3 Figure 1: Proposed Multi-objective Optimization Approach 4 Experimental Process 4.1 Experimental Environment An application that is used to perform the experimen- tal process has been implemented by utilizing services offered by Amazon AWS platform. More specifically, this application is based on the idea introduced in Amazon Big Data Blog 1 where in a map-reduce style it counts the words in files stored in an S3 2 bucket. This application streams the total number of words each function has calculated, returns it back to the user in real-time and demonstrates how a Lambda function Figure 2: Experimental Environment can efficiently process large amounts of data in short time and provide immediate results to users. In our experimentation study our application was Python. The synchronous invocations of lambda func- implemented as follows: On the client side, besides tions is executed and controlled by utilizing Python’s the main function that triggers the whole process, a multi-threading module. The integrated experimental cascade function was also implemented which is re- environment is depicted in Figure 2. sponsible to sense the workload size and accordingly distribute the data in a synchronous way over a num- 4.2 Exhaustive Algorithm ber of lambda functions. This function is also respon- sible to collect and aggregate the results taken from As mentioned before, our multi-objective optimization the lambda functions responses and when all func- approach was adjusted and configured based on the tions are completed it returns the final result to the AWS Lambda platform and taking into account the user. On the AWS platform a lambda function was available options offered. The two objectives are cost created that counts the words of a given batch of files and performance. The cost objective is the minimiza- which are stored in a AWS S3 bucket. Data stored tion of the total cost required for the completion of the in the S3 bucket represents the workload that will be process of the input workload and is calculated using processed with specific features. Both the client and the formulas and rules as these are given by Amazon 4 . lambda sides were implemented in Python 3.7 and the The calculation of the cost depends on the number of integration with the AWS services, S3 and Lambda lambda functions executions, the total duration of all was achieved through the Boto3 3 , the AWS SDK for executed functions and the allocated memory. The per- formance objective is also the minimization of the total 1 https://aws.amazon.com/blogs/big-data/building-scalable- duration needed for the workload process completion and-responsive-big-data-interfaces-with-aws-lambda/ and is calculated as the time from the moment the 2 https://aws.amazon.com/s3/ 3 4 https://aws.amazon.com/sdk-for-python/ https://aws.amazon.com/lambda/pricing/ 4 user sends the start request until the application de- objective functions, Cost and Duration, for values that livers back to the user the total count of words. The belong to the PS set (see equation 2). set of decision variables consists of the memory allo- cation size, the number of maximum concurrent func- tions and the batch size. Memory allocation denotes minimizef (x) = (fduration (x), fcost (x)), x ∈ P S (2) the amount of memory you want to allocate for your lambda function. The values memory can get, ranged Since all three decision variables are real-valued, we from 128MB to 3008 MB with 64MB increment step. accordingly use best practices for setting the MOGAs The concurrent execution limit can be set from 1 to configuration. For crossover and mutation operators, 1000. Finally, the batch size represents the number the Simulated Binary Crossover (SBX) and Polynomial of files that each function will process and in our ex- Mutation (PM) were selected respectively. The same periments its values are relative to the percentage to parameters and settings were used for all executions. the workload size and fall into the following set: [1, 2, As one can easily discern from Figure 3, all algorithms 5, 10, 20, 25, 50, 100]. The S3 bucket which is used yielded very similar solutions which are almost identi- for the workload, contains 100 text files with each file cal to the reference optimal. A more detailed analysis containing 638 words. The exhaustive algorithm that of the results follows in the next section. was used calculated and delivered all possible candi- date solutions. To reduce execution time and cost we discarded solutions for concurrency limit over 100 since 5 Results and Discussion the size of the workload is 100 and it does not make sense to take these solutions into account. The number The execution of the exhaustive algorithm on the ex- of Possible Solutions (PS) is calculated using equation perimental application delivered a complete list of PS. 1 and is calculated to be equal to 36800. The extracted values constitute the aggregation of five different executions in order to minimize or even elim- |P S| = N × M × K (1) inate possible variations between the results under ex- actly the same configuration conditions. As described where, N is the number of the possible values of the above, the results from the exhaustive algorithm were concurrency limit and is equal to 100, M denotes the considered as the reference data and were used for total number of values for memory and is equal to 46 the assessment of the three MOGAs employed. Each and K is the number of values the workload batch size MOGA was run 100 times for different values of fitness can get and is equal to 8. evaluations (FE) ranging from 500 to 4500 with incre- ment step 500. The Pareto optimal front that emerged 4.3 Multi-objective Genetic Algorithms from the reference optimal solutions in contrast to the Pareto near-optimal solutions yielded by each MOGA We selected three well-known and widespread MO- for 1000 fitness evaluations, is depicted in Figure 3. By GAs to assess their ability to solve the problem in observing the Pareto fronts, one can easily conclude hand and compare their results: The Non-dominated that all three MOGAs approached the optimal solu- Sorting Genetic Algorithm II (NSGA-II) [15], the tions to a high degree; in fact, in some cases the dom- Non-dominated Sorting Genetic Algorithm III (NSGA- inant MOGA solutions are exactly the same as those III) [16] and the Strength Pareto Evolutionary Algo- of the reference set. At this point we will utilize some rithm 2 (SPEA2) [17]. The selection of these specific metrics aiming to assess further and compare the per- algorithms was made after performing some prelimi- formance [18] of the three MOGAs we employed on our nary experimentation which indicated that these three approach. present a consistently good performance. The relative configuration of the required parame- ters, as well as the overall implementation of the al- 5.1 MOGAs performance through quality gorithms, were performed using Platypus 5 , a Python- indicators based multi-objective optimization algorithms library. The hypervolume (HV) [19] and the inverted genera- The aim is to minimize a vector consisting of the two tional distance (IGD) [20] quality indicators were se- 5 https://platypus.readthedocs.io/en/latest/index.html lected to assist in comparing the three MOGAs with 5 Table 1: Hypervolume(HV) values FE HV (x10−6 ) NSGA-II NSGA-III SPEA2 500 999667.063 999658.576 999755.043 1000 999755.043 999656.438 999960.329 1500 999960.329 999655.593 999960.329 2000 999960.329 999654.920 999960.329 2500 999960.329 999655.384 999960.329 3000 999960.329 999655.777 999960.329 Figure 3: Pareto front for 1000 fitness evaluations (FE) 3500 999960.329 999655.230 999960.329 4000 999960.329 999655.687 999960.329 respect to performance and scalability, given the abil- 4500 999960.329 999657.071 999960.329 ity of the aforementioned metrics to assess both con- vergence and diversity (uniformity and spread) of the algorithms. Specifically, the HV indicator assesses the volume covered by the non-dominated solutions of a Pareto front in the objective space. Therefore, the larger the volume covered by the solutions generated in a run, the higher the HV value, which indicates a better performance. The IGD indicator assesses how far the elements of the true Pareto front (reference data in our case) are from the non-dominated points of an approximation Pareto front. Therefore, the greater the extent of the true Pareto front that is covered by the non dominated points generated by a run in the objec- Table 2: Inverted Generational Distance(IGD) values tive space, the lower the IGD value, which denotes a better performance. Each algorithm was run 100 times FE IGD (x10−6 ) for each number of FE and the median values of the NSGAII NSGAIII SPEA2 HV and IGD were calculated for each algorithm. These 500 60727.255 61344.367 57375.237 values are presented in Tables 1 and 2 respectively. 1000 60250.148 57791.945 57792.130 The differences observed in the indicators values be- 1500 57792.130 57792.189 57792.130 tween the compared algorithms are too small, and this 2000 57792.130 57787.907 57792.130 most probably is the result of the small complexity 2500 57792.130 57792.125 57792.130 of the workload used; however, in cases of application 3000 57792.130 57791.964 57792.130 workloads with increasingly greater complexity and/or 3500 57792.130 57790.733 57792.130 scale, these differences will become more profound. As 4000 57792.130 57792.097 57792.130 regards the HV indicator, SPEA2 presents the best 4500 57792.130 57792.179 57792.130 performance, that is, the highest hypervolume value, and this is consistent along all fitness evaluation num- bers used. Second best for this indicator is the NSGA- II. It is important to observe that in the case SPEA2 in the 1000 fitness evaluations rub the HV value stabilizes to a constant value from the second measurement on- 6 are able to solve the problem of finding a set of near- Table 3: Pairwise comparison for HV indicator optimal solutions that support developers in a FaaS NSGA-II NSGA-III SPEA2 environment to select an efficient resource allocation NSGA-II 0.074 0.18 scheme with respect to cost and time. In our case NSGA-III 0.005 this was verified through the experimental process fol- lowed; therefore, the question raised now, and will be SPEA2 the source of future research work, is whether the sup- port obtained by the MOGAs is worth investigating further in terms of real-time response, ”execution on Table 4: Pairwise comparison for IGD indicator the fly as workloads come in” and the level to which this support can be generalized to cover also workloads NSGA-II NSGA-III SPEA2 of unknown characteristics. NSGA-II 0.241 0.18 NSGA-III 0.285 SPEA2 Acknowledgments The paper is part of the outcomes of the Twinning project Dossier Cloud. This project has received fund- wards. The same behaviour is also observed for NSGA- ing from the European Union’s Horizon 2020 research II but from the third measurement onwards. On the and innovation programme under grant agreement No. contrary, NSGA-III presents more fluctuations in its 692251. measurements. In the case of the IGD indicator, things appear more complicated since none of the algorithms seems to pre- References vail. A notable point is that for the lowest measure of [1] Fox GC, Ishakian V, Muthusamy V, Slominski A. the fitness evaluations numbers, SPEA2 clearly outper- Status of serverless computing and function-as- forms the others. The values for NSGA-II and SPEA2 a-service (faas) in industry and research. arXiv starting from the third measurement onwards are fully preprint arXiv:170808028. 2017;. identical, while NSGA-III shows ups and downs and in three cases seems better than the other two. [2] Adzic G, Chatley R. Serverless computing: eco- The Wilcoxon signed-rank test was applied on both nomic and architectural impact. In: Proceedings quality indicators to detect whether or not a statisti- of the 2017 11th Joint Meeting on Foundations of cally significant difference exist among the three algo- Software Engineering. ACM; 2017. p. 884–889. rithms. To handle the family-wise error rate accumu- lated, p-values were adjusted using a post-hoc Holm [3] AWS Lambda;. Accessed: 2018-09-30. https: procedure. The p-values resulting from the pairwise //aws.amazon.com/lambda/. comparison for HV and IGD indicators are shown in Tables 3 and 4 respectively where pairs of algorithms [4] IBM Cloud Functions;. Accessed: 2018-09-30. with a statistically significant difference (p <0.05) are https://www.ibm.com/cloud/functions. shown in boldface. According to the pairwise com- parisons, no significant difference is observed between [5] Google Cloud Functions;. Accessed: 2018-09-30. NSGA-II and NSGA-III, NSGA-II and SPEA2, and https://cloud.google.com/functions/. NSGA-III and SPEA2 in either indicator except only for the pair NSGA-III and SPEA2 in the case of HV. [6] Microsoft Azure Functions;. Accessed: 2018- 09-30. https://azure.microsoft.com/en-us/ services/functions/. 6 Conclusions and Future Work [7] Baldini I, Castro P, Chang K, Cheng P, Fink S, This research work performed a preliminary investiga- Ishakian V, et al. Serverless computing: Current tion to assess whether heuristic approaches for multi- trends and open problems. In: Research Advances objective optimization, like the specific MOGAs used, in Cloud Computing. Springer; 2017. p. 1–20. 7 [8] HoseinyFarahabady M, Taheri J, Tari Z, Zomaya [18] Riquelme N, Von Lücken C, Baran B. Perfor- AY. A dynamic resource controller for a lambda mance metrics in multi-objective optimization. In: architecture. In: Parallel Processing (ICPP), 2017 Computing Conference (CLEI), 2015 Latin Amer- 46th International Conference on. IEEE; 2017. p. ican. IEEE; 2015. p. 1–11. 332–341. [19] Zitzler E, Thiele L. Multiobjective evolutionary [9] Hoang A. Analysis of microservices and server- algorithms: a comparative case study and the less architecture for mobile application enable- strength Pareto approach. IEEE transactions on ment (PhD Dissertation). California State Uni- Evolutionary Computation. 1999;3(4):257–271. versity, Northridge; 2017. [20] Van Veldhuizen DA, Lamont GB. Multi- [10] Lee H, Satyam K, Fox G. Evaluation of produc- objective evolutionary algorithms: Analyzing tion serverless computing environments. In: 2018 the state-of-the-art. Evolutionary computation. IEEE 11th International Conference on Cloud 2000;8(2):125–147. Computing (CLOUD). IEEE; 2018. p. 442–450. [11] Lloyd W, Ramesh S, Chinthalapati S, Ly L, Pal- lickara S. Serverless computing: An investigation of factors influencing microservice performance. In: Cloud Engineering (IC2E), 2018 IEEE Inter- national Conference on. IEEE; 2018. p. 159–169. [12] Back T, Andrikopoulos V. Using a Microbench- mark to Compare Function as a Service Solutions. In: European Conference on Service-Oriented and Cloud Computing. Springer; 2018. p. 146–160. [13] Spillner J. Snafu: Function-as-a-service (faas) runtime design and implementation. arXiv preprint arXiv:170307562. 2017;. [14] Hong S, Srivastava A, Shambrook W, Dumitras T. Go serverless: securing cloud via serverless design patterns. In: 10th {USENIX} Workshop on Hot Topics in Cloud Computing (HotCloud 18). USENIX; 2018. . [15] Deb K, Pratap A, Agarwal S, Meyarivan T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary com- putation. 2002;6(2):182–197. [16] Deb K, Jain H. An evolutionary many-objective optimization algorithm using reference-point- based non dominated sorting approach, part I: Solving problems with box constraints. IEEE Trans Evolutionary Computation. 2014;18(4):577–601. [17] Zitzler E, Laumanns M, Thiele L. SPEA2: Im- proving the strength Pareto evolutionary algo- rithm. TIK-report. 2001;103. 8