=Paper=
{{Paper
|id=Vol-2330/paper1
|storemode=property
|title=An effective resource management approach in a FaaS environment
|pdfUrl=https://ceur-ws.org/Vol-2330/paper1.pdf
|volume=Vol-2330
|authors=Andreas Christoforou,Andreas S. Andreou
|dblpUrl=https://dblp.org/rec/conf/ucc/ChristoforouA18
}}
==An effective resource management approach in a FaaS environment==
An effective resource management approach in a FaaS environment
Andreas Christoforou, Andreas S. Andreou
Cyprus University of Technology
{andreas.christoforou,andreas.andreou}@cut.ac.cy
Abstract cloud providers adopted and currently support and of-
fer this architecture. AWS Lambda [3], IBM Cloud
Serverless computing introduces a new Cloud service Functions [4], Google Cloud Functions [5] and Mi-
which consist an increasingly popular architecture for crosoft Azure Functions [6] are the major serverless
building distributed applications. This paper investi- providers. This new cloud paradigm is becoming in-
gates and proposes a new resource management ap- creasingly popular and is gaining great attention from
proach in a FaaS platform, based on intelligent tech- the software industry and research community. A num-
niques. A number of experiments applied through an ber of new technical challenges and open problems [7]
indicative framework consisting of a client application have emerged and a number of questions have been
and a Lambda function. Three Genetic Algorithms set about the importance and the future of serverless
were employed to deliver optimal solutions in a multi- computing.
objective environment. Resource management support in such a FaaS envi-
ronment is essential for the software development pro-
1 Introduction cess itself, which is directed towards satisfying the SLA
and providing QoS assurance. The identification of the
Serverless computing is a relatively new processing optimum scenario for resource allocation to serve ade-
paradigm or model that emerged through the con- quately a specific workload is a tedious, computation-
tinuous and vast development of the Cloud. Server- ally complex and time-consuming process since multi-
less computing provides a service in which develop- ple objectives need to be satisfied.
ers can write and deploy code without provisioning or This research work is motivated by the following two
managing servers or containers. The adoption of this Research Questions (RQ)
paradigm has great impact on several software engi-
• RQ1: Is it possible to implement easy to use
neering aspects such as development process, pricing
and efficient resource management algorithms in
model and Quality of Sercice (QoS) assurance.
a FaaS platform?
Despite the various benefits and advantages of
serverless computing, like zero server management, no • RQ2: How intelligent techniques can deliver effi-
up-front provisioning, high availability, auto-scalability cient resource management to developers in a FaaS
and pay only for the resources used, there are also some environment with the minimum possible cost and
weaknesses that should also be taken into account: As time?
currently offered from providers, it is not suitable for
long term tasks because of the limited time a service The Amazon Lambda platform is considered a com-
can run; additionally, there is increasing complexity of plete platform as it offers the most features, while
the underlying architecture, which is intensified by the presents the greatest market share; for these reasons
lack of appropriate operational tools. it was selected to constitute our experimental environ-
The main representative of this new service architec- ment.
ture is Function as a Service (FaaS) or event-based pro- The rest of this paper is organized as follows: Section
gramming [1], where a function may triggered through 2 provides a brief overview of relevant literature focus-
an api call or by an event. Since Amazon introduced ing more on resource management approaches. Section
Lambda serverless platform in late 2014 [2], many other 3 introduces the proposed approach which addresses
2
the two research questions. The experimental process to the first research question. In this research work an
is described in section 4, while section 5 discuses the exhaustive algorithm will be applied on a low demand
results obtained. Finally, section 6 concludes the paper environment and a small scale workload with the re-
and outlines future research steps. sults obtained being considered as the reference data
for the proposed intelligent approaches, the latter be-
ing able to reach to solutions faster. Our proposition is
2 Literature Review the employment of a multi-objective genetic algorithms
(MOGAs) [15] as our optimization method that will
A short literature review has been carried out that is
generate near-optimal solutions and their performance
not limited to resource management approaches, since
will be compared with the reference optimal solutions
very few relevant research works were identified.
extracted by the exhaustive algorithm.
Two research works refer to efficient resource man-
agement: In [8] a solution using a well-known re-
source allocation strategy for a Lambda platform was
presented based on the model predictive controller Genetic algorithms are a type of evolutionary algo-
(MPC). This solution designs a resource allocation rithm, which are widely used to solve search-based op-
policy through the understanding of the run-time at- timization problems by simulating the theory of natu-
tributes of the workload. The authors in [9] performed ral evolution on a population of individuals (candidate
a dedicated test to identify if cost optimization is feasi- solutions). Problems like the one this study is dealing
ble when utilizing increased resources for lowering pro- with, require the optimization of multiple criteria at
cessing times. the same time. In such a case multi-objective genetic
Evaluation of main FaaS providers was performed algorithms can be adopted, where the goal is to find
in [10] and relevant results were presented in terms of the best solution by otimizing a set of objective func-
throughput, network bandwidth, a file I/O and perfor- tions. In case of conflicting or competing objectives,
mance computed according to concurrent invocations. a multi-objective genetic algorithm normally delivers a
In [11], the authors report results from a comprehen- set of optimal solutions instead of a single one. This
sive investigation on the performance of microservices set of optimal solutions is called Pareto optimal set (or
hosted by a serveless platform. The investigation dealt Pareto front) and contains those solutions that are not
with implications of infrastructure elasticity, load bal- dominated by any other solution yielded during evolu-
ancing, provisioning variation, infrastructure retention, tion. Each optimal solution constitutes a specific bal-
and memory reservations. A micro-benchmark intro- ance between the objectives under optimization, where
duced in [12] was used to evaluate the performance and any improvement in one of them leads to worsening the
cost model of popular FaaS providers. other (conflicting targets, e.g. cost vs time in our case).
An open-source FaaS tool called Snafu was intro- Therefore, a decision maker is provided with the set of
duced in [13] which is employed for managing, execut- optimal solutions and is therefore supported to take de-
ing and testing functions across provider-specific inter- cisions as to which values of the decision variables are
faces. Finally the work of Hong et al. [14] describes most suited based on the targets and the requirements
six serverless design patterns that can be used to build of his/her application.
serverless applications and services.
In the context of our application, the candidate so-
3 Multi-Objective Optimization lutions, or alternatively the decision variables, consist
Approach of two of the available configuration options offered
by AWS Lambda platform, namely memory allocation
Our general aim is to allocate a sufficient amount of re- and number of maximum concurrency functions, as
sources in a FaaS environment that should be able to well as a third variable, the batch size, that represents
serve a specific workload. By utilizing an exhaustive the number of inputs that each individual function is
algorithm we first aim to identify the optimal solutions required to process. Our approach is graphically rep-
for both objectives, cost and performance. Exhaustive resented in Figure 1 where the three decision variables
approaches have great demands on resources and, sub- in conjunction with the characteristics of the workload
sequently, on cost, and thus they cannot be the answer define the level of Cost and Duration.
3
Figure 1: Proposed Multi-objective Optimization Approach
4 Experimental Process
4.1 Experimental Environment
An application that is used to perform the experimen-
tal process has been implemented by utilizing services
offered by Amazon AWS platform. More specifically,
this application is based on the idea introduced in
Amazon Big Data Blog 1 where in a map-reduce style
it counts the words in files stored in an S3 2 bucket.
This application streams the total number of words
each function has calculated, returns it back to the user
in real-time and demonstrates how a Lambda function
Figure 2: Experimental Environment
can efficiently process large amounts of data in short
time and provide immediate results to users.
In our experimentation study our application was Python. The synchronous invocations of lambda func-
implemented as follows: On the client side, besides tions is executed and controlled by utilizing Python’s
the main function that triggers the whole process, a multi-threading module. The integrated experimental
cascade function was also implemented which is re- environment is depicted in Figure 2.
sponsible to sense the workload size and accordingly
distribute the data in a synchronous way over a num- 4.2 Exhaustive Algorithm
ber of lambda functions. This function is also respon-
sible to collect and aggregate the results taken from As mentioned before, our multi-objective optimization
the lambda functions responses and when all func- approach was adjusted and configured based on the
tions are completed it returns the final result to the AWS Lambda platform and taking into account the
user. On the AWS platform a lambda function was available options offered. The two objectives are cost
created that counts the words of a given batch of files and performance. The cost objective is the minimiza-
which are stored in a AWS S3 bucket. Data stored tion of the total cost required for the completion of the
in the S3 bucket represents the workload that will be process of the input workload and is calculated using
processed with specific features. Both the client and the formulas and rules as these are given by Amazon 4 .
lambda sides were implemented in Python 3.7 and the The calculation of the cost depends on the number of
integration with the AWS services, S3 and Lambda lambda functions executions, the total duration of all
was achieved through the Boto3 3 , the AWS SDK for executed functions and the allocated memory. The per-
formance objective is also the minimization of the total
1
https://aws.amazon.com/blogs/big-data/building-scalable- duration needed for the workload process completion
and-responsive-big-data-interfaces-with-aws-lambda/ and is calculated as the time from the moment the
2
https://aws.amazon.com/s3/
3 4
https://aws.amazon.com/sdk-for-python/ https://aws.amazon.com/lambda/pricing/
4
user sends the start request until the application de- objective functions, Cost and Duration, for values that
livers back to the user the total count of words. The belong to the PS set (see equation 2).
set of decision variables consists of the memory allo-
cation size, the number of maximum concurrent func-
tions and the batch size. Memory allocation denotes minimizef (x) = (fduration (x), fcost (x)), x ∈ P S (2)
the amount of memory you want to allocate for your
lambda function. The values memory can get, ranged Since all three decision variables are real-valued, we
from 128MB to 3008 MB with 64MB increment step. accordingly use best practices for setting the MOGAs
The concurrent execution limit can be set from 1 to configuration. For crossover and mutation operators,
1000. Finally, the batch size represents the number the Simulated Binary Crossover (SBX) and Polynomial
of files that each function will process and in our ex- Mutation (PM) were selected respectively. The same
periments its values are relative to the percentage to parameters and settings were used for all executions.
the workload size and fall into the following set: [1, 2, As one can easily discern from Figure 3, all algorithms
5, 10, 20, 25, 50, 100]. The S3 bucket which is used yielded very similar solutions which are almost identi-
for the workload, contains 100 text files with each file cal to the reference optimal. A more detailed analysis
containing 638 words. The exhaustive algorithm that of the results follows in the next section.
was used calculated and delivered all possible candi-
date solutions. To reduce execution time and cost we
discarded solutions for concurrency limit over 100 since 5 Results and Discussion
the size of the workload is 100 and it does not make
sense to take these solutions into account. The number The execution of the exhaustive algorithm on the ex-
of Possible Solutions (PS) is calculated using equation perimental application delivered a complete list of PS.
1 and is calculated to be equal to 36800. The extracted values constitute the aggregation of five
different executions in order to minimize or even elim-
|P S| = N × M × K (1) inate possible variations between the results under ex-
actly the same configuration conditions. As described
where, N is the number of the possible values of the above, the results from the exhaustive algorithm were
concurrency limit and is equal to 100, M denotes the considered as the reference data and were used for
total number of values for memory and is equal to 46 the assessment of the three MOGAs employed. Each
and K is the number of values the workload batch size MOGA was run 100 times for different values of fitness
can get and is equal to 8. evaluations (FE) ranging from 500 to 4500 with incre-
ment step 500. The Pareto optimal front that emerged
4.3 Multi-objective Genetic Algorithms from the reference optimal solutions in contrast to the
Pareto near-optimal solutions yielded by each MOGA
We selected three well-known and widespread MO- for 1000 fitness evaluations, is depicted in Figure 3. By
GAs to assess their ability to solve the problem in observing the Pareto fronts, one can easily conclude
hand and compare their results: The Non-dominated that all three MOGAs approached the optimal solu-
Sorting Genetic Algorithm II (NSGA-II) [15], the tions to a high degree; in fact, in some cases the dom-
Non-dominated Sorting Genetic Algorithm III (NSGA- inant MOGA solutions are exactly the same as those
III) [16] and the Strength Pareto Evolutionary Algo- of the reference set. At this point we will utilize some
rithm 2 (SPEA2) [17]. The selection of these specific metrics aiming to assess further and compare the per-
algorithms was made after performing some prelimi- formance [18] of the three MOGAs we employed on our
nary experimentation which indicated that these three approach.
present a consistently good performance.
The relative configuration of the required parame-
ters, as well as the overall implementation of the al- 5.1 MOGAs performance through quality
gorithms, were performed using Platypus 5 , a Python- indicators
based multi-objective optimization algorithms library.
The hypervolume (HV) [19] and the inverted genera-
The aim is to minimize a vector consisting of the two
tional distance (IGD) [20] quality indicators were se-
5
https://platypus.readthedocs.io/en/latest/index.html lected to assist in comparing the three MOGAs with
5
Table 1: Hypervolume(HV) values
FE HV (x10−6 )
NSGA-II NSGA-III SPEA2
500 999667.063 999658.576 999755.043
1000 999755.043 999656.438 999960.329
1500 999960.329 999655.593 999960.329
2000 999960.329 999654.920 999960.329
2500 999960.329 999655.384 999960.329
3000 999960.329 999655.777 999960.329
Figure 3: Pareto front for 1000 fitness evaluations (FE)
3500 999960.329 999655.230 999960.329
4000 999960.329 999655.687 999960.329
respect to performance and scalability, given the abil- 4500 999960.329 999657.071 999960.329
ity of the aforementioned metrics to assess both con-
vergence and diversity (uniformity and spread) of the
algorithms. Specifically, the HV indicator assesses the
volume covered by the non-dominated solutions of a
Pareto front in the objective space. Therefore, the
larger the volume covered by the solutions generated
in a run, the higher the HV value, which indicates a
better performance. The IGD indicator assesses how
far the elements of the true Pareto front (reference data
in our case) are from the non-dominated points of an
approximation Pareto front. Therefore, the greater the
extent of the true Pareto front that is covered by the
non dominated points generated by a run in the objec- Table 2: Inverted Generational Distance(IGD) values
tive space, the lower the IGD value, which denotes a
better performance. Each algorithm was run 100 times FE IGD (x10−6 )
for each number of FE and the median values of the NSGAII NSGAIII SPEA2
HV and IGD were calculated for each algorithm. These 500 60727.255 61344.367 57375.237
values are presented in Tables 1 and 2 respectively. 1000 60250.148 57791.945 57792.130
The differences observed in the indicators values be- 1500 57792.130 57792.189 57792.130
tween the compared algorithms are too small, and this 2000 57792.130 57787.907 57792.130
most probably is the result of the small complexity 2500 57792.130 57792.125 57792.130
of the workload used; however, in cases of application 3000 57792.130 57791.964 57792.130
workloads with increasingly greater complexity and/or 3500 57792.130 57790.733 57792.130
scale, these differences will become more profound. As 4000 57792.130 57792.097 57792.130
regards the HV indicator, SPEA2 presents the best 4500 57792.130 57792.179 57792.130
performance, that is, the highest hypervolume value,
and this is consistent along all fitness evaluation num-
bers used. Second best for this indicator is the NSGA-
II. It is important to observe that in the case SPEA2 in
the 1000 fitness evaluations rub the HV value stabilizes
to a constant value from the second measurement on-
6
are able to solve the problem of finding a set of near-
Table 3: Pairwise comparison for HV indicator
optimal solutions that support developers in a FaaS
NSGA-II NSGA-III SPEA2 environment to select an efficient resource allocation
NSGA-II 0.074 0.18 scheme with respect to cost and time. In our case
NSGA-III 0.005 this was verified through the experimental process fol-
lowed; therefore, the question raised now, and will be
SPEA2
the source of future research work, is whether the sup-
port obtained by the MOGAs is worth investigating
further in terms of real-time response, ”execution on
Table 4: Pairwise comparison for IGD indicator the fly as workloads come in” and the level to which
this support can be generalized to cover also workloads
NSGA-II NSGA-III SPEA2 of unknown characteristics.
NSGA-II 0.241 0.18
NSGA-III 0.285
SPEA2 Acknowledgments
The paper is part of the outcomes of the Twinning
project Dossier Cloud. This project has received fund-
wards. The same behaviour is also observed for NSGA- ing from the European Union’s Horizon 2020 research
II but from the third measurement onwards. On the and innovation programme under grant agreement No.
contrary, NSGA-III presents more fluctuations in its 692251.
measurements.
In the case of the IGD indicator, things appear more
complicated since none of the algorithms seems to pre-
References
vail. A notable point is that for the lowest measure of [1] Fox GC, Ishakian V, Muthusamy V, Slominski A.
the fitness evaluations numbers, SPEA2 clearly outper- Status of serverless computing and function-as-
forms the others. The values for NSGA-II and SPEA2 a-service (faas) in industry and research. arXiv
starting from the third measurement onwards are fully preprint arXiv:170808028. 2017;.
identical, while NSGA-III shows ups and downs and in
three cases seems better than the other two. [2] Adzic G, Chatley R. Serverless computing: eco-
The Wilcoxon signed-rank test was applied on both nomic and architectural impact. In: Proceedings
quality indicators to detect whether or not a statisti- of the 2017 11th Joint Meeting on Foundations of
cally significant difference exist among the three algo- Software Engineering. ACM; 2017. p. 884–889.
rithms. To handle the family-wise error rate accumu-
lated, p-values were adjusted using a post-hoc Holm [3] AWS Lambda;. Accessed: 2018-09-30. https:
procedure. The p-values resulting from the pairwise //aws.amazon.com/lambda/.
comparison for HV and IGD indicators are shown in
Tables 3 and 4 respectively where pairs of algorithms [4] IBM Cloud Functions;. Accessed: 2018-09-30.
with a statistically significant difference (p <0.05) are https://www.ibm.com/cloud/functions.
shown in boldface. According to the pairwise com-
parisons, no significant difference is observed between [5] Google Cloud Functions;. Accessed: 2018-09-30.
NSGA-II and NSGA-III, NSGA-II and SPEA2, and https://cloud.google.com/functions/.
NSGA-III and SPEA2 in either indicator except only
for the pair NSGA-III and SPEA2 in the case of HV. [6] Microsoft Azure Functions;. Accessed: 2018-
09-30. https://azure.microsoft.com/en-us/
services/functions/.
6 Conclusions and Future Work
[7] Baldini I, Castro P, Chang K, Cheng P, Fink S,
This research work performed a preliminary investiga- Ishakian V, et al. Serverless computing: Current
tion to assess whether heuristic approaches for multi- trends and open problems. In: Research Advances
objective optimization, like the specific MOGAs used, in Cloud Computing. Springer; 2017. p. 1–20.
7
[8] HoseinyFarahabady M, Taheri J, Tari Z, Zomaya [18] Riquelme N, Von Lücken C, Baran B. Perfor-
AY. A dynamic resource controller for a lambda mance metrics in multi-objective optimization. In:
architecture. In: Parallel Processing (ICPP), 2017 Computing Conference (CLEI), 2015 Latin Amer-
46th International Conference on. IEEE; 2017. p. ican. IEEE; 2015. p. 1–11.
332–341.
[19] Zitzler E, Thiele L. Multiobjective evolutionary
[9] Hoang A. Analysis of microservices and server- algorithms: a comparative case study and the
less architecture for mobile application enable- strength Pareto approach. IEEE transactions on
ment (PhD Dissertation). California State Uni- Evolutionary Computation. 1999;3(4):257–271.
versity, Northridge; 2017.
[20] Van Veldhuizen DA, Lamont GB. Multi-
[10] Lee H, Satyam K, Fox G. Evaluation of produc- objective evolutionary algorithms: Analyzing
tion serverless computing environments. In: 2018 the state-of-the-art. Evolutionary computation.
IEEE 11th International Conference on Cloud 2000;8(2):125–147.
Computing (CLOUD). IEEE; 2018. p. 442–450.
[11] Lloyd W, Ramesh S, Chinthalapati S, Ly L, Pal-
lickara S. Serverless computing: An investigation
of factors influencing microservice performance.
In: Cloud Engineering (IC2E), 2018 IEEE Inter-
national Conference on. IEEE; 2018. p. 159–169.
[12] Back T, Andrikopoulos V. Using a Microbench-
mark to Compare Function as a Service Solutions.
In: European Conference on Service-Oriented and
Cloud Computing. Springer; 2018. p. 146–160.
[13] Spillner J. Snafu: Function-as-a-service (faas)
runtime design and implementation. arXiv
preprint arXiv:170307562. 2017;.
[14] Hong S, Srivastava A, Shambrook W, Dumitras
T. Go serverless: securing cloud via serverless
design patterns. In: 10th {USENIX} Workshop on
Hot Topics in Cloud Computing (HotCloud 18).
USENIX; 2018. .
[15] Deb K, Pratap A, Agarwal S, Meyarivan T. A
fast and elitist multiobjective genetic algorithm:
NSGA-II. IEEE transactions on evolutionary com-
putation. 2002;6(2):182–197.
[16] Deb K, Jain H. An evolutionary many-objective
optimization algorithm using reference-point-
based non dominated sorting approach, part
I: Solving problems with box constraints.
IEEE Trans Evolutionary Computation.
2014;18(4):577–601.
[17] Zitzler E, Laumanns M, Thiele L. SPEA2: Im-
proving the strength Pareto evolutionary algo-
rithm. TIK-report. 2001;103.
8