An effective resource management approach in a FaaS environment
                               Andreas Christoforou, Andreas S. Andreou
                                   Cyprus University of Technology
                         {andreas.christoforou,andreas.andreou}@cut.ac.cy


Abstract                                                         cloud providers adopted and currently support and of-
                                                                 fer this architecture. AWS Lambda [3], IBM Cloud
Serverless computing introduces a new Cloud service              Functions [4], Google Cloud Functions [5] and Mi-
which consist an increasingly popular architecture for           crosoft Azure Functions [6] are the major serverless
building distributed applications. This paper investi-           providers. This new cloud paradigm is becoming in-
gates and proposes a new resource management ap-                 creasingly popular and is gaining great attention from
proach in a FaaS platform, based on intelligent tech-            the software industry and research community. A num-
niques. A number of experiments applied through an               ber of new technical challenges and open problems [7]
indicative framework consisting of a client application          have emerged and a number of questions have been
and a Lambda function. Three Genetic Algorithms                  set about the importance and the future of serverless
were employed to deliver optimal solutions in a multi-           computing.
objective environment.                                              Resource management support in such a FaaS envi-
                                                                 ronment is essential for the software development pro-
1    Introduction                                                cess itself, which is directed towards satisfying the SLA
                                                                 and providing QoS assurance. The identification of the
Serverless computing is a relatively new processing              optimum scenario for resource allocation to serve ade-
paradigm or model that emerged through the con-                  quately a specific workload is a tedious, computation-
tinuous and vast development of the Cloud. Server-               ally complex and time-consuming process since multi-
less computing provides a service in which develop-              ple objectives need to be satisfied.
ers can write and deploy code without provisioning or               This research work is motivated by the following two
managing servers or containers. The adoption of this             Research Questions (RQ)
paradigm has great impact on several software engi-
                                                                   • RQ1: Is it possible to implement easy to use
neering aspects such as development process, pricing
                                                                     and efficient resource management algorithms in
model and Quality of Sercice (QoS) assurance.
                                                                     a FaaS platform?
   Despite the various benefits and advantages of
serverless computing, like zero server management, no              • RQ2: How intelligent techniques can deliver effi-
up-front provisioning, high availability, auto-scalability           cient resource management to developers in a FaaS
and pay only for the resources used, there are also some             environment with the minimum possible cost and
weaknesses that should also be taken into account: As                time?
currently offered from providers, it is not suitable for
long term tasks because of the limited time a service               The Amazon Lambda platform is considered a com-
can run; additionally, there is increasing complexity of         plete platform as it offers the most features, while
the underlying architecture, which is intensified by the         presents the greatest market share; for these reasons
lack of appropriate operational tools.                           it was selected to constitute our experimental environ-
   The main representative of this new service architec-         ment.
ture is Function as a Service (FaaS) or event-based pro-            The rest of this paper is organized as follows: Section
gramming [1], where a function may triggered through             2 provides a brief overview of relevant literature focus-
an api call or by an event. Since Amazon introduced              ing more on resource management approaches. Section
Lambda serverless platform in late 2014 [2], many other          3 introduces the proposed approach which addresses

                                                             2
the two research questions. The experimental process        to the first research question. In this research work an
is described in section 4, while section 5 discuses the     exhaustive algorithm will be applied on a low demand
results obtained. Finally, section 6 concludes the paper    environment and a small scale workload with the re-
and outlines future research steps.                         sults obtained being considered as the reference data
                                                            for the proposed intelligent approaches, the latter be-
                                                            ing able to reach to solutions faster. Our proposition is
2 Literature Review                                         the employment of a multi-objective genetic algorithms
                                                            (MOGAs) [15] as our optimization method that will
A short literature review has been carried out that is
                                                            generate near-optimal solutions and their performance
not limited to resource management approaches, since
                                                            will be compared with the reference optimal solutions
very few relevant research works were identified.
                                                            extracted by the exhaustive algorithm.
   Two research works refer to efficient resource man-
agement: In [8] a solution using a well-known re-
source allocation strategy for a Lambda platform was
presented based on the model predictive controller            Genetic algorithms are a type of evolutionary algo-
(MPC). This solution designs a resource allocation rithm, which are widely used to solve search-based op-
policy through the understanding of the run-time at- timization problems by simulating the theory of natu-
tributes of the workload. The authors in [9] performed ral evolution on a population of individuals (candidate
a dedicated test to identify if cost optimization is feasi- solutions). Problems like the one this study is dealing
ble when utilizing increased resources for lowering pro- with, require the optimization of multiple criteria at
cessing times.                                              the same time. In such a case multi-objective genetic
   Evaluation of main FaaS providers was performed algorithms can be adopted, where the goal is to find
in [10] and relevant results were presented in terms of the best solution by otimizing a set of objective func-
throughput, network bandwidth, a file I/O and perfor- tions. In case of conflicting or competing objectives,
mance computed according to concurrent invocations. a multi-objective genetic algorithm normally delivers a
In [11], the authors report results from a comprehen- set of optimal solutions instead of a single one. This
sive investigation on the performance of microservices set of optimal solutions is called Pareto optimal set (or
hosted by a serveless platform. The investigation dealt Pareto front) and contains those solutions that are not
with implications of infrastructure elasticity, load bal- dominated by any other solution yielded during evolu-
ancing, provisioning variation, infrastructure retention, tion. Each optimal solution constitutes a specific bal-
and memory reservations. A micro-benchmark intro- ance between the objectives under optimization, where
duced in [12] was used to evaluate the performance and any improvement in one of them leads to worsening the
cost model of popular FaaS providers.                       other (conflicting targets, e.g. cost vs time in our case).
   An open-source FaaS tool called Snafu was intro- Therefore, a decision maker is provided with the set of
duced in [13] which is employed for managing, execut- optimal solutions and is therefore supported to take de-
ing and testing functions across provider-specific inter- cisions as to which values of the decision variables are
faces. Finally the work of Hong et al. [14] describes most suited based on the targets and the requirements
six serverless design patterns that can be used to build of his/her application.
serverless applications and services.

                                                                  In the context of our application, the candidate so-
3    Multi-Objective                 Optimization lutions, or alternatively the decision variables, consist
     Approach                                                   of two of the available configuration options offered
                                                                by AWS Lambda platform, namely memory allocation
Our general aim is to allocate a sufficient amount of re-       and number of maximum concurrency functions, as
sources in a FaaS environment that should be able to            well as a third variable, the batch size, that represents
serve a specific workload. By utilizing an exhaustive           the number of inputs that each individual function is
algorithm we first aim to identify the optimal solutions        required to process. Our approach is graphically rep-
for both objectives, cost and performance. Exhaustive           resented in Figure 1 where the three decision variables
approaches have great demands on resources and, sub-            in conjunction with the characteristics of the workload
sequently, on cost, and thus they cannot be the answer          define the level of Cost and Duration.

                                                            3
                             Figure 1: Proposed Multi-objective Optimization Approach


4       Experimental Process
4.1     Experimental Environment
An application that is used to perform the experimen-
tal process has been implemented by utilizing services
offered by Amazon AWS platform. More specifically,
this application is based on the idea introduced in
Amazon Big Data Blog 1 where in a map-reduce style
it counts the words in files stored in an S3 2 bucket.
This application streams the total number of words
each function has calculated, returns it back to the user
in real-time and demonstrates how a Lambda function
                                                                               Figure 2: Experimental Environment
can efficiently process large amounts of data in short
time and provide immediate results to users.
   In our experimentation study our application was                 Python. The synchronous invocations of lambda func-
implemented as follows: On the client side, besides                 tions is executed and controlled by utilizing Python’s
the main function that triggers the whole process, a                multi-threading module. The integrated experimental
cascade function was also implemented which is re-                  environment is depicted in Figure 2.
sponsible to sense the workload size and accordingly
distribute the data in a synchronous way over a num-                4.2      Exhaustive Algorithm
ber of lambda functions. This function is also respon-
sible to collect and aggregate the results taken from               As mentioned before, our multi-objective optimization
the lambda functions responses and when all func-                   approach was adjusted and configured based on the
tions are completed it returns the final result to the              AWS Lambda platform and taking into account the
user. On the AWS platform a lambda function was                     available options offered. The two objectives are cost
created that counts the words of a given batch of files             and performance. The cost objective is the minimiza-
which are stored in a AWS S3 bucket. Data stored                    tion of the total cost required for the completion of the
in the S3 bucket represents the workload that will be               process of the input workload and is calculated using
processed with specific features. Both the client and               the formulas and rules as these are given by Amazon 4 .
lambda sides were implemented in Python 3.7 and the                 The calculation of the cost depends on the number of
integration with the AWS services, S3 and Lambda                    lambda functions executions, the total duration of all
was achieved through the Boto3 3 , the AWS SDK for                  executed functions and the allocated memory. The per-
                                                                    formance objective is also the minimization of the total
    1
     https://aws.amazon.com/blogs/big-data/building-scalable-       duration needed for the workload process completion
and-responsive-big-data-interfaces-with-aws-lambda/                 and is calculated as the time from the moment the
   2
     https://aws.amazon.com/s3/
   3                                                                  4
     https://aws.amazon.com/sdk-for-python/                               https://aws.amazon.com/lambda/pricing/


                                                                4
user sends the start request until the application de-          objective functions, Cost and Duration, for values that
livers back to the user the total count of words. The           belong to the PS set (see equation 2).
set of decision variables consists of the memory allo-
cation size, the number of maximum concurrent func-
tions and the batch size. Memory allocation denotes                 minimizef (x) = (fduration (x), fcost (x)), x ∈ P S (2)
the amount of memory you want to allocate for your
lambda function. The values memory can get, ranged                 Since all three decision variables are real-valued, we
from 128MB to 3008 MB with 64MB increment step.                 accordingly use best practices for setting the MOGAs
The concurrent execution limit can be set from 1 to             configuration. For crossover and mutation operators,
1000. Finally, the batch size represents the number             the Simulated Binary Crossover (SBX) and Polynomial
of files that each function will process and in our ex-         Mutation (PM) were selected respectively. The same
periments its values are relative to the percentage to          parameters and settings were used for all executions.
the workload size and fall into the following set: [1, 2,       As one can easily discern from Figure 3, all algorithms
5, 10, 20, 25, 50, 100]. The S3 bucket which is used            yielded very similar solutions which are almost identi-
for the workload, contains 100 text files with each file        cal to the reference optimal. A more detailed analysis
containing 638 words. The exhaustive algorithm that             of the results follows in the next section.
was used calculated and delivered all possible candi-
date solutions. To reduce execution time and cost we
discarded solutions for concurrency limit over 100 since        5      Results and Discussion
the size of the workload is 100 and it does not make
sense to take these solutions into account. The number          The execution of the exhaustive algorithm on the ex-
of Possible Solutions (PS) is calculated using equation         perimental application delivered a complete list of PS.
1 and is calculated to be equal to 36800.                       The extracted values constitute the aggregation of five
                                                                different executions in order to minimize or even elim-
                  |P S| = N × M × K                  (1)        inate possible variations between the results under ex-
                                                                actly the same configuration conditions. As described
where, N is the number of the possible values of the            above, the results from the exhaustive algorithm were
concurrency limit and is equal to 100, M denotes the            considered as the reference data and were used for
total number of values for memory and is equal to 46            the assessment of the three MOGAs employed. Each
and K is the number of values the workload batch size           MOGA was run 100 times for different values of fitness
can get and is equal to 8.                                      evaluations (FE) ranging from 500 to 4500 with incre-
                                                                ment step 500. The Pareto optimal front that emerged
4.3   Multi-objective Genetic Algorithms                        from the reference optimal solutions in contrast to the
                                                                Pareto near-optimal solutions yielded by each MOGA
We selected three well-known and widespread MO-                 for 1000 fitness evaluations, is depicted in Figure 3. By
GAs to assess their ability to solve the problem in             observing the Pareto fronts, one can easily conclude
hand and compare their results: The Non-dominated               that all three MOGAs approached the optimal solu-
Sorting Genetic Algorithm II (NSGA-II) [15], the                tions to a high degree; in fact, in some cases the dom-
Non-dominated Sorting Genetic Algorithm III (NSGA-              inant MOGA solutions are exactly the same as those
III) [16] and the Strength Pareto Evolutionary Algo-            of the reference set. At this point we will utilize some
rithm 2 (SPEA2) [17]. The selection of these specific           metrics aiming to assess further and compare the per-
algorithms was made after performing some prelimi-              formance [18] of the three MOGAs we employed on our
nary experimentation which indicated that these three           approach.
present a consistently good performance.
   The relative configuration of the required parame-
ters, as well as the overall implementation of the al- 5.1 MOGAs performance through quality
gorithms, were performed using Platypus 5 , a Python-            indicators
based multi-objective optimization algorithms library.
                                                          The hypervolume (HV) [19] and the inverted genera-
The aim is to minimize a vector consisting of the two
                                                          tional distance (IGD) [20] quality indicators were se-
   5
     https://platypus.readthedocs.io/en/latest/index.html lected to assist in comparing the three MOGAs with

                                                            5
                                                                          Table 1: Hypervolume(HV) values

                                                                    FE                   HV (x10−6 )
                                                                            NSGA-II      NSGA-III       SPEA2
                                                                   500      999667.063   999658.576    999755.043
                                                                   1000     999755.043   999656.438    999960.329
                                                                   1500     999960.329   999655.593    999960.329
                                                                   2000     999960.329   999654.920    999960.329
                                                                   2500     999960.329   999655.384    999960.329
                                                                   3000     999960.329   999655.777    999960.329
Figure 3: Pareto front for 1000 fitness evaluations (FE)
                                                                   3500     999960.329   999655.230    999960.329
                                                                   4000     999960.329   999655.687    999960.329
respect to performance and scalability, given the abil-            4500     999960.329   999657.071    999960.329
ity of the aforementioned metrics to assess both con-
vergence and diversity (uniformity and spread) of the
algorithms. Specifically, the HV indicator assesses the
volume covered by the non-dominated solutions of a
Pareto front in the objective space. Therefore, the
larger the volume covered by the solutions generated
in a run, the higher the HV value, which indicates a
better performance. The IGD indicator assesses how
far the elements of the true Pareto front (reference data
in our case) are from the non-dominated points of an
approximation Pareto front. Therefore, the greater the
extent of the true Pareto front that is covered by the
non dominated points generated by a run in the objec-           Table 2: Inverted Generational Distance(IGD) values
tive space, the lower the IGD value, which denotes a
better performance. Each algorithm was run 100 times                 FE               IGD (x10−6 )
for each number of FE and the median values of the                           NSGAII NSGAIII SPEA2
HV and IGD were calculated for each algorithm. These                  500    60727.255 61344.367 57375.237
values are presented in Tables 1 and 2 respectively.                 1000    60250.148 57791.945 57792.130
   The differences observed in the indicators values be-             1500    57792.130 57792.189 57792.130
tween the compared algorithms are too small, and this                2000    57792.130 57787.907 57792.130
most probably is the result of the small complexity                  2500    57792.130 57792.125 57792.130
of the workload used; however, in cases of application               3000    57792.130 57791.964 57792.130
workloads with increasingly greater complexity and/or                3500    57792.130 57790.733 57792.130
scale, these differences will become more profound. As               4000    57792.130 57792.097 57792.130
regards the HV indicator, SPEA2 presents the best                    4500    57792.130 57792.179 57792.130
performance, that is, the highest hypervolume value,
and this is consistent along all fitness evaluation num-
bers used. Second best for this indicator is the NSGA-
II. It is important to observe that in the case SPEA2 in
the 1000 fitness evaluations rub the HV value stabilizes
to a constant value from the second measurement on-

                                                            6
                                                             are able to solve the problem of finding a set of near-
    Table 3: Pairwise comparison for HV indicator
                                                             optimal solutions that support developers in a FaaS
                NSGA-II      NSGA-III       SPEA2            environment to select an efficient resource allocation
    NSGA-II                    0.074         0.18            scheme with respect to cost and time. In our case
    NSGA-III                                 0.005           this was verified through the experimental process fol-
                                                             lowed; therefore, the question raised now, and will be
     SPEA2
                                                             the source of future research work, is whether the sup-
                                                             port obtained by the MOGAs is worth investigating
                                                             further in terms of real-time response, ”execution on
    Table 4: Pairwise comparison for IGD indicator           the fly as workloads come in” and the level to which
                                                             this support can be generalized to cover also workloads
                NSGA-II      NSGA-III       SPEA2            of unknown characteristics.
    NSGA-II                    0.241          0.18
    NSGA-III                                 0.285
     SPEA2                                                   Acknowledgments

                                                          The paper is part of the outcomes of the Twinning
                                                          project Dossier Cloud. This project has received fund-
wards. The same behaviour is also observed for NSGA- ing from the European Union’s Horizon 2020 research
II but from the third measurement onwards. On the and innovation programme under grant agreement No.
contrary, NSGA-III presents more fluctuations in its 692251.
measurements.
   In the case of the IGD indicator, things appear more
complicated since none of the algorithms seems to pre-
                                                          References
vail. A notable point is that for the lowest measure of [1] Fox GC, Ishakian V, Muthusamy V, Slominski A.
the fitness evaluations numbers, SPEA2 clearly outper-         Status of serverless computing and function-as-
forms the others. The values for NSGA-II and SPEA2             a-service (faas) in industry and research. arXiv
starting from the third measurement onwards are fully          preprint arXiv:170808028. 2017;.
identical, while NSGA-III shows ups and downs and in
three cases seems better than the other two.               [2] Adzic G, Chatley R. Serverless computing: eco-
   The Wilcoxon signed-rank test was applied on both           nomic and architectural impact. In: Proceedings
quality indicators to detect whether or not a statisti-        of the 2017 11th Joint Meeting on Foundations of
cally significant difference exist among the three algo-       Software Engineering. ACM; 2017. p. 884–889.
rithms. To handle the family-wise error rate accumu-
lated, p-values were adjusted using a post-hoc Holm [3] AWS Lambda;. Accessed: 2018-09-30. https:
procedure. The p-values resulting from the pairwise            //aws.amazon.com/lambda/.
comparison for HV and IGD indicators are shown in
Tables 3 and 4 respectively where pairs of algorithms [4] IBM Cloud Functions;. Accessed: 2018-09-30.
with a statistically significant difference (p <0.05) are      https://www.ibm.com/cloud/functions.
shown in boldface. According to the pairwise com-
parisons, no significant difference is observed between [5] Google Cloud Functions;. Accessed: 2018-09-30.
NSGA-II and NSGA-III, NSGA-II and SPEA2, and                   https://cloud.google.com/functions/.
NSGA-III and SPEA2 in either indicator except only
for the pair NSGA-III and SPEA2 in the case of HV.         [6] Microsoft Azure Functions;. Accessed: 2018-
                                                               09-30. https://azure.microsoft.com/en-us/
                                                               services/functions/.
6    Conclusions and Future Work
                                                              [7] Baldini I, Castro P, Chang K, Cheng P, Fink S,
This research work performed a preliminary investiga-             Ishakian V, et al. Serverless computing: Current
tion to assess whether heuristic approaches for multi-            trends and open problems. In: Research Advances
objective optimization, like the specific MOGAs used,             in Cloud Computing. Springer; 2017. p. 1–20.

                                                         7
 [8] HoseinyFarahabady M, Taheri J, Tari Z, Zomaya [18] Riquelme N, Von Lücken C, Baran B. Perfor-
     AY. A dynamic resource controller for a lambda          mance metrics in multi-objective optimization. In:
     architecture. In: Parallel Processing (ICPP), 2017      Computing Conference (CLEI), 2015 Latin Amer-
     46th International Conference on. IEEE; 2017. p.        ican. IEEE; 2015. p. 1–11.
     332–341.
                                                        [19] Zitzler E, Thiele L. Multiobjective evolutionary
 [9] Hoang A. Analysis of microservices and server-          algorithms: a comparative case study and the
     less architecture for mobile application enable-        strength Pareto approach. IEEE transactions on
     ment (PhD Dissertation). California State Uni-          Evolutionary Computation. 1999;3(4):257–271.
     versity, Northridge; 2017.
                                                        [20] Van Veldhuizen DA, Lamont GB.              Multi-
[10] Lee H, Satyam K, Fox G. Evaluation of produc-           objective evolutionary algorithms: Analyzing
     tion serverless computing environments. In: 2018        the state-of-the-art. Evolutionary computation.
     IEEE 11th International Conference on Cloud             2000;8(2):125–147.
    Computing (CLOUD). IEEE; 2018. p. 442–450.

[11] Lloyd W, Ramesh S, Chinthalapati S, Ly L, Pal-
     lickara S. Serverless computing: An investigation
     of factors influencing microservice performance.
     In: Cloud Engineering (IC2E), 2018 IEEE Inter-
     national Conference on. IEEE; 2018. p. 159–169.

[12] Back T, Andrikopoulos V. Using a Microbench-
     mark to Compare Function as a Service Solutions.
     In: European Conference on Service-Oriented and
     Cloud Computing. Springer; 2018. p. 146–160.

[13] Spillner J. Snafu: Function-as-a-service (faas)
     runtime design and implementation.       arXiv
     preprint arXiv:170307562. 2017;.

[14] Hong S, Srivastava A, Shambrook W, Dumitras
     T. Go serverless: securing cloud via serverless
     design patterns. In: 10th {USENIX} Workshop on
     Hot Topics in Cloud Computing (HotCloud 18).
     USENIX; 2018. .

[15] Deb K, Pratap A, Agarwal S, Meyarivan T. A
     fast and elitist multiobjective genetic algorithm:
     NSGA-II. IEEE transactions on evolutionary com-
     putation. 2002;6(2):182–197.

[16] Deb K, Jain H. An evolutionary many-objective
     optimization algorithm using reference-point-
     based non dominated sorting approach, part
     I: Solving problems with box constraints.
     IEEE     Trans    Evolutionary  Computation.
     2014;18(4):577–601.

[17] Zitzler E, Laumanns M, Thiele L. SPEA2: Im-
     proving the strength Pareto evolutionary algo-
     rithm. TIK-report. 2001;103.

                                                          8