Towards Splitting Monolithic Workflows Into
Serverless Functions and Estimating Their Run-Time
in the Earth Observation Domain
Dennis Kaiser1 , Bohdan Dovhan1 , André Bauer and Samuel Kounev
University of Würzburg, Sanderring 2, 97070 Würzburg, Germany
1
  Both authors contributed equally to this paper

Keywords
Serverless, serverless functions, runtime estimation, monolithic workflows, splitting monoliths


   In the Earth observation domain, monolithic legacy workflows are still prevalent to this
day. As a result, runtimes of such scientist workflows range from hours to weeks or months,
depending on the area being investigated and/or the architecture of the algorithms. As minor
optimizations and improved scalability can reduce runtimes considerably, we envision a novel
full-stack execution platform Earth observation scientific workflows in the long term. This work
introduces two cornerstones of our vision: (i) Splitting monolithic workflows into smaller, more
scalable, and manageable functions. More precisely, we aim to result in workflows consisting
of serverless functions since the user is not concerned with the operational part (NoOps). (ii)
Estimating the runtime of the extracted functions based on different aspects to provide optimal
scheduling.
   Firstly, we have to investigate different aspects to port workflows to serverless functions.
However, before we can extract serverless functions from these legacy monoliths, which mainly
comprise serial program code, we have to solve the essential intermediate step of splitting
the scientific workflow into parallelizable parts or applying other measures to improve the
scalability and runtime. As part of the solution, parallelization is sought because it offers
many possibilities for speedup and allows for easier distribution onto different threads and
or processor cores during execution. In recent decades, the speedup of single CPU cores per
new generation flattened. In contrast, the number of cores increased [1, 2, 3], promoting the
trend to invest in parallel executable software development further. Another essential part of a
possible solution is the optimization or reduction of serial code. As seen by Amdahl’s law [4],
code that needs to be processed in serial, even if it only represents 10% of the overall workflow
code has significant implications for the speedup of the system. Therefore, we primarily have to
shrink the percentage of serial code and assess optimizations for the remainder while keeping
parallelization techniques in mind. After optimizing, parallelizing, and reducing the serial code,
our vision is to construct graphs for the specific workflow, split off nodes by using a predefined
ruleset that correlates with the previously used strategies, and in a final step map these new

SSP’21: Symposium on Software Performance, November 09–10, 2021, Leipzig, Germany
   dennis.kaiser@uni-wuerzburg.de (D. Kaiser); bohdan.dovhan@uni-wuerzburg.de (B. Dovhan);
andre.bauer@uni-wuerzburg.de (A. Bauer); samuel.kounev@uni-wuerzburg.de (S. Kounev)
                                    © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)
resulting nodes to functions that can be executed according to the serverless paradigm.
   Secondly, we have to examine approaches to estimate the runtime of serverless functions
by leveraging decision-making procedures based on workload evaluation, custom metrics,
and system feedback followed by time versus cost estimation analysis. The importance of
the workload evaluation and serverless function runtime estimation in emerging serverless
computation was presented in [5, 6]. The run time estimation requires constant and holistic input
data scheduling and validation that play a crucial role in reliable performance. We will review
and analyze estimation processes’ metrics as well as research and compare decision-making
procedures. The decision-making procedures have a crucial role in ensuring flow-less functions
execution. Following the decision-making feedback combined with metrics and third-party
parameters where applicable, the estimation procedure represents a self-sufficient cycle. The
continuous feedback improvement and implementation of third-party AI prediction engines
and/or static parameters in accordance with specific industry needs are potential areas for
future research extension. The initial bootstrap procedure can have multiple implementations.
However, it should follow the superfluous principle to avoid function calls loss or inability to
accommodate the requested workload. We have to ensure that the runtime function estimation
process should not cause any significant delays in the overall function execution time. Thus,
the estimation process should be rather an independent, parallel process that might require
dedicated resources. We will look at the cost prediction of serverless computation and its
importance for business decision-making and cost/time analysis [7]. In addition, we aim to
show the challenges that we are facing and discuss their potential remedies, future works as
well as the general applicability of our approach.


References
[1] M. M. Waldrop, More than Moore, Nature 530 (2016) 144–148.
[2] P. Gepner, M. F. Kowalik, Multi-core processors: New way to achieve high system per-
    formance, in: International Symposium on Parallel Computing in Electrical Engineering
    (PARELEC’06), IEEE, 2006, pp. 9–13.
[3] J. S. Vetter, E. P. DeBenedictis, T. M. Conte, Architectures for the post-moore era, IEEE
    Micro 37 (2017) 6–8. doi:10.1109/MM.2017.3211127.
[4] G. M. Amdahl, Validity of the single processor approach to achieving large scale computing
    capabilities, in: Proceedings of the April 18-20, 1967, Spring Joint Computer Conference,
    AFIPS ’67 (Spring), Association for Computing Machinery, New York, NY, USA, 1967, p.
    483–485. URL: https://doi.org/10.1145/1465482.1465560. doi:10.1145/1465482.1465560.
[5] A. Chirkin, A. Belloum, S. Kovalchuk, M. Makkes, M. Melnik, A. Visheratin, D. Nasonov,
    Execution time estimation for workflow scheduling, Future Generation Computer Systems
    75 (2017). doi:10.1016/j.future.2017.01.011.
[6] N. Akhtar, A. Raza, V. Isahagian, I. Matta, Cose: Configuring serverless functions using
    statistical learning, 2020, pp. 129–138. doi:10.1109/INFOCOM41043.2020.9155363.
[7] S. Eismann, J. Grohmann, E. Eyk, N. Herbst, S. Kounev, Cost prediction of serverless
    workflows, 2020. doi:10.1145/3358960.3379133.