<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Splitting Monolithic Workflows Into Serverless Functions and Estimating Their Run-Time in the Earth Observation Domain</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dennis Kaiser</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bohdan Dovhan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>André Bauer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Samuel Kounev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Würzburg</institution>
          ,
          <addr-line>Sanderring 2, 97070 Würzburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the Earth observation domain, monolithic legacy workflows are still prevalent to this day. As a result, runtimes of such scientist workflows range from hours to weeks or months, depending on the area being investigated and/or the architecture of the algorithms. As minor optimizations and improved scalability can reduce runtimes considerably, we envision a novel full-stack execution platform Earth observation scientific workflows in the long term. This work introduces two cornerstones of our vision: (i) Splitting monolithic workflows into smaller, more scalable, and manageable functions. More precisely, we aim to result in workflows consisting of serverless functions since the user is not concerned with the operational part (NoOps). (ii) Estimating the runtime of the extracted functions based on diferent aspects to provide optimal scheduling. Firstly, we have to investigate diferent aspects to port workflows to serverless functions. However, before we can extract serverless functions from these legacy monoliths, which mainly comprise serial program code, we have to solve the essential intermediate step of splitting the scientific workflow into parallelizable parts or applying other measures to improve the scalability and runtime. As part of the solution, parallelization is sought because it ofers many possibilities for speedup and allows for easier distribution onto diferent threads and or processor cores during execution. In recent decades, the speedup of single CPU cores per new generation flattened. In contrast, the number of cores increased [ 1, 2, 3], promoting the trend to invest in parallel executable software development further. Another essential part of a possible solution is the optimization or reduction of serial code. As seen by Amdahl's law [4], code that needs to be processed in serial, even if it only represents 10% of the overall workflow code has significant implications for the speedup of the system. Therefore, we primarily have to shrink the percentage of serial code and assess optimizations for the remainder while keeping parallelization techniques in mind. After optimizing, parallelizing, and reducing the serial code, our vision is to construct graphs for the specicfi workflow, split of nodes by using a predefined ruleset that correlates with the previously used strategies, and in a final step map these new</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Serverless</kwd>
        <kwd>serverless functions</kwd>
        <kwd>runtime estimation</kwd>
        <kwd>monolithic workflows</kwd>
        <kwd>splitting monoliths</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>resulting nodes to functions that can be executed according to the serverless paradigm.</p>
      <p>
        Secondly, we have to examine approaches to estimate the runtime of serverless functions
by leveraging decision-making procedures based on workload evaluation, custom metrics,
and system feedback followed by time versus cost estimation analysis. The importance of
the workload evaluation and serverless function runtime estimation in emerging serverless
computation was presented in [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. The run time estimation requires constant and holistic input
data scheduling and validation that play a crucial role in reliable performance. We will review
and analyze estimation processes’ metrics as well as research and compare decision-making
procedures. The decision-making procedures have a crucial role in ensuring flow-less functions
execution. Following the decision-making feedback combined with metrics and third-party
parameters where applicable, the estimation procedure represents a self-suficient cycle. The
continuous feedback improvement and implementation of third-party AI prediction engines
and/or static parameters in accordance with specific industry needs are potential areas for
future research extension. The initial bootstrap procedure can have multiple implementations.
However, it should follow the superfluous principle to avoid function calls loss or inability to
accommodate the requested workload. We have to ensure that the runtime function estimation
process should not cause any significant delays in the overall function execution time. Thus,
the estimation process should be rather an independent, parallel process that might require
dedicated resources. We will look at the cost prediction of serverless computation and its
importance for business decision-making and cost/time analysis [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In addition, we aim to
show the challenges that we are facing and discuss their potential remedies, future works as
well as the general applicability of our approach.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>M. M. Waldrop</surname>
          </string-name>
          , More than Moore,
          <source>Nature</source>
          <volume>530</volume>
          (
          <year>2016</year>
          )
          <fpage>144</fpage>
          -
          <lpage>148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gepner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Kowalik</surname>
          </string-name>
          <article-title>, Multi-core processors: New way to achieve high system performance</article-title>
          ,
          <source>in: International Symposium on Parallel Computing in Electrical Engineering (PARELEC'06)</source>
          , IEEE,
          <year>2006</year>
          , pp.
          <fpage>9</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Vetter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. P.</given-names>
            <surname>DeBenedictis</surname>
          </string-name>
          , T. M.
          <article-title>Conte, Architectures for the post-moore era</article-title>
          ,
          <source>IEEE Micro 37</source>
          (
          <year>2017</year>
          )
          <fpage>6</fpage>
          -
          <lpage>8</lpage>
          . doi:
          <volume>10</volume>
          .1109/MM.
          <year>2017</year>
          .
          <volume>3211127</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Amdahl</surname>
          </string-name>
          ,
          <article-title>Validity of the single processor approach to achieving large scale computing capabilities</article-title>
          ,
          <source>in: Proceedings of the April 18-20</source>
          ,
          <year>1967</year>
          , Spring Joint Computer Conference, AFIPS '
          <volume>67</volume>
          (Spring),
          <source>Association for Computing Machinery</source>
          , New York, NY, USA,
          <year>1967</year>
          , p.
          <fpage>483</fpage>
          -
          <lpage>485</lpage>
          . URL: https://doi.org/10.1145/1465482.1465560. doi:
          <volume>10</volume>
          .1145/1465482.1465560.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chirkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Belloum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kovalchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Makkes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Melnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Visheratin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nasonov</surname>
          </string-name>
          ,
          <article-title>Execution time estimation for workflow scheduling</article-title>
          ,
          <source>Future Generation Computer Systems</source>
          <volume>75</volume>
          (
          <year>2017</year>
          ). doi:
          <volume>10</volume>
          .1016/j.future.
          <year>2017</year>
          .
          <volume>01</volume>
          .011.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Akhtar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Raza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Isahagian</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Matta</surname>
          </string-name>
          , Cose:
          <article-title>Configuring serverless functions using statistical learning</article-title>
          ,
          <year>2020</year>
          , pp.
          <fpage>129</fpage>
          -
          <lpage>138</lpage>
          . doi:
          <volume>10</volume>
          .1109/INFOCOM41043.
          <year>2020</year>
          .
          <volume>9155363</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Eismann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Grohmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Eyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Herbst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kounev</surname>
          </string-name>
          , Cost prediction of serverless workflows,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .1145/3358960.3379133.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>