<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Serverless Execution of Scientific Workflows - HyperFlow Case Study</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maciej Malawski</string-name>
          <email>malawski@agh.edu.pl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AGH University of Science and Technology Department of Computer Science Krakow</institution>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <fpage>25</fpage>
      <lpage>33</lpage>
      <abstract>
        <p>Scienti c work ows consisting of a high number of dependent tasks represent an important class of complex scienti c applications. Recently, a new type of serverless infrastructures has emerged, represented by such services as Google Cloud Functions or AWS Lambda. In this paper we take a look at such serverless infrastructures, which are designed mainly for processing background tasks of Web applications. We evaluate their applicability to more compute- and dataintensive scienti c work ows and discuss possible ways to repurpose serverless architectures for execution of scienti c work ows. A prototype work ow executor function has been developed using Google Cloud Functions and coupled with the HyperFlow work ow engine. The function can run workow tasks on the Google infrastructure, and features such capabilities as data staging to/from Google Cloud Storage and execution of custom application binaries. We have successfully deployed and executed the Montage astronomic workow, often used as a benchmark, and we report on initial results of performance evaluation. Our ndings indicate that the simple mode of operation makes this approach easy to use, although there are costs involved in preparing portable application binaries for execution in a remote environment. While our evaluation uses a preproduction (alpha) version of the Google Cloud Functions platform, we nd the presented approach highly promising. We also discuss possible future steps related to execution of scienti c work ows in serverless infrastructures, and the implications with regard to resource management for scienti c applications in general.</p>
      </abstract>
      <kwd-group>
        <kwd>Scienti c work ows</kwd>
        <kwd>cloud</kwd>
        <kwd>serverless infrastructures</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Scienti c work ows consisting of a high number of
dependent tasks represent an important class of complex
scienti c applications that have been successfully deployed and
executed in traditional cloud infrastructures, including
Infrastructure as a Service (IaaS) clouds. Recently, a new type
of serverless infrastructures emerge, represented by such
services as Google Cloud Functions (GCF) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] or AWS Lambda [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
These services allow deployment of software in the form of
functions that are executed in the provider's infrastructure
in response to speci c events such as new les being
uploaded to a cloud data store, messages arriving in queue
systems or direct HTTP calls. This approach frees the user
Copyright held by the author(s).
from having to maintain a server, including con guration
and management of virtual machines, while resource
management is provided by the platform in an automated and
scalable way.
      </p>
      <p>In this paper we take a look at such serverless
infrastructures. Although designed mainly for processing
background tasks of Web applications, we nevertheless
investigate whether they can be applied to more compute- and
data-intensive scienti c work ows. The main objectives of
this paper are as follows:</p>
      <p>
        To present the main features of serverless
infrastructures, comparing them to traditional
infrastructure-asa-service clouds,
To discuss the options of using serverless
infrastructures for execution of scienti c work ows,
To present our experience with a prototype implemented
using HyperFlow [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] work ow engine and Google Cloud
Functions (alpha version),
To evaluate our approach using the Montage
workow [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], a real-world astronomic application,
To discuss the costs and bene ts of this approach,
together with its implications for resource management
of scienti c work ows in emerging infrastructures.
      </p>
      <p>The paper is organized as follows. We begin with an
overview of serverless infrastructures in Section 2. In
Section 3 we propose and discuss alternative options for
serverless architectures of scienti c work ow systems. Our
prototype implementation, based on HyperFlow and GCF, is
described in Section 4. This is followed by evaluation
using the Montage application, presented in Section 5. We
discuss implications for resource management in Section 6
and present related work in Section 7. Section 8 provides a
summary and description of future work.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>OVERVIEW OF SERVERLESS CLOUDS</title>
      <p>Writing \serverless" applications is a recent trend, mainly
addressing Web applications. It frees programmers from
having to maintain a server { instead they can use a set
of existing cloud services directly from their application.
Examples of such services include cloud databases such as
Firebase or DynamoDB, messaging systems such as Google
Cloud Pub/Sub, noti cation services such as Amazon SNS
and so on. When there is a need to execute custom
application code in the background, special \cloud functions"
(hereafter simply referred to as functions) can be called.
Examples of such functions are AWS Lambda and Google Cloud
Functions (GCF).</p>
      <p>Both Lambda and GCF are based on the functional
programming paradigm: a function is a piece of software that
can be deployed on the providers' cloud infrastructure and it
performs a single operation in response to an external event.</p>
      <p>Functions can be triggered by:
an event generated by the cloud infrastructure, e.g.
a change in a cloud database, a le being uploaded
to a cloud object store, a new item appearing in a
messaging system, or an action scheduled at a speci ed
time,
a direct request from the application via HTTP or
cloud API calls.</p>
      <p>The cloud infrastructure which hosts the functions is
responsible for automatic provisioning of resources (including
CPU, memory, network and temporary storage), automatic
scaling when the number of function executions varies over
time, as well as monitoring and logging. The user is
responsible for providing executable code in a format required
by the framework. Typically, the execution environment is
limited to a set of supported languages: Node.js, Java and
Python in the case of AWS Lambda, and Node.js in the
case of GCF. The user has no control over the execution
environment, such as underlying operating system, version
of the runtime libraries, etc., but can use custom libraries
with package managers and even upload binary code to be
executed.</p>
      <p>Functions are thus di erent from Virtual Machines in IaaS
clouds where the users have full control over the OS
(including root access) and can customize the execution
environment to their needs. On the other hand, functions free the
developers from the need to con gure, maintain, and
manage server resources.</p>
      <p>Cloud providers impose certain limits on the amount of
resources a function can consume. In the case of AWS Lambda
these limits are as follows: temporary disk space: 512 MB,
number of processes and threads: 1024, maximum execution
duration per request: 300 seconds. There is also a limit of
100 concurrent executions per region, but this limit can be
increased on request. GCF, in its alpha version, does not
specify limit thresholds. There is, however, a timeout
parameter that can be provided when deploying a function and
the default value is 60 seconds.</p>
      <p>Functions are thus di erent from permanent and
stateful services, since they are not long-running processes, but
rather serve individual tasks. Resource limits indicate that
such cloud functions are not currently suitable for large-scale
HPC applications, but can be useful for high-throughput
computing work ows consisting of many ne-grained tasks.</p>
      <p>Functions have a ne-grained pricing model associated
with them. In the case of AWS Lambda, the price is $0.20
per 1 million requests and $0.00001667 for every GB-second
used, de ned as CPU time multiplied by the amount of
memory used. There are also additional charges for data
transfer and storage (when DynamoDB or S3 is used). The
alpha version of Google Cloud Functions does not have a
public pricing policy.</p>
      <p>Serverless infrastructures can be cost-e ective compared
to standard VMs. For example, the aggregate cost of
running AWS Lambda functions with 1 GB memory for 1 hour
is $0.060012. This is more expensive than the t2.micro
instance, which also has 1 GB of RAM but costs $0.013 per
hour. A T2.micro instance, however, o ers only burstable
performance, which means only a fraction of CPU time per
hour is available. The smallest standard instance at AWS is
m3.medium, which costs $0.067 per hour, but gives 3.75 GB
of RAM. Cloud functions are thus more suitable for
variable load conditions while standard instances can be more
economical for applications with stable workloads.
3.</p>
    </sec>
    <sec id="sec-3">
      <title>OPTIONS FOR EXECUTION OF SCIEN</title>
    </sec>
    <sec id="sec-4">
      <title>TIFIC WORKFLOWS IN SERVERLESS</title>
    </sec>
    <sec id="sec-5">
      <title>INFRASTRUCTURES</title>
      <p>In light of the identi ed features and limitations of
serverless infrastructures and cloud functions, we can discuss the
option of using them for execution of scienti c work ows.
We will start with a traditional execution model in IaaS
cloud with no cloud functions (1), then present the queue
model (2), direct executor model (3), bridge model (4), and
decentralized model (5). These options are schematically
depicted in Fig. 1, and discussed in detail further on.
3.1</p>
    </sec>
    <sec id="sec-6">
      <title>Traditional model</title>
      <p>The traditional model assumes the work ow is running
in a standard IaaS cloud. In this model the work ow
execution follows the well-known master-worker architecture,
where the master node runs a work ow engine, tasks that are
ready for execution are submitted to a queue, and worker
nodes process these tasks in parallel when possible. The
master node can be deployed in the cloud or outside of the
cloud, while worker nodes are usually deployed as VMs in
a cloud infrastructure. The worker pool is typically created
on demand and can be dynamically scaled up or down
depending on resource requirements.</p>
      <p>
        Such a model is represented e.g. by Pegasus and
HyperFlow. The Pegasus Work ow Management System [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] uses
HTCondor [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] to maintain its queue and manage
workers. HyperFlow [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is a lightweight work ow engine based
on Node.js { it uses RabbitMQ as its queue and AMQP
Executors on worker nodes. The deployment options of
HyperFlow on grids and clouds are discussed in detail in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        In this model the user is responsible for management of
resources comprising the worker pool. The pool can be
provisioned statically, which is commonly done in practice, but
there is also ongoing research on automatic or dynamic
resource provisioning for work ow applications [
        <xref ref-type="bibr" rid="ref13 ref15">13, 15</xref>
        ], which
is a non-trivial task.
      </p>
      <p>
        In the traditional cloud work ow processing model there
is a need for some storage service to store input, output
and temporary data. There are multiple options for data
sharing [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], but one of the most widely used approaches is to
rely on existing cloud storage, such as Amazon S3 or Google
Cloud Storage. This option has the advantage of providing
a permanent store so that data is not lost after the work ow
execution is complete and the VMs are terminated.
3.2
      </p>
    </sec>
    <sec id="sec-7">
      <title>Queue model</title>
      <p>This model is similar to the traditional model: the master
node and the queue remain unchanged, but the worker is
replaced by a cloud function. Instead of running a pool
of VMs with workers a set of cloud functions is spawned.
Each function fetches a task from a queue and processes it,
returning results via the queue.</p>
      <p>The main advantage of this model is its simplicity, since
it only requires changes in the worker module. This may
be simple if the queue uses a standard protocol, such as
AMQP in the case of HyperFlow Executor, but in the case
of Pegasus and HTCondor a Condor daemon (condor startd)
must run on the worker node and communicate using a
proprietary Condor protocol. In this scenario implementing a
worker as a cloud function would require more e ort.</p>
      <p>Another advantage of the presented model is the ability
to combine the workers implemented as functions with other
workers running e.g. in a local cluster or in a traditional
cloud. This would also enable concurrent usage of cloud
functions from multiple providers (e.g. AWS and Google)
when such a multi-cloud scenario is required.</p>
      <p>An important issue associated with the queue model is
how to trigger the execution of the functions. If a native
implementation of the queue is used (e.g. RabbitMQ as in
HyperFlow), it is necessary to trigger a function for each task
added to the queue. This can be done by the work ow engine
or by a dedicated queue monitoring service. Other options
include periodic function execution or recursive execution:
a function can itself trigger other functions once it nishes
processing data.</p>
      <p>To ensure a clean serverless architecture another option is
to implement the queue using a native cloud service which is
already integrated with cloud functions. In the case of AWS
Lambda one could implement the queue using DynamoDB:
here, a function could be triggered by adding a new item to
a task table. In the case of GFC, a Google Cloud Pub/Sub
service can be used for the same purpose. Such a solution,
however, would require more changes in the work ow engine
and would not be easy to deploy in multi-cloud scenarios.
3.3</p>
    </sec>
    <sec id="sec-8">
      <title>Direct executor model</title>
      <p>This is the simplest model and requires only a work ow
engine and a cloud function that serves as a task executor.
It eliminates the need for a queue since the work ow
engine can trigger the cloud function directly via API/HTTP
calls. Regarding development e ort, it requires changes in
the master and a new implementation of the worker.</p>
      <p>An advantage of this model is its cleanness and
simplicity, but these come at the cost of tight master-worker
coupling Accordingly, it becomes more di cult to implement
the multi-cloud scenario, since the work ow engine would
need to be able to dispatch tasks to multiple cloud function
providers.
3.4</p>
    </sec>
    <sec id="sec-9">
      <title>Bridge model</title>
      <p>This solution is more complex but it preserves the
decoupling of the master from the worker, using a queue. In
this case the master and the queue remain unchanged, but
a new type of bridge worker is added. It fetches tasks from
the queue and dispatches them to the cloud functions. Such
a worker needs to run as a separate service (daemon) and
can trigger cloud functions using the provider-speci c API.</p>
      <p>The decoupling of the master from the worker allows for
more complex and exible scenarios, including multi-cloud
deployments. A set of bridge workers can be spawned, each
dispatching tasks to a di erent cloud function provider.
Moreover, a pool of workers running in external distributed
platforms, such as third-party clouds or clusters, can be used
together with cloud functions.
3.5</p>
    </sec>
    <sec id="sec-10">
      <title>Decentralized model</title>
      <p>This model re-implements the whole work ow engine in a
distributed way using cloud functions. Each task of a
workow is processed by a separate function. These functions
can be triggered by (a) new data items uploaded to cloud
storage, or (b) other cloud functions, i.e. predecessor tasks
triggering their successor tasks following completion.
Option (a) can be used to represent data dependencies in a
work ow while option (b) can be used to represent control
dependencies.</p>
      <p>In the decentralized model the structure and state of
workow execution have to be preserved in the system. The
system can be implemented in a fully distributed way, by
deploying a unique function for each task in the work ow. In
this way, the work ow structure is mapped to a set of
functions and the execution state propagates by functions being
triggered by their predecessors. Another option is to deploy</p>
      <p>GCF</p>
      <sec id="sec-10-1">
        <title>Command</title>
        <p>GCF</p>
        <sec id="sec-10-1-1">
          <title>Executor</title>
        </sec>
        <sec id="sec-10-1-2">
          <title>Storage</title>
        </sec>
        <sec id="sec-10-1-3">
          <title>Client</title>
        </sec>
      </sec>
      <sec id="sec-10-2">
        <title>Cloud</title>
      </sec>
      <sec id="sec-10-3">
        <title>Functions</title>
        <p>HyperFlow</p>
        <sec id="sec-10-3-1">
          <title>Engine</title>
        </sec>
      </sec>
      <sec id="sec-10-4">
        <title>Worker</title>
      </sec>
      <sec id="sec-10-5">
        <title>Storage</title>
        <p>a generic task executor function and maintain the work ow
state in a database, possibly one provided as a cloud service.</p>
        <p>The advantages of the decentralized approach include fully
distributed and serverless execution, without the need to
maintain a work ow engine. The required development
effort is extensive, since it requires re-implementation of the
whole work ow engine. A detailed design of such an
engine is out of scope of this paper, but remains an interesting
subject of future research.
3.6</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>Summary of options</title>
      <p>As we can see, cloud functions provide multiple
integration options with scienti c work ow engines. The users need
to decide which option is best for them based on their
requirements, most notably the allowed level of coupling
between the work ow engine and the infrastructure and the
need to run hybrid or cross-cloud deployments where
resources from more than one provider are used in parallel.
We consider the fully decentralized option as an interesting
future research direction, while in the following sections we
will focus on our experience with a prototype implemented
using the direct executor model.</p>
    </sec>
    <sec id="sec-12">
      <title>PROTOTYPE BASED ON HYPERFLOW</title>
      <p>To evaluate the feasibility of our approach we decided to
develop a prototype using the HyperFlow engine and Google
Cloud Functions, applying the direct executor model. This
decision has several reasons. First, HyperFlow is
implemented in Node.js, while GCF supports Node.js as a native
function execution environment. This good match
simplies development and debugging, which is always non-trivial
in a distributed environment. Our selection of the direct
execution model was motivated by the extensible design of
HyperFlow, which can associate with each task in a
workow a speci c executor function responsible for handling
command-line tasks. Since GCF provides a direct
triggering mechanism of cloud functions using HTTP calls, we can
apply existing HTTP client libraries for Node.js, plugging
support for GCF into HyperFlow as a natural extension.
4.1</p>
    </sec>
    <sec id="sec-13">
      <title>Architecture and components</title>
      <p>The schematic diagram of the prototype is shown in Fig. 2.
The HyperFlow engine is extended with the GCFCommand
function which is responsible for communication with GCF.
It is a replacement for AMQPCommand function, which is used
in the standard HyperFlow distributed deployment with AMQP
protocol and RabbitMQ. GCFCommand sends the task
description in a JSON-encoded message to the cloud function.
The GCF Executor is the main cloud function which needs
to be deployed on the GCF platform. It processes the
message, and uses the Storage Client for staging in and out the
input and output data. It uses Google Cloud Storage and
requests parallel transfers to speed up download and upload
of data. GCF Executor calls the executable which needs to
be deployed together with the function. GCF supports
running own Linux-based custom binaries, but the user has to
make sure that the binary is portable, e.g. by statically
linking all of its dependencies. Our architecture is thus purely
serverless, with the HyperFlow engine running on a client
machine and directly relying only on cloud services such as
GCF and Cloud Storage.
4.2</p>
    </sec>
    <sec id="sec-14">
      <title>Fault tolerance</title>
      <p>Transient failures are a common risk in cloud
environments. Since execution of a possibly large volume of
concurrent HTTP requests in a distributed environment is
always prone to errors caused by various layers of network and
middleware stacks, the execution engine needs to be able to
handle such failures gracefully and attempt to retry failed
requests.</p>
      <p>In the case of HyperFlow, the Node.js ecosystem appears
very helpful in this context. We used the requestretry
library for implementing the HTTP client, which allows for
automatic retry of failed requests with a con gurable
number of retries (default: 5) and delay between retries (default:
5 seconds). Our prototype uses these default settings, but
in the future it will be possible to explore more advanced
error handling policies taking into account error types and
patterns.
5.</p>
    </sec>
    <sec id="sec-15">
      <title>EVALUATION USING MONTAGE WORK</title>
    </sec>
    <sec id="sec-16">
      <title>FLOW</title>
      <p>Based on our prototype which combines HyperFlow and
Google Cloud Functions, we performed several experiments
to evaluate our approach. The goals of the evaluation are as
follows:</p>
      <p>To validate the feasibility of our approach, i.e. to
determine whether it is practical to execute scienti c
work ows in serverless infrastructures.</p>
      <p>To measure performance characteristics of the
execution environment in order to provide hints for resource
management.</p>
      <p>Details regarding our sample application, experiment setup
and results are provided below.
5.1</p>
    </sec>
    <sec id="sec-17">
      <title>Montage workflow and experiment setup</title>
      <p>Montage application.</p>
      <p>
        For our study we selected the Montage [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] application,
which is an astronomic work ow. It is often used for various
mJPEG
mShrink
      </p>
      <p>mAdd
mImgtbl
mBackground
mBackground
mBackground
mBackground
mBackground
mBackground
mBackground
mBackground
mBackground
mBackground
mBgModel
mConcatFit
mDiffFit
mDiffFit
mDiffFit
mDiffFit
k mDiffFit
s mDiffFit
aT mmDDiiffffFFiitt
mDiffFit
mDiffFit
mDiffFit
mDiffFit
mDiffFit
mDiffFit
mDiffFit
mDiffFit
mDiffFit
mProjectPP
mProjectPP
mProjectPP
mProjectPP
mProjectPP
mProjectPP
mProjectPP
mProjectPP
mProjectPP
mProjectPP
benchmarks and performance evaluation, since it is
opensource and has been widely studied by the research
community. The application processes a set of input images
from astronomic sky surveys and constructs a single
largescale mosaic image. The structure of the work ow is shown
in Fig. 3: it consists of several stages which include
parallel processing sections, reduction operations and sequential
processing.
mProjectPP</p>
      <p>The size of the work ow, i.e. the number of tasks,
depends on the size of the area of the target image, which
is measured in angular degrees. For example, a small-scale
0.25-degree Montage work ow consists of 43 tasks, with 10
parallel mProjectPP tasks and 17 mDi Fit tasks, while more
complex work ows can involve thousands of tasks. In our
experiments we used the Montage 0.25 work ow with 43
tasks, and the Montage 0.4 work ow with 107 tasks.
Experiment setup.</p>
      <p>We used a recent version of HyperFlow and an Alpha
version of Google Cloud Functions. The HyperFlow engine was
installed on a client machine with Ubuntu 14.04 LTS Linux
and Node.js 4.5.0. For staging the input and output data,
as well as for temporary storage, we used a Google Cloud
Storage bucket with standard options. Both Cloud
Functions and Cloud Storage were located in the us-cental-1
region, while the client machine was located in Europe.
Data preparation and handling.</p>
      <p>To run the Montage work ow in our experiments all
input data needs to be uploaded to the cloud storage rst. For
each work ow run, a separate subfolder in the Cloud Storage
bucket is created. The subfolder is then used for exchange of
intermediate data and for storing the nal results. Data can
be conveniently uploaded using a command-line tool which
supports parallel transfers. The web-based Google Cloud
Console is useful for browsing results and displaying the
resulting JPEG images.
5.2</p>
    </sec>
    <sec id="sec-18">
      <title>Feasibility</title>
      <p>To assess the feasibility of our approach we tested our
prototype using the Montage 0.25 work ow. We collected
task execution start and nish timestamps, which give the
total duration of cloud function execution. This execution
time also includes data transfers. Based on the collected
task
mAdd
0</p>
      <p>25 50 75
Time in seconds
100
execution traces we plotted Gantt charts. Altogether,
several runs were performed and an example execution trace
(representative of all runs) is shown in Fig. 4.</p>
      <p>Montage 0.25 is a relatively small-scale work ow, but the
resulting plot clearly reveals that the cloud function-based
approach works well in this case. We can observe that the
parallel tasks of the work ow (mProjectPP, mDi Fit amd
mBackground) are indeed short-running and can be
processed in parallel. The user has no control over the level of
parallelism, but the cloud platform is able to process tasks
in a scalable way, as stated in the documentation. We also
observe no signi cant delays between task execution and can
attribute this to the fact that the requests between
HyperFlow engine and the cloud functions are transmitted using
HTTP over a wide-area network, including a trans-Atlantic
connection.</p>
      <p>
        Similar results were obtained for the Montage 0.4
workow which consists of 107 tasks, however the corresponding
detailed plots are not reproduced here for reasons of
readability. It should be noted that while the parallel tasks of
Montage are relatively ne-grained, the execution time of
sequential processing tasks such as mImgTbl and mAdd grows
with the size of the work ow and can exceed the default limit
of 60 seconds imposed upon cloud function execution. This
limit can be extended when deploying the cloud function,
but there is currently no information regarding the
maximum duration of such requests. We can only expect that
such limits will increase as the platforms become more
mature. This was indeed the case with Google App Engine,
where the initial request limit was increased from 30
seconds to 10 minutes [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
5.3
      </p>
    </sec>
    <sec id="sec-19">
      <title>Deployment size and portability</title>
      <p>Our current approach requires us to deploy the cloud
function together with all the application binaries. The Google
Cloud Function execution environment enables inclusion of
dependencies in Node.js libraries packaged using Node
Package Manager (NPM) which are automatically installed when
the function is deployed. Moreover, the user can provide a
set of JavaScript source les, con guration and binary
dependencies to be uploaded together with the function.</p>
      <p>In the case of the Montage application, the users need
to prepare application binaries in a portable format. Since
Montage is distributed in source1 format, it can be compiled
and statically linked with all libraries making it portable to
any Linux distribution.</p>
      <p>The size of the Montage binaries is 50 MB in total, and 20
MB in a compressed format, which is used for deployment.
We consider this deployment size as practical in most cases.
We should note that deployment of the function is performed
only once, prior to work ow execution. Of course, when the
execution environment needs to instantiate the function or
create multiple instances for scale-out scenarios, the size of
each instance may a ect performance, so users should try
to minimize the volume of the deployment package. It is
also worth noting that such binary distributions are usually
more compact than the full images of virtual machines used
in traditional IaaS clouds. Unfortunately, if the the source
distribution or portable binary is not available, then it may
not be possible to deploy it as a cloud function. One
useful option would be to allow deployment of container-based
images, such as Docker images, but this is currently not
supported.
5.4</p>
    </sec>
    <sec id="sec-20">
      <title>Variability</title>
      <p>Variability is an important metric of cloud infrastructures,
since distribution and resource sharing often hamper
consistent performance. To measure the variability of GCF while
executing scienti c work ows, we collected the duration of
parallel task execution in the Montage (0.25 degree)
workow { speci cally, mBackground, mDi Fit and mProjectPP
{ running 10 work ow instances over a period of one day.</p>
      <p>Results are shown in Fig. 5. We can see that the
distribution of tasks is moderately wide, with the inter-quartile
range of about 1 second width. The distribution is skewed
towards longer execution times, up to 7 seconds, while the
median is about 4 seconds. It is important that we do not
observe any signi cant outliers. We have to note that the
execution times of the tasks themselves vary (they are not
identical) and that the task duration includes data transfer
to/from cloud storage. Having taken this into account we
can conclude that the execution environment behaves
consistently in terms of performance, since the observed
variation is rather low. Further studies and long-term monitoring
would be required to determine whether such consistency is
preserved over time.</p>
      <p>Duration of tasks
mBackground
mDiffFit
mProjectPP</p>
      <p>The experiments conducted with our prototype
implementation con rm the feasibility of our approach to execution of
scienti c work ows in serverless infrastructures. There are,
however, some limitations that need to be emphasized here,
and some interesting implications for resource management
of scienti c work ows in such infrastructures.</p>
      <p>Granularity of tasks is a crucial issue which determines
whether a given application is well suited for processing
using serverless infrastructures. It is obvious that for
longrunning HPC applications a dedicated supercomputer is a
better option. On the other hand, for high-throughput
computing workloads distributed infrastructures such as grids
and clouds have proven useful. Serverless infrastructures can
be considered similar to these high-throughput
infrastructures, but they usually have shorter limits of task execution
size (300 seconds in the case of AWS Lambda, 60-second
default timeout for GCF). While these limits may vary, may be
con gurable or may change over time, we must assume that
each infrastructure will always impose some kind of limit,
which will constrain the types of supported work ows to
those consisting of relatively ne-grained tasks. Many
highthroughput work ows can t into these constraints, but for
the rest, other solutions should be developed, such as hybrid
approaches.
custom binaries or execution time limits.</p>
      <p>
        The hybrid approach can also be used to minimize costs
and optimize throughput. Such optimization should be based
on cost analysis of leasing a VM and calling a cloud
function, assuming that longer-term lease of resources typically
corresponds to lower unit costs. This idea is generally
applicable to hybrid cloud solutions [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. For example, it may
be more economical to lease VMs for long-running
sequential parts of the work ow and trigger cloud functions for
parallel stages, where spawning VMs that are billed on an
hourly basis would be more costly. It may also prove
interesting to combine cloud functions with spot instances or
burstable [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] instances, which are cheaper but have varying
performance and reliability characteristics.
      </p>
      <p>The hybrid approach can also help resolve issues caused
by the statelessness and transience of cloud functions, where
no local data is preserved between function calls. By adding
a traditional VM as one of the executor units, data transfers
can be signi cantly reduced in the case of tasks that need
to access to the same set of data multiple times.
6.3</p>
    </sec>
    <sec id="sec-21">
      <title>Resource management and autoscaling</title>
      <p>The core idea behind serverless infrastructures is that they
free the users from having to manage the server { and this
also extends to clusters of servers. Decisions concerning
resource management and autoscaling are thus made by the
platform based on the current workload, history, etc. This
is useful for typical Web or mobile applications that have
interactive usage patterns and whose workload depends on
user behavior. With regard to scienti c work ows which
have a well-de ned structure, there is ongoing research on
scheduling algorithms for clusters, grids and clouds. The
goal of these algorithms it to optimize such criteria as time
or cost of work ow execution, assuming that the user has
some control over the infrastructure. In the case of
serverless infrastructures the user does not have any control over
the execution environment. The providers would need to
change this policy by adding more control or the ability to
specify user preferences regarding the performance.</p>
      <p>For example, users could specify priorities when deploying
cloud functions, and a higher priority would mean faster
response time, quicker autoscaling, etc., but at an additional
price. Lower-priority functions could have longer execution
times, possibly relying on resource scavenging, but at a lower
cost. Another option would be to allow users to provide
hints regarding expected execution times or anticipated
parallelism level. Such information could be useful for internal
resource managers to better optimize the execution
environment and prepare for demand spikes, e.g. when many
parallel tasks are launched by a work ow.</p>
      <p>Adding support for cooperation between the application
and the internal resource manager of the cloud platform
would open an interesting area for research and
optimization of applications and infrastructures which both users and
providers could potentially bene t from.</p>
    </sec>
    <sec id="sec-22">
      <title>RELATED WORK</title>
      <p>Although scienti c work ows in clouds have been widely
studied, research focus is typically on IaaS and is little
related work regarding serverless or other alternative types of
infrastructures.</p>
      <p>
        An example of using AWS Lambda for analyzing genomics
data comes from the AWS blog [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The authors show how
to use R, AWS Lambda and the AWS API gateway to
process a large number of tasks. Their use case is to compute
some statistics for every gene in the genome, which gives
about 20,000 tasks in an embarrassingly parallel problem.
This work is similar to ours, but our approach is more
general, since we show how to implement generic support for
scienti c work ows.
      </p>
      <p>
        A detailed performance and cost comparison of traditional
clouds with microservices and the AWS Lambda serverless
architecture is presented in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. An enterprise application
was benchmarked and results show that serverless
infrastructures can introduce signi cant savings without
impacting performance. Similarly, in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] the authors discuss the
advantages of using cloud services and AWS Lambda for
systems that require higher resilience. They show how
serverless infrastructures can reduce costs in comparison to
traditional IaaS resources and the spot market. Although these
use cases are di erent from our scienti c scenario, we believe
that serverless infrastructures o er an interesting option for
scienti c work ows.
      </p>
      <p>
        An interesting general discussion on the economics of
hybrid clouds is presented in [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. The author shows that even
if when a private cloud is strictly cheaper (per unit) than
public clouds, a hybrid solution can result in a lower
overall cost in the case of a variable workload. We expect that
a similar e ect can be observed in the case of a hybrid
solution combining traditional and serverless infrastructures
for scienti c applications which often have a wide range of
granularity of tasks.
      </p>
      <p>
        Regarding the use of alternative cloud solutions for
scienti c applications, there is work on evaluation of Google
App Engine for scienti c applications [
        <xref ref-type="bibr" rid="ref14 ref16">16, 14</xref>
        ]. Google App
Engine is a Platform-as-a-Service cloud, designed mostly
for Web applications, but with additional support for
processing of background tasks. App Engine can be used for
running parameter-study high-throughput computing
workloads, and there are similar task processing time limits as in
the case of serverless infrastructures. The di erence is that
the execution environment is more constrained, e.g. only one
application framework is allowed (such as Java or Python)
and there is no support for native code and access to local
disk. For these reasons, we consider cloud functions such as
AWS Lambda or Google Cloud Functions as a more
interesting option for scienti c applications.
      </p>
      <p>
        The concept of cloud functions can be considered as an
evolution of former remote procedure call concepts, such
as GridRPC [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], proposed and standardized for Grid
computing. The di erence between these solutions and current
cloud functions is that the latter are supported by
commercial cloud providers with emphasis on ease of use and
development productivity. Moreover, the granularity of tasks
processed by current cloud functions tends to be ner, so
we need to follow the development of these technologies to
further assess their applicability to scienti c work ows.
      </p>
      <p>
        A recently developed approach to decentralized work ow
execution in clouds is represented by Flowbster [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], which
also aims at serverless infrastructures. We can expect that
more such solutions will emerge in the near future.
      </p>
      <p>
        The architectural concepts of scienti c work ows are
discussed in the context of component and service
architectures [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Cloud functions can be considered as a speci c
class of services or components, which are stateless and can
be deployed in cloud infrastructures. They do not impose
any rules of composition, giving more freedom to developers.
The most important distinction is that they are backed by
the cloud infrastructure which is responsible for automatic
resource provisioning and scaling.
      </p>
      <p>
        The architectures of cloud work ow systems are also
discussed in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. We believe that such architectures need to be
re-examined as new serverless infrastructures become more
widespread.
      </p>
      <p>Based on the discussion of related work we conclude that
our paper is likely the rst attempt to use serverless clouds
for scienti c work ows and we expect that more research in
this area will be needed as platforms become more mature.</p>
    </sec>
    <sec id="sec-23">
      <title>8. SUMMARY AND FUTURE WORK</title>
      <p>In this paper we have presented our approach to
combining scienti c work ows with the emerging serverless clouds.
We believe that such infrastructures based on the concept
of cloud functions, such as AWS Lambda or Google Cloud
Functions, provide an interesting alternative not only for
typical enterprise applications, but also for scienti c
workows. We have discussed several options for designing
serverless work ow execution architectures, including queue-based,
direct executor, hybrid (bridged) and decentralized ones.</p>
      <p>To evaluate the feasibility of our approach we implemented
a prototype based on the HyperFlow engine and Google
Cloud Functions, and evaluated it with the real-world
Montage application. Experiments with small-scale work ows
consisting of 43 and 107 tasks con rm that the GCF
platform can be successfully used, and that it does not introduce
signi cant delays. We have to note that the application
needs to prepared in a portable way to facilitate execution
on such infrastructure and that this may be an issue for more
complex scienti c software packages.</p>
      <p>Our paper also presents some implications of serverless
infrastructures for resource management of scienti c
workows. First, we observed that not all workloads are suitable
due to execution time limits, e.g. 5 minutes in the case of
AWS Lambda { accordingly, the granularity of tasks has to
be taken into account. Next, we discussed how hybrid
solutions combining serverless and traditional infrastructures
can help optimize the performance and cost of scienti c
work ows. We also suggest that adding more control or
the ability to provide priorities or hints to cloud platforms
could bene t both providers and users in terms of optimizing
performance and cost.</p>
      <p>Since this is a fairly new topic, we see many options for
future work. Further implementation work on development
and evaluation of various serverless architectures for
scienti c work ows is needed, with the decentralized option
regarded as the greatest challenge. A more detailed
performance evaluation of di erent classes of applications on
various emerging infrastructures would also prove useful to
better understand the possibilities and limitations of this
approach. Finally, interesting research can be conducted in
the eld of resource management for scienti c work ows, to
design strategies and algorithms for optimizing time or cost
of work ow execution in the emerging serverless clouds.</p>
    </sec>
    <sec id="sec-24">
      <title>Acknowledgments</title>
      <p>This work is partially supported by the National Centre
for Research and Development (NCBiR), Poland, project
PBS1/B9/18/2013. AGH grant no. 11.11.230.124 is also
acknowledged. The author would like to thank the Google
Cloud Functions team for the opportunity to use the alpha
version of their service.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>AWS</given-names>
            <surname>Lambda - Serverless Compute</surname>
          </string-name>
          ,
          <year>2016</year>
          . https://aws.amazon.com/lambda/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Cloud</given-names>
            <surname>Functions - Serverless Microservices</surname>
          </string-name>
          j Google Cloud Platform,
          <year>2016</year>
          . https://cloud.google.com/functions/.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Balis</surname>
          </string-name>
          .
          <article-title>HyperFlow: A model of computation, programming approach and enactment engine for complex distributed work ows</article-title>
          .
          <source>Future Generation Computer Systems</source>
          ,
          <volume>55</volume>
          :
          <fpage>147</fpage>
          {162, sep
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Balis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Figiela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Malawski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pawlik</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Bubak</surname>
          </string-name>
          .
          <article-title>A Lightweight Approach for Deployment of Scienti c Work ows in Cloud Infrastructures</article-title>
          . In R. Wyrzykowski,
          <string-name>
            <given-names>E.</given-names>
            <surname>Deelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dongarra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Karczewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kitowski</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          K. Wiatr, editors,
          <source>Parallel Processing and Applied Mathematics: 11th International Conference, PPAM</source>
          <year>2015</year>
          , Krakow, Poland, September 6-
          <issue>9</issue>
          ,
          <year>2015</year>
          . Revised Selected Papers,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , pages
          <volume>281</volume>
          {
          <fpage>290</fpage>
          ,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          ,
          <year>2016</year>
          . Springer International Publishing.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Bryan</given-names>
            <surname>Liston</surname>
          </string-name>
          .
          <article-title>Analyzing Genomics Data at Scale using R, AWS</article-title>
          <string-name>
            <surname>Lambda</surname>
          </string-name>
          , and
          <source>Amazon API Gateway j AWS Compute Blog</source>
          ,
          <year>2016</year>
          . http://tinyurl.com/h7vyboo.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Deelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Vahi</surname>
          </string-name>
          , G. Juve,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rynge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Callaghan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Maechling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mayani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          , R. Ferreira da Silva,
          <string-name>
            <given-names>M.</given-names>
            <surname>Livny</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Wenger</surname>
          </string-name>
          .
          <article-title>Pegasus, a work ow management system for science automation</article-title>
          .
          <source>Future Generation Computer Systems</source>
          ,
          <volume>46</volume>
          :
          <fpage>17</fpage>
          {35, may
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Gannon</surname>
          </string-name>
          . Component Architectures and
          <article-title>Services: From Application Construction to Scienti c Work ows</article-title>
          , pages
          <volume>174</volume>
          {
          <fpage>189</fpage>
          . Springer London, London,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Jacob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Katz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. B.</given-names>
            <surname>Berriman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Good</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Laity</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Deelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kesselman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.-H.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Prince</surname>
          </string-name>
          , and
          <string-name>
            <surname>Others</surname>
          </string-name>
          . Montage:
          <article-title>a grid portal and software toolkit for science-grade astronomical image mosaicking</article-title>
          .
          <source>International Journal of Computational Science and Engineering</source>
          ,
          <volume>4</volume>
          (
          <issue>2</issue>
          ):
          <volume>73</volume>
          {
          <fpage>87</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Juve</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Deelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Vahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Berriman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. P.</given-names>
            <surname>Berman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Maechling</surname>
          </string-name>
          .
          <article-title>Data Sharing Options for Scienti c Work ows on Amazon EC2</article-title>
          .
          <source>In SC '10 Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing</source>
          ,
          <article-title>Networking, Storage and Analysis</article-title>
          ,
          <source>SC '10</source>
          , pages
          <fpage>1</fpage>
          <article-title>{9</article-title>
          . IEEE Computer Society,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kacsuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kovacs</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Farkas</surname>
          </string-name>
          . Flowbster:
          <article-title>Dynamic creation of data pipelines in clouds</article-title>
          . In Digital Infrastructures for Research event, Krakow, Poland,
          <fpage>28</fpage>
          -30
          <source>September</source>
          <year>2016</year>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P.</given-names>
            <surname>Leitner</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Scheuner</surname>
          </string-name>
          .
          <article-title>Bursting with Possibilities { An Empirical Study of Credit-Based Bursting Cloud Instance Types</article-title>
          , dec
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yuan</surname>
          </string-name>
          , G. Zhang,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <source>The Design of Cloud Work ow Systems</source>
          . Springer New York, New York, NY,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Malawski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Juve</surname>
          </string-name>
          , E. Deelman, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Nabrzyski</surname>
          </string-name>
          .
          <article-title>Algorithms for cost-and deadline-constrained provisioning for scienti c work ow ensembles in IaaS clouds</article-title>
          .
          <source>Future Generation Computer Systems</source>
          ,
          <volume>48</volume>
          :1{
          <fpage>18</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Malawski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kuzniar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wojcik</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Bubak</surname>
          </string-name>
          .
          <article-title>How to Use Google App Engine for Free Computing</article-title>
          .
          <source>IEEE Internet Computing</source>
          ,
          <volume>17</volume>
          (
          <issue>1</issue>
          ):
          <volume>50</volume>
          {
          <fpage>59</fpage>
          ,
          <string-name>
            <surname>Jan</surname>
          </string-name>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mao</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Humphrey</surname>
          </string-name>
          .
          <article-title>Auto-scaling to minimize cost and meet application deadlines in cloud work ows</article-title>
          .
          <source>In SC '11 Proceedings of 2011 International Conference for High Performance Computing</source>
          ,
          <article-title>Networking, Storage and Analysis</article-title>
          ,
          <source>SC '11</source>
          , Seattle, Washington,
          <year>2011</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Prodan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sperk</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ostermann</surname>
          </string-name>
          .
          <article-title>Evaluating High-Performance Computing on Google App Engine</article-title>
          .
          <source>IEEE Software</source>
          ,
          <volume>29</volume>
          (
          <issue>2</issue>
          ):
          <volume>52</volume>
          {
          <fpage>58</fpage>
          ,
          <string-name>
            <surname>Mar</surname>
          </string-name>
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>K.</given-names>
            <surname>Seymour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Nakada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Matsuoka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dongarra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Casanova</surname>
          </string-name>
          .
          <article-title>Overview of gridrpc: A remote procedure call api for grid computing</article-title>
          .
          <source>In International Workshop on Grid Computing</source>
          , pages
          <volume>274</volume>
          {
          <fpage>278</fpage>
          . Springer,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D.</given-names>
            <surname>Thain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tannenbaum</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Livny</surname>
          </string-name>
          .
          <article-title>Distributed computing in practice: the Condor experience</article-title>
          .
          <source>Concurrency and Computation: Practice and Experience</source>
          ,
          <volume>17</volume>
          (
          <issue>2-4</issue>
          ):
          <volume>323</volume>
          {
          <fpage>356</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Villamizar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Garces</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ochoa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Castro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Salamanca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Verano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Casallas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Valencia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zambrano</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Lang</surname>
          </string-name>
          .
          <article-title>Infrastructure cost comparison of running web applications in the cloud using aws lambda and monolithic and microservice architectures</article-title>
          .
          <source>In 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)</source>
          , pages
          <fpage>179</fpage>
          {
          <fpage>182</fpage>
          , May
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>B.</given-names>
            <surname>Wagner</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Sood</surname>
          </string-name>
          .
          <source>Economics of Resilient Cloud Services. In 1st IEEE International Workshop on Cyber Resilience Economics</source>
          ,
          <year>Aug 2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Weinman</surname>
          </string-name>
          .
          <article-title>Hybrid Cloud Economics</article-title>
          .
          <source>IEEE Cloud Computing</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ):
          <volume>18</volume>
          {
          <fpage>22</fpage>
          ,
          <string-name>
            <surname>Jan</surname>
          </string-name>
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>