<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>B. Burns, B. Grant, D. Oppenheimer, E. Brewer, J. Wilkes, Borg, omega, and kubernetes:
Lessons learned from three container-management systems over a decade, Queue</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1145/2898442.2898444</article-id>
      <title-group>
        <article-title>Workflows on the Cloud-to-Edge Continuum</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Angelo Marchese</string-name>
          <email>angelo.marchese@phd.unict.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Orazio Tomarchio</string-name>
          <email>orazio.tomarchio@unict.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Electrical Electronic and Computer Engineering, University of Catania</institution>
          ,
          <addr-line>Catania</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>14</volume>
      <issue>2016</issue>
      <fpage>11</fpage>
      <lpage>13</lpage>
      <abstract>
        <p>Orchestrating data streaming and analytics applications presents challenges due to the increasing data volume and time-sensitive requirements. The combination of cloud and edge computing paradigms attempts to avoid their pitfalls while taking the best of both worlds: cloud scalability and compute closer to the edge where data is typically generated. However, placing microservices in such heterogeneous environments while meeting QoS constraints is a challenging task due to the geo-distribution of nodes and varying computational resources. In this paper we propose to extend Kubernetes to enable dynamic DAG workflow orchestration, taking into account both infrastructure and application states. Our approach aims to reduce QoS violations and improve application response time in Cloud-to-Edge continuum scenarios.</p>
      </abstract>
      <kwd-group>
        <kwd>DAG workflows</kwd>
        <kwd>Containers technology</kwd>
        <kwd>Orchestration</kwd>
        <kwd>Kubernetes</kwd>
        <kwd>Kubernetes scheduler</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        The orchestration of modern data streaming and analytics applications is a complex problem
to deal with, considering the increasing amount of data that needs to be processed and the
requirement for deadline-constrained response times [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. These applications are typically
implemented as DAG (Directed Acyclic Graph) workflows, where data collected from
geographically distributed sources, like users or sensors, is moved between diferent processing
microservices. Today, Cloud Computing ofers a reliable and scalable environment to execute
these applications. However, Cloud data centers are far away from the network edge and
then from end users and devices. This can lead to high application response times and limited
throughput. Edge Computing paradigm has emerged as a promising technology for mitigating
this problem by moving computation towards the network edge [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. However, Edge
environments are characterized by resource-constrained nodes and this can also have a negative
impact on the application response time. Then, to take advantage of the high computational
Cloud resources and the reduced network distance between data sources and Edge nodes, both
Cloud and Edge infrastructure are combined together to form the Cloud-to-Edge continuum, an
environment for executing distributed DAG workflows on multiple nodes organized in clusters.
CEUR
Workshop
Proceedings
However, establishing where to place each microservice of a workflow is a complex problem,
considering the geo-distribution of nodes and their heterogeneity in terms of computational
resources [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ].
      </p>
      <p>
        Kubernetes1 is today the de-facto orchestration platform for the deployment, scheduling
and management of containerized applications in Cloud environments. Kubernetes has been
initially thought for the orchestration of general purpose web services, but today it is also
used for data analytics and AI training workflows [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. However, Kubernetes has not been
designed for the orchestration of complex DAG workflows in geo-distributed and heterogeneous
environments such as the aforementioned Cloud-Edge infrastructures [
        <xref ref-type="bibr" rid="ref9">9, 10</xref>
        ]. In particular, the
default Kubernetes scheduling and orchestration strategy presents some limitations because
it does not consider the ever changing resource availability on cluster nodes, node-to-node
network latencies and the current application state, in terms of current resource usage of each
microservice and the communication relationships between microservices [11]. Considering
these factors when scheduling DAG workflows is critical in order to reduce QoS violations on
the application response time.
      </p>
      <p>To deal with those limitations, in this work we propose to extend the Kubernetes platform to
adapt its usage on environments distributed in the Cloud-to-Edge continuum. Our approach
enhances Kubernetes by implementing a dynamic DAG workflow orchestration and scheduling
strategy able to consider the current infrastructure state when determining a placement for
each microservice and to continuously tune the microservices placement based on the ever
changing infrastructure and application states.</p>
      <p>The rest of the paper is organized as follows. Section 2 provides some background information
about the Kubernetes platform and discusses in more detail some of its limitations that motivate
our work. In Section 3 the proposed approach is presented, providing some implementation
details of its components, while Section 4 provides results of our prototype evaluation in a
testbed environment. Section 5 examines some related works and, finally, Section 6 concludes
the work.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Background And Motivation</title>
      <p>Kubernetes is a container orchestration platform which automates the lifecycle management
of distributed applications deployed on large-scale node clusters [12]. A Kubernetes cluster
consists of a control plane and a set of worker nodes. The control plane is made up of
different management services that run inside one or many master nodes. The worker nodes
represent the execution environment of containerized application workloads. In Kubernetes,
minimal deployment units consist of Pods, which in turn contain one or more containers. In a
microservices-based application, each Pod corresponds to a single microservice instance.</p>
      <p>Among control plane components, the kube-scheduler2 is in charge of selecting an optimal
cluster node for each Pod to run them on, taking into account Pod requirements and node
resources availability. Each Pod scheduling attempt is split into two phases: the scheduling
cycle and the binding cycle, which in turn are divided into diferent sub-phases. During the
1https://kubernetes.io
2https://kubernetes.io/docs/concepts/scheduling-eviction/kube-scheduler
scheduling cycle a suitable node for the Pod to schedule is selected, while during the binding
cycle the scheduling decision is applied to the cluster by reserving the necessary resources and
deploying the Pod to the selected node. Each sub-phase of both cycles is implemented by one or
more plugins, which in turn can implement one or more sub-phases. The Kubernetes scheduler
is meant to be extensible. In particular, each scheduling phase represents an extension point
which one or more custom plugins can be registered at.</p>
      <p>Kubernetes scheduler placement decisions are influenced by the cluster state at that point of
time when a new Pod appears for scheduling. As Kubernetes clusters are very dynamic and
their state changes over time, better placement decisions may be taken with respect to the
initial scheduling of Pods. Several reasons can motivate the migration of a Pod from one node
to another one, like for example node under-utilization or over-utilization, Pod or node afinity
requirements not satisfied anymore and node failure or addition.</p>
      <p>To this aim a descheduler component has been recently proposed as a Kubernetes sub-project3.
This component is in charge of evicting running Pods so that they can be rescheduled onto more
suitable nodes. The descheduler does not schedule replacement of evicted Pods but relies on
the default scheduler for that. The descheduler’s policy is configurable and includes strategies
that can be enabled or disabled.</p>
      <p>While the default Kubernetes scheduler and descheduler implementations are suitable for
the orchestration of DAG workflows on centralized Cloud data centers, characterized by high
and uniform computational resources and low network latencies, they present some limitations
when dealing with node clusters dislocated on the Cloud-to-Edge continuum. The Kubernetes
scheduler does not place microservices based on their resource and communication requirements
and the current infrastructure state in terms of node resource availability and node-to-node
network latencies. This means that microservices with high computational resource
requirements could be placed on resource-constrained nodes and the microservices that exchange
trafic between them are not always placed on nearby nodes. In the same way the Kubernetes
descheduler does not reschedule Pods based on the ever changing infrastructure and application
states. This can lead to higher application response times and more frequent QoS violations.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Proposed Approach</title>
      <sec id="sec-4-1">
        <title>3.1. Overall Design</title>
        <p>Considering the limitations described in Section 2, in this work we propose to extend the
default Kubernetes orchestration strategy in order to adapt its usage to dynamic
Cloud-toEdge continuum environments. Leveraging our previous work presented in [13, 14], the main
idea of the proposed approach is that in this context the orchestration and scheduling of
complex DAG workflows should consider the dynamic state of the infrastructure where the
workflow is executed and also the run time workflow microservices resource and communication
requirements. In particular, in our approach, current node resource availability, node-to-node
network latencies, resource usage of microservices and communication intensity between
microservices are continuously monitored and taken into account during workflow scheduling.
3https://github.com/kubernetes-sigs/descheduler</p>
        <p>workflowtelemetry
Metrics Server</p>
        <p>infrastructuretelemetry
Kubernetes Cluster
Cluster
Monitoring
Operator</p>
        <p>Workflow Graph
Cluster Graph</p>
        <p>Custom
Scheduler</p>
        <p>Custom
Descheduler</p>
        <p>P1
N1
P2
N2</p>
        <p>P3
N3
Furthermore, a key point of our approach is the requirement to continuously tune the placement
of the workflow based on the ever changing infrastructure state of Cloud-Edge environments,
run time resource usage of microservices and their communication interactions.</p>
        <p>Figure 1 shows a general model of the proposed approach. The current infrastructure and
application states are monitored and all the telemetry data are collected by a metrics server. For
the infrastructure, node resource availability and node-to-node latencies are monitored, while
for the application, CPU and memory usage of microservices and the trafic amount exchanged
between them are monitored. Based on the infrastructure telemetry data the cluster monitoring
agent determines a cluster graph with the set of available resources on each cluster node and the
network latencies between them. Similarly, the workflow monitoring agent uses microservices
telemetry data to determine a workflow graph whose nodes represent microservices with their
current resource usage and the edges the communication channels between them each with
a specific weight that indicates the respective trafic amount sent through that channel. The
cluster and workflow graphs are then used by the custom scheduler to determine a placement
for each application Pod, and the custom descheduler to take Pod rescheduling actions if better
scheduling decisions can be done. Further details on the components of the proposed approach
are provided in the following subsections.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Workflow and Infrastructure Monitoring</title>
        <p>The cluster monitoring agent runs as a controller in the Kubernetes control plane. It periodically
determines the cluster graph with the currently available CPU and memory resources on each
cluster node and the node-to-node network latencies.</p>
        <p>During each execution the list of cluster Node resources is fetched from the Kubernetes API
server. First, for each node   the CPU and memory values currently available on it,   and
  respectively, are determined. These values are fetched by the agent from a Prometheus4
metrics server, which in turn collects them from node exporters executed on each cluster
node. The   and   parameters are then assigned as values for the available-cpu and
available-memory annotations of the node   .</p>
        <p>Then, for each pair of nodes   and   their network cost  , is determined. The  , parameter
is proportional to the network latency between nodes   and   . Network latency metrics are
fetched by the operator from the Prometheus metrics server, which in turn collects them from
network probe agents executed on each cluster node as Pods managed by a DaemonSet. These
agents are configured to periodically send ICMP trafic to all the other cluster nodes in order to
measure the round trip time value. For each node   the operator assigns to it a set of annotations
network-cost-  , with values equal to those of the corresponding  , parameters. Finally, the
cluster graph with the updated CPU and memory available resources and the network cost
values for each node is then submitted to the Kubernetes API server.</p>
        <p>The workflow monitoring agent runs also as a controller in the Kubernetes control plane and
periodically determines the workflow graph with the current CPU and memory usage for each
microservice and the trafic amounts exchanged between them.</p>
        <p>During each execution the list of Deployment resources that constitute a DAG workflow are
fetched from the Kubernetes API server. First, for each Deployment   its CPU and memory
usage,   and   respectively, are determined. These values are equal to the average CPU
and memory consumption of all the Pods managed by the Deployment   and are fetched by
the agent from the Prometheus metrics server, that in turn collects them from CAdvisor5 agents.
These agents are executed on each cluster node and monitor current CPU and memory usage
for the Pods executed on that node. The   and   parameters are then assigned as values
for the cpu-usage and memory-usage annotations of the Deployment   .</p>
        <p>Then for each Deployment   , the trafic amounts    , with all the other Deployments  
of the workflow are determined. The    , parameter is proportional to the trafic amount
exchanged between microservices   and   . Trafic metrics are fetched by the agent from the
Prometheus metrics server, which in turn collects them from the Istio6 platform. Istio is a service
mesh implementation, whose control plane is installed in the Kubernetes cluster. The Istio
control plane injects a sidecar container running an Envoy proxy on each Pod when they are
created. All the trafic between Pods is intercepted by their corresponding Envoy proxies that in
turn expose trafic statistics through metrics exporters that can be queried by the Prometheus
server.</p>
        <p>Each    , parameter is assigned by the agent as the value for the annotation trafic-   of
the Deployment   . The workflow graph with the set of CPU and memory usage and the trafic
amounts for each Deployment is then submitted to the Kubernetes API server.</p>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. Custom Scheduler</title>
        <p>The proposed custom scheduler extends the default Kubernetes scheduler by implementing two
additional plugins, the ResourceAware and NetworkAware plugins that extend the node scoring
phase of the default Kubernetes scheduler. For each Pod to be scheduled, each of the two plugins
assigns a partial score to each candidate node of the cluster that has passed the filtering phase.
The ResourceAware plugin takes into account the values of the cpu-usage and memory-usage
annotations of the Deployment associated with the Pod to be scheduled and the values of the
available-cpu and available-memory annotations of the node to be scored. The NetworkAware
plugin takes into account the values of the trafic annotations of the Deployment associated
5https://github.com/google/cadvisor
6https://istio.io
with the Pod to be scheduled and the values of the network-cost annotations of the node to be
scored. The node scores calculated by the ResourceAware and NetworkAware plugins are added
to the scores of the other scoring plugins of the default Kubernetes scheduler.</p>
        <sec id="sec-4-3-1">
          <title>Algorithm 1 Custom scheduler node scoring function</title>
          <p>Input:  , 
Output:  
1:    ←  ×
2:  ← 0
3: for  in 
4:
5:
6:
7:
8:</p>
          <p>do
in .
 ← 0
for</p>
          <p>←  + 
end for
 ←  + 
9: end for
10:   ← −
11:   ←  ×    +  ×  
  , 
  ,  , 
  ,</p>
          <p>,  ,   
 
   −  × 100 +  ×
   −  × 100</p>
          <p>do
,
×</p>
          <p>,
• 
• 
•   
•  : the Pod to be scheduled.
•   : the CPU usage of Pod  .</p>
          <p>: the memory usage of Pod  .
•  : the node to be scored.
•   : the CPU available on node  .</p>
          <p>: the memory available on node  .</p>
          <p>The algorithm takes as inputs the following arguments:
•  : the set of nodes in the cluster, including node  .
•  : the network costs between node  and all the other nodes  .</p>
          <p>: the trafic amounts between the Pod  and all the other Pods in the workflow.</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>The algorithm starts by calculating the value of the variable</title>
          <p>. This variable represents
the partial contribution to the final node score given by the ResourceAware scheduler plugin.
Its value is given by the weighted sum of the percentage diferences between the CPU and
memory currently available on node  and the respective usage values of Pod  . The α and
β parameters are in the range between 0 and 1 and their sum is equal to 1. By changing the
values of these parameters, a diferent contribution to the   
variable value is given by
the respective CPU and memory percentage diferences. The higher the diference between
available resources on node  and those used by Pod  , the greater the score assigned to node  .
This allows to efectively balance the load between cluster nodes and then to reduce the shared
resource interference between Pods resulting from incorrect node resource usage estimation
and then its impact on application performances.</p>
          <p>Then the partial contribution to the final node score given by the NetworkAware scheduler
plugin is calculated. First the variable  is initialized to zero. This variable represents the total
cost of communication between the Pod  and all the other Pods in the application when the Pod
 is placed on node  . This variable represents the total cost of communication between the Pod
 and all the other Pods of the application when the Pod  is placed on node  . The algorithm
iterates through the list of cluster nodes  . For each cluster node  the  variable value is
calculated. This variable represents the cost of communication between the Pod  and all the
other Pods . currently running on node  when the Pod  is placed on node  . For each
Pod  running on node  the    , parameter value is multiplied by the network cost
 , between node  and node  and added to the  variable. The  variable value is
then added to the  variable. The final partial node score contribution of the NetworkAware
scheduler plugin is assigned to the variable   as the opposite of the  variable value.
The   variable value is assigned in such a way that the Pod  is placed on the node, or in
a nearby node in terms of network latencies, where the Pods with which the Pod  exchanges
the greatest amount of trafic are executed.</p>
          <p>Then the final node score is calculated as the weighted sum between the    and  
variables values, where the γ and δ parameters are in the range between 0 and 1 and their sum
is equal to 1. By changing the values of these parameters, a diferent contribution to the  
variable value is given by the    and   values.</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>3.4. Custom Descheduler</title>
        <p>The custom descheduler runs as a controller in the Kubernetes control plane. The main business
logic of the custom descheduler is implemented by a descheduling function that is called
periodically for each application Pod to establish if that Pod should be rescheduled or not. Inside
the descheduling function the same node scoring function implemented by the custom scheduler
and showed in Algorithm 1 is invoked for each cluster node in order to assign them a score based
on the current cluster and workflow graphs determined by the cluster and workflow monitoring
agent respectively. If there is at least one node with a higher score than that of the node where
the Pod is currently executed, the descheduler evicts the Pod. As in the case of the default
Kubernetes descheduler, the proposed custom descheduler does not schedule a replacement
of evicted Pods but relies on the custom scheduler for that. The use of the proposed custom
descheduler is aimed at giving the running application Pods the possibility to be rescheduled on
the basis of the current cluster network latencies and computational resources availability on
each node and the trafic exchanged between microservices and their computational resources
usage, thus allowing to optimize the application placement at run-time. By evicting currently
running Pods and then forcing them to be rescheduled, application scheduling can take into
account the ever changing cluster and application states with the latter mainly influenced by the
user request load and patterns. One limitation of the proposed approach is that Pod eviction can
cause downtime in the overall application. However, it should be considered that cloud-native
microservices are typically replicated, so the temporary shutdown of one instance generally
causes only a graceful degradation of the application quality of service. To reduce the impact of
Pod rescheduling, for each execution the descheduler evicts one Pod at most among the replicas
of a single Deployment.</p>
        <p>m0
m11
db3
m12
db4
m1
m2
m3
m4
m5
m6
m7
m8
m9
db1
m10
db2</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Evaluation</title>
      <p>The proposed solution has been validated using a sample DAG workflow application executed
on a test bed environment. The application, whose structure is depicted in Figure 2, is composed
of diferent microservices and database servers. Microservice  0 represents the entry point for
external trafic coming from end users and input data sources. This service represents the entry
point for external user requests that are served by backend microservices that interact between
them by means of network communication.</p>
      <p>The test bed environment for the experiments consists of a Kubernetes cluster with one master
node and five worker nodes. These nodes are deployed as virtual machines on a Proxmox 7
physical node and configured with 8GB of RAM and 2 vCPU. In order to simulate a realistic
Cloud-to-Edge continuum environment with geo-distributed nodes, network latencies between
cluster nodes are simulated by using the Linux trafic control ( tc8) utility. By using this utility
network latency delays are configured on virtual network cards of the cluster nodes.</p>
      <p>We conduct black box experiments by evaluating the end-to-end response time of the workflow
application when HTTP requests are sent to the microservice  0 with a specified number of
virtual users each sending one request every second in parallel. Requests to the application
are sent through the k6 load testing utility9. Each experiment consists of 10 trials, during
which the k6 tool sends requests to the microservice  0 for 40 minutes. For each trial, statistics
about the end-to-end application response time are measured and averaged with those of the
other trials of the same experiment. For each experiment we compare both cases when our
cluster and workflow monitoring agents and custom scheduler and descheduler components
are deployed on the cluster and when only the default Kubernetes scheduler is present. We
consider three diferent scenarios based on the network latency between the cluster nodes:
10ms, 100ms and 200ms. In all the scenarios the α and β parameters of the ResourceAware
plugin of the custom scheduler are assigned the same value of 0.5 in order to make the CPU and
memory percentage diferences between the respective resource availability on cluster nodes</p>
      <p>Proposed Approach</p>
      <p>Default Scheduler
)
sm3,500
(
itsem3,000
e
n
o
p
se 2,000
litre
n
e
c
re 1,000
p
t95h 500
10010 100 200
and the resource usage of Pods contribute equally to the    variable value. Similarly the γ
and δ parameters are assigned the same value of 0.5 in order to make the    and  
variables values, determined by the ResourceAware and NetworkAware plugins respectively,
equally contribute to the final node score.</p>
      <p>Figure 3 illustrate the results of the three experiments performed, each for a diferent scenario,
showing the 95th percentile of the application response time as a function of the number of
virtual users that send requests to the application in parallel. In all the cases, the proposed
approach performs better than the default Kubernetes scheduler with average improvements of
39%, 56% and 66% in the three scenarios respectively. In the first scenario, network
communication has no significant impact on the application response time because of the low node-to-node
network latencies. Thus, the proposed network-aware scheduling strategy does not lead to high
improvements in the application response time. Furthermore, for a low number of virtual users
the proposed approach has similar performances to the default scheduler. This is because of the
limited shared resource interference between Pods though they are placed on the same nodes
by the default scheduler. However, when the number of virtual users increases, the proposed
approach performs better than the default scheduler, with higher improvements for higher
numbers of virtual users. The response time in the case of the default scheduler grows faster
than in the case of the proposed approach. This is because of the proposed resource-aware
scheduling strategy that distributes Pods on cluster nodes based on their run time resource
usage, then reducing the shared resource interference between Pods. In the other scenarios,
network communication becomes a bottleneck for the application response time and the lack
of a network-aware scheduling strategy leads to high response times. In these scenarios our
approach performs better than the default scheduler for low numbers of virtual users also,
with higher improvements for higher node-to-node network latencies. One consequence of the
combination of both a resource-aware and a network-aware scheduling strategy in our approach
is that when the number of virtual users increases the response time grows much faster for high
network latencies, though it remains lower than the response time in the case of the default
scheduler. This can be explained by the fact that, for a higher number of virtual users and then
for a higher request load, the average resource usage of microservices increases and then the
distribution of Pods among cluster nodes caused by the resource-aware scheduling strategy is
higher. This leads to an increase in the network latency between application microservices and
then in the end-to-end application response times.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Related Work</title>
      <p>In the literature, there is a variety of works that propose to extend the Kubernetes platform
in order to adapt its usage for the orchestration of microservices-based applications and in
particular DAG workflows on the Cloud-to-Edge continuum [ 15, 16].</p>
      <p>NetMARKS [17] is a Kubernetes scheduler extender that uses dynamic network metrics
collected with Istio Service Mesh to ensure an eficient placement of Service Function Chains,
based on the historical trafic amount exchanged between services. The proposed scheduler
however does not consider run-time cluster network conditions in its placement decisions.</p>
      <p>The authors of [18] propose to leverage application-level telemetry information during the
lifetime of an application to create service communication graphs that represent the internal
communication patterns of all components. The graph-based representations are then used to
generate colocation policies of the application workload in such a way that the cross-server
internal communication is minimized. However, in this work scheduling decisions are not
influenced by the cluster network state.</p>
      <p>In [19] a scheduling framework is proposed which enables edge sensitive and Service-Level
Objectives (SLO) aware scheduling in the Cloud-Edge-IoT Continuum. The proposed scheduler
extends the base Kubernetes scheduler and makes scheduling decisions based on a service
graph, which models application components and their interactions, and a cluster topology
graph, which maintains current cluster and infrastructure-speci c states. However, this work
does not consider historical information about the trafic exchanged between microservices in
order to determine their run time communication afinity.</p>
      <p>In [20] Nautilus is presented, a run-time system that includes, among its modules, a
communication-aware microservice mapper. This module divides the microservice graph
into multiple partitions based on the communication overhead between microservices and
maps the partitions to the cluster nodes in order to make frequent data interaction complete
in memory. While the proposed solution migrates application Pod if computational resources
utilization is unbalanced among nodes, there is no Pod rescheduling in the case of degradation
in the communication between microservices.</p>
      <p>In [21] Pogonip, an edge-aware scheduler for Kubernetes, designed for asynchronous
microservices is presented. Authors formulate the placement problem as an Integer Linear Programming
optimization problem and define a heuristic to quickly find an approximate solution for
realworld execution scenarios. The heuristic is implemented as a set of Kubernetes scheduler
plugins. Also in this work, there is no Pod rescheduling if network conditions change over time.</p>
      <p>In [22] an extension to the Kubernetes default scheduler is proposed that uses information
about the status of the network, like bandwidth and round trip time, to optimize batch job
scheduling decisions. The scheduler predicts whether an application can be executed within
its deadline and rejects applications if their deadlines cannot be met. Although information
about current network conditions and historical job execution times is used during scheduling
decisions, communication interactions between microservices are not considered in this work.</p>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusions</title>
      <p>In this work we proposed to extend the Kubernetes platform to adapt its usage for the
orchestration of complex DAG workflows executed on the Cloud-to-Edge continuum. The main goal
is to overcome the limitations of the Kubernetes static scheduling policies when dealing with
the placement of DAG workflows on highly distributed environments. The idea is to make the
Kubernetes scheduler aware of the run time communication intensity between the workflow
microservices and their resource usage, and the cluster network conditions to make scheduling
decisions that aim to reduce the overall workflow response time. Furthermore, a descheduler is
proposed to dynamically reschedule microservices if better scheduling decisions can be made
based on the ever changing application and cluster network states. As a future work we plan
to improve the proposed scheduling and descheduling strategies, by using AI techniques, in
particular those in the field of Reinforcement Learning, in order to design more sophisticated
algorithms that take into account historical information about both the infrastructure and
application run time states.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Salaht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Desprez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lebre</surname>
          </string-name>
          ,
          <article-title>An overview of service placement problem in fog and edge computing</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>53</volume>
          (
          <year>2020</year>
          ). doi:
          <volume>10</volume>
          .1145/3391196.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Calcaterra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. Di</given-names>
            <surname>Modica</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Tomarchio</surname>
          </string-name>
          ,
          <article-title>Cloud resource orchestration in the multi-cloud landscape: a systematic review of existing frameworks</article-title>
          ,
          <source>Journal of Cloud Computing</source>
          <volume>9</volume>
          (
          <year>2020</year>
          ).
          <source>doi:10.1186/s13677-020-00194-7.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Varghese</surname>
          </string-name>
          , E. de Lara,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bonomi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dustdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Harvey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hewkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Thiele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Willis</surname>
          </string-name>
          ,
          <article-title>Revisiting the arguments for edge computing research</article-title>
          ,
          <source>IEEE Internet Computing</source>
          <volume>25</volume>
          (
          <year>2021</year>
          )
          <fpage>36</fpage>
          -
          <lpage>42</lpage>
          . doi:
          <volume>10</volume>
          .1109/MIC.
          <year>2021</year>
          .
          <volume>3093924</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X.</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <article-title>Edge computing for internet of everything: A survey</article-title>
          ,
          <source>IEEE Internet of Things Journal</source>
          <volume>9</volume>
          (
          <year>2022</year>
          )
          <fpage>23472</fpage>
          -
          <lpage>23485</lpage>
          . doi:
          <volume>10</volume>
          .1109/JIOT.
          <year>2022</year>
          .
          <volume>3200431</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Goudarzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palaniswami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Buyya</surname>
          </string-name>
          ,
          <article-title>Scheduling iot applications in edge and fog computing environments: A taxonomy and future directions</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>55</volume>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .1145/3544836.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W. Z.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hakak</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Yaqoob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <article-title>Edge computing: A survey</article-title>
          ,
          <source>Future Generation Computer Systems</source>
          <volume>97</volume>
          (
          <year>2019</year>
          )
          <fpage>219</fpage>
          -
          <lpage>235</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.future.
          <year>2019</year>
          .
          <volume>02</volume>
          .050.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Riedlinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bernijazov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hanke</surname>
          </string-name>
          , AI Marketplace:
          <article-title>Serving Environment for AI Solutions using Kubernetes</article-title>
          ,
          <source>in: Proceedings of the 13th International Conference on Cloud Computing and Services Science - CLOSER</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>269</fpage>
          -
          <lpage>276</lpage>
          . doi:
          <volume>10</volume>
          .5220/ 0000172900003488.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.-A.</given-names>
            <surname>Corodescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Q.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Soylu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matskin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Payberah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roman</surname>
          </string-name>
          ,
          <article-title>Big data workflows: Locality-aware orchestration using software containers</article-title>
          ,
          <source>Sensors</source>
          <volume>21</volume>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .3390/s21248212.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kayal</surname>
          </string-name>
          ,
          <article-title>Kubernetes in fog computing: Feasibility demonstration, limitations and improvement scope : Invited paper</article-title>
          ,
          <source>in: 2020 IEEE 6th World Forum on Internet of Things (WF-IoT)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . doi:
          <volume>10</volume>
          .1109/WF-IoT48130.
          <year>2020</year>
          .
          <volume>9221340</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>