<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Weighted Load Balancing Mechanisms over Streaming Big Data for Online Machine Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Petros Petrou</string-name>
          <email>ppetrou@ubitech.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sophia Karagiorgou</string-name>
          <email>skaragiorgou@ubitech.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dimitrios Alexandrou</string-name>
          <email>dalexandrou@ubitech.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>UBITECH LTD</institution>
          ,
          <addr-line>Thessalias 8 and Etolias 10, Chalandri, 15231</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A growing number of complex applications, such as cloud and / or mobile computing, video on-demand and streaming big data analytics are influenced by the growth of users, devices and connections. At the same time, the microservices architecture paradigm enables software developers divide their application into small, independent, and loosely coupled services that can be hosted on multiple machines, thus enabling horizontal scale up. In this paper, we study how a weighted load balancing approach can improve application performance and online machine learning over streaming big data, specifically in Kubernetes-based environments. We introduce an automated process which prioritizes events by eficiently managing the communication among the interacting services (i.e. pods) through adaptive trafic routing and dynamic rules enforcement allowing to control the flow of data and API calls among them. This process guarantees services stability and autoscaling at runtime. We demonstrate the proposed approach in a prototype microservices application consisting of containerized and deployed pods on Kubernetes, named as Information Aware Networking Mechanisms. These mechanisms have been integrated into a sophisticated framework which takes into account the number of requests per second or the volume of data per hour and supports weighted load balancing mechanisms to minimize inter-pods communication and prioritize important events to an online machine learning model which is crucial for the examined application. The Information Aware Networking Mechanisms have been openly made available for deployment and experimentation to the research community to build upon.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Globally, devices, connections and applications are growing faster
(10% CAGR) than both the population (1% CAGR) and the Internet
users (6% CAGR) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This increasing use of services functioning
over online and ofline processing modes in COVID-19 era and
the rising demand for big data analysis require optimizations and
eficient trafic management mechanisms.
      </p>
      <p>
        The existing and emerging trend in Information and
Communication Technology (ICT) is towards high performance and
robust applications featuring higher network speeds at the
infrastructure level in order to fulfill the daily communication and
business needs. This trend is accelerating the increase in the
average number of devices and connections per user. Each year,
new devices with various functionalities, increased capabilities
and intelligence are introduced and adopted in the market. In
addition to this, a growing number of complex applications, such as
cloud-mobile computing serving web-mobile applications,
UltraHigh-Definition (UHD) video on-demand and big data analytic
applications are influenced by the growth of users, devices and
connections. Therefore, it is important to multiplex the changing
variance of user growth, devices, connections and data trafic in
multi-device ownership. Multimodal web-mobile or UHD video
applications, in particular, may have a multiplier efect on
trafifc which can only be controlled by high-capacity engineering
mechanisms at both network and application layers. For instance,
a video on-demand application in a household generates on an
average more streaming content than a SME comprised of 2-3
employees. At the same time, the pandemic pushed 41% of
connected consumers to stay home and shop online from their sofas,
avoiding crowded stores and waiting in lines at malls [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        Recently, network engineering mechanisms [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] have matured
in cloud platforms [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] providing “network connectivity as a
service”. The latter facilitates a very rich ecosystem of
networking solutions and services, such as Load-Balancer-as-a-Service
(LBaaS), Firewall-as-a-Service (FWaaS) and VPN-as-a-Service
(VPNaaS). The current shortcoming with respect to networking
in such platforms includes standalone services and containers
networking. The issue is that every new networking solution
proposes and tries to reinvent how containers [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] will be
interacting by adding more complexity or multiple encapsulation among
services.
      </p>
      <p>
        Cloud-native applications rely on characteristics such as
horizontal / vertical scaling and elastic services ofered by the
underlying platforms. Interest in ofloading application level networking
functionality such as trafic and security management onto
service meshes is increasing in the community and becoming a
critical element of big data cloud infrastructures [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. For instance,
Istio [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is an open source service mesh management platform
that layers transparently onto existing distributed cloud
applications. This service mesh management can be further enriched
by Kiali [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] which provides observability through dashboards
and enables to operate the mesh with robust configuration and
validation capabilities.
      </p>
      <p>To accommodate the fast expanding trafic and the demanding
nature of streaming data analytics in multimodal web-mobile (i.e.
connected consumers) or video on-demand applications,
automated weighted load balancing mechanisms are required over
the underlying resources to enhance the capacity of existing
computer networks, cloud infrastructures or data centres serving
computationally intensive tasks. To enable heterogeneous and
converging big data clouds being dynamically adaptive to changes,
research is needed to identify and quantify the performance
metrics which impact these environments. At the same time,
understanding containerized platforms and how their
characteristics afect the application performance enables to predictably
achieve cost-efective and resource-aware deployments.</p>
      <p>
        This paper is concentrated on an automated process which
enables to prioritize events by eficiently managing the
communication among the interacting pods through adaptive routing
and policies enforcement allowing to control the flow of trafic
and API calls between microservices. This process guarantees at
runtime services stability, autoscaling and advanced performance.
We demonstrate the proposed approach in a prototype
microservices application consisting of containerized and deployed pods
on a Kubernetes cluster, named as Information Aware Networking
Mechanisms consisting of two logical units, i.e., a Load Generator
and an Autoscaler. Istio has been deployed on the cluster to
establish pod level communications, delegate trafic flows and filter
requests. We dive into the characteristics and functionalities of
the Information Aware Networking Mechanisms delivered as an
open source deployment and experimental framework, through
which cloud administrators and big data engineers may benefit
from in implementing their own network engineering
automations. The Information Aware Networking Mechanisms are used
to conduct experiments related with the velocity of requests, the
volume of requests and the response time. The benefit is that the
Autoscaler results in eficiently handling more requests per
second or more data volume per hour through runtime adaptations
by scaling up more pods without afecting the response time of
the application. Our contributions in this paper are as follows:
• A presentation of how network trafic is routed among
Kubernetes pods with Istio service mesh and how network
policies are enforced to realize trafic management and
observability in order to achieve incremental ML Model
training and update over big streaming data.
• An experimental exploration of how to correctly
configure the Istio service mesh enabled with sidecar injection
featuring pods telemetry monitoring and visualization
through Kiali.
• A data experimentation using Apache JMeter [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and the
Prometheus [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] monitoring system in order to analyse
the enforcement of network policies and prioritization
schemes (i.e. response time, requests per second, data
volume per hour) based on Service-Level Objectives (SLOs)
defined as concrete metrics.
• A fully open source deployment and experimental
framework released under the Apache License1,2,3,4, enabling
trafic prioritization through weighted load balancing,
access control and rate limit across diverse protocols and
runtimes which is designed for scalability and
reproducibility.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>Many circumstances necessarily force changes in modern
applications, such as the fast increase in the use of cloud computing,
the extensive use of multimedia streaming, the time
consuming task of ML Model training and update over big data or the
high-speed, high-throughput and low-latency of user application
requirements.</p>
      <p>
        Cloud environments are based on predefined service, network
and security policies and are dificult to be configured
dynamically. In principle, these are time-shared environments with high
variance, and thus, performance testing requires repeated
experiments, consideration of multiple objectives (i.e. cloud scalability,
maximized usage of memory, reduced disk I/O, minimized data
transfer over the network or parallel processing to fully
leverage multi-processors) and statistical analysis [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. The proposed
approach takes advantage of inter-pods communication to
dynamically configure network policies and data trafic to targeted
pods constituting the end-to-end application during runtime
1http://bigdatastack-tasks.ds.unipi.gr/ppetrouubi/istioyaml
2http://bigdatastack-tasks.ds.unipi.gr/ppetrouubi/istioproxy
3http://bigdatastack-tasks.ds.unipi.gr/ppetrouubi/istiopod
4http://bigdatastack-tasks.ds.unipi.gr/ppetrouubi/istiopodconsumer
based on defined SLOs (i.e. response time, requests per second,
data volume per hour).
      </p>
      <p>
        Although there is a growing interest in and rapid adoption of
containers, running them in production requires a steep learning
curve, due to technological immaturity and lack of operational
know-how [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Henning and Hasselbring [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] introduced a method
for benchmarking the scalability of distributed stream processing
engines. Within their method, they presented use cases where
the stream processing microservices need to fulfill multiple
constraints along with an in advance knowledge w.r.t. how the
demand of resources is evolving with increasing workloads. Our
approach is diferent because the defined SLOs are monitored
throughout the application life cycle to drive runtime adaptations
via pods control in the service mesh and prioritization schemes
enforced by the network policies.
      </p>
      <p>
        Eismann et. al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] presented the benefits and challenges of
microservices from a performance tester’s point of view. Through
a series of experiments, they demonstrated how heterogeneous
microservices afect the application performance. They also
presented that it is not trivial to achieve reliable performance testing
and that the process of rating cloud infrastructure oferings based
on their achieved elastic scaling still remains uncharted. The
latter served both as a challenge and motivation for us to elaborate
over the novel concept of runtime adaptations, network policies
and pods prioritization schemes.
      </p>
      <p>
        Herbst [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] proposed a descriptive load profile modeling
framework together with automated model extraction from recorded
traces to enable reproducible workload generation with realistic
load intensity variations for improved applications elasticity. The
proposed mechanisms in this paper exploit load profiling to
dynamically trigger adaptations and ML Model update at runtime.
      </p>
      <p>The essential problem of dealing with big data is, in fact, a
resource issue. Because the larger the volume of the data, the
more the resources are required, in terms of memory, processors,
and disks. The goal of performance optimization is to either
reduce resource usage or make it more eficient to fully utilize
the available resources, in order to take less time to read, write,
or process the data.</p>
      <p>
        Grohmann et. al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] use machine learning techniques to
automatically recommend the best suitable approach for services
demand estimation. Their approach works in an online fashion
and incorporates new measurement data and changing
characteristics on-the-fly. Lin et. al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] propose and answer two research
questions regarding the prediction and optimization of
performance and cost of serverless applications. They propose a new
construct to formally define a serverless application workflow,
and then implement analytical models to predict the average
end-to-end response time and the cost of the workflow.
      </p>
      <p>Compared to the above mentioned approaches, the proposed
framework difers by introducing Information Aware Networking
Mechanisms contributing in autoscaling and resulting in
incremental ML Model training and update of a real-world connected
consumer application at runtime. In an environment under high
load, service demands normally cannot be directly measured,
and therefore a number of estimation approaches exist based on
high-level performance metrics. We show that service demands
by means of velocity, volume and real-time constraints can be
eficiently handled by weighted load balancing approaches which
give us encouraging indications for further research on the
application of service demand routing. The metrics that we take into
consideration refer to the number of requests per second, the data
volume per hour and the desired response time. The experiments
over the proposed Information Aware Networking Mechanisms
show that while Istio does cost in terms of adding some more
resources, it is able to support a better overall performance and
scalability for highly loaded environments in terms of
successfully served requests, without requests rejection especially for
data intensive operations in real-time.</p>
    </sec>
    <sec id="sec-3">
      <title>3 INFORMATION AWARE NETWORKING</title>
    </sec>
    <sec id="sec-4">
      <title>MECHANISMS</title>
    </sec>
    <sec id="sec-5">
      <title>3.1 Preliminaries</title>
      <p>To facilitate the reader follow some basic concepts, we briefly
present the core features of Istio. The Istio service mesh is divided
in two logical units: a control plane and a data plane. The control
plane manages and configures proxies to route trafic. It also
conifgures components to enforce policies and collect telemetry data.
In the data plane, Istio support is added to a service by
deploying an intelligent sidecar proxy, called Envoy. This sidecar proxy
routes requests to and from other proxies, mediates and controls
all network communication between microservices by forming
the mesh network. Figure 1 shows the diferent components that
make up each plane, as well as the high-level architecture of Istio.</p>
      <p>3.2.1 Load Generator. Data is harvested by two main sources.
The Load Generator using JMeter generates variant workloads
and simulates how the environment under test behaves from
the perspective of an external observer. In addition to this, the
Prometheus monitoring system deployed in the Kubernetes
cluster reports on the internally observable behavior. The Load
Generator has been deployed in a distributed mode as it is depicted
in Figure 2. In this figure, JMeter initiates one controller node
which launches tests on multiple worker nodes.</p>
      <p>The configuration of the experiments was made by setting
properties in a .jmx file through JMeter, which generates external
trafic mimicking the behaviour of multiple HTTP clients. In this
ifle, we configure the number of concurrent users accessing the
application, the velocity / number of requests per second and
the volume of requests measured in GB/hour. A sample .jmx file
containing: a) important events (i.e. REC_RMV) which are getting
higher priority by the Autoscaler because they are considered
more crucial for the ML Model training and update; and b) events
(i.e. VIZ_PROD) which are equally treated as they are arriving, is
presented as follows. More technical details about the users (i.e.
JMeter Workers) are presented in Section 4.
&lt;events&gt;
&lt;elementProp name="productId" elementType="HTTP"&gt;
&lt;strProp name="Arg.name"&gt;productId&lt;/stringProp&gt;
&lt;strProp name="Arg.value"&gt;1&lt;/stringProp&gt;
&lt;/elementProp&gt;
&lt;elementProp name="customerId" elementType="HTTP"&gt;
&lt;strProp name="Arg.name"&gt;customerId&lt;/stringProp&gt;
&lt;strProp name="Arg.value"&gt;14&lt;/stringProp&gt;
&lt;/elementProp&gt;
&lt;elementProp name="eventType" elementType="HTTP"&gt;
&lt;strProp name="Arg.name"&gt;eventType&lt;/stringProp&gt;
&lt;strProp name="Arg.value"&gt;REC_RMV&lt;/stringProp&gt;
&lt;/elementProp&gt;
&lt;/events&gt;
&lt;events&gt;
&lt;elementProp name="productId" elementType="HTTP"&gt;
&lt;strProp name="Arg.name"&gt;productId&lt;/stringProp&gt;
&lt;strProp name="Arg.value"&gt;2&lt;/stringProp&gt;
&lt;/elementProp&gt;
&lt;elementProp name="customerId" elementType="HTTP"&gt;
&lt;strProp name="Arg.name"&gt;customerId&lt;/stringProp&gt;
&lt;strProp name="Arg.value"&gt;12&lt;/stringProp&gt;
&lt;/elementProp&gt;
&lt;elementProp name="eventType" elementType="HTTP"&gt;
&lt;strProp name="Arg.name"&gt;eventType&lt;/stringProp&gt;
&lt;strProp name="Arg.value"&gt;VIZ_PROD&lt;/stringProp&gt;
&lt;/elementProp&gt;
&lt;/events&gt;</p>
      <p>
        3.2.2 Autoscaler. With the convergence of all data operations
and services in the same network mesh, the Autoscaler
manages trafic by taking into consideration the network utilization,
the pods requirements and the communication latency without
compromising the eficiency of the serving application. Using
policy statements, the end users can specify which kinds of pods
they desire to give weighted load priority, at what times and
on what part of their communication protocol (i.e. TCP, HTTP,
etc.). By deploying and configuring the Istio service mesh [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
and the Autoscaler at the experimental testbed all the data
metrics are collated by Prometheus Mixer [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and stored in the
Prometheus [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] monitoring system. Kiali [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] uses the data
stored in Prometheus to show the service mesh topology, metrics,
trafic information and more. The beneficiaries of the Autoscaler
can be cloud administrators and / or big data engineers with the
ability to either configure their big data flows or the application
requirements with specific networking primitives which fulfill
the desired SLOs (i.e. bursty or voluminous pods demand and
requests able to be eficiently served). The objective may refer to
the eficient management of various kinds of trafic (i.e. streams,
batches and micro batches) by getting the availability isolation
and bandwidth priority that are needed to serve the end-to-end
application. A common configuration as part of the architecture
design which concretizes the data flow events and the respective
logical interaction of the pods is depicted as follows.
underlying pods (i.e. distributed object storage, ML Model
training and update as well as other serving pods) without the
restriction to only use HTTP. The workloads in the experimental
testbed communicate without IP encapsulation or network
address translation for improved bare metal performance, which
enables easy troubleshooting and better interoperability.
      </p>
      <p>Through Kiali, we visualize mesh network health between the
interacting pods where the Autoscaler routes the weighted loads
(i.e. events A vs. other events) to the ML Model. For instance,
through the Kiali dashboard we can see in Figure 4 the request
duration of the Proxy Service stressing the Autoscaler by means
of average, median (i.e p50) and maximum (i.e. p95) quantiles.</p>
      <p>The deployed microservices / pods and their interaction are
described in YAML files. In order to enable Istio service mesh for
pods at the experimental testbed, we add "sidecar.istio.io/inject:
"true" in the YAML file. The Proxy Service based on Istio
policies acts as a gateway which receives external trafic. Then, the
Autoscaler Service through the Istio service mesh sidecar splits
the requests / trafic based on the application requirements by
ifltering accordingly the type of events. In Figure 3, the important
events (i.e. real-time A events) get more priority (i.e. 90%) and are
assigned a greater bandwidth within the cluster compared to the
events (i.e. real-time other events) related with more trivial user
interactions, e.g. to visualize a product or equivalent. As the
endto-end application under experimentation serves a connected
consumer cluster of pods, the real-time events of type A relate
with transactions directly afecting the ML Model. Specifically,
the real-time events of type A contribute in the online training of
the ML Model which implements the Alternating Least Squares
matrix factorization algorithm in the Apache Spark MLlib. This
ML Model supports a recommender pod which posts
personalized suggestions to the end users of the connected consumer
application.</p>
      <p>To address the current challenges, the SLOs and the respective
network policies have been implemented and are supported by
the Autoscaler. At the same time, to generalize the appropriate
attributes in order to weight the trafic towards concrete
microservices / pods, we give priority to events of interest according to
their type. This operation implements the policy enforcement
endpoint inside the pod as sidecar container in the same network
namespace. This approach is highly flexible and facilitates to
apply policies in the support of operational goals, such as
service routing, prioritization schemes over data flows, retries and
circuit-breaking.</p>
      <p>The Autoscaler operates at the application layer by taking into
consideration metrics from the Kubernetes cluster and
specifically from the Prometheus monitoring system. The latter gives
the advantage of being universal. Our focus is to address the
challenges arising from the diverse data processing modes (i.e.,
stream, micro-batch, batch) to eficiently enforce policies to the
4</p>
    </sec>
    <sec id="sec-6">
      <title>EXPERIMENTS</title>
      <p>In this section, we describe our approach to experiment over the
behaviour of the Information Aware Networking Mechanisms,
consisting of two logical units which generate and prioritize
big data flows based on their characteristics and requirements.
Benchmarking with variant volumes and velocity of data is
fundamental to assess the feasibility of the examined application.
Our goal is to validate the eficiency of the proposed mechanisms
based on the data volume of requests per hour, the velocity of
requests per second and the response time. To this end, we have
defined experiments with the following goals:
• Data generation with variant volumes and velocity;
• Experimental scalability and reproducibility;
• Data control and exploration analysis before and after the
Autoscaler execution resulting in more eficient ML Model
updates.
4.1</p>
    </sec>
    <sec id="sec-7">
      <title>Configuration and Deployment</title>
      <p>To evaluate the performance of the Information Aware
Networking Mechanisms, we used a cluster of one master with 3 workers.
Each node has a 4 core QEMU Virtual CPU at 2.4GHz, and 16
GB of RAM. The Load Generator is being executed through 3
distributed pods lying in each worker and several network,
nodelevel and pod-level metrics such as latency, throughput, CPU
&amp; memory usage, disk I/O, etc. within the cluster are measured
using the Prometheus monitoring system. The configuration and
the deployment of the respective pods acting as enablers of the
Information Aware Networking Mechanisms are described through
the following YAML file.
kind: VirtualService
apiVersion: networking.istio.io/v1alpha3
metadata:
name: autoscaler
namespace: istioapp
spec:
hosts:</p>
      <p>- autoscaler.istioapp.svc.cluster.local
http:
- match:
- uri:</p>
      <p>exact: /poll
route:
- destination:
host: autoscaler.istioapp.svc.cluster.local
subset: v1
weight: 90
- destination:
host: autoscaler.istioapp.svc.cluster.local
subset: v2
weight: 10
- fault:
abort:
httpStatus: 500
percentage:</p>
      <p>value: 0
match:
- uri:</p>
      <p>exact: /feedbacks/event_A
route:
- destination:
host: autoscaler.istioapp.svc.cluster.local
subset: v1
weight: 100
- fault:
abort:
httpStatus: 500
percentage:</p>
      <p>value: 0
match:
- uri:</p>
      <p>exact: /feedbacks/event_B
- uri:</p>
      <p>exact: /feedbacks/event_C
- uri:</p>
      <p>exact: /feedbacks/event_D
route:
- destination:
host: autoscaler.istioapp.svc.cluster.local
subset: v2
weight: 100
4.2</p>
    </sec>
    <sec id="sec-8">
      <title>Performance Evaluation of Autoscaler</title>
      <p>The Autoscaler plays the role of eficiently filtering, splitting and
load balancing in a weighted manner the big data flows targeting
at the ML Model. The Kiali system interacts with the Prometheus
monitoring system to collect metrics w.r.t. the data volume of
requests per hour, the velocity of requests per second and the
response time. The experimental settings of the Autoscaler are
broken into 3 phases, as follows:
• Configuration of variable data volumes and velocity of
requests, which includes the initialization of the
respective pods at the JMeter and Istio contexts. The Proxy
Service receives data from the Load Generator based on
defined metrics w.r.t. the velocity (i.e. the number of
requests per second, 1K, 2.2K, 4.5K and 9.4K) and the data
volume (i.e. 1.44GB/hour, 2.81GB/hour, 5.62GB/hour and
10.44GB/hour), while the Autoscaler splits the data flows
in other serving pods. Then, the ML Model performs an
HTTP GET request based on the defined prioritization
scheme (e.g. real-time A events = 90% and real-time other
events = 10%). The prioritization scheme can be adjusted
according to the application constraints (i.e. SLOs), while
in the examined case it achieves to automatically scale up
the serving pods contributing in the ML Model update.
• The decomposition and routing of the respective requests
is being held through the Istio service mesh. Kiali
communicates with Prometheus and gets metrics (i.e. % weighted
requests) about how the requests received by the Proxy
Service and prioritized by the Autoscaler are routed to the
ML Model pod. According to the metrics received by each
ML Model pod, the Autoscaler dynamically scales up more
(ML Model) pods before a pod’s queue saturation reaches
95%.
• The visualization of the experimental results including
the velocity of requests in total number per second, the
volume of requests in GB/hour and the response time in
seconds. These results validate that the weighted load
balancing mechanisms have been correctly activated and that
resources are eficiently scaled up to serve the application
without rejecting new data / requests, as needed.</p>
      <p>The experimental results in Figure 5 demonstrate that by
increasing the velocity of the streaming data flows (i.e. 1K, 2.2K,
4.5K and 9.4K requests per second), the Autoscaler needs to
merely scale up to 8 pods in order to eficiently serve and
prioritize more requests to the ML Model without rejecting newcoming
data. The Autoscaler itself is served by the cloud infrastructure
without adding considerable overhead in the underlying
computing resources, while it handles all the type of events and
determines to add more resources when the velocity or the volume
of important events increases. Then, the Autoscaler ensures that
the real-time prioritized events of type A will get the required
network bandwidth and computing resources. The real-time A
events are considered important because they contribute in
updating the ML Model which calculates the recommendations for
the personalized suggestions delivered to the end users of the
connected consumer application.</p>
      <p>The experimental results in Figure 6 depict that the end-to-end
response time between the Autoscaler and the ML Model remains
constant although the number of requests has been increased by
a factor of X1000 or more in each iteration without considerable
resources cost or waste, i.e. by only scaling up few pods. The
additional resources allocated to the Autoscaler permit to eficiently
prioritize more requests to the ML Model in a parallel manner
without discharging events, while maintaining the response time
low and constant by satisfying the real-time SLOs and constraints
of the streaming application.</p>
      <p>The experimental results in Figure 7 present that the
Autoscaler can eficiently manage huge volumes of data (i.e. up
to tens of GB per hour) by scaling up more pods allowing more
trafic to be served on-demand by the ML Model.</p>
    </sec>
    <sec id="sec-9">
      <title>CONCLUSIONS AND FUTURE WORK</title>
      <p>The purpose of this work is to present a systematic analysis in the
direction of Information Aware Networking Mechanisms which
enable eficient trafic prioritization through weighted load
balancing, access control and rate limit across diverse protocols
and runtimes. The Information Aware Networking Mechanisms
are ofered as a fully open source deployment and experimental
framework released under the Apache License which is designed
for big data clouds in the support of massive data generation,
scalability, reproducibility and data exploration. In order to analyse
the enforcement of network policies and prioritization schemes
(i.e. response time, velocity and volume of requests) based on
Service-Level Objectives (SLOs), we used Apache JMeter and the
Prometheus monitoring system. We also presented an
experimental deployment of the Istio service mesh enabled with sidecar
injection featuring pods trafic prioritization, telemetry
monitoring and visualizations through Kiali. The Autoscaler takes into
consideration several network, node-level and pod-level metrics
such as latency, throughput, CPU &amp; memory usage, disk I/O, etc.
within the cluster in order to draw decisions, enforce policies and
scale up more pods to facilitate the online training and update of
the ML Model, and therefore the streaming connected consumer</p>
      <p>In the near future, we plan to extend the Information Aware
Networking Mechanisms in order to optimize the time the route
policies need to take efect. As the modification of a policy takes
some time (i.e. seconds) in order to be propagated to all the
sidecars, we plan to intelligently enhance the networking
mechanisms. Especially in large deployments, the respective policies
need to be enforced well in advance an automatic process, like
the Autoscaler, takes any decision or actuates the addition of
more resources.</p>
    </sec>
    <sec id="sec-10">
      <title>ACKNOWLEDGMENTS</title>
      <p>The research leading to these results has received funding by
the European Commission project H2020 BigDataStack “Holistic
stack for big data applications and operations” (https://bigdatastack.
eu/) under grant agreement No. 779747.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Anne</given-names>
            <surname>Thomas Andrew Lerner</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Innovation insight for service mesh</article-title>
          .
          <source>Market report. Technical Report.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Apache</surname>
            <given-names>JMeter</given-names>
          </string-name>
          <year>2020</year>
          .
          <article-title>The Apache JMeter application is open source software, a 100% pure Java application designed to load test functional behavior and measure performance</article-title>
          .
          <source>Retrieved December 20</source>
          ,
          <year>2020</year>
          from https://jmeter.apache.org/
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Arun</surname>
            <given-names>Chandrasekaran</given-names>
          </string-name>
          <year>2020</year>
          .
          <article-title>Best Practices for Running Containers and Kubernetes in Production</article-title>
          .
          <source>Retrieved December 20</source>
          ,
          <year>2020</year>
          from https://www.gartner.com/en/documents/3902966/ best-practices
          <article-title>-for-running-containers-and-kubernetes-in-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>CISCO</given-names>
            <surname>2020. Cisco Annual Internet Report (2018-2023) White</surname>
          </string-name>
          <article-title>Paper</article-title>
          .
          <source>Retrieved December 10</source>
          ,
          <year>2020</year>
          from https://www.cisco.com/c/en/us/solutions/collateral/ executive-perspectives/annual-internet
          <source>-report/white-paper-c11-741490</source>
          . html
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Docker</surname>
            <given-names>Container</given-names>
          </string-name>
          <year>2020</year>
          .
          <article-title>What is a Container? A standardized unit of software</article-title>
          .
          <source>Retrieved December 10</source>
          ,
          <year>2020</year>
          from https://www.docker.com/resources/ what-container
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Simon</given-names>
            <surname>Eismann</surname>
          </string-name>
          ,
          <string-name>
            <surname>Cor-Paul Bezemer</surname>
          </string-name>
          , W. Shang, Dusan Okanovic,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Hoorn</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Microservices: A Performance Tester's Dream or Nightmare?</article-title>
          <source>Proceedings of the ACM/SPEC International Conference on Performance Engineering</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Johannes</given-names>
            <surname>Grohmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Herbst</surname>
          </string-name>
          , Simon Spinner, and
          <string-name>
            <given-names>Samuel</given-names>
            <surname>Kounev</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Using Machine Learning for Recommending Service Demand Estimation Approaches - Position Paper</article-title>
          .
          <source>In CLOSER.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Henning</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Hasselbring</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines</article-title>
          . ArXiv abs/
          <year>2009</year>
          .00304 (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.</given-names>
            <surname>Herbst</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Methods and Benchmarks for Auto-Scaling Mechanisms in Elastic Cloud Environments</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <article-title>Istio 2020</article-title>
          .
          <article-title>Connect, secure, control, and observe services</article-title>
          .
          <source>Retrieved December 10</source>
          ,
          <year>2020</year>
          from https://istio.io/
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <article-title>Kiali 2020</article-title>
          .
          <article-title>Service mesh management for Istio</article-title>
          .
          <source>Retrieved December 10</source>
          ,
          <year>2020</year>
          from https://kiali.io/
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Lauren</surname>
            <given-names>Thomas</given-names>
          </string-name>
          <year>2020</year>
          .
          <article-title>Black Friday 2020 online shopping surges 22% to record $9 billion, Adobe says</article-title>
          .
          <source>Retrieved December 10</source>
          ,
          <year>2020</year>
          from https://cutt.ly/ph0IMfI
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C.</given-names>
            <surname>Lin</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Khazaei</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>Modeling and Optimization of Performance and Cost of Serverless Applications</article-title>
          .
          <source>IEEE Transactions on Parallel and Distributed Systems</source>
          <volume>32</volume>
          ,
          <issue>3</issue>
          (
          <year>2021</year>
          ),
          <fpage>615</fpage>
          -
          <lpage>632</lpage>
          . https://doi.org/10.1109/TPDS.
          <year>2020</year>
          .3028841
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>OpenStack</surname>
          </string-name>
          <year>2020</year>
          .
          <article-title>Cloud Infrastructure for Virtual Machines, Bare Metal, and Containers</article-title>
          .
          <source>Retrieved December 10</source>
          ,
          <year>2020</year>
          from https://www.openstack.org/
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>OpenStack</surname>
          </string-name>
          <article-title>Neutron 2020</article-title>
          .
          <article-title>Neutron's documentation</article-title>
          .
          <source>Retrieved December 10</source>
          ,
          <year>2020</year>
          from https://docs.openstack.org/neutron/pike
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Vittorio</surname>
          </string-name>
          <string-name>
            <surname>Papadopoulos</surname>
          </string-name>
          , Laurens Versluis, André Bauer, Nikolas Herbst, Jóakim Von Kistowski, Ahmed Ali-Eldin, Cristina Abad, José Nelson Amaral, Petr Tma, and
          <string-name>
            <given-names>Alexandru</given-names>
            <surname>Iosup</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Methodological principles for reproducible performance evaluation in cloud computing</article-title>
          .
          <source>IEEE Transactions on Software Engineering</source>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <article-title>Prometheus 2020</article-title>
          .
          <article-title>An open-source monitoring system with a dimensional data model, flexible query language, eficient time series database and modern alerting approach</article-title>
          .
          <source>Retrieved December 20</source>
          ,
          <year>2020</year>
          from https://prometheus.io/
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Prometheus</surname>
            <given-names>Mixer</given-names>
          </string-name>
          <year>2020</year>
          .
          <article-title>Prometheus mixer is a query and remote read API proxy that reads raw samples from multiple Prometheus remote read endpoints, mixes them into time-series normalized to a single sample per interval and runs a query on the resulting data</article-title>
          .
          <source>Retrieved December 20</source>
          ,
          <year>2020</year>
          from https: //github.com/mz-techops/prometheus-mixer
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>