<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Tuning Browser-to-Browser Ofloading for Heterogeneous Stream Processing Web Applications</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Informatics, University of Lugano (USI)</institution>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <fpage>31</fpage>
      <lpage>36</lpage>
      <abstract>
        <p>Software that runs on the Cloud may be ofloaded to clients to ease the computational efort on the server side, while clients may as well ofload computations back to the cloud or to other clients if it becomes too taxing on their machines. In this paper we present how we autonomically deal with browser-to-browser operator ofloading in Web Liquid Streams, a stream processing framework that lets developers implement streaming topologies on any Web-enabled device. We show how we first implemented the ofloading of streaming operators, and how we subsequently improved our approach. Our experimental results show how the new approach takes advantage of additional resources to reduce the end-to-end message delay and the queue sizes under heavy load.</p>
      </abstract>
      <kwd-group>
        <kwd>Stream Processing</kwd>
        <kwd>Peer to Peer Ofloading</kwd>
        <kwd>Autonomic Controller</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In recent years the Web has become an important platform to develop any kind of
application. With the requirement that applications show results updated in real
time, it has become important for developers to use suitable stream processing
abstractions and frameworks [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In this paper we explore the opportunity to
ofload part of the computation across diferent devices on which the stream
processing topology is distributed. For example, ofloading and migrating part of
the computation eases Web servers deployed in the Cloud from computational
efort [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], while reducing the response time on clients that can locally perform
part of the computation instead of simply rendering the result. Conversely, energy
consumption may become a concern and thus the reverse ofloading may happen,
from clients back to the Cloud [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], or – as we are going to discuss in this paper –
to other clients nearby.
      </p>
      <p>
        In this paper, we take the ofloading concept and apply it in a distributed
streaming infrastructure [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] where clients and the cloud are tightly coupled to
form stream processing topologies built using the Web Liquid Streams (WLS [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ])
framework. WLS lets Web developers setup topologies of distributed streaming
operators written in JavaScript and run them across a variety of heterogeneous
Web-enabled devices.
      </p>
      <p>
        WLS can be used to develop streaming topologies for data analytics, complex
event processing [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], real-time sensor monitoring, and so on. Makers [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] that want
to automate their homes and ofices with Web-enabled sensors and
microcontrollers can use WLS to deploy full JavaScript applicatons across their home
server and house sensors without dealing with diferent programming languages.
      </p>
      <p>Streaming operators are implemented in JavaScript using the WLS primitives,
and are deployed by the WLS runtime across the pool of available devices.
Users may let the runtime decide where to distribute the operators, or manually
constrain where each operator has to run. This is done through a topology
description file which is used to deploy and start a topology. The WLS runtime
is in charge to bind the operators using the appropriate channels and start
the data stream. Depending on the hosting platform, we developed diferent
communication infrastructure. For server-to-server communication we make use
of ZeroMQ, for server-to-browser and browser-to-server we use WebSockets, while
for browser-to-browser communication we use the recently developed WebRTC.</p>
      <p>
        WLS implements an autonomic controller in charge to increase or decrease
parallelism at operator level by adding WebWorkers in bottleneck situations,
and removing them when they become idle. At topology level, the controller
parallelizes the execution of operators across multiple devices by splitting the
operator, or fusing it back together when bottlenecks are solved, depending on
the variable data load [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        In this paper we focus on the controller running on each Web browser and
how it can be tuned to make operator ofloading decisions based on diferent
policies and threshold settings. This work is based on our previous work on the
distributed controller infrastructure [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Operator Ofloading Controller</title>
      <p>The Web Liquid Streams controller deals with bottlenecks and failures by
migrating streaming operators on available machines, efectively ofloading the work
from the overloaded machines.</p>
      <p>The controller constantly monitors the topology and its operators in order to
detect bottlenecks and/or failures and solve them. In particular, it queries the
streaming operators as the topology runs, and by keeping into account the length
of the queues, the input and output rates, as well as the CPU consumption, it
takes decisions to improve the throughput.</p>
      <p>
        We currently have two distinct implementations of the controller
infrastructure: one dedicated to Web server and microcontroller operators (Node.JS
implementation) and one running on Web browsers. We decided to have two
diferent implementations because of the diferent kind of environmental
performance metrics we can access. In Node.JS our controller has access to many more
details regarding the underlying OS, the available memory, CPU utilization, and
network bandwidth. For example, to trigger a migration or an operator split in
a Web server, the controller needs to check the CPU utilization of the machine
and decides to ask for help when it reaches a given specified threshold [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>The Web browser controller has only access to a subset of the environment
metrics, which also heavily depends on the Web browser being used. Thus, the
decision policy needs to be adapted accordingly. In a Web browser
environment we only know how many CPUs the machine has available through the
window.navigator.hardwareConcurrency API. We thus decided to use
this number as a cap for the number of maximum concurrent WebWorkers on
the Web browser.
2.1</p>
      <sec id="sec-2-1">
        <title>First Iteration of the Controller</title>
        <p>The first implementation of the Web browser controller was designed to behave
very similarly to its Web server counterpart. All the functionalities related to
fault tolerance and load balancing were implemented in the same way using a
diferent approach. While the controller cycle was left at 500 milliseconds per
cycle, we applied diferent approaches given the diferences in environment.</p>
        <p>To compute the CPU usage we relied on the hardwareConcurrency API.</p>
        <p>P (t) &gt; TCP U ∗ hardwareConcurrency
When the number of WebWorker threads P on the machine reached the amount
of concurrent CPUs TCP U available on the machine (100% CPU capacity), the
controller raised a CPU overload flag to the central WLS runtime, which in turn
contacted a free host to make it start running a copy of the overloaded operator,
thus parallelizing its execution across diferent machines.</p>
        <p>WLS also support flow control to avoid overfilling queues of overloaded
operators. This ”slow mode” is also triggered by the controller using a
twothreshold rule:</p>
        <p>Q(t) &gt; T qh →
&lt; T ql →</p>
        <p>SlowM odeON</p>
        <p>SlowM odeOF F</p>
        <p>
          The idea behind the slow mode is to slow down the input rate of a given
(overloaded) operator to help it dispatch the messages in its queue Q, while
increasing the input rate on other instances of said operator. Once the queue is
consumed below a given threshold T ql, the controller removes the slow mode,
reenabling the normal stream flow. In [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] we tuned many aspects of the controller,
including the slow mode, for three diferent families of experiments. Results
suggested that T qh = 20 messages in the queue were enough to trigger the
slow mode, which was released the moment the queue reached T ql = 10 or less
elements.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Improving the Controller</title>
        <p>By stressing the controller through further experiments we noticed that the
metrics given by the hardwareConcurrency API were not suficiently precise
to make the controller behave correctly. Very high throughputs and big message
sizes in fact showed how the controller took too much time in noticing the
bottleneck and asking for help, resulting in very high delays and big queues. We
addressed the problem by tuning some controller parameters. We reduced the
cycle of the controller from 500 to 300 milliseconds per cycle. We then halved
the threshold to trigger the CPU flag to TCP U = 50%.</p>
        <p>Finally, we also halved the thresholds to trigger and release the slow mode
while maintaining the same formula, obtaining T qh = 10 and T ql = 5.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Evaluation</title>
      <p>To evaluate the performance improvement of the various Web browser controller
configurations, we developed a Web cam streaming application. The proposed
topology is a simple three-staged pipeline where the producer (first stage) gathers
data from the user’s Web cam. The filter (second stage) receives the image,
runs a face detection algorithm and draws a decoration over the heads of the
detected faces, then forwards the result downstream. This stage is intentionally
made heavy in terms of CPU time in order to create bottlenecks and stress
the controller. The consumer (third stage) shows the image on screen. All the
operators run on a Web browser.</p>
      <p>The machines used are three MacBook Pros, one with 16GB RAM 2.3 GHz
Intel Core i7 (peer 1), one with 4GB RAM 2GHz Intel Core i7 (peer 2), and one
with 4GB RAM 2.53GHz Intel Core i5 (peer 3). All the machines run macOS
Sierra 10.12 and running Chrome version 45.0.2454.101 (64-bit). We are using an
old Chrome version because of recent restrictions related to the use of WebRTC.
The WLS server side runs on a 48-core Intel Xeon 2.00GHz processors and 128GB
RAM.</p>
      <p>For this experiment we deployed the producer and consumer on peer 3, while
used peer 1 (P1) and peer 2 (P2) as free machines where the WLS runtime can
run and eventually ofload the computation of the filter. By default, the WLS
runtime picks the strongest machine (peer 1) to run the filter at the beginning,
and subsequently ofloads the computation on peer 2. We decided to send a total
of 6000 messages for this experiment at the rate of 75 milliseconds per message
(about 13 messages per second). The size of the Web cam image is fixed to
400x300 pixels, which are converted into messages to be sent, for a total weight
of about 800Kb per message. We used the WiFi (averaging around 200Mbit per
second) to simulate a normal use case scenario environment.</p>
      <p>For this kind of experiment, we decided to focus on the delay as the main
metric to measure. In streaming applications that may take into account real-time
sensor data, we want to be able to process this data as soon as possible with as
little delay as possible. We make use of the queue size as well to show how, by
transitioning from one implementation of the controller to another, we are able
to keep the queue size small by parallelizing faster and more eficiently.</p>
      <p>The results show three diferent implementations of the controller (C1, C2,
C3): the first one and the third one represent the two controller
implementations described in Section 2. The second one shows a compromise between the
two, a similar controller with a cycle frequency of 300 milliseconds and with
C1P1
C1P2
C2P1
C2P2
C3P1
C3P2
s
e
g3,000
a
s
s
e
fM2,000
o
r
e
bm1,000
u
N</p>
      <p>60
ze 40
i
S
e
u
e
u 20
Q</p>
      <p>0
2 4 6 8
Message Delay (ms) · 104</p>
      <p>C1P1 C1P2 C2P1 C2P2 C3P1 C3P2
the concurrency check halved with respect to the number of processes on the
hosting machine, but with the first slow mode implementation. Table 1 shows
the diferences in the three configurations.</p>
      <p>Figure 1 shows the distribution of the end-to-end message delay of the three
configurations as well as the boxplot (mix-max range, mean and quartiles) of the
queue size. We cut the delay axis at 100 seconds delay for readability.</p>
      <p>The first configuration shows two curves that are slowly growing in terms of
number of messages, while keep increasing in delay. This is given by the fact that
with our original configuration, under such circumstances, the controller was too
slow to raise a CPU flag to the WLS runtime, while the slow mode was triggered
too late. The queue boxplots show they were filled and kept being so, inducing
message loss and resulting in this small and slowly increasing curve.</p>
      <p>The second configuration shows two distinct curves that correctly processed
the messages in the topology and processed the vast majority of messages with a
delay of 3 to 35 seconds. Both curves follow a similar trend, one being the peer 1
(strongest machine), processing more messages, while the other being on peer
2. By increasing the frequency of the controller cycle and raising a CPU flag
sooner, we are able to deal with the increasing queue sizes, keeping them short,
and parallelize the work sooner.</p>
      <p>A similar trend can be found in the third configuration, where we notice that
the majority of messages is executed with less than 10 seconds delay. Both curves
grow with the same shape, keeping the delay lower than the second configuration.
Tuned Parameter
Controller Cycle
The further improvement in this configuration shows how, by triggering the slow
mode earlier, we are able to keep queues even emptier, and thus lowering the
delays of the majority of the messages.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and Future Work</title>
      <p>
        In this paper we have introduced how we approach browser-to-browser operator
ofloading in the Web Liquid Streams framework. We described our initial control
infrastructure and how we improved it to be more responsive in case of increasing
workloads. The experimental evaluation shows the benefits of the approach by
demonstrating the improvements on the measured end-to-end message delay and
operator queue sizes. By increasing even more the workload we may eventually
end up filling the queues and all the processors available on every machine. To
solve this problem we are implementing support for load shedding [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] that should
be triggered by the controller.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Hochreiner</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schulte</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dustdar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lecue</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Elastic stream processing for distributed environments</article-title>
          .
          <source>IEEE Internet Computing</source>
          <volume>19</volume>
          (
          <issue>6</issue>
          ) (
          <year>2015</year>
          )
          <fpage>54</fpage>
          -
          <lpage>59</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          :
          <article-title>Appmobicloud: Improving mobile web applications by mobile-cloud convergence</article-title>
          .
          <source>In: Proceedings of the 5th Asia-Pacific Symposium on Internetware. Internetware '13</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2013</year>
          )
          <volume>14</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          :
          <fpage>10</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Banerjee</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gopalakrishnan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Der Merwe</surname>
          </string-name>
          , J.:
          <article-title>Moca: A lightweight mobile cloud ofloading architecture</article-title>
          .
          <source>In: Proceedings of the Eighth ACM International Workshop on Mobility in the Evolving Internet Architecture</source>
          .
          <source>MobiArch '13</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2013</year>
          )
          <fpage>11</fpage>
          -
          <lpage>16</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Golab</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , O¨zsu, M.T.:
          <article-title>Data Stream Management</article-title>
          .
          <source>Synthesis Lectures on Data Management</source>
          . Morgan &amp; Claypool Publishers (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Babazadeh</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gallidabino</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pautasso</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Decentralized stream processing over web-enabled devices</article-title>
          .
          <source>In: 4th European Conference on Service-Oriented and Cloud Computing</source>
          . Volume
          <volume>9306</volume>
          ., Taormina, Italy, Springer (
          <year>September 2015</year>
          )
          <fpage>3</fpage>
          -
          <lpage>18</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Cugola</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Margara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Processing flows of information: From data stream to complex event processing</article-title>
          .
          <source>ACM Comput. Surv</source>
          .
          <volume>44</volume>
          (
          <issue>3</issue>
          ) (
          <year>June 2012</year>
          )
          <volume>15</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          :
          <fpage>62</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Anderson</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Makers : the new industrial revolution</article-title>
          .
          <source>Random House Business Books</source>
          , London (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hirzel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Soul´e,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Gedik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Grimm</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.:</surname>
          </string-name>
          <article-title>A catalog of stream processing optimizations</article-title>
          .
          <source>ACM Comput. Surv</source>
          .
          <volume>46</volume>
          (
          <issue>4</issue>
          ) (
          <year>March 2014</year>
          )
          <volume>46</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>46</lpage>
          :
          <fpage>34</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Babazadeh</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gallidabino</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pautasso</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Liquid stream processing across web browsers and web servers</article-title>
          .
          <source>In: 15th International Conference on Web Engineering (ICWE</source>
          <year>2015</year>
          ), Rotterdam,
          <string-name>
            <surname>NL</surname>
          </string-name>
          , Springer (
          <year>June 2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Gedik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>K.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>P.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Adaptive load shedding for windowed stream joins</article-title>
          .
          <source>In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management. CIKM '05</source>
          , New York, NY, USA, ACM (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>