<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Designing an Interactive Dashboard for Automated Cloud Resource Management</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Kanak Mahadik Adobe Research San Jose</institution>
          ,
          <addr-line>CA 95113</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Sana Malik Adobe Research San Jose</institution>
          ,
          <addr-line>CA 95113</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Many applications today are deployed on the cloud and require their owners to decide how many resources (e.g., CPUs, memory) to allocate, known as provisioning. Due to the complexities of understanding variations in load, many applications are either under-provisioned or overprovisioned. We developed an automated resource configuration recommender system but found that application owners were hesitant to trust an automated system that may impact their applications' performance. Towards increasing trust, we built an interactive dashboard that allowed them to understand their resource usage, review the automated system's recommendation, and control when the recommendations are applied. We iteratively designed and piloted the system with owners of twenty cloud applications and discuss seven design needs for designing dashboards for automated resource management systems.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>________________________________________________________
Workshop proceedings Automation Experience across Domains
In conjunction with CHI'20, April 26th, 2020, Honolulu, HI, USA
Copyright © 2020 for this paper by its authors. Use permitted under
Creative Commons License Attribution 4.0 International (CC BY 4.0).
Website: http://everyday-automation.tech-experience.at</p>
    </sec>
    <sec id="sec-2">
      <title>Author Keywords</title>
      <p>automation; resource allocation; dashboard</p>
    </sec>
    <sec id="sec-3">
      <title>CCS Concepts</title>
      <p>•Human-centered computing ! Interactive systems
and tools;</p>
    </sec>
    <sec id="sec-4">
      <title>Introduction</title>
      <p>Performance of an application executing in the public cloud
invariably depends on its provisioned resources.
Underprovisioning can result in performance degradation and
costly application-level agreement (SLA) violations, while
over-provisioning leads to low resource utilization and wasted
money. Designing cloud applications such that they can
deliver on resource efficiency without performance
degradation is key to their success. However, deciding these
resource requirements is not straightforward for application
owners. Cloud applications undergo striking variations in
load and application owners don’t always have tools to
understand how their resources are being used over time.
To alleviate these pains, we developed an automated
resource configuration recommendation algorithm that
provides recommendations for right-sized resource
provisioning. However, during piloting, we found that application
owners were hesitant to make changes to their
applications’ configuration due to a lack of understanding of how
their allocated resources are being used and distrust that
the automated recommendations would not degrade their
applications’ performance. Based on previous literature in
the trust and automation domain, we decided to develop an
interactive dashboard for application owners to build trust in
the recommendations.</p>
      <p>In this paper, we (1) distill seven design needs for an
interactive dashboard for application owners through iterative
prototyping and expert reviews, and (2) present our final
system for automated cloud resource management(AutoCRM).</p>
    </sec>
    <sec id="sec-5">
      <title>Background</title>
      <sec id="sec-5-1">
        <title>Resource Management</title>
        <p>
          Previous work to aid right-provisioning falls into reactive and
predictive approaches. Reactive approaches, such as
autoscaling with predefined heuristics, are often used to adapt
to changes in load [
          <xref ref-type="bibr" rid="ref4 ref7">7, 4</xref>
          ]. However, these heuristics are
difficult to tune and can lead to poor quality-of-application
(QoS) if the change in resource demands is quicker than
the reconfiguration time. While predictive approaches, such
as artificial neural networks and reinforcement learning [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]
ease QoS issues, they are only known to capture simpler
workload behavior or are not scalable in production
environments, respectively. Our approach is predictive but uses
a closed-loop approach that not only predicts usage and
models the scaling behavior over time to generate
configuration parameters.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>Trust and Automation</title>
        <p>
          Much work has been done surrounding trust in automated
systems, including determining trust [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], building trust [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ],
and understanding the role of trust [
          <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
          ]. Carlson et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
surveyed factors for trust in the autonomous vehicle and
medical domains and provide guidelines for building trust in
systems. The authors found similarities and differences in
important factors between these two domains, such as the
ability to stay up-to-date, past performance, and verification,
which we use as a starting point for our system.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Understanding Application Owners’ Needs</title>
      <sec id="sec-6-1">
        <title>Method</title>
        <p>The initial recommender was piloted with six teams owning
twenty applications total. Each team saw CPU and
memory line charts showing allocated resources versus actual
usage, as well as the recommended allocation over time.
Based on feedback, we iteratively developed a prototype
in close collaboration with engineering teams and project
managers who were responsible for collecting resource
usage, identifying pilot teams, and overall coordination. The
development phase took about six months. We then
synthesized the feedback into seven design needs.
s
U
P</p>
        <p>C
AutoCRM
Backend
b</p>
      </sec>
      <sec id="sec-6-2">
        <title>Design Needs for Application Owners</title>
        <p>
          (1) Add automation in stages. Because no similar system
existed in a previously fully manual process, it was a large
shift in application owners’ workflows which resulted in a
lack of trust. Developing the system slowly in stages
ensured accuracy of the tool and increased users’ confidence.
For example, though possible, a fully automated version
of the system was not immediately deployed. Instead, a
mixed-initiative approach where users can manually apply
recommendations is used.
(2) Perform formative analysis to model user needs.
Because users were primarily concerned with not interrupting
application performance, it was essential to be conscious
about the recommendations and provide appropriate error
margins, since the real future demands of the application
are unknown [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. For example, instead of recommending
the optimal configurations, the recommender skews
towards slightly lower utilization to balance users’
expectations and comfort levels.
(3) Provide accountability for the system. Users needed
to easily monitor system behavior, so we provide logging for
every change that the system makes and methods for users
to override changes (undo) made by the system.
(4) Be aware of coupling between data availability and
system automation. Because of the need for
accountability, we limited the systems’ automation based on the
availability of data. For example, there is a one-day delay
for usage data, so changes are not made more than once
a day so that users can verify what the system is doing in
real-time. The system is not able to make multiple changes
without the application owner reviewing them, and this can
be artificially limited when necessary.
(5) Increase explainability where possible. Because
application owners were unable to understand their resource
needs, they were hesitant to lower provisioning and
compromise application performance. The charts allowed users
to see gaps between their allocations and actual utilization.
(6) Build trust through simulation. Similarly, users needed
to see the expected benefits clearly and the simulator
allows the users to preview the provisioning under the
system’s recommendations. Additionally, the forecaster allows
them to see future usage, and the simulator forecasts the
allocations to assure the user that the recommended
configurations will not be under-provisioned.
(7) Provide appropriate user controls In its first iteration,
the algorithm recommended only the number of
containers based on the total CPUs as selected by the user in the
sidebar. The CPU selection was presented as a slider, so
users could experiment with different CPU amounts and
understand how it affected utilization and cost, thinking it
would allow users to interact with the recommender and
increase trust. However, most application owners relied on
the pre-configured number of CPUs and did not experiment.
The next iteration removed the slider and instead showed
the projected utilization for the 5 nearest configurations,
however, this was crowded and confusing to users. Hence,
we refined the controller and simulator algorithms to directly
generate optimal CPU values which could be visualized.
        </p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Description of System</title>
      <p>Here we describe the final version of the system in two
parts: (1) the recommender backend and (2) the dashboard
UI.</p>
      <sec id="sec-7-1">
        <title>Recommender Backend</title>
        <p>The main components of AutoCRM (Fig. 1) are an
ARIMAenabled forecaster, a simulator, and a controller that
emFigure 2 (cont’d): The sidebar
displays the recommendations
with cost and efficiency
benefits. The line charts display
historically allocated (red) and
used (grey) CPUs and memory,
so users can understand their
applications’ resource needs.</p>
        <p>The simulated CPU allocation
based on the recommended
configuration is also shown on
the CPU chart (green).
ploys a carefully designed optimization function to arrive at
efficient application resource sizing values.
events) for an application. The configuration that minimizes
the cost function is recommended.</p>
        <p>
          Forecaster The Forecaster predicts future usage of an
application applying the ARIMA model [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] to the input data
(Fig 1a), chosen for its lower error in prediction than other
time-series models. The Forecaster employs the
HyndmanKhandakar algorithm [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] to implement the fitting process.
Simulator The simulator models resource resizing over
time using the forecasted usage data (Fig 1b) and
computes the cost function value for a configuration and interval
specified by the controller (Fig 1c).
        </p>
        <p>Controller The controller receives the predicted resource
usage values (Fig 1c) and compares two sets of resource
configurations for their utility in terms of utilization and
overheads using a cost function. The cost function is a weighted
sum of resource wastage and overheads (number of scaling</p>
      </sec>
      <sec id="sec-7-2">
        <title>User Interface</title>
        <p>The UI (Fig. 2) was designed to display the
recommendations with cost and efficiency benefits and help application
owners understand and compare their prior resource
utilization against the simulated utilization.</p>
        <p>Recommendation Display The sidebar displays the
recommended configurations and projected utilization and cost
savings. We provide both “Before” and “After”
measurements so application owners can directly gauge any
costand resource-saving benefits.</p>
        <p>User Controls Many application owners have multiple
applications, so a dropdown menu allows them to choose the
application of interest. Its repository name and region are
shown to the right. The date picker allows users to choose
from pre-defined date ranges (last week, last month, last
three months, last six months) or select a custom date
range. Most importantly, users can manually Apply
recommendations or Undo the last application.</p>
        <p>Utilization Charts As discussed, the most important
aspect was for users to examine their utilization history. Thus,
we provide CPU and memory charts that display historical
allocated versus actual usage for each resource. Next, it
was important for users to see how the resources would
scale given their utilization for a particular period, so in the
CPU chart, the simulator results are also shown, which
made users more comfortable in reducing allocation
without sacrificing performance or u ptime. Users can pan and
zoom on the charts to inspect the data closer.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Conclusion</title>
      <p>In this paper, we present design needs and a system for
automated resource management. We developed a backend
recommender and an accompanying dashboard UI to
increase trust in the UI and share our most important lessons
learned that can benefit the c ommunity. Future work
includes deploying the system to the company at large and
evaluating the adoption rates for the recommendations, as
well as fully automating the system in the long-term.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgements</title>
      <p>We would like to thank Israel Derdik, Travis Borovatz, and
Chandler Allphin for their valuable feedback and support
during designing, deploying, and evaluating the system.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>George</surname>
            <given-names>E.P.</given-names>
          </string-name>
          <string-name>
            <surname>Box</surname>
          </string-name>
          ,
          <string-name>
            <surname>Gwilym M. Jenkins</surname>
          </string-name>
          ,
          <string-name>
            <surname>Gregory C. Reinsel</surname>
          </string-name>
          , and
          <string-name>
            <surname>Greta</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ljung</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Time series analysis: forecasting and control</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.S.</given-names>
            <surname>Carlson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Desai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.L.</given-names>
            <surname>Drury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kwak</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.A.</given-names>
            <surname>Yanco</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Identifying factors that influence trust in automated cars and medical diagnosis systems</article-title>
          .
          <source>AAAI Spring Symp. - Tech. Report (01</source>
          <year>2014</year>
          ),
          <fpage>20</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Sylvain</given-names>
            <surname>Daronnat</surname>
          </string-name>
          , Leif Azzopardi,
          <string-name>
            <given-names>Martin</given-names>
            <surname>Halvey</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Mateusz</given-names>
            <surname>Dubiel</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Human-agent collaborations: trust in negotiating control</article-title>
          .
          <source>CHI</source>
          <year>2019</year>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Xavier</given-names>
            <surname>Dutreilh</surname>
          </string-name>
          , Aurélien Moreau, Jacques Malenfant, Nicolas Rivierre, and
          <string-name>
            <given-names>Isis</given-names>
            <surname>Truck</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>From data center resource allocation to control theory and back</article-title>
          .
          <source>In IEEE Intl. Conf. on Cloud Computing</source>
          .
          <fpage>410</fpage>
          -
          <lpage>417</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Zhenhuan</given-names>
            <surname>Gong</surname>
          </string-name>
          , Xiaohui Gu,
          <string-name>
            <given-names>and John</given-names>
            <surname>Wilkes</surname>
          </string-name>
          .
          <year>2010</year>
          . Press:
          <article-title>Predictive elastic resource scaling for cloud systems</article-title>
          .
          <source>In Intl. Conf. on Network and Service Management</source>
          .
          <fpage>9</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Rob</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Hyndman</surname>
          </string-name>
          , Yeasmin Khandakar, and others.
          <source>2007</source>
          .
          <article-title>Automatic time series for forecasting: the forecast package for</article-title>
          <source>R. Number</source>
          <volume>6</volume>
          /07. Monash University.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Tania</given-names>
            <surname>Lorido-Botran</surname>
          </string-name>
          , Jose Miguel-Alonso, and Jose A Lozano.
          <year>2014</year>
          .
          <article-title>A review of auto-scaling techniques for elastic applications in cloud environments</article-title>
          .
          <source>J. of grid computing 12</source>
          ,
          <issue>4</issue>
          (
          <year>2014</year>
          ),
          <fpage>559</fpage>
          -
          <lpage>592</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Alexander</surname>
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Mirnig</surname>
            , Philipp Wintersberger, Christine Sutter, and
            <given-names>Jürgen</given-names>
          </string-name>
          <string-name>
            <surname>Ziegler</surname>
          </string-name>
          .
          <article-title>A Framework for Analyzing and Calibrating Trust in Automated Vehicles</article-title>
          .
          <source>In Adjunct Proc. of the Intl. Conf. on Automotive User Interfaces and Interactive Vehicular Applications (AutomotiveUI '16 Adjunct)</source>
          .
          <fpage>33</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Holly</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Yanco</surname>
          </string-name>
          , Munjal Desai, Jill L.
          <string-name>
            <surname>Drury</surname>
            , and
            <given-names>Aaron</given-names>
          </string-name>
          <string-name>
            <surname>Steinfeld</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Methods for Developing Trust Models for Intelligent Systems</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>