<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Online Performance Model Extraction in Virtualized Environments?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Simon Spinner</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Samuel Kounev</string-name>
          <email>kounevg@kit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaoyun Zhu</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mustafa Uysal</string-name>
          <email>muysalg@vmware.com</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Karlsruhe Institute of Technology</institution>
          ,
          <addr-line>KIT</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Virtualization increases the complexity and dynamics of modern software architectures making it a major challenge to manage the end-to-end performance of applications. Architecture-level performance models can help here as they provide the modeling power and analysis exibility to predict the performance behavior of applications under varying workloads and con gurations. However, the construction of such models is a complex and time-consuming task. In this position paper, we discuss how the existing concept of virtual appliances can be extended to automate the extraction of architecture-level performance models during system operation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        and viewed as a black box. This hinders ne-grained performance predictions
that are necessary for e cient resource management (e.g., predicting the e ect
on the response time, if a virtual machine of an application tier is replicated
or migrated). Therefore, newer approaches to online performance and resource
management (e.g. DMM [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ]) are based on the more powerful architecture-level
performance models for ne-grained performance predictions. However, building
architecture-level performance models that accurately capture the di erent
aspects of system behavior is a time-consuming and challenging task when applied
manually to large real-world systems [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Often, no explicit architecture
documentation of the system exists, and hence, the model must be built from scratch.
Additionally, experiments and measurements must be conducted to
parameterize and calibrate the model, such that it re ects the system behavior accurately.
Moreover, a major challenge is to ensure that models derived based on
measurements of the system in an o ine setting would be representative of the actual
system behavior in the real production environment. Given the high costs of
building performance models, techniques for automated model extraction based
on observation of the system at run-time are highly desirable.
      </p>
      <p>The contributions of this position paper are: a) we describe an extension
of the notion of virtual appliance with integrated logic for performance model
extraction, b) we propose an approach for how an end-to-end architecture-level
performance model can be obtained in virtualized environments with a
heterogeneous software stack, and c) we present a research roadmap for implementing
the proposed approach. The paper is structured as follows: Section 2 gives a brief
overview of related work in the eld of automatic model extraction and of our
preliminary work. Section 3 describes the vision and the approach in detail and
identi es research challenges.
2</p>
    </sec>
    <sec id="sec-2">
      <title>State-of-the-Art</title>
      <p>
        Related Work Current performance monitoring and management tools in
industry (e.g., Hyperic or Dynatrace Diagnostics) can provide large amounts of raw
performance data, however, they lack the ability to generate performance
abstractions of the monitored systems and applications. Approaches such as [
        <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
        ]
use systematic measurements to build black-box mathematical models. However,
they only serve as interpolation of the measurements. Predictive performance
models are extracted for example in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], where run-time monitoring data is used
to derive the model parameters of prede ned queueing Petri net models.
Extraction of structural information is considered for example for UML sequence
diagrams [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and for LQNs [
        <xref ref-type="bibr" rid="ref8 ref9">8,9</xref>
        ].
      </p>
      <p>
        Existing work on extracting architecture-level performance models is either
based on static code analysis or assumes a strictly controlled environment.
In [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], behavior models are extracted via static and dynamic analysis, however,
this is done in an o ine setting requiring ne-grained manual instrumentation of
applications. The described approaches are focused on the application level and
do not explicitly consider the in uences of the lower system layers. To quantify
the impact of the virtualization platform on the application performance,
microbenchmarks are used in [
        <xref ref-type="bibr" rid="ref11 ref12">11,12</xref>
        ]. However, no explicit model of the performance
in uence of the virtualization platform is proposed.
      </p>
      <p>
        Preliminary Work The approach we propose in this position paper is based on
the experiences we gained in our preliminary work. In [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ], we describe the
Descartes Meta-Model (DMM) which is an architecture-level modeling language
for online performance and resource management. It enables to describe the
performance in uence of di erent system layers in independent sub-models, which
can be automatically composed at run-time enabling online performance
prediction. The combined model can be automatically transformed to di erent
alternative underlying stochastic models (queueing networks, stochastic Petri nets, and
ne-grained custom simulation models), which in turn can be solved using
different solution techniques (exact analytical techniques, numerical approximation
techniques, simulation and bounding techniques). While DMM provides a
powerful and exible tool for online predictions, the manual creation of these models
can be complex and time-consuming. Therefore in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], we investigated the
feasibility of extracting architecture-level performance models at system run-time.
We used low-level monitoring data obtained through application instrumentation
to extract an architecture-level performance model of the SPECjEnterprise2010
standard benchmark. While the resulting models were able to predict the system
performance within an acceptable error margin (mostly 10-20 percent) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], this
approach has two major drawbacks, limiting its practical applicability: (i) the
extraction is focused on the application level and does not construct detailed
models of the lower layers of a system (e.g., virtualization and middleware), and
(ii) the approach is restricted to a speci c software stack (i.e., Java EE
application server, WebLogic Diagnostic Framework for the application
instrumentation). The former limits the prediction accuracy of the extracted models in
virtualized environments and their usage for con guration-based what-if analysis.
The latter hinders its application in heterogeneous environments with di erent
software stacks. The goal of the proposed approach is to overcome these
limitations and enable the automatic model extraction in virtualized environments
with heterogeneous software stacks.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Vision and Approach</title>
      <p>Vision To simplify the creation and maintenance of architecture-level
performance models, we envision a novel class of virtualization platforms with
integrated capabilities for the automatic extraction of such models at system
runtime. We assume an environment where a virtual machine monitor (VMM) hosts
a set of virtual appliances (VA) with heterogeneous software stacks. VAs are
prepackaged VM images each containing a complete software stack ready to run
on a virtualization platform. VAs are becoming increasingly popular in system
management since they signi cantly reduce the e ort and knowledge needed for
deploying software systems. For instance, there are VAs available providing a
pre-con gured Tomcat application server or Zimbra collaboration server. These
VAs are built by experts of the respective system and can then be shared with
others (e.g., through online marketplaces, such as VMware Solution Exchange3).</p>
      <p>We argue that the notion of a VA should be extended to include additional
logic for extracting performance models of the application as well as the
middleware layers during run-time. A performance engineer, who has expertise in
performance modeling, can speci cally design the extraction logic for the
respective software stack. When such a VA is deployed in a virtualized environment,
the model extraction logic will start to monitor the application serving real
production workloads and will automatically built a performance model of the VA.</p>
      <p>A virtualization platform that is aware of the model extraction logic within
the VA can then exploit the extracted performance models for online
performance and resource management. However, to evaluate the performance impact
of changes at the VMM level, we also need a model of the performance-relevant
factors of the VMM and their in uence on the VAs. Therefore, the VMM also
needs to be extended with the capability of creating such models, so that an
end-to-end performance model of the VA and the VMM can be extracted.</p>
      <p>WebVSAerver</p>
      <p>AppServer</p>
      <p>VA
HTTPD</p>
      <p>OS
VMM
lode itcno
M tra
VA xE</p>
      <p>Java EE</p>
      <p>OS
lode itcno
M tra</p>
      <p>VA xE
IMnosdtreulmExetnratacttiioonnCoordinator</p>
      <p>Controller
VMM Model Extraction</p>
      <p>Database</p>
      <p>VA
DB
OS
lde itno
o c
M tra
VA xE</p>
      <p>VA Model</p>
      <p>VMM Model
Approach Figure 1 gives an overview of the proposed architecture. The
major components are VA Model Extraction, VMM Model Extraction and Model
Extraction Coordinator. The VA Model Extraction component is delivered as
part of each VA. It contains pre-de ned model skeletons that capture
structural knowledge of the software stack, a set of monitoring probes, and an
executable extraction process. The extraction process describes how to compose
and parameterize an end-to-end performance model of the VA based on model
skeletons, static con guration data (e.g., server con guration les or deployment
descriptors) and dynamic monitoring data (e.g., call path traces). Typically, an
extraction process consists of the extraction of the static and dynamic
architecture (e.g., application components, active and passive resources, inter- and
intra-component control ow) and the model parameterization (e.g., resource
demands, and branching probabilities). The degree to which this information is
known beforehand and can be integrated as model skeletons, heavily depends
on the type of VA. For instance, if the VA contains a complete application (e.g.,
3 https://solutionexchange.vmware.com/store
a wiki or a mail server), the architecture can be provided beforehand and at
run-time it is only necessary to estimate the model parameters for the current
environment. In contrast, in case of a Java EE application server, the creator
of the VA does not know the applications that will run on top of it. Therefore,
he needs to integrate logic to determine the current application components and
instrumentation probes to observe the control ow of the application.</p>
      <p>The VMM Model Extraction is tightly coupled with the VMM and observes
its internal state and con guration to build a model that captures the overhead of
the VMM and contention e ects due to the sharing of physical resources. We plan
to derive regression-based models describing the overhead and contention e ects
depending on the current utilization of the physical resources and the VMM
con guration (e.g., caps, priority and a nity settings of the scheduler). The
models will be extracted based on online monitoring data provided by the VMM.
If necessary, we also consider to use micro-benchmarks in order to determine
certain performance characteristics of the VMM (e.g., to determine the overhead
for certain workload mixes). Such micro-benchmarks can either be run in an
initial step, when installing a new virtual host, or during system operation in
phases of low workload intensity.</p>
      <p>The Model Extraction Coordinator controls the model extraction
components in the VMM and VAs and triggers the initial extraction or the update of
the performance models. It also validates the extracted models continuously by
comparing the model predictions with observations on the real system. If the
predictions deviate signi cantly from the actual performance, the current model
will be updated by repeating the model parameterization or changing the model
structure. Furthermore, it monitors the state of the environment and triggers the
model extraction process if it observes any changes (e.g., con guration changes
in the VA or the VMM).</p>
      <p>
        The extracted models of the VMM and VA are based on the Descartes
MetaModel (DMM) [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ]. The latter allows to dynamically compose the automatically
extracted submodels of the VA and the VMM in order to answer con
gurationbased what-if questions. Using DMM as the output model for the model
extraction o ers the exibility to employ di erent analysis techniques for model
solution depending on the required accuracy and speed.
      </p>
      <p>Research Challenges The described approach raises a number of research
challenges targeted as part of our on-going work:</p>
      <p>A generic mechanism to package the model extraction logic in the VAs needs
to be de ned including an interface for exchanging information between the
VMM and VAs during model extraction.</p>
      <p>New languages to simplify the implementation of the model extraction for
various VAs will be designed (e.g., for specifying instrumentation probes in a
technology-agnostic manner, or for specifying rules for abstracting the control
ow of an application).</p>
      <p>Techniques for reliably estimating resource demands in virtualized systems are
necessary (e.g., in uence of virtualization e ects, and parallel processing on
multi-core processors).</p>
      <p>Methods for autonomic online validation and calibration of performance
models are crucial to ensure the representativeness of the extracted models.
Methods to quantify the performance in uence of the virtualization platform
during system operation are necessary to extract the VMM models.
Automatic techniques to detect con guration changes and to determine their
in uence on the performance models are desirable.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Brosig</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huber</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kounev</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Architecture-Level Software Performance Abstractions for Online Performance Prediction</article-title>
          .
          <source>Elsevier Science of Computer Programming Journal (SciCo)</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Huber</surname>
          </string-name>
          , N.,
          <string-name>
            <surname>van Hoorn</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koziolek</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brosig</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kounev</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Modeling RunTime Adaptation at the System Architecture Level in Dynamic Service-Oriented Environments</article-title>
          .
          <source>Service Oriented Computing and Applications</source>
          (
          <year>2013</year>
          )
          <article-title>In print</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kounev</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>: Performance Modeling and Evaluation of Distributed ComponentBased Systems Using Queueing Petri Nets</article-title>
          .
          <source>IEEE Trans. on Softw. Eng</source>
          .
          <volume>32</volume>
          (
          <issue>7</issue>
          ) (
          <year>2006</year>
          )
          <volume>486</volume>
          {
          <fpage>502</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Westermann</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Happe</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Towards Performance Prediction of Large Enterprise Applications Based on Systematic Measurements</article-title>
          .
          <source>In: Proc. of the 15th Intl. Workshop on Component-Oriented Programming</source>
          . (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Courtois</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Woodside</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Using Regression Splines for Software Performance Analysis</article-title>
          .
          <source>In: Proc. of the 2nd Intl. Works. on Software and Performance</source>
          . (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kounev</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bender</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brosig</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huber</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Okamoto</surname>
          </string-name>
          , R.:
          <source>Automated Simulation-Based Capacity Planning for Enterprise Data Fabrics. In: 4th Intl. ICST Conf. on Simul. Tools and Techniques</source>
          . (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Briand</surname>
            ,
            <given-names>L.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Labiche</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leduc</surname>
          </string-name>
          , J.:
          <article-title>Toward the Reverse Engineering of UML Sequence Diagrams for Distributed Java Software</article-title>
          .
          <source>IEEE Trans. on Softw. Eng</source>
          .
          <volume>32</volume>
          (
          <issue>9</issue>
          ) (
          <year>2006</year>
          )
          <volume>642</volume>
          {
          <fpage>663</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hrischuk</surname>
            ,
            <given-names>C.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Woodside</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rolia</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iversen</surname>
          </string-name>
          , R.:
          <article-title>Trace-Based Load Characterization for Generating Performance Software Models</article-title>
          .
          <source>IEEE Trans. on Softw. Eng</source>
          .
          <volume>25</volume>
          (
          <issue>1</issue>
          ) (
          <year>1999</year>
          )
          <volume>122</volume>
          {
          <fpage>135</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Israr</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Woodside</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franks</surname>
          </string-name>
          , G.:
          <article-title>Interaction Tree Algorithms to Extract Effective Architecture and Layered Performance Models from Traces</article-title>
          .
          <source>J. Syst. Softw</source>
          .
          <volume>80</volume>
          (
          <issue>4</issue>
          ) (
          <year>2007</year>
          )
          <volume>474</volume>
          {
          <fpage>492</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Krogmann</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuperberg</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reussner</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Using Genetic Search for Reverse Engineering of Parametric Behaviour Models for Performance Prediction</article-title>
          .
          <source>IEEE Trans. on Softw. Eng</source>
          .
          <volume>36</volume>
          (
          <issue>6</issue>
          ) (
          <year>2010</year>
          )
          <volume>865</volume>
          {
          <fpage>877</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Wood</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cherkasova</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ozonat</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shenoy</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Pro ling and Modeling Resource Usage of Virtualized Applications</article-title>
          .
          <source>In: Proc. of the 9th ACM/IFIP/USENIX Intl. Conf. on Middleware</source>
          . (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , H.,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoshihira</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smirni</surname>
          </string-name>
          , E.:
          <article-title>Untangling Mixed Information to Calibrate Resource Utilization in Virtual Machines</article-title>
          .
          <source>In: Proc. of the 8th ACM Intl. Conf. on Autonomic Computing</source>
          . (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Brosig</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huber</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kounev</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <source>Automated Extraction of ArchitectureLevel Performance Models of Distributed Component-Based Systems. In: 26th IEEE/ACM Intl. Conf. On Automated Softw. Eng</source>
          . (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>