=Paper=
{{Paper
|id=Vol-2066/emls2018paper03
|storemode=property
|title=Extracting Realistic User Behavior Models
|pdfUrl=https://ceur-ws.org/Vol-2066/emls2018paper03.pdf
|volume=Vol-2066
|authors=Reiner Jung,Marc Adolf
|dblpUrl=https://dblp.org/rec/conf/se/JungA18
}}
==Extracting Realistic User Behavior Models==
<pdf width="1500px">https://ceur-ws.org/Vol-2066/emls2018paper03.pdf</pdf>
<pre>
                           Extracting Realistic User Behavior Models
                                Reiner Jung                                       Marc Adolf
                         rju@informatik.uni-kiel.de                        mad@informatik.uni-kiel.de
                         Kiel University, Germany                           Kiel University, Germany


Abstract                                                                 may lose behavioral information, as two visits of a
                                                                         page might differ in purpose and a looping behavior
Workloads play a central role in assessing software
                                                                         might actually be dissimilar than another loop along
qualities, like performance and privacy. They are
                                                                         the same pages. (III) X-means only yields acceptable
characterized by intensity and user behavior patterns.
                                                                         results for small parameter vectors of at least ordinal
Combining multiple intensities and behaviors are used
                                                                         values, but current behavior models are mapped to
to create workload profiles which, among others, se-
                                                                         vectors. More model transitions imply more parame-
valuate software design, predict of system utilization.
                                                                         ters, which harm clustering [3].
The central challenge for workload profiles is their
                                                                            To mitigate these issues, we want (a) to advance be-
fit to real workloads and in particular the match to
                                                                         havior models to better capture the specific properties
specific behaviors. This is especially relevant for un-
                                                                         of different kinds of users based on domain knowledge
derstanding and identifying specific user groups and
                                                                         of the observed software system and other parameters,
support workload composition by operators.
                                                                         e.g., time, and (b) to improve clustering and classifica-
    In this paper, we address the identification of such
                                                                         tion of observed user behaviors. Therefore, we extend
realistic user behaviors by utilizing domain specific
                                                                         classic behavior models with domain knowledge and
attributes, report on our evaluation of the fitness of
                                                                         evaluate different aggregation approaches capable of
behavior clustering approaches, and discuss our setup
                                                                         handling large parameter sets or find ways to reduce
to evaluate further clustering approaches.
                                                                         the parameter sets to be able to use aggregation ap-
1     Introduction                                                       proaches well suited for limited number of parameters.
                                                                            In this paper, we report on our preliminary find-
Service quality of software systems is influenced by                     ings regarding two clustering approaches X-means and
workload intensity and the user behavior. Both fac-                      Expectation-Maximization (EM) [2] in context of re-
tors play a vital role characterizing the system work-                   alistic user behaviors, present additional approaches
load [7], which is relevant to understand past work-                     which we are currently investigating, and formulate
loads and construct workload profiles to estimate fu-                    key questions regarding the identification behavior
ture system utilization and performance. For exam-                       models and their quality.
ple, the resource consumption of browsing a catalog,                        The paper is structured as follows: Section 2 dis-
searching the inventory, and purchasing items can be                     cusses user behavior models. Section 3 introduces
quite different. Therefore, it is necessary to be able                   clustering and aggregation approaches. Section 4
to distinguish specific kinds of user behavior to char-                  presents the concept of realistic user behaviors. Sec-
acterize the workload sufficiently.                                      tion 5 describes the evaluation and Section 6 discusses
   State of the art workload characterization ap-                        preliminary results for X-means and EM clustering.
proaches, such as WESSBAS [8], use a behavior mix,                       Finally, Section 7 summarizes our findings, discusses
where different workload intensities are combined with                   further research, and describes our key questions.
specific user behavior models to construct a work-
load model. These approaches collect user sessions                       2     Behavior Models
and aggregate them to behavior models. WESSBAS
                                                                         Behavior models describe kinds of users and are ag-
estimates behavior models utilizing X-means cluster-
                                                                         gregations of single user behaviors with similar be-
ing [3]. Such behavior models have three key short-
                                                                         havior patterns. A single user behavior comprises all
comings: (I) They reflect the observed behavior of the
                                                                         system invocations (entry level events) of a user dur-
past, but might not represent specific user groups cor-
                                                                         ing a session. It can be modeled as a path over vis-
rectly harming predictability. For example, a deter-
                                                                         ited pages or transformed into a behavior graph or a
gent shopper might reappear frequently while a sun-
                                                                         Markov-chain, which may contain loops for repetitive
screen shopper has a different seasonal profile. Unfor-
                                                                         behavior. These paths or graphs are then grouped for
tunately, current approaches cannot distinguish be-
                                                                         similarity and merged into a behavior model.
tween them. (II) The behavior models use cyclic
                                                                            Figure 1 depicts an annotated behavior graph of
graphs with edge counts or probabilities to create
                                                                         a user interacting with the JPetStore [12], an exam-
compact representations. However, in this process we



EMLS 2018: 5th Collaborative Workshop on Evolution and Maintenance of Long-Living Software Systems @ SE18, Ulm, Germany         47
                                                                         new parameters for the next iteration are computed.
                                                                         These two steps are repeated until the convergence
                                                                         criterion is reached [2].
                                                                         Hierarchical Clustering Hierarchical clustering is
                                                                         a approach which builds a hierarchy of clusters.
                                                                         Where the root cluster contains all individuals which
                                                                         are then further divided into smaller clusters. The hi-
                                                                         erarchy can either be build up from the root cluster
    Figure 1: Behavior graph representing a shopper                      to the leaves (divisive) or vise versa (agglomerative).
            from our JPetStore [12] use case.                            With the agglomerate approach, we start with clusters
                                                                         containing each only one individual. Then we deter-
ple application resembling a shop system for pets. In                    mine the distance between all pairs of clusters. The
this graph, nodes represent page visits and edges ex-                    pair which has the shortest distance is then merged
press the transitions between pages. The numbers at                      into one cluster. This process is repeated until all in-
the edges indicate the amount of transitions between                     dividuals are merged into one single cluster [4, p. 95].
pages. In addition, we added domain specific infor-                      The shortest distance between two elements is deter-
mation, like the viewed category and product, which                      mined by a distance function, like Euclidian distance
can be used to support the behavior clustering.                          or Manhattan distance.

3     Clustering Approaches                                              Similarity Matching In contrast to the other ap-
                                                                         proaches, similarity matching uses two metrics to
Clustering can be used to identify groups of data                        compare graphs based on structural similarity and on
points which share similarities. In our context, we                      the distance of parameter values based on their se-
use clustering to identify user behavior models, like                    mantic similarity. The algorithm computes initially
WESSBAS, which uses X-means. Clustering is af-                           the distance between each graph of a set of behavior
fected by density, distances, and distribution of data                   graphs, creating a vector for each graph containing the
points. Depending on the clustering approach, the                        distances to all others graphs. The distance   between
dimension of the data points can have a significant                                                            Pn
                                                                         two vectors is the sum of differences i=1 |da,i − db,i |
impact on the quality of the clusters. We employ clus-                   where n is the number of graphs, and dj,i refers to
tering methods provided by Weka [4]:                                     the values in a vector j. Graphs where the distance
X-Means X-means builds on the K-means cluster-                           is lower than a defined threshold, are then considered
ing algorithm [3], which consists of three steps [1]: (1)                similar and grouped together.
For every expected cluster (K), a center point, called                       These groups are further divided based on their
centroid, is randomly chosen from the data points. (2)                   semantic difference. For example, in JPetStore cate-
According to a chosen distance metric, each point x                      gories, products, and items form a tree. The distance
of the data set is assigned to the closest centroid. (3)                 between two values in the tree determines their se-
The centroids are recomputed according to the center                     mantic difference, e.g., two cats Fritz and Felix be-
of mass of the points belonging to it. (2) and (3) are                   long both to the product male cartoon cat, having a
repeated until a convergence criterion is met.                           distance of one, while the cat Amber belong to the
   In contrast to K-means, X-means searches over a                       product female cat, and therefore the distance is two.
range (e.g., 2 to 10) for a set of clusters, which provide
the best fit. Therefore, X-means starts with comput-
                                                                         4     Realistic User Behaviors
ing K-means for the lower bound (e.g., K=2). Subse-                      We define realistic user behavior models as behavior
quently, each cluster is split into two using 2-means to                 models which reflect real groups of users in contrast
try to improve the fit. Both steps are iterated while                    to approximated groups, i.e., groups solely defined by
incrementing K until the upper bound is reached or                       their transitions, neglecting domain-specific data. For
an iteration is worse than the one before [3].                           example, our detergent shoppers should form a sepa-
                                                                         rate group from those buying sunscreen. This is help-
Expectation-Maximization EM is an iterative                              ful to better understand seasonal behavioral changes
method consisting of two phases (E- and M-step) that                     and allow to create and modify workloads more real-
are repeated until the convergence criteria is met and                   istically. This is relevant in scenarios, where workload
a final set of clusters is identified. Initially, a ran-                 characterizations can be modified to provide the sys-
dom set of cluster identifying data points are defined                   tem with knowledge of upcoming events, like a sun-
which are the initial parameters for EM. The E-step                      screen shopper just before the holiday season.
uses these parameters to compute the expected val-                          A key ingredient for realistic user behaviors is
ues of each data point. The M-step uses the E-step                       domain-specific data, like the products or categories.
results to compute a new maximum likelihood for the                      With this additional data, user behavior can be classi-
data points regarding the parameters. This way, the                      fied in different groups. To be able to use such values


EMLS 2018: 5th Collaborative Workshop on Evolution and Maintenance of Long-Living Software Systems @ SE18, Ulm, Germany        48
in a clustering approach, a suitable metric must be                                                  Account manager (AM) Changes contact informa-
defined, e.g., products of similar type should be closer                                             tion after login. Inspects one of the prior orders.
together than products which are in another category.                                                Browsing user (BU) Searches products and only
                                                                                                     browses categories, products, and items.
5        Evaluation                                                                                  Product lover (*L) Visits the CATS (CL) or
Our evaluation uses the iObserve analysis service with                                               FISH (FL) category and selects one product. Repeats
different aggregation filters to examine clustering and                                              8 times and concludes shopping.
aggregation approaches. Figure 2 depicts an excerpt                                                  Single product buyer (S*) Goes to a category
of the pipe and filter setup for our analysis. The Ses-                                              (REPTILES (SR) or CATS (SC)) and buys one item.
sion Collector collects EntryEvents caused by a users                                                New customer (NC) Registers as a new customer,
and creates sessions (collection of EntryEvents). The                                                logs in, and buys a reptile.
filter sends out a session event either when it receives                                             Experiment Execution At the end of each anal-
a SessionEndEvent or a timeout is reached and the                                                    ysis run, we compare the detected clustered behavior
filter is triggered by the time trigger filter. The An-                                              models with the prepared IBMs. First, we identified
notatedGraphBuilder creates from the SessionEvent                                                    which detected behavior model matches best to an
an annotated graph and sends it to an Aggregation-                                                   IBM. Second, we identified the distance of the match-
Filter. Depending on the aggregation and clustering                                                  ing model. In case a match can be identified, this
approach, a specific filter is inserted there. Finally, the                                          counts as a hit (score=0). The match between two
aggregated graphs are serialized with the GraphOut-                                                  models is computed in three steps: (1) We remove
putFilter or alternatively transformed into a behavior                                               nodes which are not connected to the behavior graph,
model suitable for the Palladio Component Model.                                                     as they are created by mapping graphs to matrices and
                                                                                                     back. (2) We identify missing and additional nodes
                          SessionEvent                 AnnotatedGraph                 Graph Output
SessionEndEvent
EntryEvent
                                                                                          Filter     and edges, and compute ratios between these differ-
                   Session
                  Collector
                                         Annotated Graph
                                             Builder
                                                                   Aggrega�on
                                                                      Filter
                                                                                AggregatedGraphs     ences and the IBM. The lower the ratios, the better
                                                                                      PCM Model      the fit of the detected behavior model. (3) We com-
                                                                                      Transla�on
 Timer                                               Timer                                           pare the request parameters in the behavior models.
                                                                                                     For example, in the IBM the parameter CATS ap-
Figure 2: Analysis Pipe-and-Filter setup for user be-                                                pears once, but the detected behavior model includes
havior aggregation                                                                                   REPTILES, then the behaviors do not match.

                                                                                                     5.2    JIRA Experiment
The input for this analysis is provided by two soft-
ware systems. The first observed systems is an JPet-                                                 The JIRA experiment utilizes real word monitor-
Store [12] instance instrumented with Kieker [5]. The                                                ing data which we collect every semester during a
second one is an instrumented instance of our research                                               four week practical course where multiple groups of
group’s JIRA [11] which is used by students during a                                                 students plan and develop a small software system.
four week practical course. We use the JPetStore to                                                  Therefore, we do not have predefined IBMs for this
evaluate whether a specific previously defined setup of                                              experiment. However, we want to detect and isolate
realistic behaviors can be detected. While the JIRA                                                  specific behaviors. To examine which computed be-
experiment is used to apply the approaches to another                                                havior models match the reality, we will discuss the
domain where we want to explore whether they can                                                     aggregated behavior models with the students. Beside
produce reasonable results in a realistic scenario.                                                  this qualitative evaluation, we also will gain insight in
                                                                                                     how students use JIRA and whether we have to intro-
5.1          JPetStore Experiment                                                                    duce the functionality differently.
For the JPetStore experiments, we modeled seven re-
alistic (ideal) user behaviors [9], which utilize all func-                                          6     Preliminary Results
tions of the JPetStore. We created workloads with Se-                                                We already evaluated X-means and EM clustering.
lenium [13] that represent these behavior models. We                                                 The X-means setup is based on preliminary work,
execute JPetStore together with our workload and col-                                                where we tested different configuration parameters for
lected monitoring data. This data is then processed by                                               the algorithm [9]. We choose a configuration for X-
the iObserve analysis [6] using different clustering al-                                             means which provided the best fit to the JPetStore
gorithms provided by their respective filters. Then, we                                              scenario. We set the range for the number of expected
compare the detected behaviors with the mentioned                                                    clusters to [6..12] and use the Manhattan Metric.
set of seven ideal behavior models (IBM).                                                               For this clustering, we decided to go with the stan-
                                                                                                     dard setting in Weka and are not setting any param-
Workload The behaviors are tailored to share com-                                                    eters, including the pre-estimated number of clusters.
mon behavior, but also include significant differences                                               Since both algorithms start with randomly chosen val-
regarding pages, transitions, and request parameters,                                                ues, the results may differ between each execution.
e.g., whether the person shops cats or fishes:


EMLS 2018: 5th Collaborative Workshop on Evolution and Maintenance of Long-Living Software Systems @ SE18, Ulm, Germany                                     49
Table 1: Comparison between the generated clusters                       vectors resulting in less precise clustering [3], and for
        and the IBMs (scores and content).                               EM the behavior model merge might be amendable.
                                                                            In future, we will evaluate further aggregation al-
 Scores        AM       BU      CL     FL     SC      SR     NC          gorithms, including hierarchical clustering and simi-
 EM            0.25      0       0      0     0.8     0.8    0.9         larity matching. We want to include other factors,
 X-means       0.25      0       0      0     0.1     0.1    0.4         like, seasonal factors, into the clustering to improve
 Content       AM       BU      CL     FL     SC      SR     NC          detection. In context of the EMLS’18, our key ques-
 EM                     +       +      +      (a)     (c)    (e)         tions are: (a) Are these specific approaches useful to
 X-means                +       +      +      (b)     (d)    (f)         improve the quality of aggregated behaviors? (b) Are
                                                                         cyclic graphs the best possible way to describe realistic
                                                                         user behaviors? (c) How can we improve our evalua-
Therefore, we execute each clustering five times to                      tion, especially in scenarios where we cannot predefine
avoid results solely based on arbitrary starting val-                    ideal behavior models?
ues. In X-means, the resulting user behaviors are the
computed centroids of each cluster. They do not nec-                     References
essarily correspond to a real behavior. In contrast,
the EM clustering only groups measured behaviors.                          [1]   J. B. MacQueen. “Some Methods for Classifi-
In this evaluation, we simply took one representative                            cation and Analysis of MultiVariate Observa-
behavior of each group. This can be improved, e.g.,                              tions”. In: 5th Berkeley Symp. on Math. Statis-
by creating a mean vector of every cluster.                                      tics and Probability. Vol. 1. UC Press, 1967.
   Both approaches could not detect all 7 IBMs                             [2]   T. K. Moon. “The expectation-maximization al-
(EM=4 and X-means=5 clusters), but some of the de-                               gorithm”. In: IEEE Signal Process. Mag. 13.6
tected models match an IBM. Table 1 depicts scores                               (Nov. 1996).
and parameter matches of behaviors to IBMs.                                [3]   D. Pelleg et al. “X-means: Extending K-means
   EM and X-means both detected the account man-                                 with Efficient Estimation of the Number of
ager behavior, but there where minor discrepancies                               Clusters.” In: ICML. Vol. 1. 2000.
between the aggregations and IBMs. As we did not
                                                                           [4]   M. Hall et al. “The WEKA Data Mining Soft-
record parameters for these pages, we could not com-
                                                                                 ware: An Update”. In: SIGKDD Explor. Newsl.
pare the behaviors content wise. The browsing user,
                                                                                 11.1 (Nov. 2009).
cat lover, and fish lover where detected correctly by
both algorithms. The single cat buyer, however, could                      [5]   A. v. Hoorn et al. “Kieker: A Framework for
not be detected by EM, the closest match was the                                 Application Performance Monitoring and Dy-
single reptile buyer behavior (a). X-means created a                             namic Software Analysis”. In: Proceedings of
merged cluster of cat and reptile buyer, and the new                             ICPE 2012. ACM, Apr. 2012.
customer, identifiably by 1/3 possibility for a cat and                    [6]   R. Heinrich et al. “Architectural Run-Time
2/3 for reptiles (b). Similarly, the single reptile buyer                        Models for Operator-in-the-Loop Adaptation
was identified by EM (c) and X-means closest match                               of Cloud Applications”. In: Proceedings of
was the same as for the single cat buyer (d). Finally,                           MESOCA. IEEE Computer Society, Sept. 2015.
the new customer detection failed, as the returned                         [7]   M. C. Calzarossa et al. “Workload Characteriza-
cluster deviated significantly from the IBM. Also the                            tion: A Survey Revisited”. In: CSUR 48.3 (Feb.
found graph better matches a single buyer or product                             2016).
lover than the new customer.
                                                                           [8]   C. Vögele et al. “WESSBAS: extraction of
7     Conclusion                                                                 probabilistic workload specifications for load
                                                                                 testing and performance prediction—a model-
We presented our efforts towards realistic user behav-                           driven approach for session-based application
ior clustering to improve the understanding of work-                             systems”. In: SoSyM (Oct. 2016).
loads and their composition, which can improve soft-
                                                                           [9]   C. Dornieden. “Knowledge-Driven User Behav-
ware quality assessment and support more precise
                                                                                 ior Model Extraction for iObserve”. Master the-
workload alterations. All experiment data, notes, and
                                                                                 sis. Kiel University, June 2017.
artifacts can be found in our replication package [10].
   We reported on the detection quality of the two                       [10] Replication package. doi . org / 10 . 5281 /
clustering approaches EM and X-means for behavior                             zenodo.883095. 2017.
models. Our current findings are that both clustering                    [11] JIRA – Issue Tracking System. https://de.
approaches are able to differentiate some behaviors                           atlassian.com/software/jira. 2017.
based on parameter information, which is an improve-                     [12] MyBatis JPetStore application. http : / / www .
ment in comparison to the clustering without this in-                         mybatis.org/spring/sample.html. 2017.
formation. However, they are unable to detect all be-
haviors correctly. Key issues are for X-means larger                     [13] Selenium browser automation. http : / / www .
                                                                              seleniumhq.org. 2017.


EMLS 2018: 5th Collaborative Workshop on Evolution and Maintenance of Long-Living Software Systems @ SE18, Ulm, Germany         50

</pre>