=Paper=
{{Paper
|id=Vol-2066/emls2018paper03
|storemode=property
|title=Extracting Realistic User Behavior Models
|pdfUrl=https://ceur-ws.org/Vol-2066/emls2018paper03.pdf
|volume=Vol-2066
|authors=Reiner Jung,Marc Adolf
|dblpUrl=https://dblp.org/rec/conf/se/JungA18
}}
==Extracting Realistic User Behavior Models==
<pdf width="1500px">https://ceur-ws.org/Vol-2066/emls2018paper03.pdf</pdf>
<pre>
Extracting Realistic User Behavior Models
Reiner Jung Marc Adolf
rju@informatik.uni-kiel.de mad@informatik.uni-kiel.de
Kiel University, Germany Kiel University, Germany

Abstract may lose behavioral information, as two visits of a
page might differ in purpose and a looping behavior
Workloads play a central role in assessing software
might actually be dissimilar than another loop along
qualities, like performance and privacy. They are
the same pages. (III) X-means only yields acceptable
characterized by intensity and user behavior patterns.
results for small parameter vectors of at least ordinal
Combining multiple intensities and behaviors are used
values, but current behavior models are mapped to
to create workload profiles which, among others, se-
vectors. More model transitions imply more parame-
valuate software design, predict of system utilization.
ters, which harm clustering [3].
The central challenge for workload profiles is their
To mitigate these issues, we want (a) to advance be-
fit to real workloads and in particular the match to
havior models to better capture the specific properties
specific behaviors. This is especially relevant for un-
of different kinds of users based on domain knowledge
derstanding and identifying specific user groups and
of the observed software system and other parameters,
support workload composition by operators.
e.g., time, and (b) to improve clustering and classifica-
In this paper, we address the identification of such
tion of observed user behaviors. Therefore, we extend
realistic user behaviors by utilizing domain specific
classic behavior models with domain knowledge and
attributes, report on our evaluation of the fitness of
evaluate different aggregation approaches capable of
behavior clustering approaches, and discuss our setup
handling large parameter sets or find ways to reduce
to evaluate further clustering approaches.
the parameter sets to be able to use aggregation ap-
1 Introduction proaches well suited for limited number of parameters.
In this paper, we report on our preliminary find-
Service quality of software systems is influenced by ings regarding two clustering approaches X-means and
workload intensity and the user behavior. Both fac- Expectation-Maximization (EM) [2] in context of re-
tors play a vital role characterizing the system work- alistic user behaviors, present additional approaches
load [7], which is relevant to understand past work- which we are currently investigating, and formulate
loads and construct workload profiles to estimate fu- key questions regarding the identification behavior
ture system utilization and performance. For exam- models and their quality.
ple, the resource consumption of browsing a catalog, The paper is structured as follows: Section 2 dis-
searching the inventory, and purchasing items can be cusses user behavior models. Section 3 introduces
quite different. Therefore, it is necessary to be able clustering and aggregation approaches. Section 4
to distinguish specific kinds of user behavior to char- presents the concept of realistic user behaviors. Sec-
acterize the workload sufficiently. tion 5 describes the evaluation and Section 6 discusses
State of the art workload characterization ap- preliminary results for X-means and EM clustering.
proaches, such as WESSBAS [8], use a behavior mix, Finally, Section 7 summarizes our findings, discusses
where different workload intensities are combined with further research, and describes our key questions.
specific user behavior models to construct a work-
load model. These approaches collect user sessions 2 Behavior Models
and aggregate them to behavior models. WESSBAS
Behavior models describe kinds of users and are ag-
estimates behavior models utilizing X-means cluster-
gregations of single user behaviors with similar be-
ing [3]. Such behavior models have three key short-
havior patterns. A single user behavior comprises all
comings: (I) They reflect the observed behavior of the
system invocations (entry level events) of a user dur-
past, but might not represent specific user groups cor-
ing a session. It can be modeled as a path over vis-
rectly harming predictability. For example, a deter-
ited pages or transformed into a behavior graph or a
gent shopper might reappear frequently while a sun-
Markov-chain, which may contain loops for repetitive
screen shopper has a different seasonal profile. Unfor-
behavior. These paths or graphs are then grouped for
tunately, current approaches cannot distinguish be-
similarity and merged into a behavior model.
tween them. (II) The behavior models use cyclic
Figure 1 depicts an annotated behavior graph of
graphs with edge counts or probabilities to create
a user interacting with the JPetStore [12], an exam-
compact representations. However, in this process we

EMLS 2018: 5th Collaborative Workshop on Evolution and Maintenance of Long-Living Software Systems @ SE18, Ulm, Germany 47
new parameters for the next iteration are computed.
These two steps are repeated until the convergence
criterion is reached [2].
Hierarchical Clustering Hierarchical clustering is
a approach which builds a hierarchy of clusters.
Where the root cluster contains all individuals which
are then further divided into smaller clusters. The hi-
erarchy can either be build up from the root cluster
Figure 1: Behavior graph representing a shopper to the leaves (divisive) or vise versa (agglomerative).
from our JPetStore [12] use case. With the agglomerate approach, we start with clusters
containing each only one individual. Then we deter-
ple application resembling a shop system for pets. In mine the distance between all pairs of clusters. The
this graph, nodes represent page visits and edges ex- pair which has the shortest distance is then merged
press the transitions between pages. The numbers at into one cluster. This process is repeated until all in-
the edges indicate the amount of transitions between dividuals are merged into one single cluster [4, p. 95].
pages. In addition, we added domain specific infor- The shortest distance between two elements is deter-
mation, like the viewed category and product, which mined by a distance function, like Euclidian distance
can be used to support the behavior clustering. or Manhattan distance.

3 Clustering Approaches Similarity Matching In contrast to the other ap-
proaches, similarity matching uses two metrics to
Clustering can be used to identify groups of data compare graphs based on structural similarity and on
points which share similarities. In our context, we the distance of parameter values based on their se-
use clustering to identify user behavior models, like mantic similarity. The algorithm computes initially
WESSBAS, which uses X-means. Clustering is af- the distance between each graph of a set of behavior
fected by density, distances, and distribution of data graphs, creating a vector for each graph containing the
points. Depending on the clustering approach, the distances to all others graphs. The distance between
dimension of the data points can have a significant Pn
two vectors is the sum of differences i=1 |da,i − db,i |
impact on the quality of the clusters. We employ clus- where n is the number of graphs, and dj,i refers to
tering methods provided by Weka [4]: the values in a vector j. Graphs where the distance
X-Means X-means builds on the K-means cluster- is lower than a defined threshold, are then considered
ing algorithm [3], which consists of three steps [1]: (1) similar and grouped together.
For every expected cluster (K), a center point, called These groups are further divided based on their
centroid, is randomly chosen from the data points. (2) semantic difference. For example, in JPetStore cate-
According to a chosen distance metric, each point x gories, products, and items form a tree. The distance
of the data set is assigned to the closest centroid. (3) between two values in the tree determines their se-
The centroids are recomputed according to the center mantic difference, e.g., two cats Fritz and Felix be-
of mass of the points belonging to it. (2) and (3) are long both to the product male cartoon cat, having a
repeated until a convergence criterion is met. distance of one, while the cat Amber belong to the
In contrast to K-means, X-means searches over a product female cat, and therefore the distance is two.
range (e.g., 2 to 10) for a set of clusters, which provide
the best fit. Therefore, X-means starts with comput-
4 Realistic User Behaviors
ing K-means for the lower bound (e.g., K=2). Subse- We define realistic user behavior models as behavior
quently, each cluster is split into two using 2-means to models which reflect real groups of users in contrast
try to improve the fit. Both steps are iterated while to approximated groups, i.e., groups solely defined by
incrementing K until the upper bound is reached or their transitions, neglecting domain-specific data. For
an iteration is worse than the one before [3]. example, our detergent shoppers should form a sepa-
rate group from those buying sunscreen. This is help-
Expectation-Maximization EM is an iterative ful to better understand seasonal behavioral changes
method consisting of two phases (E- and M-step) that and allow to create and modify workloads more real-
are repeated until the convergence criteria is met and istically. This is relevant in scenarios, where workload
a final set of clusters is identified. Initially, a ran- characterizations can be modified to provide the sys-
dom set of cluster identifying data points are defined tem with knowledge of upcoming events, like a sun-
which are the initial parameters for EM. The E-step screen shopper just before the holiday season.
uses these parameters to compute the expected val- A key ingredient for realistic user behaviors is
ues of each data point. The M-step uses the E-step domain-specific data, like the products or categories.
results to compute a new maximum likelihood for the With this additional data, user behavior can be classi-
data points regarding the parameters. This way, the fied in different groups. To be able to use such values

EMLS 2018: 5th Collaborative Workshop on Evolution and Maintenance of Long-Living Software Systems @ SE18, Ulm, Germany 48
in a clustering approach, a suitable metric must be Account manager (AM) Changes contact informa-
defined, e.g., products of similar type should be closer tion after login. Inspects one of the prior orders.
together than products which are in another category. Browsing user (BU) Searches products and only
browses categories, products, and items.
5 Evaluation Product lover (*L) Visits the CATS (CL) or
Our evaluation uses the iObserve analysis service with FISH (FL) category and selects one product. Repeats
different aggregation filters to examine clustering and 8 times and concludes shopping.
aggregation approaches. Figure 2 depicts an excerpt Single product buyer (S*) Goes to a category
of the pipe and filter setup for our analysis. The Ses- (REPTILES (SR) or CATS (SC)) and buys one item.
sion Collector collects EntryEvents caused by a users New customer (NC) Registers as a new customer,
and creates sessions (collection of EntryEvents). The logs in, and buys a reptile.
filter sends out a session event either when it receives Experiment Execution At the end of each anal-
a SessionEndEvent or a timeout is reached and the ysis run, we compare the detected clustered behavior
filter is triggered by the time trigger filter. The An- models with the prepared IBMs. First, we identified
notatedGraphBuilder creates from the SessionEvent which detected behavior model matches best to an
an annotated graph and sends it to an Aggregation- IBM. Second, we identified the distance of the match-
Filter. Depending on the aggregation and clustering ing model. In case a match can be identified, this
approach, a specific filter is inserted there. Finally, the counts as a hit (score=0). The match between two
aggregated graphs are serialized with the GraphOut- models is computed in three steps: (1) We remove
putFilter or alternatively transformed into a behavior nodes which are not connected to the behavior graph,
model suitable for the Palladio Component Model. as they are created by mapping graphs to matrices and
back. (2) We identify missing and additional nodes
SessionEvent AnnotatedGraph Graph Output
SessionEndEvent
EntryEvent
Filter and edges, and compute ratios between these differ-
Session
Collector
Annotated Graph
Builder
Aggrega�on
Filter
AggregatedGraphs ences and the IBM. The lower the ratios, the better
PCM Model the fit of the detected behavior model. (3) We com-
Transla�on
Timer Timer pare the request parameters in the behavior models.
For example, in the IBM the parameter CATS ap-
Figure 2: Analysis Pipe-and-Filter setup for user be- pears once, but the detected behavior model includes
havior aggregation REPTILES, then the behaviors do not match.

5.2 JIRA Experiment
The input for this analysis is provided by two soft-
ware systems. The first observed systems is an JPet- The JIRA experiment utilizes real word monitor-
Store [12] instance instrumented with Kieker [5]. The ing data which we collect every semester during a
second one is an instrumented instance of our research four week practical course where multiple groups of
group’s JIRA [11] which is used by students during a students plan and develop a small software system.
four week practical course. We use the JPetStore to Therefore, we do not have predefined IBMs for this
evaluate whether a specific previously defined setup of experiment. However, we want to detect and isolate
realistic behaviors can be detected. While the JIRA specific behaviors. To examine which computed be-
experiment is used to apply the approaches to another havior models match the reality, we will discuss the
domain where we want to explore whether they can aggregated behavior models with the students. Beside
produce reasonable results in a realistic scenario. this qualitative evaluation, we also will gain insight in
how students use JIRA and whether we have to intro-
5.1 JPetStore Experiment duce the functionality differently.
For the JPetStore experiments, we modeled seven re-
alistic (ideal) user behaviors [9], which utilize all func- 6 Preliminary Results
tions of the JPetStore. We created workloads with Se- We already evaluated X-means and EM clustering.
lenium [13] that represent these behavior models. We The X-means setup is based on preliminary work,
execute JPetStore together with our workload and col- where we tested different configuration parameters for
lected monitoring data. This data is then processed by the algorithm [9]. We choose a configuration for X-
the iObserve analysis [6] using different clustering al- means which provided the best fit to the JPetStore
gorithms provided by their respective filters. Then, we scenario. We set the range for the number of expected
compare the detected behaviors with the mentioned clusters to [6..12] and use the Manhattan Metric.
set of seven ideal behavior models (IBM). For this clustering, we decided to go with the stan-
dard setting in Weka and are not setting any param-
Workload The behaviors are tailored to share com- eters, including the pre-estimated number of clusters.
mon behavior, but also include significant differences Since both algorithms start with randomly chosen val-
regarding pages, transitions, and request parameters, ues, the results may differ between each execution.
e.g., whether the person shops cats or fishes:

EMLS 2018: 5th Collaborative Workshop on Evolution and Maintenance of Long-Living Software Systems @ SE18, Ulm, Germany 49
Table 1: Comparison between the generated clusters vectors resulting in less precise clustering [3], and for
and the IBMs (scores and content). EM the behavior model merge might be amendable.
In future, we will evaluate further aggregation al-
Scores AM BU CL FL SC SR NC gorithms, including hierarchical clustering and simi-
EM 0.25 0 0 0 0.8 0.8 0.9 larity matching. We want to include other factors,
X-means 0.25 0 0 0 0.1 0.1 0.4 like, seasonal factors, into the clustering to improve
Content AM BU CL FL SC SR NC detection. In context of the EMLS’18, our key ques-
EM + + + (a) (c) (e) tions are: (a) Are these specific approaches useful to
X-means + + + (b) (d) (f) improve the quality of aggregated behaviors? (b) Are
cyclic graphs the best possible way to describe realistic
user behaviors? (c) How can we improve our evalua-
Therefore, we execute each clustering five times to tion, especially in scenarios where we cannot predefine
avoid results solely based on arbitrary starting val- ideal behavior models?
ues. In X-means, the resulting user behaviors are the
computed centroids of each cluster. They do not nec- References
essarily correspond to a real behavior. In contrast,
the EM clustering only groups measured behaviors. [1] J. B. MacQueen. “Some Methods for Classifi-
In this evaluation, we simply took one representative cation and Analysis of MultiVariate Observa-
behavior of each group. This can be improved, e.g., tions”. In: 5th Berkeley Symp. on Math. Statis-
by creating a mean vector of every cluster. tics and Probability. Vol. 1. UC Press, 1967.
Both approaches could not detect all 7 IBMs [2] T. K. Moon. “The expectation-maximization al-
(EM=4 and X-means=5 clusters), but some of the de- gorithm”. In: IEEE Signal Process. Mag. 13.6
tected models match an IBM. Table 1 depicts scores (Nov. 1996).
and parameter matches of behaviors to IBMs. [3] D. Pelleg et al. “X-means: Extending K-means
EM and X-means both detected the account man- with Efficient Estimation of the Number of
ager behavior, but there where minor discrepancies Clusters.” In: ICML. Vol. 1. 2000.
between the aggregations and IBMs. As we did not
[4] M. Hall et al. “The WEKA Data Mining Soft-
record parameters for these pages, we could not com-
ware: An Update”. In: SIGKDD Explor. Newsl.
pare the behaviors content wise. The browsing user,
11.1 (Nov. 2009).
cat lover, and fish lover where detected correctly by
both algorithms. The single cat buyer, however, could [5] A. v. Hoorn et al. “Kieker: A Framework for
not be detected by EM, the closest match was the Application Performance Monitoring and Dy-
single reptile buyer behavior (a). X-means created a namic Software Analysis”. In: Proceedings of
merged cluster of cat and reptile buyer, and the new ICPE 2012. ACM, Apr. 2012.
customer, identifiably by 1/3 possibility for a cat and [6] R. Heinrich et al. “Architectural Run-Time
2/3 for reptiles (b). Similarly, the single reptile buyer Models for Operator-in-the-Loop Adaptation
was identified by EM (c) and X-means closest match of Cloud Applications”. In: Proceedings of
was the same as for the single cat buyer (d). Finally, MESOCA. IEEE Computer Society, Sept. 2015.
the new customer detection failed, as the returned [7] M. C. Calzarossa et al. “Workload Characteriza-
cluster deviated significantly from the IBM. Also the tion: A Survey Revisited”. In: CSUR 48.3 (Feb.
found graph better matches a single buyer or product 2016).
lover than the new customer.
[8] C. Vögele et al. “WESSBAS: extraction of
7 Conclusion probabilistic workload specifications for load
testing and performance prediction—a model-
We presented our efforts towards realistic user behav- driven approach for session-based application
ior clustering to improve the understanding of work- systems”. In: SoSyM (Oct. 2016).
loads and their composition, which can improve soft-
[9] C. Dornieden. “Knowledge-Driven User Behav-
ware quality assessment and support more precise
ior Model Extraction for iObserve”. Master the-
workload alterations. All experiment data, notes, and
sis. Kiel University, June 2017.
artifacts can be found in our replication package [10].
We reported on the detection quality of the two [10] Replication package. doi . org / 10 . 5281 /
clustering approaches EM and X-means for behavior zenodo.883095. 2017.
models. Our current findings are that both clustering [11] JIRA – Issue Tracking System. https://de.
approaches are able to differentiate some behaviors atlassian.com/software/jira. 2017.
based on parameter information, which is an improve- [12] MyBatis JPetStore application. http : / / www .
ment in comparison to the clustering without this in- mybatis.org/spring/sample.html. 2017.
formation. However, they are unable to detect all be-
haviors correctly. Key issues are for X-means larger [13] Selenium browser automation. http : / / www .
seleniumhq.org. 2017.

EMLS 2018: 5th Collaborative Workshop on Evolution and Maintenance of Long-Living Software Systems @ SE18, Ulm, Germany 50

</pre>