Towards an Evaluation Visualization with Color

                           Megan H. Varnum, Kate M. B. Spencer, and Alicia M. Grubb

                                             Department of Computer Science
                                          Smith College, Northampton, MA, USA
                                        {mvarnum, kspencer, amgrubb}@smith.edu


                         Abstract. Goal models help stakeholders understand project scenarios
                         and make decisions. In prior work, we used Tropos evaluation semantics
                         to allow for automated analysis over time; however, formal evaluation la-
                         bels (e.g., (F, ⊥)) are difficult for users to interpret across a large model.
                         In this paper, we present our work towards understanding the extent
                         to which using colors in goal modeling affects users’ ability to make de-
                         cisions. Specifically, we are interested in studying if coloring intentions
                         with evaluation information allows for better comparisons of initial states
                         and simulations of future paths. To address this question, we developed
                         a color visualization extension to BloomingLeaf, a goal model analysis
                         tool, where the color of each node is changed based on either the initial
                         evaluations or the resulting analysis over time. This then allows us to ex-
                         plore if and how color visualization assists the user with decision making.
                         We present our implementation and initial evaluation of this extension.


                 1    Introduction
                 Goal modeling allows stakeholders to model and visualize their domain and
                 analyze trade-offs [1]. Recent work extends goal modeling and analysis to allow
                 for project scenarios with changing evaluations [5]. Users can run simulations
                 to examine relationships over time, with tooling and analysis provided for both
                 iStar [7] and Tropos [6]. We believe these tools have potential for problem-solving
                 among teams of stakeholders. However, the reality of working with large models
                 is that it is challenging to view trends and make decisions because of the large
                 volume of data and technical nature of analysis results.
                     In our work, we aim to improve the interpretability of model evaluations
                 and analysis. We investigate the use of color to convey information about the
                 fulfillment of evaluations, both statically and as they change over time. Previous
                 work in the iStar community looks at improving clarity of goal models through
                 visualizations and making them more user-friendly [3, 10]. We present our work
                 in the context of BloomingLeaf, which uses Tropos semantics, but the lessons
                 learned have similar implications for the iStar language.
                 Running Example: Social Distancing Dinner. In the midst of the COVID-
                 19 pandemic [9], Emma must decide how to acquire dinner for the week. She
                 constructs a goal model to evaluate the trade-offs between picking up takeout
                 or cooking her dinner at home (see Fig. 1(i)). She runs simulations for various
                 scenarios on BloomingLeaf, one where minimize economic impact is prioritized (i.e.,


Copyright © 2020 for this paper by its authors. Use permitted under                                       79
Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                 (F, ⊥)


                                                        (F, P)            (P, ⊥)


                                               (F, F)            (P, P)            (⊥, ⊥)


                                                        (P, F)            (⊥, P)


                                                                 (⊥, F)

 (i) Running Example: Social                (ii) A lattice of evidence pairs in
 Distancing Dinner                          Tropos with colored labels.

                    Fig. 1: Goal Model and Evaluation Lattice

satisfied over time) and another where minimize spread of COVID-19 is prioritized.
For each scenario, BloomingLeaf returns evaluation labels for all timepoints in
a week, and Emma is unsure how to proceed. How can she easily know whether
cook at home or order takeout is satisfied more often? What about the proportion
of time that practice social distancing is partially denied or fully denied? While
traversing the simulation path, Emma is unsure how to interpret the data and
unclear about the optimal course of action.
Contributions. In this paper, we explore our research question: To what extent
does applying colors to individual intentions in models and simulation paths im-
prove model understanding and allow for better decision making? We present
Evaluation Visualization Overlay (EVO), an extension to BloomingLeaf which
colors the initial satisfaction values (i.e., evidence pairs) of each intention in a
goal model and, after running analysis, overlays the proportion of time points
that each intention holds each evaluation. With EVO, users can better visu-
alize future states of their model, make comparisons between intentions, and
understand trends in fulfillment over time.
    The remainder of this paper is organized as follows. Sect. 2 introduces relevant
background about evaluation labels and BloomingLeaf. Sect. 3 explores our vi-
sualization approach. Sect. 4 discusses our preliminary evaluation and compares
our work with prior visualizations. We conclude in Sect. 5.


2    Background
BloomingLeaf is an online tool used to construct and analyze goal models using
the Tropos language [4]. Goal models consist of actors and intentions (i.e., goals,
soft goals, tasks, and resources). For example, in Fig. 1(i), Emma is an actor, have
dinner is a goal, practice social distancing is a soft goal, and cook at home is a task.


                                          80
In BloomingLeaf, when EVO is not enabled, elements are solely colored based
on their intention type. For instance, fresh groceries in Fig. 1(i) is colored light
blue because it is a resource. These colors, combined with different shapes, allow
users to distinguish intention types. In Tropos, each intention can be assigned
a qualitative evaluation label (i.e., evidence pair ), which is a pair (s, d) where
s ∈ {F, P, ⊥} is the level of evidence for and d ∈ {F, P, ⊥} is the level of evidence
against the fulfillment of an intention g [4]. F [resp. P] means there is full [resp.
partial] evidence for or against the fulfillment of g, while ⊥ represents null evi-
dence. Fig. 1(ii) gives a lattice of evidence pairs, with all possible combinations of
evidence for and against. Initially, an intention can have one of five evaluation la-
bels: (Fully) Satisfied (F, ⊥), Partially Satisfied (P, ⊥), Partially Denied (⊥, P),
(Fully) Denied (⊥, F), and None (⊥, ⊥). Propagation-based analysis techniques
may result in intentions being assigned one of four conflicting evaluation labels:
(F, F), (F, P), (P, F), and (P, P).
    Interactions between intentions are expressed through links. Contribution
links (e.g., +, -, ++, –) indicate an element’s influence on another. For example,
in Fig. 1(i), in order for cook at home to be satisfied, fresh groceries must also be
satisfied, indicated by a ++, while cook at home helps practice social distancing, in-
dicated by a +. Decomposition links (and /or ) decompose an intention into child
goals. An intention with an and [resp. or ] decomposition requires all [resp. only
one] of its children to be satisfied. Functions indicate an intention’s evolution.
For example, Fig. 1(i) assigns fresh groceries a Monotonic Negative function
(MN) because Emma’s groceries will expire throughout the week. After creating
a model of intentions, links, initial values, and functions, users can analyze their
model over time by simulating single paths (not shown). These are sequences of
states consisting of evidence pairs for each element in the model over an ordered
set of time points [5]. Users can step through these time points to see the chang-
ing evidence pairs of each intention, based on its assigned function. Unassigned
intentions receive evidence pairs via propagation and the Stochastic function.
In Fig. 1(i), simulating a single path helps Emma see how the satisfaction of
fresh groceries impacts the evaluation of not get or transmit COVID-19 (not shown).


3    Results

The goal of Evaluation Visualization Overlay (EVO) is to visualize both the
initial state and path analysis of a model over time. Each evidence pair is assigned
a color, with bluer shades closer to satisfied and redder shades closer to denied,
as illustrated in the background of Fig. 1(ii). Thus, (F, ⊥) and (P, ⊥) are shades
of blue and (⊥, F) and (⊥, P) are shades of red. Conflicting evaluations with
evidence for both satisfaction and denial are shades of purple, with (F, P) [resp.
(P, F)] assigned shades of purple closer to blue [resp. red]. These colors are used
both in modeling and analysis mode when EVO is enabled.
Implementation. Working within the architecture of BloomingLeaf, we imple-
mented EVO as an on-off toggle option in the top toolbar, with its mechanics
implemented in the JavaScript front-end. For the visualization of a single path


                                         81
in analysis mode, BloomingLeaf’s back-end provides the necessary data, which
is then stored in an encapsulating object. This object contains an array that
corresponds to the evidence pairs of each intention at each time point.
EVO while Modeling. Activating EVO in the modeling mode changes the
color of each intention to correspond to its user-set initial satisfaction value.
This provides an overall visualization of the model’s initial state, allowing users
to understand the color scheme and initial evaluations.
     In our running example, Emma first prioritizes minimize spread of COVID-19,
assigning it as (F, ⊥) with a Constant function. Knowing her groceries deplete
over time, she sets the resource fresh groceries to (F, ⊥) with a Monotonic Neg-
ative function. Interested in the initial state of the model, she activates EVO.
She sees the two intentions above are blue (not shown), while the rest remain
their original color. Because the majority of her model doesn’t change, Emma
knows she cannot answer her dinner question with only the model’s initial state.
EVO while Analyzing the Resulting Model. After running a single-path
simulation in analysis mode, activating EVO displays colored stripes on each
element. The width of each colored stripe corresponds to the percentage of the
path that the intention is assigned a given evidence pair, ordered left to right from
most to least satisfied. Thus, the user can gauge the evaluations of intentions
over time without needing to walk through the simulation state by state.
     In our example, Emma, simulates a path in analysis mode and sees the evi-
dence pairs in the corners of the intentions while stepping through time points.
However, she is unable to remember data from each previous time point and so
finds it impossible to make sense of the evidence pairs at once. She then turns
on EVO to see how prioritizing the satisfaction of minimize spread of COVID-19
impacts her dinner decision (see Fig. 2). While there are variations in the evalu-
ations, Emma sees that cook at home is more satisfied than order takeout, and that
it is never denied. This indicates that to minimize the transmission of COVID-19
in her life she should choose to cook at home. Emma also observes that not get
or transmit COVID-19 and practice social distancing are mostly satisfied, but notices
that minimize economic impact has the majority of evidence towards denied.
     In a second scenario, Emma wants to reduce job loss and is concerned about
her local economy. She clears the model and sets minimize economic impact to
fully satisfied with a Constant function. Starting with EVO off, she is able to
discern that while order takeout is fully satisfied, minimize spread is denied for sev-
eral time points. However, toggling EVO reveals the whole picture of her results
(see Fig. 2). While order takeout and have dinner are overwhelmingly satisfied, the
other intentions have mixed results, and many are denied. Given this informa-
tion, Emma understands the consequences of interacting with others and decides
to cook at home to keep her and others the most safe.
     EVO helps Emma understand trends in the model and answer her trade-off
question: Which dinner option is fully or partially satisfied for the greatest pro-
portion of time points under her chosen constraints? She can also draw conclu-
sions about the unpredictability of her scenario with EVO by seeing discrepancies
among multiple single paths.


                                         82
Fig. 2: Two single-path simulations for the running example. The left prioritizes
minimize spread of COVID-19, and the result favors Emma cooking at home. The
right prioritizes minimize economic impact, and the result favors ordering takeout.

4    Discussion and Related Work

To explore EVO’s versatility, we took nine models ranging in size from 14 to 121
intentions, some more deterministic and others more stochastic, and analyzed
them with EVO enabled. For each model, we considered what information was
clear before activating EVO and what we could not understand without it. We
consolidated our observations into benefits of EVO and areas of improvement.
Benefits. The toggle operation makes the user interface easy to use. In the
modeling mode, EVO shows initial evaluations to visualize the initial state of
the model. The colors are intuitive, with bluer shades equating to more satisfied
and redder shades to more denied. In analysis mode, EVO provides the user with
the “big picture” of their model by visualizing the proportion of time points the
intentions hold each evaluation. Since the order of colors is consistent, it’s easy to
compare evaluation proportions across different intentions and simulation paths.
Improvements. The chosen color palette is not optimal for colorblind users.
EVO can be improved by enabling a colorblind mode. An intention that starts
(F, ⊥) and evolves to (⊥, F) has the same visualization as the reverse, because
the colors are always in the same order. An additional view showing changes in
order would differentiate these two cases. EVO can also be improved to visualize
any partial point of interest after simulating a single path. EVO is limited in the
modeling view; it’s only useful for select models with well-defined initial states.
Plan for Validation. We have confidence EVO benefits users in drawing con-
clusions from goal models, given the distinct differences observed between using
it and not; however, we only provide an initial evaluation. To quantify and vali-
date the functionality of EVO, we must test it among a broader range of users.
    We plan to conduct a study comparing BloomingLeaf’s usability in decision
making with and without EVO by providing different models to two randomized


                                         83
groups of students trained in goal modeling. One group’s tool will have EVO
enabled, while the other group will only use the simulated path. The students
will then answer questions about optimal decisions and evaluations over time.
Related Work. Aprajita proposed using heatmaps to depict overall satisfaction
(see Fig. 11 of the associated thesis [2]). In both Aprajita’s work and this paper,
we address the difficulty of allowing the user to visualize trends in time-based
goal models [2]. Our research differs in that Aprajita’s heatmaps are separate
from the goal model, whereas our work integrates the color visualization and the
model, allowing users to see both at the same time. Aprajita’s system of red to
green (for quantitative evaluations) does not apply in our work; thus, we develop
our two axis color scheme (see Fig. 1(ii)).

5    Summary and Future Work
EVO visualizes evaluation trends in single-path simulations as well as models’
initial states, which allows for easy comparison across intentions within models.
This work could have implications for decision making in other iStar tools [8].
    Before drawing conclusions to our research question, a more thorough analy-
sis across a broad range of potential users is required. Additionally, future work
ensures our color palette is appropriately clear for all users. We are also develop-
ing a variation of EVO that visualizes the evaluations at every time point over
a single path. This variation allows users to see changes of intentions over time
instead of their unordered overall evaluation.
Acknowledgments. Varnum was supported by a Smith College STRIDE Scholarship.


References
 1. D. Amyot et al. Evaluating Goal Models Within the Goal-Oriented Requirement
    Language. International Journal of Intelligent Systems, 25(8):841–877, 2010.
 2. Aprajita. TimedGRL: Specifying Goal Models Over Time. Master’s thesis, McGill
    University, 2017.
 3. R. F. de Oliveira et al. A Critical View Over iStar Visual Constructs. In Proc. of
    iStar’19, 2019.
 4. P. Giorgini, J. Mylopoulos, E. Nicchiarelli, and R. Sebastiani. Formal Reasoning
    Techniques for Goal Models. Journal on Data Semantics, 1:1–20, 2003.
 5. A. M. Grubb. Evolving Intentions: Support for Modeling and Reasoning about
    Requirements that Change over Time. PhD thesis, University of Toronto, 2019.
 6. A. M. Grubb and M. Chechik. BloomingLeaf: A Formal Tool for Requirements
    Evolution over Time. In Proc. of RE’18 Posters & Tool Demos, 2018.
 7. A. M. Grubb, G. Song, and M. Chechik. GrowingLeaf: Supporting Requirements
    Evolution over Time. In Proc. of iStar’16, pages 31–36, 2016.
 8. T. Li, A. M. Grubb, and J. Horkoff:. Understanding Challenges and Tradeoffs in
    iStar Tool Development. In Proc. of iStar’16, pages 49–54, 2016.
 9. WHO. WHO Statement regarding cluster of pneumonia cases in Wuhan, China.
    https://www.who.int/china/news/detail/09-01-2020-who-statement-regarding-
    cluster-of-pneumonia-cases-in-wuhan-china, Jan. 9 2020.
10. A. Yasin and L. Liu. Recent Studies on i*: A Survey. In Proc. of iStar’17, pages
    79–84, 2017.


                                         84