Comparing the comprehensibility of numeric versus
  symbolic contribution labels in goal models:
             an experimental design
                             Sotirios Liaskos                                           Wisal Tambosi
                    School of Information Technology                          School of Information Technology
                              York University                                          York University
                            Toronto, Canada                                           Toronto, Canada
                            liaskos@yorku.ca                                       w.tambosi@gmail.com


   Abstract—Goal models have been suggested to be an effec-           values has also been proposed, whereby, e.g., sign and absolute
tive way to support decision making in early requirements             value are used to represent quality and size of contribution.
engineering. Such models are capable of representing a large          These representational options have been studied from a
number of alternative ways to solve stakeholder problems and
comparing them against each other with respect to higher level        theoretical point of view and different formal semantics have
objectives. Core to the realization of such analysis is the concept   been proposed, each showing how the representations allow
of the contribution link that represents how satisfaction of one      inference of satisfaction status of one goal from that of other
goal affects satisfaction of another. Many ways for representing      goals.
and assigning precise meaning to contribution links have been
                                                                         However, limited work has been done in terms of how
proposed, each with different properties and advantages. But
which one agrees more with user preferences on how such links         users of the models perceive what the symbols and/or numbers
should be used? In this paper, we present an experimental design      mean and how they expect to use them in order to make
for comparing two ways for representing contribution links,           inferences pertinent to decision making. It is particularly useful
symbolic versus numeric, with respect to how accurately and           to understand how users intuitively assign meaning to signifiers
quickly users identify optimal decisions using each representation
                                                                      within the language, when no prior training and/or experience
format. Apart from comparing the two representation techniques
and advising the modeling practice accordingly, the study aims        with the language can be assumed for them. Knowing what
at showing how a quality construct we call intuitiveness can be       untrained user’s intuition is, language designers can settle for
added to the range of criteria a modeling language designer has       representations and semantics that are closer to the user’s
at her disposal for evaluating her language design decisions.         expectations and, as such, easier to learn and more accurate
                                                                      to use.
                       I. I NTRODUCTION
                                                                         In this paper we present an experimental design aimed at
   Goal models [1]–[3] have long been proposed as an effective        comparing the intuitiveness of qualitative versus quantitative
means for representing intentional structures and their relation-     contribution labels in goal models, having assumed specific
ship to decision problems in early requirements engineering           semantics for each. Our design aims at showing which of
[4]–[6]. Using such models, business analysts can capture the         the two visualization-meaning pairs leads to more accurate
variety of ways by which stakeholders can solve their business        decisions in the least amount of time.
problems and compare them with one another with respect to               The rest of the paper is organized as follows. In Section
set criteria.                                                         II we offer some background on goal models, contribution
   Many representational and semantic frameworks have been            links and their semantics. In Section III we describe the
proposed within the goal modeling community to allow such             experimental design and in Section IV we summarize and
analysis [5]–[8] ( [9] for a survey). One of the fundamental          review some of the related work.
constituents of goal models that allow such analysis is the
concept of the contribution link, which is a representation of                              II. BACKGROUND
a relationship between two goals signifying how satisfaction
of one affects the satisfaction of the other. Different goal          A. Goal Models and Contribution Links
modeling and analysis frameworks propose different ways                 The goal models we consider in this study look like the
to visually represent and assign meaning to the contribution          ones in Figure 1. The nodes (ovals and clouds) are goals that
concept. The traditional/de-facto representation choice is qual-      describe states of the world that the actor in question (circular
itative (symbolic) labels signifying the quality of contribution      shape) has within their scope (large shaded dashed circle) and
(positive or negative) and crudely characterizing the size of         want to achieve or maintain. The ovals describe hard-goals,
the contribution. However, the use of quantitative (numeric)          which are goals that come with a clear way to decide when
                                                                                             Label   Effect       Label   Effect
they are satisfied, while soft-goals (the clouds) are goals for
                                                                                                     FS → FS              FS → FD
which this is not the case.                                                                          PS → PS              PS → PD
                                                                                              ++                   −−
   Goal modeling languages define a variety of relationships                                         PD → PD              PD → PS
between goals and allow for great structural freedom [10].                                           FD → FD              FD → FS
                                                                                                     FS → PS              FS → PD
However, in our study we restrict our focus to goal models that                                      PS → PS              PS → PD
have specific structural characteristics. Thus, through means-                                 +                    −
                                                                                                     PD → PD              PD → PS
ends and decomposition links, hard-goals form an AND/OR                                              FD → PD              FD → PS
decomposition tree whose solutions describe alternative ways                                                TABLE I
by which the root hard-goal can be satisfied. Soft-goals on                                    S YMBOLIC C ONTRIBUTION S EMANTICS
the other hand form their own hierarchy using contribution
links, the curved directed lines. Similar lines connect some                    or, as we will see below, deciding the optimal alternative by
hard-goals with some soft-goals.                                                considering all contribution links in the structure.
   A contribution link shows in what way satisfaction (or not)                     Giorigini et al. have developed the most expressive seman-
of the origin of the link affects satisfaction (or not) of the                  tics for both symbolic and numeric links [8], [11]. According
destination of the link. This way of affecting the other goal is                to their framework each goal in the diagram can be associated
described through the label of the contribution link. Typically                 with two variables: one that measures satisfaction and one
the label will show whether the effect is positive or negative                  that measures denial. In the qualitative (symbolic) framework
and/or how large it is. Nevertheless, there are more than one                   each of these variables can take one of three values: Full
ways to represent contribution labels and, for each, multiple                   evidence (denoted with prefix F), Partial Evidence (P) and
ways to define their semantics.                                                 No Evidence (N) – of, respectively satisfaction (suffix S) or
   The original and seemingly most popular approach to                          denial (D). For example, for a goal we may have partial
modeling contribution labels is through symbols (diagram                        evidence of satisfaction and no evidence of denial (denoted
on the left in Figure 1). Thus “+”, “++”, “−” and “−−”                          {PS,ND}) and, for another, full evidence of satisfaction and
denote respectively positive (“helps”) very positive (“makes”),                 partial evidence of denial ({FS,PD}); the inconsistency is
negative (“hurts”) and very negative contribution (“breaks”).                   perfectly acceptable and the framework’s ability to represent it
Alternatively numbers can be used to convey this information                    is one if its strengths. A set of rules, seen in Table I, combine
(diagram on the right in Figure 1). Two distinct numeric                        the satisfaction and denial values of the origin goal with the
approaches have been introduced in the literature. The ap-                      contribution label to decide the satisfaction and denial values
proach by Giorgini et al. [8], [11] assigns a number in the                     of the destination. Returning to Figure 1 (qualitative model
real interval [0.0,1.0] to represent size of contribution and a                 on the left), if we know that satisfaction and denial values
sign to represent positive or negative contribution1 . The AHP-                 of Minimal Conflicts are {FS,PD} then based on the rules of
inspired “linear” interpretation [12] also adopted by URN [7]                   Table I Quality of Schedule must be {PS,PD} – assuming no
simply assigns a number in the real interval [0.0,1.0] denoting                 other influence.
the share of contribution of the origin goal to the destination                    In the quantitative (numeric) framework the rules are re-
goal.                                                                           placed by algebraic formulae. The researchers allude to three
                                                                                possible ways by which this formula can be structured, seen
B. Contribution Semantics                                                       in the top three rows of Table II; in practice their framework
   Informal descriptions such as the above about the meaning                    is open to the adoption of many other ways. Given a set of
of the contribution link allow a model reader/user (henceforth                  goals g 0 ∈ Og , each with satisfaction value s(g 0 ) ∈ [0.0, 1.0]
simply user) perform some very basic inferences by looking                      targeting goal g with contribution links weighted as w(g 0 , g),
at the model. For example, she can compare two contribu-                        the satisfaction value of goal g is expected to be s(g) as
tions with respect to which one is larger or she can even                       defined in each of the formulae. In all the proposed formulae
choose between alternatives in the hard-goal decomposition                      (“Bayesian”, “Min-Max” and “Serial-Parallel”) aggregation
with respect to a soft-goal of interest. For example, in the                    is implemented through maximization. Note that in this seman-
symbolic model on the left side of Figure 1, if to Reduce                       tic framework, users are supposed to understand the numbers
Scheduling Effort is an important soft-goal, then we know that                  of the contribution links as absolute contribution values po-
(Choose Schedule) Automatically is preferable than doing so                     tentially elicited and understood in isolation from the other
Manually, by simply looking at the contribution labels and                      ones.
without knowing precisely what they mean. However, more                            A different interpretation of numeric contributions, which is
detailed semantics need to be given in order to perform more                    of particular interest here, is the de-facto approach followed
complex inferences such as deciding on the satisfaction status                  by URN [7] which has been studied by Liaskos et al. [12].
of a goal that receives multiple incoming contribution links,                   According to that interpretation, a unique numeric satisfaction
                                                                                value is assigned to each goal with values in the real interval
   1 Giorgini et al.’s expressive framework also includes a subscript repre-
                                                                                [0.0,1.0] – so no distinct satisfaction and denial values. Then,
senting what is being contributed between satisfaction and denial; both their
quantitative and qualitative version includes this dimension. Presentation of   the number on the contribution link denotes the share of
this dimension is outside our scope.                                            contribution of the satisfaction of the origin goal to the
                                     Fig. 1. Goal models with symbolic (left) and numeric (right) contribution links.

satisfaction the destination goal. This implies also a different              by which the intended meaning and use of the contribution
formula for satisfaction propagation, the last one on Table                   aligns with the users’ intuition.
II; the formula is labeled as “Linear” for it calculates the                     We use the (working) theoretical construct “intuitiveness”
satisfaction of the destination goal through linearly combining               of a model construct to describe the ability of untrained users
the satisfaction value of each goal that influences it, using the             of a conceptual model to readily understand what the construct
numbers on the contribution links as weights for the linear                   means and how it should be used to make inferences in the
combinations.                                                                 model. The concept is analogous to the idea of an intuitive
         Bayesian        s(g) = M AX {s(g 0 ) × w(g 0 , g)}                   human-machine interface: the more intuitive an interface is,
                                 g 0 ∈Og
                                                                              the more readily first-time users can use it without the need
         Min-max         s(g) = M AX {M IN (s(g 0 ), w(g 0 , g))}
                                 g 0 ∈Og                                      to resort to help, a manual etc. The term is akin to that of
                                           s(g 0 )×w(g 0 ,g)                  learnability which is a quality of an interface that allows
       Serial-parallel   s(g) = M AX { s(g0 )+w(g0 ,g) }
                                g 0 ∈Og
                                                                              users to learn how to use it easily and quickly [14]. One can
                         s(g) = g0 ∈Og {s(g 0 ) × w(g 0 , g)}
                                P
           Linear
                                                                              think of intuitiveness as a facilitator of learnability. Design
                              TABLE II                                        principles such as consistency and compliance to standards
                 N UMERIC C ONTRIBUTION S EMANTICS
                                                                              [15] are understood here to facilitate intuitiveness: users will
   While the linear interpretation is arguably less expressive                likely find intuitive a user interface that uses conventions with
and imposes structural limitations to the models (the soft-                   which the user is already familiar.
goal sub-graph must be acyclic) they have been found [12] to                     With this user-machine interface analogy in mind, we can
be amenable to systematic elicitation through an established                  reasonably claim that conceptual models are also artifacts to be
decision making technique, the Analytic Hierarchy Process                     efficiently used by people, where “use” here is “understanding
(AHP) [13]. Following AHP, contribution values are not as-                    and communication” [16]. Further, as design artifacts them-
signed directly but through pairwise comparisons followed by                  selves, modeling languages are results of design decisions at
transformation of the output of these comparisons into the final              two levels: at the level of the concepts they consider (e.g.,
values, controlling also for the consistency of the input, via                hard-goals and soft-goals) and at the level of the visualization
calculation of a Consistency Ratio (CR). Given this promise                   of those concepts (e.g., ovals and clouds). It appears that there
of the linear interpretation for practical use, we adopt it as the            might be better and worse decisions for each of those levels.
quantitative interpretation of choice in the study we propose                 For example, would we instead of ovals and clouds use animal
here.                                                                         pictures (e.g. elephants and dolphins) to represent hard-goals
                                                                              and soft-goals? Likewise, are the concepts “upper-goal” and
C. (A case for) the Intuitiveness Construct                                   “lower-goal” more successful choices for representing human
   Given the above options for visually representing and under-               intention than currently used concepts “hard-goal” and “soft-
standing the use of contribution labels for inferring satisfaction            goal”?
propagation, it is natural to ask which one is more “friendly” to               Intuitiveness, as we conceptualize and apply it here, mea-
users of the models. One aspect of “friendliness” is the level                sures the entire package of a concept and its visualization:
the visualization evokes a meaning, which, in turn, is used to      as a possible measure via self-reporting of participants’ con-
make inferences. When a user is exposed to a visualization          fidence about the aforementioned inferences they perform.
and ends up performing an inference that is not intended by         Efficiency, will be, in this context, measured as the total time
the designers, a sub-optimal decision may be claimed at any         it takes for participants to perform this inference, independent
of the levels: either the users did not map the visualization       of correctness.
to the right concept (e.g. confused a “goal” for an “event”,           2) Experimental Units: We develop a number of goal
both otherwise being clearly understood concepts), or they          models such as those in Figure 1. We specifically develop
did so correctly but did not understand the concept as the          two (2) sets of models: qualitative, in which contribution
language designers intended them to (e.g., they correctly           lables are symbolic, and quantitative, where contribution labels
mapped a symbol to an “upper-goal” but didn’t know what             are numeric following the “linear” semantics. All models
to do with the latter). While training may arguably establish       contain one OR-decomposition of hard-goals (so one decision)
correct bridging between visualization and inference in the         together with a hierarchy of soft-goals that are used as criteria
long term, intuitiveness is exhibited when limited such training    for choosing the optimal alternative within the decomposition.
is necessary.                                                       By having a unique root goal in the soft-goal hierarchy the goal
   In the context of contribution links in goal models, the         model implies that, generally, one of the depicted alternatives
inference we are interested in is how users assign satisfaction     is optimal compared to the others.
to goals given satisfaction of other goals based on their              To show how this is possible let us go back to Figure 1 and
own interpretation of what contribution labels seem to mean.        consider the decomposition Manually versus Automatically.
Reversely, their observed inferences reveal their perceived         We can assume that whenever we pick one of the alternatives
meaning of the links, and, as such, the former can be used          the corresponding hard-goal is assigned maximum satisfaction
to develop empirical operationalizations of the latter. In the      and, if applicable minimum denial value. Thus, to choose the
experiment we describe below, we ask the users to make              alternative Manually we assign to it maximum satisfaction
decisions using goal models. To do so, they need to adopt a         values {FS, ND} (qualitative case) or s(Manually) = 1 (quan-
way of using the contribution link and, implicitly, a semantics     titative case), and to all other alternatives (in our case only
for those links. The alignment of the semantics implied by          Automatically) values {NS, ND} or s(·) = 0. We then perform
how users use the models with the designed semantics (i.e. the      recursive bottom-up application of the propagation rules of
semantics intended by the designers), as exhibited by whether       Tables I and II (depending on case), in order to calculate
the results of the inference match, is, we claim, a possible        the satisfaction of the root goal Overall Scheduling Quality.
indication of the intuitiveness of the designed semantics.          For the quantitative models specifically we follow the linear
                                                                    interpretation of the last row of Table II. Different choices
                III. E XPERIMENTAL D ESIGN                          of alternative will result in different satisfaction level for the
A. Overview and Research Question                                   root goal. The alternative that results to the highest satisfaction
                                                                    value for the root goal is the optimal.
   In the proposed study we pick two approaches for modeling
                                                                       In the quantitative case, satisfaction is a unique value and
and assigning meaning to contribution links and compare
                                                                    the comparison straightforward. In the example of Figure 1
them with regards to measures of intuitiveness and efficiency.
                                                                    (model on the right), Manually causes satisfaction of Overall
We specifically compare the symbolic against the numeric
                                                                    Scheduling Quality by approx. 0.6 compared to approx. 0.4
approach, the latter under the linear interpretation. There is
                                                                    implied by selection of Automatically. Thus, Manually is the
one main research question we wish to address:
                                                                    optimal alternative2 .
    RQ. Which of the two methods for modeling contribution             In the qualitative case, calculation is less straightforward
          links is the most (a) intuitive and (b) efficient for     in that there are two variables to consider, satisfaction and
          the task of identifying optimal alternatives in goal      denial. To make different satisfaction levels comparable we
          models?                                                   aggregate the two values into one, the aggregated satisfaction
   We address the above through a controlled experiment with        value. To calculate the aggregated satisfaction values, we
human participants.                                                 firstly associate qualitative satisfaction labels {N, P, F} with
                                                                    numeric values 0,1,2, respectively. We denote the resulting
B. Experimental Tasks and Measurements                              numeric satisfaction and denial of a goal g as sat(g) and
   1) Measures: The two constructs we are considering are           den(g), respectively. The aggregated satisfaction value is then
intuitiveness and efficiency. We theoretically defined intuitive-   sat(g) − den(g) which results to an integer in [-2,2]. Thus,
ness as the degree by which untrained users can make accurate       the aggregated satisfaction value of a goal g1 with {PS, FD}
inferences with models they are exposed to. Operationally,          is sat(g1 ) − den(g1 ) = 1 − 2 = −1 and of a goal g2 with
we will measure intuitiveness by exposing the experimental          {FS, ND}, sat(g2 ) − den(g2 ) = 2 − 0 = 2. For the qualitative
participants to a sample of goal models and asking them
                                                                      2 To simulate the experience of our experimental participants the reader
to perform an inference, which we then compare with the
                                                                    can look at the diagram and verify if the assertion that Manually is optimal
“correct” inference as dictated by the adopted contribution         can be inferred intuitively, by roughly comparing the numbers and without
modeling approach. Perception of intuitiveness is also included     performing precise calculations.
model on the left of Figure 1, it can be verified that Overall       levels, the numeric one being understood as more precise,
Scheduling Quality is {PS, PD} for Manually and also {PS,            this difference does not seem to threaten our comparison
PD} for Automatically. Hence, both alternatives lead to the          effort but rather offer us a possible explanatory view to a
same aggregated satisfaction value for the root goal, that is 0,     potential result. If, for example, a difference is discovered in
and as such they are equally optimal.                                favor of the numeric format, it might be due to a number
   3) Model Sampling: To develop the samples of goal models          of reasons including precision but also, e.g., familiarity of
that we need, we pick a goal structure (more below) and ran-         the participants with numerical reasoning and assessment of
domly choose contribution link labels, such that the distance        proportions. Identifying those precise reasons – assuming
in satisfaction value of the best alternative (i.e., optimal with    the effect is eventually observed – is a matter for future
respect to the root soft-goal) compared to the second best           investigation.
alternative is controlled to not exceed or be less than a fixed         4) Instrument and Tasks: Using the sampling procedure
value. Thus, we ensure that the distance is neither too large so     described above we develop a total of twelve (12) quantitative
that the task of identifying the optimal alternative is trivial in   models. The goal structures refer to three (3) different domains
all cases, nor too small to constitute an unimportant distance       describing intentional structures in the context of decisions:
in terms of decision making and also be impossible to detect         Choosing an Apartment, Choosing a Course, and Choosing
even by some of the participants.                                    a Means of Transportation. We develop the models based
   Specifically, in qualitative models contribution labels are       on specific domains, rather than using dummy names (A, B,
assigned randomly one of the labels “++”, “+”, “−−”, and             C etc.) for the purpose of making the tasks more realistic.
“−”, such that the first alternative has a distance from the         This introduces the threat that participants may use their own
second alternative of two (2) levels of satisfaction, based on       opinion of how goals are related to each other ignoring the
the aggregated satisfaction value of the root soft-goal that each    information provided in the contribution link. To avoid this
alternative results in.                                              bias, participants are told that the structures represent decision
   Thus, a goal model in which the best alternative, when            problems of a third party and that their task is to help that party
chosen, makes the root goal {FS, ND}, hence aggregated value         make the decision based on the priorities of that party as these
2 − 0 = 2 and the second best makes the root goal {PS, PD},          priorities are represented in the goal structure.
hence aggregated value 1−1 = 0, qualifies for inclusion to our          For each domain we develop two (2) structures (one with
sample as the distance of the two top alternatives is 2. A goal      two and one with three alternatives) and for each structure
model, on the other hand, in which the top two alternatives          we sample two (2) labels-sets (i.e., sets of labels for the
are both {FS, PD} have both an aggregated value of 1 and             contributions) sampled as described above. To produce quali-
hence distance of zero; so they do not qualify.                      tative counterparts we simply copy the twelve (12) quantitative
   In quantitative models we also randomly sample while              structures and replace the numbers with randomly sampled
ensuring that the first alternative has a distance of 0.4 from the   symbolic labels – again, as described above.
second; again, in terms of the satisfaction they imply for the          We then present the resulting twelve (12) models of each
root goal. For example a set of weights that gives satisfaction      type (qualitative and quantitative) to the participants one after
value 0.7 to the first alternative and 0.3 to the second qualifies   the other asking each time what they believe the optimal
for inclusion to our sample. The model of Figure 1 (right),          alternative is. Domains are presented in random order and
focussing on the Choose Schedule decision, does not qualify          models within the domains in random order as well. Three
as the distance is 0.6 − 0.4 = 0.2                                   video presentations precede these tasks: one describes decision
   The choice of 0.4 is made to match the corresponding choice       problems in general, another introduces goal models and a
in the qualitative models. Observe that in qualitative models        third one introduces the three domains. The second video
the maximum distance between alternatives is 4 ({FS, ND}             specifically, describes the intuition behind the contribution
versus {NS, FD} so 2 - (-2)). The distance we demand is 2,           links of each type carefully without getting into the mechanics
thus half of this space. Respectively in the quantitative models     of satisfaction propagation. The videos are scripted and are
the maximum theoretical distance is 1.0, so half the space           the same in the two cases (qualitative and quantitative) except
would be 0.5. However we end-up to 0.4 – biasing slightly            obviously for the places where the numbers or symbols are
against numeric models – as for some of our structures there         presented.
does not seem to exist combinations of numeric labels that              The videos are chosen as the instruction method for three
yield a distance of exactly 0.5.                                     reasons (a) allow for repeatability of the procedure, (b) control
   To remain consistent with the claim that linear interpretation    for biases in training, and (c) allow for remote administration
is chosen due to the systematic elicitation approach that is         or administration by non-experts.
afforded by it, namely AHP, all numeric sampling is done                A simple demographics questionnaire (age, sex, education,
through simulated AHP pair comparison processes and subse-           prior knowledge of goal models) precedes the main test.
quent profile calculations, such that the consistency ratio (CR)     Participants are unlikely to be familiar with goal models, and
is less than 0.1.                                                    input coming from those who actually are will be discarded.
   It is worthwhile to note that, while it is understood that        However, if familiarity to goal modes turns out to be more
the two representation approaches have different precision           prevalent, treating familiarity as a covariant is another option.
   5) Participant Sample: We plan to consider the University         option of measuring perceived intuitiveness via self-reporting
student pool as the population to opportunistically sample           the confidence of participants on their inferences. Our design
from, specifically intermediate/senior students from various         relies on random sampling of a number of goal structures
disciplines. We claim that this does not harm the generalizabil-     depicting a decision problem and asking participants to choose
ity of the particular study. Firstly, having a valid noteworthy      the optimal of the available choices, thereby making intuitive
intuition about how the particular conceptual modeling con-          inferences about contribution links. The decision problems are
struct works does not seem to require experience and skill in        carefully sampled to allow for a controlled distance between
any specific field: goal models refer to concepts (goals and         the optimal and second optimal choice.
their fulfillment) that should be accessible to anyone who has          Empirically evaluating the effectiveness of diagrammatic
successfully entered post-secondary education – compared to,         notations has been widely studied in the literature. Much of the
for instance, component diagrams describing software designs.        research in the field has been dedicated towards understanding
Secondly, it seems to be the implicit ambition of goal modeling      the comprehensibility of (various aspects of) UML and ER
language designers that goal models are artifacts that not only      diagrams – e.g., [20]–[24] – or process models [25]–[28].
analysts but also stakeholders are able to comprehend and use        Although understandability is a popular construct of study,
to their benefit [17]. If this is the case, then the population      it has been argued that there is little agreement on how this is
we should be drawing participants from is, roughly speaking,         to be measured. Indeed, in their survey, Houy et al. [29] find
the population of all people who might serve as decision             variability in how understandability is operationalized in the
making stakeholders in a systems development project. While          literature. The concept of intuitiveness, as a specialization of
there is no authoritative data about the characteristics of this     understandability, is less frequently being focused on explicitly
population, we believe that the breadth of educational and skill     as in work by Jošt et al., for example, where the intuitive
profiles in it can be credibly approximated by a sample of           understandability of various methods for modeling processes
intermediate/senior University undergraduate students.               are empirically compared [30].
   6) Variables and Analysis Approach: It becomes obvious               Work that relates to understanding the comprehensibility
from the above that the experiment is a simple compari-              of goal models specifically is more limited. Horkoff et al.
son between two levels (qualitative vs. quantitative) of one         evaluate an interactive evaluation technique for goal models
independent variable/factor (contribution link representation        [31]. The way various concepts within goal models are visu-
method) arranged in a between-subjects fashion. Dependent            alized has also been the matter of investigation and empirical
variables are the accuracy measured as the number of correct         evaluation. Moody et al. offer an assessment of the i* visual
responses per participant, hence a number in [0,12], as well         syntax based on established rules (“Physics of Notations”)
as response time which is the average time participants need         [32]. An empirical analysis was followed by Caire et al. [33] in
in order to provide a response.                                      which experimental participants evaluate visualization choices
   It is also possible to measure confidence in each partic-         of the language’s primitives. Elsewhere, Hadar et al. [34]
ipant’s response as a measure of perceived intuitiveness. In         compare goal diagrams with use case diagrams on a variety
earlier studies [18], [19], we augmented each exercise with          of user tasks. Measures include text-model mapping, model
a 5-level Likert-style question “how confident are you of            reading (extracting information from the model), and model
your answer above”, with possible answers Very Unconfident,          modification (performing targeted modifications to models).
Confident, Neutral, etc. The higher the confidence the higher        Carvallo and Franch have also studied, in the context of
the perceived intuitiveness, i.e., how intuitive the participants    a case study, how non-technical stakeholders performed in
think the representation is. However, this additional question       developing strategic dependency i* diagrams [17].
increases experimental time and fatigue. Addition of this               Compared to the above, our work is more targeted to a
variable would depend on our ability to keep the instrument          specific construct of goal models, that is contribution links. In
short, i.e., around 30 to 40 minutes.                                earlier work on the subject [18] we set out to investigate the
   Simple comparisons between means appear to be sufficient          intuitiveness of the rules in Table I. In that experiment, we
as a statistical procedure, with the expected deviation from         presented to experimental participants a series of contribution
normality kept in view – the scale [0,12] is particularly inviting   links each connecting two goals in which the satisfaction
for ceiling effects.                                                 value of the origin is know. As we also propose here, we
                                                                     operationalized intuitiveness by asking participants what their
            IV. S UMMARY AND R ELATED W ORK                          “hunch” is with respect to the satisfaction of the destination
   We presented an experimental design for comparing the             goal and comparing their input to the authoritative one of Table
intuitiveness of symbolic versus numeric contribution links in       I. Among our findings were that rules involving positive labels
goal models. We use intuitiveness as our main comparison             and goal satisfaction are more intuitive to ones with negative
construct, defined as the ability of novice users of the notation    labels and goal denial.
to correctly understand how they can use it. We operationalize          We also endeavored to compare the quantitative rules of
intuitiveness by measuring agreement between authoritative           Table II [35]. In that work we simply presented to participants
inferences and inferences participants make, as well as the          hierarchies of soft-goals with known satisfaction values at
time it takes for the latter to take place. We also include the      the leaf level and asked them to choose the satisfaction of
the root goal from a set of four values, each representing                     [8] P. Giorgini, J. Mylopoulos, E. Nicchiarelli, and R. Sebastiani, “Rea-
one of the possibilities of Table II. We found that the serial-                    soning with Goal Models,” in Proceedings of the 21st International
                                                                                   Conference on Conceptual Modeling (ER’02), London, UK, 2002, pp.
parallel method was not preferred while the most preferred                         167–181.
depended on whether the contribution weights added up to 1.0,                  [9] J. Horkoff and E. Yu, “Comparison and evaluation of goal-oriented
in which case a linear interpretation was evoked. In general,                      satisfaction analysis techniques,” Requirements Engineering (REJ), pp.
                                                                                   1–24, 2011.
our fundamental null hypothesis that the answers would be                     [10] E. S. Yu, “GRL - Goal-oriented Requirement Language.” [Online].
uniformly random was rejected, indicating that more research                       Available: http://www.cs.toronto.edu/km/GRL/
should be done on the matter.                                                 [11] P. Giorgini, J. Mylopoulos, E. Nicchiarelli, and R. Sebastiani, Formal
                                                                                   Reasoning Techniques for Goal Models. Springer Berlin Heidelberg,
   Finally, in a different effort [19] and in a vain some-                         2003, pp. 1–20. [Online]. Available: https://doi.org/10.1007/978-3-540-
what similar to that of Caire et al. [33] we focused on the                        39733-5 1
visualization of contribution measures that is alternative to                 [12] S. Liaskos, R. Jalman, and J. Aranda, “On Eliciting Preference and
                                                                                   Contribution Measures in Goal Models,” in Proceedings of the 20th
diagrammatic. We specifically employed bar-charts, pie-charts                      International Requirements Engineering Conference (RE’12), Chicago,
and tree-maps to represent quantitative goal diagrams such as                      IL, 2012, pp. 221–230.
those of Figure 1 – following again the “linear” interpretation.              [13] T. L. Saaty, “Decision making with the analytic hierarchy process,”
                                                                                   International Journal of Services Sciences (IJSSCI), vol. 1, no. 1, pp.
Exactly as we propose here, we presented users with decision                       83 – 98, 2008.
problems and asked them to pick the optimal alternative using                 [14] Y. R. Preece, H. Sharp, and Jennifer, Interaction Design: beyond human-
each of the visualizations under comparison. We found that the                     computer interaction. Wiley, 2011.
                                                                              [15] J. Nielsen, “Ten Usability Heuristics.” [Online]. Available:
combination of pie-charts and bar-charts lead to more accurate                     https://tfa.stanford.edu/download/TenUsabilityHeuristics.pdf
identification of the optimal alternative and that diagrams were              [16] J. Mylopoulos., “Conceptual Modeling and Telos,” in Conceptual Mod-
not better in none of the tests or measures.                                       elling, Databases and CASE: An Integrated View of Information Systems
                                                                                   Development. Wiley, 1992.
   The difference of the above effort [19] and the current work               [17] J. P. Carvallo and X. Franch, “An empirical study on the use of i* by
is that, while in that paper the semantics are assumed and                         non-technical stakeholders: the case of strategic dependency diagrams,”
the visualization is in question, in the study proposed here,                      Requirements Engineering (REJ), pp. 1–27, 2018.
                                                                              [18] S. Liaskos, A. Ronse, and M. Zhian, “Assessing the Intuitiveness
both the visualization and its meaning are under comparison.                       of Qualitative Contribution Relationships in Goal Models: an
The result can thus be interpretable at either level. For the                      Exploratory Experiment,” in Proceedings of the 11th ACM/IEEE
future, we are interested in exploring theoretical and method-                     International Symposium on Empirical Software Engineering and
                                                                                   Measurement (ESEM’17), 2017, pp. 466–471. [Online]. Available:
ological approaches through which these two aspects can be                         http://www.yorku.ca/liaskos/Docs/ESEM17.pdf
separately evaluated. The endeavour is not a simple one, as                   [19] S. Liaskos, T. Dundjerovic, and G. Gabriel, “Comparing
understanding of any communication of a concept can be                             Alternative Goal Model Visualizations for Decision Making:
                                                                                   an Exploratory Experiment,” in Proceedings of the 33rd
argued to be affected by the way it is communicated – through                      Annual ACM Symposium on Applied Computing (SAC’18),
words, visualizations or other methods. Thus it may prove                          Pau, France, 2018, pp. 1272–1281. [Online]. Available:
difficult to measure comprehension of a concept as a “pure”                        http://www.yorku.ca/liaskos/Papers/SAC2018/Visualizations/SAC2018.pdf
                                                                              [20] J. A. Cruz-Lemus, M. Genero, M. E. Manso, S. Morasca, and M. Piattini,
abstraction. Such a problematic demonstrates how empirical                         “Assessing the understandability of UML statechart diagrams with
investigation, even at the conceptualization stage, forces us                      composite states—A family of empirical studies,” Empirical Software
to think more deeply into the substance of the process of                          Engineering, vol. 14, no. 6, pp. 685–719, 2009.
conceptual modeling and the nature of its artifacts.                          [21] H. C. Purchase, R. Welland, M. McGill, and L. Colpoys, “Compre-
                                                                                   hension of diagram syntax: an empirical study of entity relationship
                                                                                   notations,” International Journal of Human-Computer Studies, vol. 61,
                             R EFERENCES                                           no. 2, pp. 187–203, 2004.
 [1] E. S. K. Yu, “Towards Modelling and Reasoning Support for Early-         [22] P. Shoval and I. Frumermann, “OO and EER Conceptual Schemas: A
     Phase Requirements Engineering,” in Proceedings of the 3rd IEEE Inter-        Comparison of User Comprehension,” Journal of Database Management
     national Symposium on Requirements Engineering (RE’97), Annapolis,            (JDM), vol. 5, no. 4, pp. 28–38, 1994.
     MD, 1997, pp. 226–235.                                                   [23] A. De Lucia, C. Gravino, R. Oliveto, and G. Tortora, “Data model com-
 [2] D. Amyot and G. Mussbacher, “User Requirements Notation: The First            prehension an empirical comparison of ER and UML class diagrams,”
     Ten Years, The Next Ten Years (Invited Paper),” Journal of Software           Proceedings of the 16th IEEE International Conference on Program
     (JSW), vol. 6, no. 5, pp. 747–768, 2011.                                      Comprehension (ICPC 2008), pp. 93–102, 2008.
 [3] A. Dardenne, A. van Lamsweerde, and S. Fickas, “Goal-Directed            [24] M. Genero, G. Poels, and M. Piattini, “Defining and validating metrics
     Requirements Acquisition,” Science of Computer Programming, vol. 20,          for assessing the understandability of entity-relationship diagrams,” Data
     no. 1-2, pp. 3–50, 1993.                                                      and Knowledge Engineering, vol. 64, no. 3, pp. 534–557, 2008.
 [4] J. Mylopoulos, L. Chung, S. Liao, H. Wang, and E. Yu, “Exploring         [25] D. Q. Birkmeier, S. Klockner, and S. Overhage, “An Empirical Compari-
     Alternatives During Requirements Analysis,” IEEE Software, vol. 18,           son of the Usability of BPMN and UML Activity Diagrams for Business
     no. 1, pp. 92–96, 2001.                                                       Users,” in Proceedings of the 18th European Conference on Information
 [5] S. Liaskos, S. M. Khan, M. Soutchanski, and J. Mylopoulos, “Modeling          Systems (ECIS’10), 2010, pp. 51–62.
     and Reasoning with Decision-Theoretic Goals,” in Proceedings of the      [26] K. Figl, J. Recker, and J. Mendling, “A study on the effects of routing
     32th International Conference on Conceptual Modeling, (ER’13), Hong-          symbol design on process model comprehension,” Decision Support
     Kong, China, 2013, pp. 19–32.                                                 Systems, vol. 54, no. 2, pp. 1104–1118, 2013.
 [6] S. Liaskos, S. McIlraith, S. Sohrabi, and J. Mylopoulos, “Representing   [27] K. Figl and R. Laue, “Cognitive Complexity in Business Process
     and reasoning about preferences in requirements engineering,” Require-        Modeling,” in Proceedings of the 23rd International Conference on
     ments Engineering Journal (REJ), vol. 16, no. 3, pp. 227–249, 2011.           Advanced Information Systems Engineering (CAiSE 2011) London, UK,
 [7] D. Amyot, S. Ghanavati, J. Horkoff, G. Mussbacher, L. Peyton, and             June 20-24, 2011. Proceedings, 2011, pp. 452–466.
     E. S. K. Yu, “Evaluating goal models within the goal-oriented require-   [28] J. Mendling and M. Strembeck, “Influence Factors of Understanding
     ment language,” International Journal of Intelligent Systems, vol. 25,        Business Process Models,” 11th International Conference on Business
     no. 8, pp. 841–877, 2010.                                                     Information Systems, pp. 142–153, 2008.
[29] C. Houy, P. Fettke, and P. Loos, “Understanding understandability of
     conceptual models - What are we actually talking about?” in Proceedings
     of the 31st International Conference on Conceptual Modeling (ER 2012),
     vol. (LNCS 7532), 2012, pp. 64–77.
[30] G. Jošt, J. Huber, M. Heričko, and G. Polančič, “An empirical inves-
     tigation of intuitive understandability of process diagrams,” Computer
     Standards and Interfaces, vol. 48, pp. 90–111, 2016.
[31] J. Horkoff and E. S. K. Yu, “Interactive goal model analysis for early
     requirements engineering,” Requirements Engineering, vol. 21, no. 1,
     pp. 29–61, 2016.
[32] D. L. Moody, P. Heymans, and R. Matulevičius, “Visual syntax does
     matter: improving the cognitive effectiveness of the i* visual notation,”
     Requirements Engineering, vol. 15, no. 2, pp. 141–175, 2010.
[33] P. Caire, N. Genon, P. Heymans, and D. L. Moody, “Visual notation
     design 2.0: Towards user comprehensible requirements engineering
     notations,” in Proceedigns of the 21st IEEE International Requirements
     Engineering Conference (RE’13), jul 2013, pp. 115–124.
[34] I. Hadar, I. Reinhartz-Berger, T. Kuflik, A. Perini, F. Ricca, and A. Susi,
     “Comparing the comprehensibility of requirements models expressed
     in Use Case and Tropos: Results from a family of experiments,”
     Information and Software Technology, vol. 55, no. 10, pp. 1823–1843,
     2013.
[35] N. Alothman, M. Zhian, and S. Liaskos, “User Perception of
     Numeric Contribution Semantics for Goal Models: an Exploratory
     Experiment,” in Proceedings of the 36th International Conference on
     Conceptual Modeling (ER’17), 2017, pp. 451–465. [Online]. Available:
     http://www.yorku.ca/liaskos/Docs/ER17.pdf