Comparing the comprehensibility of numeric versus symbolic contribution labels in goal models: an experimental design Sotirios Liaskos Wisal Tambosi School of Information Technology School of Information Technology York University York University Toronto, Canada Toronto, Canada liaskos@yorku.ca w.tambosi@gmail.com Abstract—Goal models have been suggested to be an effec- values has also been proposed, whereby, e.g., sign and absolute tive way to support decision making in early requirements value are used to represent quality and size of contribution. engineering. Such models are capable of representing a large These representational options have been studied from a number of alternative ways to solve stakeholder problems and comparing them against each other with respect to higher level theoretical point of view and different formal semantics have objectives. Core to the realization of such analysis is the concept been proposed, each showing how the representations allow of the contribution link that represents how satisfaction of one inference of satisfaction status of one goal from that of other goal affects satisfaction of another. Many ways for representing goals. and assigning precise meaning to contribution links have been However, limited work has been done in terms of how proposed, each with different properties and advantages. But which one agrees more with user preferences on how such links users of the models perceive what the symbols and/or numbers should be used? In this paper, we present an experimental design mean and how they expect to use them in order to make for comparing two ways for representing contribution links, inferences pertinent to decision making. It is particularly useful symbolic versus numeric, with respect to how accurately and to understand how users intuitively assign meaning to signifiers quickly users identify optimal decisions using each representation within the language, when no prior training and/or experience format. Apart from comparing the two representation techniques and advising the modeling practice accordingly, the study aims with the language can be assumed for them. Knowing what at showing how a quality construct we call intuitiveness can be untrained user’s intuition is, language designers can settle for added to the range of criteria a modeling language designer has representations and semantics that are closer to the user’s at her disposal for evaluating her language design decisions. expectations and, as such, easier to learn and more accurate to use. I. I NTRODUCTION In this paper we present an experimental design aimed at Goal models [1]–[3] have long been proposed as an effective comparing the intuitiveness of qualitative versus quantitative means for representing intentional structures and their relation- contribution labels in goal models, having assumed specific ship to decision problems in early requirements engineering semantics for each. Our design aims at showing which of [4]–[6]. Using such models, business analysts can capture the the two visualization-meaning pairs leads to more accurate variety of ways by which stakeholders can solve their business decisions in the least amount of time. problems and compare them with one another with respect to The rest of the paper is organized as follows. In Section set criteria. II we offer some background on goal models, contribution Many representational and semantic frameworks have been links and their semantics. In Section III we describe the proposed within the goal modeling community to allow such experimental design and in Section IV we summarize and analysis [5]–[8] ( [9] for a survey). One of the fundamental review some of the related work. constituents of goal models that allow such analysis is the concept of the contribution link, which is a representation of II. BACKGROUND a relationship between two goals signifying how satisfaction of one affects the satisfaction of the other. Different goal A. Goal Models and Contribution Links modeling and analysis frameworks propose different ways The goal models we consider in this study look like the to visually represent and assign meaning to the contribution ones in Figure 1. The nodes (ovals and clouds) are goals that concept. The traditional/de-facto representation choice is qual- describe states of the world that the actor in question (circular itative (symbolic) labels signifying the quality of contribution shape) has within their scope (large shaded dashed circle) and (positive or negative) and crudely characterizing the size of want to achieve or maintain. The ovals describe hard-goals, the contribution. However, the use of quantitative (numeric) which are goals that come with a clear way to decide when Label Effect Label Effect they are satisfied, while soft-goals (the clouds) are goals for FS → FS FS → FD which this is not the case. PS → PS PS → PD ++ −− Goal modeling languages define a variety of relationships PD → PD PD → PS between goals and allow for great structural freedom [10]. FD → FD FD → FS FS → PS FS → PD However, in our study we restrict our focus to goal models that PS → PS PS → PD have specific structural characteristics. Thus, through means- + − PD → PD PD → PS ends and decomposition links, hard-goals form an AND/OR FD → PD FD → PS decomposition tree whose solutions describe alternative ways TABLE I by which the root hard-goal can be satisfied. Soft-goals on S YMBOLIC C ONTRIBUTION S EMANTICS the other hand form their own hierarchy using contribution links, the curved directed lines. Similar lines connect some or, as we will see below, deciding the optimal alternative by hard-goals with some soft-goals. considering all contribution links in the structure. A contribution link shows in what way satisfaction (or not) Giorigini et al. have developed the most expressive seman- of the origin of the link affects satisfaction (or not) of the tics for both symbolic and numeric links [8], [11]. According destination of the link. This way of affecting the other goal is to their framework each goal in the diagram can be associated described through the label of the contribution link. Typically with two variables: one that measures satisfaction and one the label will show whether the effect is positive or negative that measures denial. In the qualitative (symbolic) framework and/or how large it is. Nevertheless, there are more than one each of these variables can take one of three values: Full ways to represent contribution labels and, for each, multiple evidence (denoted with prefix F), Partial Evidence (P) and ways to define their semantics. No Evidence (N) – of, respectively satisfaction (suffix S) or The original and seemingly most popular approach to denial (D). For example, for a goal we may have partial modeling contribution labels is through symbols (diagram evidence of satisfaction and no evidence of denial (denoted on the left in Figure 1). Thus “+”, “++”, “−” and “−−” {PS,ND}) and, for another, full evidence of satisfaction and denote respectively positive (“helps”) very positive (“makes”), partial evidence of denial ({FS,PD}); the inconsistency is negative (“hurts”) and very negative contribution (“breaks”). perfectly acceptable and the framework’s ability to represent it Alternatively numbers can be used to convey this information is one if its strengths. A set of rules, seen in Table I, combine (diagram on the right in Figure 1). Two distinct numeric the satisfaction and denial values of the origin goal with the approaches have been introduced in the literature. The ap- contribution label to decide the satisfaction and denial values proach by Giorgini et al. [8], [11] assigns a number in the of the destination. Returning to Figure 1 (qualitative model real interval [0.0,1.0] to represent size of contribution and a on the left), if we know that satisfaction and denial values sign to represent positive or negative contribution1 . The AHP- of Minimal Conflicts are {FS,PD} then based on the rules of inspired “linear” interpretation [12] also adopted by URN [7] Table I Quality of Schedule must be {PS,PD} – assuming no simply assigns a number in the real interval [0.0,1.0] denoting other influence. the share of contribution of the origin goal to the destination In the quantitative (numeric) framework the rules are re- goal. placed by algebraic formulae. The researchers allude to three possible ways by which this formula can be structured, seen B. Contribution Semantics in the top three rows of Table II; in practice their framework Informal descriptions such as the above about the meaning is open to the adoption of many other ways. Given a set of of the contribution link allow a model reader/user (henceforth goals g 0 ∈ Og , each with satisfaction value s(g 0 ) ∈ [0.0, 1.0] simply user) perform some very basic inferences by looking targeting goal g with contribution links weighted as w(g 0 , g), at the model. For example, she can compare two contribu- the satisfaction value of goal g is expected to be s(g) as tions with respect to which one is larger or she can even defined in each of the formulae. In all the proposed formulae choose between alternatives in the hard-goal decomposition (“Bayesian”, “Min-Max” and “Serial-Parallel”) aggregation with respect to a soft-goal of interest. For example, in the is implemented through maximization. Note that in this seman- symbolic model on the left side of Figure 1, if to Reduce tic framework, users are supposed to understand the numbers Scheduling Effort is an important soft-goal, then we know that of the contribution links as absolute contribution values po- (Choose Schedule) Automatically is preferable than doing so tentially elicited and understood in isolation from the other Manually, by simply looking at the contribution labels and ones. without knowing precisely what they mean. However, more A different interpretation of numeric contributions, which is detailed semantics need to be given in order to perform more of particular interest here, is the de-facto approach followed complex inferences such as deciding on the satisfaction status by URN [7] which has been studied by Liaskos et al. [12]. of a goal that receives multiple incoming contribution links, According to that interpretation, a unique numeric satisfaction value is assigned to each goal with values in the real interval 1 Giorgini et al.’s expressive framework also includes a subscript repre- [0.0,1.0] – so no distinct satisfaction and denial values. Then, senting what is being contributed between satisfaction and denial; both their quantitative and qualitative version includes this dimension. Presentation of the number on the contribution link denotes the share of this dimension is outside our scope. contribution of the satisfaction of the origin goal to the Fig. 1. Goal models with symbolic (left) and numeric (right) contribution links. satisfaction the destination goal. This implies also a different by which the intended meaning and use of the contribution formula for satisfaction propagation, the last one on Table aligns with the users’ intuition. II; the formula is labeled as “Linear” for it calculates the We use the (working) theoretical construct “intuitiveness” satisfaction of the destination goal through linearly combining of a model construct to describe the ability of untrained users the satisfaction value of each goal that influences it, using the of a conceptual model to readily understand what the construct numbers on the contribution links as weights for the linear means and how it should be used to make inferences in the combinations. model. The concept is analogous to the idea of an intuitive Bayesian s(g) = M AX {s(g 0 ) × w(g 0 , g)} human-machine interface: the more intuitive an interface is, g 0 ∈Og the more readily first-time users can use it without the need Min-max s(g) = M AX {M IN (s(g 0 ), w(g 0 , g))} g 0 ∈Og to resort to help, a manual etc. The term is akin to that of s(g 0 )×w(g 0 ,g) learnability which is a quality of an interface that allows Serial-parallel s(g) = M AX { s(g0 )+w(g0 ,g) } g 0 ∈Og users to learn how to use it easily and quickly [14]. One can s(g) = g0 ∈Og {s(g 0 ) × w(g 0 , g)} P Linear think of intuitiveness as a facilitator of learnability. Design TABLE II principles such as consistency and compliance to standards N UMERIC C ONTRIBUTION S EMANTICS [15] are understood here to facilitate intuitiveness: users will While the linear interpretation is arguably less expressive likely find intuitive a user interface that uses conventions with and imposes structural limitations to the models (the soft- which the user is already familiar. goal sub-graph must be acyclic) they have been found [12] to With this user-machine interface analogy in mind, we can be amenable to systematic elicitation through an established reasonably claim that conceptual models are also artifacts to be decision making technique, the Analytic Hierarchy Process efficiently used by people, where “use” here is “understanding (AHP) [13]. Following AHP, contribution values are not as- and communication” [16]. Further, as design artifacts them- signed directly but through pairwise comparisons followed by selves, modeling languages are results of design decisions at transformation of the output of these comparisons into the final two levels: at the level of the concepts they consider (e.g., values, controlling also for the consistency of the input, via hard-goals and soft-goals) and at the level of the visualization calculation of a Consistency Ratio (CR). Given this promise of those concepts (e.g., ovals and clouds). It appears that there of the linear interpretation for practical use, we adopt it as the might be better and worse decisions for each of those levels. quantitative interpretation of choice in the study we propose For example, would we instead of ovals and clouds use animal here. pictures (e.g. elephants and dolphins) to represent hard-goals and soft-goals? Likewise, are the concepts “upper-goal” and C. (A case for) the Intuitiveness Construct “lower-goal” more successful choices for representing human Given the above options for visually representing and under- intention than currently used concepts “hard-goal” and “soft- standing the use of contribution labels for inferring satisfaction goal”? propagation, it is natural to ask which one is more “friendly” to Intuitiveness, as we conceptualize and apply it here, mea- users of the models. One aspect of “friendliness” is the level sures the entire package of a concept and its visualization: the visualization evokes a meaning, which, in turn, is used to as a possible measure via self-reporting of participants’ con- make inferences. When a user is exposed to a visualization fidence about the aforementioned inferences they perform. and ends up performing an inference that is not intended by Efficiency, will be, in this context, measured as the total time the designers, a sub-optimal decision may be claimed at any it takes for participants to perform this inference, independent of the levels: either the users did not map the visualization of correctness. to the right concept (e.g. confused a “goal” for an “event”, 2) Experimental Units: We develop a number of goal both otherwise being clearly understood concepts), or they models such as those in Figure 1. We specifically develop did so correctly but did not understand the concept as the two (2) sets of models: qualitative, in which contribution language designers intended them to (e.g., they correctly lables are symbolic, and quantitative, where contribution labels mapped a symbol to an “upper-goal” but didn’t know what are numeric following the “linear” semantics. All models to do with the latter). While training may arguably establish contain one OR-decomposition of hard-goals (so one decision) correct bridging between visualization and inference in the together with a hierarchy of soft-goals that are used as criteria long term, intuitiveness is exhibited when limited such training for choosing the optimal alternative within the decomposition. is necessary. By having a unique root goal in the soft-goal hierarchy the goal In the context of contribution links in goal models, the model implies that, generally, one of the depicted alternatives inference we are interested in is how users assign satisfaction is optimal compared to the others. to goals given satisfaction of other goals based on their To show how this is possible let us go back to Figure 1 and own interpretation of what contribution labels seem to mean. consider the decomposition Manually versus Automatically. Reversely, their observed inferences reveal their perceived We can assume that whenever we pick one of the alternatives meaning of the links, and, as such, the former can be used the corresponding hard-goal is assigned maximum satisfaction to develop empirical operationalizations of the latter. In the and, if applicable minimum denial value. Thus, to choose the experiment we describe below, we ask the users to make alternative Manually we assign to it maximum satisfaction decisions using goal models. To do so, they need to adopt a values {FS, ND} (qualitative case) or s(Manually) = 1 (quan- way of using the contribution link and, implicitly, a semantics titative case), and to all other alternatives (in our case only for those links. The alignment of the semantics implied by Automatically) values {NS, ND} or s(·) = 0. We then perform how users use the models with the designed semantics (i.e. the recursive bottom-up application of the propagation rules of semantics intended by the designers), as exhibited by whether Tables I and II (depending on case), in order to calculate the results of the inference match, is, we claim, a possible the satisfaction of the root goal Overall Scheduling Quality. indication of the intuitiveness of the designed semantics. For the quantitative models specifically we follow the linear interpretation of the last row of Table II. Different choices III. E XPERIMENTAL D ESIGN of alternative will result in different satisfaction level for the A. Overview and Research Question root goal. The alternative that results to the highest satisfaction value for the root goal is the optimal. In the proposed study we pick two approaches for modeling In the quantitative case, satisfaction is a unique value and and assigning meaning to contribution links and compare the comparison straightforward. In the example of Figure 1 them with regards to measures of intuitiveness and efficiency. (model on the right), Manually causes satisfaction of Overall We specifically compare the symbolic against the numeric Scheduling Quality by approx. 0.6 compared to approx. 0.4 approach, the latter under the linear interpretation. There is implied by selection of Automatically. Thus, Manually is the one main research question we wish to address: optimal alternative2 . RQ. Which of the two methods for modeling contribution In the qualitative case, calculation is less straightforward links is the most (a) intuitive and (b) efficient for in that there are two variables to consider, satisfaction and the task of identifying optimal alternatives in goal denial. To make different satisfaction levels comparable we models? aggregate the two values into one, the aggregated satisfaction We address the above through a controlled experiment with value. To calculate the aggregated satisfaction values, we human participants. firstly associate qualitative satisfaction labels {N, P, F} with numeric values 0,1,2, respectively. We denote the resulting B. Experimental Tasks and Measurements numeric satisfaction and denial of a goal g as sat(g) and 1) Measures: The two constructs we are considering are den(g), respectively. The aggregated satisfaction value is then intuitiveness and efficiency. We theoretically defined intuitive- sat(g) − den(g) which results to an integer in [-2,2]. Thus, ness as the degree by which untrained users can make accurate the aggregated satisfaction value of a goal g1 with {PS, FD} inferences with models they are exposed to. Operationally, is sat(g1 ) − den(g1 ) = 1 − 2 = −1 and of a goal g2 with we will measure intuitiveness by exposing the experimental {FS, ND}, sat(g2 ) − den(g2 ) = 2 − 0 = 2. For the qualitative participants to a sample of goal models and asking them 2 To simulate the experience of our experimental participants the reader to perform an inference, which we then compare with the can look at the diagram and verify if the assertion that Manually is optimal “correct” inference as dictated by the adopted contribution can be inferred intuitively, by roughly comparing the numbers and without modeling approach. Perception of intuitiveness is also included performing precise calculations. model on the left of Figure 1, it can be verified that Overall levels, the numeric one being understood as more precise, Scheduling Quality is {PS, PD} for Manually and also {PS, this difference does not seem to threaten our comparison PD} for Automatically. Hence, both alternatives lead to the effort but rather offer us a possible explanatory view to a same aggregated satisfaction value for the root goal, that is 0, potential result. If, for example, a difference is discovered in and as such they are equally optimal. favor of the numeric format, it might be due to a number 3) Model Sampling: To develop the samples of goal models of reasons including precision but also, e.g., familiarity of that we need, we pick a goal structure (more below) and ran- the participants with numerical reasoning and assessment of domly choose contribution link labels, such that the distance proportions. Identifying those precise reasons – assuming in satisfaction value of the best alternative (i.e., optimal with the effect is eventually observed – is a matter for future respect to the root soft-goal) compared to the second best investigation. alternative is controlled to not exceed or be less than a fixed 4) Instrument and Tasks: Using the sampling procedure value. Thus, we ensure that the distance is neither too large so described above we develop a total of twelve (12) quantitative that the task of identifying the optimal alternative is trivial in models. The goal structures refer to three (3) different domains all cases, nor too small to constitute an unimportant distance describing intentional structures in the context of decisions: in terms of decision making and also be impossible to detect Choosing an Apartment, Choosing a Course, and Choosing even by some of the participants. a Means of Transportation. We develop the models based Specifically, in qualitative models contribution labels are on specific domains, rather than using dummy names (A, B, assigned randomly one of the labels “++”, “+”, “−−”, and C etc.) for the purpose of making the tasks more realistic. “−”, such that the first alternative has a distance from the This introduces the threat that participants may use their own second alternative of two (2) levels of satisfaction, based on opinion of how goals are related to each other ignoring the the aggregated satisfaction value of the root soft-goal that each information provided in the contribution link. To avoid this alternative results in. bias, participants are told that the structures represent decision Thus, a goal model in which the best alternative, when problems of a third party and that their task is to help that party chosen, makes the root goal {FS, ND}, hence aggregated value make the decision based on the priorities of that party as these 2 − 0 = 2 and the second best makes the root goal {PS, PD}, priorities are represented in the goal structure. hence aggregated value 1−1 = 0, qualifies for inclusion to our For each domain we develop two (2) structures (one with sample as the distance of the two top alternatives is 2. A goal two and one with three alternatives) and for each structure model, on the other hand, in which the top two alternatives we sample two (2) labels-sets (i.e., sets of labels for the are both {FS, PD} have both an aggregated value of 1 and contributions) sampled as described above. To produce quali- hence distance of zero; so they do not qualify. tative counterparts we simply copy the twelve (12) quantitative In quantitative models we also randomly sample while structures and replace the numbers with randomly sampled ensuring that the first alternative has a distance of 0.4 from the symbolic labels – again, as described above. second; again, in terms of the satisfaction they imply for the We then present the resulting twelve (12) models of each root goal. For example a set of weights that gives satisfaction type (qualitative and quantitative) to the participants one after value 0.7 to the first alternative and 0.3 to the second qualifies the other asking each time what they believe the optimal for inclusion to our sample. The model of Figure 1 (right), alternative is. Domains are presented in random order and focussing on the Choose Schedule decision, does not qualify models within the domains in random order as well. Three as the distance is 0.6 − 0.4 = 0.2 video presentations precede these tasks: one describes decision The choice of 0.4 is made to match the corresponding choice problems in general, another introduces goal models and a in the qualitative models. Observe that in qualitative models third one introduces the three domains. The second video the maximum distance between alternatives is 4 ({FS, ND} specifically, describes the intuition behind the contribution versus {NS, FD} so 2 - (-2)). The distance we demand is 2, links of each type carefully without getting into the mechanics thus half of this space. Respectively in the quantitative models of satisfaction propagation. The videos are scripted and are the maximum theoretical distance is 1.0, so half the space the same in the two cases (qualitative and quantitative) except would be 0.5. However we end-up to 0.4 – biasing slightly obviously for the places where the numbers or symbols are against numeric models – as for some of our structures there presented. does not seem to exist combinations of numeric labels that The videos are chosen as the instruction method for three yield a distance of exactly 0.5. reasons (a) allow for repeatability of the procedure, (b) control To remain consistent with the claim that linear interpretation for biases in training, and (c) allow for remote administration is chosen due to the systematic elicitation approach that is or administration by non-experts. afforded by it, namely AHP, all numeric sampling is done A simple demographics questionnaire (age, sex, education, through simulated AHP pair comparison processes and subse- prior knowledge of goal models) precedes the main test. quent profile calculations, such that the consistency ratio (CR) Participants are unlikely to be familiar with goal models, and is less than 0.1. input coming from those who actually are will be discarded. It is worthwhile to note that, while it is understood that However, if familiarity to goal modes turns out to be more the two representation approaches have different precision prevalent, treating familiarity as a covariant is another option. 5) Participant Sample: We plan to consider the University option of measuring perceived intuitiveness via self-reporting student pool as the population to opportunistically sample the confidence of participants on their inferences. Our design from, specifically intermediate/senior students from various relies on random sampling of a number of goal structures disciplines. We claim that this does not harm the generalizabil- depicting a decision problem and asking participants to choose ity of the particular study. Firstly, having a valid noteworthy the optimal of the available choices, thereby making intuitive intuition about how the particular conceptual modeling con- inferences about contribution links. The decision problems are struct works does not seem to require experience and skill in carefully sampled to allow for a controlled distance between any specific field: goal models refer to concepts (goals and the optimal and second optimal choice. their fulfillment) that should be accessible to anyone who has Empirically evaluating the effectiveness of diagrammatic successfully entered post-secondary education – compared to, notations has been widely studied in the literature. Much of the for instance, component diagrams describing software designs. research in the field has been dedicated towards understanding Secondly, it seems to be the implicit ambition of goal modeling the comprehensibility of (various aspects of) UML and ER language designers that goal models are artifacts that not only diagrams – e.g., [20]–[24] – or process models [25]–[28]. analysts but also stakeholders are able to comprehend and use Although understandability is a popular construct of study, to their benefit [17]. If this is the case, then the population it has been argued that there is little agreement on how this is we should be drawing participants from is, roughly speaking, to be measured. Indeed, in their survey, Houy et al. [29] find the population of all people who might serve as decision variability in how understandability is operationalized in the making stakeholders in a systems development project. While literature. The concept of intuitiveness, as a specialization of there is no authoritative data about the characteristics of this understandability, is less frequently being focused on explicitly population, we believe that the breadth of educational and skill as in work by Jošt et al., for example, where the intuitive profiles in it can be credibly approximated by a sample of understandability of various methods for modeling processes intermediate/senior University undergraduate students. are empirically compared [30]. 6) Variables and Analysis Approach: It becomes obvious Work that relates to understanding the comprehensibility from the above that the experiment is a simple compari- of goal models specifically is more limited. Horkoff et al. son between two levels (qualitative vs. quantitative) of one evaluate an interactive evaluation technique for goal models independent variable/factor (contribution link representation [31]. The way various concepts within goal models are visu- method) arranged in a between-subjects fashion. Dependent alized has also been the matter of investigation and empirical variables are the accuracy measured as the number of correct evaluation. Moody et al. offer an assessment of the i* visual responses per participant, hence a number in [0,12], as well syntax based on established rules (“Physics of Notations”) as response time which is the average time participants need [32]. An empirical analysis was followed by Caire et al. [33] in in order to provide a response. which experimental participants evaluate visualization choices It is also possible to measure confidence in each partic- of the language’s primitives. Elsewhere, Hadar et al. [34] ipant’s response as a measure of perceived intuitiveness. In compare goal diagrams with use case diagrams on a variety earlier studies [18], [19], we augmented each exercise with of user tasks. Measures include text-model mapping, model a 5-level Likert-style question “how confident are you of reading (extracting information from the model), and model your answer above”, with possible answers Very Unconfident, modification (performing targeted modifications to models). Confident, Neutral, etc. The higher the confidence the higher Carvallo and Franch have also studied, in the context of the perceived intuitiveness, i.e., how intuitive the participants a case study, how non-technical stakeholders performed in think the representation is. However, this additional question developing strategic dependency i* diagrams [17]. increases experimental time and fatigue. Addition of this Compared to the above, our work is more targeted to a variable would depend on our ability to keep the instrument specific construct of goal models, that is contribution links. In short, i.e., around 30 to 40 minutes. earlier work on the subject [18] we set out to investigate the Simple comparisons between means appear to be sufficient intuitiveness of the rules in Table I. In that experiment, we as a statistical procedure, with the expected deviation from presented to experimental participants a series of contribution normality kept in view – the scale [0,12] is particularly inviting links each connecting two goals in which the satisfaction for ceiling effects. value of the origin is know. As we also propose here, we operationalized intuitiveness by asking participants what their IV. S UMMARY AND R ELATED W ORK “hunch” is with respect to the satisfaction of the destination We presented an experimental design for comparing the goal and comparing their input to the authoritative one of Table intuitiveness of symbolic versus numeric contribution links in I. Among our findings were that rules involving positive labels goal models. We use intuitiveness as our main comparison and goal satisfaction are more intuitive to ones with negative construct, defined as the ability of novice users of the notation labels and goal denial. to correctly understand how they can use it. We operationalize We also endeavored to compare the quantitative rules of intuitiveness by measuring agreement between authoritative Table II [35]. In that work we simply presented to participants inferences and inferences participants make, as well as the hierarchies of soft-goals with known satisfaction values at time it takes for the latter to take place. We also include the the leaf level and asked them to choose the satisfaction of the root goal from a set of four values, each representing [8] P. Giorgini, J. Mylopoulos, E. Nicchiarelli, and R. Sebastiani, “Rea- one of the possibilities of Table II. We found that the serial- soning with Goal Models,” in Proceedings of the 21st International Conference on Conceptual Modeling (ER’02), London, UK, 2002, pp. parallel method was not preferred while the most preferred 167–181. depended on whether the contribution weights added up to 1.0, [9] J. Horkoff and E. Yu, “Comparison and evaluation of goal-oriented in which case a linear interpretation was evoked. In general, satisfaction analysis techniques,” Requirements Engineering (REJ), pp. 1–24, 2011. our fundamental null hypothesis that the answers would be [10] E. S. Yu, “GRL - Goal-oriented Requirement Language.” [Online]. uniformly random was rejected, indicating that more research Available: http://www.cs.toronto.edu/km/GRL/ should be done on the matter. [11] P. Giorgini, J. Mylopoulos, E. Nicchiarelli, and R. Sebastiani, Formal Reasoning Techniques for Goal Models. Springer Berlin Heidelberg, Finally, in a different effort [19] and in a vain some- 2003, pp. 1–20. [Online]. Available: https://doi.org/10.1007/978-3-540- what similar to that of Caire et al. [33] we focused on the 39733-5 1 visualization of contribution measures that is alternative to [12] S. Liaskos, R. Jalman, and J. Aranda, “On Eliciting Preference and Contribution Measures in Goal Models,” in Proceedings of the 20th diagrammatic. We specifically employed bar-charts, pie-charts International Requirements Engineering Conference (RE’12), Chicago, and tree-maps to represent quantitative goal diagrams such as IL, 2012, pp. 221–230. those of Figure 1 – following again the “linear” interpretation. [13] T. L. Saaty, “Decision making with the analytic hierarchy process,” International Journal of Services Sciences (IJSSCI), vol. 1, no. 1, pp. Exactly as we propose here, we presented users with decision 83 – 98, 2008. problems and asked them to pick the optimal alternative using [14] Y. R. Preece, H. Sharp, and Jennifer, Interaction Design: beyond human- each of the visualizations under comparison. We found that the computer interaction. Wiley, 2011. [15] J. Nielsen, “Ten Usability Heuristics.” [Online]. Available: combination of pie-charts and bar-charts lead to more accurate https://tfa.stanford.edu/download/TenUsabilityHeuristics.pdf identification of the optimal alternative and that diagrams were [16] J. Mylopoulos., “Conceptual Modeling and Telos,” in Conceptual Mod- not better in none of the tests or measures. elling, Databases and CASE: An Integrated View of Information Systems Development. Wiley, 1992. The difference of the above effort [19] and the current work [17] J. P. Carvallo and X. Franch, “An empirical study on the use of i* by is that, while in that paper the semantics are assumed and non-technical stakeholders: the case of strategic dependency diagrams,” the visualization is in question, in the study proposed here, Requirements Engineering (REJ), pp. 1–27, 2018. [18] S. Liaskos, A. Ronse, and M. Zhian, “Assessing the Intuitiveness both the visualization and its meaning are under comparison. of Qualitative Contribution Relationships in Goal Models: an The result can thus be interpretable at either level. For the Exploratory Experiment,” in Proceedings of the 11th ACM/IEEE future, we are interested in exploring theoretical and method- International Symposium on Empirical Software Engineering and Measurement (ESEM’17), 2017, pp. 466–471. [Online]. Available: ological approaches through which these two aspects can be http://www.yorku.ca/liaskos/Docs/ESEM17.pdf separately evaluated. The endeavour is not a simple one, as [19] S. Liaskos, T. Dundjerovic, and G. Gabriel, “Comparing understanding of any communication of a concept can be Alternative Goal Model Visualizations for Decision Making: an Exploratory Experiment,” in Proceedings of the 33rd argued to be affected by the way it is communicated – through Annual ACM Symposium on Applied Computing (SAC’18), words, visualizations or other methods. Thus it may prove Pau, France, 2018, pp. 1272–1281. [Online]. Available: difficult to measure comprehension of a concept as a “pure” http://www.yorku.ca/liaskos/Papers/SAC2018/Visualizations/SAC2018.pdf [20] J. A. Cruz-Lemus, M. Genero, M. E. Manso, S. Morasca, and M. Piattini, abstraction. Such a problematic demonstrates how empirical “Assessing the understandability of UML statechart diagrams with investigation, even at the conceptualization stage, forces us composite states—A family of empirical studies,” Empirical Software to think more deeply into the substance of the process of Engineering, vol. 14, no. 6, pp. 685–719, 2009. conceptual modeling and the nature of its artifacts. [21] H. C. Purchase, R. Welland, M. McGill, and L. Colpoys, “Compre- hension of diagram syntax: an empirical study of entity relationship notations,” International Journal of Human-Computer Studies, vol. 61, R EFERENCES no. 2, pp. 187–203, 2004. [1] E. S. K. Yu, “Towards Modelling and Reasoning Support for Early- [22] P. Shoval and I. Frumermann, “OO and EER Conceptual Schemas: A Phase Requirements Engineering,” in Proceedings of the 3rd IEEE Inter- Comparison of User Comprehension,” Journal of Database Management national Symposium on Requirements Engineering (RE’97), Annapolis, (JDM), vol. 5, no. 4, pp. 28–38, 1994. MD, 1997, pp. 226–235. [23] A. De Lucia, C. Gravino, R. Oliveto, and G. Tortora, “Data model com- [2] D. Amyot and G. Mussbacher, “User Requirements Notation: The First prehension an empirical comparison of ER and UML class diagrams,” Ten Years, The Next Ten Years (Invited Paper),” Journal of Software Proceedings of the 16th IEEE International Conference on Program (JSW), vol. 6, no. 5, pp. 747–768, 2011. Comprehension (ICPC 2008), pp. 93–102, 2008. [3] A. Dardenne, A. van Lamsweerde, and S. Fickas, “Goal-Directed [24] M. Genero, G. Poels, and M. Piattini, “Defining and validating metrics Requirements Acquisition,” Science of Computer Programming, vol. 20, for assessing the understandability of entity-relationship diagrams,” Data no. 1-2, pp. 3–50, 1993. and Knowledge Engineering, vol. 64, no. 3, pp. 534–557, 2008. [4] J. Mylopoulos, L. Chung, S. Liao, H. Wang, and E. Yu, “Exploring [25] D. Q. Birkmeier, S. Klockner, and S. Overhage, “An Empirical Compari- Alternatives During Requirements Analysis,” IEEE Software, vol. 18, son of the Usability of BPMN and UML Activity Diagrams for Business no. 1, pp. 92–96, 2001. Users,” in Proceedings of the 18th European Conference on Information [5] S. Liaskos, S. M. Khan, M. Soutchanski, and J. Mylopoulos, “Modeling Systems (ECIS’10), 2010, pp. 51–62. and Reasoning with Decision-Theoretic Goals,” in Proceedings of the [26] K. Figl, J. Recker, and J. Mendling, “A study on the effects of routing 32th International Conference on Conceptual Modeling, (ER’13), Hong- symbol design on process model comprehension,” Decision Support Kong, China, 2013, pp. 19–32. Systems, vol. 54, no. 2, pp. 1104–1118, 2013. [6] S. Liaskos, S. McIlraith, S. Sohrabi, and J. Mylopoulos, “Representing [27] K. Figl and R. Laue, “Cognitive Complexity in Business Process and reasoning about preferences in requirements engineering,” Require- Modeling,” in Proceedings of the 23rd International Conference on ments Engineering Journal (REJ), vol. 16, no. 3, pp. 227–249, 2011. Advanced Information Systems Engineering (CAiSE 2011) London, UK, [7] D. Amyot, S. Ghanavati, J. Horkoff, G. Mussbacher, L. Peyton, and June 20-24, 2011. Proceedings, 2011, pp. 452–466. E. S. K. Yu, “Evaluating goal models within the goal-oriented require- [28] J. Mendling and M. Strembeck, “Influence Factors of Understanding ment language,” International Journal of Intelligent Systems, vol. 25, Business Process Models,” 11th International Conference on Business no. 8, pp. 841–877, 2010. Information Systems, pp. 142–153, 2008. [29] C. Houy, P. Fettke, and P. Loos, “Understanding understandability of conceptual models - What are we actually talking about?” in Proceedings of the 31st International Conference on Conceptual Modeling (ER 2012), vol. (LNCS 7532), 2012, pp. 64–77. [30] G. Jošt, J. Huber, M. Heričko, and G. Polančič, “An empirical inves- tigation of intuitive understandability of process diagrams,” Computer Standards and Interfaces, vol. 48, pp. 90–111, 2016. [31] J. Horkoff and E. S. K. Yu, “Interactive goal model analysis for early requirements engineering,” Requirements Engineering, vol. 21, no. 1, pp. 29–61, 2016. [32] D. L. Moody, P. Heymans, and R. Matulevičius, “Visual syntax does matter: improving the cognitive effectiveness of the i* visual notation,” Requirements Engineering, vol. 15, no. 2, pp. 141–175, 2010. [33] P. Caire, N. Genon, P. Heymans, and D. L. Moody, “Visual notation design 2.0: Towards user comprehensible requirements engineering notations,” in Proceedigns of the 21st IEEE International Requirements Engineering Conference (RE’13), jul 2013, pp. 115–124. [34] I. Hadar, I. Reinhartz-Berger, T. Kuflik, A. Perini, F. Ricca, and A. Susi, “Comparing the comprehensibility of requirements models expressed in Use Case and Tropos: Results from a family of experiments,” Information and Software Technology, vol. 55, no. 10, pp. 1823–1843, 2013. [35] N. Alothman, M. Zhian, and S. Liaskos, “User Perception of Numeric Contribution Semantics for Goal Models: an Exploratory Experiment,” in Proceedings of the 36th International Conference on Conceptual Modeling (ER’17), 2017, pp. 451–465. [Online]. Available: http://www.yorku.ca/liaskos/Docs/ER17.pdf