=Paper=
{{Paper
|id=Vol-1674/iStar16_pp55-60
|storemode=property
|title=An Empirical Evaluation Roadmap for iStar 2.0
|pdfUrl=https://ceur-ws.org/Vol-1674/iStar16_pp55-60.pdf
|volume=Vol-1674
|authors=Lidia López,Fatma Başak Aydemir,Fabiano Dalpiaz,Jennifer Horkoff
|dblpUrl=https://dblp.org/rec/conf/istar/LopezADH16
}}
==An Empirical Evaluation Roadmap for iStar 2.0==
<pdf width="1500px">https://ceur-ws.org/Vol-1674/iStar16_pp55-60.pdf</pdf>
<pre>
    An Empirical Evaluation Roadmap for iStar 2.0

Lidia López1 , Fatma Başak Aydemir2 , Fabiano Dalpiaz2 , and Jennifer Horkoff3
    1
            Universidad Politècnica de Catalunya, Barcelona, Spain, llopez@essi.upc.edu
        2
             Utrecht University, Utrecht, Netherlands, {f.b.aydemir, f.dalpiaz}@uu.nl
                   3
                     City University London, London, UK, horkoff@city.ac.uk


              Abstract. The iStar 2.0 modeling language is the result of a two-year
              community effort intended at providing a solid, unified basis for teaching
              and conducting research with i*. The language was released with impor-
              tant qualities in mind, such as keeping a core set of primitives, providing
              a clear meaning for those primitives, and flattening the learning curve
              for new users. In this paper, we propose a list of qualities against which
              we intend iStar 2.0 to be evaluated. Furthermore, we describe an em-
              pirical evaluation plan, which we devise in order to assess the extent
              to which the language meets the identified qualities and to inform the
              development of further versions of the language. Besides explaining the
              objectives and steps of our planned empirical studies, we make a call for
              involving the research community in our endeavor.
              Keywords: i* Framework, iStar 2.0, empirical engineering, evaluation


1           Introduction
Many dialects and extensions of the i * modeling language have been proposed
since its introduction in the 1990s. Although these proposals demonstrate the
popularity of the language in the research community and allow adaptation of
the framework to a variety of domains (e.g., security, law, service-oriented archi-
tectures), they have also created difficulties in learning, teaching, and applying
i * consistently.
     iStar 2.0 [2] is the result of a collective effort of the i * community aimed to
overcome these difficulties by defining a standard core set of concepts. Given the
objectives of iStar 2.0, our aim is twofold: a. to measure how well the language
achieves the objectives, and b. to inform further developments with empirical
evidence.
     More specifically, our research question is the following: Does iStar 2.0 pro-
vide a solid and unified basis for teaching and supporting ongoing research on
goal-oriented requirements engineering? Toward answering this question, we
identify several relevant qualities and provide an initial roadmap for the em-
pirical studies to conduct to evaluate iStar 2.0 against those qualities.
     The remainder of the paper is structured as follows. Section 2 includes a
brief literature review of empirical evaluations in modeling languages and i *. In
Section 3, we define the set of qualities to be empirically evaluated and a roadmap
defining the time-line of the implementation of these evaluations. Finally, we
draw some conclusions in Section 4.

 Copyright © 2016 for this paper by its authors. Copying permitted for private and academic purposes.
Proceedings of the Ninth International i* Workshop (iStar 2016), CEUR Vol-1674

2      Empirical Evaluation of Modeling Languages
There is a variety of empirical evaluations in the area of modeling languages in
general, and in i * modeling language in particular. This section provides a brief
summary of these studies focusing on the qualities evaluated by the studies.
    Lindland et al. [7] propose a framework that identifies three category of
qualities related to modeling languages (syntactic, semantic, and pragmatic),
quality goals for each category, and the means for achieving these goals. The
semantic qualities refer to the validity and completeness of the language and
the models generated using the language, syntactic qualities are related to the
syntax of the language, and pragmatic qualities concern the understandability
of the language and its application.
    Guizzardi et al. [5] suggest domain and comprehensibility appropriateness as
key qualities of a modeling language, relying on verifying lucidity, soundness,
laconicity, and completeness properties of model instances. These properties are
then related to corresponding language properties: construct overload, construct
excess, construct redundancy, and ontological expressiveness.
    Frank [4] proposes a method to evaluate reference models, where the evalu-
ation concerns both the general qualities of conceptual models and re-usability
of the reference domain. The framework states four different evaluation perspec-
tives: economic, deployment, engineering and epistemological. Each perspective
is structured into multiple aspects for which a success criterion is provided.
    Interest in i * evaluation appears to be on the rise, with studies covering
both the language evaluation and the applicability of i * in the industry. We
distinguish between different kinds of studies. Some works evaluate the use of
an i * extension comparing it to the use of i * [10]. Other approaches compare
i * with other goal-oriented modeling languages such as KAOS [8] or Techne [6].
Finally, other studies evaluate specific characteristics of the language such as
visual effectiveness [9].
    The majority of the studies providing empirical evidence in the literature
are evaluating the applicability of i * for different purposes in the industrial
environment. Elahi et al. [3] studied the use of i * for gathering and understanding
knowledge in an organization, concluding that some constructs are not used by
practitioners. Carvallo et al. [1] focus on socio-technical systems and conclude
that some models are too difficult to read and modify due to their complexity.
A variety of real use cases were presented at the i * Showcase in 2011 1 .


3      iStar 2.0 Evaluation Roadmap
In order to evaluate iStar 2.0, we need to define the set of the language qualities
that we want to assess. Based on the review of Section 2, we present a number of
qualities to evaluate, then discuss suitable empirical methods, and finally devise
an initial roadmap for the empirical evaluation.
 1
     http://www.city.ac.uk/centre-for-human-computer-interaction-design/
     istar11


                                            56
                                         An Empirical Evaluation Roadmap for iStar 2.0

3.1   Qualities to be evaluated
As iStar 2.0 was not defined as a new language, but a set of core concepts
refining the original i* [12], backward compatibility is critically important. As
a community, we need to collect evidence to determine if iStar 2.0 meets the
needs of the users of i *. The open nature of i * comes with a drawback that iStar
2.0 is trying to mitigate: the steep learning curve that makes it hard to employ
the language in industry. Therefore, learnability is also a priority quality to be
evaluated. Keeping the open nature of i * was also one of the main objectives
during the definition of iStar 2.0. Consequently, we also need to consider the
extensibility quality, i.e., evaluating whether iStar 2.0 is a suitable baseline for
extensions.
    Additionally to these qualities, we consider some qualities to evaluate the
quality of the language, for example expressiveness or syntactic correctness. Re-
garding the expressiveness, we are interested in evaluating if iStar 2.0 has a
suitable set of constructs (missing, excess or overload). Syntactic correctness
evaluates whether the modelers can easily detect and correct syntactic errors
using iStar 2.0 and whether the language can prevent syntactic errors.
    We have also included qualities not directly assessed during the definition
of iStar 2.0, such as scalability. The detailed set of qualities to be evaluated
is included in Table 1. We categorize the qualities based on the classification
provided in [7].

3.2   Empirical Methods: Design Dimensions
In order to evaluate the qualities listed in Table 1, several empirical studies must
be designed and conducted. We envision the application of several empirical
methods, including experiments, surveys and case studies. We can enumerate a
number of dimensions that must be considered when designing such studies.
    The choice of subjects participating in the studies is a dimension that must
be determined for each study. To classify the subjects, we can use two categories:
expertise and background (industry or academy). We need to clearly define a set
of i* experts for inclusion in the backward compatibility evaluation. For practical
applicability, we need to involve practitioners from industry. For other qualities,
we can treat the expertise and the background of participants as a variable in
the study.
    We also need to decide when to evaluate the iStar 2.0 language in isolation
and when a comparative analysis comparing iStar 2.0 to i * is needed. The same
reasons that lead us to pay special attention to the backward compatibility and
the learnability lead us to think, that for these specific qualities, we should
conduct comparative analysis. Meanwhile, the evaluation of the other qualities
can focus only on iStar 2.0.

3.3   Roadmap
From an empirical software engineering standpoint, we can identify two main
phases for the evaluation of iStar 2.0: formative and summative. The formative


                                        57
Proceedings of the Ninth International i* Workshop (iStar 2016), CEUR Vol-1674


                     Table 1: iStar 2.0 qualities to be evaluated
  Category      Quality              Definition
                                     Does iStar 2.0 facilitate ensuring and maintaining
  Syntactic     Syntactic correct-
                                     syntactic correctness?
                ness
                                     Does iStar allow one to capture a sufficient num-
                Expressiveness
                                     ber of concepts in a socio-technical domain?
  Semantic
                Unambiguous          Do iStar 2.0 models have only one interpretation?
                models
                                     Is iStar 2.0 able to represent the same phenomena
                Backward–
                                     as i*?
                ccompatibility
                Comprehensibility Can iStar 2.0 models be understood?
                                   Is the effort required to use iStar 2.0 worth the
                Cost–
                                   benefits?
                Effectiveness
  Pragmatic     Extensibility      Is it easy to add new concepts to iStar 2.0?
                                   What does the learning curve of iStar 2.0 look
                Learnability
                                   like?
                                   Does iStar 2.0 facilitate changing and updating
                Modifiability
                                   models?
                                   Can iStar2.0 be successfully applied to real world
                Practical applica-
                                   cases?
                bility
                                   Does iStar 2.0 support the creation and analysis
                Scalable
                                   of large problems?


phase corresponds to the task related to development of the proposal providing
some partial empirical validation for the resulting proposal, while the summative
phase evaluates if the proposal can be implemented in the real world. We are
currently in the formative phase, and precisely in the treatment validation step
of Wieringa’s design science methodology [11].
    We divide the proposed empirical evaluation plan in three phases, divided
in a total of five stages. The first two phases correspond to the formative and
summative phases in empirical research, while the third phase describes comple-
mentary activities:
  – In the formative phase, the evaluation will concern the qualities that led the
    design decisions for iStar 2.0. These qualities include keeping a core set of
    primitives (stage 1), providing a clear meaning for such primitives (stage 2),
    and flattening the learning curve for new users (stage 3).
  – In the summative phase, the proposal (in our case, iStar 2.0) should be tested
    for applicability to real-world cases (stage 4).
  – The third phase includes the study of additional properties that do not
    directly relate to the use of iStar 2.0 itself, but rather on its capability to be
    adapted for specific cases or domains (stage 5).


                                            58
                                          An Empirical Evaluation Roadmap for iStar 2.0

Figure 1 shows the three phases, including the qualities to be evaluated in each
stage. Cost-effectiveness is a quality that should be evaluated as part of all the
stages. The cost can be evaluated in terms of time in all the stages, and also in
terms of money in stage 4. Note that stages 1 to 3 and 5 could be executed in any
order while stage 4 should be executed after stages 1 to 3 have been conducted.


                      Fig. 1: iStar 2.0 Evaluation Roadmap


4   Conclusions
During the last few years, the i * community has been working on the definition
of a standard, core version, resulting in iStar 2.0. The main goal of this effort
was to facilitate the consistent learning, teaching, and application of i *. After
the definition of iStar 2.0, the natural next step is evaluating the proposal to
gather evidence of whether the proposal achieves the expected qualities.
    In this paper, we emphasize the necessity of evaluating iStar 2.0 through
empirical studies. Our first step is the identification of a set of qualities against
which iStar 2.0 should be evaluated. We also discuss some key dimensions that
need to be defined when conducting these empirical studies. Interestingly, many
of the qualities we identified are pragmatic; we surmise this is linked to the
limited adoption of i * in industry.
    We prioritize the evaluation tasks of the qualities grouping them in five stages.
Some of these tasks are labeled as formative evaluation, others are part of sum-
mative evaluation, and the remaining tasks are additional studies of the exten-
sibility and customizability of iStar 2.0. Based on this grouping, we define a
roadmap proposing an order of execution for the various evaluation stages.
    The next steps consist of conducting empirical studies addressing one or more
of the identified qualities for iStar 2.0. Although we plan to design and conduct
several studies ourselves, an effective evaluation of the language will require a
community-wide effort. We encourage i * community members to use and eval-
uate iStar 2.0, keeping in mind the qualities presented here, and reporting the


                                        59
Proceedings of the Ninth International i* Workshop (iStar 2016), CEUR Vol-1674

results publicly. Our hope is that, as a community, we build evidence either to
support the usefulness of iStar 2.0 as well as to shape the future versions of the
language.
Acknowledgments. This work is supported by EOSSAC project, funded by the
Ministry of Economy and Competitiveness of the Spanish government (TIN2013-
44641-P), an ERC Marie Skodowska-Curie Intra European Fellowship (PIEF-
GA-2013-627489) and a Natural Sciences and Engineering Research Council of
Canada Postdoctoral Fellowship (Sept. 2014 – Aug. 2016). The second and third
author have received funding from the SESAR Joint Undertaking under grant
agreement No. 699306 under European Union’s Horizon 2020 research and in-
novation programme.


References
 1. Carvallo, J.P., and Franch, X.: “On the Use of i* for Architecting Hybrid Systems:
    A Method and an Evaluation Report”. In: Lecture Notes in Business Information
    Processing. Springer Science + Business Media, 2009, pp. 38–53.
 2. Dalpiaz, F., Franch, X., and Horkoff, J.: iStar 2.0 Language Guide. CoRR abs/
    1605.07767 (2016)
 3. Elahi, G., Yu, E., and Annosi, M.C.: “Modeling Knowledge Transfer in a Soft-
    ware Maintenance Organization - An Experience Report and Critical Analysis”.
    In: Lecture Notes in Business Information Processing. 2008, pp. 15–29.
 4. Frank, U.: “Evaluation of Reference Models”. In: Reference modeling for business
    systems analysis. IGI Global, 2006, pp. 118–140.
 5. Guizzardi, G., Pires, L.F., and Sinderen, M. van: “An Ontology-Based Approach for
    Evaluating the Domain Appropriateness and Comprehensibility Appropriateness
    of Modeling Languages”. In: Model Driven Engineering Languages and Systems.
    Springer Science + Business Media, 2005, pp. 691–705.
 6. Horkoff, J., Aydemir, F.B., Li, F., Li, T., and Mylopoulos, J.: Evaluating Modeling
    Languages: An Example from the Requirements Domain. In: ER (2014)
 7. Lindland, O., Sindre, G., and Solvberg, A.: Understanding Quality in Conceptual
    Modeling. IEEE Softw. 11(2), 42–49 (1994)
 8. Matulevičius, R., and Heymans, P.: “Comparing Goal Modelling Languages: An Ex-
    periment”. In: Requirements Engineering: Foundation for Software Quality. Springer
    Science + Business Media, 2007, pp. 18–32.
 9. Moody, D.L., Heymans, P., and Matulevicius, R.: Improving the Effectiveness of
    Visual Representations in Requirements Engineering: An Evaluation of i* Visual
    Syntax. In: RE (2009)
10. Teruel, M.A., Navarro, E., López-Jaquero, V., Montero, F., and González, P.:
    “Comparing Goal-Oriented Approaches to Model Requirements for CSCW”. In:
    Communications in Computer and Information Science. 2013, pp. 169–184.
11. Wieringa, R.J.: Design science methodology for information systems and software
    engineering. Springer (2014)
12. Yu, E.: Modelling strategic relationships for process reengineering. University of
    Toronto (1996).


                                            60

</pre>