Introduction

An Empirical Evaluation Roadmap for iStar 2.0

Lidia López

Fatma Başak Aydemir

f.b.aydemir@uu.nl 2

Fabiano Dalpiaz

f.dalpiaz@uu.nl 2

Jennifer Horkoff

0 0 City University London , London , UK 1 Universidad Politècnica de Catalunya , Barcelona , Spain 2 Utrecht University , Utrecht , Netherlands

2016

1674 55 60

The iStar 2.0 modeling language is the result of a two-year community effort intended at providing a solid, unified basis for teaching and conducting research with i *. The language was released with important qualities in mind, such as keeping a core set of primitives, providing a clear meaning for those primitives, and flattening the learning curve for new users. In this paper, we propose a list of qualities against which we intend iStar 2.0 to be evaluated. Furthermore, we describe an empirical evaluation plan, which we devise in order to assess the extent to which the language meets the identified qualities and to inform the development of further versions of the language. Besides explaining the objectives and steps of our planned empirical studies, we make a call for involving the research community in our endeavor.

Introduction

Many dialects and extensions of the i * modeling language have been proposed since its introduction in the 1990s. Although these proposals demonstrate the popularity of the language in the research community and allow adaptation of the framework to a variety of domains (e.g., security, law, service-oriented architectures), they have also created difficulties in learning, teaching, and applying i * consistently.

iStar 2.0 [ 2 ] is the result of a collective effort of the i * community aimed to overcome these difficulties by defining a standard core set of concepts. Given the objectives of iStar 2.0, our aim is twofold: a. to measure how well the language achieves the objectives, and b. to inform further developments with empirical evidence.

More specifically, our research question is the following: Does iStar 2.0 provide a solid and unified basis for teaching and supporting ongoing research on goal-oriented requirements engineering? Toward answering this question, we identify several relevant qualities and provide an initial roadmap for the empirical studies to conduct to evaluate iStar 2.0 against those qualities.

The remainder of the paper is structured as follows. Section 2 includes a brief literature review of empirical evaluations in modeling languages and i *. In Section 3, we define the set of qualities to be empirically evaluated and a roadmap defining the time-line of the implementation of these evaluations. Finally, we draw some conclusions in Section 4.

Empirical Evaluation of Modeling Languages

There is a variety of empirical evaluations in the area of modeling languages in general, and in i * modeling language in particular. This section provides a brief summary of these studies focusing on the qualities evaluated by the studies.

Lindland et al. [ 7 ] propose a framework that identifies three category of qualities related to modeling languages (syntactic, semantic, and pragmatic), quality goals for each category, and the means for achieving these goals. The semantic qualities refer to the validity and completeness of the language and the models generated using the language, syntactic qualities are related to the syntax of the language, and pragmatic qualities concern the understandability of the language and its application.

Guizzardi et al. [ 5 ] suggest domain and comprehensibility appropriateness as key qualities of a modeling language, relying on verifying lucidity, soundness, laconicity, and completeness properties of model instances. These properties are then related to corresponding language properties: construct overload, construct excess, construct redundancy, and ontological expressiveness.

Frank [ 4 ] proposes a method to evaluate reference models, where the evaluation concerns both the general qualities of conceptual models and re-usability of the reference domain. The framework states four different evaluation perspectives: economic, deployment, engineering and epistemological. Each perspective is structured into multiple aspects for which a success criterion is provided.

Interest in i * evaluation appears to be on the rise, with studies covering both the language evaluation and the applicability of i * in the industry. We distinguish between different kinds of studies. Some works evaluate the use of an i * extension comparing it to the use of i * [ 10 ]. Other approaches compare i * with other goal-oriented modeling languages such as KAOS [ 8 ] or Techne [ 6 ]. Finally, other studies evaluate specific characteristics of the language such as visual effectiveness [ 9 ].

The majority of the studies providing empirical evidence in the literature are evaluating the applicability of i * for different purposes in the industrial environment. Elahi et al. [ 3 ] studied the use of i * for gathering and understanding knowledge in an organization, concluding that some constructs are not used by practitioners. Carvallo et al. [ 1 ] focus on socio-technical systems and conclude that some models are too difficult to read and modify due to their complexity. A variety of real use cases were presented at the i * Showcase in 2011 1. 3

iStar 2.0 Evaluation Roadmap In order to evaluate iStar 2.0, we need to define the set of the language qualities that we want to assess. Based on the review of Section 2, we present a number of qualities to evaluate, then discuss suitable empirical methods, and finally devise an initial roadmap for the empirical evaluation. 1 http://www.city.ac.uk/centre-for-human-computer-interaction-design/ istar11 3.1

Qualities to be evaluated

As iStar 2.0 was not defined as a new language, but a set of core concepts refining the original i* [ 12 ], backward compatibility is critically important. As a community, we need to collect evidence to determine if iStar 2.0 meets the needs of the users of i *. The open nature of i * comes with a drawback that iStar 2.0 is trying to mitigate: the steep learning curve that makes it hard to employ the language in industry. Therefore, learnability is also a priority quality to be evaluated. Keeping the open nature of i * was also one of the main objectives during the definition of iStar 2.0. Consequently, we also need to consider the extensibility quality, i.e., evaluating whether iStar 2.0 is a suitable baseline for extensions.

Additionally to these qualities, we consider some qualities to evaluate the quality of the language, for example expressiveness or syntactic correctness. Regarding the expressiveness, we are interested in evaluating if iStar 2.0 has a suitable set of constructs (missing, excess or overload). Syntactic correctness evaluates whether the modelers can easily detect and correct syntactic errors using iStar 2.0 and whether the language can prevent syntactic errors.

We have also included qualities not directly assessed during the definition of iStar 2.0, such as scalability. The detailed set of qualities to be evaluated is included in Table 1. We categorize the qualities based on the classification provided in [ 7 ]. 3.2

Empirical Methods: Design Dimensions

In order to evaluate the qualities listed in Table 1, several empirical studies must be designed and conducted. We envision the application of several empirical methods, including experiments, surveys and case studies. We can enumerate a number of dimensions that must be considered when designing such studies.

The choice of subjects participating in the studies is a dimension that must be determined for each study. To classify the subjects, we can use two categories: expertise and background (industry or academy). We need to clearly define a set of i* experts for inclusion in the backward compatibility evaluation. For practical applicability, we need to involve practitioners from industry. For other qualities, we can treat the expertise and the background of participants as a variable in the study.

We also need to decide when to evaluate the iStar 2.0 language in isolation and when a comparative analysis comparing iStar 2.0 to i * is needed. The same reasons that lead us to pay special attention to the backward compatibility and the learnability lead us to think, that for these specific qualities, we should conduct comparative analysis. Meanwhile, the evaluation of the other qualities can focus only on iStar 2.0. 3.3

Roadmap

From an empirical software engineering standpoint, we can identify two main phases for the evaluation of iStar 2.0: formative and summative. The formative Does iStar allow one to capture a sufficient number of concepts in a socio-technical domain? Do iStar 2.0 models have only one interpretation? Is iStar 2.0 able to represent the same phenomena as i *? Is it easy to add new concepts to iStar 2.0? What does the learning curve of iStar 2.0 look like? Does iStar 2.0 facilitate changing and updating models? Can iStar2.0 be successfully applied to real world cases? Does iStar 2.0 support the creation and analysis of large problems? Comprehensibility Can iStar 2.0 models be understood?

Is the effort required to use iStar 2.0 worth the benefits? phase corresponds to the task related to development of the proposal providing some partial empirical validation for the resulting proposal, while the summative phase evaluates if the proposal can be implemented in the real world. We are currently in the formative phase, and precisely in the treatment validation step of Wieringa’s design science methodology [ 11 ].

We divide the proposed empirical evaluation plan in three phases, divided in a total of five stages. The first two phases correspond to the formative and summative phases in empirical research, while the third phase describes complementary activities: – In the formative phase, the evaluation will concern the qualities that led the design decisions for iStar 2.0. These qualities include keeping a core set of primitives (stage 1), providing a clear meaning for such primitives (stage 2), and flattening the learning curve for new users (stage 3). – In the summative phase, the proposal (in our case, iStar 2.0) should be tested for applicability to real-world cases (stage 4). – The third phase includes the study of additional properties that do not directly relate to the use of iStar 2.0 itself, but rather on its capability to be adapted for specific cases or domains (stage 5). During the last few years, the i * community has been working on the definition of a standard, core version, resulting in iStar 2.0. The main goal of this effort was to facilitate the consistent learning, teaching, and application of i *. After the definition of iStar 2.0, the natural next step is evaluating the proposal to gather evidence of whether the proposal achieves the expected qualities.

In this paper, we emphasize the necessity of evaluating iStar 2.0 through empirical studies. Our first step is the identification of a set of qualities against which iStar 2.0 should be evaluated. We also discuss some key dimensions that need to be defined when conducting these empirical studies. Interestingly, many of the qualities we identified are pragmatic; we surmise this is linked to the limited adoption of i * in industry.

We prioritize the evaluation tasks of the qualities grouping them in five stages. Some of these tasks are labeled as formative evaluation, others are part of summative evaluation, and the remaining tasks are additional studies of the extensibility and customizability of iStar 2.0. Based on this grouping, we define a roadmap proposing an order of execution for the various evaluation stages.

The next steps consist of conducting empirical studies addressing one or more of the identified qualities for iStar 2.0. Although we plan to design and conduct several studies ourselves, an effective evaluation of the language will require a community-wide effort. We encourage i * community members to use and evaluate iStar 2.0, keeping in mind the qualities presented here, and reporting the results publicly. Our hope is that, as a community, we build evidence either to support the usefulness of iStar 2.0 as well as to shape the future versions of the language.

Acknowledgments. This work is supported by EOSSAC project, funded by the Ministry of Economy and Competitiveness of the Spanish government (TIN201344641-P), an ERC Marie Skodowska-Curie Intra European Fellowship (PIEFGA-2013-627489) and a Natural Sciences and Engineering Research Council of Canada Postdoctoral Fellowship (Sept. 2014 – Aug. 2016). The second and third author have received funding from the SESAR Joint Undertaking under grant agreement No. 699306 under European Union’s Horizon 2020 research and innovation programme.

1. Carvallo , J.P. , and Franch , X. : “On the Use of i* for Architecting Hybrid Systems: A Method and an Evaluation Report” . In: Lecture Notes in Business Information Processing. Springer Science + Business Media , 2009 , pp. 38 - 53 .

2. Dalpiaz , F. , Franch , X. , and Horkoff , J.: iStar 2 . 0 Language Guide . CoRR abs/ 1605 .07767 ( 2016 )

3. Elahi , G. , Yu , E. , and Annosi , M.C. : “Modeling Knowledge Transfer in a Software Maintenance Organization - An Experience Report and Critical Analysis” . In: Lecture Notes in Business Information Processing . 2008 , pp. 15 - 29 .

4. Frank , U.: “Evaluation of Reference Models” . In: Reference modeling for business systems analysis . IGI Global , 2006 , pp. 118 - 140 .

5. Guizzardi , G. , Pires , L.F. , and Sinderen , M. van: “ An Ontology-Based Approach for Evaluating the Domain Appropriateness and Comprehensibility Appropriateness of Modeling Languages” . In: Model Driven Engineering Languages and Systems . Springer Science + Business Media, 2005 , pp. 691 - 705 .

6. Horkoff , J. , Aydemir , F.B. , Li , F. , Li , T. , and Mylopoulos , J.: Evaluating Modeling Languages: An Example from the Requirements Domain . In: ER ( 2014 )

7. Lindland , O. , Sindre , G. , and Solvberg , A. : Understanding Quality in Conceptual Modeling . IEEE Softw . 11 ( 2 ), 42 - 49 ( 1994 )

8. Matulevičius , R. , and Heymans , P.: “ Comparing Goal Modelling Languages: An Experiment” . In: Requirements Engineering: Foundation for Software Quality . Springer Science + Business Media, 2007 , pp. 18 - 32 .

9. Moody, D.L., Heymans , P. , and Matulevicius , R.: Improving the Effectiveness of Visual Representations in Requirements Engineering: An Evaluation of i* Visual Syntax . In: RE ( 2009 )

10. Teruel , M.A. , Navarro , E. , López-Jaquero , V. , Montero , F. , and González , P.: “ Comparing Goal-Oriented Approaches to Model Requirements for CSCW” . In: Communications in Computer and Information Science . 2013 , pp. 169 - 184 .

11. Wieringa , R.J.: Design science methodology for information systems and software engineering. Springer ( 2014 )

12. Yu , E. : Modelling strategic relationships for process reengineering . University of Toronto ( 1996 ).