<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Empirical Evaluation Roadmap for iStar 2.0</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lidia López</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fatma Başak Aydemir</string-name>
          <email>f.b.aydemir@uu.nl</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabiano Dalpiaz</string-name>
          <email>f.dalpiaz@uu.nl</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jennifer Horkoff</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>City University London</institution>
          ,
          <addr-line>London</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad Politècnica de Catalunya</institution>
          ,
          <addr-line>Barcelona</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Utrecht University</institution>
          ,
          <addr-line>Utrecht</addr-line>
          ,
          <country country="NL">Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <volume>1674</volume>
      <fpage>55</fpage>
      <lpage>60</lpage>
      <abstract>
        <p>The iStar 2.0 modeling language is the result of a two-year community effort intended at providing a solid, unified basis for teaching and conducting research with i *. The language was released with important qualities in mind, such as keeping a core set of primitives, providing a clear meaning for those primitives, and flattening the learning curve for new users. In this paper, we propose a list of qualities against which we intend iStar 2.0 to be evaluated. Furthermore, we describe an empirical evaluation plan, which we devise in order to assess the extent to which the language meets the identified qualities and to inform the development of further versions of the language. Besides explaining the objectives and steps of our planned empirical studies, we make a call for involving the research community in our endeavor.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Many dialects and extensions of the i * modeling language have been proposed
since its introduction in the 1990s. Although these proposals demonstrate the
popularity of the language in the research community and allow adaptation of
the framework to a variety of domains (e.g., security, law, service-oriented
architectures), they have also created difficulties in learning, teaching, and applying
i * consistently.</p>
      <p>
        iStar 2.0 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is the result of a collective effort of the i * community aimed to
overcome these difficulties by defining a standard core set of concepts. Given the
objectives of iStar 2.0, our aim is twofold: a. to measure how well the language
achieves the objectives, and b. to inform further developments with empirical
evidence.
      </p>
      <p>More specifically, our research question is the following: Does iStar 2.0
provide a solid and unified basis for teaching and supporting ongoing research on
goal-oriented requirements engineering? Toward answering this question, we
identify several relevant qualities and provide an initial roadmap for the
empirical studies to conduct to evaluate iStar 2.0 against those qualities.</p>
      <p>The remainder of the paper is structured as follows. Section 2 includes a
brief literature review of empirical evaluations in modeling languages and i *. In
Section 3, we define the set of qualities to be empirically evaluated and a roadmap
defining the time-line of the implementation of these evaluations. Finally, we
draw some conclusions in Section 4.</p>
      <p>Copyright © 2016 for this paper by its authors. Copying permitted for private and academic purposes.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Empirical Evaluation of Modeling Languages</title>
      <p>There is a variety of empirical evaluations in the area of modeling languages in
general, and in i * modeling language in particular. This section provides a brief
summary of these studies focusing on the qualities evaluated by the studies.</p>
      <p>
        Lindland et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] propose a framework that identifies three category of
qualities related to modeling languages (syntactic, semantic, and pragmatic),
quality goals for each category, and the means for achieving these goals. The
semantic qualities refer to the validity and completeness of the language and
the models generated using the language, syntactic qualities are related to the
syntax of the language, and pragmatic qualities concern the understandability
of the language and its application.
      </p>
      <p>
        Guizzardi et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] suggest domain and comprehensibility appropriateness as
key qualities of a modeling language, relying on verifying lucidity, soundness,
laconicity, and completeness properties of model instances. These properties are
then related to corresponding language properties: construct overload, construct
excess, construct redundancy, and ontological expressiveness.
      </p>
      <p>
        Frank [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] proposes a method to evaluate reference models, where the
evaluation concerns both the general qualities of conceptual models and re-usability
of the reference domain. The framework states four different evaluation
perspectives: economic, deployment, engineering and epistemological. Each perspective
is structured into multiple aspects for which a success criterion is provided.
      </p>
      <p>
        Interest in i * evaluation appears to be on the rise, with studies covering
both the language evaluation and the applicability of i * in the industry. We
distinguish between different kinds of studies. Some works evaluate the use of
an i * extension comparing it to the use of i * [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Other approaches compare
i * with other goal-oriented modeling languages such as KAOS [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] or Techne [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
Finally, other studies evaluate specific characteristics of the language such as
visual effectiveness [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        The majority of the studies providing empirical evidence in the literature
are evaluating the applicability of i * for different purposes in the industrial
environment. Elahi et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] studied the use of i * for gathering and understanding
knowledge in an organization, concluding that some constructs are not used by
practitioners. Carvallo et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] focus on socio-technical systems and conclude
that some models are too difficult to read and modify due to their complexity.
A variety of real use cases were presented at the i * Showcase in 2011 1.
3
      </p>
      <p>iStar 2.0 Evaluation Roadmap
In order to evaluate iStar 2.0, we need to define the set of the language qualities
that we want to assess. Based on the review of Section 2, we present a number of
qualities to evaluate, then discuss suitable empirical methods, and finally devise
an initial roadmap for the empirical evaluation.
1 http://www.city.ac.uk/centre-for-human-computer-interaction-design/
istar11
3.1</p>
      <sec id="sec-2-1">
        <title>Qualities to be evaluated</title>
        <p>
          As iStar 2.0 was not defined as a new language, but a set of core concepts
refining the original i* [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], backward compatibility is critically important. As
a community, we need to collect evidence to determine if iStar 2.0 meets the
needs of the users of i *. The open nature of i * comes with a drawback that iStar
2.0 is trying to mitigate: the steep learning curve that makes it hard to employ
the language in industry. Therefore, learnability is also a priority quality to be
evaluated. Keeping the open nature of i * was also one of the main objectives
during the definition of iStar 2.0. Consequently, we also need to consider the
extensibility quality, i.e., evaluating whether iStar 2.0 is a suitable baseline for
extensions.
        </p>
        <p>Additionally to these qualities, we consider some qualities to evaluate the
quality of the language, for example expressiveness or syntactic correctness.
Regarding the expressiveness, we are interested in evaluating if iStar 2.0 has a
suitable set of constructs (missing, excess or overload). Syntactic correctness
evaluates whether the modelers can easily detect and correct syntactic errors
using iStar 2.0 and whether the language can prevent syntactic errors.</p>
        <p>
          We have also included qualities not directly assessed during the definition
of iStar 2.0, such as scalability. The detailed set of qualities to be evaluated
is included in Table 1. We categorize the qualities based on the classification
provided in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
3.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Empirical Methods: Design Dimensions</title>
        <p>In order to evaluate the qualities listed in Table 1, several empirical studies must
be designed and conducted. We envision the application of several empirical
methods, including experiments, surveys and case studies. We can enumerate a
number of dimensions that must be considered when designing such studies.</p>
        <p>The choice of subjects participating in the studies is a dimension that must
be determined for each study. To classify the subjects, we can use two categories:
expertise and background (industry or academy). We need to clearly define a set
of i* experts for inclusion in the backward compatibility evaluation. For practical
applicability, we need to involve practitioners from industry. For other qualities,
we can treat the expertise and the background of participants as a variable in
the study.</p>
        <p>We also need to decide when to evaluate the iStar 2.0 language in isolation
and when a comparative analysis comparing iStar 2.0 to i * is needed. The same
reasons that lead us to pay special attention to the backward compatibility and
the learnability lead us to think, that for these specific qualities, we should
conduct comparative analysis. Meanwhile, the evaluation of the other qualities
can focus only on iStar 2.0.
3.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Roadmap</title>
        <p>From an empirical software engineering standpoint, we can identify two main
phases for the evaluation of iStar 2.0: formative and summative. The formative
Does iStar allow one to capture a sufficient
number of concepts in a socio-technical domain?
Do iStar 2.0 models have only one interpretation?
Is iStar 2.0 able to represent the same phenomena
as i *?
Is it easy to add new concepts to iStar 2.0?
What does the learning curve of iStar 2.0 look
like?
Does iStar 2.0 facilitate changing and updating
models?
Can iStar2.0 be successfully applied to real world
cases?
Does iStar 2.0 support the creation and analysis
of large problems?
Comprehensibility Can iStar 2.0 models be understood?</p>
        <p>
          Is the effort required to use iStar 2.0 worth the
benefits?
phase corresponds to the task related to development of the proposal providing
some partial empirical validation for the resulting proposal, while the summative
phase evaluates if the proposal can be implemented in the real world. We are
currently in the formative phase, and precisely in the treatment validation step
of Wieringa’s design science methodology [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>We divide the proposed empirical evaluation plan in three phases, divided
in a total of five stages. The first two phases correspond to the formative and
summative phases in empirical research, while the third phase describes
complementary activities:
– In the formative phase, the evaluation will concern the qualities that led the
design decisions for iStar 2.0. These qualities include keeping a core set of
primitives (stage 1), providing a clear meaning for such primitives (stage 2),
and flattening the learning curve for new users (stage 3).
– In the summative phase, the proposal (in our case, iStar 2.0) should be tested
for applicability to real-world cases (stage 4).
– The third phase includes the study of additional properties that do not
directly relate to the use of iStar 2.0 itself, but rather on its capability to be
adapted for specific cases or domains (stage 5).
During the last few years, the i * community has been working on the definition
of a standard, core version, resulting in iStar 2.0. The main goal of this effort
was to facilitate the consistent learning, teaching, and application of i *. After
the definition of iStar 2.0, the natural next step is evaluating the proposal to
gather evidence of whether the proposal achieves the expected qualities.</p>
        <p>In this paper, we emphasize the necessity of evaluating iStar 2.0 through
empirical studies. Our first step is the identification of a set of qualities against
which iStar 2.0 should be evaluated. We also discuss some key dimensions that
need to be defined when conducting these empirical studies. Interestingly, many
of the qualities we identified are pragmatic; we surmise this is linked to the
limited adoption of i * in industry.</p>
        <p>We prioritize the evaluation tasks of the qualities grouping them in five stages.
Some of these tasks are labeled as formative evaluation, others are part of
summative evaluation, and the remaining tasks are additional studies of the
extensibility and customizability of iStar 2.0. Based on this grouping, we define a
roadmap proposing an order of execution for the various evaluation stages.</p>
        <p>The next steps consist of conducting empirical studies addressing one or more
of the identified qualities for iStar 2.0. Although we plan to design and conduct
several studies ourselves, an effective evaluation of the language will require a
community-wide effort. We encourage i * community members to use and
evaluate iStar 2.0, keeping in mind the qualities presented here, and reporting the
results publicly. Our hope is that, as a community, we build evidence either to
support the usefulness of iStar 2.0 as well as to shape the future versions of the
language.</p>
        <p>Acknowledgments. This work is supported by EOSSAC project, funded by the
Ministry of Economy and Competitiveness of the Spanish government
(TIN201344641-P), an ERC Marie Skodowska-Curie Intra European Fellowship
(PIEFGA-2013-627489) and a Natural Sciences and Engineering Research Council of
Canada Postdoctoral Fellowship (Sept. 2014 – Aug. 2016). The second and third
author have received funding from the SESAR Joint Undertaking under grant
agreement No. 699306 under European Union’s Horizon 2020 research and
innovation programme.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Carvallo</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Franch</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <article-title>: “On the Use of i* for Architecting Hybrid Systems: A Method and an Evaluation Report”</article-title>
          .
          <source>In: Lecture Notes in Business Information Processing. Springer Science + Business Media</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>53</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Dalpiaz</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franch</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Horkoff</surname>
          </string-name>
          ,
          <source>J.: iStar 2</source>
          .
          <article-title>0 Language Guide</article-title>
          .
          <source>CoRR abs/ 1605</source>
          .07767 (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Elahi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Annosi</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.C.</surname>
          </string-name>
          <article-title>: “Modeling Knowledge Transfer in a Software Maintenance Organization - An Experience Report and Critical Analysis”</article-title>
          .
          <source>In: Lecture Notes in Business Information Processing</source>
          .
          <year>2008</year>
          , pp.
          <fpage>15</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Frank</surname>
          </string-name>
          , U.:
          <article-title>“Evaluation of Reference Models”</article-title>
          . In:
          <article-title>Reference modeling for business systems analysis</article-title>
          .
          <source>IGI Global</source>
          ,
          <year>2006</year>
          , pp.
          <fpage>118</fpage>
          -
          <lpage>140</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Guizzardi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pires</surname>
            ,
            <given-names>L.F.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Sinderen</surname>
          </string-name>
          , M. van: “
          <article-title>An Ontology-Based Approach for Evaluating the Domain Appropriateness and Comprehensibility Appropriateness of Modeling Languages”</article-title>
          .
          <source>In: Model Driven Engineering Languages and Systems</source>
          . Springer Science + Business Media,
          <year>2005</year>
          , pp.
          <fpage>691</fpage>
          -
          <lpage>705</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Horkoff</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aydemir</surname>
            ,
            <given-names>F.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Mylopoulos</surname>
          </string-name>
          , J.:
          <article-title>Evaluating Modeling Languages: An Example from the Requirements Domain</article-title>
          . In: ER (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lindland</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sindre</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Solvberg</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Understanding Quality in Conceptual Modeling</article-title>
          .
          <source>IEEE Softw</source>
          .
          <volume>11</volume>
          (
          <issue>2</issue>
          ),
          <fpage>42</fpage>
          -
          <lpage>49</lpage>
          (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Matulevičius</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Heymans</surname>
          </string-name>
          , P.: “
          <article-title>Comparing Goal Modelling Languages: An Experiment”</article-title>
          . In: Requirements Engineering:
          <article-title>Foundation for Software Quality</article-title>
          . Springer Science + Business Media,
          <year>2007</year>
          , pp.
          <fpage>18</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. Moody, D.L.,
          <string-name>
            <surname>Heymans</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Matulevicius</surname>
          </string-name>
          , R.:
          <article-title>Improving the Effectiveness of Visual Representations in Requirements Engineering: An Evaluation of i* Visual Syntax</article-title>
          . In: RE (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Teruel</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Navarro</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>López-Jaquero</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montero</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>González</surname>
          </string-name>
          , P.: “
          <article-title>Comparing Goal-Oriented Approaches to Model Requirements for CSCW”</article-title>
          .
          <source>In: Communications in Computer and Information Science</source>
          .
          <year>2013</year>
          , pp.
          <fpage>169</fpage>
          -
          <lpage>184</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Wieringa</surname>
          </string-name>
          , R.J.:
          <article-title>Design science methodology for information systems</article-title>
          and software engineering. Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Modelling strategic relationships for process reengineering</article-title>
          . University of Toronto (
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>