Extension of iStar for Big data projects Chabane Djeddi Nacer eddine Zarour Pierre-Jean Charrel LIRE Laboratory, LIRE Laboratory, IRIT Laboratory, Constantine 2- A. Mehri Constantine 2- A. Mehri Toulouse 2 Jean Jaurès University, Algeria University, Algeria University, France chabane.djeddi@univ- nasro.zarour@univ- charrel@univ-tlse2.fr constantine2.dz constantine2.dz [Madd12], [OtPe15], [ShOt16], which make it crucial and specific. Abstract Authors in [OtPe15], [ShOt16] confirmed that there is the necessity for the Big data software to include all Big data is characterized by the volume, the three parameters (functional feature, time variety, velocity, and complexity of the data constraint, and verifiable during some period) to which make it very difficult to handle. On the completely define the requirement specification for Big other hand, requirements engineering (RE) is data projects. But all of the existing models are not very important for the success of any software including the constrained time and verifiable time, and system. As a result, the importance of specify the requirement only in terms of the functional requirements engineering for Big data projects features. Until now, there is no work to create or to is evident, but there is no RE method to adapt an existing RE method for Big data projects. undertake them. We have analyzed the fields The modeling languages are classified into two of Big data and RE to figure out how the RE classes: (i) Domain Specific modeling language can allow taking into account the properties of (DSML) to model only one given domain, (ii) General Big data. This paper presents BIStar which is Purpose Modeling Language (GPML), to model any an extension of iStar for Big data projects to domain [GCAH18]. iStar [Ref96] is a GPML to support its properties in the elicitation step of support the elicitation step in RE for any domain. It is RE process. Our extension undertakes the widely used and adopted by the research characteristics of Big data, which allow a community[GCAH18]. better elicitation of the requirements and iStar was extended in order to be adopted in many therefore, it facilitates data analysis. domains like (security, data warehouse, social-technical Keywords - 1st Big data, 2nd Requirements systems, etc.) [GCAH18]. In our work, we propose engineering, 3rd iStar, 4th iStar extension BiStar (for Big data iStar) which is a DSML dedicated to the modeling of requirements for Big data. BiStar is based essentially on iStar, to not recreate all from 1. Introduction scratch. We extend iStar to undertake the properties of Compagnies store a large amount of data every day as Big data. Like that, we benefit from the iStar and add to transactions that are important to them. However, over it what we need and what is specific for Big data. time, the management of these data by traditional systems becomes impossible, even regarding in terms 2. Literature reviews of analysis time, it becomes challenging to guarantee efficient data processing in a short time. We find In this section, we describe briefly the domains of the structured data but also semi-structured data and even requirement engineering and the Big data that are unstructured data. This heterogeneity generates data related to our work. incompatibility issues that threaten integrity and consistency. 2.1 Requirements engineering Big data has its own properties (Volume, Velocity, The primary criterion for the success of any software is Variety, Veracity, and Value) [ChML14], [KaWG13], the degree of satisfaction of the goals fixed by the Copyright © by the paper’s authors. Copying permitted only for private stakeholders. The requirements engineering (RE) is the and academic purposes. process of discovery of these goals [NuEa00]. In: Proceedings of the 3rd Edition of the International Conference on Advanced Aspects of Software Engineering (ICAASE18), Constantine, Algeria, 1,2-December-2018, published at http://ceur-ws.org Page 9 Extension of iStar for Big data projects ICAASE'2018 ‘’Requirements engineering is the branch of software product is validated in software life cycle test phase on engineering concerned with the real-world goals for, the basis of its requirements. functions of, and constraints on software systems. It is In this work, we are interested in the first step, which is also concerned with the relationship of these factors to requirement elicitation; because it is indispensable for precise specifications of software behavior, and to their any RE step and we cannot do any others steps without evolution over time and across software families." it. [IeAI97]. The objective of RE is to know the requirements of the 2.1.2 The approaches of RE stakeholders and to verify them in order to arrive at an We find in the literature [ZoCo05] that there are three agreement on the requirements. To fulfill this, we basics approaches of RE (i) Goal Based Approaches, perform the activities of elicitation, negotiation, (ii) Scenarios Based Approaches, (iii) Viewpoints documentation, validation, and management of the Based Approaches. These approaches can be modified requirements. One of the difficult parts in building a or mixed to create new approaches. software program is to decide what the software should The fundamental premise of goal based approaches exactly do. RE helps us to understand the problem. By (GORE) is high-level goals. These goals represent studying the RE specifications precisely, we can even objectives for the system, they are decomposed (e.g. estimate the cost of the project. Moreover, RE also usually using AND and OR relationships) and helps to know the limits of our system [MiNa11]. elaborated (e.g. with “Why” and “How” questioning) into sub goals and then further refined, in such a way, 2.1.1 The steps of RE elementary requirements are elicited [ZoCo05]. RE is usually divided into four steps [KoSo98] (i) Several methods can be considered as belonging to Requirements elicitation (ii) Requirements analysis and GORE: iStar Framework [Ref96], NFR [ChPr09], negotiation (iii) Requirements documentation (vi) KAOS [Vanl01]. Among all GORE methods, KAOS Requirements validation. and iStar have been the most cited [WeOP09]. In our Requirements elicitation: serves to capture the work, we choose iStar to extend because it is very used requirements and it is usually divided into five sub- in academic research, and it is properly extensible steps [ZoCo05], Understanding the application domain, [GCAH18]. Identifying the sources of requirements, Analyzing the Scenarios Based Approaches use narrative and specific stakeholders, Selecting the techniques, Approaches, descriptions of current and future processes including and Tools to use, Eliciting the requirements from actions and interactions between the users and the stakeholders and other sources. system. Like use cases, scenarios do not typically Requirements analysis and negotiation: focuses on the consider the internal structure of the system, and review, understanding of the elicited requirements and require an incremental and interactive approach to their their verification for quality in terms of accuracy, development. Naturally, it is important when using completeness, clarity, and consistency. scenarios to collect all the potential exceptions for each Requirements documentation: we document the step [ZoCo05]. requirements obtained from previous steps. Viewpoints Based Approaches: aim to model the Requirements document can be considered as a base for domain from different perspectives in order to develop controlling changes and evaluating future products and a complete and consistent description of the target processes (system design, system test cases and system [ZoCo05]. Initially, the requirements are validation) [MiNa11]. opaque, informal and only expressed through personal Requirements validation: It is done for controlling the views. These views reflect the skills, objectives and quality. it means confirming that requirements are roles of each participant. The elicitation activity is, complete and well- written and supply needs of therefore, a collective activity. The expression of customer. This step may continue repeating other multiple views allows for better elicitation of requirements development phases because of identified requirements. deficiencies, gap between requirements, additional information and other issues. Implemented software International Conference on Advanced Aspects of Software Engineering Page 10 ICAASE, December, 01-02, 2018 Extension of iStar for Big data projects ICAASE'2018 2.2 Big data The Complexity: This is how to ensure the correlation and the links between the data, because the latter in a In this section, we will present the Big data passing Big data is collected from several heterogeneous briefly through its definitions, properties, and sources, and that is very important to guarantee the importance. integrity of the data, and not to be found in There is no exact definition of Big data even though unmanageable situations [KaWG13]. several definitions have appeared. Big data means a large dataset that cannot be processed by traditional 2.2.2 The importance of Big data tools [ChML14]. Big data can be seen from several perspectives, (i) on the infrastructure perspective Big Big data have great importance in a lot of fields (the data is seen as a significant amount of data industry, risk analysis, social networks). characterized by (Volume, Velocity, Variety, Veracity, In the industry: Companies store a large amount of data and Value), (ii) on the analysis perspective Big data is every day as transactions that are important to them. seen as events, (iii) on the business perspective Big However, over time, the management of these data by data can be considered as the output that can be used traditional systems becomes impossible. Traditional directly for the improvement of the work [OtPe15], systems cannot support a large amount of data, here we [ShOt16]. The most crucial problem is not how to store can clearly see the utility of Big data for businesses, data, but rather to analyze heterogeneous data in a short because it allows them to store a significant amount of time [Madd12]. data for a long time [KaWG13]. There is a solid relationship between Big data and other In risk analysis: Many companies use their data to technologies such as Cloud and IoT. Cloud can be an calculate risk. Without Big data technologies, they use infrastructure for Big data, and IoT is considered the small amounts. With their arrival, it becomes possible most massive source of Big data [ChML14]. to analyze a large amount of data, which allows better Consequently, our contribution in Big data will risk management [KaWG13]. influence other technologies. In social networks: the most common use of Big data is in the areas of social networks and user preferences. 2.2.1 The properties of Big data Social networks use a large amount of data collected from user reviews and choices. That way, they can The Variety: the data manipulated today are not from a analyze the data and make it known that they are the single representation, we have structured data, but also preferences of the users in a short time, in order to we have semi-structured data and even unstructured improve their products and to change their decisions to data such as web pages, social networks, making it very have a good position in the market [KaWG13]. difficult to manipulate these data using traditional systems [ChML14], [KaWG13]. The volume : the name itself in the word Big data 3. Case study means that volume takes an important role in the We have chosen to present the case study in this section creation of the Big data concept since the data handled in order to be able to use it in the modeling with iStar today are in quantity of zettabytes at most large and BiStar (iStar extended) that we propose in the companies, this is of course, one of the limitations of following sections. This example will accompany us traditional systems [ChML14], [KaWG13]. throughout the paper. The velocity: the speed of incoming data from various We will take an example of the presidential elections of sources is so critical, which make it difficult for the 2019 in Algeria. The community of a camp wants to traditional systems to undertake the situation increase the chance of success of its candidate. For that, [KaWG13]. they want to create a Big data project to study the The value: the stored data is important. A user can opinions of the people, which allow them to know the execute some queries against stored data or may misuse keys for which they can focus in order to lunch targeted existing data, and this can cause false results for advertisements to improve the chances of success of decision makers [KaWG13]. their candidate. International Conference on Advanced Aspects of Software Engineering Page 11 ICAASE, December, 01-02, 2018 Extension of iStar for Big data projects ICAASE'2018 To do this, they collect data from social networks, and to develop‘’ to launch targeted advertising. ‘’Social analyze them to know the essentials points in the networks‘’ depend on the ‘’elector‘’ to collect opinion of the different categories of population. On information about their preferences. The ‘’system to be these points, they make a presidential plan and present developed‘’ depends on the ‘’social networks‘’ to it to the people. After that, they collect the opinion of receive the elector information resource. people to make changes to the plan and make targeted The ‘’advertising manager‘’ depends on the ‘’system advertising. to be developed‘’ to provide the summary information This example is a Big data project, because we will on electors. The ‘’system to be developed‘’ depends on manipulate a large amount of data with different the ‘’advertising manager‘’ to accomplish the goal of natures (structured, semi-structured and even developing targeted advertising. unstructured) within a limited time. Therefore, these data cannot be processed using traditional systems. 4.2 The Strategic Rationale (SR) Model The Strategic Rationale (SR) Model is used to detail 4. iStar the reasoning of each actor apart. We represent what In this session, we explain the iStar method, as well as happens inside an actor, which allows a deep their diagrams. iStar [DaFH16], [I*wi00], [Ref96] is a understanding of the process. goal-oriented RE method, it is very used for Figure [2] shows the application of the strategic requirements elicitation. We first start with the rationale model on the case of study of the presidential identification of the actors and the relations of the elections. strategic dependencies between them, and then we detail the reasoning of each actor. It consists of two models: The Strategic Dependency (SD) Model, and The Strategic Rationale (SR) Model. 4.1 The Strategic Dependency (SD) Model The strategic dependency model represents a network of strategic dependencies between the different actors of the future system. One actor (the dependee) depends on another one (the depender) to accomplish a goal. There are nodes and links between them, the nodes represent the actors, and the links represent the dependencies. There are four types of dependencies, (i) Goal dependency serves to present a dependency to accomplish a goal, (ii) Task dependency serves to present a task dependency between two actors, (iii) Resource dependency serves to present a resource dependency, (the depender) depends on (the dependee) to offer it a resource, (iv) Softgoal dependency serves to present a dependency of performance between two actors. Figure [1] represents the application of the strategic dependency (SD) model of the iStar method on the case study of presidential elections. The ‘’candidate‘’ Figure 1 : Strategic dependency (SD) model for depends on the ‘’elector‘’ for the goal of winning the elections elections. The ‘’system to be developed‘’ depends on the ‘’candidate‘’ to accomplish the task of offering him its information. The ‘’elector‘’ depends on the ‘’system International Conference on Advanced Aspects of Software Engineering Page 12 ICAASE, December, 01-02, 2018 Extension of iStar for Big data projects ICAASE'2018 Big data project must not only meet a need, but also respond in a very short time by processing a big amount and a specific nature of data (structured, semi- structured, unstructured). During the RE phase for Big data projects, we are interested in what to collect rather than how to collect because current techniques and approaches of RE are valid for Big data projects. Big data projects are like traditional projects on how to collect requirements. It is on the Big data properties (Volume, Velocity, Variety) that we are going to focus to collect and model them in the RE phase by BiStar. Also, The papers [OtPe15], [ShOt16] confirmed that there is the necessity for the Big data software to include all the three parameters (functional feature, time constraint, verifiable during some period) to completely define the requirement specification for Big data projects . 5.2. The Concepts added to iStar Based on the needs of requirements for Big data in the literature [ChML14], [KaWG13], [Madd12], [tPe15], [ShOt16], we have chosen to add the concepts of execution time, volume of data to process, variety of data, and durability of a goal. In the rest of this subsection, we will explain each concept and clarify why we added it. Figure 2 : Strategic rationale model for elections 5.2.1. The execution time 5. BiStar: An extension of iStar for Big data In a Big data project, the execution time must be exact. projects A late result is considered a wrong one. We take the case study of presidential elections In this session, we present BiStar (Big data iStar) which presented in section 3. The stakeholder needs the goal consists of an extension of the iStar method for Big ‘’Generate information synthesized on the profiles of data projects. We start with clarifying the needs for an electors‘’, and does not specify in what time it should extension of iStar to support elicitation of the be performed. The project will well be done and requirements for Big data projects; then we explain the finished. But the goal must be achieved in 15 days. So concepts to add, after that, we perform the BiStar on the project has failed to satisfy the stakeholder’s need. the case of study of the presidential elections. We conclude that the execution time of each goal must be specified at the beginning of the project. 5.1. The needs for an extension of iStar In this part, we explain the situation and the important 5.2.2. The volume of data to process points that we find them as critical ones. The volume of data is one of the most important The Elicitation is the most crucial step in RE, if it is not features of Big data projects, the volume is often large, well done can lead to projects that do not respond well but stakeholders are not aware of what can be done and to the needs of stakeholders. In the case of Big data what cannot be done. Even using Big data technologies projects, it is getting more and more complicated. A International Conference on Advanced Aspects of Software Engineering Page 13 ICAASE, December, 01-02, 2018 Extension of iStar for Big data projects ICAASE'2018 like (Hadoop and nosql systems), volume remains a crucial point when talking about Zettabytes [KaWG13]. In the case study of presidential elections presented in section 3, the stakeholder needs the goal ‘’Generate information synthesized on the profiles of electors‘’, and does not specify volume of data that must be proceeded. However the goal needs to analyse 100 Zettabytes of data. So the project has failed. The volume of data of each goal must also be specified at the beginning of the project. 5.2.3. The variety of data Figure 3 : Concepts added to BiStar In Big data projects, we find data with different presentations (structured, semi-structured data, and unstructured data). Building a Big data Project that 5.3. The application of BiStar on the case study manipulates semi-structured data is different from Figure [4] shows the application of BiStar strategic unstructured data. dependency model on the example of presidential In the above example (see section 3), the stakeholder elections. does not specify the nature of the data that must be proceed. The goal needs to analyze semi-structured and unstructured data. Consequently, the nature of data of each goal must be also specified at the beginning of the project. 5.2.4. The durability of a goal Big data projects are built to meet the needs during specified times; it turns out that their goals may become dissatisfied for stakeholders, so we need to get an agreement from the beginning on the time in which a requirement can be satisfied. In the case study considered in section3, the stakeholder does not specify the durability of its goal. When we validate the project with the stakeholder, he says it is not what he wants; the goal must be satisfied during the hall election. So the project failed to satisfy the need of the stakeholder. Also, the durability of a goal must be specified at the beginning of the project. iStar does not support the properties presented above, which do not allow a complete and refined elicitation of the requirements for Big data. We see that to support Big data projects by the iStar method; we must make sure that the goals are attached to their properties (execution time, the volume of data to be processed, the variety of data, and the durability of goal). Figure [3] shows graphically the concepts added to the Strategic Dependency (SD) Model, and The Strategic Figure 4 : Strategic Dependency model of BiStar for Rationale (SR) Model. the elections International Conference on Advanced Aspects of Software Engineering Page 14 ICAASE, December, 01-02, 2018 Extension of iStar for Big data projects ICAASE'2018 We keep the same meaning explained in section 4.1. However, we find in BiStar that new concepts are linked to the goal ''Develop targeted advertising'' which means that this goal must be done within 10 days, by analyzing 100 Zettabytes of unstructured and semi- structured data nature, and it must be in operation during the elections. Like that we give more completeness and refinement to the requirements. Figure [5] shows the application of BiStar's strategic rationale model on the example of presidential elections. We keep the same meaning explained in section 4.2, but also, we find in BiStar that new concepts are linked to the goal ''Design an election program'' and to the goal ''Generate synthesized information about the profile of electors''. We understand that the goal "Designe an Election Program" must be done within 2 days, by analyzing 30 unstructured and semi-structured nature Petabytes and it must be functional during the elections. And for the goal ''Generate information synthesized on the profiles of electors'', it must be done within 15 days, by analyzing 100 Zettabytes of unstructured and semi- structured nature, and it must be functional during the elections. 6. Conclusion In this work, we have proposed BiStar (Big data iStar) a new extension of iStar to elicit the requirements for Big data projects. This extension takes into account the properties of Big data projects to ensure a proper Figure 5: Strategic Rationale model of BiStar for elicitation of the requirements. elections We applied iStar and BiStar on the same case study of the presidential elections to show the utility of BiStar. We can note that without BiStar we can mess some 7. References important requirements. This modest research was the first attempt to feel the gap in the field of the adaptation [ChML14] CHEN, MIN ; MAO, SHIWEN ; LIU, YUNHAO: Big of RE methods for Big data projects. Data: A Survey. In: Mobile Networks and We are completing this work by applying the rest of the Applications Bd. 19 (2014), Nr. 2, S. 171– life cycle activities of RE (specification, validation...). 209 We hope that the research community gives more attention to this field. International Conference on Advanced Aspects of Software Engineering Page 15 ICAASE, December, 01-02, 2018 Extension of iStar for Big data projects ICAASE'2018 [ChPr09] CHUNG, LAWRENCE ; DO PRADO LEITE, JULIO Requirement Engineering. In: Interaction CESAR SAMPAIO: On non-functional Sciences (ICIS), 2011 4th International requirements in software engineering. Conference on : IEEE, 2011, S. 181–184 In: Conceptual modeling: Foundations and applications : Springer, 2009, S. 363– 379 [NuEa00] NUSEIBEH, BASHAR ; EASTERBROOK, STEVE: [DaFH16] DALPIAZ, FABIANO ; FRANCH, XAVIER ; Requirements engineering: a roadmap. HORKOFF, JENNIFER: istar 2.0 language In: Proceedings of the Conference on the guide. In: arXiv preprint Future of Software Engineering : ACM, arXiv:1605.07767 (2016) 2000, S. 35–46 [GCAH18] GONÇALVES, ENYO ; CASTRO, JAELSON ; [OtPe15] OTERO, CARLOS E. ; PETER, ADRIAN: Research ARAÚJO, JOÃO ; HEINECK, TIAGO: A Directions for Engineering Big Data Systematic Literature Review of iStar Analytics Software. In: IEEE Intelligent extensions. In: Journal of Systems and Systems Bd. 30 (2015), Nr. 1, S. 13–19 Software Bd. 137 (2018), S. 1–33 [Ref96] ERIC SIU-KWONG YU.: Modelling strategic [IeAI97] IEEE COMPUTER SOCIETY ; ACM SIGSOFT ; IFIP relationships for process reengineering, WORKING GROUP 2.9 (Hrsg.): Classification University of Toronto, PhD Thesis, 1996 of Research Efforts in Requirements Engineering. Los Alamitos, Calif : IEEE [ShOt16] SHARMA, KAPIL ; OTHERS: Quality issues Computer Society Press, 1997 — with big data analytics. In: Computing for ISBN 978-0-8186-7740-3 Sustainable Global Development (INDIACom), 2016 3rd International [I*wi00] i* Wiki | i* Guide. URL http://istar.rwth- Conference on : IEEE, 2016, S. 3589–3591 aachen.de/tiki- index.php?page=i*+Guide. - abgerufen [Vanl01] VAN LAMSWEERDE, AXEL: Goal-oriented am 2017-12-05 requirements engineering: A guided tour. In: Requirements Engineering, [KaWG13] KATAL, AVITA ; WAZID, MOHAMMAD ; 2001. Proceedings. Fifth IEEE GOUDAR, R. H.: Big data: issues, International Symposium on : IEEE, 2001, challenges, tools and good practices. In: S. 249–262 Contemporary Computing (IC3), 2013 Sixth International Conference on : IEEE, [WeOP09] WERNECK, VERA MARIA BENJAMIM ; OLIVEIRA, 2013, S. 404–409 ANTONIO DE PADUA ALBUQUERQUE ; DO PRADO LEITE, JULIO CESAR SAMPAIO: Comparing [KoSo98] KOTONYA, GERALD ; SOMMERVILLE, IAN: GORE Frameworks: i-star and KAOS. In: Requirements engineering: processes WER, 2009 and techniques : Wiley Publishing, 1998 [ZoCo05] ZOWGHI, DIDAR ; COULIN, CHAD: [Madd12] MADDEN, SAM: From databases to big Requirements elicitation: A survey of data. In: IEEE Internet Computing Bd. 16 techniques, approaches, and tools. In: (2012), Nr. 3, S. 4–6 Engineering and managing software requirements : Springer, 2005, S. 19–46 [MiNa11] MINA ATTARHA ; NASSER MODIRI: Focusing on the Importance and the Role of International Conference on Advanced Aspects of Software Engineering Page 16 ICAASE, December, 01-02, 2018