Towards Intelligent Technology for Error Detection and Quality Evaluation of Business Process Models Andrii Koppa and Dmytro Orlovskyia a National Technical University “Kharkiv Polytechnic Institute”, Kyrpychova str. 2, Kharkiv, 61002, Ukraine Abstract Business process modeling is an essential technique of business process management, used to align business and information technology sides in an organization. Business process models are graphical diagrams used to capture, analyze, and improve organizational activities. High- quality business process models are used to detect inefficiencies in enterprise workflows and gather requirements for supportive software systems. However, poor business process models are less understandable, hardly maintainable, error-prone, and may lead to expenses and time losses caused by occurring errors. Hence, the continuous quality analysis of created business process models should be introduced as a part of the business process management lifecycle, necessary to detect and eliminate modeling errors. In this study, we propose the connectionist system based on reinforcement learning principles to take into account the occurrence of various modeling errors and their impact on the total quality estimations. The software tool is created to implement this intelligent system, perform experiments using a large collection of business process models, analyze, and discuss obtained results. Keywords 1 Business Process Model, Intelligent Technology, Connectionist System, Quality Evaluation, Error Detection, Reinforcement Learning 1. Introduction Business Process Management (BPM) is the approach used to align Information Technology (IT) and business in an organization. According to [1], BPM combines management and IT approaches to achieve excellence of organization activities. The essential technique of BPM is the business process modeling. It simplifies communication and interaction between business users (i.e. managers, process owners, and other stakeholders) and IT services providers, responsible for the design, development, and maintenance of information systems in the organization [2]. In general, business processes are structured collections of activities and decision points driven by events, which take resources or information on inputs and produce products or services valuable to the consumers on outputs. For example, authors of [3] describe business processes as “chains of events, activities and decisions”, while business process models are considered as descriptions of such chains. Business process models are graphical diagrams similar in some way to workflow charts. The goal of business process modeling is to describe organizational activities in a way, convenient for further analysis. Well-designed business process models can help to find bottlenecks and other “weak spots” in organizational workflows, find opportunities for the improvement of enterprise IT systems or even introduce new IT solutions for activities that are not automated yet. Therefore, organizational activities depicted by business process models must be of high quality to ensure they will be understandable and maintainable by all parties involved in BPM projects. Poorly designed business process models are not only useless for the analysis and improvement of enterprise IntelITSIS’2023: 4th International Workshop on Intelligent Information Technologies and Systems of Information Security, March 22–24, 2023, Khmelnytskyi, Ukraine EMAIL: kopp93@gmail.com (A. Kopp); orlovskyi.dm@gmail.com (D. Orlovskyi) ORCID: 0000-0002-3189-5623 (A. Kopp); 0000-0002-8261-2988 (D. Orlovskyi) 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) workflows, they can even lead to new mistakes or inefficiencies when used to plan new or improved business processes, capture software requirements for enterprise IT systems, etc. Furthermore, poorly designed business process models may reflect inefficiencies of real processes. Hence, the quality of created business process models should be carefully controlled for the early detection and prevention of errors at the design stage, before they became real errors in organizational workflows and supporting IT systems causing unpredicted expenses, time losses, or even dangerous impact on humanity and environment for critical industries. This paper is organized in the following way: Sub-section 1.1 describes the state-of-the-art in the field of business process modeling and quality analysis of business process models, Sub-section 1.2 introduces the problem statement, Section 2 proposes intelligent technology for error detection and quality evaluation of business process models, and Section 3 shows experimental results obtained using the proposed approach and their discussion. 1.1. Related Work Numerous business process modeling notations, languages, and standards were proposed in this industry since the 80s and are still used nowadays. One of the most popular and widely-used process descriptions based on IDEF (Integrated Definition) standards proposed by the U.S. Air Force were IDEF0 and DFDs (Data Flow Diagrams). These structural analysis models were popular in the 80s until they were smoothly replaced by EPC (Event-driven Process Chain) models proposed in the 90s by IDS Scheer company. Today, a leader and de-facto standard business process modeling notation is BPMN (Business Process Model and Notation) proposed in 2005 by Business Process Management Initiative and updated by Object Management Group in 2011 (BPMN 2.0). According to one of the latest surveys in this field [4], only 4% of respondents use IDEF-based notations, 18% use EPC, and 64% use BPMN to describe organizational business activities. Authors of [3] name BPMN models as workflow descriptions, which depict sequences of tasks and events using control flows. The primitives (or symbols) of the BPMN notation are demonstrated in Fig. 1 below [5]. Figure 1: Essential BPMN symbols According to Fig. 1, BPMN business process diagrams signalize the beginning of business process instances using Start Events and finishing using End Events. Other things happening in an instant during the process execution are represented by Intermediate Events. Process activities represented by Tasks and Sub-Processed (i.e. expanding workflows) describe work units with the given duration. Sequence Flows are used to logically connect business process elements in a chain [3]. Complex BPMN models describe various workflow scenarios using branching and merging using Exclusive Gateways (show XOR logic of process paths’ execution), Inclusive Gateways (OR logic of process paths), and Parallel Gateways (AND logic of process paths). Process boundaries are described by Pools, workflow participants are described by Lanes, interactions between pools are described by Message Flows, and documents or information resources are described by Data Objects linked to activities using Associations [6]. Numerous studies are devoted to the business process model quality research, such as GoM (The Guidelines of Modeling), SEQUAL (SEmiotic QUALity), 7PMG (7 Process Modeling Guidelines), quality framework for conceptual modeling, and many others [7]. Taking into account the definition by ISO 9001 (International Organization for Standardization ) standards for quality as the “degree to which a set of inherent characteristics fulfills requirement” [8], the business process model quality should be understood and quantitatively measured as the “degree to which a model fulfills requirements of modeling rules”. Therefore, the quantitative evaluation of business process models is possible using metrics. Some of them are based on size measurement (i.e. the number of various elements, the longest path between business process elements, etc.), gateway mismatch measurement (i.e. each split gateway should have the corresponding join gateway, similarly to the brackets in mathematical expressions), connectivity analysis (i.e. the ratio between arcs and nodes), and control flow complexity analysis (i.e. the possible combinations of process states after split gateways) [9]. Discussed structural metrics of business process models are used to evaluate their quality from the understandability and maintainability viewpoints. For example, in [10] authors have proven that poor usage of business process symbols in BPMN models leads to their poor quality. Another paper [11] suggests thresholds to linguistically estimate the business process model quality (i.e. “very good”, “good”, “average”, and “poor”) using the Control Flow Complexity (CFC) metric. Also the CFC and other complexity metrics, mostly originating from software engineering, were considered in [12] to evaluate and improve maintainability as one of the business process model quality attributes. 1.2. Problem Statement Poorly designed business process models are sources of implementation errors and further costs associated with these errors, such as monetary expenses, time losses, or even some harmful impacts on humans or the environment if faulty business process models are related to critical industries. The BPM lifecycle typically consists of business process design, implementation, monitoring, and control [12], however, it lacks continuous control of created BPMN models quality. Therefore, in this paper, we propose the extension of the BPM process with the quality analysis of designed BPMN models, given in Fig. 2 below. Figure 2: The BPM process extended by continuous quality analysis of BPMN models However, the manual quality analysis of BPMN diagrams to detect and eliminate modeling errors could be a challenging problem. Just like software developers have compilers, which can detect code errors, or writers have text editors, which can show misspellings, business process modeling designers should have their special tools for BPMN validation. Furthermore, such tools should take into account previous experience in business process modeling error detection. Therefore, an intelligent system for quality evaluation of BPMN diagrams should be proposed to prevent cost and time losses, as well as other negative consequences, by early detection of business process modeling errors. 2. Materials and Methods Let us formally represent a BPMN business process model as a directed graph structure [13]: BPModel  N , l , A, (1) where:  N  T  E  G is the set of business process elements: tasks (and sub-processes) T , events E , and gateways G ;  G  S  J is the subset of gateways including split S and join J gateways;  l : G  and , or, xor is the mapping that defines gateway types;  A  N  N is the binary relation representing sequence flows of the business process. Such a graph (1) can be obtained using the BPMN 2.0 file processing as the XML file (eXtensible Markup Language), which includes the respective tags (Fig. 3). Figure 3: The generic BPMN 2.0 file structure Let us describe the already processed business process model as the vector of metrics: X  x1 , x2 ,..., xm , (2) where m is the number of metrics. Hence, when starting the processing of the BPMN 2.0 file, the number of all start events x1 and the number of correct start events x 2 (with one outgoing flow) should be found: x1  E s , (3) x 2  es , out es   1  es  E s , where:  Es  E is the subset of start events;  out es  is the number of outgoing sequence flows of each start event. Then, we should find the number of all end events x3 and the number of correct end events x 4 (with one incoming flow): x3  E e , (4) x4  ee , inee   1  ee  Ee , where:  Ee  E is the subset of end events;  inee  is the number of incoming sequence flows of each end event. The number of all tasks x5 and the number of correct tasks x6 (with one incoming flow and one outgoing flow) can be found using the following equations: x5  T , (5) x6  t , int   1  out t   1  t  T , where:  int  is the number of incoming sequence flows of each task;  out t  is the number of outgoing sequence flows of each task. The number of all intermediate events x7 and the number of correct intermediate events x8 (with one incoming flow and one outgoing flow) can be found using the following equations: x7  Ei , (6) x8  ei , inei   1  out ei   1  ei  Ei , where:  Ei  E is the subset of intermediate events;  inei  is the number of incoming sequence flows of each intermediate event;  out ei  is the number of outgoing sequence flows of each intermediate event. Finally, we should find the number of all gateways x9 , the number of correct gateways x10 (that either split or join a workflow into several scenarios), and the number of inclusive (OR) gateways x11 : x9  G , x10  g , ing   1  out g   1  ing   1  out g   1  g  G, (7) x11  g , l g   or  g  G, where:  ing  is the number of incoming sequence flows of each gateway;  out g  is the number of outgoing sequence flows of each gateway. Then, using the vector X (2) of m  11 elements defined using equations (3) – (7), we should obtain the following binary vector of errors in a BPMN model: R  r1 , r2 , r3 , r4 , r5 , (8) where:  r1  0,1 signalizes the presence of invalid start events;  r2  0,1 signalizes the presence of invalid end events;  r3  0,1 signalizes the presence of invalid tasks and sub-processes;  r4  0,1 signalizes the presence of invalid intermediate events;  r5  0,1 signalizes the presence of invalid gateways. Another vector of error weighs should be introduced as well: W  w1 , w2 , w3 , w4 , w5 , (9) where w j  0,1 are weights of each type of business process model errors, j  1,5 . q 1 Let us initialize all weighs using equal values w j  5  0.2 , j  1,5 , wherej 1 wj 1. Hence, the following equation can signalize errors in a business process model if it goes below 1: q Q  1 w r , j 1 j j (10) where q  5 is the number of vector R (8) elements. In order to solve the problem of error detection in business process models, we have designed the connectionist system inspired by the computational systems that simulate constitution of living being brains, known as artificial neural networks [14]. The structure of BPMN Correctness Validation Network (BPMN-CVN) is given in Fig. 4. Figure 4: The structure of BPMN Correctness Validation Network In order to calculate the vector R (8) elements, we suggest using:  the indicator (characteristic) function [15], which checks whether an element u of some set U belongs to a subset B  U : 1, u  B, 1B u    (11) 0, u  B;  the Heaviside (unit) step function [15], the value of which is 1 for positive arguments and 0 for negative arguments: 1, u  0, H u    (12) 0, u  0. Then, the calculations for the detection of invalid and missing start events using respective inputs can be given using the following computational nodes within the BPMN-CVN (Fig. 5):   r1  H 1x11  1x2  x1 . (13) Figure 5: The detection of invalid start events The calculations for the detection of invalid and missing end events using respective inputs can be given using the following computational nodes within the BPMN-CVN (Fig. 6):   r2  H 1x3 1  1x4  x3 . (14) Figure 6: The detection of invalid end events The calculations for the detection of invalid tasks and intermediate events using respective inputs can be given using the following computational nodes within the BPMN-CVN (Fig. 7): r3  1 x6  x5 , (15) r4  1 x8  x7 . Figure 7: The detection of invalid tasks and intermediate events The calculations for the detection of invalid gateways and ambiguous inclusive (OR) gateways, not recommended for process modeling [16], using respective inputs can be given using the following computational nodes within the BPMN-CVN (Fig. 8):   r5  H 1 x10  x9  1 x11 1 . (16) Figure 8: The detection of invalid gateways When processing BPMN 2.0 files of business process models, the errors weights represented by the vector W (9) should be re-calculated taking into account the relevance of these errors – the more often they occur in business process models, the more urgent it is to identify and eliminate them. The proposed algorithm (Fig. 9) is inspired by the reinforcement learning technique, where an intelligent system learns from the interaction with the environment [17]. Figure 9: The algorithm of proposed intelligent system As can be seen from Fig. 9, the system processes BPMN models and re-calculates error weights: c kj w kj  k , j  1, q, (17) C where K is the number of business process models, k  1, K . 3. Results and Discussion The experimental calculations were performed using the Camunda’s collection of BPMN models freely available in its GitHub repository [18]. The software written as the Python script uses “os” and “xml” packages to read and parse BPMN 2.0, and the “csv” package to store calculations’ results in a CSV (Comma-Separated Values) format for further discussion. As it is shown in Fig. 10, the software detects errors in and calculates quality for each process described by a BPMN model. Figure 10: The software workflow As the result of experiments, 3729 BPMN 2.0 files that contain 6137 business process descriptions were processed. Using the error detection algorithms (Fig. 3 – 6), 6868 errors of different types were detected in BPMN models. Number of models by error types are given in Fig. 11 below. Figure 11: Number of models by error types As can be seen from Fig. 11 above, there are following numbers of models categorized by various business process modeling errors:  736 models with multiple start events or improperly connected start events;  2501 models with multiple end events or improperly connected end events;  1776 models with improperly connected tasks or sub-processes;  715 models with improperly connected intermediate events;  1140 models with improperly connected gateways. Fig. 12 below demonstrates changes of error weights W (9) adjusted after processing each of the 6137 business process descriptions. Figure 12: The changes of error weights After the processing was over, the final error weights took the following values:  w1  0.11 for start events;  w2  0.36 for end events;  w3  0.26 for tasks and sub-processes;  w4  0.10 for intermediate events;  w5  0.17 for gateways. The obtained error weights w j , j  1,5 reflect the following idea – the more frequently considered errors occur in business process models, the greater negative impact they should have on the overall quality assessment of BPMN models [19]. The proposed BPMN-CVN now can be used with the defined weights to detect errors and evaluate quality of business process models. The calculation results demonstrate that end events are the most vulnerable to BPMN modeling errors – 41% of analyzed business process descriptions contain multiple end events or suffer from improperly connected end events with missing incoming sequence flows (i.e. end events are detached from the workflow) or multiple incoming sequence flows (i.e. end events are used to synchronize or merge workflow scenarios instead of corresponding gateways). The second most frequent BPMN modeling errors are caused by improperly connected tasks and sub-processes – 29% of business process descriptions contain tasks or sub-processes with missing incoming or outgoing sequence flows (i.e. activities are detached from the workflow), as well as tasks or sub-processes with multiple incoming or outgoing sequence flows (i.e. activities are used to split or join workflow scenarios instead of corresponding gateways). Almost 19% of analyzed business process descriptions contain gateways that are neither splits nor joins – some are used to join and split the workflow at the same time, and some do not have enough incoming or outgoing sequence flows to be considered as splits or joins. For example, one of the analyzed BPMN models describes an insurance recourse business process. According to the obtained results, it appears to contain all the considered business process modeling errors (Fig. 13):  (a) start event detached from the workflow;  (b) task starts the workflow instead of the start event;  (c) task splits the workflow into several scenarios instead of the corresponding gateway;  (d) end event merges workflow scenarios instead of the corresponding gateway;  (e) gateway does not reflect the workflow split or join;  (f) inclusive (OR) gateway is used;  (g) task ends the workflow instead of the end event. Figure 13: The insurance recourse business process model with detected errors As part of the conducted experiments, quality values (10) were calculated for all of the 6137 business process descriptions using the different errors weights (9): equal weights Qk1 , dynamically changing error weights during re-calculation when the BPMN-CVN was used for the first time Qk2 , and error weight obtained after the initial processing of BPMN models Qk3 , k  1, K . Also, differences between quality values calculated using initial and final weights were calculated:  k  C k1  C k3 , k  1, K , (18) where K is the number of business process models, k  1, K . Let us apply exploratory data analysis [20] to calculate quality values of business process models. Outlined values show that the quality of the 25% of business process models falls below 0.60 for the initially equal error weights Qk1 , 0.52 for the dynamically changing error weights Qk2 , and 0.53 for the final error weights Qk3 . The upper 25% of business process models have the perfect quality of 1.00, which means these models are free of errors (or at least they contain errors that have not been detected). And the remaining 50% of business process models belong to the second quartile with a median value of 0.80 for the initially equal error weights Qk1 , 0.73 for the dynamically changing error weights Qk2 , and 0.74 for the final error weights Qk3 . The differences  k between quality values calculated using initial Qk1 and final weights Qk2 vary between 0.16 and 0.22 for the 25% of business process models and fall below 0.16 for the remaining 75% of business process models. Moreover, for the 25% of business process models the differences  k are equal to zero, which means the quality of these models remains the same for different weights, and more likely these are so-called “perfect” business process models of 1.00 quality. The minimum, first quartile, medial, third quartile, and maximum values are given in Table 1. Table 1 Quartile values of experimental results Quartile values Q1 k Qk2 Qk3 k Minimum value 0.00 0.00 0.00 0.00 First quartile 0.60 0.52 0.53 0.00 Median value 0.80 0.73 0.74 0.06 Third quartile 1.00 1.00 1.00 0.16 Maximum value 1.00 1.00 1.00 0.22 The box (whisker) plot [20] created using the quartile values (Table 1) is shown in Fig. 14 below. Figure 14: The boxplot The box plot (Fig. 14) means that in the set of real business process models created by different authors, 25% are of poor quality, 50% are of moderate quality, and the remaining 25% are of high quality. The distribution of differences between quality values (before and after error weights were adjusted) shows significant changes in quality estimations for 25% of models, moderate changes for 50% of models, and no changes for the remaining 25% of models. 4. Conclusion In this paper, we discussed the relevance of high-quality business process modeling and proposed the connectionist computational system called BPMN Correctness Validation Network (BPMN-CVN) capable of business process modeling errors detection and quality evaluation. This system is inspired by the architecture of artificial neural networks and the reinforcement learning technique allowing the intelligent system to learn by interacting with the environment – in our case, by analyzing business process models. The proposed intelligent system structure and its algorithm were implemented using Python programming language to perform necessary calculations. The large collection of 3729 BPMN models that contain 6137 business process descriptions was used as the experimental dataset. Therefore, the obtained experimental results allow us to make the following conclusions:  the most frequent business process modeling errors connected with the poor structure of End Events and Tasks (or Sub-Processes);  less frequent business process modeling errors are caused by the poor structure of Gateways or usage of ambiguous Inclusive (OR) Gateways, which is not recommended by many studies;  Start Events and Intermediate Events are less error-prone but still impact the business process model quality;  after the initial processing of the experimental collection of BPMN models created by various authors during Camunda’s training sessions for goods dispatch, credit scoring, insurance recourse, and restaurant business processes [18], we obtained the error weights related to start events, end events, activities (tasks and sub-processes), intermediate events, and gateways;  the obtained error weights reflect their frequencies and adjust when new BPMN models are processed according to the proposed BPMN-CVN algorithm – this approach allows the system to learn over time: i.e., some errors may not occur for a while, but still have significant weights and “hide” more relevant errors that occur frequently [19];  the exploratory analysis of experimental results (Table 1) demonstrates first quartile, median, and third quartile values, which can be used as thresholds for the classification of analyzed BPMN models: 0  Q  0.53 for low-quality diagrams, 0.53  Q  0.74 for moderate- quality diagrams, and 0.74  Q  1.00 for high-quality diagrams;  the proposed quality thresholds may be slightly adjusted in real-time during the processing of business process models with certain trends in errors. In the future, the BPMN-CVN with defined error weights can be used to create the quality analysis tool that will be used by business analysts, process designers, and other authors of BPMN diagrams to detect errors in their models and achieve better quality by eliminating such errors, making diagrams more understandable and maintainable. Furthermore, this software tool should provide a collaborative environment for authors of business process models, where they can share and search for best-practice BPMN diagrams or look up cases of error fixes. In addition, when processing organizational repositories of BPMN models, which may contain hundreds or even thousands of files, Big Data and Business Intelligence technologies should be used for efficient data processing, visualization, and reporting. 5. References [1] N. Ahrend, Opportunities and limitations of BPM initiatives in public administrations across levels and institutions, Doctoral thesis, Humboldt University, Berlin, 2014. [2] F. Kahloun, S. A. Ghannouchi, Classification Algorithm for Assessing the Quality Criteria for Business Process Models, in: Hybrid Intelligent Systems, HIS 2017, Advances in Intelligent Systems and Computing, vol. 734, Springer, Cham, 2017, pp. 71–81. doi:10.1007/978-3-319-76351-4_8 [3] M. Dumas, M. La Rosa, J. Mendling, H. A. Reijers, Introduction to Business Process Management, Springer, Heidelberg, 2018. doi:10.1007/978-3-662-56509-4 [4] P. Harmon, The State of Business Process Management, in: The State of the BPM Market, volume 2016, BPTrends, 2016, pp. 1–50. [5] Business Process Model and Notation (BPMN). Version 2.0. URL: https://www.omg.org/spec/BPMN/2.0/PDF [6] H.G. Ceballos, V. Flores-Solorio, J. P. Garcia, A Probabilistic BPMN Normal Form to Model and Advise Human Activities, in: International Workshop on Engineering Multi - Agent Systems, Springer, Cham, 2015, pp. 51–69. doi:10.1007/978-3-319-26184-3_4 [7] J. Pavlicek, R. Hronza, K. Jelinkova, The business process model quality metrics. In: Enterprise and Organizational Modeling and Simulation, in: EOMAS 2017, Lecture Notes in Business Information Processing, vol. 298, Springer, Cham, 2017, pp. 134–148. doi:10.1007/978-3-319-68185-6_10 [8] R. Tricker, ISO 9001:2015 for Small Business, Taylor & Francis, 2016. [9] F. Corradini, F. Fornari, S. Gnesi, A. Polini, B. Re, Quality assessment strategy: Applying business process modelling understandability guidelines, University of Camerino, Italy, 2015. URL: https://openportal.isti.cnr.it/data/2017/380283/2017_380283.pdf [10] L. Henrik, J. Mendling, O. Günther, Learning from quality issues of BPMN models from industry, IEEE software 4(33) (2015) 26–33. doi:10.1109/MS.2015.81 [11] W. Kbaier, S. A. Ghannouchi, Determining the threshold values of quality metrics in BPMN process models using data mining techniques, Procedia Computer Science 164 (2019) 113– 119. doi:10.1016/j.procs.2019.12.161 [12] H. B. H. Ayech, S. A. Ghannouchi, E. A. E. H. Amor, Extension of the BPM lifecycle to promote the maintainability of BPMN models, Procedia Computer Science 181 (2021) 852 – 860. doi:10.1016/j.procs.2021.01.239 [13] A. Martens, P. Fettke, P. Loos, Inductive Development of Reference Process Models Based on Factor Analysis, in: International Conference on Wirtschaftsinformatik, vol. 12, Osnabrueck, 2015, pp. 438–452. [14] S. Walczak, Artificial neural networks, in: Advanced methodologies and technologies in artificial intelligence, computer simulation, and human-computer interaction, IGI global, 2019, pp. 40–53. doi:10.4018/978-1-5225-7368-5.ch004 [15] Z. Weihong, Z. Ying, Level-set functions and parametric functions, in: The Feature-Driven Method for Structural Optimization, Elsevier, 2021, pp. 9–46. doi:10.1016/B978-0-12- 821330-8.00002-X [16] J. Krogstie, Quality of Business Process Models, in: Quality in Business Process Modeling, Springer, 2016, pp. 53–102. doi:10.1007/978-3-319-42512-2 [17] Y. Yu, Towards Sample Efficient Reinforcement Learning, in: Proceedings of the Twenty- Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), 2018, pp. 5739–5743. [18] BPMN for research. URL: https://github.com/camunda/bpmn-for-research [19] A. Kopp, D. Orlovskyi, Towards the Method and Information Technology for Evaluation of Business Process Model Quality, in: ICTERI 2020, Communications in Computer and Information Science, vol. 1308, Springer, Cham, 2021, pp. 93–118. doi:10.1007/978-3-030- 77592-6_5 [20] C. R. Pandian, M. Kumar, Simple Statistical Methods for Software Engineering: Data and Patterns, CRC Press, 2015.