Machine learning techniques for predicting software code properties using design metrics⋆ Vira Liubchenko1,*, † 1 Odesa Polytechnic National University, Shevchenka av. 1, 65044 Odesa, Ukraine Abstract This paper proposed an information technology to predict code properties based on software design metrics, underscoring the critical interplay between metrics and software code properties. A meticulous case study leveraging data from 39 open-source Java projects demonstrates the efficacy of machine learning methodologies, including random forest and artificial neural networks, in predicting code properties utilizing selected design metrics. The study reveals insights into the correlation between design metrics and lines of code (LOC), suggesting the feasibility of using design metrics for LOC forecasting and, by extension, various software characteristics. The findings emphasize the importance of prioritizing generalizability over specificity to enhance the model's reliability across diverse software engineering contexts. Overall, this paper advances our understanding of the significance of design metrics in forecasting code properties, providing valuable insights into their application within software engineering practices to mitigate risks and enhance software quality. Through these contributions, this research lays a solid foundation for further exploring and utilizing design metrics in software development processes. Keywords software quality assurance, predictive modelling, design metrics, performance prediction, machine learning, software engineering, regression analysis, classification techniques, open-source Java projects1 1. Introduction Software quality assurance issues are becoming increasingly important with the spread of software into different spheres of industry and life and with the increasing variety of software types. Therefore, the quest for quality assurance remains paramount. However, achieving these objectives requires more than post hoc debugging and testing; it necessitates proactive measures to predict and preempt potential issues before they manifest. This is where the concept of quantitative prediction emerges as a pivotal tool in the arsenal of software engineers. By leveraging design metrics, architectural insights, and empirical data, engineers could anticipate how a software system will behave under different conditions, enabling them to IntelITSIS’2024: 5th International Workshop on Intelligent Information Technologies and Systems of Information Security, March 28, 2024, Khmelnytskyi, Ukraine ∗ Corresponding author. † These authors contributed equally lvv@op.edu.ua (V. Liubchenko) 0000-0002-4611-7832 (V. Liubchenko) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). optimize its design, allocate resources judiciously, and mitigate potential bottlenecks before deployment. The importance of performance prediction cannot be overstated in today's software development landscape. In modern IT environments, the traditional approach of reactive troubleshooting proves inadequate, if not impractical. Performance prediction offers a proactive strategy to address these challenges, empowering stakeholders to make informed decisions at every stage of the software development lifecycle. At the heart of prediction is the premise that specific design metrics are reliable software quality indicators [1]. These metrics encompass various aspects of the software architecture, including its modularity, coupling, cohesion, complexity, and extensibility. By analyzing these metrics, researchers and practitioners seek to discern patterns and correlations that shed light on the system's future performance characteristics. For instance, studies have shown that high coupling levels between modules tend to increase the propagation of defects and decrease the system's resilience to change [2]. Similarly, as measured by metrics like cyclomatic complexity or code churn, excessive complexity often correlates with higher defect density and lower maintainability. By identifying such patterns, developers can proactively refactor the source code and streamline its structure. Advances in machine learning and data analytics enable the development of predictive models that leverage historical data to forecast software quality metrics. We can identify underlying trends and accurately predict future outcomes by training these models on repositories of code artefacts, bug reports, and performance logs. Predicting software performance characteristics based on design metrics holds immense significance. It offers tangible cost reduction, risk mitigation, and stakeholder satisfaction benefits. Software engineers can avoid costly rework, delays, and customer dissatisfaction by identifying potential performance issues early in the software lifecycle. Either limited or absent historical data typically hinders this endeavour. Existing datasets detailing software defects typically lack measurements about the design aspect. Secondly, datasets focusing on design components are often gathered on a per-project basis and lack extensive measurements. This paper explores methodologies for utilizing indirect metrics to forecast software performance characteristics during the software design phase. The paper is structured as follows. Section 2 provides an overview of the published research on which our work is based. In section 3, we describe information technology for prediction. Section 4 describes the application of the proposed information technology. Finally, the general conclusions of the work are collected in Section 5. 2. Literature review Given the paper's focus on software design metrics, we thoroughly analyzed articles about this category of metrics. It's important to highlight that our review exclusively included articles published after 2018. The paper [3] compared five common design patterns regarding code metrics and time efficiency. It discussed the advantages and disadvantages of each design pattern. It concluded that some metrics might be the same or better with simple solutions, while others might benefit from design patterns. Interestingly, experiments demonstrated the dependencies between design metrics and performance characteristics. In [4], the authors concluded that object-oriented design metrics can be used as early indicators of software reliability and that the neural network model can effectively model the relationship between metrics and reliability. They suggested that this can help reduce software development and testing costs and effort. The authors used various metrics to measure object- oriented software's design complexity, coupling, cohesion, inheritance, and size. The authors of [5] described using guide cards based on software metrics to identify four design smells and spectral clustering to group similar smells across different software versions. They analyzed four large Java open-source software projects and demonstrated the possibility of design smell detection based on the software metrics. The paper [6] investigates the alignment between quality improvement and software design metrics by focusing on eight internal quality attributes and 27 structural metrics. It finds that most design metrics are mapped to the main quality attributes. However, quality attributes, such as encapsulation, abstraction, polymorphism, and design size, are poorly represented by any metrics. Silva et al. [7] also studied the relationship with software quality attributes but for software architecture metrics. This study confirmed the existence of relationships and allowed the creation of a catalogue of such relationships. Special consideration should be directed towards the research [8]. The work introduces a novel approach leveraging neural networks to assess software quality characteristics using quality attributes. Internal metrics, which could be estimated using software design, serve as the basis for these quality attributes. The paper [9] applied software design metrics, such as coupling, cohesion, complexity, and size, to the developer story. It used them as input parameters for supervised machine learning algorithms to predict a class's source code size. The paper conducted a case study based on 30 open-source Java systems and demonstrated the usefulness and effectiveness of metrics for predicting a class's source code size. The paper [10] investigated the relations among the parameters of object-oriented software metrics using complex network analysis. The authors collected a dataset of object-oriented metrics and their parameters and created a network of parameters based on their co-occurrence in metrics. The authors found that some parameters were more frequently used and more strongly related than others and that the distance between parameters was short regardless of the property they belonged to. The authors concluded that understanding the relationship among software metrics' parameters can help software developers select metrics during the software design phase and that network measurements can be helpful tools for analyzing the relations among software metrics and parameters. The paper [11] proposed a solution for automating the evaluation of Java project design quality in an educational setting using object-oriented metrics and neural networks. Using manually assigned points as labels, the authors trained a neural network model to predict the design quality score based on the metric values. The model was evaluated on two student projects and homework datasets, and satisfactory results were reported. The paper [12] proposed a framework to assess the impact of design patterns on software metrics. The results showed no consistent software metrics behaviour between the pattern and non-pattern versions. However, we want to point out that the authors do not deny the metrics' consistent behaviour with the software code's characteristics. In summarizing the published research, it became evident that all researchers acknowledged the dependency between design metrics and software code properties. The disparity lies in the approaches employed to leverage or model this dependency. Armed with this understanding of the correlation, we can now develop information technology for predicting code properties based on design metrics. However, we must point out that few studies link design metrics to software characteristics. The main body of publications is devoted to using code characteristics, mainly the number of lines of program code metrics, to predict different software properties. Lines of code (LOC) is a fundamental software code measure widely used as a proxy for software development effort or as a normalization factor in many other software-related measures [13]. The most developed topic is software defect prediction (SDP). Pradhan et al. [14] used software size in KLOC as an attribute for the SDP model. Effort-aware SDP ranks software modules according to the defect density of software modules, which allows testers to find more defects while reviewing a certain amount of code [15]. This method uses LOC as input data. In [16], authors proposed a deep-learning-based method for predicting potential code defects in software modules. This method used semantic features and LOC simultaneously using the hierarchical LSTM architecture. However, other applications were described. The experimental results in [17] showed a relationship between testing and LOC. The authors of [18] successfully used LOC for measuring the Maintainability Index. The research [19] used LOC for effort prediction on projects developed with agile methods and a microservice-based architecture. Based on the analysis performed, we can assume the feasibility of LOC forecasting based on software design metrics. Predicting various software characteristics based on LOC is a well- studied task. 3. Information technology description Let us generalize the published insights. Information technology aims to predict significant software quality features during the design phase, reducing uncertainty in software projects. Input data consists of measurements on metrics that describe the software design. Various authors have proposed different sets of metrics, with the most comprehensive list presented in [20]. The results are the quality features that we predict. A natural constraint on the choice of input and output data is the availability of historical data on the selected metrics. After all, a predictive machine learning model must be trained on historical data. Information technology comprises two core procedures: model identification and prediction utilizing the model. Both procedures are widely recognized and nearly standard in machine learning. Figure 1 outlines a schematic representation of these procedures. A sufficient accuracy level of quality feature prediction under specific conditions could always be determined. Depending on this, a particular predicting technique should be chosen. It's important to note that while the technology itself is significant, the primary focus of our work lies in the results yielded by the case study. a b Figure 1: Schemas of the model identification (a) and prediction (b) procedures that composed the information technology. 4. Case study The developed information technology underwent testing using data from 39 open-source Java projects. Data measurements were collected for individual components across 86 metrics within each project. Specific metrics exhibit explicit dependencies, such as the correlation between the number of LOC and kilo lines of program code (KLOC). Figure 2 illustrates data from three randomly chosen projects plotted in the space defined by the first two principal components. It's important to highlight that the component characteristics of these projects are notably similar, indicating that the datasets from individual projects can be effectively merged. From the available set of metrics, we used only those related to the software design, namely • Number of Attributes (NOA), • Number of Parameters (NOP), • Number of Children (NOC), • Coupling Between Objects (CBO), • Depth Inheritance Tree (DIT), • Response for a Class (RFC), • Lack of Cohesion of Methods (LCOM5). Figure 2: Visualization of projects’ data in principal components space. The datasets provided did not include measurements of metrics directly tied to the quality of a software product, such as the quantity or presence of defects. Nevertheless, it has been established that the number of defects tends to be proportionate to the number of LOCs. This dependency was utilized in our analysis. Initially, we explored existing linear dependencies by examining the correlation matrix. Consequently, we identified only two significant linear dependencies with LOC: CBO and RFC. Therefore, we opted not to utilize linear regression. Instead, we employed random forest (RF), AdaBoost, and artificial neural network (ANN) to model the regression dependence. We utilized the Mean Absolute Percentage Error (MAPE) to assess the prediction quality. In our experiment, the AdaBoost method yielded the poorest result, with an error exceeding 100%. The Random Forest (RF) method achieved a MAPE of 40.64%, while the Artificial Neural Network (ANN) allowed for a reduction of MAPE to 30.49%. However, it's important to note that these results alone do not definitively establish neural networks as the superior prediction method. It's well-known that achieving quality results with neural networks often requires training the model on extensive datasets. Our experiment validated this notion. When the neural network was trained solely on data from one project, the prediction error surpassed 100%, which is comparable to AdaBoost's MAPE. Conversely, the RF and AdaBoost methods exhibited consistent performance across different training dataset sizes—whether trained on data from one, multiple, or all projects. In summarizing the results obtained, it became apparent that the regression model yielded forecasts of low quality. However, it's worth noting that our primary concern wasn't forecasting LOC itself. Instead, we focused on utilizing LOC to infer quality characteristics. Consequently, we could establish varying "risk" levels based on module sizes measured by LOC and transition to a classification task. Figure 3 depicts the frequency distribution of LOC values. Figure 3: Frequency distribution of LOC values. Most modules contain up to 75 LOC, suggesting that modules of this size typically exhibit acceptable quality. Conversely, there are a few extensive modules, which likely present challenges. Hence, we propose categorizing modules with LOC>75 as risky. As it was realized for regression, we used RF, AdaBoost, and ANN methods for the classifier. The results obtained were almost the same (Table 1). We observe that the ANN demonstrated slightly inferior performance, likely attributed to the inadequacy of the sample size used. For practical applications, RF is preferable due to its consistent performance across datasets of varying sizes and its simplicity compared to the AdaBoost method. Table 1 Performance measures Method Accuracy Precision Recall RF 0.95 0.96 0.98 AdaBoost 0.95 0.96 0.98 ANN 0.95 0.92 0.89 Therefore, reducing the specificity of the outcome criteria enables the development of a model with satisfactory prediction accuracy. This inference remains valid irrespective of the programming language utilized or the type of project under consideration. By broadening the scope of outcome criteria, the model becomes more adaptable and capable of providing reliable predictions across various contexts. This flexibility enhances the model's utility and ensures its applicability in diverse software engineering scenarios. Consequently, prioritizing generalizability over specificity enhances the model's effectiveness and reliability in predicting outcomes. 5. Conclusion The importance of performance prediction cannot be overstated in modern software development environments. Traditional reactive troubleshooting approaches often prove inadequate or impractical. Performance prediction offers a proactive strategy to address these challenges, empowering stakeholders to make informed decisions at every stage of the software development lifecycle. Advances in machine learning and data analytics have facilitated the development of predictive models that leverage historical data to forecast software quality metrics. We can identify underlying trends and accurately predict future outcomes by training these models on repositories of code artefacts, bug reports, and performance logs. Predicting software quality based on design metrics offers tangible benefits, including cost reduction, risk mitigation, and stakeholder satisfaction. By identifying potential performance issues early in the software lifecycle, software engineers can avoid costly rework, delays, and customer dissatisfaction. Reviewing the literature, we find a wealth of research exploring the relationship between design metrics and software quality. Various studies have investigated the efficacy of different metrics and modelling techniques in predicting software reliability, identifying design smells, and assessing the impact of design patterns on software metrics. Our information technology aims to generalize these insights and provide a framework for predicting significant quality features during the design phase, thereby reducing uncertainty in software projects. By utilizing historical data on design metrics, we can develop predictive models that offer valuable insights into the future performance of software systems. In our case study, we tested the developed information technology using data from 39 open- source Java projects. By analyzing metrics related to software projects and employing regression and classification techniques, we sought to predict software quality characteristics. Our results highlight the importance of training predictive models on extensive datasets and the potential limitations of specific modelling techniques. Ultimately, our findings underscore the value of predictive modelling in software engineering, offering a proactive approach to software quality assurance. Engineers can make informed decisions and optimize software designs by leveraging design metrics and historical data for improved performance and reliability. References [1] J. Rashid, T. Mahmood, M. W. Nisar, A Study on Software Metrics and its Impact on Software Quality, in: ArXiv, 2019, abs/1905.12922. URL: https://api.semanticscholar.org/CorpusID:170078652. [2] P. Skiada, A. Ampatzoglou, E.-M. Arvanitou, A. Chatzigeorgiou, I. Stamelos, Exploring the Relationship between Software Modularity and Technical Debt, in: 2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), Prague, Czech Republic, 2018, pp. 404–407. doi:10.1109/SEAA.2018.00072. [3] A. Karavokyris, E. Alepis, Software Measures for Common Design Patterns Using Visual Studio Code Metrics, in: 9th International Conference on Information, Intelligence, Systems and Applications (IISA), Zakynthos, Greece, 2018, pp. 1–7. doi:10.1109/IISA.2018.8633694. [4] C. H. Madhav, K. S. V. Kumar, A method for predicting software reliability using object oriented design metrics, in: International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India, 2019, pp. 679–682. doi:10.1109/ICCS45141.2019.9065541. [5] A. Imran, Design Smell Detection and Analysis for Open Source Java Software, in: IEEE International Conference on Software Maintenance and Evolution (ICSME), Cleveland, OH, USA, 2019, pp. 644–648. doi:10.1109/ICSME.2019.00104. [6] E. A. AlOmar, M. W. Mkaouer, A. Ouni, M. Kessentini, On the Impact of Refactoring on the Relationship between Quality Attributes and Design Metrics, in: ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), Porto de Galinhas, Brazil, 2019, pp. 1–11. doi:10.1109/ESEM.2019.8870177. [7] S. Silva, A. Tuyishime, T. Santilli, P. Pelliccione, L. Iovino, Quality Metrics in Software Architecture, in: 2023 IEEE 20th International Conference on Software Architecture (ICSA), L'Aquila, Italy, 2023, pp. 58–69. doi:10.1109/ICSA56044.2023.00014. [8] Lebiga, M., Hovorushchenko, T., Kapustian, M. (2022). Neural-Network Model of Software Quality Prediction Based on Quality Attributes. Computer Systems and Information Technologies, (1), 69–74. doi:10.31891/CSIT-2022-1-9. [9] A. Algarni, K. Magel, Applying Software Design Metrics to Developer Story: A Supervised Machine Learning Analysis, in: IEEE First International Conference on Cognitive Machine Intelligence (CogMI), Los Angeles, CA, USA, 2019, pp. 156–159. doi:10.1109/CogMI48466.2019.00030. [10] M. M. A. Dabdawb, B. Mahmood, A Network of Object-Oriented Software Metrics’ Parameters, in: IEEE International Conference on Communication, Networks and Satellite (COMNETSAT), Purwokerto, Indonesia, 2021, pp. 172–178. doi:10.1109/COMNETSAT53002.2021.9530822. [11] S. Ćelosmanović, V. Ljubović, JMetricGrader: A software for evaluating student projects using design object-oriented metrics and neural networks, in: 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO), Opatija, Croatia, 2022, pp. 532–537. doi:10.23919/MIPRO55190.2022.9803776. [12] M. G. Al-Obeidallah, Towards a Framework to Assess the Impact of Design Patterns on Software Metrics, in: International Conference on Multimedia Computing, Networking and Applications (MCNA), Valencia, Spain, 2023, pp. 67–72. doi:10.1109/MCNA59361.2023.10185865. [13] M. Ochodek, K. Durczak, J. Nawrocki, M. Staron, Mining Task-Specific Lines of Code Counters, in: IEEE Access 11 (2023) 100218–100233. doi:10.1109/ACCESS.2023.3314572. [14] S. Pradhan, V. Nanniyur, P. K. Vissapragada, On the Defect Prediction for Large Scale Software Systems – From Defect Density to Machine Learning, in: 2020 IEEE 20th International Conference on Software Quality, Reliability and Security (QRS), Macau, China, 2020, pp. 374–381. doi:10.1109/QRS51102.2020.00056. [15] J. Rao, X. Yu, C. Zhang, J. Zhou, J. Xiang, Learning to rank software modules for effort- aware defect prediction, in: 2021 IEEE 21st International Conference on Software Quality, Reliability and Security Companion (QRS-C), Hainan, China, 2021, pp. 372–380. doi:10.1109/QRS-C55045.2021.00062. [16] H. Wang, W. Zhuang, X. Zhang, Software Defect Prediction Based on Gated Hierarchical LSTMs, in: IEEE Transactions on Reliability 70(2) (2021) pp. 711–727. doi:10.1109/TR.2020.3047396. [17] S. Ahmed, L. Sadath, J. Nagaria, Software Testing and Lines of Codes-A Study on Software Engineering Design Patterns, in: 2019 International Conference on Automation, Computational and Technology Management (ICACTM), London, UK, (2019) pp. 389–394. doi:10.1109/ICACTM.2019.8776688. [18] G. H. Kencana, A. Saleh, H. A. Darwito, R. R. Rachmadi, E. M. Sari, Comparison of Maintainability Index Measurement from Microsoft CodeLens and Line of Code, in: 2020 7th International Conference on Electrical Engineering, Computer Sciences and Informatics (EECSI), Yogyakarta, Indonesia (2020) pp. 235–239. doi:10.23919/EECSI50503.2020.9251901. [19] H. Ünlü, T. Hacaloglu, F. Büber, K. Berrak, O. Leblebici, O. Demirörs, Utilization of Three Software Size Measures for Effort Estimation in Agile World: A Case Study, in: 2022 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), Gran Canaria, Spain (2022) pp. 239–246. doi: 10.1109/SEAA56994.2022.00045. [20] E. Y. Hernandez-Gonzalez, A. J. Sanchez-Garcia, M. K. Cortes-Verdin, J. C. Perez-Arriaga, Quality Metrics in Software Design: A Systematic Review, in: 7th International Conference in Software Engineering Research and Innovation (CONISOFT), Mexico City, Mexico, 2019, pp. 80–86. doi: 10.1109/CONISOFT.2019.00021.