Improving Software Maintenance Ticket Resolution Using Process Mining (Extended Abstract) Monika Gupta Indraprastha Institute of Information Technology Delhi, India monikag@iiitd.ac.in Software maintenance is a crucial activity in software industry and consumes a major portion of the expenditure on software. Software maintenance refers to the modification of software product after delivery and is required to cor- rect faults, to improve performance or other attributes, or to adapt the product to a modified environment. Ever-changing customer needs and rapid technical progress highlight the need to continuously improve software maintenance pro- cess to make it more effective and efficient. The work in this thesis focuses on analyzing and improving software mainte- nance process by exploring novel applications of process mining and predictive analytics. While process mining helps to discover the process reality, using pre- dictive analytics helps recommend suitable actions to mitigate the inefficiencies in a proactive way. To identify the potential opportunities for improvement in software process management by mining data repositories, we first conducted qualitative inter- views and surveys of over 40 managers in a large global IT company. The survey provided us with a list of over 10 maintenance process challenges encountered by practitioners, and benefits that may accrue by addressing them. The survey is published in MSR 2015 [10]. This thesis addresses a few of the identified challenges pertaining to the software maintenance process. We have conducted a series of case studies on large real world data (commercial and open source) to evaluate the usefulness of the proposed solution approaches. Overall approach of the thesis is published as doctoral symposium paper [2][3]. The main contributions of the thesis are as follows: – Analyzing the Maintenance Ticket Resolution Process to Identify the Process Inefficiencies Ticket resolution is an important part of software maintenance process. As identified from the survey, there is a need to analyze the data generated during ticket resolution process to capture process reality and identify the process inefficiencies. We have proposed a framework for analyzing software repositories for ticket resolution from diverse perspectives, by applying process mining. The frame- work has three main steps: 1. data extraction from multiple repositories and integration, 2. transformation of the data to an event log, and 3. multi- perspective process mining from the event log. Using multi- perspective pro- Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). cess mining, we discover the process model which captures the control flow, timing and frequency information about events. We then studied inefficien- cies such as self- loops, back-forth, ticket reopen, timing issues, and effort consumption. We also analyze the degree of conformance between the de- signed and the run time (discovered) process model. We conducted a series of case studies on open-source Firefox browser, Core project, and open-source Google Chromium project. The data on tickets was obtained from Issue Tracking System (ITS) for the project (e.g. Bugzilla). We also used repositories for Peer Code Review (PCR) system and Version Control System (VCS), where available. For each of the project, separate analysis was done, from which we also made some general observations. For example, in Google Chrome, we observed that for around 14% cases, ticket is instantiated in ITS after patch submission in PCR or commit in VCS (ideally, for traceability reasons, a ticket’s life cycle should start from issue reporting in ITS followed by patch submission in PCR and commit in VCS), and for these tickets the number of patch revisions thus resolution time is higher. In Firefox and Core, we found that a significant percentage of tickets undergo multiple developer reassignment causing delays in resolution. Also, we identified two categories of tickets (wontfix and worksforme) which con- sume the maximum ticket resolution effort. We noted that several issues in these categories get reopened signaling the need for improvement in identify- ing such tickets. The proposed multi-perspective process mining framework and the case studies to evaluate the proposed approach is presented in the thesis, and is published in ISEC, APSEC and MSR [7][8][9]. – Reducing User Input Requests in the Maintenance Ticket Resolu- tion Process A ticket is required to be resolved in the defined service level resolution time, measured using the service level clock. Failure to meet this requirement leads to a penalty on the service provider. After a ticket is assigned to an analyst (person responsible for servicing the tickets), they can ask for user inputs to resolve the ticket. When user input is requested, the service level clock stops in order to prevent spurious penalty on the service provider. However, this waiting time adds to the user-experienced resolution time and degrades user experience. Therefore, in this work, we aim to reduce the user input requests to make the ticket resolution faster. We first applied the multi-perspective process mining framework on the tick- ets of a large global IT company and found that around 57% of the tickets have user input requests in the life cycle, causing user-experienced resolu- tion time to be almost twice as long as the measured service resolution time. We observed that user input requests are broadly of two types - real, seek- ing information from the user to process the ticket and tactical, when no information is asked but the user input request is raised merely to pause the service level clock. We propose a machine learning based system that pre-empts a user at the time of ticket submission to provide additional in- formation that the analyst is likely to ask thus, reducing real user input requests. We also propose a rule-based detection system to identify tactical user input requests. The proposed system that predicts the information needs has an average ac- curacy of 94 − 99% across five cross validations while traditional approaches such as logistic regression and naive Bayes have accuracy in the range of 50 − 60%. The detection system identifies around 15% of the total user input requests as tactical with a high precision. Together the proposed pre- emptive and detection systems efficiently bring down the number of user input requests and improve the user-experienced resolution time. This work is published in the Empirical Software Engineering journal [5]. – Discovering Underlying Maintenance Ticket Resolution Process Interactions using Unstructured Data from Execution Logs Process mining uses largely structured data viz. event logs and does not leverage the rich information from unstructured data such as comments and emails. This work is motivated by the need to explore unstructured data gen- erated during process execution to capture underlying process interactions to help in making effective process improvement decisions. To achieve this, we extract topical phrases (keyphrases) from the unstruc- tured data using an unsupervised graph-based approach. Keyphrases are then integrated into the event log, which then gets reflected in the discov- ered process model. This provides insights that cannot be obtained solely from structured data, which can be used to identify process improvement opportunities. To evaluate the usefulness of the approach, we conducted case studies on the publicly available ticket data from a Dutch insurance company, and on the ticket data of a large global IT company. Our approach extracts keyphrases from the comments associated with the tickets with an average accuracy of around 80% across different data sets. This enabled us to succinctly cap- ture the additional information in the comments regarding issues influenc- ing ticket resolution process and often causing delays, like extra information required, priority, severity, etc. This allows the managers or the process an- alysts to make decisions about how to speed up the resolution process, e.g., implement a bot to capture the information or add a mandatory field in the initial ticket template thus reduce the delays incurred while waiting for information. This work is published at AI4BPM [4]. – Runtime Monitoring in Changed Software as Compared to Previ- ous Version To resolve a ticket, some code changes are made which can lead to an anomaly such as regression bugs. In this work, we aim to monitor and com- pare the execution behaviour of new version (after code change) with the previously deployed version to detect if ticket resolution has caused some anomalous behaviour thus reduce the post release bugs. We propose an approach to discover execution behaviour for the deployed and the new version using the execution logs (which contain outputs of all the print statements along with related information like time, thread ID, state- ment number, etc.). Differences between the two models are then identified and refined such that spurious differences, e.g., due to logging statement modifications, are eliminated. The differences are presented graphically as regions within the discovered behaviour model. This allows programmers to identify anomalous behaviour changes which are not consistent with code changes, thereby identifying potential bugs that may have been introduced during code change. To evaluate the proposed approach, we conducted case study on Nutch (open source application), and an industrial application. We discovered the execu- tion behaviour models for the two versions of applications and identified the differences between them. By manually analysing the regions, we were able to detect bugs introduced in the new versions of these applications. The bugs have been reported and later fixed by the developers, thus, confirming the effectiveness of our approach. This work is published in ICSOC [6]. In the thesis we have explored the potential of applying combination of process mining using various data sources and predictive analytics to improve various aspects of the maintenance process. We have applied the proposed approaches on a series of case studies on data sets of commercial and open source projects. Although we believe that the case studies are representative, to establish gen- eralizability, the proposed approach should be applied on different data sets. To support the reproducibility of our case studies, the large part of data (with the data from the industrial partners being the only exception) have been made publicly available [1]. We believe that leveraging diverse data sources and applying analytics in- telligently has more potential for process improvement. Information from other sources such as emails, chat logs, and screen recordings can further enhance pro- cess improvement. Such analysis usually focus on identifying the inefficiencies, but as we observed in the thesis, it can also lead to automation opportunities to make process more efficient. References 1. Link to publicly available artifact. https://github.com/Mining-multiple-repos- data/TicketExperimentalDataset. 2. Monika Gupta. Nirikshan: process mining software repositories to identify ineffi- ciencies, imperfections, and enhance existing process capabilities. In Companion Proceedings of the 36th International Conference on Software Engineering, pages 658–661, 2014. 3. Monika Gupta. Improving software maintenance using process mining and predic- tive analytics. In 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 681–686. IEEE, 2017. 4. Monika Gupta, Prerna Agarwal, Tarun Tater, Sampath Dechu, and Alexander Serebrenik. Analyzing comments in ticket resolution to capture underlying process interactions. In Artificial Intelligence for Business Process Management, 2020. 5. Monika Gupta, Allahbaksh Asadullah, Srinivas Padmanabhuni, and Alexander Serebrenik. Reducing user input requests to improve it support ticket resolution process. Empirical Software Engineering, 23(3):1664–1703, 2018. 6. Monika Gupta, Atri Mandal, Gargi Dasgupta, and Alexander Serebrenik. Runtime monitoring in continuous deployment by differencing execution behavior model. In International Conference on Service-Oriented Computing, pages 812–827. Springer, 2018. 7. Monika Gupta and Ashish Sureka. Nirikshan: Mining bug report history for dis- covering process maps, inefficiencies and inconsistencies. In Proceedings of the 7th India Software Engineering Conference, pages 1–10, 2014. 8. Monika Gupta and Ashish Sureka. Process cube for software defect resolution. In 2014 21st Asia-Pacific Software Engineering Conference, volume 1, pages 239–246. IEEE, 2014. 9. Monika Gupta, Ashish Sureka, and Srinivas Padmanabhuni. Process mining mul- tiple repositories for software defect resolution from control and organizational perspective. In Proceedings of the 11th Working Conference on Mining Software Repositories, pages 122–131, 2014. 10. Monika Gupta, Ashish Sureka, Srinivas Padmanabhuni, and Allahbaksh Mo- hammedali Asadullah. Identifying software process management challenges: Survey of practitioners in a large global IT company. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pages 346–356. IEEE, 2015.