8 Unit and Performance Testing of Scientific Software Using MATLAB R BOJANA KOTESKA, MONIKA SIMJANOSKA, IVANA JACHEVA, FROSINA KRSTESKA and ANASTAS MISHEV, Ss. Cyril and Methodius University In this paper we report our activities on performing testing of scientific software for calculating ECG-derived heart rate (HR)and respiratory rate (RR) by using Matlab R . The aim of this software is to aid the triage process in the emergency medicine which is crucial for ranging the priority of the injured victims in mass casualty situations based on the severity of their condition. One challenge of this paper is to perform unit testing in Matlab by using the Input Space Partitioning method. For that purpose, we created sets of test values for the ECG signals and we modified them according to our needs. By using the Profiler tool we tested the performance of the algorithm functions. 1. INTRODUCTION Scientific software solves problems in various (scientific) fields by applying computational practices [Kelly et al. 2008]. Its multidisciplinary nature makes it more complex, but it provides great opportu- nities and advantages for scientists in many different scientific fields. Scientists usually have a large amount of data to process [Wilson et al. 2014], many calculations, lots of requests to handle, and au- tomating the process by creating software makes their work easier, increases productivity, quality and sustainability [Wiedemann 2013]. Scientific software is based on models, experimentation and observation of the results [Joppa et al. 2013]. Tests are very much like experiments and the obtained results are observed later. That is how the scientists test their hypotheses. They run experiments, measure results and analyze the data. Scientific software testing is a hard and challenging task due to the complexity and the lack of test oracles [Kanewala and Chen 2018; Lin et al. 2018]. Challenges are categorized according to the specific testing activities: test case development, producing expected test case output values, test execution, test result interpretation, cultural differences between scientists and the software engineering com- munity, limited understanding of testing process, not applying known testing methods, etc [Kanewala and Bieman 2014]. In this paper, we report our activities on performing testing of a scientific software for calculating ECG-derived heart rate (HR) and respiratory rate (RR) by using Matlab R . We try to identify specific challenges, proposed solutions, and unsolved problems faced when testing scientific software. The aim of this software is to aid the triage process in the emergency medicine which is crucial for ranging the priority of the injured victims in mass casualty situations based on the severity of their condition This work is supported by the Faculty of Computer Science and Engineering, Skopje, North Macedonia. Ivana Jacheva, Bojana Koteska, Frosina Krsteska, Monika Simjanoska, and Anastas Mishev, FCSE, Rugjer Boshkovikj 16, P.O. Box 393 1000, Skopje; email: ivana.jaceva@students.finki.ukim.mk; bojana.koteska@finki.ukim.mk; frosina.krsteska@students.finki.ukim.mk; monika.simjanoska@finki.ukim.mk; anastas.mishev@finki.ukim.mk. Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 Inter- national (CC BY 4.0). In: Z. Budimac and B. Koteska (eds.): Proceedings of the SQAMIA 2019: 8th Workshop on Software Quality, Analysis, Mon- itoring, Improvement, and Applications, Ohrid, North Macedonia, 22–25. September 2019. Also published online by CEUR Workshop Proceedings (http://ceur-ws.org, ISSN 1613-0073) 8:2 • Bojana Koteska et al. [Hogan and Brown 2014], whether they are man-made, natural or hybrid disasters. When performed manually, an efficient triage process takes less than 30 seconds. In order to optimize the process when there are hundreds of injured people, the challenge is to reduce the triage time and the number of medical persons needed. The software we describe in Section 2 extracts heart rate and respiratory rate from ECG signal. This optimization can be achieved by using the benefits of the biosensor technologies to extract the vital signs needed for the triage. The other sections are organized as follows. Section 3 provides a comprehensive explanation of the testing methodology and results: definition of test cases, testing preparations and execution and anal- yses of the results of the executed tests. The final Section 4 concludes the paper. 2. DESCRIPTION OF SOFTWARE FOR CALCULATING ECG-DERIVED HR AND RR The software is developed as a part of the triage procedure for determining a patient’s condition [Gursky and Hrečkovski 2012]. It is designed for low-power wearable biosensor and it uses only an ECG signal to estimate automatically the HR and RR as crucial parts of the primary triage. The goal of this software is to perform efficient real-time processing of ECG data in terms of the power-demanding Bluetooth connection with the biosensor and the data transmission. The accuracy of the algorithm is published in [Simjanoska et al. 2018]. Fig. 1. Software Methodology As depicted in Fig. 1, the input of the algorithm is a raw ECG signal. The calculation of the HR is per- formed by R-peak detection performed by using the the Pan Tompkins algorithm [Pan and Tompkins 1985]. HR is calculated according to the following equation: Unit and Performance Testing of Scientific Software Using MATLAB R • 8:3 signal length HR = ( )−1 ∗ 60 (1) number of R peaks The obtained R-peaks are used to estimate the RR. The kurtosis computation technique is used for measuring the peakedness of the signal’s distribution. The locations of the local maxima are needed for the smoothing method upon which a peak finder method is applied to find the local maxima (peaks) of the ECG signal. The peaks represent a number of respirations according to which the respiratory rate is calculated: signal length RR = ( )−1 ∗ 60 (2) number of respirations The implementation of the proposed algorithm is realized in Matlab. It has 493 lines of code in total. The software contains two main functions: h3r and pan tompkin. The first one, h3r, has two argu- ments needed for the calculation of the estimated values of RR and HR. The first argument is a raw ECG signal, represented in vector format, which contains an array of decimal values and the second argument is the measurement frequency (number of measurements in a second). The second function, pan tompkin, uses three arguments for calculating the qrs amp raw - ampli- tude of R-waves, qrs i raw - the index of R-waves and delay - a number of samples in which the signal is delayed due to the filtering. The arguments used by pan tompkin are ECG - raw ECG vector signal, fs - sampling frequency and gr-flag - flag for plotting. 3. METHODOLOGY AND RESULTS 3.1 Unit Testing h3r and pan tompkin functions are the two main software units. Unit testing purpose is to assess the software units produced by the implementation phase. It represents the ”lowest” level of testing [Ammann and Offutt 2016]. Unit testing improves the quality of science and engineering software and it focuses on small units of code (class, module, method, or function). In order to have effective unit testing, the procedure should be automated so that entire test suites can be run quickly and easily. The verifying of the individual units is helpful for verifying overall system behavior. The focus on the smaller software part leads to modular and more maintainable code [Eddins 2009]. When testing a scientific software, it is very important to think about the nature of the software and to be familiar with the scientific field. The metrics should be well defined and most often it is easily noticed that they are surreal. For example, a respiratory rate can’t be 1000 breaths per minute. That’s why the testers should know the possible result value ranges, so that they can determine the input domain. The input domain should consists of as much as possible inputs (valid and invalid) that could be taken by the program. The tester should choose test cases wisely since the input values could be infinite. Program correctness can be proved by testing all possible input values, but we can only test limited set of inputs (known as test cases). One method to determine the inputs for a specific variable is to use Input Space Partitioning (ISP) - input or output data is grouped or partitioned into sets of data that we expect to behave similarly and will help us with creating test cases. We used Functionality- based ISP [Ammann and Offutt 2016]. Matlab provides multiple tools for unit testing [The MathWorks, Inc 2019]. h3r function’s first argument is the raw ECG signal vector. According to the database of ECG signals, the usual values are floating points values in range 0-10. The purpose of ISP is to create partitions of the input values and to test the software behavior when the input values are outside the usual range also. For the ECG vector, the following test cases were tested: 8:4 • Bojana Koteska et al. Fig. 2. Unit Test Results —empty vector; —vector of zeros; —vector with negative float values; —vector with positive floating values; —vector with positive floating values greater than 10; —vectors with combinations of negative values and 0s; —vectors with combinations of positive values and 0s; —vectors with combinations of values (negative, positive and 0). We used the same strategy to test the other function as well, because both functions use the same type of argument - the raw ECG signal vector. For the frequency parameter and for the gr flag we also made ISP (for the frequency we have the correct value of 125, then 0, negative number and value Unit and Performance Testing of Scientific Software Using MATLAB R • 8:5 greater than 125). For the gr flag we have 0, 1, negative value and value greater than 1. Each one of these checks represents a single test, which can be run separately. In Matlab it is possible to define unit test suites. We combined all the tests in one file (test suite), and we ran the file once which ran all the tests automatically one by one. By using the assertion functions from the matlab.unittest.qualifications.Assertable class, we com- pared the output HR and RR values with the expected ones and if the test passes, that means the assertion is true and the values from the test are the values we’ve been expecting to get. Fig. 3. Performance testing of the hr3 function. To make test running easy, Matlab provides function runtests which runs all the tests in the current folder, gathers them into a test suite, runs the test suite and returns the results as a TestResult object. In our project, we have several test files: one with the test cases for signal vector values, one for testing the functions with different values for frequency, etc. Figure 2 shows the output from the first test file - preallocationTest. As the figure shows, the output is shown in the Command Window. In Matlab, if the test case passed, it doesn’t show any output. So, the output shown here is only by the test cases 8:6 • Bojana Koteska et al. that failed. The output is presented in details, telling where and why the test case failed: function h3r can not provide results for signal contains only 0s, it does not work with negative values, combination of 0s and negative values, combination of 0s and positive values, etc. Fig. 4. Performance testing of the pan tompkin function. 3.2 Performance testing Performance testing is a non-functional testing which checks the behavior of the system when it is under significant load. In a case of performance testing the software system is evaluated from a user’s perspective, and is typically assessed in terms of throughput, stimulus response time, or both. Per- formance testing could be used to assess the level of system availability also [Vokolos and Weyuker 1998]. The goal of performance testing is to identify the performance bottleneck, to make comparison Unit and Performance Testing of Scientific Software Using MATLAB R • 8:7 Fig. 5. Perfromance evaluation of each code line. of the performance, etc. Usually, the performance testing is made by using a benchmark - a program or workload designed to be representative of the typical software system usage [Pan 1999]. In the earlier versions of Matlab, in order to measure the code’s performance, there was a testing framework, which included different performance measurement-oriented features. The newest version comes with a tool called Profiler, which automates the work. It provides details about the performance of a specific function: how much time did it take to execute the function, the percentage of the summed up time of executing the part of the program which that separate unit used, whether a line has been executed and if so, how many times, the full code coverage etc. Figure 3 shows the results of the execution time of the h3r function. It gives information about the number of calls and total execution time for each children function call. 8:8 • Bojana Koteska et al. Figure 4 shows the executing time of the pan tompkin function. With the Profiler Tool, we can analyse each function call separately and see more detailed informa- tion about its executing, like it’s shown in Fig. 5. The results show that pan tompkin function took most of the execution time (0.184s or 73.1% of the total execution time). 4. CONCLUSION Testing of the scientific software must become a standard part of the development process. Not because we only want to implement the correct way of development process as specified in the literature, but also we should consider the fact that many of the scientific software programs are connected to people’s health and can be categorized as critical software programs. Many of the testing methods designed for commercial scientific software can be adapted to scientific software testing. Testing should be consid- ered from different aspects also. For example, even if the scientific program code produces accurate results, problems with performance can exist. In this paper we made unit and performance testing of a scientific software for calculating ECG- derived heart rate (HR) and respiratory rate (RR) designed to aid the triage process in the emergency medicine which is crucial for ranging the priority of the injured victims in mass casualty situations based on the severity of their condition. The testing was done in Matlab. Multiple unit tests were created to test the functionality of the algorithm. Matlab provides a very user friendly testing interface. Most of the things are fully automated and only function parameters are required for testing. Unit testing and input space partitioning method helped us to find several function errors, especially with test cases with boundary values. For e.g. ECG signal with zeros, signal with negative values and signals with different lengths helped us to add exceptions in the code. Performance testing helped us to check the execution times of the functions. We performed tests with different signal lengths in order to increase the data load. That was useful to think about the code optimization which is left as our future work. REFERENCES Paul Ammann and Jeff Offutt. 2016. Introduction to software testing. Cambridge University Press. Steven L Eddins. 2009. Automated software testing for matlab. Computing in science & engineering 11, 6 (2009), 48–55. Elin A Gursky and Boris Hrečkovski. 2012. Handbook for Pandemic and Mass-casualty Planning and Response. Vol. 100. IOS Press. David E Hogan and Travis Brown. 2014. Utility of vital signs in mass casualty-disaster triage. Western journal of emergency medicine 15, 7 (2014), 732. Lucas N Joppa, Greg McInerny, Richard Harper, Lara Salido, Kenji Takeda, Kenton O’hara, David Gavaghan, and Stephen Emmott. 2013. Troubling trends in scientific software use. Science 340, 6134 (2013), 814–815. Upulee Kanewala and James M Bieman. 2014. Testing scientific software: A systematic literature review. Information and software technology 56, 10 (2014), 1219–1232. Upulee Kanewala and Tsong Yueh Chen. 2018. Metamorphic Testing: A Simple Yet Effective Approach for Testing Scientific Software. Computing in Science & Engineering 21, 1 (2018), 66–72. Diane Kelly, Rebecca Sanders, and others. 2008. Assessing the quality of scientific software. In First International Workshop on Software Engineering for Computational Science and Engineering. Citeseer. Xuanyi Lin, Michelle Simon, and Nan Niu. 2018. Hierarchical metamorphic relations for testing scientific software. In Proceed- ings of the International Workshop on Software Engineering for Science. ACM, 1–8. Jiantao Pan. 1999. Software testing. Dependable Embedded Systems 5 (1999), 2006. Jiapu Pan and Willis J Tompkins. 1985. A real-time QRS detection algorithm. IEEE Trans. Biomed. Eng 32, 3 (1985), 230–236. Monika Simjanoska, Bojana Koteska, Ana Madevska Bogdanova, Nevena Ackovska, Vladimir Trajkovik, and Magdalena Kos- toska. 2018. Automated triage parameters estimation from ECG. Technology and Health Care Preprint (2018), 1–4. The MathWorks, Inc. 2019. Testing Frameworks in Matlab. (2019). https://www.mathworks.com/help/matlab/ matlab-unit-test-framework.html Unit and Performance Testing of Scientific Software Using MATLAB R • 8:9 Filippos I Vokolos and Elaine J Weyuker. 1998. Performance testing of software systems. In Proceedings of the 1st International Workshop on Software and Performance. ACM, 80–87. Christin Wiedemann. 2013. Applying the scientific method to software testing. (2013). https://searchsoftwarequality.techtarget. com/feature/Applying-the-scientific-method-to-software-testing Greg Wilson, Dhavide A Aruliah, C Titus Brown, Neil P Chue Hong, Matt Davis, Richard T Guy, Steven HD Haddock, Kathryn D Huff, Ian M Mitchell, Mark D Plumbley, and others. 2014. Best practices for scientific computing. PLoS biology 12, 1 (2014), e1001745.