Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 PERFORMANCE ANALYSIS AND OPTIMIZATION OF MPDROOT J. Buša Jr.1,3, S. Hnatič1,a, O.V. Rogachevsky2 1 Mescheryakov Laboratory of Information Technologies, Joint Institute for Nuclear Research,20 Joliot-Curie, Dubna, Moscow Region, 141980, Russia 2 Veksler and Baldin Laboratory of High Energy Physics, Joint Institute for Nuclear Research, 4 Baldin St., Dubna., Moscow Region, 141980, Russia 3 Institute of Experimental Physics, Slovak Academy of Sciences, Watsonova 47, Košice, 04001, Slovakia E-mail: a hnatics@jinr.ru MPDRoot is the software framework for simulation, reconstruction and physics analysis of the simulated and experimental data for MPD experiment at NICA. It is planned to obtain ~ 10 8 events of heavy ion collisions for physics analysis, hence it is crucial to have the effective and efficient methodology of the systematic performance improvement of MPDRoot’s backend. In this work, we present the analysis of timing and instruction performance of MPDRoot’s reconstruction by benchmarks and the Callgrind profiler. We evaluate the feasibility of speeding up reconstruction by the reduction of method-call overhead and the possible benefit of optimizing the math library. Based on the obtained results we draw conclusions about necessary steps to be taken in the near future of MPDRoot’s development. Keywords: MPD, MPDRoot, benchmarking, code review, code quality, optimization Ján Buša Jr, Slavomír Hnatič, Oleg Rogachevsky Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 75 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 1. Introduction The architecture of MPDRoot is shown in Fig.1 and is composed of the three main parts [1]: Figure 1. Architecture of MPDRoot  Root – a set of building blocks and primitives written in C++ language, tailored for physics experiments  FairRoot – simulation, reconstruction and analysis framework built on top of Root and the other set of packages encapsulated in FairSoft  MPDRoot – specific implementation for the MPD experiment at NICA 2. Instruction and run-time profiling of MPDRoot The process of optimization requires finding runtime bottlenecks, which is commonly achieved by measuring performance of various software entities in units of time and instructions. For this purpose, MPDRoot and FairSoft/FairRoot suite must be built with debug symbols. The information about the instruction profile of various parts of the MPDRoot’s reconstruction is obtained by running the Callgrind tool from the Valgrind suite [2]. The output from the Callgrind profiler can then be visualized in a KCachegrind tool. However, from the practical point of view, it is the physical time the software spends in its given entity, which represents the true performance measure. The accurate measurement of such performance is somewhat problematic as it uses expensive system clock calls. If often used, these skew results and the direct timings of low-cost software entities are usually completely invalid. The pie chart in Fig. 2 shows the difference between the instruction profiler results and the true physical time for the various tasks of the MPDRoot’s reconstruction. This means, the instruction profile obtained from Callgrind is suitable for finding performance bottlenecks. 76 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 Figure 2. Time vs Instructions cost of MPDroot’s reconstruction tasks MPDRoot spends most of the time in digitizing (~45%) and clusterization (~48%) followed by the Kalman filter (~5%) and the Fair engine (less than 1%). Hence, it is logical to look into digitizer and clusterization algorithms for speedup optimization at the current stage of MPDRoot’s development. 3. Reducing method-call overhead It is possible to speedup the code by reducing the method-call overhead with the inline keyword (or the flatten attribute), however the result of code inlining is many times counterproductive [3]. In the most common case, the removal of the call stack results in a larger executable binary, which in turn slows down the execution time. The table in Fig.3 shows the effect of reducing the method-call overhead by inlining the most frequently called methods from Digitizer and ClusterFinder tasks. % instructions instructions % of calls task total out of total per call inlined speedup speedup CalcOrigin 4.8 18 100 4.2% 1.9% (Digitizer task) GetCij 12 380 90 -1.2% -6.3% (ClusterFinder task) Figure 3. Effect of method inlining on MPDroot’s performance While inlining the cheap CalcOrigin method (18 instructions per call) results in an overall speedup by 1.9% of time, inlining the more expensive GetCij method (380 instructions per call) slows 77 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 down the total runtime. Despite the obvious positive effect, inlining the CalcOrigin method at the current stage of development is not a good idea, as this method can be modified in the future. 4. Benchmarking and optimization of math methods Such a closed for modification, well tested software entity used in MPDRoot is the TMath library. The results of the benchmark of TMath methods running in MPDRoot are presented in the upper plot of Fig.4. The power, logarithm, and trigonometric are the most expensive methods. Figure 4. Math methods benchmarks in MPDroot. Upper plot: cost of TMath methods in units of the ‘+’ operation cost Bottom plot: cost of double vs float precision methods If one is careful with cumulative errors, the quickest way to achieve speedup of math methods is to replace default double precision methods with their float precision analogues. Thus, one can get up to 45% speedup (bottom plot of Fig.4). 78 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 Figure 5. TMath methods instructions percentage in MPDroot The plot in Fig.5 shows the instruction percentage of the five most present math methods in MPDRoot’s reconstruction. Overall, the whole TMath library takes less than 2.5% out of the total number of instructions, therefore any optimization of math library is currently not justifiable as it will not lead to any significant reconstruction speedup. 5. Conclusions and near future perspectives It is shown, that the effective methodology of MPDRoot’s optimization consists of firstly isolating bottlenecks by timing benchmarks and instruction profiling. Out of those, the reasonable candidates for optimization are entities already closed for modification in the MPDRoot’s development lifecycle. Having such software entities in the MPDRoot’s codebase is necessary for effective optimization. This means it is crucial to focus on improving the software quality, which will in turn result in a well-tested modularized professional grade code [4], [5], [6]. We will implement the following changes to the MPDRoot’s software development process: 1. Implementation of the code ownership feature - essential for the code review process. 2. Implementation of the QA tests engine – to minimize risks associated with algorithm logic changes or extensions 3. Implementation of the unit test engine – to minimize risks associated with system changes or extensions 6. Acknowledgements The work was supported by the RFBR grant (“Megascience – NICA”) No. 18-02-40102. References [1] Rogachevsky, O.V., Bychkov, A.V., Krylov, A.V. et al. Software Development and Computing for the MPD Experiment. Phys. Part. Nuclei 52, 817–820 (2021). [2] https://valgrind.org/docs/manual/cl-manual.html (accessed 09.09.2021) [3] https://isocpp.org/wiki/faq/inline-functions (accessed 09.09.2021) [4] Robert C. Martin (“Uncle Bob”), Principles of OOD. Available at: http://butunclebob.com/ArticleS.UncleBob.PrinciplesOfOod (accessed 09.09.2021) [5] K.J. Lieberherr, I.M. Holland, Assuring good style for object-oriented programs. IEEE Software. September 1989, pp. 38-48, vol. 6 [6] Andy Hunt, David Thomas, The Art of Enbugging, IEEE Software, January/February 2003, pp. 10-11, vol. 20. 79