V. Kůrková et al. (Eds.): ITAT 2014 with selected papers from Znalosti 2014, CEUR Workshop Proceedings Vol. 1214, pp. 97–101 http://ceur-ws.org/Vol-1214, Series ISSN 1613-0073, c 2014 F. Zavoral, J. Yaghob, D. Bednárek, M. Kruliš Contemplating Efficiency of the ROOT File Format for Data Intensive Simulations in Particle Physics Filip Zavoral, Jakub Yaghob, David Bednárek, and Martin Kruliš Parallel Architectures/Algorithms/Applications Research Group Faculty of Mathematics and Physics, Charles University in Prague Malostranské nám. 25, Prague, Czech Republic {zavoral,yaghob,bednarek,krulis}@ksi.mff.cuni.cz Abstract: A vast majority of nuclear and particle physi- 2 Technical Background cists in the world are currently using the ROOT framework (developed in CERN) as a software platform for simula- 2.1 Belle II and Basf2 tions and data evaluations. Some of the simulations and experiments performed in this domain are very data inten- Belle II [2] is a project headed by Ko Enerugi Kasukoki sive and they need to be designed with particular emphasis Kenkyu Kiko (KEK) centre in Tsukuba, Japan, that stud- on the processing performance. We have analyzed the ef- ies the physical process called Charge Parity violation ficiency of the Background Mixer component employed in (CP violation). Such process is responsible for breaking particle simulations of the Belle II project and identified, the matter-antimatter symmetry, which causes the domi- that the ROOT file format is unacceptably inefficient. The nance of matter over antimatter in our universe. The par- structure of the file and the API presented in ROOT frame- ticular task of the project is to search for new sources of work prevents efficient data retrieval for parallel process- CP-violation that could generate the observed asymmetry. ing. We propose a data format which overcomes these lim- Since the data flow generated by the KEK particle de- itations and which is much more suitable for high perfor- tector is enormous (up to 1.8 GB/s), a distributed comput- mance computing. We also demonstrate how to integrate ing model comprising high-performance grid sites located this novel format into processing pipeline of the Belle II all over the globe has been adopted [1] (Figure 1) in order experiments and present its initial evaluation. to ensure better scalability, redundancy, and sharing hard- ware resources of experiment participants. Keywords: particle physics, experiments, ROOT, data for- mat, HPC, Belle II, performance 1 Introduction Many empirical sciences, especially physics, have been in- creasingly dependent on the computer science support. In this paper, we focus on a particular problem that have been observed by the particle physicists working on Belle II ex- periment [1]. Our objective was to improve performance Figure 1: Belle II computing model of a Background Mixer component, which is used for par- ticle simulations related to the experiment. We have de- A software framework BASF2 (Belle II Analysis tected that the bottleneck was the ROOT file format, which Framework) [3] is being developed for the experimental is being used for storing the experimental and simulated and data evaluation purposes of the Belle II experiment. data, and its API. In this paper, we propose a novel for- It is designed to massively process the data using parallel mat, which is expected to improve the performance of the event processing on the multi-core CPUs. The software Background Mixer module significantly. is written in C++ to meet the requirements of high per- The paper is organized as follows. Section 2 briefly in- formance processing and to interface well with other tools troduces the Belle II experiment, its hardware and software used in particle physics. Python scripting language is used support, and the related problems of the data processing. for the configuration scripts. Section 3 addresses the performance issues of accessing BASF2 is based on a software bus architecture. A large objects in the ROOT files. Our new data format (includ- scale application is designed as a set of pluggable mod- ing implementation and integration details) is proposed in ules [4], each of which serves as a building block of the Section 4 and Section 5 concludes our work. application. The modules are assembled in an execution 98 F. Zavoral, J. Yaghob, D. Bednárek, M. Kruliš pipeline (which the framework denotes path), where out- by which particles). The data file usually holds thousands put of each module is interconnected with input of the sub- of SimHits per each particle in the stream. sequent module. The data are passed between the modules When the Background Mixer is initialized, it receives as ROOT objects (described in Section 3). a list of generated files with background data. Each file was generated with a different combination of Genera- tor, Component, and Detector modules. The Background 2.2 Background Mixer Mixer then works as a filter of the main event stream. For each particle event in the main stream it loads a frame of One of the essential parts of the experiment is an ac- particles and SimHits from each of the input files and adds curate simulation of the background signal. The back- them to the main stream. ground is generated by Monte Carlo method and the pro- duced dataset is mixed with other simulated events. The BASF2 module responsible for this task is called Back- 2.4 ROOT File Format ground Mixer [5]. ROOT [6] is an object-oriented C++ framework conceived in the high-energy physics community. It was designed for storing and analyzing large volumes of data. It uses a proprietary ROOT file format for data representation. Basically any C++ class/structure can be serialized into a ROOT file in a machine-independent compressed binary representation. Figure 2: Belle II background mixer module The presimulated background is added as SimHits (sim- ulated events on particular detectors) to the already ex- isting SimHits from the main signal. Both signals are then mixed together, giving a realistic result. A simpli- fied scheme of the background mixing process is depicted in Figure 2. 2.3 Structure of Data Processing Each BASF2 module has standardized component inter- face which is mainly responsible for controlling the input and output data streams. When the module path is exe- cuted, all relevant datasets are initialized, i.e., their ROOT objects are fetched in the main memory for onward pro- Figure 3: Structure of a ROOT file cessing. The input data for the Background Mixer are generated Internally, the ROOT files are organized as TTrees - con- by three module types: Generator, Component, and De- tainers optimized for I/O and memory usage. A TTree tector. The Generator and Component generate main par- consists of branches (as depicted in Figure 4 and Figure 3) ticles and background particles respectively. The Detec- which can contain complete serialized object of a specified tor module simulates the particle detector and generates class or which can be split up into sub-branches contain- SimHit events as they would have been generated by the ing individual data members of the original object. The real hardware. Various types of generators, components, splitting can be done recursively. Branch-based storage is and detectors exist and the Background Mixer input ROOT an example of column-wise (vertical) storage. file keeps the types of the modules that were used for gen- Much more detailed description of the ROOT file format erating the data. can be found in [6] and [7]. The Background Mixer input file contains three main data streams – mcParticles, mcSimHits, and mcPartRels. The mcParticles stream represents the particles generated 3 ROOT Format Efficiency by Generator and Component modules. The mcSimHits stream holds the SimHits produced by the Detector. The The ROOT file format and associated ROOT libraries have msPartRels stream provides the n-to-m relation between exhibited poor I/O performance in some situations, espe- particles and SimHits (i.e., which SimHits were produced cially in case of the Background Mixer component. Contemplating Efficiency of the ROOT File Format for Data Intensive Simulations in Particle Physics 99 0 50 100 time 150 200 0 500000 1000000 1500000 file position Figure 4: Internal tree structure of a ROOT file Figure 5 shows how a ROOT file is accessed when its Figure 5: Access pattern to a ROOT binary format in case logical structure is read sequentially. The file consists of of serial processing 216 logical entries which were read in the same order as when they were written. The vertical axis represents the progress of the processing of the logical entries. The hori- zontal axis represents the data in the binary file image (ap- prox. 1.8 MB large). The graph marks, which portions 0 of the binary image were accessed (I/O operations) when each logical entry was processed. 50 The figure indicates that the processing of the very first entry induces access to approximately 75% of the file data. However, the processing of subsequent 70 entries were 100 time satisfied from this data cached in memory. Some of the following entries induced additional reading of the file, 150 including minor portions that were already read. Never- theless, the amount of repetitive read operations is almost negligible and some parts of the file were never accessed. 200 Figure 6 displays the effect of random access to the log- ical entries of the same file. In this case, entries were pro- 0 500000 1000000 1500000 cessed in randomly permuted order. Almost every event induced significant access to the file. The accessed areas file position have similar size as in the sequential case; however, there is almost no effect of caching. Consequently, the total disk Figure 6: Access pattern to a ROOT binary format in case traffic was 53 MB (i.e. about 30 times the size of the file), of random processing which significantly exceeds the 1.7 MB, which were read in the sequential case. Unfortunately, a random access pattern is currently em- 4 Proposed Solution ployed by the Background Mixer since it is mixing data from several independent files. The ROOT API permits the user to handle multiple files at once, but internal limita- The ROOT format is tightly coupled with many essential tions and function invariants prevent efficient data caching. parts of the ROOT framework, so it would be quite imprac- According to reports generated by the cluster task manage- tical to significantly alter its structure or API. To solve the ment framework, the computational workload uses only problems observed in the Background Mixer, we propose 25% of the assigned CPU, which is unacceptably low. a novel GFT data format, where GFT stands for GraFT as The cost of data processing is a substantial part of the we have grafted this new format from the ROOT format. Belle II project expenses. Significant improvement of the It is designed specifically to preserve all the benefits of the efficiency would enable more extensive and influential ex- ROOT format and to optimize data I/O operations in many periments. scenarios, especially in case of the Ground Mixer. 100 F. Zavoral, J. Yaghob, D. Bednárek, M. Kruliš 4.1 Format Description The original logical structure of a ROOT file comprised only three data streams (Section 2.3). We have decided to make the format more general, so it can possibly contain Figure 9: Schema of the HOT data structure arbitrary number of data streams. Since we are describing binary file format, technical de- Each HOT data structure (Figure 9) contains data for tails regarding representation of atomic values have to be one frame. The data from the streams are restructured into observed. All numbers are stored in IEEE 754 represen- a column-oriented format, which is much more suitable for tation with little-endian byte ordering. Strings are stored modern computing architectures including GPGPU. Each as zero-terminated sequences of 8-bit chars (C string con- data column is stored in a COLD (COLumn Data) struc- vention). In the remainder of this paper we denote basic ture and represents one data field of one stream. integer types as iXX (signed) and uXX (unsigned), where XX is the number of bits (e.g., u64 stands for 64-bit un- signed integer). One GFT file is produced by a combination of three modules: Generator, Component, and Detector (as de- scribed in Section 2.3). It is possible to have multiple files generated with the same combination of these three mod- ules, but data in one file has to be generated by a fixed Figure 10: Header of the HOT data structure combination. The header of the HOT structure (Figure 10) contains only the number of records in each data stream and an index to the COLD data structures. The index is imple- mented as a sequence of file offsets, which are relative to the beginning of the HOT data structure. Figure 7: Structure of GFT file A GFT file (Figure 7) contains one main header and a sequence of HOT (Hierarchial Object Table) data struc- tures. Each HOT represents one set (originally called frame) of data, which is mixed with one event from the main stream. Figure 11: COLD header of GFT The COLD data structure comprises a COLD header (Figure 11) and a data payload in column-oriented for- mat. The column data representation permits using spe- cific compression techniques (e.g., delta encoding com- bined with RLE compression), which is an important com- ponent of big data processing. Some forms of compression require additional data (like the compression dictionary), so we have introduced header-specific data (entries hdsize Figure 8: Main header of GFT file and hdata) to each COLD structure. The column-data pay- load is stored directly after the header as a vector of values The GFT main header (Figure 8) comprise global identi- of given basic data type. The vector is properly aligned to fication entries followed by the description of data streams the size of the data type of its items. stored in the file. Each stream is defined by its name, num- ber of data fields in the stream record, and the data type of 4.2 Implementation Details the data values in the stream. The header can optionally contain a HOT index, which We have implemented a prototype library for manipulation can be used for random access to the HOT data structures. with data in GFT files. The library has two interfaces (gft- If the index is present, its offset from the beginning of the reader and gft-writer), which are responsible for reading file is stored in the idxoffs entry. and writing the data respectively. There is currently no Contemplating Efficiency of the ROOT File Format for Data Intensive Simulations in Particle Physics 101 API for direct data manipulation, since the expected usage In the future work, we are going to integrate the back- does not modify the data once they are generated. ground processing chain and thoroughly evaluate its ef- The metadata contained in the headers (main header, ficiency. We expect a significant speedup of the particle HOT headers, and COLD headers) are sufficiently small, simulations and better utilization of the Belle II hardware thus the API caches them for the entire time the file is infrastructure. opened. The payload data are read and written in the gran- ularity of HOT structures. A HOT structure usually takes kilobytes or tens of kilobytes, thus the read/write opera- Acknowledgment tions have acceptable overhead. Current implementation uses standard synchronous I/O This paper was supported by Czech Science Foundation of the operating system for reading and writing the file (GACR) projects P103/13/08195 and P103/14/14292P and data, since it was easiest for implementation and our first by SVV-2014-260100. objective was to create a proof of concept only. In the fu- ture implementations, we are planning to experiment with References the asynchronous I/O and memory mapped files. Our preliminary experiments indicate that the reading [1] T. Kuhr, “Computing at Belle II,” Journal of Physics: Con- overhead is below 10%, which is much more acceptable ference Series, vol. 396, 2012. than 3000% overhead observed for the ROOT files. [2] Ko Enerugi Kasokuki Kenkyu Kiko (KEK) - High Energy Accelerator Research Organization. (2014) Belle II Project. [Online]. Available: http://belle2.kek.jp/ 4.3 Integration in Belle Framework [3] R. Itoh, S. Lee, N.Katayama, S.Mineo, A.Moll, T.Kuhr, The integration of the proposed format into the processing and M.Heck, “Implementation of Parallel Processing in the chain of BASF2 is quite straightforward. We have identi- Basf2 Framework for Belle II,” Journal of Physics: Conf. fied two places where the proposed format could be used. Ser., vol. 396, 2012. First, when the background are generated, they will be [4] D. Y. Kim, “The Software Library of the Coming Belle II Experiment and its Simulation Package,” in Proceedings of directly written to the GFT format by the MCGenerator the 2013 IEEE Nuclear Science Symposium. IEEE, 2013. module. The module has to be modified to use gft-writer [5] P. Kvasnicka, “Background mixing status and plans,” in 15th instead of the original ROOT file API. Since the existing OM of the Belle II Collaboration, Blacksburgh. KEK, 2013. pregenerated data for the Background Mixer are intended [6] R Brun et al. (2014) Root. [Online]. Available: http: for the evaluation purposes only, they may be discarded //root.cern.ch/ and generated again (no conversion tool is required). [7] I. Antcheva et al, “ROOT — A C++ Framework for Petabyte Second, the Background Mixer has to be mod- Data Storage, Statistical Analysis and Visualization,” Com- ified to use GFT format instead ROOT for- puter Physics Communications, vol. 182, no. 6, pp. 1384 – mat. The template class 1385, 2011. Belle2::background::Generator has to be reimple- mented using gft-reader. The performance improvement of this modification should be twofold. The data stored in the GFT format will be accessed almost sequentially and read only once. Furthermore, an additional speedup caused by non-interleaving access to the ROOT library is expected, since the ROOT API will be used solely for accessing the main particle stream. 5 Conclusion The analysis of the Belle II Background Mixer data flows showed that the underlying ROOT file format is unaccept- ably inefficient for the purposes of combining data from multiple files together. We have proposed a novel data for- mat GTF which is more suitable for efficient data stream processing and adopts some design features of high perfor- mance data formats. At present, the reader and writer APIs of the proposed GFT format are implemented in a form of library and our preliminary performance experiments in- dicate, that the new format has much lower overhead then ROOT format.