V. Kůrková et al. (Eds.): ITAT 2014 with selected papers from Znalosti 2014, CEUR Workshop Proceedings Vol. 1214, pp. 97–101
http://ceur-ws.org/Vol-1214, Series ISSN 1613-0073, c 2014 F. Zavoral, J. Yaghob, D. Bednárek, M. Kruliš


        Contemplating Efficiency of the ROOT File Format for Data Intensive
                          Simulations in Particle Physics

                                Filip Zavoral, Jakub Yaghob, David Bednárek, and Martin Kruliš

                                    Parallel Architectures/Algorithms/Applications Research Group
                                   Faculty of Mathematics and Physics, Charles University in Prague
                                             Malostranské nám. 25, Prague, Czech Republic
                                    {zavoral,yaghob,bednarek,krulis}@ksi.mff.cuni.cz

Abstract: A vast majority of nuclear and particle physi-                2     Technical Background
cists in the world are currently using the ROOT framework
(developed in CERN) as a software platform for simula-                  2.1    Belle II and Basf2
tions and data evaluations. Some of the simulations and
experiments performed in this domain are very data inten-               Belle II [2] is a project headed by Ko Enerugi Kasukoki
sive and they need to be designed with particular emphasis              Kenkyu Kiko (KEK) centre in Tsukuba, Japan, that stud-
on the processing performance. We have analyzed the ef-                 ies the physical process called Charge Parity violation
ficiency of the Background Mixer component employed in                  (CP violation). Such process is responsible for breaking
particle simulations of the Belle II project and identified,            the matter-antimatter symmetry, which causes the domi-
that the ROOT file format is unacceptably inefficient. The              nance of matter over antimatter in our universe. The par-
structure of the file and the API presented in ROOT frame-              ticular task of the project is to search for new sources of
work prevents efficient data retrieval for parallel process-            CP-violation that could generate the observed asymmetry.
ing. We propose a data format which overcomes these lim-                   Since the data flow generated by the KEK particle de-
itations and which is much more suitable for high perfor-               tector is enormous (up to 1.8 GB/s), a distributed comput-
mance computing. We also demonstrate how to integrate                   ing model comprising high-performance grid sites located
this novel format into processing pipeline of the Belle II              all over the globe has been adopted [1] (Figure 1) in order
experiments and present its initial evaluation.                         to ensure better scalability, redundancy, and sharing hard-
                                                                        ware resources of experiment participants.
Keywords: particle physics, experiments, ROOT, data for-
mat, HPC, Belle II, performance


1    Introduction


Many empirical sciences, especially physics, have been in-
creasingly dependent on the computer science support. In
this paper, we focus on a particular problem that have been
observed by the particle physicists working on Belle II ex-
periment [1]. Our objective was to improve performance                               Figure 1: Belle II computing model
of a Background Mixer component, which is used for par-
ticle simulations related to the experiment. We have de-                   A software framework BASF2 (Belle II Analysis
tected that the bottleneck was the ROOT file format, which              Framework) [3] is being developed for the experimental
is being used for storing the experimental and simulated                and data evaluation purposes of the Belle II experiment.
data, and its API. In this paper, we propose a novel for-               It is designed to massively process the data using parallel
mat, which is expected to improve the performance of the                event processing on the multi-core CPUs. The software
Background Mixer module significantly.                                  is written in C++ to meet the requirements of high per-
   The paper is organized as follows. Section 2 briefly in-             formance processing and to interface well with other tools
troduces the Belle II experiment, its hardware and software             used in particle physics. Python scripting language is used
support, and the related problems of the data processing.               for the configuration scripts.
Section 3 addresses the performance issues of accessing                    BASF2 is based on a software bus architecture. A large
objects in the ROOT files. Our new data format (includ-                 scale application is designed as a set of pluggable mod-
ing implementation and integration details) is proposed in              ules [4], each of which serves as a building block of the
Section 4 and Section 5 concludes our work.                             application. The modules are assembled in an execution
98                                                                                    F. Zavoral, J. Yaghob, D. Bednárek, M. Kruliš


pipeline (which the framework denotes path), where out-          by which particles). The data file usually holds thousands
put of each module is interconnected with input of the sub-      of SimHits per each particle in the stream.
sequent module. The data are passed between the modules             When the Background Mixer is initialized, it receives
as ROOT objects (described in Section 3).                        a list of generated files with background data. Each file
                                                                 was generated with a different combination of Genera-
                                                                 tor, Component, and Detector modules. The Background
2.2 Background Mixer                                             Mixer then works as a filter of the main event stream. For
                                                                 each particle event in the main stream it loads a frame of
One of the essential parts of the experiment is an ac-
                                                                 particles and SimHits from each of the input files and adds
curate simulation of the background signal. The back-
                                                                 them to the main stream.
ground is generated by Monte Carlo method and the pro-
duced dataset is mixed with other simulated events. The
BASF2 module responsible for this task is called Back-           2.4   ROOT File Format
ground Mixer [5].
                                                                 ROOT [6] is an object-oriented C++ framework conceived
                                                                 in the high-energy physics community. It was designed
                                                                 for storing and analyzing large volumes of data. It uses
                                                                 a proprietary ROOT file format for data representation.
                                                                 Basically any C++ class/structure can be serialized into
                                                                 a ROOT file in a machine-independent compressed binary
                                                                 representation.


       Figure 2: Belle II background mixer module

   The presimulated background is added as SimHits (sim-
ulated events on particular detectors) to the already ex-
isting SimHits from the main signal. Both signals are
then mixed together, giving a realistic result. A simpli-
fied scheme of the background mixing process is depicted
in Figure 2.


2.3 Structure of Data Processing

Each BASF2 module has standardized component inter-
face which is mainly responsible for controlling the input
and output data streams. When the module path is exe-
cuted, all relevant datasets are initialized, i.e., their ROOT
objects are fetched in the main memory for onward pro-                       Figure 3: Structure of a ROOT file
cessing.
   The input data for the Background Mixer are generated            Internally, the ROOT files are organized as TTrees - con-
by three module types: Generator, Component, and De-             tainers optimized for I/O and memory usage. A TTree
tector. The Generator and Component generate main par-           consists of branches (as depicted in Figure 4 and Figure 3)
ticles and background particles respectively. The Detec-         which can contain complete serialized object of a specified
tor module simulates the particle detector and generates         class or which can be split up into sub-branches contain-
SimHit events as they would have been generated by the           ing individual data members of the original object. The
real hardware. Various types of generators, components,          splitting can be done recursively. Branch-based storage is
and detectors exist and the Background Mixer input ROOT          an example of column-wise (vertical) storage.
file keeps the types of the modules that were used for gen-         Much more detailed description of the ROOT file format
erating the data.                                                can be found in [6] and [7].
   The Background Mixer input file contains three main
data streams – mcParticles, mcSimHits, and mcPartRels.
The mcParticles stream represents the particles generated        3     ROOT Format Efficiency
by Generator and Component modules. The mcSimHits
stream holds the SimHits produced by the Detector. The           The ROOT file format and associated ROOT libraries have
msPartRels stream provides the n-to-m relation between           exhibited poor I/O performance in some situations, espe-
particles and SimHits (i.e., which SimHits were produced         cially in case of the Background Mixer component.
Contemplating Efficiency of the ROOT File Format for Data Intensive Simulations in Particle Physics                                    99


                                                                                      0
                                                                                      50
                                                                                      100
                                                                              time

                                                                                      150
                                                                                      200
                                                                                            0    500000        1000000    1500000

                                                                                                          file position
       Figure 4: Internal tree structure of a ROOT file
   Figure 5 shows how a ROOT file is accessed when its                    Figure 5: Access pattern to a ROOT binary format in case
logical structure is read sequentially. The file consists of              of serial processing
216 logical entries which were read in the same order as
when they were written. The vertical axis represents the
progress of the processing of the logical entries. The hori-
zontal axis represents the data in the binary file image (ap-
prox. 1.8 MB large). The graph marks, which portions
                                                                                      0


of the binary image were accessed (I/O operations) when
each logical entry was processed.
                                                                                      50


   The figure indicates that the processing of the very first
entry induces access to approximately 75% of the file data.
However, the processing of subsequent 70 entries were
                                                                                      100
                                                                              time


satisfied from this data cached in memory. Some of the
following entries induced additional reading of the file,
                                                                                      150


including minor portions that were already read. Never-
theless, the amount of repetitive read operations is almost
negligible and some parts of the file were never accessed.
                                                                                      200


   Figure 6 displays the effect of random access to the log-
ical entries of the same file. In this case, entries were pro-
                                                                                            0    500000        1000000    1500000
cessed in randomly permuted order. Almost every event
induced significant access to the file. The accessed areas                                                file position

have similar size as in the sequential case; however, there
is almost no effect of caching. Consequently, the total disk              Figure 6: Access pattern to a ROOT binary format in case
traffic was 53 MB (i.e. about 30 times the size of the file),             of random processing
which significantly exceeds the 1.7 MB, which were read
in the sequential case.
   Unfortunately, a random access pattern is currently em-                4          Proposed Solution
ployed by the Background Mixer since it is mixing data
from several independent files. The ROOT API permits
the user to handle multiple files at once, but internal limita-           The ROOT format is tightly coupled with many essential
tions and function invariants prevent efficient data caching.             parts of the ROOT framework, so it would be quite imprac-
According to reports generated by the cluster task manage-                tical to significantly alter its structure or API. To solve the
ment framework, the computational workload uses only                      problems observed in the Background Mixer, we propose
25% of the assigned CPU, which is unacceptably low.                       a novel GFT data format, where GFT stands for GraFT as
   The cost of data processing is a substantial part of the               we have grafted this new format from the ROOT format.
Belle II project expenses. Significant improvement of the                 It is designed specifically to preserve all the benefits of the
efficiency would enable more extensive and influential ex-                ROOT format and to optimize data I/O operations in many
periments.                                                                scenarios, especially in case of the Ground Mixer.
100                                                                                    F. Zavoral, J. Yaghob, D. Bednárek, M. Kruliš


4.1   Format Description

The original logical structure of a ROOT file comprised
only three data streams (Section 2.3). We have decided to
make the format more general, so it can possibly contain                Figure 9: Schema of the HOT data structure
arbitrary number of data streams.
   Since we are describing binary file format, technical de-
                                                                    Each HOT data structure (Figure 9) contains data for
tails regarding representation of atomic values have to be
                                                                 one frame. The data from the streams are restructured into
observed. All numbers are stored in IEEE 754 represen-
                                                                 a column-oriented format, which is much more suitable for
tation with little-endian byte ordering. Strings are stored
                                                                 modern computing architectures including GPGPU. Each
as zero-terminated sequences of 8-bit chars (C string con-
                                                                 data column is stored in a COLD (COLumn Data) struc-
vention). In the remainder of this paper we denote basic
                                                                 ture and represents one data field of one stream.
integer types as iXX (signed) and uXX (unsigned), where
XX is the number of bits (e.g., u64 stands for 64-bit un-
signed integer).
   One GFT file is produced by a combination of three
modules: Generator, Component, and Detector (as de-
scribed in Section 2.3). It is possible to have multiple files
generated with the same combination of these three mod-
ules, but data in one file has to be generated by a fixed               Figure 10: Header of the HOT data structure
combination.
                                                                   The header of the HOT structure (Figure 10) contains
                                                                 only the number of records in each data stream and an
                                                                 index to the COLD data structures. The index is imple-
                                                                 mented as a sequence of file offsets, which are relative to
                                                                 the beginning of the HOT data structure.
              Figure 7: Structure of GFT file

   A GFT file (Figure 7) contains one main header and
a sequence of HOT (Hierarchial Object Table) data struc-
tures. Each HOT represents one set (originally called
frame) of data, which is mixed with one event from the
main stream.


                                                                              Figure 11: COLD header of GFT

                                                                    The COLD data structure comprises a COLD header
                                                                 (Figure 11) and a data payload in column-oriented for-
                                                                 mat. The column data representation permits using spe-
                                                                 cific compression techniques (e.g., delta encoding com-
                                                                 bined with RLE compression), which is an important com-
                                                                 ponent of big data processing. Some forms of compression
                                                                 require additional data (like the compression dictionary),
                                                                 so we have introduced header-specific data (entries hdsize
            Figure 8: Main header of GFT file                    and hdata) to each COLD structure. The column-data pay-
                                                                 load is stored directly after the header as a vector of values
   The GFT main header (Figure 8) comprise global identi-        of given basic data type. The vector is properly aligned to
fication entries followed by the description of data streams     the size of the data type of its items.
stored in the file. Each stream is defined by its name, num-
ber of data fields in the stream record, and the data type of
                                                                 4.2   Implementation Details
the data values in the stream.
   The header can optionally contain a HOT index, which          We have implemented a prototype library for manipulation
can be used for random access to the HOT data structures.        with data in GFT files. The library has two interfaces (gft-
If the index is present, its offset from the beginning of the    reader and gft-writer), which are responsible for reading
file is stored in the idxoffs entry.                             and writing the data respectively. There is currently no
Contemplating Efficiency of the ROOT File Format for Data Intensive Simulations in Particle Physics                                   101


API for direct data manipulation, since the expected usage                   In the future work, we are going to integrate the back-
does not modify the data once they are generated.                         ground processing chain and thoroughly evaluate its ef-
   The metadata contained in the headers (main header,                    ficiency. We expect a significant speedup of the particle
HOT headers, and COLD headers) are sufficiently small,                    simulations and better utilization of the Belle II hardware
thus the API caches them for the entire time the file is                  infrastructure.
opened. The payload data are read and written in the gran-
ularity of HOT structures. A HOT structure usually takes
kilobytes or tens of kilobytes, thus the read/write opera-                Acknowledgment
tions have acceptable overhead.
   Current implementation uses standard synchronous I/O                   This paper was supported by Czech Science Foundation
of the operating system for reading and writing the file                  (GACR) projects P103/13/08195 and P103/14/14292P and
data, since it was easiest for implementation and our first               by SVV-2014-260100.
objective was to create a proof of concept only. In the fu-
ture implementations, we are planning to experiment with                  References
the asynchronous I/O and memory mapped files.
   Our preliminary experiments indicate that the reading                  [1] T. Kuhr, “Computing at Belle II,” Journal of Physics: Con-
overhead is below 10%, which is much more acceptable                          ference Series, vol. 396, 2012.
than 3000% overhead observed for the ROOT files.                          [2] Ko Enerugi Kasokuki Kenkyu Kiko (KEK) - High Energy
                                                                              Accelerator Research Organization. (2014) Belle II Project.
                                                                              [Online]. Available: http://belle2.kek.jp/
4.3    Integration in Belle Framework
                                                                          [3] R. Itoh, S. Lee, N.Katayama, S.Mineo, A.Moll, T.Kuhr,
The integration of the proposed format into the processing                    and M.Heck, “Implementation of Parallel Processing in the
chain of BASF2 is quite straightforward. We have identi-                      Basf2 Framework for Belle II,” Journal of Physics: Conf.
fied two places where the proposed format could be used.                      Ser., vol. 396, 2012.
   First, when the background are generated, they will be                 [4] D. Y. Kim, “The Software Library of the Coming Belle II
                                                                              Experiment and its Simulation Package,” in Proceedings of
directly written to the GFT format by the MCGenerator
                                                                              the 2013 IEEE Nuclear Science Symposium. IEEE, 2013.
module. The module has to be modified to use gft-writer
                                                                          [5] P. Kvasnicka, “Background mixing status and plans,” in 15th
instead of the original ROOT file API. Since the existing
                                                                              OM of the Belle II Collaboration, Blacksburgh. KEK, 2013.
pregenerated data for the Background Mixer are intended
                                                                          [6] R Brun et al. (2014) Root. [Online]. Available: http:
for the evaluation purposes only, they may be discarded
                                                                              //root.cern.ch/
and generated again (no conversion tool is required).
                                                                          [7] I. Antcheva et al, “ROOT — A C++ Framework for Petabyte
   Second, the Background Mixer has to be mod-
                                                                              Data Storage, Statistical Analysis and Visualization,” Com-
ified to use GFT format instead ROOT for-
                                                                              puter Physics Communications, vol. 182, no. 6, pp. 1384 –
mat.        The template <class SIMHITS> class                                1385, 2011.
Belle2::background::Generator has to be reimple-
mented using gft-reader. The performance improvement
of this modification should be twofold. The data stored
in the GFT format will be accessed almost sequentially
and read only once. Furthermore, an additional speedup
caused by non-interleaving access to the ROOT library
is expected, since the ROOT API will be used solely for
accessing the main particle stream.


5 Conclusion

The analysis of the Belle II Background Mixer data flows
showed that the underlying ROOT file format is unaccept-
ably inefficient for the purposes of combining data from
multiple files together. We have proposed a novel data for-
mat GTF which is more suitable for efficient data stream
processing and adopts some design features of high perfor-
mance data formats. At present, the reader and writer APIs
of the proposed GFT format are implemented in a form of
library and our preliminary performance experiments in-
dicate, that the new format has much lower overhead then
ROOT format.