<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Series</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Contemplating Efficiency of the ROOT File Format for Data Intensive Simulations in Particle Physics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Filip Zavoral</string-name>
          <email>zavoral@ksi.mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jakub Yaghob</string-name>
          <email>yaghob@ksi.mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Bednárek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Kruliš</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Parallel Architectures/Algorithms/Applications Research Group Faculty of Mathematics and Physics, Charles University in Prague Malostranské nám.</institution>
          <addr-line>25, Prague</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <volume>1214</volume>
      <fpage>97</fpage>
      <lpage>101</lpage>
      <abstract>
        <p>A vast majority of nuclear and particle physicists in the world are currently using the ROOT framework (developed in CERN) as a software platform for simulations and data evaluations. Some of the simulations and experiments performed in this domain are very data intensive and they need to be designed with particular emphasis on the processing performance. We have analyzed the efficiency of the Background Mixer component employed in particle simulations of the Belle II project and identified, that the ROOT file format is unacceptably inefficient. The structure of the file and the API presented in ROOT framework prevents efficient data retrieval for parallel processing. We propose a data format which overcomes these limitations and which is much more suitable for high performance computing. We also demonstrate how to integrate this novel format into processing pipeline of the Belle II experiments and present its initial evaluation.</p>
      </abstract>
      <kwd-group>
        <kwd>particle physics</kwd>
        <kwd>experiments</kwd>
        <kwd>ROOT</kwd>
        <kwd>data format</kwd>
        <kwd>HPC</kwd>
        <kwd>Belle II</kwd>
        <kwd>performance</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Many empirical sciences, especially physics, have been
increasingly dependent on the computer science support. In
this paper, we focus on a particular problem that have been
observed by the particle physicists working on Belle II
experiment [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Our objective was to improve performance
of a Background Mixer component, which is used for
particle simulations related to the experiment. We have
detected that the bottleneck was the ROOT file format, which
is being used for storing the experimental and simulated
data, and its API. In this paper, we propose a novel
format, which is expected to improve the performance of the
Background Mixer module significantly.
      </p>
      <p>The paper is organized as follows. Section 2 briefly
introduces the Belle II experiment, its hardware and software
support, and the related problems of the data processing.
Section 3 addresses the performance issues of accessing
objects in the ROOT files. Our new data format
(including implementation and integration details) is proposed in
Section 4 and Section 5 concludes our work.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Technical Background</title>
      <sec id="sec-2-1">
        <title>Belle II and Basf2</title>
        <p>
          Belle II [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] is a project headed by Ko Enerugi Kasukoki
Kenkyu Kiko (KEK) centre in Tsukuba, Japan, that
studies the physical process called Charge Parity violation
(CP violation). Such process is responsible for breaking
the matter-antimatter symmetry, which causes the
dominance of matter over antimatter in our universe. The
particular task of the project is to search for new sources of
CP-violation that could generate the observed asymmetry.
        </p>
        <p>
          Since the data flow generated by the KEK particle
detector is enormous (up to 1.8 GB/s), a distributed
computing model comprising high-performance grid sites located
all over the globe has been adopted [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] (Figure 1) in order
to ensure better scalability, redundancy, and sharing
hardware resources of experiment participants.
        </p>
        <p>
          A software framework BASF2 (Belle II Analysis
Framework) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] is being developed for the experimental
and data evaluation purposes of the Belle II experiment.
It is designed to massively process the data using parallel
event processing on the multi-core CPUs. The software
is written in C++ to meet the requirements of high
performance processing and to interface well with other tools
used in particle physics. Python scripting language is used
for the configuration scripts.
        </p>
        <p>
          BASF2 is based on a software bus architecture. A large
scale application is designed as a set of pluggable
modules [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], each of which serves as a building block of the
application. The modules are assembled in an execution
pipeline (which the framework denotes path), where
output of each module is interconnected with input of the
subsequent module. The data are passed between the modules
as ROOT objects (described in Section 3).
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Background Mixer</title>
        <p>
          One of the essential parts of the experiment is an
accurate simulation of the background signal. The
background is generated by Monte Carlo method and the
produced dataset is mixed with other simulated events. The
BASF2 module responsible for this task is called
Background Mixer [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>The presimulated background is added as SimHits
(simulated events on particular detectors) to the already
existing SimHits from the main signal. Both signals are
then mixed together, giving a realistic result. A
simplified scheme of the background mixing process is depicted
in Figure 2.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Structure of Data Processing</title>
        <p>Each BASF2 module has standardized component
interface which is mainly responsible for controlling the input
and output data streams. When the module path is
executed, all relevant datasets are initialized, i.e., their ROOT
objects are fetched in the main memory for onward
processing.</p>
        <p>The input data for the Background Mixer are generated
by three module types: Generator, Component, and
Detector. The Generator and Component generate main
particles and background particles respectively. The
Detector module simulates the particle detector and generates
SimHit events as they would have been generated by the
real hardware. Various types of generators, components,
and detectors exist and the Background Mixer input ROOT
file keeps the types of the modules that were used for
generating the data.</p>
        <p>The Background Mixer input file contains three main
data streams – mcParticles, mcSimHits, and mcPartRels.
The mcParticles stream represents the particles generated
by Generator and Component modules. The mcSimHits
stream holds the SimHits produced by the Detector. The
msPartRels stream provides the n-to-m relation between
particles and SimHits (i.e., which SimHits were produced
by which particles). The data file usually holds thousands
of SimHits per each particle in the stream.</p>
        <p>When the Background Mixer is initialized, it receives
a list of generated files with background data. Each file
was generated with a different combination of
Generator, Component, and Detector modules. The Background
Mixer then works as a filter of the main event stream. For
each particle event in the main stream it loads a frame of
particles and SimHits from each of the input files and adds
them to the main stream.
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>ROOT File Format</title>
        <p>
          ROOT [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] is an object-oriented C++ framework conceived
in the high-energy physics community. It was designed
for storing and analyzing large volumes of data. It uses
a proprietary ROOT file format for data representation.
Basically any C++ class/structure can be serialized into
a ROOT file in a machine-independent compressed binary
representation.
        </p>
        <p>Internally, the ROOT files are organized asTTrees -
containers optimized for I/O and memory usage. A TTree
consists of branches (as depicted in Figure 4 and Figure 3)
which can contain complete serialized object of a specified
class or which can be split up into sub-branches
containing individual data members of the original object. The
splitting can be done recursively. Branch-based storage is
an example of column-wise (vertical) storage.</p>
        <p>
          Much more detailed description of the ROOT file format
can be found in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>ROOT Format Efficiency</title>
      <p>The ROOT file format and associated ROOT libraries have
exhibited poor I/O performance in some situations,
especially in case of the Background Mixer component.
Figure 5 shows how a ROOT file is accessed when its
logical structure is read sequentially. The file consists of
216 logical entries which were read in the same order as
when they were written. The vertical axis represents the
progress of the processing of the logical entries. The
horizontal axis represents the data in the binary file image
(approx. 1.8 MB large). The graph marks, which portions
of the binary image were accessed (I/O operations) when
each logical entry was processed.</p>
      <p>The figure indicates that the processing of the very first
entry induces access to approximately 75% of the file data.
However, the processing of subsequent 70 entries were
satisfied from this data cached in memory. Some of the
following entries induced additional reading of the file,
including minor portions that were already read.
Nevertheless, the amount of repetitive read operations is almost
negligible and some parts of the file were never accessed.</p>
      <p>Figure 6 displays the effect of random access to the
logical entries of the same file. In this case, entries were
processed in randomly permuted order. Almost every event
induced significant access to the file. The accessed areas
have similar size as in the sequential case; however, there
is almost no effect of caching. Consequently, the total disk
traffic was 53 MB (i.e. about 30 times the size of the file),
which significantly exceeds the 1.7 MB, which were read
in the sequential case.</p>
      <p>Unfortunately, a random access pattern is currently
employed by the Background Mixer since it is mixing data
from several independent files. The ROOT API permits
the user to handle multiple files at once, but internal
limitations and function invariants prevent efficient data caching.
According to reports generated by the cluster task
management framework, the computational workload uses only
25% of the assigned CPU, which is unacceptably low.</p>
      <p>The cost of data processing is a substantial part of the
Belle II project expenses. Significant improvement of the
efficiency would enable more extensive and influential
experiments.
0
5
0
5
1
0
0
2
0
0
5
0
5
1
0
0
2
0
500000
1000000</p>
      <p>1500000
file position
The ROOT format is tightly coupled with many essential
parts of the ROOT framework, so it would be quite
impractical to significantly alter its structure or API. To solve the
problems observed in the Background Mixer, we propose
a novel GFT data format, where GFT stands for GraFT as
we have grafted this new format from the ROOT format.
It is designed specifically to preserve all the benefits of the
ROOT format and to optimize data I/O operations in many
scenarios, especially in case of the Ground Mixer.
4.1</p>
      <sec id="sec-3-1">
        <title>Format Description</title>
        <p>The original logical structure of a ROOT file comprised
only three data streams (Section 2.3). We have decided to
make the format more general, so it can possibly contain
arbitrary number of data streams.</p>
        <p>Since we are describing binary file format, technical
details regarding representation of atomic values have to be
observed. All numbers are stored in IEEE 754
representation with little-endian byte ordering. Strings are stored
as zero-terminated sequences of 8-bit chars (C string
convention). In the remainder of this paper we denote basic
integer types as iXX (signed) and uXX (unsigned), where
XX is the number of bits (e.g., u64 stands for 64-bit
unsigned integer).</p>
        <p>One GFT file is produced by a combination of three
modules: Generator, Component, and Detector (as
described in Section 2.3). It is possible to have multiple files
generated with the same combination of these three
modules, but data in one file has to be generated by a fixed
combination.</p>
        <p>A GFT file (Figure 7) contains one main header and
a sequence of HOT (Hierarchial Object Table) data
structures. Each HOT represents one set (originally called
frame) of data, which is mixed with one event from the
main stream.</p>
        <p>The GFT main header (Figure 8) comprise global
identification entries followed by the description of data streams
stored in the file. Each stream is defined by its name,
number of data fields in the stream record, and the data type of
the data values in the stream.</p>
        <p>The header can optionally contain a HOT index, which
can be used for random access to the HOT data structures.
If the index is present, its offset from the beginning of the
file is stored in theidxoffs entry.</p>
        <p>Each HOT data structure (Figure 9) contains data for
one frame. The data from the streams are restructured into
a column-oriented format, which is much more suitable for
modern computing architectures including GPGPU. Each
data column is stored in a COLD (COLumn Data)
structure and represents one data field of one stream.</p>
        <p>The header of the HOT structure (Figure 10) contains
only the number of records in each data stream and an
index to the COLD data structures. The index is
implemented as a sequence of file offsets, which are relative to
the beginning of the HOT data structure.</p>
        <p>The COLD data structure comprises a COLD header
(Figure 11) and a data payload in column-oriented
format. The column data representation permits using
specific compression techniques (e.g., delta encoding
combined with RLE compression), which is an important
component of big data processing. Some forms of compression
require additional data (like the compression dictionary),
so we have introduced header-specific data (entrieshdsize
and hdata) to each COLD structure. The column-data
payload is stored directly after the header as a vector of values
of given basic data type. The vector is properly aligned to
the size of the data type of its items.
4.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Implementation Details</title>
        <p>We have implemented a prototype library for manipulation
with data in GFT files. The library has two interfaces
(gftreader and gft-writer), which are responsible for reading
and writing the data respectively. There is currently no
API for direct data manipulation, since the expected usage
does not modify the data once they are generated.</p>
        <p>The metadata contained in the headers (main header,
HOT headers, and COLD headers) are sufficiently small,
thus the API caches them for the entire time the file is
opened. The payload data are read and written in the
granularity of HOT structures. A HOT structure usually takes
kilobytes or tens of kilobytes, thus the read/write
operations have acceptable overhead.</p>
        <p>Current implementation uses standard synchronous I/O
of the operating system for reading and writing the file
data, since it was easiest for implementation and our first
objective was to create a proof of concept only. In the
future implementations, we are planning to experiment with
the asynchronous I/O and memory mapped files.</p>
        <p>Our preliminary experiments indicate that the reading
overhead is below 10%, which is much more acceptable
than 3000% overhead observed for the ROOT files.</p>
      </sec>
      <sec id="sec-3-3">
        <title>Integration in Belle Framework</title>
        <p>The integration of the proposed format into the processing
chain of BASF2 is quite straightforward. We have
identified two places where the proposed format could be used.</p>
        <p>First, when the background are generated, they will be
directly written to the GFT format by the MCGenerator
module. The module has to be modified to use gft-writer
instead of the original ROOT file API. Since the existing
pregenerated data for the Background Mixer are intended
for the evaluation purposes only, they may be discarded
and generated again (no conversion tool is required).</p>
        <p>Second, the Background Mixer has to be
modified to use GFT format instead ROOT
format. The template &lt;class SIMHITS&gt; class
Belle2::background::Generator has to be
reimplemented using gft-reader. The performance improvement
of this modification should be twofold. The data stored
in the GFT format will be accessed almost sequentially
and read only once. Furthermore, an additional speedup
caused by non-interleaving access to the ROOT library
is expected, since the ROOT API will be used solely for
accessing the main particle stream.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>The analysis of the Belle II Background Mixer data flows
showed that the underlying ROOT file format is
unacceptably inefficient for the purposes of combining data from
multiple files together. We have proposed a novel data
format GTF which is more suitable for efficient data stream
processing and adopts some design features of high
performance data formats. At present, the reader and writer APIs
of the proposed GFT format are implemented in a form of
library and our preliminary performance experiments
indicate, that the new format has much lower overhead then
ROOT format.</p>
      <p>In the future work, we are going to integrate the
background processing chain and thoroughly evaluate its
efficiency. We expect a significant speedup of the particle
simulations and better utilization of the Belle II hardware
infrastructure.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgment</title>
      <p>This paper was supported by Czech Science Foundation
(GACR) projects P103/13/08195 and P103/14/14292P and
by SVV-2014-260100.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kuhr</surname>
          </string-name>
          , “Computing at
          <string-name>
            <surname>Belle</surname>
            <given-names>II</given-names>
          </string-name>
          ,
          <source>” Journal of Physics: Conference Series</source>
          , vol.
          <volume>396</volume>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Ko</given-names>
            <surname>Enerugi Kasokuki Kenkyu Kiko (KEK) - High Energy</surname>
          </string-name>
          Accelerator Research Organization. (
          <year>2014</year>
          )
          <article-title>Belle II Project</article-title>
          . [Online]. Available: http://belle2.kek.jp/
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Itoh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Katayama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mineo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moll</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kuhr</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Heck</surname>
          </string-name>
          , “
          <article-title>Implementation of Parallel Processing in the Basf2 Framework for Belle II,”</article-title>
          <source>Journal of Physics: Conf. Ser.</source>
          , vol.
          <volume>396</volume>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D. Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          , “
          <article-title>The Software Library of the Coming Belle II Experiment and</article-title>
          its Simulation Package,”
          <source>in Proceedings of the 2013 IEEE Nuclear Science Symposium. IEEE</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kvasnicka</surname>
          </string-name>
          , “
          <article-title>Background mixing status and plans,” in 15th OM of the Belle II Collaboration, Blacksburgh</article-title>
          . KEK,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R</given-names>
            <surname>Brun</surname>
          </string-name>
          et al. (
          <year>2014</year>
          )
          <article-title>Root</article-title>
          . [Online]. Available: http: //root.cern.ch/
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>I. Antcheva</given-names>
            et al, “
            <surname>ROOT - A C+</surname>
          </string-name>
          <article-title>+ Framework for Petabyte Data Storage, Statistical Analysis</article-title>
          and Visualization,”
          <source>Computer Physics Communications</source>
          , vol.
          <volume>182</volume>
          , no.
          <issue>6</issue>
          , pp.
          <fpage>1384</fpage>
          -
          <lpage>1385</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>