<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ANALYSIS OF THE EFFECTIVENESS OF VARIOUS METHODS FOR PARALLELIZING DATA PROCESSING IMPLEMENTED IN THE ROOT PACKAGE</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>T.M. Solovjeva</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tatiana Solovjeva</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Joint Institute for Nuclear Research</institution>
          ,
          <addr-line>6 Joliot-Curie St., Dubna, Moscow Region,141980</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <fpage>5</fpage>
      <lpage>9</lpage>
      <abstract>
        <p>The ROOT software package is currently being upgraded in several ways to improve data processing performance. This paper will consider several tools implemented in this framework for calculations on modern heterogeneous computing architectures. PROOF (Parallel ROOT Package Extension) divides common work into small chunks, i.e. packets. The size of the first packet used for calibration, the minimum and maximum set size of the packet, the degree of data structuring affect the speed of their processing. When processing large amounts of data, the read and write speed can be crucial. The new asynchronous file merge feature in the TBufferMerger class allows writing data in parallel from multiple streams to a single output file. Our calculations show a good scalability of the macro execution time depending on the number of processor cores used.</p>
      </abstract>
      <kwd-group>
        <kwd>ROOT</kwd>
        <kwd>PROOF</kwd>
        <kwd>parallelization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The ROOT [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] software package plays a central role in the analysis of high-energy physics
data. At the dawn of its development, ROOT was formed as a single-threaded application. However
with an increase in the amount of data processed and the development of computing technology, it
became obvious that the framework needed modernization be able to take advantage of modern
computing architectures. At present, multiprocessor computing systems, which are a set of fairly
powerful computers with distributed memory and distributed control, are widespread. For their
effective use, it is necessary to take into account the peculiarities of data structuring and the sequence
of stages of their processing for each specific analysis. This article explores various parallelization
techniques implemented in the ROOT package to improve data processing performance. Performance
tests were carried out on the HybriLIT heterogeneous cluster [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] of the Meshcherykov Laboratory of
Information Technologies of the Joint Institute for Nuclear Research.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Data structuring for processing</title>
      <p>
        PROOF [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Parallel ROOT Facility, is a special ROOT tool that uses natural parallelism of
data structures located in files of a special format and providing direct access to any individual value.
In our previous work [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we compared the efficiency of data processing using PROOF and a special
class for threading, as well as the OpenMP technology. It showed the high performance of PROOF, if
the analysis contained a large number of mathematical operations and was performed on a large
amount of data. The present article will study the question of whether the PROOF performance
depends on the way the data file is organized, and also consider the possibilities of speeding up data
writing, provided this process is parallelized.
      </p>
      <p>The calculations were performed using the ROOT 6.13 and PROOF-Lite versions. For their
operation on the HybriLIT cluster, six CPU cores were reserved. PROOF was running at different
values of the number of workers. As long as the number of workers started does not exceed the
number of cores, one worker process completely occupied the processor core. If, when starting
PROOF, more workers than the number of reserved cores were called, then naturally; several workers
were sharing one processor core. The acceleration factor (the ratio of the execution time of a macro in
a non-parallel version and the time of a macro in a parallel version) was actively growing with an
increase in workers from two to six, then its value stabilized. When starting PROOF with more than
12 workers, the acceleration rate decreased smoothly. The data shown in the graphs was obtained with
workers equal to six.</p>
      <p>To represent data in ROOT, a special class TTree was developed, it was optimized to reduce
disk space and increase the data access rate. All variables are presented in the form of leaves, which
are combined into branches, and the collection of branches forms a tree. The tree storage approach is
good for parallel architectures since each branch can be read independently of the other. The tree can
have different structures. We were interested in the question of how the organization of data in a tree
affected the effectiveness of using PROOF. We considered a simple tree with a list of variables, a tree
with structures and a tree with class objects.</p>
      <p>All of our trees contained twelve variables structured in different ways. The simple tree had
twelve branches, where a floating-point variable acted as a leaf. In the second case, the tree had three
branches, the first two of which contained three leaves each, combined into structures, and the third
branch was a structure of six float variables. In the third case, the tree had one branch that contained
an object consisting of all twelve variables. The data was contained in ten files, each of which was
approximately 1.6 GB in size. We performed two analyzes. In the first of them, data was read from all
branches, they were processed and the result was written to a new file. In the second analysis, the data
of only two variables was read, and in the case of the tree with a structure, we considered two options
for reading, namely, in the first, two variables belonging to the same structure were read and
processed, and in the second case, the variables belonged to different structures. The processing
included calculations of various physical quantities commonly used in different types of physical
analysis.</p>
      <p>As seen from the graphs, when reading and processing all data using PROOF, the acceleration
factor has a maximum value if the data is organized into structures. If only individual columns of root
files are read and processed, then the maximum speedup factor is achieved when organizing the data
in the form of a tree containing objects.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Parallel reading and writing</title>
      <p>
        Different types of analysis have different ratios of time over which different processing steps
are performed. When processing large volumes, sometimes the time spent on reading or writing data
comes to the fore. New possibilities for parallelizing these processes are actively used in the
optimization of HEP software [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5-7</xref>
        ].
      </p>
      <p>A distinctive feature of ROOT, which made it easy to use, is its columnar data format. The
user has the ability to read the data of only those columns that are of interest to him. In the process of
reading, the data is unpacked and deserialized. Starting from version 6.08, functions that perform these
actions in parallel have been implemented in ROOT. The user only has to specify the number of
threads. All implementation details are hidden from the user, which is why this approach is called
Implicit Multithreading.</p>
      <p>For parallel writing, the TBufferMerger class, implemented in ROOT since version 6.10, is
used. This class implements the following scheme of actions. The user specifies how many streams
that write data to a single file to create. The generated worker threads divide the data into separate
buffers, in each of which the data is serialized and compressed. Therefore, these processes occur
independently in each buffer, so they can be performed in parallel. The buffers are merged before
closing the output file. In the original version, the merge had a separate output stream. Then a scheme ,
in which the merge is done by the worker threads themselves on demand, was implemented. This
made it possible to reduce the required computing resources.</p>
      <p>Our performance tests consisted of generating data, organizing it into a tree with twelve
branches, and then writing it to a file. Figure 2 on the left shows the dependence of the running time of
a macro on the number of worker threads and the number of generated events. The dependence of the
acceleration coefficient on the same factors is illustrated on the right.</p>
      <p>From the results presented in the graphs, it can be seen that the processing time decreases with
an increase in the number of created threads. When the number of worker threads exceeds the number
of cores in use, the reduction in macro runtime stops. The acceleration factor is proportional to the
number of available processor cores and does not depend on the number of generated events.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>Currently, parallel computing is the main reserve for increasing the speed of calculations,
therefore, the issues of parallelization of the ROOT package are of great importance. Modern
architectures of computing systems make it possible to use various methods of parallelizing the
processing of experimental data. The choice of the optimal method in each specific case significantly
reduces the task execution time. Our research has shown that when processing data using PROOF, it is
desirable to use the highest possible structuring of primary data. When working with the PROOF-Lite
version, it is reasonable to set the number of workers equal to the number of reserved cores for that
version. Writing to a file in parallel is easy to implement, and the results depend on the number of
cores used.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Acknowledgement References</title>
      <p>The author is grateful to his colleagues for a fruitful discussion of the issues, as well as to the
HybriLIT group of MLIT JINR for the resources and technical support provided.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Brun</surname>
            <given-names>R.</given-names>
          </string-name>
          , Rademakers F.
          <article-title>ROOT - An object oriented data analysis framework //Nuclear Instruments and</article-title>
          Methods in Physics Research A.
          <year>1997</year>
          . V. 389. P.
          <volume>81</volume>
          -
          <fpage>86</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Adam</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Korenkov</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Podgainy</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Streltsova</surname>
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strizh</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zrelov P. HybriLIT -</surname>
          </string-name>
          <article-title>The main component of the MICC for heterogeneous computations</article-title>
          at JINR// CEUR Workshop Proceedings (CEUR-WS.
          <source>org)</source>
          .
          <year>2017</year>
          . V.
          <year>2023</year>
          . P.
          <volume>351</volume>
          -
          <fpage>356</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Ballintijn</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roland</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brun</surname>
            <given-names>R.</given-names>
          </string-name>
          , Rademakers F.
          <article-title>The PROOF distributed parallel analysis framework based on ROOT// Computing in High Energy and</article-title>
          Nuclear Physics,
          <volume>24</volume>
          -
          <fpage>28</fpage>
          March 2003,
          <string-name>
            <given-names>La</given-names>
            <surname>Jolla</surname>
          </string-name>
          , California.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Solovjeva</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soloviev</surname>
            <given-names>A</given-names>
          </string-name>
          .
          <article-title>Comparative study of the effectiveness of PROOF with other parallelization methods implemented in the ROOT software package</article-title>
          // Computer Physics Communications.
          <year>2018</year>
          . V.233. P.
          <volume>41</volume>
          -
          <fpage>43</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Amadio</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bockelman</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Canal</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piparo</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tejedor</surname>
            <given-names>E.</given-names>
          </string-name>
          , Zhang Z.
          <article-title>Increasing Parallelism in the ROOT I/</article-title>
          O Subsystem // JoP: Conf. Series.
          <year>2018</year>
          . V.1085. P.
          <volume>032014</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Riley</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            <given-names>C.</given-names>
          </string-name>
          <article-title>Multi-threaded Output in CMS using ROOT// EPJ Web of Conferences</article-title>
          .
          <year>2019</year>
          . V. 214,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <year>02016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Amadio</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Canal</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guiraud</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piparo</surname>
            <given-names>D</given-names>
          </string-name>
          .
          <article-title>Writing ROOT Data in Parallel with TBufferMerger/</article-title>
          / EPJ Web of Conferences.
          <year>2019</year>
          . V. 214,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <year>05037</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>