<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MULTI-GPU TRAINING AND PARALLEL CPU COMPUTING FOR THE MACHINE LEARNING EXPERIMENTS USING ARIADNE LIBRARY</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>P. Goncharov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. Nikolskaia</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>G. Ososkov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>E. Rezvaya</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>D. Rusov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>E. Shchavelev</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Joint Institute for Nuclear Research</institution>
          ,
          <addr-line>6 Joliot-Curie street, 141980, Dubna, Moscow region</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Pavel Goncharov</institution>
          ,
          <addr-line>Anastasiia Nikolskaia, Gennady Ososkov, Ekaterina Rezvaya, Daniil Rusov, Egor Shchavelev</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Saint Petersburg State University</institution>
          ,
          <addr-line>7-9 Universitetskaya emb., Saint Petersburg, 199034</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <fpage>5</fpage>
      <lpage>9</lpage>
      <abstract>
        <p>Modern machine learning (ML) tasks and neural network (NN) architectures require huge amounts of GPU computational facilities and demand high CPU parallelization for data preprocessing. At the same time, the Ariadne library, which aims to solve complex high-energy physics tracking tasks with the help of deep neural networks, lacks multi-GPU training and efficient parallel data preprocessing on the CPU. In our work, we present our approach for the Multi-GPU training in the Ariadne library. We will present efficient data-caching, parallel CPU data preprocessing, generic ML experiment setup for prototyping, training, and inference deep neural network models. Results in terms of speed-up and performance for the existing neural network approaches are presented with the help of GOVORUN computing resources.</p>
      </abstract>
      <kwd-group>
        <kwd>Machine Learning</kwd>
        <kwd>Tracking</kwd>
        <kwd>Python library</kwd>
        <kwd>CPU optimizations</kwd>
        <kwd>GPU optimizations</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Motivation</title>
      <p>
        Modern high-energy physics (HEP) experiments produce large amounts of data and require
specific computer software to operate. Particle tracking is an important part of software of HEP
experiments and there are many algorithms for performing such tasks and one of the most well-proven
tracking approach is based on Kalman filter. Unfortunately, it does not scale sufficiently to perform
efficient computations on modern hardware such as graphics processing units (GPU). At the same
time, studies [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ] indicate that machine learning (ML) and deep neural networks (NN) can be an
efficient replacement for the well-known tracking algorithms. Their authors achieve competitive
results in terms of track reconstruction accuracy, and they are orders of magnitude faster in terms of
processing speed. Modern ML approaches are mostly developed in the Python programming language
and use specific tensor-based libraries to implement NN models and deploy them to the GPU.
Considering the novelty of the ML tracking there are no generally known Python library which goal is
to study deep learning in HEP tracking tasks. Considering all the above mentioned we decided to start
the development of the Ariadne [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] library – the first Python open-source library for particle tracking
based on deep learning methods. The goal of Ariadne is to help researchers investigate their ML-based
tracking methods with a simple but standardized setup. Ariadne is still in development but has already
provided great benefits for our tasks. The initial Ariadne description and motivation one can find
in.[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Current state of Ariadne</title>
      <p>Current Ariadne application programming interface (API) from the researcher point of view is
shown in Figure 1.</p>
      <p>For an experimental run, researcher implements such following components, as preprocessor,
model, and dataset, while then he can override already implemented components such as parse,
transforms, criterion and optimizer.</p>
      <p>After the implementation, the user should run the ‘prepare’ phase which computes needed
preprocessing steps. Initial data processing steps are shown in Figure 2a. Later, one can train his NN
model with the preprocessed data.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Caching and Multi-CPU prepare</title>
      <p>
        After the previous work [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] there were already 5 different NN approaches developed with the
help of Ariadne. Every approach shares the common library API but implements its own
preprocessing and training components. During the implementation and investigation of a potential
approach researchers often run parsing, preprocessing, and training phases sequentially one-by-one in
a single Python process. So, for example, after any change in preprocessing algorithms all training
data (which can occupy hundreds of gigabytes of disk space) must be recomputed from scratch.
Running such scripts as a single-process Python is a huge time-consumer and cannot scale well with a
hardware computing facility. In this work we reimplemented ‘prepare’ core scripts with the help of
multi-processing. The comparison of old and new implementations is shown in Figure 2.
Implementation consists of 3 main parts:
● Caching module – realtime memorization of any processing unit (such as parsing, coordinate
transformations, and any other data mutation procedure)
● Multiprocessing of target preprocessing routine (a preprocessor is being run in worker pool in
parallel with the help of Python multiprocessing framework)
● Data serialization – with the help of HDF5 format [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] data could be efficiently read &amp; write to
the disk.
      </p>
      <p>(a)
(b)</p>
    </sec>
    <sec id="sec-4">
      <title>4. Batch bucketing and Multi-GPU training</title>
      <p>
        With the help of a new caching module, we implemented the batch bucketing routine. Batch
bucketing routine is a common algorithm[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for effective dataset data processing which allows placing
the NN input with the equal dimensions to the same training batch. Such routine can reasonably
speedup training time on a single GPU device and allow to use the batch sizes which would not fit in GPU
memory without such approach. We also enabled the Multi-GPU training with the help of PyTorch
Lightning [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] library. Now researchers can run their NN training on up to 8 GPUs in parallel which
also greatly reduces model training times.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Measured performance impact</title>
      <p>After applying new functionalities described above, we measured the typical researcher
workflows on 2 target hardware:
● Laptop (MacBook Pro 13) // Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz (8 cores)
● Hybrilit (JINR HOVORUN) // Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (80 cores)</p>
      <p>In Table 1, one can observe more than x25 event processing speed-up compared to the initial
implementation and up to 6 times faster ‘prepare’ phase. In Figure 3 one can observe a great NN
training speed improvement for the f1_score metric for the revised implementation compared to the
original. (for the same 1-hour training on the same data, the same GraphNet model converges much
more rapidly with the multi-GPU or batch bucketing training).</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>
        In our work, we successfully implemented a new ‘prepare’ module for Ariadne. The module
now can run in parallel utilizing all CPU cores on the target hardware which led up to 25x faster event
processing for the GraphNet NN model. For the ‘training’ module we enabled the multi-GPU training
and batch bucketing algorithm which greatly reduces training time for the existing NN model
implementation. Such results show great potential for the future implementations of the other NN
approaches within the Ariadne library – users can now utilize more hardware resources, therefore,
increasing processing capacity for more complex neural network models and preprocessing routines.
Source code is available at [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
    </sec>
    <sec id="sec-7">
      <title>7. Acknowledgment References</title>
      <p>The reported study was funded by RFBR according to the research project № 18-02-40101.</p>
      <p>
        The calculations were carried out on the basis of the HybriLIT heterogeneous computing
platform (LIT, JINR) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>Format,
version
5,
1997-NNNN.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Goncharov</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shchavelev</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ososkov</surname>
            <given-names>G.</given-names>
          </string-name>
          and
          <article-title>Baranov D. BM@N Tracking with Novel Deep Learning Methods // EPJ Web Conf</article-title>
          .,
          <volume>226</volume>
          (
          <year>2020</year>
          ) 03009 /DOI: https://doi.org/10.1051/epjconf/202022603009
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Farrell</surname>
            <given-names>S.</given-names>
          </string-name>
          et al., “
          <article-title>Novel deep learning methods for track reconstruction</article-title>
          ,
          <source>” in 4th International Workshop Connecting The Dots</source>
          <year>2018</year>
          (
          <article-title>CTD2018</article-title>
          ) Seattle, Washington, USA, March
          <volume>20</volume>
          -22,
          <year>2018</year>
          (
          <year>2018</year>
          ), arXiv:
          <year>1810</year>
          .06111 [hep-ex].
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Goncharov</surname>
            <given-names>P.</given-names>
          </string-name>
          et al.
          <article-title>Ariadne: PyTorch library for particle track reconstruction using deep learning / P</article-title>
          . Goncharov,
          <string-name>
            <given-names>E.</given-names>
            <surname>Schavelev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolskaya</surname>
          </string-name>
          , and G. Ososkov //AIP Conference Proceedings. - AIP
          <string-name>
            <surname>Publishing</surname>
            <given-names>LLC</given-names>
          </string-name>
          ,
          <year>2021</year>
          . - Vol.
          <volume>2377</volume>
          . - No. 1. - pp.
          <fpage>040004</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>[4] The HDF Group. Hierarchical https://www.hdfgroup.org/HDF5/.</mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Khomenko</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shyshkov</surname>
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radyvonenko</surname>
            <given-names>O.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Bokhan</surname>
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization</article-title>
          .
          <volume>10</volume>
          .1109/DSMP.
          <year>2016</year>
          .
          <volume>7583516</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Falcon</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , &amp;
          <article-title>The PyTorch Lightning team</article-title>
          . (
          <year>2019</year>
          ).
          <source>PyTorch Lightning (Version</source>
          <volume>1</volume>
          .4) [Computer software]. https://doi.org/10.5281/zenodo.3828935
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Ariadne</surname>
          </string-name>
          , Github: https://github.com/t3hseus/ariadne
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Adam</surname>
          </string-name>
          et al.,
          <source>CEUR Workshop Proc.</source>
          , Vol.
          <volume>2267</volume>
          ,
          <fpage>638</fpage>
          -
          <lpage>644</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>