<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SIMULATION OF DATA PROCESSING FOR THE BM@N EXPERIMENT OF THE NICA COMPLEX</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>D. Priakhina</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>V. Korenkov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>K. Gertsenberger</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>V. Trofimov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Daria Priakhina</institution>
          ,
          <addr-line>Vladimir Korenkov, Konstantin Gertsenberger, Vladimir Trofimov</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dubna State University</institution>
          ,
          <addr-line>Russia, Moscow region, Dubna, 141980, 19 Universitetskaya</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Joint Institute for Nuclear Research</institution>
          ,
          <addr-line>Russia, Moscow region, Dubna, 141980, 6 Joliot-Curie</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Plekhanov Russian University of Economics</institution>
          ,
          <addr-line>Russia, Moscow, 115093, 36 Stremyanny per</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <fpage>5</fpage>
      <lpage>9</lpage>
      <abstract>
        <p>The paper considers the application of a software complex for data processing simulation in computing systems. The BM@N experiment data storage and processing system of the NICA complex is used as a simulated infrastructure. Simulation is performed in order to obtain recommendations on the organization of experimental data processing during a session with available allocated resources. The paper presents the results of simulating the processing of data that will be received during the BM@N experiment session, with several scenarios for distributing job flows across existing processing centers. In addition, some recommendations for organizing experimental data processing are proposed. The status of the work and future plans for the development of the software complex are formulated.</p>
      </abstract>
      <kwd-group>
        <kwd>simulation</kwd>
        <kwd>data center</kwd>
        <kwd>BM@N experiment</kwd>
        <kwd>NICA complex</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        A software complex for simulating the processing of data coming from the experimental
facility of the NICA complex is being developed at the Meshcheryakov Laboratory of Information
Technologies of the Joint Institute for Nuclear Research [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The software complex allows finding out
how the data storage and processing system will work with the available computing power, as well as
calculating the load on computing farms and communication links with the specified parameters of
data flows and job flows. Unlike the previously developed simulation program [
        <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
        ] the software
complex consists of a database, a module for setting the simulated structure and equipment
configurations, a stable core for the simulation of data transmission and processing, a module for
presenting results in the form of graphs. The simulation core is implemented on top of an approach
based on the representation of information processes as byte streams [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Currently, work on the data processing simulation of the BM@N experiment is in progress
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The simulation of the facility session is performed for various hardware configurations, data flow
and job flow parameters, scenarios for running data processing jobs [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The simulation results
presented in this article made it possible to draw conclusions that are the base of some formulated
recommendations for organizing the process of experimental data processing during the BM@N
experiment session with available allocated resources.
      </p>
      <p>Requirements for the improvement of the software complex are formed in the process of its
using. The main aim of the improvement is its further application for data processing simulation in any
data storage and processing centers.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Simulation input parameters</title>
      <p>Before starting to simulate data processing, it is necessary to determine the structure of the
data storage and processing center, the configurations of various equipment, as well as the
characteristics of data flows and job flows. All these parameters are input data for simulation.</p>
      <sec id="sec-2-1">
        <title>2.1 Distributed computing system for storing and processing data</title>
        <p>The simulated distributed data storage and processing system with the equipment parameters
is shown in Figure 1.</p>
        <p>The system consists of two centers and includes several levels of data storage. The data of
particle collision events with a fixed target (raw data), which are selected by the trigger (Trigger
BM@N), is first written to the Data reception buffer, then it arrives at the Intermediate data storage
level, after which the full volume of raw data is sent to the local buffers (EOS LHEP and EOS LIT) of
the respective data processing centers. All storage devices have a limited volume, and data</p>
        <p>Class
RawToDigit
DigitToDst
GenToSim
SimToDst</p>
        <p>DstToAna
3. Simulation
transmission channels have a limited bandwidth. Data processing jobs are executed on the compute
nodes of the above processing centers (NCX LHEP, T2 LIT and Supercomputer). Data processing jobs
arrive at the compute nodes and are performed if there are free slots, i.e. processor cores. If all slots
are occupied, the job goes to the queue and waits until some slot is free. Each job needs a certain
amount of data to execute. Data is transferred from the storage to the compute node before the job
starts.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2 BM@N Run workload and job stream</title>
        <p>A 30-day session of the BM@N experiment is considered in the article. It is assumed that the
session will last continuously, except for minute breaks between runs. Each run lasts 2 minutes.
Experimental data is written to files during each run. The recording is carried out continuously up to
the maximum size of each file (35 GB). The speed of data selection by the BM@N trigger is 10 000
events per second. The average size of a single event is about 0.2 MB. The maximum planned volume
of experimental data per session, taking into account the described characteristics, will be
approximately 350 TB.</p>
        <p>Raw binary data, which is received from the installation, is subject to processing, namely,
digitization (the final format is digit) and conversion to reconstruction data (DST format), after which
physical analysis of experimental data is performed. In addition to processing experimental data, it is
planned to process simulation data (sim), which will also be converted into reconstruction data (DST)
for subsequent physical analysis.</p>
        <p>There are several classes of jobs in accordance with the described process of data processing.
The characteristics of classes are described in Table 1. Job flows for the simulation of data processing
were formed on the basis of the data from the table.</p>
        <p>Avg. event
processing
time on one
processor (ms)</p>
        <p>In the example considered below, simulation is performed in order to provide
recommendations on the organization of experimental data processing during the session with
available allocated resources [fig. 1]. First of all, there were simulated three scenarios, where the data
processing jobs described in Table 1 are distributed in different ways across the compute nodes
[tab.2].</p>
        <p>The results of simulating three scenarios showed that one of the computing resources could be
fully allocated for processing simulated data up to reconstruction data (GenToSim and SimToDst
jobs). As for the processing of experimental data, at best only 20% of raw data can be processed to
digit data, and less than 10% of jobs on processing data to reconstruction data will have time to
complete before the end of the session, which lasts 720 hours. Figure 2 shows graphs reflecting the
number of completed experimental data processing jobs (RawToDigit and DigitToDst jobs) on
computing resources in accordance with Scenario 2. Figure 2 does not show graphs of completing
DstToAna jobs since the number of completed jobs is too small and is approximately 1%. This is due
to the full load of all computing resources.</p>
        <p>Surely, the result is unsatisfactory. Such a small amount of experimental data can be processed
by the end of the session. After the end of the session, one will have to wait a few more months before
the end of processing all the data acquired from the experiment. It may be noticed that a large amount
of time is required to fully process a single data file for RawToDigit and DigitToDst jobs [tab. 1]. In
this regard, it was decided to simulate a scenario similar to Scenario 2, in which express file
processing will be added. Express processing will be run simultaneously with full processing. Express
processing consists in processing 1% of the file in RawToDigit jobs to obtain preliminary results.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Conclusion</title>
      <p>A software complex to simulate the processing of data, which comes from the experimental
facility of the NICA complex, is being developed at MLIT JINR. The presented simulation results of
the data processing of the BM@N experiment showed that with the described parameters of
equipment, data flows and jobs flows, no more than 20% of raw data could be processed by the end of
the session, lasting 30 days. It is also proposed to perform express data processing simultaneously with
full processing, which will allow obtaining preliminary results every hour throughout the experiment.
It is possible to conclude that the measurements obtained are correct.</p>
      <p>Requirements for the improvement of the software complex are formed in the process of its
using. The main aim of the improvement is its further application for data processing simulation in any
data storage and processing centers. Thus, at the next stage, it is planned to develop a module for
launching jobs similar to the pilot, as well as to conduct computational experiments taking into
account the fact that the equipment does not have absolute reliability, i.e. to calculate the probability
of equipment failure and recovery time.</p>
    </sec>
    <sec id="sec-4">
      <title>5. Acknowledgement</title>
      <p>This work supported by JINR grant for young scientists № 21-602-02.</p>
      <p>The authors express their gratitude to Doctor of Physical and Mathematical Sciences,
Professor, main researcher of the Meshcheryakov Laboratory of Information Technologies, Gennady
Ososkov for his assistance in the work, valuable advice and fruitful discussions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Kekelidze</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kovalenko</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lednicky</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matveev</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meshkov</surname>
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sorin</surname>
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Trubnikov</surname>
            <given-names>G.</given-names>
          </string-name>
          <article-title>Status of the NICA project</article-title>
          at JINR // EPJ Web Conf.
          <year>2017</year>
          . V. 138. P.
          <volume>01027</volume>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Korenkov</surname>
            <given-names>V.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nechaevskiy</surname>
            <given-names>A.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ososkov</surname>
            <given-names>G.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pryahina</surname>
            <given-names>D.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trofomov</surname>
            <given-names>V.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uzhinskiy</surname>
            <given-names>A.V.</given-names>
          </string-name>
          <article-title>Simulation concept of NICA-MPD-SPD Tier0-Tier1 computing facilities //</article-title>
          <source>Physics of Particles and Nuclei Letters</source>
          ,
          <year>2016</year>
          ,
          <volume>13</volume>
          (
          <issue>5</issue>
          ), pp.
          <fpage>693</fpage>
          -
          <lpage>699</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Nechaevskiy</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ososkov</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pryahina</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trofimov</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>WD</given-names>
          </string-name>
          .
          <article-title>Simulation approach for improving the computing network topology and performance of the China IHEP Data Center // 23rd international conference on computing in high energy</article-title>
          and
          <source>nuclear physics (CHEP</source>
          <year>2018</year>
          ),
          <source>EPJ Web of Conferences</source>
          ,
          <year>2019</year>
          ,
          <volume>214</volume>
          (
          <issue>08018</issue>
          ), DOI: 10.1051/epjconf/201921408018
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Korenkov</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nechaevskiy</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ososkov</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Priakhina</surname>
            <given-names>D.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Trofimov</surname>
            <given-names>V.</given-names>
          </string-name>
          <article-title>A Probabilistic Approach to the</article-title>
          <source>Simulation of Data Processing Centers // EPJ Web Conf</source>
          .
          <year>2020</year>
          . V. 226. P.
          <volume>03012</volume>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Kapishin</surname>
            <given-names>M</given-names>
          </string-name>
          .
          <article-title>Studies of baryonic matter at the BM@N experiment</article-title>
          (JINR) // Nuclear Physics A.
          <year>2019</year>
          . V. 982. P.
          <volume>967</volume>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Priakhina</surname>
            <given-names>D.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trofimov</surname>
            <given-names>V.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ososkov</surname>
            <given-names>G.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gertsenberger K</surname>
          </string-name>
          .V.
          <article-title>Data center simulation for the BM@N experiment of the NICA project /</article-title>
          / AIP Conference Proceedings
          <volume>2377</volume>
          ,
          <issue>040007</issue>
          (
          <year>2021</year>
          ), https://doi.org/10.1063/5.0063338
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>