=Paper= {{Paper |id=Vol-2507/321-325-paper-58 |storemode=property |title=GRID at JINR |pdfUrl=https://ceur-ws.org/Vol-2507/321-325-paper-58.pdf |volume=Vol-2507 |authors=Andrey Baginyan,Anton Balandin,Andrey Dolbilov,Alexey Golunov,Natalia Gromova,Ivan Kadochnikov,Ivan Kashunin,Vladimir Korenkov,Valery Mitsyn,Danila Oleynik,Igor Pelevanyuk,Artem Petrosyan,Sergei Shmatov,Tatiana Strizh,Alexey Vorontsov,Vladimir Trofomov,Nikolay Voytishin,Victor Zhiltsov }} ==GRID at JINR== https://ceur-ws.org/Vol-2507/321-325-paper-58.pdf
      Proceedings of the 27th International Symposium Nuclear Electronics and Computing (NEC’2019)
                         Budva, Becici, Montenegro, September 30 – October 4, 2019




                                       GRID AT JINR
A.S. Baginyan, A.I. Balandin, A.G. Dolbilov, A.O. Golunov, N.I. Gromova,
      I.S. Kadochnikov, I.A. Kashunin, V.V. Korenkov, V.V. Mitsyn,
D.A. Oleynik, I.S. Pelevanyuk, A.S. Petrosyan, S.V. Shmatov, T.A. Strizh,
       V.V. Trofimov, A.S. Vorontsov, N.N. Voytishin, V.E. Zhiltsov
                 Joint Institute for Nuclear Research, 6 Joliot-Curie, 141980, Dubna

                                        E-mail: strizh@jinr.ru


The JINR grid infrastructure is represented by the Tier1 centre for the CMS experiment at the LHC
and the Tier2 centre. The grid centre resources of JINR are part of the global grid infrastructure
WLCG (Worldwide LHC Computing Grid), developed for the LHC experiments. JINR LIT actively
participates in the WLCG global project. The work on the use of the grid infrastructure within the
WLCG project is carried out in cooperation with collaborations such as CMS, ATLAS, Alice and
major international centres, which operate as Tier1 centres of the CMS experiment (CH-CERN, DE-
KIT, ES-PIC, FR-CCIN2P3, IT-INFN-CNAF, US-FNAL-CMS) and as Tier2 grid centres located in
more than 170 computing centres of 42 countries worldwide. Since the beginning of 2015, a full-scale
WLCG Tier1 centre for the CMS experiment at the LHC has been operating in JINR LIT. The CMS
Tier1 centre at JINR has demonstrated stable work through the entire period since its launch into full
operation and is ranked second in its performance in the world Tier1 centres for CMS. The Tier2
centre supports a whole number of virtual organizations, particularly Alice, ATLAS, CMS, LHCb,
BES, BIOMED, СOMPASS, MPD, NOvA, STAR and others.

Keywords: WLCG, CMS, Tier1, Tier2



            Andrey Baginyan, Anton Balandin, Andrey Dolbilov, Alexey Golunov, Natalia Gromova,
             Ivan Kadochnikov, Ivan Kashunin, Vladimir Korenkov, Valery Mitsyn, Danila Oleynik,
              Igor Pelevanyuk, Artem Petrosyan, Sergei Shmatov, Tatiana Strizh, Alexey Vorontsov,
                                           Vladimir Trofomov, Nikolay Voytishin, Victor Zhiltsov



                                                           Copyright © 2019 for this paper by its authors.
                   Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).




                                                                                                     321
      Proceedings of the 27th International Symposium Nuclear Electronics and Computing (NEC’2019)
                         Budva, Becici, Montenegro, September 30 – October 4, 2019




1. Introduction
        The current state of computing in high-energy and nuclear physics over the past 15 years is
mainly determined by the distributed computing infrastructure created for processing and storing data
of the LHC experiments. Each of four experiments, i.e. ATLAS, CMS, Alice and LHCb, represents
not only a unique and sophisticated scientific tool, but also a collaboration, each comprising several
thousands of scientists from several hundreds of institutes located all around the world.




 Figure 1. Worldwide LHC computing Grid (WLCG) global infrastructure. The JINR Tier1 centre is
                                 marked with a white circle

         Within the framework of the WLCG infrastructure (Fig.1), about 1,000,000 processors
geographically distributed in more than 170 data centres from 42 countries are combined into a single
computing environment [1]. The resources of the JINR grid centres are part of this global WLCG
infrastructure. JINR LIT plays an active role in the WLCG project. Developments related to the use of
the grid infrastructure are carried out in cooperation with the CMS, ATLAS and Alice collaborations
and major international centres, which operate as Tier1 centres for the CMS experiment (CH-CERN,
DE-KIT, ES-PIC, FR-CCIN2P3, IT-INFN-CNAF, US-FNAL-CMS), as well as Tier2 level grid
centres. Since 2000, JINR has been actively involved in the creation of the grid segment in Russia and
its integration into the global grid infrastructure of the LCG (LHC Computing Grid) and EGEE
projects. During these years, the largest Tier2 centre in Russia was created at JINR for processing data
from all four experiments at the LHC. Since the beginning of 2015, a full-scale Tier1 WLCG centre
has been operating at JINR LIT for the CMS experiment at the LHC [2].


2. Network infrastructure
       A high-performance network infrastructure is crucial for the development of WLCG centres.
Last year the projects related to increasing the bandwidth of the Moscow-JINR telecommunication
channel to 3х100 Gb/s, installing and configuring the equipment of the Institute backbone computing
network to 2x100 Gb/s and the distributed computing cluster network between JINR facilities to
400 Gb/s were carried out at JINR. Thus, a new direct channel to CERN was put in operation. Now the
JINR external distributed network is represented by a direct JINR-CERN communication channel and




                                                                                                     322
      Proceedings of the 27th International Symposium Nuclear Electronics and Computing (NEC’2019)
                         Budva, Becici, Montenegro, September 30 – October 4, 2019



                                                   a backup channel passing through MMTS-9 in
                                                   Moscow and Amsterdam, ensuring the operation of
                                                   LHCOPN (JINR–CERN) for the communication of
                                                   the Tier0 (CERN) and Tier1 (JINR) centres and an
                                                   external LHCONE overlay network designed for the
                                                   JINR Tier2 centre, direct communication channels
                                                   using the RU-VRF technology with a collaboration
                                                   of RUHEP research centres, RUNNet and ReTN
                                                   networks (Fig.2).


                                                   3. JINR grid resources
                                                              The JINR grid infrastructure is represented
                                                     by the Tier1 centre for the CMS experiment at the
                                                     LHC and the Tier2 centre, which supports a whole
                                                     number of virtual organizations (VOs), particularly
                                                     Alice, ATLAS, CMS, LHCb, BES, BIOMED,
                                                     СOMPASS, MPD, NOvA, STAR and others.
                                                              The data processing system at JINR CMS
                                                     Tier1 consists of 400 64-bit machines: 2 x CPU, 6–
                                                     16 cores/CPU that forms altogether 10,688 cores for
    Figure 2. Logical structure of the JINR          batch processing and provides a performance of
               external network                      151.97 kHS06. The storage system contains disk
                                                     arrays and long-term data storage on tapes and is
supported by the dCache-5.2 and Enstore 4.2.2 software. The total usable capacity of disk servers is
10.4 PB; the IBM TS3500 tape robot is 11 PB. The evolution of the JINR CMS Tier1 main resource
parameters for the last three years is presented in Figure 3.




                Figure 3. Evolution of the JINR CMS Tier1 main resource parameters




                                                                                                     323
      Proceedings of the 27th International Symposium Nuclear Electronics and Computing (NEC’2019)
                         Budva, Becici, Montenegro, September 30 – October 4, 2019



         The Torque 4.2.10/Maui 3.3.2 software (custom-built) is used as a resource manager and a
task scheduler. The PhEDEx software is used as a tool for management of the CMS data placement.
The standard WLCG program stack such as 2 x CREAM, 4 x ARGUS, BDII top, BDII site, APEL
parsers, APEL publisher, EMI-UI, 220 x EMI-WN + gLExec-wn, 4 x FTS3, LFC, WMS, L & B, glite-
proxyrenewal is used for data processing. The service support system provides the operation of a
computing service, a data storage service, a grid service, a data transfer service (FTS - File Transfer
System), a distributed computing management system (Torque) and an information service
(monitoring of services, servers, storage, data transmission, information sites).
         The data processing system at JINR Tier2 consists of 239 64-bit machines: 2 x CPU, 6–16
cores/CPU that forms altogether 4,128 cores for batch processing and provides a performance of
55.489 kHS06. The storage system contains disk arrays and is supported by dCache-5.2 and EOS.
One of the dCache instances is used by CMS and ATLAS. The second dCache is used by JINR users
and user groups. Besides, this instance is used to store data of several third-party experiments
(BIOMED, BESIII, FUSION). One XROOTD installation is used by Alice. The total usable capacity
of disk servers is 2,789 TB for ATLAS, CMS and Alice, and 140 TB for other VOs. WLCG services
are the same as in Tier1, and the software stack is mostly the same too.


4. JINR Tier1 for CMS
         The CMS Tier1 centre at JINR has
demonstrated stable work through the entire
period since its launch into full operation [3,4].
Since the beginning of 2019, 284,762 pilot jobs
have been completed, which accounts for 18% of
the Sum CPU Work (HS06 hours) of CMS
experiment data processed by all Tier1 centres.
Figure 4 shows the contribution of the world first
level centres to processing of CMS experiment
data in 2019. The JINR centre is ranked second in
its performance in the world.
         Safekeeping of a proportional share of raw
and reconstructed data, large-scale reprocessing
and safekeeping of the corresponding output, the Figure 4. World Tier1 for CMS — Sum CPU
distribution of data to Tier2s and safekeeping of a Work (HS06 hours) by Tier1 Node and Year
share of simulated data produced at these Tier2s 2019
are among the most important tasks of the Tier1
centre. In 2019, 157 WLCG centres downloaded 3.5 PB of data from our tape robot and 140 centres
uploaded 0.969 PB to it. The largest data exchange occurred with FNAL (USA), CERN, KIT
(Germany), INFO-CHEF (Italy), CCIN2P3 (France) and RAL (UK). The Tier1 online disk storage
was also heavily used. Within the year, 16.8 PB of data was uploaded, and 36.4 PB was downloaded.

5. JINR Tier2
        Tier2 at JINR currently provides the majority of JINR users and user groups, as well as users
of virtual organizations (VOs) of the grid environment, with computing power, storage systems and
access to data. To support a batch processing system, a special server with a cluster resource allocation
system and a job scheduler, which allows processing data for all four experiments at the LHC (ALICE,
ATLAS, CMS, LHCb) and supports some virtual organizations working within the grid (BESIII,
BIOMED, COMPASS, FUSION, MPD, NOvA, STAR and others), was installed. The JINR Tier2




                                                                                                     324
      Proceedings of the 27th International Symposium Nuclear Electronics and Computing (NEC’2019)
                         Budva, Becici, Montenegro, September 30 – October 4, 2019



centre is the best one in the Russian Data Intensive Grid (RDIG) federation and provides 52% of the
total CPU work of RDIG. More than 3 million jobs were processed in 2019.

6. Integration of distributed resources
        In recent years, a number of works have been carried out at JINR LIT to apply grid
technologies for processing of non-LHC experiments data.
        A data processing management system for the COMPASS experiment on SPS has been
developed and is in operation. It provides automatic data processing and management on different
types of computing resources [5]. Currently, the computing resources available to the collaboration are
the CERN Batch Service, the JINR computing centre and the Frontera supercomputer of the Texas
supercomputer centre.
        The DIRAC Interware was used to enable the integration of distributed heterogeneous
computing and storage resources for NICA MPD experiment simulation. More than 50,000 of Monte-
Carlo simulation jobs of the NICA MPD experiment were performed at JINR Tier1 and Tier2 using
DIRAC.

7. Monitoring
         To ensure a correct and reliable operation of the computer complex equipment and provide
relevant timely information about its work, a special monitoring system has been designed [6]. This
system allows one to keep in check climate control and power supply systems, the local network
equipment, telecommunication links and computing nodes, running jobs, disk and tape storage
systems. An operation centre has been launched to allow real-time monitoring of the whole JINR grid
infrastructure. The number of nodes included in monitoring is more than 1,200. To ensure such
extensive monitoring, a cluster monitoring system based on the Icinga2 software is used. Visualization
is done with Grafana and NagVis.

8. Future plans
        In terms of hardware, a linear increase in the characteristics of Tier1 and Tier2, i.e. an increase
in computing resources of CMS Tier1 up to 350 kHS06 and of Tier2 up to 170 kHS06, an expansion
of the storage system of CMS Tier1 on disks up to 16 PB and of the storage system on tapes up to
42 PB, is planned for the next years.


References
[1] WLCG (The Worldwide LHC Computing Grid ): http://wlcg.web.cern.ch/LCG
[2] Baginyan A.S., Balandin A.I., Belov S.D. et al. The CMS TIER1 at JINR: five years of
operations, CEUR-WS.org/Vol-2267/1-10-paper-1.pdf, 2018.
[3] Astakhov N.S., Baginyan A.S., Belov S.D. et al. JINR Tier-1 centre for the CMS Experiment at
LHC. Particles and Nuclei, Letters, v.13,no 5, pp.1103-1107, 2016.
[4] Astakhov N.S., Baginyan A.S., Balandin A.I. et al., CEUR-WS.org/Vol-2023/68-74-paper-
10.pdf, 2017.
[5] Petrosyan          A.Sh.,        COMPASS            Production       System       Overview,
https://doi.org/10.1051/epjconf/201921403039, EPJ Web of Conf., Vol. 214, 2019.
[6] Baginyan A.S., Balashov N.A., Baranov A.V. et al. Multi-level monitoring system for
multifunctional information and computing complex at JINR, CEUR-WS.org/Vol-2023/226-233-
paper-36.pdf, 2017.




                                                                                                      325