=Paper= {{Paper |id=Vol-3041/280-284-paper-52 |storemode=property |title=JINR Distributed Information and Computing Environment: Participants, Features and Challenges |pdfUrl=https://ceur-ws.org/Vol-3041/280-284-paper-52.pdf |volume=Vol-3041 |authors=Nikita Balashov,Igor Kuprikov,Nikolay Kutovskiy,Alexandr Makhalkin,Yelena Mazhitova,Igor Pelevanyuk,Roman Semenov }} ==JINR Distributed Information and Computing Environment: Participants, Features and Challenges== https://ceur-ws.org/Vol-3041/280-284-paper-52.pdf
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021



     JINR DISTRIBUTED INFORMATION AND COMPUTING
       ENVIRONMENT: PARTICIPANTS, FEATURES AND
                      CHALLENGES
      N.A. Balashov1, I.S. Kuprikov2, N.A. Kutovskiy1 a, A.N. Makhalkin1,
              Ye. Mazhitova1,3, I.S. Pelevanyuk1, R.N. Semenov1,4
      1
          Meshcheryakov Laboratory of Information Technologies, Joint Institute for Nuclear
                                           Research,
                     6 Joliot-Curie, Dubna, Moscow region, 141980, Russia
 2
     Dubna State University, 19 Universitetskaya str., Dubna, Moscow region, 141980, Russia
            3
                Institute of Nuclear Physics, 050032, 1 Ibragimova str., Almaty, Kazakhstan
4
    Plekhanov Russian University of Economics, 36 Stremyanny per., Moscow, 117997, Russia

                                              E-mail: a kut@jinr.ru

The JINR Distributed Information and Computing Environment (DICE) was created to combine
resources for solving common scientific tasks, as well as to distribute peak loads across the resources
of partner organizations from the JINR Member States. To monitor the hardware resources and
services of the growing DICE infrastructure, a system based on Prometheus and Thanos was designed
and deployed. The collected metrics, including the geographical location of the JINR DICE
participants, are visualized using Grafana. Software distribution is carried out with the help of the
CERN Virtual Machines File System. All these topics, as well as challenges and possible overcomes,
are covered in detail.

Keywords: cloud computing, OpenNebula, cloud integration, DIRAC, ceph



                                 Nikita Balashov, Igor Kuprikov, Nikolay Kutovskiy, Alexandr Makhalkin,
                                                     Yelena Mazhitova, Igor Pelevanyuk, Roman Semenov



                                                                 Copyright © 2021 for this paper by its authors.
                        Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).




                                                       280
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021



1. Participants and resources
        Table 1 demonstrates the list of the JINR DICE participants (excluding the JINR cloud [1]), as
well as the resources provided.
                                                            Table 7. JINR DICE participants and resources
        Organization               Country      Status         Non-HT CPU         RAM, GB        Storage,
                                                                  cores                            TB
      Plekhanov Russian              RU        integrated          132               608           51.1
   University of Economics
     Astana branch of the            KZ        integrated            84              840            6.8
 Institute of Nuclear Physics                                                                     (SSD)
  Institute of Physics of the        AZ        integrated            16               96            56
     National Academy of
    Sciences of Azerbaijan
     North Ossetian State            RU        integrated            84              672            17
           University
    Academy of Scientific            EG        integrated            98              704           13.8
  Research & Technology -
    Egyptian National STI
           Network
     Institute for Nuclear           BG        integrated            35              100             4
    Research and Nuclear
            Energy
  St. Sophia University “St.         BG        integrated            48              250            4.7
      Kliment Ohridski”
 Scientific Research Institute       BY        integrated           132              290           127
   for Nuclear Problems of
 Belarusian State University
 Institute of Nuclear Physics        UZ       in progress
      Georgian Technical             GE       in progress            50              208            20
           University
                           Total                                    679             3,868

        The amount of JINR cloud resources contributed to the JINR DICE varies depending on the
load of the former: the more JINR cloud resources are utilized, the less its DICE share. At the time of
writing the article, the JINR cloud is 200 CPU cores.


2. Monitoring
         To monitor network connectivity among the JINR DICE participants, the perfSONAR (PS)
toolkit [2] is used. There is a challenge to deploy the PS instance at some sites since all cloud VMs
there are behind NAT. Low external network bandwidth (e.g. 100 Mbps shared with the whole
organization) is the main contributor to the high CPU wall time of jobs. The most suitable type of jobs
for resources with such low network bandwidth is Monte-Carlo simulation with negligible input data.
        There is ongoing work on implementing the metrics aggregation of each participating cloud.
These data are then used for resource monitoring and accounting. All software components involved
are shown in Fig. 1 and listed in the text below.




                                                   281
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021




         Figure 1. Schema of metrics aggregation, preservation and visualization for each JINR DICE
                                         participating cloud

        All JINR DICE clouds are based on the OpenNebula software [3]. A custom collector for
gathering OpenNebula-based cloud metrics was developed several years ago at the Meshcheryakov
Laboratory of Information Technologes (MLIT) of JINR. It is capable to store the collected data into
the Prometheus time series database (TSDB) [4] via the node_exporter component [5]. With the help
of the Thanos sidecar [6], the collected metrics are sent to the JINR ceph object storage via the s3
gateway with a public IP address (as the JINR ceph storage has only private IP addresses and is not
accessible from outside of the JINR network). Grafana is used for data visualization and alarming. It
queries the object storage for data via Thanos Querier and Thanos Store.
       At the moment of writing the article, metrics aggregation, preservation, visualization and
alarming based on the described approach are implemented for the JINR and NOSU clouds. Having all
data about the JINR DICE participating clouds in one place made it possible to visualize the
geographical location of the resources, as well as their statuses, on the worldmap panel grafana plugin
[7]. Work in this direction is in progress.


3. Utilization
         At present, the major consumer of the JINR DICE resources is the Baikal-GVD scientific
experiment. The number of its cumulative jobs completed from November 2020 to July 2021 amounts
to 67.5 thousands (Fig. 2). All these jobs are Monte-Carlo simulation. The main issue with them lies in
the 2.4 GB input file size, which needs to be downloaded for each job. It limits the number of
resources that can be used for them, since some participating organizations have very limited network
bandwidth. One of the possible solutions is to put this input file into the CVMFS repository. In this
case, each job (except the first) will download it with a speed of the local network from the CVMFS
caching node.




                                                   282
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021




          Figure 2. Plot with the number of cumulative Baikal-GVD jobs completed from November
                                          2020 to July 2021

         Idle JINR DICE resources are used for SARS-CoV-2 research under the Folding@Home
project [8]. A dedicated team called “Joint Institute for Nuclear Research” [9] was created within the
project. Each member of this team corresponds to a single JINR DICE member. Thus, the contribution
of each participant within the Folding@Home project can be taken into account. Figure 3 illustrates
the number of cumulative jobs completed as part of SARS-CoV-2 research.
      Apart from that, Spin Physics Detector (SPD) simulation and reconstruction jobs, as well as
the BM@N workflow with simulation jobs, were successfully tested.




        Figure 3. Plot with the number of cumulative SARS-CoV-2 jobs completed from
                                  May 2020 to July 2021




                                                   283
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021



4. Conclusion
         New members of the JINR Distributed Information and Computing Environment, as well as
new users, are joining. Only Monte-Carlo simulation jobs with negligible input data are suitable for
resources with low external network bandwidth (about 100 Mbps). The technical implementation of
OpenNebula metrics aggregation, preservation and visualization is completed. This experience should
be disseminated to the other JINR DICE clouds. The migration from the hand-drawn JINR DICE map
to the grafana World Map plugin is in progress.


References
[1] Balashov N.A. et al., Present Status and Main Directions of the JINR Cloud Development //
Proceedings of the 27th International Symposium Nuclear Electronics and Computing (NEC’2019),
CEUR Workshop Proceedings, ISSN:1613-0073, vol. 2507 (2019), pp. 185-189
[2] perfSONAR toolkit web portal. Available at: https://www.perfsonar.net (accessed 22.09.2021)
[3] D. Milojicic, I.M. Llorente, and R.S. Montero, “OpenNebula: A cloud management tool”. IEEE
Internet Computing, vol. 15, March 2011
[4] Prometheus web portal. Available at: https://prometheus.io (accessed 22.09.2021)
[5] Prometheus           node_exporter          github         page.                Available           at:
https://github.com/prometheus/node_exporter (accessed 23.09.2021)
[6] Thanos sidecar github page. Available at: https://github.com/thanos-io/thanos (accessed
23.09.2021)
[7] Worldmap          panel       grafana      plugin      web       page.       Available              at:
https://grafana.com/grafana/plugins/grafana-worldmap-panel (accessed 23.09.2021)
[8] SARS-CoV-2 web page on the Folding@Home web portal. Available                                       at:
https://foldingathome.org/diseases/infectious-diseases/covid-19/?lng=en (accessed 23.09.2021)
[9] JINR team web page on the Folding@Home                              web     portal.   Available     at:
https://stats.foldingathome.org/team/265602 (accessed 23.09.2021)




                                                   284