=Paper=
{{Paper
|id=Vol-3041/280-284-paper-52
|storemode=property
|title=JINR Distributed Information and Computing Environment: Participants, Features and Challenges
|pdfUrl=https://ceur-ws.org/Vol-3041/280-284-paper-52.pdf
|volume=Vol-3041
|authors=Nikita Balashov,Igor Kuprikov,Nikolay Kutovskiy,Alexandr Makhalkin,Yelena Mazhitova,Igor Pelevanyuk,Roman Semenov
}}
==JINR Distributed Information and Computing Environment: Participants, Features and Challenges==
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 JINR DISTRIBUTED INFORMATION AND COMPUTING ENVIRONMENT: PARTICIPANTS, FEATURES AND CHALLENGES N.A. Balashov1, I.S. Kuprikov2, N.A. Kutovskiy1 a, A.N. Makhalkin1, Ye. Mazhitova1,3, I.S. Pelevanyuk1, R.N. Semenov1,4 1 Meshcheryakov Laboratory of Information Technologies, Joint Institute for Nuclear Research, 6 Joliot-Curie, Dubna, Moscow region, 141980, Russia 2 Dubna State University, 19 Universitetskaya str., Dubna, Moscow region, 141980, Russia 3 Institute of Nuclear Physics, 050032, 1 Ibragimova str., Almaty, Kazakhstan 4 Plekhanov Russian University of Economics, 36 Stremyanny per., Moscow, 117997, Russia E-mail: a kut@jinr.ru The JINR Distributed Information and Computing Environment (DICE) was created to combine resources for solving common scientific tasks, as well as to distribute peak loads across the resources of partner organizations from the JINR Member States. To monitor the hardware resources and services of the growing DICE infrastructure, a system based on Prometheus and Thanos was designed and deployed. The collected metrics, including the geographical location of the JINR DICE participants, are visualized using Grafana. Software distribution is carried out with the help of the CERN Virtual Machines File System. All these topics, as well as challenges and possible overcomes, are covered in detail. Keywords: cloud computing, OpenNebula, cloud integration, DIRAC, ceph Nikita Balashov, Igor Kuprikov, Nikolay Kutovskiy, Alexandr Makhalkin, Yelena Mazhitova, Igor Pelevanyuk, Roman Semenov Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 280 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 1. Participants and resources Table 1 demonstrates the list of the JINR DICE participants (excluding the JINR cloud [1]), as well as the resources provided. Table 7. JINR DICE participants and resources Organization Country Status Non-HT CPU RAM, GB Storage, cores TB Plekhanov Russian RU integrated 132 608 51.1 University of Economics Astana branch of the KZ integrated 84 840 6.8 Institute of Nuclear Physics (SSD) Institute of Physics of the AZ integrated 16 96 56 National Academy of Sciences of Azerbaijan North Ossetian State RU integrated 84 672 17 University Academy of Scientific EG integrated 98 704 13.8 Research & Technology - Egyptian National STI Network Institute for Nuclear BG integrated 35 100 4 Research and Nuclear Energy St. Sophia University “St. BG integrated 48 250 4.7 Kliment Ohridski” Scientific Research Institute BY integrated 132 290 127 for Nuclear Problems of Belarusian State University Institute of Nuclear Physics UZ in progress Georgian Technical GE in progress 50 208 20 University Total 679 3,868 The amount of JINR cloud resources contributed to the JINR DICE varies depending on the load of the former: the more JINR cloud resources are utilized, the less its DICE share. At the time of writing the article, the JINR cloud is 200 CPU cores. 2. Monitoring To monitor network connectivity among the JINR DICE participants, the perfSONAR (PS) toolkit [2] is used. There is a challenge to deploy the PS instance at some sites since all cloud VMs there are behind NAT. Low external network bandwidth (e.g. 100 Mbps shared with the whole organization) is the main contributor to the high CPU wall time of jobs. The most suitable type of jobs for resources with such low network bandwidth is Monte-Carlo simulation with negligible input data. There is ongoing work on implementing the metrics aggregation of each participating cloud. These data are then used for resource monitoring and accounting. All software components involved are shown in Fig. 1 and listed in the text below. 281 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 Figure 1. Schema of metrics aggregation, preservation and visualization for each JINR DICE participating cloud All JINR DICE clouds are based on the OpenNebula software [3]. A custom collector for gathering OpenNebula-based cloud metrics was developed several years ago at the Meshcheryakov Laboratory of Information Technologes (MLIT) of JINR. It is capable to store the collected data into the Prometheus time series database (TSDB) [4] via the node_exporter component [5]. With the help of the Thanos sidecar [6], the collected metrics are sent to the JINR ceph object storage via the s3 gateway with a public IP address (as the JINR ceph storage has only private IP addresses and is not accessible from outside of the JINR network). Grafana is used for data visualization and alarming. It queries the object storage for data via Thanos Querier and Thanos Store. At the moment of writing the article, metrics aggregation, preservation, visualization and alarming based on the described approach are implemented for the JINR and NOSU clouds. Having all data about the JINR DICE participating clouds in one place made it possible to visualize the geographical location of the resources, as well as their statuses, on the worldmap panel grafana plugin [7]. Work in this direction is in progress. 3. Utilization At present, the major consumer of the JINR DICE resources is the Baikal-GVD scientific experiment. The number of its cumulative jobs completed from November 2020 to July 2021 amounts to 67.5 thousands (Fig. 2). All these jobs are Monte-Carlo simulation. The main issue with them lies in the 2.4 GB input file size, which needs to be downloaded for each job. It limits the number of resources that can be used for them, since some participating organizations have very limited network bandwidth. One of the possible solutions is to put this input file into the CVMFS repository. In this case, each job (except the first) will download it with a speed of the local network from the CVMFS caching node. 282 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 Figure 2. Plot with the number of cumulative Baikal-GVD jobs completed from November 2020 to July 2021 Idle JINR DICE resources are used for SARS-CoV-2 research under the Folding@Home project [8]. A dedicated team called “Joint Institute for Nuclear Research” [9] was created within the project. Each member of this team corresponds to a single JINR DICE member. Thus, the contribution of each participant within the Folding@Home project can be taken into account. Figure 3 illustrates the number of cumulative jobs completed as part of SARS-CoV-2 research. Apart from that, Spin Physics Detector (SPD) simulation and reconstruction jobs, as well as the BM@N workflow with simulation jobs, were successfully tested. Figure 3. Plot with the number of cumulative SARS-CoV-2 jobs completed from May 2020 to July 2021 283 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 4. Conclusion New members of the JINR Distributed Information and Computing Environment, as well as new users, are joining. Only Monte-Carlo simulation jobs with negligible input data are suitable for resources with low external network bandwidth (about 100 Mbps). The technical implementation of OpenNebula metrics aggregation, preservation and visualization is completed. This experience should be disseminated to the other JINR DICE clouds. The migration from the hand-drawn JINR DICE map to the grafana World Map plugin is in progress. References [1] Balashov N.A. et al., Present Status and Main Directions of the JINR Cloud Development // Proceedings of the 27th International Symposium Nuclear Electronics and Computing (NEC’2019), CEUR Workshop Proceedings, ISSN:1613-0073, vol. 2507 (2019), pp. 185-189 [2] perfSONAR toolkit web portal. Available at: https://www.perfsonar.net (accessed 22.09.2021) [3] D. Milojicic, I.M. Llorente, and R.S. Montero, “OpenNebula: A cloud management tool”. IEEE Internet Computing, vol. 15, March 2011 [4] Prometheus web portal. Available at: https://prometheus.io (accessed 22.09.2021) [5] Prometheus node_exporter github page. Available at: https://github.com/prometheus/node_exporter (accessed 23.09.2021) [6] Thanos sidecar github page. Available at: https://github.com/thanos-io/thanos (accessed 23.09.2021) [7] Worldmap panel grafana plugin web page. Available at: https://grafana.com/grafana/plugins/grafana-worldmap-panel (accessed 23.09.2021) [8] SARS-CoV-2 web page on the Folding@Home web portal. Available at: https://foldingathome.org/diseases/infectious-diseases/covid-19/?lng=en (accessed 23.09.2021) [9] JINR team web page on the Folding@Home web portal. Available at: https://stats.foldingathome.org/team/265602 (accessed 23.09.2021) 284