=Paper=
{{Paper
|id=Vol-3041/612-618-paper-113
|storemode=property
|title=IT Solutions for JINR Tasks on the “GOVORUN” Supercomputer
|pdfUrl=https://ceur-ws.org/Vol-3041/612-618-paper-113.pdf
|volume=Vol-3041
|authors=Dmitriy Podgainy,Dmitriy Belaykov,Andrey Nechaevsky,Oksana Streltsova,Aleksey Vorontsov,Maksim Zuev
}}
==IT Solutions for JINR Tasks on the “GOVORUN” Supercomputer==
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 IT SOLUTIONS FOR JINR TASKS ON THE “GOVORUN” SUPERCOMPUTER D.V. Podgainy, D.V. Belaykov, A.V. Nechaevsky, O.I. Streltsova, A.V. Vorontsov, M.I. Zuev Meshcheryakov Laboratory of Information Technologies, JINR E-mail: podgainy@jinr.ru The “Govorun” supercomputer is a heterogeneous computing system that contains computing architectures of different types, including graphics accelerators. The given architecture of the supercomputer allows users to choose optimal computing facilities for their tasks. To enhance the efficiency of solving user tasks, as well as to expand the efficiency of utilizing both computing resources and data processing and storage resources, a number of special IT solutions have been implemented on the “Govorun” supercomputer. A hierarchical hyperconverged data processing and storage system with a software-defined architecture is referred to the first type of IT solutions. The implementation of this system is caused by the fact that modern supercomputers are used not only as traditional computing environments for carrying out massively parallel calculations, but also as systems for Big Data analysis and artificial intelligence tasks that arise in different scientific and applied tasks. The second type of IT solutions lies in resource orchestration, which means that computational elements (CPU cores and graphics accelerators) and data storage elements (SSDs) form independent computing and data storage fields. Due to it, the user can allocate for his task the required number and type of compute nodes (including the required number of graphics accelerators), as well as the required volume and type of data storage systems. Keywords: high-performance platforms, data processing and storage systems, computing for high-energy physics Dmitriy Podgainy, Dmitriy Belaykov, Andrey Nechaevsky, Oksana Streltsova, Aleksey Vorontsov, Maksim Zuev Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 612 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 1. Introduction The “Govorun” supercomputer is an integral part of the HybriLIT heterogeneous computing platform (http://hlit.jinr.ru/) of the Meshcheryakov Laboratory of Information Technologies of the Joint Institute for Nuclear Research (MLIT JINR), which also comprises the HybriLIT training and testing polygon [1, 2]. The “Govorun” supercomputer is designed for resource-intensive massively parallel calculations. It is an innovative hyperconverged software-defined system with unique properties in terms of the flexibility of customizing the user task, ensuring the most efficient use of the computing resources of the supercomputer. The “Govorun” supercomputer encompasses a GPU component, a CPU component and a hierarchical data processing and storage system. The GPU component is implemented on the basis of five NVIDIA DGX-1 servers, each of which contains eight Tesla V100 graphics accelerators. The CPU component of the supercomputer is implemented on the “RSC Tornado” high-density architecture with direct liquid cooling, which ensures a high density of compute nodes, i.e. 150 nodes per rack, and high energy efficiency about 10 GFlop/WВ. The average annual PUE indicator of the system, reflecting the level of energy efficiency, is less than 1.06. The CPU and GPU components of the “Govorun” supercomputer rank 12 th and 21st in the TOP50 list of supercomputers in the CIS countries respectively (http://top50.supercomputers.ru/). The operation experience of the “Govorun” supercomputer has indicated the relevance and effectiveness of using the latest hyperconverged computing architectures that are part of it, which is reflected in 109 publications of users from July 2018 to January 2021. 11 articles were published in Q1 journals, and 15 papers were published in Q2 journals. Thus, on average, one publication accounts for 8.5 days of working on the “Govorun” supercomputer. In 2020, 65 articles were published, and on the basis of user reports, a booklet was formed and posted on the website (http://hlit.jinr.ru/users_publications/). In the first 6 months of 2021, four articles have already been published in Q1 journals, and a paper of the BM@N collaboration has been prepared for the Nature Physics journal, i.e., for three years of operation, using the resources of the “Govorun” supercomputer, two publications have been prepared for this prestigious journal. At the same time, the number of supercomputer users is growing. After commissioning in July 2018, the total number of users was 46 (41 from JINR and 5 from other organizations, including organizations of the Russian Federation and JINR’s Member States), in 2019, the number of users increased and amounted to 133 (93 from JINR and 40 from other organizations, including organizations of the Russian Federation and JINR’s Member States), and in 2020, it reached 161 (95 from JINR and 66 from other organizations, including organizations of the Russian Federation and JINR’s Member States).It is noteworthy that access to the resources of the “Govorun” supercomputer is provided only to those users who are directly involved in the implementation of the JINR Topical Plan. It should be pointed out that the range of tasks solved by the “Govorun” supercomputer is constantly expanding, and their specificity requires not only scaling supercomputer resources, but also introducing new IT solutions, which are not characteristic of traditional HPC systems. An example of such a task is computing being created for the NICA megascience project, for which the “Govorun” supercomputer, due to the flexibility of its architecture, is a key resource for testing and creating IT solutions. In particular, for this task, a hierarchical data processing and storage system has been developed and implemented together with RSC Group on the “Govorun” supercomputer. Let us single out several fundamental features related to the computing being created for the NICA project, which are also characteristic of other high-energy physics projects. First of all, it is necessary to work with large amounts of data, while at different stages of event reconstruction and simulation workflows, there is a need for different access speeds, for example, the access speed is not an important factor for long-term storage tasks, but it is essential for reconstruction tasks. In addition, a large amount of RAM is required for a number of tasks, which leads to the need to implement specialized nodes with large memory in the supercomputer architecture. Thus, methodologically, to ensure all work processes for tasks of the NICA megaproject, a system that combines both computing architectures of different types and a developed hierarchical data processing and storage system is being created on the 613 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 “Govorun” supercomputer. The approach to the implementation of new IT solutions is schematically presented in Fig. 1. Figure 1. Venn diagram for approaches to the development of the “Govorun” supercomputer. 2. Architecture of the “Govorun” supercomputer The supercomputer includes computing modules, network infrastructure modules, the RSC BasIS software and an infrastructure module (https://rscgroup.ru/). Computing modules have CPU and GPU components. The GPU component is based on five NVIDIA DGX-1 servers, each of which comprises eight NVIDIA Tesla V100 graphics accelerators. The CPU component is implemented on 21 RSC Tornado compute nodes containing Intel® Xeon Phi™ processors and 88 RSC Tornado nodes each containing two Intel® Xeon® Platinum 8268 processors and two Intel® SSDs DC P4511 (NVMe, M.2), with a capacity of 2TB each. Additionally, the “Govorun” supercomputer has the RSC Storage-on-Demand system, which is a single centrally managed system and has several levels of data storage, namely, very hot data, hot data and warm data. The very hot data storage system is built on top of four RSC Tornado TDN511S blade servers. Each server has 12 high-speed, low-latency solid state drives Intel® Optane™ SSD DC P4801X 375GB M.2 Series with Intel® Memory Drive Technology (IMDT), which allows for 4.2 TB of very hot data per server. The hot and warm data storage system consists of a static storage system with the Lustre parallel file system, created on the basis of 14 RSC Tornado TDN511S blade servers, and a dynamic RSC Storage-on-Demand system on top of 84 RSC Tornado TDN511 blade servers with support for the Lustre parallel file system and the DAOS distributed object storage system. Low- latency solid state drives Intel® Optane™ SSD DC P4801X 375GB M.2 Series are used to quickly access metadata of the Lustre file system. Intel® SSDs DC P4511 (NVMe, M.2) are used to store Lustre hot data. The network infrastructure module encompasses a communication and transport network, a control and monitoring network and a task control network. NVIDIA DGX-1 servers are interconnected by the communication and transport network based on InfiniBand 100 Gbps technology, and this component communicates with the CPU module via Intel OmniPath 100 Gbps. 3.5. The communication and transport network of the CPU module is built on a “thick tree” topology based on 48-port Intel OmniPath Edge 100 Series switches with full liquid cooling. The control and monitoring network enables the unification of all compute nodes and the control 614 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 node into a single Fast Ethernet network. This network is built using Fast Ethernet HP 2530-48 switches. The task control network connects all compute nodes and the control node into a single Gigabit Ethernet network. The network is built using HPE Aruba 2530 48G switches. An equally important part of the “Govorun” supercomputer architecture is the RSC BasIS supercomputer control software. RSC BasIS uses the CentOS Linux version 7.8 operating system on all compute nodes (CN) and performs the following functions: − monitors compute nodes with the emergency shutdown functionality in the case of detecting critical malfunctions (such as CN overheating); − collects indicators of the functioning of communication and transport network components; − collects performance indicators of compute nodes, i.e., load of processors and RAM; − stores monitored indicators with the ability to view statistics for a given time interval (at least one year); − collects the readings of the integral indicator of the state of CNs and displays them on the geometric form of the calculator rack; − displays the status of the leak detection system by moisture control sensors on compute nodes and displays it on the geometric form of the calculator rack; − displays the efficiency of using the allocated resources via the SLURM scheduler to the cluster user for a specific task as an indicator of the average load of CPUs allocated by the user (%); − displays the availability of CNs on the computing network and the control network on the geometric form of the calculator rack. The supercomputer control software includes: − system for managing user profiles and environments; − node software control system, including installation and updating of the operating system and application software packages; − SLURM scheduler; − software for secure remote access to the supercomputer; − tools for parallel administration and supercomputer control. In addition, RSC BasIS provides the management of the RSC Storage-on-Demand system, which ensures the following: − configuration of logical instances of storage systems using drives installed inside compute nodes designed to perform user tasks; − configuration of the parameters and hierarchy of the storage system instance; − monitoring of the key parameters of the storage system; − tasks are automatically launched on nodes that provide drives for use by other CNs via the task control system; − management of groups of CNs acting as clients for storage systems; − automatic mounting and unmounting of storage systems to a group of CN clients after making changes to the configuration; − graphical interface for creating static storage systems on demand with the ability to verify the scheme of a future storage system; − manual and automatic replacement of drives in degraded raid arrays; − ability to connect drives over a network with support for RDMA and Ethernet using the NVMe-over-Fabrics and NVMe-over-TCP protocols respectively. The infrastructure module comprises subsystems of refrigeration, power supply and an automated remote control system. The refrigeration subsystem provides the absorption, removal from compute nodes and dissipation of thermal energy in the atmosphere. Thermal energy dissipation is ensured by a liquid cooler, a dry closed cooling tower. To transfer thermal energy from nodes to the liquid cooler, circulation pumps and heat exchange units, assembled in a rack form factor, i.e., a pumping unit, are used. All computing racks are connected to a single refrigeration system. The refrigeration system is created on the basis of collectors of internal and external circuits, which allows connecting new liquid coolers, pumping units and computing racks without stopping the system. Pumping units are installed with N+1 redundancy. The refrigeration subsystem provides cooling of compute nodes with a total electrical power of up to 200 kW. The power supply subsystem performs the distribution and accounting of power supply. Computing racks, network infrastructure cabinets, refrigeration subsystems and the automated remote control system are connected to the power supply 615 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 subsystem. The automated remote control system manages the operation parameters of the refrigeration subsystem. It also monitors the current states of subsystems and provides the emergency shutdown of hardware. 3. IT solutions: hierarchical hyperconverged data processing and storage system with a software-defined architecture and resource orchestration At present, in the development of supercomputer technologies, there is a tendency to use supercomputers not only for massively parallel calculations, but also for working with Big Data [4]. The latter circumstance is related to the fact that the volume of Big Data is increasing exponentially from day to day. If we talk about the sources of Big Data in science, then the Large Hadron Collider at CERN, which generates about hundreds of petabytes of data per year, can be called the main one [5]. SKA, i.e., a project to create a radio telescope with an area of one square kilometer, the expected flow of raw data from which will be 1 exabyte per year, can be singled out of the upcoming experiments [6]. At the same time, when considering the issue of working with Big Data, two directions can be distinguished, the first one is associated with the use of tools for manipulating data (MapReduce algorithms, Hadoop libraries, etc.) [7], and the second one is related to the development of software and hardware solutions, which enables to effectively solve tasks on data manipulation. The NICA accelerator complex is currently under construction at JINR, and its experiments will generate tens of petabytes of data per year. The accounting of this trend and the creation of computing for the NICA project have defined the development vector for the “Govorun” supercomputer, which plays the role of a research polygon for the elaboration of software-hardware and IT solutions for the NICA project. To work with Big Data, a hierarchical hyperconverged data processing and storage system with a software-defined architecture has been implemented on the “Govorun” supercomputer. According to the speed of accessing data, the system is divided into layers that are available for the user’s choice. Each layer of the developed data storage system can be used both independently and as part of data processing workflows. The “Govorun” supercomputer ranks 31 st in the IO500 list (https://io500.org/) with a bandwidth of over 35 GiB/s and a speed of accessing metadata of over 230 kIOP/s. An equally important property of the “Govorun” supercomputer is the hyperconvergence of compute nodes, which allows orchestrating computing resources and data storage elements, as well as creating computing systems on demand using the RSC BasIS software. The notion “orchestration” means the software disintegration of a compute node, i.e., the separation of compute nodes and data storage elements (SSDs) with their subsequent integration in accordance with the requirements of user tasks. Thus, computing elements (CPU cores and graphics accelerators) and data storage elements (SSDs) form independent fields. Due to orchestration, the user can allocate for his task the required number and type of compute nodes (including the required number of graphics accelerators), the required volume and type of data storage systems. After the task is completed, compute nodes and storage elements are returned to their corresponding fields and are ready for the next use. This feature allows one to effectively solve user tasks of different types, to enhance the level of confidentiality of working with data and avoid system errors that occur when crossing the resources for different user tasks. The use of these IT solutions makes it possible to formulate the concept of the development of the “Govorun” supercomputer as the implementation of mapping the main characteristics of Big Data 𝑉 3 (Volume, large amounts of data for processing and storage; Velocity, the need for high-speed data processing, Variety, data of different types) on the hardware and software characteristics of the supercomputer 𝐻 3 (Heterogeneity, Hierarchy, Hyperconvegence). Heterogeneity, i.e., the presence of different types of calculators, namely, central processors, both classical and with a large number of processor cores (Intel Xeon Phi), graphics accelerators, allows choosing one or another type of calculator for efficient work with a given type of data. Hierarchy consists of two components, i.e., the use of file systems with different read/write speeds, from ultra-fast (DAOS, Lustre) to distributed, intended for long-term storage (EOS, Tapes). Hyperconvegence stands for the ability of the system to scale to the required capacity, namely, the same compute nodes can participate in data processing and be used for storage systems. 616 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 Figure 2. Concept of ensuring work with Big Data on the “Govorun” supercomputer: 𝑉 3 → 𝐻 3 . 4. Conclusion The implementation of the above technologies on the “Govorun” supercomputer has made it possible to perform a number of complex resource-intensive calculations in the field of lattice quantum chromodynamics to study the properties of hadronic matter at high energy density and baryon charge and in the presence of supramaximal electromagnetic fields, to qualitatively increase the efficiency of modeling the dynamics of collisions of relativistic heavy ions, to speed up the process of event generation and reconstruction for conducting experiments within the NICA megaproject implementation, to carry out computations of the radiation safety of JINR’s experimental facilities, to significantly accelerate studies in the field of radiation biology and other applied tasks solved at JINR under international scientific cooperation. Figure 3. Intended usage of DAOS for the NICA project. It is noteworthy that the “Govorun” supercomputer is an actively developing platform that combines state-of-the-art computing architectures and IT solutions. At present, one of such directions of development is research related to working with Big Data using the latest DAOS (Distributed Asynchronous Object Storage) technology [8]. For this purpose, a polygon of eight nodes with DAOS has been deployed on the “Govorun” supercomputer, it has demonstrated a high read/write speed and ranks 16th in the “10 node challenge” nomination in the current edition of the IO500 list 617 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 (https://io500.org/list/isc21/ten). Great prospects for the use of this technology are associated with its application to the NICA project at all stages of its work, from experimental data acquisition to final physical analysis (Fig. 3). At the same time, it is expected that the use of DAOS for the NICA project will enable to save and read multi-dimensional data structures of TB scale in a single address space, to create a multi-user presentation layer for analyzing physical results and easily integrate DAOS with other hot/warm storages. The latter circumstance is especially important due to the fact that a distributed, heterogeneous computing environment based on the DIRAC system has recently been developed for the MPD experiment to launch event generation and reconstruction tasks [9]. Meanwhile, in the process of calculating tasks, data is written to ultra-fast file storages and gradually transferred to slower ones, up to distributed or tape storages [10]. In addition, the DAOS technology looks promising in applying it to other types of tasks related to Big Data. These are primarily ML/DL tasks, including computer vision tasks, as well as the actively developing field of quantum computing. 5. Acknowledgments The studies in this direction were supported by the RFBR special grant (“Megascience – NICA”), No. 18-02-40101. References [1] Gh. Adam, M. Bashashin, D. Belyakov, M. Kirakosyan, M. Matveev, D. Podgainy, T. Sapozhnikova, O. Streltsova, Sh. Torosyan, M. Vala, L. Valova, A. Vorontsov, T. Zaikina, E. Zemlyanaya, M. Zuev. IT‑ecosystem of the HybriLIT heterogeneous platform for high‑performance computing and training of IT‑specialists. Selected Papers of the 8th International Conference «Distributed Computing and Grid-technologies in Science and Education» (GRID 2018), Dubna, Russia, September 10‑14, 2018, CEUR-WS.org/Vol‑2267″. [2] Dmitry Belyakov, Andrey Nechaevskiy, Igor Pelevanuk, Dmitry Podgainy, Alexey Stadnik, Oksana Streltsova, Aleksey Vorontsov, Maxim Zuev, “Govorun” Supercomputer for JINR Tasks, CEUR Workshop Proceedings, 2020, 2772, pp. 1-12. [3] Albrecht, J., Alves, A. A., Amadio, G., Andronico, G., Anh-Ky, N., Aphecetche, L., et al. (2019). A Roadmap for HEP Software and Computing R&D for the 2020s. Comput. Softw. big Sci. 3 (1), 7. doi:10.1007/s41781-018-0018-8. [4] Semin А. DAOS: Data storage system for HPC/BigData/AI applications in the era of exascale computing, “Storage News”, № 2 (74), 2019 (In Russian). [5] https://home.cern/news/news/computing/cern-data-storage-gets-ready-run-3. [6] https://www.spiedigitallibrary.org/journals/Journal-of-Astronomical-Telescopes-Instruments- and-Systems/volume-8/issue-1/011004/Toward-a-Spanish-SKA-Regional-Centre-fully-engaged-with- open/10.1117/1.JATIS.8.1.011004.full. [7] CMS BIG DATA PROJECT: https://cms-big-data.github.io/. [8] https://docs.daos.io/. [9] N. Kutovskiy, V. Mitsyn, A. Moshkin, I. Pelevanyuk, D. Podgayny, O. Rogachevsky, B. Shchinov, V. Trofimov, A. Tsaregorodtsev, Integration of distributed heterogeneous computing resources for the MPD experiment with DIRAC Interware, PEPAN, v. 52, № 4, pp. 999-1005, 2021. [10] A.A. Moshkin, I.S. Pelevanyuk, D.V. Podgainy, O.V. Rogachevsky, O.I. Streltsova, M.I. Zuev, Approaches, services, and monitoring in a distributed heterogeneous computing environment for the MPD experiment, in Proceedings of the International Conference. September 27–28, 2021, Moscow, Russia / Ed. by Vl. Voevodin. – Moscow: MAKS Press, 2021. DOI: https://doi.org/10.29003/m2454.RussianSCDays2021. 618