=Paper=
{{Paper
|id=Vol-2023/351-356-paper-57
|storemode=property
|title=HybriLIT - the main component of the MICC for heterogeneous computations at JINR
|pdfUrl=https://ceur-ws.org/Vol-2023/351-356-paper-57.pdf
|volume=Vol-2023
|authors=Gheorge Adam,Vladimir Korenkov,Dmitry Podgainy,Oksana Streltsova,Tatiana Strizh,Peter Zrelov
}}
==HybriLIT - the main component of the MICC for heterogeneous computations at JINR==
Proceedings of the XXVI International Symposium on Nuclear Electronics & Computing (NEC’2017)
Becici, Budva, Montenegro, September 25 - 29, 2017
HYBRILIT - THE MAIN COMPONENT OF THE MICC FOR
HETEROGENEOUS COMPUTATIONS AT JINR
Gh. Adam1,2, V.V. Korenkov1,3, D.V. Podgainy1,a, O.I. Streltsova1,3,
T.A. Strizh1, P.V. Zrelov1
1
Laboratory of Information Technologies, Joint Institute for Nuclear Research, 6 Joliot-Curie,
Dubna, Moscow region, 141980, Russia
2
Horia Hulubei National Institute for Physics and Nuclear Engineering (IFIN-HH), 30, Reactorului
St., Mǎgurele - Bucharest, 077125, Romania
3
Peoples' Friendship University of Russia (RUDN University), Moscow Miklukho-Maklaya str.6,
117198, Russia
E-mail: a podgainy@jinr.ru
The article includes the principles and approaches for the development of IT-ecosystem on the basis
of the heterogeneous computing cluster HybriLIT, that satisfies the requirements of users in solution
of scientific and engineering tasks that are covered by the research program of JINR. The
development of IT-ecosystem requires development of new services, new computational
architectures and also adaptation of the software. It is also important to train IT-specialists to learn
modern methods and tools for solution of specific tasks.
Keywords: high performance computing, heterogeneous cluster, hybrid architectures, HPC
software and information environment, parallel programming technologies; computational
science.
© 2017 Gheorge Adam, Vladimir V. Korenkov, Dmitry V. Podgainy, Oksana I. Streltsova,
Tatiana A. Strizh, Peter V. Zrelov
351
Proceedings of the XXVI International Symposium on Nuclear Electronics & Computing (NEC’2017)
Becici, Budva, Montenegro, September 25 - 29, 2017
1. IT-ecosystem for HPC
The decision to extend the JINR MICC [1] by adding a heterogeneous computing cluster
devoted to the high performance computing (HPC) in JINR and the JINR Member States follows the
modern trends in the world wide computing technologies. Implementation of a heterogeneous cluster
HybriLIT [2] that includes nodes with CPU and GPU architectures into MICC JINR infrastructure
allows follow two main direction of computation acceleration development. It provides users of the
cluster with possibilities to use available HPC means. The analysis of the needs for high performance
computing in JINR resulted in the definition of three basic tasks to be solved by the heterogeneous
HPC cluster: design and implementation of parallel software for computing intensive research;
porting to the cluster open software packages and numerical libraries which are already tuned for
hybrid architectures; development of new mathematical methods and parallel algorithms adapted to
heterogeneous architectures. For the purposes of efficient use of the computational architectures
developed in the leading companies in the field of HPC – Intel and NVIDIA; and to satisfy the
requirements of users in solution of tasks mentioned above, there appeared a need to create an IT-
ecosystem.
This IT- ecosystem is supposed to provide access to the information space that includes
services for organization of an efficient work for JINR employees, software and hardware
environment for solution of various tasks, management and processing of data of various volume and
structure, training and organization of scientific and research processes.
To support development of IT-ecosystem, it is necessary to develop information and
communication support based on modern network solutions that provide higher capacity and network
operational reliability.
Figure 1. IT-ecosystem for HPC
The developed IT-ecosystem (see Fig.1) unites hardware and information-software
environment that include modern computation architectures; applied software, information services,
services for work with applied software packages with various GUI, means for debugging and
profiling parallel applications and means for computation visualization. One of most important
352
Proceedings of the XXVI International Symposium on Nuclear Electronics & Computing (NEC’2017)
Becici, Budva, Montenegro, September 25 - 29, 2017
aspects here is training of IT-specialists to learn modern methods and tools for solution of problems
connected with the use of HPC in the field of theoretical, experimental and applied tasks [3].
2. HybriLIT current status
In present time HybriLIT consists of ten computational nodes including four nodes with
graphics processors NVIDIA Tesla K80 and four nodes with graphics processors NVIDIA Tesla
K40, one node with Intel Xeon Phi 7120P coprocessor and a node with NVIDIA Tesla K20x and
Intel Xeon Phi 5110P coprocessor. Each computational node includes two Intel Xeon E5-2695v2
processors (12-core). Totally, the cluster consists of 252 CPU-cores, 77184 GPU-cores, 182 PHI-
cores; 2.4 TB RAM; 57.6 TB HDD. The peak performance with single precision is 142 TFlops and
with double precision is 50 TFlops. The cluster is meant for carrying out massive parallel
computations using modern computing architectures such as GPUs (Nvidia Tesla K40, K80) and
Intel Xeon Phi coprocessors/processors.
For the purposes of efficiency increase of application development and carrying out
computation by means of the resources of the HybriLIT cluster, heterogeneous computation team
develops and supports information-software environment that can be divided into two parts.
The first level is considered to be system one and it includes basic software:
OS Scientific Linux 7.4 and Extreme Cluster Administration Tool Xcat;
NFS4 и EOS file systems;
• SLURM - workload manager and task scheduler that manages tasks between the
nodes of the heterogeneous cluster;
• Modules enables dynamic modification (interrogation, allocation, and deallocation) of user’s
environment;
CernVM-FS software repository.
The following software for development, debugging and profiling of parallel applications and
carrying out resource-demanding computations are installed:
OpenMPI 1.8.8, 2.1.2;
CUDA 8.0, 9.0;
GNU 4.4.7, 4.8.4, 4.9.3
Intel Parallel Studio XE 2018;
GEANT4, ROOT, FairSoft, MpdROOT at al.
The second level includes a set of services by means of which users can arrange their work more
efficiently and have access to required and useful information while working on the cluster. Among
services provided for users there are those, which contain necessary information about the cluster,
and about upcoming events that will be held by the HybriLIT team. Such services include:
• HybriLIT web-page: it contains detailed information about the resources provided for users
of the cluster. Particular emphasis is put on hardware that contains cluster's specifications,
and software installed on the cluster. It should be noted that registration procedure is very
important. Therefore, detailed description of registration is provided at the web-page.
Moreover, users can have a look at instruction for running tasks. This instruction includes
basic information about the cluster.
• Indico system: it is used by the HybriLIT team in order to organize conferences, seminars
and meetings dedicated to parallel programming technologies. The system allows creating
different events that will take place in the Institute. The created event provides basic
information about the event itself, time and place. There is also a possibility to upload all the
materials of lectures so that every user can download it.
Among services that include detailed information about the cluster and the possibilities it provides,
there are also the services aimed at close interaction between users and the HybriLIT team. Such
services include:
• HybriLIT User Support: it is a project developed in the Project Management Service system
[6] that allows answer the questions of users, upload useful materials, publish news, etc. This
project is developed for a more efficient interaction between users and the HybriLIT team.
353
Proceedings of the XXVI International Symposium on Nuclear Electronics & Computing (NEC’2017)
Becici, Budva, Montenegro, September 25 - 29, 2017
This system provides possibilities for information distribution about the upcoming events
and the current state of the cluster. Users can also create tasks on upgrade and debugging.
Thus, interaction between users of the cluster and the developers is carried out quickly and
efficiently.
• GitLab: it is a service aimed at mutual parallel development of applications. This is a version
control system and it allows follow the changes in the code of projects. System functional is
wide and includes such possibilities as separation of access among users, task management
system, wiki, code review, etc. The HybriLIT team has developed several projects within this
system. One of the main projects is «Parallel features» [7] was created within the GitLab
framework with a threefold purpose:
- to carry out co-development of software packages and programs by independent user
groups;
- to provide more effective consultancy to the cluster users and to people foreseeing future
development and implementation of parallel programs for calculations on multicore systems
(based on MPI, OpenMP, OpenCL) and coprocessors (CUDA for graphics processing units
(GPU) NVIDIA, OpenMP for Intel Xeon Phi coprocessors);
- to develop parallel applications and methodical examples on their base.
The report discusses an instance of parallel computations using the scheduling system
SLURM on sequential tasks that require massive calculations for different values of input
parameters. The need of such an algorithm rose from the investigation of the Josephson
junctions in high temperature superconductors subject to external radiation [4]. The project
already solved the problem to get optimization keys for different compilers installed on the
HybriLIT cluster.
The project «Paralleling features» can serve as a pattern toward the systematic development
of parallel applications implemented on different computing platforms containing GPU
NVIDIA and Intel Xeon Phi coprocessors. It may also be used for the development of more
efficient applications by the cluster users from various JINR laboratories, by students and by
participants of training courses conducted on the heterogeneous cluster HybriLIT.
With regard to the fact that users of the cluster are scientists and specialists from different countries,
bilingual support all resources (Russian and English) is important.
Fig.2 shows software and information environment of the cluster.
Figure 2. Software and information environment of HybriLIT
354
Proceedings of the XXVI International Symposium on Nuclear Electronics & Computing (NEC’2017)
Becici, Budva, Montenegro, September 25 - 29, 2017
3. New services
A new service - HLIT-VDI – has been developed for shared use of applied software
packages on the HybriLIT cluster using GUI (graphical user interface). By means of this service, it is
now possible to work with applied software packages such as Wolfram Mathematica, Maple, Matlab,
COMSOL, GEANT4, etc. via remote access to the virtual machines (VM) in the framework of the
HybriLIT cluster (Fig.3). The developed service allows carrying out computations in the frames of
VMs and massive computations using the resources of the cluster. HLIT-VDI is developed for
solution of the following problems closely connected with the use of HPC for applied software
packages.
Figure 3. Workflow of HLIT-VDI service
First of all, scientific and engineering tasks are connected with the use of floating license
software packages used by the organization; thus, the developed service will be optimized for
distribution of computation time among users.
Secondly, in order to solve resource-intensive tasks, there appears a need to use computation
clusters including those with GPUs (in case the software packages support GPUs) for reduction of
computation time. For example, geometry and/or computation grids are carried out by means of GUI
(graphical user interface), and computations are carried out using HPC-platforms. The developed
service will have a user-friendly interface which will make it easier to accelerate computations.
Thirdly, as soon as software packages for engineering computations require a huge amount of
work with graphical data in order to provide efficient data analysis, a graphical station with NVIDIA
M60 has been added.
In collaboration with Universität Kassel, in order to receive data on available resources on
the cluster, characteristics of nodes, and also the status of launched tasks (for more efficient
organization for carrying out computations), a mobile application -- MobiLIT -- has been developed.
It provides the following possibilities:
• monitoring of user’s jobs;
• quick view of user’s files (*.out, *.dat, *.in or *.sh);
• possibility to kill jobs directly via the app;
• monitor the expected computation time;
• monitor available resources on different nodes and partitions of the cluster which are
meant for using different computing architectures;
• possess full information on jobs launched on the cluster, the resources they require, etc.
MobiLIT is developed using NativeScript framework. It is available in Google Play Store [4] for
users of smartphones and tablets under Android; for iOS users the application is available in App
Store in test mode. To sign in, users need authorization data for the HybriLIT cluster.
The developed mobile application – MobiLIT – for users of HybriLIT enriches its
information-software environment with a convenient and modern IT-service, which makes it simpler
to carry out parallel computations and provides additional optimization while using computation
resources of the cluster.
355
Proceedings of the XXVI International Symposium on Nuclear Electronics & Computing (NEC’2017)
Becici, Budva, Montenegro, September 25 - 29, 2017
4. Acknowledgement
The work was supported by the Russian Foundation for Basic Research (RFBR), grant №15-
29-01217, and by the Ministry of Education and Science of the Russian Federation (the Agreement
number 02.a03.21.0008).
5. Conclusion and future plans
The described above conception for HPC is aimed at solution of scientific and applied tasks
that are crucial for JINR by means of implementation and development of a wide range of new IT-
solutions integrated into an information-software environment of the IT-ecosystem including various
technological solutions, concepts and approaches.
Development of such an IT-ecosystem allows reducing time for development of JINR's
experimental projects that require carrying out theoretical and applied computations necessary on the
stages of engineering of experimental facilities, works on development of computing models for
experiments, solution of tasks connected with data analysis and processing, and also development of
an information environment for management of all projects' components.
This IT-ecosystem is the basic platform for training of IT-specialists to be able to develop
algorithmic and program solutions in the field of system and applied programming, mathematical,
information and simulation models.
It is planned to develop IT-ecosystem as a distributed HPC-platform that unites
supercomputers (heterogeneous), grid- and cloud- technologies [5] in order to provide optimal
approaches for the solution of various scientific and applied tasks that require both massive-parallel
computations, and Big Data methods and technologies which will allow to use modern computation
architectures much more efficiently.
References
[1] MICC LIT JINR: https://miccom.jinr.ru/ru/
[2] Heterogeneous computing cluster HybriLIT: http://hybrilit.jin.ru
[3] Korenkov V.V., Podgainy D.V., Streltsova O.I. Educational program on HPC technologies on
the basic of the HybriLIT heterogeneous cluster (LIT JINR) // Modern Information Technology and
IT-education V13, №4, 2017, P141-146 (in Russian)
[4] Streltsov A.I. MobiLIT@HPC in Google Play Market:
https://play.google.com/store/apps/details?id=org.nativescript.mobilit
[5] Korenkov V.V., Strizh T. A., Gh. Adam, Podgainy D.V. The development of distributed
computing technologies and BigData in LIT-JINR // Proceedings of the 8-th ROLCG 2015
Conference “Grid, Cloud and High Performance Computing in Science”, Cluj-Napoca, Romania,
Editura U.T. (2016)
356