=Paper=
{{Paper
|id=Vol-3041/321-325-paper-59
|storemode=property
|title=Design and Development of Application Software for the MPD Distributed Computing Infrastructure
|pdfUrl=https://ceur-ws.org/Vol-3041/321-325-paper-59.pdf
|volume=Vol-3041
|authors=Andrey Moshkin,Igor Pelevanyuk,Oleg Rogachevskiy
}}
==Design and Development of Application Software for the MPD Distributed Computing Infrastructure==
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 DESIGN AND DEVELOPMENT OF APPLICATION SOFTWARE FOR THE MPD DISTRIBUTED COMPUTING INFRASTRUCTURE A.A. Moshkin1, I.S. Pelevanyuk1,a, O.V. Rogachevskiy1 1 Joint Institute for Nuclear Research, 6 Joliot-Curie St., Dubna, Moscow Region, Russia, 141980 E-mail: a pelevanyuk@jinr.ru The Multi-Purpose Detector collaboration began using distributed computing for centralized Monte- Carlo generation in mid-2019. The DIRAC Interware is used as a platform for the integration of heterogeneous distributed computing resources. Since then, workflows of job submission, data transfer, and storage have been designed, tested, and successfully applied. Moreover, the growth of interest in access to the computing system from users is observed. One way to provide such access for the users is to allow them to directly submit jobs to DIRAC. However, direct access to the resources imposes a high responsibility on users and must be restricted. For this reason, another approach was chosen, i.e. to design and develop a dedicated application that collects requirements from users and starts the required number of jobs. Such an approach entails additional effort: elaboration of requirements, application design and development. Nevertheless, it allows for greater control over the workload submitted by other users, reducing possible failures and the inefficient usage of resources. Keywords: data processing, distributed computing, GRID applications Andrey Moshkin, Igor Pelevanyuk, Oleg Rogachevskiy Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 321 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 1. Introduction The Multi-Purpose Detector (MPD) is the first experiment at the NICA complex. The MPD apparatus has been designed as a 4π spectrometer capable of detecting charged hadrons, electrons, and photons in heavy-ion collisions at high luminosity in the energy range of the NICA collider. To reach this goal, the detector will comprise a precise 3D tracking system and a high-performance particle identification (PID) system based on time-of-flight measurements and calorimetry [1]. At present, the NIСA complex and the MPD experimental setup are under construction. However, there is a need to generate hundreds of millions of events in order to develop algorithms for recognizing and reconstructing tracks of elementary particles and working out algorithms for physical data analysis. To solve these problems and then analyze real data from the MPD experiment, DIRAC is used. 2. Use of the DIRAC Interware for MPD distributed computing For efficient data processing of the MPD experiment, a heterogeneous, geographically distributed computing environment is currently being created on top of the DIRAC Interware [2]. The DIRAC Interware is an open-source development platform for the integration of heterogeneous computing and storage resources. It was originally developed for the LHCb computing infrastructure, but was later released as a general-purpose solution for scientific groups. Right now, it is used by many experiments in high-energy physics, particle physics, and astronomy: LHCb, Belle-II, BES-III, CTA, CLIC, ILC. The purpose of DIRAC is to provide access to various computing and storage resources through standard interfaces, namely, web, command line, API, and REST. Another purpose is to provide a set of standard tools for workload management, data management, accounting, workflow management, user management, etc. The service based on the DIRAC platform was deployed and configured at the Joint Institute for Nuclear Research in 2016. Various tests were performed to estimate its possibilities and performance [3]. By 2019, standard workflow tests related to general Monte-Carlo generation were successfully carried out, which proved the possibility of using DIRAC for real scientific computation. At that time, MPD started to have a necessity for massive computing to perform centralized Monte- Carlo generation for the needs of scientific groups. At first, only Tier1 and Tier2 were used for that work, but later the “Govorun” supercomputer and the VBLHEP cluster were integrated into the system. The JINR EOS storage system was integrated and accessed via root protocol. Authentication is based on x509 certificates, and authorization is regulated by both DIRAC and VOMS. At the moment, the heterogeneous, geographically distributed computing environment includes the Tier1 and Tier2 grid sites, the “Govorun” supercomputer, the cloud component of the MIСC JINR, the computing clusters of VBLHEP JINR and UNAM Mexico [4]. The extensive use of different distributed resources made it possible to successfully complete more than 800 thousand computing jobs. Each job was running approximately 6-7 hours. The total amount of computing work is around 4 MHS06 days. Request for all these jobs came from four MPD Physics Working Groups. They form requests with all the information relevant for data production in a special information system. The production manager uses this information to form job descriptions and submit them to DIRAC. Each job consists of two parts: Job Description information and Shell script that is executed on the worknode. The Job Description contains information about the resource on which the job should run, what executable will be on the worknode (in most cases for MPD jobs it is Shell: /usr/bin/sh), what arguments should be passed to the executable (it should be a Shell script name), which files should be passed with the job. The Shell script comprises initial data download, configuration, execution of physics software, and result data upload. In most cases, the DIRAC API is used to generate and submit thousands of similar jobs with a different index number. This process requires special expertise related to physics software, the use of DIRAC, and the features of different computing resources. Right now, there are three major ways to enable users to use distributed computing resources. 322 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 The first is to perform extensive training for a user and gradually expand his access to different resources. It is expensive in terms of time both for DIRAC specialists who teach and users who learn. Nevertheless, when the training is complete, the user becomes very flexible in terms of job definition and workflow design. Although, it is required for users to keep up to date with the changing and evolving infrastructure. The second way is to find a DIRAC specialist to submit physics jobs on behalf of scientists. Users explain the task to the specialist, and the specialist designs and submits jobs to resources. It is less time consuming for users and DIRAC specialists. However, the specialist in DIRAC will need to become more proficient in applied software. This approach has been successfully used by MPD and proved to be effective. The third way is to develop a dedicated DIRAC application to provide an interface that is defined in terms of user tasks (required software, type of job, energy, number of events needed). It is highly convenient for users. Nevertheless, it requires much more effort from DIRAC development specialists, and it will further require the support of the developed application. The third approach was chosen to provide physicists with the possibility to use the distributed infrastructure. A special dedicated application was designed and developed. 3. Design and development of the application for MPD users The DIRAC Interware provides a convenient web interface suitable for small workload submission, job monitoring, accounting check, and some other activities. The frontend is built with the ExtJS framework. Basically, a web interface is a set of different applications related to different activities. Each application requires two files with the code: one with the JavaScript code related to visualization and web logic, another with the Python code working as a backend and responsible for getting data from the database or various DIRAC services. It is possible to create additional custom applications just by adding new modules to the corresponding frontend and backend code directories. Table 1. Input parameters for the MPD application Name of parameter Type of parameter Possible values InputTemplate string urqmd-BiBi-09.2GeV-mb-eos0-500 Generator enum UrQMD, SMASH, DCQGSM, etc EventsNumber int 1000000 Beam enum Au, Bi, Ag, p, C, Pb Target enum Au, Bi, Ag, p, C, Pb Centrality string mb RecMod string dst-BiBi-09.2GeV-mb-eos0-500ev Energy double 12.2 InputExtention string f14 InputSandbox file Several uploaded files The creation of a custom DIRAC application that will be responsible for production jobs submission should provide easier access to computing resources with a lower risk of errors. This can potentially bring new users to the system and improve resource utilization. Users will not need to learn all the details about DIRAC, its commands, and API. It was decided to design an application for MPD Monte-Carlo jobs. The first step is the design of the future application. At this step, it is necessary to define the schema of the future application, the set of fields that will form the input and the set of fields that will form the output. The best way to do it is to analyze Shell scripts of previously submitted jobs. In our 323 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 case, the main task of the application is to submit thousands of jobs with the same physics parameters and different index parameters for each job. There should be no output other than the result of massive job submissions. The list of the physics parameters is shown in Table 1. Figure 1. Schema of the MPD application for Monte-Carlo production The user should open a dedicated application, fill in all the physics parameters, define the number of collision events to be generated, and attach two files with the root code: run.mc and reco.mc. These files are written by physicists and contain the code related to the process of generation and reconstruction correspondingly. On the server side, a special service receives this user request and forms a temporary file with the user’s parameters. The file is passed with each job since all jobs should use the same physics parameters. Job Description parameters are set by the service itself, they are the same for all jobs in one pack. To allow it, the Shell script was taken from previous productions and modified to use the physics configuration file. The variable index of each job is passed separately as an additional argument to the Shell script. The required number of jobs is then submitted to the DIRAC Job Queue. The application usage schema is demonstrated in Figure 1. During the development stage, the major part of the code was taken from the standard DIRAC JobLaunchpad application. Several major changes were made to the developed application. The MPDJobLaunchpad application first checks that the user who opens it belongs to the MPD group in DIRAC. Only MPD users are allowed to submit jobs via this interface. Custom input fields were put on the interface and configured to accept valid parameter values. Most of the logic was put on the server-side code. It reads user parameters, calculates the number of jobs to be submitted, and forms parameters for all submitted jobs. The Shell script for execution now exists on the server. It was changed to accept user parameters and use them during its execution. 4. Conclusion The developed application allows ordinary users, who are not specialists in the DIRAC Interware, to run massive Monte-Carlo productions. The web interface (Fig. 2) is simple to understand without additional training. It is certainly possible to design and develop new applications if needed. However, the main requirement for using this approach is that only the parameters should be changed, and not the scientific processing schema. Otherwise, frequent changes will be required, and processing will shift to more flexible approaches that imply the help of a DIRAC specialist. The second issue is that the proposed application will not work if scientific software is unstable. Frequent crashes can complicate the process of workload execution, and a lot of additional work from users to resubmit jobs will be required. Although it is possible to improve the developed application in order to provide a resubmission mode, in which it will check for failed jobs and resubmit them. 324 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 Figure 2. MPD application for Monte-Carlo production on the DIRAC web interface The main advantage of the proposed approach is that it simplifies the process of submitting jobs to the DIRAC system. It allows us to provide distributed resources to a wider range of users. Another advantage is that it helps to eliminate user errors in execution scripts (not scientific software) and bring more order in result file placement and metadata assignment. If there are no changes in the process of Monte-Carlo generation over time, the support of the application is not required since its work is stable. The development of dedicated applications for different user groups is a viable option for administrators of DIRAC instances in different organizations. It is not too difficult to design and develop this kind of application, although it requires some expertise in DIRAC development. Developed applications can be used in parallel with other approaches to submitting jobs during the testing stage and, in the case of success, can be proposed to other users. 5. Acknowledgments The software described in this work was created with the support by RFBR according to the research project №.18-02-40101. The application of this software to automate the processes of task running were supported by the RFBR grant (“Megascience – NICA”) No.18-02-40102. References [1] NICA – Nuclotron-based Ion Collider fAcility. URL: https://nica.jinr.ru (Date of reference: 30.08.2021). [2] Tsaregorodtsev A. and the DIRAC Project. DIRAC Distributed Computing Services // J. Phys.: Conf. Ser. 2014. Vol. 513, No. 3. DOI: 10.1088/1742-6596/513/3/032096. [3] Korenkov V., Pelevanyuk I., Tsaregorodtsev A. Integration of the JINR Hybrid Computing Resources with the DIRAC Interware for Data Intensive Applications // Data Analytics and Management in Data Intensive Domains. 2020. P. 31-46. DOI: 10.1007/978-3-030-51913-1_3 [4] Kutovskiy N., Mitsyn V., Moshkin A. et al. Integration of Distributed Heterogeneous Computing Resources for the MPD Experiment with DIRAC Interware // PEPAN. 2021. Vol. 52, No. 4. 325