=Paper= {{Paper |id=Vol-3041/439-444-paper-81 |storemode=property |title=Development of the Event Metadata System for the NICA Experiments |pdfUrl=https://ceur-ws.org/Vol-3041/439-444-paper-81.pdf |volume=Vol-3041 |authors=Evgeny Alexandrov,Igor Alexandrov,Artyom Degtyarev,Irina Filozova,Konstantin Gertsenberger,Peter Klimai,Alexander Yakovlev }} ==Development of the Event Metadata System for the NICA Experiments== https://ceur-ws.org/Vol-3041/439-444-paper-81.pdf
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021



       DEVELOPMENT OF THE EVENT METADATA SYSTEM
               FOR THE NICA EXPERIMENTS
 E.I. Alexandrov1, I.N. Alexandrov1,a, A.G. Degtyarev2, I.A. Filozova1, K.V.
               Gertsenberger1, P.A. Klimai2, A.V. Yakovlev1
         1
             Joint Institute for Nuclear Research, 6 Joliot-Curie st., Dubna, 141980, Russia
   2
       Moscow Institute of Physics and Technology, 9 Institutskiy per., Dolgoprudny, 141701,
                                            Russia

                                          E-mail: a alexand@jinr.ru


Particle collision experiments are known to generate a substantial amount of data that must be stored
and then analyzed. Typically, only a small subset of all collected events is relevant when performing a
particular physics analysis task. Although it is possible to obtain the required subset of event records
directly, by iterating through the whole volume of the collected data, the process is very time and
resource consuming. Instead, a more convenient approach widely used in modern particle collision
experiments is to have an event metadata or indexing system that stores summary properties of all
events and allows for the fast search for and retrieval of only necessary events based on given criteria.
Such a system, called the Event Metadata System, has been developed for the current fixed-target and
future collider experiments of the NICA project. The design of the system, its components, user
interfaces and the REST API service, its integration with existing experiment systems and software, as
well as associated challenges, are presented.


Keywords: particle collision experiment, data management, event metadata, event index,
event database, NICA experiments, information system



                    Evgeny Alexandrov, Igor Alexandrov, Artyom Degtyarev, Irina Filozova, Konstantin
                                                    Gertsenberger, Peter Klimai, Alexander Yakovlev

                                                               Copyright © 2021 for this paper by its authors.
                      Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).




                                                     439
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021




1. Introduction
         The aim of the NICA (Nuclotron-based Ion Collider fAcility) project is to study hot and dense
strongly interacting matter in heavy ion collisions at center-of-mass energies up to 11 GeV for gold
ions, and up to 27 GeV for protons to study the spin structure and spin-related phenomena [1]. The
NICA complex is currently under construction and will include, in its final form, both the fixed-target
experiment, BM@N (Baryonic Matter at Nuclotron), and collider experiments, MPD (MultiPurpose
Detector) and SPD (Spin Physics Detector). At present, experimental data from BM@N collected
since 2015 (overall about 500 million events), as well as simulated data of both BM@N and the
collider experiments, are available at distributed storages of the NICA computing system.
         A large amount of data produced by particle collision experiments, including those of NICA,
must be stored, then processed and analyzed. The data stream from the detector electronics up to the
data storage of the NICA experiments is organized as a sequential data-driven pipeline [2], which
includes data acquisition, online processing, and offline processing stages on distributed computing
platforms. Physics analysis, which is a final stage of offline processing, is performed to obtain the
desired physics results of an experiment. Physics analysis is typically carried out for a subset of events
satisfying given criteria, such as events for a selected beam particle, or that lead to the production of
certain particles, etc. To avoid a full scan of all files with the collected event data, an event metadata
or indexing system is required.


2. Goals and requirements for the Event Metadata System
         The main goal of the developed system was to provide a searchable event catalogue with basic
information (metadata) on all reconstructed experimental collision events. Full experimental data are
contained in ROOT files of the experiments, while the Event Metadata System (EMS) stores only the
necessary event metadata, such as the number of primary and all tracks, track number of positively
(negatively) charged particles, primary and secondary particles found in the event, number of hits by
the detectors, total input and output charge in the event, etc. The exact set of stored summary event
attributes (metadata) varies per experiment, so the system should be configurable to store a basic and
arbitrary set of parameters per event being specific to the experiment. In addition, the software version
used to obtain event data and a reference to the storage location are recorded for each event in the
Event Catalogue.
         Additional features of the EMS are providing statistics on the collected data and checking the
quality of the catalogue of physics events. The requirements for the Event Metadata System include
scalability (several billion events per year are expected for the NICA experiments), performance (an
important characteristic is the processing time of one request), availability and fail safety, role-based
access control. Access to metadata should be provided via several application programming and user
interfaces, including Web service, REST API (application programming interface), and ROOT C++
methods.


3. Event Metadata System architecture and its developed components
         The overall architecture of the Event Metadata System is shown in Fig. 1. The information
system is based on the Event Database called Event Catalogue, which contains specific information on
particle collision events, allowing the user to quickly search for a set of events required for particular
physics analysis by various criteria. The Event Database identifies stored events with a file pointer
defining a reference to the data file and event number (a unique integer within a run) in the file, which
uniquely matches each event and allows finding it in the distributed file systems of the experiments.
The Catalogue has been built on top of the PostgreSQL database management system (DBMS). The
choice of the DBMS has been made for both performance and manageability reasons [3].



                                                   440
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021



        Run metadata for the NICA experiments, such as a period and run numbers, beam and target
particles, and beam energy, are stored in the developed Condition Database [4]. To improve data query
performance, run metadata are used to preselect experimental run numbers matching given criteria,
before performing a search in the Event Catalogue.
         The web user interface (UI) and REST API service, discussed in detail in the next section, are
used to perform queries for required events to the Event Metadata System using various search
criteria. Not only event metadata can be obtained via the developed interfaces, but also full event data
merged into ROOT files containing a tree with the necessary events can be requested. A special
Gathering Service is used to solve this task. Another service, Dispatcher, is employed to transfer event
metadata directly to the workload management system of the experiments for further data processing
or physics analysis on the NICA computing platform.
        Authentication and authorization in the Event Metadata System have been implemented on the
basis of the FreeIPA system, which is also used for the other information systems of the experiment.
Each request to the EMS must be authenticated, and the available operations depend on the assigned
user role: Index Administrator (full access, including the deletion of possibly erroneous data), Index
Writer (can write new events, but not delete existing ones), Event Consumer (read-only user).
        It is imperative that the system is ready and available to handle user requests at any time, so a
monitoring service for the Event Metadata System has been implemented. The service is based on the
following software stack: Telegraf, InfluxDB, Grafana, as well as custom Python scripts.




                         Figure 1. Common architecture of the Event Metadata System




                                                   441
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021



4. Implementation of the REST API and Web UI for the EMS
        Both the Web UI and REST API for the Event Metadata System have been implemented using
the Kotlin programming language, ktor framework [5], and Java Virtual Machine runtime. For ease of
provisioning, the components of the EMS architecture are packed in Docker containers.
        The configuration for the REST API and Web UI services is provided in YAML format,
specifying database connection parameters (for both the Event Catalogue and Condition database),
base URLs for the Web UI and REST API, and custom metadata fields (defined by their name and
type for each field).
        The first version of the Web UI view (deployed for the BM@N experiment) is given in Fig. 2.
It can be seen in the figure that base metadata (period number, run number, software version),
Condition Database fields (beam and target particles, beam energy), and custom event attributes being
specific to the experiment (for instance, positively charged track number) can be used to select the
physics events required for particular analysis or data processing.




                     Figure 2. First version of the Web UI for the Event Metadata System

         The developed REST API service employs JSON formatting illustrated in Fig. 3 to transfer
event metadata. The parameters object includes the configurable part of event metadata, while the
other attributes are the same for events of all the NICA experiments, regardless of configuration.




                                                   442
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021




                        Figure 3. Example of an event metadata record in JSON format

       The implemented API methods allow performing the following operations with event
metadata:
    ●    GET /count/ to obtain event count only, matching given criteria (available for any user role).
    ●    GET /emd/ to obtain event metadata, matching given criteria (for any user role).
    ●    POST /emd/ to add event metadata to the Event Catalogue (for Index Writer and Index
         Administrator roles).
     ● DELETE /emd/ to remove event metadata (for the Index Administrator role only).
     ● GET /eventFile to obtain a ROOT file with physics events satisfying given criteria
         (synchronous operation).
     ● GET /eventFileRef to obtain a reference to a ROOT file with physics events satisfying given
         criteria (asynchronous operation).
         The above REST API methods support the same filtering criteria that are available in the Web
UI, for example, to get event metadata for the given period number (7), software version (19.1), range
of track numbers (from 10 to 15), and given beam particle (Kr), the user queries the following URL
via the HTTP GET method:
        /emd?period_number=7&software_version=19.1&track_number=10-15&beam_particle=Kr
       In addition, a specialized C++ interface has been developed to write new metadata to the
Event Catalogue while processing input data files with physics events in CERN ROOT format.


5. Conclusion
        The first version of the Event Metadata System for the NICA experiments has been developed,
and the information system is actively evolving to simplify user access and event metadata
management. The Event Catalogue has been built on PostgreSQL and stores event metadata of the
experiments to select only the events needed for particular physics analysis. The Web UI and REST
API interfaces have been implemented to interact with NICA collaboration members, as well as with
the other software systems of the experiments. All the services are available, and the Event Catalogue
has been deployed for experimental physics events of the BM@N experiment, the first experiment of
the NICA project. In addition, the Event Metadata System has been integrated with the Condition
Database developed for the NICA experiments.




                                                   443
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021



6. Acknowledgement
       The work was supported by the Russian Foundation for Basic Research (RFBR) grant
according to the research project No.18-02-40125.


References
[1] M. Kapishin, Heavy Ion BM@N and MPD experiments at NICA // JPS Conf. Proc. 32, 010093,
2020.
[2] A. Baskakov, S. Bazylev, A. Fediunin and I. Filippov, MPD Data Acquisition System: Technical
Design Report // JINR, Dubna, 74 p., 2018.
[3] E. Alexandrov, I. Alexandrov, A. Degtyarev, et al., Design of the event metadata system for the
experiments at NICA // Physics of Particles and Nuclei Letters, Vol. 18, 603-616, 2021.
[4] K. Gertsenberger, I. Alexandrov, I. Filozova, et al., Development of Information Systems for
Online and Offline Data Processing in the NICA Experiments // Physics of Particles and Nuclei, Vol.
52, 801–807, 2021.
[5] Ktor: https://ktor.io/ (accessed 09.09.2021).




                                                    444