=Paper=
{{Paper
|id=Vol-3041/9-16-paper-2
|storemode=property
|title=Accounting and Monitoring Infrastructure for Distributed Computing in the ATLAS Experiment
|pdfUrl=https://ceur-ws.org/Vol-3041/9-16-paper-2.pdf
|volume=Vol-3041
|authors=Aleksandr Alekseev,Dario Barberis,Thomas Beermann
}}
==Accounting and Monitoring Infrastructure for Distributed Computing in the ATLAS Experiment==
<pdf width="1500px">https://ceur-ws.org/Vol-3041/9-16-paper-2.pdf</pdf>
<pre>
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                            Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


   ACCOUNTING AND MONITORING INFRASTRUCTURE
     FOR DISTRIBUTED COMPUTING IN THE ATLAS
                  EXPERIMENT
                     A. Alekseev1,a, D. Barberis2, T. Beermann3
              on behalf of the ATLAS Computing and Software activity
                      1
                          Institute for System Programming of the RAS, Russia
         2
             Università di Genova and INFN, Via Dodecaneso 33, I-16146 Genova, Italy
                                 3
                                     University of Wuppertal, Germany

                                 E-mail: a Aleksandr.Alekseev@cern.ch


The ATLAS experiment uses various tools to monitor and analyze the metadata of the main
distributed computing applications. One of the tools is fully based on the Unified Monitoring
Infrastructure provided by the CERN-IT MonIT group. The Unified Monitoring Infrastructure uses
modern and efficient open-source solutions such as Kafka, InfluxDB, ElasticSearch, Kibana and
Grafana to collect, store and visualize metadata produced by data and workflow management systems.
This software stack is adapted for the ATLAS experiment and allows the development of dedicated
monitoring and accounting dashboards in Grafana visualization environment. The current state of the
monitoring infrastructure and overview of core monitoring and accounting dashboards in the ATLAS
are presented in this contribution.

Keywords: Scientific computing, BigData, Hadoop, InfluxDB, ElasticSearch, Grafana,
Kibana.


                                                Aleksandr Alekseev, Dario Barberis, Thomas Beermann

                                                             Copyright © 2021 for this paper by its authors.
                    Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


                                                    9
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                            Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


1. Introduction
         During the LHC Run 1 and Run 2 (2009-2018), the ATLAS Collaboration [1] used custom
frameworks to collect and display the monitoring and accounting data related to the worldwide
operation of its distributed computing system [1]. These frameworks and dashboards served well
during that time, but they started to show their age in several areas; in particular, the data retrieval was
slow because of the large amount of data stored in Oracle databases. Since 2016 CERN-IT MonIT
group started developing a new unified infrastructure and environment for monitoring and accounting
applications based on a modern and efficient open-source software stack [2]. This stack is adapted for
the ATLAS experiment and allows the development of dedicated monitoring and accounting
applications in the Grafana [3] and Kibana [4] visualization environments [5].


2. The CERN MonIT Unified Monitoring Infrastructure
        The CERN MonIT Unified Monitoring Infrastructure (UMI) can receive data from many
sources of information, through appropriate interfaces based on Apache Flume [6]. It feeds the data
through a buffer and pipeline in Kafka [7]; this allows the execution of Spark [9] jobs to aggregate
and/or enrich the data, and stores each type of data in the appropriate technology: ElasticSearch [9],
the time-oriented database InfluxDB [10] for aggregated data, or directly in the Hadoop file system
HDFS [11] for web-based analyses and long-term archival. Several data display frameworks can then
be used to create different representations and dashboards: Kibana for the data in ElasticSearch,
Grafana for the data in InfluxDB or in ElasticSearch, and Swan [12] for interactive web-based
analyses and displays of the data in HDFS. Figure 1 gives a graphical representation of the data flow
through the UMI.


  Figure 1. Graphical representation of the data flow through the CERN MonIT Unified Monitoring
                                             Infrastructure

         The ATLAS Distributed Computing (ADC) activities use the UMI to feed several different
dashboards, some of which take input information from different data sources and provide correlated
information to the users. The users are normally shifters who monitor the performance of the ATLAS
distributed computing components, activity managers who need numbers and figures for resource
accounting purposes, and Grid site managers who wish to check the performance of their sites for
ATLAS computing activities.


                                                    10
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                            Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


        There are three major groups of ATLAS dashboards:
    1. Job monitoring and accounting dashboards, showing the number of jobs pending, running or
       completed at each Grid site and their status, possible errors, CPU and wall-click time etc.
    2. Data management monitoring and accounting dashboards, showing file transfer status, transfer
       data volumes by origin and destination, transfer results, data volumes at each site etc.
    3. Site status dashboards, showing the status of the Grid services at each site, the number and
       efficiency of the running or completed jobs, the ongoing data transfers, the data storage
       volume for each data category etc.
        Next sections describe some of these dashboards in more detail.


3. Job monitoring and accounting dashboards
        For the job dashboards, Flume JDBC extracts information from jobs tables in the database of
the PanDA [13] workload management system every ten minutes, while pledge information is taken
from the WLCG REBUS [14] database and site topology information is taken from the ATLAS Grid
information system CRIC [15]. Then Spark jobs process the data executing the following steps: first
additional information is computed for each job (ADC activities, error messages, execution time), then
the data is enriched with topology information from CRIC, and finally the processed data is written
back to Kafka. Kafka keeps aggregated data in five separate indices for different jobs statuses
(submitted, pending, running, finalizing and completed jobs) in the ElasticSearch store. Aggregated
information about jobs is also kept in long-term HDFS storage. Figure 2 shows the data flow related to
the jobs dashboards in UMI.


           Figure 2. Graphical representation of the data flow related to the jobs dashboards

         The Job Accounting dashboard shows aggregated information about Grid and Tier-0 jobs run
since the start of data-taking in 2010. Eighty-seven histograms and other charts can be displayed for
selected time ranges, with additional selections based on job types, execution sites, input data types,
status etc. It is used by shifters, experts and management to spot problems with the operation of the
workflow management system and the Grid sites. Different display styles can be selected, and the
binning can be set to one hour (the default aggregation) or one, seven or thirty days.
         The Job Monitoring dashboard provides extended information about completed jobs for 2
months. It is useful to monitor separate jobs, tasks or requests. The data are processed in the same way
as for the job accounting dashboard; Kafka keeps aggregated data in a dedicated index for completed
jobs in the ElasticSearch storage. The binning can be set to 10 minutes, 30 minutes, one, six or 12
hours, one, seven, 14 or 30 days.
       The HS06 Report dashboard is used by management to generate resource usage reports, where
the CPU consumption is shown in HepSpec2006 (HS06) normalized units [16]. The dashboard uses


                                                    11
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                            Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


aggregated data from ElasticSeach index for completed jobs (as in the job accounting dashboard),
splitting information into separate tables for computing sites, federations and tiers. The data from the
tables can be exported to CSV reports using the Grafana API and python scripts. Figure 3 shows some
examples of plots from the Job Accounting dashboard: the number of CPU cores used by running jobs
in HS06 units, split by ADC activity, and the CPU usage efficiency based on successful and all
completed jobs grouped by site.


Figure 3. Examples of data representations provided by the job monitoring and accounting dashboards

4. Data monitoring and accounting dashboards
       The MonIT infrastructure collects events and traces sent by the data management system
Rucio [17] and injects them into a Kafka pipeline; furthermore, the infrastructure also regularly reads
topology information from CRIC and adds it to Kafka.
        The Spark jobs that process the data consist of the following steps: first the Spark job formats
the events and traces so they fit in a similar structure, then they are enriched with topology information
from CRIC, and finally everything is aggregated in one-minute bins by source/destination, activity,
cloud, federation, country, tier, etc. and the number of failed and successful transfers are counted.
These data are written back to Kafka and then stored in ElasticSearch, InfluxDB and/or HDFS. Figure
4 shows the data flow related to the DDM dashboards in UMI.


                                                    12
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                            Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


     Figure 4. Graphical representation of the data flow related to the Data Management (DDM)
                                              dashboards

          There are two Global Accounting dashboards. The “historical” dashboard is based on datasets.
For each dataset, it counts how many file replicas are available split by primary, secondary, tier,
disk/tape and dataset metadata, and shows the evolution of replicas over time. It includes 13 plots, 6
filters, 5 options to group the data.
      The “snapshot” dashboard includes the same data as the historical dashboard but displays the
numbers for the last week.
        The Site Accounting dashboard is based on information provided by the Rucio Storage
Elements (RSEs) and displays information useful for central operations and for local site
administrators. It counts the files and datasets for each RSE and can display the numbers of files or
datasets and their volume according to selections set by the users.
        The Data Transfer dashboards are mainly used by shifters and experts to monitor the
worldwide transfers on the Grid and spot problems. The main dashboard has a granularity of 1 minute
for the last 30 days; it uses both InfluxDB and ElasticSearch at the same time, with time-series
information in the plots coming from InfluxDB and the transfer details coming from ElasticSearch. It
includes 35 plots with 18 filters and 17 options to group data. The binning can be set to 10 minutes, 1
hour, 6 hours, 12 hours or 1 day.
        The Historical Transfer dashboard has a minimum granularity of 1 hour for up to 5 years, and
displays data in 39 plots with 16 filters and 23 options to group data. Some examples of data
representations from data management dashboards are shown in Figure 5: the data volume per data
class plot from the Data Accounting dashboard and the data transfer efficiency by site from the Data
Transfer dashboard.


                                                    13
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                            Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


Figure 5. Examples of data representations provided by DDM monitoring and accounting dashboards


5. Site monitoring dashboards
         Other dashboards have been developed to provide comprehensive and coherent views of each
site status to local system administrators and managers, as well as to ATLAS central operators and
shifters. The Site-oriented dashboard combines information from the Job Accounting and Data
Transfer dashboards to allow monitoring and analysis of the efficiency of the computing sites; it
contains 8 plots (6 from the Job and 2 from the Data dashboards), 5 filters and 55 options to group
data. Figure 6 shows some examples of the plots from this dashboard: the transfer efficiency by
activity and the efficiency based on successful and all completed jobs grouped by activity.


                                                    14
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                            Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


                 Figure 6. Some of the plots provided by the Site-oriented dashboard

         The Site Status Board groups information provided by ATLAS tools (PanDA and Rucio) as
well as by tests run regularly by the WLCG team to check the status and response times of the Grid
sites for typical unit tests of simple remote commands [18]. It includes several dashboards displaying
different data types, all containing links to the primary source of information. Figure 7 shows the
information from the Site Status Board dashboard.


                           Figure 7. The entry page of the Site Status Board
6. Conclusions
         Since 2016 the CERN-IT MonIT group has been developing a new Unified Monitoring
Infrastructure (UMI). The infrastructure uses modern and efficient open-source solutions such as
Kafka, InfluxDB, ElasticSearch, Kibana and Grafana to collect, store and visualize metadata produced
by data (Rucio) and workflow management (PanDA) systems. This software stack is adapted for the
ATLAS experiment and allows the development of dedicated monitoring and accounting dashboards
in the Grafana visualization environment. ATLAS currently has over a dozen production dashboards


                                                    15
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                            Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


for Jobs, DDM and Sites monitoring; other dashboards relate to the status of central computing
servers. All parts of the monitoring based on UMI are constantly being improved and adapted to the
rapidly-evolving technologies.
7. Acknowledgement
       This work was partially funded by the Russian Science Foundation under contract
No.19-71-30008 (the research is conducted in the Plekhanov Russian University of Economics).

References
[1] ATLAS Collaboration 2008 The ATLAS Experiment at the CERN Large Hadron Collider, JINST
3 S08003 doi:10.1088/1748-0221/3/08/S08003
[2] Sargsyan L et al. 2012 ATLAS job monitoring in the Dashboard Framework, J.Phys.Conf.Ser.
396 (2012) 032094 doi:10.1088/1742-6596/396/3/032094
[3] Aimar A et al. 2019 MONIT: Monitoring the CERN Data Centres and the WLCG Infrastructure,
EPJ Web Conf. 214 08031 doi:10.1051/epjconf/201921408031
[4] Grafana: https://grafana.com/
[5] Kibana: https://www.elastic.co/kibana
[6] Beermann T et al. 2020 Implementation of ATLAS Distributed Computing monitoring
dashboards     using    InfluxDB and   Grafana, EPJ   Web      Conf.   245   03031
doi:10.1051/epjconf/202024503031
[7] Apache Flume: https://flume.apache.org/
[8] Apache Kafka: https://kafka.apache.org/
[9] Apache Spark: https://spark.apache.org/
[10] ElasticSearch: https://www.elastic.co/elasticsearch
[11] InfluxDB: https://www.influxdata.com/
[12] Hadoop: http://hadoop.apache.org/
[13] CERN SWAN service: https://swan.web.cern.ch/
[14] Barreiro Megino F et al. 2016 PanDA: Exascale Federation of Resources for the ATLAS
Experiment at the LHC, EPJ Web Conf. 108 01001 doi:10.1051/epjconf/201610801001
[15] WLCG REBUS: http://wlcg-rebus.cern.ch/
[16] ATLAS CRIC: https://atlas-cric.cern.ch/
[17] HepSpec2006: http://www.spec.org/cpu2006
[18] Barisits M et al. 2019 Rucio: Scientific Data Management, Computing and Software for Big
Science 3:11 doi:10.1007/s41781-019-0026-3
[19] WLCG ETF: http://etf.cern.ch/docs/latest/user/overview.html


                                                    16

</pre>