Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 USAGE OF TIME SERIES DATABASES IN THE GRAFANA PLATFORM FOR THE NETIS SERVICE E.I. Alexandrov1,a, M. E. Pozo Astigarraga2, G. Avolio2 on behalf of the ATLAS Software and Computing Activity 1 Joint Institute for Nuclear Research, Joliot-Curie 6, RU-141980 Dubna, Russia 2 European Organization for Nuclear Research, CERN E-mail: a aleksand@jinr.ru NetIs is a service used to monitor the Data Acquisition network of the ATLAS experiment. The first version was developed at CERN in 2010. Over the years, the need to replace NetIs with an improved service emerged. Indeed, the effort to maintain NetIs has considerably increased together with the size and complexity of the network system; additionally, the Round Robin Database used to store the data results in a loss of granularity over time that makes the tool unsuitable for retrieving accurate values from the past. The graphs produced by NetIs are generated by the backend server and they are quite static, though the GUI is familiar to many users. The main idea was to exploit the recent advancements in time series databases and visualization tools like Grafana in order to present data to the users in a more dynamic way. The Persistent Back-End for the ATLAS Information System, developed in ATLAS for permanent storage of operational data, was already integrated with Grafana and successfully collecting network monitoring statistics. Grafana, despite being a very popular visualization web application, does not support some GUI elements that are used in NetIs such as a tree or position of drop-down. Javascript code integrated with Grafana was used to overcome these limitations. Keywords: ATLAS, network, monitoring, Grafana Evgeny Alexandrov, Mikel Eukeni Pozo Astigarraga, Giuseppe Avolio Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 326 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 1. First version of the ATLAS network monitoring tool and motivation for upgrade The ATLAS experiment is one of four LHC accelerator experiments at CERN [1]. The computing and network infrastructure of the experiment consists of 4,020 computing nodes, 285 network switches and 14,778 switch interfaces (ports). The NetIs system was designed to monitor this network [2]. The first version of NetIs was developed in 2010 based on the cyclic time series Round Robin Database (RRD) [3]. Over the years, the need to replace NetIs with an improved service emerged. Indeed, the effort to maintain NetIs has considerably increased together with the size and complexity of the network system; additionally, the RRD used to store the data results in a loss of granularity over time that makes the tool unsuitable for retrieving accurate values from the past. Another inconvenience of using the first version is that graphs displayed by NetIs are generated by the server backend and they are quite static. This interface has a dynamic tree for selecting nodes, switches or interfaces and information panels as depicted in the image (Figure 1). A new Grafana- based monitoring version using the Persistent Back-End for the ATLAS information system (P- BEAST) [4] database has been developed in recent years. Figure 1. A screenshot of the legacy NetIs [2] 2. The common structure of the NetIs monitoring The common structure of the legacy NetIs monitoring had two data sources: the first for creating dashboards and the second for obtaining the network topology (Figure 2). In this version, the monitoring data was retrieved from switches using the Simple Network Management Protocol (SNMP) and was stored in a MySQL database to be later converted into RRD files by a separate service. The Matplotlib [5] library generated images from the RRD data sources and the Django framework [6] passed these images to the client. The second data source was used for navigation and it was implemented with a JavaScript tree object from the dhtmlxTree library [7]. The data source of the tree was the Central DB (CDB) that contains a description of the whole system. The new NetIS 327 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 version has similar structure, but the monitoring data fetched with SNMP are placed in the P-BEAST. P-BEAST was developed in ATLAS for permanent storage of operational data, and it has already been integrated with Grafana [8]. Grafana was chosen as a new data visualization service, but it has some limitations in the GUI implementation that makes the navigation less intuitive than in the legacy version. The main problem is that Grafana doesn't support the visualization of hierarchical data structures. Figure 2. General structure of the NetIs Monitoring 3. The Graphics interface implementation of the Tree The tree is the main GUI navigator in NetIs, but as mentioned Grafana does not support the Tree element. It is possible to add a tree to Grafana in the following ways: create a custom plugin or inject JavaScript code dashboards using the Grafana text panel widget. Both approaches have their pros and cons. Creating a new panel is the main way to add new functionality to Grafana. This path requires the presence or acquisition of skills to improve Grafana itself. The plugin must be compiled as part of the Grafana developer workflow and installed on the server side. Within this path, to add any new function, for example, in our case we need to set the position for the dropdown menu, a new plugin is required. Unfortunately, this path does not guarantee compatibility with new versions. Sometimes some plugin does not work with new version. The most effective way to add a tree into Grafana is to add JavaScript code to the text panel. This approach does not require a Grafana developer workflow because all the code will be placed in the Grafana text panel. The tree is not an element of Grafana itself and cannot use the data retrieval methods. It should directly use the JavaScript objects of the Grafana library to interact with other Grafana elements. This method can be easily adopted for any HTML/JavaScript object, including a dropdown menu. The new version of NetIs uses JavaScript code in a text panel to implement the tree (Figure 3) and some other elements. The text panel is a basic element of Grafana. It supports HTML code and JavaScript inside it. This external JavaScript tree interacts with the just using the Grafana JavaScript code downloaded by the browser, which is included in the main HTML page. Using this library, the user can receive data from other panels and update dashboards. The NetIs uses tree as the navigation panel. 328 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 Figure 3. The scheme of interaction of the external JavaScript tree with Grafana The initial data to build the tree for NetIs is stored in the CDB MySQL database. Grafana has a plugin for MySQL, but as already mentioned, the tree is not an element of Grafana so it cannot directly interact with a Grafana data source. Instead, the tree receives data using the Grafana template element. The template stores the data as a list of strings that needs to be created first out of the data retrieved from several tables in the MySQL database. The strings have the following structure: Function:Device::Linecard::Interface. Some devices do not have a linecard component, so in this case when converting data for such devices a special token, PORTS or LAG, needs to be generated. The NetIs uses standard edition of the dhtmlxTree library for the tree implementation in HTML. The Initial method gets all strings of the template, parses them and generates the tree. NetIs uses special ID of tree node for detection the level of tree and getting the data required for the dashboards. The ID has the following format: ● Root level (function in CDB): _RR_FunctionName ● Level 1 (device in CDB): DeviceId ● Level 2 (linecard in CDB): _LL_DeviceId:LinecardId ● Level 3 (interface in CDB): _II_DeviceId:LinecardId:InterfaceId Different levels of the NetIs tree have different context structures, with the exception of the tree area, which is always present in the upper left area. The tree uses the ID of selected node to generate content of the page using Grafana‘s JavaScript library. This approach allows the developers to make deeper changes in the structure of the monitoring page than those that can be done using standard Grafana tools. 4. View of NetIs monitoring Figure 4 presents the NetIs view for a device (level 1). It has two dashboards with aggregated data for all the interfaces belonging to the device and two mini dashboards with aggregated data for each linecard of selected device. The first dashboard can change the network metrics displayed (e.g. from Packet/s to Octets/s) using the dropdown button on top of it. The following types of metrics are available: Discards/s, Errors/s, Link load, Octets/s and Packets/s. The second dashboard show “packets detailed” data, which includes a breakdown of the traffic into Unicast, Broadcast and Multicast packets. For the mini dashboards the same set of metrics as for the uppermost dashboard is available with its own dropdown button. The second mini dashboard displays links speed and status. To the left of the miniplots there is a table summarizing the host and linecard information. 329 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 Figure 4. NetIs view for linecard (level 2) The linecard view (level 2) has the same structure but uses the metrics gathered for the selected linecard and its interfaces. The interface view (level 3) does not display aggregated data (Figure 5), but instead it has a text area with additional information about the interface itself. Figure 5. The interface view 5. Conclusions The new version of NetIs was successfully implemented, tested and put into production. These services are based on Grafana and P-BEAST. The use the P-BEAST time-series database avoided the loss of data over time of the stored samples and does not degrade the resolution of the dashboards. Grafana makes the NetIs page navigation more dynamic and flexible. Maintaining the new system should be easier than before because only knowledge of Web and JavaScript technologies is required to support the NetIs service. The monitoring system will be evolved and updated following operational experience. 330 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 References [1] ATLAS Collaboration 2008 The ATLAS Experiment at the CERN Large Hadron Collider, JINST 3 S08003 doi:10.1088/1748-0221/3/08/S08003 [2] D. Savu, A. Al-Shabibi, B. Martin, R. Sjoen, S. Batraneanu and S. Stancu 2010 Integrated System for Performance Monitoring of the ATLAS TDAQ Network, Journal of Physics: Conference Series. 331 052031 doi:10.1088/1742-6596/331/5/052031 [3] RRDtool: https://oss.oetiker.ch/rrdtool/index.en.html [4] A. Sicoe, G. Lehmann, Luca Magnoni, S. Kolos, I. Soloviev 2012 A persistent back-end for the ATLAS TDAQ online information service (P-BEAST), Journal of Physics: Conference Series 368 (2012) 012002 doi:10.1088/1742-6596/368/1/012002 [5] Matplotlib: https://matplotlib.org/ [6] Django: https://www.djangoproject.com/ [7] JavaScript Tree: https://dhtmlx.com/docs/products/dhtmlxTree/ [8] Grafana: https://grafana.com/ 331