Development of a technology for collecting and analyzing data for monitoring based on an ontological approach Yuri I. Molorodov1,2 , Oleg V. Kasatkin2 1 Federal Research Center for Information and Computational Technologies, Novosibirsk, Russia 2 Novosibirsk State University, Novosibirsk, Russia Abstract One of the ways to build information models is ontological modeling. The use of ontologies greatly facilitates the exchange of data between embedded models and utilities for the digital representation of an object or a real-world system, sometimes called “digital twin” (DT). It is also important to establish a correspondence between the DT, people and external programs. Based on the dictionary of the main terms, classes, objects of the subject area and the relations between them, we have built an ontology of the hydroelectric dam DT. Keywords Data collection and analysis system, monitoring of the technical condition of hydroelectric dams, dynamic characteristics, processing of seismometric monitoring data, digital twin ontology of hydroelectric dams. 1. Introduction A digital twin is a software analogue of a physical device that simulates internal processes, technical characteristics and behavior of a real object under environmental influences. The concept of digital twins implies the connection of the physical and digital world through the interaction of information models. In other words, a mathematical model is created for a physical object, a piece of equipment or an entire process, which is then used to analyze the behavior of the object. Moreover, the digital model is constantly updated to fully correspond to the current state of the real object. This makes it possible to identify unexpected changes in processes, optimize the operating modes of equipment, prevent breakdowns and accidents, which ultimately allows you to significantly increase the reliability and efficiency of operation. According to regulatory documents, hydroelectric power plants are designed with a minimum expectation for 100 years of operation, but with proper maintenance, the actual operation period can be significantly increased and reach several hundred years [8]. The main equipment and structures naturally wear out and require major repairs, modernization, replacement. The entire life cycle of a hydroelectric power plant is a series of constant updates, upgrades, reconstructions associated with the replacement of automation systems or with the replacement of main or auxiliary equipment. SDM-2021: All-Russian conference, August 24–27, 2021, Novosibirsk, Russia " yumo@ict.sbras.ru (Y. I. Molorodov); o.kasatkin@g.nsu.ru (O. V. Kasatkin) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 212 Yuri I. Molorodov et al. CEUR Workshop Proceedings 212–221 The use of a digital twin of the hydroelectric dam will significantly increase the efficiency of operation, as well as determine the most accurate time of maintenance. 2. The use of digital twin The digital twin uses information from sensors (sensors) installed on a real object not only to update the current digital copy, but also to monitor the state and analyze the residual resource of the object, predict defects. To build a digital twin of the dam, first of all, it is necessary to build an information model. One of the ways to build information models is ontological modeling. The use of ontologies greatly facilitates the exchange of data between embedded models and utility programs of the digital double, as well as between the digital double and people or external programs. There are many definitions of ontology of varying degrees of formalization. Often, the definition is given based on the methods of constructing an ontology. Nevertheless, despite the abundance of approaches to the definition, the same composition of components is usually used, which includes: 1) classes or concepts that are usually interpreted quite broadly and can include entities of any kind; 2) instances, i.e. separate entities, the totality of which forms classes; 3) attributes — specific internal properties of classes and instances; 4) relationships mean the relationship between the classes of the subject area, for example, the relation of taxonomy; 5) axioms, or rules of inference, which determine some always true statements that serve, for example, to check information for correctness. Figure 1: Ontological description of the digital twin of the dam. 213 Yuri I. Molorodov et al. CEUR Workshop Proceedings 212–221 According to [6], the ontology is formally defined as follows: 𝑂𝑅 = ⟨𝐶, 𝑅, 𝐴, 𝑇, 𝐷⟩, where 𝐶 = {𝐶1 , . . . , 𝐶𝑛 } — set of domain classes; 𝑅 = {𝑅1 , . . . , 𝑅𝑚 }, 𝑅𝑖 ⊆ 𝐶 × 𝐶 — the set of relations defined on the classes of the subject area; 𝐴 = {𝑎1 , . . . , 𝑎𝑤 } — a set of attributes describing the properties of concepts 𝐶 and relations 𝑅; 𝑇 is the set of standard attribute values; 𝐷 = {𝑑1 , . . . , 𝑑𝑛 } — the set of domains. To build an ontology of the digital twin of a hydroelectric dam, we will use the CmapTools pro- gram. Based on the works [2, 3, 12], as well as following the instructions of the methodological manual [5], the ontologies presented in Figure 1 were constructed. Figure 1 shows that the model of a digital twin of a hydroelectric dam requires constant updating of simulation models and accumulation of data obtained from a real object to monitor the current technical condition and predict the remaining resource. To solve these problems, an information system for collecting and processing information is needed, which, along with calculation and information models, can become a central link in the construction of a digital twin of the dam. 3. Functionality and structure of the system In the work [1], a method for assessing the technical condition of buildings and structures is proposed, based on the analysis of changes in dynamic characteristics determined under the influence of a microseismic background of a natural and man-made nature, i.e. without the use of special excitation sources. These characteristics are manifested under dynamic loads and reflect the technical condition of the building structure as a whole (frequencies and forms of natural vibrations, attenuation decrements, statistical characteristics, etc.). The analysis of trends in dynamic characteristics, taking into account the influence of seasonal environmental factors, allows us to determine their irreversible changes and monitor the current technical condition of the entire structure. To perform the tasks of monitoring the state of the object in this work, the periodic (accord- ing to a given schedule) registration of dam vibrations under the influence of microseismic background and dynamic loads from the equipment operating at the HPP is selected. Based on the analysis of regulatory documents, the following main functional requirements for the system were identified: 1) the possibility of scheduled and periodic registration of microseismic vibrations of the dam; 2) the system must store records of seismometric monitoring data and information about the functioning of the system; 3) the possibility of data analysis using various information processing algorithms, calculation of the studied parameters and characteristics; 4) the ability to log error messages in the system, notify the user in case of exceeding the permissible values of the observed values and problems in the system; 5) visualization of the analysis results. 214 Yuri I. Molorodov et al. CEUR Workshop Proceedings 212–221 Figure 2: System functionality. Figure 3: The basic structure of the system for collecting, storing, analyzing and visualizing data arrays. In general, the operating procedure and functionality of the system under development are shown in Figure 2. The assessment of the technical condition of hydraulic structures is carried out in accordance with the criteria adopted in the current building codes and rules for ensuring the safety of operation of buildings and structures according to the theory of limit states. In the developed system, the criteria for the safety of hydraulic structures in terms of seismometric monitoring should be established in the form of functional dependencies between the dynamic characteristics and parameters of external influences and loads. These dependencies are determined based on the results of statistical analysis of data at the accumulation stage. The structurally developed system should be a construction of related subsystems: data collection, storage of the received data arrays, information processing and analysis, and data visualization. Figure 3 shows the basic structure of the system for collecting, storing, analyzing and visualizing data arrays. The decomposition of the system into functional modules with well-defined data flows between them can significantly reduce the complexity of the system and its development, as well as increase the versatility of the modules. 3.1. Development of a data processing and analysis module The data processing and analysis module was written in the C++ programming language in the Microsoft Visual Studio 2019 development environment using the STL library. 215 Yuri I. Molorodov et al. CEUR Workshop Proceedings 212–221 3.2. Input data format As an input in this paper, we will consider the CIBF seismometric monitoring file format used in the operation of the automated earthquake registration system and monitoring of the technical condition of the dam (PAK-MZ) of the Krasnoyarsk HPP. This system periodically registers stationary micro-vibrations of the dam at ten observation points in three directions of vibration of the structure: longitudinal, transverse and vertical. The CIBF file format is a specially developed format for seismometric monitoring data. The file name includes the start time of data registration (the time of the first count) in the format dd_mm_yyy_hh_mm_ss.cibf. All data is stored in the little-endian format (i.e. the lowest byte comes first). CIBF files consist of data packets (in one packet, data for all channels). Each packet contains a header and data. The header includes the serial number of the packet for monitoring data transmission, the parameters of the registration profile and the length of the data. Each packet contains a 32-byte header and the data itself. The header structure is shown in Table 1. The number of samples (the Sample Received field) shows how many samples are in the data following the header. Thus, the headers can be used for iterating over the packages of the file. The data is presented as a set of samples by channels. Each sample is represented as two bytes of the data itself (the code with the ADC) and two bytes of service information (Table 2). The bit fields F1, F2, and F3 are responsible for channel error, negative saturation, and positive saturation, respectively. If the 8th bit of F1 is 0, then a channel error has occurred. If the 9th bit of F2 is not equal to 0, then negative saturation has occurred. If the 10th bid of F3 is not equal to 0, then a positive saturation has occurred. The lowest byte represents a cyclic code from the range from 0 to 254. Table 1 Package Header structure. Byte Field name Type Length Comment number 0 Command word 2 1: test, 2: data, 3: service 2 SamplesResived word 2 Number of samples 4 DateTime TDateTime (double) 8 Start time 12 PackedSize integer 4 The size of the data packet after the header 16 PackedNmb integer 4 Package number 20 SendDateTime TDateTime (double) 8 The time of sending — possibly an empty field, a reserve 28 Reserved0 word 2 0 30 Reserved1 word 2 0 Table 2 Format of service information ctrl. 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Ch a nn e l F3 F2 F1 Cy c li c c o de 216 Yuri I. Molorodov et al. CEUR Workshop Proceedings 212–221 3.3. Data extraction and processing To read CIBF format files, we will use the functionality of the fstream header file. Let’s create an object of the std::ifstream class and link the class object to the CIBF file from which the reading will be performed (hereinafter we will simply call it “file”). At the same time, we need to receive data in binary form, so we will use the binary mode of opening the file. If it is impossible to open the specified file, we will generate an exception of the std::exception type. Recall that the smallest addressable element of the computer’s RAM is a byte. Let’s calculate the exact length of the file. To do this, use the seekg function to move the file pointer to the final position and use the tellg function to find out its number (in bytes), then use the seekg function again to return the pointer to the starting position for further work. Note that the computer’s central processor uses a machine word as the main unit of work with memory, the size of which depends on the processor architecture. When storing an arbitrary object in memory, it may happen that some field consisting of several bytes will cross the “natural boundary” of machine words in memory. Some processor models cannot access such data in memory, or they take longer to access data located inside an entire machine word in memory. Therefore, some C++ compilers can use automatic “data alignment”. This optimization consists in inserting insignificant bytes between the fields of the data structure when placing it in memory to speed up access to this structure. For the convenience of reading, we will describe data packages using structures — the custom data type struct of the C++ language. Each packet will be a structure, whose members will be other structures — the packet header and an array of samples. When describing structures, we will use the pragma pack(push,1) and pragma pack(pop) preprocessor directives to exclude any data alignment. Create an integer variable with a zero initial value to track the number of the current position (in bytes) from the beginning of the file. Further, everywhere under the word “reading” we will mean byte-by-byte extraction of the structure into a variable of the corresponding type using the read operator. The process of reading data packets will occur in the body of the main while loop, the condition for the completion of which will be reaching or exceeding the number of the current position of the file length value. The increment of the value of the current position number will occur after using the read operator. In the body of the main loop, we will first read the packet header. At the beginning of the header of each package is a number that characterizes the contents of the package. We will use it to control the reading. If the number 2 is the first in the read packet, then we determine how many samples are in the packet and read them into the buffer. Otherwise, we notify the user that this package contains service or test information. In the cycle, we will process the received data according to the number of read samples. First of all, from the ctrl field, we will determine the number of the channel on which the countdown was received. Next, it is necessary to convert the value 𝑁 obtained from the ADC to the form 20𝑁 1 1 𝐴= 16 · · , 2 𝐾 𝐾𝐸𝑀 𝑆 where 𝐴 is the acceleration amplitude; 𝐾 is the total signal gain; 𝐾𝐸𝑀 𝑆 is the electrome- chanical coupling coefficient. The 𝐾𝐸𝑀 𝑆 of the sensor determines its sensitivity. At the 217 Yuri I. Molorodov et al. CEUR Workshop Proceedings 212–221 Table 3 Gain factors 𝐾. Pavilion, mark, section, component Gain 1k PV.1, mark.223, sec.Ba, Z 100 × 4 = 400 2k PV.1, mark.223, sec.Ba, Y 100 × 4 = 400 3k PV.1, mark.223, sec.Ba, X 100 × 4 = 400 4k PV.2, mark.223, sec.8, Z 100 × 2 = 200 5k PV.2, mark.223, sec.8, Y 100 × 2 = 200 6k PV.2, mark.223, sec.8, X 100 × 1 = 100 7k PV.3, mark.132, sec.8, Z 100 × 4 = 400 8k PV.3, mark.132, sec.8, Y 100 × 4 = 400 9k PV.3, mark.132, sec.8, X 100 × 4 = 400 10k PV.4, mark.223, sec.22, Z 100 × 1 = 100 11k PV.4, mark.223, sec.22, Y 100 × 1 = 100 12k PV.4, mark.223, sec.22, X 100 × 1 = 100 13k PV.5, mark.132, sec.22, Z 100 × 4 = 400 14k PV.5, mark.132, sec.22, Y 100 × 4 = 400 15k PV.5, mark.132, sec.22, X 100 × 4 = 400 16k Reserved 17k PV.6, mark.244, sec.37, Z 100 × 1 = 100 18k PV.6, mark.244, sec.37, Y 100 × 1 = 100 19k PV.6, mark.244, sec.37, X 10 × 4 = 40 20k PV.7, mark.132, sec.37, Z 100 × 2 = 200 21k PV.7, mark.132, sec.37, Y 100 × 2 = 200 22k PV.7, mark.132, sec.37, X 100 × 2 = 200 23k PV.8, mark.223, sec.54, Z 100 × 1 = 100 24k PV.8, mark.223, sec.54, Y 100 × 1 = 100 25k PV.8, mark.223, sec.54, X 10 × 4 = 40 26k PV.9, mark.132, sec.54, Z 100 × 2 = 2000 27k PV.9, mark.132, sec.54, Y 100 × 2 = 200 28k PV.9, mark.132, sec.54, X 100 × 2 = 200 29k PV.10, mark.223, sec.71, Z 100 × 4 = 400 30k PV.10, mark.223, sec.71, Y 100 × 4 = 400 31k PV.10, mark.223, sec.71, X 100 × 4 = 400 32k Reserved Krasnoyarsk HPP dam, 𝐾𝐸𝑀 𝑆 = 2𝑉 sec2 /m. The total signal gain is calculated as the product of the pre-gain coefficients 𝐾1 and the main gain 𝐾2. The values of the gain coefficients 𝐾 are shown in Table 3. The obtained acceleration amplitudes are distributed over channels using objects of the std container class::vector. 3.4. Presentation and analysis of the results of the module The implemented module was tested on CIBF files. The correctness of the module operation and the results obtained was confirmed by a specialist of the PAK-MZ system of the Krasnoyarsk 218 Yuri I. Molorodov et al. CEUR Workshop Proceedings 212–221 HES. Based on the obtained values of the acceleration amplitudes, the statistical characteristics of the oscillations are calculated. An example of the calculated statistical characteristics for one of the monitoring sessions is shown in Table 4. One of the methods for visualizing the process of changing the amplitudes of accelerations over time is the construction of seismic tracks. A seismic trace is a graph of the dependence of the signal level of seismic waves or noise on the time of their registration. Let’s use the Gnuplot charting program. To build a seismic track, it is necessary to know the duration of the registration of vibrations (512 seconds) and the number of recorded samples (the length of the object std::vector, where the data is stored after extraction). If we set aside the time in seconds horizontally, and the values of the acceleration amplitudes (mm/s2 ) vertically, Table 4 Statistics for the session 03/04/2011/19:00. Channel No. Coordinate Average, mm/s2 Variance, mm2 /s4 Frequency, Hz 1 Z −0.02 0.01 4.63 2 Y −0.01 0.01 4.87 3 X −0.02 0.04 4.33 4 Z −0.04 0.16 4.73 5 Y −0.03 0.09 5.81 6 X −0.08 4.84 4.32 7 Z −0.02 0.01 4.5 8 Y −0.01 0.01 4.89 9 X −0.02 0.04 4.12 10 Z −0.09 0.11 4.71 11 Y −0.07 0.1 5.2 12 X −0.06 0.63 4.31 13 Z −0.02 0.04 4.53 14 Y −0.02 0.02 4.03 15 X −0.02 0.14 4.36 17 Z −0.08 0.12 4.72 18 Y −0.06 0.13 6.02 19 X −0.12 3.9 4.38 20 Z −0.04 0.09 4.22 21 Y −0.03 0.07 5.11 22 X −0.03 0.12 4.6 23 Z −0.09 0.12 6.17 24 Y −0.09 0.13 4.97 25 X −0.2 3.24 4.53 26 Z −0.03 0.05 4.81 27 Y 0.07 0.03 5.54 28 X −0.05 0.1 4.34 29 Z −0.02 0.01 7.31 30 Y −0.03 0.02 5.98 31 X −0.02 0.04 4.87 219 Yuri I. Molorodov et al. CEUR Workshop Proceedings 212–221 Figure 4: Seismic route of the 2nd channel of the session 03/04/2011/07:00. then we will get a visualization of the dam oscillation process in the specified observation point and direction in time on the graph. An example of a seismic route is shown in Figure 4. 4. Conclusion On the basis of regulatory documents regulating the maintenance of hydroelectric dams, the subject area is investigated and an information system for collecting and analyzing seismometric monitoring data is developed. The basis for designing an information system is an ontological approach. Using this approach, the ontology of the digital twin of the hydroelectric dam is constructed. The main functional capabilities of the information system are formulated and its structure is described. The data processing and analysis module is implemented in the C++ programming language. The implemented module has been tested on CIBF-format seismometric monitoring data packages. References [1] Baryshev V.G., Kuzmenko A.P., Saburov V.S. et al. Dynamic test inspection of dams under the influence of operational dynamic loads // Gidrotehn. Construction. 2002. N. 10. P. 26–36. (In Russ.) [2] Borovkov A.I., Ryabov Yu.A. Digital twins: Definition, approaches and methods develop- ment / A.V. Babkin (Ed.) // Digital Transformation of Economy and Industry: Proceedings of the Scientific-Practical Conference. June 20–22, 2019. SPb.: Polytechnic Press, 2019. P. 234–245. (In Russ.) [3] Borovkov A.I., Ryabov Yu.A., Maruseva V.M. A new paradigm of digital design and model- ing of globally competitive products of a new generation // Digital Production: Methods, Ecosystems, Technologies. MSU Skolkovo, 2018. P. 24–44. (In Russ.) [4] Bulletin of RusHydro-Science and Technology. The future digital everyday life of RusHy- dro. Available at: https://vestnik-rushydro.ru/articles/2-fevral-2019/nauka-i-tekhnologii/ budushchie-tsifrovye-budni-rusgidro (In Russ.) [5] Muromtsev D.I. Conceptual modeling of knowledge in the Concept Map system. St. Pe- tersburg: SPb GU ITMO, 2009. 83 p. (In Russ.) 220 Yuri I. Molorodov et al. CEUR Workshop Proceedings 212–221 [6] Zagorulko Yu.A., Sidorova E.A., Borovikova O.I. Ontological approach to the construction of information support systems for scientific and industrial activities // Materials of the All- Russian Conference with International Participation “Knowledge — Ontology — Theory” (ZONT-09). Novosibirsk, 2009. Vol. 2. P. 93–102. (In Russ.) [7] Korolenko D.B., Kuzmenko A.P., Moskvichev V.V., Saburov V.S. Information system of seismometric monitoring of the technical condition of hydraulic structures: Experience of modeling, development and implementation // Computational Technologies. 2019. Vol. 24. No. 5. P. 13–37. DOI:10.25743/ICT.2019.24.5.003. (In Russ.) [8] How many years can hydroelectric power plants last? Available at: http://blog.rushydro. ru/?p=9950 (accessed April 06, 2021). (In Russ.) [9] STO 70238424.27.140.003-2010. Hydrotechnical structures of hydroelectric power plants and hydroelectric power plants. Organization of operation and maintenance. Norms and requirements. Introduced 2010-09-30. Moscow: NP “Invel”, 2010. 15 p. (In Russ.) [10] STO 70238424.27.140.035-2009. Hydroelectric power stations. Monitoring and evaluation of technical structures during operation. Norms and requirements. Introduced 2009-12-31. Moscow: NP “Invel”, 2009. 59 p. (In Russ.) [11] STO RusHydro 02.01.80-2012. Hydrotechnical structures of hydroelectric power plants and hydroelectric power plants. Operating rules. Norms and requirements. Entered 2012-10-29. Moscow: JSC “RusHydro”, 2012. 181 p. (In Russ.) [12] Parrott A., Warshaw L. Industry 4.0 and the digital twin technology. Deloitte Insigts (12- 05-2017) — Manufacturing meets its match. Available at: https://www2.deloitte.com/us/ en/insights/focus/industry-4-0/digital-twin-technology-smart-factory.html. 221