Finding Anomalies in the Operation of Automated Control
Systems Using Machine Learning
Yurii Hodlevskyia, Tetiana Vakaliukb, c, d, Oleksii Chyzhmotriab, Olena Chyzhmotriab and
Oleh Vlasenkob
a
  Infopulse Ukraine, 13a, Trypilska Street, Zhytomyr, 10003, Ukraine
b
  Zhytomyr Polytechnic State University, 103 Chudnivsyka Str., Zhytomyr, 10005, Ukraine
c
  Institute for Digitalisation of Education of the NAES of Ukraine, 9 M. Berlynskoho Str., Kyiv, 04060, Ukraine
d
  Kryvyi Rih State Pedagogical University, 54 Gagarin Ave., Kryvyi Rih, 50086, Ukraine

                 Abstract
                 This article deals with the problem of detecting anomalies in the operation of automated
                 systems. Anomaly detection is useful for preventing breakdowns and improving the
                 performance of automated systems. Using various sensors in automated systems, it is
                 possible to read information about the state of certain system parameters, which in turn helps
                 to monitor the state of the system at the moment. But simply viewing system indicators is not
                 a very optimal option, as human resources are wasted. It is also possible to set the limit
                 values of the sensors and if some indicator goes beyond them, the system can show a
                 message about it. Still, not all graphs can do with this solution to the problem, because
                 abnormal values can be recorded in these limits and ignored by such a system, or the sensor
                 can change the range of work and in this case, thus operators will receive a large number of
                 false messages. In this way, it is possible to implement a system that will detect anomalies
                 automatically using artificial intelligence, which will learn from existing previous data and
                 notify the operator of a malfunction.

                 Keywords 1
                 Machine Learning, Gradient Descent, Learning Algorithm, Adaptive Movement Estimation,
                 Long Short-term Memory, Diapason.

1. Introduction
    The problem of detecting anomalies in complex automated control systems is widespread enough
in various spheres of human activity. These problems can be tried to solve in different ways,
regardless of the budget and the ability to maintain various systems.
    Let's take as an example electricity generator stations in hard-to-reach places on our planet. The
work control system is complicated, for example, due to weather conditions and keeping personnel in
hard-to-reach places. Including the complexity of controlling data coming back from different
sensors that return different values in different components. You can try to troubleshoot this problem
in several ways.
    The most obvious of these is 24/7 monitoring of high performance using staff. But solving the
problem in this way increases the number of personnel, complicates the work schedule, and creates a
human factor risk.
    Another solution is a fixed minimum and maximum value for the sensors. But the problems of
this approach are the difficulty of setting the values for each of the sensors separately and updating


IntelITSIS’2023: 4th International Workshop on Intelligent Information Technologies and Systems of Information Security, March 22–24,
2023, Khmelnytskyi, Ukraine
EMAIL: godlevskiy.yuriy@gmail.com (Yurii Hodlevskyi); tetianavakaliuk@gmail.com (Tetiana Vakaliuk); chov@ztu.edu.ua (Oleksii
Chyzhmotria); ch-o-g@ztu.edu.ua (Olena Chyzhmotria); oleg@ztu.edu.ua (Oleh Vlasenko)
ORCID: 0000-0003-4094-0788 (Yurii Hodlevskyi); 0000-0001-6825-4697 (Tetiana Vakaliuk); 0000-0002-5515-6550 (Oleksii
Chyzhmotria); 0000-0001-8597-1292 (Olena Chyzhmotria); 0000-0001-6697-2150 (Oleh Vlasenko)
            © 2023 Copyright for this paper by its authors.
            Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
            CEUR Workshop Proceedings (CEUR-WS.org)
the values for each separately in case of changing the operation diapason of the sensors, which slows
down and complicates the operation of such systems.
   One of the options is to create an application for finding anomalies in the operation of the
automated control system using machine learning. Thanks to machine learning, the system will
automatically adapt to the variety of diapasons and values being looked at.
   In this application, it is planned to use data analysis methods in the LSTM neural network
architecture to implement the search for anomalous values in the operation of the automated system,
which return the values of different diapasons with different periodicity. The peculiarity of this
search for anomalies is the adaptation of programs to different diapasons, and the search for
anomalies in various zones, which expands the capabilities of the program supplement for various
areas.
   Objective – development of a system that can be used to control anomalies in the operation of
automated control systems.

2. Related Works
   Benjamin Lindemann, Benjamin Maschler, Nada Sahlab, Michael Weyrich in their research
describe an overview of promising LSTM based approaches for anomaly detection with an additional
focus on upcoming graph-based and transfer learning approaches. All approaches are evaluated based
on a set of application-oriented criteria such as the detection capabilities regarding temporal
anomalies, achieved accuracies, and use cases addressed in the original publication. They present
some use cases which can be useful in different areas but not examples of applications that can solve
the real problem of detecting anomalies on different types of power stations [13].
   Gian Antonio Susto, Matteo Terzi, Alessandro Beghi describe an application which was checked
anomaly detection strategies that have been tested on a real industrial dataset related to a
Semiconductor Manufacturing Etching process. They show the results of their application which is a
powerful tool but only for one limited area. It is not scalable for some other processes where you can
customize your application for different manufacturing processes [14].
   Zhe Li, Jingyue Li, Yi Wang, Kesheng Wang proposed a novel deep learning–based method for
anomaly detection in mechanical equipment by combining two types of deep learning architectures,
stacked autoencoders (SAE) and long short-term memory (LSTM) neural networks, to identify
anomaly condition in a completely unsupervised manner. They made an experiment for anomaly
detection in rotary machinery through wavelet packet decomposition (WPD) and data-driven models
demonstrated the efficiency and stability of the proposed approach. But their work can be useful only
for rotary machinery equipment which is not also scalable for other types of equipment [16].
   Yujie Wang; Xin Du; Zhihui Lu; Qiang Duan; Jie Wu improved LSTM model to detect anomalies
in equipment in rail transit systems. But their solution was implemented only for one problem and it
is not scalable for other fields [18]. Mahe Zabin, Ho-Jin Choi, Jia Uddin presented hybrid DTL
architecture comprising a deep convolutional neural network and long short-term memory layers for
extracting both temporal and spatial features enhanced by Hilbert transform 2D images. They
proposed a new customization of the model but not the application that is ready and scalable to solve
problems [21].
   Preeti Rajni Bala, Ram Pal Singh in their research considered the theoretical material for the
analysis of time series, and also elaborated the information about the neural network with long and
short-term memory [3]. Sheng Xiang, Yi Qin, Caichao Zhu, Yangyang Wang, Haizhou Chen
presented material on the use cases of neural networks with long and short-term memory. Considered
the use of this network in the example of predicting the resource of mechanical equipment [4]. The
same scientists investigated the use of the network with long and short-term memory at a deeper
level with the control state of the wear of the gears of mechanical equipment, in addition, they
considered the basic concepts of the algorithms of this neural network [5].
   Anuraganand Sharma conducted a review of existing neural network optimization methods.
Analyzing the advantages and disadvantages of various modifications of gradient descent, he made
an overview of the variation of gradient descent - stochastic gradient descent [6]. Farajtabar M.,
Azizan N., Mott A., Li A. considered variations of gradient descent and investigated superficial
information about orthogonal gradient descent for continuous learning [7]. Azizan, Navid & Hassibi,
Babak reviewed and investigated variational gradient descent as well as stochastic gradient descent
[8]. Simon Du, Jason Lee, Haochuan Li, Liwei Wang, Xiyu Zhai investigated the problem of the
local minimum, as well as the possibility of using various methods to find the global minimum [9].
    Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B. conducted an introduction to the problem of
saddle points in the gradient descent method, and tried to optimize the usual gradient descent [10].
Soukup D., Cejka T., Hynek K. listed options for using machine learning methods to detect
anomalies in various areas, and conducted an overview of anomaly detection in computer networks
[11]. Grcić, M., Bevandić, P., Šegvić, S. provided an introduction to hybrid anomaly detection for
dense open dataset recognition and a review of existing methods for dataset anomaly detection [12].
    Based on reviewed articles the main problem is that all related works had a solution for one area
and were not scalable. As result, it is necessary to create a common solution that can be used in
different areas with scalable functionality which is ready to detect anomalies in almost any
automation system.

3. Models & Methods & Technology
3.1. Analysis of problems and features of the anomaly detection process
    Anomaly detection (or outlier detection) is the identification of rare items, events, or observations
that are suspicious because of significant differences from the majority of the data. Typically,
abnormal data can be related to some problem or rare event, such as bank fraud, medical problems,
structural defects, malfunctioning equipment, etc. This relationship makes it very interesting to be
able to choose which data points can be considered anomalies, as identifying these events is usually
very interesting from a business perspective.
    Any machine, whether rotating (pump, compressor, gas or steam turbine, etc.) or non-rotating
machine (heat exchanger, distillation column, valve, etc.) will eventually reach a point of breakage.
This point may not be an actual failure or shutdown, but the point at which the equipment is no
longer operating in an optimal state. It means that some maintenance may be required to restore its
full operating potential. Simply put, determining the "health" of our equipment is the realm of health
monitoring.
    The most common way to perform condition monitoring is to look at each machine sensor
measurement and set a minimum and maximum value limit for it. If the current value is within the
limits, then the machine is working. If the current value is out of range, it means the machine is
faulty and an alarm is sent. But this procedure for setting hard-coded alarm limits is known to send a
large number of false alarms, that is, alarms for situations that are actually healthy for the machine.
There are also no alarms, i.e. situations that are problematic but not alarming. The first problem leads
to an additional expenditure of time and effort. The second problem is more important because it
leads to real damage along with repair costs and loss of productivity.
    Both problems can be the result of the same cause: human error. Even if you seat several
operators to view all sensor values, the station will not achieve energy-efficient monitoring of the
equipment, because a human may miss a value, and it takes a lot of time to view volumetric data.
That is why it is an optimal case to use automated methods of finding anomalies. In this way, you
can avoid human error, and focus the attention of service personnel on the anomalies already found
by the application, instead of reviewing the entire volume of data.
    Let's imagine a certain station, each of them, with a high probability, will have certain equipment
with completely different parameters, units of measurement, number of sensors, etc. In this way, the
issue of controlling all the equipment, keeping reports, and the process of detecting abnormal values
is complicated, if we take into account the proportional growth of stations to the growth of personnel.
As a result, with the increase in equipment, the process of detecting incorrect behavior of certain
automated systems becomes more and more difficult.
    Usually, in this case, more personnel are hired, and more operators keep reports and try to
visually detect the incorrect operation of this or that automated system. Or various applications are
being developed that are configured specifically for certain automated systems. Therefore, it should
be noted that the versatility of use plays a very important role in the further design and further
development of the software product, as the above-mentioned different equipment have a completely
different set of sensors, from simple temperature sensors to complex sensors that can measure
pressure, voltage, speed, etc.

3.2.    Selection of tools of implementation of the program
    Many different methods are used for data analysis, but it is advisable to approach the choice
reasonably, as for each problem there is a more suitable solution that will affect the further operation
of the software application and how the result can affect the operation of the business itself.
    An LSTM neural network is well-suited for anomaly detection.
    LSTM (long short-term memory) is an artificial neural network used in the field of artificial
intelligence and deep learning. Unlike standard feedforward neural networks, LSTM has feedback.
Such a recurrent neural network (RNN) can process not only individual data points (such as images),
but also entire data sequences (such as speech or video). Due to the fact that this neural network can
remember sequences of data, it helps to determine the behavior of a certain sensor and further check
whether subsequent data matches this behavior. This means that this neural network can be used for
many completely different graphs, which have a certain consistent behavior that helps to unify the
application for use in completely different equipment.
    Unlike any other data analysis method, LSTM is very good at detecting outliers in a graph. But in
order to conduct data analysis, you first need to prepare the data. First, you need to standardize the
data. This is necessary for the correct operation of the neural network.
    The standard evaluation of sample x is calculated by the formula z = (x - u) / s, where u is the
mean of the training samples and s is the standard deviation of the training samples. Centering and
scaling are performed independently for each feature by computing appropriate statistics on the
samples in the training set. The mean and standard deviation are then stored for further use in the
data using a transformation.
    Dataset standardization is a common requirement for many machine learning estimators: they can
perform poorly if individual features do not in some way resemble standard normally distributed data
(eg, a Gaussian with zero mean and unit variance).
    After the standardization of the data set, it should be divided into training and test samples.
Usually, the sample for train data is 70% of the total number, and for testing - 30%. The training data
is used to train the model. Tests for checking the trained model.
    In Figure 1, we can see the sensor data, which cannot be analyzed for anomalies using a fixed
minimum and maximum.
    Thanks to this neural network, we can work with different graphs, even with similar graphs,
which are shown in Figure 1, where it is impossible to simply set the minimum and maximum,
beyond which the value of the sensor cannot go, since there is a certain interval sequence, decreasing
the interval, and increasing it. An example of an anomaly on a similar sequence can be the examples
shown in Figure 2.
   Figure 1: Data before standatrisation


   Figure 2: Anomaly example

3.3.    Development of an anomaly analyzer using a neural network
   Python should be used as the main programming language.
   For implementation, it is suggested to use the following modules:
      • scikit-learn — Python module for machile learning, based on SciPy.
      • pandas — a software library is written for the Python programming language for data
manipulation and analysis. In particular, it offers data structures and operations for manipulating
numerical tables and time series.
      • Numpy — module that adds support for large multidimensional arrays and matrices, along
with a large library of high-level math functions for manipulating these arrays.
      • Plotly — for data visualization.
      • TensorFlow — for model realization.
   Let's take the LSTM neural network as a basis - it is an artificial neural network, the advantages
of which are that it avoids the problems of explosion or fading of the gradient since it does not
change the weights of the paths as with the usual method of back error propagation. In this way, we
can analyze quite different data graphs, and prepare a fairly universal software application that will
be useful in many areas.

3.4.    Neural network
  To understand the work of the LSTM neural network, let's first familiarize ourselves with its main
components, Figure 3 shows the basic structure of the LSTM.


   Figure 3: Structure of LSTM

   Consider the main elements of the structure shown in Figure 4. "σ" refers to the sigmoid.
   A sigmoid is a continuously differentiable monotonic non-linear S-shaped function that is often
used to "smooth" the values of some quantity. It is determined by the formula:
                                                  1                                         (1)
                                      𝑆𝑆(𝑥𝑥) =        −𝑥𝑥
                                               1 + 𝑒𝑒
  With the help of sigmoids, it is easy to get a number in the range from 1 to 0 depending on the
number that will be input as x. The graphic view of the sigmoid is shown in Figure 4.


Figure 4: Sigmoid function

    The notation “tanh” means the hyperbolic tangent function, which returns a result in the range
from -1 to 1. The hyperbolic tangent function is the hyperbolic counterpart of the circular tangent
function, which is used throughout trigonometry. The graphic representation is shown in Figure 5.
    It is determined by the formula:
                                             (𝑒𝑒 𝑥𝑥 − 𝑒𝑒 −𝑥𝑥 )                                 (2)
                                     𝑓𝑓(𝑥𝑥) = 𝑥𝑥         −𝑥𝑥
                                             (𝑒𝑒 + 𝑒𝑒 )
    Conventional notation “+” and “×” denote the operations of addition and multiplication,
respectively. Having analyzed the symbols on the general diagram, let's move on to a detailed review
of the work of the LSTM neural network.


Figure 5: Hyperbolic tangent function

    The Figure 6 shows the first stage of the neural network. The first layer can also be called the
Forget gate layer. It is in this layer that it is determined which information can be forgotten and
which can be left. The value of the previous output ht−1 and current input xt are passed through the
sigmoid layer. The obtained values are in the range [0; 1]. Values closer to 0 will be forgotten, and
values closer to 1 will be save.
    The formula of current step.
                                𝑓𝑓𝑡𝑡 = 𝜎𝜎(𝑊𝑊𝑓𝑓 × [ℎ𝑡𝑡−1 , 𝑥𝑥𝑡𝑡 ] + 𝑏𝑏𝑓𝑓 )                          (3)
    Next, it is decided what new information will be stored in the state of the cell. This stage consists
of two parts. First, a sigmoidal layer called input layer gate which determines which values should be
updated. The hyperbolic tangent layer then constructs a vector of new candidate values that can be
added to the cell. In Figure 7 schematically depicts the work of the second stage.
Figure 6: First step of LSTM

   The formula of current step.
                                   𝑖𝑖𝑡𝑡 = 𝜎𝜎(𝑊𝑊𝑖𝑖 × [ℎ𝑡𝑡−1 , 𝑥𝑥𝑡𝑡 ] + 𝑏𝑏𝑖𝑖 )                        (4)
                              𝐶𝐶̃𝑡𝑡 = 𝑡𝑡𝑡𝑡𝑡𝑡ℎ(𝑊𝑊𝑐𝑐 × [ℎ𝑡𝑡−1 , 𝑥𝑥𝑡𝑡 ] + 𝑏𝑏𝑐𝑐 )
   At the third stage, to replace the old state of the cell Ct−1 with new condition Ct. It is necessary to
multiply the old state with ft, forgetting what had decided to forget earlier. Then add an expression it
∗ C̃ t. As result we have a new candidates for the value. In Figure 8 schematically depicts the work of
the third stage.


Figure 7: Second step of LSTM

   The formula of current step.
                                   𝐶𝐶𝑡𝑡 = 𝑓𝑓𝑡𝑡 × 𝐶𝐶𝑡𝑡−1 + 𝑖𝑖𝑡𝑡 × 𝐶𝐶̃𝑡𝑡                           (5)
   At the last stage, it is determined what information will be received at the output. The output will
be based on our cell state, with some filters applied to it.


Figure 8: Third step of LSTM
   First, the value of the previous output ht−1 and current input xt are passed through a sigmoid layer
that decides what cell state information will be output. Then the values of the state of the cell are
passed through the hyperbolic tangent layer to obtain at the output values from the range from -1 to
1, and are multiplied with the output values of the sigmoid layer, which allows to output only the
necessary information. In Figure 9 schematically depicts the work of the last fourth stage.


Figure 9: Fourth step of LSTM

   The formula of current step.
                                𝑜𝑜𝑡𝑡 = 𝜎𝜎(𝑊𝑊𝑜𝑜 [ℎ𝑡𝑡−1 , 𝑥𝑥𝑡𝑡 ] + 𝑏𝑏𝑜𝑜 )                          (6)
                                     ℎ𝑡𝑡 = 𝑜𝑜𝑡𝑡 × 𝑡𝑡𝑡𝑡𝑡𝑡ℎ(𝐶𝐶𝑡𝑡 )
    Obtained in this way ht і Ct are passed down the chain. Iterative gradient descent with error
backpropagation could be used to minimize the total error, but the main problem with gradient
descent for standard recurrent neural networks is that the error gradients decrease at an exponential
rate as the time delay between important events increases, which was discovered in 1991 [1,2]. With
LSTM blocks, however, as the error magnitudes propagate backward from the source layer, the error
appears locked in the block's memory. Thus, regular error backpropagation is effective for training an
LSTM block to remember values for very long time intervals.

3.5.    Dataset creation
   For a real check of the algorithm, it is desirable to have real equipment that would return certain
values, but since the equipment can be different, it was decided to analyze Internet resources and find
out what types of graphs can be returned from automated systems.
   After analyzing Internet resources, it was decided to generate data in a certain range, periodic data
with repetition of oscillations, and sinusoidal data for examples. Data in a certain range are shown in
Figure 10. An example of periodic data is shown in Figure 11. An example of sinusoidal data is
shown in Figure 12.


Figure 1: Dataset example
Figure 2: An example of periodic dataset


Figure 3: Sinusoidal dataset

3.6.    Preparation for data analysis
    To work with artificial intelligence algorithms, you need to prepare data in the appropriate format.
After generating the dataset and saving the data in a csv file, you should select only the necessary
lines and perform data standardization.
    Data standardization is the process of converting data into a common format so that analysts can
process and analyze it. Most organizations use data from multiple sources; this can include on-
premises data storage, cloud storage, and various databases. However, data from different sources can
be problematic if it is not homogeneous, leading to difficulties later (for example, when you use this
data to create dashboards, visualizations, etc.).
    Data standardization is critical for many reasons. Above all, it helps establish clear, coherently
defined elements and attributes, providing a complete catalog of your data. No matter what statistics
we're trying to get or what problems we're trying to solve, getting the data right is an important
starting point.
    This requires converting this data into a single format with logical and consistent definitions.
These definitions will form metadata—labels that identify different aspects of the data. This is the
basis of the data standardization process.
    In terms of accuracy, standardizing the way data is labeled will improve access to the most
relevant pieces of information. This will simplify analytics and reporting. It is calculated according to
the formula:
                                                𝑥𝑥 − 𝜇𝜇                                           (7)
                                           𝑍𝑍 =
                                                   𝜎𝜎
    In the formula mentioned above, x is a point from the dataset, µ is the arithmetic mean of the
dataset, σ is the standard deviation of the dataset.
   The formula for calculating the standard deviation can be presented as:
                                                                                                  (8)
                                          ∑(𝑥𝑥𝑖𝑖 − 𝑥𝑥�)2
                                        �
                                             𝑛𝑛 − 1
   where 𝑥𝑥̅ – is the arithmetic mean of the dataset, n – is the number of values of the dataset, xi – is
the ith value from the dataset.
   After the standardization of the data set, it should be divided into training and test samples.
Usually, the sample for training is 70% of the total number, and for testing - 30%. The training data is
used to train the model. Tests for checking the trained model.

3.7.    Interface
   After successful registration, the user can familiarize himself with the main functionality of the
software application. On the stations page, a user with manager rights can create stations, change or
delete one of the stations. The station page is shown in Figure 13.


Figure 4: Stations page

   Before the user creates station equipment and equipment sensors, it is necessary to create
equipment types, which are actually a certain automated system of this station. Since different
equipment can have similar properties at different stations these are common features that should be
included in the type of equipment. The same logic should be used when creating types for sensors,
since, for example, temperature sensors can be installed on different equipment and it does not make
sense to create the same information for different types of equipment every time. The equipment types
page is shown in Figure 14. The sensor types page is similar to the previous one.
   Figure 5: Equipment type page


   Figure 6: Equipment type context menu

   After creating equipment types and sensor types, the user can start creating equipments and
sensors using pre-prepared types. The user can select the appropriate type from the drop-down menu
shown in Figure 15. The logic of the work of creating sensors is similar, with the exception of the
choice of the type of graph and the presence or absence of anomalies, since the dataset is generated,
and not taken from a real automated system, since there is no access to such at the development stage.
   After that user can check the equipment list with all the necessary details, and edit or delete them,
equipment page is shown in Figure 16.
Figure 7: Equipment page

   Inside every equipment the user can create sensors. Among the graph types, you can choose
periodic, with a certain period of repetition of data ranges, normal, where there is a certain range and
values that do not go beyond it, and sine wave. You can also choose the type in the drop-down menu,
which is shown in Figure 17. The sensors page is similar to the equipments page.


   Figure 8: Selection of graph types
4. Experiments
   For experiments researched internet resources for checking the data that different equipment can
return from different sensors. Picked up a regular graph which should return data in diapason from -2
to 2, shown in Figure 18, a repeating graph that has some repeating trend where the big part has a
value between -4 to 3 and the small part from -1 to 1, shown in Figure 19, and sinusoid graph, shown
in Figure 20.
   After that was imitated anomalies which is shown in Figure 21. For application training data was
split for train and test batches. After all training and testing, the application analyzes data, find
anomalies, and show the results, anomalies were highlighted as orange dots, shown in Figure 21.


Figure 9: Regular graph


Figure 19: Repeating graph that has repeating trend
Figure 10: Sinusoid graph


Figure 11: Experiments results

   The application is ready to analyze and return results for radically different graphs, which can
solve the problem of previously considered studies that were prepared only for certain equipment.
This application has solutions for completely different equipment, which as a result is flexible in
further use at various stations using different equipment, from pumps and turbines to nuclear reactors.
   If the system did not detect any anomalies, the user will be informed by the inscription that there
are no anomalies in the specific equipment on all sensors, shown in Figure 22.


Figure 12: Equipment without anomalies

    If anomalies were detected, the status will change to "Exist". To study in detail where exactly the
anomaly occurred, the user needs to click on the equipment that he needs to examine, and after going
to the page of all sensors of this equipment, analyze the sensors for the presence of the "Exist" status.
It is shown in Figure 23.


Figure 13: Page of sensors of this equipment with the present anomalies
   After that, the user can go to the sensor that he needs to investigate and analyze the results of the
anomaly detection in this automated system. The graph of similar results is shown in Figure 24.
   If anomalies were detected, they are indicated by orange dots, as shown in Figure 24. If there are
no anomalies, then the graph will consist only of the indicators of the current sensor of this
equipment, shown in Figure 25. In this way, the program can analyze quite different graphs, an
example is shown in Figure 21.


Figure 14: The page of the sensor of this equipment with the exist anomalies


Figure 15: The page of the sensor of this equipment without anomalies
5. Conclusions
    In this work, a user-friendly software application was implemented. What is convenient for
keeping records of various automated station systems and a mechanism for automatic detection of
anomalies in them has been implemented. The technical task was prepared and the correct tools for
the implementation of the program were chosen. Analyzed different researcher works with their
advantages and disadvantages. Was implemented a scalable application that can be used in different
equipment which is the main novelty. The application was tested with different experiments which
showed acceptable results for future use.
    The main role in the detection of anomalies is played by the LSTM neural network. This work
describes the process of data preparation for its use, its creation and training. To train the neural
network, a proprietary dataset generator was created after analyzing various Internet resources. The
process of choosing a specific neural network architecture is described. A number of algorithms for
preliminary data preparation were also implemented.
   Prospects for further research should be in-depth investigations of the performance of an LSTM
natural network in this context. It is necessary to check the current application in various areas. It is
worth conducting a study on the success of this software application using a large amount of real data
from equipment indicators of automated systems.

6. References
[1] X. Yang, M. Xu, S. Xu and X. Han, "Day-ahead forecasting of photovoltaic output power with
     similar cloud space fusion based on incomplete historical data mining", Appl. Energy, vol. 206,
     pp.                                          683-696,                                        2017.
     https://www.sciencedirect.com/science/article/abs/pii/S0306261917312564?via%3Dihub
[2] M. Ceci, R. Corizzo, F. Fumarola, D. Malerba and A. Rashkovska, "Predictive modeling of PV
     energy production: How to set up the learning task for a better prediction?", IEEE Trans. Ind.
     Informat., vol. 13, no. 3, pp. 956-966, Jun. 2017. https://ieeexplore.ieee.org/document/7556989
[3] Preeti Rajni Bala, Ram Pal Singh. A dual-stage advanced deep learning algorithm for long-term
     and long-sequence prediction for multivariate financial time series. Applied Soft Computing.
     Volume 126, 2022, 109317. https://doi.org/10.1016/j.asoc.2022.109317.
[4] Sheng Xiang, Yi Qin, Caichao Zhu, Yangyang Wang, Haizhou Chen. LSTM networks based on
     attention ordered neurons for gear remaining life prediction. ISA Transactions. Volume 106,
     2020. Pp. 343-354. https://doi.org/10.1016/j.isatra.2020.06.023.
[5] Sheng Xiang, Yi Qin, Caichao Zhu, Yangyang Wang, Haizhou Chen. Long short-term memory
     neural network with weight amplification and its application into gear remaining useful life
     prediction. Engineering Applications of Artificial Intelligence.Volume 91, 2020, 103587,
     https://doi.org/10.1016/j.engappai.2020.103587.
[6] Anuraganand Sharma. Guided Stochastic Gradient Descent Algorithm for inconsistent datasets.
     Applied        Soft       Computing.       Volume          73,      2018,       pp.     1068-1080.
     https://doi.org/10.1016/j.asoc.2018.09.038.
[7] Farajtabar M., Azizan N., Mott A., Li A. Orthogonal gradient descent for continual learning.
     Proceedings of the 23rdInternational Conference on Artificial Intelligence and Statistics
     (AISTATS)            2020,        Palermo,         Italy.        PMLR:          Volume        108.
     https://core.ac.uk/download/pdf/345075797.pdf
[8] Azizan, Navid & Hassibi, Babak. (2018). Stochastic Gradient/Mirror Descent: Minimax
     Optimality and Implicit Regularization. ICLR 2019. https://openreview.net/pdf?id=HJf9ZhC9FX
[9] Simon Du, Jason Lee, Haochuan Li, Liwei Wang, Xiyu Zhai. Gradient Descent Finds Global
     Minima of Deep Neural Networks. Proceedings of the 36th International Conference on Machine
     Learning, PMLR 97:1675-1685, 2019. http://proceedings.mlr.press/v97/du19c.html
[10] Lee, J.D., Simchowitz, M., Jordan, M.I. &amp; Recht, B.. (2016). Gradient Descent Only
     Converges to Minimizers. 29th Annual Conference on Learning Theory, in Proceedings of
     Machine Learning Research. 49: 1246-1257. https://proceedings.mlr.press/v49/lee16.html.
[11] Soukup D., Cejka T., Hynek K. (2020) Behavior Anomaly Detection in IoT Networks. Lecture
     Notes on Data Engineering and Communications Technologies, 49, pp. 465 - 473. DOI:
     10.1007/978-3-030-43192-1_53
[12] Grcić, M., Bevandić, P., Šegvić, S. (2022). DenseHybrid: Hybrid Anomaly Detection for Dense
     Open-Set Recognition. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds)
     Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13685.
     Springer, Cham. https://doi.org/10.1007/978-3-031-19806-9_29
[13] Benjamin Lindemann, Benjamin Maschler, Nada Sahlab, Michael Weyrich. (2021). A survey on
     anomaly detection for technical systems using LSTM networks. Computers in Industry, Volume
     131, https://doi.org/10.1016/j.compind.2021.103498
[14] Gian Antonio Susto, Matteo Terzi, Alessandro Beghi. (2017). Anomaly Detection Approaches
     for Semiconductor Manufacturing. Procedia Manufacturing, Volume 11, Pages 2018-2024,
     https://doi.org/10.1016/j.promfg.2017.07.353.
[15] Bashar M. Haddad, Sen Yang, Lina J. Karam, Jieping Ye, Nital S. Patel, Martin W. Braun.
     (2018). Multifeature, Sparse-Based Approach for Defects Detection and Classification in
     Semiconductor Units. IEEE Transactions on Automation Science and Engineering, vol. 15, no. 1,
     pp. 145-159, Jan. 2018, doi: 10.1109/TASE.2016.2594288.
[16] Li, Z., Li, J., Wang, Y. et al. (2019) A deep learning approach for anomaly detection based on
     SAE and LSTM in mechanical equipment. The International Journal of Advanced
     Manufacturing Technology, volume 103, 499–510. https://doi.org/10.1007/s00170-019-03557-w
[17] Yaguo Lei, Jing Lin, Zhengjia He, Ming J. Zuo (2013). A review on empirical mode
     decomposition in fault diagnosis of rotating machinery. Mechanical Systems and Signal
     Processing, Volume 35, Issues 1–2, Pp. 108-126. https://doi.org/10.1016/j.ymssp.2012.09.015.
[18] Y. Wang, X. Du, Z. Lu, Q. Duan and J. Wu. (2022). Improved LSTM-Based Time-Series
     Anomaly Detection in Rail Transit Operation Environments. IEEE Transactions on Industrial
     Informatics, vol. 18, no. 12, pp. 9027-9036, doi: 10.1109/TII.2022.3164087.
[19] M. Abdel-Nasser, K. Mahmoud and M. Lehtonen (2021) Reliable Solar Irradiance Forecasting
     Approach Based on Choquet Integral and Deep LSTMs. IEEE Transactions on Industrial
     Informatics, vol. 17, no. 3, pp. 1873-1881, doi: 10.1109/TII.2020.2996235.
[20] R. Quan, L. Zhu, Y. Wu and Y. Yang (2021) Holistic LSTM for pedestrian trajectory prediction.
     IEEE          Trans.       Image        Process.,       vol.      30,      pp.       3229-3239,
     https://ieeexplore.ieee.org/document/9361440
[21] Zabin, M., Choi, HJ. & Uddin, J. (2023) Hybrid deep transfer learning architecture for industrial
     fault diagnosis using Hilbert transform and DCNN–LSTM. J Supercomput 79, 5181–5200.
     https://doi.org/10.1007/s11227-022-04830-8