Model of Graphic Object Identification in a Video
                         Surveillance System based on a Neural Network
                         Andii Sahun1, Vladyslav Khaidurov2, and Viktor Bobkov2
                         1 National University of Life and Environmental Sciences of Ukraine, 15 Heroyiv Oborony str., Kyiv, 03041,

                         Ukraine
                         2 National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute,” 37 Peremohy ave.,

                         Solomyanskyi district, Kyiv, 03056, Ukraine

                                          Abstract
                                          The object identification system, given the correct model selection and settings, enables
                                          accurate and fast identification of graphical objects in video data. A deep learning neural
                                          network is the base for the identification system. The use of the CamVid benchmark video
                                          dataset for training the neural network model allows using of fundamental truth labels
                                          that associate each pixel with one of the 32 semantic classes of the identification system.
                                          The total number of used training images is 421, and the testing ones are 280. Selecting
                                          optimal parameters for the learning function and identification support, a method for
                                          measuring the distance between feature vectors gives the necessary result—the
                                          identification of objects from the video data stream of perimeter IP cameras
                                          demonstrated an average accuracy of 99.7% across all cameras in the test examples,
                                          consisting of 12 video fragments with a duration of 70 seconds each. The developed
                                          algorithm of the system is capable of identifying objects of 11 classes from the graphical
                                          information content of IP cameras.

                                          Keywords 1
                                          Deep machine learning, neural network, classifier, clustering, cluster analysis, pattern
                                          recognition.

                         1. Introduction                                                                                        important aspect is the formation of the
                                                                                                                                classifier’s gradation scale in the identification
                         The creation and implementation of object                                                              system.
                         identification systems within various technical                                                            If we need to analyze video data it is critical
                         systems is explained by their demand in a large                                                        to create a function that analyzes video frames
                         number of contemporary information systems.                                                            and utilizes test databases for initial object
                         Today, effective and high-quality identification                                                       identification in individual static frames of the
                         of various objects can only be achieved by                                                             video sequence [3]. A separate challenge in
                         intelligent systems. These systems are based                                                           object identification models based on neural
                         on artificial intelligence or machine learning                                                         networks is the classification of recognized
                         algorithms. Frequently, neural networks in                                                             objects [4].
                         various variations serve as the basis for object                                                           Significant contributions to creating
                         identification systems. When using neural                                                              algorithms      and    methods       for     object
                         networks, it is crucial to correctly determine                                                         identification have been made by: V. Kornienko,
                         both the type of network and choose a test                                                             L. Budkova, A. Korobov, A. Korobov [1],
                         video database for proper training of the                                                              V. Lakhno, V. Chubaievskyi, K. Palaguta [6],
                         neural network [1, 2]. Otherwise, the accuracy                                                         V. Kornienko, I. Gulina, L. Budkova [7],
                         of the classification in the created object                                                            O. Kryvoruchko,       A. Desiatko,       A. Blozva,
                         identification system may be low. Another                                                              V. Semidotska [8]. Publications on the use of
                                                                                                                                neural networks and machine learning

                         CPITS-2024: Cybersecurity Providing in Information and Telecommunication Systems, February 28, 2024, Kyiv, Ukraine
                         EMAIL: avd29@ukr.net (A. Sahun); allif0111@gmail.com (V. Khaidurov); vbb.wlp@gmail.com (V. Bobkov)
                         ORCID: 0000-0002-5151-9203 (A. Sahun); 0000-0002-4805-8880 (V. Khaidurov); 0000-0002-1567-8186 (V. Bobkov)
                                      ©️ 2024 Copyright for this paper by its authors.
                                      Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

                                      CEUR Workshop Proceedings (CEUR-WS.org)

CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
                                                                                                                      361
technologies for solving applied tasks in object    difference between these two             learning
identification and cybersecurity are dedicated      methods is illustrated in Figure 1.
to the research of the following scholars:
S. Schuster [9], L. Ljung, C. Andersson, K. Tiels
[10], O. Nelles [11], I. Goodfellow, Y. Bengio,
A. Courville [12], C.-J. Lin [13], T. Schön [14],
D. Kandamali, X. Cao, M. Tian, Z. Jin, H. Dong,
K. Yu [15], S. Bickler [16], B. Akhmetov [7], and
others.
    In    most      cases,     overcoming     the
aforementioned challenges allows obtaining a
correct mathematical model of the object            Figure 1: The difference between two learning
identification system. The same systems can be      methods (Machine Learning and Deep
used as a computer vision system or for other       Learning)
practical applications. To obtain an object
identification system, it is crucial to obtain a    In the neural network used as the basis for the
mathematical model on which such a system           object identification system, multiple layers of
will operate [17]. To achieve this, it is           neurons are envisaged. The input layer of the
necessary to analyze existing algorithms,           neural network takes initial input data. In this
models, and methods applied in intelligent          case, there are four neurons in the output
object identification systems.                      layer: intensity of each pixel, Haar features for
                                                    each of the considered objects (trees, cars,
                                                    roads, sky, pedestrians, etc.). The input layer
2. Graphic Object Identification
                                                    passes this data to the first hidden layer.
                                                    Hidden      layers     perform     mathematical
According to the task conditions, the
                                                    computations with the input data. The term
fundamental property of the identification
                                                    “depth” in “deep learning” refers to having
system is the need to distinguish and identify
                                                    more than one hidden layer. The output layer
objects in individual frames of video content. It
                                                    produces the final result. In this case, it is the
is rational to base such models on the
                                                    identification of the type of object present in a
mathematical framework and algorithms of
                                                    specific image. The diagram of the obtained
neural networks.
                                                    neural network model is shown in Figure 2.
    The advantage of a mathematical model
based on neural networks is the ability to learn.
In computer intelligent systems for object
identification, machine learning is a factor that
significantly improves the adequacy and
accuracy of recognition and identification
algorithm performance. The most rational type
of learning for the mathematical model of the
created system is supervised learning. This         Figure 2: The best forecasting models
type of learning involves using labeled or          obtained by the method of group accounting of
selected datasets that contain input data and       arguments obtained
expected output results. Thus, during the
                                                    Identification of objects based on this machine
learning process, the model can perform
                                                    learning method occurs through the
internal iterations to approximate the
                                                    reevaluation of the weights of connections
specified error boundary. Once the learning
                                                    between neurons. The weight factor
boundary (error) ceases to exceed the
                                                    determines the importance of the input data
specified limits, the model is considered
                                                    element. When classifying objects, the weight
trained.
                                                    coefficient in interneuronal connections is the
    Deep learning allows training a model to
                                                    most crucial.
predict outcomes based on a set of input data.
                                                       For the practical implementation of the
For network training, both supervised and
                                                    created model, the vgg16() function in
unsupervised learning can be used. The
                                                    MATLAB was chosen as the basis. This function


                                                362
represents the architecture of a deep neural            We can define these two vectors as follows:
network for image classification. It includes 16     𝑥 = (𝑥1 ; 𝑥2 ; … 𝑥𝑛 ). 𝑥 ′ = (𝑥1′ ; 𝑥2′ ; … 𝑥𝑛′ ).
convolutional and fully connected layers.               To train a neural network, prepared data
    The given function has 16 layers, including      needs to be fed into it, and the generated
13 convolutional layers and 3 fully connected        outputs should be compared with the results
layers. The overall architecture of VGG-16           from a test dataset. For the training of the
includes:                                            neural network, test examples from the video
    1. Input Layer: Consisting of 224x224            database labeled Cambridge (CamVid) were
       pixels.                                       used. This database represents the first
    2. Convolutional          Layers:        each    collection of videos with semantic labels of
       convolutional layer has 3x3 filters and       object classes, complemented with metadata.
       ReLU activation.                              The database provides fundamental ground
    The number of filters in each convolutional      truth labels associating each pixel with one of
layer increases from 64 to 512.                      32 semantic classes. It was obtained from a
    MaxPooling (2x2) layers are used after each      free internet link.
block of convolutional layers.                          This database addresses the need for
    3. Fully Connected Layers: three fully           forming experimental data to quantitatively
       connected layers with 4096 neurons            evaluate identification and classification
       each. ReLU activation is applied to the       algorithms. For each pair of objects, the
       output neurons of each fully connected        “distance” between them is measured,
       layer. Dropout (random deactivation)          representing the degree of similarity.
       may be applied to prevent overfitting.           A model that provides a minimum of
    4. Output Layer: the output layer has 1000       external criteria is considered optimal. With an
       neurons (the number of classes in             increase in the number of variables in the
       ImageNet images). The softmax                 model and the degree of the reference
       activation function is used to obtain class   polynomial, obtaining the best forecasting
       probabilities.                                model can increase significantly.
    The vgg16 function in MatLab returns a              The obtained model is practically
neural network object but does not include a         implemented in the MatLab environment. The
specific method for measuring the distance           aforementioned video database of test samples
between feature vectors.                             was used for training the neural network. This
    There are some specific methods for              video database contains a 10-minute frame
measuring the distance between feature               with a rate of 30 Hz. Images are segmented
vectors: Euclidean distance; Euclidean               using corresponding semantic labels at a
distance squared; Manhattan distance;                frequency of 1 Hz and partially at 15 GHz. The
Chebyshev distance; and hamming distance             CamVid database has four datasets
[18]. But Euclid’s distance has the biggest          corresponding to the studied objects. They
advantage—its simplicity. Its calculation is the     include:
simplest and light-calculated way to get a              1. Pixel-level semantic segmentation for
direct path between two points. In the task of              more than 700 images (segmentation
graphic image identification, we use the                    performed manually), later verified and
Euclidean distance to compare the model’s                   confirmed by a second person for
output values with expected values during                   accuracy.
training and for classifying objects based on           2. High-quality color video images in high
their features.                                             resolution, collected in the database.
    In this research, the classical Euclidean               These images represent digital video
distance is utilized. To calculate it, we use the           recordings with a long duration of video
following expression:                                       content.
                             𝑛                          3. Contain calibrated sequences for color
                    ′)                  2                   response        and       internal          camera
           𝜌(𝑥, 𝑥        = √∑(𝑥𝑖 − 𝑥𝑖′ ) ,                  characteristics, considering the point of
                             𝑖=1
                                                            view, as a typical surveillance camera
where 𝑥𝑖 is the first n-dimensional vector, 𝑥𝑖′ is          and fixation of each frame in the
the second n-dimensional vector.                            sequences.


                                                 363
   4. To support the expansion of the                   • InitialLearnRate (initial neural network
      database, software is proposed for                    learning rate).
      labeling (necessary to assist users who           • L2Regularization (L2 neural network
      want to perform accurate labeling of                  training regularization—weight decay);
      classes for other images and videos).             • MaxEpochs (number of full passes
   The relevance of the database is evaluated               through the entire dataset during
by measuring the algorithm’s performance in                 network training).
each of the three different areas: object               • MiniBatchSize (mini-batch size, the
recognition in multiple classes, pedestrian                 number of examples used to update the
detection, and label propagation.                           gradient at each iteration).
                                                        • Shuffle and every epoch (shuffling data
3. Training and Testing the                                 at each epoch during training).
                                                        • VerboseFrequency (the frequency of
   Identification System Model                              displaying training progress information
                                                            in the command window).
To ensure the operational functionality of the          Further, it is necessary to define the
model, the following steps were taken: loading       classifier’s grading scales to perform cluster
test datasets from the CamVid database sets          analysis and the final classification of identified
and preparing a repository for test loading          graphical objects. As reference data, we will
samples.                                             take the informational component of the color
   Declare individual classes of identified          channels of the identified image {R ∈ (0; 255),
objects: “Sky,” “Building,” “Pole,” “Road,”          G ∈ (0; 255), B ∈ (0; 255)} this way:
“Pavement,” “Tree,” “SignSymbol,” “Fence,”              • Reference information vector for sky
“Car,” “Pedestrian,” “Bicyclist”.                           identification:
   The resolution of training frames is set to              [128 128 128; ... % “Sky”].
360×480 points of the video stream:                     • Reference information vector for
imageSize = [360 480 3].                                    building and structure identification:
   A neural network model for the                           000 128 064; ... % “Bridge”
identification of graphical objects returns a               128 000 000; ... % “Building”
specific set of numerical values for the                    064 192 000; ... % “Wall”
identified objects. To obtain the central value             064 000 064; ... % “Tunnel”
of an ordered set of such data, we use the                  192 000 128; ... % “Archway”.
median () function in the Matlab environment.           • Reference information vector for
The initialization of parameters for neural                 identifying columns, pillars, etc:
network training and the definition of the error            192 192 128; ... % “Column_Pole”
function for the neural network are provided                000 000 064; ... % “TrafficCone”.
in Table 1.                                             • Reference information vector for
Table 1                                                     identifying road surface:
Parameters (argument) of the neural network                 128 064 128; ... % “Road”
training function                                           128 000 192; ... % “LaneMkgsDriv”
 Argument’s Name                  Value                     192 000 064; ... % “LaneMkgsNonDriv”.
 Momentum                                    0.9        • Reference information vector for
 InitialLearnRate                         1×10-3            identifying sidewalks, pavement, and
 L2Regularisation                         5×10-4
                                                            pedestrian paths:
                                                            000 000 192; ... % “Sidewalk”
   As we see from Table 1, the
                                                            064 192 128; ... % “ParkingBlock”
trainingOptions() function has 9 arguments:
                                                            128 128 192; ... % “RoadShoulder”.
   • sgdm (Stochastic Gradient Descent with
                                                        • Reference information vector for
      Momentum).
                                                            identifying trees, shrubs, and other
   • 'Momentum         (momentum        helps
                                                            significant vegetation areas:
      accelerate the optimization process by
                                                            128 128 000; ... % “Tree”
      incorporating      information    from
                                                            192 192 000; ... % “VegetationMisc”.
      previous iterations).


                                                   364
    • Reference information vector for               weight of each class can be determined (Table
        identifying road signs, informational        3, column ‘Class weight’).
        signs, etc.:
                                                     Table 2
        192 128 128; ... % “SignSymbol”
                                                     Basic classes of model
        128 128 064; ... % “Misc_Text”                 Class name   Class characteristics (RGB features)
        000 064 064; ... % “TrafficLight”.             Sky                        128 128 128
    • Reference information vector for                 Building                      128 0 0
        identifying fences and barriers:               Pole                       192 192 192
                                                       Road                        128 64 128
        064 064 128; ... % “Fence”.                    Pavement                     60 40 222
    • Reference information vector for                 Tree                         128 128 0
                                                       SignSymbol                 192 128 128
        identifying vehicles:                          Fence                        64 64 128
        064 000 128; ... % “Car”                       Car                          64 0 128
        064 128 192; ... % “SUVPickupTruck”            Pedestrian                    64 64 0
                                                       Bicyclist                    0 128 192
        192 128 192; ... % “Truck_Bus”
        192 064 128; ... % “Train”                   Table 3
        128 064 064; ... % “OtherMoving”.            The weight of each class
    • Reference information vector for                 Class name     Class weight            IoU
        identifying pedestrians, animals, and          Sky            0.318184709354742       0.9266
                                                       Building       0.208197860785155       0.7987
        light     means     of    manual    cargo      Pole           5.092367332938507       0.1698
        transportation:                                Road           0.174381825257403       0.9518
        064 064 000; ... % “Pedestrian”                Pavement       0.710338097812948       0.4188
                                                       Tree           0.417518560687874       0.4340
        192 128 064; ... % “Child”                     SignSymbol     4.537074815482926       0.3251
        064 000 192; ... % “CartLuggagePram”           Fence          1.838648261914560       0.4920
        064 128 064; ... % “Animal”.                   Car            1.000000000000000       0.0688
                                                       Pedestrian     6.605878573155874       0
    • Reference information vector for                 Bicyclist      5.113338416059593       0
        identifying light mechanized means of
        transportation for people and cargo          Frequency characteristics of the occurrence of
        (motorcycles/bicycles):                      weights for individual classes on specific
        000 128 192; ... % “Bicyclist”               frames are also determined in column ‘IoU’ of
        192 000 192; ... % “MotorcycleScooter”.      Table 3. As shown in column ‘IoU’, there are
    From the provided reference information          any pedestrians and Bicyclist in the test image
vectors, it can be seen that some classes of the     were identified.
identified and subsequently clustered data              Through training on the training set, the
contain subclasses, namely:                          algorithm based on a deep learning neural
    • The ‘Building’ class contains 4 embedded       network distinguishes the background from
        subclasses.                                  the informational content of identification
    • The ‘Pole’ class contains 2 subclasses.        (object—car) (Figure 3).
    • The ‘Road’ contains 3 subclasses.
    • The ‘Pavement’ contains 3 subclasses.
    • The ‘Tree’ contains 2 subclasses.
    • The ‘SignSymbol’ contains 3 subclasses.
    • The ‘Car’ contains 5 subclasses.
    • The ‘Pedestrian’ contains 4 subclasses.
    • The ‘Bicyclist’ contains 2 subclasses.
    The color channel reference map for
highlighting the classes of identified objects
contains a color channel vector for 11 basic
classes. Its representation is shown in Table 2.
It is critical to define a reference map for color
channels.                                            Figure 3: The algorithm of the created
    When training the model, the total number        identification model separates the background
of training images is 421, and the number of         in a graphical video frame
test images is 280. As a result of training, the


                                                 365
As an example, to showcase the operation of
the developed object identification system, a
graphical frame depicted in Figure 4 has been
loaded.


                                                  Figure 6: A histogram of the frequency of
Figure 4: Original graphical frame for testing
                                                  occurrence of identified classes and subclasses
the identification model
                                                  of objects in the identification zone of video
As a result of the developed algorithm, we        surveillance cameras
observe the identification results on the
                                                  By comparing deviations of the obtained
demonstration test frame of objects subjected
                                                  numerical arrays with reference ones, the
to further cluster analysis. The legend for the
                                                  developed system for identifying graphic
identification algorithm classes is provided in
                                                  objects can make decisions regarding the
Figure 5.
                                                  detection of specific incidents or determine
                                                  certain reactions of the system.

                                                  4. Conclusions
                                                  As a result of testing the implemented
                                                  algorithm in the Matlab environment, the
                                                  accuracy of the graphic information
                                                  identification system model was proven to be
                                                  high. The identification of objects from the
                                                  video data stream of perimeter surveillance
                                                  cameras demonstrated an average accuracy of
                                                  96.38% across all cameras in the test
                                                  examples, consisting of 12 video fragments
                                                  with a duration of 68.35 seconds each. These
Figure 5: The legend for the identification       results were achieved due to several factors,
algorithm classes                                 including:
The practical application of the developed           • Type of neural network and thoughtfully
identification system implies that the                  chosen parameters of the neural
identification results were presented not in a          network training function.
visual (graphic) form but in the form of a           • Method for determining the similarity
numerical data array, allowing these results to         measure of an object to existing classes
be further used in more complex systems. To             (distance measure for clustering)
achieve this, we construct a histogram of the           through Euclidean distance.
frequency of occurrence of identified classes        • Using the Cambridge (CamVid) labeled
and subclasses of objects in the identification         video database as a collection of videos
zone of video surveillance cameras (Figure 6).          with semantic labels of object classes,
                                                        accompanied by metadata. This


                                              366
      database facilitated effective training of   [7]    O. Herasina,       V. Korniienko,       The
      the foundation of the identification                Algorithms of Global and Local
      system—the neural network model.                    Optimization in Tasks of Identification of
   • Corrected definition of the classifier’s             Difficult     Dynamic     Systems,      Inf.
      grading scales to perform cluster                   Processing Syst. 6(87) (2010) 73–77.
      analysis and the final classification of     [8]    V. Lakhno, et al., Development Strategy
      identified graphical objects and others.            Model of the Informational Management
   As seen from the results of the image                  Logistic System of a Commercial Enter-
identification model, incorporating Haar’s                prise by Neural Network Apparatus, in:
features into the neural network model yields             Cybersecurity Providing in Inf. and
excellent results in the classification and               Telecom. Syst., vol. 2746 (2020) 87–98.
identification of images of various types.         [9]    H. Schuster,      Deterministic      Chaos:
                                                          Introduction and Recent Results,
References                                                Nonlinear Dyn. Solids (1992) 22–30. doi:
                                                          10.1007/978-3-642-95650-8_2.
                                                   [10]   L. Ljung, et al., (2020). Deep Learning
[1]   B. Bebeshko, et al., Application of Game
                                                          and System Identification, IFAC-
      Theory, Fuzzy Logic and Neural
                                                          PapersOnLine 53(2) (2020) 1175–1181.
      Networks for Assessing Risks and
                                                          doi: 10.1016/j.ifacol.2020.12.1329.
      Forecasting Rates of Digital Currency, J.
                                                   [11]   O. Nelles, Nonlinear System Identifica-
      Theor. Appl. Inf. Technol. 100(24)
                                                          tion: From Classical Approaches to Neu-
      (2022) 7390–7404.
                                                          ral and Fuzzy Models, Springer (2001).
[2]   K. Khorolska, et al., Application of a
                                                          doi: 10.1007/978-3-662-04323-3.
      Convolutional Neural Network with a
                                                   [12]   I. Goodfellow, Y. Bengio, A. Courville,
      Module of Elementary Graphic Primitive
                                                          Deep Learning, The MIT Press (2016).
      Classifiers in the Problems of
                                                   [13]   C.-J. Lin, SISO Nonlinear           System
      Recognition of Drawing Documentation
                                                          Identification Using a Fuzzy-Neural
      and Transformation of 2D to 3D Models,
                                                          Hybrid System, Int. J. Neural Syst. 08(03)
      J. Theor. Appl. Inf. Technol. 100(24)
                                                          (1997)      325–337.     doi:     10.1142/
      (2022) 7426–7437.
                                                          s0129065797000331.
[3]   V. Sokolov, P. Skladannyi, A. Platonenko,
                                                   [14]   C. Andersson, N. Wahlström, T. Schön,
      Video Channel Suppression Method of
                                                          Learning Deep Autoregressive Models
      Unmanned Aerial Vehicles, in: IEEE 41st
                                                          for       Hierarchical    Data,       IFAC-
      International Conf. on Electronics and
                                                          PapersOnLine 54(7) (2021) 529–534.
      Nanotechnology (2022) 473–477. doi:
                                                          doi: 10.1016/j.ifacol.2021.08.414.
      10.1109/ELNANO54667.2022.9927105.
                                                   [15]   D. Kandamali, et al., Machine Learning
[4]   V. Zhebka, et al., Optimization of Machine
                                                          Methods       for    Identification     and
      Learning Method to Improve the
                                                          Classification of Events in ϕ-OTDR
      Management Efficiency of Hetero-
                                                          Systems: a review, Appl. Opt. 61(11)
      geneous Telecommunication Network,
                                                          (2022) 2975. doi: 10.1364/ao.444811.
      in: Cybersecurity Providing in Inf. and
                                                   [16]   S. Bickler, (2018). Machine Learning
      Telecom. Syst., vol. 3288 (2022) 149–155.
                                                          Identification and Classification of
[5]   V. Моskalenko, A. Korobov, Optimization
                                                          Historic Ceramics. Archaeology in New
      Parameters of Intellectual Identification
                                                          Zealand 61 (2018) 48–58.
      System of Objects on the Terrain,
                                                   [17]   V. Buriachok, et al., Invasion Detection
      Radioelectron. Comput. Syst. 2 (2019)
                                                          Model using Two-Stage Criterion of
      32–39. doi: 10.32620/reks.2016. 2.05.
                                                          Detection of Network Anomalies, in:
[6]   V. Lakhno, et al., Information Security
                                                          Cybersecurity Providing in Inf. and
      Audit Method Based on the Use of a
                                                          Telecom. Syst., vol. 2746 (2020) 23–32.
      Neuro-Fuzzy          System,      Software
                                                   [18]   J.-H. Lee, Minimum Euclidean Distance
      Engineering Application in Informatics,
                                                          Evaluation Using Deep Neural Networks,
      LNNS 232 (2021) 171–184. doi:
                                                          AEU – Int. J. Electron. Commun. 112
      10.1007/978-3-030-90318-3_17.
                                                          (2019) 152964. doi: 10.1016/j.aeue.
                                                          2019.152964.


                                               367