=Paper=
{{Paper
|id=Vol-3654/short2
|storemode=property
|title=Model of Graphic Object Identification in a Video Surveillance System based on a Neural Network (short paper)
|pdfUrl=https://ceur-ws.org/Vol-3654/short2.pdf
|volume=Vol-3654
|authors=Andii Sahun,Vladyslav Khaidurov,Viktor Bobkov
|dblpUrl=https://dblp.org/rec/conf/cpits/SahunKB24
}}
==Model of Graphic Object Identification in a Video Surveillance System based on a Neural Network (short paper)==
Model of Graphic Object Identification in a Video
Surveillance System based on a Neural Network
Andii Sahun1, Vladyslav Khaidurov2, and Viktor Bobkov2
1 National University of Life and Environmental Sciences of Ukraine, 15 Heroyiv Oborony str., Kyiv, 03041,
Ukraine
2 National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute,” 37 Peremohy ave.,
Solomyanskyi district, Kyiv, 03056, Ukraine
Abstract
The object identification system, given the correct model selection and settings, enables
accurate and fast identification of graphical objects in video data. A deep learning neural
network is the base for the identification system. The use of the CamVid benchmark video
dataset for training the neural network model allows using of fundamental truth labels
that associate each pixel with one of the 32 semantic classes of the identification system.
The total number of used training images is 421, and the testing ones are 280. Selecting
optimal parameters for the learning function and identification support, a method for
measuring the distance between feature vectors gives the necessary result—the
identification of objects from the video data stream of perimeter IP cameras
demonstrated an average accuracy of 99.7% across all cameras in the test examples,
consisting of 12 video fragments with a duration of 70 seconds each. The developed
algorithm of the system is capable of identifying objects of 11 classes from the graphical
information content of IP cameras.
Keywords 1
Deep machine learning, neural network, classifier, clustering, cluster analysis, pattern
recognition.
1. Introduction important aspect is the formation of the
classifier’s gradation scale in the identification
The creation and implementation of object system.
identification systems within various technical If we need to analyze video data it is critical
systems is explained by their demand in a large to create a function that analyzes video frames
number of contemporary information systems. and utilizes test databases for initial object
Today, effective and high-quality identification identification in individual static frames of the
of various objects can only be achieved by video sequence [3]. A separate challenge in
intelligent systems. These systems are based object identification models based on neural
on artificial intelligence or machine learning networks is the classification of recognized
algorithms. Frequently, neural networks in objects [4].
various variations serve as the basis for object Significant contributions to creating
identification systems. When using neural algorithms and methods for object
networks, it is crucial to correctly determine identification have been made by: V. Kornienko,
both the type of network and choose a test L. Budkova, A. Korobov, A. Korobov [1],
video database for proper training of the V. Lakhno, V. Chubaievskyi, K. Palaguta [6],
neural network [1, 2]. Otherwise, the accuracy V. Kornienko, I. Gulina, L. Budkova [7],
of the classification in the created object O. Kryvoruchko, A. Desiatko, A. Blozva,
identification system may be low. Another V. Semidotska [8]. Publications on the use of
neural networks and machine learning
CPITS-2024: Cybersecurity Providing in Information and Telecommunication Systems, February 28, 2024, Kyiv, Ukraine
EMAIL: avd29@ukr.net (A. Sahun); allif0111@gmail.com (V. Khaidurov); vbb.wlp@gmail.com (V. Bobkov)
ORCID: 0000-0002-5151-9203 (A. Sahun); 0000-0002-4805-8880 (V. Khaidurov); 0000-0002-1567-8186 (V. Bobkov)
©️ 2024 Copyright for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR Workshop Proceedings (CEUR-WS.org)
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
361
technologies for solving applied tasks in object difference between these two learning
identification and cybersecurity are dedicated methods is illustrated in Figure 1.
to the research of the following scholars:
S. Schuster [9], L. Ljung, C. Andersson, K. Tiels
[10], O. Nelles [11], I. Goodfellow, Y. Bengio,
A. Courville [12], C.-J. Lin [13], T. Schön [14],
D. Kandamali, X. Cao, M. Tian, Z. Jin, H. Dong,
K. Yu [15], S. Bickler [16], B. Akhmetov [7], and
others.
In most cases, overcoming the
aforementioned challenges allows obtaining a
correct mathematical model of the object Figure 1: The difference between two learning
identification system. The same systems can be methods (Machine Learning and Deep
used as a computer vision system or for other Learning)
practical applications. To obtain an object
identification system, it is crucial to obtain a In the neural network used as the basis for the
mathematical model on which such a system object identification system, multiple layers of
will operate [17]. To achieve this, it is neurons are envisaged. The input layer of the
necessary to analyze existing algorithms, neural network takes initial input data. In this
models, and methods applied in intelligent case, there are four neurons in the output
object identification systems. layer: intensity of each pixel, Haar features for
each of the considered objects (trees, cars,
roads, sky, pedestrians, etc.). The input layer
2. Graphic Object Identification
passes this data to the first hidden layer.
Hidden layers perform mathematical
According to the task conditions, the
computations with the input data. The term
fundamental property of the identification
“depth” in “deep learning” refers to having
system is the need to distinguish and identify
more than one hidden layer. The output layer
objects in individual frames of video content. It
produces the final result. In this case, it is the
is rational to base such models on the
identification of the type of object present in a
mathematical framework and algorithms of
specific image. The diagram of the obtained
neural networks.
neural network model is shown in Figure 2.
The advantage of a mathematical model
based on neural networks is the ability to learn.
In computer intelligent systems for object
identification, machine learning is a factor that
significantly improves the adequacy and
accuracy of recognition and identification
algorithm performance. The most rational type
of learning for the mathematical model of the
created system is supervised learning. This Figure 2: The best forecasting models
type of learning involves using labeled or obtained by the method of group accounting of
selected datasets that contain input data and arguments obtained
expected output results. Thus, during the
Identification of objects based on this machine
learning process, the model can perform
learning method occurs through the
internal iterations to approximate the
reevaluation of the weights of connections
specified error boundary. Once the learning
between neurons. The weight factor
boundary (error) ceases to exceed the
determines the importance of the input data
specified limits, the model is considered
element. When classifying objects, the weight
trained.
coefficient in interneuronal connections is the
Deep learning allows training a model to
most crucial.
predict outcomes based on a set of input data.
For the practical implementation of the
For network training, both supervised and
created model, the vgg16() function in
unsupervised learning can be used. The
MATLAB was chosen as the basis. This function
362
represents the architecture of a deep neural We can define these two vectors as follows:
network for image classification. It includes 16 𝑥 = (𝑥1 ; 𝑥2 ; … 𝑥𝑛 ). 𝑥 ′ = (𝑥1′ ; 𝑥2′ ; … 𝑥𝑛′ ).
convolutional and fully connected layers. To train a neural network, prepared data
The given function has 16 layers, including needs to be fed into it, and the generated
13 convolutional layers and 3 fully connected outputs should be compared with the results
layers. The overall architecture of VGG-16 from a test dataset. For the training of the
includes: neural network, test examples from the video
1. Input Layer: Consisting of 224x224 database labeled Cambridge (CamVid) were
pixels. used. This database represents the first
2. Convolutional Layers: each collection of videos with semantic labels of
convolutional layer has 3x3 filters and object classes, complemented with metadata.
ReLU activation. The database provides fundamental ground
The number of filters in each convolutional truth labels associating each pixel with one of
layer increases from 64 to 512. 32 semantic classes. It was obtained from a
MaxPooling (2x2) layers are used after each free internet link.
block of convolutional layers. This database addresses the need for
3. Fully Connected Layers: three fully forming experimental data to quantitatively
connected layers with 4096 neurons evaluate identification and classification
each. ReLU activation is applied to the algorithms. For each pair of objects, the
output neurons of each fully connected “distance” between them is measured,
layer. Dropout (random deactivation) representing the degree of similarity.
may be applied to prevent overfitting. A model that provides a minimum of
4. Output Layer: the output layer has 1000 external criteria is considered optimal. With an
neurons (the number of classes in increase in the number of variables in the
ImageNet images). The softmax model and the degree of the reference
activation function is used to obtain class polynomial, obtaining the best forecasting
probabilities. model can increase significantly.
The vgg16 function in MatLab returns a The obtained model is practically
neural network object but does not include a implemented in the MatLab environment. The
specific method for measuring the distance aforementioned video database of test samples
between feature vectors. was used for training the neural network. This
There are some specific methods for video database contains a 10-minute frame
measuring the distance between feature with a rate of 30 Hz. Images are segmented
vectors: Euclidean distance; Euclidean using corresponding semantic labels at a
distance squared; Manhattan distance; frequency of 1 Hz and partially at 15 GHz. The
Chebyshev distance; and hamming distance CamVid database has four datasets
[18]. But Euclid’s distance has the biggest corresponding to the studied objects. They
advantage—its simplicity. Its calculation is the include:
simplest and light-calculated way to get a 1. Pixel-level semantic segmentation for
direct path between two points. In the task of more than 700 images (segmentation
graphic image identification, we use the performed manually), later verified and
Euclidean distance to compare the model’s confirmed by a second person for
output values with expected values during accuracy.
training and for classifying objects based on 2. High-quality color video images in high
their features. resolution, collected in the database.
In this research, the classical Euclidean These images represent digital video
distance is utilized. To calculate it, we use the recordings with a long duration of video
following expression: content.
𝑛 3. Contain calibrated sequences for color
′) 2 response and internal camera
𝜌(𝑥, 𝑥 = √∑(𝑥𝑖 − 𝑥𝑖′ ) , characteristics, considering the point of
𝑖=1
view, as a typical surveillance camera
where 𝑥𝑖 is the first n-dimensional vector, 𝑥𝑖′ is and fixation of each frame in the
the second n-dimensional vector. sequences.
363
4. To support the expansion of the • InitialLearnRate (initial neural network
database, software is proposed for learning rate).
labeling (necessary to assist users who • L2Regularization (L2 neural network
want to perform accurate labeling of training regularization—weight decay);
classes for other images and videos). • MaxEpochs (number of full passes
The relevance of the database is evaluated through the entire dataset during
by measuring the algorithm’s performance in network training).
each of the three different areas: object • MiniBatchSize (mini-batch size, the
recognition in multiple classes, pedestrian number of examples used to update the
detection, and label propagation. gradient at each iteration).
• Shuffle and every epoch (shuffling data
3. Training and Testing the at each epoch during training).
• VerboseFrequency (the frequency of
Identification System Model displaying training progress information
in the command window).
To ensure the operational functionality of the Further, it is necessary to define the
model, the following steps were taken: loading classifier’s grading scales to perform cluster
test datasets from the CamVid database sets analysis and the final classification of identified
and preparing a repository for test loading graphical objects. As reference data, we will
samples. take the informational component of the color
Declare individual classes of identified channels of the identified image {R ∈ (0; 255),
objects: “Sky,” “Building,” “Pole,” “Road,” G ∈ (0; 255), B ∈ (0; 255)} this way:
“Pavement,” “Tree,” “SignSymbol,” “Fence,” • Reference information vector for sky
“Car,” “Pedestrian,” “Bicyclist”. identification:
The resolution of training frames is set to [128 128 128; ... % “Sky”].
360×480 points of the video stream: • Reference information vector for
imageSize = [360 480 3]. building and structure identification:
A neural network model for the 000 128 064; ... % “Bridge”
identification of graphical objects returns a 128 000 000; ... % “Building”
specific set of numerical values for the 064 192 000; ... % “Wall”
identified objects. To obtain the central value 064 000 064; ... % “Tunnel”
of an ordered set of such data, we use the 192 000 128; ... % “Archway”.
median () function in the Matlab environment. • Reference information vector for
The initialization of parameters for neural identifying columns, pillars, etc:
network training and the definition of the error 192 192 128; ... % “Column_Pole”
function for the neural network are provided 000 000 064; ... % “TrafficCone”.
in Table 1. • Reference information vector for
Table 1 identifying road surface:
Parameters (argument) of the neural network 128 064 128; ... % “Road”
training function 128 000 192; ... % “LaneMkgsDriv”
Argument’s Name Value 192 000 064; ... % “LaneMkgsNonDriv”.
Momentum 0.9 • Reference information vector for
InitialLearnRate 1×10-3 identifying sidewalks, pavement, and
L2Regularisation 5×10-4
pedestrian paths:
000 000 192; ... % “Sidewalk”
As we see from Table 1, the
064 192 128; ... % “ParkingBlock”
trainingOptions() function has 9 arguments:
128 128 192; ... % “RoadShoulder”.
• sgdm (Stochastic Gradient Descent with
• Reference information vector for
Momentum).
identifying trees, shrubs, and other
• 'Momentum (momentum helps
significant vegetation areas:
accelerate the optimization process by
128 128 000; ... % “Tree”
incorporating information from
192 192 000; ... % “VegetationMisc”.
previous iterations).
364
• Reference information vector for weight of each class can be determined (Table
identifying road signs, informational 3, column ‘Class weight’).
signs, etc.:
Table 2
192 128 128; ... % “SignSymbol”
Basic classes of model
128 128 064; ... % “Misc_Text” Class name Class characteristics (RGB features)
000 064 064; ... % “TrafficLight”. Sky 128 128 128
• Reference information vector for Building 128 0 0
identifying fences and barriers: Pole 192 192 192
Road 128 64 128
064 064 128; ... % “Fence”. Pavement 60 40 222
• Reference information vector for Tree 128 128 0
SignSymbol 192 128 128
identifying vehicles: Fence 64 64 128
064 000 128; ... % “Car” Car 64 0 128
064 128 192; ... % “SUVPickupTruck” Pedestrian 64 64 0
Bicyclist 0 128 192
192 128 192; ... % “Truck_Bus”
192 064 128; ... % “Train” Table 3
128 064 064; ... % “OtherMoving”. The weight of each class
• Reference information vector for Class name Class weight IoU
identifying pedestrians, animals, and Sky 0.318184709354742 0.9266
Building 0.208197860785155 0.7987
light means of manual cargo Pole 5.092367332938507 0.1698
transportation: Road 0.174381825257403 0.9518
064 064 000; ... % “Pedestrian” Pavement 0.710338097812948 0.4188
Tree 0.417518560687874 0.4340
192 128 064; ... % “Child” SignSymbol 4.537074815482926 0.3251
064 000 192; ... % “CartLuggagePram” Fence 1.838648261914560 0.4920
064 128 064; ... % “Animal”. Car 1.000000000000000 0.0688
Pedestrian 6.605878573155874 0
• Reference information vector for Bicyclist 5.113338416059593 0
identifying light mechanized means of
transportation for people and cargo Frequency characteristics of the occurrence of
(motorcycles/bicycles): weights for individual classes on specific
000 128 192; ... % “Bicyclist” frames are also determined in column ‘IoU’ of
192 000 192; ... % “MotorcycleScooter”. Table 3. As shown in column ‘IoU’, there are
From the provided reference information any pedestrians and Bicyclist in the test image
vectors, it can be seen that some classes of the were identified.
identified and subsequently clustered data Through training on the training set, the
contain subclasses, namely: algorithm based on a deep learning neural
• The ‘Building’ class contains 4 embedded network distinguishes the background from
subclasses. the informational content of identification
• The ‘Pole’ class contains 2 subclasses. (object—car) (Figure 3).
• The ‘Road’ contains 3 subclasses.
• The ‘Pavement’ contains 3 subclasses.
• The ‘Tree’ contains 2 subclasses.
• The ‘SignSymbol’ contains 3 subclasses.
• The ‘Car’ contains 5 subclasses.
• The ‘Pedestrian’ contains 4 subclasses.
• The ‘Bicyclist’ contains 2 subclasses.
The color channel reference map for
highlighting the classes of identified objects
contains a color channel vector for 11 basic
classes. Its representation is shown in Table 2.
It is critical to define a reference map for color
channels. Figure 3: The algorithm of the created
When training the model, the total number identification model separates the background
of training images is 421, and the number of in a graphical video frame
test images is 280. As a result of training, the
365
As an example, to showcase the operation of
the developed object identification system, a
graphical frame depicted in Figure 4 has been
loaded.
Figure 6: A histogram of the frequency of
Figure 4: Original graphical frame for testing
occurrence of identified classes and subclasses
the identification model
of objects in the identification zone of video
As a result of the developed algorithm, we surveillance cameras
observe the identification results on the
By comparing deviations of the obtained
demonstration test frame of objects subjected
numerical arrays with reference ones, the
to further cluster analysis. The legend for the
developed system for identifying graphic
identification algorithm classes is provided in
objects can make decisions regarding the
Figure 5.
detection of specific incidents or determine
certain reactions of the system.
4. Conclusions
As a result of testing the implemented
algorithm in the Matlab environment, the
accuracy of the graphic information
identification system model was proven to be
high. The identification of objects from the
video data stream of perimeter surveillance
cameras demonstrated an average accuracy of
96.38% across all cameras in the test
examples, consisting of 12 video fragments
with a duration of 68.35 seconds each. These
Figure 5: The legend for the identification results were achieved due to several factors,
algorithm classes including:
The practical application of the developed • Type of neural network and thoughtfully
identification system implies that the chosen parameters of the neural
identification results were presented not in a network training function.
visual (graphic) form but in the form of a • Method for determining the similarity
numerical data array, allowing these results to measure of an object to existing classes
be further used in more complex systems. To (distance measure for clustering)
achieve this, we construct a histogram of the through Euclidean distance.
frequency of occurrence of identified classes • Using the Cambridge (CamVid) labeled
and subclasses of objects in the identification video database as a collection of videos
zone of video surveillance cameras (Figure 6). with semantic labels of object classes,
accompanied by metadata. This
366
database facilitated effective training of [7] O. Herasina, V. Korniienko, The
the foundation of the identification Algorithms of Global and Local
system—the neural network model. Optimization in Tasks of Identification of
• Corrected definition of the classifier’s Difficult Dynamic Systems, Inf.
grading scales to perform cluster Processing Syst. 6(87) (2010) 73–77.
analysis and the final classification of [8] V. Lakhno, et al., Development Strategy
identified graphical objects and others. Model of the Informational Management
As seen from the results of the image Logistic System of a Commercial Enter-
identification model, incorporating Haar’s prise by Neural Network Apparatus, in:
features into the neural network model yields Cybersecurity Providing in Inf. and
excellent results in the classification and Telecom. Syst., vol. 2746 (2020) 87–98.
identification of images of various types. [9] H. Schuster, Deterministic Chaos:
Introduction and Recent Results,
References Nonlinear Dyn. Solids (1992) 22–30. doi:
10.1007/978-3-642-95650-8_2.
[10] L. Ljung, et al., (2020). Deep Learning
[1] B. Bebeshko, et al., Application of Game
and System Identification, IFAC-
Theory, Fuzzy Logic and Neural
PapersOnLine 53(2) (2020) 1175–1181.
Networks for Assessing Risks and
doi: 10.1016/j.ifacol.2020.12.1329.
Forecasting Rates of Digital Currency, J.
[11] O. Nelles, Nonlinear System Identifica-
Theor. Appl. Inf. Technol. 100(24)
tion: From Classical Approaches to Neu-
(2022) 7390–7404.
ral and Fuzzy Models, Springer (2001).
[2] K. Khorolska, et al., Application of a
doi: 10.1007/978-3-662-04323-3.
Convolutional Neural Network with a
[12] I. Goodfellow, Y. Bengio, A. Courville,
Module of Elementary Graphic Primitive
Deep Learning, The MIT Press (2016).
Classifiers in the Problems of
[13] C.-J. Lin, SISO Nonlinear System
Recognition of Drawing Documentation
Identification Using a Fuzzy-Neural
and Transformation of 2D to 3D Models,
Hybrid System, Int. J. Neural Syst. 08(03)
J. Theor. Appl. Inf. Technol. 100(24)
(1997) 325–337. doi: 10.1142/
(2022) 7426–7437.
s0129065797000331.
[3] V. Sokolov, P. Skladannyi, A. Platonenko,
[14] C. Andersson, N. Wahlström, T. Schön,
Video Channel Suppression Method of
Learning Deep Autoregressive Models
Unmanned Aerial Vehicles, in: IEEE 41st
for Hierarchical Data, IFAC-
International Conf. on Electronics and
PapersOnLine 54(7) (2021) 529–534.
Nanotechnology (2022) 473–477. doi:
doi: 10.1016/j.ifacol.2021.08.414.
10.1109/ELNANO54667.2022.9927105.
[15] D. Kandamali, et al., Machine Learning
[4] V. Zhebka, et al., Optimization of Machine
Methods for Identification and
Learning Method to Improve the
Classification of Events in ϕ-OTDR
Management Efficiency of Hetero-
Systems: a review, Appl. Opt. 61(11)
geneous Telecommunication Network,
(2022) 2975. doi: 10.1364/ao.444811.
in: Cybersecurity Providing in Inf. and
[16] S. Bickler, (2018). Machine Learning
Telecom. Syst., vol. 3288 (2022) 149–155.
Identification and Classification of
[5] V. Моskalenko, A. Korobov, Optimization
Historic Ceramics. Archaeology in New
Parameters of Intellectual Identification
Zealand 61 (2018) 48–58.
System of Objects on the Terrain,
[17] V. Buriachok, et al., Invasion Detection
Radioelectron. Comput. Syst. 2 (2019)
Model using Two-Stage Criterion of
32–39. doi: 10.32620/reks.2016. 2.05.
Detection of Network Anomalies, in:
[6] V. Lakhno, et al., Information Security
Cybersecurity Providing in Inf. and
Audit Method Based on the Use of a
Telecom. Syst., vol. 2746 (2020) 23–32.
Neuro-Fuzzy System, Software
[18] J.-H. Lee, Minimum Euclidean Distance
Engineering Application in Informatics,
Evaluation Using Deep Neural Networks,
LNNS 232 (2021) 171–184. doi:
AEU – Int. J. Electron. Commun. 112
10.1007/978-3-030-90318-3_17.
(2019) 152964. doi: 10.1016/j.aeue.
2019.152964.
367