Model of Graphic Object Identification in a Video Surveillance System based on a Neural Network Andii Sahun1, Vladyslav Khaidurov2, and Viktor Bobkov2 1 National University of Life and Environmental Sciences of Ukraine, 15 Heroyiv Oborony str., Kyiv, 03041, Ukraine 2 National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute,” 37 Peremohy ave., Solomyanskyi district, Kyiv, 03056, Ukraine Abstract The object identification system, given the correct model selection and settings, enables accurate and fast identification of graphical objects in video data. A deep learning neural network is the base for the identification system. The use of the CamVid benchmark video dataset for training the neural network model allows using of fundamental truth labels that associate each pixel with one of the 32 semantic classes of the identification system. The total number of used training images is 421, and the testing ones are 280. Selecting optimal parameters for the learning function and identification support, a method for measuring the distance between feature vectors gives the necessary result—the identification of objects from the video data stream of perimeter IP cameras demonstrated an average accuracy of 99.7% across all cameras in the test examples, consisting of 12 video fragments with a duration of 70 seconds each. The developed algorithm of the system is capable of identifying objects of 11 classes from the graphical information content of IP cameras. Keywords 1 Deep machine learning, neural network, classifier, clustering, cluster analysis, pattern recognition. 1. Introduction important aspect is the formation of the classifier’s gradation scale in the identification The creation and implementation of object system. identification systems within various technical If we need to analyze video data it is critical systems is explained by their demand in a large to create a function that analyzes video frames number of contemporary information systems. and utilizes test databases for initial object Today, effective and high-quality identification identification in individual static frames of the of various objects can only be achieved by video sequence [3]. A separate challenge in intelligent systems. These systems are based object identification models based on neural on artificial intelligence or machine learning networks is the classification of recognized algorithms. Frequently, neural networks in objects [4]. various variations serve as the basis for object Significant contributions to creating identification systems. When using neural algorithms and methods for object networks, it is crucial to correctly determine identification have been made by: V. Kornienko, both the type of network and choose a test L. Budkova, A. Korobov, A. Korobov [1], video database for proper training of the V. Lakhno, V. Chubaievskyi, K. Palaguta [6], neural network [1, 2]. Otherwise, the accuracy V. Kornienko, I. Gulina, L. Budkova [7], of the classification in the created object O. Kryvoruchko, A. Desiatko, A. Blozva, identification system may be low. Another V. Semidotska [8]. Publications on the use of neural networks and machine learning CPITS-2024: Cybersecurity Providing in Information and Telecommunication Systems, February 28, 2024, Kyiv, Ukraine EMAIL: avd29@ukr.net (A. Sahun); allif0111@gmail.com (V. Khaidurov); vbb.wlp@gmail.com (V. Bobkov) ORCID: 0000-0002-5151-9203 (A. Sahun); 0000-0002-4805-8880 (V. Khaidurov); 0000-0002-1567-8186 (V. Bobkov) ©️ 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 361 technologies for solving applied tasks in object difference between these two learning identification and cybersecurity are dedicated methods is illustrated in Figure 1. to the research of the following scholars: S. Schuster [9], L. Ljung, C. Andersson, K. Tiels [10], O. Nelles [11], I. Goodfellow, Y. Bengio, A. Courville [12], C.-J. Lin [13], T. Schön [14], D. Kandamali, X. Cao, M. Tian, Z. Jin, H. Dong, K. Yu [15], S. Bickler [16], B. Akhmetov [7], and others. In most cases, overcoming the aforementioned challenges allows obtaining a correct mathematical model of the object Figure 1: The difference between two learning identification system. The same systems can be methods (Machine Learning and Deep used as a computer vision system or for other Learning) practical applications. To obtain an object identification system, it is crucial to obtain a In the neural network used as the basis for the mathematical model on which such a system object identification system, multiple layers of will operate [17]. To achieve this, it is neurons are envisaged. The input layer of the necessary to analyze existing algorithms, neural network takes initial input data. In this models, and methods applied in intelligent case, there are four neurons in the output object identification systems. layer: intensity of each pixel, Haar features for each of the considered objects (trees, cars, roads, sky, pedestrians, etc.). The input layer 2. Graphic Object Identification passes this data to the first hidden layer. Hidden layers perform mathematical According to the task conditions, the computations with the input data. The term fundamental property of the identification “depth” in “deep learning” refers to having system is the need to distinguish and identify more than one hidden layer. The output layer objects in individual frames of video content. It produces the final result. In this case, it is the is rational to base such models on the identification of the type of object present in a mathematical framework and algorithms of specific image. The diagram of the obtained neural networks. neural network model is shown in Figure 2. The advantage of a mathematical model based on neural networks is the ability to learn. In computer intelligent systems for object identification, machine learning is a factor that significantly improves the adequacy and accuracy of recognition and identification algorithm performance. The most rational type of learning for the mathematical model of the created system is supervised learning. This Figure 2: The best forecasting models type of learning involves using labeled or obtained by the method of group accounting of selected datasets that contain input data and arguments obtained expected output results. Thus, during the Identification of objects based on this machine learning process, the model can perform learning method occurs through the internal iterations to approximate the reevaluation of the weights of connections specified error boundary. Once the learning between neurons. The weight factor boundary (error) ceases to exceed the determines the importance of the input data specified limits, the model is considered element. When classifying objects, the weight trained. coefficient in interneuronal connections is the Deep learning allows training a model to most crucial. predict outcomes based on a set of input data. For the practical implementation of the For network training, both supervised and created model, the vgg16() function in unsupervised learning can be used. The MATLAB was chosen as the basis. This function 362 represents the architecture of a deep neural We can define these two vectors as follows: network for image classification. It includes 16 𝑥 = (𝑥1 ; 𝑥2 ; … 𝑥𝑛 ). 𝑥 ′ = (𝑥1′ ; 𝑥2′ ; … 𝑥𝑛′ ). convolutional and fully connected layers. To train a neural network, prepared data The given function has 16 layers, including needs to be fed into it, and the generated 13 convolutional layers and 3 fully connected outputs should be compared with the results layers. The overall architecture of VGG-16 from a test dataset. For the training of the includes: neural network, test examples from the video 1. Input Layer: Consisting of 224x224 database labeled Cambridge (CamVid) were pixels. used. This database represents the first 2. Convolutional Layers: each collection of videos with semantic labels of convolutional layer has 3x3 filters and object classes, complemented with metadata. ReLU activation. The database provides fundamental ground The number of filters in each convolutional truth labels associating each pixel with one of layer increases from 64 to 512. 32 semantic classes. It was obtained from a MaxPooling (2x2) layers are used after each free internet link. block of convolutional layers. This database addresses the need for 3. Fully Connected Layers: three fully forming experimental data to quantitatively connected layers with 4096 neurons evaluate identification and classification each. ReLU activation is applied to the algorithms. For each pair of objects, the output neurons of each fully connected “distance” between them is measured, layer. Dropout (random deactivation) representing the degree of similarity. may be applied to prevent overfitting. A model that provides a minimum of 4. Output Layer: the output layer has 1000 external criteria is considered optimal. With an neurons (the number of classes in increase in the number of variables in the ImageNet images). The softmax model and the degree of the reference activation function is used to obtain class polynomial, obtaining the best forecasting probabilities. model can increase significantly. The vgg16 function in MatLab returns a The obtained model is practically neural network object but does not include a implemented in the MatLab environment. The specific method for measuring the distance aforementioned video database of test samples between feature vectors. was used for training the neural network. This There are some specific methods for video database contains a 10-minute frame measuring the distance between feature with a rate of 30 Hz. Images are segmented vectors: Euclidean distance; Euclidean using corresponding semantic labels at a distance squared; Manhattan distance; frequency of 1 Hz and partially at 15 GHz. The Chebyshev distance; and hamming distance CamVid database has four datasets [18]. But Euclid’s distance has the biggest corresponding to the studied objects. They advantage—its simplicity. Its calculation is the include: simplest and light-calculated way to get a 1. Pixel-level semantic segmentation for direct path between two points. In the task of more than 700 images (segmentation graphic image identification, we use the performed manually), later verified and Euclidean distance to compare the model’s confirmed by a second person for output values with expected values during accuracy. training and for classifying objects based on 2. High-quality color video images in high their features. resolution, collected in the database. In this research, the classical Euclidean These images represent digital video distance is utilized. To calculate it, we use the recordings with a long duration of video following expression: content. 𝑛 3. Contain calibrated sequences for color ′) 2 response and internal camera 𝜌(𝑥, 𝑥 = √∑(𝑥𝑖 − 𝑥𝑖′ ) , characteristics, considering the point of 𝑖=1 view, as a typical surveillance camera where 𝑥𝑖 is the first n-dimensional vector, 𝑥𝑖′ is and fixation of each frame in the the second n-dimensional vector. sequences. 363 4. To support the expansion of the • InitialLearnRate (initial neural network database, software is proposed for learning rate). labeling (necessary to assist users who • L2Regularization (L2 neural network want to perform accurate labeling of training regularization—weight decay); classes for other images and videos). • MaxEpochs (number of full passes The relevance of the database is evaluated through the entire dataset during by measuring the algorithm’s performance in network training). each of the three different areas: object • MiniBatchSize (mini-batch size, the recognition in multiple classes, pedestrian number of examples used to update the detection, and label propagation. gradient at each iteration). • Shuffle and every epoch (shuffling data 3. Training and Testing the at each epoch during training). • VerboseFrequency (the frequency of Identification System Model displaying training progress information in the command window). To ensure the operational functionality of the Further, it is necessary to define the model, the following steps were taken: loading classifier’s grading scales to perform cluster test datasets from the CamVid database sets analysis and the final classification of identified and preparing a repository for test loading graphical objects. As reference data, we will samples. take the informational component of the color Declare individual classes of identified channels of the identified image {R ∈ (0; 255), objects: “Sky,” “Building,” “Pole,” “Road,” G ∈ (0; 255), B ∈ (0; 255)} this way: “Pavement,” “Tree,” “SignSymbol,” “Fence,” • Reference information vector for sky “Car,” “Pedestrian,” “Bicyclist”. identification: The resolution of training frames is set to [128 128 128; ... % “Sky”]. 360×480 points of the video stream: • Reference information vector for imageSize = [360 480 3]. building and structure identification: A neural network model for the 000 128 064; ... % “Bridge” identification of graphical objects returns a 128 000 000; ... % “Building” specific set of numerical values for the 064 192 000; ... % “Wall” identified objects. To obtain the central value 064 000 064; ... % “Tunnel” of an ordered set of such data, we use the 192 000 128; ... % “Archway”. median () function in the Matlab environment. • Reference information vector for The initialization of parameters for neural identifying columns, pillars, etc: network training and the definition of the error 192 192 128; ... % “Column_Pole” function for the neural network are provided 000 000 064; ... % “TrafficCone”. in Table 1. • Reference information vector for Table 1 identifying road surface: Parameters (argument) of the neural network 128 064 128; ... % “Road” training function 128 000 192; ... % “LaneMkgsDriv” Argument’s Name Value 192 000 064; ... % “LaneMkgsNonDriv”. Momentum 0.9 • Reference information vector for InitialLearnRate 1×10-3 identifying sidewalks, pavement, and L2Regularisation 5×10-4 pedestrian paths: 000 000 192; ... % “Sidewalk” As we see from Table 1, the 064 192 128; ... % “ParkingBlock” trainingOptions() function has 9 arguments: 128 128 192; ... % “RoadShoulder”. • sgdm (Stochastic Gradient Descent with • Reference information vector for Momentum). identifying trees, shrubs, and other • 'Momentum (momentum helps significant vegetation areas: accelerate the optimization process by 128 128 000; ... % “Tree” incorporating information from 192 192 000; ... % “VegetationMisc”. previous iterations). 364 • Reference information vector for weight of each class can be determined (Table identifying road signs, informational 3, column ‘Class weight’). signs, etc.: Table 2 192 128 128; ... % “SignSymbol” Basic classes of model 128 128 064; ... % “Misc_Text” Class name Class characteristics (RGB features) 000 064 064; ... % “TrafficLight”. Sky 128 128 128 • Reference information vector for Building 128 0 0 identifying fences and barriers: Pole 192 192 192 Road 128 64 128 064 064 128; ... % “Fence”. Pavement 60 40 222 • Reference information vector for Tree 128 128 0 SignSymbol 192 128 128 identifying vehicles: Fence 64 64 128 064 000 128; ... % “Car” Car 64 0 128 064 128 192; ... % “SUVPickupTruck” Pedestrian 64 64 0 Bicyclist 0 128 192 192 128 192; ... % “Truck_Bus” 192 064 128; ... % “Train” Table 3 128 064 064; ... % “OtherMoving”. The weight of each class • Reference information vector for Class name Class weight IoU identifying pedestrians, animals, and Sky 0.318184709354742 0.9266 Building 0.208197860785155 0.7987 light means of manual cargo Pole 5.092367332938507 0.1698 transportation: Road 0.174381825257403 0.9518 064 064 000; ... % “Pedestrian” Pavement 0.710338097812948 0.4188 Tree 0.417518560687874 0.4340 192 128 064; ... % “Child” SignSymbol 4.537074815482926 0.3251 064 000 192; ... % “CartLuggagePram” Fence 1.838648261914560 0.4920 064 128 064; ... % “Animal”. Car 1.000000000000000 0.0688 Pedestrian 6.605878573155874 0 • Reference information vector for Bicyclist 5.113338416059593 0 identifying light mechanized means of transportation for people and cargo Frequency characteristics of the occurrence of (motorcycles/bicycles): weights for individual classes on specific 000 128 192; ... % “Bicyclist” frames are also determined in column ‘IoU’ of 192 000 192; ... % “MotorcycleScooter”. Table 3. As shown in column ‘IoU’, there are From the provided reference information any pedestrians and Bicyclist in the test image vectors, it can be seen that some classes of the were identified. identified and subsequently clustered data Through training on the training set, the contain subclasses, namely: algorithm based on a deep learning neural • The ‘Building’ class contains 4 embedded network distinguishes the background from subclasses. the informational content of identification • The ‘Pole’ class contains 2 subclasses. (object—car) (Figure 3). • The ‘Road’ contains 3 subclasses. • The ‘Pavement’ contains 3 subclasses. • The ‘Tree’ contains 2 subclasses. • The ‘SignSymbol’ contains 3 subclasses. • The ‘Car’ contains 5 subclasses. • The ‘Pedestrian’ contains 4 subclasses. • The ‘Bicyclist’ contains 2 subclasses. The color channel reference map for highlighting the classes of identified objects contains a color channel vector for 11 basic classes. Its representation is shown in Table 2. It is critical to define a reference map for color channels. Figure 3: The algorithm of the created When training the model, the total number identification model separates the background of training images is 421, and the number of in a graphical video frame test images is 280. As a result of training, the 365 As an example, to showcase the operation of the developed object identification system, a graphical frame depicted in Figure 4 has been loaded. Figure 6: A histogram of the frequency of Figure 4: Original graphical frame for testing occurrence of identified classes and subclasses the identification model of objects in the identification zone of video As a result of the developed algorithm, we surveillance cameras observe the identification results on the By comparing deviations of the obtained demonstration test frame of objects subjected numerical arrays with reference ones, the to further cluster analysis. The legend for the developed system for identifying graphic identification algorithm classes is provided in objects can make decisions regarding the Figure 5. detection of specific incidents or determine certain reactions of the system. 4. Conclusions As a result of testing the implemented algorithm in the Matlab environment, the accuracy of the graphic information identification system model was proven to be high. The identification of objects from the video data stream of perimeter surveillance cameras demonstrated an average accuracy of 96.38% across all cameras in the test examples, consisting of 12 video fragments with a duration of 68.35 seconds each. These Figure 5: The legend for the identification results were achieved due to several factors, algorithm classes including: The practical application of the developed • Type of neural network and thoughtfully identification system implies that the chosen parameters of the neural identification results were presented not in a network training function. visual (graphic) form but in the form of a • Method for determining the similarity numerical data array, allowing these results to measure of an object to existing classes be further used in more complex systems. To (distance measure for clustering) achieve this, we construct a histogram of the through Euclidean distance. frequency of occurrence of identified classes • Using the Cambridge (CamVid) labeled and subclasses of objects in the identification video database as a collection of videos zone of video surveillance cameras (Figure 6). with semantic labels of object classes, accompanied by metadata. This 366 database facilitated effective training of [7] O. Herasina, V. Korniienko, The the foundation of the identification Algorithms of Global and Local system—the neural network model. Optimization in Tasks of Identification of • Corrected definition of the classifier’s Difficult Dynamic Systems, Inf. grading scales to perform cluster Processing Syst. 6(87) (2010) 73–77. analysis and the final classification of [8] V. Lakhno, et al., Development Strategy identified graphical objects and others. Model of the Informational Management As seen from the results of the image Logistic System of a Commercial Enter- identification model, incorporating Haar’s prise by Neural Network Apparatus, in: features into the neural network model yields Cybersecurity Providing in Inf. and excellent results in the classification and Telecom. Syst., vol. 2746 (2020) 87–98. identification of images of various types. [9] H. Schuster, Deterministic Chaos: Introduction and Recent Results, References Nonlinear Dyn. Solids (1992) 22–30. doi: 10.1007/978-3-642-95650-8_2. [10] L. Ljung, et al., (2020). Deep Learning [1] B. Bebeshko, et al., Application of Game and System Identification, IFAC- Theory, Fuzzy Logic and Neural PapersOnLine 53(2) (2020) 1175–1181. Networks for Assessing Risks and doi: 10.1016/j.ifacol.2020.12.1329. Forecasting Rates of Digital Currency, J. [11] O. Nelles, Nonlinear System Identifica- Theor. Appl. Inf. Technol. 100(24) tion: From Classical Approaches to Neu- (2022) 7390–7404. ral and Fuzzy Models, Springer (2001). [2] K. Khorolska, et al., Application of a doi: 10.1007/978-3-662-04323-3. Convolutional Neural Network with a [12] I. Goodfellow, Y. Bengio, A. Courville, Module of Elementary Graphic Primitive Deep Learning, The MIT Press (2016). Classifiers in the Problems of [13] C.-J. Lin, SISO Nonlinear System Recognition of Drawing Documentation Identification Using a Fuzzy-Neural and Transformation of 2D to 3D Models, Hybrid System, Int. J. Neural Syst. 08(03) J. Theor. Appl. Inf. Technol. 100(24) (1997) 325–337. doi: 10.1142/ (2022) 7426–7437. s0129065797000331. [3] V. Sokolov, P. Skladannyi, A. Platonenko, [14] C. Andersson, N. Wahlström, T. Schön, Video Channel Suppression Method of Learning Deep Autoregressive Models Unmanned Aerial Vehicles, in: IEEE 41st for Hierarchical Data, IFAC- International Conf. on Electronics and PapersOnLine 54(7) (2021) 529–534. Nanotechnology (2022) 473–477. doi: doi: 10.1016/j.ifacol.2021.08.414. 10.1109/ELNANO54667.2022.9927105. [15] D. Kandamali, et al., Machine Learning [4] V. Zhebka, et al., Optimization of Machine Methods for Identification and Learning Method to Improve the Classification of Events in ϕ-OTDR Management Efficiency of Hetero- Systems: a review, Appl. Opt. 61(11) geneous Telecommunication Network, (2022) 2975. doi: 10.1364/ao.444811. in: Cybersecurity Providing in Inf. and [16] S. Bickler, (2018). Machine Learning Telecom. Syst., vol. 3288 (2022) 149–155. Identification and Classification of [5] V. Моskalenko, A. Korobov, Optimization Historic Ceramics. Archaeology in New Parameters of Intellectual Identification Zealand 61 (2018) 48–58. System of Objects on the Terrain, [17] V. Buriachok, et al., Invasion Detection Radioelectron. Comput. Syst. 2 (2019) Model using Two-Stage Criterion of 32–39. doi: 10.32620/reks.2016. 2.05. Detection of Network Anomalies, in: [6] V. Lakhno, et al., Information Security Cybersecurity Providing in Inf. and Audit Method Based on the Use of a Telecom. Syst., vol. 2746 (2020) 23–32. Neuro-Fuzzy System, Software [18] J.-H. Lee, Minimum Euclidean Distance Engineering Application in Informatics, Evaluation Using Deep Neural Networks, LNNS 232 (2021) 171–184. doi: AEU – Int. J. Electron. Commun. 112 10.1007/978-3-030-90318-3_17. (2019) 152964. doi: 10.1016/j.aeue. 2019.152964. 367