Obstacles and Traffic Signs Tracking System Anatolii Amarii 1, Stepan Melnychuk 2 and Yuliya Tanasyuk 3 1,2,3 Yuriy Fedkovych Chernivtsi National University, Kotsyubynsky 2, Chernivtsi, 58012, Ukraine Abstract The analysis, development, software implementation and testing of the methodologies for tracking obstacles and road signs have been performed. The created system utilizes artificial neural network of DeepLab for semantic segmentation of the car camera images to identify obstacles and to select traffic signs segments based on MobileNetV2. The TrafficSignNet artificial neural network is subsequently used for traffic signs classification. The software is implemented in the Python programming language using the Tensorflow machine learning platform and the OpenCV, Scipy and Skimage computer vision libraries. Keywords 1 Аrtificial neural networks, semantic segmentation, classification, computer vision 1. Introduction Continental (in collaboration with DigiLens Inc.) [1] and WayRay [2]. Both companies have implemented full-fledged hardware and software Nowadays there are many different systems of solutions with the use of augmented reality. human assistance in different areas. More and The aim of the given research is to develop a more often the ability to recognize images methodology for determining obstacles and road becomes the requirement for such systems. The signs in the direction of the car movement with the problem of image recognition is to identify certain designation of the entities detected from the video patterns in the picture and relate them to stream in real time on a computer screen. predefined classes. So, it was decided to explore this area and The driver behind the wheel needs to monitor develop a methodology for analysis of physical not only the road conditions, but also the objects located within the car route with the use of indications of the car sensors, such as current edge computing. And it will assist in further speed, engine RPM, position on the GPS map. informing of a vehicle driver and facilitate Although modern cars are designed so that all the decision-making. necessary information is available in the driver’s field of vision, even occasional distraction from the road to a device can lead to unpredictable 2. System development consequences. methodology To solve this problem, road tracking assistant systems are developed. Their operation is To develop an obstacle and road sign tracking primarily based on the algorithms and methods of system the classification and semantic the road situation analysis with the use of segmentation by artificial neural network was computer vision. The capabilities of such systems used. Artificial neural networks are commonly include the detection of various obstacles and road applied for image processing and show high signs in the path of the vehicle. values both in accuracy and computing speed. Similar obstacle and road tracking systems The task of image classification is to determine were produced at the following companies: whether its content belongs to a certain class. In ISIT 2021: II International Scientific and Practical Conference «Intellectual Systems and Information Technologies», September 13–19, 2021, Odesa, Ukraine EMAIL: amarii.anatolii@chnu.edu.ua (A. 1); s.melnychuk@chnu.edu.ua (A. 2); y.tanasyuk@chnu.edu.ua (A. 3) ORCID: 0000-0001-8650-0521 (A. 3) ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) contrast, semantic segmentation is designed for require any complex calculations. The labelling each pixel of an image correspondingly. architecture of the deployed neural network is Therefore, instead of belonging to one certain shown іn Figure 2. class, an image can be related to several An image (Input image) with a size of 32x32 categories. As it is shown in Figure 1, pixels with 3 color channels is fed to network`s classification would determine that there is a cat input. The first convolution layer uses eight 5x5 in the picture. While the segmentation would filters with ReLU activation function and 2x2 identify the same image not only as a cat, but also aggregation at the maximum value. the sky, trees and grass. ● Sky ● Cat Cat ● Tree ● Grass Figure 1: Difference between image classification (left side of picture, where the image of cat is recognized as cat) and image segmentation (right side of picture, where the image of cat is split up on segments of sky, trees, actual cat and grass) [3] It the developed system neural network semantic segmentation is used for identifying different traffic objects such as other vehicles, people, buildings, trees etc. Also, the created software utilizes capabilities of neural network to detect traffic signs, recognized by means of classification. Figure 2: Architecture of TraficSignNet 2.1. Model of neural network for Each following layer may differ in the number of filters and their dimensions. So, in the next two classification layers, 16 3x3 filters are used, with the following number of filters increased up to 32. The The analysis of neural network models was subsequent three layers are fully connected, where performed among those presented on the official the last one contains 43 neurons, each GitHub repository of the open machine learning corresponding to the number of road sign classes platform TensorFlow [4]. Most of the models in the training data set. considered either require high-power computing systems (such as ResNet and EfficientNet) or were developed for a specific purpose (MARCO). 2.2. Model of neural network for Therefore, it was decided to use a third-party semantic segmentation model: a specially created network TraficSignNet for road sign recognition [5]. This type of a For image semantic segmentation it was network takes advantage of a data set ready-made decided to use the Deeplab model [6], which is an for training and its simple structure that does not example of the "encoder-decoder" architecture. The encoder is a pre-trained classification with arbitrary resolution. It reduces the network. The MobileNetV2 model was chosen for calculation time without degrading the accuracy. the encoder network, the architecture of which can The task of the decoding network is to be seen in Figure 3. semantically project the discriminant features (lower resolution) learned by the encoder network onto the pixel space (higher resolution) to obtain a dense classification. a) b) Figure 4: Types of convolutions: a) ordinary (without dilation rate); b) atrous (with dilation rate) [7] 3. Algorithm traffic signs and obstacles recognition The algorithm of traffic signs and obstacles recognition implemented in the developed Figure 3: Architecture of MobileNetV2 software system is shown in Figure 5. The description of the algorithm is as follows. The MobileNetV2 architecture contains an initial fully convoluted layer with 32 filters, followed by 19 residual bottleneck layers. The ReLU6 activation function is also used to provide nonlinearity due to its reliability when used with low-precision calculations. In addition, we always use 3 × 3 kernel size as a standard for modern networks, and we use screening and batch normalization during training. In Figure 3 the blocks corresponding to the layers of the neural network contain the following notations: the dimension of the input data (h, w, k), the type of layer (conv2d - convolutional, avgpool - aggregation by the average value), the output number of channels (c), which determines the parameter k of the next layer, and the offset (stride - s), which determines the parameters h and w of the next layer and the structure of the "bottleneck". Below the layers there is the number of repetitions of layers with identical parameters. DeepLab applies some modifications to this model, changing the ordinary convolution (Fig. 4a) to an atros convolution via kernel dilation rate addition (Fig. 4b) to obtain the characteristics Figure 5: Block diagram of the algorithm calculated by deep convolutional neural networks The video camera captures images along the car route (Fig. 6). The image is pre-processed and fed to input DeepLab segmentation model, which returns a segmentation map (Fig. 7). The resulting segmentation map is divided into segments. Each set of segments is passed for processing to the corresponding module. When the module of road signs classification receives a sample (Fig. 8) it breaks it into separate segments omitting too small objects. After that, each of the remaining segments is further processed and applied as an input to the classification network, Figure 9: The road sign class definition (speed resulting in the road sign class definition and its limit) corresponding designation in the frame (Fig. 9). On the segmentation map in the Modules for selecting a vehicle and a pedestrian all segments related to these objects are highlighted (Fig. 10). Figure 6: Image from car camera Figure 10: Segments of pedestrians Then the segments are split up additionally and their spatial characteristics are found. When a ratio of a segment size to the original image size is greater than a value, defined by the spatial characteristics of the segment, outline in the shape of ellipse (Fig. 11) is superimposed on the original image (Fig. 12), and its brightness depends on the Figure 7: Segmentation map aspect ratio. Figure 8: Segments of road signs Figure 11: Ellipse, which highlights pedestrians location in a given image. The second neural network is used to recognize road signs. The test results proved the developed method to be sufficiently effective in identification of physical objects and single road signs located within the car route. Object recognition time is less than 0,5 sec, which implies the use of the proposed method for obstacles detection both in real time and with the video of car recordings. Figure 12: Resulting image with highlighted Taking into account the achieved results of testing pedestrians and detected traffic sign and utilization, the developed software can be further combined with the facilities of edge 4. Testing results computing to provide the driver with notification and decision-making system. The developed method of tracking obstacles and road signs was tested on personal computer 6. References equipped with CPU Intel Core i5-2400 and 8GB RAM memory by processing the car's video [1] Continental Group, Augmented-Reality recordings. The test results have shown that the HUD. URL: https://www.continental- developed system provides rather small automotive.com/en-gl/Passenger- computing time of 0,5 sec, which with an average Cars/Information-Management/Head-Up- car speed in the city of 30 km/h is enough to Displays/Augmented-Reality-HUD-(1) understand the general road conditions and even [2] WayRay. URL: https://wayray.com/ make decisions. Identification of real pedestrians [3] Jasmin Kurtanović, Deep Learning – and cars in the image, distinguishing them from Semantic Segmentation. URL: other objects, is performed quite accurately. https://serengetitech.com/tech/deep- Although, in the cases when several traffic signs learning-semantic-segmentation/ are placed too close to each other, a separate sign [4] TensorFlow Model Garden. URL: can’t be clearly distinguished, the system https://github.com/tensorflow/models successfully highlights the found segment. [5] Adrian Rosebrock, Traffic Sign However, several shortcomings have also been Classification with Keras and Deep revealed. Namely, due to the small depth of the Learning. URL: artificial neural network for semantic https://www.pyimagesearch.com/2019/11/0 segmentation, extraneous noise objects that do not 4/traffic-sign-classification-with-keras-and- belong to the specified classes are often deep-learning/ distinguished. Moreover, the data set for training [6] Liang-Chieh Chen, George Papandreou, an artificial neural network for the classification Iasonas Kokkinos, Kevin Murphy, Alan L. of road signs contains a fairly limited number of Yuille, DeepLab: Semantic Image classes (43 entities). In comparison the number of Segmentation with Deep Convolutional classes in the Ukrainian traffic rules counts 201 Nets, Atrous Convolution, and Fully entities, excluding plates. Connected CRFs. IV, volume 40 of IEEE Transactions on Pattern Analysis and 5. Conclusions Machine Intelligence, 2018, pp. 834-848. [7] Paul-Louis Pröve, An Introduction to different Types of Convolutions in Deep The given research considers the application Learning. URL: of the means and methods of artificial neural https://towardsdatascience.com/types-of- networks for semantic segmentation and image convolutions-in-deep-learning- classification with the intention to identify 717013397f4d obstacles and perform road signs recognition. For this purpose, two neural networks have been trained. One of them provides semantic segmentation of images, enabling one to define several entities of different classes as well as their