=Paper=
{{Paper
|id=Vol-2280/paper-09
|storemode=property
|title=Comparison of Support Vector Machines and Deep Learning for Vehicle Detection
|pdfUrl=https://ceur-ws.org/Vol-2280/paper-09.pdf
|volume=Vol-2280
|authors=Özgür Kaplan,Ediz Saykol
|dblpUrl=https://dblp.org/rec/conf/rtacsit/KaplanS18
}}
==Comparison of Support Vector Machines and Deep Learning for Vehicle Detection==
Comparison of Support Vector Machines and Deep Learning for Vehicle Detection Özgür Kaplan and Ediz Şaykol Beykent University Department of Computer Engineering, Ayazağa Campus, 34396, Istanbul, Turkey ozgurkaplan@outlook.com , ediz.saykol@beykent.edu.tr Abstract Besides, there are also simulation-based approaches to help drivers. For example, [PYL+15] proposed an The main goal of this paper is to compare increased reality system to increase the driver's different Vehicle Detection algorithms and to immediate attention. The system calculates the provide an effective comparison technique for probability of collision with the vehicles in the same developers and researchers. During this study, lane as the driver and colorizes the lanes according to fine tunings are suggested to improve the the risk ratio. In [RSL+14], it is aimed to help middle- implementations of these algorithms. Our focus aged and older drivers to make a left turn in running on Support Vector Machines (SVM) and then traffic. Drivers have been given hints to return with Deep Learning based approaches. The SVM augmented reality for their left turns in different based vehicle detection implementation utilizes scenarios such as heavy traffic and flowing traffic. Histogram Oriented Gradients (HOG). The In this paper, we focus on SVM-based vehicle deep learning approach we consider is the detection technique, where we can simplify vehicle YOLO implementation. Our evaluation detection problem to a classification problem by employs 400 random frames extracted from a examining individual sections of image and classify real world driving video. As stated by the whether it is a vehicle image or not a vehicle. On the experimental results, YOLO is more accurate other side, we consider Deep Learning approaches that with %81.9 success than SVM which only automatically learn image features that are required for scored %57.8. vehicle detection. There are different deep learning techniques such as Region-based Convolutional 1. Introduction Network (R-CNN) [GDD+14], Fast R-CNN [Gir15], Faster R-CNN [RHG+17] and You Only Look Once According to the World Health Organization (WHO), (YOLO) [RDG+16]. We choose YOLO as it is the faster 1.25 million people die each year from traffic accidents among these techniques [DU18]. The basic aim is to [WHO15]. Prevention of traffic accidents and provide an effective comparison technique for punishment of drivers who violate the rules are of great developers and researchers. importance for humanity. Many municipalities are using The organization of the paper is as follows: In Section automated camera system to detect cars that violates 2, we present vehicle detection techniques that we traffic rules. Proper operation of these systems is crucial employ in our study. Section 3 provides the comparative to avoid further traffic accidents. study and Section 4 concludes the paper. It is estimated that 1 in every 10 cars in traffic will be composed of vehicles capable of self-driving in 2030 and 1 in every 3 in 2050 [MKG+16]. It is obvious that 2. Vehicle Detection Approaches the safety of future traffic will be directly linked to the Our study contains two major components. First quality of the software these vehicles will be using. For phase is detecting vehicles using SVM and deep both software, detecting vehicles who violate the rules learning with YOLO. Second phase is comparing and self-driving vehicles, it is critical to identify images accuracy and performance of these two methods. correctly. As our starting point we utilized implementation of There are several techniques in the literature suited [Fu17] that includes a classical SVM approach using for vehicle detection task. Existing papers suggests OpenCV and Histogram of Oriented Gradients (HOG) using acoustic sensor networks [WCH17], wavelet and feature extraction as well as a deep neural network with interest point based feature extraction [KB13], edge- YOLO tensor flow implementation using pre-trained based constraint filters based vehicle segmentation weight object [Cho16]. To simplify the problem we only [Sri02] and time spatial data [MRR10]. considered cars as vehicles. 2.1 SVM-based Vehicle Detection using HOG The HOG technique counts the occurrences of the gradient orientation of the localized regions of an image. In this approach, we used supervised learning with First image is divided into small connected cells, and for pre-categorized images. We used the images provided each cell gradient directions are calculated. Each cell is by GTI vehicle database [GTI18]. There are 3425 splitted into angular bins according to the gradient images of vehicle rears and 3900 images of road orientation. The pixel of each cell adds the weight sequences not containing vehicles. All images are 64x64 gradient to the corresponding angular bin. Groups of pixels. Figure 1 shows a vehicle image and a non-vehicle adjacent cells are called blocks. The grouping of cells to image. blocks is the basis of normalization of histograms. This normalized group of histograms generates the block histogram and set of block histograms represents the descriptor [Int18]. Figure 3 shows a car image and the corresponding HOG transformation. Figure 1: A vehicle image (left) and a non-vehicle image (right) in [GTI18]. First step to prepare the images for SVM classifier is to extract HOG features. The main purpose of HOG is to identify the image as a group of local histograms. HOG is not scale invariant. In order to use HOG we needed the same size for our training images. All our training set images were already the same size of 64x64 so we were able to use our images directly. Figure 2 shows our HOG implementation details. Figure 3: HOG Transformation of a car image. To train the model we randomly split the data into two parts with respect to 5-fold. %80 percent is used for training and %20 percent is used for test, as suggested by most of the learning schemes. We explored different parameters and color spaces for our images to get better results. After some trial and error we managed to get %86.7 success rate. We wanted to test the classifier on a large scale Figure 2: HOG Implementation. (The image source: images. We found raw driving video data on KITTI https://software.intel.com/en-us/ipp-dev-reference- Vision Benchmark Suite [KIT18]. We modified the histogram-of-oriented-gradients-hog-descriptor) code to make it work on 1392 x 512 pixels which is the resolution of raw driving video data. The 64x64 pixel block of search windows is used to estimates the B bounding boxes and C class search the entire frame. We used 50% percent overlap probabilities. The bounding box has 5 components: x, y, for search windows. For each window, our SVM w, h and confidence. The (x, y) are center coordinates of classifier is used to detect if there is a vehicle or not. the box. The (w, h) are dimensions. The confidence Figure 4 shows how search windows are used. score tells us if there is any shape in the box. If the score is zero then there should be no object in the cell. Each grid cell makes B predictions, so there are total of S x S x B x 5 outputs. The network predicts one class probability per cell that results S x S x C total probabilities. Figure 6 shows YOLO object detection and Figure 7 shows YOLO execution pipeline. Figure 4: Search Windows (The image source: https://raw.githubusercontent.com/JunshengFu/vehicle- detection/master/examples/search_windows.png ) To get the exact locations of vehicles, heat map generation is used with the windows that may have vehicle. Figure 5 shows a successfully detected vehicle using SVM. Figure 6: YOLO Object Detection [RDG+16] Figure 7: YOLO Pipeline [RDG+16] Figure 5: Vehicle Detected with SVM We utilized [Fu17] code, which employs tensorflow implementation of YOLO [Cho16]. It uses pre-trained YOLO_small network, which has 20 classes as follows: 2.2 Deep Learning based Vehicle Detection "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "dining table", "dog", We chose the YOLO technique for this purpose. "horse", "motorbike", "person", "potted plant", "sheep", YOLO uses deep neural network to detect objects. Yolo "sofa", "train","tv monitor". Since vehicle is already approaches object detection as a simple single known as “car”, we were able to use precomputed regression problem to find bounding boxes class settings and apply it directly to our inputs. We used 30% probabilities. YOLO is trained on full images and it threshold, cells whose car class score is 0.3 or more are predicts multiple bounding boxes using single neural selected. Figure 8 shows a successfully detected vehicle network. First, YOLO model accepts an image as input using YOLO. then divides it into an SxS grid. Each cell of this grid Figure 10 shows a sample snapshot of our experimental study employing deep learning based YOLO technique. YOLO downscales images to 448x448 which causes distortions. With pre road or line detection, we can send smaller and more useful segments to YOLO to avoid distortions and get better results. Figure 8: Vehicle Detected with YOLO 3. Comparing SVM and YOLO We tested both algorithms on the same randomly selected raw driving video data. Random data contains 400 images with total of 266 vehicles in them. Test results are evaluated in 3 categories; Figure 10: Sample Snapshot of YOLO technique. Positive : Vehicle detected correctly Negative: Vehicle is not detected. Both algorithms failed to detect vehicles, which are False Positive: Non-vehicle is detected as further ahead. We may be able to detect these further vehicle. vehicles with SVM using higher resolution data. However higher resolution data would require more Figure 9 shows a sample snapshot of our computation and therefore the performance would be experimental study employing SVM-based technique. worse. As for YOLO, which downscales pictures to SVM generated a lot of false positives. In some frames, 448x448 pixels, it is useless to use higher resolution SVM identified road signs, trees and pedestrians as data. vehicles. These false negatives may be avoided if a pre road line detection is performed. Thus, we can limit SVM search window within actual road. Figure 11: Test results for SVM Figure 9: Sample Snapshot of SVM-based technique. 4. Conclusion and Future Work In this paper, we looked into two different vehicle detection algorithm implementations and test them against real world traffic data. Our results showed that deep neural network with YOLO performed more accurate results. YOLO also performed faster which makes it more suited for real time tasks. With our approach we are able to find strong and weak sides of both models. Thus, we managed to implement and suggest fine tunings. Our approach has potential to be used to compare different algorithms and to be used to find fine tunings. Figure 12: Test results for YOLO References [WHO15] World Health Organization. (2015). Global status report on road safety 2015. World Health Organization. http://www.who.int/iris/handle/10665/189242 [MKG+16] D.Mohr, H.Kaas, P.Gao, D.Wee, and T.Möller, “Automotive revolution— Perspective towards 2030: How the convergence of disruptive technology-driven trends could transform the auto industry,” McKinsey & Company, Washington, DC, USA, Tech. Rep., Jan. 2016 [WCH17]Wang, R., Cao, W., & He, Z. (2017). Vehicle Figure 13: Overall Test Results. recognition in acoustic sensor networks using multiple kernel sparse representation over Our test results shows that YOLO’s success rate is learned dictionaries. International Journal of 81.9%, 218 out of 266, and SVM’s is %57.8, 154 out of Distributed Sensor Networks. 266. It also has significantly lower false positives with 5 https://doi.org/10.1177/1550147717701435 against SVM’s 84. When we apply false negatives [KB13] KumarMishra, Pradeep & Banerjee, Biplab. YOLO’s success rate drops to 80.4% and SVM’s drops (2013). Multiple Kernel based KNN to 44%. Classifiers for Vehicle Classification. One reason of high false positives of SVM is that International Journal of Computer unlike YOLO, who can classify 20 different types, SVM Applications. 71. 1-7. 10.5120/12359-8673. can only identify vehicles. YOLO lowers the class probability if there is another possible object in the [Sri02] Srinivasa, Narayan. (2002). Vision-based frame, which lowers the chance of false positives. vehicle detection and tracking method for We processed the entire video using Asus Nvdia forward collision warning in automobiles. Geforce 1060 gpu. YOLO performed up to 42 fps while 10.1109/IVS.2002.1188021. SVM reach just 4 fps. This is because YOLO is [MRR10] N. C. Mithun, N. U. Rashid and S. M. M. lightweight and use single neural network on given Rahman, "Detection and Classification of frame. On the other hand, SVM needs recursive Vehicles From Video Using Multiple Time- calculations with multiple sliding window calculations. Spatial Images," in IEEE Transactions on Intelligent Transportation Systems, vol. 13, no. 3, pp. 1215-1225, Sept. 2012. doi: [Cho16] Jinyoung Choi. (2016). Tensorflow 10.1109/TITS.2012.2186128 Implementation of ‘YOLO: Real-Time Object [PYL+15] B. J. Park, C. Yoon, J. W. Lee and K. H. Kim, Detection’. Retrieved from "Augmented reality based on driving situation https://github.com/gliese581gg/YOLO_tensorf awareness in vehicle," 2015 17th International low Conference on Advanced Communication [GTI18] Vehicle Image Database (2018, May 12) Technology (ICACT), Seoul, 2015, pp. 593- http://www.gti.ssr.upm.es/data/Vehicle_databa 595. doi: 10.1109/ICACT.2015.7224865 se.html [RSL+14] Rusch, Michelle & Schall, Mark & Lee, John [Int18] Histogram of Oriented Gradients (HOG) & Dawson, Jeffrey & Rizzo, Matthew. (2014). Descriptor (2018, April 2) Augmented reality cues to assist older drivers https://software.intel.com/en-us/ipp-dev- with gap estimation for left-turns. Accident reference-histogram-of-oriented-gradients- Analysis & Prevention. 71. 210–221. hog-descriptor 10.1016/j.aap.2014.05.020. [KIT18] KITTI Vision Bench. Suite (2018, May 13) [GDD+14] R. Girshick, J. Donahue, T. Darrell and J. http://www.cvlibs.net/datasets/kitti/raw_data.p Malik, "Rich Feature Hierarchies for Accurate hp Object Detection and Semantic Segmentation," 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, 2014, pp. 580-587.doi: 10.1109/CVPR.2014.81 [Gir15] R. Girshick, "Fast R-CNN," 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015, pp. 1440-1448. doi: 10.1109/ICCV.2015.169 [RHG+17] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137- 1149, 1 June 2017. doi: 10.1109/TPAMI.2016.2577031 [RDG+16] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 779-788 [Du18] Juan Du (2018). Understanding of Object Detection Based on CNN Family and YOLO. Journal of Physics: Conference Series, 1004, 012029. [Fu17] Juhnsheng FU. (2017, March 17). Vehicle Detection for Autonomous Driving. Retrieved from https://github.com/JunshengFu/vehicle- detection