<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">YORES: An Ensemble YOLO and Resnet Network for Vehicle Detection and Classification</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Akansha</forename><surname>Singh</surname></persName>
							<email>akanshasing@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">SCSET</orgName>
								<orgName type="institution">Bennett University</orgName>
								<address>
									<settlement>Greater Noida</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Krishna</forename><forename type="middle">Kant</forename><surname>Singh</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Delhi Technical Campus</orgName>
								<address>
									<settlement>Greater Noida</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<address>
									<postCode>2023</postCode>
									<settlement>Waterloo</settlement>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">YORES: An Ensemble YOLO and Resnet Network for Vehicle Detection and Classification</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">E34D76B62A6D1ABDF8811B7BEB105B67</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T16:26+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Deep Learning</term>
					<term>ResNet</term>
					<term>YOLO</term>
					<term>Vehicle Detection</term>
					<term>Intelligent Transportation System 1</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Vehicle identification is a significant process in Intelligent Transportation System (ITS). The growing number of vehicles on road has led to the need of automated methods for traffic monitoring and control. Autonomous vehicles and driver assistance systems require efficient vehicle detection methods. The real time performance of these methods must be high and efficient. The existing methods for vehicle identification have significant drawbacks like complex computations, poor performance and inability to detect vehicles in traffic videos. Thus, in this research, we offer an ensemble strategy for vehicle detection in traffic videos that combines the advantages of YOLO and Resnet. In contrast to Resnet, which is used for fine-grained detection, YOLO is utilized for coarse object detection. The final detection result is generated by averaging the results of the two algorithms. We test our method using a publicly available collection of traffic films and demonstrate that, when used alone, it beats both YOLO and Resnet. A multipart loss function is used by the YOLO network. The ResNet network uses cross entropy loss function. The global ensemble loss function is used that takes weighted average of these two loss function. The multipart loss function is used to combine the classification as well as vehicle localization losses. Thus, the method identifies the vehicle using classification and gives a bounding box using localization. A detailed comparative analysis of the methods is also done, and it is observed that the proposed method is better than other methods.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Increasing number of vehicles and the corresponding increase in traffic on roads has increased the demand of monitoring and controlling of traffic to reduce the number of fatalities. Intelligent transportation systems (ITS) have become an important area of research in the last decade. To introduce such system which can track all the suspicious conditions on the roads and can report the same to reduce the number of accidents and miss-happenings on the roads <ref type="bibr" target="#b22">(Xiao et al., 2020)</ref>. The most important and the key component of designing an ITS is vehicle detection. Once the vehicle is detected, the information can be precisely used to classify them, analyse the congestion, tracking the vehicles, removing the occlusions, foreign object detection, and detection of suspicious activities and so on <ref type="bibr" target="#b21">(Xiao et al., 2021;</ref><ref type="bibr" target="#b23">Xu et al., 2022)</ref>.</p><p>The study of how to automatically detect and categorise vehicles is a hot topic in the fields of computer vision and machine learning. There are several things -including other cars, buildings, trees, and more -that can obscure a driver's view of their vehicle on the road. Researching how to create algorithms that can accurately recognise and classify partially visible or obstructed cars is difficult. Detecting automobiles in real time is crucial for several uses, including autonomous driving, traffic monitoring, and surveillance. A difficult area of study is the creation of real-time vehicle detection and classification systems. Vehicle identification and categorization algorithms can be hampered by inclement weather. Researching how to make algorithms that can withstand bad weather is difficult. Classifying vehicles is not a simple yes/no question, but rather a multiclass problem. Creating algorithms that can correctly categorise vehicles including cars, trucks, buses, and motorcycles is a difficult area of study. An unbalanced dataset can lower the quality of results obtained by vehicle identification and classification algorithms. It is a difficult research topic to design algorithms that can correct for bias in datasets. The difficulties listed above are only a small sample of the many that have been studied in the field of autonomous vehicle recognition and categorization. By resolving these issues, the performance of these algorithms can be enhanced, and they can be more widely used in practical settings.</p><p>The literature review reveals that vehicle detection in traffic videos is difficult because of the dynamic nature of the situations and the large number of possible vehicle types. It can be difficult for vehicle detection algorithms relying on Haar features or HOG descriptors to function in certain settings.</p><p>Thus, in this paper YORES an ensemble YOLO Resnet model is proposed. Recent years have seen significant progress in this area thanks to deep learning-based object detection systems like YOLO and Resnet.</p><p>The YOLO algorithm is a well-known example of a single-neural-network object detection system. YOLO can detect objects of varying sizes and aspect ratios quickly and precisely. While Resnet is commonly used for image classification and object detection, it is a deep convolutional neural network. Resnet is well-known for its versatility and adaptability, as it can easily handle complicated visual elements and learn from new da-ta.</p><p>By combining YOLO and Resnet, we may overcome the shortcomings of both algorithms and improve our ability to detect vehicles. In this research, we propose an ensemble method for vehicle detection in traffic videos by combining the advantages of YOLO and Resnet. The detection of small items or objects with low contrast against their back-ground may be difficult for YOLO. YOLO predicts the bounding box and class of each object using a single grid cell, which may not be precise enough for localizing small objects. Combining YOLO with other object detection model, ResNet, can help alleviate this shortcoming by leveraging the advantages of each.</p><p>A further shortcoming of independent object identification models is their potential inability to distinguish between background clutter and actual objects. By combining the benefits of YOLO and REsNet with alternate architectural designs, training data, or input representations, model ensembles can help overcome this restriction. The resulting object detection system may be better equipped to deal with the wide variety of conditions found in the real world.</p><p>There are two phases to our ensemble method. In the first phase, coarse object detection is carried out with the help of YOLO. YOLO can swiftly detect the presence of auto-mobiles in an image or video because it has been trained on a vast collection of traffic footage. When cars are identified, YOLO generates bounding boxes that contain those places.</p><p>Resnet is utilized for fine-grained detection in the second step. Resnet, which was trained on a more limited set of traffic recordings than YOLO, is able to improve upon the latter's detections by pinpointing the exact position and orientation of the vehicles. Resnet's output is also a set of bounding boxes that are associated with the observed vehicle locations.</p><p>In order to arrive at a conclusive detection result, the results from both algorithms are integrated via weighted average. Each algorithm's performance on a validation dataset is used to determine the weights, which can be tweaked to give more weight to speed or accuracy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Proposed Method</head><p>The videos are converted to frames for further processing and identification of vehicles. Noise may be present in the frames due to different illumination, weather, and camera calibrations. Filtering techniques are applied during the pre-processing stage and all the frames are converted into a normalized size of 224 × 224 × 3. The details of the complete method are described in the sections below (figure <ref type="figure" target="#fig_0">1</ref>). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Conversion of Video Data to frames</head><p>The traffic scenes to be processed are generally captured by the CCTV cameras installed on roads. These cameras capture the vehicles as a video. The processing of these videos cannot be done directly. Thus, the conversion of the videos to image frames captured at different time frames is required. A video taken over a time interval T may be represented as shown in equation <ref type="bibr" target="#b0">(1)</ref>.</p><formula xml:id="formula_0">𝑣 𝑇 𝜖{𝑓 1 , 𝑓 2 , 𝑓 3 , … … … … 𝑓 𝑛 }<label>(1)</label></formula><p>where 𝑣 𝑇 = Traffic Video recorded at time interval 𝑇 . 𝑓 𝑛 = image frame. 𝑛 = number of frames per second.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Pre-processing</head><p>The pre-processing of the retrieved video frames is important. As these frames suffer from poor quality due to different capturing conditions. They may also have noise due to the problems in the image sensors. All these will lead to poor results and therefore some pre-processing is required. After pre-processing the data will be ready for input to the model. The main issue is presence of noise. Thus, the input frames are filtered using Butterworth low pass noise removal filter <ref type="bibr" target="#b1">(Basu, 2002)</ref> for removing the noise and smoothening the images. The mathematical equation for the same is given in eq. <ref type="bibr" target="#b1">(2)</ref>.</p><formula xml:id="formula_1">𝐵(𝑥, 𝑦) = 1 1+[ 𝐷(𝑥,𝑦) 𝐷 0 ] 2𝑚<label>(2)</label></formula><p>where 𝐷 0 is the cut − off frequency and 𝐷(𝑥, 𝑦) = √𝑥 2 + 𝑦 2  where 𝑥 𝑎𝑛𝑑 𝑦 are individual pixels of HSI layers obtained in previous step.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Input Traffic Video</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Frame Extraction from Videos (224x224x3)</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Preprocessing</head><p>Train Test Split</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Train standalone YOLO Train standalone ResNet</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Combine loss functions</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Train ensembled model</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Non Maximum Suppression</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Vehicle Identification and Localization</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Proposed Network Architecture</head><p>In recent years deep learning has shown very good results for object detection and classifications in image/videos. In this paper, we have used a Resnet-50 network for detecting various vehicles on the road. The network comprises of the convolution layer network which extracts various important features from the image applying convolutions. The second part of the network is feature localization network which comprises of region proposal networks and pooling combined with non-maximum suppressions to detect the bounding boxes around the vehicles. The backbone network used in the proposed work for initial feature extractions is ZF network <ref type="bibr" target="#b24">(Zeiler &amp; Fergus, 2014)</ref>. The network has very fast training and testing speed and is very useful in designing real time object detections. The network uses small size kernels which maintain even lower-level details in the frames with max pooling. This reduces the time and complexity in network processing.</p><p>The second network used is YOLO which is efficient and fast object detection network <ref type="bibr" target="#b28">(Diwan et al., 2023)</ref>. The network architecture for YOLO is shown below:</p><p>1. Input Layer: This layer is responsible for receiving the input video frames (RGB) from the traffic videos.</p><p>2. Backbone Network: The EfficientNet design serves as the foundation for the backbone network, which is made up of numerous convolutional layers and includes the following: a. Convolutional layers: The backbone network contains a total of 9 convolutional layers, each with a different number of filters and kernel sizes. b. Bottleneck layers: The backbone network is comprised of 2 bottleneck layers, each of which utilizes a combination of 1x1 and 3x3 convolutional layers to minimize the total number of input channels. c. Depthwise separable convolutions: The backbone network also includes two depthwise separable convolutional layers. These layers make use of a combination of depthwise and pointwise convolutions in order to reduce the number of computations that are necessary for feature extraction. 3. The Neck Network: The neck network is what connects the head network to the backbone network. It is made up of a few convolutional layers and includes the following components:</p><p>a. SPP layer: The neck network incorporates a spatial pyramid pooling (SPP) layer, which implements max pooling at many scales to capture features at various granularities of detail. b. Convolutional layers: The neck network also incorporates a number of convolutional layers, which further refine the features that were extracted by the backbone network. 4. Head Network: The head network is the part of the system that is in charge of producing bounding boxes and detecting objects. The head network is made up of a number of convolutional layers, including the following: a. Levels of prediction that are based on anchors: The head network has three levels of prediction that are based on anchors. Each of these layers predicts the class and location of objects by making use of anchor boxes that range in scale. b. Convolutional layers: The head network also includes a number of convolutional layers, which further refine the predictions that were provided by the anchor-based prediction layers. 5. Output Layer: The output layer is responsible for generating the final detection results, which include the category and position of each object that was found.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Ensembling Technique</head><p>Let 𝐼 be an input video frame and let 𝑌𝑂𝐿𝑂(𝐼) be the output of 𝑌𝑂𝐿𝑂 on 𝐼, which consists of a set of bounding boxes 𝐵 = {𝑏 1 , 𝑏 2 , 𝑏 3 , … 𝑏 𝑛 }, where each 𝑏 𝑖 = (𝑥 𝑖 , 𝑦 𝑖 , 𝑤 𝑖 , ℎ 𝑖 ) represents the location and size of a detected vehicle.</p><p>Let 𝑅𝑒𝑠𝑁𝑒𝑡(𝐼) be the output of 𝑅𝑒𝑠𝑁𝑒𝑡 on 𝐼, which also consists of a set of bounding boxes 𝐵 ′ = {𝑏 1 ′ , 𝑏 2 ′ , 𝑏 3 ′ , … 𝑏 𝑛 ′ }, where each 𝑏 𝑖 ′ = (𝑥 𝑖 ′ , 𝑦 𝑖 ′ , 𝑤 𝑖 ′ , ℎ 𝑖 ′ ) represents the location and size of a detected vehicle.</p><p>We can combine the outputs of YOLO and Resnet using a weighted average:</p><formula xml:id="formula_2">𝐵 𝑓𝑖𝑛𝑎𝑙 = 𝑤 1 𝐵 + 𝑤 2 𝐵 ′<label>(3)</label></formula><p>where 𝐵 𝑓𝑖𝑛𝑎𝑙 = {𝑏 1 𝑓𝑖𝑛𝑎𝑙 , 𝑏 2 𝑓𝑖𝑛𝑎𝑙 , 𝑏 3 𝑓𝑖𝑛𝑎𝑙 , … 𝑏 𝑛 𝑓𝑖𝑛𝑎𝑙 } is the final set of bounding boxes, and 𝑤 1 and 𝑤 2 are the weights assigned to YOLO and Resnet, respectively. We can choose these weights based on the performance of each algorithm on a validation dataset, and we can adjust them to prioritize speed or accuracy depending on the application. Both the YOLO and ResNet models produce a significant number of ideas for each vehicle. The abundance of proposals poses a challenge in the process of filtering and identifying a singular bounding box for each vehicle. Therefore, the application of the non-maximum suppression technique is employed to filter the bounding boxes and reduce them to a single box per vehicle. The NMS algorithm takes as input a set of proposal boxes B, their corresponding confidence scores (S), and a user-selected threshold value (N). The filtered proposals (D) are acquired as the resulting output of the method.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5.">YOLO LOSS Function</head><p>The employed loss function in this study is a multifaceted loss function. The losses employed in this context are mean squared error losses, which incorporate the IoU score to quantify the discrepancy between expected and actual values. The loss function comprises three components, namely coordinate loss, confidence loss, and classification loss.</p><formula xml:id="formula_3">𝐿 𝑌𝑂𝐿𝑂 = 𝑓 𝑐𝑜𝑜𝑟𝑑𝑖𝑛𝑎𝑡𝑒−𝑙𝑜𝑠𝑠 + 𝑓 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒−𝑙𝑜𝑠𝑠 + 𝑓 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛−𝑙𝑜𝑠𝑠<label>(4)</label></formula><p>where</p><formula xml:id="formula_4">𝑓 𝑐𝑜𝑜𝑟𝑑𝑖𝑛𝑎𝑡𝑒−𝑙𝑜𝑠𝑠 = 𝜆 𝑐𝑜𝑜𝑟𝑑 ∑ ∑ 1 𝑖𝑗 𝑜𝑏𝑗 𝐵 𝑗=0 (𝑥 𝑖 − 𝑥 ̂𝑖) 2 + (𝑦 𝑖 − 𝑦 ̂𝑖) 2 𝑆 2 𝑖=0</formula><p>(5) </p><formula xml:id="formula_5">𝑓 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒−𝑙𝑜𝑠𝑠 = 𝜆 𝑐𝑜𝑜𝑟𝑑 ∑ ∑ 1 𝑖𝑗 𝑜𝑏𝑗 𝐵 𝑗=0 (√𝑤 𝑖 − √𝑤 ̂𝑖 ) 2 + (√ℎ 𝑖 − √ℎ ̂𝑖 ) 2 𝑆 2 𝑖=0 (6) 𝑓 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛−𝑙𝑜𝑠𝑠 = ∑ 1 𝑖𝑗 𝑜𝑏𝑗 ∑ (𝑝 𝑖 (𝑐) − 𝑝̂𝑖(𝑐))</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.6.">ResNet LOSS Function</head><p>The cross-entropy loss function is defined as the difference between the true probability distribution y and the anticipated probability distribution.</p><formula xml:id="formula_6">𝐿 𝑅𝑒𝑠𝑁𝑒𝑡 = ∑ 𝑦 𝑖 log (𝑦 𝑖 ̂)<label>(8)</label></formula><p>Where 𝒚 𝒊 is the ith element of the true probability distribution y and 𝒚 𝒊 ̂ is the corresponding element of the predicted probability distribution ŷ. The summation is taken over all elements i of the distributions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.7.">Global Ensemble LOSS Function</head><p>When combining YOLO with ResNet, we can use a loss function that is a weighted sum of the losses from both models. For instance, we can compute the loss as a weighted sum of the YOLO and ResNet loss functions by giving each loss term in the YOLO loss function a certain value. Training model parameters can also be updated using a weighted combination of individual model's optimization techniques. The relative success of each model on the validation data will inform the decision of how much weight to give each feature.</p><p>YORES ensembles YOLO and ResNet -and that their loss functions, L(YOLO) and L(ResNet), have been assigned weights of alpha and beta, respectively. The ensemble loss function is then calculated as:</p><formula xml:id="formula_7">𝐿 𝑒𝑛𝑠𝑒𝑚𝑏𝑙𝑒 = 𝛼𝐿 𝑌𝑂𝐿𝑂 + 𝛽𝐿 𝑅𝑒𝑠𝑁𝑒𝑡 (9)</formula><p>Here, alpha and beta are scalar weights that specify how much emphasis should be placed on either of the two loss functions. These weights can be determined by looking at how well each model does on validation data and giving more weight to the model that does better.</p><p>Minimizing the global loss function 𝐿 𝑒𝑛𝑠𝑒𝑚𝑏𝑙𝑒 as a function of the model parameters is the target of the optimization. Backpropagation is used to compute the gradients of the global loss function with respect to the model parameters during training, and an optimization technique like stochastic gradient descent (SGD), Adam, or RMSProp is used to update the model parameters.</p><p>The loss functions of YOLO and ResNet are combined in the ensembled model by weighting the individual loss functions and then summing them to get the final loss function.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Experiments and Results</head><p>In this section the experiments are discussed. The proposed model is implemented using Python programming language. The Python modules used include keras and tensorflow. Other supporting modules are also used. The model training is done using GPU support as the dataset is very large and training will ot be possible with simple CPU. The dataset contains two subsets localization and classification. Using these datasets an annotated csv file is created. The csv file comprises the bounding box position of each object. This is used as the ground truth for training. The model is trained with 10000 iterations on the selected dataset. After the trained network is fully trained it can identify the vehicles. The non max suppression threshold value is selected as 0.45. The model is then applied to test videos and images. Each detected object shows vehicle bounded by a box. The name of the vehicle also appears on the box. The results of various steps of the proposed method are shown below. Classification results for all categories of vehicles are shown in figure <ref type="figure" target="#fig_2">3</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Data Set Used</head><p>The experiments are conducted using the publicly available datasets. Numerous publicly available datasets are available for vehicle classes. But in this work one of largest vehicle dataset MIO-TCD <ref type="bibr" target="#b8">(Luo et al., 2018)</ref> is used. This dataset is divided into two parts the classification and localization dataset. The localization is used for the object position and classification for vehicle class. The distribution of the MIO-TCD dataset is shown in table 1. The dataset contains vehicles from different field of views, illumination condition and weather. Some of the sample images from the dataset are shown in figure <ref type="figure" target="#fig_1">2</ref>.   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Evaluation Metric</head><p>To quantitatively analyse the performance of the classification and detection method following metrics are being used.</p><p>Total Accuracy: The total accuracy demonstrates the percentage of the total number of vehicles currently identified as vehicles.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>𝐴𝐶 =</head><p>𝑇𝐶𝐼 𝑡𝑜𝑡𝑎 𝑙𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑒ℎ𝑖𝑐𝑙𝑒𝑠 <ref type="bibr" target="#b9">(10)</ref> where 𝑇𝐶𝐼 = total number of correctly identified vehicles</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Mean Recall and Mean precision</head><p>The dataset which we are using is having different number of images or frames for different datasets we have used another two metrics for rectifying this imbalance namely mean recall and mean precision. These are obtained by taking the average of precision and recall overall category of the vehicles. The method was also compared with other state of the art classification methods. All these methods have used feature extraction followed by a classifier network. The comparative results for all the methods have been shown in Table <ref type="table" target="#tab_3">3</ref>. The graphical representations of the same are shown in figure <ref type="figure" target="#fig_3">4</ref>. The average accuracy comparison analysis with different methods is shown in figure <ref type="figure">9</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusions and Future Work</head><p>In this research, we offer an ensemble method to the problem of vehicle detection in traffic videos by combining the advantages of the YOLO and Resnet algorithms. YOLO is used for coarse object detection, and Resnet is used for fine-grained detection in our approach. YOLO is used to identify large groups of objects. A weighted average assembly is used to aggregate the results of both of these processes. The YOLO and ResNet loss functions are combined in the ensembled model by first assigning weights to each loss function and then computing the weighted sum of these individual loss functions as the overall loss function. We can effectively combine the strengths of YOLO and ResNet and increase the performance of car detection from traffic videos by improving the ensembled model using the overall loss function. This allows us to effectively mix YOLO and ResNet. Some of the limitations of standalone object detection models can be circumvented with the help of ensembling and YOLO. The proposed approach accurately identifies eleven classes of vehicles, achieving state-of-the-art results. The comparison of the proposed approach with six other methods demonstrates its superiority in terms of accuracy, speed, and robustness. Therefore, the proposed approach has significant potential for practical applications in traffic surveillance and management, such as traffic flow optimization and accident detection. Further studies can investigate the scalability and generalizability of the proposed approach to various traffic scenarios and different environments. Overall, this research contributes to the development of intelligent transportation systems and paves the way for future research in this field.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Proposed vehicle detection method</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Example frames from video dataset.</figDesc><graphic coords="7,236.65,71.33,139.85,108.70" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Detection results for different vehicle categories</figDesc><graphic coords="7,72.00,206.62,451.00,345.20" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Comparative Results</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Comparative analysis with other methods</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 . Distribution of dataset</head><label>1</label><figDesc></figDesc><table><row><cell>Category</cell><cell>Training</cell><cell>Testing</cell></row><row><cell>Articulated Truck</cell><cell>10346</cell><cell>2587</cell></row><row><cell>Bicycle</cell><cell>2284</cell><cell>571</cell></row><row><cell>Bus</cell><cell>10316</cell><cell>2579</cell></row><row><cell>Car</cell><cell>260518</cell><cell>65131</cell></row><row><cell>Motorcycle</cell><cell>1982</cell><cell>495</cell></row><row><cell>Non-Motorized Vehicle</cell><cell>1751</cell><cell>438</cell></row><row><cell>Pick up Truck</cell><cell>50906</cell><cell>12727</cell></row><row><cell>Single Unit Truck</cell><cell>5120</cell><cell>1280</cell></row><row><cell>Work Van</cell><cell>9679</cell><cell>2422</cell></row><row><cell>Background</cell><cell>160000</cell><cell>40000</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2 Accuracy of method for all classes of vehicles and background</head><label>2</label><figDesc>𝐹𝑁 𝑖 and 𝐹𝑃 𝑖 are true positives, false negatives and false positives for each category. The overall results for all category of vehicle are shown in Table2.</figDesc><table><row><cell>𝑀𝑅𝐸 = 𝑀𝑃𝑅 =</cell><cell>∑ ∑</cell><cell>𝑅𝐸 𝑖 11 11 𝑖=1 𝑷𝑹 𝒊 𝟏𝟏 𝒊=𝟏 𝟏𝟏</cell><cell>(11) (12)</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3 Comparative</head><label>3</label><figDesc></figDesc><table><row><cell></cell><cell cols="2">analysis of accuracy</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Method</cell><cell>Artic</cell><cell>Bicycl</cell><cell>Bus</cell><cell>Car</cell><cell>Mot</cell><cell>Mot</cell><cell>Pick</cell><cell>Non-</cell><cell>Singl</cell><cell>Van Aver</cell></row><row><cell></cell><cell>ulat</cell><cell>e</cell><cell></cell><cell></cell><cell>orcyc</cell><cell>ori-</cell><cell>up</cell><cell>Motor</cell><cell>e</cell><cell>age</cell></row><row><cell></cell><cell>ed</cell><cell></cell><cell></cell><cell></cell><cell>le</cell><cell>zed</cell><cell>Truc</cell><cell>ized</cell><cell>unit</cell><cell>Accu</cell></row><row><cell></cell><cell>Truc</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>Vehi</cell><cell>k</cell><cell>Vehicl</cell><cell>truck</cell><cell>racy</cell></row><row><cell></cell><cell>k</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>cle</cell><cell></cell><cell>e</cell><cell></cell><cell></cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Video vehicle detection algorithm based on virtual-line group</title>
		<author>
			<persName><forename type="first">L</forename><surname>Anan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhaoxuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jintao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">APCCAS 2006-2006 IEEE Asia Pacific Conference on Circuits and Systems</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="1148" to="1151" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Gaussian-based edge-detection methods-a survey</title>
		<author>
			<persName><forename type="first">Mitra</forename><surname>Basu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews)</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="252" to="260" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Vehicle type classification using a semisupervised convolutional neural network</title>
		<author>
			<persName><forename type="first">Zhen</forename><surname>Dong</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE transactions on intelligent transportation systems</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="2247" to="2256" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Visual tracking based on improved foreground detection and perceptual hashing</title>
		<author>
			<persName><forename type="first">Mengjuan</forename><surname>Fei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jing</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Honghai</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neurocomputing</title>
		<imprint>
			<biblScope unit="volume">152</biblScope>
			<biblScope unit="page" from="413" to="428" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Local binary pattern-based onroad vehicle detection in urban traffic scene</title>
		<author>
			<persName><forename type="first">M</forename><surname>Hassaballah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mourad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ibrahim</forename><forename type="middle">M</forename><surname>Kenk</surname></persName>
		</author>
		<author>
			<persName><surname>El-Henawy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Pattern Analysis and Applications</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="1505" to="1521" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">The Improvement of Road Driving Safety Guided by Visual Inattentional Blindness</title>
		<author>
			<persName><forename type="first">J</forename><surname>Xu</surname></persName>
		</author>
		<idno type="DOI">10.1109/TITS.2020.3044927</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Intelligent Transportation Systems</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="4972" to="4981" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">ResNet-based vehicle classification and localization in traffic surveillance systems</title>
		<author>
			<persName><forename type="first">Heechul</forename><surname>Jung</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE conference on computer vision and pattern recognition workshops</title>
				<meeting>the IEEE conference on computer vision and pattern recognition workshops</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Video-based traffic data collection system for multiple vehicle types</title>
		<author>
			<persName><forename type="first">Shuguang</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IET Intelligent Transport Systems</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="164" to="174" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">MIO-TCD: A new benchmark dataset for vehicle classification and localization</title>
		<author>
			<persName><forename type="first">Zhiming</forename><surname>Luo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Image Processing</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="issue">10</biblScope>
			<biblScope unit="page" from="5129" to="5141" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">The whale optimization algorithm</title>
		<author>
			<persName><forename type="first">S</forename><surname>Mirjalili</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lewis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in engineering software</title>
		<imprint>
			<biblScope unit="volume">95</biblScope>
			<biblScope unit="page" from="51" to="67" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Object detection binary classifiers methodology based on deep learning to identify small objects handled similarly: Application in video surveillance</title>
		<author>
			<persName><surname>Pérez-Hernández</surname></persName>
		</author>
		<author>
			<persName><surname>Francisco</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Knowledge-Based Systems</title>
		<imprint>
			<biblScope unit="volume">194</biblScope>
			<biblScope unit="page">105590</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M M</forename><surname>Rahman</surname></persName>
		</author>
		<ptr target="https://mahbubur.buet.ac.bd/resources/DatabaseEBVT.htm" />
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Moving object area detection using normalized self adaptive optical flow</title>
		<author>
			<persName><forename type="first">Sandeep</forename><surname>Sengar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Susanta</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><surname>Mukhopadhyay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Optik</title>
		<imprint>
			<biblScope unit="volume">127</biblScope>
			<biblScope unit="page" from="6258" to="6267" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Automatic vehicle detection using spatial time frame and object based classification</title>
		<author>
			<persName><forename type="first">Poonam</forename><surname>Sharma</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Intelligent &amp; Fuzzy Systems</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="8147" to="8157" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Vehicle identification using modified region based convolution network for intelligent transportation system</title>
		<author>
			<persName><forename type="first">Poonam</forename><surname>Sharma</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Multimedia Tools and Applications</title>
		<imprint>
			<biblScope unit="page" from="1" to="25" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Active learning based robust monocular vehicle detection for on-road safety systems</title>
		<author>
			<persName><forename type="first">Sayanan</forename><surname>Sivaraman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mohan</forename><forename type="middle">Manubhai</forename><surname>Trivedi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE intelligent vehicles symposium</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2009">2009. 2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos</title>
		<author>
			<persName><forename type="first">Andrews</forename><surname>Sobral</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Antoine</forename><surname>Vacavant</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computer Vision and Image Understanding</title>
		<imprint>
			<biblScope unit="volume">122</biblScope>
			<biblScope unit="page" from="4" to="21" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">EDeN: Ensemble of deep networks for vehicle classification</title>
		<author>
			<persName><forename type="first">Rajkumar</forename><surname>Theagarajan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Federico</forename><surname>Pala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bir</forename><surname>Bhanu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE conference on computer vision and pattern recognition workshops</title>
				<meeting>the IEEE conference on computer vision and pattern recognition workshops</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Vehicle Detection with HOG and Linear SVM</title>
		<author>
			<persName><forename type="first">Nikola</forename><surname>Tomikj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrea</forename><surname>Kulakov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Emerging Computer Technologies</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="6" to="9" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Vehicle detection using normalized color and edge map</title>
		<author>
			<persName><forename type="first">Luo-</forename><surname>Tsai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jun-Wei</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kuo-Chin</forename><surname>Hsieh</surname></persName>
		</author>
		<author>
			<persName><surname>Fan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE transactions on Image Processing</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="850" to="864" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Real-time vehicle type classification with deep convolutional neural networks</title>
		<author>
			<persName><forename type="first">Xinchen</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Real-Time Image Processing</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="5" to="14" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Electric vehicle routing problem: A systematic review and a new comprehensive model with nonlinear energy recharging and consumption</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Kaku</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Pan</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.rser.2021.111567</idno>
	</analytic>
	<monogr>
		<title level="j">Renewable &amp; sustainable energy reviews</title>
		<imprint>
			<biblScope unit="volume">151</biblScope>
			<biblScope unit="page">111567</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">The continuous pollution routing problem</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zuo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Konak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xu</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.amc.2020.125072</idno>
	</analytic>
	<monogr>
		<title level="j">Applied mathematics and computation</title>
		<imprint>
			<biblScope unit="volume">387</biblScope>
			<biblScope unit="page">125072</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">The alleviation of perceptual blindness during driving in urban areas guided by saccades recommendation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">H</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Guo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Intelligent Transportation Systems</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="issue">9</biblScope>
			<biblScope unit="page" from="16386" to="16396" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Visualizing and understanding convolutional networks</title>
		<author>
			<persName><forename type="first">Matthew</forename><forename type="middle">D</forename><surname>Zeiler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rob</forename><surname>Fergus</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">European conference on computer vision</title>
				<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Vehicle detection in urban traffic surveillance images based on convolutional neural networks with feature concatenation</title>
		<author>
			<persName><forename type="first">Fukai</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ce</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Feng</forename><surname>Yang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Sensors</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page">594</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<title level="m" type="main">Maff-net: Filter false positive for 3d vehicle detection with multi-modal adaptive feature fusion</title>
		<author>
			<persName><forename type="first">Zehan</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2009.10945</idno>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">EDA approach for model based localization and recognition of vehicles</title>
		<author>
			<persName><forename type="first">Zhaoxiang</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE Conference on Computer Vision and Pattern Recognition</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2007">2007. 2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Object detection using YOLO: Challenges, architectural successors, datasets and applications</title>
		<author>
			<persName><forename type="first">T</forename><surname>Diwan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Anirudh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">V</forename><surname>Tembhurne</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Multimedia Tools and Applications</title>
		<imprint>
			<biblScope unit="volume">82</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
