<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Visual Objects Tracking on Road Sequences Using Information about Scene Perspective Transform</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Nikolay</forename><surname>Nemcev</surname></persName>
							<email>nicknemcev@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">ITMO University</orgName>
								<address>
									<postCode>197101</postCode>
									<settlement>Saint Petersburg</settlement>
									<country key="RU">Russian Federation</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nickolay</forename><surname>Kozyrev</surname></persName>
							<email>kozyrevkoly@mail.ru</email>
							<affiliation key="aff0">
								<orgName type="institution">ITMO University</orgName>
								<address>
									<postCode>197101</postCode>
									<settlement>Saint Petersburg</settlement>
									<country key="RU">Russian Federation</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Visual Objects Tracking on Road Sequences Using Information about Scene Perspective Transform</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">B64E268340D068CBAF50F282819FAD3B</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T00:35+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Visual Data Processing</term>
					<term>Visual Object Tracking</term>
					<term>Convolutional Neural Networks</term>
					<term>Perspective Transform</term>
					<term>Vanishing Point</term>
					<term>RANSAC</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The paper studies existing approaches and methods used in the task of objects tracking on video, which is one of the most important tasks facing both visual data analysis systems as a whole and road traffic control systems mounted on moving participants of the scene directly (including self-driving vehicles). The proposed approach is used for road scene perspective transform estimation, the search area location, and works in conjunction with a convolutional neural network for objects tracking. The proposed approach helps significantly increase tracking efficiency (on average 10 %, up to 20 % for certain object classes) on a subset of the road scenes videos shot from a moving vehicle and can be used in practice in environment perception modules mounted directly to vehicles.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In the automotive industry, computer vision algorithms are used to solve various problems. For example, object and lane detection, velocity and free space estimation, creating functions for understanding the environment, motion planning for autonomous moving devices.</p><p>The task of tracking an object between two frames of a video sequence can be represented as a search for the position of the object 𝑅(𝐹 𝑖 ) at some frame 𝐹 𝑖 , by the known state of the object on the previous frame of the sequence 𝑅(𝐹 𝑖 ), which is given by a rectangular bounding box.</p><p>Object tracking technology is widely used in systems for the road environment understanding in the modules of perception and motion planning for unmanned vehicles. Extensive use of technology leads to additional requirements on them. These requirements are related to realtime data processing in the changing weather and illumination conditions, and with the specific nature of the movement of tracked objects. The specific nature of the movement of tracked objects is characterized by high movement speed, frequent overlap, and significant frame-toframe object's size change caused both by the own movement of the scene objects and the movement of the camera.</p><p>In general, real-time object tracking algorithms can be divided according to the method of obtaining and describing the model of the tracked object on two types of algorithms: classical ones, and based on the principles of machine learning.</p><p>Classical algorithms include the basic pattern of the search algorithm <ref type="bibr" target="#b0">[1]</ref>. Those algorithms estimate the position of an object at the next frame by searching the area most similar to the used object template (object image at the previous frame) according to the minimum matching error (SAD) criterion or a maximum of the correlation coefficient. Algorithms which are based on principles of contours tracking uses as template information not about the entire pixel field of an object, but its shape and boundaries <ref type="bibr" target="#b1">[2]</ref>. It is necessary to take into account approaches based on the methods of the object's key points extracting and their subsequent comparison with key points of the next frame search area <ref type="bibr" target="#b2">[3]</ref>. The key points estimation algorithm can be performed by the usage of different approaches described in <ref type="bibr" target="#b3">[4]</ref>, <ref type="bibr" target="#b4">[5]</ref>, <ref type="bibr" target="#b5">[6]</ref>, <ref type="bibr" target="#b6">[7]</ref>. The task of tracking objects on a video can be solved by the related task of the object's motion estimation <ref type="bibr" target="#b7">[8]</ref>.</p><p>The advantages of classical algorithms include availability to work without preliminary stage training for tracking module, low computational complexity and high speed of the baseline approaches. The disadvantages of classical algorithms include sensitivity to changes in the illumination of the scene, the problems with object tracking at scenes with a non-static background. It should be noted that the above problems are inherent in the baseline algorithms of this class, and there are algorithms based on the classical principles of computer vision, devoid of these shortcomings. However, such approaches are usually computationally complex and unable to work in real-time <ref type="bibr" target="#b8">[9]</ref> and even more for organizing inter-machine exchange through a network <ref type="bibr" target="#b9">[10]</ref>, <ref type="bibr" target="#b10">[11]</ref>, <ref type="bibr" target="#b11">[12]</ref>.</p><p>Algorithms based on the principles of machine learning use various neural network architectures <ref type="bibr" target="#b12">[13]</ref>, <ref type="bibr" target="#b13">[14]</ref>. Also, algorithms can use other methods of machine learning, for example, RandomForest <ref type="bibr" target="#b13">[14]</ref>, <ref type="bibr" target="#b14">[15]</ref> These principles of machine learning allow extracting a set of tracked object's features, used later for searching object position at the next frame of the sequence. Some of these approaches search for the position of the object on the next frame by searching for candidate regions in a certain area <ref type="bibr" target="#b15">[16]</ref>. Other approaches solve the problem of tracking the object as a one-shot detection task <ref type="bibr" target="#b16">[17]</ref>. The need for preliminary preparation of feature extraction modules is a hallmark of algorithms which use machine learning methods <ref type="bibr" target="#b16">[17]</ref>.</p><p>ML-based algorithms (ML -machine learning) are more resistant to changes in the parameters of the scene and more robustly extract features of partially overlapping scene objects, which makes them more applicable in object tracking modules mounted on moving vehicles.</p><p>It should also be noted that when solving the task of tracking objects on video often used modifications of the Kalman filter <ref type="bibr" target="#b17">[18]</ref>. This modification used both for filtering the trajectory of objects and for predicting the position of the object on the next frame based on the history of its motion <ref type="bibr" target="#b15">[16]</ref> are often used in the task of tracking objects on video.</p><p>The proposed approach combines a method for assessing the parameters of the perspective transformation of the scene, used to refine the search region at the next frame of the sequence, and a modified convolutional Siamese network for the object position estimation within the given search region <ref type="bibr" target="#b16">[17]</ref>. Usage of the proposed method for refining the parameters of the search region is caused by the need to compensate the displacement and resizing of objects moving longitudinally to the camera (in this case, the movement of these objects is generated by both their movement and the displacement of the camera mounted on the vehicle).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">The general scheme of the proposed approach</head><p>Conventionally, the task of tracking an object between two frames can be represented as a search for the state of the object 𝑅(𝐹 𝑖 ) on some frame 𝐹 𝑖 , based on the known state of the object on the previous frame of the sequence 𝑅(𝐹 𝑖−1 ), specified by a rectangular bounding box. The proposed approach can be divided into two separate modules -a module for defining parameters of an object search region on the next frame, used to calculate the assumed position and scale of the object, and a modified convolutional neural network that searches for the position of the object in a given region of interest <ref type="bibr" target="#b15">[16]</ref>. The general diagram of the approach is given in figure <ref type="figure" target="#fig_0">1</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Method for refining search region parameters</head><p>For search region parameters estimation (center of an area 𝑖 = (𝑥, 𝑦) and scale 𝑆 𝑖 ) on the frame 𝐹 𝑖 used by the tracker for the object position estimation is used the procedure based on a method of random samples <ref type="bibr" target="#b17">[18]</ref> (RANSAC, Random Sample Consensus) and estimation of parameters of the scene perspective transform by vanishing point search. At first step of the perspective transformation estimation, object boundaries are searched using the Canny edge detector <ref type="bibr" target="#b0">[1]</ref>, and a set of linear object boundary segments whose length exceeds 3 pixels is searched by the Huff transform. Each segment 𝐸 = (𝑝𝑜𝑠, 𝑑𝑖𝑟, 𝑙𝑒𝑛) is described by the combination of the position of the center 𝑝𝑜𝑠, the slope of the segment 𝑑𝑖𝑟 and its length 𝑙𝑒𝑛, while segments whose angle of inclination to the vertical axis of the frame didn't belong to the range from 10 to 70 degrees were removed.</p><p>The structure of the RANSAC algorithm can be described by two stages. At first stage a set of hypotheses is selected, in this case, the hypothesis is a model of the vanishing point 𝑀 (𝐸 1 , 𝐸 2 ), selected as the intersection of two random segments 𝐸 1 and 𝐸 2 , obtained at the previous stage. Finally, the votes for each model are counted and the model with the most votes is the output of the algorithm (target vanishing point).</p><p>To count the votes of some hypothesis 𝑀 (𝐸 1 , 𝐸 2 ), we iterate around all available segments 𝐸 𝑖 and calculate the weight of each voice using the following expression:</p><formula xml:id="formula_0">𝑣𝑜𝑡𝑒(𝐸 𝑖 , 𝑀 (𝐸 1 , 𝐸 2 ) = {︃ 1−𝑒 −𝛼 cos 2 𝜃 1−𝑒 −𝛼 • 𝛽 • 𝑙𝑒𝑛(𝐸 𝑖 ), if 𝜃 ≤ 5 ∘ , 0, otherwise ,<label>(1)</label></formula><p>where is the smaller angle between the voting segment and the line connecting the hypothetical vanishing point to the center of the given segment, 𝛼 is the parameter describing the dependence of the voice weight on the angles similarity level, 𝛽 is the coefficient describing the influence of the segment length on the weight of the its vote.</p><p>The model with the most votes is the approximate position of the vanishing point, describing the parameters of the perspective transformation of the scene. After finding this model, the point refinement procedure is performed according to the approach described in <ref type="bibr" target="#b18">[19]</ref>.</p><p>Knowing the parameters (coordinates of angles) of the bounding box of the object 𝑅(𝐹 𝑖−1 ) on frame 𝐹 𝑖−1 and the coordinates of the vanishing point 𝑉 𝑃 = (𝑥, 𝑦), we can construct a set of estimated parameters of the search region (position and scale) based on the hypothesis about the longitudinal motion of objects <ref type="bibr" target="#b0">[1]</ref>. Hypothetical search regions are selected by shifting (taking into account perspective transformation parameters) the bounding box of the object 𝑅(𝐹 𝑖−1 ) on the frame 𝐹 𝑖 along the line connecting the center of the given object and the vanishing point, so the coefficient of object scale (ratio of the area of the supposed bounding box to the size of object's box at the frame 𝑅(𝐹 𝑖−1 )) varies in the range from 0.75 to 1.25 with step 0.1. The illustration is shown in Figure <ref type="figure" target="#fig_1">2</ref>.</p><p>At next step, the most appropriate 𝑅 ′ (𝐹 𝑖 ) from the set of assumed bounding frames, is selected based on the criterion of the maximum correlation level with the frame area 𝐹 𝑖−1 corresponding to the image of an object 𝑅(𝐹 𝑖−1 ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Neural network model</head><p>After determining the hypothetical search region 𝑅 ′ (𝐹 𝑖 ) the position of the object on the frame 𝐹 𝑖 is searched using the Siamese neural network for tracking objects. The architecture is almost identical to the network described in <ref type="bibr" target="#b16">[17]</ref>. The main difference of this network architecture is that in addition to the object template and search area specified by the previous center of the frame 𝑅(𝐹 𝑖−1 ), a hypothetical search region is also supplied to the network input. The hypothetical search region is described by the bounding box center 𝑅 ′ (𝐹 𝑖 0) and size change factor 𝑆 𝑖 . This network solves the problem of object tracking as detecting with template, operates in parallel with both fields of search and describes the received results using the rectangular bounding boxes of 𝐵 𝑝𝑟𝑜𝑝 (𝐹 𝑖 , 𝑅 𝐹 𝑖 −1 , 1) and 𝐵 𝑝𝑟𝑜𝑝 (𝐹 𝑖 , 𝑅 ′ (𝐹 𝑖 ), 𝑆 𝑖 ) and the probabilities of detection of 𝑃 (𝐹 𝑖 , 𝑅(𝐹 𝑖−1 , 1) and 𝑃 (𝐹 𝑖 , 𝑅 ′ (𝐹 𝑖 , 𝑆 𝑖 ) corresponding to the search </p><formula xml:id="formula_1">𝑅(𝐹 𝑖 ) = {︂ 𝐵 𝑝𝑟𝑜𝑝 (𝐹 𝑖 , 𝑅 ′ (𝐹 𝑖 ), 𝑆 𝑖 ), if 𝑃 (𝐹 𝑖 , 𝑅 ′ (𝐹 𝑖 ), 𝑆 𝑖 ) − 0.1|1 − 𝑆 𝑖 | &gt; 𝑃 (𝐹 𝑖 , 𝑅 ′ (𝐹 𝑖−1 ), 1), 𝐵 𝑝𝑟𝑜𝑝 (𝐹 𝑖 , 𝑅 ′ (𝐹 𝑖−1 ), 1), otherwise ,<label>(2)</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Assessment of the effectiveness of the proposed approach</head><p>At this stage, a comparative analysis of the proposed approach and the classical implementation of the Siam-RPN tracker <ref type="bibr" target="#b14">[15]</ref>, which became the basis of the proposed approach, was produced according to the mean overlap criterion (EAO, expected average overlap), calculated in compliance with the procedure described in <ref type="bibr" target="#b18">[19]</ref>:</p><formula xml:id="formula_2">𝜑 = 1 𝑁 ℎ𝑖 − 𝑁 𝑙𝑜 𝑁 ℎ𝑖 ∑︁ 𝑁𝑠=𝑁 𝑙𝑜 𝜑 𝑁𝑠<label>(3)</label></formula><p>In equation ( <ref type="formula" target="#formula_2">3</ref>) 𝑁 𝑙𝑜 is the minimum length and 𝑁 ℎ𝑖 is the maximum length of the sequence of frames on which the tracked object is present, and 𝑁 𝑠 calculated according to the following formula:</p><formula xml:id="formula_3">𝜑 𝑁 𝑠 = 1 𝑁 𝑠 𝑁𝑠 ∑︁ 𝑖=1 𝜑 𝑖<label>(4)</label></formula><p>Here (4) 𝑁 𝑠 is the average overlap for length sequence 𝑁 𝑠 , and 𝜑 𝑖 is the coefficient of overlap a predicted position of the object and its true position on the frame i (IOU, intersection over union <ref type="bibr" target="#b0">[1]</ref>). A subset of the data set BDD100K <ref type="bibr" target="#b19">[20]</ref>, consisting of 61 road scenes sequences shot from a moving car in various weather conditions and at different day times was used as a test data set. Tracking was performed for each video object from its first appearance to the end of the video. In this case, the initial state (position of the bounding box) of the object was taken directly from the markup of used dataset. The results of the effectiveness analysis of the proposed approach for different object's classes are given in Table <ref type="table" target="#tab_0">1</ref>.</p><p>The obtained results show that using the proposed approach makes it possible to significantly increase the efficiency of tracking objects compared to the classical implementation of Siam-RPN <ref type="bibr" target="#b16">[17]</ref>. It should be noted that the proposed approach operates with the same parameters (weights) of the neural network as the classical implementation. At the same time, using the Kalman filter <ref type="bibr" target="#b17">[18]</ref> to predict the position of the object on the next frame (and select the corresponding search region) does not give a noticeable increase in tracking quality (Siam-RPN <ref type="bibr" target="#b16">[17]</ref> + UKF <ref type="bibr" target="#b17">[18]</ref> and Proposed + UKF <ref type="bibr" target="#b17">[18]</ref>), this is primarily due to the small length of used video sequences, during which the filter often does not have enough time to formalize the model of the object movement.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>The approach described in this article for tracking objects on video is based on the method of refining the parameters of the search region and using a modified neural network for tracking objects. The proposed approach of refining parameters of the search region is based on the method of estimating the perspective transformation of the scene by searching for the vanishing point and used to compensate the movement and scaling of objects caused by their longitudinal movement and allows to significantly increase the efficiency of the neural network for tracking objects (average 10%, up to 20% for some object classes) on a subset of video sequences of road scenes taken from a moving camera. Modified network performs object search simultaneously at two search areas using the same object template. It should be noted that the search region refinement module usage slightly increases the computational complexity of the tracking process and its duration. However, the information about perspective transformation may be used by other unmanned vehicle modules, such as the road marking detection and tracking module. The modified neural network also imposes higher requirements on the computational capabilities of the used graphics accelerator (primarily its memory). However, the relative simplicity of the original Siam-RPN architecture <ref type="bibr" target="#b16">[17]</ref> allows the proposed approach for tracking objects to work in real-time on devices mounted directly on moving unmanned vehicles.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: The general scheme of the proposed approach</figDesc><graphic coords="3,89.29,226.39,416.70,227.04" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: The procedure for determining the parameters of the search region</figDesc><graphic coords="5,89.29,84.19,416.69,219.79" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Assessment of the effectiveness of the proposed approach</figDesc><table><row><cell></cell><cell cols="3">Expected Average Overlap, EAO</cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell cols="3">Siam-RPN [17] Proposed Diff, %</cell><cell>Siam-RPN [17] + UKF [18]</cell><cell>Proposed + UKF [18]</cell><cell>Diff, %</cell></row><row><cell>Car</cell><cell>0.36</cell><cell>0.41</cell><cell>13.89</cell><cell>0.37</cell><cell>0.42</cell><cell>13.51</cell></row><row><cell>Pedestrian</cell><cell>0.32</cell><cell>0.38</cell><cell>18.75</cell><cell>0.33</cell><cell>0.36</cell><cell>9.09</cell></row><row><cell>Rider</cell><cell>0.35</cell><cell>0.43</cell><cell>22.86</cell><cell>0.35</cell><cell>0.41</cell><cell>17.14</cell></row><row><cell>Bus</cell><cell>0.44</cell><cell>0.42</cell><cell>-2.22</cell><cell>0.45</cell><cell>0.45</cell><cell>0.00</cell></row><row><cell>Truck</cell><cell>0.44</cell><cell>0.46</cell><cell>4.5</cell><cell>0.43</cell><cell>0.47</cell><cell>9.3</cell></row><row><cell>Motorcycle</cell><cell>0.24</cell><cell>0.29</cell><cell>20.83</cell><cell>0.28</cell><cell>0.26</cell><cell>-7.14</cell></row><row><cell>Bicycle</cell><cell>0.21</cell><cell>0.24</cell><cell>14.29</cell><cell>0.2</cell><cell>0.25</cell><cell>25</cell></row><row><cell>Average</cell><cell>0.34</cell><cell>0.38</cell><cell>11.81</cell><cell>0.34</cell><cell>0.37</cell><cell>8.7</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Digital image processing using MATLAB</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">S L</forename><surname>Gonzalez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">C</forename><surname>Woods</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">E</forename></persName>
		</author>
		<imprint>
			<date type="published" when="2004">2004</date>
			<publisher>Pearson Education India</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Contour tracking by stochastic propagation of conditional density</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">A</forename><surname>Isard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">European conference on computer vision</title>
				<meeting><address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="1996">1996</date>
			<biblScope unit="page" from="343" to="356" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Efficient mean-shift tracking via a new similarity measure</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">L</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Duraiswami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR&apos;05)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="176" to="183" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">An extended set of haar-like features for rapid object detection</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Lienhart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings. international conference on image processing</title>
				<meeting>international conference on image processing</meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Object tracking using sift features and mean shift</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">C</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computer vision and image understanding</title>
				<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="345" to="352" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Orb: An efficient alternative to sift or surf</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">E</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2011 International conference on computer vision</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="2564" to="2571" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Pedestrian detection and tracking using temporal differencing and hog features</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">T</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computers Electrical Engineering</title>
		<imprint>
			<biblScope unit="page" from="1072" to="1079" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">J Y</forename></persName>
		</author>
		<imprint>
			<date type="published" when="2001">2001</date>
			<publisher>Intel Corporation</publisher>
			<biblScope unit="page" from="1" to="10" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Handcrafted and deep trackers: Recent visual object tracking approaches and trends</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ACM Computing Surveys (CSUR)</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="1" to="44" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Multipath redundant transmission with packet segmentation</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">S V</forename><surname>Bogatyrev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">V</forename><surname>Bogatyrev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">A</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Wave Electronics and its Application in Information and Telecommunication Systems</title>
				<imprint>
			<publisher>WECONF</publisher>
			<date type="published" when="2019">2019. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Back up data transmission in real-time duplicated computer systems</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">V</forename><surname>Arustamov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Bogatyrev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">A</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Intelligent Systems and Computing</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="103" to="109" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Exchange of duplicated computing complexes in fault-tolerant systems</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">V A</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Automatic Control and Computer Sciences</title>
				<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="268" to="276" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Learning to track at 100 fps with deep regression networks</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Held</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Thrun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">European Conference on Computer Vision</title>
				<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="749" to="765" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Spatially supervised recurrent convolutional neural networks for visual object tracking</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">G</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Symposium on Circuits and Systems (ISCAS)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2017">2017. 2017</date>
			<biblScope unit="page" from="1" to="4" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Classification and regression by randomforest</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">A</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">R news</title>
		<imprint>
			<biblScope unit="page" from="18" to="22" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Tracking-learning-detection</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Kalal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Mikolajczyk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE transactions on pattern analysis and machine intelligence</title>
				<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="1409" to="1422" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">High-performance visual tracking with siamese region proposal network</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">B</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</title>
				<meeting>the IEEE Conference on Computer Vision and Pattern Recognition</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="8971" to="8980" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">The unscented kalman filter for nonlinear estimation</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">D M R</forename><surname>Wan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">A</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium</title>
				<meeting>the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium</meeting>
		<imprint>
			<date type="published" when="2000">2000</date>
			<biblScope unit="page" from="153" to="158" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Auto-rectification of user photos</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">S</forename><surname>Chaudhury</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Diverdi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Image Processing (ICIP)</title>
				<imprint>
			<date type="published" when="2014">2014. 2014</date>
			<biblScope unit="page" from="3479" to="3483" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Bdd100k: A diverse driving video database with scalable annotation tooling</title>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">F</forename></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
