<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Panorama Stitching Method Using Sensor Fusion</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Aleksei</forename><surname>Goncharov</surname></persName>
							<email>goncharov.aleshka@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">ITMO University</orgName>
								<address>
									<addrLine>49 Kronverksky Pr., St. Petersburg</addrLine>
									<postCode>197101</postCode>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sergei</forename><surname>Bykovskii</surname></persName>
							<email>sergei_bykovskii@itmo.ru</email>
							<affiliation key="aff0">
								<orgName type="institution">ITMO University</orgName>
								<address>
									<addrLine>49 Kronverksky Pr., St. Petersburg</addrLine>
									<postCode>197101</postCode>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Panorama Stitching Method Using Sensor Fusion</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">00A94D01A88F8D9429F1A207997EF064</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T00:34+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Computer vision</term>
					<term>Embedded systems</term>
					<term>Cyber-Physical systems</term>
					<term>Panoramic photography</term>
					<term>Sensor fusion</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>A commonly used solution for stitching a set of images into a panorama is to use computer vision algorithms. The greatest computational complexity in these algorithms present by the methods of image analysis, specifically, the methods for finding key points. Now there are many methods for finding key points, suitable for various conditions and shooting parameters of the initial set of frames. By choosing the correct method, you can avoid stitching defects and get the final image faster. This article introduces a method that allows you to consider the initial set of images and select a suitable algorithm for finding key points by using various data from sensors. This method allows obtaining final panoramic images without significant defects, as well as better performance relative to the compared methods for finding key points. The developed method, using the PASSAT dataset as an example, made it possible to obtain a final panoramic image of about 1.33 Mb in size in 16 seconds, regardless of the number of frames used (11, 8 or 6) with an angular displacement of (25/35/45) degrees, respectively.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Panoramic images are widely used as a means of information support for various control systems. Creation of panoramic images can be performed using specialized hardware, but this is associated with high financial costs and requires serious professional photography skills. As an alternative to specialized devices, you can use ordinary cameras, turning the camera in the desired direction and taking a sequence of frames, and then using computer vision methods to stitch the original frames and get a panoramic image. Computer vision methods allow obtaining panoramic images in the general case according to the following algorithm <ref type="bibr" target="#b1">[2]</ref>:</p><p>1. Detection of special points of images and their comparison; 2. Construction of a projective transformation for aligning images and transferring them to a common plane; 3. Stitching images aligned relative to each other.</p><p>To construct a feature description of an image, it is necessary to select the characteristic parts of the image, for example, corners, edges, regions corresponding to extrema of intensity, etc. Algorithms that highlight such features (key points) should be invariant to various transformations: displacement, rotation, zoom and illumination of the original image, as well as the position of the camera relative to the captured object (change in perspective). To search for interpreted information on the image, it is necessary to link to the local features of the image.</p><p>Different algorithms for selecting key points do not provide universal solutions for different images due to the specifics of determining local features. In this study, it is proposed to use various sensor data to select the most appropriate algorithm for searching for key points, depending on the scene in the image, the angular displacement between frames, illumination, and other parameters.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related works</head><p>The technical literature is rich in new detection features and image description algorithms <ref type="bibr" target="#b0">[1]</ref>. However, to this day, there is no ideal detector <ref type="bibr" target="#b3">[4]</ref>. This is mainly due to the almost infinite number of possible computer vision applications (which may require one or more functions) <ref type="bibr" target="#b4">[5]</ref>, the discrepancy in the image conditions (zoom, viewpoint, lighting and contrast, image quality, compression, etc.) <ref type="bibr" target="#b1">[2]</ref> and the possible scene <ref type="bibr" target="#b5">[6]</ref>. The computational efficiency of such detectors becomes even more important when considered for real-time applications <ref type="bibr" target="#b2">[3]</ref>.</p><p>Three algorithms (SIFT, SURF, ORB) were studied in detail and the following conclusions were made:</p><p>1. ORB algorithm -the fastest algorithm, but with a lower percentage of matches among other algorithms. 2. The SIFT algorithm is the slowest, but at the same time it surpasses other algorithms in terms of percentage coincidence in most cases of frame distortion considered. 3. The SURF algorithm is close enough in percentage coincidence to the SIFT algorithm and is close in speed to the ORB algorithm. 4. It is important to note that the ORB algorithm finds key points mainly in the center of the image, while the SIFT and SURF algorithms are evenly distributed over the entire image.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Proposal</head><p>When creating panoramas, you can use various auxiliary data of the device from which the shooting was carried out: based on a timer, an encoder, a gyroscope, or other sensors. Smartphones are often used to capture panoramic images, the image below shows the various sensors found on most smartphones. For general control of the initial set of images, control of overlapping between frames and offsets along the axes, angular displacements, you can modify the basic algorithm based on the OpenCV library as shown in the block diagram below. As can be seen from the proposed block diagram, before starting the algorithm, it is planned to analyze the position of the camera between images to warn the user about the uselessness of processing this set of frames. With further stitching of images, it is proposed to estimate the displacement between frames, and, accordingly, the total area of overlap between frames. This approach will allow full control of the original images for the suitability of stitching into a general panoramic image, as well as control between individual frames, stopping the algorithm when the general overlaps between images are lost, and at the output of the algorithm, the user will not be provided with a full panorama, but correct in terms of image integrity. Figure <ref type="figure" target="#fig_0">1</ref> below shows the minimum required equipment for using the developed method and briefly shows the algorithm.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Evaluation</head><p>To test the developed method, use the PASSTA datasets (ie, image sets) of Linkoping University. These image sets have a few functions:</p><p>1. Images were taken from a camera mounted on a tripod. 2. Between each subsequent image, the camera rotates around the vertical axis through the optical center. 3. Small enough.</p><p>The dataset includes three sets of images. Sets of images were used for the experiment:</p><p>1. Blue Dining Room: Contains 72 images captured with the Canon DS50, perspective lenses with a poor resolution of 1280 x 1920 pixels under lighting. The panoramic head was used to rotate approximately 5 degrees around the vertical axis around the optical center of the camera. 2. Dining Room: Consists of 72 images captured with Canon DS70 Samyang 2.8 / 10mm wide-angle lenses (about 105 degrees), with a resolution of 5740x3780 pixels. The panoramic head was used to rotate approximately 5 degrees of the vertical axis around the optical center of the camera.   Figure <ref type="figure" target="#fig_11">11</ref> shows a comparison diagram with different input data for the developed method and the standard method library OpenCV using different methods for finding key points. Defects in the final images are indicated separately. Table <ref type="table" target="#tab_0">1</ref> below shows the results obtained with various input data and methods used.   The operating time of the developed method was estimated for various sets of initial images. Based on the data obtained, the following conclusions can be drawn:</p><p>1. The developed method, regardless of the displacement between frames, creates a panoramic image in approximately the same time 2. With an angular displacement between frames up to 45 degrees for light scenes and up to 40 for dark scenes, the developed method does not create obvious defects in the final image.   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>The analysis of existing methods for creating panoramic images carried out, methods for finding key points in images using computer vision are analyzed in detail. As a result of the analysis, it was concluded that the presence of different methods is dictated by the difference in applied problems and, accordingly, objects in the images that require the search for key points.</p><p>A method was developed for creating panoramic images using multisensory data based on the OpenCV library in the Python programming language. To improve the quality of the created panoramic images by using the most suitable key point search algorithm for scenes on the original frames, as well as to control the mutual overlap between frames, shifts and displacements, it was proposed to add data from position sensors (gyroscope and accelerometer) to the algorithm. Choosing the optimal algorithm for finding key points also allows you to reduce the total running time of the algorithm without losing quality. It can be concluded that multisensor data is useful for creating panoramic images. At the same time, the developed method can be implemented in an embedded system due to a decrease in the operating time due to the use of an optimal algorithm for finding key points. The developed method, using the PASSAT dataset as an example, made it possible to obtain a final panoramic image of about 1.33 Mb in size in 16 seconds, regardless of the number of frames used (11, 8 or 6) with an angular displacement of (25/35/45) degrees, respectively.</p><p>The developed method is adapted for expansion and use with other algorithms for finding key points, as well as the use of various sensors. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Proposed algorithm with a set of sensors and a camera required for use</figDesc><graphic coords="3,130.96,150.32,333.36,222.86" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2</head><label>2</label><figDesc>Figure2shows the developed algorithm based on OpenCV in detail. Flowchart shows in detail the stages of creating a panoramic image -possible algorithms for finding key points, methods for comparing the found key points, as well as the necessary information from the sensors and their influence on the algorithm. It is important to note the potential for expanding this developed algorithm for use with other algorithms for finding key points.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Proposed algorithm based on OpenCV algorithm</figDesc><graphic coords="4,89.29,84.18,416.68,588.54" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figures 3 and 6</head><label>6</label><figDesc>Figures 3 and 6 below show the results of the developed application at various angular displacements between frames on the proposed datasets. For comparison, the results of work with the same initial data of the basic algorithm are presented in Figures 4, 5 for the "LunchRoomBlue" set, in Figures 7, 8 for the "LunchRoom" set. Figures 9 show thedefect and distortions used in panoramas when the algorithm operates in the angular values that are limiting for the search for key points, with violation of the spacing. Figure10shows for comparison the work of the developed method and OpenCV tools when using various methods for finding key points, the arising defects in the final images are separately marked.</figDesc><graphic coords="5,130.96,291.03,333.37,91.59" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Panoramic image obtained from 8 frames of the "LunchRoomBlue" image set with an angular displacement of 35 degrees between frames using the developed algorithm.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Panoramic image obtained from 8 frames of the "LunchRoomBlue" image set with an angular offset of 35 degrees between frames using the OpenCV algorithm (SURF).</figDesc><graphic coords="5,130.96,438.14,333.38,143.53" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Panoramic image obtained from 8 frames of the "LunchRoomBlue" image set with an angular offset of 35 degrees between frames using the OpenCV algorithm (ORB).</figDesc><graphic coords="6,130.96,84.19,333.36,112.51" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Panoramic image obtained from 6 frames of the "LunchRoomBlue" image set with an angular displacement of 45 degrees between frames using the developed algorithm.</figDesc><graphic coords="6,130.96,242.08,333.37,123.34" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Panoramic image obtained from 6 frames of the "LunchRoomBlue" image set with an angular displacement of 45 degrees between frames using the OpenCV algorithm (SURF).</figDesc><graphic coords="6,151.80,410.81,291.69,108.96" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_9"><head>Figure 8 :Figure 9 :</head><label>89</label><figDesc>Figure 8: Panoramic image obtained from 6 frames of the "LunchRoomBlue" image set with an angular displacement of 45 degrees between frames using the OpenCV algorithm (ORB).</figDesc><graphic coords="7,151.80,85.15,291.69,134.83" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_10"><head>Figure 10 :</head><label>10</label><figDesc>Figure 10: Method test on PASSAT dataset.</figDesc><graphic coords="7,89.29,455.03,416.69,194.32" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_11"><head>Figure 11 :</head><label>11</label><figDesc>Figure 11: Testing the method on the PASSAT dataset (LunchRoom).</figDesc><graphic coords="8,89.29,84.19,458.37,205.31" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Method test for PASSAT dataset(LunchRoom).</figDesc><table><row><cell>Panorama stitching method</cell><cell>Angular offset between frames, degrees</cell><cell>Number of frames</cell><cell>Panorama stitching time, seconds</cell><cell>Final image size, Mb</cell><cell>Presence of ob-vious defects</cell></row><row><cell>OpenCV (ORB)</cell><cell>25</cell><cell>11</cell><cell>16,04</cell><cell>1,27</cell><cell>No</cell></row><row><cell>OpenCV (SURF)</cell><cell>25</cell><cell>11</cell><cell>18,49</cell><cell>1,3</cell><cell>No</cell></row><row><cell>OpenCV (SIFT)</cell><cell>25</cell><cell>11</cell><cell>19,23</cell><cell>1,29</cell><cell>No</cell></row><row><cell>Developed method</cell><cell>25</cell><cell>11</cell><cell>16,04</cell><cell>1,27</cell><cell>No</cell></row><row><cell>OpenCV (ORB)</cell><cell>35</cell><cell>8</cell><cell>14,11</cell><cell>1,33</cell><cell>Yes</cell></row><row><cell>OpenCV (SURF)</cell><cell>35</cell><cell>8</cell><cell>16,27</cell><cell>1,38</cell><cell>No</cell></row><row><cell>OpenCV (SIFT)</cell><cell>35</cell><cell>8</cell><cell>16,65</cell><cell>1,32</cell><cell>No</cell></row><row><cell>Developed method</cell><cell>35</cell><cell>8</cell><cell>16,27</cell><cell>1,38</cell><cell>No</cell></row><row><cell>OpenCV (ORB)</cell><cell>45</cell><cell>6</cell><cell>13,41</cell><cell>1,35</cell><cell>Yes</cell></row><row><cell>OpenCV (SURF)</cell><cell>45</cell><cell>6</cell><cell>14,93</cell><cell>1,31</cell><cell>Yes</cell></row><row><cell>OpenCV (SIFT)</cell><cell>45</cell><cell>6</cell><cell>15,58</cell><cell>1,34</cell><cell>No</cell></row><row><cell>Developed method</cell><cell>45</cell><cell>6</cell><cell>15,58</cell><cell>1,34</cell><cell>No</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A survey of recent advances in visual feature detection</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Tian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Ding</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neurocomputing</title>
		<imprint>
			<biblScope unit="volume">149</biblScope>
			<biblScope unit="page" from="736" to="751" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Extracting semantic information from visual data: A survey</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Robotics</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Automated real-time video surveillance algorithms for soc implementation: A survey</title>
		<author>
			<persName><forename type="first">E</forename><surname>Salahat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Saleh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mohammad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Al-Qutayri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sluzek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ismail</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Electronics, Circuits, and Systems</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A K</forename><surname>Tareen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Saleem</surname></persName>
		</author>
		<idno>-C. 1-10</idno>
		<title level="m">A comparative analysis of sift, surf, kaze, akaze, orb, and brisk //2018 International conference on computing, mathematics and engineering technologies (iCoMET). -IEEE</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><surname>Karami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Prasad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Shehata</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1710.02726.-2017</idno>
		<title level="m">Image matching using SIFT, SURF, BRIEF and ORB: performance comparison for distorted images</title>
				<imprint/>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Comparison of image matching techniques</title>
		<author>
			<persName><forename type="first">N</forename><surname>Jayanthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Indu</surname></persName>
		</author>
		<idno>-T. 7</idno>
	</analytic>
	<monogr>
		<title level="j">International Journal of Latest Trends in Engineering and Technology</title>
		<imprint>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="396" to="401" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
