Assessment of camera orientation in Manhattan scenes using information from optical and inertial sensors Evgeny Myasnikov Geoinformatics and Information Security department Samara National Research University; Image Processing Systems Institute of RAS - Branch of the FSRC "Crystallography and Photonics" RAS Samara, Russia mevg@geosamara.ru Abstract—In the present paper, the solution to the problem taking into account the data of the inertial sensor, determine of assessing the orientation of a camera is performed under the the orientation of the camera. The method for determining condition of two main limitations. The first limitation is the vanishing points described in this paper is based on the idea analysis of Manhattan scenes only. The second one is the described in [4], according to which the search for horizontal presence of an accelerometer in a mobile device. To assess the vanishing points can be performed along the horizon line characteristics of the proposed solution, a data set was defined by a plane orthogonal to the direction of the vertical prepared containing both photos and accelerometer readings, vanishing point. as well as information about the true orientation of the device. Experimental studies were carried out using the prepared data Unfortunately, common data sets for the evaluation of set. vanishing point assessment methods (see, for example, [5]) do not contain information from inertial sensors. For this Keywords—camera orientation, vanishing point, Manhattan reason, their use for evaluating methods similar to those scenes, accelerometer, inertial sensor described in this paper is possible only in the mode of sensors emulation, as it was done, for example, in [6]. I. INTRODUCTION Assessing the camera orientation is one of the most For the above reason, to evaluate the characteristics of important tasks in three-dimensional computer vision. the proposed solution, we prepared our own data set Typically, camera orientation is estimated using calibration containing both photos and accelerometer readings as well as patterns, and it requires human interaction. For this reason, information about the true camera orientation. Experimental automatic methods for assessing the orientation are of studies were carried out using the prepared data set. particular interest. It should be noted that the initial implementation of the Despite the presence of various sensors in modern mobile algorithm for determining vanishing points was previously devices, such as an accelerometer, compass, etc., their use described in [6]. Thus, in the present work, the previously for orientation estimation is limited due to the low accuracy proposed approach is further developed and studied, using and the influence of noise [1]. For this reason, both optical the data set prepared as part of the work. information and information from the sensors of mobile The work is organized as follows. Section 2 describes the devices are used to determine the orientation of the camera. developed method for assessing camera orientation. Section In this paper, we consider a method for assessing the 3 describes the modeling technique and conducts orientation of a camera, based on the analysis of the position experimental studies. The work ends with a conclusion and a of vanishing points [2], i.e. the points in the plane of a list of used literature. perspective image, in which projections of mutually parallel II. METHOD lines of three-dimensional space converge. In this case, the problem is solved under the condition of two main As it was mentioned in the introduction, the described limitations. The first limitation is the restriction of the class method consists of sequentially determining three vanishing of analyzed scenes only to Manhattan scenes [3], in which points, followed by finding the orientation of the camera. the lines are aligned along three main mutually orthogonal The general scheme of the method is presented in Fig. 1. directions. Vivid examples of such scenes are photographs of First, preliminary processing of the image received from city buildings (the lines of building facades may possess the camera is performed. In particular, it is scaled and rotated these characteristics), road scenes (border of the roadway, with an accuracy of 90 degrees in accordance with the markings, poles), indoor scenes (borders of rooms, furniture information received from the inertial sensor. If necessary, lines, decoration elements - panels, tiles, etc.). The second the vector received from the sensor is transformed so as to limitation is the presence of an accelerometer on a mobile correspond to the direction of gravity for the rotated image. device. After preliminary processing by one of the known The orientation of the camera in this paper is determined methods, for example, by the Canny method [7], contours are sequentially in several stages. At the first stage, using the extracted from the image. The extracted contours are traced inertial sensor readings, the direction to the first vanishing and the segments of straight lines are searched. The found point corresponding to the direction of gravity is determined. segments form the set L, which will be used subsequently to After that, the position of the first vanishing point is refined find vanishing points. along vertical lines in the optical image. At the second stage, the vanishing points of the horizontal lines of the main and Further, the information obtained from the inertial sensor side facades are determined. So the found vanishing points, is used for a preliminary assessment of the first vanishing Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) Image Processing and Earth Remote Sensing point VP1. It is assumed that the direction to the first In the direction obtained, a set L1 of segments is selected vanishing point corresponds, up to a sign, to the gravity such that the lines corresponding to this set deviate from the vector. direction to VP1 no more than the predefined angle. Pre-processing of an image and inertial sensor readings Extraction and tracing of contours, search for line segments L Preliminary assessment of the first vanishing point VP1 using information from an inertial sensor The formation of the set L1 of segments corresponding to the first vanishing point VP1 Finding VP1 by L1 is no possible yes Refinement of the position of VP1 using L1 Estimation of the horizon plane and horizon line Г in the image plane Search for the points pi as intersections of the extracted lines from L’ = L \ L1 with the horizon line Г Search the interval h on the horizon line Г containing the maximum number of intersection points pi Formation of the set L2 of line segments corresponding to the intersection points pi Finding VP2 by L2 is possible no yes Estimation of the second vanishing point Estimation of the second vanishing point VP2 by the intersection points pi  h VP2 from the set of line segments L2 Calculation of the third vanishing point VP3 Assessment of the camera orientation R Fig. 1. General scheme of the method. If there are enough selected segments, the first vanishing assessment of VP1 by L1 is not possible, the initial estimation point is refined by a weighted summation of the points of VP1 is used for further processing. determined by all possible segments from L1. Moreover, segments of greater length have more weight. If the At the next stage, the direction to VP1 is used to determine the horizon line plane as the plane passing through the origin of the modeled optical system and orthogonal to VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 6 Image Processing and Earth Remote Sensing the direction to VP1 (refined gravity vector). In addition to To obtain information about the true position of the the plane, the horizon line Г is also determined as the camera, several (from 3 to 7 for each vanishing point) lines projection of a line, which is in the horizon plane and non- were manually selected that reliably determine the directions orthogonal to the image, on the image plane. to the true vanishing points. This procedure was performed at 2x magnification, and normalized vanishing points obtained Further, for all the lines l i  L’ extracted in the image, using selected lines were considered as true vanishing points. with the exception of the lines used earlier to find the At the moment, the described data set consists of 40 images vanishing point VP1 (L’ = L \ L1), the intersection points with of buildings with the corresponding inertial sensor data and the horizon line Г are determined. A search is made for such true orientation data. a segment h (with a predetermined angular size) on the horizon line, at which the maximum number of intersection points pi falls. After this, we form the set L2 of line segments, for which the intersections pi with the horizon line Г fall in the indicated interval h. If there are enough selected segments, the second vanishing point is estimated by weighted summation of the intersection points determined by all possible segments from L2. If the estimation of VP2 by L2 is not possible, the weighted sum of the points of intersection of the corresponding lines with the horizon line Г is taken for the position of VP2. In both cases, the segments of longer length have more weight when determining VP2. (a) After determining two vanishing points, the third is found as a vector orthogonal to the vectors corresponding to the first and second points: V3 = V1 × V2. After finding vanishing points, the camera orientation can be found as follows: R=[r1 r2 r3], where R is the rotation matrix, and vectors r1 r2 r3 are calculated as r1 = mK-1VP1, r2 = mK-1VP2, r3 = r1 × r2, where m is the scale factor, K is the matrix of internal (b) parameters of the camera [8] containing information on the focal length, pixel size, tilt, the shift of the image center relative to the optical axis. In general, the proposed method is the development of the previously described method [6], the main idea of which [4] is to search for horizontal vanishing points along the horizon line defined by a plane orthogonal to the direction of the vertical vanishing point. Compared with the previous implementation, both the individual steps of the method underwent changes (the search for segments on the contours is now carried out according to the criterion of maximum deviation, the weighted summation takes into account the lengths of the segments of lines, which are separated from (c) each other by a sufficient distance, the second vanishing Fig. 2. An example of the method: a) extracted contours (white) and the point is refined without using histograms), as well as the set of lines segments corresponding to the first vanishing point (blue); b) general scheme of the method (now contains branches that the horizon line (red) and the set of lines defining the second vanishing increase the reliability of determining vanishing points, as point (red); c) directions to true (dashed lines) and estimated (solid lines) well as the actual orientation estimation stage). vanishing points. III. EXPERIMENTS An example demonstrating the various stages of the To study the method described above, we used our own proposed method is shown in Fig. 2. specially prepared data set. This set was collected using the To assess the quality of the developed method, modeling Huawei Honor 9 lite smartphone [9]. Its camera has a CMOS BSI sensor with an f / 2.2 aperture, a focal length of 3.46 was performed according to the following scheme: mm, and produces a color image of 12.98 MP. To collect images and inertial sensor data, we developed an Android  for each image from the prepared data set, three application that stores both the captured images and a custom vanishing points and camera orientations relative number of accelerometer readings recorded prior to the shot. to the building depicted in the photograph were determined; VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 7 Image Processing and Earth Remote Sensing  using information about the true position, for the «Crystallography and Photonics» Research Center of the each vanishing point, the error was calculated as RAS in parts «1. Introduction» and «4. Conclusion». the angular deviation of the direction to the estimated vanishing point from the true direction;  based on the data obtained for each vanishing point, a histogram of the angular deviation of the found points from their true values was constructed, and the average value of such a deviation was also calculated. The experimental results are shown in the following Figure 3. Each of the histograms shown in the figure shows the angular deviation of the estimated vanishing point from its (a) true position. In the ideal case, such a histogram should have a single column on the left side (first), which means the minimum deviation of the vanishing point from the true values for all test images. As can be seen from the above figures, in most cases the position of the three vanishing points was made with a deviation of up to 2º, while the deviation exceeded 4º was observed for only 3 of 40 images. The average error values were: 1.69º, 1.54º, and 1.88º for the first, second, and third points, respectively. It should be noted that using only information from the inertial sensor (see the histogram in Fig. 1 (a)) provided a greater level of errors in determining the direction to the first vanishing point. The average error value was 3.7º when (b) using only an inertial sensor versus 1.69º when refined with an optical image. Thus, the accuracy of the algorithm can be improved in conditions of noisy readings of the gravity vector by selecting parameters. Another way to increase accuracy may be to use previously obtained estimates in processing a video stream, which is the subject of future research. IV. CONCLUSION The method for automatic assessment of the orientation of a camera in Manhattan scenes using information from optical and inertial sensors is proposed and investigated. To study the developed technique, the data set was created containing (c) digital images of buildings, readings of inertial sensors, as well as information about the true position of vanishing points obtained by careful manual marking of the source images. The described method is simple to implement, and undemanding to computing resources. Its use allows reducing the average level of errors in determining the orientation in more than 2 times compared with the inertial sensor. As a direction for further work, it is planned to expand the method for assessing the orientation and position of the camera when working with a video stream. (d) ACKNOWLEDGMENT Fig. 3. Estimation of the method quality. Histograms of the angles The work was partly funded by RFBR according to the deviations of directions to the vanishing points from their true values (in research project 17-29- 03190-ofi-m in parts of «2. Method» degrees): a) the first vanishing point, estimated by inertial sensor readings; - «3. Experiments» and by the Russian Federation Ministry b) the first vanishing point, refined by an optical image; c) the second vanishing point; d) the third vanishing point. of Science and Higher Education within a state contract with VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 8 Image Processing and Earth Remote Sensing REFERENCES [5] P. Denis, J.H. Elder and F. Estrada, “Efficient Edge-Based Methods for Estimating Manhattan Frames in Urban Imagery,” Proc. European Conference on Computer Vision, vol. 5303, pp. 197-211, 2008. [1] V.V. Myasnikov and E.A. Dmitriev, “The accuracy dependency [6] E. Myasnikov, “Automatic search for vanishing points on mobile investigation of simultaneous localization and mapping on the errors devices,” CEUR Workshop Proceedings, vol. 2391, pp. 216-221, from mobile device sensors,” Computer Optics, vol. 43, no. 3, pp. 2019. 492-503, 2019. DOI: 10.18287/2412-6179-2019-43-3-492-503. [7] J. Canny, “A Computational Approach To Edge Detection,” IEEE [2] B. Caprile and V. Torre, “Using vanishing points for camera Transactions on Pattern Analysis and Machine Intelligence, vol. 8, no. calibration,” International Journal of Computer Vision, vol. 4, no. 2, 6, pp. 679-698, 1986. pp. 127-139, 1990. [8] R. Hartley and A. Zisserman, “Multiple View Geometry in Computer [3] J.M. Coughlan and A.L. Yuille, “Manhattan World: compass Vision,” Cambridge, Cambridge University Press, 2004. direction from a single image by Bayesian inference,” Proceedings of [9] Huawei.com, “HONOR 9 Lite,” 2020. [Online]. URL: the Seventh IEEE International Conference on Computer Vision, https://consumer.huawei.com/ru/support/phones/honor-9-lite. vol.2, pp. 941-947, 1999. [4] V. Angladon, S. Gasparini and V. Charvillat, “The Toulouse . vanishing points dataset,” Proceedings of the 6th ACM Multimedia Systems Conference (MMSys ’15), Portland, United States, 2015. VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 9