Adaptive algorithm of conforming image matching V A Fursov1,2, Ye V Goshin1,2 and K G Pugachev1 1 Samara National Research University, Moskovskoe Shosse 34А, Samara, Russia, 443086 2 Image Processing Systems Institute of RAS - Branch of the FSRC "Crystallography and Photonics" RAS, Molodogvardejskaya street 151, Samara, Russia, 443001 e-mail: pugachev_k.g@mail.ru Abstract. This paper presents an adaptive algorithm of conforming image matching based on the principle of conformity. The algorithm consists of several main stages. At the first stage, we find the corresponding points using a minimum value of conformity as the measure of points’ similarity. We define a conformity function as the sum of all possible combinations of squared differences of pixel intensity values on the fragments that are matched. Then, we perform an adaptive procedure of errors correction considering an intensity gradient distribution. An important feature of the algorithm is the finding of error points using a criterion of maximum value of samples’ conformity for every fragment of the disparity map. The results of experiments on the "Teddy" test images are shown. 1. Introduction Image matching is the most important and difficult step in computer vision applications, such as image stitching, the creation of three-dimensional models of objects or scenes by a sequence of frames etc. At least three approaches to solve image matching problem are known: area-based (1), feature-based (2), and symbolic (3) approach. In the framework of the area-based approach [1], which is used in this paper, for the fragments of the first image, the most similar fragment is need to be found. The choice of fragments similarity measure is very important, when the area-based approach is used. There are many similarity measures. The absolute differences (AD) and squared differences (SD) and the sum of absolute differences (SAD) are the most commonly used to compute matching cost. Also, the sum of squared differences (SSD), normalized cross correlation (NCC), rank transformation (RT), and census transformation (CT) are the popular methods to aggregate matching cost. In detail, these methods were described in paper [2]. In paper [3] Wang et al. applied the AD algorithm to real-time image matching using GPU. The results show that AD algorithm is not capable of producing a smooth disparity map in highly textured images. To overcome it, Min et al. [4] and Pham and Jeon [5] implemented truncated absolute differences (TAD) algorithm, which uses the colour and gradient values in pixels matching to improve its robustness against variations in illumination. SD algorithm was applied to subpixel image matching by Yang et al. [6]. To improve the flattening of edges and to smooth areas, they used a bilateral filter. In paper [7] Miron et al. researched various matching cost functions. They concluded that the SD algorithm produces the largest error. V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) Data Science V A Fursov, Ye V Goshin and K G Pugachev Lee and Sharma [8] implemented the real time image matching algorithm using the SAD algorithm to calculate matching costs via GPU. In paper [9] Gupta and Cho proposed a technique, which uses two different sizes of correlation windows in the SAD algorithm. As denoted in [2], the SAD algorithm is fast, but the quality of the initial disparity map is low because of the noise at object boundaries and in homogeneous regions. In paper [10] Fusiello et al. used the SSD algorithm with multiple fixed window blocks to reduce the incidence of occlusion errors. In paper [11] the SSD algorithm with multiple windows was implemented by Yang and Pollefeys using GPU. Low-dimensional image features matching method using NCC was proposed in [12] by Satoh. This method achieves a high accuracy, but requires a lot of computational resources. Cheng et al. [13] proposed to calculate matching cost using a zero mean normalized cross correlation (ZNCC). An image matching algorithm using the RT algorithm was proposed by Gac et al. [14]. The method provides a reliable initial disparity map by using a careful selection of the window sizes. A new RT algorithm was developed in paper [15] to reduce matching ambiguity using a Bayesian model. This model considers similarity between the first and the second image pixels and the level of ambiguity within each image independently. To improve image matching accuracy, some researchers tried using combinations of various cost functions. For instance, Mei et al. [16] implemented a combination of the AD and CT algorithms to compensate their limitations. They showed that CT algorithms produce incorrect matches in regions with repetitive local structures, and the AD algorithm does not work well in the large regions without texture. The combination of the SAD and CT algorithms led to higher performance, but computational complexity has become higher as well [17]. The SAD and CT cost measures were obtained individually. The final cost function was calculated as a linear combination of SAD and CT cost measures based on a weighting factor. Zhang et al. [18] has improved the matching accuracy by using a combination of SAD and the arm length differences (ALD) methods. Lee et al. [19] used combination of CT and gradient difference methods. However, due to similar or repetitive texture patterns, matching ambiguity can appear in some regions. In paper [20] Perez-Patricio et al. has proposed a fuzzy logic method using the various cost functions simultaneously and decision-making based on membership functions. The method uses both local and global information to reduce errors’ count in homogeneous areas and to preserve the edges near discontinuities. When the area-based approach is used, many points with wrong disparity value can appear in disparity map. In our previous papers we have tried to solve this problem by using a new criterion of similarity. In particular, in papers [21, 22] a new fragments’ similarity criterion was presented. Although we have obtained the results comparable and in some cases are better than results of semi- global block matching, however, this problem was not fully solved. The traditional way of noise removal is smoothing filtering of image representing disparity map. This method can give visually better image, however this smoothing procedure unavoidably distorts into disparity map points with true value of disparity. Consequently, the three-dimensional coordinates of the scene, which are calculated using these points, can also be distorted. So, the problem of filtering procedure creating, which performs error correction without affecting the other points with high reliability, is actual task. 2. Problem statement In this paper we develop an approach based on the principle of fragments conformity. The method was proposed in [21, 22]. Let us state that F1 and F2 are two images, which are obtained by multi-view shooting. For every point of these images we can specify a rectangular fragment with size of K  L , K and L are odd. In general case, K  L. Every fragment of the first image F1 can be presented in the form of vector f1 , obtained by scanning by rows or columns. V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 27 Data Science V A Fursov, Ye V Goshin and K G Pugachev Let’s specify rectangular search area with size of N  M on the second image F2 for every point of the first image F1 . For every point in search area we can form vector f 2 with the same size in a similar way. The task is to find corresponding point (central point of f 2 , the most similar for some measure) in the search area of the second image F2 for every point (central point of f1 ) of first image F1 . For certainty, we assume that fragments sizes are odd and the search area is a set of points, which can be used as centres of fragments. For points of the first image with coordinates (k, l) and N  M points in search area in second image vectors of differences are calculated: f (n, m)  f2 (n, m)  f1 (k , l ), n  1, N , m  1, M , (1) where N and M are count of rows and columns of the search area. Then conformity function is calculated as: S W (n, m)   (fi (n, m)  f j (n, m))2 , i , j 1, i j (2) where f i is i th element of vector (1), S  K  L is elements count of f1 and f 2 vectors, n  1, N , m  1, M . Point (n* , m* ) in search area of the second image with minimum value of conformity function is chosen as corresponding point for point (k, l) at the first image: W (n* , m* )  min W (n, m) n , m (3) We have used this criterion in papers [21, 22], where we have shown that this criterion allows improving an accuracy of image matching. The important feature is that all possible combinations of fragments samples differences are used. So, this criterion gives much more information about pixel intensity distribution of image fragment and provides a high reliability of image matching. However, as we mentioned above r, the points with wrong calculated disparity can occur in the areas with low values of gradient function. Proposed filter, which allows overcoming it, is also based on the principle of samples conformity. The main idea is to use the conformity function like (2) to find error point and replace its value by values of neighbouring points. But the points without errors are not will be affected. This method allows avoiding errors in image areas with low values of pixel intensity gradient. 3. Method of error correction We develop an error correction algorithm for areas of disparity map image with low values of pixel intensity gradient. Let us state that Fd is image representing obtained disparity map. Let’s choose fragment of this image with size of P  Q , P and Q are odd. In general case P  Q . For every point  p, q  we calculate value of local conformity with other points of this fragment: S 1 W ( p, q )   ( f ( p, q)  f (i, j)) , i 1, P , 2 j 1,Q i p, j q (4) where f ( p, q), f (i, j ) are values of fragments samples in points ( p, q) and (i, j ) , and S  P * Q . * * Now S functions of local conformity are calculated. We search point ( p , q ) with the maximum value of local conformity and assume it as error point: W ( p* , q* )  max W ( p, q) p , q . (5) V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 28 Data Science V A Fursov, Ye V Goshin and K G Pugachev This statement is bases on an assumption that all points of this fragment are in areas with low value of pixel intensity gradient. So, the three-dimensional coordinates of elements of the same objects can’t be too different from neighbouring points, because the objects don’t contain any tears of surface. To correct the value of error point ( p* , q* ) , an error value is replaced by mean value of variational range of fragment samples. In case, when count of errors in disparity map is low, it’s useful to compare an error value with mean value of variational range. If the difference is low and can be compared with possible differences in coordinates of surface elements, this replacement will not change anything. It can be seen that all points will remain the same. So, the coordinates of surface can be calculated accurately using the values of obtained disparity map. On the fragments with objects edges this procedure will not work because the values of conformity function (4) will be high due to the difference between fragments samples of different objects. So, these image areas should be excluded from consideration. Moreover, this filter work efficiently, when count of error points on the fragment is low. In view of the above, the next section proposes a multistage technology with adaptation to the features of the various parts of the disparity map, ensuring the efficient correction of error points. 4. Overall technology of disparity map forming The technology consists of several main stages: 1. Disparity map forming using a method of conforming estimates; 2. Preliminary disparity image processing, single errors’ correction; 3. Object edges detection and binary template construction; 4. Error points correction on the areas with the low value of gradient function (on the objects surface). Let us describe briefly these stages of technology. At the first stage we form disparity map using method and algorithm which are described in [21, 22]. The main idea of this method was described in Section 2. Preliminary disparity image processing is performed to correct values of single error points. The algorithm is implemented by several iterations (by rows and columns) on the image. For each point of the image difference from neighbouring points (by vertical or horizontal) and the difference between neighbouring points are calculated. If the first difference is higher than threshold value and the second value is low, this point is chosen as an error point. Then, the value of the central pixel is replaced by half-sum of neighbouring pixels values. Similar procedure can be used to correct double, triple and etc. errors. To construct a binary template with detected edges, various methods can be used to find the differences, similarities [23] and image segmentation [24]. In this paper we use the simplest and widely used approach. We use a median filter to blur images and Sobel operator to form the field of gradients. Then we perform the contrasting and binarization procedures. At the last stage, we perform iterative error correction in the areas of disparity map with low value of gradient. We divide disparity image into disjoint fragments with size of P  Q and iteratively perform an error correction using method, which is described in Section 3. If the fragment contains objects edges, an error correction is not performed. When the fourth stage is performed, some clusters of failed points can be “transformed” into the single error points. Therefore, the second stage correction can be repeated without loss of accuracy. 5. Results of experiment The described technology was used to form a disparity map for a set of multi-view test images «Cones» [25]. Test images are presented at Figure 1. For these test images, the disparity map shown at Figure 2a was obtained using method and algorithm described in [21, 22]. Search window with the size of 5  7 and search area with the size of 1 65 were used. Then median intensity of pixels was set as 128. It can be seen that disparity map contains many points with the wrong disparity. The most of them, as expected, are in the area without any local features (textures, symbols etc.). V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 29 Data Science V A Fursov, Ye V Goshin and K G Pugachev Figure 2b shows a disparity map obtained after several iterations of single, double and triple error’s removal (Stage 2). Also, correction of big areas with errors was performed. We can see, that this disparity map is still consists of many errors. Moreover, most of these points are contained in the areas of the test images with a small value of gradient. a) b) a) left image; b) right image Figure 1. Test images. a) b) a) Obtained disparity map (first stage); b) Processed disparity map (second stage) Figure 2. Disparity maps. Figure 3a shows disparity map obtained using our error correction algorithm, which is described in Section 3. To compare the obtained results with the other methods and algorithms, we have constructed a disparity map using semi-global block matching method [26]. We have used the implementation of this method from OpenCV library [27] with parameters: 1) minDisparity=0; 2) numberOfDisparities=80; 3) SADWindowSize=5; 4) disp12MaxDiff=1; 5) preFilterCap=31; 6) uniquenessRatio=5; 7) speckleWindowSize=5; 8) speckleRange=5. Disparity map obtained by semi-global block matching is shown at Figure 3b. Figure 3c shows the ground-truth disparity map. It can be seen that semi-global matching smooth errors. However, this method also smooths edges of objects. V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 30 Data Science V A Fursov, Ye V Goshin and K G Pugachev a) b) c) a) Filtered disparity map (fourth stage); b) Disparity map, obtained using Semi-Global Block Matching; c) Ground-truth disparity map Figure 3. Disparity maps. Accuracy comparison for semi-global matching method and our performed technology was performed. For both methods an error of image matching was calculated using the formula:  1 N ,M 1    2  NM i , j 1 I i , j I i , j . Calculation was performed for points, where disparity value is known. So, for our proposed method error value is 5,22 before processing. After processing error value became 3,98. For semi-global matching error value is equal to 4,00. We can see that proposed method of error correction can significantly reduce the number of errors in the disparity map. 6. Conclusion A new method and algorithm of error correction on the areas with low value of pixel intensity gradient are proposed. The important feature is that an algorithm allows removing only point with the wrong disparity value. The other points of fragment are not affected. This feature is very useful for disparity maps formation, which affects the three-dimensional models calculation of coordinates. The constructed technology, including this algorithm, has a high accuracy. 7. References [1] Guk A P and Altsyntsev M A 2017 Automatic identification of corresponding points for aerial images of forest areas Vestnik of SSUGT 22 68-77 [2] Rostam A H and Haidi I 2016 Literature Survey on Stereo Vision Disparity Map Algorithms J. Sens. 8742920 V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 31 Data Science V A Fursov, Ye V Goshin and K G Pugachev [3] Wang L, Gong M, Gong M and Yang R 2006 How far can we go with local optimization in real-time stereo matching Proc. 3rd Int. Symp. 3D Data Proc. Vis. Trans. (Chapel Hill, NC, USA/ IEEE) 129-136 [4] Min D, Lu J and Do M N 2011 A revisit to cost aggregation in stereo matching: how far can we reduce its computational redundancy? Proc. IEEE Int. Conf. Comp.Vis. (ICCV ’11) (Barcelona, Spain/IEEE) 1567-1574 [5] Pham C C and Jeon J W 2013 Domain transformation-based efficient cost aggregation for local stereo matching IEEE Trans. Circ. Sys. Vid. Tech. 23 1119-1130 [6] Yang Q, Yang R, Davis J and Nistrer D 2007 Spatial-depth super resolution for range images Proc. IEEE Conf. Comp. Vis. Pat. Rec. (CVPR) (Minneapolis, Minn, USA/IEEE) 1-8 [7] Miron A, Ainouz S, Rogozan A and Bensrhair A 2014 A robust cost function for stereomatching of road scenes Pat. Rec. Let. 38 70-77 [8] Lee S H and Sharma S 2011 Real-time disparity estimation algorithm for stereo camera systems IEEE Trans. Cons. Electr. 57 1018-1026 [9] Gupta R K and Cho S Y 2013 Window-based approach for fast stereo correspondence IET Comp. Vis. 7 123-134 [10] Fusiello A, Castellani U and Murino V 2001 Relaxing symmetric multiple windows stereo using Markov Random Fields Computer Vision and Pattern Recognition of Lecture Notes in Computer Science 2134 91-105 [11] Yang R and Pollefeys M 2003 Multi-resolution real-time stereo on commodity graphics hardware Proc. IEEE Comp. Soc. Conf. Comp. Vis. Patt. Rec. 1 211-217 [12] Satoh S 2011 Simple low-dimensional features approximating NCC-based image matching Pat. Rec. Let. 32 1902-1911 [13] Cheng F, Zhang H, Yuan D and Sun M 2014 Stereo matching by using the global edge constraint Neurocomp. 131 217-226 [14] Gac N, Mancini S, Desvignes M and Houzet D 2008 High speed 3D tomography on CPU, GPU, and FPGA EURASIP J. Emb. Sys. 930250 [15] Zhao G, Du Y K and Tang Y D 2011 Anew extension of the rank transform for stereo matching Adv. Eng. Forum 2-3 182-187 [16] Mei X, Sun X, Zhou M, Jiao S, H. Wang and Zhang X 2011 On building an accurate stereo matching system on graphics hardware Proc. IEEE Int. Conf. Comp. Vis. Work. (Barcelona, Spain/IEEE) 467-474 [17] Cigla C and Alatan A A 2013 Information permeability for stereo matching Sig. Proc.: Im. Com. 28 1072-1088 [18] Zhang N, Wang H and Cr J 2013 A near real-time color stereo matching method for GPU Proc. 3rd Int. Conf. on Adv. Comm. Comp. (Lisbon, Portugal/ IARIA) 27-32 [19] Lee S, Lee J H, Lim J and Suh I H 2015 Robust stereo matching using adaptive random walk with restart algorithm Im. Vis. Comp. 37 1-11 [20] Pérez-Patricio M, Aguilar-González A, Camas-Anzueto J-L and Arias-Estrada M 2015 A Fuzzy Logic Approach for Stereo Matching Suited for Real-Time Proc. Int. J. Comp. App. 113 9 [21] Fursov V A, Gavrilov A V, Goshin Ye V and Pugachev K G 2018 The technology of image matching by the criterion of conformity of image fragments samples Inf. Tech. and Nanotech. (Samara, Russia/ New Engineering Ltd.) 2299-2305 [22] Fursov V A, Gavrilov A V, Goshin Ye V and Pugachev K G 2018 The technology of image matching by the criterion of conformity of image fragments samples J. Phys.: Conf. Ser. 1096 012084 [23] Plotnikov D E, Kolbudaev P A and Bartalev S A 2018 Identification of dynamically homogeneous areas with time series segmentation of remote sensing data Computer Optics 42(3) 447-456 DOI: 10.18287/2412-6179-2018-42-3-447-456 [24] Lebedev M A, Rubis A Yu, Vizilter Yu V and Vigolov O V 2018 Detecting image differences based on reference EMD-filters Computer Optics 42(2) 291-296 DOI: 10.18287/2412-6179- 2018-42-2-291-296 V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 32 Data Science V A Fursov, Ye V Goshin and K G Pugachev [25] University of Tsukuba Computer Vision Laboratory URL: http://www.cvlab.cs.tsukuba.ac.jp/ dataset/tsukubastereo.php [26] Hirschmüller H 2008 Accurate and efficient stereo processing by semi-global matching and mutual information CVPR, PAMI 30(2) 328-341 [27] OpenCV Library URL: http://opencv.org Acknowledgments The work was funded by the Russian Federation Ministry of Education and Science (project # 2.891.2017) and RFBR (projects # 17-29-03112). V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 33