=Paper=
{{Paper
|id=Vol-2623/paper6
|storemode=property
|title=Effective Object Localization in Images by Calculating Ratio and Distance Between Pixels
|pdfUrl=https://ceur-ws.org/Vol-2623/paper6.pdf
|volume=Vol-2623
|authors=Rykhard Bohush,Sergey Ablameyko,Yahor Adamovskiy,Dmitry Glukhov
|dblpUrl=https://dblp.org/rec/conf/intelitsis/BohushAAG20
}}
==Effective Object Localization in Images by Calculating Ratio and Distance Between Pixels==
Effective Object Localization in Images by Calculating Ratio and Distance Between Pixels Rykhard Bohush1 [0000-0002-6609-5810], Sergey Ablameyko2 [0000-0001-9404-1206], Yahor Adamovskiy1 [0000-0003-1044-8741], Dmitry Glukhov1 [0000-0003-4983-2919], 1Polotsk State University, Blokhina st., 29, Novopolotsk, Republic of Belarus, 211440 { r.bogush, adamovskiy.y, d.gluhov}@psu.by 2 Belarusian State University, Nezavisimosti avenue 4, Minsk, Republic of Belarus ablameyko@bsu.by Abstract. In this paper, two novel similarity functions which consider the spa- tial and brightness relations between pixels for object localization in images are presented. We explore different advantages of our functions and compare them to others that use only spatial connection between pixels. It is shown, that one of them is robust to linear change in pixel brightness levels of the compared im- ages. Comparison of computational cost and localization accuracy of shifted ob- ject for our similarity functions with others is given in the paper. The presented experimental results confirm the effectiveness of the proposed approach for ob- ject localization. Keywords: similarity functions, object localization, distance 1 Introduction Object localization is one of the critical tasks of computer vision. The task is used for the solution of applied problems, such as the automatic search for defects over images in industrial and medical diagnostics, the search for specified objects in inquiry and communication systems, the discovery and localization of bench marks in satellite Earth’s surface images, the establishment of correspondence between the conjugate points of two and more images during the procedure of their binding, the tracking of targets in airborne radar systems, etc. [1, 2]. For this reason, a lot of works are devot- ed to the development of methods for localization of objects on static images and video sequences. Localization is notably more challenging than image classification because it involves generation of precise object locations [3]. Furthermore, the magni- tude of the problem increases manifold in real-world situations where objects vary in position, scale and outlook, surrounded by cluttered scenes. Unfortunately, there currently does not exist an optimal distance measure or simi- larity function whose application would provide the maximum efficiency for object localization by different features in images. Therefore, traditional similarity metrics are improved or new functions are provided considering expansion of applications of image processing, and different similarity functions are selected for various applica- Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). IntelITSIS-2020 tions [4, 5]. Image recognition applications require more shift and rotation robust measures. But registration and tracking applications require better localization and noise tolerance In the general case for object detection and localization in an image, the similarity value is calculated for its features and the features of all the image fragments. The decision about the presence of an object is made by comparing the values with a threshold. If the comparison condition is met, the decision about the correspondence of a fragment to a standard is made. It is obvious that precise localization requires the similarity function value to exceed a threshold only for the correct location of an ob- ject. However, to provide the minimum probability of missing an object, it is neces- sary to decrease the threshold value, but this leads to an increase in the number of discovered image fragments, which do not correspond to the sought object, including the fragments near its correct location in an image. For this reason, the next step con- sists in determining the precise coordinates for the location of an object. Inaccuracy complicates the precise estimation of coordinates for the true location of an object due to an excess over the threshold function value for the standard and the set of neigh- bouring fragment values. Ambiguity is characterized by an excess over the similarity function threshold for a limited number of image fragments, which essentially overlap each other, at a certain distance between their centres. Therefore, the localization accuracy also depends on the similarity function. We propose a new similarity functions for object detection and localization in im- age and video based on ratio and distance calculation between pixels. These functions can be used for arbitrary features of images and form a normalized similarity value. It is shown that it is possible to improve the exact localization of objects. One of the functions is robust to a linear change of analysed features. 2 Similarity functions based on ratios or spatial relations between the pixels { } To compare two images O = oij and B = {bij } , N × N size, several similarity func- tions are often used. There exist some known functions, which calculate the ratios of compared features and are applied to estimate the similarity in other applied areas, e.g., environmental science [6]. Relationship calculation between descriptors will better emphasize local differences compared to subtraction. Among such functions are Wave Hedges, Ruzicka, and Czekanowski. The Wave Hedges function obtains an unnormalized value by calculating the ratio between the minimum and maximum values of all pairs with further summation of the obtained results over the entire space of compared features and can be calculated for a pair of images as: N N min (oij , bij ) S WHs = ∑ ∑ 1 − max(oij , bij ) (1) i =1 j =1 The Ruzicka function differs from the previous one in that the ratio is calculated not between each pair of compared features, but only once after the summation of their minimum and maximum values: ∑ ∑ min (oij , bij ) N N S Rzk = iN=1 jN=1 (2) ∑ ∑ max (oij , bij ) i =1 j =1 A specific trait of the Czekanowski function is the calculation of differences and sums for the corresponding pairs of features with further summation of obtained results and the estimation of the ratio between their final values. For compared images, this func- tion can be written as: N N ∑ ∑ oij − bij S Cz = i =1 j =1 (3) ∑ ∑ (oij + bij ) N N i =1 j =1 In [7] two similarity functions relationship based on calculation between the mini- mum and maximum values for all pairs of analyzed features are presented: ─ normalized minimax additive similarity p-function (MMADDP): 1 N N min (oij , bij ) p p p S MMADD = ∑∑ , p ∈ Z, p ≥ 1 NN i =1 j =1 max(oijp , bijp ) (4) ─ normalized zero-mean minimax additive similarity p-function (ZMMADDP): 1 N N b − b 2 ∑ ∑ , if oij − o ≥ bij − b ij P NN i =1 j =1 oij − o S ZMMADD = 2 (5) 1 N N oij − o ∑∑ , if oij − o < bij − b NN i =1 j =1 bij − b where o - image O mean value; b - image B mean value. Minimum or maximum attribute is necessary to determine when searching a relation- ship between them, therefore, the proposed functions are called normalized minimax similarity functions. High resistance to noise achieved through the use of the summa- tion when obtaining a complex normalized value. To decrease the sensitivity to the deformation and displacement of images, it is reasonable to take into account not only the brightness levels of corresponding pixels, but also the spatial distance between them. The authors [8] propose the approach, in which the function of the distance between all the pixels is calculated and used to determine the Euclidian distance for compared images: NN NN S IMED = 1 − ∑ ∑ g ij (oi − bi )(o j − b j ) (6) i =1 j =1 where G = {g ij } is the symmetric matrix, which has a size of NN × NN and is found by calculating the distances d between pixels Pi and Pj , which monotonically de- crease with increasing distance, and the elements of this matrix are determined with the Gaussian function: 1 − d (Pi , Pj )2 g ij = (7) 2πσ 2 2σ 2 It is obvious that this approach appreciably increases computational efforts. With this distance measure, smaller deformation causes smaller changes in the distance. In the paper [9], it is proposed to take into account the spatial distances between pixels to find the normalized correlation function: NN NN ∑ ∑ g ij oi b j S IMNCC = i =1 j =1 (8) NN NN NN NN ∑ ∑ g ij oi o j ∑ ∑ g ij bi b j i =1 j =1 i =1 j =1 The normalized averaged correlation function with consideration for the distance between pixels is: NN NN ( ∑ ∑ g ij oi − o bi − b i =1 j =1 )( ) S IMZNCC = (9) NN NN ( )( ) NN NN ∑ ∑ g ij oi − o o j − o ∑ ∑ g ij bi − b b j − b i =1 j =1 i =1 j =1 ( )( ) IMED, IMNCC and IMZNCC functions are smoother in comparison with the basic ones. 3 New similarity function based on ratio and distance A distinctive trait of the proposed two similarity functions is the calculations of the ratios between the minimum and maximum values for all the pair pixels and distance calculation between pixels. We use locally weighted normalized ratios for each pixel pair, where the weight is decreases with the increasing distance from the current pixel. If we change the elements of the matrix G , then the influence of distances on the result of similarity will also change. Then the image normalized minimax additive similarity function (IMMMADDP) is calculated as: p NN NN S IMMMADD = W −1 ∑ ∑ g ij ( ) ( p )p min oi , bi min o j , b j p p max (o , b )max (o , b ) p p p p (10) i =1 j =1 i i j j where g ij is defined as in IMED; W is the normalizing coefficient determined as: NN NN W = ∑ ∑ g ij (11) i =1 j =1 The normalized minimax averaged additive similarity function (IMZMMADDP) taking into account the distances between features is calculated as: NN NN − b − b p ∑ ∑ g ij bi b j , if oi − o ≥ bi − b & o j − o ≥ b j − b i =1 j =1 oi − o o j − o p NN NN bi − b o j − o ∑ ∑ g ij , if oi − o ≥ bi − b & b j − b > o j − o o −o b −b −1 i =1 j =1 S IMZMMADD p =W i j (12) p NN NN oi − o o j − o ∑ ∑ g ij , if bi − b > oi − o & b j − b > o j − o i =1 j =1 b −b b −b i j NN NN p ∑ ∑ g oi − o b j − b , if b − b > o − o & o − o ≥ b − b i =1 j =1 ij bi − b o j − o i i j j It is obvious that the presented functions are universal, as they make it possible to find a normalized value from (0) to (1) for randomly selected characteristics of a pair of images for IMMMADDP and value from (-1) to (1) for IMZMMADDP. Table 1 shows a comparison of computational costs for similarity functions for images with size N × N pixels. Presented functions are universal, as IMMMADDP returns a normalized value from (0) to (1) and IMZMMADDP returns a value from (-1) to (1) for randomly selected features of a pair of images. Table 1 shows a comparison of computational costs for similarity functions for images with size N × N pixels. Two sets are considered as an example: image B 4×4 size and its fragment O 3×4 size. For this data there are two combinations: coincidence and relative displace- ment by one pixel. Thus, in the first case, the maximum value of the functions is ex- pected. In the second case, the function value will show its resistance to displacement. For this object O , the matrix G will be an array of 16×16 elements, as the square of the larger side of the object O . Normalizing coefficient W = 67,285. 120 120 120 120 120 120 140 120 120 120 120 120 120 140 O= B= 100 100 100 100 100 100 120 80 80 80 80 80 80 100 1 0.606 0.135 0.011 0.606 0.367 0.082 0.006 0.135 0.082 0.018 0.001 0.011 0.006 0.001 10 -4 0.606 1 0.606 0.135 0.367 0.606 0.367 0.082 0.082 0.135 0.082 0.018 0.006 0.011 0.006 0.001 0.135 0.606 1 0.606 0.082 0.367 0.606 0.367 0.018 0.082 0.135 0.082 0.001 0.006 0.011 0.006 0.011 0.135 0.606 1 0.006 0.082 0.367 0.606 0.001 0.018 0.082 0.135 10 -4 0.001 0.006 0.011 0.606 0.367 0.082 0.006 1 0.606 0.135 0.011 0.606 0.367 0.082 0.006 0.135 0.082 0.018 0.001 0.367 0.606 0.367 0.082 0.606 1 0.606 0.135 0.367 0.606 0.367 0.082 0.082 0.135 0.082 0.018 0.082 0.367 0.606 0.367 0.135 0.606 1 0.606 0.082 0.367 0.606 0.367 0.018 0.082 0.135 0.082 G = 0.006 0.082 0.367 0.606 0.011 0.135 0.606 1 0.006 0.082 0.367 0.606 0.001 0.018 0.082 0.135 0.135 0.082 0.018 0.001 0.606 0.367 0.082 0.006 1 0.606 0.135 0.011 0.606 0.367 0.082 0.006 0.082 0.135 0.082 0.018 0.367 0.606 0.367 0.082 0.606 1 0.606 0.135 0.367 0.606 0.367 0.082 0.018 0.082 0.135 0.082 0.082 0.367 0.606 0.367 0.135 0.606 1 0.606 0.082 0.367 0.606 0.367 0.001 0.018 0.082 0.135 0.006 0.082 0.367 0.606 0.011 0.135 0.606 1 0.006 0.082 0.367 0.606 0.011 0.006 0.001 10 -4 0.135 0.082 0.018 0.001 0.606 0.367 0.082 0.006 1 0.606 0.135 0.011 0.006 0.011 0.006 0.001 0.082 0.135 0.082 0.018 0.367 0.606 0.367 0.082 0.606 1 0.606 0.135 0.001 0.006 0.011 0.006 0.018 0.082 0.135 0.082 0.082 0.367 0.606 0.367 0.135 0.606 1 0.606 10 -4 0.001 0.006 0.011 0.001 0.018 0.082 0.135 0.006 0.082 0.367 0.606 0.011 0.135 0.606 1 For this data there are coincidence and relative displacement by one pixel. For this data, it is possible to coincide O with first subimage of B on the left or second sub- image after shifting to left one pixel. Thus, in the first case, the maximum value of the functions is expected. The function value for second subimage should differ as much as possible from maximal value and for this example (p=1): IMED=0,972; IMNCC=0,999; IMZNCC=0,909; MMADDP=0,9456; ZMMADDP=0,465; P P IMMMADD =0,906; IMZMMADD =0,23. We see that proposed similarity functions can allow us to obtain lowest similarity values when shift in image by one pixel ob- ject. Table 1 shows that the consideration for the spatial distance between features leads to an appreciable increase in computational expenditures from O(N2) to O(N4) for all the functions, which use a similar approach. Table 1. Computational costs in estimating the similarity between two images Number of addition / Number of multi- Number of Function type subtraction opera- plication / divi- comparison tions sion operations operations Wave Hedges N2+N(N-1) N2 N2 Ruzika 2N(N-1) N2 - Czekanowski 2N(N-1) 1 N2 IMED 2N2+N2(N2-1)+1 2N4+1 - IMNCC 3N2(N2-1) 6N4+3 - 2 2 2 4 IMZNCC 3N (N -1)+6N 6N +7 MMADDp (p=2) N(N-1) 3N2+2 N2 ZMMADDp (p=1) 2N2+3N(N-1) N2+4 N2 IMMMADDp (p=1) N2(N2-1) 2N4+2N2+3 2N2 IMZMMADDp (p=1) N2(N2-1)+2N(N-1) 2N4+2N2+6 N2 4 Experimental results For the similarity functions analytical assessment when detecting and localizing ob- jects in image the following parameters are used: ─ function value calculated for object and subimage ( A – main peak). Value should aim to 1; ─ main peak variance ( D A ). The value should aim to zero, meaning smaller value deviations A from expected value; ─ amplitude coefficient ( Q ). This parameter will provide the possibility to determine the maximum amplitude outlier of the similarity function with respect to the mean- square value S rms when comparing the object O with all subimages: S max Q= (13) S rms where S max is the maximum value of the similarity function; ─ function maximum value from all side peak ( S L ) can be used to threshold T de- termine, which should be less than T . ─ maximum side peak variance ( DS L ) should also be as less as possible; ─ side peak number at levels higher than 0.95 ( N S L ). The parameter can be used to evaluate the possible false-positive detection results number if the threshold value is incorrectly selected. All parameters were calculated for 20 different images without noise and distortion of 150 × 150 size. For each of them 20 reference objects of 15×15 size were used with obtained values averaging. Fig. 1 shows some examples of images and object. Values A =1 and D A =1 are obtained for all similarity functions. The resulting val- ues of the remaining parameters are given in Table 2. Table 2. Function characteristics Function type Q SL DS L N SL Wave Hedges 1,51715 0,859 0,00055 277 Ruzika 1,52919 0,85987 0,00077 256 Czekanowski 1,31457 0,86406 0,00152 1644 IMED 1,19798 0,94581 0,00027 554 IMNCC 1,06136 0,9921 0,00005 10396 IMZNCC 4,20704 0,86102 0,00293 3 MMADDp(p=2) 2,16488 0,64973 0,00496 263 ZMMADDp(p=1) 11,54347 0,47403 0,01489 6 IMMMADDp (p=2) 2,22317 0,7404 0,00133 63 IMZMMADDp (p=1) 7,63841 0,3079 0,01148 337 a) b) c) d) e) f) Fig. 1. Some test images: (a-c) input images; (d-f) object templates The dependences of the similarity value on the shift of objects with respect to the central location for the studied functions are shown in Fig. 2. These values were found by averaging the results for 10 templates from the image shown in Fig. 1a. Fig. 2. Dependence of the similarity functions values on the shift of objects As a result, the analysis of the Table 2 and Fig. 2 shows that the proposed IMMMADDP and IMZMMADDP functions have the best characteristics for accurate- ly determining object coordinates in images. For them, amplitude coefficient is the largest among all functions, and function maximum value from all side peak is the minimum. Other parameters are satisfactory and less important. This means that func- tion level is more likely to not exceed the specified threshold and localization will be more accurate. Some 2D similarity functions for image (Fig. 1a) and object (Fig. 1d) are present- ed in Fig. 3. a) b) c) d) e) f) Fig. 3. Object and all subimages similarity for Fig. 1a and 1d based on: (a) Ceanowski function; (b) Wave Hedges function; (c) MMADDP; (d) IMMMADDP; (e) ZMMADDP; (f) IMZMMADDP Visual analysis of Fig. 3e and 3f shows that IMZMMADDP main peak is more con- trasting than ZMMADDP main peak, however, according to Table 2, and ZMMADDP has a larger Q value. This contradiction is explained by the behavior of the functions outside of the object location: ZMMADDP function has wave-like shape, which re- duces its square value, and IMZMMADDP save uniformity in almost the all range of definition. 5 Conclusion Two new similarity functions for object detection and localization in image and video based on ratio and distance calculation between pixels is presented. Functions have good characteristics for accurately determining object coordinates in images. One limitation of is that consideration for the spatial distance between features leads to an appreciable increase in computational costs. Our future directions for this area re- search are construction efficient algorithm to reduce computations and estimate of stability of functions to noise. References 1. Tsechpenakis G., Xirouhakis Y., Delopoulos A. Main Mobile Object Detection and Local- ization in Video Sequences Advances in Visual Inf. Syst. Lect. Not. in Comp. Sc. (1929) pp. 84-95 (2000) 2. Long Y., Gong Y., Xiao Z., LiuAccurate Q.: Object Localization in Remote Sensing Im- ages Based on Convolutional Neural Networks IEEE Trans. on Geosc.e and Remote Sens. May 55 (5) pp. 2486-2498 (2017) 3. Choudhuri S., Das N., Sarkhel R. and Nasipuri M.: Object Localization on Natural Scenes: A Survey Int. J. Patt. Recogn. Artif. Intell. 32 (2) (2017) 4. Lv G.: A novel similarity measure for matching local image descriptors IEEE Access 6 pp 55315–25 (2018) 5. Kim J., Hyun Ch., Han H., Kim H.: Evaluation of Matching Costs for High-Quality Sea- Ice Surface Reconstruction from Aerial Images Remote Sensing 11 (9) pp 1055-72 (2019) 6. M. M. Deza and E. Deza, “Encyclopedia of Distances”, Berlin, 2009. 7. Bohush R., Ablameyko S., Adamovskiy Y. Robust object detection in images corrupted by impulse noise Workshop on Computer Modeling and Intelligent Systems, Zaporizhzhia, Ukraine, April 27- May 1 (2020) 8. Wang L., Zhang Y., Feng J. On the Euclidean Distance of Images IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (8) pp 1334–39 (2005) 9. Nakhmani A., Tannenbaum A.: New Distance Measure Based on Generalized Image Normalized Cross-Correlation for Robust Video Tracking and Image Recognition Pattern Recogn. Lett. 34 (3) pp 315–21 (2013)