=Paper= {{Paper |id=Vol-2623/paper6 |storemode=property |title=Effective Object Localization in Images by Calculating Ratio and Distance Between Pixels |pdfUrl=https://ceur-ws.org/Vol-2623/paper6.pdf |volume=Vol-2623 |authors=Rykhard Bohush,Sergey Ablameyko,Yahor Adamovskiy,Dmitry Glukhov |dblpUrl=https://dblp.org/rec/conf/intelitsis/BohushAAG20 }} ==Effective Object Localization in Images by Calculating Ratio and Distance Between Pixels== https://ceur-ws.org/Vol-2623/paper6.pdf
    Effective Object Localization in Images by Calculating
              Ratio and Distance Between Pixels

        Rykhard Bohush1 [0000-0002-6609-5810], Sergey Ablameyko2 [0000-0001-9404-1206],
        Yahor Adamovskiy1 [0000-0003-1044-8741], Dmitry Glukhov1 [0000-0003-4983-2919],
     1Polotsk State University, Blokhina st., 29, Novopolotsk, Republic of Belarus, 211440

                { r.bogush, adamovskiy.y, d.gluhov}@psu.by
      2 Belarusian State University, Nezavisimosti avenue 4, Minsk, Republic of Belarus

                                   ablameyko@bsu.by



       Abstract. In this paper, two novel similarity functions which consider the spa-
       tial and brightness relations between pixels for object localization in images are
       presented. We explore different advantages of our functions and compare them
       to others that use only spatial connection between pixels. It is shown, that one
       of them is robust to linear change in pixel brightness levels of the compared im-
       ages. Comparison of computational cost and localization accuracy of shifted ob-
       ject for our similarity functions with others is given in the paper. The presented
       experimental results confirm the effectiveness of the proposed approach for ob-
       ject localization.

       Keywords: similarity functions, object localization, distance


1      Introduction

Object localization is one of the critical tasks of computer vision. The task is used for
the solution of applied problems, such as the automatic search for defects over images
in industrial and medical diagnostics, the search for specified objects in inquiry and
communication systems, the discovery and localization of bench marks in satellite
Earth’s surface images, the establishment of correspondence between the conjugate
points of two and more images during the procedure of their binding, the tracking of
targets in airborne radar systems, etc. [1, 2]. For this reason, a lot of works are devot-
ed to the development of methods for localization of objects on static images and
video sequences. Localization is notably more challenging than image classification
because it involves generation of precise object locations [3]. Furthermore, the magni-
tude of the problem increases manifold in real-world situations where objects vary in
position, scale and outlook, surrounded by cluttered scenes.
   Unfortunately, there currently does not exist an optimal distance measure or simi-
larity function whose application would provide the maximum efficiency for object
localization by different features in images. Therefore, traditional similarity metrics
are improved or new functions are provided considering expansion of applications of
image processing, and different similarity functions are selected for various applica-
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0). IntelITSIS-2020
tions [4, 5]. Image recognition applications require more shift and rotation robust
measures. But registration and tracking applications require better localization and
noise tolerance
   In the general case for object detection and localization in an image, the similarity
value is calculated for its features and the features of all the image fragments. The
decision about the presence of an object is made by comparing the values with a
threshold. If the comparison condition is met, the decision about the correspondence
of a fragment to a standard is made. It is obvious that precise localization requires the
similarity function value to exceed a threshold only for the correct location of an ob-
ject. However, to provide the minimum probability of missing an object, it is neces-
sary to decrease the threshold value, but this leads to an increase in the number of
discovered image fragments, which do not correspond to the sought object, including
the fragments near its correct location in an image. For this reason, the next step con-
sists in determining the precise coordinates for the location of an object. Inaccuracy
complicates the precise estimation of coordinates for the true location of an object due
to an excess over the threshold function value for the standard and the set of neigh-
bouring fragment values. Ambiguity is characterized by an excess over the similarity
function threshold for a limited number of image fragments, which essentially overlap
each other, at a certain distance between their centres. Therefore, the localization
accuracy also depends on the similarity function.
   We propose a new similarity functions for object detection and localization in im-
age and video based on ratio and distance calculation between pixels. These functions
can be used for arbitrary features of images and form a normalized similarity value. It
is shown that it is possible to improve the exact localization of objects. One of the
functions is robust to a linear change of analysed features.


2      Similarity functions based on ratios or spatial relations
       between the pixels

                               { }
To compare two images O = oij and B = {bij } , N × N size, several similarity func-
tions are often used. There exist some known functions, which calculate the ratios of
compared features and are applied to estimate the similarity in other applied areas,
e.g., environmental science [6]. Relationship calculation between descriptors will
better emphasize local differences compared to subtraction. Among such functions are
Wave Hedges, Ruzicka, and Czekanowski.
   The Wave Hedges function obtains an unnormalized value by calculating the ratio
between the minimum and maximum values of all pairs with further summation of the
obtained results over the entire space of compared features and can be calculated for a
pair of images as:

                                    N N       min (oij , bij ) 
                           S WHs = ∑ ∑ 1 −                     
                                              max(oij , bij ) 
                                                                                     (1)
                                   i =1 j =1                   
The Ruzicka function differs from the previous one in that the ratio is calculated not
between each pair of compared features, but only once after the summation of their
minimum and maximum values:

                                             ∑ ∑ min (oij , bij )
                                                 N       N


                                  S Rzk = iN=1 jN=1                                 (2)
                                          ∑ ∑ max (oij , bij )
                                             i =1 j =1



A specific trait of the Czekanowski function is the calculation of differences and sums
for the corresponding pairs of features with further summation of obtained results and
the estimation of the ratio between their final values. For compared images, this func-
tion can be written as:
                                                     N       N
                                                 ∑ ∑ oij − bij
                                    S   Cz
                                             =       i =1 j =1
                                                                                    (3)
                                                 ∑ ∑ (oij + bij )
                                                     N N

                                                 i =1 j =1



In [7] two similarity functions relationship based on calculation between the mini-
mum and maximum values for all pairs of analyzed features are presented:

─ normalized minimax additive similarity p-function (MMADDP):

                                   1 N N min (oij , bij )
                                                                 p   p
                            p
                    S MMADD =        ∑∑                         , p ∈ Z, p ≥ 1
                                  NN i =1 j =1 max(oijp , bijp )
                                                                                    (4)


─ normalized zero-mean minimax additive similarity p-function (ZMMADDP):

                            1 N N  b − b 2
                                ∑ ∑                  , if oij − o ≥ bij − b
                                               ij


                        P  
                             NN i =1 j =1
                                             oij
                                                  − o 
                S ZMMADD =                              2
                                                                                    (5)
                            1 N N  oij − o 
                                ∑∑                      , if oij − o < bij − b
                            NN i =1 j =1  bij − b 
where o - image O mean value; b - image B mean value.
Minimum or maximum attribute is necessary to determine when searching a relation-
ship between them, therefore, the proposed functions are called normalized minimax
similarity functions. High resistance to noise achieved through the use of the summa-
tion when obtaining a complex normalized value.
   To decrease the sensitivity to the deformation and displacement of images, it is
reasonable to take into account not only the brightness levels of corresponding pixels,
but also the spatial distance between them. The authors [8] propose the approach, in
which the function of the distance between all the pixels is calculated and used to
determine the Euclidian distance for compared images:
                                                  NN NN
                           S IMED = 1 − ∑ ∑ g ij (oi − bi )(o j − b j )                                (6)
                                                  i =1 j =1



where G = {g ij } is the symmetric matrix, which has a size of NN × NN and is found
by calculating the distances d between pixels Pi and Pj , which monotonically de-
crease with increasing distance, and the elements of this matrix are determined with
the Gaussian function:

                                                   1    − d (Pi , Pj )2 
                                     g ij =                                                          (7)
                                                2πσ 2      2σ 2        
                                                                         

   It is obvious that this approach appreciably increases computational efforts. With
this distance measure, smaller deformation causes smaller changes in the distance.
   In the paper [9], it is proposed to take into account the spatial distances between
pixels to find the normalized correlation function:
                                                              NN NN
                                                              ∑ ∑ g ij oi b j
                               S IMNCC =                      i =1 j =1
                                                                                                       (8)
                                                 NN NN                    NN NN
                                                 ∑ ∑ g ij oi o j ∑ ∑ g ij bi b j
                                                  i =1 j =1               i =1 j =1



   The normalized averaged correlation function with consideration for the distance
between pixels is:
                                                  NN NN
                                                                   (
                                                  ∑ ∑ g ij oi − o bi − b
                                                  i =1 j =1
                                                                               )(         )
              S   IMZNCC
                           =                                                                           (9)
                                NN NN
                                            (             )(              )
                                                                          NN NN
                                ∑ ∑ g ij oi − o o j − o ∑ ∑ g ij bi − b b j − b
                                i =1 j =1                                 i =1 j =1
                                                                                      (       )(   )
  IMED, IMNCC and IMZNCC functions are smoother in comparison with the basic
ones.


3      New similarity function based on ratio and distance

A distinctive trait of the proposed two similarity functions is the calculations of the
ratios between the minimum and maximum values for all the pair pixels and distance
calculation between pixels. We use locally weighted normalized ratios for each pixel
pair, where the weight is decreases with the increasing distance from the current pixel.
If we change the elements of the matrix G , then the influence of distances on the
result of similarity will also change.
   Then the image normalized minimax additive similarity function (IMMMADDP) is
calculated as:
                              p        NN NN
                    S IMMMADD = W −1 ∑ ∑ g ij
                                                       ( ) (       p
                                                                          )p
                                                   min oi , bi min o j , b j
                                                                                   p       p



                                                   max (o , b )max (o , b )
                                                                   p       p       p       p
                                                                                                    (10)
                                       i =1 j =1
                                                               i       i       j       j



where g ij is defined as in IMED; W is the normalizing coefficient determined as:

                                                   NN NN
                                           W = ∑ ∑ g ij                                             (11)
                                                   i =1 j =1



   The normalized minimax averaged additive similarity function (IMZMMADDP)
taking into account the distances between features is calculated as:

                      NN NN  − b − b  p
                      ∑ ∑ g ij  bi b j              , if oi − o ≥ bi − b & o j − o ≥ b j − b
                      i =1 j =1  oi − o o j − o 
                                                      p
                      NN NN  bi − b o j − o 
                     ∑     ∑ g ij                     , if oi − o ≥ bi − b & b j − b > o j − o
                                    o −o b −b 
                  −1 
                       i =1 j =1
   S IMZMMADD p
                =W                   i       j                                                   (12)
                                                       p
                      NN NN  oi − o o j − o 
                     ∑     ∑ g ij                       , if bi − b > oi − o & b j − b > o j − o
                       i =1 j =1    b −b b −b 
                                    i        j     
                      NN NN                        
                                                       p

                      ∑ ∑ g  oi − o b j − b  , if b − b > o − o & o − o ≥ b − b
                      i =1 j =1 ij  bi − b o j − o          i        i         j         j

                                                   

   It is obvious that the presented functions are universal, as they make it possible to
find a normalized value from (0) to (1) for randomly selected characteristics of a pair
of images for IMMMADDP and value from (-1) to (1) for IMZMMADDP. Table 1
shows a comparison of computational costs for similarity functions for images with
size N × N pixels.
   Presented functions are universal, as IMMMADDP returns a normalized value from
(0) to (1) and IMZMMADDP returns a value from (-1) to (1) for randomly selected
features of a pair of images. Table 1 shows a comparison of computational costs for
similarity functions for images with size N × N pixels.
   Two sets are considered as an example: image B 4×4 size and its fragment O
3×4 size. For this data there are two combinations: coincidence and relative displace-
ment by one pixel. Thus, in the first case, the maximum value of the functions is ex-
pected. In the second case, the function value will show its resistance to displacement.
   For this object O , the matrix G will be an array of 16×16 elements, as the square
of the larger side of the object O . Normalizing coefficient W = 67,285.
                        120 120 120                  120 120 120 140
                        120 120 120                 120 120 120 140
                 O=                            B=                       
                       100 100 100                  100 100 100 120
                        80 80 80                   80 80 80 100
     1     0.606 0.135 0.011 0.606 0.367 0.082 0.006 0.135 0.082 0.018 0.001 0.011 0.006 0.001 10 -4 
                                                                                                     
     0.606   1   0.606 0.135 0.367 0.606 0.367 0.082 0.082 0.135 0.082 0.018 0.006 0.011 0.006 0.001
    0.135 0.606    1   0.606 0.082 0.367 0.606 0.367 0.018 0.082 0.135 0.082 0.001 0.006 0.011 0.006
     0.011 0.135 0.606   1   0.006 0.082 0.367 0.606 0.001 0.018 0.082 0.135 10 -4 0.001 0.006 0.011
                                                                                                     
    0.606 0.367 0.082 0.006    1   0.606 0.135 0.011 0.606 0.367 0.082 0.006 0.135 0.082 0.018 0.001
    0.367 0.606 0.367 0.082 0.606    1   0.606 0.135 0.367 0.606 0.367 0.082 0.082 0.135 0.082 0.018
    0.082 0.367 0.606 0.367 0.135 0.606    1   0.606 0.082 0.367 0.606 0.367 0.018 0.082 0.135 0.082
                                                                                                     
G = 0.006 0.082 0.367 0.606 0.011 0.135 0.606    1   0.006 0.082 0.367 0.606 0.001 0.018 0.082 0.135
    0.135 0.082 0.018 0.001 0.606 0.367 0.082 0.006    1   0.606 0.135 0.011 0.606 0.367 0.082 0.006
    0.082 0.135 0.082 0.018 0.367 0.606 0.367 0.082 0.606    1   0.606 0.135 0.367 0.606 0.367 0.082
                                                                                                     
    0.018 0.082 0.135 0.082 0.082 0.367 0.606 0.367 0.135 0.606    1   0.606 0.082 0.367 0.606 0.367 
     0.001 0.018 0.082 0.135 0.006 0.082 0.367 0.606 0.011 0.135 0.606   1   0.006 0.082 0.367 0.606
     0.011 0.006 0.001 10 -4 0.135 0.082 0.018 0.001 0.606 0.367 0.082 0.006   1   0.606 0.135 0.011
    
     0.006 0.011 0.006 0.001 0.082 0.135 0.082 0.018 0.367 0.606 0.367 0.082 0.606   1   0.606 0.135
     0.001 0.006 0.011 0.006 0.018 0.082 0.135 0.082 0.082 0.367 0.606 0.367 0.135 0.606   1   0.606
     10 -4 0.001 0.006 0.011 0.001 0.018 0.082 0.135 0.006 0.082 0.367 0.606 0.011 0.135 0.606   1 
    
   For this data there are coincidence and relative displacement by one pixel. For this
data, it is possible to coincide O with first subimage of B on the left or second sub-
image after shifting to left one pixel. Thus, in the first case, the maximum value of the
functions is expected. The function value for second subimage should differ as much
as possible from maximal value and for this example (p=1): IMED=0,972;
IMNCC=0,999;           IMZNCC=0,909;        MMADDP=0,9456;             ZMMADDP=0,465;
              P                        P
IMMMADD =0,906; IMZMMADD =0,23. We see that proposed similarity functions
can allow us to obtain lowest similarity values when shift in image by one pixel ob-
ject.
   Table 1 shows that the consideration for the spatial distance between features leads
to an appreciable increase in computational expenditures from O(N2) to O(N4) for all
the functions, which use a similar approach.

         Table 1. Computational costs in estimating the similarity between two images

                             Number of addition /          Number of multi-           Number of
    Function type             subtraction opera-            plication / divi-         comparison
                                    tions                   sion operations            operations
Wave Hedges                    N2+N(N-1)                     N2                        N2
Ruzika                         2N(N-1)                       N2                        -
Czekanowski                    2N(N-1)                       1                         N2
IMED                           2N2+N2(N2-1)+1                2N4+1                     -
IMNCC                           3N2(N2-1)                     6N4+3                    -
                                   2   2         2               4
IMZNCC                          3N (N -1)+6N                  6N +7
MMADDp (p=2)                    N(N-1)                        3N2+2                    N2
ZMMADDp (p=1)                   2N2+3N(N-1)                   N2+4                     N2
IMMMADDp (p=1)                  N2(N2-1)                      2N4+2N2+3                2N2
IMZMMADDp (p=1)                 N2(N2-1)+2N(N-1)              2N4+2N2+6                N2
4       Experimental results

For the similarity functions analytical assessment when detecting and localizing ob-
jects in image the following parameters are used:
─ function value calculated for object and subimage ( A – main peak). Value should
  aim to 1;
─ main peak variance ( D A ). The value should aim to zero, meaning smaller value
  deviations A from expected value;
─ amplitude coefficient ( Q ). This parameter will provide the possibility to determine
  the maximum amplitude outlier of the similarity function with respect to the mean-
  square value S rms when comparing the object O with all subimages:

                                            S max
                                       Q=                                          (13)
                                            S rms

    where S max is the maximum value of the similarity function;
─ function maximum value from all side peak ( S L ) can be used to threshold T de-
  termine, which should be less than T .
─ maximum side peak variance ( DS L ) should also be as less as possible;
─ side peak number at levels higher than 0.95 ( N S L ). The parameter can be used to
    evaluate the possible false-positive detection results number if the threshold value
    is incorrectly selected.
All parameters were calculated for 20 different images without noise and distortion of
150 × 150 size. For each of them 20 reference objects of 15×15 size were used with
obtained values averaging. Fig. 1 shows some examples of images and object.
  Values A =1 and D A =1 are obtained for all similarity functions. The resulting val-
ues of the remaining parameters are given in Table 2.

                             Table 2. Function characteristics

     Function type                 Q                SL              DS L         N SL
Wave Hedges                   1,51715            0,859           0,00055       277
Ruzika                        1,52919           0,85987          0,00077       256
Czekanowski                   1,31457           0,86406          0,00152      1644
IMED                          1,19798           0,94581          0,00027       554
IMNCC                         1,06136           0,9921           0,00005      10396
IMZNCC                        4,20704           0,86102          0,00293        3
MMADDp(p=2)                   2,16488           0,64973          0,00496       263
ZMMADDp(p=1)                  11,54347          0,47403          0,01489        6
IMMMADDp (p=2)                2,22317           0,7404           0,00133       63
IMZMMADDp (p=1)               7,63841           0,3079           0,01148       337
                a)                            b)                            c)




                d)                            e)                            f)
             Fig. 1. Some test images: (a-c) input images; (d-f) object templates

The dependences of the similarity value on the shift of objects with respect to the
central location for the studied functions are shown in Fig. 2. These values were
found by averaging the results for 10 templates from the image shown in Fig. 1a.




         Fig. 2. Dependence of the similarity functions values on the shift of objects

   As a result, the analysis of the Table 2 and Fig. 2 shows that the proposed
IMMMADDP and IMZMMADDP functions have the best characteristics for accurate-
ly determining object coordinates in images. For them, amplitude coefficient is the
largest among all functions, and function maximum value from all side peak is the
minimum. Other parameters are satisfactory and less important. This means that func-
tion level is more likely to not exceed the specified threshold and localization will be
more accurate.
   Some 2D similarity functions for image (Fig. 1a) and object (Fig. 1d) are present-
ed in Fig. 3.




                      a)                                               b)




                      c)                                               d)




                      e)                                               f)
Fig. 3. Object and all subimages similarity for Fig. 1a and 1d based on: (a) Ceanowski function;
(b) Wave Hedges function; (c) MMADDP; (d) IMMMADDP; (e) ZMMADDP; (f)
IMZMMADDP
Visual analysis of Fig. 3e and 3f shows that IMZMMADDP main peak is more con-
trasting than ZMMADDP main peak, however, according to Table 2, and ZMMADDP
has a larger Q value. This contradiction is explained by the behavior of the functions
outside of the object location: ZMMADDP function has wave-like shape, which re-
duces its square value, and IMZMMADDP save uniformity in almost the all range of
definition.


5      Conclusion

Two new similarity functions for object detection and localization in image and video
based on ratio and distance calculation between pixels is presented. Functions have
good characteristics for accurately determining object coordinates in images. One
limitation of is that consideration for the spatial distance between features leads to an
appreciable increase in computational costs. Our future directions for this area re-
search are construction efficient algorithm to reduce computations and estimate of
stability of functions to noise.


References
 1. Tsechpenakis G., Xirouhakis Y., Delopoulos A. Main Mobile Object Detection and Local-
    ization in Video Sequences Advances in Visual Inf. Syst. Lect. Not. in Comp. Sc. (1929)
    pp. 84-95 (2000)
 2. Long Y., Gong Y., Xiao Z., LiuAccurate Q.: Object Localization in Remote Sensing Im-
    ages Based on Convolutional Neural Networks IEEE Trans. on Geosc.e and Remote
    Sens. May 55 (5) pp. 2486-2498 (2017)
 3. Choudhuri S., Das N., Sarkhel R. and Nasipuri M.: Object Localization on Natural Scenes:
    A Survey Int. J. Patt. Recogn. Artif. Intell. 32 (2) (2017)
 4. Lv G.: A novel similarity measure for matching local image descriptors IEEE Access 6 pp
    55315–25 (2018)
 5. Kim J., Hyun Ch., Han H., Kim H.: Evaluation of Matching Costs for High-Quality Sea-
    Ice Surface Reconstruction from Aerial Images Remote Sensing 11 (9) pp 1055-72 (2019)
 6. M. M. Deza and E. Deza, “Encyclopedia of Distances”, Berlin, 2009.
 7. Bohush R., Ablameyko S., Adamovskiy Y. Robust object detection in images corrupted by
    impulse noise Workshop on Computer Modeling and Intelligent Systems, Zaporizhzhia,
    Ukraine, April 27- May 1 (2020)
 8. Wang L., Zhang Y., Feng J. On the Euclidean Distance of Images IEEE Transactions on
    Pattern Analysis and Machine Intelligence 27 (8) pp 1334–39 (2005)
 9. Nakhmani A., Tannenbaum A.: New Distance Measure Based on Generalized Image
    Normalized Cross-Correlation for Robust Video Tracking and Image Recognition Pattern
    Recogn. Lett. 34 (3) pp 315–21 (2013)