=Paper= {{Paper |id=Vol-2665/paper33 |storemode=property |title=Deformation field estimate for image sequence by applying stochastic adaptation in the block method |pdfUrl=https://ceur-ws.org/Vol-2665/paper33.pdf |volume=Vol-2665 |authors=Roman Kovalenko,Pavel Smirnov,Radik Ibragimov,Alexander Tashlinskiy }} ==Deformation field estimate for image sequence by applying stochastic adaptation in the block method == https://ceur-ws.org/Vol-2665/paper33.pdf
 Deformation Field Estimate for Image Sequence by
    Applying Stochastic Adaptation in the Block
                     Method
                       Roman Kovalenko                                                                        Pavel Smirnov
                 Radio Engineering Department                                                                      Ventra
              Ulyanovsk State Technical University                                                            Moscow, Russia
                       Ulyanovsk, Russia                                                                       rtcis@mail.ru
                   r.kovalenko.o@gmail.com

                       Radik Ibragimov                                                               Alexander Tashlinskiy
                 Radio Engineering Department                                                    Radio Engineering Department
              Ulyanovsk State Technical University                                            Ulyanovsk State Technical University
                       Ulyanovsk, Russia                                                               Ulyanovsk, Russia
                   ibragimow.it@gmail.com                                                                tag@ulstu.ru



    Abstract—The paper researches the block method based on                  Methods in the frequency domain have high computational
stochastic adaptation, which is used to estimate the                         complexity, so in practice, they are used much less often than
deformation field of the image sequence. The similarity model                methods of the spatial domain.
was selected as the deformation model. The method was
implemented for two target functions: the mean square inter-                     There are different approaches to detect the area of a
frame difference and the inter-frame correlation coefficient.                moving object in the spatial domain: methods based on inter-
The result of the proposed method was compared with the                      frame difference estimation[4, 5], background subtraction [4,
Motion Vector Field Adaptive Search Technique. The                           6], statistical [5, 7], block method [8], optical flow analysis
proposed method has a high noise resistance and allows one to                [9, 10]. In this paper, we develop an algorithm based on a
reduce the influence of global inter-frame geometric changes.                block method.
   Keywords—stochastic adaptation, mean square difference,                                        II. PROBLEM STATEMENT
correlation coefficient, image sequence, block method,                           Most methods for the deformation field H estimation
deformation field.
                                                                             use inter-frame image processing. In this case, the image of a
                        I. INTRODUCTION                                      moving object can be represented as some region or regions
                                                                             of the current image having inter-frame geometric changes
    Detection of the area of a moving object is usually used                 (IGC). Thus dividing the image into nonoverlapping areas
in machine vision systems for highlighting the areas of                      (blocks) and estimation their inter-frame deformation
interest in images and subsequent analysis improvement. The                  parameters, the deformation field H will be obtained. The
task of detecting a moving object for complex cases has not                  obtained field is used to determine image blocks that
yet received a general solution. The complexity of this task is              correspond to a moving object, e.g. by using a threshold.
caused by the possibility of various dynamic changes in the                  This approach corresponds to the general principle of block
scene (smooth, sharp or local changes in lighting conditions,                methods for detecting motion [11], which are based on
weather changes, repetitive movement, etc.). A more                          finding the corresponding location of blocks of the current
complex case can be observed when the background is                          (deformed) frame on the previous (reference) frame. To do
similar to a moving object. Therefore, the development of                    this, the current frame Z t of the image sequence is divided
algorithms analyzing scene movement in difficult conditions
                                                                             into many nonoverlapping blocks B i , j , where i , j is the
remains a relevant subject.
                                                                             block center coordinates. The size of blocks is selected based
    The task of detecting the area of a moving object is                     on the size of objects whose movement needs to be detected.
considered as the task of dividing image pixels into two
                                                                             The solution comes down to finding the motion vector h i , j
groups: background and foreground, where the foreground is
the moving object. The foreground may consist of one or                      of each block B i , j on frame Z t  1 .
several objects. In both cases, the foreground objects must be
detected, and if there are several objects, the moving objects                               h i , j  arg  extremum        Q  i , j , v i , j      (1)
must also be separated from each other.                                                                          vi, j O                             

    As with many other image processing tasks, moving                        where O – is the search area, Q  i , j , v i , j  – is the target
object detection can be implemented in both spatial and                      function of matching blocks of the current and the previous
frequency domains.                                                           frames. By assigning the shift h i , j to the nodes of the
    In the frequency domain, most of the moving object                       reference grid included in block B i , j , we obtain the
detection methods are based on wavelet transformations [1]
                                                                             deformation field H  h i , j  for the deformed image and the
and low order fractional statistics [2]. Background changes
have less effect on the result of the moving object area                     reference image. This approach provides high efficiency at a
detection in the frequency domain than in the spatial domain.                relatively low computational complexity [8, 11].
But with this approach, problems with shadows appear [3].


Copyright © 2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
Image Processing and Earth Remote Sensing

    Block methods assume static background on which                                                                                                                                           
                                                                                                                                                                                                  
                                                                                                                                                                                                                                                 
moving objects are to be detected. In practice, consecutive                                                                       where                  2  x     1 
                                                                                                                                                                                         1
                                                                                                                                                                                              
                                                                                                                                                                                                   ~z xlt   x , yl 2    ~z t  xm  2       ,
frames can have global mutual spatial deformations, e.g. due                                                                                                                                   l 1                                             
to camera movements. In this case the algorithm based on                                                                                                              

                                                                                                                                                                     z                 z mt  1 
                                                                                                                                                               1                                      2                       ~t
the block method will detect motion in almost the entire                                                                          ˆ t2 1     1                         t 1
                                                                                                                                                                              il , jl
                                                                                                                                                                                                                 ;             z   xm               and
frame. To solve this problem, a more complex models for                                                                                                              l 1

determining the location of blocks B i , j such as similarity                                                                                            

                                                                                                                                                       z
                                                                                                                                                                                                                                    ~t
model [12] can be chosen. This models include the following                                                                       z mt  1    1                  t 1
                                                                                                                                                                    il , jl
                                                                                                                                                                              – the mean values of                                  z xl   x , yl   and
                                                                                                                                                        l 1
parameters  t , t  1  ( h ,  ,  ) T : shift along the basic axes
                                                                                                                                  z ilt , 1jl .
h  ( h x , h y )T            , rotation angle  and scale  . The paper
proposes to estimate the location of blocks B i , j by stochastic                                                                    The method based on MSID requires less computational
                                                                                                                                  costs and can work already with the local sample size   1 ,
adaptation procedure [13] to find the parameters of  t , t  1 .                                                                 which allows it to be implemented in pixel-by-pixel
The algorithm is resistant to impulse noise and requires small                                                                    processing. Therefore, in the proposed method, the choice of
computational cost which is virtually independent of block                                                                        MSID as the main target function is appropriate.
sizes. Block sizes are usually significantly smaller than the
size of the object to be detected.                                                                                                    If the similarity model is used as a model for geometric
                                                                                                                                  deformations of the reference and deformed frame, then the
                                 III. ALGORITHM DECRIPTION                                                                        derivatives  x   i and  y   i will be defined by
   For each block B i , j of the reference frame, the stochastic                                                                  expressions:
block method proposes a recurring finding of estimation                                                                                                                                   x hx  1 ,
parameters (vector  it, ,jt  1 ) position on the deformed frame in
accordance with the procedure [13]:                                                                                                                                                       x h y  0 ,


                         ˆ i , j  n   ˆ i , j  n  1   Λ n  n (J( ˆ i , j  n  1  , Z n ))
                             t ,t 1              t ,t 1                                t ,t 1
                                                                                                                            (2)                            x     a l  x o cos    b l  y o sin  ,

                                                                                                                                                      x        a l  x o sin    b l  y o  cos   ,
where  – stochastic gradient of the target function J    ;
 Λ n –the array of learning rate; Z n – a local sample, it used                                                                                                                           y hx  0 ,

to find  at the iteration, n  0 , N  1 ; N – the number of
                                                                                                                                                                                          y hy  1 ,
iterations. Note that a local sample Z n is independently
selected for each estimation iteration.                                                                                                                   y     a l  x o sin    b l  y o cos  ,

    The method was implemented for two most common                                                                                                     y       a l  x o cos    b l  y o sin   ,
target functions: the mean square inter-frame difference
(MSID) and the inter-frame correlation coefficient [14].                                                                          where ( x o , y o ) - coordinates of the rotation center.
When using MSID for the stochastic gradient at the n-th
iteration, we obtain [15]:                                                                                                           Usually to represent deformation field, every reference
                                        
                                                                                                                                  pixel coordinates ( x , y ) is set in accordance with the shift
                             1                                                                                      x
                                         z  z                            z xlt   x , yl  2 z ilt , 1jl 
                                                   ~t ~t                      ~
              in                                    x     xl   x , yl
                                                                                                                                 vector h  ( h x , h y ) T . To obtain such deformation field
                         2x                                                                                       i
                                 
                                       l 1                                                                                 (3)   representation, the estimates of the deformation parameters
                     1                                                                                      y
                        z  z                                     z xlt , yl   y  2 z ilt , 1jl 
                                         ~t ~t                        ~
                                                                                                               ,                 ˆ i , j
                 2y
                                              y     xl , yl   y
                                                                                                            i
                                                                                                                                      must be recalculated using the accepted deformation
                              l 1
                                                                                                                                  model. In particular, for the similarity model, we get:
where  x l , y l  – coordinates on image                                                         Zt     ;  il , j l  –
coordinates on image Z t  1 ; ~z xlt , yl                                       is the brightness of the                                      hˆ i , j  x  x o  ˆ n  1   i  x o cos ˆ n  1   j  y o sin ˆ n  1   hˆ x  n  1  , (5)

oversampling image Z t taking into account the estimates                                                                                      hˆ i , j  y  y o  ˆ n  1   i  x o sin ˆ n  1   j  y o  cos ˆ n  1   hˆ y  n  1  . (6)
ˆ
   t ,t 1
   i , j  n 1
                   , obtained in the previous iteration;  x ,  y the steps
                                                                                                                                      The algorithm can be described simplified way as
of finding derivatives  ~z xlt , yl                                  x      and  ~z xlt , yl       y      using the           follows. For neighboring frames that do not have mutual
                                                                                                                                  global IGC, parameter estimates of blocks without motion
finite difference [14],  is local sample size Z n . Partial                                                                      will remain close to zero in contrast to blocks with motion,
derivatives  x   and  y   are found analytically.                                                                           whose parameter estimates will converge to some nonzero
                                                                                                                                  values (for a scale   1 ). Described rule is a criterion for
    When inter-frame correlation coefficient is used as the                                                                       assigning a block to motion. If neighboring frames have
target function, then the expression of the stochastic gradient                                                                   mutual global IGC, then the estimates of the deformation
on the n-th iteration takes the form:                                                                                             parameters of all blocks will be different from zero. In this
                                   
                                             t 1        t 1    ~t                 ~t                                           case, the blocks corresponding to the moving object will
                             1             z il , jl  z m  z xl   x , yl        z xl   x , yl    x
              in                      
                                          
                     2  ˆ t  1  l  1           x
                                                               
                                                                        
                                                                                                    
                                                                                          x     i
                                                                                                             
                                                                                                                            (4)
                                                                                                                                  form compact clusters. Blocks with global deformations are
                                                                           x
                                                                                                                                  located throughout the frame, which is used as a criterion for
                       t 1         t 1   ~t                 ~t
                     z il , jl  z m  z xl , yl   y        z xl , yl   y    y                                           determining global deformations [16]. The deformation
                  
                               y
                                          
                                           
                                                    
                                                                              
                                                                    y     i 
                                                                                            ,
               l 1                                  y




VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                                                                                                                                146
Image Processing and Earth Remote Sensing

parameters of moving objects are determined by subtracting                 information about the direction and magnitude of the pixel
the global deformations.                                                   shift in the reference image relative to its position on the
                                                                           deformed image. For example, Fig. 3 shows two consecutive
                  IV. EXPERIMENTAL RESULTS                                 frames of an image sequence in which the car in the center is
   Fig. 1 shows an example of two consecutive frames of                    moving and the car on the right is stationary. At the same
the image sequence Z t , which was obtained with a                         time images of a moving car have the following parameters
microscope at a magnification of 400 times. On this figure,                of inter-frame spatial shift: h x  3 , h y  2 . 95 .
you can see two unicellular Sonderia organisms. An
organism that is completely in the frame is in motion. Motion                  The results of estimating the deformation field using the
parameters can be written by using the similarity model:                   proposed method in comparison with the results obtained
                                                                           using a well-known blocks method named Motion Vector
h  ( 3 . 6 ,  2 . 1 ) T ,   3  ,   1 . And the second organism is   Field Adaptive Search Technique (MVFAST) [17] are
almost motionless. At the same time frames have global IGC                 shown below. MVFAST also allows pixel-by-pixel estimate
with parameters: h  (1,  2 . 2 ) T ,    1  ,   1 . 01 . Also for   of the deformation field. In this case, the estimates hˆ i , j  x ,
a complex case, unbiased additive Gaussian noise with a                    hˆ i , j  y   are recalculated into the vector module and its angle:
signal/noise ratio of 14 dB was added to the images.

                                                                                                                hˆ i , j  x   hˆ i , j  y  ,
                                                                                                                            2                      2
                                                                                                      h                                               (7)

                                                                                                       h   arctg   hˆ i , j  x hˆ i , j  y  .   (8)
                                                                               Fig. 4 shows typical shift estimations of image pixels
                                                                           corresponding to the nodes for one row of the reference
                                                                           image. Here Fig. 4(a) corresponds to the application of
                                                                           MVFAST method, Fig. 4(b) to the proposed method. For
                                                                           MVFAST method in contrast to the proposed one, you can
Fig. 1. An example of an image sequence.                                   see the errors on the borders of the object image and in the
                                                                           areas inside. Gaps inside the object occur in low-contrast
    Fig. 2 shows the comparative results of the inter-frame                areas. The proposed method due to the inertia of changes in
difference algorithms Fig. 2(a), background subtraction Fig.               the estimates does not have this disadvantage.
2(b) and the proposed stochastic block method Fig. 2(c). For
ease of comparison, each image has an organism contour.




                                                                                                          (a)                                 (b)
                 (a)                  (b)                   (c)
                                                                           Fig. 4. Example of shift estimates for row.
Fig. 2. The result of motion detection by different algorithms.

    Fig. 2 shows that the inter-frame difference and
background subtraction algorithms define the second
organism in motion, due to global geometric changes in
consecutive frames. These two algorithms detect an area of a
moving object with a large number of gaps, especially in
low-contrast places where there is a small gradient of image
brightness. The proposed stochastic block method highlights                                             (a)                                    (b)
a region of motion with almost no gaps. The gaps can only                  Fig. 5. Deformation field visualization.
correspond to blocks in which most of the pixels relate to the
background and only some of them relate to a moving object.                    Table 1 shows the estimation for the expected value m̂
                                                                           and variance Dˆ for both the row and the entire image. Also,
                                                                           the table shows that the estimation expected value of the
                                                                           MVFAST method for the motion area are several times
                                                                           greater (about 5 times for a row, 8 times for an image) than
                                                                           for the proposed method. The variance estimation for the
                                                                           motion area in the MVFAST method is many times greater
                                                                           than the variance of the proposed method. For a motionless
Fig. 3. Example of an image sequence with a moving object.                 area, the MVFAST method shows slightly better results for
                                                                           the entire image in the absence of noise. Deformation field
   As already noted, the proposed method also works for                    estimates for the entire image are shown in Fig. 5: Fig. 5(a)
pixel-by-pixel estimation of the deformation field. In this                when using the MVFAST method, Fig. 5(b) the proposed
case, each element of the deformation field contains                       method. The Fig.5 shows significant errors in the MVFAST



VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                                    147
Image Processing and Earth Remote Sensing

method at the boundaries of the object, as well as in low-                      [5]  B. Karasulu and S. Korukoglu, “Moving object detection and tracking
contrast areas within the object.                                                    in videos,” Performance Evaluation Software SpringerBriefs in
                                                                                     Computer Science, pp. 7-30, 2013.
       TABLE I.         THE ESTIMATION ERRORS OF SHIFT VECTORS                  [6] L. Wang and N. Yung, “Extraction of moving objects from their
                                                                                     background based on multiple adaptive thresholds and boundary
                                        Motion             Motionless                evaluation,” IEEE Transactions on Intelligent Transportation
                                         area                 area                   Systems, vol. 11, no. 1, pp. 40-51, 2010.
            Algorithm
                                                                                [7] R.V. Kutsov and A.P. Trifonov, “Detection of a moving object in the
                                     m̂         Dˆ         m̂      Dˆ
                                                                                     image,” Journal of Computer and Systems Sciences International, vol.
                         One line processing results
                                                                                     45, no. 3, pp. 459-468, 2006.
        Proposed algorithm           0.01       26         0.02       3
            MVFAST                   0.05      2530        0.01       1         [8] S.V. Grishin, D.S. Vatolin, A.S. Lukin, S.Iu. Putilin and K.N.
                                                                                     Strelnikov, “A review of block-based methods for estimating motion
                     Average results of the entire image
                                                                                     in digital video signals,” Software systems and tools: Thematic
        Proposed algorithm           0.01       140        0.09       4
                                                                                     collection, vol. 9, pp. 50-62, 2008.
            MVFAST                   0.08      1860        0.02       5
                                                                                [9] N.Iu. Zolotykh, V.D. Kustikova and I.B. Meerov, “An overview of
                                                                                     the methods for searching and tracking vehicles on the video stream,”
                           V. CONCLUSION                                             Vestnik of the Nizhny Novgorod University. N.I. Lobachevsky, vol.
    The developed method, based on identificationless                                5, no. 2, pp. 348-358, 2012.
stochastic adaptation, has high noise immunity and allows                       [10] H. Chen, S. Ye, A. Nedzvedz, O. Nedzvedz, H. Lv and S.
one to get rid of the influence of global IGC, as well as to                         Ablameyko, “Traffic extreme situations detection in video sequences
                                                                                     based on integral optical flow,” Computer Optics, vol. 43, no. 4, pp.
remove small moving objects that are not of interest. In this                        647-652, 2019. DOI: 10.18287/2412-6179-2019-43-4-647-652.
paper, such objects were small organisms and particles, in                      [11] I.S. Zaqout, “An efficient block-based algorithm for hair removal in
other situations it can be rain, snow, falling leaves, etc. The                      dermoscopic images,” Computer Optics, vol. 41, no. 4, pp. 521-527,
detection of small objects is realized by reducing the size of                       2017. DOI: 10.18287/2412-6179-2017-41-4-521-527.
blocks, up to one pixel.                                                        [12] D. Pons and Zh. Forsait, “Computer vision. Modern approach,”
                                                                                     Moscow: Viliams, 2004.
                         ACKNOWLEDGMENT                                         [13] A.G. Tashlinskii, “Estimation of spatial deformation parameters of
   The work was supported by RFBR and Government of                                  image sequences,” Ulyanovsk: ULSTU, 2000.
Ulyanovsk Region according to the research projects № 18-                       [14] A.G. Tashlinskii, P.V. Smirnov and L.S. Biktimirov, “Methods of
                                                                                     finding gradient estimates of target function for measurement of
41-730011 and 19-29-09048.                                                           images parameters,” Pattern Recognition and Image Analysis, vol. 21,
                                                                                     no. 2, pp. 339-342, 2011.
                              REFERENCES
                                                                                [15] A.G. Tashlinskii, P.V. Smirnov and S.S. Zhukov, “Analysis of
[1]   B. Antic, V. Crnojevic and D. Culibrk, “Efficient wavelet based                methods of estimating objective function gradient during recurrent
      detection of moving objects,” 16th International Conference on                 measurements of image parameters,” Pattern Recognition and Image
      Digital Signal Processing, 2009.                                               Analysis, vol. 22, no. 3, pp. 399-405, 2012.
[2]   A.M. Bagci, Y. Yardimci and A. Çetin, “Moving object detection            [16] A.G. Tashlinskii, S.V. Voronov and P.V. Smirnov, “A way to predict
      using adaptive subband decomposition and fractional lower-order                parameters of image registration by estimating inter-frame
      statistics in video sequences,” Signal Processing, vol. 82, no. 12, pp.        deformation of local fragments,” Pattern Recognition and Image
      1941-1947, 2002.                                                               Analysis, vol. 24, no. 1, pp. 179-184, 2014.
[3]   B.U. Töreyin, A.E. Çetin, A. Aksay and M.B. Akhan, “Moving object         [17] P.I. Hosur and K.K Ma, “Motion vector field adaptive fast motion
      detection in wavelet compressed video,” Signal Processing: Image               estimation,” Second International Conference on Information,
      Communication, vol. 20, no. 3, pp. 255-264, 2005.                              Communications and Signal Processing, pp. 7-10, 1999.
[4]   S. Ahmed, K. El-Sayed and S. Elhabian, “Moving object detection in
      spatial domain using background removal techniques,” Recent Patents
      on Computer Sciencee, vol. 1, no. 1, pp. 32-54, 2008.




VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                               148