Adaptation of the mathematical apparatus of the Markov
chain theory for the probabilistic analysis of recurrent
estimation of image inter-frame geometric deformations

                G L Safina1, A G Tashlinskii2 and M G Tsaryov2


                1
                  National Research Moscow State University of Civil Engineering, Yaroslavskoe Shosse, 26,
                Moscow, Russia, 129337
                2
                  Ulyanovsk State Technical University, Severnii Venetz, 32, Ulyanovsk, Russia, 432027


                e-mail: tag54@mail.ru

                Abstract. The paper is devoted to the analysis of the possibilities of using Markov chains for
                analyzing the accuracy of stochastic gradient relay estimation of image geometric
                deformations. One of the ways to reduce computational costs is to discretize the domain of
                studied parameters. This approach allows to choose the dimension of transition probabilities
                matrix a priori. However, such a matrix has a rather complicated structure. It does not
                significantly reduce the number of computations. A modification of the transition probabilities
                matrix is proposed, it’s dimension does not depend on the dimension of estimated parameters
                vector. In this case, the obtained relations determine a recurrent algorithm for calculating the
                matrix at the estimation iterations. For the one-step transitions matrix, the calculated
                expressions for the probabilities of image deformation parameters estimates drift are given.


1. Introduction
At estimation of the parameters of image inter-frame geometric deformations (IID) under conditions
of a priori uncertainty the non-identification stochastic gradient procedures (SGP) [1-2] are widely
used [3-6]:
                                αˆ t = αˆ t −1 − Λ t βt (Q (Z (1) , Z ( 2 ) , αˆ t −1 )) ,       (1)
where Z (1) and Z ( 2 ) are images; α = (α1 , ... , α m )T is a vector of estimated parameters of geometric
deformations; Λ t is gain matrix; α̂ 0 is an initial of the parameter vector; βt is a stochastic gradient of
the cost function (CF) Q (⋅) , characterizing the estimation quality.
   The most practical application have relay procedures                                                   [8],   when   in   (1)
     (sign ( ∂Q (⋅) ∂α ) , sign (( ∂Q (⋅) ∂α )) , ..., sign (( ∂Q (⋅) ∂α ))) . At a good accuracy of the
                                                                                               T
βt
=                       1                           2                                     m

estimates, they have high velocity and better impulse noise resistance compared to gradient procedures
[9]. The next parameter estimate α i is determined as:

                                                     (                                )
                                αˆ i ,t − λi ,t −1 sign ∂Qˆ i ( Ζ1 , Ζ 2 , αˆ t −1 ) ∂α i , αˆ i 0 ∈ Ω(α ) ,
                        αˆ i ,t =                                                                                            (2)
where Ω(α ) is the domain of α . The estimates sequence


                    V International Conference on "Information Technology and Nanotechnology" (ITNT-2019)
Image Processing and Earth Remote Sensing
G L Safina, A G Tashlinskii and M G Tsaryov


                                             αˆ 0 , αˆ 1 , ... , αˆ t , ... , αˆ T ,                       (3)
obtained with SGP in the form of (1), is m -dimensional sequence without aftereffect and is a vector
Markov process. At the same time, the joint probability density (PD) of IID parameters at any iteration
can be expressed through the PD w(αˆ 0 ) of the initial approximation α̂ 0 and one-step conditional
PD π t (αˆ t | αˆ t −1 ) of the Markov sequence transitions from (t − 1) -th iteration to the t -th iteration,
t = 1, T [9]:
                                                                                 T
                                         w(αˆ 0 , αˆ 1 , ..., αˆ T ) = w(αˆ 0 ) ∏ π t (αˆ t | αˆ t −1 ) .
                                                                                t =1
    The apparatus for analysis of Markov sequences makes it possible to take into account the
finiteness of the domain Ω α̂ of possible values of IID parameter estimates for different rules of PD
behavior at the boundaries of parameter domain.
    If the domain Ω α̂ of α̂ t is continuous, then the sequence (3) is a simple Markov sequence; if it is
discrete, then (3) is a Markov chain [10]. The latter is true, in particular, in the relay procedure (2).
Attracting a well-developed mathematical apparatus of Markov sequences [10, 11] and chains [10, 12]
for analyzing the effectiveness of PD with a finite number of iterations allows to obtain a number of
useful results.
    Let us consider the possibility of using the apparatus of the Markov chain theory for modeling the
process of stochastic gradient estimation of IID parameters.

2. The relationship of the matrix of transition probabilities with the probabilities of drift
estimates
If in (1) the gain matrix Λ t is diagonal, then the matrix of conditional probabilities
Π i (l , t ) = P{αˆ it = aik | αˆ il = aij } can be expressed in terms of estimates drift probabilities (EDP) of the
IID parameters [13, 14]. The EDP of the parameter is the probability of the estimate improvement
after the SGP iteration (taking into account possible changes in the estimates of other parameters).
Then, with a variable step λt of the increment in (2), the elements π jk (l , t ) of matrix Π(l, t ) :
                                               ρα+ (ε j ), if l = t − 1, k = j + ∆ t sign ε j ,
                                               ρ o (ε ), if l = t , k = j; ,
                              π jk (l , t ) =  α− j
                                                ρα (ε j ), if l = t − 1, k = j − ∆ t sign ε j ,
                                               0,          other case,
where π jk (l , t ) is the probability that the estimate α̂ i will be equal to value aik , if at an earlier
iteration l < t the estimate had a value aij ; ρα+ (ε j ) is the probability of changing the parameter to α iт ;
ρα− (ε j ) is probability of estimate change from α iт ; ρα0 (ε j ) - probability of no change of the estimate;
ε ij = (α iт − aij ) ; α iт is the exact value of the parameter α i ; ∆ t is the number of possible states of
parameter α in the interval from a j to a k , including the state a k . In the future, to simplify the
recording, the index «i» will be omitted when considering one parameter.
   At a constant step λt the elements π jk (l , t ) are also directly expressed through the EDP:
                                                  ρα+i (ε j ), if l = t − 1, k = j + sign ε j ;
                                                  ρ o (ε ), if l = t , k = j;
                                 π jk (l , t ) =  α−i j                                                        (4)
                                                   ραi (ε j ), if l = t − 1, k = t − sign ε j ;
                                                  0,           other case.
  Under these conditions, we obtain a homogeneous Markov chain (3), for which
                                            Π(t ) = Π t ,                                          (5)
where Π is the matrix of one-step transition probabilities. At t → ∞ such a chain becomes stationary.


V International Conference on "Information Technology and Nanotechnology" (ITNT-2019)                           104
Image Processing and Earth Remote Sensing
G L Safina, A G Tashlinskii and M G Tsaryov


   However, with an increasing the number of estimated parameters, the application of the classical
mathematical apparatus of Markov chains becomes problematic due to the sharp increase in the size of
the transition probability matrix.
   One of the main parameters determining computing costs, when using the Makovsky chain
apparatus, is the number of possible values of IID parameter estimates. A priori, choose the size of the
matrix Π allows the discretization of the estimated parameters domain. However, the use of the
classical apparatus of the Markov chain theory remains reasonable only when estimating one
parameter, since an increase in the number m of estimated parameters by one leads to an increase in
computational costs at least by K 2 , where K is the number of possible discrete values of estimates of
the ( m + 1) -th parameter. In problems of measuring the IID parameters, the value K reaches several
orders of magnitude. In this case, the determination of PD of parameter estimates, based on the use of
a matrix of one-step transitions, becomes an obstacle for probabilistic modeling of the stochastic
gradient process of measuring the parameters of inter-frame deformations.

3. Adaptation of the Markov chains apparatus to the solved problem
To reduce computational costs, we use the fact that at the t -th iteration, regardless of the state of the
estimates of other parameters, transitions from the j -th state of parameter α i estimate are possible
only to the known k -th state, where k ∈ { j + ν it + 1, j + ν it , j, j − ν it , j − ν it − 1,} , ν it = int (λit / ∆αi ) . In
this case, the transition probabilities are determined by the state of other parameter estimates. The
integral probability of the transition of estimate α̂ i from the j -th state (αˆ = aij ) to the k -th state
(αˆ = aik ) is determined by the sum of the transition probabilities from the subdomains ωi k of the
parameter space, i = 1, m , k = 1, K i . For example, to use relay-type procedures when evaluating three
parameters α1 , α 2 and α 3 the overall probability ρ~ij of deterioration of the estimate αˆ1 = a1 j at the
t -th iteration can be written as [15]:
                                                       K1
                                                                         K3
                                                                                                              
                                               ρ~1−j = ∑  p2 k (t − 1)∑ p3l (t − 1)ρ − (ε 1 j , ε 2 k , ε 3l ),
                                                       k =1             l =1                                 
where plk (t − 1) = P(αˆ l = alk | αˆ1 = aij ) is the probability that at the (t − 1) -th iteration for α̂ l = alk ,
k = 1, K l the value α̂1 is equal to a1 j ; ε lk = alk − α lт is deviation of the estimate α̂ l from the exact
value α lт , l = 1, 2, 3 . For m parameters we have:
                                                                                                                   
                     ρ~1−j = ∑  p2 k (t − 1)∑  p3l (t − 1)...∑ ( pmn (t − 1)ρ − (ε 1 j , ε 2 k ,..., ε mn ))...  .
                             K2                        K3                  Km
                                                                                                                           (6)
                             k =1                    l =1                n =1                                      
Similarly, the expressions for probabilities ρ~1oj and ρ~1+j can be written.
   With this approach, the matrices of one-step transitions are also change, which for this case we
                          ( )
denote Πi (t ) = π (jkt ) i, ρ~i*j , where i is the number of parameter; π (jkt ) (i, ρ~ij* ) is probability of transition
       ~

of i -th estimate from the state a j at the (t − 1) -th iteration to the state a k to the t -th iteration:
                                        K2
                                                                 Km
                                                                                                              
                           ρ~ij* = ∑  p1ν (t − 1)...∑ ( pml (t − 1)ρ i* (ε 1ν ,..., ε ij ,..., ε ml )) ,                 (7)
                                        ν =1                     l =1                                        
where ρ i* (⋅) are values of probabilities ρ i+ (⋅) , ρ io (⋅) and ρ i− (⋅) at a vector of estimates mismatch
ε i = (ε1ν ,..., ε ij ,..., ε ml )T , ε ik = aik − α iт . At the same time, the size of the matrix with m parameters
compared with the traditional approach is reduced from ∑ K i × ∑ K i to K i × K i . Computational costs
                                                                                           m        m


                                                                                          i =1     i =1

are about as much reduced. This reduction occurs due to the loss of information about the probability
of belonging to the estimation of the parameter vector of each of the subdomains of the parameter
space. Only information about the projections of the distribution in this space is saved. In this case,

V International Conference on "Information Technology and Nanotechnology" (ITNT-2019)                                      105
Image Processing and Earth Remote Sensing
G L Safina, A G Tashlinskii and M G Tsaryov


this information is sufficient for calculating the PD of IID estimates for a finite number of iterations.
To calculate the PD of the estimate of the i -th parameter at the t -th iteration of the SGP, you need to
know the PD of the estimates of all parameters on the (t − 1) -th iteration:
                                                                    t
                                                                      ~
                                                  p i (t ) = p i (0)∏ Π i ( s ) .                            (8)
                                                     T          T

                                                                    s =1
                                                ~
     Thus, determining the matrix Π i (t ) allows only a recurrent method of calculation. Note also that,
           ~
when Π i (t ) is used even at λi = const the Markov chain of estimates formed by the SGP, it can no
longer be considered homogeneous. Accordingly, the expression (5) for this case is not true. To
calculate the discrete probability distribution pTi (t ) of estimates of the i -th parameter using the matrix
 ~
 Π i (t ) , we obtain the recurrent procedure:
                                                                      ~
                                               pTi (t ) = pTi (t − 1)Π i (t ), i = 1, m .                 (9)
                                                                                          ~
     For a variable step λit with adopted simplifications, the matrix Π i (t ) for the parameter α i can be
determined as:
                           ρ~i−1 + ρ~io1    0       ... ρ~i+1−1 ρ~i+1−2 ...        0        0
                                ~
                               ρ i2
                                  −        ~
                                          ρ i2 ... 0 ρ i2−1 ...
                                              o                    ~ +             0        0
                              ~
                             ρ i3−2
                                 −        ~
                                          ρ i3−1 ... 0
                                             −                      0      ...     0        0
                 ~
                 Πi (t ) =       0       ρ~i−
                                            4−2
                                                    ...    0        0      ...     0        0 .         (10)
                              ...          ...    ...     ...      ...     ...   ...             ...
                               0            0     ...      0        0      ... ρ~ioKi −1       ρ~i+Ki −1
                               0            0     ...      0        0      ...     0       ρ~ioKi + ρ~i+Ki
     For constant λi the expression, determining the matrix element is significantly simplified:
                                                 ~      ~
                                              ρ i− + ρ io , if j = k = 1,
                                              ρ~ − ,
                                                   j      j

                                                              if j = k − 1,1 < j < K i ,
                                              ~i  j

                                              ρ i ,         if j = k , 1 < j < K i ,
                                    ( )
                                                   o

                         π (jki ) t , ρ~i* =  ~ + j
                                                                                                      (11)
                                                ρi ,          if j = k + 1,1 < j < K i ,
                                              
                                       j
                                                   j

                                               ρ~io + ρ~i+ , if j = k = K i ,
                                              
                                                   j      j


                                              0,            another case.
                                                                                         ~
     Relations (7)-(11) determine the recurrent algorithms for calculating the matrix Π i (t ) and the PD
of the parameter estimation errors for the required iteration of estimation, starting from the initial
                                          ~
approximation. The size of Π i (t ) does not depend on the dimension of the vector α and is
determined only by the discretization parameters of the domain of definition of a specific parameter
α i . Computational costs with an increase in the dimension of the vector of parameters grow in
                  m
proportion to ∑ K i , which allows to find a compromise between the accuracy of the calculation of PD
                  i =1
and the requirements for computational resources. Further reduction of computational costs is possible
due to the imposition of restrictions on the range of allowable values of the estimated parameters and
the introduction of rules for taking into account probabilities beyond the boundaries of this area.

4. Conclusion
It is shown that the sequence of estimates of the parameters of IID, obtained using GSP, is a sequence
without aftereffect and is a vector Markov process. With one estimated parameter for probabilistic
modeling of the stochastic gradient estimation process, it is advisable to use the mathematical
apparatus of the Markov chain theory. The study of expressions that allow calculating the transition
probabilities of the matrix of one-step transitions through the probabilities of parameter estimation


V International Conference on "Information Technology and Nanotechnology" (ITNT-2019)                        106
Image Processing and Earth Remote Sensing
G L Safina, A G Tashlinskii and M G Tsaryov


drift showed that for relay algorithms with one estimated parameter, the single-step transition matrix
has a five-diagonal structure and does not depend on the iteration number, determining the Markov
chain uniformity. However, for the vector of parameters, the use of the classical apparatus of the
Markov chain theory becomes problematic due to a sharp increase in the size of the transition
probability matrix. A priori, the matrix size can be chosen to discretize the domain of the parameters
to be estimated, but even with this approach, the use of the classical apparatus makes sense only when
evaluating one parameter.
    In order to reduce computational costs, one-step transition matrices are proposed, the dimension of
which is determined only by discretization of the domain of the corresponding parameters and does
not depend on their number. The resulting matrix allows only a recurrent method of its calculation
from the initial approximation of the parameters to the required iteration. Moreover, the Markov chain
of estimates loses the property of homogeneity. Also, accurate information about the probability
distribution in the parameter space is lost. Only the projections of this spatial distribution are saved.
However, this is enough to solve the problem of finding the PD of parameter estimates for inter-frame
deformations with a finite number of iterations.

5. References
[1] Tashlinskii A G 2003 Computational expenditure reduction in pseudo-gradient image parameter
      estimation Lecture Notes in Computer Science 2658 456-462
[2] Tashlinskii A G 2007 Pseudogradient estimation of digital images interframe geometrical
      deformations Vision Systems: Segmentation and Pattern Recognition (Vienna: I Tech Education
      and Publishing) 25 465-494
[3] Shalev-Shwartz S and Zhang T 2016 Accelerated proximal stochastic dual coordinate ascent for
      regularized loss minimization Mathematical Programming 155(1) 105-145
[4] Borisova I V, Legkiy V N and Kravets S A 2017 Application of the gradient orientation for
      systems of automatic target detection Computer Optics 41(6) 931-937 DOI: 10.18287/2412-
      6179-2017-41-6-931-937
[5] Tashlinskii A G, Smirnov P V and Tsaryov M G 2017 Pixel-by-pixel estimation of scene
      motion in video International Archives of the Photogrammetry Remote Sensing and Spatial
      Information XLII-2/W4 61-65
[6] Fursov V A, Gavrilov A V, Goshin Ye V and Pugachev K G 2017 Conforming identification of
      the fundamental matrix in the image matching problem Computer Optics 41(4) 559-563 DOI:
      10.18287/2412-6179-2017-41-4-559-563
[7] Maksimov A I and Gashnikov M V 2018 Adaptive interpolation of multidimensional signals for
      differential compression Computer Optics 42(4) 679-687 DOI: 10.18287/2412-6179-2018-42-4-
      679-687
[8] Tsypkin Ya Z 1995 Information theory of identification (Moscow: Fizmatlit) p 336
[9] Tashlinskii A G 2008 Optimization of goal function pseudogradient in the problem of
      interframe geometrical deformations estimation Pattern Recognition Techniques, Technology
      and Applications (Vienna: I Tech Education and Publishing) 10 249-280
[10] Tichonov V I and Mironov M A 1977 Markov processes (Moscow: Sovetskoe Radio) p 488
[11] Neveu J 1964 Mathematical foundations of probability theory (Paris: Masson et cie) p 309
[12] Dynkin E B 1959 Foundations of the Markov processes theory (Moscow: Fizmatlit) p 227
[13] Tashlinskii A G and Tichonov V O 2001 Methods for analyzing the error of pseudogradient
      measurement of the parameters of multidimensional processes Izvestiya vuzov:
      Radioelectronika 44 75-80
[14] Tashlinskii A G and Voronov I V 2014 The probability of the demolition of estimates of the
      parameters of image interframe geometric deformations with a pseudogradient measurement
      Izvestiya Samarskogo nauchnogo tsentra Rossiiskoi akademii nauk 6(2) 612-615
[15] Tashlinskii A G, Gorin A A, Muratkhanov D S and Tikhonov V O Priority approach to the
      estimation of the parameters of the spatial image distortions 2001 Pattern Recognition and
      Image Analysis 11(1) 251-253


V International Conference on "Information Technology and Nanotechnology" (ITNT-2019)                 107
Image Processing and Earth Remote Sensing
G L Safina, A G Tashlinskii and M G Tsaryov


Acknowledgments
The reported study was supported by RFBR and Government of Ulyanovsk region, project 18-41-
730006.


V International Conference on "Information Technology and Nanotechnology" (ITNT-2019)   108