Adaptation of the mathematical apparatus of the Markov chain theory for the probabilistic analysis of recurrent estimation of image inter-frame geometric deformations G L Safina1, A G Tashlinskii2 and M G Tsaryov2 1 National Research Moscow State University of Civil Engineering, Yaroslavskoe Shosse, 26, Moscow, Russia, 129337 2 Ulyanovsk State Technical University, Severnii Venetz, 32, Ulyanovsk, Russia, 432027 e-mail: tag54@mail.ru Abstract. The paper is devoted to the analysis of the possibilities of using Markov chains for analyzing the accuracy of stochastic gradient relay estimation of image geometric deformations. One of the ways to reduce computational costs is to discretize the domain of studied parameters. This approach allows to choose the dimension of transition probabilities matrix a priori. However, such a matrix has a rather complicated structure. It does not significantly reduce the number of computations. A modification of the transition probabilities matrix is proposed, it’s dimension does not depend on the dimension of estimated parameters vector. In this case, the obtained relations determine a recurrent algorithm for calculating the matrix at the estimation iterations. For the one-step transitions matrix, the calculated expressions for the probabilities of image deformation parameters estimates drift are given. 1. Introduction At estimation of the parameters of image inter-frame geometric deformations (IID) under conditions of a priori uncertainty the non-identification stochastic gradient procedures (SGP) [1-2] are widely used [3-6]: αˆ t = αˆ t −1 − Λ t βt (Q (Z (1) , Z ( 2 ) , αˆ t −1 )) , (1) where Z (1) and Z ( 2 ) are images; α = (α1 , ... , α m )T is a vector of estimated parameters of geometric deformations; Λ t is gain matrix; α̂ 0 is an initial of the parameter vector; βt is a stochastic gradient of the cost function (CF) Q (⋅) , characterizing the estimation quality. The most practical application have relay procedures [8], when in (1) (sign ( ∂Q (⋅) ∂α ) , sign (( ∂Q (⋅) ∂α )) , ..., sign (( ∂Q (⋅) ∂α ))) . At a good accuracy of the T βt = 1 2 m estimates, they have high velocity and better impulse noise resistance compared to gradient procedures [9]. The next parameter estimate α i is determined as: ( ) αˆ i ,t − λi ,t −1 sign ∂Qˆ i ( Ζ1 , Ζ 2 , αˆ t −1 ) ∂α i , αˆ i 0 ∈ Ω(α ) , αˆ i ,t = (2) where Ω(α ) is the domain of α . The estimates sequence V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) Image Processing and Earth Remote Sensing G L Safina, A G Tashlinskii and M G Tsaryov αˆ 0 , αˆ 1 , ... , αˆ t , ... , αˆ T , (3) obtained with SGP in the form of (1), is m -dimensional sequence without aftereffect and is a vector Markov process. At the same time, the joint probability density (PD) of IID parameters at any iteration can be expressed through the PD w(αˆ 0 ) of the initial approximation α̂ 0 and one-step conditional PD π t (αˆ t | αˆ t −1 ) of the Markov sequence transitions from (t − 1) -th iteration to the t -th iteration, t = 1, T [9]: T w(αˆ 0 , αˆ 1 , ..., αˆ T ) = w(αˆ 0 ) ∏ π t (αˆ t | αˆ t −1 ) . t =1 The apparatus for analysis of Markov sequences makes it possible to take into account the finiteness of the domain Ω α̂ of possible values of IID parameter estimates for different rules of PD behavior at the boundaries of parameter domain. If the domain Ω α̂ of α̂ t is continuous, then the sequence (3) is a simple Markov sequence; if it is discrete, then (3) is a Markov chain [10]. The latter is true, in particular, in the relay procedure (2). Attracting a well-developed mathematical apparatus of Markov sequences [10, 11] and chains [10, 12] for analyzing the effectiveness of PD with a finite number of iterations allows to obtain a number of useful results. Let us consider the possibility of using the apparatus of the Markov chain theory for modeling the process of stochastic gradient estimation of IID parameters. 2. The relationship of the matrix of transition probabilities with the probabilities of drift estimates If in (1) the gain matrix Λ t is diagonal, then the matrix of conditional probabilities Π i (l , t ) = P{αˆ it = aik | αˆ il = aij } can be expressed in terms of estimates drift probabilities (EDP) of the IID parameters [13, 14]. The EDP of the parameter is the probability of the estimate improvement after the SGP iteration (taking into account possible changes in the estimates of other parameters). Then, with a variable step λt of the increment in (2), the elements π jk (l , t ) of matrix Π(l, t ) :  ρα+ (ε j ), if l = t − 1, k = j + ∆ t sign ε j ,  ρ o (ε ), if l = t , k = j; , π jk (l , t ) =  α− j  ρα (ε j ), if l = t − 1, k = j − ∆ t sign ε j , 0, other case, where π jk (l , t ) is the probability that the estimate α̂ i will be equal to value aik , if at an earlier iteration l < t the estimate had a value aij ; ρα+ (ε j ) is the probability of changing the parameter to α iт ; ρα− (ε j ) is probability of estimate change from α iт ; ρα0 (ε j ) - probability of no change of the estimate; ε ij = (α iт − aij ) ; α iт is the exact value of the parameter α i ; ∆ t is the number of possible states of parameter α in the interval from a j to a k , including the state a k . In the future, to simplify the recording, the index «i» will be omitted when considering one parameter. At a constant step λt the elements π jk (l , t ) are also directly expressed through the EDP:  ρα+i (ε j ), if l = t − 1, k = j + sign ε j ;  ρ o (ε ), if l = t , k = j; π jk (l , t ) =  α−i j (4)  ραi (ε j ), if l = t − 1, k = t − sign ε j ; 0, other case. Under these conditions, we obtain a homogeneous Markov chain (3), for which Π(t ) = Π t , (5) where Π is the matrix of one-step transition probabilities. At t → ∞ such a chain becomes stationary. V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 104 Image Processing and Earth Remote Sensing G L Safina, A G Tashlinskii and M G Tsaryov However, with an increasing the number of estimated parameters, the application of the classical mathematical apparatus of Markov chains becomes problematic due to the sharp increase in the size of the transition probability matrix. One of the main parameters determining computing costs, when using the Makovsky chain apparatus, is the number of possible values of IID parameter estimates. A priori, choose the size of the matrix Π allows the discretization of the estimated parameters domain. However, the use of the classical apparatus of the Markov chain theory remains reasonable only when estimating one parameter, since an increase in the number m of estimated parameters by one leads to an increase in computational costs at least by K 2 , where K is the number of possible discrete values of estimates of the ( m + 1) -th parameter. In problems of measuring the IID parameters, the value K reaches several orders of magnitude. In this case, the determination of PD of parameter estimates, based on the use of a matrix of one-step transitions, becomes an obstacle for probabilistic modeling of the stochastic gradient process of measuring the parameters of inter-frame deformations. 3. Adaptation of the Markov chains apparatus to the solved problem To reduce computational costs, we use the fact that at the t -th iteration, regardless of the state of the estimates of other parameters, transitions from the j -th state of parameter α i estimate are possible only to the known k -th state, where k ∈ { j + ν it + 1, j + ν it , j, j − ν it , j − ν it − 1,} , ν it = int (λit / ∆αi ) . In this case, the transition probabilities are determined by the state of other parameter estimates. The integral probability of the transition of estimate α̂ i from the j -th state (αˆ = aij ) to the k -th state (αˆ = aik ) is determined by the sum of the transition probabilities from the subdomains ωi k of the parameter space, i = 1, m , k = 1, K i . For example, to use relay-type procedures when evaluating three parameters α1 , α 2 and α 3 the overall probability ρ~ij of deterioration of the estimate αˆ1 = a1 j at the t -th iteration can be written as [15]: K1  K3  ρ~1−j = ∑  p2 k (t − 1)∑ p3l (t − 1)ρ − (ε 1 j , ε 2 k , ε 3l ), k =1  l =1  where plk (t − 1) = P(αˆ l = alk | αˆ1 = aij ) is the probability that at the (t − 1) -th iteration for α̂ l = alk , k = 1, K l the value α̂1 is equal to a1 j ; ε lk = alk − α lт is deviation of the estimate α̂ l from the exact value α lт , l = 1, 2, 3 . For m parameters we have:     ρ~1−j = ∑  p2 k (t − 1)∑  p3l (t − 1)...∑ ( pmn (t − 1)ρ − (ε 1 j , ε 2 k ,..., ε mn ))...  . K2 K3 Km (6) k =1  l =1  n =1   Similarly, the expressions for probabilities ρ~1oj and ρ~1+j can be written. With this approach, the matrices of one-step transitions are also change, which for this case we ( ) denote Πi (t ) = π (jkt ) i, ρ~i*j , where i is the number of parameter; π (jkt ) (i, ρ~ij* ) is probability of transition ~ of i -th estimate from the state a j at the (t − 1) -th iteration to the state a k to the t -th iteration: K2  Km  ρ~ij* = ∑  p1ν (t − 1)...∑ ( pml (t − 1)ρ i* (ε 1ν ,..., ε ij ,..., ε ml )) , (7) ν =1  l =1  where ρ i* (⋅) are values of probabilities ρ i+ (⋅) , ρ io (⋅) and ρ i− (⋅) at a vector of estimates mismatch ε i = (ε1ν ,..., ε ij ,..., ε ml )T , ε ik = aik − α iт . At the same time, the size of the matrix with m parameters compared with the traditional approach is reduced from ∑ K i × ∑ K i to K i × K i . Computational costs m m i =1 i =1 are about as much reduced. This reduction occurs due to the loss of information about the probability of belonging to the estimation of the parameter vector of each of the subdomains of the parameter space. Only information about the projections of the distribution in this space is saved. In this case, V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 105 Image Processing and Earth Remote Sensing G L Safina, A G Tashlinskii and M G Tsaryov this information is sufficient for calculating the PD of IID estimates for a finite number of iterations. To calculate the PD of the estimate of the i -th parameter at the t -th iteration of the SGP, you need to know the PD of the estimates of all parameters on the (t − 1) -th iteration: t ~ p i (t ) = p i (0)∏ Π i ( s ) . (8) T T s =1 ~ Thus, determining the matrix Π i (t ) allows only a recurrent method of calculation. Note also that, ~ when Π i (t ) is used even at λi = const the Markov chain of estimates formed by the SGP, it can no longer be considered homogeneous. Accordingly, the expression (5) for this case is not true. To calculate the discrete probability distribution pTi (t ) of estimates of the i -th parameter using the matrix ~ Π i (t ) , we obtain the recurrent procedure: ~ pTi (t ) = pTi (t − 1)Π i (t ), i = 1, m . (9) ~ For a variable step λit with adopted simplifications, the matrix Π i (t ) for the parameter α i can be determined as: ρ~i−1 + ρ~io1 0 ... ρ~i+1−1 ρ~i+1−2 ... 0 0 ~ ρ i2 − ~ ρ i2 ... 0 ρ i2−1 ... o ~ + 0 0 ~ ρ i3−2 − ~ ρ i3−1 ... 0 − 0 ... 0 0 ~ Πi (t ) = 0 ρ~i− 4−2 ... 0 0 ... 0 0 . (10) ... ... ... ... ... ... ... ... 0 0 ... 0 0 ... ρ~ioKi −1 ρ~i+Ki −1 0 0 ... 0 0 ... 0 ρ~ioKi + ρ~i+Ki For constant λi the expression, determining the matrix element is significantly simplified: ~ ~  ρ i− + ρ io , if j = k = 1,  ρ~ − , j j if j = k − 1,1 < j < K i ,  ~i j  ρ i , if j = k , 1 < j < K i , ( ) o π (jki ) t , ρ~i* =  ~ + j (11) ρi , if j = k + 1,1 < j < K i ,  j j  ρ~io + ρ~i+ , if j = k = K i ,  j j 0, another case. ~ Relations (7)-(11) determine the recurrent algorithms for calculating the matrix Π i (t ) and the PD of the parameter estimation errors for the required iteration of estimation, starting from the initial ~ approximation. The size of Π i (t ) does not depend on the dimension of the vector α and is determined only by the discretization parameters of the domain of definition of a specific parameter α i . Computational costs with an increase in the dimension of the vector of parameters grow in m proportion to ∑ K i , which allows to find a compromise between the accuracy of the calculation of PD i =1 and the requirements for computational resources. Further reduction of computational costs is possible due to the imposition of restrictions on the range of allowable values of the estimated parameters and the introduction of rules for taking into account probabilities beyond the boundaries of this area. 4. Conclusion It is shown that the sequence of estimates of the parameters of IID, obtained using GSP, is a sequence without aftereffect and is a vector Markov process. With one estimated parameter for probabilistic modeling of the stochastic gradient estimation process, it is advisable to use the mathematical apparatus of the Markov chain theory. The study of expressions that allow calculating the transition probabilities of the matrix of one-step transitions through the probabilities of parameter estimation V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 106 Image Processing and Earth Remote Sensing G L Safina, A G Tashlinskii and M G Tsaryov drift showed that for relay algorithms with one estimated parameter, the single-step transition matrix has a five-diagonal structure and does not depend on the iteration number, determining the Markov chain uniformity. However, for the vector of parameters, the use of the classical apparatus of the Markov chain theory becomes problematic due to a sharp increase in the size of the transition probability matrix. A priori, the matrix size can be chosen to discretize the domain of the parameters to be estimated, but even with this approach, the use of the classical apparatus makes sense only when evaluating one parameter. In order to reduce computational costs, one-step transition matrices are proposed, the dimension of which is determined only by discretization of the domain of the corresponding parameters and does not depend on their number. The resulting matrix allows only a recurrent method of its calculation from the initial approximation of the parameters to the required iteration. Moreover, the Markov chain of estimates loses the property of homogeneity. Also, accurate information about the probability distribution in the parameter space is lost. Only the projections of this spatial distribution are saved. However, this is enough to solve the problem of finding the PD of parameter estimates for inter-frame deformations with a finite number of iterations. 5. References [1] Tashlinskii A G 2003 Computational expenditure reduction in pseudo-gradient image parameter estimation Lecture Notes in Computer Science 2658 456-462 [2] Tashlinskii A G 2007 Pseudogradient estimation of digital images interframe geometrical deformations Vision Systems: Segmentation and Pattern Recognition (Vienna: I Tech Education and Publishing) 25 465-494 [3] Shalev-Shwartz S and Zhang T 2016 Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization Mathematical Programming 155(1) 105-145 [4] Borisova I V, Legkiy V N and Kravets S A 2017 Application of the gradient orientation for systems of automatic target detection Computer Optics 41(6) 931-937 DOI: 10.18287/2412- 6179-2017-41-6-931-937 [5] Tashlinskii A G, Smirnov P V and Tsaryov M G 2017 Pixel-by-pixel estimation of scene motion in video International Archives of the Photogrammetry Remote Sensing and Spatial Information XLII-2/W4 61-65 [6] Fursov V A, Gavrilov A V, Goshin Ye V and Pugachev K G 2017 Conforming identification of the fundamental matrix in the image matching problem Computer Optics 41(4) 559-563 DOI: 10.18287/2412-6179-2017-41-4-559-563 [7] Maksimov A I and Gashnikov M V 2018 Adaptive interpolation of multidimensional signals for differential compression Computer Optics 42(4) 679-687 DOI: 10.18287/2412-6179-2018-42-4- 679-687 [8] Tsypkin Ya Z 1995 Information theory of identification (Moscow: Fizmatlit) p 336 [9] Tashlinskii A G 2008 Optimization of goal function pseudogradient in the problem of interframe geometrical deformations estimation Pattern Recognition Techniques, Technology and Applications (Vienna: I Tech Education and Publishing) 10 249-280 [10] Tichonov V I and Mironov M A 1977 Markov processes (Moscow: Sovetskoe Radio) p 488 [11] Neveu J 1964 Mathematical foundations of probability theory (Paris: Masson et cie) p 309 [12] Dynkin E B 1959 Foundations of the Markov processes theory (Moscow: Fizmatlit) p 227 [13] Tashlinskii A G and Tichonov V O 2001 Methods for analyzing the error of pseudogradient measurement of the parameters of multidimensional processes Izvestiya vuzov: Radioelectronika 44 75-80 [14] Tashlinskii A G and Voronov I V 2014 The probability of the demolition of estimates of the parameters of image interframe geometric deformations with a pseudogradient measurement Izvestiya Samarskogo nauchnogo tsentra Rossiiskoi akademii nauk 6(2) 612-615 [15] Tashlinskii A G, Gorin A A, Muratkhanov D S and Tikhonov V O Priority approach to the estimation of the parameters of the spatial image distortions 2001 Pattern Recognition and Image Analysis 11(1) 251-253 V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 107 Image Processing and Earth Remote Sensing G L Safina, A G Tashlinskii and M G Tsaryov Acknowledgments The reported study was supported by RFBR and Government of Ulyanovsk region, project 18-41- 730006. V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 108