Fast Full-Search Motion Estimation Method Based On Fast Fourier Transform Algorithm Elena I. Zakharenko, Evgeniy A. Altman Omsk State Transport University {zaxarenko.elena, altmanea}@gmail.com Abstract. Motion estimation (ME) is used extensively in video codecs based on MPEG-4 standards to remove interframe redundancy. Motion estimation is based on the block matching method which evaluates block mismatch by the sum of squared differences (SSD) measure. Winograd’s Fourier transform is applied and the redundancy of the overlapped area computation among refer- ence blocks is eliminated in order to reduce the computational amount of the ME. When the block size is N × N and the number of reference blocks in a search window is the same as the current block, this method reduces the compu- tational amount (additions and multiplications) by 58 % of the straightforward approach for N = 8 and to 81 % for N = 16 without degrading motion tracking capability. The proposed fast full-search ME method enables more accurate mo- tion estimation in comparison to conventional fast ME methods, thus it can be applied in video systems. Keywords: motion estimation, fast Fourier transform, correlation, convolution theorem, full-search, video encoding. 1 Introduction The popularity of video as a mean of data representation and transmission is in- creasing. Hence the requirements for a quality and size of video are growing. High visual quality of video is provided by coding. In 1960s the motion estimation (ME) and compensation were proposed to improve the efficiency of video coding [1]. The current frame is divided into non-overlapping blocks. For each block of the current frame the most similar block of the reference frame within the limited search area is found. The criterion of the similarity of the two blocks is called a metric com- parison of the two blocks. The position of the block, for which an extremum of metric is founded, determines the coordinates of the motion vector of the current block. The full search algorithm is the most accurate method of the block ME, i.e. the proportion of true motion vectors found is the highest [2]. The current block is com- pared to all candidate blocks within the restricted search area in order to find the best match. This ME algorithm requires a lot of computing resources. Therefore, a lot of alter- native fast motion estimation algorithms were developed. In 1981 T. Koga and other authors proposed a three-step search algorithm (TTS) [3]. 183 The disadvantage of fast search methods is finding a local extremum of a function of the difference of two blocks. Consequently motion estimation degrades by half degradation in some sequences compared to brute-force and visual quality of video degrades as well [4]. 2 The Criterion To Compare Blocks The standards of video coding do not regulate the choice of criterion for matching two blocks (metric). One of the most popular metrics is the sum of square difference (SSD): N h 1 N w 1 SSD(i, j )    ( B( x, y )  S ( x  i, y  j )) 2 , (1) y 0 x 0 where i, j – the coordinates of the motion vector of the current block, i ϵ (–Vw/2; Vw/2), j ϵ (–Vh/2; Vh/2), where Vw × Vh – size of the area which can be is the upper left corner of the title block on the reference frame; x, y – coordinates of the current block B; Nw × Nh – block size B; S – reference area of size Sw × Sh, where Sw = Nw + Vw, Sh = = Nh + Vh; B and S – luminance images in color format YUV. Inside the search area size Sw × Sh is the minimum value of SSD criterion for the current block B, which determines the coordinates of the motion vector in order. SSD can be calculated through fewer number of operations by decomposition into three components [5]: N h 1 N w 1   B ( x, y )  y 0 x 0 2 (2) N h 1 N w 1    B ( x, y ) S ( x  i , y  j )  (3) y 0 x 0 N h 1 N w 1    S 2 ( x  i, y  j ), (4) y 0 x 0 In [5, 6] for the (3) computation were used Fast Fourier Transform (FFT). We propose to replace this algorithm by other fast transforms: Winograd algorithm and the number-theoretic transform of Farm (NTT). 3 Research Results A programming model for block motion estimation using SSD metrics was imple- mented in Matlab in order to analyze the effectiveness of the fast Fourier transform algorithm for computing the block matching criterion. Expression (4) is calculated using the algorithm described in [2]. 184 To analyze the effectiveness of the algorithms of fast Fourier transform (FFT) for motion estimation were considered Cooley-Tukey and Winograd algorithms. We selected B block 16 × 16 and 8 × 8 pixels, a reference area S is twice as much as B block, i.e. is 32 × 32 and 16 × 16, respectively. Operating time of algorithm is significant for processing and analysis of video, it depends on its computational complexity. The number of arithmetic operations (mul- tiplications (×) and additions (+)) is significant when the complexity of the method for motion estimation is measured. Therefore, this criterion was chosen as the metric calculation efficiency of SSD. Table 1 shows the results for a block size of 16 × 16 pixels and the reference area S 32 × 32 dots. Table 1. The computational complexity of algorithms SSD when S = 2B = 32 for one of the current block Real arithmetic operations plexity, % The com- expres- expression expression In Algorithms in total sion (2) (3) (4) total × + × + × + × + SSD according to (1), FS – – – – – – 65 536 130 816 196 352 100 SSD according to (1), TTS – – – – – – 2 560 5 110 7 670 4 SSD according 256 255 65 536 65 280 65 536 65 280 131 328 130 815 262 143 134 to (2), (3) and (4) Winograd FFT 0 0 7 203 25 923 1 426 2 115 8 629 28 038 36 667 19 Cooley-Tukey 0 0 23 552 33 792 1 426 2 115 24 978 35 907 60 885 31 FFT The results for block 8×8 and the search area 16×16 similar to presented in Table 1. Winograd algorithms reduces the computational complexity of the ME algorithm by 58 % for block size 8×8. Having analyzed the results shown in Tables 1, it is obvious that the choice of con- volution algorithm has a significant impact on the computational complexity of moti- on estimation. For the hardware implementation of a significant impact on the area occupied on the chip, providing multiplication. As can be seen from Table 1 application of the Winograd algorithms for calculating SSD reduces number of the real multiplications. The algorithms examined operate with complex numbers in order to get the result. For identifying more efficient ways to compute the two-dimensional convolution the number-theoretic transform of Farm (NTT) was considered. This algorithm is based on modular arithmetic and uses only integer real numbers. It is hard to assess the computational efficiency by the number of arithmetic opera- tions of NTT relative to considered algorithms, because besides addition and multipli- cation there are operations modulo Farm. In this case the criterion of comparison of these methods is the estimated performance. This criterion evaluates the time required 185 to calculate the coordinates of the moving blocks with each of the FFT algorithms considered in the developed model. The results of Matlab simulation are presented in Table 2. Table 2. The estimated performance of the algorithms SSD when S=2B=16 for one of the current block Algorithms Time, ms Winograd FFT 5,5 Cooley-Tukey FFT 6,7 NTT 6,1 As shown in Table 2 motion estimation method based on block matching using Winograd algorithm has the lowest complexity. 4 Conclusions A new fast full search algorithm for block motion estimation based on convolution theorem and Winograd’s Fourier transform is presented. Proposed method reduces the computational complexity of the ME algorithm by 58 % the size of block 8 × 8, and – 81 % the size of block 16 × 16 pixels for the search area, exceeding twice the size of the current block. The developed motion estimation algorithm of full search can be used in the environment with a high demand for the quality of video, and the compu- tational complexity of the algorithm is not critical. The further direction of research in this area will be the analysis of fast algorithm for computing two-dimensional convo- lution through its decomposition into several convolutions shorter length. References 1. Haskell B.G., Limb J.O.: Predictive video encoding using measured subjective velocity. U.S. Patent No. 3,632,865 (1972) 2. Lee C.-H., Chen L.-H.: A Fast Motion Estimation Algorithm Based on the Block Sum Pyramid. J. IEEE Trans. Image Processing, vol. 6, pp. 1587–1591 (1997) 3. Koga T., Iinuma K., Hirano A., Iijima Y., Ishiguro T.: Motion compensated interframe coding for video conferencing. In: Proc. Nat. Telecommun. Conf., New Orleans, LA, pp. G5.3.1–5.3.5 (1981) 4. Cheung C. H., Po L. M.: A novel cross-diamond search algorithm for fast block motion es- timation. J. IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 12, pp. 1168–1177 (2002) 5. Kilthau S. L., Drew M. S., Moller T. Full search content independent block matching based on the fast Fourier transform. IEEE 1CIP, I, pp. 669–672 (2002) 6. Zubarev Iu.B., Dvorkovich V.P., Nechepaev V.V., Sokolov A.Iu.: Metody analiza i kompensatsii dvizheniia v dinamicheskikh izobrazheniiakh. J. Elektrosviaz', no. 11, pp. 15–21 (1998) 186