Fast Multigrid Pattern Search for Motion Estimation in Hybrid Compression Systems Nguyen Van Truong and Andrey A. Tropchenko ITMO University, Saint Petersburg, 197101, Russian Federation, thientruong.mars@gmail.ru Abstract. This paper deals with the motion estimation algorithms for the video sequences analysis in hybrid compression standards H.264/AVC and H.265/HEVC. Based on the analysis of the advantages and disad- vantages of existing algorithms has been offered a new algorithm, which is called Fast Multigrid Pattern Search (FMPS). This new algorithm in- cludes the Fast Interger-Pel Search FIPS, Adaptive Rood Pattern Search ARPS and hierarchical search MP (Hierarchical search or Mean pyra- mid). All motion estimation algorithms have been implemented using MI- CROSOFT VISUAL STUDIO and tested with several video sequences. The criteria for evaluating the algorithms were: speed, peak signal to noise ratio and RD-curves. The proposed method showed a much better performance at a comparable error and deviation (about 4 times faster). The average loss of the RD curve value (PSNR versus bitrate) is up to 3.75% in all. Application of this algorithm in hybrid codecs (H.264/AVC and H.265/HEVC) instead of the standard can significantly reduce com- pression time. This feature enables to recommend it in telecommunica- tion systems for multimedia data storing, transmission and processing. Keywords: motion estimation, H.264/AVC, H.265/HEVC, ARPS, FIPS, Fast Multigrid Pattern Search. 1 Introduction Interframe predictive coding is used to eliminate the large amount of temporal and spatial redundancy that exists in video sequences and helps in compressing them. In conventional predictive coding the difference between the current frame and the predicted frame (based on the previous frame) is coded and transmit- ted. The better the prediction, the smaller the error and therefore lowered the transmission bit rate. If a scene is still, then a good prediction for a particular pixel in the current frame is the same pixel in the previous frame and the error is zero. However, when there is motion in a sequence, then a pixel on the same part of the moving object is a better prediction for the current pixel [1]. There are many motion estimation algorithms for interframe prediction cod- ing. This work is oriented on algorithms that are called "block matching algo- rithms" [2-5]. Block matching algorithm is a method of finding block matching in a video sequence for motion estimation. The algorithm includes the division of the current frame into blocks and comparing each of them with a corresponding block in the adjacent frame. It creates a vector which describes the motion of the unit from one place to another. This process is performed for all the frame blocks. Motion estimation has a fairly large amount of computation and can consume up to 80% of the processing power of the encoder, when a full search (FS) is used. It evaluates all possible candidate blocks within the search window. Outgoing of this disadvantage, looking for other effective algorithms is started. The Unsymmetrical-cross Multihexagon-grid Search (UMHexagonS) or FIPS was proposed for the fast integer pel motion estimation in H. 264/AVC [6]. Com- pared to FS, the UMHexagonS algorithm claims that it can reduce 90% of motion estimation time, drop less than 0.05dB PSNR, and maintain the low bit rate, in order to make the initial search point close to the best prediction point. And a novel Center Biased Fractional-pel Search (CBFPS) algorithm or ARPS is pro- posed for the fast fractional pel motion estimation in [6], which can save 30–50% computation compared with the Full Fractional-pel Search scheme. However, in the UMHexagonS algorithm compared to ARPS and Enhanced Predictive Zonal Search EPZS, the computational complexity is very high, because the search pat- tern shape has more search candidate. Therefore, in this work we propose fast motion estimation algorithm with name Fast Multigrid Pattern Search FMPS. The proposed method showed that it works about 4 times faster. This is achieved by reducing the number of search pixels on the blocks and the number of coded blocks in the frame. The peak signal to noise ratio in different video sequences shows better and worse results than characteristics of known algorithms (up to 3.75%) so it requires further investigation. 2 Fast Multigrid Pattern Search FMPS To overcome the disadvantages of existing algorithms, we propose the follow- ing algorithm called “Fast Multigrid Pattern Search”, which includes the FIPS algorithm for integer pel estimation, ARPS algorithm for fractional pel estima- tion and Hierarchical Search MP. The whole motion estimation process of FMPS is exemplified in the flowchart in fig. 1: Component algorithms are discussed in the following sections. 3 Fast Interger-Pel Search FIPS FIPS is a hybrid method because it includes four steps with different kinds of search pattern [6] (fig. 2): – Step 1: Initial search point prediction: Spatial median prediction, upper layer prediction, neighboring reference frame prediction, and temporal prediction are used to predict current motion vector (MV) of block; – Step 2: Asymmetrical cross search. It is followed by an early termination scheme; Fig. 1. The flowchart for FMPS algorithm – Step 3: Uneven multi-hexagon-grid search (step 3-1). Two sub-steps include a square search pattern and a 16 points hexagon search pattern (step 3-2); – Step 4: Extended hexagon-based search (step 4-1). Two sub-steps include a hexagon search pattern and a diamond search pattern (Small diamond search pattern SDSP) (step 4-2). Early termination scheme is also applied during the search process. For each step mentioned above, the best search point (which means a point with minimum cost so far) generated by the previous step is used as a search center for the current search step. Initial search point prediction is an important technique introduced by many fast motion estimation algorithms [7,8] setting the search area around the MBD (Minimum Block Distortion) point of the whole search window in order to improve the performance of motion estimation. Median prediction as described in [9] is frequently used in many algorithms and standard. Motion vectors of the collocated block in the previous frame and of the spatially adjacent blocks are also used in [7] as initial search point predictors. four kinds of prediction modes in this proposal: Fig. 2. Search process of FIPS algorithm, W=16 3.1 Spatial median Prediction As fig. 3 shows, median predictor is used in median prediction of MV, the median value of the adjacent blocks on the left, up, and up-right (or up-left) of the current block is used to predict the motion vector of the current block: M Vpredict = median(M Vlef t , M Vup , M Vup−right ). (1) Therefore we can predict the SAD (Sum of Absolute Differences) by: SADpredict = min(SADXpredict , SADY predict ). (2) 3.2 Upper layer Prediction As fig. 4 shows, there are seven inter prediction block modes defined in H.264. 8x8 modes (mode 4, 5, 6, 7) are first searched followed by 16x16 modes (mode 3, 2, 1). Such a strategy is not beneficial in utilizing the motion relationship between different modes, therefore the search order of the modes here is changed according to the size of the block mode, a hierarchically search order from mode 1 to 7 is chosen as our mode search order and the motion vector of the up layer block (for example, mode 5 or 6 is the up layer of mode 7, and mode 4 is the up layer of mode 5 or 6, etc.) is used as one of the prediction candidates of lower layer, just as fig. 5 demonstrates. Fig. 3. Spatial median prediction of motion vectors Fig. 4. Seven prediction blocks modes in H.264 M Vpredict = median(M VU pLayer ). (3) Therefore we can predict the SAD by: Fig. 5. Spatial upper layer prediction of motion vector SADU pLayer SADpredict = . (4) 2 3.3 Neighboring reference frame Prediction Multi-reference frames motion compensation is adopted in JVT to increase prediction accuracy and coding efficiency. For the same current block, motion vectors in different reference frames exhibit a strong correlation in our exper- iment. Therefore current block’s motion vector in reference frame tp can be predicted by scaling of current block’s motion vector in reference frame tp+1, as fig. 6 shows: tc − tp M Vpredict = M Vneigh . (5) tc − tp − 1 Therefore we can predict the SAD by: SADpredict = SADneigh . (6) 3.4 Temporal Prediction For natural video sequence, the motion track of a moving object is continuous except scene change occur, therefore there is strong correlation of motion vector in the temporal domain, and then we utilize this property to give an accurate starting search position. In this prediction mode, the motion vector of the cor- responding block in the last frame is used as one motion vector candidate, as fig. 7 shows: Fig. 6. Temporal Neighboring Ref-frame Prediction of motion vector Fig. 7. Temporal Corresponding-block Prediction of motion vector M Vpredict = M Vcorres . (7) Therefore we can predict the SAD by: SADpredict = SADcorres . (8) The prediction with the minimum cost among these prediciencies will be chosen as the initial search position of next search step. 4 Adaptive Rood Pattern Search ARPS The algorithm uses the fact that the total motion of the frame is usually co- herent, i.e. if the blocks around the current block moving in a certain direction then there is a high probability that the current block will also have a similar motion vector. This algorithm uses the motion vector of the block to its imme- diate left to predict its own motion vector [10]. The algorithm summarized as follows (fig. 8): Fig. 8. Example of the ARPS algorithm – Step 1: Find the predicted motion vector of the block. Set step size as max (|x|, |y|), where (x, y) – coordinates of predicted motion vector. Find points (around the center) are located at a distance of step size from the center. Find the point with the minimum distortion, which then to be the new center. – Step 2: Perform a search on SDSP around the new center. Repeat SDSP search until point with the minimum distortion is at the center of SDSP. 5 Hierarchical search MP (Mean pyramid) MP algorithm is described as: At the beginning, to eliminate the effect of noise low-resolution image is obtained by low-pass filter [11, 12]. The scheme of the MP algorithm is shown in fig. 9. Fig. 9. The scheme of the MP algorithm 6 Experimental results The proposed algorithm was implemented using Microsoft Visual Studio and tested with several video sequences1 . In the experiment, we will compare the proposed algorithm with the algo- rithm FS, FIPS with ARPS[6] and EPZS, which are used in JM 19.0 (H.264/14496- 10 AVC Reference Software) by the following criteria: the coding time and the quality of the video sequence (according to PSNR and RD curve). The exper- iment is carried on Window 10 OS platform with Intel(R) Core(TM) i5-4210U CPU @1.70GHz 2.40 GHz and 6GB RAM. The coding time is estimated by comparing the average coding time for en- coding each video sequence (shown in Table 1). It can be seen that the proposed algorithm reduces the coding time by about 4 times in comparison with the other algorithms. The quality of the video sequence is estimated by comparing the average PSNR video exponent (Table 2) and the RD curve values (Fig. 10). The results of the research show that the proposed algorithm loses not more than 0.87 to 3.75% of the PSNR ratio in comparison with other algorithms. In Fig. 10 shows the curves of the dependence of the distortion rate RD of the proposed algorithm and others for the difference video sequences with QP (quantization parameter) equal to 37, 32, 27, 22, vertical axes are PSNR (dB), horizontal axes correspond to bitrate (kbps), and each point on the curves represents the parameter QP. From Fig. 10 that the RD-curves for the considered algorithms (FS, FIPS with ARPS, EPZS and proposed algorithms) are close for each sub-step. This can explain the fact that the proposed change has little impact on both PSNR and bitrate. 1 Test video sequences ftp://ftp.tnt.uni-hannover.de/pub/svc/testsequences/. Table 1. Average coding time for encoding video sequences The relative decrease in the time Video Sequences Average coding time, s of the proposed algorithm FIPS with Proposed FIPS with EPZS EPZS ARPS[6] algorithm ARPS BUS_352x288_15_avc_384,yuv 71.228 66.460 15.794 4.51 4.21 CITY_704x576_30_avc_1024,yuv 239.785 232.819 55.505 4.32 4.19 CREW_704x576_30_avc_1500,yuv 187.483 176.579 46.063 4.07 3.83 FOOTBALL_352x288_15_avc_384,yuv 62.276 59.535 13.085 4.76 4.55 FOREMAN_352x288_30_avc_256,yuv 56.930 54.734 13.208 4.31 4.14 HARBOUR_704x576_30_avc_1500,yuv 276.431 253.878 65.817 4.20 3.86 MOBILE_352x288_30_avc_384,yuv 73.816 71.245 17.961 4.11 3.97 SOCCER_704x576_60_avc_3000,yuv 263.043 245.680 57.433 4.56 4.28 Table 2. Average PSNR for encoding video sequences The relative decrease in the PSNR Video Sequences Average PSNR, dB of the proposed algorithm to the FIPS with Proposed FIPS with FS EPZS FS EPZS ARPS algorithm ARPS BUS_352x288_15_avc_384,yuv 36.865 36.864 36.864 36.432 1.17 1.17 1.17 CITY_704x576_30_avc_1024,yuv 38.222 38.220 38.221 37.560 1.73 1.72 1.73 CREW_704x576_30_avc_1500,yuv 41.250 41.246 41.249 39.704 3.75 3.73 3.75 FOOTBALL_352x288_15_avc_384,yuv 38.512 38.508 38.511 37.845 1.73 1.72 1.73 FOREMAN_352x288_30_avc_256,yuv 39.153 39.149 39.153 38.031 2.87 2.85 2.86 HARBOUR_704x576_30_avc_1500,yuv 38.680 38.678 38.679 37.383 3.35 3.34 3.35 MOBILE_352x288_30_avc_384,yuv 34.453 34.451 34.452 33.656 2.31 2.30 2.31 SOCCER_704x576_60_avc_3000,yuv 37.915 37.911 37.914 37.584 0.87 0.86 0.87 7 Conclusion A new algorithm for interframe encoding of the hybrid codecs is proposed, which includes well-known algorithms FIPS, ARPS and MP. The algorithm was tested with several video sequences. Experimental results showed that the pro- posed algorithm works faster 4 times, while the PSNR coefficient is only 0.86- 3.75% below the other algoritms. Fig. 10. RD-curves with QP = 37, 32, 27, 22 for video sequence BUS References 1. Tauraga, D. Search algorithms for Block Matching Estimation // Mid-term Project, spring 1998. 2. Moschetti, F., Kunt, M., Debes, E. A Statistical Adaptive Block-Matching Motion Estimation // IEEE transactions on circuits and systems for video technology. 2003. Vol. 13, No. 4. P. 417–431. 3. Babu, D.V., Subramanian, P., Karthikeyan, C. Performance Analysis of Block Matching Algorithms for Highly Scalable Video Compression // 1-42440731-2006 IEEE. P. 179-181. 4. Barjatya, A. Block Matching Algorithms For Motion Estimation // DIP 6620 Final Project Paper in Digital Image Processing, Utah State University. P. 1–6. 5. Cuevas, E., Zaldívar, D., Pérez-Cisneros, M., Oliva, D. Block-matching algorithm based on differential evolution for motion estimation // Engineering Applications of Artificial Intelligence. 2013. Vol. 26, No. 1. P. 488–498. 6. Zhibo , C., Jianfeng, X., Yun, H., Junli, Z. Fast integer-pel and fractional-pel mo- tion estimation for H.264/AVC // Journal of Visual Communication and Image Representation. 2006, Vol. 17, No. 2, P. 264-290. 7. Tourapis, H. Y., Tourapis, A., Topiwala, P. Fast Motion Estimation within the JVT codec // JVT-E023, 5th Meeting: Geneva, Switzerland, 09-17 October, 2002. 8. Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG // JVT-F100d2.doc, 6th meeting, Awaji, Island, JP, 5-13 December, 2002. 9. Jain, A. K. Fundamentals of Digital Image Processing // Englewood Cliffs, NJ: Prentice-Hall, 1989. 10. Nguyen, V. T., Tropchenko, A. A. Hierarchical adaptive rood pattern search for motion estimation at video sequence analysis // Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2016. Vol. 16, No. 3. P. 474–481. 11. Nguyen, V. T., Tropchenko, A. A. Methods And Algorithms For Reducing Tem- poral Redundancy Of Video Data / Sbornik statyei II-oy Mezdunarodnoy nauchno- prakticheskoy konferensyy “Aktualnye problem nauky XXI veka”. 2015. Vol. 2. P. 36- 41. — Moscow: Cognitio, 2015. 12. Tropchenko, A. A., Tropchenko, A. U., Nguyen, V.T. Research of Block-Based Motion Estimation Methods for Video Compression // TEM Jornal - 2016. Vol. 5, No. 3, P. 187-194.