Image Processing, Geoinformatics and Information Security COPY-MOVE DETECTION ALGORITHM EFFICIENCY INCREASE USING BINARY SPACE PARTITIONING TREES A. Kuznetsov1,2, E. Myasnikov1 1 Samara National Research University, Samara, Russia 2 Image Processing Systems Institute - Branch of the Federal Scientific Research Centre “Crys- tallography and Photonics” of Russian Academy of Sciences, Samara, Russia Abstract. Duplicates embedding is one of the most frequently used type of digital image change. The copy-move procedure includes copying an image fragment from one part and pasting it to another part of the same image. More- over, this fragment can be changed using some transformations, like affine or contrast enhancement. Existing copy-move detection methods consist of two main steps: feature calculation in overlapping processing windows and the nearest feature search in Euclidean space using lexicographic sorting or kd-tree search. In this paper we suggest to use binary space partitioning trees on the second step of the detection algorithm – vp-tree (vantage-point tree). We pre- sent the search speed comparison using vp-tree and kd-tree. The results show advantage of the proposed solution over kd-tree use. Keywords: duplicate, copy-move, forgery, kd-tree, vp-tree, binary space parti- tioning tree Citation: Kuznetsov A, Myasnikov E. Copy-move detection algorithm efficien- cy increase using binary space partitioning trees. CEUR Workshop Proceed- ings, 2016; 1638: 373-378. DOI: 10.18287/1613-0073-2016-1638-373-378 1 Introduction In the digital age it is impossible to imagine a more popular method of presenting and transmitting information than digital images and video. They are used in all spheres of life, and are applied in many fields of science and technology. Such a huge amount of data cannot be missed by users with malicious intent whose aim is to provide forgery data to the end user. Depending on the image content, it can be used in the compro- mising form and cause serious political and economic consequences if the data changes will not be detected. That is why in today's world the urgent task is to devel- op methods and algorithms for artificial digital image changes detection. The urgency of the problem is also due to the dynamic growth of the number of software products for digital imaging. This software does not require special skills or training. Information Technology and Nanotechnology (ITNT-2016) 373 Image Processing, Geoinformatics and Information Security Kuznetsov A, Myasnikov E… Among all the existing image forgery creation methods one of the most frequently used is to copy and paste an image fragment. Such attacks are called copy-move at- tacks, and the copied fragment – a duplicate. Between copying and pasting different transformations can be applied to the duplicate: affine (scale, rotation), brightness (contrast enhancement, additive noise), and others. The widespread use of this partic- ular attack is caused by its implementation simplicity. Nowadays one can find a large number of papers on development of distorted and plain copy-move detection algorithms [1-5]: the common step used in them is the development of some features that are invariant to duplicates distortions that appear during the transformation step. These features are usually calculated in an overlapping processing window mode [2]. Previously there were developed plain copy-move de- tection algorithms, which were based on the hash values calculation in a sliding win- dow and their frequencies calculation using a hash table [4-5]. The proposed solution has shown high efficiency at a low computational complexity. In this paper, we assume the operation of feature calculation is solved and focus on the study of the nearest features search procedure. The paper is structured as follows. The first part describes the general scheme of copy-move detection algorithms. The second part is devoted to the binary space partitioning trees description. The third part contains the results of conducted experiments on the computational complexity of the proposed method using a variety of sample volumes. 2 General copy-move detection algorithm scheme A large number of works devoted to the development of copy-move detection algo- rithms exists nowadays [1-3]. Most of them adhere the general scheme shown in Fig. 1. Fig. 1. General scheme of a copy-move detection algorithm The first stage of the algorithm includes preprocessing of the analyzed image. During this step, noise filtering or image channels combining operation is applied. This step is followed by two alternative procedures. The first is to calculate the key points of the image (for example, in the work [3]), the second characterizes a partitioning scheme of the image into blocks to apply a processing window further. The third step of the algorithm is dedicated to the calculation of features, based on the key points or calculated using pixel values of the processing window according to the previous step. The next stage is to find the nearest features, which are then considered as potential Information Technology and Nanotechnology (ITNT-2016) 374 Image Processing, Geoinformatics and Information Security Kuznetsov A, Myasnikov E… duplicates. This procedure is the most resource-intensive in the whole scheme of im- age analysis, therefore, the processing time of the whole algorithm depends on its choice. The final stage of the algorithm is post-filtering and post-processing. This step reduces the number of false detected duplicates and missed duplicates, which obvi- ously increases the accuracy of the copy-move detection algorithm. Most studies on copy-move detection algorithms development [1-3] pay a lot of atten- tion to the research of new features invariant to various distortions. These features lead to detection accuracy improvement. The task of computational complexity re- duce is not considered by most of the authors. The obvious solution to the nearest feature vectors search is their pairwise comparison. This approach is computationally inefficient and has complexity O(N2), which does not allow to use it for large data analysis, for example, in remote sensing data analysis tasks. To reduce the computa- tional complexity, the great majority of authors use two main methods: lexicograph- ical sorting and kd-tree search. The lexicographic sorting means that all the feature vector are placed in the feature matrix in a row-wise way, and then the sorting of rows is applied starting with the first feature. As a result, the nearest feature vectors will be located on the adjacent rows of the matrix. An example of this approach is demonstrated in the paper [6]. In most of the studies [7-8] kd-tree [9] is used for the nearest feature vectors search. It is used in conjunction with the k nearest neighbors search algorithm, which allows to generate a list of k nearest vectors to every feature vector in Euclidean space. In com- parison with the lexicographic sorting this approach leads to better search results and, consequently, to the detection accuracy increase. In this paper, we propose to use an alternative binary space partitioning tree – vp-tree [10]. It is assumed that the use of vp-tree will lead to a better analysis time perfor- mance in comparison with kd-tree. This assumption is based on the results of previous studies in which vp-tree showed better results in search tasks [10], as well as reducing the size of multi-dimensional data [11]. 3 Binary space partitioning trees used in the study Kd-tree [9] is balanced binary space partitioning tree, the construction of which for a set of vectors is as follows. One axis among all coordinate axes is selected that will be used in partitioning of the set of input vectors. Selection of the coordinate axis can be carried out in a cyclical order (first coordinate then second coordinate, etc.) or based on the maximum variation of the vectors in the set along a selected axis (the latter option was used in this paper). After selecting a coordinate axis, having subset of vectors we choose vector with a median value along the coordinate axis selected for partitioning. The chosen vector is associated with the root of the tree, and all the other vectors are divided into two sub- sets. In the first subset we place vectors with values of the corresponding coordinate below the median value. In the second subset we place vectors with values of the corresponding coordinate above the median value. So formed subsets are associated with the left and the right subtrees of the root, respectively. After this, the above pro- Information Technology and Nanotechnology (ITNT-2016) 375 Image Processing, Geoinformatics and Information Security Kuznetsov A, Myasnikov E… cess is repeated recursively for the left and the right subtrees. Fig. 2a shows an exam- ple of kd-tree construction in two-dimensional space. a) b) Fig. 2. Binary space partitioning trees in two-dimensional space: (a) kd-tree, (b) vp-tree. Different levels of decomposition are shown with different lines: solid line (dark blue) – first level; dash-dot line (red) – second level; dash line (green) – third level Vp-tree [10] is also a balanced binary space partitioning tree. Its construction for a set of vectors is as follows. We choose one vector from a set of vectors (vantage point), which becomes the root of the tree. Then all the other vectors are divided into two subsets, so that the first subset contains vectors located by a distance not exceeding a value of R from the vantage point, and the second subset contains vectors located by a distance greater then R from the vantage point. The threshold value R is selected so that the number of elements in the first and the second subsets differed by no more than one. Formed in this way subsets are associated with the left and the right subtrees of the root, respectively. Described process is repeated recursively for the left and the right subtrees. Fig. 2b shows an example of vp-tree construction in two-dimensional space. Information Technology and Nanotechnology (ITNT-2016) 376 Image Processing, Geoinformatics and Information Security Kuznetsov A, Myasnikov E… Trees built using the algorithms described above can be used to find the nearest neighbor using the branch and bound algorithm as described in [9, 10]. It should be noted that the considered structures perform decomposition in essentially different ways: with hyperplanes orthogonal to the coordinate axes in the kd-trees and with hyperspheres with centers at the vantage points in the vp-trees. 4 Experiments For experimental study, we used a standard PC (Intel Core i5-3470 3.2. GHz, 4 Gb RAM). The algorithms were implemented in C ++ using Microsoft Visual Studio 2013 development environment. The objects of the study are feature sets of different size containing from 5,000 to 15,000 vectors. In the experiment, for each feature vector we search the nearest vector among the remaining vectors of the sample. Upon completion of the algorithm, a list containing pairs of indices of closest vectors is generated, and the run time of the search algo- rithm is evaluated in seconds. Comparison of the search time is conducted for the three solutions: elementwise comparison of feature vectors (brute force), kd-tree which is the most commonly used structure in the duplicate detection problem, and vp-tree which is the binary space partitioning tree proposed to be used for the dupli- cate detection in images in this paper. Results of the experiments shown in Table 1, show that the proposed approach performs search for several seconds faster than the kd-tree, and significantly faster than the brute force approach with increasing sample size. Table 1. Comparison of the nearest feature vector search time (sec.) for different sample sizes from 5,000 to 15,000 5000 7500 10000 12500 15000 brute force 9,6 22,7 39,0 62,0 90,0 kd-tree 5,7 13,4 24,5 36,3 49,0 vp-tree 4,6 11,0 22,1 34,6 47,2 5 Conclusion In this study, an alternative approach using vp-trees was proposed to search for simi- lar feature vectors in the problem of duplicate detection in digital images. The pro- posed solution shown best nearest vector search time in comparison with the most frequently used solution based on kd-trees. It is planned to conduct a study on the effectiveness of different vantage point selection algorithms to solve the problem of duplicate detection in digital images. Information Technology and Nanotechnology (ITNT-2016) 377 Image Processing, Geoinformatics and Information Security Kuznetsov A, Myasnikov E… Acknowledgements This work was supported by the Russian Foundation for Basic Research (RFBR) grants № 16-37-00056 мол_а and № 16-37-00202 мол_а. References 1. Christlein V, Riess C, Jordan J, Angelopoulou E. An evaluation of popular copy-move forgery detection approaches. IEEE Trans. Inf. Forensic Secur, 2012; 7(6): 1841–1854. 2. Popescu A, Farid H. Exposing digital forgeries by detecting duplicated image regions. URL: http://www.ists.dartmouth.edu/library/102.pdf 3. Fridrich J, Soukal D, Lukas J. Detection of copy-move forgery in digital images. URL: http://www.ws.binghamton.edu/fridrich/Research/copymove.pdf 4. Glumov N, Kuznetsov A, Myasnikov V. The algorithm for copy-move detection on digital images. Computer Optics, 2013; 37(3): 360–367. 5. Vladimirovich KA, Valerievich MV. A fast plain copy-move detection algorithm based on structural pattern and 2D Rabin-Karp rolling hash. In: Campilho A, Kamel M. (eds.) ICIAR 2014, Springer, Heidelberg, 2014, Part I. LNCS, 8814: 461–468. 6. Bayram S, Sencar H, Memon H. An efficient and robust method for detecting copy-move forgery. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, 2009: 1053–1056. 7. Huang H, Guo W, Zhang Y. Detection of copy-move forgery in digital images using SIFT algorithm. In: Pacific-Asia Workshop on Computational Intelligence and Industrial Appli- cation, 2008; 2: 272–276. 8. Ju S, Zhou J, He K. An Authentication Method for Copy Areas of Images. International Conference on Image and Graphics, 2007: 303–306. 9. Bentley JL. Multidimensional binary search trees used for associative searching. Commu- nications of the ACM, 1975; 18(9): 509. 10. Yianilos: Data structures and algorithms for nearest neighbor search in general metric spaces. Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms, 1993: 311–321. 11. Myasnikov EV. Evaluation of Space Partitioning Data Structures for Nonlinear Mapping. 23rd International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, WSCG 2015 – Full Papers Proceedings, 2015: 109-118. Information Technology and Nanotechnology (ITNT-2016) 378