<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <article-id pub-id-type="doi">10.18287/1613-0073-2016-1638-373-378</article-id>
      <title-group>
        <article-title>COPY-MOVE DETECTION ALGORITHM EFFICIENCY INCREASE USING BINARY SPACE PARTITIONING TREES</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>A. Kuznetsov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>E. Myasnikov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Image Processing Systems Institute - Branch of the Federal Scientific Research Centre “Crystallography and Photonics” of Russian Academy of Sciences</institution>
          ,
          <addr-line>Samara</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Samara National Research University</institution>
          ,
          <addr-line>Samara</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <fpage>373</fpage>
      <lpage>378</lpage>
      <abstract>
        <p>Duplicates embedding is one of the most frequently used type of digital image change. The copy-move procedure includes copying an image fragment from one part and pasting it to another part of the same image. Moreover, this fragment can be changed using some transformations, like affine or contrast enhancement. Existing copy-move detection methods consist of two main steps: feature calculation in overlapping processing windows and the nearest feature search in Euclidean space using lexicographic sorting or kd-tree search. In this paper we suggest to use binary space partitioning trees on the second step of the detection algorithm - vp-tree (vantage-point tree). We present the search speed comparison using vp-tree and kd-tree. The results show advantage of the proposed solution over kd-tree use.</p>
      </abstract>
      <kwd-group>
        <kwd>duplicate</kwd>
        <kwd>copy-move</kwd>
        <kwd>forgery</kwd>
        <kwd>kd-tree</kwd>
        <kwd>vp-tree</kwd>
        <kwd>binary space partitioning tree</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In the digital age it is impossible to imagine a more popular method of presenting and
transmitting information than digital images and video. They are used in all spheres of
life, and are applied in many fields of science and technology. Such a huge amount of
data cannot be missed by users with malicious intent whose aim is to provide forgery
data to the end user. Depending on the image content, it can be used in the
compromising form and cause serious political and economic consequences if the data
changes will not be detected. That is why in today's world the urgent task is to
develop methods and algorithms for artificial digital image changes detection. The urgency
of the problem is also due to the dynamic growth of the number of software products
for digital imaging. This software does not require special skills or training.
Among all the existing image forgery creation methods one of the most frequently
used is to copy and paste an image fragment. Such attacks are called copy-move
attacks, and the copied fragment – a duplicate. Between copying and pasting different
transformations can be applied to the duplicate: affine (scale, rotation), brightness
(contrast enhancement, additive noise), and others. The widespread use of this
particular attack is caused by its implementation simplicity.</p>
      <p>
        Nowadays one can find a large number of papers on development of distorted and
plain copy-move detection algorithms [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1-5</xref>
        ]: the common step used in them is the
development of some features that are invariant to duplicates distortions that appear
during the transformation step. These features are usually calculated in an overlapping
processing window mode [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Previously there were developed plain copy-move
detection algorithms, which were based on the hash values calculation in a sliding
window and their frequencies calculation using a hash table [
        <xref ref-type="bibr" rid="ref4 ref5">4-5</xref>
        ]. The proposed solution
has shown high efficiency at a low computational complexity.
      </p>
      <p>In this paper, we assume the operation of feature calculation is solved and focus on
the study of the nearest features search procedure. The paper is structured as follows.
The first part describes the general scheme of copy-move detection algorithms. The
second part is devoted to the binary space partitioning trees description. The third part
contains the results of conducted experiments on the computational complexity of the
proposed method using a variety of sample volumes.
2</p>
    </sec>
    <sec id="sec-2">
      <title>General copy-move detection algorithm scheme</title>
      <p>
        A large number of works devoted to the development of copy-move detection
algorithms exists nowadays [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1-3</xref>
        ]. Most of them adhere the general scheme shown in Fig.
1.
The first stage of the algorithm includes preprocessing of the analyzed image. During
this step, noise filtering or image channels combining operation is applied. This step
is followed by two alternative procedures. The first is to calculate the key points of
the image (for example, in the work [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]), the second characterizes a partitioning
scheme of the image into blocks to apply a processing window further. The third step
of the algorithm is dedicated to the calculation of features, based on the key points or
calculated using pixel values of the processing window according to the previous step.
The next stage is to find the nearest features, which are then considered as potential
duplicates. This procedure is the most resource-intensive in the whole scheme of
image analysis, therefore, the processing time of the whole algorithm depends on its
choice. The final stage of the algorithm is post-filtering and post-processing. This step
reduces the number of false detected duplicates and missed duplicates, which
obviously increases the accuracy of the copy-move detection algorithm.
      </p>
      <p>
        Most studies on copy-move detection algorithms development [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1-3</xref>
        ] pay a lot of
attention to the research of new features invariant to various distortions. These features
lead to detection accuracy improvement. The task of computational complexity
reduce is not considered by most of the authors. The obvious solution to the nearest
feature vectors search is their pairwise comparison. This approach is computationally
inefficient and has complexity O(N2), which does not allow to use it for large data
analysis, for example, in remote sensing data analysis tasks. To reduce the
computational complexity, the great majority of authors use two main methods:
lexicographical sorting and kd-tree search.
      </p>
      <p>
        The lexicographic sorting means that all the feature vector are placed in the feature
matrix in a row-wise way, and then the sorting of rows is applied starting with the
first feature. As a result, the nearest feature vectors will be located on the adjacent
rows of the matrix. An example of this approach is demonstrated in the paper [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
In most of the studies [
        <xref ref-type="bibr" rid="ref7 ref8">7-8</xref>
        ] kd-tree [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is used for the nearest feature vectors search. It
is used in conjunction with the k nearest neighbors search algorithm, which allows to
generate a list of k nearest vectors to every feature vector in Euclidean space. In
comparison with the lexicographic sorting this approach leads to better search results and,
consequently, to the detection accuracy increase.
      </p>
      <p>
        In this paper, we propose to use an alternative binary space partitioning tree – vp-tree
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. It is assumed that the use of vp-tree will lead to a better analysis time
performance in comparison with kd-tree. This assumption is based on the results of previous
studies in which vp-tree showed better results in search tasks [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], as well as reducing
the size of multi-dimensional data [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Binary space partitioning trees used in the study</title>
      <p>
        Kd-tree [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is balanced binary space partitioning tree, the construction of which for a
set of vectors is as follows. One axis among all coordinate axes is selected that will be
used in partitioning of the set of input vectors. Selection of the coordinate axis can be
carried out in a cyclical order (first coordinate then second coordinate, etc.) or based
on the maximum variation of the vectors in the set along a selected axis (the latter
option was used in this paper).
      </p>
      <p>After selecting a coordinate axis, having subset of vectors we choose vector with a
median value along the coordinate axis selected for partitioning. The chosen vector is
associated with the root of the tree, and all the other vectors are divided into two
subsets. In the first subset we place vectors with values of the corresponding coordinate
below the median value. In the second subset we place vectors with values of the
corresponding coordinate above the median value. So formed subsets are associated
with the left and the right subtrees of the root, respectively. After this, the above
process is repeated recursively for the left and the right subtrees. Fig. 2a shows an
example of kd-tree construction in two-dimensional space.</p>
      <p>
        Fig. 2. Binary space partitioning trees in two-dimensional space: (a) kd-tree, (b) vp-tree.
Different levels of decomposition are shown with different lines: solid line (dark blue) – first
level; dash-dot line (red) – second level; dash line (green) – third level
Vp-tree [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is also a balanced binary space partitioning tree. Its construction for a set
of vectors is as follows. We choose one vector from a set of vectors (vantage point),
which becomes the root of the tree. Then all the other vectors are divided into two
subsets, so that the first subset contains vectors located by a distance not exceeding a
value of R from the vantage point, and the second subset contains vectors located by a
distance greater then R from the vantage point. The threshold value R is selected so
that the number of elements in the first and the second subsets differed by no more
than one. Formed in this way subsets are associated with the left and the right subtrees
of the root, respectively. Described process is repeated recursively for the left and the
right subtrees. Fig. 2b shows an example of vp-tree construction in two-dimensional
space.
Trees built using the algorithms described above can be used to find the nearest
neighbor using the branch and bound algorithm as described in [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ]. It should be
noted that the considered structures perform decomposition in essentially different
ways: with hyperplanes orthogonal to the coordinate axes in the kd-trees and with
hyperspheres with centers at the vantage points in the vp-trees.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>For experimental study, we used a standard PC (Intel Core i5-3470 3.2. GHz, 4 Gb
RAM). The algorithms were implemented in C ++ using Microsoft Visual Studio
2013 development environment. The objects of the study are feature sets of different
size containing from 5,000 to 15,000 vectors.</p>
      <p>In the experiment, for each feature vector we search the nearest vector among the
remaining vectors of the sample. Upon completion of the algorithm, a list containing
pairs of indices of closest vectors is generated, and the run time of the search
algorithm is evaluated in seconds. Comparison of the search time is conducted for the
three solutions: elementwise comparison of feature vectors (brute force), kd-tree
which is the most commonly used structure in the duplicate detection problem, and
vp-tree which is the binary space partitioning tree proposed to be used for the
duplicate detection in images in this paper. Results of the experiments shown in Table 1,
show that the proposed approach performs search for several seconds faster than the
kd-tree, and significantly faster than the brute force approach with increasing sample
size.
In this study, an alternative approach using vp-trees was proposed to search for
similar feature vectors in the problem of duplicate detection in digital images. The
proposed solution shown best nearest vector search time in comparison with the most
frequently used solution based on kd-trees. It is planned to conduct a study on the
effectiveness of different vantage point selection algorithms to solve the problem of
duplicate detection in digital images.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>This work was supported by the Russian Foundation for Basic Research (RFBR)
grants № 16-37-00056 мол_а and № 16-37-00202 мол_а.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Christlein</surname>
            <given-names>V</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riess</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Angelopoulou</surname>
            <given-names>E.</given-names>
          </string-name>
          <article-title>An evaluation of popular copy-move forgery detection approaches</article-title>
          .
          <source>IEEE Trans. Inf. Forensic Secur</source>
          ,
          <year>2012</year>
          ;
          <volume>7</volume>
          (
          <issue>6</issue>
          ):
          <fpage>1841</fpage>
          -
          <lpage>1854</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Popescu</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farid</surname>
            <given-names>H</given-names>
          </string-name>
          .
          <article-title>Exposing digital forgeries by detecting duplicated image regions</article-title>
          . URL: http://www.ists.dartmouth.edu/library/102.pdf
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Fridrich</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soukal</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lukas</surname>
            <given-names>J</given-names>
          </string-name>
          .
          <article-title>Detection of copy-move forgery in digital images</article-title>
          . URL: http://www.ws.binghamton.edu/fridrich/Research/copymove.pdf
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Glumov</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Myasnikov</surname>
            <given-names>V.</given-names>
          </string-name>
          <article-title>The algorithm for copy-move detection on digital images</article-title>
          .
          <source>Computer Optics</source>
          ,
          <year>2013</year>
          ;
          <volume>37</volume>
          (
          <issue>3</issue>
          ):
          <fpage>360</fpage>
          -
          <lpage>367</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Vladimirovich</surname>
            <given-names>KA</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valerievich</surname>
            <given-names>MV</given-names>
          </string-name>
          .
          <article-title>A fast plain copy-move detection algorithm based on structural pattern and 2D Rabin-Karp rolling hash</article-title>
          . In:
          <string-name>
            <surname>Campilho</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kamel</surname>
            <given-names>M</given-names>
          </string-name>
          . (eds.)
          <source>ICIAR</source>
          <year>2014</year>
          , Springer, Heidelberg,
          <year>2014</year>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>I. LNCS</given-names>
          </string-name>
          ,
          <volume>8814</volume>
          :
          <fpage>461</fpage>
          -
          <lpage>468</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Bayram</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sencar</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Memon H</surname>
          </string-name>
          .
          <article-title>An efficient and robust method for detecting copy-move forgery</article-title>
          .
          <source>In: IEEE International Conference on Acoustics, Speech, and Signal Processing</source>
          ,
          <year>2009</year>
          :
          <fpage>1053</fpage>
          -
          <lpage>1056</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Huang</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            <given-names>W</given-names>
          </string-name>
          , Zhang Y.
          <article-title>Detection of copy-move forgery in digital images using SIFT algorithm</article-title>
          .
          <source>In: Pacific-Asia Workshop on Computational Intelligence and Industrial Application</source>
          ,
          <year>2008</year>
          ;
          <volume>2</volume>
          :
          <fpage>272</fpage>
          -
          <lpage>276</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ju</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            <given-names>K.</given-names>
          </string-name>
          <article-title>An Authentication Method for Copy Areas of Images</article-title>
          .
          <source>International Conference on Image and Graphics</source>
          ,
          <year>2007</year>
          :
          <fpage>303</fpage>
          -
          <lpage>306</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Bentley</given-names>
            <surname>JL</surname>
          </string-name>
          .
          <article-title>Multidimensional binary search trees used for associative searching</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <year>1975</year>
          ;
          <volume>18</volume>
          (
          <issue>9</issue>
          ):
          <fpage>509</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Yianilos</surname>
          </string-name>
          :
          <article-title>Data structures and algorithms for nearest neighbor search in general metric spaces</article-title>
          .
          <source>Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms</source>
          ,
          <year>1993</year>
          :
          <fpage>311</fpage>
          -
          <lpage>321</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Myasnikov</surname>
            <given-names>EV</given-names>
          </string-name>
          .
          <article-title>Evaluation of Space Partitioning Data Structures for Nonlinear Mapping</article-title>
          .
          <source>23rd International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision</source>
          , WSCG 2015 - Full Papers Proceedings,
          <year>2015</year>
          :
          <fpage>109</fpage>
          -
          <lpage>118</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>