<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Spatio-Temporal Slices for Frame Cut Detection in Video</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>N A Sorokina</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>V A Fedoseev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Image Processing Systems Institute - Branch of the Federal Scientific Research Centre “Crystallography and Photonics” of Russian Academy of Sciences</institution>
          ,
          <addr-line>Molodogvardeyskaya str. 151, Samara, Russia, 443001</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Samara National Research University</institution>
          ,
          <addr-line>Moskovskoe Shosse 34, Samara, Russia, 443086</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>266</fpage>
      <lpage>272</lpage>
      <abstract>
        <p>The paper proposes an approach for unauthorized inter-frame video change detection using spatio-temporal slices. This approach can significantly reduce the amount of data processed and replace video processing with image processing that could be performed much faster. To test the efficiency of this approach, we consider a simple algorithm in analyzing adjacent rows of a slice and then classifying the rows based on its result. Experimental studies have revealed that this algorithm shows moderate results in terms of quality, but it has a great potential for improvement, which confirms the prospects of spatio-temporal slices for the given problem.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        1.1. Problem statement
Today digital video plays an increasingly important role in the society. In 2015, according to the
Sandvine report [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the percentage of video and audio in North American traffic exceeded 70%. In
addition, according to the Ericsson report [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], by 2019 the percentage of video in mobile traffic should
exceed 50% (and already now it is above 40%). The reasons for that is not only entertainment industry
development, but also the growing market for video surveillance systems (up to 20% per annum,
according to the Markets and Markets analysts' report [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]), and their widespread introduction to both
large business structures and small companies. As a consequence, the data received by surveillance
systems are increasingly used in investigative activities or as evidence in legal proceedings. For this
reason, such data must be reliably protected from unauthorized alteration by intruders.
      </p>
      <p>One of the most common methods of unauthorized video alteration is inter-frame modification,
which includes the removal of video fragments or their replacement with copies of other ones. Such
changes can remove crime evidence or data about movements of persons or vehicles that are important
in a particular context. In video signals obtained from a stationary camera, such changes can be
practically invisible. In the case of a moving camera, such changes can also be hard to detect when an
intruder cuts or replaces short-term fragments.</p>
      <p>
        In this paper, we consider the problem of detecting inter-frame artificial changes in video signals
taken from a moving or stationary camera. The detection method should work well with video signals
stored in various formats, and also combine high detection accuracy with high speed.
1.2. Review of related studies
In practice, the detection of artificial changes in video can be carried out using digital
forensics methods developed since the second half of the 2000s [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4-6</xref>
        ]. The main achievements in
this direction are associated with H. Farid, A.C. Popescu, S. Prasad, J. Fridrich, A. Piva, M. Barni. So,
the latest two in the review paper [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] classified methods of digital video forensics onto the following
groups:
      </p>
      <p>1) camera-based methods that analyze various video artifacts to determine the optical system of the
camera;
2) coding-based methods that identify artifacts resulting from encoding video using certain codecs;
3) geometry, or physics-based methods that detect violations in the physical or geometric
parameters of the objects observed;
4) pixel-based methods based on detecting changes at the pixel level of the video.</p>
      <p>
        Examples of algorithms aimed to detect inter-frame modifications can be found, in particular, in
[814]. Most of them are coding-based methods [
        <xref ref-type="bibr" rid="ref10 ref11 ref8 ref9">8-11</xref>
        ] and geometry / physics-based [
        <xref ref-type="bibr" rid="ref12 ref13 ref14">12-14</xref>
        ].
      </p>
      <p>
        The coding-based methods [
        <xref ref-type="bibr" rid="ref10 ref11 ref8 ref9">8-11</xref>
        ] are based on the properties of certain video formats
(usually different versions of MPEG) and assume the separation of frames into different types
(Pframes, I-frames, etc.). Therefore, these methods do not satisfy the claimed universality
requirement regarding the data format. In addition, a significant number of such methods (in
particular, [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">8-10</xref>
        ]) may be used to determine the fact of video change, but do not allow to determine
the exact location of the changes (in the time domain).
      </p>
      <p>
        As for the geometry / physics-based methods, many of them (in particular, [
        <xref ref-type="bibr" rid="ref12 ref13">12-13</xref>
        ]) are based on
the use of an optical flow to track changes frame by frame. This technology works well for the case
of a stationary camera, but for a moving camera, it needs to take into account camera movements
which can be unknown. Moreover, the methods based on optical flow do not provide a high
computational performance. Another method from this group [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] utilizes another approach. As a
vivid example, this paper considers a simple scene with moving balls. To detect the removal of a
video fragment, the method first tracks ball positions, and then detects physically unjustified
deviations of the found trajectories, which is possible evidence of unauthorized changes. The main
drawback of this method is the need to solve a complex problem of tracking several moving objects.
However, the basic idea of this method is rather attractive: to detect unauthorized alterations, we
can try to analyze video data over a long time interval.
      </p>
      <p>
        In this paper, we test a method based on the same idea of detecting deviations in time. However, as
the analyzed data, we propose to use the so-called spatio-temporal slices of video images that are
slices of the video data cube along the time axis and one of the spatial axes (for example, the
horizontal one). If you build several such horizontal slices of video at a certain vertical interval, the
resulting set of images will give quite enough information about object movements, although the data
will have much less volume comparing with the original video. Moreover, for the processing of these
data, we can use computationally effective image processing methods, in particular, parallel-recursive
FIR filters [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ].
      </p>
      <p>
        In papers [
        <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
        ] a similar approach is used for road object detection in the problem
of autonomous navigation., The algorithms [
        <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
        ] also process not all the pixels of each frame, but
only horizontal lines spaced from each other at equal intervals. This allows the authors to
ensure the solution of the problem with satisfactory accuracy in real time.
      </p>
      <p>The paper is organized as follows. Section 2 illustrates the traces of natural events in
spatiotemporal slices and outlines the principles for detecting inter-frame video changes with their help.
Section 3 describes a simple cut detection method based on these principles. Finally, Section 4
describes the experimental studies.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Object movements at spatio-temporal slices</title>
      <p>Figure 1 shows an example of a spatio-temporal slice of a video obtained by a stationary camera. In
this figure, the following events occurred are marked with numbers: (1) – movements by the hands of
a standing person, (2) – movements of a person who emerged from one door and entered another, (3) –
appearance and stopping of a car, (4) - the appearance of a person from the left border of the frame and
their movement. As the figure shows, these events lead to the appearance of smooth curves on the cuts
characterizing object movements. You can also note the local background shifts due to camera
deviations or fluctuations of observable objects (trees, advertising signs, etc.). Figure 2 illustrates four
typical types of object movements observed in spatio-temporal slices.</p>
      <p>In case of unauthorized video alteration in the time domain, video slices can clearly visualize them
by horizontal offsets, as shown in Figure 3. These offsets can appear over the entire frame width.
Furthermore, if we consider a set of slices made at different vertical positions, such offsets will be
observed in the same lines of the slices corresponding to the same time shift. Thus, we assume that the
considered video slices contain enough information to detect inter-frame changes.</p>
      <p>(a)</p>
      <p>(b)
(c) (d)
Figure 2. Four types of object movement in spatio-temporal
slices: (a) an object enters the camera view and goes beyond
it, (b) an object appears in the camera view and goes beyond</p>
      <p>it, (c) an object enters the camera view and disappears,
(d) an object appears and disappears within the camera view.</p>
      <p>Next, we can use the obtained k and  k (k ) (for simplicity, we will sign the latest one as  k , i.e.
without the argument) to detect artificial changes based on the following assumptions. Low values of
k and  k indicate unaltered videos, while sharp leaps in either k or  k may give evidence of
artificial changes (see Fig. 4-5).</p>
      <p>To detect artificial changes using k and  k , we used the algorithm based on supervised learning.
It classified slice image rows into two classes: “Cut” and “Non-cut”. To train the algorithm, we used
the following features of the rows:
p1  k ,
p2   k ,
p3  min  k  k1 , k1  k  ,
p4  min   k  k1 ,  k1  k  ,
p5  k  med k  ,
p6   k  med  k  .
(3)
(4)
(5)
(6)
(7)
(8)</p>
      <p>We should note especially that we calculated the features (3)-(8) in the local maxima only.
Function med (x) in equations (7)-(8) means the median of the 4-point neighborhood of pixel x
excluding x itself.</p>
      <p>To speed up the algorithm, we classified the only rows corresponding to the local maxima of p3
and p4 .</p>
      <p>(а)</p>
      <p>(b)</p>
      <p>Figure 4. Dependence of k (a) and  k (b) on k for an unaltered video.</p>
      <p>(а) (b)</p>
      <p>Figure 5. Dependence of k (a) and  k (b) on k for a video with a cut at frame 130.</p>
    </sec>
    <sec id="sec-3">
      <title>4. Experimental investigations</title>
      <p>To test the proposed method, we used two types of video: DVR recordings (made from a moving car),
and model records (made mainly by a stationary camera and containing typical pedestrian
movements).</p>
      <p>To conduct the experiments, we made spatio-temporal slices of the source videos, and then divide
them onto 100-line fragments. A half of these fragments were obtained from 100 consecutive frames,
while the other half was composite and contained a frame cut in the 50th line. The length of the gap in
the frames was a parameter of the experiment. Then for each image, we calculated feature sets. 70% of
the data obtained was used as a training set, whereas the remaining 30% was a test sample. When
testing the method, it did not use the information about the line of the gap. The experimental studies
were carried out in two stages. The first one was aimed to select the most appropriate feature set and
classifier model. We considered two models: linear SVM and non-linear SVM with radial basis
function. At the second stage, we investigated the algorithm performance for various gap lengths and
for different types of video. In addition, at the second stage, we analyzed the efficiency of combining
data from different lines of several slices, which was made by summarizing lines.</p>
      <p>The first stage of the experiments was carried out on DVR recordings with the gap length of 30 and
without slice summation. The results of this stage in terms of classification accuracy (equal to the
fraction of the correct classifications) are given in Figure 6. The diagram in Figure 6 shows that the
best accuracy values are resulted from the use of the ( p3, p4 ) feature set and linear SVM. Therefore,
these options were further used in the second stage (see results in Table 1). The obtained results show
that slice summation improves the classification quality sufficiently. We may also notice that the
algorithm works better on DVR videos which contain a rapidly changing background. In general, the
final results allow us to conclude that the proposed method is able to solve the considered problem,
even in the simplified version of the algorithm described in Section 2.
5. Conclusion
In this paper, we have tested the approach based on spatio-temporal video slices to solve the problem
of detecting unauthorized inter-frame changes in video. This method is theoretically capable to solve
this problem with a high speed due to the processing of only a part of the video signal, as well as using
fast image processing techniques. To test the efficiency of this approach, we proposed a simple
algorithm for detecting inter-frame changes and performed some numerical experiments. Our studies
showed that the algorithm provides an accuracy of not less than 0.8 and works better for video
captured with a moving camera. The results of the studies allow us to conclude that the method of
spatio-temporal image slices looks promising, but the algorithm should be significantly improved in
terms of increasing the accuracy at short gap lengths and when using video from a stationary camera.
Acknowledgments
This work was supported by the Russian Foundation for Basic Research (grants 16-29-09494,
16-41-630676), by the Ministry of Education and Science (grant МК-1907.2017.9), and by the
Federal Agency for Scientific Organizations (Agreement 007-GZ/43363/26).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Anon</surname>
            <given-names>CNW Sandvine</given-names>
          </string-name>
          :
          <article-title>Over 70% of North American traffic is now streaming video and audio (Access mode</article-title>
          : http://www.newswire.ca/news-releases/sandvine-over
          <article-title>-70-of-north-americantraffic-is-now-streaming-video-and-audio-560769981</article-title>
          .html)
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Anon</given-names>
            <surname>Internet Video</surname>
          </string-name>
          <article-title>Streaming to Dominate Mobile Data Traffic by 2019 - ISPreview UK (Access mode</article-title>
          : http://www.ispreview.co.uk/index.php/
          <year>2014</year>
          /06/internet-video
          <article-title>-streamingdominate-mobile-data-traffic-2019</article-title>
          .html)
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Anon</given-names>
            <surname>Video Surveillance Market by Applications</surname>
          </string-name>
          &amp;
          <article-title>Management Services 2015 MarketsandMarkets (Access mode</article-title>
          : http://www.marketsandmarkets.com/Market-Reports/ surveillance-277.html)
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Poisel</surname>
            <given-names>R</given-names>
          </string-name>
          and
          <article-title>Tjoa S 2011 Forensics Investigations of Multimedia Data: A Review of the Stateof-the-</article-title>
          <source>Art Sixth International Conference on IT Security Incident Management and IT Forensics 48-61</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Rocha</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scheirer</surname>
            <given-names>W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boult</surname>
            <given-names>T</given-names>
          </string-name>
          and
          <string-name>
            <surname>Goldenstein</surname>
            <given-names>S 2011</given-names>
          </string-name>
          <article-title>Vision of the Unseen: Current Trends and Challenges in Digital Image and Video Forensics ACM Comput</article-title>
          .
          <source>Surv. 43</source>
          <volume>26</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          :
          <fpage>42</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Gashnikov</surname>
            <given-names>M V</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glumov</surname>
            <given-names>N I</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            <given-names>A V</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mitekin</surname>
            <given-names>V A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Myasnikov</surname>
            <given-names>V V</given-names>
          </string-name>
          and
          <string-name>
            <surname>Sergeev</surname>
            <given-names>V V</given-names>
          </string-name>
          <year>2016</year>
          <article-title>Hyperspectral remote sensing data compression and</article-title>
          protection
          <source>Computer Optics</source>
          <volume>40</volume>
          (
          <issue>5</issue>
          )
          <fpage>689</fpage>
          -
          <lpage>712</lpage>
          DOI: 10.18287/
          <fpage>2412</fpage>
          -6179-2016-40-5-
          <fpage>689</fpage>
          -712
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Bestagini</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fontani</surname>
            <given-names>K M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Milani</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barni</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piva</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tagliasacchi</surname>
            <given-names>M</given-names>
          </string-name>
          and
          <string-name>
            <surname>Tubaro K S 2012</surname>
          </string-name>
          <article-title>An overview on video forensics</article-title>
          <source>Proceedings of the 20th European Signal Processing Conference</source>
          (EUSIPCO)
          <fpage>1229</fpage>
          -
          <lpage>1233</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Shanableh</surname>
            <given-names>T 2013</given-names>
          </string-name>
          <article-title>Detection of frame deletion for digital video forensics</article-title>
          <source>Digital Investigation</source>
          <volume>10</volume>
          <fpage>350</fpage>
          -
          <lpage>360</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Stamm</surname>
            <given-names>M C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin W S and Liu K J R 2012 Temporal</surname>
          </string-name>
          <article-title>Forensics and Anti-Forensics for</article-title>
          <source>Motion Compensated Video IEEE Transactions on Information Forensics and Security</source>
          <volume>7</volume>
          <fpage>1315</fpage>
          -
          <lpage>1329</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Gironi</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fontani</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bianchi</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piva</surname>
            <given-names>A</given-names>
          </string-name>
          and
          <string-name>
            <surname>Barni M 2014</surname>
          </string-name>
          <article-title>A video forensic technique for detecting frame deletion and</article-title>
          insertion IEEE International Conference on Acoustics, Speech and
          <string-name>
            <given-names>Signal</given-names>
            <surname>Processing</surname>
          </string-name>
          (ICASSP)
          <fpage>6226</fpage>
          -
          <lpage>6230</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Wu</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            <given-names>X</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            <given-names>T</given-names>
          </string-name>
          and
          <string-name>
            <surname>Wang</surname>
            <given-names>W 2014</given-names>
          </string-name>
          <article-title>Exposing video inter-frame forgery based on velocity field consistency</article-title>
          IEEE International Conference on Acoustics, Speech and
          <string-name>
            <given-names>Signal</given-names>
            <surname>Processing</surname>
          </string-name>
          (ICASSP)
          <fpage>2674</fpage>
          -
          <lpage>2678</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Chao</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            <given-names>X</given-names>
          </string-name>
          and
          <string-name>
            <surname>Sun T 2013 A Novel Video</surname>
          </string-name>
          Inter-frame
          <source>Forgery Model Detection Scheme Based on Optical Flow Consistency The International Workshop on Digital Forensics and Watermarking 267-281</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Wang</surname>
            <given-names>W</given-names>
          </string-name>
          and
          <string-name>
            <surname>Farid H 2006 Exposing Digital</surname>
          </string-name>
          <article-title>Forgeries in Video by Detecting</article-title>
          <source>Double MPEG Compression Proceedings of the 8th Workshop on Multimedia and Security</source>
          (New York, NY, USA: ACM)
          <fpage>37</fpage>
          -
          <lpage>47</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Zhang</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            <given-names>Y</given-names>
          </string-name>
          and
          <string-name>
            <surname>Zhang M 2009 Exposing Digital Video Forgery by Ghost</surname>
          </string-name>
          Shadow
          <source>Artifact Proceedings of the First ACM Workshop on Multimedia in Forensics 49-54</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Glumov</surname>
            <given-names>N I</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Myasnikov</surname>
            <given-names>V V</given-names>
          </string-name>
          and
          <string-name>
            <surname>Sergeyev</surname>
            <given-names>V V</given-names>
          </string-name>
          <year>1996</year>
          <article-title>Parallel-recursive local image processing and polynomial bases</article-title>
          <source>Proceedings of Third International Conference on Electronics, Circuits, and Systems</source>
          <volume>2</volume>
          <fpage>696</fpage>
          -
          <lpage>699</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Myasnikov</surname>
            <given-names>V V</given-names>
          </string-name>
          <year>2007</year>
          <article-title>Fast algorithm for recursive computation of the convolution of an image with a two-dimensional inseparable polynomial FIR filter Pattern Recognit</article-title>
          .
          <source>Image Anal</source>
          .
          <volume>17</volume>
          <fpage>421</fpage>
          -
          <lpage>427</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Kiy</surname>
            <given-names>K I</given-names>
          </string-name>
          and
          <string-name>
            <surname>Dickmanns E D 2004</surname>
          </string-name>
          <article-title>A color vision system for real-time analysis of road scenes IEEE</article-title>
          <source>Intelligent Vehicles Symposium 54-59</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Kiy K I 2015</surname>
          </string-name>
          <article-title>A NewReal-Time Method of Contextual Image Description and Its Application in Robot Navigation</article-title>
          and
          <source>Intelligent Control Computer Vision in Control Systems</source>
          <volume>2</volume>
          <fpage>109</fpage>
          -
          <lpage>133</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>