<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Shot Boundary Detection: Fundamental Concepts and Survey</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>1st Benoughidene Abdel halim</string-name>
          <email>benouhalim@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>2nd Titouna Faiza</string-name>
          <email>ftitouna@yahoo.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of computer science, University of Batna 2</institution>
          ,
          <addr-line>Batna</addr-line>
          ,
          <country country="DZ">Algeria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <volume>1</volume>
      <fpage>119</fpage>
      <lpage>127</lpage>
      <abstract>
        <p>-A great part of the Big Data surge in our digital environments is in the form of video information. Hence automatic management of this massive growth in video content seems to be significantly necessary. At present researches topic on automatic video analyses includes video abstraction or summarization, video classification, video annotation and content based video retrieval. In all these applications one needs to identify shot boundary detection. Video shot boundary detection (SBD) is the process of segmenting a video sequence into smaller temporal units called shots. SBD is the primary step for any further video analyses. This paper presents the fundamental theory of the video shot boundary, and a brief overview on shot boundary detection approaches and their development. The advantages and disadvantages of each approach are comprehensively explored and challenges are presented. In addition to that, we focused on the machine learning technologies such as deep learning approaches for SBD could be directed as new directions for the future. Index Terms-Shot Boundary Detection(SBD), Cut Transition (CT), Gradual Transition (GT), Temporal Video Segmentation, Video Content Analysis, Content Based Video Indexing and Retrieval (CBVIR), Feature Extraction, Machine Learning, Deep Learning, Convolutional Neural Networks (CNN), Multimedia Big Data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>With the rapid development of computer networks and
multimedia technology, the amount of multimedia data available
every day is enormous and is increasing at a high rate, as well
as the ease of access and availability of multimedia sources,
which leads to big data revolution Multimedia.</p>
      <p>
        Video is the most consumed data type on the Internet such
as YouTube, Vimeo or Dailymotion, Yahoo Video, social
networking sites like Facebook, Twitter, Instagram, etc. The
explosive growth in video content leads to the problem of content
management. However, people spent their time uploading and
browsing huge videos to determine whether these videos were
relevant or not, this is an difficult and stressful task for humans
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In such a scenario, it is necessary to have automated
video analysis applications to represent information stored in
large multimedia data. Such techniques are grouped into a
single concept of Content-Based Video Indexing and Retrieval
(CBVIR) systems. These applications include browsing of
video folders, news event analyses, intelligent management of
videos, video surveillance [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], key frame extraction [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and
video event partitioning [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In addition, the video summary
is the best and most effective solution for converting large,
amorphous videos into structured, concise, clear and
meaningful information. The main task of summarizing a video
is to segment the original video into shots and extract key
frames from the shots, which will be the most representative
and concise of the entire video [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Video shot boundary detection (SBD) is also called shot
segmentation, is the first process in video summarization, and
its output significantly affects the subsequent processes. The
main idea of video shot boundary is extracting the feature
of video frames, and then detecting the shot type according
to the difference the feature. There are two kinds of video
shot boundary detection: Cut Transition (CT) and Gradual
Transition (GT) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In general, the performance of the shot
boundary detection algorithm depends on its ability to detect
transitions (shot boundaries) in the video sequence. Whereas,
the accuracy of detection of shot boundary detection generally
depends on the extracted features and their effectiveness in
representing the visual content of video frames and the
computational cost of the algorithm, which needs to be reduced [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
Practically, there are some effects that appear in a video shot
such as: flash lights or light variations, object/camera motion,
camera operation (such as zooming, panning, and tilting), and
similar background. Currently, there is no complete solution
to these problems or most of them in the same algorithm. In
other words, a favorable and effective method of detecting
transitions between shots is still not available despite the
increased attention devoted to shot boundary detection in the
last two decades. This unavailability is due to randomness and
raw video data size. Hence, a robust, efficient, automated shot
boundary detection method is a necessary requirement [8].
Most of the existing reviews are not covering the recent
advancements directionsin the field of shot boundary detection
as deep learning. This paper mainly focusing on review and
analyze different kinds of shot boundary detection algorithms
that are implemented in the uncompressed domain following
their accuracy rate, computational load, feature extraction
technique, advantages, and disadvantages. Future research
directions are also discussed.
      </p>
      <p>II. BASIC CONCEPTS OF SHOT BOUNDARY DETECTION
Partitioning a video sequence into shots is the first step
toward video summarization. A video shot is defined as a
series of interrelated consecutive frames taken contiguously
by a single camera and representing a continuous action in
time and space. As such, a shot boundary is the transition
between two shots. This section presents the main concepts
for shot boundary detection in videos [9].</p>
      <p>1) Video Definition : A video is a collection of image
frames arranged in a time-sequenced manner. As video
consist of number of frames depend upon size of video.
These frames occupy large space in memory. Frame rate
is about 20 to 30 frames per second [10].
2) Video Hierarchically : A video can be broken down
in scene, shot and frames. Scene is a logical grouping
of shots into a semantic unit. A shot is a sequence
of frames captured by a single camera in a single
continuous action. The frames within a shot (intra-shot
frames) contain similar information and visual features
with temporal variations. A frame is the smallest unit
that constitutes a shot [10]. (see Fig 1)
3) Shot transition types : The transition between one
shot and the following can be cut or gradual. The cut
shot occurs when two successive shots are concatenated
directly without any editing (special effects). This type
of transition is also known as a abrupt or hard transition.
The cut is considered a sudden change from one shot
to another. By contrast, gradual shot occurs when two
shots are combined by utilizing special effects
throughout the production course. Gradual shot may span two
or more frames that are visually interdependent and
contain truncated information [11]. According to the
different editing effects, there are several different kinds
of gradual shot types, such as fade in/fade out , dissolve,
wipe [12]. (see Fig 2)</p>
    </sec>
    <sec id="sec-2">
      <title>a) A Cut : Is a sudden change from a video shot to</title>
      <p>
        another one [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. (see Fig 3)
      </p>
    </sec>
    <sec id="sec-3">
      <title>b) A fade out : Occurs when the shot gradually turns</title>
      <p>
        into a single monochrome frame, usually dark [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
(see Fig 4)
      </p>
    </sec>
    <sec id="sec-4">
      <title>c) A fade in : Takes place when the scene gradually</title>
      <p>
        appears on screen.[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. (see Fig 5)
      </p>
    </sec>
    <sec id="sec-5">
      <title>d) A dissolve : Happens when a shot gradually re</title>
      <p>
        places another one. One disappears as the
following appears, and for a few seconds, they overlap,
and both are visible. In the process of dissolve, two
adjacent shots are temporally as well as spatially
associated [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. (see Fig 6)
e) The wipe : Is more dynamic and is considered
as the most difficult to model and to detect. It
happens when a shot pushes the other one off
the screen. In this case, two adjacent shots are
spatially separated at any time, but not temporally
separated. Its difficulty lies in the number of types
of wipe transitions that exists. Indeed, when a shot
is moving from the screen (i/.e leaving place to
the other incoming shot), the movement can be
either horizontal (i.e. from bottom to top or vice
versa), vertical (e.g. from left to right), oblique (i.e.
from a corner to the opposite one), starting from
the center, going towards the center or others, etc
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. (see Fig 7)
4) The feature extraction : Is the process to represent raw
image in a reduced form to facilitate decision making
such as pattern detection, classification or recognition.
The features extracted from the video frames may be
low-level, mid-level or high level features [13].
      </p>
    </sec>
    <sec id="sec-6">
      <title>a) Low-level features : The low-level features are</title>
      <p>minor details of the image, like lines or dots,
that does not take into consideration the visual
or semantic. The low-level features consist of
RGB values/histograms, intensity values, mean,
variance, entropy of the pixel values etc [10].
b) Mid-level features : The mid-level features are
intermediate between the low-level features and high
level semantics. The mid-level features consist of
feature point detectors and descriptors. Although,
the feature points may be used for object
identification in an image, these are not appropriate for high
level semantic description of the content depicted
in an image [10].
c) High-level features : High-level features are built
to detect objects and larger shapes in the image,
trajectory of paths followed by objects, motion
vectors etc. These may be used for high level
description of the content in an image [10].</p>
      <p>Because of the importance of SBD, many researchers have
presented algorithms to boost the accuracy of SBD for Cut
Transition (CT) and Gradual Transition (GT). We introduce a
survey on various SBD approaches below.</p>
    </sec>
    <sec id="sec-7">
      <title>III. SHOT BOUNDARY DETECTION METHODS</title>
      <p>Nowadays, many researchers are doing work to develop
more reliable and accurate algorithms that can results into
more precise shot boundaries. There are several common
methods that deal with CT and/or GT:</p>
      <sec id="sec-7-1">
        <title>A. Pixel-Based Methods</title>
        <p>
          In this method, intensity of pixels is evaluated by taking two
consecutive video frames and comparing pixel by pixel or the
percentage of pixels that has been changed in two successive
frames is compared. When the intensity of pixels is more than
threshold, then it is referred to shot change [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
        <p>The main drawback of such approaches (i.e intensity pixels),
whatever the metric used, is sensitive to fast object and camera
movement, camera panning or zooming. And limitations in this
method is setting threshold manually.</p>
      </sec>
      <sec id="sec-7-2">
        <title>B. Histogram-Based Methods</title>
        <p>The most popular metric for cut transition detection is
the difference between histograms of two consecutive frames.
Histogram describes the distribution of gray, color, shape and
texture without taking into account their position, so we can
estimate the similarity between two images through the
histogram similarity. This method first extracts the histograms of
the video frames, and then calculates the distance between the
histograms. When the distance is more than threshold, then it is
referred to shot change. There are several kinds of methods to
calculate the histogram distance , such as Manhattan distance
,Euclidean distance and chi-square distance. Several variants
of histogram-based have been proposed in the literature. Lu et
al. In [14] employed Singular Value Decomposition (SVD),
with Hue Saturation Value (HSV) histogram, to propose a
low computational complexity SBD scheme. The candidate
segment selection using adaptive threshold is implemented,
The color histograms are extracted in HSV
(Hue-SaturationValue) space from all frames in each candidate segment,
forming a frame feature matrix. The SVD is then performed
on the frame feature matrices of all candidate segments to
reduce the feature dimension. Bendraou Youssef et al. In [15]
formulated a new approach for detecting both hard (CT) and
gradual (GT) transitions. They proposed approach processes
the video segment by segment, is composed of two main parts:
static segment verification (A candidate segment that have not
a transition) and shot transition identification (A candidate
segment that may contain a transition CT or GT). Features
are extracted from the Concatenated Block Based Histograms
(CBBH). For each non static segment all frames in each
this segment, forming a frame feature matrix. The economy
SVD is then performed on feature matrix. An adaptive double
thresholding process was employed for detecting the hard cuts.
For gradual transitions detection, the folding in technique,
known as SVD-updating, is used for the first time in video shot
boundary detection. Hong Shao et al. In [16] Hue Saturation
Value (HSV) color histogram and Histogram of Gradient
(HOG) features are exploited to detect cut transition. HSV
color histogram is used to detect the difference between two
adjacent frames. While HOG feature is adopted for secondary
detection to improve the algorithm performance.</p>
        <p>The study confirmed that histogram difference is less
sensitive to object motion than the pair-wise comparison, since it
ignores the spatial changes in a frame. However, histograms
may also produce missed shots when two frames with similar
histograms share a different content.</p>
      </sec>
      <sec id="sec-7-3">
        <title>C. Edge-Based Methods</title>
        <p>Another choice for characterizing an image is its edge
information. An edge is the boundary between an object and the
background, and indicates the boundary between overlapping
objects. In edge-based approaches, transition is declared when
the locations of the edges of the current frame exhibit a large
difference with the edges of the previous frame that have
disappeared. For example, Heng et al. In [17] proposed a
method based on an edges. They presented the concept of
an objects edge by considering the pixels close to the edge. A
matching of the edges of an object between two consecutive
frames was performed. Then, a transition was declared by
utilizing the ratio of the objects edge that was permanent over
time and the total number of edges. Zheng et al. In [18] an
approach based on a Robert edge detector for detecting
fadein and fade-out transitions was proposed. First, the authors
identified the frame edges by comparing gradients with a fixed
threshold. Second, they determined the total number of edges
that appeared. When a frame without edges occurred, fade in
or fade out was declared.</p>
        <p>The advantage of this feature is that it is sufficiently
invariant to illumination changes and several types of motion,
and is related to the human visual perception of a scene. Its
main disadvantage is computational cost, and noise sensitivity.</p>
      </sec>
      <sec id="sec-7-4">
        <title>D. Motion-Based Methods</title>
        <p>Motion is a key feature in videos and forms an integral part
of it. Because shots with camera motion can be incorrectly
classified as gradual transitions, detecting zooms and pans
increases the accuracy of a shot boundary detection algorithm.
Bruno et al. In [19] proposed a linear motion prediction
method based on wavelet coefficients, which were computed
directly from two successive frames.</p>
        <p>For an accurate motion estimation, each block should be
matched with all blocks of the next frame, which lead to a
large and unreasonable computational cost.</p>
      </sec>
      <sec id="sec-7-5">
        <title>E. Deep Learning-Based Methods</title>
        <p>Recently, employing deep learning algorithms in the field
of computer vision received much attention from academics.
Convolutional Neural Networks (CNN) is one of the most
important deep learning algorithms due to its significant abilities
to extract high level features from images and video frames
[20].</p>
        <p>Tong et al. In [21] used The CNN model to extract
highlevel interpretable features from the frames. It is capable of
detecting both CT and GT boundaries. An adaptive threshold
process was employed as a preprocessing stage to select
candidate segments. Taken one frame as input, the output of
the network is a probability distribution among 1000 classes.
The five classes with the highest probabilities are selected as
the high-level features of the frame and called as the TAGs of
the frame for simplicity. However, in some cases when changes
in GT are small and the background is similar, semantics do
not change at all thus they cannot achieve high detection of
accuracy.</p>
        <p>Jingwei Xu et al. In [22] use convolutional neural networks
(CNNs) to extract typical features of frames. They adopted a
candidate segment selection method to locate the positions of
shot boundaries coarsely using adaptive thresholds and
eliminate most non-boundary frames. Cut and gradual transitions
can be obtained by using a novel pattern-matching method
based on a new similarity strategy which is partially inspired
by [14].</p>
        <p>Hassanien et al. In [23] presented a shot boundary detection
method on huge video data set based on spatial-temporal CNN.
The Technique is named DeepSBD network that takes a
segments of fixed length as input and classify it into 3 categories
(cut, gradual, no transition), its output is fed through SVM
classifier. This gives the first labeling estimate. Consecutive
segments with the same labeling are merged and the result is
passed to a post-processing step. The step reduce false alarms
of gradual transitions through a histogram-driven temporal
differential measurement. However, the C3D ConvNet is more
complex than 2D ConvNet, which requires much computation
resources and the lengths of gradual transitions are varying
but DeepSBD is not designed for multi-scale detection.</p>
        <p>Michael Gygli et al. In [24] proposed to learn shot detection,
from pixels to final shot boundaries. A fully convolutional
neural network has been used for shot boundary detection
task. For training this model, They consider the all shot
boundaries are generated. Thus, they created a dataset with one
million frames and automatically generated transitions such
as cuts, dissolves and fades. They considered this work as a
binary classification problem to correctly predict if a frame
is part of the same shot as the previous frame or not. Their
method obtains state-of-the-art results on the RAI data set,
while running at an unprecedented speed of more than 120x
real-time. Currently, their model makes three main errors, (i)
missing long dissolves, which it was not trained with, (ii)
partial scene changes and (iii) fast scenes with motion blur.</p>
        <p>Shitao Tang et al. In [25] presented a new cascade
framework, a fast and accurate approach for shot boundary
detection. The first stage applied adaptive thresholding to initially
filter the whole video and selects the candidate segments for
acceleration. In the second stage, they used a well designed 2D
ConvNet learning the similarity function between two images
to locate the cut transitions. The third stage utilized a novel
C3D ConvNet model to locate positions of gradual transitions.</p>
        <p>Lifang Wu et al. In [26] presented a two stage method for
shot boundary detection (TSSBD) which distinguishes cut shot
by fusing color histogram (HSV) and deep features (CNN)
where divide the complete video into segments containing
gradual transitions, and over these video segments, gradual
shot change detection is implemented using 3D-convolutional
neural network, which classifies clips into specific gradual
shot change types with a majority voting strategy, gap filling
conducts to effectively distinguish shot types of frames and
locate shot boundaries.</p>
        <p>Rui Liang et al. In [27] proposed a new video shot boundary
detection method based on CNN feature. The method extracts
the features using the AlexNet and ResNet-152 model for
each frame, and calculate consine similarity to describe the
similarity of a pair of frames. For cut boundary detection, they
used the similarity of local frames to get more accuracy, and
proposed dual-threshold sliding window for gradual transition
detection.</p>
        <p>Lifang Wu et al. In [28] proposed a method for shot
boundary detection with spatial-temporal convolutional neural
networks based gradual shot detection and histogram base shot
filtering. The cut shots are extracted from the whole video
with histogram base shot filtering. Then, C3D deep model
is constructed to extract features of frames and distinguish
shot types of dissolve, swipe, fade in and fade out, and
normal. For untrimmed videos, a frame level merging strategy
is constructed to help locate the boundary of shots from
neighboring frames.</p>
        <p>However, those methods only using the CNN for feature
extraction and then using traditional classifiers to detect the
scene change. Recently, with the development and popularity
of deep learning, many efficient networks for various of
applications have been proposed. For example, the deep learning
model Res-Net based networks can obtain very high accuracy
in image classification and object detection for many large
scale image data sets. Therefore, it can be adopted to solve
the issue of shot change detection. The downside of this
method is revolve around the need for large annotated
datasets. However, that the real data can contain cuts between shots
of the same scene which rarely occur in the synthetic data sets
due to the nature how they are generated.</p>
      </sec>
      <sec id="sec-7-6">
        <title>F. Others approaches</title>
        <p>Thounaojam et al. In [29] proposed a shot detection
approach based on genetic algorithm (GA) and fuzzy logic.
Fuzzy system is used to classify the video frames into
different types of transitions (cut and gradual). Color Histogram
Difference is used for feature extraction and for finding the
differences between two consecutive frames in a video. GA
is used as optimizer to find the optimal range of values
of the fuzzy membership functions. The result shows that
the combination of this feature is efficient and the accuracy
increases with increase in iterations/generations of GA.</p>
        <p>Jialei Bi et al. In [30] proposed a novel cut detection method
based on information theory using SVM. They first compute
the dissimilarity using information theory and construct a
discriminative feature vector based on mutual information.
Then a support vector machine is trained to classify the frames
as cut or none-cut frames without using a traditional global or
adaptive threshold.</p>
        <p>Junaid Baber et al. In [31] the proposed method, shot
boundaries are extracted from videos using frame entropy and
SURF descriptors. Cut boundaries were detected by difference
of entropy of the gray scale intensity in adjacent frames.
And fade boundaries were detected indiscriminately based
on temporal changes in the entropy of the pixel intensity
across each images. Then the false detection can be eliminated
effectively by using local descriptors SURF.</p>
        <p>Sawitchaya Tippaya et al. In [32] proposed a multi-modal
visual features based SBD framework. They adopted a
candidate segment selection that performs without the threshold
calculation. The discontinuity signal is calculated based on
the SURF matching score and RGB histogram cosine distance
value.</p>
        <p>Finally, In TABLE I demonstrates a comparison among
different SBD algorithms based on features employed, frame
skipping, data-set used, accuracy (precision, recall and F1
score measures). From the table, it can be observed that the
algorithms used frame skipping technique have low
computational cost with an acceptable accuracy as in [14]. Although
some algorithms utilize frame skipping, they show a moderate
computational cost because of the computation complexity of
the features used such as SURF in [32]. Obviously,
CNNbased SBD algorithms that show a high computational cost
such as [27, 28, 29, 32, 36] gain a remarkable accuracy
compared to other algorithms.</p>
        <p>IV. SHOT BOUNDARY DETECTION EVALUATION METRICS</p>
        <p>There are two prospective metrics that need to be used
to evaluate the performance of SBD algorithms. These two
aspects are the accuracy and the computational complexity.
Usually improving one aspect would be on the cost of the
other one. Also, for the evaluation to be truly representative
and reliable for comparing various techniques, it must be done
in similar conditions and with very similar data sets. In this
section, we discuss the common metrics (recall, precision, and
F1-score ) of measuring the accuracy and the computational
complexity [33].</p>
        <p>
          1) Precision : It is the ratio of detection of correct
experimental to the detection of correct and false.
(1)
(2)
(3)
2) Recall : It is the ratio of detection of correct
experimental to the detection of correct and missed.
3) F1 score : It combines precision and recall to achieve
one score. It is varies in the range [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] where a score
of 1 indicates the best efficacy of a system.
        </p>
        <p>precision =</p>
        <p>Nc</p>
        <p>Nc + Nf
recall =</p>
        <p>Nc</p>
        <p>Nc + Nm
F 1 =
2</p>
        <p>recall precision
recall + precision
Where, Nc is number of transitions correctly reported, Nm is
number of transitions missed to be reported, and Nf is number
of falsely reported transitions.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>V. OPEN CHALLENGES</title>
      <p>Although a large amount of work has been done in shot
boundary detection, many issues are still open and deserve
further research. We can conclude from this state of art
that a good video shot detection method highly depends on
features, similarity measure and thresholds used. We found
that the major challenges to detection techniques are by
illumination changes, object and camera motion. For example
color histograms are robust to small camera motion, but they
are not able to differentiate the shots within the same scene,
and they are sensitive to large camera motions. Edge features
are more invariant to illumination changes and motion than
color histograms, and motion features can effectively handle
the influence of object and camera motion. If we just use a
kind of feature to detect the shot boundary, the result may
not be satisfactory, but if we use many kinds of features,
the speed will be slow. And the major challenge is the
problem of determining an automatic threshold based on the
characteristics of the video. The difficulty is how to choose the
optimal threshold. However, the efforts to replace thresholding
by machine learning have begun only recently. The importation
of these ideas may be novel drives to the advance of SBD.</p>
    </sec>
    <sec id="sec-9">
      <title>VI. CONCLUSION AND FUTURE SCOPE</title>
      <p>Video shot boundary detection is the first step of video
processing , it is also the most important step. There have been
a lot of studies about shot boundary at present. In this work,
a comprehensive survey of SBD algorithms (or shot boundary
detection algorithms) was performed. Video definitions,
transition types, and hierarchies were demonstrated. The different
techniques are discussed to detect a shot boundary depending
upon the contents and the change in contents of video. Despite
the extensive research on concrete SBD techniques, SBD still
have some problems that are relevant in practice for different
video scenarios which need to be studied. These challenges
are represented by: Sudden illuminance changes, dim lighting
frames, comparable background frames, object and camera
motion, and change in small regions. Solving these challenges
will surely improve the performance of SBD algorithms.
Finally, the machine learning approaches have been popular
and received much attention in the field of computer vision
applications. However, in the field of SBD, the efforts to
replace thresholding by machine learning have begun only
recently. But the amount of research carried out in the domain
of SBD using machine learning is quite less. Exploring the
benefit of the new machine learning technologies such as
deep learning approaches for SBD could be directed as new
directions for the future.</p>
      <p>Usually, in the sequential case, the comparison of the frames
and shot boundary detection sounds simple, but it can take
centuries to processes multimedia big data. Performance in
a lengthy video data remains an open area of research. Our
future work is to focus on deep learning approaches for SBD
by used technologies of analyses multimedia big data.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Deepika</given-names>
            <surname>Bajaj</surname>
          </string-name>
          and
          <string-name>
            <given-names>Shanu</given-names>
            <surname>Sharma</surname>
          </string-name>
          .
          <article-title>Video depiction of key frames- a review</article-title>
          .
          <source>In Proceedings of the Sixth International Conference on Computer and Communication Technology</source>
          <year>2015</year>
          , ICCCT '
          <volume>15</volume>
          , pages
          <fpage>183</fpage>
          -
          <lpage>187</lpage>
          , New York, NY, USA,
          <year>2015</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Weiming</given-names>
            <surname>Hu</surname>
          </string-name>
          , Nianhua Xie,
          <string-name>
            <surname>Li</surname>
            <given-names>Li</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Xianglin</given-names>
            <surname>Zeng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Maybank</surname>
          </string-name>
          .
          <article-title>A survey on visual content-based video indexing and retrieval</article-title>
          .
          <source>IEEE Transactions on Systems, Man, and Cybernetics</source>
          , Part C (
          <article-title>Applications</article-title>
          and Reviews),
          <volume>41</volume>
          (
          <issue>6</issue>
          ):
          <fpage>797</fpage>
          -
          <lpage>819</lpage>
          , nov
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Tiecheng</given-names>
            <surname>Liu and John R. Kender</surname>
          </string-name>
          .
          <article-title>Computational approaches to temporal sampling of video sequences</article-title>
          .
          <source>ACM Transactions on Multimedia Computing, Communications, and Applications</source>
          ,
          <volume>3</volume>
          (
          <issue>2</issue>
          ):
          <fpage>7</fpage>
          -es, may
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Remi</given-names>
            <surname>Trichet</surname>
          </string-name>
          , Ramakant Nevatia, and
          <string-name>
            <given-names>Brian</given-names>
            <surname>Burns</surname>
          </string-name>
          .
          <article-title>Video event classification with temporal partitioning</article-title>
          .
          <source>In 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)</source>
          . IEEE, aug
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Shayok</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          , Omesh Tickoo, and
          <string-name>
            <given-names>Ravi</given-names>
            <surname>Iyer</surname>
          </string-name>
          .
          <article-title>Adaptive keyframe selection for video summarization</article-title>
          .
          <source>In 2015 IEEE Winter Conference on Applications of Computer Vision</source>
          . IEEE, jan
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Youssef</given-names>
            <surname>Bendraou</surname>
          </string-name>
          .
          <article-title>Video shot boundary detection and key-frame extraction using mathematical models</article-title>
          . Theses, Universite´ du Littoral Coˆte d'Opale,
          <string-name>
            <surname>November</surname>
          </string-name>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Jaydeb</given-names>
            <surname>Mondal</surname>
          </string-name>
          , Malay Kumar Kundu,
          <string-name>
            <surname>Sudeb Das</surname>
            , and
            <given-names>Manish</given-names>
          </string-name>
          <string-name>
            <surname>Chowdhury</surname>
          </string-name>
          .
          <article-title>Video shot boundary detection using multiscale geometric analysis of nsct and least squares support vector machine</article-title>
          .
          <source>Multimedia Tools and Applications</source>
          ,
          <volume>77</volume>
          (
          <issue>7</issue>
          ):
          <fpage>8139</fpage>
          -
          <lpage>8161</lpage>
          , apr
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>