Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) Shot Boundary Detection: Fundamental Concepts and Survey 1st Benoughidene Abdel halim 2nd Titouna Faiza Department of computer science Department of computer science University of Batna 2 University of Batna 2 Batna, Algeria Batna, Algeria benouhalim@gmail.com ftitouna@yahoo.fr Abstract—A great part of the Big Data surge in our digital en- video event partitioning [4]. In addition, the video summary vironments is in the form of video information. Hence automatic is the best and most effective solution for converting large, management of this massive growth in video content seems to be amorphous videos into structured, concise, clear and mean- significantly necessary. At present researches topic on automatic video analyses includes video abstraction or summarization, video ingful information. The main task of summarizing a video classification, video annotation and content based video retrieval. is to segment the original video into shots and extract key In all these applications one needs to identify shot boundary frames from the shots, which will be the most representative detection. Video shot boundary detection (SBD) is the process and concise of the entire video [5]. of segmenting a video sequence into smaller temporal units Video shot boundary detection (SBD) is also called shot called shots. SBD is the primary step for any further video analyses. This paper presents the fundamental theory of the segmentation, is the first process in video summarization, and video shot boundary, and a brief overview on shot boundary its output significantly affects the subsequent processes. The detection approaches and their development. The advantages and main idea of video shot boundary is extracting the feature disadvantages of each approach are comprehensively explored of video frames, and then detecting the shot type according and challenges are presented. In addition to that, we focused to the difference the feature. There are two kinds of video on the machine learning technologies such as deep learning approaches for SBD could be directed as new directions for the shot boundary detection: Cut Transition (CT) and Gradual future. Transition (GT) [6]. In general, the performance of the shot Index Terms—Shot Boundary Detection(SBD), Cut Transition boundary detection algorithm depends on its ability to detect (CT), Gradual Transition (GT), Temporal Video Segmentation, transitions (shot boundaries) in the video sequence. Whereas, Video Content Analysis, Content Based Video Indexing and the accuracy of detection of shot boundary detection generally Retrieval (CBVIR), Feature Extraction, Machine Learning, Deep Learning, Convolutional Neural Networks (CNN), Multimedia depends on the extracted features and their effectiveness in Big Data. representing the visual content of video frames and the com- putational cost of the algorithm, which needs to be reduced [7]. Practically, there are some effects that appear in a video shot I. I NTRODUCTION such as: flash lights or light variations, object/camera motion, With the rapid development of computer networks and mul- camera operation (such as zooming, panning, and tilting), and timedia technology, the amount of multimedia data available similar background. Currently, there is no complete solution every day is enormous and is increasing at a high rate, as well to these problems or most of them in the same algorithm. In as the ease of access and availability of multimedia sources, other words, a favorable and effective method of detecting which leads to big data revolution Multimedia. transitions between shots is still not available despite the Video is the most consumed data type on the Internet such increased attention devoted to shot boundary detection in the as YouTube, Vimeo or Dailymotion, Yahoo Video, social net- last two decades. This unavailability is due to randomness and working sites like Facebook, Twitter, Instagram, etc. The ex- raw video data size. Hence, a robust, efficient, automated shot plosive growth in video content leads to the problem of content boundary detection method is a necessary requirement [8]. management. However, people spent their time uploading and Most of the existing reviews are not covering the recent browsing huge videos to determine whether these videos were advancements directionsin the field of shot boundary detection relevant or not, this is an difficult and stressful task for humans as deep learning. This paper mainly focusing on review and [1]. In such a scenario, it is necessary to have automated analyze different kinds of shot boundary detection algorithms video analysis applications to represent information stored in that are implemented in the uncompressed domain following large multimedia data. Such techniques are grouped into a their accuracy rate, computational load, feature extraction single concept of Content-Based Video Indexing and Retrieval technique, advantages, and disadvantages. Future research (CBVIR) systems. These applications include browsing of directions are also discussed. video folders, news event analyses, intelligent management of videos, video surveillance [2], key frame extraction [3], and II. BASIC C ONCEPTS OF S HOT B OUNDARY D ETECTION a) A Cut : Is a sudden change from a video shot to Partitioning a video sequence into shots is the first step another one [6]. (see Fig 3) toward video summarization. A video shot is defined as a series of interrelated consecutive frames taken contiguously by a single camera and representing a continuous action in time and space. As such, a shot boundary is the transition between two shots. This section presents the main concepts Fig. 3: (a) Cut Transition. for shot boundary detection in videos [9]. 1) Video Definition : A video is a collection of image b) A fade out : Occurs when the shot gradually turns frames arranged in a time-sequenced manner. As video into a single monochrome frame, usually dark [6]. consist of number of frames depend upon size of video. (see Fig 4) These frames occupy large space in memory. Frame rate is about 20 to 30 frames per second [10]. 2) Video Hierarchically : A video can be broken down in scene, shot and frames. Scene is a logical grouping of shots into a semantic unit. A shot is a sequence of frames captured by a single camera in a single continuous action. The frames within a shot (intra-shot Fig. 4: (b) Fade out. frames) contain similar information and visual features with temporal variations. A frame is the smallest unit c) A fade in : Takes place when the scene gradually that constitutes a shot [10]. (see Fig 1) appears on screen.[6]. (see Fig 5) Fig. 5: (c) Fade in. d) A dissolve : Happens when a shot gradually re- places another one. One disappears as the follow- Fig. 1: Video Hierarchically. ing appears, and for a few seconds, they overlap, and both are visible. In the process of dissolve, two 3) Shot transition types : The transition between one adjacent shots are temporally as well as spatially shot and the following can be cut or gradual. The cut associated [6]. (see Fig 6) shot occurs when two successive shots are concatenated directly without any editing (special effects). This type of transition is also known as a abrupt or hard transition. The cut is considered a sudden change from one shot to another. By contrast, gradual shot occurs when two shots are combined by utilizing special effects through- Fig. 6: (d) Dissolve. out the production course. Gradual shot may span two or more frames that are visually interdependent and contain truncated information [11]. According to the e) The wipe : Is more dynamic and is considered different editing effects, there are several different kinds as the most difficult to model and to detect. It of gradual shot types, such as fade in/fade out , dissolve, happens when a shot pushes the other one off wipe [12]. (see Fig 2) the screen. In this case, two adjacent shots are spatially separated at any time, but not temporally separated. Its difficulty lies in the number of types of wipe transitions that exists. Indeed, when a shot is moving from the screen (i/.e leaving place to the other incoming shot), the movement can be either horizontal (i.e. from bottom to top or vice versa), vertical (e.g. from left to right), oblique (i.e. from a corner to the opposite one), starting from the center, going towards the center or others, etc Fig. 2: Video shot transition types. [6]. (see Fig 7) A. Pixel-Based Methods In this method, intensity of pixels is evaluated by taking two consecutive video frames and comparing pixel by pixel or the percentage of pixels that has been changed in two successive frames is compared. When the intensity of pixels is more than threshold, then it is referred to shot change [6]. The main drawback of such approaches (i.e intensity pixels), whatever the metric used, is sensitive to fast object and camera movement, camera panning or zooming. And limitations in this method is setting threshold manually. B. Histogram-Based Methods The most popular metric for cut transition detection is the difference between histograms of two consecutive frames. Histogram describes the distribution of gray, color, shape and texture without taking into account their position, so we can estimate the similarity between two images through the his- Fig. 7: (e) Various types of wipe transitions. togram similarity. This method first extracts the histograms of the video frames, and then calculates the distance between the 4) The feature extraction : Is the process to represent raw histograms. When the distance is more than threshold, then it is image in a reduced form to facilitate decision making referred to shot change. There are several kinds of methods to such as pattern detection, classification or recognition. calculate the histogram distance , such as Manhattan distance The features extracted from the video frames may be ,Euclidean distance and chi-square distance. Several variants low-level, mid-level or high level features [13]. of histogram-based have been proposed in the literature. Lu et a) Low-level features : The low-level features are al. In [14] employed Singular Value Decomposition (SVD), minor details of the image, like lines or dots, with Hue Saturation Value (HSV) histogram, to propose a that does not take into consideration the visual low computational complexity SBD scheme. The candidate or semantic. The low-level features consist of segment selection using adaptive threshold is implemented, RGB values/histograms, intensity values, mean, The color histograms are extracted in HSV (Hue-Saturation- variance, entropy of the pixel values etc [10]. Value) space from all frames in each candidate segment, b) Mid-level features : The mid-level features are in- forming a frame feature matrix. The SVD is then performed termediate between the low-level features and high on the frame feature matrices of all candidate segments to level semantics. The mid-level features consist of reduce the feature dimension. Bendraou Youssef et al. In [15] feature point detectors and descriptors. Although, formulated a new approach for detecting both hard (CT) and the feature points may be used for object identifica- gradual (GT) transitions. They proposed approach processes tion in an image, these are not appropriate for high the video segment by segment, is composed of two main parts: level semantic description of the content depicted static segment verification (A candidate segment that have not in an image [10]. a transition) and shot transition identification (A candidate c) High-level features : High-level features are built segment that may contain a transition CT or GT). Features to detect objects and larger shapes in the image, are extracted from the Concatenated Block Based Histograms trajectory of paths followed by objects, motion (CBBH). For each non static segment all frames in each vectors etc. These may be used for high level this segment, forming a frame feature matrix. The economy description of the content in an image [10]. SVD is then performed on feature matrix. An adaptive double thresholding process was employed for detecting the hard cuts. Because of the importance of SBD, many researchers have For gradual transitions detection, the folding in technique, presented algorithms to boost the accuracy of SBD for Cut known as SVD-updating, is used for the first time in video shot Transition (CT) and Gradual Transition (GT). We introduce a boundary detection. Hong Shao et al. In [16] Hue Saturation survey on various SBD approaches below. Value (HSV) color histogram and Histogram of Gradient (HOG) features are exploited to detect cut transition. HSV III. SHOT BOUNDARY DETECTION M ETHODS color histogram is used to detect the difference between two adjacent frames. While HOG feature is adopted for secondary Nowadays, many researchers are doing work to develop detection to improve the algorithm performance. more reliable and accurate algorithms that can results into The study confirmed that histogram difference is less sen- more precise shot boundaries. There are several common sitive to object motion than the pair-wise comparison, since it methods that deal with CT and/or GT: ignores the spatial changes in a frame. However, histograms may also produce missed shots when two frames with similar in GT are small and the background is similar, semantics do histograms share a different content. not change at all thus they cannot achieve high detection of accuracy. C. Edge-Based Methods Jingwei Xu et al. In [22] use convolutional neural networks Another choice for characterizing an image is its edge in- (CNNs) to extract typical features of frames. They adopted a formation. An edge is the boundary between an object and the candidate segment selection method to locate the positions of background, and indicates the boundary between overlapping shot boundaries coarsely using adaptive thresholds and elim- objects. In edge-based approaches, transition is declared when inate most non-boundary frames. Cut and gradual transitions the locations of the edges of the current frame exhibit a large can be obtained by using a novel pattern-matching method difference with the edges of the previous frame that have based on a new similarity strategy which is partially inspired disappeared. For example, Heng et al. In [17] proposed a by [14]. method based on an edges. They presented the concept of Hassanien et al. In [23] presented a shot boundary detection an objects edge by considering the pixels close to the edge. A method on huge video data set based on spatial-temporal CNN. matching of the edges of an object between two consecutive The Technique is named DeepSBD network that takes a seg- frames was performed. Then, a transition was declared by ments of fixed length as input and classify it into 3 categories utilizing the ratio of the objects edge that was permanent over (cut, gradual, no transition), its output is fed through SVM time and the total number of edges. Zheng et al. In [18] an classifier. This gives the first labeling estimate. Consecutive approach based on a Robert edge detector for detecting fade- segments with the same labeling are merged and the result is in and fade-out transitions was proposed. First, the authors passed to a post-processing step. The step reduce false alarms identified the frame edges by comparing gradients with a fixed of gradual transitions through a histogram-driven temporal threshold. Second, they determined the total number of edges differential measurement. However, the C3D ConvNet is more that appeared. When a frame without edges occurred, fade in complex than 2D ConvNet, which requires much computation or fade out was declared. resources and the lengths of gradual transitions are varying The advantage of this feature is that it is sufficiently but DeepSBD is not designed for multi-scale detection. invariant to illumination changes and several types of motion, Michael Gygli et al. In [24] proposed to learn shot detection, and is related to the human visual perception of a scene. Its from pixels to final shot boundaries. A fully convolutional main disadvantage is computational cost, and noise sensitivity. neural network has been used for shot boundary detection task. For training this model, They consider the all shot D. Motion-Based Methods boundaries are generated. Thus, they created a dataset with one Motion is a key feature in videos and forms an integral part million frames and automatically generated transitions such of it. Because shots with camera motion can be incorrectly as cuts, dissolves and fades. They considered this work as a classified as gradual transitions, detecting zooms and pans binary classification problem to correctly predict if a frame increases the accuracy of a shot boundary detection algorithm. is part of the same shot as the previous frame or not. Their Bruno et al. In [19] proposed a linear motion prediction method obtains state-of-the-art results on the RAI data set, method based on wavelet coefficients, which were computed while running at an unprecedented speed of more than 120x directly from two successive frames. real-time. Currently, their model makes three main errors, (i) For an accurate motion estimation, each block should be missing long dissolves, which it was not trained with, (ii) matched with all blocks of the next frame, which lead to a partial scene changes and (iii) fast scenes with motion blur. large and unreasonable computational cost. Shitao Tang et al. In [25] presented a new cascade frame- work, a fast and accurate approach for shot boundary detec- E. Deep Learning-Based Methods tion. The first stage applied adaptive thresholding to initially Recently, employing deep learning algorithms in the field filter the whole video and selects the candidate segments for of computer vision received much attention from academics. acceleration. In the second stage, they used a well designed 2D Convolutional Neural Networks (CNN) is one of the most im- ConvNet learning the similarity function between two images portant deep learning algorithms due to its significant abilities to locate the cut transitions. The third stage utilized a novel to extract high level features from images and video frames C3D ConvNet model to locate positions of gradual transitions. [20]. Lifang Wu et al. In [26] presented a two stage method for Tong et al. In [21] used The CNN model to extract high- shot boundary detection (TSSBD) which distinguishes cut shot level interpretable features from the frames. It is capable of by fusing color histogram (HSV) and deep features (CNN) detecting both CT and GT boundaries. An adaptive threshold where divide the complete video into segments containing process was employed as a preprocessing stage to select gradual transitions, and over these video segments, gradual candidate segments. Taken one frame as input, the output of shot change detection is implemented using 3D-convolutional the network is a probability distribution among 1000 classes. neural network, which classifies clips into specific gradual The five classes with the highest probabilities are selected as shot change types with a majority voting strategy, gap filling the high-level features of the frame and called as the TAGs of conducts to effectively distinguish shot types of frames and the frame for simplicity. However, in some cases when changes locate shot boundaries. Rui Liang et al. In [27] proposed a new video shot boundary across each images. Then the false detection can be eliminated detection method based on CNN feature. The method extracts effectively by using local descriptors SURF. the features using the AlexNet and ResNet-152 model for Sawitchaya Tippaya et al. In [32] proposed a multi-modal each frame, and calculate consine similarity to describe the visual features based SBD framework. They adopted a can- similarity of a pair of frames. For cut boundary detection, they didate segment selection that performs without the threshold used the similarity of local frames to get more accuracy, and calculation. The discontinuity signal is calculated based on proposed dual-threshold sliding window for gradual transition the SURF matching score and RGB histogram cosine distance detection. value. Lifang Wu et al. In [28] proposed a method for shot Finally, In TABLE I demonstrates a comparison among boundary detection with spatial-temporal convolutional neural different SBD algorithms based on features employed, frame networks based gradual shot detection and histogram base shot skipping, data-set used, accuracy (precision, recall and F1 filtering. The cut shots are extracted from the whole video score measures). From the table, it can be observed that the with histogram base shot filtering. Then, C3D deep model algorithms used frame skipping technique have low computa- is constructed to extract features of frames and distinguish tional cost with an acceptable accuracy as in [14]. Although shot types of dissolve, swipe, fade in and fade out, and some algorithms utilize frame skipping, they show a moderate normal. For untrimmed videos, a frame level merging strategy computational cost because of the computation complexity of is constructed to help locate the boundary of shots from the features used such as SURF in [32]. Obviously, CNN- neighboring frames. based SBD algorithms that show a high computational cost However, those methods only using the CNN for feature such as [27, 28, 29, 32, 36] gain a remarkable accuracy extraction and then using traditional classifiers to detect the compared to other algorithms. scene change. Recently, with the development and popularity of deep learning, many efficient networks for various of appli- IV. S HOT B OUNDARY D ETECTION E VALUATION M ETRICS cations have been proposed. For example, the deep learning There are two prospective metrics that need to be used model Res-Net based networks can obtain very high accuracy to evaluate the performance of SBD algorithms. These two in image classification and object detection for many large aspects are the accuracy and the computational complexity. scale image data sets. Therefore, it can be adopted to solve Usually improving one aspect would be on the cost of the the issue of shot change detection. The downside of this other one. Also, for the evaluation to be truly representative method is revolve around the need for large annotated data- and reliable for comparing various techniques, it must be done sets. However, that the real data can contain cuts between shots in similar conditions and with very similar data sets. In this of the same scene which rarely occur in the synthetic data sets section, we discuss the common metrics (recall, precision, and due to the nature how they are generated. F1-score ) of measuring the accuracy and the computational complexity [33]. F. Others approaches 1) Precision : It is the ratio of detection of correct exper- Thounaojam et al. In [29] proposed a shot detection ap- imental to the detection of correct and false. proach based on genetic algorithm (GA) and fuzzy logic. Nc Fuzzy system is used to classify the video frames into dif- precision = (1) Nc + Nf ferent types of transitions (cut and gradual). Color Histogram Difference is used for feature extraction and for finding the 2) Recall : It is the ratio of detection of correct experimen- differences between two consecutive frames in a video. GA tal to the detection of correct and missed. is used as optimizer to find the optimal range of values Nc recall = (2) of the fuzzy membership functions. The result shows that Nc + Nm the combination of this feature is efficient and the accuracy 3) F1 score : It combines precision and recall to achieve increases with increase in iterations/generations of GA. one score. It is varies in the range [0, 1] where a score Jialei Bi et al. In [30] proposed a novel cut detection method of 1 indicates the best efficacy of a system. based on information theory using SVM. They first compute the dissimilarity using information theory and construct a 2 × recall × precision F1 = (3) discriminative feature vector based on mutual information. recall + precision Then a support vector machine is trained to classify the frames Where, Nc is number of transitions correctly reported, Nm is as cut or none-cut frames without using a traditional global or number of transitions missed to be reported, and Nf is number adaptive threshold. of falsely reported transitions. Junaid Baber et al. In [31] the proposed method, shot boundaries are extracted from videos using frame entropy and V. O PEN C HALLENGES SURF descriptors. Cut boundaries were detected by difference Although a large amount of work has been done in shot of entropy of the gray scale intensity in adjacent frames. boundary detection, many issues are still open and deserve And fade boundaries were detected indiscriminately based further research. We can conclude from this state of art on temporal changes in the entropy of the pixel intensity that a good video shot detection method highly depends on TABLE I: Comparison of different state-of-the-art SBD algorithm CT GT Ref Methods Dataset P R F1 P R F1 [14] SVD and HSV Histogram-Based TRECVID 2001 0.91 0.85 0.88 0.83 0.81 0.81 [15] Histogram-Based TRECVID 2001 0.97 0.95 0.96 0.87 0.93 0.90 [21] Deep Learning-Based (CNN) TRECVID 2001 0.99 0.87 0.92 0.87 0.83 0.87 [22] Deep Learning-Based (CNN) TRECVID 2001 1.00 0.98 0.99 0.99 0.95 0.97 [23] Deep Learning-Based (CNN) UCF101-SBD 0.98 1.00 0.99 0.99 0.99 0.99 [25] Deep Learning-Based (CNN) TRECVID 2007 0.98 1.00 0.99 0.84 0.84 0.84 [27] Deep Learning-Based (CNN) Other 0.95 0.97 0.96 0.86 0.91 0.87 [29] Genetic Algorithm (GA) and Fuzzy Logic based TRECVID 2001 0.88 0.92 0.90 0.86 0.78 0.82 [30] Theory Information and SVM based TRECVID 2002 0.98 0.97 0.98 - - - [32] SURF matching score and RGB histogram based Golf Video 1.00 0.98 0.99 0.89 0.81 0.85 features, similarity measure and thresholds used. We found deep learning approaches for SBD could be directed as new that the major challenges to detection techniques are by directions for the future. illumination changes, object and camera motion. For example Usually, in the sequential case, the comparison of the frames color histograms are robust to small camera motion, but they and shot boundary detection sounds simple, but it can take are not able to differentiate the shots within the same scene, centuries to processes multimedia big data. Performance in and they are sensitive to large camera motions. Edge features a lengthy video data remains an open area of research. Our are more invariant to illumination changes and motion than future work is to focus on deep learning approaches for SBD color histograms, and motion features can effectively handle by used technologies of analyses multimedia big data. the influence of object and camera motion. If we just use a kind of feature to detect the shot boundary, the result may not be satisfactory, but if we use many kinds of features, R EFERENCES the speed will be slow. And the major challenge is the [1] Deepika Bajaj and Shanu Sharma. Video depiction of problem of determining an automatic threshold based on the key frames- a review. In Proceedings of the Sixth Inter- characteristics of the video. The difficulty is how to choose the national Conference on Computer and Communication optimal threshold. However, the efforts to replace thresholding Technology 2015, ICCCT ’15, pages 183–187, New York, by machine learning have begun only recently. The importation NY, USA, 2015. ACM. of these ideas may be novel drives to the advance of SBD. [2] Weiming Hu, Nianhua Xie, Li Li, Xianglin Zeng, and S. Maybank. A survey on visual content-based video VI. CONCLUSION AND FUTURE SCOPE indexing and retrieval. IEEE Transactions on Systems, Video shot boundary detection is the first step of video Man, and Cybernetics, Part C (Applications and Re- processing , it is also the most important step. There have been views), 41(6):797–819, nov 2011. a lot of studies about shot boundary at present. In this work, [3] Tiecheng Liu and John R. Kender. Computational a comprehensive survey of SBD algorithms (or shot boundary approaches to temporal sampling of video sequences. detection algorithms) was performed. Video definitions, tran- ACM Transactions on Multimedia Computing, Commu- sition types, and hierarchies were demonstrated. The different nications, and Applications, 3(2):7–es, may 2007. techniques are discussed to detect a shot boundary depending [4] Remi Trichet, Ramakant Nevatia, and Brian Burns. Video upon the contents and the change in contents of video. Despite event classification with temporal partitioning. In 2015 the extensive research on concrete SBD techniques, SBD still 12th IEEE International Conference on Advanced Video have some problems that are relevant in practice for different and Signal Based Surveillance (AVSS). IEEE, aug 2015. video scenarios which need to be studied. These challenges [5] Shayok Chakraborty, Omesh Tickoo, and Ravi Iyer. are represented by: Sudden illuminance changes, dim lighting Adaptive keyframe selection for video summarization. frames, comparable background frames, object and camera In 2015 IEEE Winter Conference on Applications of motion, and change in small regions. Solving these challenges Computer Vision. IEEE, jan 2015. will surely improve the performance of SBD algorithms. [6] Youssef Bendraou. Video shot boundary detection and Finally, the machine learning approaches have been popular key-frame extraction using mathematical models. Theses, and received much attention in the field of computer vision Université du Littoral Côte d’Opale, November 2017. applications. However, in the field of SBD, the efforts to [7] Jaydeb Mondal, Malay Kumar Kundu, Sudeb Das, and replace thresholding by machine learning have begun only Manish Chowdhury. Video shot boundary detection recently. But the amount of research carried out in the domain using multiscale geometric analysis of nsct and least of SBD using machine learning is quite less. Exploring the squares support vector machine. Multimedia Tools and benefit of the new machine learning technologies such as Applications, 77(7):8139–8161, apr 2017. [8] Gautam Pal, Dwijen Rudrapaul, Suvojit Acharjee, Ruben Rong Xie. Cnn-based shot boundary detection and video Ray, Sayan Chakraborty, and Nilanjan Dey. Video shot annotation. In 2015 IEEE International Symposium on boundary detection: A review. In Advances in Intelli- Broadband Multimedia Systems and Broadcasting. IEEE, gent Systems and Computing, pages 119–127. Springer jun 2015. International Publishing, 2015. [22] Jingwei Xu, Li Song, and Rong Xie. Shot boundary [9] A. Hanjalic. Shot-boundary detection: unraveled and detection using convolutional neural networks. In 2016 resolved? IEEE Transactions on Circuits and Systems Visual Communications and Image Processing (VCIP). for Video Technology, 12(2):90–105, 2002. IEEE, nov 2016. [10] Hrishikesh Bhaumik, Siddhartha Bhattacharyya, and Su- [23] Ahmed Hassanien, Mohamed A. Elgharib, Ahmed Se- santa Chakraborty. Content coverage and redundancy lim, Mohamed Hefeeda, and Wojciech Matusik. Large- removal in video summarization. In Intelligent Analysis scale, fast and accurate shot boundary detection through of Multimedia Information, pages 352–374. IGI Global. spatio-temporal convolutional neural networks. CoRR, [11] Guangyu Gao and Huadong Ma. To accelerate shot abs/1705.03281, 2017. boundary detection by reducing detection region and [24] Michael Gygli. Ridiculously fast shot boundary detec- scope. Multimedia Tools and Applications, 71(3):1749– tion with fully convolutional neural networks. In 2018 1770, Springer Science and Business Media, dec 2012. International Conference on Content-Based Multimedia [12] Zhonglan Wu and Pin Xu. Shot boundary detection Indexing (CBMI). IEEE, sep 2018. in video retrieval. In 2013 IEEE 4th International [25] Shitao Tang, Litong Feng, Zhanghui Kuang, Yimin Chen, Conference on Electronics Information and Emergency and Wei Zhang. Fast video shot transition localiza- Communication. IEEE, nov 2013. tion with deep structured models. In Computer Vision [13] Heba Ahmed Elnemr, Nourhan Mohamed Zayed, and – ACCV 2018, pages 577–592, Cham, 2019. Springer Mahmoud Abdelmoneim Fakhreldein. Feature extraction International Publishing. techniques. In Handbook of Research on Emerging [26] Lifang Wu, Shuai Zhang, Meng Jian, Zhe Lu, and Perspectives in Intelligent Pattern Recognition, Analysis, Dong Wang. Two stage shot boundary detection via and Image Processing, pages 264–294. IGI Global, 2016. feature fusion and spatial-temporal convolutional neural [14] Zhe-Ming Lu and Yong Shi. Fast video shot boundary networks. IEEE Access, 7:77268–77276, 2019. detection based on SVD and pattern matching. IEEE [27] Rui Liang, Qingxin Zhu, Honglei Wei, and Shujiao Liao. Transactions on Image Processing, 22(12):5136–5145, A video shot boundary detection approach based on dec 2013. CNN feature. In 2017 IEEE International Symposium [15] Bendraou Youssef, Essannouni Fedwa, Aboutajdine on Multimedia (ISM). IEEE, dec 2017. Driss, and Salam Ahmed. Shot boundary detection via [28] Lifang Wu, Shuai Zhang, Meng Jian, Zhijia Zhao, and adaptive low rank and svd-updating. Computer Vision Dong Wang. Shot boundary detection with spatial- and Image Understanding, 161:20–28, aug 2017. temporal convolutional neural networks. In Pattern [16] Hong Shao, Yang Qu, and Wencheng Cui. Shot bound- Recognition and Computer Vision, pages 479–491. ary detection algorithm based on HSV histogram and Springer International Publishing, 2018. HOG feature. In Proceedings of the 2015 International [29] Dalton Meitei Thounaojam, Thongam Khelchandra, Conference on Advanced Engineering Materials and Kh. Manglem Singh, and Sudipta Roy. A genetic algo- Technology, pages 951–957. Atlantis Press, 2015. rithm and fuzzy logic approach for video shot boundary [17] Wei Jyh Heng and King N. Ngan. An object-based shot detection. Computational Intelligence and Neuroscience, boundary detection using edge tracing and tracking. Jour- 2016:1–11, 2016. nal of Visual Communication and Image Representation, [30] Jialei Bi, Xianglong Liu, and Bo Lang. A novel shot 12(3):217–239, sep 2001. boundary detection based on information theory using [18] Jie Zheng, Fengmei Zou, and Mandel Shi. An efficient SVM. In 2011 4th International Congress on Image and algorithm for video shot boundary detection. In Pro- Signal Processing. IEEE, oct 2011. ceedings of 2004 International Symposium on Intelligent [31] Junaid Baber, Nitin Afzulpurkar, Matthew N. Dailey, Multimedia, Video and Speech Processing, 2004. IEEE. and Maheen Bakhtyar. Shot boundary detection from [19] E. Bruno and D. Pellerin. Video shot detection based videos using entropy and local descriptor. In 2011 17th on linear prediction of motion. In Proceedings. IEEE International Conference on Digital Signal Processing International Conference on Multimedia and Expo 2002, (DSP). IEEE, jul 2011. volume 1, pages 289–292. IEEE, 2002. [32] Sawitchaya Tippaya, Suchada Sitjongsataporn, Tele Tan, [20] Eralda Nishani and Betim Cico. Computer vision ap- Masood Mehmood Khan, and Kosin Chamnongthai. proaches based on deep learning and neural networks: Multi-modal visual features-based video shot boundary Deep neural networks for video analysis of human pose detection. IEEE Access, 5:12563–12575, 2017. estimation. In 2017 6th Mediterranean Conference on [33] Amr Ahmed. Video representation and processing for Embedded Computing (MECO). IEEE, jun 2017. multimedia data mining. In Semantic Mining Technolo- [21] Wenjing Tong, Li Song, Xiaokang Yang, Hui Qu, and gies for Multimedia Databases. IGI Global, 2009.