=Paper=
{{Paper
|id=Vol-2203/100
|storemode=property
|title=Semisupervised Segmentation of UHD Video
|pdfUrl=https://ceur-ws.org/Vol-2203/100.pdf
|volume=Vol-2203
|authors=Oliver Kerul-Kmec,Petr Pulc,Martin Holena
|dblpUrl=https://dblp.org/rec/conf/itat/Kerul-KmecPH18
}}
==Semisupervised Segmentation of UHD Video==
S. Krajči (ed.): ITAT 2018 Proceedings, pp. 100–107 CEUR Workshop Proceedings Vol. 2203, ISSN 1613-0073, c 2018 Oliver Kerul’-Kmec, Petr Pulc, and Martin Holeňa Semisupervised Segmentation of UHD Video Oliver Kerul’-Kmec1 , Petr Pulc1,2 , Martin Holeňa2 1 Faculty of Information Technology, Czech Technical University, Thákurova 7, Prague, Czech Republic 2 Institute of Computer Science, Czech Academy of Sciences, Pod vodárenskou věží 2, Prague, Czech Republic Abstract: One of the key preprocessing tasks in informa- our research, a combination of a corner detection method tion retrieveal from video is the segmentation of the scene, FAST (Features from Accelerated Segment Test) with a primarily its segmentation into foreground objects and the visual descriptor method BRIEF (Binary Robust Indepen- background. This is actually a classification task, but with dent Elementary Features) is used to this end, known as the specific property that it is very time consuming and ORB (oriented FAST and rotated BRIEF) [7]. Points of costly to obtain human-labelled training data for classifier interest detected in a frame are always attempted to match training. That suggests to use semisupervised classifiers to those detected in the next frame. Such matching points are this end. The presented work in progress reports the inves- searched in a two-step fashion: tigation of semisupervised classification methods based on (i) Only the points of interest in the spacial neighbour- cluster regularization and on fuzzy c-means in connection hood of the expected position are considered. That with the foreground / background segmentation task. To position is based on last known interest point posi- classify as many video frames as possible using only a tion and its past motion (if available). single human-based frame, the semisupervised classifica- (ii) Among the points of interest resulting from (i), as tion is combined with a frequently used keypoint detec- well as among all detected in the current frame for tor based on a combination of a corner detection method which no information about their past motion is avail- with a visual descriptor method. The paper experimentally able, points in the previous frame are searched based compares both methods, and for the first of them, also clas- on the Hamming distance between the descriptors of sifiers with different delays between the human-labelled both points. video frame and classifier training. Whereas the dependence of matching success on the dif- ference between positions of the points and on the move- ment of the first point has a straightforward geometric 1 Introduction meaning, its dependence on the Hamming distance be- For the indexing of multimedial content, it is beneficial to tween their descriptors has a probabilistic character. In have annotations of actors, objects or any other informa- [7], this dependence was investigated and was found that tion that can occur in a video. A vital preprocessing task to if the Hamming distance between 256-bit binary descrip- prepare such annotations is the segmentation of the scene tors of the points is greater than 64, then the probability of into foreground objects and the background. successful match is less than 5%. Traditional methods, such as Gaussian mixture model- If two points of interests in subsequent frames are con- ing, work on the pixel level and are time consuming on sidered matching, the point in the later frame is added to higher resolution video [1]. Another simple method mod- the history vector of the point in the previous frame. In els the background through image averaging, however it this way, we get the motion description of each point of requires a static camera [6]. Our approach, on the other interest. hand, is based on the level of detected interest points, and uses semi-supervised classification to assign those points 3 Semi-supervised Classification as belonging either to the foreground objects or to the background. Traditional supervised classification techniques use only In the next section, we introduce the key points detector labelled instances in the learning phase. In situations we employed for the detection of points of interest. Sec- where the number of availabe labelled instances is insuffi- tion 3 recalls two methods of semi-supervised classifica- cient, labelling is expensive and time consuming, semi- tion we used in our approach. The approach itself is out- supervised classification can be employed, which uses lined in Section 4. Finally, Section 5 presents the results both labelled and unlabelled instances for learning. of its experimental validation performed so far. In the reported research, we used the following two methods for semisupervised classification. 2 Scene Segmentation in the Context of 3.1 Semisupervised Classification with Cluster Video Preprocessing Regularization In each frame of the video, a keypoint detector is used to The principle of this method, in detail described in [8], detect points of interest and compute their descriptors. In consists in clustering all labelled and unlabelled instances Semisupervised Segmentation of UHD Video 101 and estimating, for the instance xk , k = 1, . . . , N, its proba- classifier outputs form probability distributions on bility distribution qk on the set of clusters. In addition, the classes, then following penalty function is proposed for the differences between the pairs (qk , qn ) of probability distributions of DKL ((ŷn,1 , . . . , ŷn,M )kF(xn )) = the instances. M (F(xn )) j π = ∑ ŷn, j ln , n = 1, . . . , N. (6) P(qk , qn ) = sin (r(qk , qn ) ∗ s(qk , qn ))κ , j=1 ŷn, j 2 k, n = 1, . . . , N, k 6= n, (1) Therefore, the considered loss function is where r(qk , qn ) denotes the Pearson correlation coeffi- L((F(xk )) j , ŷk, j ) = cient between qk and qn , κ is a parameter controlling the steepeness of the mapping from similarity to penalty, and (F(xn )) j = ŷn, j ln , n = 1, . . . , N, j = 1, . . . , M. s(qk , qn ) is a normalized similarity of the probability dis- ŷn, j tributions qk and qn , defined (7) kqk − qn k − dmin s(qk , qn ) = 1 − (2) 2. As a classifier, a multilayer perceptron with one hid- dmax − dmin den layer is used, such that the activation function g in using the notation its hidden layer is smooth and includes no bias, and its output layer performs the softmax normalization dmin = min Q, dmax = max Q, of the hidden layer. Hence, with Q = {kqk − qn k|k, n = 1, . . . , N, k 6= n}. (3) exp(g(w>j· x)) (F(x)) j = M . (8) The results of clustering allow to assign pseudolabels ∑i=1 exp(g(w> i· x) to unlabelled instances. In particular, the pseudolabel as- signed for the j-th among the M considered clusters to an The weight vectors w1· , . . . , wM· in (8) are learned unlabelled instance xn in a cluster Ψ is through the minimization of the error function (5). exp ∑xk ∈Ψ is labelled yk, j ŷn, j = M , (4) ∑i=1 exp ∑xk ∈Ψ is labelled yk,i 3.2 Semi-supervised Kernel-Based Fuzzy C-means where yk,i , i = 1, . . . , M is a crisp or fuzzy label of the la- This method, in detail described in [9], originated from belled instance xk for the class i. For uniformity of nota- the fuzzy c-means clustering algorithm [2]. Similarly to tion, the symbol ŷk, j , j = 1, . . . , M can also be used for yk, j the original fuzzy c-means, the method is parametrized by if xk is labelled. a parameter m > 1. What makes this method more gen- The penalty function (1) can be used as a regulariza- eral than the original fuzzy c-means, is its dependence tion modifier in some loss function L : [0, 1]2 → [0, +∞) on the choice of some kernel K, i.e., a symmetric func- measuring the discrepancy between the classifier outputs tion on pairs (x, y) of clustered vectors, which has positive F(xn ) = ((F(xn ))1 , . . . , (F(xn ))M ) for an instance xn , and semidefinite Gramm matrices (e.g., Gaussian or polyno- the corresponding labels (yn,1 , . . . , yn,M ) or pseudolabels mial kernels). In fact, the fuzzy c-means algorithm corre- (ŷn,1 , . . . , ŷn,M ): sponds to the choice K(x, y) = x> y. First, the membership matrix U l is constructed, for clus- 1 M E= ∑ N j=1 ∑ L((F(xn )) j , yn, j )+ tering nl labelled instances x1l , . . . , xnl l into as many clusters xn labelled ! as there are classes, i.e., M. For j = 1, . . . , M, k = 1, . . . , nk , λ max(qn ) ( ∑ ∑ P(qk , qn )L((F(xk )) j , ŷk, j , (5) 1 if the instance xkl is labelled with the class j xn unlabelled |φ (xn )| x ∈φ (x ) k n l U j,k = 0 else. where λ > 0 is a given parameter determining the tradeoff (9) between supervised loss and unsupervised regularization, and the set of instances xk 6= xn with the highest value of From U l , the initial cluster centers are constructed as P(qk , qn ) is denoted φ (xn ). In [8], the following design decisions have been made n ∑k=1 l l xl U j,k k for the loss function and the classifier in (5): v0j = n l , j = 1, . . . , M. (10) ∑k=1 l U j,k 1. The employed loss function can be derived from DKL ( (ŷn,1 , . . . , ŷn,M )kF(xn ) ), the Kullback-Leibler If for some t = 0, 1, . . . , the cluster centers vt1 , . . . , vtM are divergence, from classifier outputs to labels or pseu- available, such as (10), then they are used together with dolabels. If both the labels or pseudolabels and the the chosen kernel K to construct the membership matrix 102 Oliver Kerul’-Kmec, Petr Pulc, and Martin Holeňa U u,t for clustering nu unlabelled instances x1u , . . . , xnuu , as 4.2 Implementation of Object Segmentation follows: The Cartesian coordinates ([p]1 , [p]2 ) of a point p of in- 1 terest are expressed with respect to top left corner of the u,t (1 − K(xku , v j ))− m−1 U j,k = 1 , frame, using as units the frame height and width. Due to ∑M u i=1 (1 − K(xk , vi )) − m−1 that, [p]1 and [p]2 are normalized to [0, 1]. j = 1, . . . , M, k = 1, . . . , nu . (11) For a match between points of interest pk and pk+1 in subsequent frames k and k + 1, the following criteria have Finally, the cluster centers are updated, for t = 0, 1, .. by been used: calculating (i) The point pk+1 must lie within the radius rkp from the estimated new position of the point p̂k vt+1 = j kpk+1 − p̂k k < rkp . (13) nl l )m K(xl , vt )xl + nl (U u,t )m K(xu , vt )xu ∑k=1 (U j,k k j k ∑k=1 j,k k j k = nl nl u,t m . Here, the estimated position p̂k is calculated as ∑k=1 (U j,k ) K(xk , v j ) + ∑k=1 (U j,k ) K(xku , vtj ) l m l t ( (12) pk + c1 (pk − pk−1 ) if pk−1 is available, p̂k = The computations (11)–(12) are iterated until at least pk else, one of the following termination criteria is reached: (14) (i) kU u,t − U u,t−1 k < ε,t ≥ 1, for a given matrix norm k · k and a given ε > 0; where c1 > 0, and the radius rkp is calculated as (ii) a given maximal number of iterations tmax . rkp = (ukpW )2 , (15) where ukp quantifies the uncertainty pertaining to the 4 Proposed Approach point pk in the k-th frame and W denotes the frame width (in the units in which point positions are ex- 4.1 Overall Strategy pressed). The uncertainty u p is set to u1p = c2 > 0 Our methodology for the segmentation of video frames in the first frame and is then evolved from frame into foreground objects and background relies on the as- to frame through linear scaling above a lower limit sumption that the user typically assigns corresponding la- c3 > 0: bels to points of interest only in the first frame, and even ( p max(c3 , c4 ukp ) if pk is matched, not necessarily to all detected points of interest. uk+1 = No matter whether the considered method of semisuper- c5 ukp if pk is not matched, vised classification is semisupervised classification with (16) cluster regularization or semi-supervised kernel-based where 0 < c4 < 1, c5 > 1. fuzzy c-means, the methodology always proceeds in the p Moreover, if the evolution (16) leads to uk+1 > c6 for following steps: some c6 > c3 , then the point p is deactivated and not 1. In the first frame, the user labels some of the points any more considered for matching. of interest detected by the ORB detector. (ii) Hamming distance between the 256-bit binary de- sciptors of the points is at most 64. 2. Using the considered method of semisupervised clas- The choice of the real-valued constants in the criterion sification, the remaining detected points of interest (i) has been based on the resolution of the video (4K), on are labelled. the frame rate (25) and on the defaults in the ORB imple- mentation based on [7]. They have been set to the follow- 3. Matching points detected in the next frame are as- ing values: c1 = 0.6, c2 = 0.02, c3 = 0.009, c4 = 0.9, c5 = signed the same labels as the points to which they are 1.1, c6 = 0.03. matched. In each frame, the described implementation was used 4. Using the considered method of semisupervised clas- to find 500 most interesting points. On a linux computer sification, the remaining points of interest detected in with a 3.3 GHz Intel Xeon E3-1230 processor, this took the next frame are labelled. 95.32 ms. 5. Steps 3 and 4 are repeated till either the points of in- 4.3 Implementation of Semi-supervised Classifiers terest in all frames have been classified or the scene has been so much disrupted between two frames that As input features for both classification methods, the no points of interest could be matched between them Cartesian coordinates ([pk ]1 , [pk ]2 ) of the point in the k-th (in such a case, new labelling by the user is needed). frame and and the polar coordinates ([pk − pk−1 ]|| , [pk+1 − Semisupervised Segmentation of UHD Video 103 pk ]ϕ ) of its movement with respect to the previous frame • a handheld camera, only the foreground object is are used. sharp (2 videos), In the implementation of the semisupervised classifica- tion with cluster regularization method described in 3.1, • a static camera, only the background is sharp (2 we used k-means clustering for an initial clustering of all videos), instances. Although this method allows choosing the num- • a static camera, only the background is sharp, the ber of clusters independently of the number of classes, foreground object is close to the camera, we have set it to the same value for comparability with semi-supervised kernel-based fuzzy c-means, i.e., to the • a static camera, only the foreground object is sharp, a value 2 corresponding to the classes of foreground objects hand is interfering with the background (2 videos), and background. Hence, we performed k-means cluster- ing with k = 2. Since the k-means algorithm does not • a static camera, only the foreground object is sharp, output a probability distribution on the set of clusters, we it is moving towards the camera, employed a simple procedure proposed in [8] to transform • a static camera, only the foreground object is sharp, the original distances from an instance xn to cluster centers it is moving away from the camera, v1 , . . . , vk , to a probability distribution qn , which assures that xn more likely belongs to clusters to which centers it • static camera, only the foreground object is sharp, it is closer: passes the scene multiple times (2 videos). 1− kxn −vi k For the testing, labels were available for all points of inter- ∑kj=1 kxn −vi k (qn )i = . (17) est. Unfortunately, those labels were often unreliable. k−1 Consequently, for our case k = 2: 5.2 Results and Their Analysis kxn − v2 k On all the employed videos, we measured the quality of (qn )1 = , (18) kxn − v1 k + kxn − v2 k classification by means of accuracy, sensitivity, specificity kxn − v1 k and F-measure of both implemented classification meth- (qn )2 = . (19) kxn − v1 k + kxn − v2 k ods. For the fuzzy c-means method, the accuracy and speci- The remaining parameters pertaining to semisupervised ficity on the unlabelled data are illustrated for four partic- classification with cluster regularization were set as pro- ular videos in Figure 1. posed in [8]: λ = 0.2, κ = 2, |φ (xn )| = 10. For the cluster-regularization method, we compared the For the semi-supervised kernel-based fuzzy c-means values of the considered four quality meaures obtained algorithm described in 3.2, we used a Gaussian kernel with five classifiers trained in each of the five first video function for updating the membership matrix K(x, y) = frames with respect to the delay between classifier training exp(−kx − yk2 /σ 2 ), where the parameter σ is computed and measuring its quality. The results of their comparison as proposed in [9]: are for three particular delays, 1 frame, 5 frames and 10 s frames, summarized in Table 1. In addition, for delays up 1 ∑Nn=1 kxn − vk2 to 50 frames, they are again illustrated for accuracy and σ= , (20) M N sensitivity on the four videos used already in connection with the fuzzy c-means classifier, in Figures 2–5. where v is the center of all instances. The remaining pa- The figures (2)–(5) indicate that classifiers trained in a rameters were set as follows: m = 2, ε = 0.001,tmax = 50. later frame tend to have higher accuracy and specificity, but in general, the differences between classifiers trained 5 Experimental Validation in different frames are small. This is confirmed by the Friedman test for delays 1, 5 and 10 frames between clas- 5.1 Employed Data sifier training and measuring its quality and for all four considered quality measures. The hypothesis of equality For the validation of the proposed approach we prepared of all five classifiers is rejected (p-value < 5%) only for 12 short videos. In all videos, there is a yellow or blue bal- the delay 1 frame and the F-measure, and weakly rejected loon as a foreground object and a green background. On (p-value < 10%) for the delay 1 frame and the sensitivity, the background, there are a few small red sticky notes to as well as for the delay 5 frames and the F-measure. A help detecting some key points. The videos were recorded posthoc test expectedly reveals that the equality of all five in a UHD resolution. classifiers was rejected mainly due to differences between Here is a brief characterization of all employed videos: classifiers trained in the early and in later frames; in par- • a handheld camera, both the foreground object and ticular between those trained in the 1st and 4th frame (de- the background are sharp, lay 1, both sensitivity and F-measure), classifiers trained 104 Table 1: Comparison of the values of the considered quality measures obtained with classifiers trained in each of the 5 first video frames for different delays between classifier training and testing, obtained on data from the 12 employed videos. The result in a cell of the table indicates on how many videos the considered measure of classifier quality (accuracy, sensitivity, specificity, F-measure) was higher for the row classifier : on how many videos it was higher for the column classifier. A result in italic, respectively bold italic, indicates that after the Friedman test at least weakly rejected (p-value < 10%) the hypothesis that the considered quality measure is equal for all classifiers (cf. Table 2), the post-hoc test according to [3, 4] weakly rejects, respectively rejects (p-value < 5%) the hypothesis that it is equal for the particular row and column classifiers. All simultanously tested hypotheses were corrected in accordance with Holm [5] Delay between the frame on which the classifier is trained and the frame when it is tested 1 frame 5 frames 10 frames #1 Frame in which the compared classifier was trained # 2 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Accuracy 1 5:7 5:7 7:5 6:6 4:8 7:5 7:5 10:2 4:8 4:8 4:8 2:10 2 7:5 4:8 8:4 6:6 8:4 6:6 7:5 10:2 8:4 7:5 5:7 5:7 3 7:5 8:4 9:3 6:6 5:7 6:6 7:5 11:1 8:4 5:7 6:6 5:7 4 5:7 4:8 3:9 5:7 5:7 5:7 5:7 10:2 8:4 7:5 6:6 5:7 5 6:6 6:6 6:6 7:5 2:10 2:10 1:11 2:10 10:2 7:5 7:5 7:5 Sensitivity 1 8:4 8.5:3.5 9.5:2.5 8.5:3.5 8:4 7:5 7:5 9:3 6.5:5.5 8:4 8.5:3.5 8.5:3.5 2 4:8 8:4 10:2 9:3 4:8 6:6 7.5:4.5 9.5:2.5 5.5:6.5 6:6 8:4 8:4 3 3.5:8.5 4:8 10.5:1.5 9.5:2.5 5:7 6:6 7.5:4.5 8.5:3.5 4:8 6:6 8.5:3.5 9.5:2.5 4 2.5:9.5 2:10 1.5:10.5 6.5:5.5 5:7 4.5:7.5 4.5:7.5 8:4 3.5:8.5 4:8 3.5:8.5 8.5:3.5 5 3.5:8.5 3:9 2.5:9.5 5.5:6.5 3:9 2.5:9.5 3.5:8.5 4:8 3.5:8.5 4:8 2.5:9.5 3.5:8.5 Specificity 1 7.5:4.5 6.5:5.5 7:5 7:5 3.5:8.5 5:7 5:7 4:8 6.5:5.5 3.5:8.5 2:10 3.5:8.5 2 4.5:7.5 4.5:7.5 6:6 6.5:5.5 8.5:3.5 5:7 4.5:7.5 4:8 5.5:6.5 4:8 4:8 4:8 3 5.5:6.5 7.5:4.5 6:6 7:5 7:5 7:5 7:5 6.5:5.5 8.5:3.5 8:4 4.5:7.5 6:6 4 5:7 6:6 6:6 4.5:7.5 7:5 7.5:4.5 5:7 4:8 10:2 8:4 7.5:4.5 6.5:5.5 5 5:7 5.5:6.5 5:7 7.5:4.5 8:4 8:4 5.5:6.5 8:4 8.5:3.5 8:4 6:6 5.5:6.5 F-measure 1 8:4 9:3 10:2 8:4 6:6 7:5 8:4 11:1 5.5:6.5 9:3 8.5:3.5 9.5:2.5 2 4:8 7:5 12:0 9:3 6:6 6.5:5.5 7:5 10:2 6.5:5.5 6.5:5.5 7.5:4.5 9.5:2.5 3 3:9 5:7 11:1 8:4 5:7 5.5:6.5 8:4 11:1 3:9 5.5:6.5 8:4 9:3 4 2:10 0:12 1:11 6:6 4:8 5:7 4:8 8.5:3.5 3.5:8.5 4.5:7.5 4:8 9:3 5 4:8 3:9 4:8 6:6 1:11 2:10 1:11 3.5:8.5 2.5:9.5 2.5:9.5 3:9 3:9 Oliver Kerul’-Kmec, Petr Pulc, and Martin Holeňa Semisupervised Segmentation of UHD Video 105 Table 2: Results of the Friedman test of the hypothesis that for a given delay between classifier training and mea- suring its quality, a given quality measure is equal for the classifiers trained in each of the 5 first video frames, for the 12 combinations of delays and quality measures con- sidered in Table 1. The combinations for which the tested hypotheseis was weakly rejected (p-value < 10%) are in italic, the single combination for which it was rejected (p- value < 5%) is in bold italic. All simultanously tested hy- potheses were corrected in accordance with Holm [5] Quality measure Delay p-Value accuracy 1 1 accuracy 5 0.117 accuracy 10 1 sensitivity 1 0.052 sensitivity 5 0.428 sensitivity 10 0.238 specificity 1 1 specificity 5 1 specificity 10 0.25 F-measure 1 0.043 F-measure 5 0.089 F-measure 10 0.238 in the 1st and 4th frame (delay 1, F-measure) and classi- fiers trained in the 1-3 frame and in the 5th frame (delay 5, F-measure). 6 Conclusion The presented research integrates two comparatively re- cent approaches, the keypoint detector ORB, which is a combination of a corner detection method with a visual Figure 1: The evolution of accuracy (top) and specificity descriptor method, and two semi-supervised classifiction (bottom) of the c-means method on the unlabelled data for methods. To our knowledge, this is the first time these ap- four particular videos proaches are used together for the task of scene segmenta- tion into the foreground objects and the background. References On the other hand, this is a work in progress and the pre- sented results are still rather preliminary, being obtained on 12 artificially created videos with a quite simple scene [1] M.S. Allili, N. Bouguila, and D. Ziou. Finite general Gaus- segmentation. Both approaches should be investigated in sian mixture modeling and application to image and video the context of more complex segmentations and more re- foreground segmentation. Journal of Electronic Imaging, alistic scenes. To this end, however, especially the ORB 17:paper 013005, 2008. detector needs to be more deeply elaborated with methods [2] J.C. Bezdek. Pattern Recognition with Fuzzy Objective of semisupervised classification. Function Algorithms. Plenum Press, New York, 1981. [3] J. Demšar. Statistical comparisons of classifiers over multi- ple data sets. Journal of Machine Learning Research, 7:1– 30, 2006. Acknowledgement [4] S. Garcia and F. Herrera. An extension on “Statistical Com- parisons of Classifiers over Multiple Data Sets” for all pair- The research reported in this paper has been supported by wise comparisons. Journal of Machine Learning Research, the Czech Science Foundation (GAČR) grant 18-18080S. 9:2677–2694, 2008. 106 Oliver Kerul’-Kmec, Petr Pulc, and Martin Holeňa Figure 2: The evolution of accuracy (top) and specificity Figure 3: The evolution of accuracy (top) and specificity (bottom) of the classifiers trained in each of the 5 first (bottom) of the classifiers trained in each of the 5 first video frames for a handheld-camera video with both the video frames for a handheld-camera video with only the foreground object and the background sharp foreground object sharp [5] S. Holm. A simple sequentially rejective multiple test pro- 2012. cedure. Scandinavian Journal of Statistics, 6:65–70, 1979. [9] D. Zhang, K. Tan, and S. Chen. Semi-supervised kernel- [6] L. Li, W. Huang, I.Y.H. Gu, and Q. Tan. Foreground object based fuzzy c-means. In ICONIP’04, pages 1229–1234. detection from videos containing complex background. In Springer, 2004. 11th ACM Conference on Multimedia, pages 2–10, 2003. [7] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski. ORB: An efficient alternative to SIFT or SURF. In International Conference on Computer Vision, pages 2564–2571, 2011. [8] R.G.F. Soares, H. Chen, and X. Yao. Semisupervised clas- sification with cluster regularization. IEEE Transactions on Neural Networks and Learning Systems, 23:1779–1792, Semisupervised Segmentation of UHD Video 107 Figure 4: The evolution of accuracy (top) and specificity Figure 5: The evolution of accuracy (top) and specificity (bottom) of the classifiers trained in each of the 5 first (bottom) of the classifiers trained in each of the 5 first video frames for a static-camera video, in which only the video frames for a static-camera video, in which only the foreground object is sharp and is moving towards the cam- foreground object is sharp and passes the scene multiple era time