<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Online Video Summarization with the Kohonen SOM in Real Time Oleksii Gorokhovatskyi [0000-0003-3477-2132] , Oleh Teslenko [0000-0003-3105-9323] ,Volodymyr Zatkhei [0000-0003-4426-7789]</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Simon</forename><surname>Kuznets</surname></persName>
							<email>oleksii.gorokhovatskyi@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Kharkiv National University of Economics</orgName>
								<address>
									<addrLine>9a Nauka Ave</addrLine>
									<postCode>61166</postCode>
									<settlement>Kharkiv</settlement>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Online Video Summarization with the Kohonen SOM in Real Time Oleksii Gorokhovatskyi [0000-0003-3477-2132] , Oleh Teslenko [0000-0003-3105-9323] ,Volodymyr Zatkhei [0000-0003-4426-7789]</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">8F25FC2152D04A47E387FD1FF1FB9B72</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T04:20+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>video summarization</term>
					<term>keyframe</term>
					<term>self-organizing map</term>
					<term>clustering</term>
					<term>image features</term>
					<term>matching</term>
					<term>summary</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, we propose an algorithm for the creation of an automatic video summary. It is based on the Kohonen's Self-Organizing Map as a method for training and clustering of frame features in online mode. The decision about whether the frame should be in summary depends on the stability of the last sequential clustering results. Three-way matching of images between automatic summary and corresponding user one is proposed and tested. Open Video and SumMe datasets were used for accuracy and performance comparison. It is shown, that the proposed approach can achieve real time summarization combining with its online properties without the requirement to see the whole video. The accuracy (measured by F1 scores) of the proposed approach can compete with batch processing methods. We also compared the performance to the state-of-the-art existing methods of online real time processing.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction and the related work</head><p>In recent years tremendous development of the information, computer and communication technologies made humans impossible to process all available and appearing data themselves. Especially, this is true for video content, e.g. every minute 300-500 hours of video (by different sources) are uploaded to YouTube, additionally, users watch over a billion hours every day. Video summarization is a process of selecting some specific subset of keyframes (still images) or keyshots (small sequences of frames) from a video stream which preserves the main idea of video <ref type="bibr" target="#b0">[1]</ref>. A summary should keep important frames of the initial video that creates the core summarization challenge -the same frames may be important and unimportant at the same time for different users, making in such a way the summary of video to be a quite subjective term.</p><p>A lot of researches <ref type="bibr" target="#b1">[2]</ref><ref type="bibr" target="#b2">[3]</ref><ref type="bibr" target="#b3">[4]</ref><ref type="bibr" target="#b4">[5]</ref> distinguish all summarization methods into two classes: unsupervised <ref type="bibr" target="#b5">[6]</ref><ref type="bibr" target="#b6">[7]</ref><ref type="bibr" target="#b7">[8]</ref> and supervised <ref type="bibr" target="#b8">[9]</ref><ref type="bibr" target="#b9">[10]</ref><ref type="bibr" target="#b10">[11]</ref><ref type="bibr" target="#b11">[12]</ref>. A brief description of some known methods is below. Some researchers <ref type="bibr" target="#b12">[13]</ref><ref type="bibr" target="#b13">[14]</ref><ref type="bibr" target="#b14">[15]</ref> used self-organizing maps or close approaches. Hierarchical Growing Cell Structures (GCS) methods are proposed in <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b13">14]</ref> as an extension of the Kohonen's Self-Organizing Maps (SOM) that allows to build a flexible structure without knowing the number of classes a priori. A graphical user interface to investigate the construction of two-dimensional SOM is proposed in <ref type="bibr" target="#b14">[15]</ref>. Unfortunately, these researches don't contain significant modeling results using at least a dataset of medium size.</p><p>One of the most popular summarization approach is VSUMM <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b15">16]</ref>. The authors proposed the method based on the extraction of color features from video frames and unsupervised classification. Additionally, a new measure to compare automatic and user-defined summaries (Comparison of User Summaries, CUS) was presented.</p><p>The variety of other video summarizing methods, based on generative adversarial networks <ref type="bibr" target="#b16">[17]</ref>, long short-term memory networks <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b11">12]</ref>, attention-based <ref type="bibr" target="#b17">[18,</ref><ref type="bibr" target="#b18">19]</ref> and deep learning approaches <ref type="bibr" target="#b19">[20]</ref>, using of text annotations both with visual features <ref type="bibr" target="#b20">[21]</ref>, fuzzy-based incremental clustering <ref type="bibr" target="#b21">[22]</ref> were proposed before.</p><p>Paper <ref type="bibr" target="#b7">[8]</ref> describes the idea to generate video summaries in online mode immediately without seeing the entire video in quasi real time. This method includes the building of the dictionary via group sparse coding for some initial video frames, following by the reconstruction attempt of unseen frames. If reconstruction error is significant enough, a dictionary is updated and the current frame is added to summary.</p><p>Ideas proposed in <ref type="bibr" target="#b22">[23]</ref> extend dictionary learning with the prediction of interestingness using global camera motion analysis and colorfulness. This approach seems not to be designed for online processing but focused on real time processing mainly.</p><p>The implementations of the abovementioned approaches <ref type="bibr" target="#b7">[8]</ref> and <ref type="bibr" target="#b22">[23]</ref> seem not to be available, as well as test videos for <ref type="bibr" target="#b7">[8]</ref>.</p><p>The contributions of the paper include:</p><p> video summarization method based on the self-organizing maps, that can work in online mode (without seeing the whole video) in real time;  new three-stage matching of two sets of frames, that includes both keypoint and raw image pixels comparison;  selection of the keyframes based on Kohonen's SOM clustering stability;  we performed the quality assessment of the proposed online summary generation method using the dataset, which was created by volunteers in online fashion also.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2</head><p>Problem statement 3</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Self-organizing maps</head><p>Kohonen's self-organizing map (SOM) <ref type="bibr" target="#b23">[24]</ref> is one of the most popular neural network unsupervised clustering approaches. This type of network preserves the topology between input and output values and allows to map multidimensional input into lowdimensional (typically 1D or 2D) outputs.</p><p>The important property of the Kohonen network is that it is capable of online data processing when input samples come one by one followed by immediate clustering (classification) decision.</p><p>Training of Kohonen SOM network implies the update of all weights after the processing of each training sample one by one (online training) and may be done according to stages below. Let n be the quantity of outputs (known a priori) and we denote the quantity of features as m .</p><p>1. Initializing of weights for each neuron with the small random weights.</p><p>2. Selection for input features vector x .</p><p>3. Train the features vector while error for it is bigger than training epsilon (0.0001):</p><p>3.1 Find the closest (Best Matching Unit, BMU)  neuron in terms of some dis- tance, e.g. Euclidean.</p><p>3.2 Update weights ij w for all n neurons according to (1):  4. Repeat step 2 for all features vector increasing t each time. One of the most important parameters to be set up for the usage of the SOM is the size of a one-dimensional or two-dimensional map. In this work, we used a onedimensional map with 20 possible clusters to reach the required real time performance. Our experiments showed that the bigger quantity of clusters requires more training time, the lesser quantity doesn't catch the difference between frames well enough. The successful choice of this parameter allows to balance the quality and the performance of the method being proposed.</p><formula xml:id="formula_0">) )( , , ( ) ( ij i i ij ij w x t x t w w      <label>( 1 )</label></formula><p>The training stage of the SOM continues until all clusters are trained and the quantity of already processed feature vectors is less than 1000.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Features and keyframes selection</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Image features</head><p>The selection of features is an important step to represent the specific properties of each frame. The best choice from our point of view is such features, which can be calculated quickly and/or in parallel to preserve real-time processing. We selected the common color features, obtained from floating non-intersection windowing of an image. Averaged R, G and B color components in range [0;1] are used as features from the single window.</p><p>We used square windows with size 16  w or 32  w in experiments and rescaled images accordingly to preserve the same length of features vector. The processing of building a feature vector for the entire image was performed in two parallel independent threads and merged at the end.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>4.2</head><p>The selection of keyframes</p><p>We will define the keyframe as a frame that varies slightly during some quantity T of previously examined frames in a video stream. We define the quantitative measure of the variability in the two-step procedure. The first step includes the clustering of a frame by SOM. So, we cluster the video stream frame by frame, counting the quantity of frames Q , belonging to the same cluster in a row. When this quantity becomes bigger than the predefined threshold 0 T , we consider this part of a video stream as stable enough and select keyframe candidate from the middle with index 2 /</p><formula xml:id="formula_1">* Q i k  </formula><p>, where i -is the number of the frame being processed.</p><p>At the next step we compare keyframe candidate * k with previously added frame in order to avoid the addition of the very similar, belonging to the same cluster but having some frame with another cluster in between accidently. <ref type="bibr">If</ref>  </p><formula xml:id="formula_2">    m i i i c c E m d 1 2 1 00 ) , ( 1<label>( 2 )</label></formula><p>where ) , (</p><formula xml:id="formula_3">2 1 00 i i c c E </formula><p>-is the CIEDE2000 difference between two colors <ref type="bibr" target="#b24">[25]</ref>. We claim images to be similar if , where 1.5 is the just noticeable difference (JND) for this metric according to <ref type="bibr" target="#b25">[26]</ref>.</p><p>So, we present the entire scheme of the suggested summarization method in Fig. <ref type="figure" target="#fig_2">1</ref>. We denote as 42 0  T the quantity of frames, required to be classified as the same class in a row with SOM. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Image matching</head><p>In order to check the quality of summarizing with the suggested approach we need to compare two sets of images, the first one contains images from our automatic summary, the second (ground truth or etalon) one contains images, proposed by humans.</p><p>We implemented the match between two sets in three consecutive stages. The first stage includes the rough estimate of match candidate images with Fast Retina Keypoint (FREAK, <ref type="bibr" target="#b26">[27]</ref>) descriptor. FREAK is built on FAST keypoint detector <ref type="bibr" target="#b27">[28]</ref> and requires initial threshold t to be set up as a required difference between a central pixel and surrounding to identify central pixel as a corner. The bigger value t leads to the less quantity of keypoint being detected. We found 20  t to be a good default value for experiments.</p><p>FREAK descriptors contain 512 bits, we compare them in cascades 128 bits each as proposed in <ref type="bibr" target="#b26">[27]</ref>. A comparison of chains requires another threshold 0 F , which allows some differences to appear, as the same bit arrays even for close images is the rare case. If corresponding bit chains in descriptors have more differences compared to the threshold, descriptors are considered being different and comparison stops. In our experiments we found threshold 32 0  F to be the good choice (25% bit values may differ between descriptors).</p><p>We suggest that two images probably match if the overall quantity of matched descriptors between them is greater than the quantity of non-matched.</p><p>When the list of matching candidates is ready, we compare each pair of candidates using comparison <ref type="bibr" target="#b1">(2)</ref>. We perform these comparisons on feature vectors, built on full size images with window size 32  w . The impact of different 0 F values on the matching results is shown in Fig. <ref type="figure" target="#fig_3">2</ref>. The first row contains some frames from the user summary. Three rows below contain some frames from an automatic summary and a corresponding 0 F value (24, 32 or 40). As one can see from Fig. <ref type="figure" target="#fig_3">2</ref>, we have one successful match for 24 0  F , three correct matches for 32 0  F . In the last case when 40 0  F we have five matches, one of which is absolutely false (third frame) and one is quite subjective to make the decision about the match (last frame). The last stage is independent of the previous ones and is used for the frames, for which FREAK descriptor is not effective enough. We process the closest (in the scope of Manhattan distance between frame numbers) frames from automatic and user summaries, which were not matched before. Some frames may be skipped earlier because they contain not enough FREAK descriptors. We analyze them with searching the difference between average R, G and B values of the entire full size image. Images are assumed to be similar if the average difference of all three R, G and B is less than 20.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Measuring quality of the automatic summary</head><p>We will estimate the quality of the suggested approach with measuring the CUS values like it was suggested in <ref type="bibr" target="#b6">[7]</ref> and F1 scores, like it is described in detail in <ref type="bibr" target="#b17">[18]</ref> and applied in <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b11">12]</ref> and others.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Two summary quality</head><p>A CUS (accuracy) and E CUS (error) metrics are proposed in <ref type="bibr" target="#b6">[7]</ref>:</p><formula xml:id="formula_4">US mAS A n n CUS  , US mAS E n n CUS  , (<label>3</label></formula><formula xml:id="formula_5">)</formula><p>where mAS n is the quantity of matching keyframes from automatic summary, mAS n is the quantity of non-matching keyframes from automatic summary, US n is the total quantity of keyframes from user summary.</p><formula xml:id="formula_6">1 F score is calculated as ) /( 2 1 R P PR F  </formula><p>, where P is the precision, F is the recall, calculated on the true positives (quantity of matched frames), false positives (quantity of frames which are present if automatic summary but absent in user one), false negatives (quantity of frames which are present in user summary but missing in automatic).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Experiments</head><p>SumMe <ref type="bibr" target="#b9">[10]</ref> and TVSum <ref type="bibr" target="#b5">[6]</ref> are the most popular datasets in video summarizing, but not suitable for us. They have importance assigned to each frame as a result of the user summary performed on the entire video. This is convenient to build etalon summaries from this user summaries just solving 0/1 knapsack problem for a fixed length of the summary, typically 15%. In our case we make the decision in online mode, so we don't know the final length naturally. Despite we could extend out an automatic summary to make the required length, criteria of frame selection in batch and online modes are very different. So, we used dataset, gathered from Open Video dataset by <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b15">16]</ref>, that contains 50 color videos in MPEG-1 format (30 fps, 352x240 pixels) approximately 75 minutes in total.</p><p>User summaries were generated by 2 volunteers in online mode. We asked them to select important (from their point of view, no explanations required) frames while looking video-sequences without sound. The modification of previously selected summary frames was not allowed. Users made the decision about the importance of the frame without knowing the content of the entire video.</p><p>The specific of human behavior, called chronological bias, is described in <ref type="bibr" target="#b5">[6]</ref>. The authors say that humans sometimes claim frames that appear earlier to be more important just because of chronologically, regardless of frame content. This effect leads to the selection of very close frames as important ones. To avoid the influence of chronological bias, we apply the elimination of duplicates in user summaries applying matching of frames, described above.</p><p>After we compared automatic summaries using the proposed approach with user summaries, created by volunteers in online mode and cleaned from duplicates, we got such average values:</p><formula xml:id="formula_7">58 , 0  A CUS , 38 , 0  E CUS , 61 , 0 1  F</formula><p>. Feature vector of length 210 was built on the size of images that two times less than original ones.</p><p>We have recalculated these scores for OV <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b28">29]</ref>, DT <ref type="bibr" target="#b29">[30]</ref> and VSUMM <ref type="bibr" target="#b6">[7]</ref> methods using our matching approach, results are presented in Table <ref type="table" target="#tab_2">1</ref> and they are close to ones, shown in <ref type="bibr" target="#b6">[7]</ref>. As one can see, our approach has much bigger E CUS value compared to the same value for our user summaries, created in online mode ( 38 , 0 </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E CUS</head><p>). That means, that the rules for creation of user summaries (online without seeing the entire video or the selection of keyframes after watching entire video) matters. The total duration of the 50 videos <ref type="bibr" target="#b15">[16]</ref> from Open Video dataset is about 75 minutes, we built the summary for all videos in 17 minutes. The average length of the summary is 0.3% of the entire video. One-dimensional map with 20 clusters was used.</p><p>We used SumMe <ref type="bibr" target="#b9">[10]</ref> dataset to evaluate the processing speed. The duration of 25 videos from the SumMe dataset is approximately 70 minutes, time of processing, the quantity of keyframes in summary and length of the features vector are shown in Table <ref type="table">2</ref>.</p><p>First column contains the name of video and its duration in minutes and seconds. Values in the second column were calculated for the frames with size two times less, than original, in the third column -three times less. Values in the last column were calculated with specific downscaling factor for each video that limits the quantity of features (maximum allowed width of the frame is 200, height is 140). The empty value in the last column means that the result of processing corresponds to one of the values from previous columns.</p><p>Values in the last column in Table <ref type="table">2</ref> show the ratio between the best processing time and the total time of a video. The processing of four videos (Air_Force_One, Eiffel Tower, Notre_Dame and Scuba) didn't satisfy real time requirement. The length of the summary decreases significantly with decreasing of feature vector length for one video (Bearpark_climbing). We tested also the performance of the proposed online summarization method on the two long movies. First one contains 260222 frames and lasts 3 hrs. 00 min. 53 sec. (10853 seconds in total), the second one has 203882 frames and lasts 2 hrs. 21 min. and 43 sec. (8503 seconds in total).</p><p>The online summarization of the first movie took 3800 seconds for 336 features per frame, the length of the summary was 705 frames. Second video required 2209 seconds to build summary containing 585 frames using 252 features per frame.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8">Conclusion</head><p>We proposed an approach to generate the video summary in the form of keyframes, which are still images. It is based on the clustering of separate frames in online mode without the analysis of the whole video using Kohonen's self-organized maps. The frame from the video stream is selected as a summary frame if some quantity of frames 2 / 0 T after it and before were classified as the same cluster. The proposed method was tested on Open Video dataset and the performance is tested on SumMe dataset. We showed that the quality is comparable to some batch video summarization methods, and the performance combined with the flexibility in the selection of the quantity of frame features allows achieving real time processing in most cases. The quality of the suggested method is better compared to the user summary created in online mode.</p><p>The investigation of the dependency of quality and performance on the size of SOM map may be the topic of future research.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>of Gauss function, x -current input vector.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. The whole scheme of the proposed summarization method</figDesc><graphic coords="5,124.80,239.40,345.60,155.64" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Frames matching with different 0 F values</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 1 .</head><label>1</label><figDesc></figDesc><table><row><cell>A CUS ,</cell><cell cols="7">E CUS and 1 F scores for the known approaches, suggested matching and</cell></row><row><cell></cell><cell></cell><cell cols="3">known user summaries</cell><cell></cell><cell></cell></row><row><cell cols="2">Method</cell><cell>CUS</cell><cell>A</cell><cell>CUS</cell><cell>E</cell><cell>F</cell><cell>1</cell></row><row><cell>OV</cell><cell></cell><cell cols="2">0,63</cell><cell cols="2">0,61</cell><cell cols="2">0,56</cell></row><row><cell>DT</cell><cell></cell><cell cols="2">0,44</cell><cell cols="2">0,37</cell><cell cols="2">0,48</cell></row><row><cell cols="2">VSUMM1</cell><cell cols="2">0,76</cell><cell cols="2">0,47</cell><cell cols="2">0,71</cell></row><row><cell cols="2">VSUMM2</cell><cell cols="2">0,62</cell><cell cols="2">0,35</cell><cell cols="2">0,65</cell></row><row><cell cols="2">Our (compared</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="2">to existing</cell><cell cols="2">0,58</cell><cell cols="2">0,53</cell><cell cols="2">0,57</cell></row><row><cell cols="2">summaries)</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="2">Our (compared</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="2">to our user</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="2">summaries,</cell><cell cols="2">0,58</cell><cell cols="2">0,38</cell><cell cols="2">0,61</cell></row><row><cell cols="2">created in</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="2">online fashion)</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0" />			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Rethinking the Evaluation of Video Summaries</title>
		<author>
			<persName><forename type="first">M</forename><surname>Otani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Nakashima</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Rahtu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Heikkilä</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR.2019.00778</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</title>
				<imprint>
			<date type="published" when="2019-06">2019. June 2019. 2019</date>
			<biblScope unit="page" from="7588" to="7596" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Weakly Supervised Summarization of Web Videos</title>
		<author>
			<persName><forename type="first">R</forename><surname>Panda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ernst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Roy-Chowdhury</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2017 IEEE International Conference on Computer Vision (ICCV)</title>
				<imprint>
			<date type="published" when="2017-10">October 2017. 2017</date>
			<biblScope unit="page" from="3677" to="3686" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A bottom-up summarization algorithm for videos in the wild</title>
		<author>
			<persName><forename type="first">G</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Qu</surname></persName>
		</author>
		<idno type="DOI">10.1186/s13634-019-0611-y</idno>
	</analytic>
	<monogr>
		<title level="j">EURASIP Journal on Advances in Signal Processing</title>
		<imprint>
			<biblScope unit="volume">2019</biblScope>
			<biblScope unit="issue">15</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Video Summarization by Learning from Unpaired Data</title>
		<author>
			<persName><forename type="first">M</forename><surname>Rochan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR.2019.00809</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</title>
				<imprint>
			<date type="published" when="2019-06">2019. June 2019. 2019</date>
			<biblScope unit="page" from="7894" to="7903" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Weakly-supervised Video Summarization using Variational Encoder-Decoder and Web Prior</title>
		<author>
			<persName><forename type="first">S</forename><surname>Cai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zuo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">S</forename><surname>Davis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-01264-9_12</idno>
	</analytic>
	<monogr>
		<title level="m">Computer Vision -ECCV 2018</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">V</forename><surname>Ferrari</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Hebert</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Sminchisescu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Weiss</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">11218</biblScope>
			<biblScope unit="page" from="193" to="210" />
		</imprint>
	</monogr>
	<note>ECCV 2018</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">TVSum: Summarizing Web Videos Using Titles</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vallmitjana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Stent</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jaimes</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR.2015.7299154</idno>
	</analytic>
	<monogr>
		<title level="m">2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</title>
				<imprint>
			<date type="published" when="2015-06">June 2015. 2015</date>
			<biblScope unit="page" from="5179" to="5187" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Vsumm: A mechanism designed to produce static video summaries and a novel evaluation method</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">E F</forename><surname>De Avila</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">P B</forename><surname>Lopes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Da Luz</surname><genName>Jr</genName></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">D L</forename><surname>Araujo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Pattern Recognition Letters</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="56" to="68" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Quasi Real-Time Summarization for Consumer Videos</title>
		<author>
			<persName><forename type="first">B</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">P</forename><surname>Xing</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR.2014.322</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE Conference on Computer Vision and Pattern Recognition</title>
				<imprint>
			<date type="published" when="2014-06">2014. June 2014. 2014</date>
			<biblScope unit="page" from="2513" to="2520" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Video summarization by learning submodular mixtures of objectives</title>
		<author>
			<persName><forename type="first">M</forename><surname>Gygli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Grabner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Van Gool</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR.2015.7298928</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</title>
				<meeting>the IEEE Conference on Computer Vision and Pattern Recognition</meeting>
		<imprint>
			<date type="published" when="2015-06">June 2015. 2015</date>
			<biblScope unit="page" from="3090" to="3098" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Creating summaries from user videos</title>
		<author>
			<persName><forename type="first">M</forename><surname>Gygli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Grabner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Riemenschneider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Van Gool</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-10584-0_33</idno>
	</analytic>
	<monogr>
		<title level="m">Computer Vision -ECCV 2014</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">D</forename><surname>Fleet</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Pajdla</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Schiele</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Tuytelaars</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="volume">8695</biblScope>
			<biblScope unit="page" from="505" to="520" />
		</imprint>
	</monogr>
	<note>ECCV 2014</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Video Summarization with Long Short-Term Memory</title>
		<author>
			<persName><forename type="first">K</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">L</forename><surname>Chao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Sha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Grauman</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-46478-7_47</idno>
	</analytic>
	<monogr>
		<title level="m">Computer Vision -ECCV 2016</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">B</forename><surname>Leibe</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Matas</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Sebe</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Welling</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="volume">9911</biblScope>
			<biblScope unit="page" from="766" to="782" />
		</imprint>
	</monogr>
	<note>ECCV 2016</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Unsupervised video summarization with adversarial LSTM networks</title>
		<author>
			<persName><forename type="first">B</forename><surname>Mahasseni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Todorovic</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR.2017.318</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</title>
				<imprint>
			<date type="published" when="2017-07">2017. July 2017. 2017</date>
			<biblScope unit="page" from="1" to="10" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Video Summarization and Browsing Using Growing Cell Structures</title>
		<author>
			<persName><forename type="first">I</forename><surname>Koprinska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clark</surname></persName>
		</author>
		<idno type="DOI">10.1109/IJCNN.2004.1381056</idno>
	</analytic>
	<monogr>
		<title level="m">International Joint Conference on Neural Networks (IJCNN)</title>
				<imprint>
			<date type="published" when="2004-07">July 2004. 2004</date>
			<biblScope unit="page" from="2601" to="2606" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">VideoGCS -A Clustering-Based System for Video Summarization and Browsing</title>
		<author>
			<persName><forename type="first">I</forename><surname>Koprinska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Carrato</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Proceedings of the 6th COST 276 Workshop</title>
				<meeting><address><addrLine>Thessaloniki, Greece</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2004-05">May 2004. 2014</date>
			<biblScope unit="page" from="34" to="40" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Summarizing Video Information Using Self-Organizing Maps</title>
		<author>
			<persName><forename type="first">T</forename><surname>Baerecke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kijak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nuernberger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Detyniecki</surname></persName>
		</author>
		<idno type="DOI">10.1109/FUZZY.2006.1681764</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Fuzzy Systems</title>
				<imprint>
			<date type="published" when="2006-07">2006. July 2006. 2006</date>
			<biblScope unit="page" from="16" to="21" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<ptr target="https://sites.google.com/site/vsummsite/home" />
		<title level="m">VSUMM (Video SUMMarization</title>
				<imprint>
			<date type="published" when="2020-03-01">01 March 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Dilated Temporal Relational Adversarial Network for Generic Video Summarization</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kampffmeyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Xing</surname></persName>
		</author>
		<idno type="DOI">10.1007/s11042-019-08175-y</idno>
	</analytic>
	<monogr>
		<title level="j">Multimedia Tools and Applications</title>
		<imprint>
			<biblScope unit="volume">78</biblScope>
			<biblScope unit="page" from="35237" to="35261" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Summarizing Videos with Attention</title>
		<author>
			<persName><forename type="first">J</forename><surname>Fajtl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">S</forename><surname>Sokeh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Argyriou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Monekosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Remagnino</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-21074-8_4</idno>
	</analytic>
	<monogr>
		<title level="m">Computer Vision -ACCV 2018 Workshops. ACCV 2018</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">G</forename><surname>Carneiro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>You</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">11367</biblScope>
			<biblScope unit="page" from="39" to="54" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title level="m" type="main">Video Summarization with Attention-Based Encoder-Decoder Networks</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Ji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Pang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<ptr target="https://arxiv.org/pdf/1708.09545.pdf" />
		<imprint>
			<date type="published" when="2020-03-01">01 March 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Video Summarization Using Deep Semantic Features</title>
		<author>
			<persName><forename type="first">M</forename><surname>Otani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Nakashima</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Rahtu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Heikkilä</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Yokoya</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-54193-8_23</idno>
	</analytic>
	<monogr>
		<title level="m">Computer Vision -ACCV 2016. ACCV 2016</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">S</forename><forename type="middle">H</forename><surname>Lai</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Lepetit</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Nishino</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Sato</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">10115</biblScope>
			<biblScope unit="page" from="361" to="377" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Enhancing Video Summarization via Vision-Language Embedding</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">A</forename><surname>Plummer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lazebnik</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR.2017.118</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</title>
				<imprint>
			<date type="published" when="2017-07">2017. July 2017. 2017</date>
			<biblScope unit="page" from="1052" to="1060" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Video Summarization Based on a Fuzzy Based Incremental Clustering</title>
		<author>
			<persName><forename type="first">M</forename><surname>Pournazari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Mahmoudi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M E</forename><surname>Moghadam</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Electrical and Computer Engineering (IJECE)</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="593" to="602" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Real-time video summarization on mobile</title>
		<author>
			<persName><forename type="first">S</forename><surname>Marvaniya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Damoder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Gopalakrishnan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">N</forename><surname>Iyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Soni</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICIP.2016.7532342</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Image Processing (ICIP)</title>
				<imprint>
			<date type="published" when="2016-09">2016. September 2016. 2016</date>
			<biblScope unit="page" from="176" to="180" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">Self-Organizing Maps</title>
		<author>
			<persName><forename type="first">T</forename><surname>Kohonen</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1995">1995</date>
			<publisher>Springer-Verlag</publisher>
			<pubPlace>Berlin</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">The CIEDE2000 Color-Difference Formula: Implementation Notes, Supplementary Test Data, and Mathematical Observations</title>
		<author>
			<persName><forename type="first">G</forename><surname>Sharma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">N</forename><surname>Dalal</surname></persName>
		</author>
		<idno type="DOI">10.1002/col.20070</idno>
	</analytic>
	<monogr>
		<title level="j">Color Research &amp; Application</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="21" to="30" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Color Image Quality Assessment Based on CIEDE2000</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ming</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Yu</surname></persName>
		</author>
		<idno type="DOI">10.1155/2012/273723</idno>
	</analytic>
	<monogr>
		<title level="j">Advances in Multimedia</title>
		<imprint>
			<biblScope unit="volume">2012</biblScope>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
	<note>ID of paper 273723</note>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">FREAK: Fast Retina Keypoint</title>
		<author>
			<persName><forename type="first">A</forename><surname>Alahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ortiz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Vandergheynst</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR.2012.6247715</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE Conference on Computer Vision and Pattern Recognition</title>
				<imprint>
			<date type="published" when="2012-06">2012. June 2012. 2012</date>
			<biblScope unit="page" from="510" to="517" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Machine Learning for High-Speed Corner Detection</title>
		<author>
			<persName><forename type="first">E</forename><surname>Rosten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Drummond</surname></persName>
		</author>
		<idno type="DOI">10.1007/11744023_34</idno>
	</analytic>
	<monogr>
		<title level="m">Computer Vision -ECCV 2006</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">A</forename><surname>Leonardis</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Bischof</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Pinz</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="volume">3951</biblScope>
			<biblScope unit="page" from="430" to="443" />
		</imprint>
	</monogr>
	<note>ECCV 2006</note>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Video summarization by curve simplification</title>
		<author>
			<persName><forename type="first">D</forename><surname>Dementhon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Kobla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Doermann</surname></persName>
		</author>
		<idno type="DOI">10.1145/290747.290773</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the sixth ACM international conference on Multimedia</title>
				<meeting>the sixth ACM international conference on Multimedia</meeting>
		<imprint>
			<date type="published" when="1998-09">September 1998. 1998</date>
			<biblScope unit="page" from="211" to="218" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Keyframe-based video summarization using Delaunay clustering</title>
		<author>
			<persName><forename type="first">P</forename><surname>Mundur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Rao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yesha</surname></persName>
		</author>
		<idno type="DOI">10.1007/s00799-005-0129-9</idno>
	</analytic>
	<monogr>
		<title level="j">International Journal on Digital Libraries</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="219" to="232" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
