<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A Graph-Based Approach and Analysis Framework for Hierarchical Content Browsing</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Markus</forename><surname>Rickert</surname></persName>
							<email>markus.rickert@cs.tu-chemitz.de</email>
							<affiliation key="aff0">
								<orgName type="institution">Technische Universität Chemnitz</orgName>
								<address>
									<addrLine>Straße der Nationen 62</addrLine>
									<postCode>D-09111</postCode>
									<settlement>Chemnitz</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Benedikt</forename><surname>Etzold</surname></persName>
							<email>benedikt.etzold@cs.tu-chemitz.de</email>
							<affiliation key="aff1">
								<orgName type="institution">Technische Universität Chemnitz</orgName>
								<address>
									<addrLine>Straße der Nationen 62</addrLine>
									<postCode>D-09111</postCode>
									<settlement>Chemnitz</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Maximilian</forename><surname>Eibl</surname></persName>
							<email>maximilian.eibl@cs.tu-chemitz.de</email>
							<affiliation key="aff2">
								<orgName type="institution">Technische Universität Chemnitz</orgName>
								<address>
									<addrLine>Straße der Nationen 62</addrLine>
									<postCode>D-09111</postCode>
									<settlement>Chemnitz</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A Graph-Based Approach and Analysis Framework for Hierarchical Content Browsing</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">DA9D9F80F1464032BE8C9F518226FB88</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T06:19+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Content-browsing</term>
					<term>video analysis</term>
					<term>video retrieval</term>
					<term>graphbased analysis</term>
					<term>visualization</term>
					<term>algorithm</term>
					<term>user interface Graphs and networks</term>
					<term>H.5.1 [Information Interfaces and Presentation]: Multimedia</term>
					<term>I.2.10 [Vision and Scene Understanding]: Video analysis</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Systems for multimedia retrieval have been object of scientific research for many years. When it comes to present results to the user many solutions disregard the set of problems connected to content delivery. Especially time-constrained results of video retrieval systems need a different visualization. In this paper we present our solution for hierarchical content browsing of video files. Our workflow covers the phases of ingest, transcoding, automatic analysis, intellectual annotation and data aggregation. We describe an algorithm for the graph-based analysis of the content structure in videos. By identifying the requirements of professional users we developed a user interface enabling to access retrieval results in different hierarchical abstraction levels.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>INTRODUCTION</head><p>Compared to other areas of information retrieval, the content-browsing of audiovisual media bears special challenges. Videos are time-dependent. Usually the user's intention is to find an element inside a video, depicting a certain semantic concept like a person, topic, location or event. By querying a video database, the returned result is either a complete video item or a single element inside a video item determined by its time position. Professional users are not mainly interested in finding only a single occurrence of the queued semantic concept. They want to gather the whole sequence related to their search query, e.g. to reuse it in a news report or for historic research. The user usually sees the retrieval result as a starting point for a further manual searching process inside the video item which is operated by using the playback and seek functions of the player software.</p><p>In this paper we present our approach to provide a hierarchical presentation of video items to support professional users while browsing and consuming the content of a media retrieval system. Based on the primary focus of video content from television programs, this solution works best on video material edited in a post-production workflow. It is not supposed to be used on e.g. surveillance videos. Our framework has been developed to provide automatic and intellectual annotation to historical television recorded on video tapes. The digitized master copies and their metadata can be searched and displayed in a web-based user interface (UI). Video shots and sequences can be explored as a hierarchical structure in the UI. The system is in use in a pilot project by the "media state authority of Saxony" (Sächsische Landesmedienanstalt) in Germany.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>USER REQUIREMENTS &amp; EXISTING WORKFLOWS</head><p>Our use case focuses on user groups in professions that rely heavily on reviewing large amounts of video data on a daily basis like journalists, editors or historians.</p><p>In a set of interviews we asked a group of experts to describe their daily work. Thereby, we especially focused on those areas that deal with the examination of the results of archive queries. Other fields of interest were the process of querying, preferred software solutions and the planning of new reports or videos. Our findings were subsequently merged into an extensive workflow that was used for identifying different problem areas.</p><p>Altogether, we spoke to three experts from three different German TV stations, who all work in the field of TV journalism. Their similar statements and their reports on the workflows of other professionals and institutions give reason to believe that our workflow is representative for a significant part in this field of work. Conducting surveys and interviews <ref type="bibr" target="#b14">[17,</ref><ref type="bibr" target="#b15">18]</ref>, we identified some of the main problems they face as a part of their working routine:  Metadata is often either fragmentary, or missing completely. While standards or recommendations exist in most professions, they are usually ignored due to bottlenecks in time and personnel.  Video data is normally stored in its final state, e.g. a film that has already been edited in post-production. In the case of search queries returning more than one result, users often receive a single file containing a queue of all relevant video files.</p><p> In TV production, time pressure is always high because of narrow schedules and the need for instant coverage of current events.</p><p>Specific software solutions addressing these issues do not yet exist in professional scenarios. This leads to a highly inefficient workflow: Precision rates are usually low because of the described storing modalities and the lack of precise metadata. Therefore, numerous files of comparatively large size have to be inspected in a short period of time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Classical User Interfaces</head><p>The software that is used is normally designed to handle the simple consumption of video content (e.g. VLC Media Player or Apple QuickTime) or the tasks of professional post-production (e.g. Avid MediaComposer or Adobe Premiere). Both approaches are based on a perspective that emphasizes the linear structure of the completed video whilst or after the process of editing. By showing an ordered sequence of single shots, they present the content in consideration of the editor's intention but not of the needs of an expert using a retrieval system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Requirements</head><p>Based on these findings, we compiled a list of requirements that have to be met by a user interface to improve the user experience significantly:</p><p> Metadata is usable for both video processing and visualization.  Information can be displayed based on the video's structure.  Richness of detail can be increased for single segments of the video.  The video itself can be accessed through any bit of information displayed in the UI.  Relevant segments of the video can be used in later steps of the user's workflow, e.g. editing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>FRAMEWORK</head><p>Our framework provides functionalities for audio and video analysis, manual annotation, data warehousing, retrieval and visualization. It uses specialized components for each aspect. The core "dispatcher" is controlling the analysis process, allocation of work units and data aggregation. As deduced from <ref type="bibr" target="#b5">[8]</ref>, the requirements for a scalable analysis system based on heterogeneous scientific algorithms on the field of audio and video analysis are complex. The framework is presented here in its complete workflow for the first time. Earlier publications covered only aspects of distinct components. A predecessor partial framework was presented in <ref type="bibr" target="#b1">[3]</ref>.</p><p>Our framework needs to support individual solutions, programmed in varying languages, based on different operating system environments and requesting various quantities of resources. Therefor it runs in an environment of virtual machines on a cluster of five Intel Xeon dual-quad-core host servers. The main components were written in C# .Net source code and make use of service-orientated-architecture and web services. This provides a redundant and hardware independent service, while supporting a variety of separate execution environments for each component. It also allows for a possible scale-out with additional hardware if needed.</p><p>The execution workflow for an individual video tape or file consists of five phases, as depicted in Figure <ref type="figure" target="#fig_0">1</ref>. On the level of each stage it is intended to reach a maximum of concurrency. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. Digitization and Transcoding</head><p>The very first step is an incoming control of each video tape and the generation of a unique identifier. Our id system consists of a 12-byte block and can be represented and displayed for human reading as a combination of 12 hexadecimal digits (e.g. 0000-0074-0000-0026-Z) in four segments plus a calculated check character. After the initial logging, the video tape is digitized with an automatic robot ingest system as described in <ref type="bibr" target="#b9">[12]</ref>. It is running batch jobs in parallel on up to six tape players.</p><p>The resulting digital master file is encoded as a broadband IMX50 video codec captured in an mxf-container for archiving and data exchange. As defined by <ref type="bibr" target="#b5">[8]</ref> we create proxy versions of the archive file by transcoding it. For automatic annotation, analysis and as a preview video for the web UI, we use an h.264 codec at level 4.1 wrapped in an mp4-container.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. Automatic Analysis and Annotation</head><p>The created analysis proxy video is transferred to the analysis cluster. The dispatcher schedules the analysis of each video file as a sequence of consecutive analysis steps. For performance reasons, each component can be instantiated multiple times. In the common configuration, the system runs with up to 12 individual virtual machines. The analysis components are controlled by the dispatcher via a web-service interfaces.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Shot detection component</head><p>The shot detection is the first component in the workflow. It provides a segmentation of the continuous video stream into parts of uninterrupted camera recordings (shots). The algorithms developed by <ref type="bibr" target="#b6">[9]</ref> and <ref type="bibr" target="#b7">[10]</ref> are based on calculating the cumulated error rate of individual motion vectors for each image block between two successive frames. The component's output is a list of metadata for every detected shot. Key frames of the shot are extracted for use in the UI and successive components.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Face detection component</head><p>The face detection component uses the key frames from the shot detection to mark bounding boxes around each detected face. The used algorithm is optimized for high precision and developed by <ref type="bibr" target="#b7">[10]</ref>. It is specialized on data corpora from local television broadcasts. Its result data is a set of metadata of the bounding box around detected faces and a sample image for each detected face.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Text extraction component</head><p>The text extraction component detects areas of overlay text boxes within the video steam. The algorithm by <ref type="bibr" target="#b8">[11]</ref> uses a weighted discrete cosine transform (DCT) to detect macroblocks by finding regions located in the medium frequency spectrum. By normalizing the eigenvalues, a mask is calculated which is used to separate the textbox from the rest of the image. For text to character transformation the software tesseract-ocr is used (https://code.google.com/p/tesseractocr/). The component creates key frame samples of the detected textboxes, metadata about the locations of the textboxes and the extracted text from the OCR (optical character recognition).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Speech Recognition</head><p>The Speech Recognition component makes use of the speaker change recognition method described by <ref type="bibr" target="#b3">[6]</ref> and extended by <ref type="bibr" target="#b1">[3]</ref>. It provides data for the differentiation of individual voices and pauses. By applying Gaussian Mixture Models individual speaker can be trained and recognized. The detected utterances of individual speakers are transferred to an automatic speech recognition (ASR) software. The resulting data provides not only the recognized words. It adds metadata about the time position and duration of the utterance and an id code for identification and re-recognition of the speaker.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. Intellectual Annotation</head><p>This framework is not only used for demonstration of our solutions. It is in productive use for archiving historical tape based material. This constitutes the need for additional intellectual annotation, since today's automatic annotation can provide support, but it cannot substitute the intellectual work of a human entirely. Secondly, the manual annotated metadata is used as training sets and test sets for the development of new algorithms. Therefor we collect metadata for 1 (REM http://rmd.dra.de/remid.php?id=REM_APR_3) each video tape in form of classical intellectual annotation as it is already implemented in media archives.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Scene &amp; Topic Annotation</head><p>We developed a web-based annotation tool for the intellectual annotation of the analyzed video files. To support the professional user, the tool makes use of the detected video shots. The video is presented in slices of camera shots. The video player repeats the current shot in a loop. This makes it easier for the user to fill out all input fields, without constantly dealing with the player controls. When the user is finished with the shot, he can jump to the next. The user marks the boundaries of storyline sequences as collections of multiple shots and adds a variety of bibliographical metadata like title and type of video content, topics and subjects in terms of individuals, locations, institutions, things, creations (like art) and other metadata useful for either information retrieval or as development test data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. Data Aggregation</head><p>In the past we were only analyzing video assets for isolated scientific experiments. To process large quantities of videos now, the integration of results from different analysis algorithms becomes a key challenge. For our environment of use cases, a data-warehouse solution is needed to aggregate more than only the results of video analysis. On the one hand it needs to incorporate the metadata supplied from its sources, like production information and data from TV broadcasters.</p><p>On the other hand it has to provide its data as an export artifact witch is compatible with formats and conventions used by achieving facilities and institutes.</p><p>A special challenge was to find a scheme, which complies with the way video content producers and archives structure their data, and includes technical data, like feature-vectors and audiovisual classifications. Our selected databasescheme is adapted from a common standard for video documentation 1 developed for the German public television. We combined metadata fields from the point of mandatory and optional meta-data classes with the goal to maintain a maximum of compatibility.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. Content Delivery and Visualization</head><p>For data exchange and archiving, the digital master file, the proxy files and the metadata are exported to a LTO tape library. Search and content access for the user are provided by a webserver. The user interface is used for web-based intellectual annotation, controlling the analysis process, information retrieval and for content browsing. The UI is able to handle multiple tenants and has as scalable interface for different display resolutions or devices. Each function runs in its own web-app.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>GRAPH-BASED VIDEO CLUSTERING</head><p>During the analysis and automatic annotation we extract segments of camera shots from the video stream. This shot-segmentation is helpful for content browsing, but it suffers from over-segmentation. The structure is too detailed for the visualization of the actions inside a video. The user needs to be able to search for scenes or sequences as basic units.</p><p>Procedure Sequence-Graph input: List of detected shot-boundaries and transitions 𝑆ℎ, list of sequences 𝑆𝑞. output: Sequence-graph 𝐺 1 (𝐸 𝑎 , 𝑉 𝑎 )  Different approaches for clustering or grouping of related shots were published. A detailed survey on the field of video segmentation is given by <ref type="bibr" target="#b10">[13]</ref>.</p><p>A common strategy in many clustering approaches is to find structures and similarities in the given video. The similarity measurement can be based on classification of e.g. motion vectors, dominant color, edge histogram and editing tempo. By calculating the similarity of consecutive shots, groups can be identified. "Overlapping-links" introduced by <ref type="bibr" target="#b13">[16]</ref> was one of the early strategies to find structures inside of videos. It was extended by <ref type="bibr" target="#b16">[19,</ref><ref type="bibr" target="#b17">20]</ref>. The algorithm can cluster similar shots and the shots laying in between as a Logical shot units (LSUs) <ref type="bibr" target="#b18">[21]</ref>.</p><p>Our solution was inspired by overlapping-links, the concept of a Scene-Transition-Graph (STG) <ref type="bibr" target="#b11">[14,</ref><ref type="bibr" target="#b19">22]</ref> and the Scene Detection solution published by <ref type="bibr" target="#b12">[15]</ref>. These approaches are still subject to actual publication and optimizations like <ref type="bibr" target="#b20">[23,</ref><ref type="bibr" target="#b21">24,</ref><ref type="bibr" target="#b22">25]</ref>. Shots are represented as nodes, transitions as edges. Shots with a high similarity are clustered into group-nodes. This process leads to a digraph with cycles.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Data Structure</head><p>Our proposed solution is derived from the concept of shottransition-graphs. We use a weighted directed graph for the representation of hierarchical sequence structures in a video. Edges represent transitions between distinct shots or sequences. Nodes represent single shots or sub-segments with a new graph of shots inside. See Table <ref type="table">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sequence-Graph-Algorithm</head><p>In order to access the video content in a graph based hierarchical structure, we create a directed graph to represent the video's shots and sequences   end for each 8.</p><p>for each sub-vertex 𝑉𝑠 𝑖 of 𝑉𝑎 𝑖 not from 𝑠𝑞 𝑖 do 9.</p><p>add new non-group-vertex 𝑉𝑠 𝑗 to 𝐺𝑡 𝑗 10. end for each 11. for each sub-edge 𝐸𝑠 ∪ 𝐸𝑎 from 𝑉𝑎 𝑖 do 12.</p><p>add new edge 𝐸𝑎 𝑗 to 𝐺𝑡 𝑗 connecting the corresponding vertex of its sources group-vertex 𝑉𝑎 and its targets group-vertex 𝑉𝑎, respectively the non-group-vertex 𝑉𝑠 if source or target is not part of an similarity group 𝑠𝑞 𝑖 . 13. end for each 14. Calculate the strongly connected components of 𝐺𝑡 𝑗 15. for each strongly connected component 𝑠𝑐𝑐 𝑘 from 𝐺𝑡 𝑗 do <ref type="bibr" target="#b13">16</ref>.</p><p>create new similarity-vertex 𝑉𝑎 𝑘 as sub-element in 𝑉𝑎 𝑖 17.</p><p>for each shot-vertex 𝑉𝑠 𝑙 from 𝑉𝑎 𝑖 do 18.</p><p>if 𝑉𝑠 𝑙 𝑖𝑠 𝑚𝑒𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑐𝑐 𝑘 then 19.</p><p>remove 𝑉𝑠 𝑙 and it's out-edges and in-edge from 𝑉𝑎 𝑖 20.</p><p>add 𝑉𝑎 𝑙 and it's edges as sub-elements to the 𝑉𝑎 𝑘 21.</p><p>end if <ref type="bibr">22.</ref> end for each <ref type="bibr">23.</ref> for each 𝐸𝑠 𝑚 removed from 𝑉𝑎 𝑖 do 24.</p><p>add new 𝐸𝑎 𝑚 edge in 𝑉𝑎 𝑘 connecting 𝑉𝑎 𝑘 with it's predecessors resp. successors 25.</p><p>end for each 26. end for each 27. for each edge 𝐸𝑎 𝑚 from 𝑉𝑎 𝑖 do 28.</p><p>if more than one 𝐸𝑎 exists with the same source-vertex and the same target-vertex as 𝐸𝑎 𝑘 than 29.</p><p>remove all duplicates and increment the weight of 𝐸𝑎 𝑚 30.</p><p>end if 31. end for each 32. end for each The resulting Sequence-Graph (Algorithm in Figure <ref type="figure" target="#fig_1">2</ref>) is representing all content sequences as aggregated nodes and the remaining singular nodes not belonging to a sequence on the first level. Inside each aggregated node a sub-graph was created on the second level, representing the chain of shots forming a sequence.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Similarity-Graph-Algorithm</head><p>One important feature of videos from film and television is the presence of recurring images. This happens especially when interviews or dialogs are recorded where the same individuals are shown several times. In terms of film grammar this is called the shot-/ reverse-shot method. See Figure <ref type="figure" target="#fig_4">4</ref> Resulting Graph Structure</p><p>The final resulting graph represents the video in a hierarchical structure. On the first level all sequences and all standalone shots can be accessed. By selecting a sequence all shots and similarity groups inside the selected sequence can be accessed. If a shot shows a similar image multiple times, each instance of this image is aggregated to a group. Recurring shots are recognizable by cyclic structures of the edges. On selecting a similarity group the individual instances of the similar shots can be accessed. The results of the two clustering-steps and the final 3-layer graph are shown in Figure <ref type="figure" target="#fig_5">5</ref>. Figure <ref type="figure" target="#fig_7">6</ref> shows die visualization of a single layer as used in the UI. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>GRAPH-BASED USER INTERFACE</head><p>UI approaches with the purpose of addressing structures in video content have been developed mainly in in the fields of film studies and in human-computer-interaction (HCI). They normally focus on certain key aspects like analysis, description <ref type="bibr" target="#b2">[4]</ref> or summarization <ref type="bibr" target="#b0">[1]</ref> of content. From their perspective, the temporal order of a video's single sequences is an important bit of information and therefore one of the fundamental principles of their modus operandi.</p><p>By shifting the main focus to the video's structure, we managed to design a user interface that makes it possible to quickly overlook a whole file without losing any detail.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Graph-based User Interface</head><p>In order to avoid the issues reported by our user groups, as described above, we decided to organize all available information in a way that emphasizes the video's structure. Richness of detail is increased from top (overview) to bottom (all details and metadata). The presented metadata-types are summarized in Table <ref type="table" target="#tab_1">2</ref>. The following interface description is connected to the layers presented in the Figures <ref type="figure" target="#fig_7">6 and 7</ref>.</p><p>I. Video player -The player can be used to examine the single segments in any intended way. In order to provide permanent availability, it remains at the top of the screen when scrolling to the lower parts of the UI. II. Current graph -Its nodes represent either a single shot group or a cluster of related groups. By using a simple directed graph for the top level, we were able to display all nodes in a familiar left-to-right-order. Every node contains a representative image sample and some basic information on its content. The existence of child graphs is color coded (blue) on this level of detail. III. Collapsible container that is used to display a more granular child graph belonging to a certain top-level node. IV. Queue -Nodes can be transferred in a drag-and-drop operated queue of cards that offer a more detailed view of their content. Furthermore, they can be used to manage a collection of shots or shot groups that can be watched directly or exported for further use e.g. in editing software.</p><p>V. Details-viewshows all data that is available for one of the cards. It consists of several lines displaying key frames, detected faces, off text and text overlays.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>EVALUATION</head><p>We performed a first evaluation of our approach by using a combination of baseline tests and questionnaires. Therefore, we designed a set of tasks comparable to those described by our group of experts. A screenshot of the graph-based UI is depicted in Figure <ref type="figure">7</ref>. The content used for evaluation consists of real television news programs, produced during the early to mid-1990s. It was archived on VHS video tapes. The actual test-set was composed by randomly selecting 1377 minutes of this video material.</p><p>Four expert users were asked to perform searching tasks. They were given short descriptions of 27 randomly picked video sequences with durations between 5 seconds and 10 minutes. The task was to find the described sequences in the corresponding video file and to write down the time codes of the sequence boundaries. Searching tasks like these are quite comparable to the real live work of video editors, because video content in tape based archives is only marginally documented. Manual content browsing in a video player and non-linear editing software (NLE) is used to find sequences of video content reusable in new video clips. For comparison, the searching tasks were performed by using our graph-based user interface, VLC Media Player and Adobe Premiere Pro (CS 6). For each task the time needed for completion was recorded. Overall, 108 different search operations were performed. Furthermore, differences in the accuracy of the time-codes were taken into account. With the graph-based UI, the average duration per searching task was 93 seconds. When searching with VLC (average: 122 s) and Premiere Pro (average: 179s) significantly more time was needed (Figure <ref type="figure" target="#fig_8">8</ref>). As a result our graph-based solution outperformed VLC and Premiere Pro. In VLC 27.8% more time was needed. Searching in Premiere Pro needed 48.5% more time. One reason for the weak performance of Premiere Pro could be the zoom function. It was heavily used by the testers, but leaded to longer searching times.</p><p>One disadvantage of the graph-based UI turned out to be the fact that entities and events inside a video shot cannot be isolated. They are bound to the boundaries of the surrounding shot and cannot be exported independently. In terms of perceiving the actual structure of the video, all users reported gaining a deeper understanding when using our approach than when using VLC or Premiere Pro.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>FUTURE WORK</head><p>The next step for the analysis and graph-based clustering will be the substitution of the manual annotation of video sequences by an automatic sequence segmentation algorithm. Surveys on the state-of-the-art in video-segmentation indicate that a multimodal fusion of the analysis results can be used to cluster successive shots into video sequences. Most approaches use visual similarity features. But as discussed in <ref type="bibr" target="#b4">[7]</ref>, concepts and rules from the production of video content can be useful to find sequences or scenes inside video content.</p><p>The graph-based user interface will be evaluated in additional user tests, exploring if its use is beneficial for non-professional users as well. A second study will evaluate, which text-based metadata should be presented at the different elements to comply the need of the users. Currently, extensions of the UI are under development to enable a sync-function. It will allow adapting the presented graph elements when the current position in the video shifts to the next sequence. This will give the UI a two-sided interaction between the video player and the graph structure.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>CONCLUSION</head><p>In this paper we presented our concept of a hierarchical presentation of video items in a graph-based structure. We described our framework which incorporates video and audio analysis, intellectual annotation and graph analysis to construct a multi-layer structure for content-consumption. Our web-based UI shows how classical sequential content browsing in videos can be extended to incorporate the inner structures and relations of the video's sub-elements. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Framework Workflow in its five Phases.</figDesc><graphic coords="2,314.28,172.20,234.96,218.40" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Procedure to create a Sequence-Graph.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head></head><label></label><figDesc>. The vertices belonging to a Description 𝐸𝑠 Singular Edges -Directed edge between two Singular Nodes (Ns) representing the transition from a camera shot to its successor in the sequence of the video. 𝑉𝑠 Singular Nodes -A single continuous camera shot. 𝐸𝑎 Aggregated Edge -Directed edge between two Aggregated Nodes (Na) or between a Singular Node and an Aggregated Node. It represents a set of interrelated Singular Nodes, respectively a sub-graph containing a scene in the video. 𝑉𝑎 Aggregated Node -A group of Singular or Aggregated Nodes as a sequence or sub-graph. 𝐶 Color-Similarity-Group (𝐶) -A list of shots, grouped by its visual similarity. The similarity is measured by a combination of the MPEGdescriptors Edge-Histogram (EHD) and Color-Layout (CLD). [10 pp.169] 𝑆𝑞 Sequence-List (𝑆𝑞) -A List of shots, grouped by their affiliation to a sequence, found by intellectual annotation. A sequence represents a segment of continuous action or location in a video.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Table 1 :𝑉𝑠</head><label>1</label><figDesc>Data structures. Metadata &amp; Parameter 𝐸𝑠  Duration of the transition.  Type of transition (cut, wipe, dissolve, fade). As described in the taxonomy by [7] Number of the shot.  Times of start, duration and end of the shot  Extracted keyframes of the first and last frame.  Extracted keyframes from face detection  Data from text extraction 𝐸𝑎  Weight. 𝑉𝑎  A representative keyframe.  Start-time of the earliest sub-element.  End-time of the latest sub-element.  Metadata of the speech recognition.  Annotation: topic, location, subjects, individuals etc.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Similarity-Graph Procedure</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: 𝑮 𝟏 (Sequence-Graph), 𝑮 𝟐 (Similarity-Graph and Sequence-Graph)</figDesc><graphic coords="5,311.16,417.84,242.52,201.84" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Visualization of vertices and edges of the Sequence-Graph</figDesc><graphic coords="5,51.72,72.00,491.40,69.96" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Multilayer-View of a Graph-based UI.Figure 7: Schematic View of the UI.</figDesc><graphic coords="6,59.76,489.96,233.04,213.12" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Figure 8 :</head><label>8</label><figDesc>Figure 8: Evaluation results.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>1. for each detected shot and transition 𝑠ℎ 𝑖 from 𝑆ℎ do 2. add new vertex 𝑉𝑠 𝑖 to 𝐺 1 3. add new edge 𝐸𝑠 𝑖 to 𝐺 1 connecting 𝑉𝑠 𝑖 and 𝑉𝑠 𝑖+1 4. end for each 5. for each sequence 𝑠𝑞 𝑗 from 𝑆𝑞 do 6. create new 𝑉𝑎 𝑗 in 𝐺 1 7. end for each 8. for each 𝑉𝑠 𝑖 from 𝐺 1 do 9.</figDesc><table /><note>if 𝑉𝑠 𝑖 belongs to sequence 𝑠𝑞 𝑗 then 10. remove 𝑉𝑠 𝑖 and it's out-edges and in-edge from 𝐺 1 11. add 𝑉𝑠 𝑖 and it's edges as sub-elements to the 𝑉𝑎 𝑗 12. end if 13. end for each 14. for each 𝐸𝑠 𝑖 removed from 𝐺 1 15. add new 𝐸𝑎 𝑖 edge in 𝐺 1 connecting 𝑉𝑎 𝑗 with the predecessors resp. successors of 𝑉𝑠 𝑖 16. end for each 17. for each 𝐸𝑎 𝑘 from 𝐺 1 do 18. if more than one 𝐸𝑎 exists with the same source-vertex and the same target-vertex as 𝐸𝑎 𝑘 than 19. remove all duplicates and increment the weight of 𝐸𝑎 𝑘 20. end if 21. end for each</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 : Metadata available in the data structures.</head><label>2</label><figDesc></figDesc><table /><note>sequence are aggregated to build a second level in the hierarchy. Metadata created during the intellectual annotation performs the aggregation. Procedure Similarity-Graph input: List of Color-Similarity-Groups 𝐶, graph 𝐺 1 output: Sequence-Graph with Similarity-Subgraphs 𝐺 2 1. for each 𝑉𝑎 𝑖 from 𝐺 1 do 2. create new temporal graph 𝐺𝑡 𝑖 3. for each similarity group 𝑠𝑞 𝑗 from 𝑆𝑞 do 4. if one or more sub-vertex 𝑉𝑠 𝑖 of 𝑉𝑎 𝑖 is ∈ 𝑠𝑞 𝑖 then 5. add new group-vertex 𝑉𝑎 𝑗 to 𝐺𝑡 𝑗 6. end if 7.</note></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ACKNOWLEDGMENTS</head><p>Parts of this work were accomplished in the research project validAX funded by the German Federal Ministry of Education and Research.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">An overview of video shot clustering and summarization techniques for mobile applications</title>
		<author>
			<persName><forename type="first">N</forename><surname>Adami</surname></persName>
		</author>
		<author>
			<persName><surname>Benini S</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Leonardi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc MobiMedia &apos;06</title>
				<meeting>MobiMedia &apos;06</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">annotate. archive. repurpose --: accelerating the composition and metadata accumulation of tv content</title>
		<author>
			<persName><forename type="first">R</forename><surname>Knauf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kürsten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kurze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ritter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Berger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Heinich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Eibl</surname></persName>
		</author>
		<author>
			<persName><surname>Produce</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. AIEMPro&apos;11</title>
				<meeting>AIEMPro&apos;11</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="30" to="36" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Einführung in die systematische Filmanalyse</title>
		<author>
			<persName><forename type="first">H</forename><surname>Korte</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1999">1999</date>
			<publisher>Schmidt</publisher>
			<biblScope unit="volume">40</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Speaker change detection and tracking in real-time news broadcasting analysis</title>
		<author>
			<persName><forename type="first">L</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-J</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc MULTIMEDIA&apos;02</title>
				<meeting>MULTIMEDIA&apos;02</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="602" to="610" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A proposal for a taxonomy of semantic editing devices to support semantic classification</title>
		<author>
			<persName><forename type="first">M</forename><surname>Rickert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Eibl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. RACS 2014</title>
				<meeting>RACS 2014</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="34" to="39" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Evaluation of media analysis and information retrieval solutions for audio-visual content through their integration in realistic workflows of the broadcast industry</title>
		<author>
			<persName><forename type="first">M</forename><surname>Rickert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Eibl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. RACS 2013</title>
				<meeting>RACS 2013</meeting>
		<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="118" to="121" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">An Extensible Tool for the Annotation of Videos Using Segmentation and Tracking</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ritter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Eibl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="s">Lecture Notes in Computer Science. Springer</title>
		<imprint>
			<biblScope unit="page" from="295" to="304" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">Marc</forename><surname>Ritter</surname></persName>
		</author>
		<idno>119-144</idno>
		<title level="m">Optimierung von Algorithmen zur Videoanalyse</title>
				<meeting><address><addrLine>Chemnitz</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="187" to="213" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Textdetektion und -extraktion mit gewichteter DCT und mehrwertiger Bildzerlegung</title>
		<author>
			<persName><forename type="first">S</forename><surname>Heinich</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. WAM 2009</title>
				<meeting>WAM 2009<address><addrLine>TU-Chemnitz</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="151" to="162" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">A Support Framework for Automated Video and Multimedia Workflows for Production and Archive</title>
		<author>
			<persName><forename type="first">R</forename><surname>Manthey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Herms</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ritter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Storz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Eibl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. HCI International</title>
				<meeting>HCI International</meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013. 2013</date>
			<biblScope unit="page" from="336" to="341" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">State-of-the-art and future challenges in video scene detection: a survey</title>
		<author>
			<persName><forename type="first">M</forename><surname>Del Fabro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Böszörmenyi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal Multimedia systems</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="427" to="454" />
			<date type="published" when="2013">2013</date>
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Time-constrained clustering for segmentation of video into story units</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Yeung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Boon-Lock</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICPR.1996.546973</idno>
	</analytic>
	<monogr>
		<title level="m">Proc. 13 th IC on Pattern Recognition</title>
				<meeting>13 th IC on Pattern Recognition</meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="1996">1996</date>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="375" to="380" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Video summarization and scene detection by graph modeling</title>
		<author>
			<persName><forename type="first">C.-W</forename><surname>Ngo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-F</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-J</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Circuits and Systems for Video Technology</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="page" from="296" to="305" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Automated high-level movie segmentation for advanced video-retrieval systems</title>
		<author>
			<persName><forename type="first">A</forename><surname>Hanjalic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">L</forename><surname>Lagendijk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Biemond</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Circuits and Systems for Video Technology</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="580" to="585" />
			<date type="published" when="1999">1999</date>
			<publisher>IEEE</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Visual String of Reformulation</title>
		<author>
			<persName><forename type="first">A</forename><surname>Berger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kürsten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Eibl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc HCI International</title>
				<meeting>HCI International</meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="volume">5618</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">A: Design Thinking for Search User Interface Design</title>
		<author>
			<persName><surname>Berger</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc EuroHCIR2011</title>
				<meeting>EuroHCIR2011<address><addrLine>Newcastle</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="38" to="41" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">A new approach for high level video structuring</title>
		<author>
			<persName><forename type="first">Y.-M</forename><surname>Kwon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-J</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I.-J</forename><surname>Kim</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Multimedia and Expo</title>
				<meeting>Multimedia and Expo</meeting>
		<imprint>
			<date type="published" when="2000">2000</date>
			<biblScope unit="page" from="773" to="776" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Automatic segmentation of news items based on video and audio features</title>
		<author>
			<persName><forename type="first">W</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Gao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Advances in Multimedia Information Processing, PCM 2001</title>
				<meeting>Advances in Multimedia Information essing, PCM 2001</meeting>
		<imprint>
			<biblScope unit="volume">2195</biblScope>
			<biblScope unit="page" from="498" to="505" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Systematic evaluation of logical story unit segmentation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Vendrig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Worring</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="492" to="499" />
			<date type="published" when="2002">2002</date>
			<publisher>IEEE</publisher>
		</imprint>
	</monogr>
	<note>Multimedia</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Normalized Cuts and Image Segmentation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Shi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Malik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Pattern Analysis and Machine Intelligence</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="888" to="905" />
			<date type="published" when="2000">2000</date>
			<publisher>IEEE</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Multi-modal scene segmentation using scene transition graphs</title>
		<author>
			<persName><forename type="first">P</forename><surname>Sidiropoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Mezaris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Kompatsiaris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Meinedo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Trancoso</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. MM &apos;09 of the 17th ACM international conference on Multimedia</title>
				<meeting>MM &apos;09 of the 17th ACM international conference on Multimedia</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="665" to="668" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Graphbased multi-modal scene detection for movie and teleplay</title>
		<author>
			<persName><forename type="first">S</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Ding</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Xu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Acoustics, Speech and Signal Processing (ICASSP) 2012</title>
				<meeting>Acoustics, Speech and Signal essing (ICASSP) 2012</meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2012">2012. 2012</date>
			<biblScope unit="page" from="1413" to="1416" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Interactive storytelling via video content recombination</title>
		<author>
			<persName><forename type="first">J</forename><surname>Porteous</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Benini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Canini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Charles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cavazza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Leonardi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. MM &apos;10 of the 17th ACM international conference on Multimedia</title>
				<meeting>MM &apos;10 of the 17th ACM international conference on Multimedia</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2010">2010. 2010</date>
			<biblScope unit="page" from="1715" to="1718" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
