<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Idiap at MediaEval 2013: Search and Hyperlinking Task</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Chidansh</forename><surname>Bhatt</surname></persName>
							<email>cbhatt@idiap.ch</email>
							<affiliation key="aff0">
								<orgName type="institution">Idiap Research Institute Martigny</orgName>
								<address>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nikolaos</forename><surname>Pappas</surname></persName>
							<email>npappas@idiap.ch</email>
							<affiliation key="aff1">
								<orgName type="institution" key="instit1">Idiap</orgName>
								<orgName type="institution" key="instit2">EPFL Martigny</orgName>
								<address>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Maryam</forename><surname>Habibi</surname></persName>
							<email>mhabibi@idiap.ch</email>
							<affiliation key="aff2">
								<orgName type="institution" key="instit1">Idiap</orgName>
								<orgName type="institution" key="instit2">EPFL Martigny</orgName>
								<address>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Andrei</forename><surname>Popescu-Belis</surname></persName>
							<affiliation key="aff3">
								<orgName type="institution">Idiap Research Institute Martigny</orgName>
								<address>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Idiap at MediaEval 2013: Search and Hyperlinking Task</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">847C676EB17CA1DE923E8904DA251523</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-19T17:57+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing</term>
					<term>H.5.1 [Information Interfaces and Presentation]: Multimedia Information Systems Topic segmentation</term>
					<term>video search</term>
					<term>video hyperlinking</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The Idiap system for Search and Hyperlinking Task uses topic-based segmentation, content-based recommendation algorithms, and multimodal re-ranking. For both sub-tasks, our system performs better with automatic speech recognition output than with manual subtitles. For linking, the results benefit from the fusion of text and visual concepts detected in the anchors.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">INTRODUCTION</head><p>This paper outlines the Idiap system for the MediaEval 2013 Search and Hyperlinking Task <ref type="bibr" target="#b2">[3]</ref>. The search sub-task required finding a determined segment of a show (from 1260 hours of broadcast TV material provided by BBC) based on a query that had been built with this "known item" in mind. The hyperlinking sub-task required finding items from the collection that are related to "anchors" from known items. We propose a unified approach to both sub-tasks, based on techniques inspired from content-based recommender systems <ref type="bibr" target="#b5">[6]</ref>, which provide the most similar segments to a given text query or to another segment, based on words. For hyperlinking, we also use the visual concepts detected in the anchor in order to rerank answers based on visual similarity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">SYSTEM OVERVIEW</head><p>The Idiap system makes use of three main components, shown at the center of Fig. <ref type="figure" target="#fig_0">1</ref>. We generate the data units, namely topic-based segments, from the subtitles or the ASR transcripts (either from LIMSI/Vocapia <ref type="bibr" target="#b3">[4]</ref> or from LIUM <ref type="bibr" target="#b6">[7]</ref>) using TextTiling in NLTK <ref type="bibr" target="#b0">[1]</ref>. For search, we compute wordbased similarity (from transcript and metadata) between queries and all segments in the collection, using a vector space model based and TF-IDF weighting. Similarly, for</p><p>Copyright is held by the author/owner(s).</p><p>MediaEval 2013 Workshop, October 18-19, 2013, Barcelona, Spain hyperlinking, we first rank all segments based on similarity with the anchor. In addition, we use the visual concept detection provided by the organizers (key frames from Technicolor <ref type="bibr" target="#b4">[5]</ref>, concepts detected by Visor <ref type="bibr" target="#b1">[2]</ref>) to generate a score matrix and then the list of nearest neighbors. Scores from text and visual similarity are fused to re-rank final linking results.  Topic segmentation was performed over subtitles or transcripts using TextTiling as implemented in the NLTK toolkit. Topic shifts are based on the analysis of lexical co-occurrence patterns, computed from 20-word pseudo-sentences. (This value was chosen to satisfy the requirement of the hyperlinking task that segments are on average shorter than 2 minutes.) Then, similarity scores are assigned at sentence gaps using block comparison. The peak differences between the scores are marked as boundaries, which we fit to the closest speech segment break. The total number of segments for subtitles / LIMSI / LIUM is respectively 114,448 / 111,666 / 84,783, with average segment sizes of 53 / 53 / 68 seconds and a standard deviation of 287 / 68 / 64 seconds. We found some mismatches between the durations in metadata files and the timing found in the subtitle or LIMSI transcript files and we discarded such mismatching segments (there are respectively 488 and 956 such mismatches). For instance, "20080510 212500 bbcthree two pints of lager and" has a duration of 1,800 seconds according to the meta-data file, while the last subtitle segment ends at 00:55:26.2 and the last segment of the LIMSI transcript ends at 3325.36.</p><p>Segment search was performed by indexing the text segments in a word vector space with TF-IDF weights, rep-resenting each textual query (and words from the "visual cues") into the same space, and retrieving the most similar segments to the query using cosine similarity. We first tokenized the text and removed stop words. We tested several parameters on the small development set with the LIMSI transcript: the order of n-grams (1, 2, or 3) and the size of the vocabulary (10k, 20k, 30k, 40k, 50k words). The best scores (ranks of known items in the results) were reached for 50k words with unigrams, bigrams and trigrams. With these features, we found on the development set that the LIMSI transcript performed best, followed by LIUM, LIUM with metadata, and subtitles. We submitted 4 runs for the search sub-task: 3 were based on each transcript/subtitle words, and the fourth used the LIUM transcript but appended to each segment the words from the metadata (cast, synopsis, series, and episode name).</p><p>For hyperlinking segments from anchors, indexing is performed as above, though using only unigrams and a vocabulary of 20,000 words. For scenario A (anchor information only), we extended the anchor text with text from segments containing/overlapping the anchor boundaries. For the scenario C, we considered the text within the start time and end time of the provided know-item, along with text from segments containing/overlapping the know-item boundaries. We enriched the subtitle/ASR text using the textual metadata (title, series, episode) and webdata (cast, synopsis). The segments and anchors were indexed into a vector space with TF-IDF weights, and the top N most similar segments were found by cosine similarity.</p><p>Then, we reranked results based on visual feature similarity, using the visual concept detection scores per keyframe (provided by the organizers). Keyframes were first aligned to topic-based segments using shot information <ref type="bibr" target="#b4">[5]</ref>, with an average of 5 keyframes per segment. Similarly, this was performed for the anchors (8 frames) and anchors + contexts (55 frames). For each segment, we generated a visual feature vector using the concepts with the highest scores from the keyframes of the segment. Using KNN, we ranked all segments by decreasing similarity to an anchor. Then, we reranked text-based results using visual information, respectively with weight W (for text) and 1 − W (for visual). We chose W = 0.8 in the case of subtitles (assuming a higher accuracy) and W = 0.6 for transcripts. Finally, we ignored segments shorter than 10 s and chunked larger segments into 2-minute segments. We submitted 3 runs: two with the subtitle words (scenarios A and C) and one with the LIMSI transcript (C).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">RESULTS</head><p>The official search results (Table <ref type="table" target="#tab_2">1</ref>) show the same ranking as on the development set. Using LIMSI transcript outperforms the LIUM one, which is not helped by metadata (this might be due to low-frequency features in the metadata). Surprisingly, subtitles yield the lowest scores.</p><p>The overall low scores (esp. on mGAP and MASP) could be due to the short average size of our segments, which were not calibrated to match the average size of known items.</p><p>Analyzing results per query, in 12 out of 50 test queries our best run gets the known item in the top 10 answers. These queries are not "easy", as they vary across runs (with exceptions like item 18). On the contrary, for 14 queries the known-item is not found among the top 1000 results.</p><p>The linking runs ( line, separately from the other submissions, due to a time conversion problem undetected on submission. Here also, using the LIMSI transcript (first line) outperforms subtitles. This might be due to the higher weight of visual concepts when using transcripts (0.4) vs. subtitles (0.2). When using subtitles (2nd and 3rd rows), a higher MAP value was found when context was used, indicating that this might actually add useful information, esp. with our strategy of extending context boundaries to the closest segments. Therefore, we hypothesize that using LIMSI transcripts for the A task would lead to an even lower MAP compared to the LIMSI transcripts for the C task.</p><p>The precision of our system increases from top 5 to top 10 and decrease a bit at top 20. Our best system reaches close-to-average MAP on anchors 31 and 39 (respectively 0.80 and 0.50), while the MRR of the corresponding search queries (item 23 for 31, item 25 for 39) is close to zero. This is an indication that the visual features may be helpful.  </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Overview of the Idiap system.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc></figDesc><table><row><cell>) were scored after the dead-</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 1 :</head><label>1</label><figDesc>Official Idiap results for the search task.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 2 :</head><label>2</label><figDesc>Idiap results for hyperlinking: precision at top 5, 10 and 20, and mean average precision.</figDesc><table /></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">ACKNOWLEDGMENTS</head><p>This work was supported by the Swiss National Science Foundation (AROLES project n. 51NF40-144627) and by the European Union (inEvent project FP7-ICT n. 287872). We would like to thank Maria Eskevich and Robin Aly for their valuable help with the task.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">NLTK: the Natural Language Toolkit</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bird</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">COLING/ACL Interactive Presentations</title>
				<meeting><address><addrLine>Sydney</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">The devil is in the details: an evaluation of recent feature encoding methods</title>
		<author>
			<persName><forename type="first">K</forename><surname>Chatfield</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">British Machine Vision Conference</title>
				<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The Search and Hyperlinking Task at MediaEval 2013</title>
		<author>
			<persName><forename type="first">M</forename><surname>Eskevich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">J</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Aly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ordelman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">MediaEval 2013 Workshop</title>
				<meeting><address><addrLine>Barcelona, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013">October 18-19 2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">The LIMSI Broadcast News Transcription System</title>
		<author>
			<persName><forename type="first">J.-L</forename><surname>Gauvain</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Speech Communication</title>
				<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A video fingerprint based on visual digest and local fingerprints</title>
		<author>
			<persName><forename type="first">A</forename><surname>Massoudi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICIP</title>
				<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Combining content with user preferences for ted lecture recommendation</title>
		<author>
			<persName><forename type="first">N</forename><surname>Pappas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Popescu-Belis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Content Based Multimedia Indexing (CBMI)</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">LIUM&apos;s SMT machine translation systems for WMT</title>
		<author>
			<persName><forename type="first">H</forename><surname>Schwenk</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">6th Workshop on Statistical Machine Translation</title>
				<meeting><address><addrLine>Edinburgh</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011">2011. 2011</date>
			<biblScope unit="page" from="464" to="469" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
