<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Searching and Hyperlinking using Word Importance Segment Boundaries in MediaEval 2013</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Kim</forename><surname>Schouten</surname></persName>
							<email>schouten@ese.eur.nl</email>
							<affiliation key="aff0">
								<orgName type="institution">Erasmus University Rotterdam</orgName>
								<address>
									<settlement>Rotterdam</settlement>
									<country key="NL">the Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Robin</forename><surname>Aly</surname></persName>
							<email>r.aly@ewi.utwente.nl</email>
							<affiliation key="aff1">
								<orgName type="institution">University of Twente Enschede</orgName>
								<address>
									<country key="NL">the Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Roeland</forename><surname>Ordelman</surname></persName>
							<affiliation key="aff2">
								<orgName type="institution">University of Twente Enschede</orgName>
								<address>
									<country key="NL">the Netherlands</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Searching and Hyperlinking using Word Importance Segment Boundaries in MediaEval 2013</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">2362E59BAA6B96434A4C3BC3C79FFDE4</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-19T17:57+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper reports a set of experiments performed in the context of the Searching and Hyperlinking task of the Medi-aEval Benchmark Initiative 2013. The Searching part challenges to return a ranked list of video segments that are relevant given some textual user query, while for the Hyperlinking task the aim is to return a ranked list of video segments that are relevant given some video segment. The main focus is on finding a way to compute flexible segment boundaries. This is performed by extending the term frequency part of tf-idf to include the temporal dimension of videos. Although the contribution is theoretically sound its performance is relatively poor, which we attribute to the focus on speech data and the hyperlinking process. We plan to refine our method in the future overcome these limitations.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">INTRODUCTION</head><p>The content of videos can be long and current search approaches that return whole videos waste the time of users searching for specific information. The Search and Hyperlinking Tasks in the MediaEval Benchmark Initiative 2013 <ref type="bibr" target="#b2">[3]</ref>, which is a refinement of its previous instance <ref type="bibr" target="#b0">[1]</ref>, models the situation where users should be directly pointed to relevant segments within videos and be offered links to video segments relevant to the already found video segment. Stateof-the-art search methods use fixed video segments as a return unit. However, fixed video segments have the limitation that evidence for a relevant passage can be divided between two segments or that the segments are too long. Therefore, in this paper we propose a method to determine segment boundaries that are suitable for individual queries and to rank these segments accordingly.</p><p>The paper is outlined as follows. Section 2 describes our methods for search and hyperlinking video. Section 3 describes the details of our experiment and show the evaluation results of our submitted runs, and finally we conclude in Section 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">METHOD</head><p>In the following we describe our methods used for the searching and hyperlinking sub-tasks. To this end, we extend the well-known tf-idf ranking function to incorporate the temporal dimension of videos when computing scores.</p><p>Copyright is held by the author/owner(s).</p><p>MediaEval 2013 Workshop, October 18-19, 2013, Barcelona, Spain Our intuition is that term are not only important to the moment they were uttered but also to a time window around the utterance. We model the posterior probability of a query term q being important at time t given its utterance at time t2 as a double sigmoid function:</p><formula xml:id="formula_0">p(qt = 1|ot) = 1 1 + exp(− t−b+c s ) − 1 1 + exp(− t−e−c s )<label>(1)</label></formula><p>where t = video time in milliseconds qt = binary variable of q being important at time t oq = the utterance of term q (milliseconds) b = the begin time of query term q being uttered e = the end time of query term q being uttered c = a correction term to shift the sigmoid to, respectively, the left and the right, such that b &lt; ∀t &lt; e yields a probability close to 1 s = the steepness of the sigmoid modeling the reach of the importance around oq, this is a multiple of idf (q) Given our probabilistic model of the importance of query terms, we now describe how we integrate this model into the tf-idf ranking scheme. Instead of term frequencies in a document, we calculate the score of time t as the frequency of utterances that are important to t times its idf factor. However, because we are not certain about the actual importance qt, we calculate the expected frequency of important terms:</p><formula xml:id="formula_1">score(t) = q∈Q oq ∈Oq idfqE[qt|oq] = q idfq oq ∈Oq p(qt|ot) (2)</formula><p>where Q are the query terms and Oq are all occurrence of term q in the video.</p><p>Based on this ranking scheme, we determine the extent of a result segment s as the adjacent t's that have a score above a certain threshold (set at 0.01 in our experiments). The score of a segment seg is set to be the maximum score(t) for every time point t in seg (we break ties by selecting the shorter segment):</p><formula xml:id="formula_2">score(seg) = max t∈seg score(t)<label>(3)</label></formula><p>where t iterates over the time interval of the segment. Figure <ref type="figure" target="#fig_0">1</ref> shows an example of our ranking scheme using a term with large idf and a term with low idf with each one utterance, as well as the score(t) function that combines both.  For the hyperlinking task, we construct a textual query from the given anchor segment. To this end, we consider the words uttered during the anchor segment and select the ten words with the largest Kullback-Leibner divergence compared to the language model of the whole collection, loosely following the method proposed in <ref type="bibr" target="#b3">[4]</ref>. Using the query, we used the same ranking scheme as for the search task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">EXPERIMENTS</head><p>In this section, we describe the experimental setup and submitted runs for the searching, and hyperlinking sub-tasks. For both search queries and constructed hyperlinking queries, we return a ranked list of video segments by decreasing score.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Setup</head><p>For an overview of the used dataset please refer to <ref type="bibr" target="#b2">[3]</ref>. Transcripts from LIUM are provided in blocks of 20 seconds, which we chose as the duration of each term. For the manual subtitles and LIMSI transcripts we used the time annotation for the individual word.</p><p>Our ranking scheme requires two parameters, see Equation 1. We set c = s * log( 1.0 0.01 − 1), which ensures that the sigmoids will yield a value of 0.99 between b and e. This corresponds to the intuition that a term is almost certainly important close to the time it was uttered. For the steepness parameter, we use s = 100, 000 * idf (q), causing rare words to have wider influence.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Results</head><p>The results for the searching sub-task are given below in terms of Mean Reciprocal Rank (MRR), mean Generalized Average Precision (mGAP), and Mean Average Segment Precision (MASP) (cf. <ref type="bibr" target="#b1">[2]</ref>). We see that the subtitles perform the poorest in both task. The LIUM transcripts outperform the ones from LIMSI, albeit with only small differences, which were statistical insignificant (p = 0.05). In general, these results are in line with our previous work <ref type="bibr" target="#b4">[5]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Runs</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Discussion</head><p>The described method shows poor performance. We identify the following possible reasons: First, our ranking scheme only uses speech information to find video segments, while their relevance is sometimes determined by their visual content or in the metadata (we found several instances where this is the case). Second, we used only one parameter setting, which was the most intuitive to us. We believe other settings can improve the performance. Finally, for the query generation for the hyperlinking task, we only used the text uttered during the anchor segments. As some of them are relatively short (with a minimum of 10sec), we believe considering their surrounding can improve their performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">CONCLUSIONS</head><p>We have described a method to rank video segments based on the uncertain importance of query terms at particular time in a video given the term's utterances in the transcript. The time interval where a term is important was described by a probability distribution modeled as a double sigmoid function. We proposed that the points where the probability that none of the terms is important are intuitive segment boundaries specific to the current query, which many methods lack. To rank segments we expanded the standard tf-idf weighting scheme to the situation where the importance of query terms at a given was uncertain. The final score to rank segments was the maximum expected tf-idf score.</p><p>The described method showed relatively poor performance. We plan to pursue the following paths to improve the results in the future. First, we plan to extend our scheme to incorporate visual content and metadata. Second, we will investigate multiple parameter settings to tune our method. Finally, for hyperlinking we only extracted words from the given segment, which we believe may not provide enough information. In the future we plan to include the surrounding of the anchor segment and the initial query for which a user arrived at the current segment.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: An example plot of two query keyword occurrences in a video.</figDesc></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Search and Hyperlinking Task at MediaEval 2012</title>
		<author>
			<persName><forename type="first">M</forename><surname>Eskevich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">J F</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Aly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ordelman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Larson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">MediaEval 2012 Workshop</title>
				<meeting><address><addrLine>Pisa, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2012-05">October 4-5 2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">New metrics for meaningful evaluation of informally structured speech retrieval</title>
		<author>
			<persName><forename type="first">M</forename><surname>Eskevich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Magdy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">J F</forename><surname>Jones</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 34th European conference on Advances in Information Retrieval, ECIR&apos;12</title>
				<meeting>the 34th European conference on Advances in Information Retrieval, ECIR&apos;12<address><addrLine>Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="170" to="181" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The Search and Hyperlinking Task at MediaEval 2013</title>
		<author>
			<persName><forename type="first">Maria</forename><surname>Eskevich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">F</forename><surname>Gareth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Shu</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Robin</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Roeland</forename><surname>Aly</surname></persName>
		</author>
		<author>
			<persName><surname>Ordelman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">MediaEval 2013 Workshop</title>
				<meeting><address><addrLine>Barcelona, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013">October 18-19 2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Language Modeling for Information Retrieval, chapter Relevance models in information retrieval</title>
		<author>
			<persName><forename type="first">V</forename><surname>Lavrenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">Bruce</forename><surname>Croft</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2003">2003</date>
			<publisher>Kluwer Academic Publishers</publisher>
			<biblScope unit="page" from="11" to="56" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">UTwente does brave new tasks for mediaeval 2012: Searching and hyperlinking</title>
		<author>
			<persName><forename type="first">D</forename><surname>Nadeem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Aly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ordelman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">MediaEval 2012 Multimedia Benchmark Workshop</title>
		<title level="s">CEUR Workshop Proceedings. CEUR</title>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="volume">927</biblScope>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
