<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Out-of-the-box strategy for Rich Speech Retrieval @ MediaEval 2011</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Wouter</forename><surname>Alink</surname></persName>
							<email>wouter@spinque.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Spinque Utrecht</orgName>
								<address>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Roberto</forename><surname>Cornacchia</surname></persName>
							<email>roberto@spinque.com</email>
							<affiliation key="aff1">
								<orgName type="institution">Spinque Utrecht</orgName>
								<address>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Out-of-the-box strategy for Rich Speech Retrieval @ MediaEval 2011</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">62E225ABC685F5CD674A9BA765B917E6</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T01:29+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Evaluation tracks offer valuable opportunities to measure scientific and technological advances. Spinque approaches challenges as the MediaEval Rich Speech Recognition task with the additional goal of developing solutions that can easily transferred from academic labs to industry. The system used during this evaluation was obtained with minimal effort and no manual optimisation and yet it provides a reasonably good baseline to improve upon. More importantly, it is by nature an extensible approach, based on the concept of declarative search strategies, rather than an ad-hoc search system.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">INTRODUCTION</head><p>Our participation in the MediaEval Rich Speech Recognition task, described in <ref type="bibr" target="#b3">[3]</ref>, has been inspired by the quest for finding a simple, fast, robust, and effective approach to searching in speech transcripts. We used our generic search framework to instantiate a specific search solution for this task, with the explicit goal of producing reasonable results in the space of a few hours, including index creation, search strategy modelling and evaluation. As for example argued in <ref type="bibr" target="#b2">[2]</ref>, standard textual IR techniques can be applied to speech transcripts, even when the transcripts are not perfect. Our runs focus on textual search with different query keyword combinations and with rank refinement at different levels of retrieval unit granularity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">SPINQUE FRAMEWORK</head><p>We modelled and executed our runs as search strategies within the Spinque framework. This is a prototype environment where search processes are divided into two phases: the search strategy definition and the actual search.</p><p>Modelling search strategies in this framework corresponds to designing graph structures, where edges represent dataflows consisting of terms, documents (e.g. speech-transcripts), and document-sections. The nodes connected by such edges are pre-defined, general-purpose operational blocks, that either provide source data (the speech transcripts and the topics) or modify their input data-flow applying operations such as extraction of specific sections from documents or ranking of sections and documents, to name a few.</p><p>Search strategies defined in this framework are automati-</p><p>Copyright is held by the author/owner(s).</p><p>MediaEval 2011 Workshop, September 1-2, 2011, Pisa, Italy cally translated into a probabilistic relational query language and executed on top of an SQL database engine.</p><p>The same framework has also been used to participate in other evaluation tracks, such as CLEF-IP <ref type="bibr" target="#b1">[1]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">DESCRIPTION</head><p>The speech transcripts were indexed at two levels of granularity: as whole documents as well as individual Speech-Segment sections. We did not use the tags and the video keyframes provided, nor any other source of evidence.</p><p>Our runs can be described as follows:</p><p>run1 First, all words from title (weight 0.2) and all words from short-title (weight 0.8) are used to search all documents in the collection. Then, all the SpeechSegment sections within those documents are searched using the same keywords. The start of the section is returned as the result. This strategy is depicted in Figure <ref type="figure" target="#fig_0">1</ref>.</p><p>run2 the same as run1, except that all terms from title get a weight of 0.0 and all terms from short-title get a weight of 1.0. This basically discards the terms from title.</p><p>run3 the same as run1, except that all terms from title get a weight of 1.0 and all terms from short-title get a weight of 0.0. This basically discards the terms from short-title. Run3 should be considered as the "required run".</p><p>Textual ranking is performed with the BM25 [4] retrieval method, with standard parameters b = 0.75 and k1 = 1.2. The weights 0.2 (words from title) and 0.8 (words from shorttitle) have been found as the local optimum using a hill climbing approach.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">RESULTS AND FINDINGS</head><p>The average time for retrieving results for a topic was 230ms. This time includes "compiling" the search strategy (i.e. translating it into SQL queries) out-of-the-box and without manual optimisations, and the overhead for generating the run-files. A glitch later found in our indexer may have altered results marginally: a few documents have not been included in our index and therefore not retrieved.</p><p>The evaluation scores for the 3 submitted runs are shown in Table <ref type="table" target="#tab_0">1</ref>. Scores have been measured with window sizes of 10, 30, and 60 seconds. Overall scores are reasonably satisfying for a simple keyword-search approach. As expected, the combination of both the title and the short-title yield a better result than the individual runs. Best results were  found on the test-set assigning a larger weight to short-title keywords, which suggests that full titles may carry off-topic words which yield lower precision. We found that searching short sections produced disappointing rankings, probably due to a non fine-tuned documentlength normalisation. Both parameter configurations used (for BM25 and for the title / short-title keyword mixture) could be improved with a more exhaustive exploration of their search space. The simplicity of the strategies used and the small size of the corpus at hand would make this approach feasible indeed, which is not the case in general.</p><p>One more direction for possible improvements is to experiment with a more fine-grained zooming in, with search windows of e.g. entire documents followed by 10 minute, 1 minute and 5 seconds speeches. Such a multi-stage strategy would likely retain recall and improve precision at every iteration.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">CONCLUSIONS</head><p>The main contribution of this paper is to show how a specific search engine for speech transcripts of reasonable quality can be instantiated with minimal effort. While outof-the-box text search is not unique to Spinque's framework, the ability to play with retrieval units of different granularities and combine query and/or data sources easily is not common.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Search strategy using both title and shorttitle as input, first searching the whole transcript documents, then refining into sections.</figDesc><graphic coords="2,53.80,165.14,255.13,397.74" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>mGAP scores for the runs on the test-set with 50 topics (step size is 10 seconds)</figDesc><table><row><cell></cell><cell cols="2">Weights for</cell><cell cols="3">Window size (seconds)</cell></row><row><cell></cell><cell>title</cell><cell>short-title</cell><cell>10</cell><cell>30</cell><cell>60</cell></row><row><cell>Run 1</cell><cell>0.2</cell><cell>0.8</cell><cell>0.1320</cell><cell>0.2210</cell><cell>0.2724</cell></row><row><cell>Run 2</cell><cell>0.0</cell><cell>1.0</cell><cell>0.1164</cell><cell>0.1816</cell><cell>0.2231</cell></row><row><cell>Run 3</cell><cell>1.0</cell><cell>0.0</cell><cell>0.1054</cell><cell>0.1630</cell><cell>0.1968</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We plan to improve on our first speech retrieval evaluation in two ways: firstly, by automating as much as possible the optimisation of search strategies' free parameters, including the choice of unit retrieval granularities; secondly, by building on top of this optimised baseline with the addition of more sources of evidence that may be available (such as tags and video material).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title/>
		<author>
			<persName><surname>References</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Searching clef-ip by strategy</title>
		<author>
			<persName><forename type="first">W</forename><surname>Alink</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Cornacchia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">P</forename><surname>Vries</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2009, Revised Selected Papers, Part I</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Perspectives on information retrieval and speech</title>
		<author>
			<persName><forename type="first">James</forename><surname>Allan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Information Retrieval Techniques for Speech Applications</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<meeting><address><addrLine>Berlin / Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2002">2002</date>
			<biblScope unit="volume">2273</biblScope>
			<biblScope unit="page" from="323" to="326" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Overview of MediaEval 2011 Rich Speech Retrieval Task and Genre Tagging Task</title>
		<author>
			<persName><forename type="first">M</forename><surname>Larson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Eskevich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ordelman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Kofler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schmiedeke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">J F</forename><surname>Jones</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">MediaEval 2011 Workshop</title>
				<meeting><address><addrLine>Pisa, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011-02">September 1-2 2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Okapi at TREC-3</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">E</forename><surname>Robertson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Walker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hancock-Beaulieu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gatford</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Third Text REtrieval Conference</title>
				<imprint>
			<publisher>TREC</publisher>
			<date type="published" when="1994">1994. 1994</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
