<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">CEA LIST&apos;s Participation at the MediaEval 2014 Retrieving Diverse Social Images Task</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Alexandru</forename><forename type="middle">Lucian</forename><surname>Ginsca</surname></persName>
							<email>alexandru.ginsca@cea.fr</email>
							<affiliation key="aff0">
								<orgName type="institution" key="instit1">CEA</orgName>
								<orgName type="institution" key="instit2">LIST, Vision &amp; Content Engineering Laboratory</orgName>
								<address>
									<postCode>91190</postCode>
									<settlement>Gif-sur-Yvette</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">TELECOM Bretagne</orgName>
								<address>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Adrian</forename><surname>Popescu</surname></persName>
							<email>adrian.popescu@cea.fr</email>
							<affiliation key="aff0">
								<orgName type="institution" key="instit1">CEA</orgName>
								<orgName type="institution" key="instit2">LIST, Vision &amp; Content Engineering Laboratory</orgName>
								<address>
									<postCode>91190</postCode>
									<settlement>Gif-sur-Yvette</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Navid</forename><surname>Rekabsaz</surname></persName>
							<email>rekabsaz@ifs.tuwien.ac.at</email>
							<affiliation key="aff2">
								<orgName type="department">Faculty of Informatics</orgName>
								<orgName type="institution">Vienna University of Technology</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">CEA LIST&apos;s Participation at the MediaEval 2014 Retrieving Diverse Social Images Task</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">C58E1327F26774923FDB0F1B257C0DCC</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T16:10+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The Mediaeval 2014 Retrieving Diverse Social Image Task aims to tackle the challenge of improving result diversity while keeping a high precision in a social image retrieval task. We base our approach on the retrieval performance of recently introduced visual descriptors coupled with a mixt diversification method that explores the use of social cues together with a classic clustering setting. As a novelty, this year's task introduced user credibility features. We also describe how to use credibility in the diversification process and how to improve individual features by the means of a regression model.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">INTRODUCTION</head><p>Social image retrieval presents an appropriate setting for the use of multimodal approaches to improve both results relevance and diversity. Recently, emerging works propose the use of social cues alongside visual and textual data.</p><p>Our efforts are channeled towards exploiting visual information and the use of credibility in the diversification process. We first describe a couple of pre-filtering techniques followed by an image retrieval method that boosts precision. Next, we describe how to predict a user's credibility score and we propose a user based image filtering approach. After we show how we improve diversity by clustering and cluster ranking, we finally describe the submitted runs and discuss the results we obtained on the testset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">AIMING FOR PRECISION</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Initial pre-filtering</head><p>We use two filtering steps with the goal to eliminate noise form the image lists. Similar to <ref type="bibr" target="#b2">[2]</ref>, we eliminate geotagged images that have a distance from the POI higher than 1 km. The second filter is a restriction on the presence of faces in images. We use the standard OpenCV<ref type="foot" target="#foot_0">1</ref> algorithm to perform face detection and we eliminate images having a face coverage ratio higher than 0.4. The distance threshold and the one for the percentage of faces are determined on the devset. We keep the same pre-filtering steps for all the runs.</p><p>MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Image retrieval</head><p>Following the latest advances in computer vision, we use Caffe <ref type="bibr" target="#b3">[3]</ref>, a powerful CNN-based feature, to extract representations for the images in the collection, as well as the Wikipedia image examples. Following a standard content based image retrieval approach, we rank the images for each topic by the average cosine similarity between the retrieved image and all of the example images. On the devset, we obtain a P@20 of 0.966 when doing retrieval with the Caffe features. This represents a significant improvement over the Flickr ranking (P@20 = 0.831) and LBP3x3 (P@20 = 0.816), the descriptor provided by the organizers which gives the best performances in visual retrieval. One drawback of this method is the strong trade-off between precision and cluster recall. Although P@20 on the devset is high, we get a CR@20 of 0.293, leading to a F1@20 of 0.438. This problem is directly approached by first selecting images found in different clusters, as described in Section 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">LISTENING TO SOCIAL CUES</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Predicting user credibility</head><p>We exploit the credibility set to train a regression model that predicts a user's credibility score from the provided features. We perform model selection and parameter tunning by 5-fold cross-validation (cv) on the credibility set and we evaluate the performance of the predictions by Spearman's rank correlation coefficient with the ground truth credibility values. The highest cv correlation (0.47) is obtained using gradient boosting regression trees with a Huber loss and 100 estimators. By comparison, the highest correlation of an individual feature (visual score) is 0.36. The gain in regards to the Spearman score is also reflected on the competition metrics. When fixing the rest of the parameters and using the predicted credibility scores instead of the provided visual credibility feature, F1@20 increases from 0.61 to 0.632 on the devset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">User selection</head><p>For each topic, we first keep a subset of users that have contributions in the top n images found in the ranking produces by the image retrieval process described in Section 2.2. Then, as an extra filter, in our final ranking we retain only images coming from the selected user set. Given the good precision of image retrieval, we have a high confidence that images found in the top of the ranking are relevant. This gives us an ad-hoc topical expertise insight about the users responsible for those images. We tune n on the devset and fix it at 20. For comparison, when not using a user based filter, the F1@20 score drops from 0.632 to 0.597. We also tried a similar approach by retaining contributions from top users ranked according to the credibility score but this did not improve the results. This result hints at the need for a topic specific credibility score.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">IMPROVING DIVERSITY</head><p>Building on previous works, we combine a more traditional clustering approach for diversification with the use of social cues <ref type="bibr" target="#b5">[5]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Clustering</head><p>We first perform k-Means clustering on the complete set of images. To ensure a stable cluster distribution, we initialize the centroids by uniformly selecting images from the ranking produced after image retrieval. For example, the i-th cluster will have as initial centroid the image found on the position (i − 1) * n/k, where k is the desired number of clusters and n is the number of images in the ranking. After validation on the devset, k is set to 30.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Cluster ranking</head><p>We leverage the social component of this task by ordering the clusters based on the average credibility score of the users that contribute with images in the cluster. For the runs that do not permit the use of credibility, we rank the clusters according to the number of unique users represented in each cluster. In the case of a tie, we prefer the cluster that has the best ranked image after visual retrieval. Our final ranked list is obtained by selecting from each cluster at a time the image that is best placed in the visual retrieval ranking.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">RESULTS AND DISCUSSION</head><p>We submitted five different runs at this year's Retrieving Diverse Social Images Task <ref type="bibr" target="#b1">[1]</ref>. Our submissions are briefly described below:</p><p>• RUN1 uses the provided LBP3x3 visual descriptor for image retrieval and clustering. The clusters are then ranked based on the number of users represented in each cluster.</p><p>• RUN2 is a purely textual one. We concatenated the title, tags and description of the photos to calculate the text similarity. As text pre-processing phase, we decompounded the terms by applying a greedy approach using the dictionary which is created by all the words in the text. In the next step, in order to disambiguate the places, we expand the queries using the first sentence of Wikipedia. After testing several language models, using a semantic similarity approach based on Word2Vec <ref type="bibr" target="#b4">[4]</ref> gave the best result. We trained a model on Wikipedia and then used the vector representation of words to calculate the text similarity of the query to each photo. In additional to the text similarity, we extracted three binary attributes: (1) if the photo had any views, (2) if the distance between a photo and the POI is greater than 8 kilometers, and (3) if the description length has more than 2000 characters. All features were then used in a Linear Regression model in order to re-rank the list. Finally, following <ref type="bibr" target="#b5">[5]</ref>, in order to diversify the ranking, we iterate over the initial re-ranked list and keep one image from each user at each iteration.</p><p>• RUN3 is a fusion between RUN1 and RUN2. Since the scores for visual and textual rankings are not in the same range, fusion is performed based on the ranks of the images in the two initial rankings. More specifically, we perform a linear weighting in which the individual ranks are given a weight of 0.5. Other weighting have been tested but the results remain quite stable in the range 0.3 -0.7, a result which accounts for the robustness of the proposed fusion.</p><p>• RUN4 is similar to RUN1 with the single difference laying in the use of credibility for cluster ranking.</p><p>• RUN5 is obtained using the Caffe visual descriptor for image retrieval and clustering and predicted credibility scores for cluster ranking.</p><p>Our textual run (RUN2) is the single one in which we do not use clustering to improve diversity. This reflects across metrics, as it can be seen in Table <ref type="table" target="#tab_0">1</ref>. Although it performs well in terms of F1@20, this run is placed at oposite poles when looking at the other metrics. It has the highest P@20 and the lowest CR@20.</p><p>The usefulness of credibility can be best observed when comparing RUN1 and RUN4. They share the same configuration with the sole exception being the use of the predicted credibility scores for cluster ranking in RUN4. Although the difference is not as significant as on the devset, we can see a slight improvement of F1@20.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">ACKNOWLEDGMENT</head><p>This research was supported by the MUCKE project, partly funded within the FP7 CHIST-ERA scheme.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Run performances with three official met-</figDesc><table><row><cell>rics</cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="2">Run name F1@20</cell><cell>P@20</cell><cell>CR@20</cell></row><row><cell>RUN1</cell><cell cols="2">0.5182 0.7313</cell><cell>0.4103</cell></row><row><cell>RUN2</cell><cell cols="3">0.5346 0.8089 0.4084</cell></row><row><cell>RUN3</cell><cell>0.5525</cell><cell>0.798</cell><cell>0.4335</cell></row><row><cell>RUN4</cell><cell cols="2">0.5243 0.7378</cell><cell>0.4157</cell></row><row><cell>RUN5</cell><cell>0.571</cell><cell cols="2">0.7931 0.4563</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://opencv.org/Copyright is held by the author/owner(s).</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title/>
		<author>
			<persName><surname>References</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Retrieving diverse social images at mediaeval 2014: Challenge, dataset and evaluation</title>
		<author>
			<persName><forename type="first">B</forename><surname>Ionescu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">MediaEval 2014 Workshop</title>
				<meeting><address><addrLine>Barcelona, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">October 16-17 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Experiments in diversifying flickr result sets</title>
		<author>
			<persName><forename type="first">N</forename><surname>Jain</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">MediaEval 2013 Workshop</title>
				<meeting><address><addrLine>Barcelona, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013">October 18-19 2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Caffe: An open source convolutional architecture for fast feature embedding</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Jia</surname></persName>
		</author>
		<ptr target="http://caffe.berkeleyvision.org" />
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Efficient estimation of word representations in vector space</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2013">2013</date>
			<publisher>CoRR</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Cea list&apos;s participation at the mediaeval 2013 retrieving diverse social images task</title>
		<author>
			<persName><forename type="first">A</forename><surname>Popescu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">MediaEval 2013 Workshop</title>
				<meeting><address><addrLine>Barcelona, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013">October 18-19 2013</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
