<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">CERTH at MediaEval 2014 Synchronization of Multi-User Event Media Task</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Konstantinos</forename><surname>Apostolidis</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Information Technologies Institute</orgName>
								<orgName type="institution">CERTH</orgName>
								<address>
									<settlement>Thessaloniki</settlement>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Christina</forename><surname>Papagiannopoulou</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Information Technologies Institute</orgName>
								<orgName type="institution">CERTH</orgName>
								<address>
									<settlement>Thessaloniki</settlement>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Vasileios</forename><surname>Mezaris</surname></persName>
							<email>bmezaris@iti.gr</email>
							<affiliation key="aff0">
								<orgName type="department">Information Technologies Institute</orgName>
								<orgName type="institution">CERTH</orgName>
								<address>
									<settlement>Thessaloniki</settlement>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">CERTH at MediaEval 2014 Synchronization of Multi-User Event Media Task</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">629B33F37F7EC7522E925AC543BF4E4A</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T16:12+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper describes the results of the CERTH participation in the Synchronization of Multi-User Event Media Task of MediaEval 2014. We used a near duplicate image detector to identify very similar photos, which allowed us to temporally align photo galleries; and then we used time, geolocation and visual information, including the results of visual concept detection, to cluster all photos into different events.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">INTRODUCTION</head><p>People attending large-scale social events collect dozens of photos and video clips with their smartphones, tablets, cameras. These are later exchanged and shared in a number of different ways. The alignment and presentation of the photo galleries of different users in a consistent way, so as to preserve the temporal evolution of the event, is not straightforward, considering that the time information attached to some of the captured media may be wrong (due to different photo capturing devices not being synchronized) and geolocation information may be missing. The 2014 Me-diaEval Synchronization of Multi-user Event Media (SEM) task tackles this exact problem <ref type="bibr" target="#b1">[1]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">SYSTEM OVERVIEW</head><p>The main goal of our system is the time alignment of photo galleries that are created by different digital photo capture devices, and the clustering of these into event-related clusters. In the first stage, similar photos of the different galleries are identified and are used for constructing a graph, whose nodes represent galleries and edges represent discovered links between them. Time alignment of the galleries is achieved by traversing the graph. After that, we apply clustering techniques in order to split our collection into different events. Figure <ref type="figure" target="#fig_0">1</ref> shows the pipeline of our system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">TIME SYNCHRONIZATION</head><p>Time synchronization makes use of a Near Duplicate Detector (NDD) that extracts SIFT descriptors from the photos, forms a visual vocabulary and encodes the descriptorbased representation of each photo using VLAD encoding. The nearest neighbours that are returned for a query image are refined by checking the geometrical consistency of SIFT keypoints using geometric coding (GC) <ref type="bibr" target="#b4">[4]</ref>.</p><p>Copyright is held by the author/owner(s) MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain</p><p>We further modified this NDD process to also use color information (HSV histograms), so that near duplicate candidates that are very similar in color are not discarded even if the GC score is relatively low.</p><p>We apply the modified NDD on the union of all galleries. Consequently, we filter out identified pairs of near duplicates according to the following rules:</p><p>• Reject pairs when geolocation information is available and the location distance of the two photos is greater than a distance threshold. • Reject pairs when the time difference between the photos is above an extreme time threshold (which indicates that this time difference is unlikely to be due to a time synchronization error alone). The remaining near duplicate photos belonging to different galleries are considered as links between those galleries.</p><p>It is now straightforward to construct a graph whose nodes represent the galleries, and the edges represent these links between galleries. Each edge has a weight which is equal to the number of links between the two galleries. Having constructed the graph, we compute the time offset of each gallery by traversing it, as follows. Starting from the node corresponding to the reference gallery, we select the edge with the highest weight. We compute the time offset of the node on the other end of this edge as the median of the time differences of the pairs of near duplicate photos that this edge represents, and add this node to the set of visited nodes. The selection of the edge with the highest weight is repeated, considering as possible starting point any member of the set of visited nodes, and the corresponding time offset is computed, until all nodes are visited. Alternatively, we can traverse the graph and compute the nodes' time offsets by simultaneously considering the weights of multiple edges.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">MEDIA CLUSTERING OF EVENTS</head><p>Following time synchronization, we cluster all photos to events. Two different approaches are adopted: the first one considers all photo galleries as a single photo collection, exploiting the synchronization results, while the second one first makes a pre-clustering within each gallery separately.</p><p>In the first approach, we use the method of <ref type="bibr" target="#b2">[2]</ref>, resulting in clusters that are time distinct, comprising different events. Subsequently, each of these clusters is split based on the geolocation information. The photos that do not have geolocation information are assigned to the geo-cluster which is more similar according to the color information (e.g. HSV histogram). In the second approach, we detect time gaps between events of each gallery. Specifically, we find the minimum time difference of dissimilar photos which is greater than the maximum time difference of the near-duplicate photos (based on the similarity matrix of GC). The clusters that are formed are merged according to time and geolocation similarity. For the clusters that do not have geolocation information, the merging is continued by considering the time and low-level feature similarity or the time and the concept detector (CD) confidence similarity scores <ref type="bibr" target="#b3">[3]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">EXPERIMENTS AND RESULTS</head><p>We submitted 5 runs in total, combining 3 methods for time synchronization and 3 methods for event clustering:</p><p>• Run1:aNDD-perGallery-mergeCD: Compute gallery time offsets using our modified NDD. CD scores are used to merge clusters using the second approach of section 4. • Run2:aNDD-perGallery-mergeHSV : Compute gallery time offsets using our modified NDD. HSV histogram similarity is used to merge clusters using the second approach of section 4. • Run3:aNDD-concat: Compute gallery time offsets using our modified NDD. Clustering is performed using the first approach of section 4. • Run4:aNDD-multiT-perGallery-mergeCD: Compute gallery time offsets using our modified NDD and traversal of the graph by simultaneously considering the weights of multiple edges. CD scores are used to merge clusters using the second approach of section 4. • Run5:NDD-perGallery-mergeCD: Compute gallery time offsets using NDD without HSV color information. CD scores are used to merge certain events using the second approach of the section 4. The results of our approach for all 5 runs, for the Vancouver testset and the London testset are listed in Tables <ref type="table" target="#tab_1">1 and 2</ref> respectively. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">CONCLUSIONS</head><p>This paper presented our framework and results at the MediaEval 2014 Synchronization of Multi-User Event Media Task. Our modified NDD approach gives the best results in time alignment for the Vancouver testset, while the standard NDD yields a slightly better time synchronization for the London testset. In sub-event clustering, the exploitation of consistent timestamps in a gallery and the use of CD confidence scores gives a good performance for the Vancouver testset, whereas HSV histogram similarity seems to give the best clustering results for the London testset.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: System overview</figDesc><graphic coords="2,53.80,53.80,502.13,119.06" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Time Synchronization and Clustering metrics for each run for the Vancouver testset.</figDesc><table><row><cell></cell><cell>run1</cell><cell>run2</cell><cell>run3</cell><cell>run4</cell><cell>run5</cell></row><row><cell cols="6">Sync. Precision 0.9118 0.9118 0.9118 0.5294 0.9118</cell></row><row><cell cols="6">Sync. Accuracy 0.7375 0.7375 0.7375 0.7014 0.7279</cell></row><row><cell>Rand Index</cell><cell cols="5">0.9770 0.9734 0.9526 0.9601 0.9656</cell></row><row><cell cols="6">Jaccard Index 0.2581 0.2315 0.2856 0.1782 0.2861</cell></row><row><cell>F-Measure</cell><cell cols="5">0.2052 0.1880 0.2222 0.1512 0.2225</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Time Synchronization and Clustering metrics for each run for the London testset. Precision 0.6111 0.6111 0.6111 0.2222 0.6389 Sync. Accuracy 0.7127 0.7127 0.7127 0.6996 0.7299 Rand Index 0.9885 0.9910 0.9838 0.9829 0.9863 Jaccard Index 0.5051 0.5614 0.3232 0.2739 0.4849 F-Measure 0.3356 0.3596 0.2443 0.2150 0.3266</figDesc><table><row><cell>run1</cell><cell>run2</cell><cell>run3 run4</cell><cell>run5</cell></row><row><cell>Sync.</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">ACKNOWLEDGMENTS</head><p>This work was supported by the EC under contracts FP7-287911 LinkedTV and FP7-600826 ForgetIT.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title/>
		<author>
			<persName><surname>References</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Synchronization of Multi-User Event Media (SEM) at MediaEval 2014: Task Description, Datasets, and Evaluation</title>
		<author>
			<persName><forename type="first">N</forename><surname>Conci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">De</forename><surname>Natale</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Mezaris</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. MediaEval Workshop</title>
				<meeting>MediaEval Workshop</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Temporal event clustering for digital photo collections</title>
		<author>
			<persName><forename type="first">M</forename><surname>Cooper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Foote</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Girgensohn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wilcox</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="269" to="288" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Concept-based Image Clustering and Summarization of Event-related Image Collections</title>
		<author>
			<persName><forename type="first">C</forename><surname>Papagiannopoulou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Mezaris</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Int. Workshop on Human Centered Event Understanding from Multimedia (HuEvent14) of ACM Multimedia (MM14)</title>
				<meeting>Int. Workshop on Human Centered Event Understanding from Multimedia (HuEvent14) of ACM Multimedia (MM14)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">SIFT match verification by geometric coding for large-scale partial-duplicate web image search</title>
		<author>
			<persName><forename type="first">W</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Tian</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Trans. Multimedia Comput. Commun. Appl</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">18</biblScope>
			<date type="published" when="2013-02">Feb. 2013</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
