=Paper=
{{Paper
|id=Vol-1263/paper40
|storemode=property
|title=CERTH at MediaEval 2014 Synchronization of Multi-User Event Media Task
|pdfUrl=https://ceur-ws.org/Vol-1263/mediaeval2014_submission_40.pdf
|volume=Vol-1263
|dblpUrl=https://dblp.org/rec/conf/mediaeval/ApostolidisPM14
}}
==CERTH at MediaEval 2014 Synchronization of Multi-User Event Media Task==
CERTH at MediaEval 2014 Synchronization of Multi-User
Event Media Task
Konstantinos Apostolidis, Christina Papagiannopoulou, Vasileios Mezaris
Information Technologies Institute, CERTH, Thessaloniki, Greece
{kapost, cppapagi, bmezaris}@iti.gr
ABSTRACT We further modified this NDD process to also use color
This paper describes the results of the CERTH participation information (HSV histograms), so that near duplicate can-
in the Synchronization of Multi-User Event Media Task of didates that are very similar in color are not discarded even
MediaEval 2014. We used a near duplicate image detector to if the GC score is relatively low.
identify very similar photos, which allowed us to temporally We apply the modified NDD on the union of all galleries.
align photo galleries; and then we used time, geolocation and Consequently, we filter out identified pairs of near duplicates
visual information, including the results of visual concept according to the following rules:
detection, to cluster all photos into different events. • Reject pairs when geolocation information is available
and the location distance of the two photos is greater
than a distance threshold.
1. INTRODUCTION • Reject pairs when the time difference between the pho-
People attending large-scale social events collect dozens tos is above an extreme time threshold (which indicates
of photos and video clips with their smartphones, tablets, that this time difference is unlikely to be due to a time
cameras. These are later exchanged and shared in a num- synchronization error alone).
ber of different ways. The alignment and presentation of The remaining near duplicate photos belonging to differ-
the photo galleries of different users in a consistent way, so ent galleries are considered as links between those galleries.
as to preserve the temporal evolution of the event, is not It is now straightforward to construct a graph whose nodes
straightforward, considering that the time information at- represent the galleries, and the edges represent these links
tached to some of the captured media may be wrong (due between galleries. Each edge has a weight which is equal
to different photo capturing devices not being synchronized) to the number of links between the two galleries. Having
and geolocation information may be missing. The 2014 Me- constructed the graph, we compute the time offset of each
diaEval Synchronization of Multi-user Event Media (SEM) gallery by traversing it, as follows. Starting from the node
task tackles this exact problem [1]. corresponding to the reference gallery, we select the edge
with the highest weight. We compute the time offset of the
2. SYSTEM OVERVIEW node on the other end of this edge as the median of the
The main goal of our system is the time alignment of photo time differences of the pairs of near duplicate photos that
galleries that are created by different digital photo capture this edge represents, and add this node to the set of visited
devices, and the clustering of these into event-related clus- nodes. The selection of the edge with the highest weight is
ters. In the first stage, similar photos of the different gal- repeated, considering as possible starting point any member
leries are identified and are used for constructing a graph, of the set of visited nodes, and the corresponding time offset
whose nodes represent galleries and edges represent discov- is computed, until all nodes are visited. Alternatively, we
ered links between them. Time alignment of the galleries can traverse the graph and compute the nodes’ time offsets
is achieved by traversing the graph. After that, we apply by simultaneously considering the weights of multiple edges.
clustering techniques in order to split our collection into dif-
ferent events. Figure 1 shows the pipeline of our system.
4. MEDIA CLUSTERING OF EVENTS
3. TIME SYNCHRONIZATION Following time synchronization, we cluster all photos to
Time synchronization makes use of a Near Duplicate De- events. Two different approaches are adopted: the first one
tector (NDD) that extracts SIFT descriptors from the pho- considers all photo galleries as a single photo collection, ex-
tos, forms a visual vocabulary and encodes the descriptor- ploiting the synchronization results, while the second one
based representation of each photo using VLAD encoding. first makes a pre-clustering within each gallery separately.
The nearest neighbours that are returned for a query image In the first approach, we use the method of [2], resulting in
are refined by checking the geometrical consistency of SIFT clusters that are time distinct, comprising different events.
keypoints using geometric coding (GC) [4]. Subsequently, each of these clusters is split based on the
geolocation information. The photos that do not have ge-
olocation information are assigned to the geo-cluster which
Copyright is held by the author/owner(s) is more similar according to the color information (e.g. HSV
MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain histogram).
Figure 1: System overview
In the second approach, we detect time gaps between
events of each gallery. Specifically, we find the minimum Table 2: Time Synchronization and Clustering met-
time difference of dissimilar photos which is greater than rics for each run for the London testset.
run1 run2 run3 run4 run5
the maximum time difference of the near-duplicate photos Sync. Precision 0.6111 0.6111 0.6111 0.2222 0.6389
(based on the similarity matrix of GC). The clusters that Sync. Accuracy 0.7127 0.7127 0.7127 0.6996 0.7299
are formed are merged according to time and geolocation Rand Index 0.9885 0.9910 0.9838 0.9829 0.9863
similarity. For the clusters that do not have geolocation in- Jaccard Index 0.5051 0.5614 0.3232 0.2739 0.4849
formation, the merging is continued by considering the time F-Measure 0.3356 0.3596 0.2443 0.2150 0.3266
and low-level feature similarity or the time and the concept
detector (CD) confidence similarity scores [3].
6. CONCLUSIONS
5. EXPERIMENTS AND RESULTS This paper presented our framework and results at the
We submitted 5 runs in total, combining 3 methods for MediaEval 2014 Synchronization of Multi-User Event Media
time synchronization and 3 methods for event clustering: Task. Our modified NDD approach gives the best results in
• Run1:aNDD-perGallery-mergeCD: Compute gallery ti- time alignment for the Vancouver testset, while the stan-
me offsets using our modified NDD. CD scores are used dard NDD yields a slightly better time synchronization for
to merge clusters using the second approach of section the London testset. In sub-event clustering, the exploitation
4. of consistent timestamps in a gallery and the use of CD con-
• Run2:aNDD-perGallery-mergeHSV : Compute gallery fidence scores gives a good performance for the Vancouver
time offsets using our modified NDD. HSV histogram testset, whereas HSV histogram similarity seems to give the
similarity is used to merge clusters using the second best clustering results for the London testset.
approach of section 4.
• Run3:aNDD-concat: Compute gallery time offsets us-
ing our modified NDD. Clustering is performed using
7. ACKNOWLEDGMENTS
the first approach of section 4. This work was supported by the EC under contracts FP7-
• Run4:aNDD-multiT-perGallery-mergeCD: Compute ga- 287911 LinkedTV and FP7-600826 ForgetIT.
llery time offsets using our modified NDD and traversal
of the graph by simultaneously considering the weights 8. REFERENCES
of multiple edges. CD scores are used to merge clusters [1] N. Conci, F. De Natale, and V. Mezaris.
using the second approach of section 4. Synchronization of Multi-User Event Media (SEM) at
• Run5:NDD-perGallery-mergeCD: Compute gallery ti- MediaEval 2014: Task Description, Datasets, and
me offsets using NDD without HSV color information. Evaluation. In Proc. MediaEval Workshop, 2014.
CD scores are used to merge certain events using the [2] M. Cooper, J. Foote, A. Girgensohn, and L. Wilcox.
second approach of the section 4. Temporal event clustering for digital photo collections.
The results of our approach for all 5 runs, for the Vancouver ACM Transactions on Multimedia Computing,
testset and the London testset are listed in Tables 1 and 2 Communications, and Applications (TOMCCAP),
respectively. 1(3):269–288, 2005.
[3] C. Papagiannopoulou and V. Mezaris. Concept-based
Table 1: Time Synchronization and Clustering met- Image Clustering and Summarization of Event-related
rics for each run for the Vancouver testset. Image Collections. In Proc. Int. Workshop on Human
run1 run2 run3 run4 run5 Centered Event Understanding from Multimedia
Sync. Precision 0.9118 0.9118 0.9118 0.5294 0.9118 (HuEvent14) of ACM Multimedia (MM14), 2014.
Sync. Accuracy 0.7375 0.7375 0.7375 0.7014 0.7279 [4] W. Zhou, H. Li, Y. Lu, and Q. Tian. SIFT match
Rand Index 0.9770 0.9734 0.9526 0.9601 0.9656 verification by geometric coding for large-scale
Jaccard Index 0.2581 0.2315 0.2856 0.1782 0.2861 partial-duplicate web image search. ACM Trans.
F-Measure 0.2052 0.1880 0.2222 0.1512 0.2225 Multimedia Comput. Commun. Appl., 9(1):4:1–4:18,
Feb. 2013.