=Paper=
{{Paper
|id=Vol-1263/paper58
|storemode=property
|title=UPC at MediaEval 2014 Social Event Detection Task
|pdfUrl=https://ceur-ws.org/Vol-1263/mediaeval2014_submission_58.pdf
|volume=Vol-1263
|dblpUrl=https://dblp.org/rec/conf/mediaeval/Manchon-VizueteGN14
}}
==UPC at MediaEval 2014 Social Event Detection Task==
UPC at MediaEval 2014 Social Event Detection Task
Daniel Manchon-Vizuete, Irene Gris-Sarabia and Xavier Giro-i-Nieto
Universitat Politecnica de Catalunya
Barcelona, Catalonia
xavier.giro@upc.edu
ABSTRACT tags have a semantic dimension that was not captured in our
This working notes paper presents the contribution of the previous approach, where we compared tag similarity solely
UPC team to the Social Event Detection (SED) Subtask 1 based on the string of characters, with no semantic inter-
in MediaEval 2014. This contribution extends the solution pretation at all. This approach fails into detecting related
tested in the previous year with a better optimization of tags and synonyms, as they are considered as completely
the parameters that determine the clustering algorithm, and different.
by introducing an additional pass that considers the merges In this MediaEval SED 2014 edition, we addressed these
of all pairs of mini-clusters generated during the two first problems by assessing the mergings between mini-clusters
passes. Our proposal also addresses the problem of incom- based on the textual tags only. This way it was not nec-
plete metadata by generating additional textual tags based essary to define and train different merging strategies de-
on geolocation and natural language processing techniques. pending on the available metadata. This approach required
the introduction of strategies to populate the tags associated
to the photos from the available data. In addition, in this
1. MOTIVATION year’s submission, we also optimized the merging of mini-
This document describes the algorithms tested by the clusterings by adopting a three-pass approach, instead of a
UPC team in the MediaEval 2014 Social Event Detection two-pass one as in the previous year.
(SED) Subtask 1, which addressed the problem of full clus- This paper is structured as follows. Section 2 presents the
tering of a photo collection. The reader is referred to [6] for novelties that have been introduced this year compared to
the task description in MediaEval 2014 and further details our previous submission. The performance of the solution
about the study case, dataset and metrics . is assessed in Section 3 with the results obtained on the
The proposed approach extends our submission in the pre- MediaEval SED 2014 task. Finally, Section 4 provides the
vious MediaEval SED 2013 [5]. That work solved the Full insights learned with this work.
Clustering subtask by firstly generating a temporal-based
over-segmentation of the photo collection as proposed by
PhotoTOC [7]. In a second pass, the mini-clusters in a
2. APPROACH
close temporal neighbourhood were clustered based on ge- This section presents the new advances we have intro-
olocation, textual tags and user ID metadata. That solution duced in our algorithm with respect to the solution submit-
presented difficulties when addressing the realistic scenario ted in MediaEval SED 2013 [5].
where many photos had missing metadata information and
noisy tags. 2.1 Text-based cluster merging
We observed that the images in the provided collection are While in the previous year mini-clusters were merged with
heterogeneous in terms of the type of metadata they have a similarity that combined geolocation, tags and user IDs, in
associated with. Some of the photos in the dataset contained 2014 we have used only the textual tags of the mini-clusters
geolocation data, while some others did not. On the other to assess their merges. In addition, instead of using the
hand, users also present diverse behaviour regarding photo Jaccard Index on the tags to assess the similarity between
tagging. While some users will provide textual tags to their mini-clusters, this year we have switched to TF-IDF descrip-
pictures, others will just share the photos with no tags at tors, compared with the cosine distance. These changes have
all. allowed a better estimation of the relevance of the different
This situation generates an unbalanced description of the terms with respect to others.
photos in the dataset. In our previous approach [5], we ap-
plied different similarity functions when assessing the merg- 2.2 Reverse geocoding
ing of mini-clusters, depending on which metadata was avail- A first solution to increase the amount of textual tags was
able from the mini-clusters. using the geolocation coordinates by obtaining the names
On the other hand, when tags are present, there is also of the locations where the photos were taken. This strat-
a problem related to natural language processing. Textual egy is known as Reverse geocoding, and it allows transform-
ing numerical coordinates for geolocation into a readable
text description. Our system has used the MapQuest Open
Copyright is held by the author/owner(s). Geocoding API Web Service, which provides an easy access
MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain to data stored under the OpenStreeMap (OSM) project [3].
2.3 Tag expansion with hypernyms 4. CONCLUSIONS
One of the main challenges when dealing with textual tags The UPC contribution to the Social Event Detection task
is their semantic interpretation. The case of synonyms can- in MediaEval 2014 has been an extension of the approach
not be captured with a character-based descriptor. It re- previously tested in the 2013 edition [5]. The implemented
quires the introduction of some semantic information, typi- algorithm provides a fast solution for photo clustering given
cally coded in an ontology. its time-based first- and second- passes, which compare pho-
Another novelty in this year has been the enrichment of tos and mini-clusters in a local neighbourhood. A first im-
the tag set with the hypernyms, that is, those terms that provement from the previous submission has been the op-
provide a generalization of each original term. This way timization of the parameters which control the creation of
we have tried to capture better the synonym terms. The these mini-clusters. The main novelty has been introduced
Natural Language Processing Toolkit (NLTK) [4] has been in the merging of these mini-clusters which, instead of com-
used for this task, which exploits the knowledge contained bining different modalities (tags, geolocation and user ID),
in WordNet [2]. have been solely based on textual tags. Nevertheless, these
textual tags have been extended in two directions: by re-
2.4 Three clustering passes verse geocoding and with hypernyms. However, the results
The creation of clusters begins with a sequential process that have been obtained in the test dataset do not show
in the temporal dimension, as in our previous submission any improvement by the introduction of the new tags. In
in 2013, which was inspired by the original PhotoTOC al- fact, performance decreases, especially when using the hy-
gorithm [7]. This approach allows a fast converge of the pernyms. This behaviour on the test dataset was unex-
problem, as clusters are initially only merged in a local time- pected, as during the development of our solution with the
based neighbourhood. training dataset, an improvement had been observed.
In particular, our approach defines three passes. In the
first pass, the photos of each user are treated separately, 5. ACKNOWLEDGMENTS
comparing the time stamps of the photos in a local neigh-
This work has been developed in the framework of the
bourhood. In a second pass, the mini-clusters generated
project TEC2013-43935-R, funded by the Spanish Ministerio
from the first pass are compared in also a small temporal
de Economia y Competitividad and the European Regional
neighbourhood and a relaxed similarity threshold, assessed
Development Fund (ERDF).
with the TF-IDF descriptors of each mini-cluster. At this
stage, the user ID associated to each photo is also added
as an additional textual tag. Finally, a third pass considers 6. REFERENCES
all pairs of existing mini-clusters in a global neighborhood. [1] J. Bergstra, D. Yamins, and D. D. Cox. Making a
However, in this case the similarity threshold for merging is science of model search: Hyperparameter optimization
much more strict. in hundreds of dimensions for vision architectures. In
ICML (1), volume 28 of JMLR Proceedings, pages
3. EXPERIMENTS AND RESULTS 115–123. JMLR.org, 2013.
[2] C. Fellbaum. WordNet. Wiley Online Library, 1998.
The UPC participated in Subtask 1 with the results shown
in Table 1. The description of the different runs is the fol- [3] M. Haklay and P. Weber. OpenStreetMap:
lowing: User-generated street maps. Pervasive Computing,
IEEE, 7(4):12–18, 2008.
Run 1: No tags were added. The parameters that opti- [4] E. Loper and S. Bird. NLTK: The natural language
mized the clustering algorithm were found with hyper- toolkit. In Proceedings of the ACL-02 Workshop on
opt Python package [1]. Effective Tools and Methodologies for Teaching Natural
Language Processing and Computational Linguistics -
Run 2: Configuration from Run 1 was modified by adding Volume 1, ETMTNLP ’02, pages 63–70, Stroudsburg,
textual tags obtained with reverse geolocation , as de- PA, USA, 2002. Association for Computational
scribed in Section 2.2. Linguistics.
Run 3: Configuration from Run 1 was modified by adding [5] D. Manchon Vizuete, I. Gris-Sarabia, and X. Giro-i
textual tags obtained with the hypernyms provided by Nieto. Photo clustering of social events by extending
the NLTK Python package [4], as described in Section PhotoTOC to a rich context. In Proc. Social Events in
2.3. Web Multimedia (SEWN), 2014.
[6] G. Petkos, S. Papadopoulos, V. Mezaris, and
Run 4: Configurations from Run 2 and Run 3 were com- Y. Kompatsiaris. Social Event Detection at MediaEval
bined. 2014: Challenges, datasets, and evaluation. In
MediaEval 2014 Workshop, Barcelona, Spain, October
16-17 2014.
GeoTag Hypernym F1 NMI Div. F1
[7] J. C. Platt, M. Czerwinski, and B. Field. PhotoTOC:
Run 1 0.9240 0.9820 0.9231
automatic clustering for browsing personal
Run 2 • 0.9165 0.9793 0.9155
photographs. In Proc. 4th Pacific Rim Conference on
Run 3 • 0.8141 0.9432 0.8127
Multimedia., volume 1, pages 6–10, 2003.
Run 4 • • 0.8112 0.9393 0.8097
Table 1: UPC results in Subtask 1.