CERTH @ MediaEval 2011 Social Event Detection Task Symeon Papadopoulos, Yiannis Kompatsiaris Athena Vakali 1 2 Christos Zigkolis Informatics & Telematics Informatics Department 1 CERTH, Thessaloniki Institute Aristotle University, 2 Aristotle University CERTH, Thessaloniki, Greece Thessaloniki, Greece {papadop,chzigkol}@iti.gr ikom@iti.gr avakali@csd.auth.gr ABSTRACT that is focused on the five cities of the SED dataset. For This paper describes the participation of CERTH in the “So- those photos that have geotagging information associated cial Event Detection Task @ MediaEval 2011”, which aims with them (∼ 20% of the photos [1]), the classifier simply at discovering social events in a large photo collection. The assigns the nearest city to the photo (geodesic distance is task comprises two challenges: (i) identification of soccer used for ranking). For the non-geotagged photos, the clas- events in the cities of Barcelona and Rome, and (ii) iden- sifier employs a tag-based matching scheme to classify the tification of events taking place in two specific venues. We photo to one of the cities: the classifier counts the number of adopt an approach that combines spatial and temporal fil- city-specific tags in the textual metadata (title, description) ters with tag-based location classification models and an ef- of the photo for each dataset city and selects the one, with ficient photo clustering method. In our best runs, we achieve which the photo shares the maximum number of city-specific F-measure and NMI scores of 77.4% and 0.63 respectively tags; the city tags are automatically derived from statisti- for Challenge 1, and 64% and 0.38 for Challenge 2. cal analysis of tags of city photos collected independently from Flickr. If the classifier assigns a photo to a city that is Categories and Subject Descriptors not of interest for the challenge at hand, then this photo is H.3 [Information Search and Retrieval] not further considered (but is not excluded from the event expansion step described in subsection 2.3). 1. INTRODUCTION Subsequently, a finer-grained classifier is employed for se- In this paper we present our system, experiments, and con- lecting only the photos that are related to the topic/entity clusions in the context of the MediaEval 2011 Social Event of interest. For Challenge 1, a soccer classifier was created, Detection (SED) Task. The SED Task, which is described while a venue classifier was employed for Challenge 2. Both in detail in [1], provides a collection of 73,645 tagged photos classifiers rely on an approach similar to the one described from Flickr and requests the detection of two types of so- for the city classification. In both cases, appropriate tag cial events. Challenge 1 pertains to the detection of soccer models (soccer model and venue models, one for each of the events in the cities of Barcelona and Rome. Challenge 2 asks venues specified by the task) were used that will be further for events taking place in Paradiso (Amsterdam) and Parc described in Section 3. del Forum (Barcelona). The task considers a social event as a group of photos capturing some aspect of a certain event. Formally, given the collection P , {p} of photos, the task asks for the detection of K events {Ei |Ei ⊂ P }, i = 1, ..., K. 2. SED APPROACH We employed a common approach for tackling both chal- lenges. Figure 1 illustrates its main steps: (a) photo filter- ing, (b) event partitioning, and (c) event expansion. 2.1 Photo filtering This step is implemented through the cascaded combination of two classifiers. The first classifier is a city-level classifier Copyright is held by the author/owner(s). MediaEval 2011 Workshop, September 1-2, 2011, Pisa, Italy Figure 1: Proposed SED approach 2.2 Event partitioning run p1 p2 p3 NMI P R F We define a single event by a date-place combination. For 1 m1,b ttd T 0.3742 57.66 62.50 59.98 that reason, we first enumerate all unique dates that appear 2 m1,+ tt ∅ 0.5707 90.58 67.58 77.40 in the set of photos collected from the photo filtering step 3 m1,+ tt T 0.6180 90.58 67.58 77.40 described above. For each unique date, we consider a dis- 4 m1,+ tt V 0.5748 89.18 67.58 76.89 tinct event, except for the dates for which there are photos 5 m1,+ ttd T 0.6301 94.63 65.43 77.37 classified to more than one city of interest. For these dates, one distinct event is considered for each different city, with Table 1: Results for Challenge 1 which at least one photo is associated. At the end of this step, a list of events is available and each event of this list run p1 p3 NMI P R F is associated with a set of photos. 1 mb,2 ∅ 0.2516 51.36 48.85 50.08 2 mb,2 T 0.2629 50.58 48.85 49.70 2.3 Event expansion 3 mb,2 V 0.2527 51.27 48.85 50.03 Each event produced by the event partitioning step is en- 4 mb,2 H 0.2646 50.58 48.85 49.70 riched by making use of the metadata of the photos asso- 5 m+,2 H 0.3796 54.31 77.90 64.00 ciated with it. A first expansion is carried out by adding photos of the same user at the same day of the event. Next, Table 2: Results for Challenge 2 photos with geotagging information that are located in the vicinity of the event (within a radius of 200m) are also added to the event under consideration. Finally, an addi- 4. DISCUSSION The first important observation by studying the results in tional list of photos related to the event are discovered by Tables 1 and 2 highlights the importance of using an ap- means of clustering the photo collection and selecting the propriate tag model for photo classification. A significant photos of the same cluster under the constraint that their improvement in all evaluation measures is achieved by use creators/owners are already associated to the event through of an enriched tag model. For instance, in Challenge 1, this at least one photo. The photo collection is clustered by is clearly visible by comparing runs 1 and 5, while in Chal- means of a community detection scheme that is applied on lenge 2 it is demonstrated by comparing runs 4 and 5. This a visual, tag or hybrid similarity graph [2]. highlights the value of rich domain knowledge in the reliable detection of social events in photo collections. In addition, 3. EXPERIMENTS the use of description (ttd) in addition to the title and tags We present a set of 10 experiments that evaluate the per- of photos appears to improve the performance of our system formance of the system under a variety of configurations. when the extended soccer tag model is used (compare runs Tables 1 and 2 summarize the results obtained from the 3 and 5 in Table 1). official submission of the five runs to Challenges 1 and 2 re- spectively. In both tables, the run number along with the se- Finally, the use of image clustering appears to be of limited lected parameters are listed together with the achieved per- value to the system. In Challenge 1, there is a modest im- formance scores, Normalized Mutual Information (N M I), provement in the obtained NMI when using the tag-based Precision (P ), Recall (R) and F-measure (F ), which are de- clusters (run 3 versus run 2), and a marginal improvement scribed in [1]. For Challenge 1, all three parameters appear- when usign the visual clusters. However, a slight drop in ing in Figure 1 are studied, while for Challenge 2, only the precision is observed when using the visual clusters (run 4 first and third parameter are studied. versus run 2). Similar results are obtained for Challenge 2. We attribute this result to the fact that most of the potential The first parameter (p1 ) pertains to the tag model used for gains of clustering are already captured by the user-based filtering out irrelevant photos. Two different soccer tag mod- event expansion (used in all runs). els were used for Challenge 1, i.e. p1 ∈ {m1,b , m1,+ }, where m1,b is the baseline soccer tag model containing generic soc- In conclusion, the experiments indicate the importance of cer tags as well as tags consisting of spanish and italian textual metadata of photos in combination with rich do- football club names. The extended tag soccer model (m1,+ ) main knowledge for the effective detection of events in large additionally contains alternative team names (e.g. “Blau- photo collections. In the future, we plan a more comprehen- grana” for Barcelona FC) and stadium names. For Challenge sive analysis of false positives and false negatives in order to 2, a similar selection was available: the baseline venue tag further improve the system performance. model (m2,b ) consisted of few tags with generic music event terms (e.g. “concert”, “gig”) as well as the names of the two Acknowledgments venues of interest. The extended model (m2,+ ) was enriched This work has been supported by the GLOCAL EC project with the names of the bands playing in these venues in May under contract number FP7-248984. 2009 that were retrieved by use of the last.fm API. The second parameter (p2 ) regards the use of description 5. REFERENCES along with the photo title/tags (p2 ∈ {tt, ttd}), ttd denoting [1] S. Papadopoulos, R. Troncy, V. Mezaris, B. Huet, and I. Kompatsiaris. Social event detection at mediaeval 2011: the use of description in addition to title/tags (tt). The third Challenges, dataset and evaluation. In MediaEval 2011 parameter (p3 ) regards the use of clustering for event expan- Workshop, Pisa, Italy, September 1-2 2011. sion (p3 ∈ {∅, T, V, H}), where the options of tag-based (T ), [2] S. Papadopoulos, C. Zigkolis, Y. Kompatsiaris, and visual (V ), hybrid (H) clusters (produced by graph-based A. Vakali. Cluster-based landmark and event detection for clustering on graphs comprising both tag-based and visual tagged photo collections. Multimedia, IEEE, 18(1):52 – 63, similarities) or no clusters at all (∅) were available. jan. 2011.