=Paper=
{{Paper
|id=None
|storemode=property
|title=MediaEval Benchmark: Social Event Detection in collaborative photo collections
|pdfUrl=https://ceur-ws.org/Vol-807/Brenner_SED_QMUL_me11wn.pdf
|volume=Vol-807
|dblpUrl=https://dblp.org/rec/conf/mediaeval/BrennerI11
}}
==MediaEval Benchmark: Social Event Detection in collaborative photo collections==
MediaEval Benchmark: Social Event Detection in Collaborative Photo Collections Markus Brenner, Ebroul Izquierdo School of Electronic Engineering and Computer Science Queen Mary University of London, London E14NS, UK {markus.brenner, ebroul.izquierdo}@eecs.qmul.ac.uk ABSTRACT collaborative photo collections such as Flickr. The approach is In this paper, we present an approach to detect social events in tailored to the two challenges laid out by the MediaEval SED collaboratively annotated photo collections as part of the Benchmark: The goal of Challenge I relates to soccer events MediaEval Benchmark. We combine various information from taking place in two given cities, and that of Challenge II to events tagged photos with external data sources to train a classification at two given (music) venues during a given month. model. Experiments based on the MediaEval Social Event The remainder of this paper is structured as follows: In the next Detection Dataset demonstrate the effectiveness of our approach. section we set forth how we gather relevant external information and describe the feature extraction from the photos. Then, we Categories and Subject Descriptors explain the design of our classifier-based approach. Using H.3.1 [Information Storage and Retrieval]: Content Analysis experiments, we test and discuss the overall framework and and Indexing; H.3.3 [Information Systems]: Information Storage present our conclusions. and Retrieval 2. GATHERING EXTERNAL DATA General Terms 2.1 Challenge I: Soccer Matches Design, Experimentation, Performance Our strategy for detecting soccer events (or matches) is to first Keywords find all soccer clubs and associated stadiums for the given cities in the challenge query. We automatically retrieve this information Benchmark, Photo Collections, Classification, Event Detection from DBpedia by means of the SPARQL interface. For each 1. INTRODUCTION soccer club, we also gather its club- and nickname. Similarly, we The Internet enables people to host, access and share their photos request alternative names for the stadiums as well as any location online, e.g. through websites like Flickr. Collaborative information available. For simplicity, we limit ourselves to bigger annotations and tags are commonplace on such services. The soccer events by considering only those clubs whose home information people assign vary greatly, but often seem to include stadiums have a capacity of at least 20000 people. some kind of references to what happened where and who was To our knowledge, there is no public dataset or web service involved. In other words, such references describe observed that provides all-encompassing statistics related to the world of experiences or occurrences that are simply referred to as events sports. As for soccer only, there are a few dedicated websites, one [7]. In order to enable users to explore such events in their photo of which is playerhistory.com. The website does not provide an collections, effective approaches to detect events and group API, and thus, we manually navigate and parse through the corresponding photos are needed. The MediaEval Social Event webpages to retrieve the date and opposing team of all matches Detection (SED) Benchmark [4] provides a platform to compare against any of the home teams found earlier on. different such approaches. 2.2 Challenge II: Music Performances 1.1 Background We define a venue as a place (usually with a physical location) at There is increasing research in the area of event detection in web which events can occur. There are web services like Foursquare resources in general. The subdomain we focus on is photo that compile and maintain venue directories. We use Last.fm, websites, where users can collaboratively annotate their photos. which specializes in music-related venues and events, to retrieve Recent research like [1] put emphasis on detecting events from data such as venue location and performances (date and time, title, Flickr photos by primarily exploiting user-supplied tags. [6] and artists, etc.) associated with the venues given in Challenge II. [5] extend this to place semantics, the latter incorporating the 2.3 Generic Terms and Location visual similarity among photos as well. Our aim, however, is to For each challenge, we compile a list of generic words relating to also use information from external sources to find photos the challenge. Examples are goal or stadium for Challenge I, or corresponding to the same events. [2] is an example that goes music and concert for Challenge II. We utilize both DBpedia and further in our direction by exploiting Wikipedia classes. WordNet for the task. Depending on the country the venue is 1.2 Objective located in, we additionally get corresponding translations via the In this work we present an approach where we utilize external Google Translate API. sources to detect social events and group applicable photos in For each venue, we also gather location-centric information like suburb, region and the geographic coordinates. We employ the Google MAP API to query the mentioned information based on initial evidence from DBpedia (Challenge I) and the venue location available through Last.fm (Challenge II). Copyright is held by the author/owner(s). MediaEval 2011 Workshop, September 1-2, 2011, Pisa, Italy 3. DETECTING IN- AND OUTLIERS As geo-tagged photos become more and more popular, we can identify photos as belonging and not belonging to a venue (and thus an event when also considering the time). Prior to discarding all photo outliers from the dataset at this stage, we extract features For Challenge I, we identify two soccer clubs (we discard of them as well as of the inliers. We later incorporate both in a several smaller ones) for each given city. We find and detect a classification process to train appropriate classes. total of twelve events (two if not considering external event In general, the date and time a photo was captured is an sources as outlined in Section 2) at their according venues effective cue to bound the search and classification space. The (stadiums). For Challenge II, we compile a total of 37 events (six MediaEval Benchmark defines an event as a distinct combination without external event sources). of location and date (but not time). As such, we can limit our We find about 14300 geographic outliers not associated with approach to at most one event per day at the same location. Note any venue (of both challenges), thus substantially reducing the that we also try to retrieve the event time so that we can further testing candidates while providing a large amount of training tighten the bound to within a certain margin. samples for the non-relating class. We do not further classify photos which match both venue and Certain samples in our experiments suggest that the number of time of an event. If we find multiple photos (at least five) that false positives could potentially be reduced by considering terms match only a venue’s location but do not fall into any of that reflecting geographic places like Paris or London that do not venue’s events (e.g. gathered through external sources), we correspond to an event’s venue location. We also notice the consider them as part of another new event. special case where the exemplarily term London is part of a particular event’s title (with its venue being in Amsterdam), and 4. COMPOSING FEATURES thus, actually leads to numerous incorrect classifications. We compose text features of each photo’s title, description, In the following table we present our test results (as evaluated keywords and username (perhaps linking a user’s collection). In by the organizers of the MediaEval Benchmark). our training step, we also include the generic terms we compiled previously as well as the event information. Table 1: Results depending on configuration Then, we apply a Roman preprocessor that converts text into Configuration Challenge I Challenge II lower case, strips punctuation as well as whitespaces and removes F-Score NMI F-Score NMI accents from Unicode characters. It also eliminates common Complete configuration 45.5 0.28 25.9 0.36 (stop) words like and, cannot, you etc. Moreover, we discard all Without generic terms 68.7 0.41 33.0 0.50 words that are less than three characters in length. We also ignore Without other challenge 60.3 0.38 25.6 0.20 numbers and terms commonly associated with photography. Without outlier features 43.1 0.19 19.0 0.28 Examples are Canon, Nikon, 80mm and IMG_8152. Finally, As expected, we see a notable performance gain when using photos with less than two words overall are filtered out. geographic outlier features. This is also true for externally sourced In the next step, we split the words into tokens. The text events (omitted above). Surprisingly, generic terms have a assigned to photos by users on online services such as Flickr is negative impact (less precision). often not clean: Words have spelling errors and different suffixes and prefixes. Furthermore, traditional natural language processing 7. CONCLUSION steps, e.g. word-stemming, are often tailored to the English We present an approach to find and detect social events in tagged language. To accommodate other languages, we do not apply a photo collections. We combine external information with (mostly word-based tokenizer but a language-agnostic character-based textual) data extracted from photos to train a classifier. Based on tokenizer (minimum three, maximum seven characters). However, our experiments, we conclude that external information and we exclude the username from this step (it is an ID and has no identified outliers can aid classification, but challenges such as alternative word forms). We also take all preprocessed words in finding and linking structured external data remain. For future their full and non-tokenized form into account. experiments, we intend to additionally detect events from the We then use a vectorizer to convert the tokens into a matrix of photos’ textual annotations as well as include visual features to occurrences. To make up for photos with a large amount of textual further improve results. annotations, we also consider the total number of tokens. This 8. REFERENCES approach is commonly referred to as Term Frequencies (TF). [1] Chen, L. and Roy, A. 2009. Event detection from Flickr 5. CLASSIFICATION data through wavelet-based spatial analysis. ACM CIKM After composing the features, we train a Linear Support Vector (2009), 523–532. Classifier [3]. Based on brief internal tests, we use a value of 100 [2] Firan, C.S. et al. 2010. Bringing order to your photos: for parameter C and otherwise recommended default parameters. Event-driven classification of Flickr images based on For each event, we train a separate classifier. As mentioned social knowledge. ACM CIKM (2010), 189–198. earlier, we only consider testing samples falling on the same day [3] Keerthi, S.S. et al. 2008. A sequential dual method for according to each event in the prediction step. Basically, we large scale multi-class linear SVMs. ACM KDD (2008), perform binary classification: Photos which are either related or 408–416. not related to an event. However, introducing a third class [4] Papadopoulos, S. et al. 2011. Social Event Detection at reflecting events from the same challenge seems to perform better. MediaEval 2011: Challenges, Dataset and Evaluation. Given the assumption that both challenges are exclusive, we MediaEval 2011 Workshop (Pisa, Italy, Sep. 2011). include the features of each other’s challenge in the appropriate [5] Papadopoulos, S. et al. 2010. Cluster-based landmark and class label. We aggregate the features of the location in- and event detection on tagged photo collections. Multimedia, outliers into single samples (starting as a set of distinct terms), as IEEE. 99 (2010), 1–1. it seems to perform better than considering multiple samples (with [6] Rattenbury, T. et al. 2007. Towards automatic extraction of the same class label). event and place semantics from Flickr tags. ACM SIGIR (2007), 103–110. 6. EXPERIMENTS AND RESULTS [7] Troncy, R. et al. 2010. Linking events with media. We perform experiments on the MediaEval SED Dataset that I-Semantics (2010), 1–4. consists of 73645 Flickr photos with accompanying metadata.