=Paper= {{Paper |id=None |storemode=property |title=MediaEval Benchmark: Social Event Detection in collaborative photo collections |pdfUrl=https://ceur-ws.org/Vol-807/Brenner_SED_QMUL_me11wn.pdf |volume=Vol-807 |dblpUrl=https://dblp.org/rec/conf/mediaeval/BrennerI11 }} ==MediaEval Benchmark: Social Event Detection in collaborative photo collections== https://ceur-ws.org/Vol-807/Brenner_SED_QMUL_me11wn.pdf
                 MediaEval Benchmark: Social Event Detection
                      in Collaborative Photo Collections
                                          Markus Brenner, Ebroul Izquierdo
                              School of Electronic Engineering and Computer Science
                               Queen Mary University of London, London E14NS, UK
                               {markus.brenner, ebroul.izquierdo}@eecs.qmul.ac.uk
ABSTRACT                                                              collaborative photo collections such as Flickr. The approach is
In this paper, we present an approach to detect social events in      tailored to the two challenges laid out by the MediaEval SED
collaboratively annotated photo collections as part of the            Benchmark: The goal of Challenge I relates to soccer events
MediaEval Benchmark. We combine various information from              taking place in two given cities, and that of Challenge II to events
tagged photos with external data sources to train a classification    at two given (music) venues during a given month.
model. Experiments based on the MediaEval Social Event                   The remainder of this paper is structured as follows: In the next
Detection Dataset demonstrate the effectiveness of our approach.      section we set forth how we gather relevant external information
                                                                      and describe the feature extraction from the photos. Then, we
Categories and Subject Descriptors                                    explain the design of our classifier-based approach. Using
H.3.1 [Information Storage and Retrieval]: Content Analysis           experiments, we test and discuss the overall framework and
and Indexing; H.3.3 [Information Systems]: Information Storage        present our conclusions.
and Retrieval                                                         2. GATHERING EXTERNAL DATA
General Terms                                                         2.1 Challenge I: Soccer Matches
Design, Experimentation, Performance                                  Our strategy for detecting soccer events (or matches) is to first
Keywords                                                              find all soccer clubs and associated stadiums for the given cities in
                                                                      the challenge query. We automatically retrieve this information
Benchmark, Photo Collections, Classification, Event Detection
                                                                      from DBpedia by means of the SPARQL interface. For each
1. INTRODUCTION                                                       soccer club, we also gather its club- and nickname. Similarly, we
The Internet enables people to host, access and share their photos    request alternative names for the stadiums as well as any location
online, e.g. through websites like Flickr. Collaborative              information available. For simplicity, we limit ourselves to bigger
annotations and tags are commonplace on such services. The            soccer events by considering only those clubs whose home
information people assign vary greatly, but often seem to include     stadiums have a capacity of at least 20000 people.
some kind of references to what happened where and who was               To our knowledge, there is no public dataset or web service
involved. In other words, such references describe observed           that provides all-encompassing statistics related to the world of
experiences or occurrences that are simply referred to as events      sports. As for soccer only, there are a few dedicated websites, one
[7]. In order to enable users to explore such events in their photo   of which is playerhistory.com. The website does not provide an
collections, effective approaches to detect events and group          API, and thus, we manually navigate and parse through the
corresponding photos are needed. The MediaEval Social Event           webpages to retrieve the date and opposing team of all matches
Detection (SED) Benchmark [4] provides a platform to compare          against any of the home teams found earlier on.
different such approaches.                                            2.2 Challenge II: Music Performances
1.1 Background                                                        We define a venue as a place (usually with a physical location) at
There is increasing research in the area of event detection in web    which events can occur. There are web services like Foursquare
resources in general. The subdomain we focus on is photo              that compile and maintain venue directories. We use Last.fm,
websites, where users can collaboratively annotate their photos.      which specializes in music-related venues and events, to retrieve
Recent research like [1] put emphasis on detecting events from        data such as venue location and performances (date and time, title,
Flickr photos by primarily exploiting user-supplied tags. [6] and     artists, etc.) associated with the venues given in Challenge II.
[5] extend this to place semantics, the latter incorporating the      2.3 Generic Terms and Location
visual similarity among photos as well. Our aim, however, is to       For each challenge, we compile a list of generic words relating to
also use information from external sources to find photos             the challenge. Examples are goal or stadium for Challenge I, or
corresponding to the same events. [2] is an example that goes         music and concert for Challenge II. We utilize both DBpedia and
further in our direction by exploiting Wikipedia classes.             WordNet for the task. Depending on the country the venue is
1.2 Objective                                                         located in, we additionally get corresponding translations via the
In this work we present an approach where we utilize external         Google Translate API.
sources to detect social events and group applicable photos in           For each venue, we also gather location-centric information
                                                                      like suburb, region and the geographic coordinates. We employ
                                                                      the Google MAP API to query the mentioned information based
                                                                      on initial evidence from DBpedia (Challenge I) and the venue
                                                                      location available through Last.fm (Challenge II).
Copyright is held by the author/owner(s).
MediaEval 2011 Workshop, September 1-2, 2011, Pisa, Italy             3. DETECTING IN- AND OUTLIERS
                                                                      As geo-tagged photos become more and more popular, we can
                                                                      identify photos as belonging and not belonging to a venue (and
                                                                      thus an event when also considering the time). Prior to discarding
all photo outliers from the dataset at this stage, we extract features      For Challenge I, we identify two soccer clubs (we discard
of them as well as of the inliers. We later incorporate both in a        several smaller ones) for each given city. We find and detect a
classification process to train appropriate classes.                     total of twelve events (two if not considering external event
   In general, the date and time a photo was captured is an              sources as outlined in Section 2) at their according venues
effective cue to bound the search and classification space. The          (stadiums). For Challenge II, we compile a total of 37 events (six
MediaEval Benchmark defines an event as a distinct combination           without external event sources).
of location and date (but not time). As such, we can limit our              We find about 14300 geographic outliers not associated with
approach to at most one event per day at the same location. Note         any venue (of both challenges), thus substantially reducing the
that we also try to retrieve the event time so that we can further       testing candidates while providing a large amount of training
tighten the bound to within a certain margin.                            samples for the non-relating class.
   We do not further classify photos which match both venue and             Certain samples in our experiments suggest that the number of
time of an event. If we find multiple photos (at least five) that        false positives could potentially be reduced by considering terms
match only a venue’s location but do not fall into any of that           reflecting geographic places like Paris or London that do not
venue’s events (e.g. gathered through external sources), we              correspond to an event’s venue location. We also notice the
consider them as part of another new event.                              special case where the exemplarily term London is part of a
                                                                         particular event’s title (with its venue being in Amsterdam), and
4. COMPOSING FEATURES                                                    thus, actually leads to numerous incorrect classifications.
We compose text features of each photo’s title, description,
                                                                            In the following table we present our test results (as evaluated
keywords and username (perhaps linking a user’s collection). In
                                                                         by the organizers of the MediaEval Benchmark).
our training step, we also include the generic terms we compiled
previously as well as the event information.                             Table 1: Results depending on configuration
   Then, we apply a Roman preprocessor that converts text into           Configuration                         Challenge I Challenge II
lower case, strips punctuation as well as whitespaces and removes                                             F-Score NMI F-Score NMI
accents from Unicode characters. It also eliminates common               Complete configuration                 45.5 0.28        25.9 0.36
(stop) words like and, cannot, you etc. Moreover, we discard all         Without generic terms                  68.7 0.41        33.0 0.50
words that are less than three characters in length. We also ignore      Without other challenge                60.3 0.38        25.6 0.20
numbers and terms commonly associated with photography.                  Without outlier features               43.1 0.19        19.0 0.28
Examples are Canon, Nikon, 80mm and IMG_8152. Finally,                   As expected, we see a notable performance gain when using
photos with less than two words overall are filtered out.                geographic outlier features. This is also true for externally sourced
   In the next step, we split the words into tokens. The text            events (omitted above). Surprisingly, generic terms have a
assigned to photos by users on online services such as Flickr is         negative impact (less precision).
often not clean: Words have spelling errors and different suffixes
and prefixes. Furthermore, traditional natural language processing       7. CONCLUSION
steps, e.g. word-stemming, are often tailored to the English             We present an approach to find and detect social events in tagged
language. To accommodate other languages, we do not apply a              photo collections. We combine external information with (mostly
word-based tokenizer but a language-agnostic character-based             textual) data extracted from photos to train a classifier. Based on
tokenizer (minimum three, maximum seven characters). However,            our experiments, we conclude that external information and
we exclude the username from this step (it is an ID and has no           identified outliers can aid classification, but challenges such as
alternative word forms). We also take all preprocessed words in          finding and linking structured external data remain. For future
their full and non-tokenized form into account.                          experiments, we intend to additionally detect events from the
   We then use a vectorizer to convert the tokens into a matrix of       photos’ textual annotations as well as include visual features to
occurrences. To make up for photos with a large amount of textual        further improve results.
annotations, we also consider the total number of tokens. This           8. REFERENCES
approach is commonly referred to as Term Frequencies (TF).               [1]    Chen, L. and Roy, A. 2009. Event detection from Flickr
5. CLASSIFICATION                                                               data through wavelet-based spatial analysis. ACM CIKM
After composing the features, we train a Linear Support Vector                  (2009), 523–532.
Classifier [3]. Based on brief internal tests, we use a value of 100     [2]    Firan, C.S. et al. 2010. Bringing order to your photos:
for parameter C and otherwise recommended default parameters.                   Event-driven classification of Flickr images based on
    For each event, we train a separate classifier. As mentioned                social knowledge. ACM CIKM (2010), 189–198.
earlier, we only consider testing samples falling on the same day        [3]    Keerthi, S.S. et al. 2008. A sequential dual method for
according to each event in the prediction step. Basically, we                   large scale multi-class linear SVMs. ACM KDD (2008),
perform binary classification: Photos which are either related or               408–416.
not related to an event. However, introducing a third class              [4]    Papadopoulos, S. et al. 2011. Social Event Detection at
reflecting events from the same challenge seems to perform better.              MediaEval 2011: Challenges, Dataset and Evaluation.
    Given the assumption that both challenges are exclusive, we                 MediaEval 2011 Workshop (Pisa, Italy, Sep. 2011).
include the features of each other’s challenge in the appropriate        [5]    Papadopoulos, S. et al. 2010. Cluster-based landmark and
class label. We aggregate the features of the location in- and                  event detection on tagged photo collections. Multimedia,
outliers into single samples (starting as a set of distinct terms), as          IEEE. 99 (2010), 1–1.
it seems to perform better than considering multiple samples (with       [6]    Rattenbury, T. et al. 2007. Towards automatic extraction of
the same class label).                                                          event and place semantics from Flickr tags. ACM SIGIR
                                                                                (2007), 103–110.
6. EXPERIMENTS AND RESULTS                                               [7]    Troncy, R. et al. 2010. Linking events with media.
We perform experiments on the MediaEval SED Dataset that                        I-Semantics (2010), 1–4.
consists of 73645 Flickr photos with accompanying metadata.