=Paper= {{Paper |id=None |storemode=property |title=Leveraging linked data in Social Event Detection |pdfUrl=https://ceur-ws.org/Vol-807/Hintsa_SED_VTT_me11wn.pdf |volume=Vol-807 |dblpUrl=https://dblp.org/rec/conf/mediaeval/HintsaVM11 }} ==Leveraging linked data in Social Event Detection== https://ceur-ws.org/Vol-807/Hintsa_SED_VTT_me11wn.pdf
              Leveraging Linked Data in Social Event Detection
                  Timo Hintsa                                    Sari Vainikainen                              Magnus Melin
                     VTT                                                 VTT                                          VTT
                 P.O.Box 1000                                        P.O.Box 1000                                P.O. Box 1000
            FI-02044, VTT, Finland                              FI-02044, VTT, Finland                       FI-02044, VTT, Finland
               +358 40 837 2723                                    +358 50 525 5794                             +358 40 589 6384
              timo.hintsa@vtt.fi                           sari.vainikainen@vtt.fi                        magnus.melin@vtt.fi



ABSTRACT                                                                     “profile” for the event and matching it with the metadata of
In this paper, we present our approach and results for the                   photos.
MediaEval 2011 Social Event detection task. VTT participated in
Challenge 2 where a given dataset of Flickr photos were matched
                                                                             2. DESCRIPTION OF THE APPROACH
to events in certain places. We used Linked Data to enhance the              The main point of the approach was to connect the given photos to
                                                                             events that were found using the Linked Data sources on the
dataset by adding event information and other related data and
then searching the enhanced dataset. Additional information                  Internet. Linked Data was used to get additional information
relating to venues and places were used for creating a subset of             relating to events, artists, venues and places.
photos for each place; Barcelona and Amsterdam. The event                    2.1 Enhancing Dataset with Linked Data
profiles including semantically enhanced metadata were used in               First we used publicly available event services such as Last.fm5
media retrieval. The approach of combining additional data from              and Upcoming6 to find information about the relevant events. The
the Internet and limiting the queries to limited subsets improved            event descriptions including title, description, artist, time and
the relevance of photos relating to the events.                              venue information was stored in a database.
Categories and Subject Descriptors                                           By using Freebase we looked up the unique identifiers for the
H.3 [Information Storage and Retrieval]:, H.3.1 Content Analysis             artists and bands. Based on these URIs, additional information
and Indexing; H 3.3 Information Search and Retrieval; H5.3 On-               such as genre and band members were collected and stored in a
line Information Services                                                    database. This additional information was used for updating the
                                                                             “profile” of the event.
General Terms                                                                We used Freebase and GeoNames for getting additional
Experimentation                                                              information relating to places. This included getting coordinates
                                                                             for the venues and cities, as well as different language versions for
Keywords                                                                     the cities and countries. We used Freebase for getting information
events, Linked Data, metadata enhancement, media retrieval                   about the tourist attractions in Barcelona and Amsterdam, and
                                                                             GeoNames for getting places near venues by utilizing coordinates.
1. MOTIVATION AND RELATED WORK                                               An assumption was that these were things that users commonly
The challenges, dataset and evaluation methods of the Social                 use for describing the photos.
Event detection task are described in [3]. VTT participated in
challenge 2 where the task was to find all events that took place in         We created a limited dataset for each place based on the photo
May 2009 at defined venues, Parc del Forum in Barcelona and                  location information. The tourist attractions, nearby places and
Paradiso in Amstedam, and to find all photos associated with the             coordinates that were too far from the venues were used to
events.                                                                      exclude irrelevant photos from the limited dataset. The goal was
                                                                             to be able to create more relevant matches between the events and
In our earlier research [4] we have worked with personalized                 photos.
recommendations where events were recommended to the end
user based on the user’s interests. The approach was to test similar         2.2 Run Configurations
methods for “recommending” relevant media items to the event.                2.2.1 First Run
In our earlier work with user profiles we have used Linked Data1             In the first run, searching for photos that matched the relevant
and publicly available semantic databases such as Freebase2,                 events was made against the datasets in which the photos were
DBpedia3 and GeoNames4 for enhancing the user profile with                   limited based on the places.
additional semantic information [2,4]. In this challenge we used
Linked Data for enhancing the event descriptions and for multi-              The run consisted of a set of queries that include matching the
language support. The information was used for creating the                  artist name and the time of the event, the venue name and the time
                                                                             of the event, and the event name and the time of the event with the
                                                                             metadata (title, tags, description and time taken) of the photos in
                                                                             the dataset.
1
    http://linkeddata.org
2
    http://www.freebase.com
3                                                                            5
    http://dbpedia.org                                                           http://www.last.fm
4                                                                            6
    http://www.geonames.org                                                      http://upcoming.yahoo.com

    Copyright is held by the author/owner(s).
    MediaEval 2011 Workshop, September 1-2, 2011, Pisa, Italy
The goal of this run was to get a set of highly relevant matches      When analysing the irrelevant photos in the results of the second
between events and photos.                                            run we found that more logic should be developed for checking
                                                                      the reliability of the results. To enhance the quality of the second
2.2.2 Second Run                                                      run, the event profile created from the users’ tags should have
In the second run we used the results of the first run, but we        been cleaned up from irrelevant tags regarding the image content,
created additional searches for the total dataset of photos for       e.g. camera makers and models. Further analysis of tag relevance
finding more relevant photos.                                         based on occurrence and co-occurrence could have been made to
Event names without time restriction were queried against the         further define the tag relevancies to images and the event.
metadata of photos. In the case of Parc del Forum, event names
were quite unique such as Primavera Sound 2009 and the queries        We planned to make the semantic analysis [2] of users’ tags, but
found relevant images. In the case of Paradiso the name of events     did not do it due to the time needed to analyse all the images and
were often same as the artist that were performing in the event. If   seemingly high variance on the quality of the tags themselves.
time restriction was not used together with the event name, quite a   However, the analysis would have helped to better determine the
lot of irrelevant photos were attached to the events. We used this    place-related tags and remove false positives in the result sets.
query only in the case of Parc del Forum.                             A search for other photos from the same user within same
The event profiles and their tag clouds were enhanced with the        timeframe as the ones found in the first run was not conducted.
results of the first run, namely the tags from the photos that were   This search would have helped to find photosets where only one
found relevant to the event. In this phase, the event profiles        or few of the photos are tagged, but the rest of the photos are from
consisted of the event name, venue, city, artists, genre, band        the same event.
member information, and the photo tags from the previous run.         Solr parameters, like the score parameter, can be adjusted further
Apache Solr7 and Lucene8 were used in free-text indexing and          and more logic can be added to find irrelevant photos especially
searching the textual photo metadata, namely tags and photo           when the score parameter value is lowered. Other Lucene
descriptions. The photo index was searched with the information       functionality like MoreLikeThis would also be worth exploring.
in the event profile. The Lucene score limit for accepted result
was set relatively high (i.e. 0.5) so that the irrelevant photos      4. ACKNOWLEDGMENTS
would be left out. To further increase the relevance the searches     The work presented in this paper was partially funded by the
were run on the limited datasets of the Barcelona photos and the      OpenSEM project funded by EIT ICT Labs. We would like to
Amsterdam photos as described in chapter 2.1.                         thank Onni Ojutkangas, Asko Ollila, Johannes Peltola, Antti
                                                                      Nummiaho and Mika Timonen for code snippets, thoughts and
3. RESULTS AND DISCUSSION                                             ideas while planning and realizing this task.
The results of our submitted runs can be seen in the table 1. The
evaluation measures are described in [3].                             5. REFERENCES
                                                                      [1] Modsching M., Kramer R. and ten Hagen K. 2006. Field trial
             Table 1. The results of the submitted runs                   on GPS Accuracy in a medium size city: The influence of
      Run         Precision     Recall      F-score       NMI             builtup. 3rd Workshop on Positioning, Navigation and
                                                                          Communication 2006, WPNC’06 Hannover, Germany
        1           72,18           48,41    57,96     0,5839
                                                                          March 16 2006. Proceedings.
        2           73,79           64,21    68,67     0,6782
                                                                      [2] Nummiaho A., Vainikainen S., Melin M. 2010. Utilizing
As expected, the recall of the first run was low due to the use of        Linked Open Data Sources for Automatic Generation of
the limited set of photos, however the photos were quite relevant.        Semantic Metadata. Metadata and Semantic Research 4th
Our additions to the second run improved the results and more             International Conference, MTSR 2010, Alcalá de Henares,
relevant photos were found.                                               Spain, October 20-22, 2010. Proceedings. Metadata and
                                                                          Semantic Research,Communications in Computer and
Our approach of limiting searches to the subset of photos, which          Information Science, 2010, Volume 108, 78-83, DOI:
was created based on additional information gathered from Linked          10.1007/978-3-642-16552-8_8.
Data, increased the relevance of photos.
                                                                      [3] Papadopoulos S., Troncy R., Mezaris V., Huet B. and
One challenge in the development was the unreliability of the             Kompatsiaris I. Social Event Detection at MediaEval 2011:
photo metadata. We could see that the photo timestamps that are           Challenges, Dataset and Evaluation. In MediaEval 2011
created by different cameras were not always reliable. This made          Workshop, September 1-2, 2011, Pisa, Italy.
it difficult to match different images to events using the time
                                                                      [4] Vainikainen S., Laakko T., Giesecke R., Vesikivi P. 2011.
information. The same problem was noted with the GPS
                                                                          Context awareness – portable profiles, HTML5 and
coordinates where even the inherent error in location precision in
                                                                          advertiser’s metadata. Next Media deliverable D3.0.1.2.
city environments is tens of meters [1]. This is particularly shown
in the Paradiso case where distances as low as 100 meters from
the center of the building yield false positives.




7
    http://lucene.apache.org/solr
8
    http://lucene.apache.org