Leveraging Linked Data in Social Event Detection Timo Hintsa Sari Vainikainen Magnus Melin VTT VTT VTT P.O.Box 1000 P.O.Box 1000 P.O. Box 1000 FI-02044, VTT, Finland FI-02044, VTT, Finland FI-02044, VTT, Finland +358 40 837 2723 +358 50 525 5794 +358 40 589 6384 timo.hintsa@vtt.fi sari.vainikainen@vtt.fi magnus.melin@vtt.fi ABSTRACT “profile” for the event and matching it with the metadata of In this paper, we present our approach and results for the photos. MediaEval 2011 Social Event detection task. VTT participated in Challenge 2 where a given dataset of Flickr photos were matched 2. DESCRIPTION OF THE APPROACH to events in certain places. We used Linked Data to enhance the The main point of the approach was to connect the given photos to events that were found using the Linked Data sources on the dataset by adding event information and other related data and then searching the enhanced dataset. Additional information Internet. Linked Data was used to get additional information relating to venues and places were used for creating a subset of relating to events, artists, venues and places. photos for each place; Barcelona and Amsterdam. The event 2.1 Enhancing Dataset with Linked Data profiles including semantically enhanced metadata were used in First we used publicly available event services such as Last.fm5 media retrieval. The approach of combining additional data from and Upcoming6 to find information about the relevant events. The the Internet and limiting the queries to limited subsets improved event descriptions including title, description, artist, time and the relevance of photos relating to the events. venue information was stored in a database. Categories and Subject Descriptors By using Freebase we looked up the unique identifiers for the H.3 [Information Storage and Retrieval]:, H.3.1 Content Analysis artists and bands. Based on these URIs, additional information and Indexing; H 3.3 Information Search and Retrieval; H5.3 On- such as genre and band members were collected and stored in a line Information Services database. This additional information was used for updating the “profile” of the event. General Terms We used Freebase and GeoNames for getting additional Experimentation information relating to places. This included getting coordinates for the venues and cities, as well as different language versions for Keywords the cities and countries. We used Freebase for getting information events, Linked Data, metadata enhancement, media retrieval about the tourist attractions in Barcelona and Amsterdam, and GeoNames for getting places near venues by utilizing coordinates. 1. MOTIVATION AND RELATED WORK An assumption was that these were things that users commonly The challenges, dataset and evaluation methods of the Social use for describing the photos. Event detection task are described in [3]. VTT participated in challenge 2 where the task was to find all events that took place in We created a limited dataset for each place based on the photo May 2009 at defined venues, Parc del Forum in Barcelona and location information. The tourist attractions, nearby places and Paradiso in Amstedam, and to find all photos associated with the coordinates that were too far from the venues were used to events. exclude irrelevant photos from the limited dataset. The goal was to be able to create more relevant matches between the events and In our earlier research [4] we have worked with personalized photos. recommendations where events were recommended to the end user based on the user’s interests. The approach was to test similar 2.2 Run Configurations methods for “recommending” relevant media items to the event. 2.2.1 First Run In our earlier work with user profiles we have used Linked Data1 In the first run, searching for photos that matched the relevant and publicly available semantic databases such as Freebase2, events was made against the datasets in which the photos were DBpedia3 and GeoNames4 for enhancing the user profile with limited based on the places. additional semantic information [2,4]. In this challenge we used Linked Data for enhancing the event descriptions and for multi- The run consisted of a set of queries that include matching the language support. The information was used for creating the artist name and the time of the event, the venue name and the time of the event, and the event name and the time of the event with the metadata (title, tags, description and time taken) of the photos in the dataset. 1 http://linkeddata.org 2 http://www.freebase.com 3 5 http://dbpedia.org http://www.last.fm 4 6 http://www.geonames.org http://upcoming.yahoo.com Copyright is held by the author/owner(s). MediaEval 2011 Workshop, September 1-2, 2011, Pisa, Italy The goal of this run was to get a set of highly relevant matches When analysing the irrelevant photos in the results of the second between events and photos. run we found that more logic should be developed for checking the reliability of the results. To enhance the quality of the second 2.2.2 Second Run run, the event profile created from the users’ tags should have In the second run we used the results of the first run, but we been cleaned up from irrelevant tags regarding the image content, created additional searches for the total dataset of photos for e.g. camera makers and models. Further analysis of tag relevance finding more relevant photos. based on occurrence and co-occurrence could have been made to Event names without time restriction were queried against the further define the tag relevancies to images and the event. metadata of photos. In the case of Parc del Forum, event names were quite unique such as Primavera Sound 2009 and the queries We planned to make the semantic analysis [2] of users’ tags, but found relevant images. In the case of Paradiso the name of events did not do it due to the time needed to analyse all the images and were often same as the artist that were performing in the event. If seemingly high variance on the quality of the tags themselves. time restriction was not used together with the event name, quite a However, the analysis would have helped to better determine the lot of irrelevant photos were attached to the events. We used this place-related tags and remove false positives in the result sets. query only in the case of Parc del Forum. A search for other photos from the same user within same The event profiles and their tag clouds were enhanced with the timeframe as the ones found in the first run was not conducted. results of the first run, namely the tags from the photos that were This search would have helped to find photosets where only one found relevant to the event. In this phase, the event profiles or few of the photos are tagged, but the rest of the photos are from consisted of the event name, venue, city, artists, genre, band the same event. member information, and the photo tags from the previous run. Solr parameters, like the score parameter, can be adjusted further Apache Solr7 and Lucene8 were used in free-text indexing and and more logic can be added to find irrelevant photos especially searching the textual photo metadata, namely tags and photo when the score parameter value is lowered. Other Lucene descriptions. The photo index was searched with the information functionality like MoreLikeThis would also be worth exploring. in the event profile. The Lucene score limit for accepted result was set relatively high (i.e. 0.5) so that the irrelevant photos 4. ACKNOWLEDGMENTS would be left out. To further increase the relevance the searches The work presented in this paper was partially funded by the were run on the limited datasets of the Barcelona photos and the OpenSEM project funded by EIT ICT Labs. We would like to Amsterdam photos as described in chapter 2.1. thank Onni Ojutkangas, Asko Ollila, Johannes Peltola, Antti Nummiaho and Mika Timonen for code snippets, thoughts and 3. RESULTS AND DISCUSSION ideas while planning and realizing this task. The results of our submitted runs can be seen in the table 1. The evaluation measures are described in [3]. 5. REFERENCES [1] Modsching M., Kramer R. and ten Hagen K. 2006. Field trial Table 1. The results of the submitted runs on GPS Accuracy in a medium size city: The influence of Run Precision Recall F-score NMI builtup. 3rd Workshop on Positioning, Navigation and Communication 2006, WPNC’06 Hannover, Germany 1 72,18 48,41 57,96 0,5839 March 16 2006. Proceedings. 2 73,79 64,21 68,67 0,6782 [2] Nummiaho A., Vainikainen S., Melin M. 2010. Utilizing As expected, the recall of the first run was low due to the use of Linked Open Data Sources for Automatic Generation of the limited set of photos, however the photos were quite relevant. Semantic Metadata. Metadata and Semantic Research 4th Our additions to the second run improved the results and more International Conference, MTSR 2010, Alcalá de Henares, relevant photos were found. Spain, October 20-22, 2010. Proceedings. Metadata and Semantic Research,Communications in Computer and Our approach of limiting searches to the subset of photos, which Information Science, 2010, Volume 108, 78-83, DOI: was created based on additional information gathered from Linked 10.1007/978-3-642-16552-8_8. Data, increased the relevance of photos. [3] Papadopoulos S., Troncy R., Mezaris V., Huet B. and One challenge in the development was the unreliability of the Kompatsiaris I. Social Event Detection at MediaEval 2011: photo metadata. We could see that the photo timestamps that are Challenges, Dataset and Evaluation. In MediaEval 2011 created by different cameras were not always reliable. This made Workshop, September 1-2, 2011, Pisa, Italy. it difficult to match different images to events using the time [4] Vainikainen S., Laakko T., Giesecke R., Vesikivi P. 2011. information. The same problem was noted with the GPS Context awareness – portable profiles, HTML5 and coordinates where even the inherent error in location precision in advertiser’s metadata. Next Media deliverable D3.0.1.2. city environments is tens of meters [1]. This is particularly shown in the Paradiso case where distances as low as 100 meters from the center of the building yield false positives. 7 http://lucene.apache.org/solr 8 http://lucene.apache.org