<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Leveraging Linked Data in Social Event Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Timo Hintsa</string-name>
          <email>timo.hintsa@vtt.fi</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sari Vainikainen</string-name>
          <email>sari.vainikainen@vtt.fi</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Magnus Melin</string-name>
          <email>magnus.melin@vtt.fi</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>VTT</institution>
          ,
          <addr-line>P.O. Box 1000, FI-02044, VTT, Finland, +358 40 589 6384</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>VTT</institution>
          ,
          <addr-line>P.O.Box 1000, FI-02044, VTT, Finland, +358 40 837 2723</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>VTT</institution>
          ,
          <addr-line>P.O.Box 1000, FI-02044, VTT, Finland, +358 50 525 5794</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2011</year>
      </pub-date>
      <fpage>1</fpage>
      <lpage>2</lpage>
      <abstract>
        <p>In this paper, we present our approach and results for the MediaEval 2011 Social Event detection task. VTT participated in Challenge 2 where a given dataset of Flickr photos were matched to events in certain places. We used Linked Data to enhance the dataset by adding event information and other related data and then searching the enhanced dataset. Additional information relating to venues and places were used for creating a subset of photos for each place; Barcelona and Amsterdam. The event profiles including semantically enhanced metadata were used in media retrieval. The approach of combining additional data from the Internet and limiting the queries to limited subsets improved the relevance of photos relating to the events.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;events</kwd>
        <kwd>Linked Data</kwd>
        <kwd>metadata enhancement</kwd>
        <kwd>media retrieval</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Experimentation</p>
    </sec>
    <sec id="sec-2">
      <title>1. MOTIVATION AND RELATED WORK</title>
      <p>
        The challenges, dataset and evaluation methods of the Social
Event detection task are described in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. VTT participated in
challenge 2 where the task was to find all events that took place in
May 2009 at defined venues, Parc del Forum in Barcelona and
Paradiso in Amstedam, and to find all photos associated with the
events.
      </p>
      <p>
        In our earlier research [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] we have worked with personalized
recommendations where events were recommended to the end
user based on the user’s interests. The approach was to test similar
methods for “recommending” relevant media items to the event.
In our earlier work with user profiles we have used Linked Data1
and publicly available semantic databases such as Freebase2,
DBpedia3 and GeoNames4 for enhancing the user profile with
additional semantic information [
        <xref ref-type="bibr" rid="ref2 ref4">2,4</xref>
        ]. In this challenge we used
Linked Data for enhancing the event descriptions and for
multilanguage support. The information was used for creating the
      </p>
      <sec id="sec-2-1">
        <title>1 http://linkeddata.org</title>
      </sec>
      <sec id="sec-2-2">
        <title>2 http://www.freebase.com</title>
      </sec>
      <sec id="sec-2-3">
        <title>3 http://dbpedia.org</title>
      </sec>
      <sec id="sec-2-4">
        <title>4 http://www.geonames.org</title>
        <p>“profile” for the event and matching it with the metadata of
photos.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2. DESCRIPTION OF THE APPROACH</title>
      <p>The main point of the approach was to connect the given photos to
events that were found using the Linked Data sources on the
Internet. Linked Data was used to get additional information
relating to events, artists, venues and places.</p>
    </sec>
    <sec id="sec-4">
      <title>2.1 Enhancing Dataset with Linked Data</title>
      <p>First we used publicly available event services such as Last.fm5
and Upcoming6 to find information about the relevant events. The
event descriptions including title, description, artist, time and
venue information was stored in a database.</p>
      <p>By using Freebase we looked up the unique identifiers for the
artists and bands. Based on these URIs, additional information
such as genre and band members were collected and stored in a
database. This additional information was used for updating the
“profile” of the event.</p>
      <p>We used Freebase and GeoNames for getting additional
information relating to places. This included getting coordinates
for the venues and cities, as well as different language versions for
the cities and countries. We used Freebase for getting information
about the tourist attractions in Barcelona and Amsterdam, and
GeoNames for getting places near venues by utilizing coordinates.
An assumption was that these were things that users commonly
use for describing the photos.</p>
      <p>We created a limited dataset for each place based on the photo
location information. The tourist attractions, nearby places and
coordinates that were too far from the venues were used to
exclude irrelevant photos from the limited dataset. The goal was
to be able to create more relevant matches between the events and
photos.</p>
    </sec>
    <sec id="sec-5">
      <title>2.2 Run Configurations</title>
      <sec id="sec-5-1">
        <title>2.2.1 First Run</title>
        <p>In the first run, searching for photos that matched the relevant
events was made against the datasets in which the photos were
limited based on the places.</p>
        <p>The run consisted of a set of queries that include matching the
artist name and the time of the event, the venue name and the time
of the event, and the event name and the time of the event with the
metadata (title, tags, description and time taken) of the photos in
the dataset.</p>
        <sec id="sec-5-1-1">
          <title>5 http://www.last.fm</title>
        </sec>
        <sec id="sec-5-1-2">
          <title>6 http://upcoming.yahoo.com</title>
          <p>The goal of this run was to get a set of highly relevant matches
between events and photos.</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>2.2.2 Second Run</title>
        <p>In the second run we used the results of the first run, but we
created additional searches for the total dataset of photos for
finding more relevant photos.</p>
        <p>Event names without time restriction were queried against the
metadata of photos. In the case of Parc del Forum, event names
were quite unique such as Primavera Sound 2009 and the queries
found relevant images. In the case of Paradiso the name of events
were often same as the artist that were performing in the event. If
time restriction was not used together with the event name, quite a
lot of irrelevant photos were attached to the events. We used this
query only in the case of Parc del Forum.</p>
        <p>The event profiles and their tag clouds were enhanced with the
results of the first run, namely the tags from the photos that were
found relevant to the event. In this phase, the event profiles
consisted of the event name, venue, city, artists, genre, band
member information, and the photo tags from the previous run.
Apache Solr7 and Lucene8 were used in free-text indexing and
searching the textual photo metadata, namely tags and photo
descriptions. The photo index was searched with the information
in the event profile. The Lucene score limit for accepted result
was set relatively high (i.e. 0.5) so that the irrelevant photos
would be left out. To further increase the relevance the searches
were run on the limited datasets of the Barcelona photos and the
Amsterdam photos as described in chapter 2.1.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>3. RESULTS AND DISCUSSION</title>
      <p>
        The results of our submitted runs can be seen in the table 1. The
evaluation measures are described in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
72,18
48,41
64,21
      </p>
      <p>F-score
57,96
68,67
0,5839
0,6782
As expected, the recall of the first run was low due to the use of
the limited set of photos, however the photos were quite relevant.
Our additions to the second run improved the results and more
relevant photos were found.</p>
      <p>Our approach of limiting searches to the subset of photos, which
was created based on additional information gathered from Linked
Data, increased the relevance of photos.</p>
      <p>
        One challenge in the development was the unreliability of the
photo metadata. We could see that the photo timestamps that are
created by different cameras were not always reliable. This made
it difficult to match different images to events using the time
information. The same problem was noted with the GPS
coordinates where even the inherent error in location precision in
city environments is tens of meters [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This is particularly shown
in the Paradiso case where distances as low as 100 meters from
the center of the building yield false positives.
When analysing the irrelevant photos in the results of the second
run we found that more logic should be developed for checking
the reliability of the results. To enhance the quality of the second
run, the event profile created from the users’ tags should have
been cleaned up from irrelevant tags regarding the image content,
e.g. camera makers and models. Further analysis of tag relevance
based on occurrence and co-occurrence could have been made to
further define the tag relevancies to images and the event.
We planned to make the semantic analysis [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] of users’ tags, but
did not do it due to the time needed to analyse all the images and
seemingly high variance on the quality of the tags themselves.
However, the analysis would have helped to better determine the
place-related tags and remove false positives in the result sets.
A search for other photos from the same user within same
timeframe as the ones found in the first run was not conducted.
This search would have helped to find photosets where only one
or few of the photos are tagged, but the rest of the photos are from
the same event.
      </p>
      <p>Solr parameters, like the score parameter, can be adjusted further
and more logic can be added to find irrelevant photos especially
when the score parameter value is lowered. Other Lucene
functionality like MoreLikeThis would also be worth exploring.</p>
    </sec>
    <sec id="sec-7">
      <title>4. ACKNOWLEDGMENTS</title>
      <p>The work presented in this paper was partially funded by the
OpenSEM project funded by EIT ICT Labs. We would like to
thank Onni Ojutkangas, Asko Ollila, Johannes Peltola, Antti
Nummiaho and Mika Timonen for code snippets, thoughts and
ideas while planning and realizing this task.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Modsching</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kramer</surname>
            <given-names>R</given-names>
          </string-name>
          . and
          <string-name>
            <surname>ten Hagen K.</surname>
          </string-name>
          <year>2006</year>
          .
          <article-title>Field trial on GPS Accuracy in a medium size city: The influence of builtup</article-title>
          . 3rd Workshop on Positioning,
          <source>Navigation and Communication</source>
          <year>2006</year>
          , WPNC'06 Hannover, Germany March 16
          <year>2006</year>
          . Proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Nummiaho</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vainikainen</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Melin</surname>
            <given-names>M.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Utilizing Linked Open Data Sources for Automatic Generation of Semantic Metadata</article-title>
          .
          <source>Metadata and Semantic Research 4th International Conference, MTSR</source>
          <year>2010</year>
          , Alcalá de Henares, Spain,
          <source>October 20-22</source>
          ,
          <year>2010</year>
          .
          <source>Proceedings. Metadata and Semantic Research</source>
          ,Communications in Computer and Information Science,
          <year>2010</year>
          , Volume
          <volume>108</volume>
          ,
          <fpage>78</fpage>
          -
          <lpage>83</lpage>
          , DOI: 10.1007/978-3-
          <fpage>642</fpage>
          -16552-
          <issue>8</issue>
          _
          <fpage>8</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Papadopoulos</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Troncy</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mezaris</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huet</surname>
            <given-names>B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Kompatsiaris</surname>
            <given-names>I.</given-names>
          </string-name>
          <article-title>Social Event Detection at MediaEval 2011: Challenges, Dataset and Evaluation</article-title>
          . In MediaEval 2011 Workshop, September 1-
          <issue>2</issue>
          ,
          <year>2011</year>
          , Pisa, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Vainikainen</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laakko</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giesecke</surname>
            <given-names>R.</given-names>
          </string-name>
          , Vesikivi P.
          <year>2011</year>
          .
          <article-title>Context awareness - portable profiles, HTML5 and advertiser's metadata</article-title>
          .
          <source>Next Media deliverable D3.0.1.2.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>