<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SemWebVid - Making Video a First Class Semantic Web Citizen and a First Class Web Bourgeois*</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thomas Steiner</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Google Germany GmbH</institution>
          ,
          <addr-line>ABC-Straße 19, 20354 Hamburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>SemWebVid1 is an online Ajax application that allows for the automatic generation of Resource Description Framework (RDF) video descriptions. These descriptions are based on two pillars: first, on a combination of user-generated metadata such as title, summary, and tags; and second, on closed captions which can be user-generated, or be auto-generated via speech recognition. The plaintext contents of both pillars are being analyzed using multiple Natural Language Processing (NLP) Web services in parallel whose results are then merged and where possible matched back to concepts in the sense of Linking Open Data (LOD). The final result is a deep-linkable RDF description of the video, and a “scroll-along” view of the video as an example of video visualization formats.</p>
      </abstract>
      <kwd-group>
        <kwd>RDF</kwd>
        <kwd>LOD</kwd>
        <kwd>Linked Data</kwd>
        <kwd>Semantic Web</kwd>
        <kwd>NLP</kwd>
        <kwd>Video</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Over recent years the use of Resource Description Framework (RDF) in documents
has gained massive popularity with even mainstream media2 picking up stories of big
companies deploying RDF on their Web presence. However, these efforts have
mainly concentrated on textual documents in order to annotate concepts like shop
opening hours, prices, or contact data. Far fewer occurrences can be noted for RDF
video description on the public Web. Related efforts are automatic video content
extraction, or the W3C Ontology for Media Resource.</p>
      <p>The development of SemWebVid was driven by the following objectives:
• Improve searchability of video content by extraction of contained entities
and disambiguation of those entities (for queries like videos of Barack
Obama where he talks about Afghanistan while being abroad).
• Enable graphical representations of video content through symbolization
of entities (for e.g. video archives of keynote speeches where one could
graphically skim through long video sections at a glance).</p>
      <p>These goals can be reached through RDF video descriptions and we thus developed
SemWebVid to create RDF video descriptions in a potentially automatable way based
on live data found on YouTube.
2</p>
    </sec>
    <sec id="sec-2">
      <title>SemWebVid Dataflow</title>
      <p>The raw data for the RDF video descriptions consist of the beforementioned two
pillars: the user-generated video title, video (plaintext) description, and tags on the
one hand, and the user- or auto-generated3 closed captions on the other.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Curation of Raw Data</title>
      <p>Before the entity mapping and extraction steps, the raw data need to be curated. While
the video titles are typically trouble-free as they are usually very descriptive, the main
problem with the plaintext video descriptions is that sometimes they get abused for
non-related spam-like messages or comments rather than providing a proper summary
of the video content. Unfortunately this is hard to detect, so in the end we decided to
simply use them as is. With regards to tags the main issue are different tagging styles.
As an example see potential tags for the concept of the person Barack Obama:
• "barack", "obama" (all words separated, 2 tags)
• "barack obama" (space-separated, 1 tag)
• "barackobama" (separate words concatenated, 1 tag)
The first split style is especially critical if complete phrase segments are expressed in
tag form:
3 YouTube allows for auto-generation of closed captions through speech recognition:
http://youtube-global.blogspot.com/2010/03/future-will-be-captioned-improving.html
• "one", "small", "step", "for", "a", "man"
For our demonstration we use an API from Bueda4 in order to split combined tags into
their components and try to make sense of split tags. We use Common Tag to
represent tags in the application.</p>
      <p>The curation step for closed captions mainly consists of removing speaker and
hearable events syntax noise from the plaintext contents, and obviously the cues (time
markers for each caption). This can be easily done using regular expressions, the
syntax being a variation of "&gt;&gt;Speaker:" and "[Hearable Event]".
4</p>
    </sec>
    <sec id="sec-4">
      <title>Entity Extraction and Mapping</title>
      <p>
        We try to map the list of curated tags back to entities using plaintext entity mapping
Web services5 from DBpedia [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], Sindice [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], Uberblic, and Freebase. This works
quite well for very popular tags (samples below from the DBpedia URI Lookup Web
service, all results are prefixed with http://dbpedia.org/resource/"):
• "barack obama" =&gt; Barack_Obama
It somewhat succeeds for very generic tags (though with obvious ambiguity issues):
• "obama" =&gt; Obama,_Fukui
It fails for specific tags ("ggj09" was a tag for the event "Global Game Jam 2009"):
• "ggj09" =&gt; N/A
It is thus very important to preserve provenance data in order to judge and estimate
the quality of the mapped entities. With regards to the curated closed captions,
description, and title we work with NLP Web services6, namely OpenCalais,
Zemanta, and AlchemyAPI. For the test cases we used (famous speeches, keynotes)
results were relatively accurate from our judgments. In a final step the detected
entities are merged, and a symbolization for each entity gets retrieved by means of a
heuristic approach, including Google image search.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Description of the SemWebVid Demonstration</title>
      <p>SemWebVid is designed to be an online Ajax application for interactive use.
Unfortunately the terms and conditions of some of the NLP Web services involved do
4 http://www.bueda.com/developers
5 http://platform.uberblic.org, http://www.freebase.com, http://sindice.com, http://dbpedia.org
6 http://opencalais.com, http://zemanta.com, http://alchemyapi.com
not allow for a SemWebVid API, however, due to its design both on-the-fly RDF
description generation and permanent linking to previous descriptions are possible.</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and Future Work</title>
      <p>While we are not the first7 to connect RDF (and thus Linked Data) with video,
SemWebVid's contribution is to present an automatic text-based way to generate RDF
video descriptions. Future work is among other things to determine whether the
expected searchability improvements pay off the high processing efforts.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobilarov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ives</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>DBpedia: A Nucleus for a Web of Open Data</article-title>
          .
          <source>In: Proc. of 6th Int. Semantic Web Conf., 2nd Asian Semantic Web Conf., November</source>
          <year>2008</year>
          , pp.
          <fpage>722</fpage>
          -
          <lpage>735</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Oren</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delbru</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Catasta</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stenzhorn</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tummarello</surname>
          </string-name>
          , G.:
          <article-title>Sindice.com - A Document-oriented Lookup Index for Open Linked Data</article-title>
          .
          <source>International Journal of Metadata, Semantics and Ontologies</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ),
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>7 Sack, H: http://www.hpi.uni-potsdam.de/fileadmin/hpi/FG_ITS/papers/Harald/DSMSA09.pdf</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>