<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enterprise Multimedia Integration and Search</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>José-Manuel López-Cobo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katharina Siorpaes</string-name>
          <email>katharina.siorpaes@playence.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>STI Innsbruck, University of Innsbruck</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>playence</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Motivation</title>
      <p>"The next generation Web should not be based on the false assumption that
text is pre-dominant (…). The Web is a multimedia environment, which makes
for complex semantics." [1]</p>
      <p>With increasing bandwidth, cheaper storage of data, and improved
hardware, multimedia content is gaining importance. This does not only show
in Web portals but also in any other content-intensive environment in many
verticals. As rich medium, video can transport and conserve more information
than text ever could. This new type of content also creates new issues with
respect to search, integration, management and preservation: up until now,
heterogeneous data formats in text-based resources were a major challenge,
now integration of non-textual contents has to be tackled. Naturally, a rich
media asset is related to information and data residing in various information
systems. While text analysis has been researched since years and mature
solutions for tackling text annotation exist [2], the semantic annotation of
multimedia is still a hard problem. This cannot only be traced back to the
problems of image and audio analysis, but also to the fact that the
combination of both can lead to entirely new high-level semantics. The
automatic analysis of multimedia is only feasible to a very limited extent: still,
human contribution for interpreting meaning to create metadata for rich media
is required. In this paper, we describe playence Media, a system that aims at
the semi-automatic annotation, search, and integration of multimedia and
textual resources across information systems, taking semantics beyond text.
We first describe the system and then discuss future challenges of this work.</p>
    </sec>
    <sec id="sec-2">
      <title>Multimedia holistic view in the enterprise</title>
      <p>When dealing with multimedia assets, current information systems can only
rely on a weak annotation process due to the fact that only a limited set of
low-level metadata can be extracted: file title, resolution, size, length, and
other technical properties. In the best case, in specialized domains like media
production and consumption, annotation or tagging is done manually, using a
shared vocabulary and/or thesaurus.</p>
      <p>However, the level of integration of these multimedia assets with the rest of
the organization’s information systems is limited, due to the cost and the lack
of a holistic semantic model that could facilitate integration.</p>
      <p>In an organization,
maintaining different
processes for each data
format is not affordable.</p>
      <p>Therefore, we envision a
holistic view on multimedia
involving five
informationrelated processes (Figure 1).</p>
      <p>The most important feature
of this conceptual
architecture is that is it based
on semantic technologies,
enabling scalable data
integration [3]. playence
Media relies on ontologies,
providing shared
vocabularies and
terminologies. These models support all information-intensive processes:
annotation, interlinking, integration and search.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Annotation process</title>
      <p>The annotation process in playence Media can be done automatically by the
system or manually. The accuracy and precision of the automatic process
depends on the type of content.</p>
      <p>In the case of text, the system
is able to annotate content using
domain ontologies, pinpointing
the occurrence of a concept or an
instance in text. The annotation
process can be done using several
ontologies, thus empowering
users to have more than one point
of view (e.g. sales department and
engineering department have
differing foci and priorities on the
minutes of the same meeting).</p>
      <p>In the case of video or audio, automatic analysis has less accuracy and can
only contribute to the annotation partially. For content containing speech,
techniques of ASR (Automatic Speech Recognition) can be applied, obtaining
60-80% of accuracy depending on the technical conditions of the audio
channel, enabling speaker diarization [4] (e.g “who said what and when”). For
video or image analysis, high-level features can be obtained, like face
detection, detection of objects (or persons, or whatever other recognizable
object), daylight classification and other basic features.</p>
      <p>In those cases, collaborative manual contribution is still needed when
addressing media annotation. playence Media supports manual annotation in
three ways: informal tags, free text, and formal concepts and instances.
Concepts and instances can be chosen from a drop-down list following
predictive search or can be selected from a tree structure.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Integration</title>
      <p>Annotation will produce a corpus of documents – regardless the type of
content - annotated with the same set of domain ontologies. Thus, integration
is facilitated because content residing in all information systems (documents,
multimedia assets and previous legacy systems and databases) can be
accessed using the same queries. Data can easily be located, mashed-up and
displayed regardless its original source.</p>
      <p>In playence Media, we approach integration from a holistic point of view,
allowing users to find and relate assets with the same set of tools (domain
ontologies). A user of playence Media trying to find information about a
specific project (e.g. in a company where meetings are video recorded and
annotated through playence Media) will be able to locate not only project
meeting minutes (in text and video), but also other videos related with the
project (similar topics, similar people), memos and related documents as well
as previous meetings where the project was discussed.
2.3</p>
    </sec>
    <sec id="sec-5">
      <title>Search</title>
      <p>playence Media performs semantic search, using annotations and applying
faceted search and semantic navigation to narrow the set of results (Figure 3).
These techniques are complementary, allowing hybrid search and filtering
through concepts and tags. This empowers the user to find the specific asset
she is looking for regardless the technique.
When searching, playence Media makes use of Natural Language Processing
techniques like lemmatization or spell check. Semantic features are used in
query expansion, like synonym expansion through SKOS, or
generalizationspecialization expansion, using the “is-a” relationship and instances from
concepts involved, or using more complex relations in query expansion. Thus,
precision is enhanced as relevant assets (e.g. those in which concepts and
instances from the query and those expanded appear) will be presented in the
first results.</p>
    </sec>
    <sec id="sec-6">
      <title>Discussion and conclusion</title>
      <p>The challenges associated with multimedia integration in the enterprise are
manifold: Videos are dynamic and often only one portion of the video might
be relevant. Search must point to exactly these pieces of information.
Additionally, extracting text from speech as one source for annotation, the
quality of the transcription is a big concern. The creation and evolution of the
domain vocabulary or ontology underlying the semantic annotations and
search must be created so that it reflects the vocabulary relevant in the
organization. Additionally, evolution and consistency of annotations are
issues that have to be addressed. As multimedia analysis is not advanced,
manual input is required to a certain extent. Thereby, unobtrusive,
workflowintegrated and collaborative annotation and validation must be supported.
Creating high-quality links to the relevant documents or even portions in
documents is a must: media assets cannot be treated as isolated information
islands. Making use linked data for multimedia annotation involves questions
like selecting useful data sets and nodes as well as ensuring the quality of
these annotations. Interlinking of contents - regardless of their type, whether
they are rich media or textual contents - is a crucial requirement for efficient
and comprehensive knowledge management.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Tim</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          et al.
          <article-title>"A Framework for Web Science" in: Foundations and Trends in Web Science</article-title>
          , Vol.
          <volume>1</volume>
          , No 1, pp.
          <fpage>1</fpage>
          -
          <lpage>130</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Lawrence</given-names>
            <surname>Reeve</surname>
          </string-name>
          and Hyoil Han “
          <article-title>Survey of semantic annotation platforms” in:</article-title>
          <source>SAC '05: Proceedings of the 2005 ACM symposium on Applied computing, ACM</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Tom</given-names>
            <surname>Heath</surname>
          </string-name>
          and
          <string-name>
            <given-names>Enrico</given-names>
            <surname>Motta</surname>
          </string-name>
          . “
          <article-title>Ease of interaction plus ease of integration: Combining Web2.0 and the Semantic Web in a reviewing site”</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          , Volume
          <volume>6</volume>
          ,
          <string-name>
            <surname>Issue</surname>
            <given-names>1</given-names>
          </string-name>
          ,
          <string-name>
            <surname>February</surname>
            <given-names>2008</given-names>
          </string-name>
          , Pages
          <fpage>76</fpage>
          -83
          <string-name>
            <given-names>G.</given-names>
            <surname>Friedland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Vinyals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Müller: “Prosodic and other Long-Term Features for Speaker Diarization”</article-title>
          ,
          <source>IEEE Transactions on Audio, Speech, and Language Processing</source>
          , Vol
          <volume>17</volume>
          , No 5, pp
          <fpage>985</fpage>
          --
          <lpage>993</lpage>
          ,
          <year>July 2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>