<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Connecting Foundational Ontologies with MPEG-7 Ontologies for Multimodal QA</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Massimo Romanelli</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Sonntag</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Norbert Reithinger</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>- In the SMARTWEB project [1] we aim at developing a context-aware, mobile, and multimodal interface to the Semantic Web. In order to reach this goal we provide a integrated ontological framework offering coverage for deep semantic content, including ontological representation of multimedia based on the MPEG-7 standard1. A discourse ontology covers concepts for multimodal interaction by means of an extension of the W3C standard EMMA2. For realizing multimodal/multimedia dialog applications, we link the deep semantic level with the mediaspecific semantic level to operationalize multimedia information in the system. Through the link between multimedia representation and the semantics of specific domains we approach the Semantic Gap.</p>
      </abstract>
      <kwd-group>
        <kwd>Multimedia systems</kwd>
        <kwd>Knowledge representation</kwd>
        <kwd>Multimodal ontologies</kwd>
        <kwd>ISO standards</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>W cations with question answering (QA) functionality</p>
      <p>
        ORKING with multimodal, multimedia dialog
appliassumes the presence of a knowledge model that ensures
appropriate representation of the different levels of descriptions.
Ontologies provide instruments for the realization of a well
modeled knowledge base with specific concepts for different
domains. For related work, see e.g. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]–[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>Within the scope of the SMARTWEB project3 we
realized a multi-domain ontology where a media ontology based
on MPEG-7 supports meta-data descriptions for multimedia
audio-visual content; a discourse ontology based on the W3C
standard EMMA covers multimodal annotation. In our
approach we assign conceptual ontological labels according to
the ontological framework (figure 1) to either complete
multimedia documents, or entities identified therein. We employ
an abstract foundational ontology as a means to facilitate
domain ontology integration (combined integrity, modeling
consistency, and interoperability between the domain
ontologies). The ontological infrastructure of SMARTWEB, the</p>
      <p>This research was funded by the German Federal Ministry for Education
and Research under grant number 01IMD01A.</p>
      <p>
        M. Romanelli, D. Sonntag and N. Reithinger are with DFKI GmbH –
German Research Center for Artificial Intelligence, Stuhlsatzenhausweg 3,
d-66123 Saarbru¨cken, Germany {romanell,sonntag,bert}@dfki.de
1http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm
2http://www.w3.org/TR/emma/
3SMARTWEB aims to realize a mobile and multimodal interface to
Semantic Web Services and ontological knowledge bases [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The project moves
through three scenarios: handheld, car, and motorbike. In the handheld
scenario the user is able to pose multimodal closed- and open-domain questions
using speech and gesture. The system reacts with a concise and short answer
and the possibility to browse pictures, videos, or additional text information
found on the Web or in Semantic Web sources
(http://www.smartwebproject.de/).
      </p>
      <p>
        SWINTO (SmartWeb Integrated Ontology), is based on a
upper model ontology realized by merging well chosen concepts
from two established foundational ontologies, DOLCE [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
and SUMO [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], into the SMARTWEB foundational ontology
SMARTSUMO [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The domain-specific knowledge like
sportevent, navigation, or webcam is defined in dedicated
ontologies modeled as sub-ontologies of SMARTSUMO. Semantic
integration takes place for heterogeneous information sources:
extraction results from semi-structured data such as tabular
structures which are stored in an ontological knowledge base
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], and hand-annotated multimedia instances such as images
of football teams. In addition, Semantic Web Services deliver
MPEG-7 annotated city maps with points of interest.
Fig. 1. SMARTWEB’s Ontological Framework for Multimodal QA
      </p>
    </sec>
    <sec id="sec-2">
      <title>II. THE DISCONTO AND SMARTMEDIA ONTOLOGIES</title>
      <p>
        The SWINTO supplements QA specific knowledge in a
discourse ontology (DISCONTO) and represents multimodal
information in a media ontology (SMARTMEDIA). The
DISCONTO provides concepts for dialogical interaction with the
user and with the Semantic Web sub-system. The DISCONTO
models multimodal dialog management using SWEMMA,
the SMARTWEB extention of EMMA, dialog acts, lexical
rules for syntactic-semantic mapping, HCI concepts (pattern
language for interaction design), and semantic question/answer
types. Concepts for QA functionality are realized with the
discourse:Query concept specifying emma:interpretation. It
models the user query to the system in form of a partially
filled ontology instance. The discourse:Result concept
references information the user is asking for [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In order to
efficiently search and browse multimedia content SWINTO
specifies a multimedia sub-ontology called SMARTMEDIA.
SMARTMEDIA is an ontology for semantic annotations based
on MPEG-7. It offers an extensive set of audio-visual
descriptions for the semantics of multimedia [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Basically, the
SMARTMEDIA ontology uses the MPEG-7 multimedia content
description and multimedia content management (see [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] for
details on description schemes in MPEG-7) and enriches it to
account for the integration with domain-specific ontologies.
A relevant contribution of MPEG-7 in SMARTMEDIA is the
representation of multimedia decomposition in space, time,
and frequency as in the case of the general
mpeg7:SegmentDecomposition concept. In addition we use file format and
coding parameters (mpeg7:MediaFormat, mpeg7:MediaProfile,
etc.).
      </p>
      <p>Fig. 2. The SWINTO - DISCONTO - SMARTMEDIA Connection</p>
    </sec>
    <sec id="sec-3">
      <title>III. CLOSING THE SEMANTIC GAP</title>
      <p>In order to close the Semantic Gap deriving from the
different levels of media representations, namely the surface
level referring to the properties of realized media as in the
SMARTMEDIA, and the deep semantic representation of these
objects, the smartmedia:aboutDomainInstance property with
range smartdolce:entity has been added to the top level class
smartmedia:ContentOrSegment (see fig. 2). In this way the
link to the upper model ontology is inherited to all segments of
a media instance decomposition, so that we can guarantee for
deep semantic representation of the SMARTMEDIA instances
referencing the specific media object, or the segments making
up the decomposition. Through the discourse:hasMedia
property with range smartmedia:ContentOrSegment located in the
smartdolce:entity top level class and inherited to each concept
in the ontology, we realize a pointer back to the SMARTMEDIA
ontology.</p>
      <p>This type of representation is useful for pointing gesture
interpretation and co-reference resolution. A map which is
obtained from the Web Services to be displayed on the screen
shows selectable objects (e.g. restaurants, hotels), and the
map is represented in terms of an mpeg7:StillRegion instance,
decomposed into different mpeg7:StillRegion instances for
each object segment of the image. The MPEG-7 instances are
linked to a domain-specific instance, i.e., the deep semantic
description of the picture (in this case the smartsumo:Map) or
the segment of picture (e. g., navigation:ChineseRestaurant).
In this way the user can refer to the restaurant by touching
on the displayed map. Hence a multimodal fusion component
can directly process the referred navigation:ChineseRestaurant
instance performing linguistic co-reference resolution: What’s
the phone number here?</p>
    </sec>
    <sec id="sec-4">
      <title>IV. CONCLUSION</title>
      <p>We presented the connection of our foundational ontology
with an MPEG-7 ontology for multimodal QA in the context
of the SMARTWEB project. The foundational ontology is
based on two upper model ontologies and offers coverage
for deep semantic ontologies in different domains. To capture
multimedia low-level semantics we adopted an MPEG-7 based
ontology that we connected to our domain-specific concepts
by means of relations in the top level classes of the SWINTO
and SMARTMEDIA. This work enables the system the use of
multimedia in a multimodal context like in the case of mixed
gesture and speech interpretation, where every object that is
visible on the screen must have a comprehensive ontological
representation in order to be identified on the discourse level.</p>
    </sec>
    <sec id="sec-5">
      <title>ACKNOWLEDGMENT</title>
      <p>We would like to thank our partners in SmartWeb. The
responsibility for this article lies with the authors.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Wahlster</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Smartweb: Mobile applications of the semantic web</article-title>
          . In: P. Dadam and M. Reichert, editors,
          <source>GI Jahrestagung</source>
          <year>2004</year>
          , Springer,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Reyle</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saric</surname>
          </string-name>
          , J.:
          <source>Ontology Driven Information Extraction, Proceedings of the 19th Twente Workshop on Language Technology</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Lopez</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motta</surname>
          </string-name>
          , E.:
          <article-title>Ontology-driven Question Answering in AquaLog</article-title>
          <source>In Proceedings of 9th International Conference on Applications of Natural Language to Information Systems (NLDB)</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Niekrasz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Purver</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A multimodal discourse ontology for meeting understanding</article-title>
          . In Bourlard, H. and
          <string-name>
            <surname>Bengio</surname>
          </string-name>
          , S., editors,
          <source>Proceedings of MLMI'05. LNCS.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Nirenburg</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raskin</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          : Ontological Semantics, MIT Press,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Reithinger</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bergweiler</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Engel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herzog</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pfleger</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romanelli</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Sonntag</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>A Look Under the Hood - Design and Development of the First SmartWeb System Demonstrator</article-title>
          .
          <source>In: Proc. ICMI</source>
          <year>2005</year>
          , Trento,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Gangemi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guarino</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Masolo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oltramari</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schcneider</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Sweetening Ontologies with DOLCE</article-title>
          .
          <source>In Proc. of the 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW02)</source>
          , volume
          <volume>2473</volume>
          of Lecture Notes in Computer Science, Sigu¨nza, Spain,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Niles</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pease</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Towards a Standard Upper Ontology</article-title>
          .
          <source>In Proc. of the 2nd International Conference on Formal Ontology in Information Systems (FOIS-</source>
          <year>2001</year>
          ),
          <string-name>
            <given-names>C.</given-names>
            <surname>Welty</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Smith</surname>
          </string-name>
          , Ogunquit, Maine,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eberhart</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hitzler</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oberle</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Studer</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>The smartweb foundational ontology</article-title>
          .
          <source>Technical report, Institute for Applied Informatics and Formal Description Methods</source>
          (AIFB) University of Karlsruhe, SmartWeb Project, Karlsruhe, Germany,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Buitelaar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Racioppa</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Siegel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Ontology-based Information Extraction with SOBA</article-title>
          <source>In Proc. of the 5th Conference on Language Resources and Evaluation (LREC</source>
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Sonntag</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romanelli</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <string-name>
            <given-names>A Multimodal</given-names>
            <surname>Result</surname>
          </string-name>
          <article-title>Ontology for Integrated Semantic Web Dialogue Applications</article-title>
          .
          <source>In Proc. of the 5th Conference on Language Resources and Evaluation (LREC</source>
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Benitez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rising</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jorgensen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leonardi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bugatti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasida</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mehrotra</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tekalp</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ekin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walker</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Semantics of Multimedia in MPEG-7</article-title>
          .
          <source>In Proc. of IEEE International Conference on Image Processing (ICIP)</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Hunter</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Adding Multimedia to the Semantic Web - Building an MPEG-7 Ontology</article-title>
          .
          <source>In Proc. of the Internatioyesnal Semantic Web Working Symposium (SWWS)</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>