<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MultiMedia Metadata Management: a Proposal for an Infrastructure</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Patrizia Asirelli</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Suzanne Little</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Massimo Martinelli</string-name>
          <email>Massimo.Martinelli@isti.cnr.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ovidio Salvetti</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2006</year>
      </pub-date>
      <abstract>
        <p>- The management and exchange of multimedia data is a challenging area of research due to the variety of formats, standards and the many interesting intended applications. Semantic web technologies are very promising to enable interoperability and integration of media. Many research groups are active in finding and proposing interesting solutions or standards. Within the MUSCLE NoE research is focusing on standards, technologies and techniques for integrating, exchanging and enhancing the use of multimedia within a variety of research areas. At CNR ISTI, we are developing an infrastructure for MultiMedia Metadata Management (4M) to support the integration of media from different sources. This infrastructure enables the collection, analysis and integration of media for semantic annotation, search and retrieval. In this paper we discuss the independent units that are used within the infrastructure and the semantic web technologies that are being used to support them.</p>
      </abstract>
      <kwd-group>
        <kwd>Multimedia</kwd>
        <kwd>Metadata</kwd>
        <kwd>Semantic Annotations</kwd>
        <kwd>Semantic Web</kwd>
        <kwd>Information Integration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>Tdue to the availability of off-the-shelf, modern digital
HE production of multimedia data is rapidly increasing
devices that can be used by even inexperienced users. It is
likely that this volume of information will only increase in the
future.</p>
      <p>
        Multimedia management on the Web is a hot topic and
many research teams, projects and working groups are active
in this area. To mention only few within the European
framework, see for example W3C [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], DELOS [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], aceMedia
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], MUSCLE [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], etc. In particular, MUSCLE (Multimedia
Understanding through Semantics, Computation and
Learning) is a Network of Excellence (NoE) that aims at
establishing closer collaboration between research groups in
multimedia data mining and machine learning. Within
MUSCLE we are working to establish possible strategies for
the interoperability of multimedia groups, mainly focusing on
the representation and communication of data and metadata
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] in order to enable interaction and exchange of metadata
emanating from different multimedia modalities.
      </p>
      <p>Facilitating the exchange of documents within the signals
and imaging domain is an interesting and challenging problem
due to the variety of content, formats, modes and standards
used. The challenge is to provide an infrastructure that enables
disparate groups to integrate, combine and disseminate
research data. The achievement of this goal requires the use of
standards and the development of tools to assist in the
extraction and conversion of multimedia metadata.</p>
      <p>Thus, our activity is mostly concerned with the setting up of
a methodology for the NoE to develop, maintain, and facilitate
the exchange of multimedia metadata and data sets. To this
purpose, we aim to provide an integrated metadata
environment to support different metadata standards and tools
for browsing, search, media transformation and dissemination.</p>
      <p>In this paper we present an infrastructure for the integration
of multimedia metadata and their management that we have
proposed within the MUSCLE NoE.</p>
      <p>While designing this infrastructure we have considered its
use in two main contexts. These applications cover a range of
requirements in both personal and professional management
of multimedia information:
a. The management of personal multimedia collections of
data (e.g., photos, videos, music etc) that includes the
archiving and the retrieval of specific items under
particular semantic conditions (e.g., photos showing
smiling persons, etc.);
b. The management of professional multimedia data
within a network to share multimedia resources and
related semantic information, where ownership and
authorization rights should be taken into account.</p>
      <p>Therefore, the following capabilities should be provided:
• to store, organize and retrieve distributed multimedia
resources;
• to manage algorithms for information processing;
• to add semantic annotations;
• to access, protect and/or share information.</p>
      <p>Our proposed infrastructure has been designed taking into
account the use of (i) Semantic Web technology; (ii)
multimedia metadata standards; (iii) existing tools, (iv)
opensource software. In particular, we propose an infrastructure
composed of five main units: an MPEG-7 feature extraction
and processing unit, an XML database management unit, an
algorithm unit, a multimedia semantic annotation unit and an
integration unit. In Section 3, the characteristics of the
infrastructure are presented and each units is discussed with
respect to its purpose and the issues it raises. In Section 4, we
discuss the use of ontologies and their integration to facilitate
interoperability. Section 5 presents possible future work and
conclusions.</p>
    </sec>
    <sec id="sec-2">
      <title>II. RELATED WORK</title>
      <p>
        The multi-dimensional nature of multimedia metadata and
the challenges this presents when integrating media,
particularly in a web-based system, is a well-known problem
[
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. A variety of standards to describe and define
multimedia objects and their contents have been proposed
such as MARC [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], Dublin Core [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], VRA Core [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], LOM
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], DIG35 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], MPEG-7 [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and MPEG-21 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. A
general comparison and review of these standards can be
found at http://muscle.isti.cnr.it/.
      </p>
      <p>
        The use of technologies coming from Semantic Web,
promoted by the W3C office, could facilitate the overall
vision of distributed, machine readable metadata on Internet.
To enable this scenario, standardized frameworks have been
developed to express semantic relationships between
resources (RDF [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]), ontologies describing domain classes
and their properties (RDFS [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and OWL [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]).
      </p>
      <p>
        Multimedia on the Semantic Web is a topic of some interest
with the chartering of a W3C Incubator Group [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] to discuss
issues relating to multimedia integration using semantic web
technologies. In addition Van Ossenbruggen et al. [
        <xref ref-type="bibr" rid="ref19 ref20">19, 20</xref>
        ]
discuss some of the specific requirements for integrating and
applying multimedia within a semantic web infrastructure.
      </p>
      <p>
        Related work is also being conducted by Dasiopoulou et al.
[
        <xref ref-type="bibr" rid="ref21 ref22">21, 22</xref>
        ] who have proposed a similar framework for analysing
and integrating image-based data only.
      </p>
      <p>III. CHARACTERISTICS OF THE INFRASTRUCTURE</p>
      <p>Fig. 1 illustrates the 4M infrastructure at a high level. The
infrastructure consists of five units: an MPEG-7 feature
extraction and processing unit (M), an XML database
management unit (X), an algorithms ontology unit (O), a
multimedia semantic annotation unit (A) and an integration
unit (I).</p>
      <p>
        In particular:
• Unit “M” is devoted to MPEG-7 features extraction
and processing from multimedia objects;
• Unit “X”, based on an XML [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] database, is a
repository of MPEG-7 features organized as XML
files;
• Unit “O” is based on an ontology of algorithms
describing processes and procedures that can be used
to produce and elaborate multimedia objects;
• Unit “A” offers tools for the annotation of multimedia
objects to describe specific semantic information;
• Unit “I” provides interfaces and tools to integrate and
access the overall set of units.
&lt;?xml version=“1.0” ?&gt;
&lt;mpeg-7&gt;
      </p>
      <p>…
&lt;/mpeg-7&gt;</p>
    </sec>
    <sec id="sec-3">
      <title>MPEG-7 feature processing</title>
      <p>M
DB
X</p>
    </sec>
    <sec id="sec-4">
      <title>Algorithm</title>
    </sec>
    <sec id="sec-5">
      <title>Semantic Annotation</title>
      <p>O
A
I</p>
    </sec>
    <sec id="sec-6">
      <title>Integration</title>
      <p>The implementation of the proposed 4M infrastructure is
being conducted in parallel on the different units. Each unit is
composed of multiple sub-components (tools, inferencing
engines, ontologies, etc.) This section describes the intended
purpose and challenges of each of these units.</p>
      <p>A. MPEG-7 Features Extraction and Processing Unit
MPEG-7 is the most mature and widely recognized
standard for multimedia description. Furthermore its format,
that is XML, facilitates interoperability with other metadata
standards. Our need for using different multimedia objects
drives us to adopt a system able to extract features from
multimedia objects with a high level of interoperability.</p>
      <p>We begun by collecting information on metadata schemas
and frameworks standards and considered MPEG-7: among
programs allowing MPEG-7 features extraction, few are
completely open-source and fewer are able to extract features
from different kinds of multimedia objects. Thus we decided
to build an integrated system able to extract MPEG-7 features
from audio, video, images and text by combining and
extending existing open-source programs.</p>
      <p>Currently, we are able to extract almost all MPEG-7
features from audio, color and texture from still-images, while
for video we are still investigating a solution.</p>
      <p>A tool has been implemented able to extract all features
together from a multimedia object, building the XML files of
MPEG-7 descriptors ready to be used to populate the XML
Database.</p>
      <p>An example of MPEG-7 XML feature extraction from a
still image is shown in Fig. 2.</p>
      <p>As it is nowadays widely recognized, MPEG-7, among its
advantages related to its completeness to represent metadata
of image, video and sound and its suitability to be used in
connection with Semantic Web technology, still presents
important limitations. Thus, we worked towards possible
approaches for extending it by adding some semantics, in
particular for annotation, (intelligent) retrieval and, possibly,
reasoning.</p>
      <p>To this aim we looked at existing tools and, in particular,
tools to define and manage ontologies, and the integration of
existing ones.
&lt;?xml version="1.0" encoding="UTF-8" ?&gt;
&lt;Mpeg7 xmlns="urn:mpeg:mpeg7:schema:2001"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:mpeg7="urn:mpeg:mpeg7:schema:2001"
xsi:schemaLocation="urn:mpeg:mpeg7:schema:2001
Mpeg72001.xsd"&gt;
&lt;Description xsi:type="ContentEntityType"&gt;
&lt;MultimediaContent xsi:type="ImageType"&gt;
&lt;Image&gt;
&lt;MediaInformation&gt;
&lt;MediaProfile&gt;
&lt;MediaFormat&gt;
&lt;Content href="MPEG7ContentCS"&gt;
&lt;Name&gt;image&lt;/Name&gt;
&lt;/Content&gt;
&lt;FileFormat
href="urn:mpeg:mepg7:cs:VisualCodingFormatCS:2001:1"&gt;
&lt;Name&gt;jpg&lt;/Name&gt;
&lt;/FileFormat&gt;
&lt;FileSize&gt;16367&lt;/FileSize&gt;
&lt;Frame height="400" width="500" /&gt;
&lt;/MediaFormat&gt;
&lt;MediaInstance&gt;
&lt;MediaLocator&gt;
&lt;MediaUri&gt;pict2.jpg&lt;/MediaUri&gt;
&lt;/MediaLocator&gt;
&lt;/MediaInstance&gt;
&lt;/MediaProfile&gt;
&lt;/MediaInformation&gt;
&lt;/Image&gt;
&lt;/MultimediaContent&gt;
&lt;/Description&gt;
. . .
&lt;VisualDescriptor xsi:type="DominantColorType"&gt;
&lt;SpatialCoherenchy&gt;1&lt;/SpatialCoherenchy&gt;
&lt;Values&gt;
&lt;Percentage&gt;0&lt;/Percentage&gt;
&lt;ColorValueIndex&gt;6 4 16&lt;/ColorValueIndex&gt;
&lt;Variance&gt;0 0 0&lt;/Variance&gt;
&lt;/Values&gt;
. . .
&lt;/VisualDescriptor&gt;
&lt;VisualDescriptor xsi:type="ScalableColorType"</p>
      <p>numOfCoeff="64" numOfBitplanesDiscarded="3"&gt;
&lt;Coeff&gt;-143 40 57 27 20 18 16 9 54 31 21 23 -5 -1 &lt;/Coeff&gt;
&lt;/VisualDescriptor&gt;
&lt;VisualDescriptor xsi:type="ColorLayoutType"&gt;
&lt;YDCCoeff&gt;56&lt;/YDCCoeff&gt;
&lt;CbDCCoeff&gt;36&lt;/CbDCCoeff&gt;
&lt;CrDCCoeff&gt;51&lt;/CrDCCoeff&gt;
&lt;YACCoeff5 xmlns=""&gt;28 20 20 12 11&lt;/YACCoeff5&gt;
&lt;CbACCoeff2 xmlns=""&gt;12 15&lt;/CbACCoeff2&gt;
&lt;CrACCoeff2 xmlns=""&gt;1 10&lt;/CrACCoeff2&gt;
&lt;/VisualDescriptor&gt;
. . .
&lt;/Mpeg7&gt;</p>
      <sec id="sec-6-1">
        <title>B. XML Database Management Unit</title>
        <p>In order to manage information extracted by the previous
unit we have decided to use a XML Database. This way, the
internal representation of MPEG-7 descriptors can be directly
inserted onto the database and data structures should be
extended only to include additional descriptors. Furthermore,
our priorities were also to fulfil the following requirements:
fine-grained representation, access and update; typed
representation and access; structured indexing; Java interface;
multi-user access and extensibility.</p>
        <p>
          We examined four possible open-source projects –
Berkeley DB XML [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], eXist [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], Ozone XML [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] and
Xindice [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]. All of these solutions have pros and cons,
however no one solution fulfils all of our requirements.
Overall we found that eXist [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] provided the most stable
implementation and the critical features we desired. In
addition eXist has a very active community of support.
        </p>
        <p>
          eXist is an open source, native XML database featuring
efficient, index-based XQuery processing [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ], automatic
indexing, extensions for full-text search, XUpdate and a Java
interface. At present eXist has been installed and tested within
the database unit. To accomplish the second use case
(professional multimedia collections) we extended eXist in
order to give to the administrator better handle user groups.
        </p>
        <p>We created collections of MPEG-7 XML documents on the
base of the multimedia content type. Java classes have been
implemented able to query the collections using XQuery
language in order to extract low-level features and select
multimedia objects by similarity. An interface has been also
implemented to search for images in the database (see
Integration). Given that an URI (Unique Resource Identifier)
is a basic building block for Semantic Web applications, we
denote every multimedia object by a unique identifier, named
MediaURI, that includes the type of the object and a hash of
the object content. Through a MediaURI, any multimedia
object is univocally identified and can be accessed in our
XML database.</p>
      </sec>
      <sec id="sec-6-2">
        <title>C. Algorithm Unit</title>
        <p>Algorithms for image analysis (e.g., edge detection, noise
reduction, segmentation etc.) are difficult to manage,
understand and apply, particularly for non-expert users. For
instance, a researcher needs to reduce the noise and improve
the contrast in a radiology image prior to analysis and
interpretation but is unfamiliar with the specific algorithms
that could apply in this instance. This unit aims to provide
user support for the discovery, orchestration and application
of media analysis algorithms. This enables users to define,
store and retrieve the procedures by which multimedia objects
have been produced or processed.</p>
        <p>
          Quantifying and integrating knowledge related to analysis
algorithms for media, particularly describing visual outcomes,
is a challenging problem. Currently there exists a
taxonomy/thesaurus for image analysis algorithms [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ] but
this is insufficient to support the required functionality. We
are collaborating on expanding and converting this taxonomy
to an OWL ontology. Challenges include:
• articulating and quantifying the ‘visual’ result of
applying algorithms;
• finding and associating practical example media
with the algorithms specified;
• integrating and harmonizing the ontologies;
• reasoning with and applying the knowledge in the
algorithm ontology (e.g., using input and output
formats to align processes)
        </p>
        <p>Our proposed solution is to use the algorithm ontology to
record and describe available algorithms for application to
image analysis. This ontology can then be used to
interactively build sequences of algorithms to achieve
particular user outcomes or goals in accordance with the
user’s preferences. In addition, the record of processes applied
to the source image can be used to define the history and
provenance of data.</p>
        <p>An example of problem that could be addressed by the
algorithm ontology could be the suggestion of possible
clinical descriptors (e.g.: pneumothorax) given a chest x-ray.</p>
        <p>An hypothesis of solution could be
1) Get a digital chest x-ray of patient P (image A).
2) Apply on image A a digital filter to improve the
signalto-noise ratio (image B).
3) Apply on image B a region detection algorithm. This
algorithm segments image B according to a partition of
homogeneous regions (image C).
4) Apply on image C an algorithm that 'sorts' according to
a given criterion the regions by their geometrical and
densitometric properties (from largest to smallest, from
darkest to clearest, etc.) (array D).
5) Apply on array D an algorithm that searching on a
database of clinical descriptors detects the one that best fits
the similarity criterion (result E).</p>
        <p>However, we should consider the following aspects:
step 2) Which digital filter should be applied on image A?
We can consider different kinds of filters (Fourier, Wiener,
Smoothing, etc. ) each one having different input-output
formats and giving slightly different results.
step 3) Which segmentation algorithm should be used? We
can consider different algorithms (clustering, histogram,
homogeneity criterion, etc.).
step 4) How can we define geometrical and densitometric
properties of the regions? There are several possibilities
depending on the considered mathematical models for
describing closed curves (regions) and the grey level
distribution inside each region (histogram, Gaussian-like,
etc.). step 5) How can we define similarity between
patterns? There are multiple approaches that can be applied
(vector distance, probability, etc.).</p>
        <p>Each step could be influenced by the previous ones.
Finally, there are two types or levels of interoperability to be
considered:
1) low-level interoperability, concerning data formats and
algorithms, their transition or selection aspects among the
different steps and consequently the possible related
ontologies (algorithm ontology, media ontology, etc.);
2) high-level interoperability, concerning the semantics at
the base of the domain problem, that is how similar
problems (segment this image; improve image quality) can
be faced or even solved using codified 'experience'
extracted from well-known case studies ?</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>We focused our attention mainly on the latter.</title>
      <sec id="sec-7-1">
        <title>D. Multimedia Semantic Annotation Unit</title>
        <p>This unit addresses the issue of (semi-)automatic semantic
annotation of multimedia. It aims to exploit the standardized
media analysis data produced by the MPEG-7 Feature
Extraction and Processing Unit (M) and integrate technologies
such as semantic inferencing rules and machine-learning
approaches to associate domain terms with media objects.
Semantic annotations are highly valuable but generally
expensive to create manually and can be overly subjective.
Quality semantic annotation of media can facilitate
sophisticated semantic search and retrieval, re-use of media
objects and support advanced reasoning applications.</p>
        <p>
          The breach between the automatically extracted, low-level
feature metadata and the difficult-to-generate, high-level,
semantic metadata is often termed the “semantic gap”.
Smeulders et al. define it as “the discrepancy between the
information that one can extract from the visual data, and the
interpretation that the same data has for a user” [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]. Bridging
or otherwise mitigating this divide is an area of great interest
within the multimedia field (e.g., [
          <xref ref-type="bibr" rid="ref31 ref32">31, 32</xref>
          ]). Existing
multimedia annotation tools, such as IBM Multimodal
Annotation Tool (alphaWorks) [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ], aceMedia M-Ontomat
Annotizer [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ] and Caliph-Emir [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ], support user annotation
and use multimedia standards or models such as MPEG-7 or
the aceMedia ontology. However, these tools are limited for
integration with a java, web-based infrastructure and don’t
provide the necessary level of automation.
        </p>
        <p>
          Therefore, previous work by Hunter and Little [
          <xref ref-type="bibr" rid="ref36 ref37">36, 37</xref>
          ] is
being extended with machine-learning approaches to relate
low-level media analysis data to high-level semantic terms
defined in an ontology. Domain terms are defined by specific
ontologies such as GO [
          <xref ref-type="bibr" rid="ref38">38</xref>
          ], MeSH [
          <xref ref-type="bibr" rid="ref39">39</xref>
          ], FOAF [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ], Wordnet
[
          <xref ref-type="bibr" rid="ref41">41</xref>
          ] etc. for particular application areas. Semantic inferencing
rules can be used to define relationships between features
(color, shape, texture etc.) and domain concepts within the
ontologies. We are investigating a hybrid approach involving
the use of Multi-level Artificial Neural Networks (MANN) to
specialize the rules and exploit the relationships defined in the
ontologies.
        </p>
        <p>Finally, annotations can also include user-created,
naturallanguage, subjective comments relating to media objects or
possibly media objects themselves (e.g., an audio
commentary) that can be associated with a media segment.
Future work in this area will investigate how to record, store
and manage other annotation types in conjunction with the 4M
infrastructure.</p>
      </sec>
      <sec id="sec-7-2">
        <title>E. Integration Unit</title>
        <p>
          All of the units previously discussed, interact and can be
accessed and managed by an Integration unit which supports
the retrieval and insertion of information through suitable
tools and interfaces. The principal purpose of this unit is to
provide interfaces and controllers between the individual units
and the user. To assist in this we are investigating inference
engines based on OWL and SPARQL [
          <xref ref-type="bibr" rid="ref42">42</xref>
          ], and Java tools,
such as Jena [
          <xref ref-type="bibr" rid="ref43">43</xref>
          ] and Jess [
          <xref ref-type="bibr" rid="ref44">44</xref>
          ].
        </p>
        <p>At present, a web-based interface has been implemented so
that a user can select a sound or image from the database,
choose a set of features and then extract from the database all
the sounds and images which are similar to a given one
according to the features themselves. The interface has been
realized using Java Server Page forms sending the selected
parameters to a Java Servlet. Apache Tomcat has been used as
application server.</p>
        <p>Overall the independence of the units provides a scalable
infrastructure that allows new technologies to be easily
integrated.</p>
        <p>IV. INTEGRATION AND EXTENSION OF ONTOLOGIES
Within the infrastructure described in section III, a number
of requirements for structured, formal definitions of concepts
can be identified. The best way to approach this is by means
of ontology that provides a method for structuring a universe
of discourse and the possibility to increase such given
knowledge through inference engines and inferred knowledge.
The use of ontologies is necessary to support: interoperability
of multimedia metadata; advanced reasoning using low-level
data (e.g., for pattern detection, semantic annotation, etc.);
semantic search and retrieval and integration and application
of analysis algorithms. Therefore not only one, but five
distinct ontologies are required:
• Multimedia ontology – media types, descriptions of
low-level features, creation metadata, etc.;
• Algorithm ontology (described in section III.C);
• Possibly a web-services ontology for the algorithm
unit (e.g., OWL-S);
• Domain ontologies – recording domain specific terms
and concepts (e.g., MeSH, FOAF, etc.);
• Upper or core ontologies for integration;
We are now working on defining, extending and integrating
these ontologies.</p>
      </sec>
      <sec id="sec-7-3">
        <title>A. Multimedia Ontology – MPEG-7</title>
        <p>
          A number of projects have used the MPEG-7 standard to
derive a multimedia ontology [
          <xref ref-type="bibr" rid="ref45 ref46">45, 46</xref>
          ]. However, extensions
are required to the MPEG-7 standard to define specific
lowlevel analysis features such as ‘eccentricity’, ‘ColorRange’,
etc. Within the 4M infrastructure, this is important to integrate
with the output definitions in the algorithm ontology. Previous
work by Hollink et al. [
          <xref ref-type="bibr" rid="ref47">47</xref>
          ] describes some extensions to
Hunter’s MPEG-7 ontology by creating subproperties of the
visual descriptor to incorporate analysis terms.
        </p>
      </sec>
      <sec id="sec-7-4">
        <title>B. Algorithm Ontology</title>
        <p>The existing taxonomy lacks the specific, formal details
required to integration the algorithms within the 4M
infrastructure. For example, detailed definition of the required
input formats such as ‘binary’, ‘JPG’, etc. and structured
descriptions of the goals or outcomes of applying the
algorithm which may include the association of example
media. The challenge is to develop methods for quantifying
‘visual’ characteristics to assist users (or agents) in evaluating
the usefulness of an algorithm for their particular purpose.</p>
      </sec>
      <sec id="sec-7-5">
        <title>C. Integration through a Core Ontology</title>
        <p>We focused on extending the available technology towards
multimedia ontologies to add semantics in order to handle
applications that require annotation, retrieval, and
summarization of multimedia documents. Such an extension is
being done in line with the Semantic Web technology, so that
integration and interoperability with other existing
applications and tools can be provided.</p>
        <p>
          Research communities working on standards are developing
upper ontologies in order to achieve interoperability among
metadata, and integration of multimedia data. An upper level
ontology defines structures and concepts upon which single
domain ontologies could be implemented. An upper ontology
is defined through abstract concepts, which are generic
enough to be exploited by a wide range of domains. And in
fact they are especially suitable for multimedia data
interoperability and integration as demonstrated in [
          <xref ref-type="bibr" rid="ref48 ref49 ref50">48, 49,
50</xref>
          ].
        </p>
        <p>The use of an upper ontology facilitates the integration of
multi-source multimedia information. By combining metadata
from various initiatives (Dublin Core, MPEG-7, MPEG-21,
CIDOC/CRM, etc.), an upper ontology also provides a basis
for semantic interoperability and the development of services
based on deductive inferencing. Moreover, providing a
common model with a single set of semantic definitions
facilitates the efficiency and interoperability of multimedia
systems based on the lower-level integrated standards.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>V. FUTURE WORK AND CONCLUSIONS</title>
      <p>Currently work is continuing in parallel on the algorithm
and semantic annotation units. In addition there is ongoing
development on the functionality and implementation of the
integration unit. Possible future extensions to the
infrastructure include: distributed data storage and access in
the database unit; enhanced functionality for fine-grained,
role-based access control and incorporating reasoning
capabilities into the integration unit to further improve search
and retrieval capabilities.</p>
      <p>An initial prototype version of the infrastructure has been
developed that integrates the prototype versions of the
MPEG-7 feature extraction and database units. This prototype
demonstrates some of the technical challenges faced in
integrating multimedia metadata.</p>
      <p>Overall, the architecture proposed here enables media to be
combined and managed. In addition valuable semantic
services can be supported, such as semantic search and
retrieval, algorithm discovery and application and semantic
annotation.</p>
    </sec>
    <sec id="sec-9">
      <title>ACKNOWLEDGMENT The authors would like to thank Marco Tampucci for his MultiMedia Metadata Management: a Proposal for an Infrastructure, submitted SWAP 2006 6</title>
      <p>valuable contribution. This work has been partially supported
by EU MUSCLE Network of Excellence.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W3C</given-names>
            <surname>World Wide</surname>
          </string-name>
          Web Consortium http://www.w3.org/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>DELOS</given-names>
            <surname>Network</surname>
          </string-name>
          of Excellence http://www.delos.info/
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>[3] aceMedia project http://www.acemedia.org/aceMedia/</mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>[4] MUSCLE Network of Excellence 'Multimedia Understanding through Semantics, Computation</article-title>
          and Learning' http://www.muscle-noe.org/
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>MUSCLE</given-names>
            <surname>Workpackage</surname>
          </string-name>
          <article-title>9 http://muscle</article-title>
          .isti.cnr.it/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Boll</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Klas</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sheth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>Overview on Using Metadata to Manage Multimedia Data</article-title>
          ,
          <source>McGraw Hill</source>
          ,
          <year>1998</year>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Subrahmanian</surname>
          </string-name>
          , V.S. Principles of Multimedia Database Systems Morgan Kaufmann, 1998
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>[8] MARC</mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>[9] Dublin Core Metadata Initiative http://dublincore.org</mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Visual</given-names>
            <surname>Resources Association Data Standards Committee</surname>
          </string-name>
          , VRA Core http://www.vraweb.org/vracore3.htm
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <article-title>IEEE Standard for Learning Technology - Extensible Markup Language (XML) Schema Definition Language Binding for Learning Object Metadata</article-title>
          ,
          <source>IEEE Std 1484.12</source>
          .
          <fpage>3</fpage>
          -
          <lpage>2005</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <issue>I3A</issue>
          DIG35 Initiative Group http://www.i3a.org/i_dig35.html
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>[13] MPEG-7 http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg7.htm</mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>[14] MPEG-21 http://www.chiariglione.org/mpeg/standards/mpeg-21/mpeg21.htm</mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>[15] RDF http://www.w3.org/RDF/</mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>[16] RDF Schema http://www.w3.org/TR/rdf-schema/</mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>[17] OWL http://www.w3.org/2004/OWL/</mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>[18] W3C Incubator Group on Multimedia Semantics http://www.w3.org/2005/Incubator/mmsem/</mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>van Ossenbruggen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nack</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hardman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>That Obscure</surname>
          </string-name>
          <article-title>Object of Desire: Multimedia Metadata on the Web, Part 1 IEEE MultiMedia</article-title>
          , IEEE Computer Society,
          <year>2004</year>
          ,
          <volume>11</volume>
          ,
          <fpage>38</fpage>
          -
          <lpage>48</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Nack</surname>
            ,
            <given-names>F.; van Ossenbruggen</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ;
            <surname>Hardman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>That Obscure</surname>
          </string-name>
          <article-title>Object of Desire: Multimedia Metadata on the Web, Part 2 IEEE MultiMedia</article-title>
          , IEEE Computer Society,
          <year>2005</year>
          ,
          <volume>12</volume>
          ,
          <fpage>54</fpage>
          -
          <lpage>63</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Dasiopoulou</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Papastathis</surname>
            ,
            <given-names>V.K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Mezaris</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kompatsiaris</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ; Strintzis,
          <string-name>
            <surname>M.G.</surname>
          </string-name>
          <article-title>An Ontology Framework For Knowledge-Assisted Semantic Video Analysis</article-title>
          and
          <source>Annotation Proc. 4th International Workshop on Knowledge Markup and Semantic Annotation (SemAnnot</source>
          <year>2004</year>
          ) at the 3rd
          <source>International Semantic Web Conference</source>
          ,
          <year>2004</year>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Dasiopoulou</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kompatsiaris</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ; Strintzis,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Semantic Processing of Color Images</article-title>
          .
          <source>Color Image Processing: Methods and Applications Semantic Processing of Color Images</source>
          . Lukac,
          <string-name>
            <given-names>R.</given-names>
            &amp;
            <surname>Plataniotis</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          (Editors), CRC Press, (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>XML eXtensible Markup Language</surname>
          </string-name>
          http://www.w3.org/XML/
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Berkley</surname>
            <given-names>DB</given-names>
          </string-name>
          <article-title>(subsumed by Oracle) see http</article-title>
          ://www.xml.com/pub/a/2003/05/07/bdb.html
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>[25] eXist http://exist.sourceforge.net/</mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Ozone</surname>
            <given-names>XML</given-names>
          </string-name>
          http://www.ozone-db.org
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>[27] Xindice http://xml.apache.org/xindice/</mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>[28] XQuery http://www.w3.org/XQuery/</mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Beloozerov</surname>
            ,
            <given-names>V.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurevich</surname>
            ,
            <given-names>I. B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurevich</surname>
            ,
            <given-names>N. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murashov</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trusova</surname>
            ,
            <given-names>Y.O.</given-names>
          </string-name>
          :
          <article-title>Thesaurus for Image Analysis: Basic Version, Pattern Recognition and Image Analysis</article-title>
          , Vol.
          <volume>13</volume>
          , No.
          <volume>4</volume>
          ,
          <fpage>556</fpage>
          -
          <lpage>569</lpage>
          (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Smeulders</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Worring</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Santini</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>ContentBased Image Retrieval at the End of the Early Years</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          ,
          <year>2000</year>
          ,
          <fpage>22</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Grosky,
          <string-name>
            <surname>W. Negotiating</surname>
          </string-name>
          <article-title>The Semantic Gap: From Feature Maps to Semantic Landscapes Pattern Recognition</article-title>
          ,
          <year>2002</year>
          ,
          <volume>35</volume>
          ,
          <fpage>51</fpage>
          -
          <lpage>58</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hollink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schreiber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wielemaker</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Wielinga</surname>
          </string-name>
          . “Semantic Annotation of Image Collections.”
          <source>In KCAP'03 Workshop on Knowledge Capture and Semantic Annotation</source>
          . Florida, USA,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>[33] alphaWorks, IBM Multimodal Annotation Tool http://www.alphaworks.ibm.com/tech/multimodalannotation</mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34] aceMedia,
          <string-name>
            <surname>M-OntoMat-Annotizer</surname>
            <given-names>http</given-names>
          </string-name>
          ://www.acemedia.org/aceMedia/results/software/m-ontomatannotizer.html
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>Caliph-Emir</surname>
            <given-names>http</given-names>
          </string-name>
          ://www.semanticmetadata.net/
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>Hunter</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Drennan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Little,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Realizing the Hydrogen Economy through Semantic Web Technologies IEEE Intelligent Systems Journal</article-title>
          - Special Issue on eScience,
          <year>2004</year>
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <surname>Little</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hunter</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Rules-By-Example - A Novel</surname>
          </string-name>
          <article-title>Approach to the Semantic Indexing</article-title>
          and
          <source>Querying of Images International Semantic Web Conference (ISWC2004)</source>
          ,
          <fpage>2004</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>B.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Williams</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Schulze-Kremer</surname>
          </string-name>
          .
          <article-title>“The Ontology of the Gene Ontology</article-title>
          .”
          <source>In Proceedings of AMIA Symposium</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <source>[39] National Library of Medicine. “Medical Subject Headings (MeSH)</source>
          .” http://www.nlm.nih.gov/mesh/
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>[40] FOAF http://www.foaf-project.org</mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>C.</given-names>
            <surname>Fellbaum. Wordnet</surname>
          </string-name>
          ,
          <article-title>An Electronic Lexical Database</article-title>
          . MIT press,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>[42] SPARQL http://www.w3.org/TR/rdf-sparql-query/</mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>[43] Jena http://jena.sourceforge.net/</mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>[44] Jess http://herzberg.ca.sandia.gov/jess/</mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <surname>Hunter</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Adding Multimedia to the Semantic Web - Building and Applying an MPEG-7 Ontology. Multimedia Content and the Semantic Web: Standards, Methods and Tools, Giorgos Stamou</article-title>
          and Stefanos Kollias (Editors), Wiley (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <surname>Garcia</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Celma</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>A Complete MPEG-7 OWL Ontology based on a XMLSchema to OWL Mapping, www</article-title>
          .acemedia.org/aceMedia/files/multimedia_ontology/presentations_ 1st_meeting/special_event_
          <fpage>EWIMT05</fpage>
          -celma.ppt
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hollink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Little</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Hunter</surname>
          </string-name>
          . “
          <article-title>Evaluating the Application of Semantic Inferencing Rules to Image Annotation</article-title>
          .”
          <source>In Proceedings of the Third International Conference on Knowledge Capture, KCAP05</source>
          . Banff, Canada,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>M.</given-names>
            <surname>Doerr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hunter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Lagoze</surname>
          </string-name>
          . “
          <article-title>Towards a Core Ontology for Information Integration</article-title>
          .”
          <source>In Journal of Digital Information</source>
          , volume
          <volume>4</volume>
          (
          <issue>1</issue>
          ),
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hunter</surname>
          </string-name>
          . “
          <article-title>Enhancing the Semantic Interoperability of Multimedia through a Core Ontology.” In IEEE Transactions on Circuits and Systems for Video Technology, Special Issue on Conceptual and Dynamical Aspects of Multimedia Content Description</article-title>
          ,
          <year>February 2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [50]
          <string-name>
            <surname>Petridis</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bloehdorn</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Saathoff</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Simou</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Dasiopoulou</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Tzouvaras</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Handschuh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Avrithis,
          <string-name>
            <given-names>Y.</given-names>
            ;
            <surname>Kompatsiaris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            &amp;
            <surname>Staab</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Knowledge Representation and Semantic Annotation of Multimedia Content IEEE Proceedings on Vision, Image and Signal Processing - Special issue on the Integration of Knowledge, Semantics</article-title>
          and Digital Media Technology,
          <year>2006</year>
          ,
          <volume>153</volume>
          ,
          <fpage>255</fpage>
          -
          <lpage>262</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>