Why Real-World Multimedia Assets Fail to Enter the
                         Semantic Web
                                             ∗
                          Tobias Bürger                                        Michael Hausenblas
           Digital Enterprise Research Institute (DERI)                          Joanneum Research
                       Technikerstrasse 21a                                        Steyrergasse 17
                      6020 Innsbruck, Austria                                     8010 Graz, Austria
                    tobias.buerger@deri.at                             michael.hausenblas@joanneum.at

ABSTRACT                                                           More recently, the popular attraction was guided away from
Making multimedia assets on the one hand first-class objects       image sharing to richer content sharing of videos. This can
on the Semantic Web, while keeping them on the other hand          be seen by the launch of video portals like iFilm.com, Zid-
conforming to existing multimedia standards is a non-trivial       dio.com or the dozen of other portals that appeared recently
task. Most proprietary media asset formats are binary, op-         to compete with YouTube.
timized for streaming or storage. However, the semantics
carried by the media assets are not accessible directly. In        Unsurprisingly there is already a portal called VideoRonk3
addition, multimedia description standards lack the expres-        trying to combine other portals by providing a MetaSearch
siveness to gain a semantic understanding of the media as-         interface, which is quite of an help as one does not want
sets. There exists an array of requirements both regarding         to search on ten or more different sites. However, what is
media assets, and the Semantic Web already. Based on a             missing is the link between the contents of all these sites,
critical review of these requirements we investigate how on-       enabling distributed recommendations, cross-linking, etc.
tology languages fit into the picture. We finally analyse the
usefulness of formal accounts to describe spatio-temporal as-      Still, for example a cross-site search on the semantic level
pects of multimedia assets in a practical context.                 is close to impossible. The most obvious reason is due to
                                                                   a lack of metadata coming along with all the content. The
Categories and Subject Descriptors                                 power of providing metadata along with content on the Web
                                                                   can be seen at prospering mashups that not just combine
H.5.1 [Information Systems]: Multimedia Information
                                                                   APIs—provided by parties as Google4 — but also trying to
Systems; I7.4 [Document and Text Processing]: Elec-
                                                                   mashup things on a semantic level. This can be observed for
tronic Publishing
                                                                   example at Joost [40]. Having metadata about everything,
                                                                   as video content, blog posts, news feeds and the users of the
General Terms                                                      system makes this new experience of watching TV through
Multimedia Semantics, Semantic Web                                 the Internet possible. To take this even one step further:
                                                                   Would every stream or video available on the Internet be
Keywords                                                           described more detailed even content on the Internet could
Multimedia Assets for the Semantic Web, Multimedia Mod-            be matched with user profiles from applications like Joost
els, Requirements Analysis                                         and could be offered to watch.

                                                                   As pointed out in [46, 36], high-quality metadata is essential
1.     INTRODUCTION                                                for multimedia applications. Our recent work within initia-
Today a huge explosion of content can be experienced on            tives [47] and research projects5 has shown, there is a need
the Web generated by, and for the home users [33]: An in-          for going beyond current metadata standards to annotate
creasing number of people produce media assets (as photos,         media assets.
video clips, etc.), and share them on popular sites as Flickr1 ,
and YouTube2 .                                                     Current XML-based standards [24] are diverse, often pro-
∗Tobias Bürger is also affiliated with Salzburg Research          prietary and not ad hoc interoperable; cf. also [45]. In
1
    http://www.flickr.org                                          SALERO, for example, we are facing the problem to offer
2
    http://www.youtube.com                                         a semantic search facility over a diverse set of multimedia
                                                                   assets, e.g., image, videos, 3D objects or character anima-
                                                                   tions. The same is true for the Austrian project GRISINO6
                                                                   where we aim to realize a semantic search facility for cultural
                                                                   heritage collections.

                                                                   3
                                                                     http://www.videoronk.com
                                                                   4
                                                                     http://code.google.com/apis/
                                                                   5
                                                                     such    as     in    the    EU        project     SALERO,
                                                                   http://www.salero.info
                                                                   6
                                                                     http://www.grisino.at
Automating the handling of metadata for these collections         2.    REQUIREMENTS FOR THE DESCRIP-
and automating linkage between parts of these collections is            TION OF MULTIMEDIA ASSETS
hard as the vocabularies to describe them are mostly diverse      Requirements for multimedia content descriptions have been
and do not offer facilities to attach formal descriptions.        researched in a number of papers [17, 46, 36, 6] before and
                                                                  investigations of the combination of multimedia descriptions
A Motivating Scenario. Imagine a person that wants to             with features of the Semantic Web are yet numerous [27,
watch the recent clips similar to the ones of his favourite ex-   3, 42, 44, 2]. In the following, we give a summarisation
perimental artist. Tons of clips are potentially distributed      of the proposed requirements and add two additional ones
on the Web making a search for them time consuming and            (Authoring & Consumption and Performance & Scalability).
laborious. Thus a central facility to search for and negotiate
content is needed. This facility should allow to formulate a
search goal, including the characteristics, the subject mat-
ter, a maximum price, and the preferred encoding and file
                                                                  Representational Issues. A basic prerequisite is the for-
                                                                  mal grounding and neutral representation of the format used
format of the clip. In a next step, all portal offerings will
                                                                  to describe multimedia assets.
be scanned in order to retrieve and negotiate content that
matches the users’ intention. Note that also parts of a video
may match his intention, which means that videos need to               • Neutral Representation: The ideal multimedia meta-
be fine granular and sufficiently well enough described.                 data format has a platform and application indepen-
                                                                         dent representation, and is both human and machine
In order for this scenario to work, the descriptions of (1) the          processable;
goal formulation, (2) the description of the media content by
all content owners and (3) the negotiation semantics have              • Formal Grounding: Knowledge about media assets must
to be compatible. Three important focal points of these                  be represented in formal languages, as it must be in-
semantic descriptions are:                                               terpretable by machines to allow for automation.


   • Expressivity for high level semantic descriptions of con-
     tent as typical users are not thinking in terms of colour    Extensibility & Reusability. It is requested that the for-
     histograms and spatial / temporal constructs. The            mat at hand is extensible, e.g., via an extension mechanism
     characteristics of the media should be described de-         as found in MPEG-7. It should be possible to integrate or
     tailed enough.                                               reference existing vocabularies [24].

   • The need for rules: To effectively identify the part of
     the content that matches the users’ intention, rules are     Multimedia Characteristics and Linking. The format
     needed to map high level semantic concepts to spatial        should reflect the characteristics of media assets, hence allow
     and temporal segments of the video (eg., because rat-        linking between data and annotations:
     ings and classifications could only apply to parts of the
     content, ie., a scene including crime is only suitable for
     adults)                                                           • Description Structures. The format should support de-
                                                                         scription structures at various levels of detail, includ-
   • Fine grain semantic descriptions as of bandwidth, user              ing a rich set of structural, cardinality, and multimedia
     effort, or cost reason to transfer the whole content is             data-typing constraints;
     not possible. Thus parts of the content should be de-
     scribed detailed enough.                                          • Granularity. The language has to support the defi-
                                                                         nition of the various spatial, temporal, and concep-
                                                                         tual relationships between media assets in a commonly
To reach out, we want to provide answers to the question:                agreed-upon format;
Why do we need rich semantic descriptions of media assets
on the Web, and (why) is there a need to bundle these de-              • Linking. It has to facilitate a diverse set of linking
scriptions together with the multimedia assets? Simultane-               mechanisms between the annotations and the data be-
ous, we want to provide answers to the questions: How can                ing described, including a way to segment temporal
descriptions be provided? Why are the metadata features                  media.
of multimedia standards not enough?

Consequently, we elaborate on the answer to the question
                                                                  Authoring & Consumption. A major drawback of exist-
stated in the title of this paper “Why Multimedia Assets
                                                                  ing metadata approaches is its lacking support for authors
Fail to Enter the Semantic Web? ” by first collecting the re-
                                                                  in creating annotations along with the lacking benefits of
quirements for the description of multimedia assets (section
                                                                  generated annotations.
2), secondly by analysing the environment (section 3), and
thirdly by collecting requirements for multimedia assets on
the Semantic Web (section 4). In section 5 we analyse ex-              • Engineering support. Appropriate tools are a prerequi-
isting ontology languages for their usefulness regarding the             site for uptake of new vocabularies. There is the need
requirements and conclude in section 6 with a discussion of              for at least authoring and consumption environments
the question stated in the title of this paper and give a brief          making use of the vocabularies to demonstrate their
outlook on the open issues.                                              usefulness.
      • Deployment. Multimedia Assets need to be exchange-         of the existing metadata (sub-symbolic level - symbolic level
        able, and there must be ways to deploy descriptions        - semantic level) is a necessary prerequisite for multimedia
        along with the assets.                                     assets to enter the Semantic Web successfully. Secondly,
                                                                   from the requirements gathered in section 2 and the envi-
                                                                   ronmental analysis done in section 3 we deduce the following
                                                                   characteristics for multimedia assets on the Semantic Web:
Performance & Scalability. The language should yield
descriptions that can be stored, processed, exchanged and
queried effectively and efficiently.                               Formality of Descriptions. Formal descriptions are the
                                                                   basic building blocks of the Semantic Web. To enable auto-
MPEG-7. MPEG-7 [35] is a powerful and flexible way to              matic handling like retrieval, and negotiation of multimedia
describe media assets at several levels of granularity; on the     assets formality of descriptions is a pre-requisite.
other hand MPEG-7 bears some intrinsic complexity and              Three different (semantic) levels of multimedia metadata
interoperability issues [4, 46, 36, 43]. Due to the fact that      can be identified [17]: (1) At the subsymbolic layer covering
MPEG-7 standard is not grounded on formal semantics for            the raw multimedia information typically binary formats are
the descriptions, variability in the syntactic representation      used which are optimized for storage or streaming and which
of the descriptions may cause interoperability issues.             mostly do not provide metadata. (2) The symbolical layer
                                                                   provides an additional structural layer for the binary essence
3.     ENVIRONMENT ANALYSIS:                                       stream. For this level standards like MPEG-7, Dublin Core
       THE SEMANTIC WEB                                            or MPEG-21 can be used. The semantics of the informa-
A good starting point for the analysis of our targeted host-       tion encoded with these standards are only specified within
ing environment—the Semantic Web—is the Architecture of            each standards framework. (3) Therefore the semantic and
the World Wide Web [28], in which its three main building          logical layer is needed to provide the semantics for the sym-
blocks are discussed: identification, interaction, and data        bolical layer. This layer should be formally described using
formats. The Semantic Web, as an extension of the well-            ontology languages as proposed in this paper.
known Web roughly has the following characteristics:

                                                                   Efficient layering and referencing of descriptions. It is
      • It is a highly distributed system. Identification of re-   necessary to support different levels of meaning attached
        sources is based on URIs—for both data and services;       to multimedia assets, i.e., meaning at the bit-level, tradi-
                                                                   tional metadata and semantic (high-level) information. As
      • There is no single, central “registry”, viz. authorities
                                                                   there are already widely adopted standards available for the
        are decentralised ; data and metadata are under control
                                                                   description of multimedia assets, the semantic layer must
        of a lot of distinct individuals (companies, standardis-
                                                                   be efficiently put upon those traditional description layers
        ation bodies, private, etc.)
                                                                   and should not aim to replace it. Furthermore semantic
      • Alike in the Web fundamental building blocks are re-       descriptions from these traditional layers shall be re-used.
        lations between data, whereas the relations in the Se-     As content, parts of content, and traditional and semantic
        mantic Web are named, may be of any granularity and        descriptions may be distributed, efficient referencing mech-
        allow the automatic interchange of data;                   anisms for multimedia content must be present.

      • Contribuser 7 inhabit it; each participant may play dif-   Based on recent discussions8 we give a summarisation of
        ferent roles at once: consuming content and contribut-     possible approaches in the following. The multimedia asset
        ing via comments, links, etc.                              is denoted with A, for the multimedia metadata (M3) format,
                                                                   such as MPEG-7, we write M, the ontology (language) is
      • Finally, there exists a number of standards. Such as
                                                                   written as O, and finally an external reference mechanism9
        RDF allowing formal definitions of the intended mean-
                                                                   are labelled with R. The linking is depicted with ,→:
        ing, SPARQL for querying, RDF(S), OWL or SKOS to
        classify content and OWL, WSML, or RIF for describ-           • M ,→ A. the content is referenced from the M3 format;
        ing logical relationships.                                      the ontology layer has to deal with it, separately;
                                                                      • M ,→ O. The M3 format references the ontology layer;
Any multimedia metadata format that is after the successful
application on the Semantic Web has to be in-line with the            • O ,→ M. The ontology layer references the M3 format;
above listed characteristics. While some requirements, as             • O ,→ A. The ontology layer references the content di-
formats (e.g. XML) are rather easy to meet, other can pose              rectly;
serious problems regarding the integration into the Semantic
Web.                                                                  • O, M ,→R A. The ontology layer and the M3 format use
                                                                        a common reference mechanism to link to the content.
4.     MULTIMEDIA ASSETS ON THE                                    However, it has to be noted that there is no standardised
       SEMANTIC WEB                                                way for the layering or the referencing, yet.
Firstly, addressing the environmental requirements together        8
with an efficient layering of the semantic descriptions on top       http://lists.w3.org/Archives/Public/
                                                                   public-xg-mmsem/2007Apr/0002.html
7                                                                  9
    a portmanteau word; contributor and user                         http://www.annodex.net/TR/URI_fragments.html
Interoperability among descriptions. Many formats used           ing, and deployment of multimedia assets along with their
in various communities cause interoperability problems when      associated metadata. In the following the most important
dealing with multimedia content. To overcome this, an RDF        areas of engineering support are listed:
based semantic layer should be added on top of these numer-
ous formats to ease their semantic and syntactic integration.
However, there are some open problems regarding the inte-              • Edit & Visualise. To aid the engineer in handling
gration of existing annotation standards and semantic ap-                the annotations, editor tools, and IDEs10 are needed.
proaches [46, 36]: The stack of Semantic Web languages                   These may include validator services11 , converter or
and technologies provided by the W3C is well suited to the               mapper, and visualisation modules.
formal, semantic descriptions of the terms in a multimedia             • Libraries & Applications. When developing applica-
document’s annotation. But, as also pointed out in [41], the             tions, the availability of APIs is a core requirement.
Semantic Web based languages lack the structural advan-                  In special for Semantic Web applications, interface and
tages of the XML-based approaches. Additionally, there is                mapping issues are of importance [19].
a huge amount of work already done on multimedia docu-
ment annotation within the framework of other standards.               • Deployment Multimedia containers as HTML, SMIL,
This is why a combination of the existing standards is the               etc. require the metadata either being referenced from
most promising path for multimedia document description                  within the media assets, or being embedded into it.
in the near future.                                                      As the data model needs to be RDF—in contrast to
                                                                         existing, flat (tags, etc.) technologies—upcoming ap-
                                                                         proaches as RDFa [1] need to be utilised thoroughly.
Subjectivity and granularity of descriptions. Opinions
and views on the content differ among users because of their     5. FORMAL DESCRIPTIONS OF MULTI-
personal background, culture or previous experiences. As
many users are potential contributors to descriptions of as-
                                                                    MEDIA ASSETS
                                                                 In this part ontology languages which are thought to be used
sets, opinions may differ. Many of these opinions sometimes
                                                                 for the advanced requirements which were identified in the
do not serve to a unique whole opinion. This is why it should
                                                                 sections before are introduced. In its core it comprises a a
be possible to separately attach these opinions to multime-
                                                                 comparison of two families of ontology languages against the
dia assets and keep them separate.
                                                                 requirements postulated in section 4.

                                                                 The reader is invited to note that not all of the existing
Trust and IPR issues. The Web consists of decentralized          languages have the same expressiveness and not all have the
authorities and a huge number of contribusers. As descrip-       same inferential capabilities. Further, the underlying knowl-
tions of content—especially in the new changing Web 2.0          edge representation paradigms can differ (eg., Description
environment—are subject to vandalism, there need to be           Logics, Logic Programming, etc.). Corcho and Gomez-Perez
ways to guarantee the validity of the descriptions and to        [20] present a framework that allows for analysing and com-
secure descriptions that are just read-only for a user group.    paring the expressiveness and reasoning capabilities of on-
Popular portals like Flickr or YouTube show that there is no     tology languages, which can be used in the decision process.
need to own content in order to annotate it. Furthermore         The process of choosing and selecting the appropriate ontol-
copyright is critical when dealing with multimedia content.      ogy language includes questions about e.g. the expressive-
                                                                 ness, inference mechanisms, translators or exchange formats
                                                                 offered for an ontology language.We are going to take these
Functional Descriptions. Sometimes the fact that meta-           questions into consideration and simultaneously verify if the
data is created to support some specific function is forgotten   languages meet the requirements discussed in section 4.
when summarizing the requirements for a metadata schema.
For the metadata creator it should be clear beforehand for
what purpose the metadata will be used and what benefits
                                                                 5.1      Ontology Languages
                                                                 A number of logical languages have been used for the de-
he gains from it [34], ie., using this part of the metadata
                                                                 scription of different kinds of knowledge (i.e. ontologies and
scheme enhances retrieval, raises social attention or helps
                                                                 rules) on the Semantic Web: First Order Logic, Description
you protect your assets.
                                                                 Logics, Logic Programming and Frame-based Logics. Each
This in turn also applies to the consumer of the metadata,
                                                                 of which allow the description of different statements and
functional descriptions of what type of information can be
                                                                 each imply different complexity results for certain reasoning
inferred from the attached metadata or what type of ac-
                                                                 tasks with these languages.
tions can be performed on the content are essential: this is
especially true for information that is obfuscated prior to a
                                                                 In this section we want to introduce two of the most promis-
possible negotiation phase of the content.
                                                                 ing ontology language families, ie., the OWL- and the WSML-
                                                                 family of languages. The OWL family of languages is a
                                                                 standardisation effort of the W3C and the WSML family of
Engineering Support. The presence of metadata is a pre-          languages is an effort of the WSMO working group, whereas
requisite to make multimedia assets accessible, and deploy-      WSML is a formal language for the description of ontologies
able on the Semantic Web, hence to enable their automated        and Semantic Web Services. Other ontology languages like
processing. From a developers perspective, there must be
                                                                 10
tools and standards enabling an integrated authoring, test-           as for example http://www.topbraidcomposer.com/
                                                                 11
                                                                      http://phoebus.cs.man.ac.uk:9999/OWL/Validator
F-Logic [30], OIL [7] or DAML+OIL12 were not taken into          semantics; from a theoretical perspective, RIF Core corre-
consideration because their lack of support for recent W3C       sponds to the language of definite Horn rules. As standardi-
recommendations like RDF.                                        sation is still in its infancy, we will not go further into detail
                                                                 regarding rules, but one has to note that the careful inte-
 5.1.1    Web Ontology Language (OWL) Family                     gration of ontology languages is an issue to be addressed;
The Web Ontology Language (OWL) family was designed              for example the usage of DL concepts in a rule has to be
in a W3C standardisation process because of the need for         well-defined.
an ontology language that can be used to formally describe
the meaning of terminology used in Web documents, thus,          5.3      Comparing Formal Descriptions Regard-
making it easier for machines to automatically process and
integrate information available on the Web. This language                 ing the Requirements
should be layered on top of XML and RDF (W3C’s Resource          In the following a high-level comparison of formal descrip-
Description Framework13 ) in order to build on XML’s ability     tion paradigms for multimedia assets is performed. We chose
to define customized tagging schemes and RDF’s approach          OWL+RIF on the one side, and WSML/OWL-Flight on the
to representing data.                                            other to achieve a somehow realistic scenario; the result can
                                                                 be found in Table 117 : The table indicates for which re-
Currently OWL 1.114 is under development; it extends OWL         quirement an ontology language (resp. OWL / WSML) can
DL in several ways: the underlying DL now is is SROIQ,           be utilised to overcome the identified shortcomings of tradi-
which provides increased expressive power with respect to        tional approaches and thus fulfill the requirements stated in
properties and cardinality restrictions. Further, OWL 1.1        4.
has user-defined datatypes and restrictions involving datatype
predicates, and a weak form of meta-modelling known as                Requirement                OWL 1.1          WSML-/
punning.                                                                                         + RIF            OWL-
                                                                                                                  Flight
The usage of rules in combination with DL has been inves-             Formal Description         ++               ++
tigated for some time [14, 21]—in the Semantic Web stack,             Layering of Descriptions   +                +
it is expected that a rule language will complement the on-           Interoperability           ++               +
tology layer.                                                         Granularity                -                -
                                                                      Trust & IPR issues         -                -
                                                                      Functional Descriptions    -                *
 5.1.2    The WSML family of languages                                Engineering Support        ++               +
The activities of the WSMO Working group15 have yielded               Datatype Support           +                ++
proposals of new ontology languages, namely WSML (WSML-
Core, WSML-DL, WSML-Flight, WSML-Rule, WSML-Full),               Table 1: Comparison of Formal Descriptions for Me-
OWL- (”OWL minus”) [8] and OWL Flight [10]. In [16]              dia Assets.
unique key features of WSML in comparison of other lan-
guage proposals are presented. Compared to OWL key fea-
tures include (1) WSML offers one syntactic framework for
a set of layered languages, and (2) it separates between con-    In the following, we elaborate in detail on each of the items
ceptual and logical modelling. An overview of the different      in Table 1, and argue therefore our findings regarding the
variants of the WSML framework can be found in [32]. One         comparison of OWL 1.1 + RIF vs. WSML/OWL-Flight.
has to note that WSML-Flight incroporates a rule langage
while still allowing efficint decidable reasoning and WSML-
Rule allows unsafe rules. The relation of WSML to OWL is          5.3.1     Formal Description
discussed in [9].                                                Both OWL and WSML provide a framework for the formal
                                                                 (machine-processable) description of ontologies. An ontol-
5.2      Rules                                                   ogy in WSML consists of the elements concept, relation, in-
Due to well-known limitation of the expressive power of the      stance, relationInstance and axiom. The primary elements
Description Logics language family [25, 26], the need for a      of an OWL ontology concern classes and their instances,
richer set of descriptions w.r.t. properties emerges. As rule    properties, and relationships between these instances. The
systems are widely deployed, the harmonisation efforts have      formality of the descriptions is based on logics that allow
not been successful so far. A relatively new W3C initia-         machines to reason on the information. Whereas OWL is
tive, the Rule Interchange Format Working Group, is now          based on Description Logics, the WSML family members
after defining a core rule language for exchanging rules. This   are based on different logic languages (ie. Description Log-
Rule Interchange Format Core16 (RIF Core) language aims          ics, Logic Programming or First Order Logic).
at achieving maximum interoperability while preserving rule
                                                                 Despite the fact, that OWL is more widely adopted and
12
   DAML+OIL        Reference      Description,            see:   used we believe that WSML with its layered framework is
 http://www.daml.org/2001/03/reference                           conceptually superior to OWL. A major difference between
13
   http://www.w3.org/TR/rdfprimer/                               ontology modeling in WSML and ontology modeling in OWL
14
   http://owl1_1.cs.manchester.ac.uk/owl_
 specification.html                                              17
                                                                  ++ . . . good support, + . . . available , - . . . not supported,
15
   http://www.wsmo.org                                           * . . . supported because of WSML’s constructs for the de-
16
   http://www.w3.org/TR/rif-core/                                scription of Semantic Web Services
is that WSML separates conceptual modelling for the non-        5.3.4      Granularity
expert users, and logical modeling for the expert user as it—   As stated above, when referring to granularity, we under-
unlike OWL—uses an epistemology, which abstracts from           stand the support of the definition of various spatial, tem-
the underlying logical language making the surface syntax       poral, and conceptual relationships regarding annotations.
nicer. Even if an application later requires OWL, one is able   In this sense, OWL and WSML meet the minimal require-
to use WSML tools to convert ontologies that reside in pop-     ments, but do not explicitly address this issue. Depending
ular logic/language fragments automatically into equivalent     on the granularity, obviously scalability and performance is-
OWL ontologies. Furthermore the WSML family framework           sues come along. In this respect, again, OWL and WSML
enables one to choose exactly the language with the needed      can be perceived comparable.
expressiveness to be used, and later allows an easy switch
to another family member as a consequence of its common
grounding. WSML Rule and WSML Flight also include
                                                                5.3.5      Trust and IPR
rule-support. Thus, unlike with OWL, no additional rule         In an interdependent, interconnected environment as the Se-
language is needed.                                             mantic Web, two important aspects immediately arise: data
                                                                provenance and trust [5]. Requirements regarding trust is-
                                                                sues gathered from [37, 18] contain costs and benefits w.r.t.
 5.3.2      Layering of Descriptions                            implementation, technology-driven vs. social networking,
An array of existing multimedia metadata (M3) formats           etc.
have been used for years in diverse application areas. How-
ever, when one aims at using these formats (as MPEG-7,          Both WSML and OWL do not have explicit provisions for
ID3, etc.) in the context of the Semantic Web, the options      handling trust and IPR issues, respectively.
are limited. Hence, to enable an efficient layering of RDF-
based vocabularies on top of existing multimedia metadata,      5.3.6      Functional Descriptions
one may use hybrid techniques.                                  OWL and the WSML’s part for the description of ontologies
                                                                do not have support for such kind of descriptions.
As a result of our works in the media semantics area, we re-    However, WSML is a language for the specification of on-
cently proposed the RDFa-deployed Multimedia Metadata           tologies and different aspects of Web services. As such it
(ramm.x) specification [22]. ramm.x is a light-weight frame-    not only provides means for modeling and description of on-
work allowing existing multimedia metadata to hook into the     tologies but also functional (service) descriptions, i.e. the
Semantic Web using RDFa [1]. Ontologies based on WSML           description of a service capability by means of precondition,
and OWL are typically used in ramm.x to formalise a M3          assumptions, postconditions and effects [29].
format; this is especially important due to their interoper-
ability features (see 5.3.3).
                                                                5.3.7      Engineering Support
A different but as well Web compatible approach is described    Tool Support for WSML and especially OWL is constantly
in [31]. There, the authors propose the concept of seman-       growing. However, the amount of tools available for OWL
tic documents; semantic documents include any informa-          [48] and WSML [13] can drastically not be compared. As
tion regarding the document and its relationships to other      OWL is a W3C Recommendation, the support for it is huge.
documents. The concept is realised by including XMP de-
scriptions in PDF documents which can be rendered in any        5.3.8      Data Type Support
browser with available plugins. XMP is a format for embed-      The reader is invited to note that both OWL and WSML
ding metadata in documents using RDF.                           ground their datatype support on XML Schema. In WSML,
                                                                XML Schema primitive datatypes, simple types and XML
 5.3.3      Interoperability                                    Schema derived datatypes are supported [39]; OWL adopts
To adhere to the architecture of the WWW, OWL uses (1)          the RDF(S) specification of datatypes [38], though some
URIs for naming and (2) RDF to provide extensible descrip-      XML Schema built-ins are problematic.
tions. (3) OWL builds on RDF and RDF Schema and adds
additional vocabulary for describing properties and classes.    6.     CONCLUSIONS & OUTLOOK
(4) The datatype support for OWL is grounded on XML             The first question we kept open is ”What are real-world mul-
Schema.                                                         timedia assets”? Real-world multimedia assets are multi-
                                                                media objects which can be currently found embedded in
WSML has a number of features which allow to integrate it       HTML pages on the Web, as images, videos, etc. We see
seamlessly in the Web: (1) WSML uses IRIs18 [15] for the        three main reasons why media assets fail to enter the Se-
identification of resources. (2) WSML adopts the names-         mantic Web:
pace mechanism of XML, and WSML and XML Schema
datatypes are compatible. (3), WSML has an XML- and
RDF based syntax for exchange over the Web.                          1. There is a lack of the critical mass of annotated content
To reach compatiability between WSML and OWL, WSML                      which is mainly due to the large scale automation of
has a set of defined translators between OWL and WSML                   (semantic) visual analysis has not gone that far. This
[11, 12].                                                               is why the user is the central person in the process in
                                                                        order to provide manual annotations. Motivating user
                                                                        to attach complex annotations to content is not easy
18
     IRIs are the successors of URIs                                    to achieve.
     2. Current traditional and Web 2.0 based approaches to            Storage and Retrieval Methods and Applications for
        multimedia annotation are not useful to achieve the            Multimedia, pages 284–295, San Jose, California, USA,
        goals of the Semantic Web: The most important as-              2005.
        pects that the Semantic Web intends to solve are (i)       [5] C. Bizer and R. Oldakowski. Using context- and
        Annotation, (ie., how to associate metadata to a re-           content-based trust policies on the Semantic Web. In
        source), (ii) Information Integration (ie., how to in-         Proceedings of the 132th international World Wide
        tegrate information about resources), and (iii) Infer-         Web conference on Alternate track papers & posters,
        ence (ie., reasoning over known facts to unleash hidden        pages 228–229. ACM Press, 2004.
        facts).                                                    [6] T. Bürger and R. Westenthaler. Mind the gap -
        Existing multimedia metadata standards as MPEG-7               requirements for the combination of content and
        can be used to annotate but keep a certain amount              knowledge. In Poster Proceedings of the SAMT 2006
        of ambiguity amongst these annotations. As it is a             Conference, Athens, Greece, 2006.
        standard it allows easy integration based on it (a re-     [7] F. D., van Harmelen F., H. I., M. D., and P.-S. P. Oil:
        quirement for that is that everyone adheres to this            An ontology infrastructure for the semantic web.
        standard!) but inference is not possible with the in-          IEEE Intelligent Systems and their applications,
        formation attachable to a MPEG-7 file. The problem             16(2):38–44, 2001.
        with tagging is manifold; there are open issues, such      [8] J. de Bruijn and A. P. (eds.). OWL− . WSML
        as consistency among tags, reconciliation of tags, and         Deliverable D20.1v0.2 WSML Working Draft
        how to associate tags with parts of the tagged content.        05-15-2005,
        This huge amount of uncertainty will not allow reliable        http://www.wsmo.org/TR/d20/d20.1/v0.2/, 2005.
        information integration, nor allow to reason on it.
                                                                   [9] J. de Bruijn, H. Lausen, A. Polleres, and D. Fensel.
     3. As we argued in this paper, more requirements have             The Web Service Modeling Language WSML: An
        to be fulfilled, which can not be solely solved by tra-        Overview. In ESWC, pages 590–604, 2006.
        ditional or Web 2.0 based approaches and which make       [10] J. de Bruijn (ed.). OWL Flight. D20.3v0.1 OWL
        more formalized descriptions of content necessary. How-        Flight WSML Working Draft 23-08-2004,
        ever, before not being able to attach these directly to        http://www.wsmo.org/2004/d20/d20.3/v0.1/, 2004.
        the media being described, multimedia assets will not     [11] DERI. OWL - WSML Translator v1.0. http://tools.
        be able to enter the Semantic Web.                             deri.org/wsml/owl2wsml-translator/v0.1/, 2007.
                                                                  [12] DERI. WSML - OWL Translator v1.0. http://tools.
                                                                       deri.org/wsml/wsml2owl-translator/v0.1/, 2007.
Regarding deployment of M3 format on the Semantic Web,            [13] DERI. WSML Tools. http://tools.deri.org/wsml/,
we recently proposed to use ramm.x in the Cultural Heritage            2007.
domain [23].
                                                                  [14] F. M. Donini, M. Lenzerini, D. Nardi, and A. Schaerf.
                                                                       AL-log: Integrating Datalog and Description Logics.
Acknowledgements                                                       Journal of Intelligent Information Systems,
The research leading to this paper was partially supported             10(3):227–252, 1998.
by the European Commission under contract FP6-027026,             [15] M. Duerst and M. Suignard. Internationalized
“Knowledge Space of semantic inference for automatic an-               Resource Identifiers (IRIs). IETF RFC 3987, 2005.
notation and retrieval of multimedia content - K-Space” and            http://www.ietf.org/rfc/rfc3987.txt.
SALERO (contract number FP6-027122).                              [16] D. Fensel, H. Lausen, A. Polleres, J. de Bruijn,
                                                                       M. Stollberg, D. Roman, and J. Domingue. Enabling
7.     REFERENCES                                                      Semantic Web Services: The Web Service Modeling
 [1] B. Adida and M. Birbek. RDFa Primer 1.0 -                         Ontology. Springer, 11 2006.
     Embedding RDF in XHTML. W3C Working Draft,                   [17] J. Geurts, J. van Ossenbruggen, and L. Hardman.
     W3C RDF in XHTML Taskforce, 2007.                                 Requirements for practical multimedia annotation. In
 [2] R. Arndt, R. Troncy, S. Staab, L. Hardman, and                    Proceedings of the Workshop on Multimedia and the
     M. Vacura. COMM: Designing a Well-Founded                         Semantic Web, May 2005, Heraklion, Crete, pages
     Multimedia Ontology for the Web. In Proceedings of                4–1, 2005.
     the 6th International Semantic Web Conference                [18] J. Golbeck, B. Parsia, and J. Hendler. Trust Networks
     (ISWC’2007), Busan, Korea, November 11-15, 2007,                  on the Semantic Web. In Proceedings of Cooperative
     (forthcoming), 2007.                                              Intelligent Agents 2003, 2003.
 [3] T. Athanasiadis, V. Tzouvaras, K. Petridis,                  [19] N. M. Goldman. Ontology-Oriented Programming:
     F. Precioso, Y. Avrithis, and Y. Kompatsiaris. Using a            Static Typing for the Inconsistent Programmer. In
     Multimedia Ontology Infrastructure for Semantic                   Proceedings of the Second International Semantic Web
     Annotation of Multimedia Content. In Proc. of 5th                 Conference - ISWC 2003, pages 850–865, 2003.
     International Workshop on Knowledge Markup and               [20] Gomez-Perez, Fernandez-Lopez, and Corcho-Garcia.
     Semantic Annotation (SemAnnot ’05), Galway,                       Ontological Engineering. Springer, Berlin, 2004.
     Ireland, November 2005, 2005.                                [21] B. Grosof, I. Horrocks, R. Volz, and S. Decker.
 [4] W. Bailer, P. Schallauer, M. Hausenblas, and                      Description Logic Programs: Combining Logic
     G. Thallinger. MPEG-7 Based Description                           Programs with Description Logics. In 12th
     Infrastructure for an Audiovisual Content Analysis                International World Wide Web Conference
     and Retrieval System. In Proceedings of SPIE -
     (WWW’03), Budapest, Hungary, 2003.                       [37] K. O’Hara, H. Alani, Y. Kalfoglou, and N. Shadbolt.
[22] M. Hausenblas, W. Bailer, and T. Bürger. Deploying           Trust Strategies for the Semantic Web. In ISWC
     Multimedia Metadata on the Semantic Web -                     Workshop on Trust, Security, and Reputation on the
     RDFa-deployed Multimedia Metadata (ramm.x).                   Semantic Web, 2004.
     Specification, ramm.x Working Group, 2007.               [38] J. Z. Pan. Description Logics: Reasoning Support for
[23] M. Hausenblas, W. Bailer, and H. Mayer. Deploying             the Semantic Web. PhD thesis, School of Computer
     Multimedia Metadata in Cultural Heritage on the               Science, The University of Manchester, 2004.
     Semantic Web. In First International Workshop on         [39] D. Roman, H. Lausen, and U. Keller. Web Service
     Cultural Heritage on the Semantic Web, collocated             Modeling Ontology (WSMO), WSMO Deliverable
     with the 6th International Semantic Web Conference            D2v1.0., WSMO Working Draft 20 September 2004,
     (ISWC07), Busan, South Korea, 2007.                           September 2004.
[24] M. Hausenblas, S. Boll, T. Bürger, O. Celma,            [40] L. Simons. RDF at the Venice Project.
     C. Halaschek-Wiener, E. Mannens, and R. Troncy.               http://www.leosimons.com/2006/
     Multimedia Vocabularies on the Semantic Web. W3C              rdf-at-the-venice-project.html, 2006. Blog Post.
     Incubator Group Report, W3C Multimedia Semantics         [41] G. Stamou, J. van Ossenbruggen, J. Z. Pan, and
     Incubator Group, 2007.                                        G. Schreiber. Multimedia annotations on the semantic
[25] I. Horrocks, P. F. Patel-Schneider, S. Bechhofer, and         web. IEEE MultiMedia, 13(1):86–90, 2006.
     D. Tsarkov. OWL rules: A proposal and prototype          [42] R. Troncy. Integrating Structure and Semantics into
     implementation. Journal of Web Semantics,                     Audio-visual Documents. In Proceedings of the 2nd
     3(1):23–40, 2005.                                             International Semantic Web Conference (ISWC’03),
[26] I. Horrocks, P. F. Patel-Schneider, and F. van                volume LNCS 2870, pages 566–581, 2003.
     Harmelen. From SHIQ and RDF to OWL: The                  [43] R. Troncy, W. Bailer, M. Hausenblas, P. Hofmair, and
     making of a web ontology language. Journal of Web             R. Schlatte. Enabling Multimedia Metadata
     Semantics, 1(1):7–26, 2003.                                   Interoperability by Defining Formal Semantics of
[27] J. Hunter. Adding Multimedia to the Semantic Web -            MPEG-7 Profiles. In 1st International Conference on
     Building an MPEG-7 Ontology. In First International           Semantics And digital Media Technology (SAMT’06),
     Semantic Web Working Symposium (SWWS’01),                     pages 41–55, Athens, Greece, 2006.
     Stanford, California, USA, 2001.                         [44] C. Tsinaraki, P. Polydoros, and S. Christodoulakis.
[28] I. Jacobs and N. Walsh. Architecture of the World             Integration of OWL ontologies in MPEG-7 and
     Wide Web, Volume One.                                         TVAnytime compliant Semantic Indexing. In
     http://www.w3.org/TR/webarch/, 2004.                          Proceedings of the 16th International Conference on
[29] U. Keller, H. Lausen, and M. Stollberg. On the                Advanced Information Systems Engineering (CAiSE),
     Semantics of Functional Descriptions of Web Services.         2004.
     In The Semantic Web: Research and Applications           [45] V. Tzouvaras (ed.). Multimedia Annotation
     (Proceedings of ESWC 2006), pages 605–619, 2006.              Interoperability Framework; MMSEM XG Report.
[30] M. Kifer and G. Lausen. F-logic: a higher-order               http://www.w3.org/2005/Incubator/mmsem/wiki/
     language for reasoning about objects, inheritance, and        Semantic_Interoperability, 2007.
     scheme. In B. L. J. Clifford and D. Maier, editors,      [46] J. van Ossenbruggen, F. Nack, and L. Hardman. That
     Proceedings of the 1989 ACM SIGMOD international              Obscure Object of Desire: Multimedia Metadata on
     Conference on Management of Data, pages 134–146,              the Web (Part I). IEEE Multimedia, 11(4), 2004.
     New York, NY, 1989. ACM Press.                           [47] W3C. Multimedia Semantics Incubator Group.
[31] H. Kim, H. Kim, J. H. Choi, and S. Decker.                    http://www.w3.org/2005/Incubator/mmsem/, 2007.
     Translating Documents into Semantic Documents            [48] W3C. Semantic Web Development Tools.
     using Semantic Web and Web 2.0. In Proceedings of             http://esw.w3.org/topic/SemanticWebTools, 2007.
     the 1st Semantic Authoring and Annotation Workshop
     (SAAW2006), 2006.
[32] H. Lausen, J. de Bruijn, A. Polleres, and D. Fensel.
     WSML - a Language Framework for Semantic Web
     Services. In Proceedings of the W3C Workshop on
     Rule Languages for Interoperability, 2005.
[33] J. Markoff. Web content by and for the masses. New
     York Times Online, June 2005.
[34] A. Morgan and M. Naaman. Why we tag: motivations
     for annotation in mobile and online media. In CHI
     ’07: Proceedings of the SIGCHI conference on Human
     factors in computing systems, pages 971–980, New
     York, NY, USA, 2007. ACM Press.
[35] MPEG-7. Multimedia Content Description Interface.
     Standard No. ISO/IEC n◦ 15938, 2001.
[36] F. Nack, J. van Ossenbruggen, and L. Hardman. That
     Obscure Object of Desire: Multimedia Metadata on
     the Web (Part II). IEEE Multimedia, 12(1), 2005.