<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Ontology Framework For Knowledge-Assisted Semantic Video Analysis and Annotation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>S. Dasiopoulou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>V. K. Papastathis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>V. Mezaris</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>I. Kompatsiaris</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>M. G. Strintzis</string-name>
          <email>strintzi@iti.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Informatics and Telematics Institute (ITI)/ Centre for Research and Technology Hellas (CERTH)</institution>
          ,
          <addr-line>1st Km Thermi-Panorama Rd, Thessaloniki 57001</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Information Processing Laboratory, Electrical and Computer Engineering Department, Aristotle University of Thessaloniki</institution>
          ,
          <addr-line>Thessaloniki 54124</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>An approach for knowledge assisted semantic analysis and annotation of video content, based on an ontology infrastructure is presented. Semantic concepts in the context of the examined domain are defined in an ontology, enriched with qualitative attributes of the semantic objects (e.g. color homogeneity), multimedia processing methods (color clustering, respectively), and numerical data or low-level features generated via training (e.g. color models, also defined in the ontology). Semantic Web technologies are used for knowledge representation in RDF/RDFS language. Rules in F-logic are defined to describe how tools for multimedia analysis should be applied according to different object attributes and low-level features, aiming at the detection of video objects corresponding to the semantic concepts defined in the ontology. This supports flexible and managed execution of various application and domain independent multimedia analysis tasks. This ontology-based approach provides the means of generating semantic metadata and as a consequence Semantic Web services and applications have a greater chance of discovering and exploiting the information and knowledge in multimedia data. The proposed approach is demonstrated in the Formula One and Football domains and shows promising results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        As a result of recent progress in hardware and telecommunication technologies,
multimedia has become a major source of content on the World Wide Web, used
in a wide range of applications in areas such as content production and
distribution, telemedicine, digital libraries, distance learning, tourism, distributed
CAD/CAM, GIS, etc. The usefulness of all these applications is largely
determined by their accessibility and portability and as such, multimedia data sets
present a great challenge in terms of storing, querying, indexing and retrieval. In
addition, the rapid increase of the available amount of multimedia information
has revealed an urgent need for developing intelligent methods for
understanding and managing the conveyed information. To face such challenges
developing faster hardware or more sophisticated algorithms has become insufficient.
Rather, a deeper understanding of the information at the semantic level is
required [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This results in a growing demand for efficient methods for extracting
semantic information from such content, since this is the key enabling factor for
the management and exploitation of multimedia content.
      </p>
      <p>
        Although new multimedia standards, such as MPEG-4 and MPEG-7 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
provide the needed functionalities in order to manipulate and transmit objects and
metadata, their extraction, and that most importantly at a semantic level, is
out of the scope of the standards and is left to the content developer. Extraction
of features and object recognition are important phases in developing general
purpose multimedia database management systems [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Significant results have
been reported in the literature for the last two decades, with successful
implementation of several prototypes [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. However, the lack of precise models and
formats for object and system representation and the high complexity of
multimedia processing algorithms make the development of fully automatic semantic
multimedia analysis and management systems a challenging task.
      </p>
      <p>
        This is due to the difficulty, often mentioned as the semantic gap, in
capturing concepts mapped into a set of image and/or spatiotemporal features that
can be automatically extracted from video data without human intervention
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The use of domain knowledge is probably the only way by which higher
level semantics can be incorporated into techniques that capture the semantics
through automatic parsing. Such techniques are turning to knowledge
management approaches, including Semantic Web technologies to solve this problem [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
A priori knowledge representation models are used as a knowledge base that
assists semantic-based classification and clustering [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] automatic
associations between media content and formal conceptualizations are performed
based on the similarity of visual features extracted from a set of pre-annotated
media objects and the examined media objects. In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], semantic entities, in the
context of the MPEG-7 standard, are used for knowledge-assisted video
analysis and object detection, thus allowing for semantic level indexing. In [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], the
problem of bridging the gap between low-level representation and high-level
semantics is formulated as a probabilistic pattern recognition problem. In [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], an
object ontology, coupled with a relevance feedback mechanism, is introduced to
facilitate the mapping of low-level to high-level features and allow the definition
of relationships between pieces of multimedia information.
      </p>
      <p>In this paper, an approach for knowledge assisted semantic content analysis
and annotation, based on a multimedia ontology infrastructure, is presented.
Content-based analysis of multimedia requires methods which will
automatically segment video sequences and key frames into image areas corresponding to
salient objects, track these objects in time, and provide a flexible framework for
object recognition, indexing, retrieval and for further analysis of their relative
Domain Knowledge Base</p>
      <p>RDFS
ontology
F-logic Rules
Algorithm
Repository</p>
      <p>Main
Processing
Module</p>
      <p>Multimedia
MCuoltnitmenetdia
Content
Semantic
Multimedia</p>
      <p>
        Description
motion and interactions. This problem can be viewed as relating symbolic terms
to visual information by utilizing syntactic and semantic structure in a manner
related to approaches in speech and language processing [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. In the proposed
approach, semantic and low-level attributes of the objects to be detected in
combination with appropriately defined rules determine the set of algorithms and
parameters required for the objects detection. Semantic concepts within the
context of the examined domain are defined in an ontology, enriched with qualitative
attributes of the semantic objects, multimedia processing methods, and
numerical data or low-level features generated via training. Semantic Web technologies
are used for knowledge representation in RDF/RDFS language. Processing may
then be performed by using the necessary processing tools and by relating
highlevel symbolic representations to extracted features in the signal (image and
temporal feature) domain. F-logic rules are defined to describe how tools for
multimedia analysis should be applied according to different object attributes
and low-level features, aiming at the detection of video objects corresponding
to the semantic concepts defined in the ontology. The proposed approach, by
exploiting the domain knowledge modelled in the ontology, enables the
recognition of the underlying semantics of the examined video, providing a first level
semantic annotation. The general system architecture is shown in Fig. 1
      </p>
      <p>Following this approach, the multimedia analysis and annotation process
largely depends on the knowledge base of the system and as a result the method
can easily be applied to different domains provided that the knowledge base is
enriched with the respective domain ontology. Extending the knowledge base
with spatial and temporal objects interrelations would be an important step
towards the detection of semantically important events for the particular domain,
achieving thus a finer, high-level semantic annotation. In addition, the
ontologybased approach also ensures that semantic web services and applications have a
greater chance of discovering and exploiting the information and knowledge in
multimedia data.</p>
      <p>The remainder of the paper is organized as follows: section 2 a detailed
description of the ontology and rules developed is given, while in section 3, its
application to the Formula One domain is described. Experimental results are
presented in section 4. Finally, conclusions are drawn in section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>Multimedia Analysis Ontology Development and Rule</title>
    </sec>
    <sec id="sec-3">
      <title>Construction</title>
      <p>In order to realize the knowledge-assisted multimedia content semantic analysis
and annotation technique explained in the previous section, an analysis and
a domain ontology are constructed. The multimedia analysis ontology is used
to support the detection process of the corresponding domain specific objects.
Knowledge about the domain under discourse is also represented in the form
of an ontology, namely the domain specific ontology. The domain-independent,
primitive classes comprising the analysis ontology serve as attachment points
allowing the integration of the two ontologies. Practically, each domain ontology
comprises a specific instantiation of the multimedia analysis ontology providing
the corresponding color models, restrictions e.t.c as will be demonstrated in more
detail in section 3.</p>
      <p>Object detection in general considers the exploitation of objects
characteristic features in order to apply the most appropriate detection steps for the
analysis process in the form of algorithms and numerical data generated off-line
by training (e.g. color models). Consequently, the development of the proposed
analysis ontology deals with the following concepts (RDFS classes) and their
corresponding properties, as illustrated in Fig. 2:
– Class Object: the superclass of all video objects to be detected through
the analysis process. Each object instance is related to appropriate feature
instances by the hasFeature property and to one or more other objects
through a set of appropriately defined spatial properties.
– Class Feature: the superclass of multimedia low-level features associated
with each object.
– Class Feature Parameter which denotes the actual qualitative descriptions
of each corresponding feature. It is subclassed according to the defined
features, i.e. to Connectivity Feature Parameter, Homogeneity Feature
Parameter e.t.c.
– Class Limit: it is subclassed to Minimum and Maximum and allows the
definition of value restrictions to the various feature parameters.
– The Color Model and Color Component classes are used for the
representation of the color information, encoded in the form of the Y, Cb, Cr
components of the MPEG color space.
– Class Distribution and Distribution Parameter represent information
regarding the defined Feature Parameter models.
– Class Motion Norm: used to represent information regarding the object
motion.
– Class Algorithm: the superclass of the available processing algorithms (A1,
A2,. . . ,An) to be used during the analysis procedure. This class is linked to
the FeatureParameter class through the usesFeatureParameter property
in order to represent the potential argument list for each algorithm.
– Class Detection: used to model the detection process, which in our
framework consists of two stages. The CandidateRegionSelection involves
finding a set of regions which are potential matches for the object to be detected,
isA</p>
      <p>Total
Dependency
has</p>
      <p>Dependency
hasFeature
hasFeature</p>
      <p>Parameter
has</p>
      <p>Limit
isA</p>
      <p>Partial
Dependency</p>
      <p>Size</p>
      <p>isA
Connectivity</p>
      <p>Texture
isA
isA
Homogeneity</p>
      <p>Feature
Standard isA
Deviation
Mean
Value isA</p>
      <p>Limit
isA
Maximum</p>
      <p>isA
Minimum</p>
      <p>Object</p>
      <p>hasDetection
isA</p>
      <p>Object_2
isA
Object_1</p>
      <p>Feature</p>
      <p>Parameter
isA
Connectivity
Feature Parameter
isA</p>
      <p>Texture
isA Feature Parameter
Homogeneity</p>
      <p>Feature Parameter
Motion Norm</p>
      <p>Color Model
Distribution
Parameter
has ParameterValue
hasDistribution
Parameter
"Integer"</p>
      <p>Distribution</p>
      <p>isA
Gaussian</p>
      <p>isA
Candidate Region</p>
      <p>Selection
uses Feature
Parameter</p>
      <p>isA
Earth's Mover</p>
      <p>Distance- EMD
hasColorComponent
has
Distribution
hasDetection
Part</p>
      <p>Detection
Detection Part</p>
      <p>isA
Final Region</p>
      <p>Selection
Algorithm
has
Detection</p>
      <p>Step
isA
Four Connectivity
Component Labelling
isA
K-means</p>
      <p>Color
Component</p>
      <p>Y
Cb
Cr
while FinalRegionSelection leads to the selection of only one region that
best matches the criteria predefined for this object (e.g. size specifications).
– Class Dependency: this concept addresses the possibility that the
detection of one object may depend on the detection of another, due to possible
spatial or temporal interrelations between the two objects. For example in
the Formula One domain, the detection of the car could be assisted and
improved if the more dominant and characteristic region of road is detected
first. In order to differentiate between the case where the detection of object
O1 requires the detection of the candidate regions of object O2 and the case
where the entire final region of object O2 is required, PartialDependency
and TotalDependency are introduced.</p>
      <p>
        As mentioned before, the choice of algorithms employed for the detection
of each object is directly dependent on its available characteristic features. This
association is determined by a set of properly defined rules represented in F-logic.
F-logic is a language that enables both ontology representation and reasoning
about concepts, relations and instances [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ].
      </p>
      <p>The rules required for the presented approach are: rules to define the mapping
between algorithms and features (which implicitly define the object detection
steps), rules to determine algorithms input parameters, if any, and rules to deal
with object interdependencies as explained above. The rules defined for each
category have the following form:
– “IF an object O has features F1 F2 . . . Fn as part of its qualitative
description THEN algorithm A1 is a step for the detection of O.”
– “IF an object O has feature F AND O has algorithm A as detection step</p>
      <p>AND A uses feature F THEN A has as input the parameter values of F .”
– “IF an object O1 has partial dependency on object O2 AND object O2
has as CandidateRegionSelection part the set S = {A1, A2, . . . , Am}
THEN execute the set of algorithms included in S before proceeding with
the detection of O1.”
– IF an object O1 is totally dependent on object O2 THEN execute all
detection steps for O2 before proceeding with the execution of O1 detection.”
In order for the described multimedia analysis ontology to be applied, a
domain specific ontology is needed. This ontology provides the vocabulary and
background knowledge of the domain i.e. the semantically significant concepts
and the properties among them. In the context of video understanding it maps
to the important objects, their qualitative and quantitative attributes and their
interrelations.
3</p>
    </sec>
    <sec id="sec-4">
      <title>Domain Knowledge Ontology</title>
      <p>As previously mentioned, for the demonstration of the proposed approach the
Formula One and Football domains were used. The detection of semantically
significant objects, such as the road area and the cars in racing video for example,
is an important step towards understanding and extracting the semantics of a
temporal segment of the video by efficiently modelling the events captured in
it. The set of features associated with each object comprises their definitions in
terms of low-level features as used in the context of video analysis. The selection
of the attributes to be included is based on their ability to act as distinctive
features for the analysis to follow, i.e. the differences in their definitions indicate
the different processing methods that should be employed for their identification.
As a consequence, the definitions used for the Formula One domain are:
– Car: a motion homogeneous (i.e. comprising elementary parts characterized
by similar motion), fully connected region whose motion norm must be above
a minimum value and whose size can not exceed a predefined maximum
value.
– Road: a color homogeneous, fully connected region, whose size has to exceed
a predefined minimum value and additionally to be the largest such region
in the video.
– Grass: a color homogeneous, partly connected region with the requirement
that each of its components has a minimum predefined size.
– Sand: a color homogeneous, partly connected region with the requirement
that each of its components has a size exceeding a predefined minimum.</p>
      <p>In a similar fashion, the corresponding definitions for the Football domain
include the concepts Player, Field and Spectators and their respective visual
descriptions. As can be seen, the developed domain ontologies focus mainly on
the representation of the object attributes and positional relations and in the
current version does not include event definitions. For the same object, multiple
instances of the Color Model class are supported, since the use of more than
one color models for a single object may be advantageous in some cases.
3.1</p>
      <p>
        Compressed-domain Video Processing and Rules
The proposed knowledge-based approach is applied to MPEG-2 compressed
streams. The information used by the proposed algorithms is extracted from
MPEG sequences during the decoding process. Specifically, the extracted color
information is restricted to the DC coefficients of the macroblocks of I-frames,
corresponding to the Y, Cb and Cr components of the MPEG color space.
Additionally, motion vectors are extracted for the P-frames and are used for
generating motion information for the I-frames via interpolation. P-frame motion
vectors are also necessary for the temporal tracking in P-frames, of the objects
detected in the I-frames [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>The procedure for detecting the desired objects starts by performing a set of
initial clusterings, using up to eight dominant colors in each frame to initialize a
K-means algorithm. ¿From the resulting mask, which contains a number of
nonconnected color-homogeneous regions, the non-connected semantic objects can
be identified by color-model based selection. The application of a four
connectivity component labelling algorithm results in a new mask featuring connected
color-homogenous components. The color-model-based selection of an area
corresponding to a color-homogeneous semantic object is performed using a suitable
mask and the Earth Movers Distance (EMD). EMD computes the distance
between two distributions represented as signatures and is defined as the minimum
amount of work needed to change one signature into the other. Additional
requirements as imposed by the models represented in the ontology, are checked to
lead to the desired object detection. For motion-homogeneous objects a similar
process is followed. At first, a mask containing motion-homogeneous regions is
generated. Subsequently, the model- based selection depends on the information
contained in the ontology (e.g. size restrictions, motion requirements).</p>
      <p>The construction of the domain specific rules derives directly from the
aforementioned video processing methodology. For example, since color clustering is
the first step for the detection of any of the three objects, a rule of the first
category without any feature matching condition is used to add the k-means
algorithm as the first detection step to all objects. A set of different algorithms
could have been used as long as the respective instantiations are defined.
4</p>
    </sec>
    <sec id="sec-5">
      <title>Experimental results</title>
      <p>
        The proposed approach was tested in two different domains: the Formula One
and the Football domain. In both cases, the exploitation of the knowledge
contained in the respective system ontology and the associated rules resulted to
the application of the appropriate analysis algorithms using suitable parameter
values, for the detection of the domain specific objects. For ontology creation the
OntoEdit ontology engineering environment [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] was used, having F-logic as the
output language. A variety of MPEG-2 videos of 720 × 576 pixels were used for
testing and evaluation of the knowledge assisted semantic annotation system.
      </p>
      <p>For the Formula One domain our approach was tested on a one-hour video.
As was discussed in section 3, four objects were defined for this domain. For
those objects whose homogeneity attribute is described in the ontology by the
Color Homogeneity class, the corresponding color models were extracted from
a training set of approximately 5 minutes of manually annotated Formula One
video. Since we assume the model to be a Gaussian distribution for each one of
the three components of the color space, the color models were calculated from
the annotated regions of the training set accordingly. Results for the Formula
One domain are presented both in terms of sample segmentation masks showing
the different objects detected in the corresponding frames (Fig. 3) as well as
numerical evaluation of the results over a ten-minute segment of the test set
(Table. 1). For the Football domain, the proposed semantic analysis and
annotation framework was tested on a half-hour video, following a procedure similar
to the one illustrated for the Formula One domain. Segmentation masks for this
domain are shown in Fig. 4, while numerical evaluation of the results over a
ten-minute segment of the test set for this domain are given in Table. 1.</p>
      <p>For the numerical evaluation, the semantic objects appearing on each I-frame
were manually annotated and compared with the results produced by the
proposed system. It is important to note that the regions depicted in the generated
segmentation masks correspond to semantic concepts and this mapping is
defined according to the domain specific knowledge (i.e. object models) provided
in the ontology.
5</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>In this paper we have presented an ontology-based approach for knowledge
assisted domain-specific semantic video analysis. Knowledge involves qualitative
object attributes, quantitative low-level features generated by training as well
as multimedia processing methods. The proposed approach aims at formulating
a domain specific analysis model with the additional information provided by
rules, appropriately defined to address the inherent algorithmic issues.</p>
      <p>Future work includes the enhancement of the domain ontology with more
complex model representations, including spatial and temporal relationships,
and the definition of semantically important events in the domain of discourse.
Further exploration of low-level multimedia features (e.g. use of the MPEG-7
standardized descriptors) is expected to lead to more accurate and thus efficient
representations of semantic content. The above mentioned enhancements will
allow more meaningful reasoning, thus improving the efficiency of multimedia
content understanding. Another possibility under consideration is the use of a
more expressive language, e.g. OWL, in order to capture a more realistic model
of the specific domain semantics.
Object correct detections false detections missed</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>S.-F.</given-names>
            <surname>Chang</surname>
          </string-name>
          .
          <article-title>The holy grail of content-based media analysis</article-title>
          .
          <source>IEEE Multimedia</source>
          ,
          <volume>9</volume>
          (
          <issue>2</issue>
          ):
          <fpage>6</fpage>
          -
          <lpage>10</lpage>
          ,
          <string-name>
            <surname>Apr</surname>
          </string-name>
          .-Jun.
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>S.-F.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sikora</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Puri</surname>
          </string-name>
          .
          <article-title>Overview of the MPEG-7 standard</article-title>
          .
          <source>IEEE Trans. on Circuits and Systems for Video Technology</source>
          ,
          <volume>11</volume>
          (
          <issue>6</issue>
          ):
          <fpage>688</fpage>
          -
          <lpage>695</lpage>
          ,
          <year>June 2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>A.</given-names>
            <surname>Yoshitaka</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Ichikawa</surname>
          </string-name>
          .
          <article-title>A survey on content-based retrieval for multimedia databases</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>11</volume>
          (
          <issue>1</issue>
          ):
          <fpage>81</fpage>
          -
          <lpage>93</lpage>
          , Jan/Feb
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>P.</given-names>
            <surname>Salembier</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Marques</surname>
          </string-name>
          .
          <article-title>Region-Based Representations of Image and Video: Segmentation Tools for Multimedia Services</article-title>
          .
          <source>IEEE Trans. Circuits and Systems for Video Technology</source>
          ,
          <volume>9</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1147</fpage>
          -
          <lpage>1169</lpage>
          ,
          <year>December 1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>W.</given-names>
            <surname>Al-Khatib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.F.</given-names>
            <surname>Day</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ghafoor</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.B.</given-names>
            <surname>Berra</surname>
          </string-name>
          .
          <article-title>Semantic modeling and knowledge representation in multimedia databases</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>11</volume>
          (
          <issue>1</issue>
          ):
          <fpage>64</fpage>
          -
          <lpage>80</lpage>
          , Jan/Feb
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>S. Little J.</given-names>
            <surname>Hunter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Drennan</surname>
          </string-name>
          .
          <article-title>Realizing the hydrogen economy through semantic web technologies</article-title>
          .
          <source>IEEE Intelligent Systems Journal - Special</source>
          Issue on eScience,
          <volume>19</volume>
          :
          <fpage>40</fpage>
          -
          <lpage>47</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>A.</given-names>
            <surname>Yoshitaka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kishida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hirakawa</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Ichikawa</surname>
          </string-name>
          .
          <article-title>Knowledge-assisted content-based retrieval for multimedia databases</article-title>
          .
          <source>IEEE Multimedia</source>
          ,
          <volume>1</volume>
          (
          <issue>4</issue>
          ):
          <fpage>12</fpage>
          -
          <lpage>21</lpage>
          ,
          <year>Winter 1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>V.</given-names>
            <surname>Mezaris</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Kompatsiaris</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.G.</given-names>
            <surname>Strintzis</surname>
          </string-name>
          .
          <article-title>An Ontology Approach to Objectbased Image Retrieval</article-title>
          .
          <source>In Proc. IEEE Int. Conf. on Image Processing (ICIP03)</source>
          , Barcelona, Spain, Sept.
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>A.B.</given-names>
            <surname>Benitez</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.F.</given-names>
            <surname>Chang</surname>
          </string-name>
          .
          <article-title>Image Classification Using Multimedia Knowledge Networks</article-title>
          .
          <source>In Proc. IEEE Int. Conf. on Image Processing (ICIP03)</source>
          , Barcelona, Spain, Sept.
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>R.</given-names>
            <surname>Tansley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bird</surname>
          </string-name>
          , W. Hall,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Weal</surname>
          </string-name>
          .
          <article-title>Automating the linking of content and concept</article-title>
          .
          <source>In Proc. ACM Int. Multimedia Conf. and Exhibition</source>
          (ACM MM-
          <year>2000</year>
          ), Oct./Nov.
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. G. Tsechpenakis, G. Akrivas, G. Andreou, G. Stamou, and
          <string-name>
            <given-names>S.D.</given-names>
            <surname>Kollias</surname>
          </string-name>
          .
          <article-title>Knowledge-Assisted Video Analysis and Object Detection</article-title>
          .
          <source>In Proc. European Symposium on Intelligent Technologies, Hybrid Systems and their implementation on Smart Adaptive Systems (Eunite02)</source>
          , Algarve, Portugal,
          <year>September 2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>M. Ramesh Naphade</surname>
            ,
            <given-names>I.V.</given-names>
          </string-name>
          <string-name>
            <surname>Kozintsev</surname>
            , and
            <given-names>T.S.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
          </string-name>
          .
          <article-title>A factor graph framework for semantic video indexing</article-title>
          .
          <source>IEEE Trans. on Circuits and Systems for Video Technology</source>
          ,
          <volume>12</volume>
          (
          <issue>1</issue>
          ):
          <fpage>40</fpage>
          -
          <lpage>52</lpage>
          , Jan.
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. I.
          <string-name>
            <surname>Kompatsiaris</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Mezaris</surname>
            , and
            <given-names>M. G.</given-names>
          </string-name>
          <string-name>
            <surname>Strintzis</surname>
          </string-name>
          .
          <article-title>Multimedia content indexing and retrieval using an object ontology</article-title>
          .
          <source>Multimedia Content and Semantic Web - Methods</source>
          , Standards and Tools, Editor G.Stamou, Wiley, New York, NY,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>C.</given-names>
            <surname>Town</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Sinclair</surname>
          </string-name>
          .
          <article-title>A self-referential perceptual inference framework for video interpretation</article-title>
          .
          <source>In Proceedings of the International Conference on Vision Systems</source>
          , volume
          <volume>2626</volume>
          , pages
          <fpage>54</fpage>
          -
          <lpage>67</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>J.</given-names>
            <surname>Angele</surname>
          </string-name>
          and
          <string-name>
            <surname>G. Lausen.</surname>
          </string-name>
          <article-title>Ontologies in F-logic</article-title>
          .
          <source>International Handbooks on Information Systems</source>
          . Springer,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>M. Kifer</surname>
            , G. Lausen, and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
          </string-name>
          .
          <article-title>Logical foundations of object-oriented and framebased languages</article-title>
          .
          <source>J. ACM</source>
          ,
          <volume>42</volume>
          (
          <issue>4</issue>
          ):
          <fpage>741</fpage>
          -
          <lpage>843</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>V.</given-names>
            <surname>Mezaris</surname>
          </string-name>
          , I. Kompatsiaris,
          <string-name>
            <given-names>N.V.</given-names>
            <surname>Boulgouris</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.G.</given-names>
            <surname>Strintzis</surname>
          </string-name>
          .
          <article-title>Real-time compressed-domain spatiotemporal segmentation and ontologies for video indexing and retrieval</article-title>
          .
          <source>IEEE Trans. on Circuits and Systems for Video Technology</source>
          ,
          <volume>14</volume>
          (
          <issue>5</issue>
          ):
          <fpage>606</fpage>
          -
          <lpage>621</lpage>
          , May
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sure</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Angele</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          .
          <source>OntoEdit: Guiding Ontology Development by Methodology and Inferencing</source>
          . Springer-Verlag,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>