<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Thinking of a System for Image Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giovanna Castellano</string-name>
          <email>castellano@di.uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gianluca Sforza</string-name>
          <email>gsforza@di.uniba.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandra Torsello</string-name>
          <email>torsello@di.uniba.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Università degli Studi di Bari</institution>
          , “
          <addr-line>Aldo Moro”, via Orabona 4, Bari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Università degli Studi di Bari</institution>
          , “
          <addr-line>Aldo Moro”, via Orabona 4, Bari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Università degli Studi di Bari</institution>
          , “
          <addr-line>Aldo Moro”, via Orabona 4, Bari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <fpage>27</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>Increasing applications are demanding e ective and e cient support to perform retrieval in large collections of digital images. The work presented here is an early stage research focusing on the integration between text-based and contentbased image retrieval. The main objective is to nd a valid solution to the problem of reducing the so called semantic gap, i.e. the lack of coincidence existing between the visual information contained in an image and the interpretation that a user can give of it. To address the semantic gap problem, we intend to use a combination of several approaches. Firstly, a linking between low-level features and text description is obtained by a semi-automatic annotation process, which makes use of shape prototypes generated by clustering. Precisely, the system indexes objects based on shape and groups them into a set of clusters, with each cluster represented by a prototype. Then, a taxonomy of objects that are described by both visual ontologies and textual features is attached to prototypes, by forming a visual description of a subset of the objects. The paper outlines the architecture of the system and describes brie y algorithms underpinning the proposed approach.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
      <p>H [Information Storage and Retrieval]</p>
      <sec id="sec-1-1">
        <title>Image retrieval</title>
        <p>Content-based image retrieval, Semantic image retrieval</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>1. INTRODUCTION</title>
      <p>By the end of the last century the question was not whether
digital image archives are technically and economically
viable, but rather how these archives would be e cient and
informative. The attempt has been to develop intelligent
and e cient human-computer interaction systems, enabling</p>
      <sec id="sec-2-1">
        <title>Corresponding author</title>
        <p>the user to access vast amounts of heterogeneous image sets,
stored in di erent sites and archives. Additionally, the
continuously increasing number of people that should access to
such collections further dictates that more emphasis be put
on attributes such as the user-friendliness and exibility of
any multimedia content retrieval scheme.</p>
        <p>
          The very rst attempts at image retrieval were based on
exploiting existing image captions to classify images
according to predetermined classes or to create a restricted
vocabulary [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Although relatively simple and computationally
e cient, this approach has several restrictions mainly
deriving from the use of a restricted vocabulary that neither
allows for unanticipated queries nor can be extended without
re-evaluating the possible connection between each item in
the database and each new addition to the vocabulary.
Additionally, such keyword-based approaches assume either the
pre-existence of textual annotations (e.g. captions) or that
annotation using the predetermined vocabulary is performed
manually. In the latter case, inconsistency of the keyword
assignments among di erent indexers can also hamper
performance. Recently, a methodology for computer-assisted
annotation of image collections was presented [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
        </p>
        <p>
          To overcome the limitations of the keyword-based
approach, the use of the visual content has been proposed,
leading to Content-Based Image Retrieval(CBIR) approaches
[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. CBIR systems utilize the visual content of images to
perform indexing and retrieval, by extracting low-level
indexing features, such as color, shape, and texture. In this
case, pre-processing of images is necessary as the basis on
which features are extracted. The pre-processing is of coarse
granularity if it involves processing of images as a whole,
whereas it is of ne granularity if it involves detection of
objects within an image [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Then, relevant images are
retrieved by comparing the low-level features of each item in
the database with those of a user-supplied sketch or, more
often, a key image that is either selected from a restricted
image set or is supplied by the user (query-by-example).
Several approaches have appeared in the literature which
perform visual querying by examples taking into account
di erent facets of pictorial data to express the image
contents, such as color [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], object shape [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], texture [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], or
a combination of them [
          <xref ref-type="bibr" rid="ref18 ref20 ref8">8, 18, 20</xref>
          ]. Among these, search by
matching shapes of image portions is one of the most natural
way to pose a query in image databases.
        </p>
        <p>
          Though many sophisticated algorithms have been designed
to describe color, shape, and texture features, these
algorithms cannot adequately model image semantics. Indeed,
extensive experiments on CBIR show that low-level contents
often fail to describe the high-level semantic concepts in
user's mind [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. Also, CBIR systems have limitations when
dealing with broad content image databases [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]; indeed, in
order to start a query, the availability of an appropriate key
image is assumed; occasionally, this is not feasible,
particularly for classes of images that are underrepresented in the
database. Therefore, the performance of CBIR systems is
still far from user's expectations.
        </p>
        <p>
          Summarizing, current indexing schemes for image retrieval
employ descriptors ranging from low-level features to
higherlevel semantic concepts [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. So far, signi cant work has
been presented on unifying keywords and visual contents in
image retrieval, and several hybrid methods exploiting both
keywords and the visual content have been proposed [
          <xref ref-type="bibr" rid="ref12 ref17 ref26">17,
12, 26</xref>
          ]. Depending on how low-level and high-level
descriptors are employed and/or combined together, di erent levels
of image retrieval can be achieved. According to [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], three
levels of image retrieval can be considered:
        </p>
        <p>Level 1: Low-level features such as color, texture, shape
or the spatial location of image elements are exploited
in the retrieval process. At this level, the system
supports queries like nd pictures like this or nd pictures
containing blue squares.</p>
        <p>Level 2: Objects of given type identi ed by low-level
features are retrieved with some degree of logical
inference. An example of query is nd pictures in which
my father appears.</p>
        <p>Level 3: Abstract attributes associated to objects are
used for retrieval. This involves a signi cant amount
of high-level reasoning about the meaning of the
objects or scenes depicted. An example of query is nd
pictures of a happy woman.</p>
        <p>
          Retrieval including both Level 2 and Level 3 together is
referred to as semantic image retrieval. The gap between
Level 1 and Level 2 is known as semantic gap, which is "the
lack of coincidence between the information that one can
extract from the visual data and the interpretation that the
same data have for a user in a given situation" [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Retrieval
at Level 3 is quite di cult, therefore current systems mostly
perform retrieval at Level 2, which requires three
fundamental steps: (1) extraction of low-level image features, (2)
definition of proper similarity measures to perform matching,
(3) reducing the semantic gap. Clearly, step (3) is the most
challenging one, since it requires providing a link between
low-level features (visual data) and high-level concepts
(semantic interpretation of visual data).
        </p>
        <p>
          Currently, various approaches have been proposed to
reduce the semantic gap between the low-level features of
images and the high-level concepts that are understandable by
human. According to [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], they can be broadly grouped into
four main categories:
        </p>
        <p>
          Use of ontologies [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Ontologies can be used to
provide an explicit, simpli ed and abstract speci cation
of knowledge about the domain of interest; this is
obtained by de ning concepts and relationships between
them, according to the speci c purpose of the
considered problem. This approach exploits the
possibility to simply derive semantics from our daily
language. Then, di erent descriptors can be related to
the low-level features of images in order to form a
vocabulary that provides a qualitative de nition of
highlevel query concepts. Finally, these descriptors can be
mapped to high level semantics, based on our
knowledge. This approach works ne with small databases
containing speci cally collected images. With large
collections of images with various contents, more
powerful tools are required to learn the semantics.
        </p>
        <p>
          Automatic image annotation [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. This approach
consists in exploiting supervised or unsupervised learning
techniques to derive high-level concepts from images.
In particular, supervised learning techniques are used
to predict values of a semantic category based on a
set of training samples. However, supervised learning
algorithms present some disadvantages strictly related
to the nature of this kind of technique, that require a
large amount of labeled data to provide e ective
learning results. This represents a problem when the
application domain changes and new labeled samples have
to be provided. Clustering is the typical unsupervised
learning technique used for retrieval purpose. In this
approach, images are grouped on the basis of some
similarity measure, so that a class label is associated
to each derived cluster. Images into the same cluster
are supposed to be similar to each other (i.e. having
similar semantic content). Thus, a new untagged
image that is added to the database can be indexed by
assigning it to the cluster that better matches with the
image.
        </p>
        <p>
          Relevance feedback [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. This approach concerns the
possibility to learn the intentions of users and their
speci c needs by exploiting information obtained
during their interactions with the system. In
particular, when the system provides the initial retrieval
results, the user judges these by indicating if they are
relevant/irrelevant (and eventually the degree of
relevance/irrelevance). Then, a learning algorithm is used
to learn the user feedback, which will be exploited in
order to provide results that better satisfy the user
needs.
        </p>
        <p>
          Generating semantic templates [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. This method is
based on the concept of visual semantic template that
includes a set of icons or objects denoting a
personalized view of concepts. Feature vectors of these objects
are extracted for query process. Initially, the user has
to de ne the template of a concept by specifying, for
example, the objects and their spatial and temporal
constraints and the weights assigned to each feature
for each object. Finally, through the interaction with
users, the system move toward a set of queries that
better express the concept in the user mind. Since this
method requires the user to know the image features,
it could be quite di cult for ordinary users.
        </p>
        <p>Along with state-of-art directions in the eld of IR, in
this paper we present the idea of an IR system supporting
retrieval at Level 2. Precisely, we intend to provide a
solution to the problem of semantic gap in IR by designing a
methodology based on a combination of several approaches,
which is oriented to exploit both the visual and the semantic
content of images. This is achieved making use of clustering
and visual ontologies. In the following, all the approaches
underpinning the proposed IR methodology are brie y
described and the architecture of the system is outlined.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>OVERVIEW OF THE IR SYSTEM</title>
      <p>The proposed system is intended to perform image
retrieval by exploiting both the visual and the semantic
content of images. As concerns the visual content, in this
preliminary phase of the research we focus only on shape
content. In fact, we aim to deal with speci c domain images
containing objects that have a distinguishable shape
meaning. Therefore, we assume that indexing and querying are
only based on shape matching. The system will allow the
user to query the image database not only by shape sketches
and by keywords but also by \concepts describing shapes".
The general architecture of the proposed IR system is
reported in g. 1.</p>
      <p>As it can be seen, several tasks are carried out in order
to derive visual and textual features of shapes contained in
images. These tasks are:</p>
      <sec id="sec-3-1">
        <title>1. Feature extraction: detecting shapes in images;</title>
        <p>2. Clustering: grouping similar shapes into prototypes;
3. Semi-automatic annotation: associating keywords to
prototypes;</p>
      </sec>
      <sec id="sec-3-2">
        <title>4. Search.</title>
        <p>In the following we describe how each task is carried out.
2.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Feature extraction</title>
      <p>In the proposed system, each image in the database is
stored as a collection of objects' shapes contained in it. In
order to be stored in the database, every image is processed
to identify objects appearing in it. Image processing starts
with an edge detection process that extracts all contours in
the image. Then, using the derived edges, a shape detection
process is performed to identify di erent objects included
in the image and determine their contours. Finally, Fourier
descriptors are computed on each contour and retained as
visual signatures of the objects in a separate database.
2.2</p>
    </sec>
    <sec id="sec-5">
      <title>Clustering</title>
      <p>Once all shapes have been detected from images and
represented as visual signatures vectors, a set of shape prototypes
is automatically de ned by an unsupervised learning
process that performs clustering on visual signatures (Fourier
descriptors) of shapes, so as to categorize similar shapes into
clusters. Each resulting cluster Ci is represented by a shape
prototype pi, that is computed by averaging visual
signatures of all shapes belonging to the cluster. We intend to
apply a hierarchical clustering, in order to generate a
hierarchy of prototypical shapes. Each node of the
hierarchical tree is associated with one prototypical shape. Root
nodes of the tree represent general prototypes, intermediate
nodes represent general shapes, leaf nodes represent speci c
shapes.</p>
      <p>
        During the interaction of the user with the system, the
hierarchical tree is incrementally updated. Whenever a new
shape is considered (i.e. each time a new image containing
relevant object shapes is added to the database), we evaluate
its matching against all existing prototypes, from root nodes
to pre-leafs( nal) nodes, according to a similarity measure
de ned on visual signatures. If the new shape matches a nal
prototype with a su cient degree, then the corresponding
prototype is updated by averaging the features of shapes
that belong to the corresponding cluster [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Otherwise, a
new prototype is created, corresponding to the new shape.
      </p>
      <p>The use of shape prototypes, which represent an
intermediate level of visual signatures, facilitates the subsequent
tasks 3. and 4. Actually, prototypes facilitate the
annotation process, since only a reduced number of shapes (the
prototypical ones) need to be manually annotated. Secondly,
the use of prototypes simpli es the search process. Indeed,
since only a small number of objects is likely to match any
single user query, a large number of unnecessary
comparisons is avoided during search by performing matching with
shape prototypes rather than with speci c shapes. In other
words, prototypes acts as a lter that reduces the search
space quickly while discriminating the objects.
2.3</p>
    </sec>
    <sec id="sec-6">
      <title>Semi-automatic annotation</title>
      <p>Once shape prototypes have been derived, a semi-automatic
annotation process is applied to associate text descriptions
to identi ed object shapes. The process is semi-automatic
since it involves a manual annotation only for prototypes:
shapes immediately attached in the hierarchy are
automatically annotated, since they inherit descriptions from their
prototypes.</p>
      <p>
        Every semantic class that is of interest in the considered
image domain (e.g. for ours, glasses, bottles, etc.) will be
described by a visual ontology (VO), which is intended as
a textual description, made of concepts and relationships
among them, of the visual content of a prototypical shape
[
        <xref ref-type="bibr" rid="ref4 ref9">9, 4</xref>
        ]. We gure the lexicon used to de ne the VOs to be
as much intuitive as possible, so as to evocate the particular
shape it describes. We plan that the system will be
supplied of a basic set of domain dependent VOs, one for each
considered semantic class.
      </p>
      <p>Of course, di erent prototypical shapes may convey the
same semantic content (e.g., several di erent shapes may
convey the concept of glass). We consider such prototypes
to belong to the same semantic class. Shape prototypes
belonging to the same semantic class will share about the same
VO structure, obviously with the appropriate di erences.</p>
      <p>As an illustrative example, we sketch some possible
relationships included in a VO that refers to the semantic class
glass:
wine glass IS SPECIALIZATION OF glass;
bottom IS PART OF wine glass;
wavy shape IS PROPERTY OF bottom.</p>
      <p>The combined use of prototypes and VOs provides a
powerful mechanism for automatic annotation of shapes. Every
time the user adds a new shape to the database, the system
associates the shape to the most similar prototype, which
is related to a semantic class and linked to a VO. Thus the
new shape inherits all the semantic descriptions associated
to the selected prototype in an automatic fashion. Then,
a feedback from the user is considered. Namely, the user
may accept the choice operated by the system, or reject it.
In the latter case, there are two possibilities: the user can
select the proper prototype with the related VO from the
existing ones, or, if no one can be associated to the shape,
the user can create a new prototype (using the new shape)
and manually annotate it by modifying the VO incorrectly
assigned by the system previously.
2.4</p>
    </sec>
    <sec id="sec-7">
      <title>Search</title>
      <p>The engine mechanism is designed to allow users to submit
sketch-based, text-based and concept-based queries.</p>
      <p>The results of the sketch-based search emerge from a
matching between the submitted sample shape and the created
prototypes. Precisely, when the user presents a query in the
form of an object sketch, the system formulates the query,
performing feature extraction by translating that object into
a shape model. The extracted query feature is submitted
to compute similarity between the query and prototypes
rst. This is made by considering shapes as points of a
feature space. Having characterized each shape as a vector
of Fourier descriptors, we simply evaluate dissimilarity
between two shapes in terms of Euclidean distance between
two vectors of descriptors. Of course, other similarity
measures can be considered, encapsulating the human
perception of shape similarity (this is an interesting issue that we
would like to deepen in future). After sorting the prototypes
in terms of similarity, the system returns images containing
objects indexed by the prototypes with highest similarities.</p>
      <p>The results of the text-based search emerge from a
matching between the submitted textual query and textual
descriptions associated to prototypes. Namely, when a query
is formulated in terms of keywords, the system simply
returns images including the objects indexed by the
prototypes labeled with that keywords. As before, high-matching
prototypes are selected to provide shapes to be visualized as
search results.</p>
      <p>Finally, when both a visual and textual content are
exploited by the user querying the image database, images
returned from the two approaches separately, are merged
together in a single output set.</p>
    </sec>
    <sec id="sec-8">
      <title>FIRST STEPS TOWARD THE SYSTEM</title>
    </sec>
    <sec id="sec-9">
      <title>DEVELOPMENT</title>
      <p>
        In this preliminary phase of the research, only the main
functions for tasks 1. and 4. described above have been
implemented in the system. For tests during the development
of the system, we considered an image database from the art
domain. The database, used in other IR works [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] includes
digitalized images representing still-object paintings by the
Italian artist Giorgio Morandi.
      </p>
      <p>As concerns task 1., various image processing tools that
are necessary to extract shape features from the image
objects have been developed, including edge detection
methods, as well as enhancement and reconstruction
functionalities. Basic image processing methods were included from
the ImageJ image analysis software1, such as thresholding
methods (e.g. Canny, Prewitt and Sobel) for automatic
detection of objects boundaries lying in images. Having the
possibility to act on contrast and brightness properties, the
user can adjust the image appearance to re ne the extraction
of the shapes of objects. The shape identi cation is made
automatically through an edge following algorithm. When
the result of shape identi cation is not satisfying, the user
is given the possibility to correct boundaries or to manually
draw boundaries directly on the image.</p>
      <p>As concerns task 4., the retrieval graphical interface has
been developed, that enables users to query the system and
to inspect search results ( g. 2). Also, the computation of
Euclidean dissimilarity measures for shape prototype
matching has been included in the system.</p>
      <p>Currently, the system provides also the interfaces for
browsing the database and insert new images.
4.</p>
    </sec>
    <sec id="sec-10">
      <title>CONCLUSIONS</title>
      <p>In this paper a preliminary proposal of an IR system has
been presented. The system is intended to solve the problem
of semantic gap by exploiting clustering and visual
ontologies. The use of a visual ontology is motivated by the
necessity of reproducing the capacity of a human in describing her
visual perception by means of the visual concepts she
possesses. From the point of human-computer interaction view,
visual ontologies provide a bridge between low-level features
of images and visual representation of semantic contained in
images. Compared to symbolized ontology, visual ontologies
can represent complex image knowledge in a more detailed
and intuitive way, so that no expert knowledge is needed to
process a complicated knowledge representation of images.
1http://rsbweb.nih.gov/ij
The binding created by visual ontologies between image
objects and their description, enables the proposed IR system
to perform a conceptual reasoning on the collection of
images, also when treating with pure content-based queries.
Thus, di erent forms of retrieval become possible with the
proposed system:
1. text-based: queries are lexically motivated, i.e. they
express objects by their names (keywords);
2. content-based: queries are perceptually motivated, i.e.</p>
      <p>they express objects by their visual apparency;
3. semantic retrieval: queries are semantically motivated,
since they express objects by their intended meaning,
i.e. in terms of concepts and their relationships.</p>
      <p>Currently, we are continuing to develop the proposed IR
system. To this aim, we are looking for the best appropriate
clustering algorithm to derive signi cant shape prototypes
and analyzing methods to create visual ontologies.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W.</given-names>
            <surname>Al-Khatib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. F.</given-names>
            <surname>Day</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ghafoor</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. B.</given-names>
            <surname>Berra</surname>
          </string-name>
          .
          <article-title>Semantic modeling and knowledge representation in multimedia databases</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>11</volume>
          (
          <issue>1</issue>
          ):
          <volume>64</volume>
          {
          <fpage>80</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Arivazhagan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ganesan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Selvanidhyananthan</surname>
          </string-name>
          .
          <article-title>Image retrieval using shape features</article-title>
          .
          <source>International journal of imaging science and engineering (IJISE)</source>
          ,
          <volume>1</volume>
          (
          <issue>3</issue>
          ):
          <volume>101</volume>
          {
          <fpage>103</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Bimbo</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Pala</surname>
          </string-name>
          .
          <article-title>Visual image retrieval by elastic matching of user sketches</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>19</volume>
          :
          <fpage>121</fpage>
          {
          <fpage>132</fpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bouet</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.-A.</given-names>
            <surname>Aufaure</surname>
          </string-name>
          .
          <article-title>Multimedia Data Mining and Knowledge Discovery, chapter New Image Retrieval Principle: Image Mining and Visual Ontology</article-title>
          , pages
          <volume>168</volume>
          {
          <fpage>184</fpage>
          . Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Christodoulakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Theodoridou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Papa</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Pathria</surname>
          </string-name>
          .
          <article-title>Multimedia document presentation, information extraction, and document formation in minos: a model and a system</article-title>
          .
          <source>ACM Trans. Inf</source>
          . Syst.,
          <volume>4</volume>
          (
          <issue>4</issue>
          ):
          <volume>345</volume>
          {
          <fpage>383</fpage>
          ,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Datta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and J. Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Image retrieval: Ideas, in uences, and trends of the new age</article-title>
          .
          <source>ACM Comput. Surv.</source>
          ,
          <volume>40</volume>
          (
          <issue>2</issue>
          ):1{
          <fpage>60</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Eakins</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Graham</surname>
          </string-name>
          .
          <article-title>Content-based image retrieval</article-title>
          .
          <source>University of Northumbria Technical Report</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Jain</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Vailaya</surname>
          </string-name>
          .
          <article-title>Image retrieval using color and shape</article-title>
          .
          <source>Pattern Recognition</source>
          ,
          <volume>29</volume>
          :
          <fpage>1233</fpage>
          {
          <fpage>1244</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Huang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Gao</surname>
          </string-name>
          .
          <article-title>An ontology-based approach to retrieve digitized art images</article-title>
          .
          <source>In WI '04: Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence</source>
          , pages
          <fpage>131</fpage>
          {
          <fpage>137</fpage>
          , Washington, DC, USA,
          <year>2004</year>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>K.-M. Lee</surname>
            and
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Street</surname>
          </string-name>
          .
          <article-title>Cluster-driven re nement for content-based digital image retrieval</article-title>
          .
          <source>Multimedia</source>
          , IEEE Transactions on,
          <volume>6</volume>
          (
          <issue>6</issue>
          ):
          <volume>817</volume>
          {
          <fpage>827</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , G. Lu, and W.-Y. Ma.
          <article-title>A survey of content-based image retrieval with high-level semantics</article-title>
          .
          <source>Pattern Recogn</source>
          .,
          <volume>40</volume>
          (
          <issue>1</issue>
          ):
          <volume>262</volume>
          {
          <fpage>282</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <article-title>A uni ed framework for semantics and feature based relevance feedback in image retrieval systems</article-title>
          .
          <source>In MULTIMEDIA '00: Proceedings of the eighth ACM international conference on Multimedia</source>
          , pages
          <volume>31</volume>
          {
          <fpage>37</fpage>
          , New York, NY, USA,
          <year>2000</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>MacArthur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Brodley</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.-R.</given-names>
            <surname>Shyu</surname>
          </string-name>
          .
          <article-title>Relevance feedback decision trees in content-based image retrieval</article-title>
          .
          <source>In IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVLS00)</source>
          , pages
          <fpage>68</fpage>
          {
          <fpage>72</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>B. S.</given-names>
            <surname>Manjunath</surname>
          </string-name>
          and
          <string-name>
            <given-names>W. Y.</given-names>
            <surname>Ma</surname>
          </string-name>
          .
          <article-title>Texture features for browsing and retrieval of image data</article-title>
          .
          <source>IEEE Trans. Pattern Anal. Mach</source>
          . Intell.,
          <volume>18</volume>
          (
          <issue>8</issue>
          ):
          <volume>837</volume>
          {
          <fpage>842</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V.</given-names>
            <surname>Mezaris</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Kompatsiaris</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Strintzis</surname>
          </string-name>
          .
          <article-title>An ontology approach to object-based image retrieval</article-title>
          .
          <source>In ICIP</source>
          <year>2003</year>
          ,
          <string-name>
            <surname>volume</surname>
            <given-names>II</given-names>
          </string-name>
          , pages
          <volume>511</volume>
          {
          <fpage>514</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Mojsilovic</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Rogowitz</surname>
          </string-name>
          .
          <article-title>Capturing image semantics with low-level descriptors</article-title>
          .
          <source>In Proc. of ICIP</source>
          , pages
          <volume>18</volume>
          {
          <fpage>21</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Naphade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kristjansson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Frey</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <article-title>Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems</article-title>
          .
          <source>Image Processing</source>
          , International Conference on,
          <volume>3</volume>
          :
          <fpage>536</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>P.</given-names>
            <surname>Pala</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Santini</surname>
          </string-name>
          .
          <article-title>Image retrieval by shape and texture</article-title>
          .
          <source>Pattern Recognition</source>
          ,
          <volume>32</volume>
          :
          <fpage>517</fpage>
          {
          <fpage>527</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>A. W. M. Smeulders</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Worring</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Santini</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Gupta</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Jain</surname>
          </string-name>
          .
          <article-title>Content-based image retrieval at the end of the early years</article-title>
          .
          <source>IEEE Trans. Pattern Anal. Mach</source>
          . Intell.,
          <volume>22</volume>
          (
          <issue>12</issue>
          ):
          <volume>1349</volume>
          {
          <fpage>1380</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Smith</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Chang</surname>
          </string-name>
          .
          <article-title>Local color and texture extraction and spatial query</article-title>
          .
          <source>In Proc. of IEEE Int. Conf. on Image Processing</source>
          , volume
          <volume>3</volume>
          , pages
          <fpage>1011</fpage>
          {
          <fpage>1014</fpage>
          ,
          <string-name>
            <surname>Sep</surname>
          </string-name>
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Smith</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. fu</given-names>
            <surname>Chang</surname>
          </string-name>
          .
          <article-title>Tools and techniques for color image retrieval</article-title>
          . In IS&amp;T/SPIE Proceedings,
          <article-title>Storage &amp; Retrieval for Image and Video Databases</article-title>
          , volume
          <volume>2670</volume>
          , pages
          <fpage>426</fpage>
          {
          <fpage>437</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vailaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Figueiredo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jain</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <article-title>Image classi cation for content-based indexing</article-title>
          .
          <source>IEEE Transaction on Image Process</source>
          ,
          <volume>10</volume>
          (
          <issue>1</issue>
          ):
          <volume>117</volume>
          {
          <fpage>130</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>A.</given-names>
            <surname>Yoshitaka</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Ichikawa</surname>
          </string-name>
          .
          <article-title>A survey on content-based retrieval for multimedia databases</article-title>
          .
          <source>IEEE Trans. on Knowl. and Data Eng</source>
          .,
          <volume>11</volume>
          (
          <issue>1</issue>
          ):
          <volume>81</volume>
          {
          <fpage>93</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>An active learning framework for content-based information retrieval</article-title>
          .
          <source>IEEE Transactions on Multimedia</source>
          ,
          <volume>4</volume>
          :
          <fpage>260</fpage>
          {
          <fpage>268</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>X. S.</given-names>
            <surname>Zhou</surname>
          </string-name>
          and
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <article-title>Cbir: from low-level features to high-level semantics</article-title>
          .
          <source>Image and Video Communications and Processing</source>
          <year>2000</year>
          ,
          <volume>3974</volume>
          (
          <issue>1</issue>
          ):
          <volume>426</volume>
          {
          <fpage>431</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>X. S.</given-names>
            <surname>Zhou</surname>
          </string-name>
          and
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <article-title>Unifying keywords and visual contents in image retrieval</article-title>
          .
          <source>IEEE MultiMedia</source>
          ,
          <volume>9</volume>
          (
          <issue>2</issue>
          ):
          <volume>23</volume>
          {
          <fpage>33</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Pan</surname>
          </string-name>
          .
          <article-title>Apply semantic template to support content-based image retrieval</article-title>
          .
          <source>In Storage and Retrieval for Media Databases</source>
          , volume
          <volume>3972</volume>
          , pages
          <fpage>442</fpage>
          {
          <fpage>449</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>