=Paper= {{Paper |id=None |storemode=property |title=Thinking of a System for Image Retrieval |pdfUrl=https://ceur-ws.org/Vol-560/paper17.pdf |volume=Vol-560 |dblpUrl=https://dblp.org/rec/conf/iir/CastellanoST10 }} ==Thinking of a System for Image Retrieval== https://ceur-ws.org/Vol-560/paper17.pdf
                       Thinking of a System for Image Retrieval

                                                                                  ∗
              Giovanna Castellano                              Gianluca Sforza                  Alessandra Torsello
            Università degli Studi di Bari                Università degli Studi di Bari     Università degli Studi di Bari
                    “Aldo Moro”                                   “Aldo Moro”                        “Aldo Moro”
                  via Orabona 4                                 via Orabona 4                      via Orabona 4
                      Bari, Italy                                   Bari, Italy                        Bari, Italy
             castellano@di.uniba.it                           gsforza@di.uniba.it               torsello@di.uniba.it

ABSTRACT                                                                   the user to access vast amounts of heterogeneous image sets,
Increasing applications are demanding effective and efficient              stored in different sites and archives. Additionally, the con-
support to perform retrieval in large collections of digital               tinuously increasing number of people that should access to
images. The work presented here is an early stage research                 such collections further dictates that more emphasis be put
focusing on the integration between text-based and content-                on attributes such as the user-friendliness and flexibility of
based image retrieval. The main objective is to find a valid               any multimedia content retrieval scheme.
solution to the problem of reducing the so called seman-                      The very first attempts at image retrieval were based on
tic gap, i.e. the lack of coincidence existing between the                 exploiting existing image captions to classify images accord-
visual information contained in an image and the interpre-                 ing to predetermined classes or to create a restricted vocab-
tation that a user can give of it. To address the semantic                 ulary [5]. Although relatively simple and computationally
gap problem, we intend to use a combination of several ap-                 efficient, this approach has several restrictions mainly de-
proaches. Firstly, a linking between low-level features and                riving from the use of a restricted vocabulary that neither
text description is obtained by a semi-automatic annotation                allows for unanticipated queries nor can be extended without
process, which makes use of shape prototypes generated by                  re-evaluating the possible connection between each item in
clustering. Precisely, the system indexes objects based on                 the database and each new addition to the vocabulary. Ad-
shape and groups them into a set of clusters, with each clus-              ditionally, such keyword-based approaches assume either the
ter represented by a prototype. Then, a taxonomy of ob-                    pre-existence of textual annotations (e.g. captions) or that
jects that are described by both visual ontologies and tex-                annotation using the predetermined vocabulary is performed
tual features is attached to prototypes, by forming a visual               manually. In the latter case, inconsistency of the keyword
description of a subset of the objects. The paper outlines the             assignments among different indexers can also hamper per-
architecture of the system and describes briefly algorithms                formance. Recently, a methodology for computer-assisted
underpinning the proposed approach.                                        annotation of image collections was presented [24].
                                                                              To overcome the limitations of the keyword-based ap-
                                                                           proach, the use of the visual content has been proposed,
Categories and Subject Descriptors                                         leading to Content-Based Image Retrieval(CBIR) approaches
H [Information Storage and Retrieval]                                      [6]. CBIR systems utilize the visual content of images to
                                                                           perform indexing and retrieval, by extracting low-level in-
General Terms                                                              dexing features, such as color, shape, and texture. In this
                                                                           case, pre-processing of images is necessary as the basis on
Image retrieval                                                            which features are extracted. The pre-processing is of coarse
                                                                           granularity if it involves processing of images as a whole,
Keywords                                                                   whereas it is of fine granularity if it involves detection of
Content-based image retrieval, Semantic image retrieval                    objects within an image [1]. Then, relevant images are re-
                                                                           trieved by comparing the low-level features of each item in
                                                                           the database with those of a user-supplied sketch or, more
1.   INTRODUCTION                                                          often, a key image that is either selected from a restricted
  By the end of the last century the question was not whether              image set or is supplied by the user (query-by-example).
digital image archives are technically and economically vi-                Several approaches have appeared in the literature which
able, but rather how these archives would be efficient and                 perform visual querying by examples taking into account
informative. The attempt has been to develop intelligent                   different facets of pictorial data to express the image con-
and efficient human-computer interaction systems, enabling                 tents, such as color [21], object shape [2], texture [14], or
∗Corresponding author                                                      a combination of them [8, 18, 20]. Among these, search by
                                                                           matching shapes of image portions is one of the most natural
                                                                           way to pose a query in image databases.
                                                                              Though many sophisticated algorithms have been designed
                                                                           to describe color, shape, and texture features, these algo-
                                                                           rithms cannot adequately model image semantics. Indeed,
Appears in the Proceedings of the 1st Italian Information Retrieval
                                                                           extensive experiments on CBIR show that low-level contents
Workshop (IIR’10), January 27–28, 2010, Padova, Italy.
http://ims.dei.unipd.it/websites/iir10/index.html
Copyright owned by the authors.
often fail to describe the high-level semantic concepts in              the low-level features of images in order to form a vo-
user’s mind [25]. Also, CBIR systems have limitations when              cabulary that provides a qualitative definition of high-
dealing with broad content image databases [16]; indeed, in             level query concepts. Finally, these descriptors can be
order to start a query, the availability of an appropriate key          mapped to high level semantics, based on our knowl-
image is assumed; occasionally, this is not feasible, particu-          edge. This approach works fine with small databases
larly for classes of images that are underrepresented in the            containing specifically collected images. With large
database. Therefore, the performance of CBIR systems is                 collections of images with various contents, more pow-
still far from user’s expectations.                                     erful tools are required to learn the semantics.
   Summarizing, current indexing schemes for image retrieval
employ descriptors ranging from low-level features to higher-         • Automatic image annotation [22]. This approach con-
level semantic concepts [23]. So far, significant work has              sists in exploiting supervised or unsupervised learning
been presented on unifying keywords and visual contents in              techniques to derive high-level concepts from images.
image retrieval, and several hybrid methods exploiting both             In particular, supervised learning techniques are used
keywords and the visual content have been proposed [17,                 to predict values of a semantic category based on a
12, 26]. Depending on how low-level and high-level descrip-             set of training samples. However, supervised learning
tors are employed and/or combined together, different levels            algorithms present some disadvantages strictly related
of image retrieval can be achieved. According to [7], three             to the nature of this kind of technique, that require a
levels of image retrieval can be considered:                            large amount of labeled data to provide effective learn-
                                                                        ing results. This represents a problem when the appli-
   • Level 1: Low-level features such as color, texture, shape          cation domain changes and new labeled samples have
     or the spatial location of image elements are exploited            to be provided. Clustering is the typical unsupervised
     in the retrieval process. At this level, the system sup-           learning technique used for retrieval purpose. In this
     ports queries like find pictures like this or find pictures        approach, images are grouped on the basis of some
     containing blue squares.                                           similarity measure, so that a class label is associated
                                                                        to each derived cluster. Images into the same cluster
   • Level 2: Objects of given type identified by low-level             are supposed to be similar to each other (i.e. having
     features are retrieved with some degree of logical in-             similar semantic content). Thus, a new untagged im-
     ference. An example of query is find pictures in which             age that is added to the database can be indexed by
     my father appears.                                                 assigning it to the cluster that better matches with the
                                                                        image.
   • Level 3: Abstract attributes associated to objects are
                                                                      • Relevance feedback [13]. This approach concerns the
     used for retrieval. This involves a significant amount
                                                                        possibility to learn the intentions of users and their
     of high-level reasoning about the meaning of the ob-
                                                                        specific needs by exploiting information obtained dur-
     jects or scenes depicted. An example of query is find
                                                                        ing their interactions with the system. In particu-
     pictures of a happy woman.
                                                                        lar, when the system provides the initial retrieval re-
   Retrieval including both Level 2 and Level 3 together is             sults, the user judges these by indicating if they are
referred to as semantic image retrieval. The gap between                relevant/irrelevant (and eventually the degree of rele-
Level 1 and Level 2 is known as semantic gap, which is ”the             vance/irrelevance). Then, a learning algorithm is used
lack of coincidence between the information that one can                to learn the user feedback, which will be exploited in
extract from the visual data and the interpretation that the            order to provide results that better satisfy the user
same data have for a user in a given situation” [19]. Retrieval         needs.
at Level 3 is quite difficult, therefore current systems mostly       • Generating semantic templates [27]. This method is
perform retrieval at Level 2, which requires three fundamen-            based on the concept of visual semantic template that
tal steps: (1) extraction of low-level image features, (2) def-         includes a set of icons or objects denoting a personal-
inition of proper similarity measures to perform matching,              ized view of concepts. Feature vectors of these objects
(3) reducing the semantic gap. Clearly, step (3) is the most            are extracted for query process. Initially, the user has
challenging one, since it requires providing a link between             to define the template of a concept by specifying, for
low-level features (visual data) and high-level concepts (se-           example, the objects and their spatial and temporal
mantic interpretation of visual data).                                  constraints and the weights assigned to each feature
   Currently, various approaches have been proposed to re-              for each object. Finally, through the interaction with
duce the semantic gap between the low-level features of im-             users, the system move toward a set of queries that
ages and the high-level concepts that are understandable by             better express the concept in the user mind. Since this
human. According to [11], they can be broadly grouped into              method requires the user to know the image features,
four main categories:                                                   it could be quite difficult for ordinary users.
   • Use of ontologies [15]. Ontologies can be used to pro-           Along with state-of-art directions in the field of IR, in
     vide an explicit, simplified and abstract specification       this paper we present the idea of an IR system supporting
     of knowledge about the domain of interest; this is ob-        retrieval at Level 2. Precisely, we intend to provide a solu-
     tained by defining concepts and relationships between         tion to the problem of semantic gap in IR by designing a
     them, according to the specific purpose of the con-           methodology based on a combination of several approaches,
     sidered problem. This approach exploits the possi-            which is oriented to exploit both the visual and the semantic
     bility to simply derive semantics from our daily lan-         content of images. This is achieved making use of clustering
     guage. Then, different descriptors can be related to          and visual ontologies. In the following, all the approaches
                                                                                            2.2    Clustering
                                           Prototype                       Prototype
          Image
                                              visual                        textual            Once all shapes have been detected from images and repre-
         collection
                                           signatures                     description       sented as visual signatures vectors, a set of shape prototypes
                      Feature extraction                 Semi-automatic
                      + clustering                       annotation                         is automatically defined by an unsupervised learning pro-
                                                                                        I
                                                                                        N   cess that performs clustering on visual signatures (Fourier
                                                                                        D
                                                                                        E   descriptors) of shapes, so as to categorize similar shapes into
                                            Indexing                                    X
                                               +                                        I   clusters. Each resulting cluster Ci is represented by a shape
                                      prototype matching                                N   prototype pi , that is computed by averaging visual signa-
                                                                                        G
                                                                                            tures of all shapes belonging to the cluster. We intend to
                                                                                        R   apply a hierarchical clustering, in order to generate a hi-
                                        Search engine                                   E   erarchy of prototypical shapes. Each node of the hierar-
                                      (Query processor                                  T
                                        + Ontologies                                    R   chical tree is associated with one prototypical shape. Root
                                         + interface)                                   I
                                                                                        E   nodes of the tree represent general prototypes, intermediate
                                                                                        V
                                                                                        A
                                                                                            nodes represent general shapes, leaf nodes represent specific
                                                                                        L   shapes.
                                             user
                                                                                               During the interaction of the user with the system, the
                                                                                            hierarchical tree is incrementally updated. Whenever a new
                                                                                            shape is considered (i.e. each time a new image containing
              Figure 1: The system architecture.                                            relevant object shapes is added to the database), we evaluate
                                                                                            its matching against all existing prototypes, from root nodes
                                                                                            to pre-leafs(final) nodes, according to a similarity measure
underpinning the proposed IR methodology are briefly de-                                    defined on visual signatures. If the new shape matches a final
scribed and the architecture of the system is outlined.                                     prototype with a sufficient degree, then the corresponding
                                                                                            prototype is updated by averaging the features of shapes
2.      OVERVIEW OF THE IR SYSTEM                                                           that belong to the corresponding cluster [10]. Otherwise, a
                                                                                            new prototype is created, corresponding to the new shape.
   The proposed system is intended to perform image re-                                        The use of shape prototypes, which represent an inter-
trieval by exploiting both the visual and the semantic con-                                 mediate level of visual signatures, facilitates the subsequent
tent of images. As concerns the visual content, in this pre-                                tasks 3. and 4. Actually, prototypes facilitate the anno-
liminary phase of the research we focus only on shape con-                                  tation process, since only a reduced number of shapes (the
tent. In fact, we aim to deal with specific domain images                                   prototypical ones) need to be manually annotated. Secondly,
containing objects that have a distinguishable shape mean-                                  the use of prototypes simplifies the search process. Indeed,
ing. Therefore, we assume that indexing and querying are                                    since only a small number of objects is likely to match any
only based on shape matching. The system will allow the                                     single user query, a large number of unnecessary compar-
user to query the image database not only by shape sketches                                 isons is avoided during search by performing matching with
and by keywords but also by “concepts describing shapes”.                                   shape prototypes rather than with specific shapes. In other
The general architecture of the proposed IR system is re-                                   words, prototypes acts as a filter that reduces the search
ported in fig. 1.                                                                           space quickly while discriminating the objects.
   As it can be seen, several tasks are carried out in order
to derive visual and textual features of shapes contained in                                2.3    Semi-automatic annotation
images. These tasks are:
                                                                                               Once shape prototypes have been derived, a semi-automatic
     1. Feature extraction: detecting shapes in images;                                     annotation process is applied to associate text descriptions
                                                                                            to identified object shapes. The process is semi-automatic
     2. Clustering: grouping similar shapes into prototypes;                                since it involves a manual annotation only for prototypes:
                                                                                            shapes immediately attached in the hierarchy are automat-
     3. Semi-automatic annotation: associating keywords to                                  ically annotated, since they inherit descriptions from their
        prototypes;                                                                         prototypes.
                                                                                               Every semantic class that is of interest in the considered
     4. Search.                                                                             image domain (e.g. for ours, glasses, bottles, etc.) will be
                                                                                            described by a visual ontology (VO), which is intended as
     In the following we describe how each task is carried out.                             a textual description, made of concepts and relationships
                                                                                            among them, of the visual content of a prototypical shape
2.1      Feature extraction                                                                 [9, 4]. We figure the lexicon used to define the VOs to be
  In the proposed system, each image in the database is                                     as much intuitive as possible, so as to evocate the particular
stored as a collection of objects’ shapes contained in it. In                               shape it describes. We plan that the system will be sup-
order to be stored in the database, every image is processed                                plied of a basic set of domain dependent VOs, one for each
to identify objects appearing in it. Image processing starts                                considered semantic class.
with an edge detection process that extracts all contours in                                   Of course, different prototypical shapes may convey the
the image. Then, using the derived edges, a shape detection                                 same semantic content (e.g., several different shapes may
process is performed to identify different objects included                                 convey the concept of glass). We consider such prototypes
in the image and determine their contours. Finally, Fourier                                 to belong to the same semantic class. Shape prototypes be-
descriptors are computed on each contour and retained as                                    longing to the same semantic class will share about the same
visual signatures of the objects in a separate database.                                    VO structure, obviously with the appropriate differences.
   As an illustrative example, we sketch some possible rela-
tionships included in a VO that refers to the semantic class
glass:
     • wine glass IS SPECIALIZATION OF glass;
     • bottom IS PART OF wine glass;
     • wavy shape IS PROPERTY OF bottom.
   The combined use of prototypes and VOs provides a pow-
erful mechanism for automatic annotation of shapes. Every
time the user adds a new shape to the database, the system
associates the shape to the most similar prototype, which
is related to a semantic class and linked to a VO. Thus the
new shape inherits all the semantic descriptions associated
to the selected prototype in an automatic fashion. Then,
a feedback from the user is considered. Namely, the user
may accept the choice operated by the system, or reject it.             Figure 2: An initial search engine interface.
In the latter case, there are two possibilities: the user can
select the proper prototype with the related VO from the
existing ones, or, if no one can be associated to the shape,     of the system, we considered an image database from the art
the user can create a new prototype (using the new shape)        domain. The database, used in other IR works [3] includes
and manually annotate it by modifying the VO incorrectly         digitalized images representing still-object paintings by the
assigned by the system previously.                               Italian artist Giorgio Morandi.
2.4     Search                                                      As concerns task 1., various image processing tools that
                                                                 are necessary to extract shape features from the image ob-
   The engine mechanism is designed to allow users to submit     jects have been developed, including edge detection meth-
sketch-based, text-based and concept-based queries.              ods, as well as enhancement and reconstruction functional-
   The results of the sketch-based search emerge from a match-   ities. Basic image processing methods were included from
ing between the submitted sample shape and the created           the ImageJ image analysis software1 , such as thresholding
prototypes. Precisely, when the user presents a query in the     methods (e.g. Canny, Prewitt and Sobel) for automatic de-
form of an object sketch, the system formulates the query,       tection of objects boundaries lying in images. Having the
performing feature extraction by translating that object into    possibility to act on contrast and brightness properties, the
a shape model. The extracted query feature is submitted          user can adjust the image appearance to refine the extraction
to compute similarity between the query and prototypes           of the shapes of objects. The shape identification is made
first. This is made by considering shapes as points of a         automatically through an edge following algorithm. When
feature space. Having characterized each shape as a vector       the result of shape identification is not satisfying, the user
of Fourier descriptors, we simply evaluate dissimilarity be-     is given the possibility to correct boundaries or to manually
tween two shapes in terms of Euclidean distance between          draw boundaries directly on the image.
two vectors of descriptors. Of course, other similarity mea-        As concerns task 4., the retrieval graphical interface has
sures can be considered, encapsulating the human percep-         been developed, that enables users to query the system and
tion of shape similarity (this is an interesting issue that we   to inspect search results (fig. 2). Also, the computation of
would like to deepen in future). After sorting the prototypes    Euclidean dissimilarity measures for shape prototype match-
in terms of similarity, the system returns images containing     ing has been included in the system.
objects indexed by the prototypes with highest similarities.        Currently, the system provides also the interfaces for brows-
   The results of the text-based search emerge from a match-     ing the database and insert new images.
ing between the submitted textual query and textual de-
scriptions associated to prototypes. Namely, when a query
is formulated in terms of keywords, the system simply re-        4.     CONCLUSIONS
turns images including the objects indexed by the proto-            In this paper a preliminary proposal of an IR system has
types labeled with that keywords. As before, high-matching       been presented. The system is intended to solve the problem
prototypes are selected to provide shapes to be visualized as    of semantic gap by exploiting clustering and visual ontolo-
search results.                                                  gies. The use of a visual ontology is motivated by the neces-
   Finally, when both a visual and textual content are ex-       sity of reproducing the capacity of a human in describing her
ploited by the user querying the image database, images          visual perception by means of the visual concepts she pos-
returned from the two approaches separately, are merged          sesses. From the point of human-computer interaction view,
together in a single output set.                                 visual ontologies provide a bridge between low-level features
                                                                 of images and visual representation of semantic contained in
3.    FIRST STEPS TOWARD THE SYSTEM                              images. Compared to symbolized ontology, visual ontologies
                                                                 can represent complex image knowledge in a more detailed
      DEVELOPMENT                                                and intuitive way, so that no expert knowledge is needed to
  In this preliminary phase of the research, only the main       process a complicated knowledge representation of images.
functions for tasks 1. and 4. described above have been im-
                                                                 1
plemented in the system. For tests during the development            http://rsbweb.nih.gov/ij
The binding created by visual ontologies between image ob-        [12] Y. Lu, C. Hu, X. Zhu, H. Zhang, and Q. Yang. A
jects and their description, enables the proposed IR system            unified framework for semantics and feature based
to perform a conceptual reasoning on the collection of im-             relevance feedback in image retrieval systems. In
ages, also when treating with pure content-based queries.              MULTIMEDIA ’00: Proceedings of the eighth ACM
Thus, different forms of retrieval become possible with the            international conference on Multimedia, pages 31–37,
proposed system:                                                       New York, NY, USA, 2000. ACM.
                                                                  [13] S. MacArthur, C. Brodley, and C.-R. Shyu. Relevance
     1. text-based: queries are lexically motivated, i.e. they
                                                                       feedback decision trees in content-based image
        express objects by their names (keywords);
                                                                       retrieval. In IEEE Workshop on Content-Based Access
     2. content-based: queries are perceptually motivated, i.e.        of Image and Video Libraries (CBAIVLŠ00), pages
        they express objects by their visual apparency;                68–72, 2000.
                                                                  [14] B. S. Manjunath and W. Y. Ma. Texture features for
     3. semantic retrieval: queries are semantically motivated,
                                                                       browsing and retrieval of image data. IEEE Trans.
        since they express objects by their intended meaning,
                                                                       Pattern Anal. Mach. Intell., 18(8):837–842, 1996.
        i.e. in terms of concepts and their relationships.
                                                                  [15] V. Mezaris, I. Kompatsiaris, and M. Strintzis. An
  Currently, we are continuing to develop the proposed IR              ontology approach to object-based image retrieval. In
system. To this aim, we are looking for the best appropriate           ICIP 2003, volume II, pages 511–514, 2003.
clustering algorithm to derive significant shape prototypes       [16] R. Mojsilovic and B. Rogowitz. Capturing image
and analyzing methods to create visual ontologies.                     semantics with low-level descriptors. In Proc. of ICIP,
                                                                       pages 18–21, 2001.
5.     REFERENCES                                                 [17] M. Naphade, T. Kristjansson, B. Frey, and T. Huang.
 [1] W. Al-Khatib, Y. F. Day, A. Ghafoor, and P. B. Berra.             Probabilistic multimedia objects (multijects): a novel
     Semantic modeling and knowledge representation in                 approach to video indexing and retrieval in
     multimedia databases. IEEE Transactions on                        multimedia systems. Image Processing, International
     Knowledge and Data Engineering, 11(1):64–80, 1999.                Conference on, 3:536, 1998.
 [2] S. Arivazhagan, L. Ganesan, and                              [18] P. Pala and S. Santini. Image retrieval by shape and
     S. Selvanidhyananthan. Image retrieval using shape                texture. Pattern Recognition, 32:517–527, 1999.
     features. International journal of imaging science and       [19] A. W. M. Smeulders, M. Worring, S. Santini,
     engineering (IJISE), 1(3):101–103, 2007.                          A. Gupta, and R. Jain. Content-based image retrieval
 [3] A. D. Bimbo and P. Pala. Visual image retrieval by                at the end of the early years. IEEE Trans. Pattern
     elastic matching of user sketches. IEEE Transactions              Anal. Mach. Intell., 22(12):1349–1380, 2000.
     on Pattern Analysis and Machine Intelligence,                [20] J. R. Smith and S. Chang. Local color and texture
     19:121–132, 1997.                                                 extraction and spatial query. In Proc. of IEEE Int.
 [4] M. Bouet and M.-A. Aufaure. Multimedia Data                       Conf. on Image Processing, volume 3, pages
     Mining and Knowledge Discovery, chapter New Image                 1011–1014, Sep 1996.
     Retrieval Principle: Image Mining and Visual                 [21] J. R. Smith and S. fu Chang. Tools and techniques for
     Ontology, pages 168–184. Springer, 2007.                          color image retrieval. In IS&T/SPIE Proceedings,
 [5] S. Christodoulakis, M. Theodoridou, F. Ho, M. Papa,               Storage & Retrieval for Image and Video Databases,
     and A. Pathria. Multimedia document presentation,                 volume 2670, pages 426–437, 1996.
     information extraction, and document formation in            [22] A. Vailaya, M. Figueiredo, A. Jain, and H. Zhang.
     minos: a model and a system. ACM Trans. Inf. Syst.,               Image classification for content-based indexing. IEEE
     4(4):345–383, 1986.                                               Transaction on Image Process, 10(1):117–130, 2001.
 [6] R. Datta, D. Joshi, J. Li, and J. Z. Wang. Image             [23] A. Yoshitaka and T. Ichikawa. A survey on
     retrieval: Ideas, influences, and trends of the new age.          content-based retrieval for multimedia databases.
     ACM Comput. Surv., 40(2):1–60, 2008.                              IEEE Trans. on Knowl. and Data Eng., 11(1):81–93,
 [7] J. Eakins and M. Graham. Content-based image                      1999.
     retrieval. University of Northumbria Technical Report,       [24] C. Zhang and T. Chen. An active learning framework
     1999.                                                             for content-based information retrieval. IEEE
 [8] A. K. Jain and A. Vailaya. Image retrieval using color            Transactions on Multimedia, 4:260–268, 2002.
     and shape. Pattern Recognition, 29:1233–1244, 1996.          [25] X. S. Zhou and T. S. Huang. Cbir: from low-level
 [9] S. Jiang, T. Huang, and W. Gao. An ontology-based                 features to high-level semantics. Image and Video
     approach to retrieve digitized art images. In WI ’04:             Communications and Processing 2000,
     Proceedings of the 2004 IEEE/WIC/ACM                              3974(1):426–431, 2000.
     International Conference on Web Intelligence, pages          [26] X. S. Zhou and T. S. Huang. Unifying keywords and
     131–137, Washington, DC, USA, 2004. IEEE                          visual contents in image retrieval. IEEE MultiMedia,
     Computer Society.                                                 9(2):23–33, 2002.
[10] K.-M. Lee and W. Street. Cluster-driven refinement           [27] Y. Zhuang, X. Liu, and Y. Pan. Apply semantic
     for content-based digital image retrieval. Multimedia,            template to support content-based image retrieval. In
     IEEE Transactions on, 6(6):817–827, 2004.                         Storage and Retrieval for Media Databases, volume
[11] Y. Liu, D. Zhang, G. Lu, and W.-Y. Ma. A survey of                3972, pages 442–449, 1999.
     content-based image retrieval with high-level
     semantics. Pattern Recogn., 40(1):262–282, 2007.