=Paper=
{{Paper
|id=None
|storemode=property
|title=Thinking of a System for Image Retrieval
|pdfUrl=https://ceur-ws.org/Vol-560/paper17.pdf
|volume=Vol-560
|dblpUrl=https://dblp.org/rec/conf/iir/CastellanoST10
}}
==Thinking of a System for Image Retrieval==
Thinking of a System for Image Retrieval ∗ Giovanna Castellano Gianluca Sforza Alessandra Torsello Università degli Studi di Bari Università degli Studi di Bari Università degli Studi di Bari “Aldo Moro” “Aldo Moro” “Aldo Moro” via Orabona 4 via Orabona 4 via Orabona 4 Bari, Italy Bari, Italy Bari, Italy castellano@di.uniba.it gsforza@di.uniba.it torsello@di.uniba.it ABSTRACT the user to access vast amounts of heterogeneous image sets, Increasing applications are demanding effective and efficient stored in different sites and archives. Additionally, the con- support to perform retrieval in large collections of digital tinuously increasing number of people that should access to images. The work presented here is an early stage research such collections further dictates that more emphasis be put focusing on the integration between text-based and content- on attributes such as the user-friendliness and flexibility of based image retrieval. The main objective is to find a valid any multimedia content retrieval scheme. solution to the problem of reducing the so called seman- The very first attempts at image retrieval were based on tic gap, i.e. the lack of coincidence existing between the exploiting existing image captions to classify images accord- visual information contained in an image and the interpre- ing to predetermined classes or to create a restricted vocab- tation that a user can give of it. To address the semantic ulary [5]. Although relatively simple and computationally gap problem, we intend to use a combination of several ap- efficient, this approach has several restrictions mainly de- proaches. Firstly, a linking between low-level features and riving from the use of a restricted vocabulary that neither text description is obtained by a semi-automatic annotation allows for unanticipated queries nor can be extended without process, which makes use of shape prototypes generated by re-evaluating the possible connection between each item in clustering. Precisely, the system indexes objects based on the database and each new addition to the vocabulary. Ad- shape and groups them into a set of clusters, with each clus- ditionally, such keyword-based approaches assume either the ter represented by a prototype. Then, a taxonomy of ob- pre-existence of textual annotations (e.g. captions) or that jects that are described by both visual ontologies and tex- annotation using the predetermined vocabulary is performed tual features is attached to prototypes, by forming a visual manually. In the latter case, inconsistency of the keyword description of a subset of the objects. The paper outlines the assignments among different indexers can also hamper per- architecture of the system and describes briefly algorithms formance. Recently, a methodology for computer-assisted underpinning the proposed approach. annotation of image collections was presented [24]. To overcome the limitations of the keyword-based ap- proach, the use of the visual content has been proposed, Categories and Subject Descriptors leading to Content-Based Image Retrieval(CBIR) approaches H [Information Storage and Retrieval] [6]. CBIR systems utilize the visual content of images to perform indexing and retrieval, by extracting low-level in- General Terms dexing features, such as color, shape, and texture. In this case, pre-processing of images is necessary as the basis on Image retrieval which features are extracted. The pre-processing is of coarse granularity if it involves processing of images as a whole, Keywords whereas it is of fine granularity if it involves detection of Content-based image retrieval, Semantic image retrieval objects within an image [1]. Then, relevant images are re- trieved by comparing the low-level features of each item in the database with those of a user-supplied sketch or, more 1. INTRODUCTION often, a key image that is either selected from a restricted By the end of the last century the question was not whether image set or is supplied by the user (query-by-example). digital image archives are technically and economically vi- Several approaches have appeared in the literature which able, but rather how these archives would be efficient and perform visual querying by examples taking into account informative. The attempt has been to develop intelligent different facets of pictorial data to express the image con- and efficient human-computer interaction systems, enabling tents, such as color [21], object shape [2], texture [14], or ∗Corresponding author a combination of them [8, 18, 20]. Among these, search by matching shapes of image portions is one of the most natural way to pose a query in image databases. Though many sophisticated algorithms have been designed to describe color, shape, and texture features, these algo- rithms cannot adequately model image semantics. Indeed, Appears in the Proceedings of the 1st Italian Information Retrieval extensive experiments on CBIR show that low-level contents Workshop (IIR’10), January 27–28, 2010, Padova, Italy. http://ims.dei.unipd.it/websites/iir10/index.html Copyright owned by the authors. often fail to describe the high-level semantic concepts in the low-level features of images in order to form a vo- user’s mind [25]. Also, CBIR systems have limitations when cabulary that provides a qualitative definition of high- dealing with broad content image databases [16]; indeed, in level query concepts. Finally, these descriptors can be order to start a query, the availability of an appropriate key mapped to high level semantics, based on our knowl- image is assumed; occasionally, this is not feasible, particu- edge. This approach works fine with small databases larly for classes of images that are underrepresented in the containing specifically collected images. With large database. Therefore, the performance of CBIR systems is collections of images with various contents, more pow- still far from user’s expectations. erful tools are required to learn the semantics. Summarizing, current indexing schemes for image retrieval employ descriptors ranging from low-level features to higher- • Automatic image annotation [22]. This approach con- level semantic concepts [23]. So far, significant work has sists in exploiting supervised or unsupervised learning been presented on unifying keywords and visual contents in techniques to derive high-level concepts from images. image retrieval, and several hybrid methods exploiting both In particular, supervised learning techniques are used keywords and the visual content have been proposed [17, to predict values of a semantic category based on a 12, 26]. Depending on how low-level and high-level descrip- set of training samples. However, supervised learning tors are employed and/or combined together, different levels algorithms present some disadvantages strictly related of image retrieval can be achieved. According to [7], three to the nature of this kind of technique, that require a levels of image retrieval can be considered: large amount of labeled data to provide effective learn- ing results. This represents a problem when the appli- • Level 1: Low-level features such as color, texture, shape cation domain changes and new labeled samples have or the spatial location of image elements are exploited to be provided. Clustering is the typical unsupervised in the retrieval process. At this level, the system sup- learning technique used for retrieval purpose. In this ports queries like find pictures like this or find pictures approach, images are grouped on the basis of some containing blue squares. similarity measure, so that a class label is associated to each derived cluster. Images into the same cluster • Level 2: Objects of given type identified by low-level are supposed to be similar to each other (i.e. having features are retrieved with some degree of logical in- similar semantic content). Thus, a new untagged im- ference. An example of query is find pictures in which age that is added to the database can be indexed by my father appears. assigning it to the cluster that better matches with the image. • Level 3: Abstract attributes associated to objects are • Relevance feedback [13]. This approach concerns the used for retrieval. This involves a significant amount possibility to learn the intentions of users and their of high-level reasoning about the meaning of the ob- specific needs by exploiting information obtained dur- jects or scenes depicted. An example of query is find ing their interactions with the system. In particu- pictures of a happy woman. lar, when the system provides the initial retrieval re- Retrieval including both Level 2 and Level 3 together is sults, the user judges these by indicating if they are referred to as semantic image retrieval. The gap between relevant/irrelevant (and eventually the degree of rele- Level 1 and Level 2 is known as semantic gap, which is ”the vance/irrelevance). Then, a learning algorithm is used lack of coincidence between the information that one can to learn the user feedback, which will be exploited in extract from the visual data and the interpretation that the order to provide results that better satisfy the user same data have for a user in a given situation” [19]. Retrieval needs. at Level 3 is quite difficult, therefore current systems mostly • Generating semantic templates [27]. This method is perform retrieval at Level 2, which requires three fundamen- based on the concept of visual semantic template that tal steps: (1) extraction of low-level image features, (2) def- includes a set of icons or objects denoting a personal- inition of proper similarity measures to perform matching, ized view of concepts. Feature vectors of these objects (3) reducing the semantic gap. Clearly, step (3) is the most are extracted for query process. Initially, the user has challenging one, since it requires providing a link between to define the template of a concept by specifying, for low-level features (visual data) and high-level concepts (se- example, the objects and their spatial and temporal mantic interpretation of visual data). constraints and the weights assigned to each feature Currently, various approaches have been proposed to re- for each object. Finally, through the interaction with duce the semantic gap between the low-level features of im- users, the system move toward a set of queries that ages and the high-level concepts that are understandable by better express the concept in the user mind. Since this human. According to [11], they can be broadly grouped into method requires the user to know the image features, four main categories: it could be quite difficult for ordinary users. • Use of ontologies [15]. Ontologies can be used to pro- Along with state-of-art directions in the field of IR, in vide an explicit, simplified and abstract specification this paper we present the idea of an IR system supporting of knowledge about the domain of interest; this is ob- retrieval at Level 2. Precisely, we intend to provide a solu- tained by defining concepts and relationships between tion to the problem of semantic gap in IR by designing a them, according to the specific purpose of the con- methodology based on a combination of several approaches, sidered problem. This approach exploits the possi- which is oriented to exploit both the visual and the semantic bility to simply derive semantics from our daily lan- content of images. This is achieved making use of clustering guage. Then, different descriptors can be related to and visual ontologies. In the following, all the approaches 2.2 Clustering Prototype Prototype Image visual textual Once all shapes have been detected from images and repre- collection signatures description sented as visual signatures vectors, a set of shape prototypes Feature extraction Semi-automatic + clustering annotation is automatically defined by an unsupervised learning pro- I N cess that performs clustering on visual signatures (Fourier D E descriptors) of shapes, so as to categorize similar shapes into Indexing X + I clusters. Each resulting cluster Ci is represented by a shape prototype matching N prototype pi , that is computed by averaging visual signa- G tures of all shapes belonging to the cluster. We intend to R apply a hierarchical clustering, in order to generate a hi- Search engine E erarchy of prototypical shapes. Each node of the hierar- (Query processor T + Ontologies R chical tree is associated with one prototypical shape. Root + interface) I E nodes of the tree represent general prototypes, intermediate V A nodes represent general shapes, leaf nodes represent specific L shapes. user During the interaction of the user with the system, the hierarchical tree is incrementally updated. Whenever a new shape is considered (i.e. each time a new image containing Figure 1: The system architecture. relevant object shapes is added to the database), we evaluate its matching against all existing prototypes, from root nodes to pre-leafs(final) nodes, according to a similarity measure underpinning the proposed IR methodology are briefly de- defined on visual signatures. If the new shape matches a final scribed and the architecture of the system is outlined. prototype with a sufficient degree, then the corresponding prototype is updated by averaging the features of shapes 2. OVERVIEW OF THE IR SYSTEM that belong to the corresponding cluster [10]. Otherwise, a new prototype is created, corresponding to the new shape. The proposed system is intended to perform image re- The use of shape prototypes, which represent an inter- trieval by exploiting both the visual and the semantic con- mediate level of visual signatures, facilitates the subsequent tent of images. As concerns the visual content, in this pre- tasks 3. and 4. Actually, prototypes facilitate the anno- liminary phase of the research we focus only on shape con- tation process, since only a reduced number of shapes (the tent. In fact, we aim to deal with specific domain images prototypical ones) need to be manually annotated. Secondly, containing objects that have a distinguishable shape mean- the use of prototypes simplifies the search process. Indeed, ing. Therefore, we assume that indexing and querying are since only a small number of objects is likely to match any only based on shape matching. The system will allow the single user query, a large number of unnecessary compar- user to query the image database not only by shape sketches isons is avoided during search by performing matching with and by keywords but also by “concepts describing shapes”. shape prototypes rather than with specific shapes. In other The general architecture of the proposed IR system is re- words, prototypes acts as a filter that reduces the search ported in fig. 1. space quickly while discriminating the objects. As it can be seen, several tasks are carried out in order to derive visual and textual features of shapes contained in 2.3 Semi-automatic annotation images. These tasks are: Once shape prototypes have been derived, a semi-automatic 1. Feature extraction: detecting shapes in images; annotation process is applied to associate text descriptions to identified object shapes. The process is semi-automatic 2. Clustering: grouping similar shapes into prototypes; since it involves a manual annotation only for prototypes: shapes immediately attached in the hierarchy are automat- 3. Semi-automatic annotation: associating keywords to ically annotated, since they inherit descriptions from their prototypes; prototypes. Every semantic class that is of interest in the considered 4. Search. image domain (e.g. for ours, glasses, bottles, etc.) will be described by a visual ontology (VO), which is intended as In the following we describe how each task is carried out. a textual description, made of concepts and relationships among them, of the visual content of a prototypical shape 2.1 Feature extraction [9, 4]. We figure the lexicon used to define the VOs to be In the proposed system, each image in the database is as much intuitive as possible, so as to evocate the particular stored as a collection of objects’ shapes contained in it. In shape it describes. We plan that the system will be sup- order to be stored in the database, every image is processed plied of a basic set of domain dependent VOs, one for each to identify objects appearing in it. Image processing starts considered semantic class. with an edge detection process that extracts all contours in Of course, different prototypical shapes may convey the the image. Then, using the derived edges, a shape detection same semantic content (e.g., several different shapes may process is performed to identify different objects included convey the concept of glass). We consider such prototypes in the image and determine their contours. Finally, Fourier to belong to the same semantic class. Shape prototypes be- descriptors are computed on each contour and retained as longing to the same semantic class will share about the same visual signatures of the objects in a separate database. VO structure, obviously with the appropriate differences. As an illustrative example, we sketch some possible rela- tionships included in a VO that refers to the semantic class glass: • wine glass IS SPECIALIZATION OF glass; • bottom IS PART OF wine glass; • wavy shape IS PROPERTY OF bottom. The combined use of prototypes and VOs provides a pow- erful mechanism for automatic annotation of shapes. Every time the user adds a new shape to the database, the system associates the shape to the most similar prototype, which is related to a semantic class and linked to a VO. Thus the new shape inherits all the semantic descriptions associated to the selected prototype in an automatic fashion. Then, a feedback from the user is considered. Namely, the user may accept the choice operated by the system, or reject it. Figure 2: An initial search engine interface. In the latter case, there are two possibilities: the user can select the proper prototype with the related VO from the existing ones, or, if no one can be associated to the shape, of the system, we considered an image database from the art the user can create a new prototype (using the new shape) domain. The database, used in other IR works [3] includes and manually annotate it by modifying the VO incorrectly digitalized images representing still-object paintings by the assigned by the system previously. Italian artist Giorgio Morandi. 2.4 Search As concerns task 1., various image processing tools that are necessary to extract shape features from the image ob- The engine mechanism is designed to allow users to submit jects have been developed, including edge detection meth- sketch-based, text-based and concept-based queries. ods, as well as enhancement and reconstruction functional- The results of the sketch-based search emerge from a match- ities. Basic image processing methods were included from ing between the submitted sample shape and the created the ImageJ image analysis software1 , such as thresholding prototypes. Precisely, when the user presents a query in the methods (e.g. Canny, Prewitt and Sobel) for automatic de- form of an object sketch, the system formulates the query, tection of objects boundaries lying in images. Having the performing feature extraction by translating that object into possibility to act on contrast and brightness properties, the a shape model. The extracted query feature is submitted user can adjust the image appearance to refine the extraction to compute similarity between the query and prototypes of the shapes of objects. The shape identification is made first. This is made by considering shapes as points of a automatically through an edge following algorithm. When feature space. Having characterized each shape as a vector the result of shape identification is not satisfying, the user of Fourier descriptors, we simply evaluate dissimilarity be- is given the possibility to correct boundaries or to manually tween two shapes in terms of Euclidean distance between draw boundaries directly on the image. two vectors of descriptors. Of course, other similarity mea- As concerns task 4., the retrieval graphical interface has sures can be considered, encapsulating the human percep- been developed, that enables users to query the system and tion of shape similarity (this is an interesting issue that we to inspect search results (fig. 2). Also, the computation of would like to deepen in future). After sorting the prototypes Euclidean dissimilarity measures for shape prototype match- in terms of similarity, the system returns images containing ing has been included in the system. objects indexed by the prototypes with highest similarities. Currently, the system provides also the interfaces for brows- The results of the text-based search emerge from a match- ing the database and insert new images. ing between the submitted textual query and textual de- scriptions associated to prototypes. Namely, when a query is formulated in terms of keywords, the system simply re- 4. CONCLUSIONS turns images including the objects indexed by the proto- In this paper a preliminary proposal of an IR system has types labeled with that keywords. As before, high-matching been presented. The system is intended to solve the problem prototypes are selected to provide shapes to be visualized as of semantic gap by exploiting clustering and visual ontolo- search results. gies. The use of a visual ontology is motivated by the neces- Finally, when both a visual and textual content are ex- sity of reproducing the capacity of a human in describing her ploited by the user querying the image database, images visual perception by means of the visual concepts she pos- returned from the two approaches separately, are merged sesses. From the point of human-computer interaction view, together in a single output set. visual ontologies provide a bridge between low-level features of images and visual representation of semantic contained in 3. FIRST STEPS TOWARD THE SYSTEM images. Compared to symbolized ontology, visual ontologies can represent complex image knowledge in a more detailed DEVELOPMENT and intuitive way, so that no expert knowledge is needed to In this preliminary phase of the research, only the main process a complicated knowledge representation of images. functions for tasks 1. and 4. described above have been im- 1 plemented in the system. For tests during the development http://rsbweb.nih.gov/ij The binding created by visual ontologies between image ob- [12] Y. Lu, C. Hu, X. Zhu, H. Zhang, and Q. Yang. A jects and their description, enables the proposed IR system unified framework for semantics and feature based to perform a conceptual reasoning on the collection of im- relevance feedback in image retrieval systems. In ages, also when treating with pure content-based queries. MULTIMEDIA ’00: Proceedings of the eighth ACM Thus, different forms of retrieval become possible with the international conference on Multimedia, pages 31–37, proposed system: New York, NY, USA, 2000. ACM. [13] S. MacArthur, C. Brodley, and C.-R. Shyu. Relevance 1. text-based: queries are lexically motivated, i.e. they feedback decision trees in content-based image express objects by their names (keywords); retrieval. In IEEE Workshop on Content-Based Access 2. content-based: queries are perceptually motivated, i.e. of Image and Video Libraries (CBAIVLŠ00), pages they express objects by their visual apparency; 68–72, 2000. [14] B. S. Manjunath and W. Y. Ma. Texture features for 3. semantic retrieval: queries are semantically motivated, browsing and retrieval of image data. IEEE Trans. since they express objects by their intended meaning, Pattern Anal. Mach. Intell., 18(8):837–842, 1996. i.e. in terms of concepts and their relationships. [15] V. Mezaris, I. Kompatsiaris, and M. Strintzis. An Currently, we are continuing to develop the proposed IR ontology approach to object-based image retrieval. In system. To this aim, we are looking for the best appropriate ICIP 2003, volume II, pages 511–514, 2003. clustering algorithm to derive significant shape prototypes [16] R. Mojsilovic and B. Rogowitz. Capturing image and analyzing methods to create visual ontologies. semantics with low-level descriptors. In Proc. of ICIP, pages 18–21, 2001. 5. REFERENCES [17] M. Naphade, T. Kristjansson, B. Frey, and T. Huang. [1] W. Al-Khatib, Y. F. Day, A. Ghafoor, and P. B. Berra. Probabilistic multimedia objects (multijects): a novel Semantic modeling and knowledge representation in approach to video indexing and retrieval in multimedia databases. IEEE Transactions on multimedia systems. Image Processing, International Knowledge and Data Engineering, 11(1):64–80, 1999. Conference on, 3:536, 1998. [2] S. Arivazhagan, L. Ganesan, and [18] P. Pala and S. Santini. Image retrieval by shape and S. Selvanidhyananthan. Image retrieval using shape texture. Pattern Recognition, 32:517–527, 1999. features. International journal of imaging science and [19] A. W. M. Smeulders, M. Worring, S. Santini, engineering (IJISE), 1(3):101–103, 2007. A. Gupta, and R. Jain. Content-based image retrieval [3] A. D. Bimbo and P. Pala. Visual image retrieval by at the end of the early years. IEEE Trans. Pattern elastic matching of user sketches. IEEE Transactions Anal. Mach. Intell., 22(12):1349–1380, 2000. on Pattern Analysis and Machine Intelligence, [20] J. R. Smith and S. Chang. Local color and texture 19:121–132, 1997. extraction and spatial query. In Proc. of IEEE Int. [4] M. Bouet and M.-A. Aufaure. Multimedia Data Conf. on Image Processing, volume 3, pages Mining and Knowledge Discovery, chapter New Image 1011–1014, Sep 1996. Retrieval Principle: Image Mining and Visual [21] J. R. Smith and S. fu Chang. Tools and techniques for Ontology, pages 168–184. Springer, 2007. color image retrieval. In IS&T/SPIE Proceedings, [5] S. Christodoulakis, M. Theodoridou, F. Ho, M. Papa, Storage & Retrieval for Image and Video Databases, and A. Pathria. Multimedia document presentation, volume 2670, pages 426–437, 1996. information extraction, and document formation in [22] A. Vailaya, M. Figueiredo, A. Jain, and H. Zhang. minos: a model and a system. ACM Trans. Inf. Syst., Image classification for content-based indexing. IEEE 4(4):345–383, 1986. Transaction on Image Process, 10(1):117–130, 2001. [6] R. Datta, D. Joshi, J. Li, and J. Z. Wang. Image [23] A. Yoshitaka and T. Ichikawa. A survey on retrieval: Ideas, influences, and trends of the new age. content-based retrieval for multimedia databases. ACM Comput. Surv., 40(2):1–60, 2008. IEEE Trans. on Knowl. and Data Eng., 11(1):81–93, [7] J. Eakins and M. Graham. Content-based image 1999. retrieval. University of Northumbria Technical Report, [24] C. Zhang and T. Chen. An active learning framework 1999. for content-based information retrieval. IEEE [8] A. K. Jain and A. Vailaya. Image retrieval using color Transactions on Multimedia, 4:260–268, 2002. and shape. Pattern Recognition, 29:1233–1244, 1996. [25] X. S. Zhou and T. S. Huang. Cbir: from low-level [9] S. Jiang, T. Huang, and W. Gao. An ontology-based features to high-level semantics. Image and Video approach to retrieve digitized art images. In WI ’04: Communications and Processing 2000, Proceedings of the 2004 IEEE/WIC/ACM 3974(1):426–431, 2000. International Conference on Web Intelligence, pages [26] X. S. Zhou and T. S. Huang. Unifying keywords and 131–137, Washington, DC, USA, 2004. IEEE visual contents in image retrieval. IEEE MultiMedia, Computer Society. 9(2):23–33, 2002. [10] K.-M. Lee and W. Street. Cluster-driven refinement [27] Y. Zhuang, X. Liu, and Y. Pan. Apply semantic for content-based digital image retrieval. Multimedia, template to support content-based image retrieval. In IEEE Transactions on, 6(6):817–827, 2004. Storage and Retrieval for Media Databases, volume [11] Y. Liu, D. Zhang, G. Lu, and W.-Y. Ma. A survey of 3972, pages 442–449, 1999. content-based image retrieval with high-level semantics. Pattern Recogn., 40(1):262–282, 2007.