=Paper=
{{Paper
|id=Vol-1173/CLEF2007wn-ImageCLEF-TushabeEt2007
|storemode=property
|title=Content-based Image Retrieval Using Shape-Size Pattern Spectra
|pdfUrl=https://ceur-ws.org/Vol-1173/CLEF2007wn-ImageCLEF-TushabeEt2007.pdf
|volume=Vol-1173
|dblpUrl=https://dblp.org/rec/conf/clef/TushabeW07a
}}
==Content-based Image Retrieval Using Shape-Size Pattern Spectra==
Content-based Image Retrieval Using Shape-Size Pattern Spectra Florence Tushabe and Michael. H. F. Wilkinson University of Groningen, The Netherlands (m.h.f.wilkinson, florence)@cs.rug.nl Abstract This paper presents the results of using the shape, size and color features of an image for content-based image retrieval. The use of granulometries has been applied to model the size and shape of the connected components of the image. Granulometry are computed by successively sieving an image using filters of increasing size parameter so that information can be obtained about the components that filter through. A shape-size granulometry using sieves of increasing shape and sizeparameters represents the shape and size distributions of an image. The results of granulometry are stored in a 2-D pattern spectrum that is implemented using attribute thinnings and openings. Additionally, the pattern spectra is extracted from the red, green and blue color bands of the images, concatenated and compared using the L1 Norm and the Jensen-Shannon divergence (JSD). Our results show that incorporating color information improves the performance by 23% and L1 Norm outperforms JSD by 20%. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Infor- mation Search and Retrieval; H.3.4 Systems and Software; H.3.7 Digital Libraries; I.5 [Pattern Recognition]: I.5.3 Clustering—Similarity Measures; G.2 [Discrete Mathematics]: G.2.2 Graph Theory General Terms Measurement, Performance, Experimentation Keywords Max-tree, pattern spectrum, shape-size pattern spectra, granulometry, attribute filtering, shape representation, IAPR TC-12 1 Introduction The most recent advances in image retrieval show that text-based methods still outperform content-based ones by approximately 30% [3]. Yet the question: ” What sentence can describe an image” is still being researched by linguists, computer scientists and other academicians. This is because different people will describe an image quite differently, depending on their interests and prior knowledge. Some users look at an image in terms of the shape of its components, others the size, color, texture, orientation or spatial location etcetera . In theory, using the features of an image as its signature presents the users with a precise and focussed way of describing an image according to their interests. User satisfaction is enhanced when one is presented with a variety of ways to define his need. This is because the retrieval engine is customised according to the se- lected choice / requirement. This study retrieves images based upon the shape of its components by using mathematical morphology techniques. Attribute filtering is used to identify subsets of the image that satisfy the requirements of a particular property or attribute. Forexample, attributes like moments and length (or width / volume) are useful for spatial and size information respec- tively [10]. The selected components are then used for further analysis like the formation of a pattern spectrum. Pattern spectrum has been used to obtain information about the contents and distribution of an image including it’s spatial location [10, 9, 5]. We present the results of using pattern spectra for content-based image retrieval in large-scale databases. It is an extension of the method proposed by Urbach et al [8] but combined with color information and applied within the image retrieval context. The objective was to identify the 1000 most similar images from the 20,000 IAPR TC-12 database by using a purely visual, fully automated modality [2]. Section 2 briefly describes the theory of our method, the experiments and results are contained in Section 3 and 4 and Section 5 has the concluding remarks and on-going work. 2 2-D Shape-Size Pattern Spectra Pattern spectrum is a quantitative description of the contents of an image after the application of a granulometric operation [5]. Granulometry is when an image is successively sieved using filters of increasing diameters, r, so that information can be obtained about its components that satisfy the properties of r. A size granulometry obtains the size distribution of an image by using sieves of increasing sizes herein represented by area. Let X , Y represent an image then the size granulometry (Γr ) is a set of filters {Γr } with r from some totally ordered set Λ (usually Λ ⊂ R or Z ) satisfying the properties: Γr (X) ⊆ X (1) X ⊆ Y ⇒ Γr (X) ⊆ Γr (Y ) (2) Γs (Γr (X)) = Γmax(r,s) (X), (3) for all r, s ∈ Λ. Equations (1),(2) and (3) define Γr as being anti-extensive, increasing and idempotent respectively. Attribute openings are size granulometries since they share the same properties [1, 10]. The pattern spectrum, sΓ (X), obtained by applying the size granulometry, {Γτ }, to a binary image X is defined by [5] as: dA(Γr (X)) (sΓ (X))(u) = − (4) dr r=u where A(X) is the area of X. Similarly, shape granulometry filters image contents using sieves of different shapes [8, 9]. Urbach and Wilkinson [9] defined shape granulometry, of X, as a family of filters, {Φr }, with shape parameter, r, from some totally ordered set Λ (usually Λ ⊂ R or Z) with the following properties: Φr (X) ⊆ X (5) Φr (tX) = t(Φr (X)), (6) in which tC stands for a scaling of set C by a factor t, and, Φs (Φr (X)) = Φmax(r,s) (X), (7) for all r, s ∈ Λ and t > 0. Equations (5),(6) and (7) define Φr as anti-extensive, scale invariant and idempotent respectively. Therefore, attribute thinnings are shape granulometries since they share the same properties including the non-increasing criterion[8]. The shape pattern spectrum, sΦ (X), is obtained by applying the shape granulometry, {Φτ }, to a binary image X and defined as [8]: dA(Φr (X)) (sΦ (X))(u) = − (8) dr r=u C30 C31 ? 2 2 4 P30 2, 4 C20 3 P20 P31 3, 1 2, 3 @R 0 @ C1 C11 P10 P11 8, 2 2, 5 8 @R 0 @ P00 13, 1 C0 peak components node members max-Tree 2-D spectrum Figure 1: Peak components(Phk ), corresponding (Chk ), the resultant max-tree and the corresponding pattern spectrum (right) as above. In our case we use a shape granulometry based on the ratio I(C)/A2 (C), with I(C) the moment of inertia of C. This shape parameter is equivalent to Hu’s first moment invariant [8]. The final feature vector resulting from the combination of size and shape granulometries formed the 2-D shape-size pattern spectrum as a shape descriptor that we used. Further details about this are found in [8]. 2.0.1 Computing the shape-size pattern spectra The same method as was used in [8] has been adopted for application within CBIR. The max-tree approach [6, 7] was used to implement the attribute thinnings and openings in grey-scale. The subsets (connected components) of the image are arranged into a tree structure and filtered by removing the nodes which do not satisfy a given criterion. A max tree is a rooted tree in which each of its nodes, Chk , at gray-level h corresponds to a peak component, Phk [6]. An example is shown in Figure 1 which illustrates the peak components, Phk , of a 1-D signal, the corresponding Chk at levels h = 0, 1, 2, 3, the resultant max-tree and corresponding spectrum. Let {Γr } be a size distribution with r from some finite set Λr and {Φs } a shape distribution with s from some index set Λs . If S is the 2-D array that stores the final 2-D spectrum, then each cell, S(r, s), contains the sum of gray levels of Chk that falls within size class r− and r and shape class s− and s. The 2-D pattern spectrum is then computed from the max-tree as follows: • Set all elements of the array S to zero. • Compute the max-tree according to the algorithm in [7]. • As the max-tree is built, compute the area A(Phk ) and moment of inertia I(Phk ) of each node. • For each node Chk : – Compute the size class r from the area A(Phk ); – Compute the shape class s from I(Phk ) / A2 (Phk ); – Compute the gray level difference δh , between the current node and its parent; – Add the product of δh and A(Phk ) to S(r, s). 3 Experiments The objective of our experiments was: given a sample image, find as many relevant images as possible from the IAPR TC-12 photographic collection [3]. The procedure that was undertaken is as follows: Figure 2: Performance of component sizes 1. Images are first pre-processed by filtering off all components whose area is less than 30% of the total image size. This was after comparisons with other size ranges indicated that the discriminative power lies in the large particles. Figure 2 illustrates detailed MAP results after using different class sizes. It is observed that the smaller particles performed poorer than the larger ones and that using all particles without any filtering performs worse than using particles over 30% or 75% of total area. 2. The shape-size pattern spectrum is then extracted from all images including the query im- ages. The size and shape ranges were between 50, 000 − 172, 800 and 1 − 53 respectively. A 20 by 15 bin histogram which translates into a 1 × 600 array representation per image was chosen. Additionally, conversion of the images into XYZ and YUV color spaces as well as grayscale (PGM) was performed. This was to test whether additional information is cap- tured from the color matrices. Visual analysis showed that unlike YUV and XYZ which performed worse than the grayscale images, RGB representation improved the results. 3. One of the three topic images was selected as the query image according to what the authors thought provided the most meaningful retrievals. The list of the chosen topic images is provided in the appendix. 4. The similarity measure used is the reciprocal of the L1 Norm. 1 Sim(X, Y ) = P (9) (xi − yi ) 3.1 Submitted Runs Four runs were submitted: 1. PGM2007 - the shape-size pattern spectrum of the grayscale images are compared. 2. Max2007 - the maximum of red, green and blue bands of the images are compared. 3. Concatenate2007 - the concatenated red, green and blue bands of the images are compared (a 1 × 1800 array). 4. JSD2007 - same as (3) above but comparison with the Jensen-Shannon divergence [4]. Table 1: performance of submitted runs Run MAP P20 Relevant % improvement Concatenate 0.0337 0.1050 724 23 M ax 0.0288 0.0933 699 5.5 JSD 0.0281 0.0925 701 2.9 P GM 0.0273 0.0925 693 - 4 Results We observed that most queries returned images based on their shapes. For example an image of a hill returns images of hills / mountains. So is the case with images of people, beaches, skies, buildings etcetera. An image with more buildings than people returns more results of buildings (some of them without people) than people. A group of people standing in front of a mountain landscape returns a group of people in front of buildings, mountains or in a city square or anywhere else. But if the mountains cover a larger percentage of the picture, then the returned results are more of mountains than people. Similarly, a duck swimming in water returns many images of birds flying in the sky and hens eating on a road. This is because the shapes of ducks, birds and hens are similar. The texture of the image also affects the performance of this method. For example, the image of a bird flying in a clear sky returns many images of birds or aeroplanes flying in the sky. But the one with a bird flying in a cloudy sky returns pictures of beds, clouds without birds, mountains covered in snow and some beach pictures. One of the poorest performances was the meat dishes which returned buildings, landscapes, beds, and people with no food and without food. The overall performance of the submitted runs is shown in Table 1. The RGB concatenated version using L1 Norm produces the best results and improves performance by 23%. As expected, all methods that incorporate color information perform better than the one without (PGM). This shows that more information has been obtained from the color matrices and so images with similar colors are grouped closer together. For example, images taken at night return more black pictures just as the ones taken in deserts return more brown ones. Interestingly, the images with two dominant colors seemed to outperform those with only one dominant color or many colors. Table 1 also gives surprising results that show how the maximum of the RGB bands using L1 norm outperformed the concatenated version using JSD. This could be due to the loss of information during normalization. This clearly shows that the use of an appropriate similarity measure significantly contributes to the performance of our system. 5 Conclusion This paper uses the shape-size pattern spectrum as the fundamental image signature for content- based image retrieval. The pattern spectrum is obtained from the red, green and blue color bands of the image and compared using the L1-Norm. Our experiments have shown that this results in very logical and interesting performance although the MAP scores remain low. We believe that one of the ways of improving the MAP scores is to represent a single topic by the combination of the three provided sample images. This can be achieved by using relevance learning techniques already in place. Secondly, the results have shown that textural information, especially within the background, affects the performance of this system. We intend to incorporate both textural and spatial information into the pattern spectrum. Other works in progress include identifying the most appropriate shape parameters, attributes and similarity measures. References [1] E.J. Breens and R. Jones. Attribute openings, thinnings and granulometries. Computer Vision Image Understanding, 64(3):377–389, 1996. [2] M. Grubinger, P. Clough, and L. Clement. The iapr tc-12 benchmark for visual information search. IAPR Newsletter, 28(2):10–12, 2006. [3] M. Grubinger, C. Paul, A. Hanbury, and M. Henning. Overview of the ImageCLEF 2007 photographic retrieval task. In Working Notes of the 2007 CLEF Workshop, Budapest, Hungary, Sep 2007. [4] J Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory, 37(1):145–151, 1991. [5] P Maragos. Pattern spectrum and multiscale shape representation. IEEE Trans. Pattern Anal. Mach. Intell., 11(7):701–715, 1989. [6] A Meijster and M. H. F. Wilkinson. A comparison of algorithms for connected set openings and closings. IEEE Trans. Pattern Anal. Mach. Intell., 34(4):484–494, 2002. [7] P Salembier, A Oliveras, and L Garrido. Antiextensive connected operators for image and sequence processing. IEEE Trans. Image Proc., 7(4):555–570, 1998. [8] E R. Urbach, J. B. T. M. Roerdink, and M. H. F. Wilkinson. Connected shape-size pat- tern spectra for rotation and scale-invariant classification of gray-scale images. IEEE Trans. Pattern Anal. Mach. Intell., 29(2):272–285, 2007. [9] E. R. Urbach and M. H. F. Wilkinson. Shape-only granulometries and grey-scale shape filters. In Proc. Int. Symp. Math. Morphology (ISMM) 2002, Sydney, Australia, April 2002. [10] M.H.F. Wilkinson. Generalized pattern spectra sensitive to spatial information. In Proceed- ing of the 16th International Conference on Pattern Recognition, volume 1, pages 701–715, Quebec City, Canada, August 2002. Appendix - Selected topic images and corresponding MAP Topic Image MAP Topics Image MAP 1 topics/01/3793 0.0019 31 topics/31/3616 0.001 2 topics/02/16432 0 32 topics/32/14973 0.0016 3 topics/03/31 0.028 33 topics/33/32317 0.0703 4 topics/04/23221 0.0012 34 topics/34/11601 0.0189 5 topics/05/30823 0.1956 35 topics/35/25004 0.1554 6 topics/06/37754 0.0655 36 topics/36/13033 0.0192 7 topics/07/15515 0.0455 37 topics/37/16336 0.0027 8 topics/08/1541 0.0134 38 topics/38/18059 0.0288 9 topics/09/1082 0 39 topics/39/14560 0.0002 10 topics/10/4072 0.0089 40 topics/40/11645 0.0037 11 topics/11/37367 0.0094 41 topics/41/2721 0.0224 12 topics/12/31646 0.069 42 topics/42/39273 0.0033 13 topics/13/26719 0.0421 43 topics/43/23909 0.0364 14 topics/14/31609 0.0665 44 topics/44/31871 0.0055 15 topics/15/2125 0.2106 45 topics/45/12477 0.0001 16 topics/16/38220 0.0185 46 topics/46/31312 0.0085 17 topics/17/33458 0.0113 47 topics/47/35974 0.001 18 topics/18/38195 0.0184 48 topics/48/35705 0.0491 19 topics/19/35791 0.009 49 topics/49/40214 0.0594 20 topics/20/38894 0.0022 50 topics/50/7301 0.003 21 topics/21/26437 0.0003 51 topics/51/23792 0.0058 22 topics/22/31715 0.0694 52 topics/52/32350 0.0029 23 topics/23/37505 0.0128 53 topics/53/2174 0.0601 24 topics/24/34159 0.0061 54 topics/54/35879 0.1016 25 topics/25/37874 0.0535 55 topics/55/14097 0.2525 26 topics/26/17051 0.0019 56 topics/56/4614 0.0362 27 topics/27/32195 0.0599 57 topics/57/32629 0.0036 28 topics/28/16432 0 58 topics/58/31189 0.003 29 topics/29/32759 0.0065 59 topics/59/17327 0.0133 30 topics/30/17057 0.0016 60 topics/60/27037 0.0007