Image Recognition Using Kullback-Leibler Information Discrimination Andrey Savchenko, National Research University Higher School of Economics, B. Pecherskaya St. 25/12, 603155 Nizhniy Novgorod, Russian Federation avsavchenko@hse.ru Abstract. The problem of automatic image recognition based on the minimum information discrimination principle is formulated and solved. Color histograms comparison in the Kullback–Leibler information metric is proposed. It’s combined with method of directed enumeration alternatives as opposed to complete enumeration of competing hypotheses. Results of an experimental study of the Kullback-Leibler discrimination in the problem of face recognition with a large database are presented. It is shown that the proposed algorithm is characterized by increased accuracy and reliability of image recognition. Keywords: Image recognition, method of directed enumeration alternatives, Kullback-Leibler information discrimination 1 Introduction The well-known challenging problem of the image recognition [1, 2] is processing of large image databases [3, 4, 5]. Traditional pattern recognition methods [6] based on exhaustive search can’t be implemented in real-time applications. For this case a method of directed enumeration alternatives (NDEA) [3] has been proposed as opposed to the traditional method of complete enumeration of competing hypotheses [6]. MDEA can be exploited with different metrics, but the efficiency of that method highly depends on the quality of applied metric in terms of given image database [3]. Hence the choice of metric to compare images becomes quite significant [8]. One of the most perspective methods of image recognition is based on image color histogram comparison [9], [10]. The histogram-based methods are very suitable for color image recognition, because such methods are unaffected by geometrical characteristics of the images, such as translation and rotation [10]. It’s shown [11] that histogram-based methods could provide quality comparable with special methods based on specific information (i.e. methods for faces recognition [12]). In this paper, we propose a novel histogram-based method which applies minimum discrimination information (MDI) principle [13]. It is known to be an effective tool for solving various problems of pattern recognition. Meanwhile, its capabilities have not been fully exploited. In particular, almost no studies have addressed the advantages of the MDI principle over traditional methods and approaches in problems of automatic image recognition, especially, for half-tone images as one of the most complex cases in the theory and practice of pattern recognition [2, 3]. The present paper seeks to fill this gap. The rest of the paper is organized as follows: Section 1 introduces image recognition problem and the usage of minimum discrimination information principle and Kullback-Leibler discrimination [14] to compare color histograms. In Section 3, we present metric properties of proposed decision statistics. In Section 4, we introduce new modification of MDEA [3], which can be used to reduce the amount of calculations needed for recognition. In Section 5, we present the experimental results and analyze the proposed method in the acute problem of face images recognition. Finally, concluding comments are presented in Section 6. 2. Minimum Information Discrimination Criterion r , ( u = 1, U , v = 1, V ) be specified. Here Let a set of R>1 half-tone images X r = xuv { r ∈ 0,1, K , x U and V are the image height and width, respectively, xuv max } is the intensity of an image point with coordinates (u, v); r is the reference number (r = 1,…,R), and x max is the maximum intensity. It is assumed that the templates X r define some classes of images, for example, as a method of protection against noise. Furthermore, the objects belonging to each class have some common features or similar characteristics. The common feature that unites objects in a class is called a pattern. It is required to assign a new input image X = xuv to one of the R classes. The procedures for constructing decision rules for the problem are developed in accordance with some deterministic or statistical approach. The deterministic approach is currently the most widely used. This approach seeks to determine a certain distance (measure of similarity) between any pairs of objects. For image recognition, one often uses the criterion based on the standard metric l1: 1 U V r ρ1 ( X / X r ) = ∑ ∑ x − xuv → min (1) U ⋅ V u = 1 v = 1 uv However, this approach does not always provide satisfactory results. The first reason is well-known [15] variability of visual patterns. The second reason is the presence of noise in the input image X, such as unknown intensity of light sources or simply random distortions of some points of the image. In the deterministic approach, the indicated problems are usually solved by adding new images to database, which, in turn, leads to a dramatic increase in its size. The above-mentioned difficulties are overcome by invoking a statistical approach [7, 8]. Let's consider a random variable - template X r color of a point. It’s distribution   H r = h1r , h2r , K , h xr r  could be evaluated based on matrix xuv  max  1 U V h xr = ( ∑ ∑ δ x −x U ⋅ V u = 1 v = 1 uv ) (2) 1, x = 0, Here δ (x ) =  -discrete Dirac delta-function. H r is often called “color 0, otherwise histogram” [10], [11] of image X r . The same procedure for color histogram H definition is applied for the input image X. Color histograms’ comparison has widely been using in the image recognition problem [9], [10]. The common way to compare histograms is “merged histogram method” [10] x max  r  ∑ min h x , h x  → max (3) x =1   r Unfortunately, images with the same visual information, but with shifted color intensity, may significantly degrade if the conventional method of direct comparison of histograms is used. Actually, if the input image X is one of the images X r from database but all pixels are decolourized, its color histogram H will be quite different from H r . The common approach for solving the problem is shifting histograms [11] after their evaluation using (2). This procedure may cause the increase of decision time. Hence in this paper we use normalization of images’ intensities [16] which could be done really efficient and doesn’t influence recognition’s average time According to statistical approach, recognizing image X is supposed to be a received signal of noisy communication channel where transmitted signal is one of {X r } image. Hence the problem is to minimize the mutual-entropy (or Kullback-Leibler information discrimination) [13] between color distributions of template image and recognizing image. Based on this approach we assume that histogram H r stands for the distribution of discrete certain random variable – image color. This interpretation seems justified considering the common properties of discrete distribution are valid for Hr: xmax ∑ hxr = 1 (normalization condition) and h xr ≥ 0, x = 1, x max (regularization condition x =1 [13]). Based on such image probability model, it is required to verify R hypotheses on the distribution H r , r = 1, R of the input image signal X. It’s well-known [8], that the optimal decision of the problem of statistical check of hypotheses about distribution of discrete stochastic variable in Bayesian terms is equivalent to the minimum discrimination information principle and the optimal decision rule x max ρ KL X / X r = ∑ h x ln h x / h xr  ( ) (4) x =1   ( ) Here the statistic ρ KL X / X r defines the Kullback–Leibler information discrimination [14] between the observed image X and its rth template from set {X r } . Thus, the image recognition procedure in this case involves a multichannel processing scheme in which the number of channels R is given by images’ database size. Decision making is based on the statistic minimum criterion from expressions (1) or (3) for traditional image recognition methods [1] or from expression (4) when using proposed criterion and the Kullback-Leibler discrimination [3], [14]. 3. Metric Properties of Information Discrimination Decision Statistics We consider the most important and difficult case R>>1, where the image recognition problem is solved with a template images’ set containing hundreds and thousands of images. For the specified conditions, practical implementation of the optimal decision rule (4) by the R-channel processing scheme encounters the obvious problem of its computational complexity and even feasibility, especially considering the labor-consuming procedure of image equalization in accordance with multiple parameters: size, color, project view, etc. The present work seeks to develop methods other than complete enumeration of reference images’ set to solve the above problem. We first note the metric properties of the Kullback-Leibler discrimination decision statistic ρ KL ( X / X r ) ≥ 0 , which is equal to zero only in the ideal case of coincident input and template signals. Therefore, we first transform the minimum discrimination information criterion (4) to a simplified form, suitable for practical implementation [3]: Wν ( X ) : ρ KL ( X / Xν ) < ρ = const (5) 0 Here ρ 0 is the threshold for the admissible information discrimination on the set of similarly-named images due to their known variability. The value of this threshold is easy to find experimentally by fixation of the beta error probability. { β = P ρ KL ( X / Xν ) < ρ 0 W ν } It’s known [13], [17], that if recognizing image distribution is the same as for template Xν then 2UV ⋅ ρ KL ( X / Xν ) is distributed according to the chi-square distribution with x max − 1 degrees of freedom. For other classes X r , r ≠ ν , random variable 2UV ⋅ ρ KL ( X / Xν ) is distributed according to the noncentral chi-square distribution with x max − 1 degrees of freedom and noncentral parameter λ = 2UV ⋅ ρ KL ( X r / Xν ) . Thus, in practice ( U ⋅V >> 1 ), threshold could be determined from the following expression 1 R β= ∑ H  ρ − min ρ ( X / Xν )  R r =1  0 ν ≠ r KL r (6) 0, x < 0 where H ( x ) =  is a Heaviside step function. 1, x ≥ 0 In fact, expression (5) defines the termination condition for the enumeration procedure using the MDI criterion (4). Thus, in decision-making process based on the minimum discrimination information principle (4), instead of looking through all templates, one needs to calculate the value of Kullback-Leibler divergence only until it becomes smaller than a certain threshold level. It is easy to see that this circumstance should reduce the amount of enumeration by 50% in average A natural development of this idea is the MDEA described below, which fully exploits the metric properties of the Kullback-Leibler decision statistic (4). 4. Method of Directed Enumeration Alternatives Following the general computation scheme (2), (4), we reduce the image X recognition problem to a check of the first N variants X 1 ,..., X N from the specified database {X r } subject to the condition N<> p 0 = (10)   R This is the effect of the directed enumeration. The difference in the steps’ count K can be explained by the depending probability p from applied metric and properties of input image and give images’ set. Thus, the system of expressions (2), (4)–(9) defines the proposed MDEA modification in the image recognition problem. This modification differs from the (M ) original method [3] in the way of evaluation X set (6). In the initial version iN discrimination extrapolation based on autoregressive model [20] was used causing increase in the amount of calculations. 5. Experimental Results The experiment deals with the problem of face images recognition. We use a real large database of photographs of people [21]. The photos were preliminary processed to detect faces using OpenCV library. Then detected faces were spited into 16 (4x4) parts for information discrimination computation. Such fragmentation is used to take into account heterogeneous illumination of images. The discrimination between images was calculated as a sum of discriminations (4) between these parts. The 6000 photographs of 400 different people were selected as templates R=900 of the most different images using standard clusterization [7]. Thus, the number of elements involved in search procedures at the next stages of the algorithms was decreased in 5 times. In addition, 1000 more photographs of the same people were used in the test. These test images were modified by reducing the intensity of all of their points (blanking) to demonstrate the illumination’s influence. In each case, the problem of image X recognition was solved. In the first case the metric (1) was used and the following method parameters were chosen: N=9 and M=64. The threshold ρ 0 = 30 was chosen experimentally. Here the exact solution X * (the same person photograph as from input image X) was obtained in 90.5% of the cases. For each solution, it was required to check, in average, 51% of the total size of images database R. Fig. 1. Histogram of the Number of Checks per Template Images’ Database Size In the second case illuminated test images and the Kullback–Leibler metric (4) were used. Threshold ρ 0 = 0.019 was determined from (5) by fixation beta-error probability to 5%. Using the MDEA (2), (4)–(9) with the parameters N, M from the previous experiment, we obtained an average number of checks equal to 13% of R. A histogram of the number of checks carried out by MDEA for this case is shown in Fig. 1. With a probability of 90%, the number of template images to check does not exceed 15% of R. In this case, condition (4) was not satisfied for any template from the given database for 7.1% of the initial images; therefore, all R alternatives were checked. In 96% of the cases, the exact solution was obtained.  (M )    Fig. 2. Dependence of Probability p = P  X * ∈ X iN  on ρ KL  X / X i N  .     (M ) The probability of desired image X * containing in X from (10) for iN Kullback-Leibler discrimination was equal to p=33% which is quite greater then M 64 p0 = = = 7% . Dependence of probability p on the discrimination R 900   ρ KL  X / X i  is shown at Fig.2.  N  There is an unquestionable advantage in using the Kullback–Leibler discrimination in this problem and this is due to the fact that, in our example, the input images were artificially distorted (blanked). Let us now consider the case where all images are equally illuminated. Use of the metric l1 from expression (1) and the directed enumeration method with a threshold ρ 0 = 15 gives much better results then it was for the first experiment. The error probability reduces to 6%, and the average number of checks decreases to 21% of R. However, the Kullback–Leibler metric (3) seems to be more efficient even in that case of equal illumination. Those metric and proposed methods show practically the same result as for the previous experiment. The error probability reduces to 3,2 with 12% of average number of checks. For comparison, merged histogram method (3) was used. The error probability for the second experiment (non-illuminated images) was estimated to 3,1% with 11% average number of checks according to MDEA. However, in the first experiment with illuminated test images the quality of (3) was much worse – 6,5% error with 20% of average distance (3) calculation. 6. Conclusion The problem of increasing the computation speed has attracted considerable interest of experts in both the theory and practice of image recognition. Despite huge number of approaches, most of the algorithms compare an input image with each template image, and unavoidably cannot be implemented in real-time mode for large databases. For solution of that problem directed enumeration method [3] may be used. This method is based on an information theoretical approach which uses the metric properties of the Kullback-Leibler decision statistic [3], [13] and possesses wide functional capabilities and high operational properties. The point of fundamental importance in this enumeration method is the termination criterion (4). Even in the most unprofitable case, the method almost halves the amount of computations. The use of the proposed method in the formulation (2)–(8) reduces the computational complexity by 10–20%. If proposed discrimination is combined with clustering [7] of the database, the performance increases in 40-50 times. It’s shown that efficiency of proposed method depends on the quality of applied metric in terms of given database. Thus, the metric choice becomes very important. In this paper, the new histogram-based method using minimum discrimination information principle is proposed for histogram-based image recognition. It’s based on the dominant colors in images. The method is very suitable for color image recognition because it is unaffected by geometrical changes in images, such as translation and rotation. However, images with similar visual information but with shifted color intensity may result in a significant degradation in the similarity level, if the conventional histogram intersection method is used. To solve the problem, the histogram method in Kullback-Leibler metric was proposed. Our experimental results show that histogram methods (3), (4) of image recognition have significantly higher recognition effectiveness than the standard methods using l1. metric for pixels comparison. Our method of directed enumeration alternatives is more straightforward then traditional recognition methods used with large databases [4]. It doesn't have special requirements or extra restrictions for images to recognize. Meanwhile, the quality of the solution reached by the method is comparable to that obtained by continuous enumeration of given database [12] using special methods of faces recognition. The main advantage is that we reach the same quality with 50-times reduce of the computational complexity using proposed method. Our further work on image recognition will continue in the following directions: - investigating more effective halftone image models (for example, models based on gradient direction features), - finding metric thresholds to increase recognition accuracy, - presenting experiments with recognition for most popular image databases (FERET, Yale, AT&T, etc.). References 1. Rui Y., Huang T., Chang S.F.: Image retrieval: current techniques, promising directions and open issues, Visual Communication and Image Representation 10, 39–62 (1999) 2. Flickner M., et al.: Query by image and video content: The QBIC system. IEEE Computer 28(9), 23–32 (1995) 3. A. V. Savchenko: Method of directed enumeration of alternatives in the problem of automatic recognition of half-tone images, Optoelectronics, Instrumentation and Data Processing 45(3), 83–91 (2009). 4. Jia Z., Amselang L., Gros P.: Content-based image retrieval from a large image database, Pattern Recognition 11(5), 1479-1495 (2008). 5. Huet B., Hancock E.R.: Shape recognition from large image libraries by inexact graph matching, Pattern Recognition Letters 20(11-13), 1259-1269 (1999). 6. Cover T.M., Hart P.E.: Nearest Neighbor Pattern Classification, IEEE Trans. Information Theory 13, 1968, 21-27 7. Theodoridis S., Koutroumbas C.: Pattern Recognition, 4th edn. Elsevier Amsterdam (2009) 8 Eickeler S., Jabs M., Rigoll G.: Comparison of Confidence Measures for Face Recognition, Fourth IEEE International Conference on Automatic Face and Gesture Recognition (FG'00), 257-263 (2000), 9 Min R., Cheng H.D.: Effective image retrieval using dominant color descriptor and fuzzy support vector machine, Pattern Recognition 42(1), 147-157 (2009). 10. Wong K.M., Cheung C.H., Po L.M.: Dominant Color Image Retrieval using Merged Histogram. Proc. the 2003 Int. Symposium, (2003). 11. Yoo G.-H., Kim B.K., You K.S.: Content-based image retrieval using shifted histogram, ICCS, LNCS 4489, 894–897 (2007). 12. Foon N. H., Jin A. T. B., Ling D. N. C.: Face Recognition Using Wavelet Transform and Non-negative Matrix Factorization, Proc. 7th Australian Joint Conference on Artificial Intelligence, Cairns, 192-202 (2004) 13. Kullback S.: Information Theory and Statistics, Dover Pub., New York (1978). 14. Kullback S., Leibler R.A.: On information and sufficiency, Annals of Mathematical. Statistics 22, 79–86 (1951). 15. Oppenheim A. V.: Discrete-time signal processing, Prentice-Hall, (1989). 16. Zhao W., Chellappa R. ed. Face Processing: Advanced Modeling and Methods Elsevier/Academic Press, (2005) 17. Kupperman M.: Further applications of information theory to multivariate analysis and statistical inference, Dissertation, Graduate Council of George Washington University, (1957). 18. Gonsalvesh M., Papa J., Zhang B. etc: A genetic programming framework for content- based image retrieval, Pattern Recognition 42(2), 283-292 (2009). 19. Voskoboinikov Yu.E., Litvinov L.A.: Choosing the stopping iteration in iterative algorithms of image and signal reconstruction, Optoelectronics, Instrumentation and Data Processing 40(4), 3–9 (2004). 20. Marpl S L (Jr): Digital Spectral Analysis with Applications, Englewood Cliffs, N.J.: Prentice-Hall, (1987) 21. Essex Faces database, http://cswww.essex.ac.uk/mv/allfaces/index.html