Ethnicity Prediction Based on Iris Texture Features Stephen Lagree and Kevin W. Bowyer Department of Computer Science and Engineering University of Notre Dame Notre Dame, Indiana 46556 USA slagree@nd.edu, kwb@cse.nd.edu Abstract Related Work This paper examines the possibility of predicting ethnicity based on iris texture. This is possible if there are The CASIA biometrics research group has performed similarities of the iris texture of a certain ethnicity, and research on iris texture elements, including studies (Qiu, these similarities differ from ethnicity to ethnicity. This sort Sun, and Tan 2006; Qiu, Sun, and Tan 2007a; Qiu, Sun, of “soft biometric” prediction could be used, for example, to and Tan 2007b) on determining ethnicity based on iris narrow the search of an enrollment database for a match to probe sample. Using an iris image dataset representing 120 texture. To our knowledge, this is the only other work on persons and 10-fold person-disjoint cross validation, we predicting ethnicity from iris texture. In (Qiu, Sun, and obtain 91% correct Asian / Caucasian ethnicity Tan 2006), they report 86% accuracy in Asian / Caucasian classification. classification. Thomas et al. (2007) suggests that the work in (Qiu, Sun, and Tan 2006) may be biased due to illumination differences in the two datasets the images Introduction were taken from, the Asian subject images coming from Iris texture has been shown to be useful for biometric one dataset and the Caucasian subject images from another identification and verification (Bowyer, Hollingsworth, dataset. If one dataset was generally brighter or darker and Flynn 2008; Phillips et al. 2005; Phillips et al. 2010; than the other, this factor could have entered into the Daugman 2006). Studies have been done to determine if learned algorithm for separating the subjects based on iris texture contains information that can determine “soft lighting, not iris texture. In the results presented in this biometric” attributes of a person, such as ethnicity (Qiu, paper, we eliminate this issue by using images taken from Sun, and Tan 2006; Qiu, Sun, and Tan 2007a) or gender a single database to build our classifier, so that any (Thomas et al. 2007). This paper analyzes the possibility acquisition setup differences are just as likely to appear in of ethnicity prediction based on iris texture. The ability of either ethnicity class. In (Qiu, Sun, and Tan 2007a), the biometric systems to recognize the ethnicity of a subject CASIA group reports 91% accuracy in Asian / non-Asian could allow automatic classification without human input. ethnicity classification, using support vector machines and Also, in an iris recognition system, an identification texton features. The dataset in this work is composed of request includes a “probe” iris, which is checked against a 2,400 images representing 60 different persons, so that “gallery” of enrolled images, to find the correct identity of there are 20 images per iris. They divide the dataset into a the requested iris. One application of this feature is to 1,200-image training set and a 1,200-image test set, with narrow down the gallery of subjects to compare an iris to training and test set not specified to be person-disjoint. In for identification purposes. In a system with millions of general, if iris images from the same person appear in both enrolled subjects, comparing an iris to all subjects could the training and the test set, then the performance estimate take an extremely long time. Narrowing down the gallery obtained is optimistically biased. In the results presented to only irises with the same ethnicity as the probe iris for in this paper, we eliminate this issue by using a person- comparison could give a great speed improvement. disjoint ten-fold cross-validation. Figure 1 – Example LG 4000 Iris Images From Subjects with Caucasian Ethnicity (top: image 02463d1892; Figure 2 – Example LG 4000 Iris Images From Subjects middle: image 04327d1264; bottom: image with Asian Ethnicity (top: image 04815d908; middle: 04397d1461). image 04629d1385; bottom: image 05404d80). In a study of how human observers categorize images, observers did not know the gender or ethnicity of the Stark, Bowyer, and Siena (2010) found that humans persons in the iris images. However, the grouping of perceive general differences in iris texture that can be used images into categories of similar iris texture resulted in to classify iris textures into categories of similar texture categories that were, on average, split 80% / 20% on pattern. Observers grouped a set of 100 iris images into ethnicity. The same categories were on average divided categories of similar texture. The 100 images represented much more closely to 50% / 50% on gender. Thus, one 100 different persons, and the 100 persons were balanced result of Stark’s work (2010) is that human observers on gender and on Asian / Caucasian ethnicity. The perceive consistent ethnicity-related differences in iris texture. In this paper, we want to train a classifier to explicitly perform the sort of ethnicity classification that was found as a side effect of the texture similarity grouping done by humans in (Stark, Bowyer, and Siena 2010) and that was previously explored in (Qiu, Sun, and Tan 2006; Qiu, Sun, and Tan 2007a). Dataset We want to see how accurately we can identify ethnicity based on iris texture. For this study we will use two ethnicity classes, Caucasian and Asian. This study used 1200 iris images selected from the University of Notre Figure 3 – Examples of Segmented, Normalized Iris Dame’s iris image database. (This is a newer database than Images. (top: normalized image derived from image was released to the iris biometrics research community for 02463d1892 above; bottom: normalized image the government’s Iris Challenge Evaluation (ICE) program derived from image 04815d908 above). The green (Phillips et al. 2005; Phillips et al. 2010).) All images regions indicate the “mask” of where the iris texture were obtained using an LG 4000 sensor at Notre Dame. is occluded by eyelid / eyelash / specular highlights. As with all commercial iris biometrics systems that we are aware of, the images are obtained using near-infrared Feature Generation illumination, and are 480x640 in size. One half of the images, 600, were of persons whose ethnicity is classified After an image is segmented and normalized, we compute as Asian and the other half were from persons classified as texture features that can be used in training a classifier to Caucasian. For each ethnicity, the 600 images represented categorize images according to ethnicity. To do this we 60 different persons, with 5 left iris images and 5 right iris apply different filters to the image at every non-masked images per person. This 1,200-image dataset was pixel location, and use the results of the filter to build a randomly divided into 10 folds of 120 images each, with 6 feature vector. Six of the filters we have used are “spot persons of each ethnicity in each fold. Thus the images in detectors” and “line detectors” of various sizes, as depicted the folds are person-disjoint; that is, each person’s images in Tables I to VI. For a given point in the image, if appear in just one fold. applying a given filter would result in using any pixel that is masked, then that filter application is skipped for that point. The rest of the filters, depicted in Tables VI-VIII, Segmentation were created using Laws’ Texture Measures (Laws 1980). These are designed to give responses for various types of For this iris texture prediction study, we want to base our textures when convolved with images. findings solely on iris texture. Therefore we exclude A feature vector that describes the texture is computed periocular clues that might be used as an indicator of for each iris image. We divided the normalized image ethnicity. We segment the images to obtain the region of array into a number of smaller sections in order to compute interest, and mask out the eyelid-occluded portions of the statistics for sub-regions of the normalized image. This is iris. We use Notre Dame’s IrisBee software to perform the so that classification could be based on, for example, segmentations (Phillips et al. 2005). The output from relative differences between the band of the iris nearer the IrisBee that we use for texture examination is a 240x40 pupil versus the band of the iris furthest from the pupil. pixel normalized iris image along with the corresponding These regions were ten four-pixel horizontal bands and bitmask of eyelid and eyelash occlusion locations. The four 60-pixel vertical bands of neighboring pixels in the image segmentation and masking are exactly those that normalized iris image. The ten horizontal bands would be used by IrisBEE in processing the images for correspond to concentric circular bands of the iris, running biometric recognition of a person’s identity. However, the from the pupil out to the sclera (white) of the eye. The normalized images are not processed by the log-Gabor four vertical bands correspond roughly to the top, right, filters that are used by IrisBEE to create the “iris code” for bottom and left parts of the iris. Since the filters are biometric recognition. We create a different texture looking for different phenomena in the image, we find feature vector for ethnicity prediction. statistics for the filter response of each image. Each image contains 630 features, with 5 statistics calculated for each of the 9 filters on all of the 14 regions. The five statistics are: (1) average value of filter response, (2) standard deviation of filter response, (3) 90th percentile value of TABLE VI: Wide Horizontal Line Detector filter response, (4) 10th percentile value of filter response, -1/10 -1/10 -1/10 -1/10 -1/10 and (5) range between 90th and 10th percentile value. The +1/15 +1/15 +1/15 +1/15 +1/15 motivation for using the average value is to represent the +1/15 +1/15 +1/15 +1/15 +1/15 strength of a given spot size or line width in the texture. +1/15 +1/15 +1/15 +1/15 +1/15 The motivation for using the standard deviation is to represent the degree of variation in the response. The -1/10 -1/10 -1/10 -1/10 -1/10 motivation for using the percentiles and range is to have an alternate representation of the variation that is not affected TABLE VII: S5S5 by small amounts of image segmentation error. +1 0 -2 0 1 0 0 0 0 0 TABLE I: Small Spot Detector Filter -2 0 +4 0 -2 -1/8 -1/8 -1/8 0 0 0 0 0 -1/8 +1 -1/8 +1 0 -1 0 +1 -1/8 -1/8 -1/8 TABLE VIII: R5R5 TABLE II: Large Spot Detector Filter -1 -4 6 -4 +1 -1/16 -1/16 -1/16 -1/16 -1/16 -4 +16 -24 +16 -4 -1/16 +1/9 +1/9 +1/9 -1/16 6 -24 +36 -24 +6 -1/16 +1/9 +1/9 +1/9 -1/16 -4 +16 -24 +16 -4 -1/16 +1/9 +1/9 +1/9 -1/16 +1 -4 +6 -4 +1 -1/16 -1/16 -1/16 -1/16 -1/16 TABLE III: Vertical Line Detector Filter -1/20 -1/20 +1/5 -1/20 -1/20 Results -1/20 -1/20 +1/5 -1/20 -1/20 We tried a variety of different classification algorithms -1/20 -1/20 +1/5 -1/20 -1/20 included in the WEKA package (Weka). This included -1/20 -1/20 +1/5 -1/20 -1/20 using meta-algorithms like Bagging with other classifiers. -1/20 -1/20 +1/5 -1/20 -1/20 By changing parameters, we achieved performance gains on some of the algorithms. However, we found our best TABLE IV: Wide Vertical Line Detector Filter results using the SMO algorithm with the default -1/10 +1/15 +1/15 +1/15 -1/10 parameters in WEKA for classification. The SMO algorithm implements “Sequential Minimal Optimization”, -1/10 +1/15 +1/15 +1/15 -1/10 John Platt’s algorithm for building a support vector -1/10 +1/15 +1/15 +1/15 -1/10 machine classifier (Weka). The input to the SMO -1/10 +1/15 +1/15 +1/15 -1/10 algorithm is the feature vectors of all 1200 iris images that -1/10 +1/15 +1/15 +1/15 -1/10 we have computed. To assess the results of our classifier we use cross-fold validation with ten folds using TABLE V: Horizontal Line Detector Filter stratification based on ethnicity. These folds are also -1/20 -1/20 -1/20 -1/20 -1/20 subject-disjoint to ensure the persons whose images are in the test data have not been seen by the classification -1/20 -1/20 -1/20 -1/20 -1/20 algorithm in the training data. +1/5 +1/5 +1/5 +1/5 +1/5 The SMO classifier results in higher accuracy compared -1/20 -1/20 -1/20 -1/20 -1/20 to a broad range of other classifiers, including decision tree -1/20 -1/20 -1/20 -1/20 -1/20 based algorithms and bagging. Using Bagging on the top two classifiers, SMO and Random Forest, did not improve performance. Running the experiment with the SMO classifier and the feature vector as described above gives us an accuracy of 90.58%. This is good accuracy, representing an improvement on the 86% reported in (Qiu, Sun, and Tan 2006) and close to the 91% reported in (Qiu, Sun, and Tan 2007a) for a train-test split that was not person-disjoint. When we do not use person disjoint TABLE XII: SMO Accuracy By Fold results, we see an accuracy of 96.17%, which is Using 10 Fold Cross Validation significantly higher than Qiu, Sun, and Tan (2006; 2007a) Fold Accuracy (%) reported. 1 91.667 We computed the classification accuracy for each 2 100.000 feature separately to see the impact of individual features. 3 88.333 Table X shows that some of the single features have almost 4 90.833 have the performance of all of the features together. 5 97.500 However none of them do as well as the combination of all 6 82.500 of the features. Some filters may be redundant; a 7 98.333 combination of a few might reproduce the performance of 8 90.000 all nine filters. 9 87.500 To ensure that the size of our training dataset was not 10 79.167 limiting our accuracy levels, we ran the classifier with Average 90.583 different numbers of folds. Table XI shows the results we achieved using 5, 10, and 20 fold cross validation. The accuracy levels are all within one percent, indicating that Future Work our performance should not be limited by our dataset size. To achieve even greater accuracy, we intend to implement additional and more sophisticated features, and to look at TABLE IX: Results for Different Classifiers the effects of the size of the training set. We envision that Algorithm Accuracy (%) the number of different persons represented in the training SMO 90.58 data is likely to be more important than the number of RandomForest (100 Trees/Features) 89.50 images in the training set; that is, doubling the training set Bagged FT 89.33 by using twice as many images per person is likely not as FT 87.67 powerful as doubling the number of persons. For this experiment, we only looked at very broad ADTree 85.25 ethnicity classifications. More work could be done to J48Graft 83.67 examine finer categories, such as Indian and Southeast J48 83.08 Asian. The performance of a classifier such as this has not Naïve Bayes 68.42 been tested on subjects of multiple ethnic backgrounds either. TABLE X: Feature Performance with SMO Feature Accuracy (%) Acknowledgments Small Spot Detector 85.58 This work is supported by the Technical Support Working Large Spot Detector 85.67 Group under US Army contract W91CRB-08-C-0093, and Vertical Line Detector 87.42 by the Central Intelligence Agency. The opinions, findings, Wide Vertical Line 85.50 and conclusions or recommendations expressed in this Horizontal Line Detector 78.92 publication are those of the authors and do not necessarily Wide Horizontal Line Detector 78.33 reflect the views of our sponsors. S5S5 78.17 R5R5 73.33 E5E5 88.0 References All Features 90.58 Bowyer, K.W.; Hollingsworth, K.; and Flynn, P. J. Image TABLE XI: SMO By Number of Folds Used in Cross Understanding for Iris Biometrics: A Survey, Computer Vision Validation and Image Understanding, 110(2), 281-307, May 2008. Folds Accuracy (%) Daugman, J., Probing the Uniqueness and Randomness of Iris 5 90.00 Codes: Results From 200 Billion Iris Pair Comparisons, Proceedings of the IEEE, Nov. 2006, 94 (11), 1927 – 1935. 10 90.583 Laws, K. Textured Image Segmentation, Ph.D. Dissertation, 20 90.1667 University of Southern California, January 1980. Phillips, P. J.; Bowyer, K. W.; Flynn, P.J; Liu, X; and Scruggs, T. Thomas, V; Chawla, N; Bowyer, K. W.; and Flynn, P. J. Learning W. The Iris Challenge Evaluation 2005, Biometrics: Theory, to predict gender from iris images. In Proc. IEEE Int. Conf. on Applications and Systems (BTAS 08), September 2008, Biometrics: Theory, Applications, and Systems, Sept 2007. Washington, DC. Qiu, X. C.; Sun, Z. A.; and Tan, T. N. Global texture analysis of Phillips, P. J.; Scruggs, W. T.; O'Toole, A.; Flynn, P.J.; Bowyer, iris images for ethnic classification. In Springer LNCS 3832: Int. K.W.; Schott, C. L.; and Sharpe, M. FRVT 2006 and ICE 2006 Conf. on Biometrics, pages 411-418, June 2006. Large-Scale Experimental Results, IEEE Transactions on Pattern Qiu, X. C.; Sun, Z. A.; and Tan, T. N. Learning appearance Analysis and Machine Intelligence 32 (5), May 2010, 831-846.. primitives of iris images for ethnic classification. In Int. Conf. on Stark, L; Bowyer, K.W.; and Siena,S. Human perceptual Image Processing, pages II: 405–408, 2007a. categorization of iris texture patterns, Biometrics Theory, Qiu, X. C.; Sun, Z. A.; and Tan, T. N. Coarse iris classification by Applications and Systems (BTAS), September 2010. learned visual dictionary. In Springer LNCS 4642: Int. Conf. on Biometrics, pages 770–779, Aug 2007b. Weka 3. http://www.cs.waikato.ac.nz/ml/weka/.