Ethnicity Prediction Based on Iris Texture Features
                                            Stephen Lagree and Kevin W. Bowyer
                                              Department of Computer Science and Engineering
                                                         University of Notre Dame
                                                      Notre Dame, Indiana 46556 USA
                                                     slagree@nd.edu, kwb@cse.nd.edu


                            Abstract                                                           Related Work
  This paper examines the possibility of predicting ethnicity
  based on iris texture. This is possible if there are                  The CASIA biometrics research group has performed
  similarities of the iris texture of a certain ethnicity, and          research on iris texture elements, including studies (Qiu,
  these similarities differ from ethnicity to ethnicity. This sort      Sun, and Tan 2006; Qiu, Sun, and Tan 2007a; Qiu, Sun,
  of “soft biometric” prediction could be used, for example, to         and Tan 2007b) on determining ethnicity based on iris
  narrow the search of an enrollment database for a match to
  probe sample. Using an iris image dataset representing 120            texture. To our knowledge, this is the only other work on
  persons and 10-fold person-disjoint cross validation, we              predicting ethnicity from iris texture. In (Qiu, Sun, and
  obtain 91% correct Asian / Caucasian ethnicity                        Tan 2006), they report 86% accuracy in Asian / Caucasian
  classification.                                                       classification. Thomas et al. (2007) suggests that the work
                                                                        in (Qiu, Sun, and Tan 2006) may be biased due to
                                                                        illumination differences in the two datasets the images
                        Introduction                                    were taken from, the Asian subject images coming from
Iris texture has been shown to be useful for biometric                  one dataset and the Caucasian subject images from another
identification and verification (Bowyer, Hollingsworth,                 dataset. If one dataset was generally brighter or darker
and Flynn 2008; Phillips et al. 2005; Phillips et al. 2010;             than the other, this factor could have entered into the
Daugman 2006). Studies have been done to determine if                   learned algorithm for separating the subjects based on
iris texture contains information that can determine “soft              lighting, not iris texture. In the results presented in this
biometric” attributes of a person, such as ethnicity (Qiu,              paper, we eliminate this issue by using images taken from
Sun, and Tan 2006; Qiu, Sun, and Tan 2007a) or gender                   a single database to build our classifier, so that any
(Thomas et al. 2007). This paper analyzes the possibility               acquisition setup differences are just as likely to appear in
of ethnicity prediction based on iris texture. The ability of           either ethnicity class. In (Qiu, Sun, and Tan 2007a), the
biometric systems to recognize the ethnicity of a subject               CASIA group reports 91% accuracy in Asian / non-Asian
could allow automatic classification without human input.               ethnicity classification, using support vector machines and
Also, in an iris recognition system, an identification                  texton features. The dataset in this work is composed of
request includes a “probe” iris, which is checked against a             2,400 images representing 60 different persons, so that
“gallery” of enrolled images, to find the correct identity of           there are 20 images per iris. They divide the dataset into a
the requested iris. One application of this feature is to               1,200-image training set and a 1,200-image test set, with
narrow down the gallery of subjects to compare an iris to               training and test set not specified to be person-disjoint. In
for identification purposes. In a system with millions of               general, if iris images from the same person appear in both
enrolled subjects, comparing an iris to all subjects could              the training and the test set, then the performance estimate
take an extremely long time. Narrowing down the gallery                 obtained is optimistically biased. In the results presented
to only irises with the same ethnicity as the probe iris for            in this paper, we eliminate this issue by using a person-
comparison could give a great speed improvement.                        disjoint ten-fold cross-validation.
    Figure 1 – Example LG 4000 Iris Images From Subjects
    with Caucasian Ethnicity (top: image 02463d1892;            Figure 2 – Example LG 4000 Iris Images From Subjects
    middle:    image    04327d1264;     bottom:   image         with Asian Ethnicity (top: image 04815d908; middle:
    04397d1461).                                                image 04629d1385; bottom: image 05404d80).


   In a study of how human observers categorize images,         observers did not know the gender or ethnicity of the
Stark, Bowyer, and Siena (2010) found that humans               persons in the iris images. However, the grouping of
perceive general differences in iris texture that can be used   images into categories of similar iris texture resulted in
to classify iris textures into categories of similar texture    categories that were, on average, split 80% / 20% on
pattern. Observers grouped a set of 100 iris images into        ethnicity. The same categories were on average divided
categories of similar texture. The 100 images represented       much more closely to 50% / 50% on gender. Thus, one
100 different persons, and the 100 persons were balanced        result of Stark’s work (2010) is that human observers
on gender and on Asian / Caucasian ethnicity. The               perceive consistent ethnicity-related differences in iris
texture. In this paper, we want to train a classifier to
explicitly perform the sort of ethnicity classification that
was found as a side effect of the texture similarity grouping
done by humans in (Stark, Bowyer, and Siena 2010) and
that was previously explored in (Qiu, Sun, and Tan 2006;
Qiu, Sun, and Tan 2007a).


                         Dataset
We want to see how accurately we can identify ethnicity
based on iris texture. For this study we will use two
ethnicity classes, Caucasian and Asian. This study used
1200 iris images selected from the University of Notre              Figure 3 – Examples of Segmented, Normalized Iris
Dame’s iris image database. (This is a newer database than          Images. (top: normalized image derived from image
was released to the iris biometrics research community for          02463d1892 above; bottom: normalized image
the government’s Iris Challenge Evaluation (ICE) program            derived from image 04815d908 above). The green
(Phillips et al. 2005; Phillips et al. 2010).) All images           regions indicate the “mask” of where the iris texture
were obtained using an LG 4000 sensor at Notre Dame.                is occluded by eyelid / eyelash / specular highlights.
As with all commercial iris biometrics systems that we are
aware of, the images are obtained using near-infrared                             Feature Generation
illumination, and are 480x640 in size. One half of the
images, 600, were of persons whose ethnicity is classified       After an image is segmented and normalized, we compute
as Asian and the other half were from persons classified as      texture features that can be used in training a classifier to
Caucasian. For each ethnicity, the 600 images represented        categorize images according to ethnicity. To do this we
60 different persons, with 5 left iris images and 5 right iris   apply different filters to the image at every non-masked
images per person.        This 1,200-image dataset was           pixel location, and use the results of the filter to build a
randomly divided into 10 folds of 120 images each, with 6        feature vector. Six of the filters we have used are “spot
persons of each ethnicity in each fold. Thus the images in       detectors” and “line detectors” of various sizes, as depicted
the folds are person-disjoint; that is, each person’s images     in Tables I to VI. For a given point in the image, if
appear in just one fold.                                         applying a given filter would result in using any pixel that
                                                                 is masked, then that filter application is skipped for that
                                                                 point. The rest of the filters, depicted in Tables VI-VIII,
                     Segmentation                                were created using Laws’ Texture Measures (Laws 1980).
                                                                 These are designed to give responses for various types of
For this iris texture prediction study, we want to base our
                                                                 textures when convolved with images.
findings solely on iris texture. Therefore we exclude
                                                                    A feature vector that describes the texture is computed
periocular clues that might be used as an indicator of
                                                                 for each iris image. We divided the normalized image
ethnicity. We segment the images to obtain the region of
                                                                 array into a number of smaller sections in order to compute
interest, and mask out the eyelid-occluded portions of the
                                                                 statistics for sub-regions of the normalized image. This is
iris. We use Notre Dame’s IrisBee software to perform the
                                                                 so that classification could be based on, for example,
segmentations (Phillips et al. 2005). The output from
                                                                 relative differences between the band of the iris nearer the
IrisBee that we use for texture examination is a 240x40
                                                                 pupil versus the band of the iris furthest from the pupil.
pixel normalized iris image along with the corresponding
                                                                 These regions were ten four-pixel horizontal bands and
bitmask of eyelid and eyelash occlusion locations. The
                                                                 four 60-pixel vertical bands of neighboring pixels in the
image segmentation and masking are exactly those that
                                                                 normalized iris image.           The ten horizontal bands
would be used by IrisBEE in processing the images for
                                                                 correspond to concentric circular bands of the iris, running
biometric recognition of a person’s identity. However, the
                                                                 from the pupil out to the sclera (white) of the eye. The
normalized images are not processed by the log-Gabor
                                                                 four vertical bands correspond roughly to the top, right,
filters that are used by IrisBEE to create the “iris code” for
                                                                 bottom and left parts of the iris. Since the filters are
biometric recognition. We create a different texture
                                                                 looking for different phenomena in the image, we find
feature vector for ethnicity prediction.
                                                                 statistics for the filter response of each image. Each image
                                                                 contains 630 features, with 5 statistics calculated for each
                                                                 of the 9 filters on all of the 14 regions. The five statistics
                                                                 are: (1) average value of filter response, (2) standard
deviation of filter response, (3) 90th percentile value of                   TABLE VI: Wide Horizontal Line Detector
filter response, (4) 10th percentile value of filter response,         -1/10       -1/10     -1/10    -1/10     -1/10
and (5) range between 90th and 10th percentile value. The
                                                                       +1/15       +1/15     +1/15    +1/15     +1/15
motivation for using the average value is to represent the
                                                                       +1/15       +1/15     +1/15    +1/15     +1/15
strength of a given spot size or line width in the texture.
                                                                       +1/15       +1/15     +1/15    +1/15     +1/15
The motivation for using the standard deviation is to
represent the degree of variation in the response. The                 -1/10       -1/10     -1/10    -1/10     -1/10
motivation for using the percentiles and range is to have an
alternate representation of the variation that is not affected                           TABLE VII: S5S5
by small amounts of image segmentation error.                           +1         0         -2        0        1
                                                                        0          0         0         0        0
                TABLE I: Small Spot Detector Filter                     -2         0         +4        0        -2
                      -1/8    -1/8    -1/8                              0          0         0         0        0
                      -1/8    +1      -1/8                              +1         0         -1        0        +1
                      -1/8    -1/8    -1/8
                                                                                         TABLE VIII: R5R5
               TABLE II: Large Spot Detector Filter                     -1         -4        6         -4       +1
        -1/16       -1/16     -1/16     -1/16     -1/16                 -4         +16       -24       +16      -4
        -1/16       +1/9      +1/9      +1/9      -1/16                 6          -24       +36       -24      +6
        -1/16       +1/9      +1/9      +1/9      -1/16                 -4         +16       -24       +16      -4
        -1/16       +1/9      +1/9      +1/9      -1/16                 +1         -4        +6        -4       +1
        -1/16       -1/16     -1/16     -1/16     -1/16

              TABLE III: Vertical Line Detector Filter
        -1/20       -1/20    +1/5       -1/20     -1/20                                     Results
        -1/20       -1/20    +1/5       -1/20     -1/20          We tried a variety of different classification algorithms
        -1/20       -1/20    +1/5       -1/20     -1/20          included in the WEKA package (Weka). This included
        -1/20       -1/20    +1/5       -1/20     -1/20          using meta-algorithms like Bagging with other classifiers.
        -1/20       -1/20    +1/5       -1/20     -1/20          By changing parameters, we achieved performance gains
                                                                 on some of the algorithms. However, we found our best
         TABLE IV: Wide Vertical Line Detector Filter            results using the SMO algorithm with the default
      -1/10       +1/15       +1/15       +1/15       -1/10      parameters in WEKA for classification.             The SMO
                                                                 algorithm implements “Sequential Minimal Optimization”,
      -1/10       +1/15       +1/15       +1/15       -1/10      John Platt’s algorithm for building a support vector
      -1/10       +1/15       +1/15       +1/15       -1/10      machine classifier (Weka). The input to the SMO
      -1/10       +1/15       +1/15       +1/15       -1/10      algorithm is the feature vectors of all 1200 iris images that
      -1/10       +1/15       +1/15       +1/15       -1/10      we have computed. To assess the results of our classifier
                                                                 we use cross-fold validation with ten folds using
           TABLE V: Horizontal Line Detector Filter              stratification based on ethnicity. These folds are also
      -1/20        -1/20     -1/20      -1/20      -1/20         subject-disjoint to ensure the persons whose images are in
                                                                 the test data have not been seen by the classification
      -1/20        -1/20     -1/20      -1/20      -1/20         algorithm in the training data.
      +1/5         +1/5      +1/5       +1/5       +1/5             The SMO classifier results in higher accuracy compared
      -1/20        -1/20     -1/20      -1/20      -1/20         to a broad range of other classifiers, including decision tree
      -1/20        -1/20     -1/20      -1/20      -1/20         based algorithms and bagging. Using Bagging on the top
                                                                 two classifiers, SMO and Random Forest, did not improve
                                                                 performance. Running the experiment with the SMO
                                                                 classifier and the feature vector as described above gives
                                                                 us an accuracy of 90.58%. This is good accuracy,
                                                                 representing an improvement on the 86% reported in (Qiu,
                                                                 Sun, and Tan 2006) and close to the 91% reported in (Qiu,
                                                                 Sun, and Tan 2007a) for a train-test split that was not
person-disjoint. When we do not use person disjoint                 TABLE XII: SMO Accuracy By Fold
results, we see an accuracy of 96.17%, which is                     Using 10 Fold Cross Validation
significantly higher than Qiu, Sun, and Tan (2006; 2007a)           Fold                   Accuracy (%)
reported.                                                           1                      91.667
   We computed the classification accuracy for each                 2                      100.000
feature separately to see the impact of individual features.        3                      88.333
Table X shows that some of the single features have almost          4                      90.833
have the performance of all of the features together.               5                      97.500
However none of them do as well as the combination of all           6                      82.500
of the features. Some filters may be redundant; a                   7                      98.333
combination of a few might reproduce the performance of
                                                                    8                      90.000
all nine filters.
                                                                    9                      87.500
   To ensure that the size of our training dataset was not
                                                                    10                     79.167
limiting our accuracy levels, we ran the classifier with
                                                                    Average                90.583
different numbers of folds. Table XI shows the results we
achieved using 5, 10, and 20 fold cross validation. The
accuracy levels are all within one percent, indicating that                          Future Work
our performance should not be limited by our dataset size.
                                                               To achieve even greater accuracy, we intend to implement
                                                               additional and more sophisticated features, and to look at
TABLE IX: Results for Different Classifiers                    the effects of the size of the training set. We envision that
Algorithm                                     Accuracy (%)     the number of different persons represented in the training
SMO                                           90.58            data is likely to be more important than the number of
RandomForest (100 Trees/Features)             89.50            images in the training set; that is, doubling the training set
Bagged FT                                     89.33            by using twice as many images per person is likely not as
FT                                            87.67            powerful as doubling the number of persons.
                                                                  For this experiment, we only looked at very broad
ADTree                                        85.25
                                                               ethnicity classifications. More work could be done to
J48Graft                                      83.67
                                                               examine finer categories, such as Indian and Southeast
J48                                           83.08
                                                               Asian. The performance of a classifier such as this has not
Naïve Bayes                                   68.42
                                                               been tested on subjects of multiple ethnic backgrounds
                                                               either.

TABLE X: Feature Performance with SMO
Feature                                       Accuracy (%)                        Acknowledgments
Small Spot Detector                           85.58
                                                               This work is supported by the Technical Support Working
Large Spot Detector                           85.67
                                                               Group under US Army contract W91CRB-08-C-0093, and
Vertical Line Detector                        87.42
                                                               by the Central Intelligence Agency. The opinions, findings,
Wide Vertical Line                            85.50            and conclusions or recommendations expressed in this
Horizontal Line Detector                      78.92            publication are those of the authors and do not necessarily
Wide Horizontal Line Detector                 78.33            reflect the views of our sponsors.
S5S5                                          78.17
R5R5                                          73.33
E5E5                                          88.0                                     References
All Features                                  90.58
                                                               Bowyer, K.W.; Hollingsworth, K.; and Flynn, P. J. Image
 TABLE XI: SMO By Number of Folds Used in Cross                Understanding for Iris Biometrics: A Survey, Computer Vision
 Validation                                                    and Image Understanding, 110(2), 281-307, May 2008.
 Folds                                        Accuracy (%)     Daugman, J., Probing the Uniqueness and Randomness of Iris
 5                                            90.00            Codes: Results From 200 Billion Iris Pair Comparisons,
                                                               Proceedings of the IEEE, Nov. 2006, 94 (11), 1927 – 1935.
 10                                           90.583
                                                               Laws, K. Textured Image Segmentation, Ph.D. Dissertation,
 20                                           90.1667          University   of     Southern    California,   January     1980.
Phillips, P. J.; Bowyer, K. W.; Flynn, P.J; Liu, X; and Scruggs, T.   Thomas, V; Chawla, N; Bowyer, K. W.; and Flynn, P. J. Learning
W. The Iris Challenge Evaluation 2005, Biometrics: Theory,            to predict gender from iris images. In Proc. IEEE Int. Conf. on
Applications and Systems (BTAS 08), September 2008,                   Biometrics: Theory, Applications, and Systems, Sept 2007.
Washington, DC.                                                       Qiu, X. C.; Sun, Z. A.; and Tan, T. N. Global texture analysis of
Phillips, P. J.; Scruggs, W. T.; O'Toole, A.; Flynn, P.J.; Bowyer,    iris images for ethnic classification. In Springer LNCS 3832: Int.
K.W.; Schott, C. L.; and Sharpe, M. FRVT 2006 and ICE 2006            Conf. on Biometrics, pages 411-418, June 2006.
Large-Scale Experimental Results, IEEE Transactions on Pattern        Qiu, X. C.; Sun, Z. A.; and Tan, T. N. Learning appearance
Analysis and Machine Intelligence 32 (5), May 2010, 831-846..         primitives of iris images for ethnic classification. In Int. Conf. on
Stark, L; Bowyer, K.W.; and Siena,S. Human perceptual                 Image Processing, pages II: 405–408, 2007a.
categorization of iris texture patterns, Biometrics Theory,           Qiu, X. C.; Sun, Z. A.; and Tan, T. N. Coarse iris classification by
Applications and Systems (BTAS), September 2010.                      learned visual dictionary. In Springer LNCS 4642: Int. Conf. on
                                                                      Biometrics, pages 770–779, Aug 2007b.
                                                                      Weka 3. http://www.cs.waikato.ac.nz/ml/weka/.