Pixel Consistency, K-Tournament Selection, and
                                    Darwinian-Based Feature Extraction

             Joseph Shelton, Melissa Venable, Sabra Neal, Joshua Adams, Aniesha Alford, Gerry Dozier
                  #
                   Computer Science Department, North Carolina Agricultural and Technical State University
                           1601 East Market St. Greensboro, NC, 27411, United States of America
                             jashelt1@ncat.edu, mdvenabl@ncat.edu, saneal@ncat.edu,
                             jcadams2@ncat.edu, aalford@ncat.edu, gvdozier@ncat.edu
                            Abstract                                  first stage takes a set of FEs evolved by GEFEML and
In this paper, we present a two-stage process for developing          superimposes each to create a hyper FE. From this hyper
feature extractors (FEs) for facial recognition. In this process, a   FE, a probability distribution function (PDF) is created. The
genetic algorithm is used to evolve a number of local binary          PDF is represented as a two-dimensional matrix where each
patterns (LBP) based FEs with each FE consisting of a number of
(possibly) overlapping patches from which features are extracted      position in the matrix corresponds to a pixel within a set of
from an image. These FEs are then overlaid to form what is            images. Each value within the PDF represents the number
referred to as a hyper FE.                                            of patches an associated pixel is contained within it.
   The hyper FE is then used to create a probability distribution        In the second stage of the process, a Darwinian feature
function (PDF). The PDF is a two dimensional matrix that records      extractor (dFE) is developed by sampling the PDF via k-
the number of patches within the hyper FE that a particular pixel     tournament selection (Miller and Goldberg 1996). The
is contained within. Thus, the PDF matrix records the consistency     selected pixels are then grouped into c different clusters by
of pixels contained within patches of the hyper FE.
                                                                      randomly selecting α pixels to serve as centers. Our results
   Darwinian-based FEs (DFEs) are then constructed by sampling
the PDF via k-tournament selection to determine which pixels of a     show that the computational cost of DFE (in terms of the
set of images will be used in extract features from. Our results      total number of pixels being processed) via dFEs is far less
show that DFEs have a higher recognition rate as well as a lower      expensive than the FEs evolved via GEFEML. The dFEs also
computational complexity than other LBP-based feature                 outperform GEFEML evolved FEs in terms of recognition
extractors.                                                           accuracy.
                                                                        The remainder of this paper is as follows. Section 2
                         Introduction                                 provides an overview of the LBP feature extraction method,
Genetic & Evolutionary Biometrics (GEB) is the field of               GECs, and GEFEML. Section 3 provides a description of the
study devoted towards the development, analysis, and                  two-stage process for developing dFEs. Sections 4 and 5
application of Genetic & Evolutionary Computation (GEC)               present our experiment setup and our results respectively.
to the area of biometrics (Ramadan and Abdel-kader 2009;              Finally, in Section 6, we present our conclusions and future
Galbaby et al. 2007; Alford et al. 2012; Shelton et al.               work.
2012c). Over the past few years there has been a growing
interest in GEB. To date, GEB has been applied to the area
of biometrics in the form of feature extraction (Shelton et al.                             Background
2011a; Adams et al. 2010), feature selection (Kumar,
Kumar and Rai 2009; Dozier et al. 2011), feature weighting            Local Binary Pattern Method
(Popplewell et al. 2011; Alford et al. 2011) as well as cyber
security (Shelton et al. 2012a; Shelton et al. 2012b).                   The LBP method (Ojala and Pietikainen 2002; Ahonen,
  GEFEML (Genetic and Evolutionary Feature Extraction –               Hadid and Pietikinen 2006) extracts texture patterns from
Machine Learning) (Shelton et al. 2012c) is a GEB method              images in an effort to build a feature vector (FV). It does
                                                                      this by segmenting an image into rectangular regions,
that uses GECs to evolve feature extractors (FEs) that have
                                                                      referred to as patches, and comparing the grey-scale
high recognition accuracy while using a small subset of
                                                                      intensity values of each pixel with the intensity values of a
pixels from a biometric image. The results of Shelton et al.
                                                                      pixel’s nearest neighbors. After pixels are compared with
(2012c) show that FEs evolved via GEFEML outperform the               their nearest neighbors, a pattern is extracted. This pattern
FEs developed via the traditional Local Binary Pattern                is represented by a binary string. A histogram is built using
(LBP) (Ojala and Pietikainen 2002) approach.                          the frequency of occurring patterns for a patch. The
  In this paper, we present a two-stage process for facial            histograms for every patch are concatenated to form a FV.
recognition (Tsekeridou and Pitas 1998; Zhao et al. 2003)
known as Darwinian-based feature extraction (DFEs). The
   In the LBP method, images are traditionally partitioned          y-coordinates of center pixel of n possible patches. The
into uniform sized, non-overlapping patches. Within each            widths and heights of the n patches are represented by Wi =
patch, pixels are sought out that have d neighboring pixels         {wi,0, wi,1, … , wi,n-1} and Hi = {hi,0, hi,1,…, hi,n-1}. Because
on all sides and that are a distance of r pixels away from a        the patches are uniform, W k = {wk,0, wk,1, … , wk,n-1} is
center pixel. Each of these pixels can be referred to as a          equivalent to, wk,0 = wk,1,…, wk,n-2 = wk,n-1, and Hk = {hk,0,
center pixel, cp, due to it being surrounded by a                   hk,1, … , hk,n-1} is equivalent to, hk,0 = hk,1,…, hk,n-2 = hk,n-1,
neighborhood of pixels. A texture pattern can be extracted          meaning that the widths and heights of every patch are the
using Equations 1 and 2, where N is the set of pixel                same. Uniform sized patches are used because uniform
intensity values for each of the neighboring pixels. In             sized patches outperformed non-uniform sized patches in
Equation 1, the difference between a neighboring pixel and          (Shelton et al. 2011b). Mi = {mi,0, mi,1,…, mi,n-1} represents
cp is calculated and sent to Equation 2. The value returned         the masking values for each patch and fi represents the
will either be a 1 or a 0, depending on the difference. The d       fitness of fei . The masking value determines whether a
bits returned will be concatenated to form a texture pattern.       patch is activated or deactivated. If a patch is deactivated,
                                                                    by setting mi,j = 0, then the sub-histogram will not be
                         d
                                                                    considered in the distance measure, and the number of
         LBP( N , c p )   (nt  c p )            (1)
                        t 0
                                                                    features to be used in comparisons is reduced. Otherwise,
                                                                    the patch is activated, with mi,j = 1.
                    1, x  0,                     (2)              The fitness fi is determined by how many incorrect matches
           ( x)                                                  it makes on a training dataset D and how much of the image
                    0, x  0.
                                                                    is processed by fei. The dataset D is composed of multiple
                                                                    snapshots of subjects and is divided into two subsets, a
 Each patch has a histogram that stores the frequency of
                                                                    probe and a gallery set. The fei is applied on both the probe
certain texture patterns extracted. The histograms for all
                                                                    set and gallery set to create FVs for each set. A distance
patches of an image are concatenated together to create a
                                                                    measure is used to compare FVs in the probe to FVs in the
FV for an image. This FV can be compared to another FV
                                                                    gallery and the smallest distances are considered a match. If
of an image using a distance measure such as the
                                                                    the FV of an individual in the probe is incorrectly matched
Manhattan Distance measure or the Euclidean distance
                                                                    with the FV of another individual in the gallery, then that is
measure.
                                                                    considered an error. The fitness, shown in Equation 3, is the
                                                                    number of errors multiplied by 10 plus the percentage of
GECs                                                                image space being processed.
  GEFEML uses GECs to evolve FEs (Shelton et al. 2012c).
The resulting FEs have been shown to have high                                 fi  10 ( D)   ( fei )                 (3)
recognition rates. A GEC uses artificial evolution to evolve
a population of candidate solutions (CSs) to a particular              To prevent overfitting FEs on a training set during the
problem. Initially, a population of CSs is randomly                 evolutionary process, cross-validation is used to determine
generated. Each CS in the population is then assigned a             the FEs that generalize well to a dataset of unseen subjects.
fitness based on a user specified evaluation function. Parent       While offspring are applied to the training dataset to be
CSs are then selected based on their fitness and allowed to         evaluated, they were also applied to a mutually exclusive
create offspring using a number of recombination and                validation dataset which does not affect the evolutionary
mutation techniques (Spears and DeJong 1991). After the             process. The offspring with the best performance on the
offspring are created, they are evaluated and typically             validation dataset is recorded regardless of its performance
replace the weaker members of the previous population.              on the training set.
The process of selecting parents, creating offspring, and
replacing weaker CSs is repeated until a user specified
stopping condition is met.
                                                                         The Two-stage Process for Developing a
                                                                               Hyper FE and a PDF
GEFEML
 GEFEML evolves LBP-based FEs using some GEC, so FEs                Stage I: Hyper FE/PDF
must be represented as a CS. GEFEML represents an FE, fei,            The hyper FE is constructed by taking a set of FEs from
as a six-tuple, <Xi,Yi,Wi,Hi,Mi,fi>. The set Xi = {xi,0, xi,1,…,    GEFEML and overlaying them. Figure 1a shows a set of
xi,n-1} represents the x-coordinates of the center pixel of n       sample FEs while Figure 1b shows a sample hyper FE.
possible patches and Yi = {yi,0, yi,1, … , yi,n-1} represents the   After the hyper FE is constructed, a PDF, in the form of a
matrix, is created. Each position in the matrix contains the      placed within the PDF. The distance between each of the
number of patches a pixel was contained in. When patches          selected positions for clustering will be compared to the
in an FE overlapped on a position multiple times, the             center positions, and the pixel will be clustered towards the
overlap is considered in the count. So if the hyper FE had n      closest one. After pixels have been assigned to clusters,
patches, and used κ FEs, the greatest number of times a           those pixels undergo LBP feature extraction to extract
pixel was contained in a patch would be n * κ. Figure 1c          texture patterns for a cluster. Due to the random placement
shows a 3D plot of a PDF, while Figure 1d shows the 3D            of clusters, it is possible for different clusters to have
plot laid over a facial image.                                    different numbers of pixels clustered to it.
                                                                    The clusters are similar to patches, therefore histograms
                                                                  are associated with each, and the patterns are used to build
                                                                  the histogram and ultimately create FVs for images.


                                                                                        Experiments
                                                                      Two hyper FEs were used in this experiment: (a) a
                                                                   hyper FE composed of a set of FEs that performed well on
                                                                   the training set, HFEtrn and (b) a hyper FE composed of a
                                                                   set of the best performing FEs on the validation set,
           (a)                           (b)                       HFEval. The FEs were evolved using the experimental
                                                                   setup in Shelton et al. (2012c), which used GEFEML.
                                                                   GEFEML was run 30 times using increments of 1000,
                                                                   2000, 3000 and 4000 evaluations. An EDA instance
                                                                   (Larranga and Loranzo 2002) of GEFEML was used with a
                                                                   population of 20 FEs and an elite of 1, meaning every
                                                                   generation starting from the second contained the single
                                                                   best performing FE of the previous generation. On each
                                                                   run, GEFEML returned the best performing FE on the
                                                                   training set and the best performing FE with respect to the
                                                                   validation set.
                                                                      The FEs were trained and validated on two mutually
                                                                   exclusive sets, and they were then applied to a test set. The
           (c)                            (d)                      datasets were composed of subjects from the Facial
                                                                   Recognition Grand Challenge database (FRGC) (Phillips
Figure 1: (a) Set of FEs (b) hyper FE (c) 3D plot of PDF           et al. 2005). The training set was composed of 100
      and (d) overlay of 3D plot on a facial image                 subjects (FRGC-100trn), the validation set was composed
                                                                   of 109 subjects (FRGC-109), and the test set was
Stage II: Developing dFEs                                          composed of 100 subjects (FRGC-100tst). The average
  A dFE can be defined by the number of clusters it has, α,        number of patches used from the set of generalizing FEs
the selection pressure of tournament selection, µ, and the         as well as the average number of pixels processed in a
patch resolution, ρ. The variables µ and ρ are represented         patch were calculated in order to set a starting point for
as a percentage, or a value between 0 and 1. Assume that β         this experiment. On average, 12 patches were activated
represents the number of pixels a user would want for a            and 504 pixels were processed by each patch using
cluster, there are α *ρ*β positions that will be selected to be    GEFEML.
clustered. Tournament selection selects µ*σ pixels to                 In this experiment, instances of 16, 12, 8 and 4 clusters
compete for clustering, where σ represents the total number        were tested. Different patch resolutions, or the amount of
of positions in the PDF that have been processed at least          pixels that could belong to a cluster, were also used. In this
once. When performing tournament selection, the position           experiment, σ = 504. This was the average number of
with the greatest consistency will be the winner. If there is a    pixels in patches of the set of FEs from GEFEML. Instances
tie, then the first selected position is the winner. Winning       of DFE with patch resolutions of 1.0, 0.9, 0.8, 0.7, 0.6,
pixels are selected without replacement.                           0.5, 0.4, 0.3, 0.2 and 0.1 were run. Each resolution used
  After α *ρ* β pixels have been selected via tournament           selection pressures from 0.0 (where number of pixels to be
selection, α random centers for clusters are chosen to be          compared in tournament selection is actually 2) to 1.0 and
                                                                   every tenth percentage in between. A dFE is defined to be
 a cluster, patch resolution, then selection pressure, giving a          Table I: Results of DFEval, DFEtrn and GEFEML
 total of 880 dFEs (4 clusters * 10 patch resolutions * 11
 selection pressures * 2 hyper FEs), and each DFE instance             Method          Feature Extractor          CC              Acc
 was ran 30 times. For each run, a dFE was applied to
 FRGC-100tst.                                                                              <16,1.0,0.8>           8064.0          99.82%
                                                                                           <16,0.9,0.5>           7257.6          99.69%
                                                                         DFEval            <16,0.8,0.4>           6451.2          99.85%
                                                                                           <16,0.7,0.5>           5644.8          99.60%
                          Results
                                                                                           <12,1.0,0.2>           6048.0          99.66%
                                                                                           <12,0.9,1.0>           5443.2          99.39%
  The results were obtained by running each dFE listed in
Section 4 on FRGC-100tst.                                                DFEtrn            <12,0.5,0.2>           3024.0          98.70%
 To compare the effectiveness of each method, we compare                GEFEML                 ----               6048.0          99.10%
the results of different selection pressures within a certain
resolution and patch. The results of the best selection
pressure for a resolution are compared to the best selection                      Conclusion and Future Work
pressures of every other resolution within the cluster group,
and this is done for results in every cluster. After the best       The results of the experiment suggest that the HFEval
performing FEs are obtained from each cluster, they are           produces dFEs that generalize well to unseen subjects. The
compared to each other as well as the results of GEFEML.          dFEs resulting from the HFEtrn also generalized well, but
Results are compared using an ANOVA test and a t-test on          were not as effective as when using dFEs resulting from the
the recognition accuracies for a cluster-resolution-selection     HFEval. Using both hyper FEs performed better than the set
pressure instance.                                                of generalized FEs from GEFEML. Future work will be
 Table I shows the results of this experiment. The first          devoted towards using additional GECs for the DFE.
column shows the methods used. The method DFEval
represents dFEs that sampled the HFEval, while the method
DFEtrn represents dFEs that sampled the HFEtrn. The two
methods are compared to the original GEFEML method,                                      Acknowledgment
shown as GEFEML. The second column, Feature Extractor,
shows the number of clusters used, the resolution and the           This research was funded by the Office of the Director of
selected pressure for a dFE. The third and fourth columns         National Intelligence (ODNI), Center for Academic
show the computational complexity (CC) and the average            Excellence (CAE) for the multi-university Center for
recognition accuracy (Acc) respectively for each method.          Advanced Studies in Identity Sciences (CASIS), NSF
The computational complexity is the number of pixels              SSTEM, Lockheed Martin and the National Science
processed, or extracted, by each method. Though 880 dFEs          Foundation (NSF) Science & Technology Center:
were tested, the only ones shown are ones that produced           Bio/computational Evolution in Action CONsortium
superior results to GEFEML.                                       (BEACON). The authors would like to thank the ODNI, the
 For DFEval, each dFE showed in Table I outperformed              NSF, and Lockheed Martin for their support of this
GEFEML in terms of recognition accuracy. For DFE trn, the         research.
dFE <12,0.5,0.2> was statistically equivalent to FEs
evolved using GEFEML. Though we compare results based                                         References
on recognition accuracy, we also considered computational
                                                                  Adams, J.; Woodard, D.L.; Dozier, G.; Miller, P.; Bryant, K.; Glenn, G.,
complexity.                                                       "Genetic-Based Type II Feature Extraction for Periocular Biometric
 The results show that the <12,0.5,0.2> dFE (of DFEtrn)           Recognition: Less is More," Pattern Recognition (ICPR), 20th
outperforms GEFEML in terms of computational complexity,          International Conference on , vol., no., pp.205-208, 23-26 Aug. 2010
and that the <12,0.9,0.1> instance of DFEval outperformed
                                                                  Ahonen, T., Hadid, A., Pietikinen, M.; "Face Description with Local
DFEtrn and GEFEMLin terms of recognition accuracy as well         Binary Patterns: Application to Face Recognition," IEEE Transactions on
as computational complexity. These results are promising          Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 2037-2041,
in terms of recognition and feature reduction of DFE.             Dec. 2006

                                                                  Alford, A., Popplewell, K., Dozier, G., Bryant, K., Kelly, J., Adams, J.,
                                                                  Abegaz, T., and Shelton, J.“GEFeWS: A Hybrid Genetic-Based Feature
                                                                  Weighting and Selection Algorithm for Multi-BiometricRecognition,”
                                                                  Proceedings of the 2011 Midwest Artificial Intelligence and Cognitive
                                                                  Science Conference (MAICS), April 16-17, Cincinnati, OH., 2011
Alford, A., Steed, C., Jeffrey, M., Sweet, D., Shelton, J., Small, L.,           Ramadan. R. M. and Abdel-Kader, R. F. ; “Face Recognition Using
Leflore, D., Dozier, G., Bryant, K., Abegaz, T., Kelly, J.C., Ricanek, K.        Particle Swarm Optimization-Based Selected Features,” In International
(2012) “Genetic & Evolutionary Biometrics: Hybrid Feature Selection and          Journal of Signal Processing, Image Processing and Pattern Recognition,
Weighting for a Multi-Modal Biometric System”, IEEE SoutheastCon                 Vol. 2, No. 2, June 2009.
2012.
                                                                                 Shelton, J., Dozier, G., Bryant, K., Smalls, L., Adams, J., Popplewell, K.,
Dozier, G., Purrington, K., Popplewell, K., Shelton, J., Bryant, K., Adams,      Abegaz, T., Woodard, D., and Ricanek, K. “Comparison of Genetic-based
J., Woodard, D. L., and Miller, P. “GEFeS: Genetic & Evolutionary                Feature Extraction Methods for Facial Recognition,” Proceedings of the
Feature Selection for Periocular Biometric Recognition,” Proceedings of          2011 Midwest Artificial Intelligence and Cognitive Science Conference
the 2011 IEEE Workshop on Computational Intelligence in Biometrics and           (MAICS-2011), April 16-17, Cincinnati., 2011a
Identity Management (CIBIM-2011), April 11-15, Paris, France., 2011
                                                                                 Shelton, J., Dozier, G., Bryant, K., Smalls, L., Adams, J., Popplewell, K.,
Galbally, J.; Fierrez, J.; Freire, M.R.; Ortega-Garcia, J.; "Feature Selection   Abegaz, T., Woodard, D., and Ricanek, K. “Genetic and Evolutionary
Based      on      Genetic       Algorithms     for    On-Line       Signature   Feature Extraction via X-TOOLSS” in The 8th annual International
Verification," Automatic Identification Advanced Technologies, 2007              Conference on Genetic and Evolutionary Methods (GEM), 2011b.
IEEE Workshop on , vol., no., pp.198-203, 7-8 June 2007
                                                                                 Shelton, J., Bryant, K, Abrams, S., Small, L., Adams, J., Leflore, D.,
Kumar, D.; Kumar ,S. and Rai, C.S.; "Feature selection for face                  Alford, A., Ricanek, K, and Dozier, G.; “Genetic & Evolutionary
recognition: a memetic algorithmic approach." Journal of Zhejanga                Biometric Security: Disposable Feature Extractors for Mitigating
University Science A, Vol. 10, no. 8, pp. 1140-1152, 2009.                       Biometric Replay Attacks” proceedings of The 10th Annual Conference
                                                                                 on Systems Engineering Research (CSER), 2012a
Larranaga, P.and Lozano, J. A.; “Estimation of Distribution Algorithms: A
New Tool for Evolutionary Computation”, Kluwer Academic Publishers,              Shelton, J., Dozier, G., Adams, J., Alford, A.; “Permutation-Based
2002.                                                                            Biometric Authentication Protocols for Mitigating Replay Attacks” will
                                                                                 appear in the Proceedings of the 2012 IEEE World Congress on
Miller B. L. and Goldberg, D. E.; “Genetic algorithms, selection schemes,        Computational Intelligence (WCCI 2012), 2012b
and the varying effects of noise”. Evol. Comput. 4, 2, 113-131, June 1996.
                                                                                 Shelton, J., Alford, A., Abegaz, T., Small, L., Leflore, D., Williams, J.,
Ojala,T., Pietikainen, M.; “Multiresolution Gray-Scale and Rotation              Adams, J., Dozier, G., Bryant, K.; “Genetic & Evolutionary Biometrics:
Invariant Texture Classification with Local Binary Patterns”, IEEE Trans.        Feature Extraction from a Machine Learning Perspective”, proceedings of
Pattern Analysis and Machine Intelligence; 971-987; 2002                         IEEE SoutheastCon 2012c.

Phillips, P.J., Flynn, P. J., Scruggs, T., Bowyer, K. W., Chang, J., Hoff,       Spears, W. M. and DeJong, K. A. “An analysis of multipoint crossover,”
K., Marques, J., Min, J. and Worek, W; “Overview of face recognition             in the Proc. 1990 Workshop of the Foundations of Genetic Algorithms, pp.
grand challenge,” in IEEE Conference on Computer Vision and Pattern              301-315, 1991
Recognition, 2005.
                                                                                 Tsekeridou , S. and Pitas, I. ;“Facial feature extraction in frontal views
Popplewell, K., Dozier, G., Bryant, K., Alford, A., Adams, A., Abegaz, T.,       using biometric analogies” in Proc. of the IX European Signal Processing
Purrington, K., and Shelton, J. (2011). “A Comparison of Genetic Feature         Conference, I:315–318, 1998
Selection and Weighting Techniques for Multi-Biometric Recognition,”
Proceedings of the 2011 ACM Southeast Conference, March 24-26,                   Zhao, W., Chellappa, R., Phillips, P. J. and Rosenfeld, A.; “Face
Kennesaw, GA., 2011                                                              recognition: A literature survey". ACM Comput. Surv. 35, 4 (December
                                                                                 2003), 399-458., 2003