=Paper=
{{Paper
|id=Vol-1584/paper21
|storemode=property
|title=Comparison of Recent Machine Learning Techniques for Gender Recognition from Facial Images
|pdfUrl=https://ceur-ws.org/Vol-1584/paper21.pdf
|volume=Vol-1584
|authors=Joseph Lemley,Sami Abdul-Wahid,Dipayan Banik,Răzvan Andonie
|dblpUrl=https://dblp.org/rec/conf/maics/LemleyABA16
}}
==Comparison of Recent Machine Learning Techniques for Gender Recognition from Facial Images==
Joseph Lemley et al. MAICS 2016 pp. 97–102 Comparison of Recent Machine Learning Techniques for Gender Recognition from Facial Images Joseph Lemley Sami Abdul-Wahid Dipayan Banik Răzvan Andonie Computer Science Department Computer Science Department Computer Science Department Computer Science Department Central Washington University Central Washington University Central Washington University Central Washington University Ellensburg, WA, USA Ellensburg, WA, USA Ellensburg, WA, USA Ellensburg, WA, USA and Electronics and Computers Department Transilvania University Braşov, Romania Abstract hibit the diversity of subjects, settings, and qualities typical of everyday scenes. Recently, several machine learning methods for gender classi- fication from frontal facial images have been proposed. Their The diversity of the methods and benchmarks makes variety suggests that there is not a unique or generic solution a comparison between gender classification a challenging to this problem. In addition to the diversity of methods, there task, and this gave us the motivation for our work. We is also a diversity of benchmarks used to assess them. This compare state-of-the-art methods used in automatic gender gave us the motivation for our work: to select and compare in recognition on two benchmarks: the most popular standard a concise but reliable way the main state-of-the-art methods dataset Facial Recognition Technology (FERET) (Phillips et used in automatic gender recognition. As expected, there is al. 2000) and a more challenging data set of “in the wild” no overall winner. The winner, based on the accuracy of the images (Adience) (Eidinger, Enbar, and Hassner 2014). classification, depends on the type of benchmarks used. We only compare the accuracy of the classification and not other performance measures (precision, recall, F1 score, Introduction etc). The main reason is that the misclassification cost in this A major goal of computer vision and artificial intelligence is particular problem is the same, regardless if we misclassify a to build computers that can understand or classify concepts male or a female. We also do not compare the running time, such as gender in the same way humans do. since the experiments are performed on different computer Automatic classification of gender from frontal face im- architectures (the CNN is implemented on a GPU). ages taken under contrived conditions has been well studied with impressive results. The variety of methods published Related work: recent gender classification in the literature show that there is not a unique or generic methods solution to the gender classification problem. Applications of gender classification include, image Classifiers such as SVMs and feedforward NNs are often search, automatic annotation of images, security systems, used to classify images after the faces have been cropped out face recognition, and real time image acquisition on smart from the rest of the image, and possibly aligned and normal- phones and mobile devices. ized. Various feature extraction methods such as Principal The state-of-the-art gender classification methods gen- Component Analysis (PCA), independent component analy- erally fall into the following main categories: Convolu- sis, Fischer linear discriminants (Belhumeur, Hespanha, and tional Neural Networks (CNN), Dual Tree Complex Wavelet Kriegman 1997) (Wu et al. 2015), and edge detection al- Transform (DTCWT) + a Support Vector Machine (SVM) gorithms can be used to encode useful information from classifier, and feature extraction techniques such as Principal the image that is fed into the classifier, leading to high lev- Component Analysis (PCA), Histograms of Oriented Gradi- els of accuracy on many benchmarks. Other approaches use ents (HOG) and others with a classifier (SVM, kNN, etc). hand-crafted template features to find facial keypoints such The SVM approach is natural, since we have a two class as nose, eyes etc, while also using edge detection methods problem. The CNN is related to the well-known deep learn- (Sobel) and line intensities to separate facial edges from ing paradigm. The DTCWT provides approximate shift in- wrinkles. The resulting feature information, when fed into variance and directionally selective filters (properties lack- a feedforward neural network, allows age and gender to be ing in the traditional wavelet transform) while preserving classified with overall 85% accuracy on two test sets with the usual properties of perfect reconstruction and compu- a total of 172 images in the FERET and FGNET databases tational efficiency with good well-balanced frequency re- (Kalansuriya and Dharmaratne 2014). sponses (Kingsbury 2001). LDA (Linear Discriminant Analysis) based approaches to To assess gender classification techniques, two types of the face recognition task promise invariance to differing il- benchmarks may be used: standard posed datasets (with well luminations (Belhumeur, Hespanha, and Kriegman 1997). defined backgrounds, lighting and photographic characteris- This has been further studied in (Bekios-Calfa, Buena- tics) and datasets containing “In the wild” images that ex- posada, and Baumela 2011). Fisher linear discriminant max- 97 Joseph Lemley et al. MAICS 2016 pp. 97–102 imizes the ratio of between- class scatter to that of within- deal of success in dealing with images of subjects and ob- class scatter. Independent component analysis has been used jects in natural non-contrived settings, along with handling on a small subset (500 images) of the FERET dataset, lead- the rich diversity that these images entail. One investigation ing to 96% accuracy with an SVM classifier (Jain, Huang, of CNN fundamentals involved training a CNN to classify and Fang 2005). Likewise, PCA has been used in conjunc- gender on images collected on the Internet. 88% classifica- tion with a genetic algorithm that eliminated potentially un- tion accuracy was achieved after incorporating L2 regular- necessary features. The remaining features were then fed ization into training, and filters were shown to respond to to a feedforward neural network for training, and an over- the same features that neuroscientists have identified as fun- all 85% accuracy was obtained over 3 data sets (Sun et al. damental cues humans use in gender classification (Verma 2002). Various information theory based metrics were also and Vig 2014). Another experiment (Levi and Hassner 2015) fused together to produce 99.13% gender classification ac- uses a convolutional neural network on the Adience dataset curacy on the FERET (Perez et al. 2012). To overcome the for gender and age recognition. They used data augmenta- challenge of inadequate contrast among facial features using tion and face cropping to achieve 86% accuracy for gender histogram analysis, Haar wavelet transformation and Ad- classification. This is the only paper we know of that uses aboost learning techniques have been employed, resulting in CNN on Adience. a 97.3% accuracy on the Extended Yale face database which A method recently proposed by (Eidinger, Enbar, and contains 17 subjects under 576 viewing conditions (Laytner, Hassner 2014) uses an SVM with dropout, a technique in- Ling, and Xiao 2014). Another experiment describes how spired from newer deep learning methods, that has shown various transformations, such as noise and geometric trans- promise for age and gender estimation. Dropout involves formations, were fed in combination into a series of RBFs dropping a certain percent of features randomly during train- (Radial Basis Functions). RBF outputs were forwarded into ing. They also introduce the Adiance dataset to fulfill the a symbolic decision tree that outputs gender and ethnic class. need for a set of realistic labeled images for gender and 94% classification accuracy was obtained using the hybrid age recognition in quantities needed to prevent overfitting architecture on the FERET database (Gutta, Wechsler, and and allow true generalization (Eidinger, Enbar, and Hassner Phillips 1998). 2014). HOG (Histogram of Oriented Gradients) is commonly As we can see, most of the state-of-the-art methods for used as a global feature extraction technique that expresses gender classification fall into the categories described in information about the directions of curvatures of an image. Section . HOG features can capture information about local edge and gradient structures while maintaining degrees of invariance Data sets to moderate changes in illumination, shadowing, object lo- A number of databases exist that can be used to benchmark cation, and 2D rotation. HOG descriptors, combined with gender classification algorithms. Most image sets that con- SVM classifiers, can be used as a global feature extraction tain gender labels suffer from insufficient size, and because mechanism (Torrione et al. 2014), while HOG descriptors of this we chose two of the larger publicly available datasets: can be used on locations indicated by landmark-finding soft- Color-FERET (Phillips et al. 2000) and Adience (Eidinger, ware in areas such as facial expression classification (Déniz Enbar, and Hassner 2014). et al. 2011). One useful application of variations in HOG de- scriptors is the automatic detection of pedestrians, which is made easier in part because of their predominantly upright pose (Dalal and Triggs 2005). In addition, near perfect re- sults were obtained in facial expression classification when HOG descriptors were used to extract features from faces that were isolated through face-finding software (Carcagnı̀ et al. 2015). A recent technique proposed for face recognition is the Figure 1: Randomly selected images from the Adience DTCWT, due to its ability to improve operation under vary- dataset illustrating the wider range of photographic condi- ing illumination and shift conditions when compared to Ga- tions found. bor Wavelets and DWT (Discrete Wavelet Transform). The Extended Yale B and AR face databases were used, contain- Color FERET Version 2 was collected between Decem- ing a total 16128 images of 38 human subjects under 9 poses ber 1993 and August 1996 and made freely available with and 64 illumination conditions. It achieved 98% classifica- the intent of promoting the development of face recognition tion accuracy in the best illumination condition, while low algorithms. Images in the FERET Color database are 512 frequency subband image at scale one (L1) achieved 100% by 768 pixels and are in PPM format. They are labeled with (Sultana et al. 2014). gender, pose, name, and other useful labels. Recent years have seen great success in image related Although FERET contains a large number of high qual- problems through the use of CNN, thereby seeing the prolif- ity images in different poses and with varying face obstruc- eration of a scalable and more or less universal algorithmic tions (beards, glasses, etc), they all have certain similarities approach to solving general image processing problems, if in quality, background, pose, and lighting which make them enough training data is available. CNNs have had a great very easy for modern machine learning methods to correctly 98 Joseph Lemley et al. MAICS 2016 pp. 97–102 and Adience, a set of labeled unfiltered images intended to be especially challenging for modern machine learning al- gorithms. Adience was designed to present all variations in appearance, noise, pose, and lighting, that can be expected of images taken without careful preparation or posing. (Ei- dinger, Enbar, and Hassner 2014) We use the following steps in conducting our experi- ments: Figure 2: Randomly selected images from the FERET dataset show similarities in lighting, pose, subject, back- • Uniformly shuffle the order of images. ground, and other photographic conditions. • Use 70% as training set. 30% as testing set. • Train with training set. classify. We used all 11338 images in FERET for which gen- • Record correct classification rate on testing set. der labels exist in our experiments. As machine learning algorithms are increasingly used to Steps 1-4 are repeated 10 times for each experiment using process images of varying quality with vast differences in freshly initialized classifiers. scale, obstructions, focus, and which are often acquired with We report the results of 18 experiments, 16 of which use consumer devices such as web cams or cellphones, bench- SVMs and two of which use CNN. marks such as FERET have become less useful. To address this issue, datasets such as LWS (labeled faces in the wild) SVM classification and most recently Adience have emerged. LWS lacks gen- Both linear and RBF kernels were used, with each consti- der labels but it has accurate names from which gender can tuting a separate experiment using the SVC implementation often be deduced automatically with reasonable accuracy. included as part of scikit-learn(Pedregosa et al. 2011) with Adience is a recently released benchmark that contains C = 100 parameter set. gender and approximate age labels separated into 5 folds to In one experiment, raw pixels are fed into the SVM. Other allow duplication of results published by the database au- experiments used the following feature extraction methods: thors. It was created by collecting Flickr images and is in- PCA, HOG, and DTCWT. Feature extraction was applied tended to capture all variations of pose, noise, lighting, and to images uniformly without using face finding software to image quality. Each image is labeled with age and gender. isolate and align the face. It is designed to mimic the challenges of ”real world” im- age classification tasks where faces can be partly obscured Histogram of Oriented Gradients or even partly overlapping (for example in a crowd or when HOG descriptors, combined with SVM classifiers, can be an adult is holding a child and both are looking at the cam- used as a global feature extraction mechanism (Torrione et era). Eidlinger, et al published a paper where they used a al. 2014), while HOG descriptors can be used on locations SVM and filtering to classify age and gender along with the indicated by landmark-finding software in areas such as fa- release of Adience (Eidinger, Enbar, and Hassner 2014). cial expression classification (Déniz et al. 2011). We used all 19370 of the aligned images from Adience One application of HOG descriptors is the automatic de- that had gender labels, to create our training and testing sets tection of pedestrians, which is made easier in part because for all experiments that used Adience. of their predominantly upright pose (Dalal and Triggs 2005). Using the included labels and meta data in the FERET and We use the standard HOG implementation from the scikit- Adience datasets, we generated two files containing reduced image library (van der Walt et al. 2014). size 51x51 pixel data with values normalized between 0 and For every image in the Adience and FERET databases, 1, followed by a gender label. We choose to resize the im- HOG descriptors were uniformly calculated. 9 orientation ages to 51x51 because this produced the best quality images bins were used, and each histogram was calculated based on after Anti-Aliasing gradient orientations in the 7x7 pixel non-overlapping cells. Normalization was done within each cell (i.e., 1 x 1). The Classification methods result was fed into a SVM (SVC class from scikit-learn). We compare the accuracy of CNN and several SVM based Training and testing on both Adience and FERET was classifiers. We limit ourselves to methods involving these performed separately. 30% of images in each database were two approaches because they are among the most effective used for testing, and the rest for training. For each database, and most prevalently used methods reported in the literature after reading the data into arrays, the arrays were shuffled for Gender Classification. and then the testing and training set were separated. Train- Gender classifications with SVM perform on the raw im- ing and testing were repeated 10 times with freshly shuffled age pixels along with different well known feature extrac- data. tion methods, namely DTCWT, PCA, and HOG. Training is done separately on two widely differing datasets consisting Principal Component Analysis of gender labeled human faces: Color FERET, a set of im- PCA is a statistical method for finding correlations between ages taken under similar conditions with good image quality, features in data. When used on images of faces the resulting 99 Joseph Lemley et al. MAICS 2016 pp. 97–102 images are often referred to as Eigenfaces. PCA is used for Table 1: Mean Classification accuracy and Standard Devi- reducing dimensionality of data by eliminating non-essential ation for different methods on the Adience dataset over 10 information from the dataset and is frequently used in both runs. 70% of images used for training and 30% used for test- image processing and machine learning. ing. To create the Eigenfaces we used the RandomizedPCA tool within scikit-learn, which is based on work by (Halko, Method Mean SD Martinsson, and Tropp 2011) and (Martinsson, Rokhlin, and CNN 96.1% 0.0029 Tygert 2011). The resulting Eigenfaces were then used in a PCA+SVM[RBF] 77.4% 0.0071 linear and RBF SVM. SVM[RBF] 77.3% 0.0046 HOG + SVM[RBF] 75.8% 0.006 Convolutional Neural Network HOG+SVM[linear] 75% 0.0053 For the learning stage we used a convolutional neural net- PCA+ SVM[linear] 72 % 0.0032 work with 3 hidden convolutional layers and one softmax SVM[linear] 70.2% 0.0052 layer. The training was done using a GTX Titan X GPU DTCWT on SVM[RBF] 68.5% 0.0059 using the Theano based library Pylearn2 and CUDNN li- DTCWT on SVM[linear] 59% 0.0046 braries. Stochastic gradient descent was used as the training algorithm with a momentum of 0.95, found by trial and error. Table 2: Mean Classification accuracy and Standard Devi- Learning rates under 0.001 did not show any improvement. ation for different methods on the FERET dataset over 10 Increasing the learning rate above around 0.005 results in runs. 70% of images used for training and 30% used for test- decreased classification accuracy. ing. A general outline of the structure of our CNN is: Method Mean SD • Hidden layer 1: A Rectified Linear Convolutional Layer CNN 97.9% 0.0058 using a kernel shape of 4x4, a pool shape of 2x2, a pool DTCWT on SVM[RBF] 90.7% 0.0047 stride of 2x2 and 128 output channels. Initial weights are PCA+SVM[RBF] 90.2% 0.0063 randomly selected with a range of 0.5. SVM[RBF] 87.1% 0.0053 • Hidden layer 2: A Rectified Linear Convolutional Layer HOG+SVM[RBF] 85.6% 0.0042 using a kernel shape of 4x4, a pool shape of 2x2, a pool HOG+SVM[linear] 84.6% 0.0024 stride of 2x2 and 256 output channels. Initial weights are DTCWT on SVM[linear] 83.3% 0.0047 randomly selected with a range of 0.5. PCA+SVM[linear] 81% 0.0071 • Hidden layer 3: A Rectified Linear Convolutional Layer SVM[linear] 76.5% 0.0099 using a kernel shape of 3x3, a pool shape of 2x2, a pool stride of 2x2 and 512 output channels. Initial weights are randomly selected with a range of 0.5. PCA ties with DTCWT on the best performance on FERET but performs better than DTCWT on Adiance. As expected • Softmax layer: Initial weights randomly set between 0 and RBF methods performed better than linear SVM classifiers, 0.5. Output is the class (male or female). however unexpectedly this did not hold true for Adiance, where differences in filters were enough to cancel out the Experimental results effect of RBF in some cases. Every time we used a filter Tables 1 and 2 summarize the classification accuracy of each on FERET RBF was better than linear with filters. This did approach on each data-set after random shuffling and sep- not hold for Adience. None of the filters worked particularly aration into 70% training and 30% testing sets. For each well on Adience, with only PCA slightly outperforming raw method the grayscale pixels were used as the features, ei- pixels for the RBF classifier. ther directly to the classifier, or to the filter mentioned. For On the FERET dataset DTCWT is better (90% vs 86%). example, HOG+SVM[RBF] indicates that we use the pixels On Adience, it is worse (6̃7% vs 77%). This would lend as input to a HOG filter, the output of which is used as the support to the idea that DTCWT seems to work better (in input to a SVM with an RBF kernel. theory) on images that are more similar to FERET (uniform DTCWT was both the second best method (after CNN) lighting, no complex backgrounds, no extreme warping, pix- and the very worst method we examined; its performance elation, or blurring ). has the greatest degree of variability depending on the Using an initial momentum of 0.95 tended to promote fast dataset. It performs very well when objects are consistent convergence without getting stuck in local minimum. We in location and scale. CNN outperformed all methods. Even use a momentum of 0.95 and a learning rate of 0.001. the worst CNN experiment on the most difficult dataset per- Using this setup we have achieved an average valid classi- formed better than the best of any other method on the eas- fication rate of 98% on FERET and 96% on Adience which iest dataset. This is not a surprising outcome. We wanted is better than the previous highest reported results according to see if HOG alone was sufficient to increase classification to (Levi and Hassner 2015) on Adience, but we do not rec- accuracy as a filter. We found that HOG filters with SVM, ommend direct comparison of our results with theirs because without the usual additional models, provide no benefit on of different experimental protocols used. their own over raw pixel values for this experimental setup. One of our aims is to investigate the use of the dual tree 100 Joseph Lemley et al. MAICS 2016 pp. 97–102 complex wavelet transform (DTCWT) on the face feature Déniz, O.; Bueno, G.; Salido, J.; and De la Torre, F. 2011. classification task. Several recent papers report success in Face recognition using histograms of oriented gradients. using DTCWT in gender recognition from frontal face im- Pattern Recognition Letters 32(12):1598–1603. ages citing the benefits of partial rotation invariance. It is Eidinger, E.; Enbar, R.; and Hassner, T. 2014. Age and somewhat unclear how to best use this for “In the wild” im- gender estimation of unfiltered faces. Information Forensics ages. and Security, IEEE Transactions on 9(12):2170–2179. Gutta, S.; Wechsler, H.; and Phillips, P. J. 1998. Gender and Conclusions ethnic classification of face images. In Automatic Face and Much of the previous work on automatic gender classifica- Gesture Recognition, 1998. Proceedings. Third IEEE Inter- tion use differing datasets and experimental protocols that national Conference on, 194–199. IEEE. can make direct comparisons between reported results mis- Halko, N.; Martinsson, P.-G.; and Tropp, J. A. 2011. Finding leading. We have compared nine different machine learning structure with randomness: Probabilistic algorithms for con- methods used in gender recognition on two benchmarks, us- structing approximate matrix decompositions. SIAM review ing identical research methodology to allow a direct com- 53(2):217–288. parison between the efficacies of the different classifiers and Jain, A.; Huang, J.; and Fang, S. 2005. Gender identifi- feature extraction methods. In addition to providing updated cation using frontal facial images. In Multimedia and Expo, information on the effectiveness of these algorithms, we pro- 2005. ICME 2005. IEEE International Conference on, 4–pp. vide directly comparable results. IEEE. The aim of our study was to explore gender classifica- Kalansuriya, T. R., and Dharmaratne, A. T. 2014. Neural tion using recent learning algorithms. We carried out experi- network based age and gender classification for facial im- ments on several state-of-the-art gender classification meth- ages. ICTer 7(2). ods. We compared the accuracy of these methods on two very different data sets (“In the wild” verses posed images). Kingsbury, N. 2001. Complex wavelets for shift invariant To the extent of our knowledge, this is the first use of analysis and filtering of signals. Applied and Computational DTCWT on a large 15, 000 database of “in the wild” im- Harmonic Analysis 10(3):234 – 253. ages, specifically addressing gender classification. We have Laytner, P.; Ling, C.; and Xiao, Q. 2014. Robust face de- achieved an average accuracy of 98% (FERET) and 96% tection from still images. In Computational Intelligence in (Adience), which is better than the previous highest reported Biometrics and Identity Management (CIBIM), 2014 IEEE results (according to (Levi and Hassner 2015)) on Adience Symposium on, 76–80. IEEE. using a CNN. Levi, G., and Hassner, T. 2015. Age and gender classifica- The DTCWT seems to work better (⇡ 90%) on images tion using convolutional neural networks. In Proceedings that are more similar to FERET (uniform lighting, no com- of the IEEE Conference on Computer Vision and Pattern plex backgrounds, no extreme warping, pixelation, or blur- Recognition Workshops, 34–42. ring). Martinsson, P.-G.; Rokhlin, V.; and Tygert, M. 2011. A ran- The Adience and FERET data sets are relatively large and domized algorithm for the decomposition of matrices. Ap- this may explain why the CNN method generally outper- plied and Computational Harmonic Analysis 30(1):47–68. forms other methods: it is known that deep learning per- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; forms well when large training sets are being used. It is Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, interesting to determine in this particular application what R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; “large” actually is. Brucher, M.; Perrot, M.; and Duchesnay, E. 2011. Scikit- learn: Machine learning in Python. Journal of Machine References Learning Research 12:2825–2830. Bekios-Calfa, J.; Buenaposada, J. M.; and Baumela, L. Perez, C.; Tapia, J.; Estévez, P.; and Held, C. 2012. Gen- 2011. Revisiting linear discriminant techniques in gender der classification from face images using mutual information recognition. Pattern Analysis and Machine Intelligence, and feature fusion. International Journal of Optomechatron- IEEE Transactions on 33(4):858–864. ics 6(1):92–119. Belhumeur, P. N.; Hespanha, J. P.; and Kriegman, D. J. 1997. Phillips, P. J.; Moon, H.; Rizvi, S. A.; and Rauss, P. J. 2000. Eigenfaces vs. fisherfaces: Recognition using class specific The feret evaluation methodology for face-recognition algo- linear projection. Pattern Analysis and Machine Intelli- rithms. Pattern Analysis and Machine Intelligence, IEEE gence, IEEE Transactions on 19(7):711–720. Transactions on 22(10):1090–1104. Carcagnı̀, P.; Del Coco, M.; Leo, M.; and Distante, C. 2015. Sultana, M.; Gavrilova, M.; Alhajj, R.; and Yanushkevich, S. Facial expression recognition and histograms of oriented 2014. Adaptive multi-stream score fusion for illumination gradients: a comprehensive study. SpringerPlus 4(1):1–25. invariant face recognition. In Computational Intelligence in Dalal, N., and Triggs, B. 2005. Histograms of oriented gra- Biometrics and Identity Management (CIBIM), 2014 IEEE dients for human detection. In Computer Vision and Pat- Symposium on, 94–101. IEEE. tern Recognition, 2005. CVPR 2005. IEEE Computer Soci- Sun, Z.; Yuan, X.; Bebis, G.; and Loui, S. J. 2002. Neural- ety Conference on, volume 1, 886–893. IEEE. network-based gender classification using genetic search 101 Joseph Lemley et al. MAICS 2016 pp. 97–102 for eigen-feature selection. In Neural Networks, 2002. IJCNN’02. Proceedings of the 2002 International Joint Con- ference on, volume 3, 2433–2438. IEEE. Torrione, P. A.; Morton, K. D.; Sakaguchi, R.; and Collins, L. M. 2014. Histograms of oriented gradients for landmine detection in ground-penetrating radar data. Geoscience and Remote Sensing, IEEE Transactions on 52(3):1539–1550. van der Walt, S.; Schönberger, J. L.; Nunez-Iglesias, J.; Boulogne, F.; Warner, J. D.; Yager, N.; Gouillart, E.; Yu, T.; and the scikit-image contributors. 2014. scikit-image: image processing in Python. PeerJ 2:e453. Verma, A., and Vig, L. 2014. Using convolutional neural networks to discover cogntively validated features for gen- der classification. In Soft Computing and Machine Intelli- gence (ISCMI), 2014 International Conference on, 33–37. IEEE. Wu, Y.; Zhuang, Y.; Long, X.; Lin, F.; and Xu, W. 2015. Human gender classification: A review. arXiv preprint arXiv:1507.05122. 102