=Paper=
{{Paper
|id=Vol-2936/paper-133
|storemode=property
|title=Snake Species Classification using Transfer Learning Technique
|pdfUrl=https://ceur-ws.org/Vol-2936/paper-133.pdf
|volume=Vol-2936
|authors=Karthik Desingu,Mirunalini Palaniappan,Jitesh Kumar
|dblpUrl=https://dblp.org/rec/conf/clef/DesinguPK21
}}
==Snake Species Classification using Transfer Learning Technique==
Snake Species classification using Transfer learning Technique Karthik Desingu1 , Mirunalini Palaniappan1 and Jitesh Kumar2 1 Department of Computer Science and Engineering, Sri Sivasubramaniya Nadar College of Engineering, India 2 Department of Computer Science and Engineering, Sri Venkateshwara College of Engineering, India Abstract Transfer learning is a technique that helps to utilise the knowledge of previously trained machine learn- ing models by extending them to solve any related problem. This technique is predominantly used when there is either a scarcity of computational resource or limited availability of labelled data. Categorizing snake at the species level can be instrumental in treatment of snake bites and clinical management. We propose a deep learning model based on transfer learning technique to build a snake species classifier that uses snake photographic images in combination with their geographic location. We have used the Inception ResNet V2 as a feature extractor, extracted the feature vector for each input image and concatenated it with geographic feature information. The concatenated features are classified using a lightweight gradient boost classifier. Keywords Transfer Learning, Inception ResNet, Gradient Boosting, Snake Species Classification, Metadata Inclu- sion 1. Introduction Snake species identification is essential for biodiversity, conservation and global health. Millions of snake bites occur globally every year, half of which cause snakebite envenoming (SBE), killing people and disabling more in different regions across the globe [1]. Taxonomic identification of the species helps the healthcare providers to articulate the symptoms, responses of the treatment and antivenom efficacy and also aid in clinical management [2, 3]. Identification of the snake species is difficult because of similarity in appearance, situational stress and fear of potential danger [4]. An automatic system that helps in recognizing the snake species from the photographic image and geographic information can be paramount in overcoming the above problems. Hence, we propose an automated system based on transfer learning techniques that utlilizes pre-trained weights of the Inception ResNet V2 [5] to extract input image features. The extracted features, in combination with the geographic features, are classified using a LightGBM [6], a gradient boosting classifier. The Inception ResNet V2 incorporates residual connections into the inception architecture to perform enhanced feature extraction from images. The Inception ResNet V2 is a convolution CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania " karthik19047@cse.ssn.edu.in (K. Desingu); miruna@ssn.edu.in (M. Palaniappan); 2018cse0716@svce.ac.in (J. Kumar) 0000-0001-6433-8842 (M. Palaniappan) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) neural network which has 164 deep layers where multi-sized convolution filters are combined by residual connections which not only avoids the degradation caused by the deep layers but also reduces training time. The knowledge acquired by the model by training on the ImageNet data set [7] is utilized through transfer learning as a feature extractor. Gradient boosting [8] is a machine learning technique that can be used for supervised classi- fication problems to produce a prediction model. It is an ensemble of weak prediction models, typically decision trees, known for its prediction speed and accuracy with large and complex data sets. It minimizes the overall prediction error by iteratively generating optimized new models based on the loss function of the previous model. After concatenating the representation vectors of the input images with the geographic information, we trained a lightweight gradient boost classifier to predict the snake species. 2. Dataset As part of the LifeCLEF-2021 [9], an evaluation campaign aimed at data-oriented challenges re- lated to the identification and prediction of biodiversity, SnakeCLEF-2021 [10] is an image-based snake identification task. For this challenge, a large data set with 414,424 RGB photographic images belonging to 772 distinct snake species, taken in 188 countries is provided. Additionally, geographic metadata comprising of country and continent information is provided to facilitate classification. The data set is split into a training subset with 347,406 images, and a validation sub-set with 38,601 image, both having the same class distribution. The data set is highly imbalanced with a heavy long-tailed distribution. The most frequent class is represented with 22,163 images while the least frequent class by a mere 10 images. A large number of classes in combination with a high intra-class variance (depicted in Figure 1 and low inter-class variance makes this an exigent machine learning classification task. Figure 1: Four images of the Dispholidus Typus snake species with high visual deviation characterized by age and gender depicting an instance of high inter-class variance in the data set 3. Related Work An investigation of the accuracy of five machine learning techniques — decision tree J48, nearest neighbors, k-nearest neighbors (k-NN), back-propagation neural network, and naive Bayes — for image-based snake species identification problem was performed in [11]. It revealed the efficacy of back-propagation neural networks which achieved a greater than 87% classification accuracy. A Siamese network with three main components namely, twin network, similarity function and output neuron was proposed in [12] to classify the snake species. A pair of deep neural networks was proposed where one network extracts features from the test image while the other from a reference image. The features were compared using L1 distance similarity and the final output layer predicted the probability of the test image belonging to same class as the reference image. Four different region-based convolution neural networks (R-CNN) architectures - Inception V2, MobileNet, ResNet and VGG16 were used in [13] for object detection and image recognition of 9 snake species of the Pseudalsophis genus. Among them, VGG16 and ResNet achieved the highest accuracy of 75%. A detailed quantitative comparative study between a computer vision algorithm trained to identify 45 species and human experts was performed in [14]. The algorithm used an EfficientNet based model, fine-tuned using preprocessed images to achieve an accuracy between 72 and 87% depending on the test data set. The significant impact of geographic data in addition to visual information for snake species classification was also realized. 4. Methodology A transfer learning method is adopted to classify the snake species using the data set of snake images and geographic location metadata provided by SnakeCLEF-2021 [9, 10]. The pre-trained Inception ResNet V2, a deep learning convolution neural network is used to extract image features. These features are concatenated with the categorical geographic features and finally classified using a gradient boost classifier. 4.1. Preprocessing The input images were resized to 299 × 299 × 3 using bi-linear interpolation. To counter the effect of irrelevant factors in the context of the required task such as variation in lighting conditions among the photographs, the images were linearly normalized to values between 0 and 1. Scale and rotation transformations, along with contrast and saturation variations were performed to make the model more generic, immune to the impact of positional and orientation based features and prevent memorization by enhancing image diversity. RandAugment [15] was used to augment the input images using the aforementioned transformations. RandAugment is parameterized by two values - the number of augmentation transformations to apply sequentially (N), and the magnitude for all the transformations (M). The values used in [15] for the ResNet model i.e N=3 and M=4 were chosen. 4.2. Feature Extraction The Inception ResNet V2 model was used to perform feature extraction. The model is loaded with weights obtained from pre-training on the ImageNet data set. The fully connected output layer was excluded from the base model. A 2D average pooling layer is appended to produce the representation vector of the input image. The pre-processed images are fed to the so constructed convolution neural network to produce a feature vector. We have obtained 1536 features for each input image from the output layer. This vector is then augmented with the geographic metadata, containing country and continent information, to perform the snake species classification. 4.3. Gradient Boost Classifier A decision tree ensemble classifier is trained using the metadata about the geographic location of the photograph along with the image feature vector obtained form the Inception ResNet V2. Gradient boosting algorithm is used to train the classifier. The parameters of the classifier are tuned over several runs to improve classification results. Five-fold cross-validation is used to obtain a reliable evaluation of model performance for each configuration of parameters. The classifier is trained five times per run, each time selecting a different fold as the cross-validation set and training on the remaining four folds. The average of the performance parameters ( accuracy and F1 score ) over the five iterations is considered while tuning the parameters. Cross-entropy loss is used to monitor the model’s convergence towards the objective in each fold. Early stopping is used to stop the boosting process if the loss starts to diverge. 5. Implementation Details The training subset consisting of 347,406 images was split into five folds for cross-validation while training the classifier. Since the data-set had a long-tailed distribution across classes, stratified sampling was used to ensure a proportional split and ensure inclusion of images from each class. The pre-processed images from the training and validation set are fed into the proposed deep learning convolution neural network model. The model produces feature-vectors of size 1536 for each image. The geographic location describing where the photographs were taken, specifically the continent and country, are encoded into numeric labels. This information is used as categorical features in classifier. For the images in which this data is unavailable, the features are encoded as ’nan’. The classifier imputes the missing values to the mode of the corresponding feature space. The representation vector consisting of 1538 features obtained for each image is used to train the decision tree ensemble classifiers by gradient boosting. It was observed that learning rates higher than 0.05 lead to quicker divergence, suggesting the suitability of a slower learning rate using with more decision trees. Grid-search was performed by varying the learning rates in the range of 0.001 to 0.05 and the number of decision trees in the range 100 to 1000. Combinations having the least losses were chosen to further tune the tree-level parameters. The maximum depth for the tree is left to be determined based on the training progress of the classifier and is not set strictly. This causes the depth to expand until the leaves are pure (has all samples belonging to the same class) or has reached the threshold of minimum number of samples required to split further. Due to the long-tailed distribution of the data set, some classes may require deeper branches to capture more information from the features. The potential over-fitting that may occur is controlled by tuning and setting an upper limit on the number of leaves by performing a grid search over values in range of 32 to 256. Some other notable tree-level parameters tuned were sub-sampling rate and column-sampling rate. Sub-sampling rate determines the fraction of training samples that are randomly sampled per tree and was tuned between 0.6 and 1.0. Column-sampling rate, on the other hand, specifies the fraction of features used to fit each decision tree and was tuned between 0.5 and 0.9. Both these parameters help prevent over-fitting. They are maintained sufficiently above 0.1 to prevent under-fitting. 6. Results The country and continent metadata, used as categorical features in the classifier had a significant impact on the classification. Without the categorical data, the testing accuracy of the best run was 40.16%. This improved to 42.96% when contextual data was encoded as categorical features. Country information has the highest impact while continent information also has a notable influence on the classification. Figure 2, depicts the relative importance of the 20 most significant features of the 1538 features used for classification. The feature importance values are normalized and scaled between 0 and 100 to realise the relative impacts. Features named as f1, f2, etc. denote features extracted from the convolution neural network. Through parameter tuning, the classifier’s performance was improved over several runs. The F1-scores macro-averaged across the countries and macro-averaged over all classes were the prescribed the metrics [10]. Eight best runs were selected based on the prescribed metrics evaluated on the prescribed validation set. The metrics are evaluated as an average over the five iterations (for 5-fold cross validation) performed in each run. We have achieved a training accuracy of 71.32%, validation accuracy of 44.16% and a testing accuracy of 42.96% on the best run. The results are summarized in Tables 1 and 2 below: Table 1 Prediction metrics of the five best runs on the validation set Run F1-Score (Country) F1-Score (Overall) Accuracy 1 0.455 0.456 0.531 2 0.482 0.469 0.554 3 0.509 0.481 0.569 4 0.522 0.488 0.583 5 0.536 0.497 0.622 Figure 2: Relative importance on a scale of 0-100 of the 20 most impactful features used to train the classifier. The first two bars represent feature importance of country and continent respectively Table 2 Prediction metrics of the five best runs on the test set Run F1-Score (Country) F1-Score (Overall) Accuracy 1 0.246 0.164 0.428 2 0.247 0.166 0.430 3 0.249 0.159 0.428 4 0.249 0.162 0.432 5 0.252 0.162 0.432 7. Conclusion and Future Work The results depict the positive impact of integrating contextual country and continent data for snake species classification. Introducing more contextual data such as population counts of various species by region as class-wise probability priors [16], climate information such as temperature and humidity, etc. may contribute to better classification results. Due to unavailability of sufficient computational resources during the SnakeCLEF-2021 contest period, the results were submitted before complete convergence of the classifier’s training process. Post the deadline, significant improvements in classification accuracy were observed even with a slight increase in the number of iterations applied to train the gradient boost classifier. This suggests that the transfer learning approach adopted here is promising and further parameter tuning and complete training can greatly improve the model performance. Further efforts to experiment with input image resolutions and alternative pre-trained weights [17] as well as including custom training layers to the frozen base model before extracting features [18] can contribute to the classification performance. Acknowledgments Thanks to the Machine Learning Research Group (MLRG), Deptartment of Computer Science and Engineering, Sri Sivasubramaniya Nadar College of Engineering, Chennai, India ( https: //www.ssn.edu.in/ ) for providing the GPU resources to implement the model References [1] J. M. Gutiérrez, J. J. Calvete, A. G. Habib, R. A. Harrison, D. J. Williams, D. A. Warrell, Snakebite envenoming, Nature reviews Disease primers 3 (2017) 1–21. [2] YANG, Zihan, Sinnott, Richard., Snake detection and classification using deep learning, in: Proceedings of the 54th Hawaii International Conference on System Sciences, 2021. [3] A. GARG, D. LEIPE, P. UETZ, The disconnect between dna and species names: lessons from reptile species in the ncbi taxonomy database, Zootaxa 4706 (2019). [4] B. N. S. B. W. K. H. M. D. C. Stephen W. Corbett, Brian Anderson, Most lay people can correctly identify indigenous venomous snakes, American Journal of Emergency Medicine, The 2698 (2005) A3–A12. [5] C. Szegedy, S. Ioffe, V. Vanhoucke, A. Alemi, Inception-v4, inception-resnet and the impact of residual connections on learning, Proceedings of the AAAI Conference on Artificial Intelligence 31 (2017). URL: https://ojs.aaai.org/index.php/AAAI/article/view/11231. [6] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, T.-Y. Liu, Lightgbm: A highly efficient gradient boosting decision tree, Advances in neural information processing systems 30 (2017) 3146–3154. [7] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: A large-scale hierarchical image database, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255. doi:10.1109/CVPR.2009.5206848. [8] J. H. Friedman, Greedy function approximation: a gradient boosting machine, Annals of statistics (2001) 1189–1232. [9] A. Joly, H. Goëau, S. Kahl, L. Picek, T. Lorieul, E. Cole, B. Deneu, M. Servajean, R. Ruiz De Castañeda, G. H. Bolon, Isabelle, R. Planqué, W.-P. Vellinga, A. Dorso, P. Bonnet, I. Eggel, H. Müller, Overview of lifeclef 2021: a system-oriented evaluation of automated species identification and species distribution prediction, in: Proceedings of the Twelfth International Conference of the CLEF Association (CLEF 2021), 2021. [10] L. Picek, A. M. Durso, R. Ruiz De Castañeda, I. Bolon, Overview of snakeclef 2020: Automatic snake species identification with country-level focus, in: Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, 2021. [11] A. Amir, N. A. H. Zahri, N. Yaakob, R. B. Ahmad, Image classification for snake species using machine learning techniques, in: S. Phon-Amnuaisuk, T.-W. Au, S. Omar (Eds.), Computational Intelligence in Information Systems, Springer International Publishing, Cham, 2017, pp. 52–59. [12] C. Abeysinghe, A. Welivita, I. Perera, Snake image classification using siamese networks, in: Proceedings of the 2019 3rd International Conference on Graphics and Signal Processing, ICGSP ’19, Association for Computing Machinery, New York, NY, USA, 2019, p. 8–12. URL: https://doi.org/10.1145/3338472.3338476. doi:10.1145/3338472.3338476. [13] A. Patel, L. Cheung, N. Khatod, I. Matijosaitiene, A. Arteaga, J. W. Gilkey, Revealing the unknown: Real-time recognition of galápagos snake species using deep learning, Animals 10 (2020). URL: https://www.mdpi.com/2076-2615/10/5/806. [14] A. M. Durso, G. K. Moorthy, S. P. Mohanty, I. Bolon, M. Salathé, R. Ruiz De Castañeda, Supervised learning computer vision benchmark for snake species identification from photographs: Implications for herpetology and global health, Frontiers in Artificial Intelligence 4 (2021) 17. [15] E. D. Cubuk, B. Zoph, J. Shlens, Q. V. Le, Randaugment: Practical automated data augmen- tation with a reduced search space (2019). arXiv:1909.13719. [16] J. Wang, Y. Yang, J. Mao, Z. Huang, C. Huang, W. Xu, Cnn-rnn: A unified framework for multi-label image classification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [17] L. Picek, R. Ruiz De Castaneda, A. M. Durso, P. Sharada, Overview of the snakeclef 2020: Automatic snake species identification challenge, CLEF task overview (2020). [18] M. Zhong, J. LeBien, M. Campos-Cerqueira, R. Dodhia, J. Lavista Ferres, J. P. Velev, T. M. Aide, Multispecies bioacoustic classification using transfer learning of deep convolutional neural networks with pseudo-labeling, Applied Acoustics 166 (2020) 107375. URL: https:// www.sciencedirect.com/science/article/pii/S0003682X20304795. doi:https://doi.org/ 10.1016/j.apacoust.2020.107375.