Automatic Snake Classification using Deep Learning
Algorithm
Lekshmi Kalinathan1 , Prabavathy Balasundaram1 , Pradeep Ganesh1 ,
Sandeep Sekhar Bathala1 and Rahul Kumar Mukesh1
1
    Department of CSE, SSN College of Engineering, Rajiv Gandhi Salai, Chennai, Tamil Nadu, India


                                         Abstract
                                         Automatic snake classification is the process of identifying snake species using image processing tech-
                                         niques. This system is helpful in reducing the death by snake bites and to suggest appropriate anti-
                                         venom for the victim in a short span of time. The previous works have built systems with relatively
                                         smaller databases using ML and older deep architectures. The systems were capable of identifying only
                                         a few snake species and/or had lower accuracies. However, the Snake classification can further be im-
                                         proved in order to make the system more robust. The proposed system is capable of identifying 772
                                         classes of snake species and is built with a relatively larger dataset using newer deep learning architec-
                                         ture ResNeXt50-V2. An ensembled model is used to further improve the system to achieve an accuracy
                                         of 85.7% and F1-score of 0.68.

                                         Keywords
                                         Snake Classification, Deep Learning, ResNet-50, ResNeXt, Keras


1. Introduction
Death and amputation caused by snake bites is major cause of concern in health care institutions.
There are approximately 1.8 to 2.7 million cases of envenoming each year of which 435,000 to
580,000 snake bites need treatment, for they can cause permanent disability and disfigurement.
  Although only about 20% of snake species worldwide are medically important, identifying
the biting snake is challenging, especially due to:

                  • The high diversity of snake species in snakebite endemic countries (e.g. 310 snake species
                    in India)
                  • The limited herpetological knowledge of communities and healthcare providers con-
                    fronted with snakebite
                  • Incomplete knowledge of their epidemiological importance

   Often people who are bitten by snakes, are unable to get the required treatment immediately
as they are ineffectual in neither identifying the snake nor giving adequate information about

CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania
" lekshmik@ssn.edu.in (L. Kalinathan); prabavathyb@ssn.edu.in (P. Balasundaram);
pradeep19077@cse.ssn.edu.in (P. Ganesh); sandeepsekhar19096@cse.ssn.edu.in (S. S. Bathala);
rahulkumar19086@cse.ssn.edu.in (R. K. Mukesh)
 0000-0003-3903-0410 (P. Balasundaram)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
the snake that bit them. This leads to the physician to guess the type of snake involved in the
situation. If the identification of snake determined by the medical practitioner based on the
clues given is wrong, it will lead to further complications in the case, and, unfortunately can
lead to the death of the patient. This is one of the reasons that decreases the survival rate of
patients of snake bite cases.
   Hence, if we are able to identify a snake using pictures taken in low resolution, it would im-
prove the survival rate of the patients. Moreover, the system will also be useful for conservation
of wildlife as it can be used to identify endangered species of snakes.
   When snake species identification is automated, it could ameliorate public health as there
are only a few initiatives that seek to identify snakes using computer vision techniques. So
far only a handful of computer vision and machine learning algorithms specific to snakes
have been developed. The best existing model as of 2020 achieved an F1-score of 0.625 using
a pre-trained model, with Vanilla ResNet50-V2 architecture. However, none of these are yet
usable in real-world situations where lives may be at stake. Hence, creating an automatic and
robust system for snake species identification is an important goal for biodiversity, conservation,
and global health.


2. Related Work
Alex James et al. [1] presented a parallel processed inter-feature product similarity fusion
based automatic classification of kinds of snakes such as Spectacled Cobra, Russel’s Viper and
King Cobra to name a few. The authors used a database of 88 images of cobra and 48 images
of viper for the initial feature taxonomy analysis and identified 31 different taxonomically
relevant features from snake images for automated snake classification studies. In this paper, the
authors use the nearest-neighbour classifier. This classifier identifies the class of unknown data
sample from its nearest neighbour, whose class is already known. For automatic classification,
the taxonomically relevant features are selected from the snake images and are normalized
using mean-variance filtering. The histograms of gradients and orientation of these normalized
features are used as feature vectors. These feature vectors are evaluated using a proposed
minimum distance product similarity metric classifier. The proposed system was able to achieve
an F-score of 0.91 when 5% of the class samples are used as gallery and remaining 95% of sample
are used as test on snake image database. The authors analysed the scalability and real-time
implementation of the classifier through GPU enabled parallel computing environment. The
developed systems found their application in wild life studies, analysis of snake bites and in
management of snake population.
   Alex Pappachen James et al. [2] in his paper presented the automatic snake identification
problem by developing a taxonomy-based feature, targeted for use by computer scientists and
herpetologists. The feature database contained 38 taxonomically relevant features of each
sample. Out of these 38 features, top features that have highest impact on classification were
determined. In order to find the top features from the complete database, twelve attribute
evaluators (ChiSquared AttributeEval and CfsSubsetEval to name a few) were used. Along
with this, a combination of certain search methods like Genetic Search, Greedy Step-wise and
Linear Forward Selection was used. The feature-subset analysis on the dataset indicated that
only 15 features are sufficient for snake identification. It was found that these features were
almost equally distributed from the logical grouping of top, side and body views of snake images.
Features from the bottom view of snakes had the least role in the snake identification. In order
to perform automated snake classification, 13 classifiers (Bayes Net, Naïve Bayes and Multilayer
perception to name a few) were used. Using these classifiers, the best F-score obtained was
about 0.94.
   Louise Bloch et al. [3] implemented a machine learning workflow that uses Mask Region-
based Convolutional Neural Network (Mask R-CNN) for object detection, and EfficientNets
for classification. The best model submitted by them in the SnakeCLEF 2020 [4] challenge
achieved an F1-score of 0.404. After the expiration of this deadline, it was found that it could be
improved to achieve an F1-score of 0.594. The main improvements in snake species classification
presented by them were based on increasing the image size, combining location and image
information as well as upscaled model architecture.
   Moorthy Gokula Krishnan [5] wrote a report on the improvement of F1-score using pre-trained
network on the large dataset for pre training. Vanilla ResNet50-V2 was used for pretraining of
the database and comparison was drawn between F1-scores of the model which used pretrained
network and model which did not use pre trained network. The conclusion of the comparison
was that model with pretrained network had performed better, the F1-score improved from
0.5813 to 0.6018. The test dataset on which the final scores were calculated, follows a data
distribution similar to validation dataset and the model achieves an F1-score of 0.625 when
tested on AICrowd platform [6].
   Patel A et al. [7] used deep learning methods to develop an application for smartphones
which distinguishes images of 9 different snake species that dwell on the Galápagos Islands in
Ecuador. In order to achieve this, object detection as well as classification algorithms have been
used. The images from the dataset were collected from Tropical Herping collection of image
and Web scraping images from Google and Flickr sites. Different combinations of architecture
models such as Faster R-CNN, Inception V2, ResNet, MobileNet, and VGG16 have been tested
for object detection and image classification. The model which was based on Faster R-CNN
with ResNet achieved the best classification accuracy of 75 %.


3. Methodology
This section discusses about the dataset and base methodology utilized for implementing snake
classification task.

3.1. Dataset
The data set provided by SnakeCLEF2021[8, 9] - Snake species Identification Challenge consists
of a total of 412,537 images. This dataset contains images of 772 different snake species from
188 countries. The majority of data was gathered from online biodiversity platforms such as
iNaturalist and HerpMapper. It was further extended by data scraped from Flickr and images
collected from private collections and museums. The final dataset has a heavy long-tailed class
distribution, where the most frequent species (Thamnophis sirtalis) was represented by 22,163
images and the least frequent by just 10 images (Achalinus formosanus). The final data set was
split into training and validation sets consisting of 347,405 and 38,601 images respectively. Both
subsets have the same class distribution, while the minimum number of validation images per
class is one. A set of 26,531 images containing all 772 classes with a similar class distribution was
used as the testing set. These images are associated with metadata that provides information
about the continent and country of the place where the image has been taken. For some snake
depictions, the information is not given and only “UNKNOWN” is provided in the metadata.

3.2. Deep Network Architecture used
A baseline model of Convolutional Neural Network (CNN) can be scaled up in hopes of achieving
better performance and to solve complex problems. However, it has been found that adding more
layers in the network potentially degrades the performance. This may be due to optimization
function, initialization of the network and more specifically vanishing gradient problem. This
performance degradation has been addressed by ResNet architecture [10].
   The core idea of ResNet is residual network block. This block consists of skip connection
which skips some layers in the neural network by feeding the output of one layer to the
subsequent layer not necessarily the adjacent layer. By using a skip connection, an alternative
path is provided for the gradient. This helps us in avoiding the vanishing gradient problem.


                           Figure 1: ResNet and ResNeXt Network Block


  To improve the effectiveness of ResNet architecture without increasing the number of pa-
rameters, ResNeXt [11] architecture was proposed. In ResNeXt architecture, each residual
network block shown in Figure 1 (a) is split into n number of paths. Number of paths inside
the ResNeXt block is defined as cardinality. In Figure 1 (b), the cardinality is 32. All the paths
contain the same topology. Both the ResNet and ResNeXt network blocks have different width.
Layer-1 in ResNet has one convolution layer with 64 width, while layer-1 in ResNeXt has 32
different convolution layers with 4 width (32*4 width). Validation error will be decreased with
the increase of cardinality. Hence, the performance of the network will be improved.


4. Implementation
The proposed Automatic Snake Classification system was developed by following the steps:

    • Preprocessing the data
    • Batch Accumulation
    • Building the model
    • Ensemble Model for testing

   The automatic snake classification system was developed using ResNeXt50-V2 model with
Keras framework [12]. This model was trained using a machine with Intel Xeon Processor
W-2145, 11M cache, 3.70GHz, 2*16GB DDR4 - 2666Mhz ECC REG DIMM 2TB SATA 7.2K RPM
3.5” HDDs, 240 GB SATA SSD 2.5" HDDs and NVIDIA GeForce RTX2080 Ti, 11 GB.

4.1. Preprocessing the data

Table 1
                                Details of SnakeClef 2021 dataset
                  Subset        # of images % of data min. # of images/class
              Training data        347,405      84.21%            9
              Validation data      38,601        9.36%            1
              Test data            26,531        6.43%            1

   In our implementation, as per the number of images given in Table 1, training data is used to
train the model. Validation data is used to test the model obtained during every epoch to fine
tune the weight parameters. Test data is used to test the final model obtained after the entire
training process.
   The images provided by the SnakeClef2021 dataset are of varied sizes. The training images
are first resized to 224 × 224 × 3 dimensions using a random crop resized operation. The
images were also horizontally and vertically flipped with a probability of 0.5. Further, the
images were scaled and rotated with a probability of 0.5. The images were also normalized
with a standard deviation and mean of 0.23 and 0.5. Similarly, the images of validation and the
testing dataset were resized to 224 × 224 × 3 dimensions using resize function. The images
were also normalized with a standard deviation and mean of 0.23 and 0.5 respectively. Both
the training and testing images were augmented using the Albumentations [13] python library
with the parameters of augmentation method such as RandomResizedCrop, Transpose, Resize,
HorizontalFlip. VerticalFlip, ShiftScaleRotate and Normalize.
4.2. Batch Accumulation

Table 2
                          Model Summary of ResNeXt50-V2 Architecture
          Conv1                                 L1 - [224, 7 x 7, 64]
          Conv2                            3 Stacked Residual Blocks
                    L1- [64, 1 × 1, 128]      L1- [256, 1 × 1, 128]   L1- [256, 1 × 1, 128]
                    L2- [128, 3 × 3, 128]     L2- [128, 3 × 3, 128]   L2- [128, 3 × 3, 128]
                    L3- [128, 1 × 1, 256]     L3- [128, 1 × 1, 256]   L3- [128, 1 × 1, 256]
          Conv3                            4 Stacked Residual Blocks
                    L1- [256, 1 × 1, 256]
                    L2- [256, 3 × 3, 256]     L1- [512, 1 × 1, 256]
                    L3- [256, 1 × 1, 512]     L2- [256, 3 × 3, 256]
                    Downsampling              L3- [256, 1 × 1, 512]
                    L0 -[256, 1 × 1, 512]
                    L1- [512, 1 × 1, 256]     L1- [512, 1 × 1, 256]
                    L2- [256, 3 × 3, 256]     L2- [256, 3 × 3, 256]
                    L3- [256, 1 × 1, 512]     L3- [256, 1 × 1, 512]
          Conv4                            6 Stacked Residual Blocks
                    L1- [512, 1 × 1, 512]
                    L2- [512, 3 × 3, 512]     L1- [1024, 1 × 1, 512]  L1- [1024, 1 × 1, 512]
                    L3- [512, 1 × 1, 1024]    L2- [512, 3 × 3, 512]   L2- [512, 3 × 3, 512]
                    Downsampling              L3- [512, 1 × 1, 1024]  L3- [512, 1 × 1, 1024]
                    L0 -[512, 1 × 1, 1024]
                    L1- [1024, 1 × 1, 512]    L1- [1024, 1 × 1, 512]  L1- [1024, 1 × 1, 512]
                    L2- [512, 3 × 3, 512]     L2- [512, 3 × 3, 512]   L2- [512, 3 × 3, 512]
                    L3- [512, 1 × 1, 1024]    L3- [512, 1 × 1, 1024]  L3- [512, 1 × 1, 1024]
          Conv5                            3 Stacked Residual Blocks
                    L1- [1024, 1 × 1, 1024]
                    L2- [1024, 3 × 3, 1024] L1- [2048, 1 × 1, 1024] L1- [2048, 1 × 1, 1024]
                    L3- [1024, 1 × 1, 1024] L2- [1024, 3 × 3, 1024] L2- [1024, 3 × 3, 1024]
                    Downsampling              L3- [1024, 1 × 1, 2048] L3- [1024, 1 × 1, 2048]
                    L0 -[1024, 1 × 1, 2048]
          Softmax                               L1 – [2048, - , 772]

   Training data is split into 3860 batches where each batch consists of 5 folds or mini-batches.
Training will be inefficient due to noisy gradients, if the mini-batch is small and consists of a set
of images without the inclusion of all the classes. Hence, the images in mini-batch have been
chosen in such a way that it covers all the classes. A stratified k-fold cross-validation where
k = 5, has been adapted to enforce the class distribution in each split of the data to match the
distribution in the complete training dataset. Cosine annealing with warm restarts scheduler
has been used to update the learning rate. Learning rate of 1e-4 and Adam optimizer [14] were
used to fine-tune the weight matrices of the architecture.
                (a) Epoch vs F1-Score                          (b) Epoch vs Accuracy


                                        (c) Epoch vs Loss
                 Figure 2: Performance Measures of ResNeXt50-V2 Architecture


4.3. Building the model
Architecture of the ResNext model consists of Conv1 layer with 64 filters of each 7 × 7. The
output of Conv1 which is 64 feature maps were the input for 3 stacked Conv2 block. The output
of Conv2 which is 256 feature maps were the input for 4 stacked Conv3 block. The output
of Conv3 which is 512 feature maps were the input for 6 stacked Conv4 block. The output of
Conv4 which is 1024 feature maps were the input for 3 stacked Conv5 block. The output of
Conv5 which is 2048 feature maps were the input for soft max layer whose output is 772 classes.
Model summary of the implemented ResNeXt50-V2 is shown in Table 2. The following hyper
parameters were used to train the model:

    • The input image size was kept as 224 × 224 × 3
    • Image Augmentations used during training process were as follows:
      RandomResizedCrop, Transpose, Resize, HorizontalFlip. VerticalFlip, ShiftScaleRotate
      and Normalize methods from Albumentations Python library
    • The model was trained for 17 epochs
    • CrossEntropyLoss with default parameters
    • Learning rate of 1e-4 with “CosineAnnealingWarmRestarts” scheduler (T_max=10, eta_min=1e-
      6, last_epoch=-1)
    • Adam optimizer was used


5. Results and Analysis
This section discusses the analysis of training and testing processes through various performance
measures.

5.1. Analysis of training process
The training process was analysed in order to determine whether the model is stable and free
from overfitting.
   During the training process, F1-Score and training accuracy were measured for every epoch
and plotted as shown in Figure 2a and 2b respectively. It is inferred from Figure 2a that the
F1-Score is increasing right from the very first epoch and get stabilized to 0.6008 from 15𝑡ℎ
epoch. From Figure 2b, it is inferred that the training accuracy is stabilized from 11𝑡ℎ epoch.
Similarly, the training and validation losses were also measured for every epoch and plotted as
shown in Figure 2c. It is clearly found from the graph that the validation loss for every epoch is
lesser when compared to the training loss. From this it is inferred that the generated model is
free from overfitting.

5.2. Analysis of testing process

Table 3
                          Performance measures of different models
              ResNeXt50-V2 Models F1-Country Score F1-Score              Accuracy
              Model-13                     0.42              0.38          0.66
              Model-14                     0.51              0.40          0.69
              Model-15                     0.44              0.40          0.68
              Model-16                     0.40              0.38          0.68
              Model-17                     0.37              0.37          0.68
              Ensembled Model              0.67              0.68          0.86

   During the testing process, instead of generating predictions with the recent model, our
proposed ensemble model has been utilized. In order for ensembling, the models corresponding
to the last 5 epochs i.e models 13, 14, 15, 16 and 17 were considered. These models were chosen
to be the top models as the accuracy and F1-Score were stabilized during the corresponding
epochs. In the process of ensembling, the predictions generated from every model are averaged.
   Experiments were conducted to generate the predictions from every individual and ensembled
model as shown in Table 3. Models 13, 14, 15, 16 and 17 generate different predictions for each
image due to the fact that they have different weight matrices. Each model may have specialised
in predictions for distinct set of images. In the ensembled model, the average of the predictions
generated from all the models in consideration is taken. This increases the accuracy of the
model manifold. The same can be deduced from the table. The ensembled model generated an
Accuracy and F1-Country Score of 85.77% and 0.724 respectively.


6. Conclusions and Future Work
In conclusion, it can be stated that snake species identification is a challenging task, primarily
because of the high diversity of snake species and the data set having a heavy long-tailed class
distribution. Automatic classification has been built using ResNeXt50-V2 Architecture and
the model achieved an F1-country score of 0.724. In future, accuracy of the system can be
enhanced through preprocessing of the images. Subsequently, alternate deep architectures can
be implemented and tested for better accuracies.


7. Acknowledgement
We thank the CSE department of SSN College of Engineering for letting us utilize the GPU
server machine extensively to implement this task.


References
 [1] A. James, Snake classification from images, PeerJ Preprints 5 (2017).
 [2] A. P. James, B. Mathews, S. Sugathan, D. K. Raveendran, Discriminative histogram
     taxonomy features for snake species identification, Human-Centric Computing and
     Information Sciences 4 (2014).
 [3] L. Bloch, A. Boketta, C. Keibel, E. Mense, A. Michailutschenko, O. Pelka, J. Rückert,
     L. Willemeit, C. Friedrich, Combination of image and location information for snake
     species identification using object detection and efficientnets, in: CLEF working notes,
     2020.
 [4] L. Picek, R. Ruiz De Castaneda, A. M. Durso, P. Sharada, Overview of the snakeclef
     2020: Automatic snake species identification challenge, in: Working Notes of CLEF 2020 -
     Conference and Labs of the Evaluation Forum, 2020.
 [5] M. G. Krishnan, Impact of pretrained networks for snake species classification., in:
     Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, 2020.
 [6] A. M. Durso, G. K. Moorthy, S. P. Mohanty, I. Bolon, M. Salathé, R. Ruiz De Castañeda,
     Supervised learning computer vision benchmark for snake species identification from
     photographs: Implications for herpetology and global health, Frontiers in Artificial
     Intelligence 4 (2021) 17.
 [7] A. Patel, L. Cheung, N. Khatod, I. Matijosaitiene, A. Arteaga, J. W. Gilkey, Revealing the
     unknown: real-time recognition of galápagos snake species using deep learning, Animals
     10 (2020) 806.
 [8] L. Picek, A. M. Durso, R. Ruiz De Castañeda, I. Bolon, Overview of snakeclef 2021:
     Automatic snake species identification with country-level focus, in: Working Notes of
     CLEF 2021 - Conference and Labs of the Evaluation Forum, 2021.
 [9] A. Joly, H. Goëau, S. Kahl, L. Picek, T. Lorieul, E. Cole, B. Deneu, M. Servajean, R. Ruiz De
     Castañeda, I. Bolon, H. Glotin, R. Planqué, W.-P. Vellinga, A. Dorso, H. Klinck, T. Denton,
     I. Eggel, P. Bonnet, H. Müller, Overview of lifeclef 2021: a system-oriented evaluation of
     automated species identification and species distribution prediction, in: Proceedings of
     the Twelfth International Conference of the CLEF Association (CLEF 2021), 2021.
[10] C. Szegedy, S. Ioffe, V. Vanhoucke, A. Alemi, Inception-v4, inception-resnet and the impact
     of residual connections on learning, in: Proceedings of the AAAI Conference on Artificial
     Intelligence, volume 31, 2017.
[11] S. Xie, R. Girshick, P. Dollár, Z. Tu, K. He, Aggregated residual transformations for deep
     neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern
     Recognition, 2017.
[12] A. Gulli, S. Pal, Deep learning with Keras, Packt Publishing Ltd, 2017.
[13] A. Buslaev, V. I. Iglovikov, E. Khvedchenya, A. Parinov, M. Druzhinin, A. A. Kalinin,
     Albumentations: fast and flexible image augmentations, Information 11 (2020) 125.
[14] D. Kingma, J. Ba, Adam: A method for stochastic optimization, in: International Conference
     on Learning Representations, 2014.