<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Conference and Labs of the Evaluation Forum, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Incorporation of object detection models and location data into snake species classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Regő Borsodi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dávid Papp</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics</institution>
          ,
          <addr-line>Budapest</addr-line>
          ,
          <country country="HU">Hungary</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>2</volume>
      <fpage>1</fpage>
      <lpage>24</lpage>
      <abstract>
        <p>Photo-based automatic snake species identification could assist in clinical management of snakebites. LifeCLEF announced the SnakeCLEF 2021 challenge, which aimed attention on this task and provided location metadata for most snake images. This paper describes the participation of the BME-TMIT team in this challenge. In order to reduce clutter and drop the unnecessary background, we employed the state-of-the-art EficientDet object detector, which was fine-tuned on manually annotated images. Detected snakes were then classified by EficientNet weighted with the likelihood of location information. Based on the oficial evaluation of SnakeCLEF 2021, our solution achieved an 1 country score of 0.903, which placed our team at rank 1 position in the challenge.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;classification</kwd>
        <kwd>object detection</kwd>
        <kwd>convolutional neural networks</kwd>
        <kwd>snake species identification</kwd>
        <kwd>SnakeCLEF</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Snakebite envenoming is a potentially life-threatening disease caused by toxins in the bite of a
venomous snake. Envenoming can also be caused by having venom sprayed into the eyes by
certain species of snakes that have the ability to spit venom as a defense measure [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Most of
these occur in Africa, Asia, and Latin America. In Asia, up to 2 million people are envenomed by
snakes each year, while in Africa, there are an estimated 435 000 to 580 000 snakebites annually
that need treatment. Identification of the snake is essential in planning treatment in certain
areas of the world, but it is a dificult task and not always possible [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Ideally the snake would
be brought in with the person, but attempting to catch or kill the ofending snake also puts
one at risk for re-envenomation or creating a second person bitten. On the other hand, taking
a photo of the snake is much more feasible, less dangerous, and generally is recommended.
The three types of venomous snakes that cause the majority of injuries are vipers, kraits, and
cobras. Knowledge of what species are present locally can be crucial in searching for adequate
medicine [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Developing a robust system for identifying species of snakes from photographs and
geographical information could significantly improve snakebite eco-epidemiological data and
correct antivenom administration [
        <xref ref-type="bibr" rid="ref2 ref4">2, 4</xref>
        ]. In 2021, the fifth round of SnakeCLEF [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] challenge
was announced by the LifeCLEF [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] campaign, where the goal was to create a system that is
capable of automatically categorizing snakes on the species level. In this paper, we describe the
solution of our team (BME-TMIT) in detail, which achieved rank 1 in the challenge. We first
perform object detection on the images to separate the snake(s) from the background; then, the
next step is to classify each detected snake. Finally, classification data is fused with location
information using a likelihood weighting method. These steps are specified in Sections 4, 5,
and 6, respectively. Section 2 briefly outlines the corresponding literature in the field; lastly,
Section 7 summarizes our conclusion.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        Common tasks in computer vision, and therefore in image-based snake species identification,
include object detection and object classification. The former aims to locate the region of
an image where the object of interest may appear [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], while the latter involves categorizing
an input image into several predefined classes [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In the past decade, Convolutional Neural
Networks (CNN) became the most popular approach for visual recognition due to their superior
performance. Architectures of CNN that are often used for classification in practice are among
others VGGNet [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Inception [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], Residual Network (ResNet) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], EficientNet [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and
MobileNet [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>Regarding object detection, numerous deep learning frameworks have been proposed in
the literature, and they can be organized into two categories: (i) two-stage detectors and (ii)
one-stage detectors. Two-stage detection models first generate region proposals, and then
they are forwarded to a specific network for further classification. The most prominent
twostage detectors are Region-based CNN (R-CNN) [14] and its’ successors: Fast R-CNN [15] ,
Faster R-CNN [16], Mask R-CNN [17]. Diferently, there is no separate procedure for region
proposal in one-stage detection models, as they are designed to predict the bounding boxes
and the corresponding class probabilities at once [18]. Popular one-stage detectors include You
Only Look Once (YOLO) [19, 20, 21, 22], Single Shot Detector (SSD) [23], RetinaNet [24], and
EficientDet [25].</p>
      <p>Automated image-based snake species identification is challenging due to small inter-class
variance, high intra-class variance, and a large number of categories (species) [26]. Furthermore,
labeled snake image collections usually contain only several hundred images. 38 diferent
taxonomically relevant features were manually identified to categorize six snake species in [ 27],
while Amir et al. [28] used automated feature extraction to get texture features and distinguish
22 diferent snake species. Other researchers applied deep learning techniques. For example,
Faster R-CNN and ResNet were used to classify nine diferent snake species occurring on the
Galápagos Islands of Ecuador [29]; authors of [30] experimented with three diferent-sized CNN
architectures; recently, Yang and Sinnott classified Australian snakes using deep networks [ 31].
In cases when only a small amount of labeled snake images are available, it could be beneficial
to use a few-shot learning approach, such as the Siamese network [32], or the recently proposed
Double-View Matching Network (DVMN) [33], although, the latter was tested on X-ray images.
Abeysinghe et al. [34] performed single-shot learning by applying the Siamese network to
categorize 84 snake species, where only 3 to 16 training images were available per species. On
the other hand, Putra et al. [35] aimed to build a system to recognize the existing bite points
on the snake bite image and then classified to venomous snake bite or non-venomous using
Chain Code and K Nearest Neighbor (KNN) classification. Their bite mark recognition method
achieved 65% accuracy while distinguishing venomous from non-venomous bites was possible
with 80% accuracy.</p>
      <p>For the SnakeCLEF 2020 challenge, 287 632 photographs were prepared that belong to 783
snake species and were taken in 145 countries [26]. The qualified teams used object detection
on the images as a preprocessing step. Gokula Krishnan used an object detection model with
ResNet-50 backbone [36], and team FHDO-BCSG utilized a Mask R-CNN model reaching 39.0
mAP (mean average precision) [37] on the COCO (Common Objects in Context) dataset1.</p>
      <p>The dataset was expanded to a size of 414 424 images in SnakeCLEF 2021, which is
approximately a 44% increase compared to the previous challange, while slightly lowering the
number of snake species to 772; however, the photographs were taken in 188 countries. The
full dataset was split into a training subset with 347 405 images, a validation subset with 38
601 images, and a test subset with 28 418 images. Each subset has the same class distribution,
while the minimum number of training, validation, and test images per class is nine, one, and
one, respectively. Furthermore, the test subset contains all 772 classes and observations from
almost all the countries presented in training and validation sets. Similarly to the teams of
SnakeCLEF 2020, we applied object detection and then object classification on the images. The
main contributions of our work are (i) training the EficientDet-D1 for accurate snake detection,
(ii) using double thresholding to categorize the predicted bounding boxes, and (iii) the fusion of
class membership vectors with the likelihood of location data.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Evaluation metrics</title>
      <p>Various metrics were used to measure the performance of classification. The 1 score is
computed for the -th species as:
1 = 2 · precision · recall</p>
      <p>precision + recall
macro 1 = ∑︁ 1</p>
      <p>=1
 = ∑︁ 1</p>
      <p>∈ ||</p>
      <sec id="sec-3-1">
        <title>1Datasets are available at: https://cocodataset.org/#download</title>
        <p>(1)
(2)
(3)
The macro 1 is calculated by averaging the 1 values over all the species ( is the number of
species):
Let  be a set containing the indices of species that might appear in the -th country (these
data were available, provided by the organizers). Then the country-averaged  for the given
country is defined as:
The primary metric in the challenge is the 1 country score, and the secondary is the macro 1
value. For evaluating the models, we also use classification accuracy defined as:
Categorical cross entropy is used both as an evaluation metric and as the function to be optimized
during training the network (loss). If the output of the network for a sample is y and the ground
truth is y^, categorical cross entropy is calculated as:
1 country = ∑︁</p>
        <p>=1
Accuracy =
#correct predictions</p>
        <p>#samples
Γ(y^, y) = −

∑︁ y^i · log yi
=1
The 1 country score is the average of the  values for all countries (let their number be  ):</p>
      </sec>
      <sec id="sec-3-2">
        <title>When evaluating multiple samples, their losses are averaged.</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Object detection</title>
      <p>Object detection can improve snake classification accuracy as results from earlier rounds of the
SnakeCLEF competition show. As mentioned in Section 2, top-scoring teams applied ResNet-50
and Mask R-CNN in round 4 (SnakeCLEF 2020). We conducted experiments with a lightweight
SSD MobileNet v2 and a state-of-the-art EficientDet-D1 model. More complex models (D2 to
D7) might improve the results in the presence of suficient training data; however, they are
prone to overfitting. Table 1 shows the comparison of the described models. Both SSD and
EficientDet-D1 are one-stage networks, which generally run faster than the two-stage Mask
R-CNN.</p>
      <p>A subset of the train and val datasets consisting of 4700 images was annotated with the help
of the labelImg utility [38]. These samples were split in an 85% − 15% ratio to construct a
training and a validation input for the object detection models. The models were initialized with
weights pre-trained on the COCO 2017 dataset and then fine-tuned on the annotated part of the
SnakeCLEF dataset using the Tensorflow Object Detection API. Both models were trained to
(4)
(5)
(6)
detect two classes (snake and background) and used with mostly the default settings. For data
augmentation, only vertical flipping was used. The validation mAP’s plateaued before reaching
30000 steps in both cases. The best results are shown in Table 1. The EficientDet recorded
higher mAP; however, the diference is not as big as in the case of the COCO dataset, possibly
because there are only two classes. Some examples of detected snakes are shown in Figure 1.</p>
      <p>As the EficientDet-D1 performed better, only this model was put to use in image
classification. After running the inference on a picture, the model’s output consists of the coordinates
of bounding boxes and the 0, 1 probabilities of the box containing background or snake,
respectively. One image could contain multiple snakes (most likely of the same species) or none
at all; therefore, our method of using the boxes (ordered descending by 1) is the following:
• If for the first box 1 ≥ ℎℎ, then we take this, and the next boxes having 1 ≥ ℎℎ
(up to a maximum of three boxes, although there was not a single image in the dataset
for which at least 3 boxes met this constraint for ℎℎ = 0.75)2.
• If for the first box  ≤ 1 &lt; ℎℎ we take only the first box.
• If 1 &lt;  for the first box, then we take the whole image. During training, we
experimented with dropping such images, but it is not possible in the case of validation
or testing.</p>
      <p>2The reason for this is that one-phase object detection networks use non-max suppression to prevent overlapping
boxes containing the same class in the result.</p>
      <p>If multiple boxes are cropped from training or validation images, all of them are taken to the
dataset with the ground truth label. In the case of testing, the classifier is run for all boxes, and
the results are combined by first summing the prediction vectors and then normalizing it to a
unit vector. More accurately, if the prediction vectors for the  diferent boxes are: y1, y2, . . . ,
yn, then the combined prediction is y (where || · || means the Euclidean norm):
y =</p>
      <p>∑︀=1 yi
|| ∑︀=1 yi||
(7)</p>
      <p>The EficientDet detector was run on the whole dataset after training. Figure 2 shows the
cumulative distribution functions (CDF) of the 1 probabilities of the first bounding boxes on
the test data and the combined training and validation datasets (trainval). The medians are
0.87 and 0.629, respectively. For the second bounding boxes, the medians are 0.068 and 0.146.
Comparing these numbers with the assumption that the vast majority of the images contain a
single snake, the network was more accurate on the test dataset.</p>
      <p>Another conclusion is that the straightforward  = 0.5 choice (i.e., only keeping boxes
for which the network predicts a higher probability of containing a snake than background) is
not necessarily the best. For this value, 30% of the training images would be preserved without
cropping, as the CDF shows. In most cases, these images contained a snake that is harder to
recognize or has a bounding box with an unusual aspect ratio (as the network was fitting boxes
with aspect ratios of 0.5, 1.0, and 2.0).</p>
    </sec>
    <sec id="sec-5">
      <title>5. Classification</title>
      <p>During classification, intermediate results were evaluated on the validation dataset, while scores
on the test set are only available for those attempts, which were submitted for the challenge
and evaluated by the organizers. Accuracy, 1 country, and macro 1 scores were calculated in
both cases (as defined in Section 3), while loss values are only available on the validation data.</p>
      <sec id="sec-5-1">
        <title>5.1. Preprocessing</title>
        <p>The first preprocessing step in the case of Full-train networks (see Tables 2 and 3) was running
the EficientDet object detector on the trainval dataset as described in Section 4, and saving the
results in the TFRecord format. For validation and test datasets,  = 0.2 and ℎℎ = 0.75
were used, while for training data,  = 0.5 was the default.</p>
        <p>Before running the network on a batch of images, some preprocessing steps were needed.
The following steps were executed3 each time an image was read into memory (the images in
the dataset were not modified/overwritten):
1. grayscale to RGB conversion, if needed
2. Rescaling to 224 * 224 * 3 (the standard input format for EficientNet-B0)
†3. Vertical and horizontal flip with probabilities of  = 0.5
†4. Brightness and contrast modification with  = 0.2
5. Float conversion. (scaling from [0, 255] to [0.0, 1.0] and normalization were not executed,
as these are part of the EficientNet model in Keras [39])</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Models</title>
        <p>The parameters used to train the diferent models are shown in Table 2 (note that EficientNet-B0
networks were used in each case). The Baseline model, trained on a dataset of a reduced size,
was provided by the organizers. Min-train was trained with the Tensorflow library using
mostly the same parameters and the same dataset as the Baseline, without using object detection.
The Full-train networks were trained on the complete training dataset with object detection
included. The loss function was categorical cross entropy in all cases.</p>
        <p>Adam optimizer was used for training all the models with a learning rate decay from  
to  . The  1 = 0.9 and  2 = 0.999 values were used as the exponential decay rate for the
ifrst and second moment estimates, respectively. The parameter epochs shows the number of
3Items marked with a † symbol were only executed in the phase of training but not during validation or testing.
loss
epochs before the results on the validation data plateaued, and the training was terminated.
Some models started the training process with frozen layers, leaving only the classification head
trainable to prevent the big initial gradients from destroying the pre-trained weights. However,
all Batch Normalization layers were kept frozen in all the models even after unfreezing the
others. This is a common technique suggested by Keras developers, in order to prevent the
pre-trained weights from being destroyed [39, 40], which is not preferable with a dataset of this
size, due to the risk of overfitting.</p>
        <p>As Table 3 shows, the Full-train networks performed quite similarly. Full-train 1 converged
to the worst optimum, possibly due to the low initial learning rate and the frozen layers in the
beginning. Interestingly, the 1 scores were not inferior to the other networks on the validation
data. This might be caused by some noise in the validation data, or another option is that
the loss does not correlate well with the 1 scores. However, the networks with lower loss
generally predict higher probability for the correct species4, thus, they are a better candidate
for reweighting using location data, as described in Section 6.</p>
        <p>Submission 1 The first submission included the Full-train 2 network. The learning rate was
0.001 in the first 6 epochs, then multiplied by 0.5 or 0.75 in alteration. During object detection,
the parameters were  = 0.5 and ℎℎ = 0.75, and images having a best bounding box
under  were preserved without cropping. Despite recording the highest 1 country score
on the validation data, the loss remained higher than in the next attempts.</p>
        <p>Submission 3 The Full-train 3 model was trained with a similar approach as in the case of
Submission 1. A notable diference was that the batch size was increased to 128 and the initial
learning rate to 0.003. On the validation data, this resulted in a major diference in the loss
values, and a top 92.13% accuracy on the test data.</p>
        <p>Submission 4 The fourth submission was the model Full-train 4. In this attempt,  was
modified to 0.2 on the training data, and images falling below this value during object detection
(5008 altogether) were dropped. The learning rate schedule followed the same pattern as in
4Considering categorical cross entropy loss, the y^ vector contains a single nonzero element referring to the
correct class, where the value is 1. Substituting into the formula, the loss is − log yi, if the ground truth class is .
As yi ≤ 1, the − log yi loss value decreases, as the yi prediction increases.
loss
Submission 1, but started from a higher value of 0.01. The results were lower on the validation
and the test set than those of the previous submission.</p>
        <p>Submission 5 This submission included the Full-train 3 network (as Submission 3), but
during the evaluation of the test set, object detection was not used. This attempt performed
the worst in all metrics (see Table 4), except accuracy on the test set. The diference from other
submissions is higher on the validation set, which can be explained by the lower quality of the
images, where the object detection might greatly improve the visibility of the snake.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Use of location data</title>
      <p>The models described in Section 5.2 did not utilize the location data provided with the images.
This information can be used to predict the class label independently, using frequentist statistics.
Let  be the random variable of the image label (class) and  of the country, then the probability
that a sample belongs to class  can be approximated from  as:
 ( =  |  = ) ≈
#images in the training set from class  and country</p>
      <p>#images in the training set from country</p>
      <p>If the prediction of the neural network for the -th class is  and the image is from country ,
a new prediction can be created by multiplication with probabilities from the location data:
* =  ·  ( =  |  = )
Normalizing * to unit length gives the new prediction vector. However, one could apply Bayes’
Theorem to split the probabilities in the following way:
 ( =  |  = ) =
 ( =  |  = ) ·  ( = )
 ( = )</p>
      <p>As  ( = ) is only a normalizing constant (the same for all ), and the prediction vectors are
normalized to unit length anyway, it can be omitted. The  ( = ) factor is the prior probability
of the -th class. It puts more weight on more frequent classes and less on the least frequent
ones, which might not harm classification accuracy, but is clearly not preferable for macro 1
(8)
(9)
(10)
loss
and country 1 scores, which put equal weights on classes. Therefore, the  ( =  |  = )
likelihood remains, which is approximated as:
 ( =  |  = ) ≈
#images in the training set from class  and country</p>
      <p>#images in the training set from class</p>
      <p>The more training images are present for a class, the more accurate the approximation is.
However, multiplication by 0 is not preferred, as (i) a snake of a particular species might appear
anywhere with a small probability (e.g., a captive snake), (ii) for classes with only a few training
samples, there is a high chance that we do not have samples from each country where the
species is native. Therefore, a small  constant is used:</p>
      <p>* =  · max { ( =  |  = ), }</p>
      <p>The method was tested for the prediction of the Full-train 3 network, as this model had the
lowest loss on the validation dataset. The predictions were evaluated for three diferent  values:
 = 0 (submission 7),  = 7 × 10− 6 (submission 2) and  = 7 × 10− 4 (submission 6). The results
are shown in Table 5. Adding location data to the model increased all the metrics to a great
extent, afecting 1 country the most, which saw an increase of 0.089 on the test data.</p>
      <p>On the test data, there was another major increase in the 1 country score when the  was
not set to 0, while the macro 1 and accuracy scores also rose. The results did not difer much
between  = 7 × 10− 4 and  = 7 × 10− 6; however, 7 × 10− 6 seems to be better in general.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>We elaborated an automated snake species identification system that first applies object
detection to separate the snake(s) from the background then incorporates visual information and
location metadata into a classification algorithm to categorize the detected snakes. EficientDet
and EficientNet were used for object detection and classification, respectively. Based on our
experiments, we can conclude that object detection can positively influence snake identification;
moreover, the inclusion of geographical data showed further significant improvement. Our best
submission achieved an 1 country score of 0.903 and almost 95% classification accuracy in
the oficial evaluation of SnakeCLEF 2021.
(11)
(12)
[14] R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object
detection and semantic segmentation, in: Proceedings of the IEEE conference on
computer vision and pattern recognition, 2014, pp. 580–587. doi:https://doi.org/10.1109/
cvpr.2014.81.
[15] R. Girshick, Fast r-cnn, in: Proceedings of the IEEE international conference on computer
vision, 2015, pp. 1440–1448. doi:https://doi.org/10.1109/iccv.2015.169.
[16] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with
region proposal networks, arXiv preprint arXiv:1506.01497 (2015). doi:https://doi.org/
10.1109/tpami.2016.2577031.
[17] K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask r-cnn, in: Proceedings of the IEEE
international conference on computer vision, 2017, pp. 2961–2969. doi:https://doi.org/
10.1109/iccv.2017.322.
[18] L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chen, X. Liu, M. Pietikäinen, Deep learning for
generic object detection: A survey, International journal of computer vision 128 (2020)
261–318.
[19] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time
object detection, in: Proceedings of the IEEE conference on computer vision and pattern
recognition, 2016, pp. 779–788. doi:https://doi.org/10.1109/cvpr.2016.91.
[20] J. Redmon, A. Farhadi, Yolo9000: better, faster, stronger, in: Proceedings of the IEEE
conference on computer vision and pattern recognition, 2017, pp. 7263–7271. doi:https:
//doi.org/10.1109/cvpr.2017.690.
[21] J. Redmon, A. Farhadi, Yolov3: An incremental improvement, arXiv preprint
arXiv:1804.02767 (2018).
[22] A. Bochkovskiy, C.-Y. Wang, H.-Y. M. Liao, Yolov4: Optimal speed and accuracy of object
detection, arXiv preprint arXiv:2004.10934 (2020).
[23] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. C. Berg, Ssd: Single shot
multibox detector, in: European conference on computer vision, Springer, 2016, pp. 21–37.
doi:https://doi.org/10.1007/978-3-319-46448-0_2.
[24] T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, in:
Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988.
doi:https://doi.org/10.1109/iccv.2017.324.
[25] M. Tan, R. Pang, Q. V. Le, Eficientdet: Scalable and eficient object detection, in:
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp.
10781–10790. doi:https://doi.org/10.1109/cvpr42600.2020.01079.
[26] L. Picek, I. Bolon, A. M. Durso, R. R. de Castañeda, Overview of the SnakeCLEF 2020:</p>
      <p>Automatic Snake Species Identification Challenge, 2020.
[27] A. P. James, B. Mathews, S. Sugathan, D. K. Raveendran, Discriminative histogram
taxonomy features for snake species identification, Human-Centric Computing and Information
Sciences 4 (2014) 1–11. doi:https://doi.org/10.1186/s13673-014-0003-0.
[28] A. Amir, N. A. H. Zahri, N. Yaakob, R. B. Ahmad, Image classification for snake species using
machine learning techniques, in: International Conference on Computational Intelligence
in Information System, Springer, 2016, pp. 52–59.
[29] A. Patel, L. Cheung, N. Khatod, I. Matijosaitiene, A. Arteaga, J. W. Gilkey, Revealing the
unknown: real-time recognition of galápagos snake species using deep learning, Animals
10 (2020) 806. doi:https://doi.org/10.3390/ani10050806.
[30] I. S. Abdurrazaq, S. Suyanto, D. Q. Utama, Image-based classification of snake species
using convolutional neural network, in: 2019 International Seminar on Research
of Information Technology and Intelligent Systems (ISRITI), IEEE, 2019, pp. 97–102.
doi:https://doi.org/10.1109/isriti48646.2019.9034633.
[31] Z. Yang, R. Sinnott, Snake detection and classification using deep learning, in: Proceedings
of the 54th Hawaii International Conference on System Sciences, 2021, p. 1212. doi:https:
//doi.org/10.24251/hicss.2021.148.
[32] J. Bromley, I. Guyon, Y. LeCun, E. Säckinger, R. Shah, Signature verification using a"
siamese" time delay neural network, Advances in neural information processing systems
6 (1993) 737–744. doi:https://doi.org/10.1142/s0218001493000339.
[33] G. Szűcs, M. Németh, Double-view matching network for few-shot learning to
classify covid-19 in x-ray images, INFOCOMMUNICATIONS JOURNAL 13 (2021) 26–34.
doi:https://doi.org/10.36244/icj.2021.1.4.
[34] C. Abeysinghe, A. Welivita, I. Perera, Snake image classification using siamese networks, in:
Proceedings of the 2019 3rd International Conference on Graphics and Signal Processing,
2019, pp. 8–12. doi:https://doi.org/10.1145/3338472.3338476.
[35] R. M. Putra, D. Q. Utama, et al., Snake bite classification using chain code and k nearest
neighbour, in: Journal of Physics: Conference Series, volume 1192, IOP Publishing, 2019,
p. 012015. doi:https://doi.org/10.1088/1742-6596/1192/1/012015.
[36] GokulaKrishnan, Diving into deep learning — part 3 — a deep learning
practitioner’s attempt to build state of the art snake-species image classifier, 2019. URL:
https://medium.com/@Stormblessed/diving-into-deep-learning-part-3-a-deep-learningpractitioners-attempt-to-build-state-of-the-2460292bcfb.
[37] L. Bloch, A. Boketta, C. Keibel, E. Mense, A. Michailutschenko, O. Pelka, J. Rückert,
L. Willemeit, C. Friedrich, Combination of image and location information for snake
species identification using object detection and eficientnets, In: CLEF working notes
2020, CLEF: Conference and Labs of the Evaluation Forum (2020).
[38] Tzutalin, Labelimg, 2015. URL: https://github.com/tzutalin/labelImg.
[39] Keras documentation: Image classification via fine-tuning with EficientNet, 2020. URL:
https://keras.io/examples/vision/image_classification_eficientnet_fine_tuning/.
[40] Keras documentation: Transfer learning &amp; fine-tuning, 2020. URL: https://keras .io/guides/
transfer_learning/.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>W. H. O.</surname>
          </string-name>
          (WHO),
          <source>Snakebite envenoming - Key Facts</source>
          <year>2021</year>
          ,
          <year>2021</year>
          . URL: https://www.who.int/ news-room/fact-sheets/detail/snakebite-envenoming.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>I.</given-names>
            <surname>Bolon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Durso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Botero</given-names>
            <surname>Mesa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Alcoba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Chappuis</surname>
          </string-name>
          , R. Ruiz de Castañeda,
          <article-title>Identifying the snake: First scoping review on practices of communities and healthcare providers confronted with snakebite across the world</article-title>
          ,
          <source>PLOS ONE 15</source>
          (
          <year>2020</year>
          )
          <article-title>e0229989</article-title>
          . doi:https://doi.org/10.1371/journal.pone.
          <volume>0229989</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pathmeswaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kasturiratne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fonseka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nandasena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lalloo</surname>
          </string-name>
          , H. De Silva,
          <article-title>Identifying the biting species in snakebite by clinical features: an epidemiological tool for community surveys</article-title>
          ,
          <source>Transactions of the Royal Society of Tropical Medicine and Hygiene</source>
          <volume>100</volume>
          (
          <year>2006</year>
          )
          <fpage>874</fpage>
          -
          <lpage>878</lpage>
          . doi:https://doi.org/10.1016/j.trstmh.
          <year>2005</year>
          .
          <volume>10</volume>
          .003.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>R. R. de Castañeda</surname>
            ,
            <given-names>A. M.</given-names>
          </string-name>
          <string-name>
            <surname>Durso</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ray</surname>
            ,
            <given-names>J. L.</given-names>
          </string-name>
          <string-name>
            <surname>Fernández</surname>
            ,
            <given-names>D. J.</given-names>
          </string-name>
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Alcoba</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Chappuis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Salathé</surname>
            ,
            <given-names>I. Bolon</given-names>
          </string-name>
          ,
          <article-title>Snakebite and snake identification: empowering neglected communities and health-care providers with ai</article-title>
          ,
          <source>THe Lancet Digital Health</source>
          <volume>1</volume>
          (
          <year>2019</year>
          )
          <fpage>e202</fpage>
          -
          <lpage>e203</lpage>
          . doi:https://doi.org/10.1016/s2589-
          <volume>7500</volume>
          (
          <issue>19</issue>
          )
          <fpage>30086</fpage>
          -
          <lpage>x</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Durso</surname>
          </string-name>
          , R. Ruiz De Castañeda,
          <string-name>
            <surname>I. Bolon</surname>
          </string-name>
          , Overview of snakeclef 2021:
          <article-title>Automatic snake species identification with country-level focus</article-title>
          ,
          <source>in: Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Goëau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lorieul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Cole</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Deneu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          , R. Ruiz De Castañeda,
          <string-name>
            <given-names>G. H.</given-names>
            <surname>Bolon</surname>
          </string-name>
          , Isabelle,
          <string-name>
            <given-names>R.</given-names>
            <surname>Planqué</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-P.</given-names>
            <surname>Vellinga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dorso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          , I. Eggel,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          , Overview of lifeclef
          <year>2021</year>
          :
          <article-title>a system-oriented evaluation of automated species identification and species distribution prediction</article-title>
          ,
          <source>in: Proceedings of the Twelfth International Conference of the CLEF Association (CLEF</source>
          <year>2021</year>
          ),
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Druzhkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kustikova</surname>
          </string-name>
          ,
          <article-title>A survey of deep learning methods and software tools for image classification and object detection</article-title>
          ,
          <source>Pattern Recognition and Image Analysis</source>
          <volume>26</volume>
          (
          <year>2016</year>
          )
          <fpage>9</fpage>
          -
          <lpage>15</lpage>
          . doi:https://doi.org/10.1134/s1054661816010065.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W.</given-names>
            <surname>Rawat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Deep convolutional neural networks for image classification: A comprehensive review</article-title>
          ,
          <source>Neural computation 29</source>
          (
          <year>2017</year>
          )
          <fpage>2352</fpage>
          -
          <lpage>2449</lpage>
          . doi:https://doi.org/ 10.1162/neco_a_
          <fpage>00990</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Simonyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          ,
          <article-title>Very deep convolutional networks for large-scale image recognition</article-title>
          ,
          <source>arXiv preprint arXiv:1409.1556</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Szegedy</surname>
          </string-name>
          , W. Liu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sermanet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Reed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Anguelov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Erhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vanhoucke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rabinovich</surname>
          </string-name>
          ,
          <article-title>Going deeper with convolutions</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          . doi:https://doi.org/10.1109/ cvpr.
          <year>2015</year>
          .
          <volume>7298594</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Deep residual learning for image recognition</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>770</fpage>
          -
          <lpage>778</lpage>
          . doi:https://doi.org/10.1109/cvpr.
          <year>2016</year>
          .
          <volume>90</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Le</surname>
          </string-name>
          , Eficientnet:
          <article-title>Rethinking model scaling for convolutional neural networks</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>6105</fpage>
          -
          <lpage>6114</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Howard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kalenichenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Weyand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Andreetto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Adam</surname>
          </string-name>
          , Mobilenets:
          <article-title>Eficient convolutional neural networks for mobile vision applications</article-title>
          ,
          <source>arXiv preprint arXiv:1704.04861</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>