<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multibranch Co-training to Mine Venomous Feature Representation: A Solution to SnakeCLEF2024</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Peng Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yangyang Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bao-Feng Tan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yi-Chao Zhou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yong Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiu-Shen Wei</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Science and Engineering, Nanjing University of Science and Technology</institution>
          ,
          <addr-line>Nanjing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computer Science and Engineering, and Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Southeast University</institution>
          ,
          <addr-line>Nanjing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The SnakeCLEF2024 competition aims to develop an advanced algorithm capable of automatically identifying snake species from images. Accurate identification of snake species in snakebite cases can assist doctors in administering targeted antivenom, which is crucial for efective treatment. In this paper, we propose a multibranch co-training strategy based on Convolutional Neural Networks (CNNs) as the solution. During the training phase, our method consists of three branches which can be trained end-to-end. The first branch is used for the classification of all species and generates a gating coeficient. The second branch specifically focuses on venomous snakes, while the third branch concentrates on harmless species. The gating coeficient determines which of these branches will be utilized. During the inference phase, we only retain the first branch. Our solution significantly enhances the model's ability to distinguish between venomous and harmless snake species and achieve an accuracy of 69.83% and scored 83.57% on the track1 on the private leaderboard, which is the 1st place among all participants. The code is available at https://huggingface.co/pengdadaaa/SnakeCLEF2024.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Snake Species Identification</kwd>
        <kwd>Fine-grained image recognition</kwd>
        <kwd>Long-tailed</kwd>
        <kwd>SnakeCLEF</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The SnakeCLEF2024 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] competition, co-hosted as part of the LifeCLEF2024 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] within the CLEF2024
conference and the FGVC11 workshop in conjunction with the CVPR2024 conference, aims to advance
the development of robust algorithms for snake species identification from images. Each year, snakebites
result in an annual mortality of between 81,000 and 138,000 people, and an additional 400,000 victims
sufer from incurable physical and psychological disabilities [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. Accurate identification of snake
species is crucial for administering the correct antivenom, which can significantly reduce the number
of fatalities and disabilities caused by snakebites. Furthermore, snake species identification can improve
the protection of harmful snakes, reducing the number of snakes killed out of fear. This objective is
profoundly significant for biodiversity conservation and is a crucial aspect of human health preservation.
      </p>
      <p>
        Compared to SnakeCLEF2023 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the test data of SnakeCLEF2024 contains only image information
without metadata, making it more practical but also more challenging for accurate recognition. Unlike
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], we focus on enhancing the model’s capacity to mine distinguishable features for recognizing
venomous and harmless species and provide an eficient solution. Specifically, we use the first three
stages of the CNN as the basic feature extractor. The fourth stage and a fully connected layer are
considered as experts responsible for making predictions, constructing a model similar to a mixture
of experts. Experimental results show that through end-to-end co-training, our method efectively
improves model performance and achieves significant improvements in multiple metrics.
      </p>
      <p>This paper follows the structure as outlined: We first describe the related work in this field in Section 2.
Then, in Section 3, we analyze the competition data and challenges in detail. In Section 4, we describe
our method. Section 5 provides detailed experimental details and results. Finally, we summarize this
work briefly.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The problem of automatic snake recognition has been studied for a long time. Early research was based
on manually designed rules to propose features beneficial for snake classification, intended for use by
computer scientists and herpetologists [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. A. Amir et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] was the first to use texture-based features
along with various machine learning algorithms for automatic snake recognition. With the development
of deep learning, CNN networks have made tremendous progress in image classification tasks [
        <xref ref-type="bibr" rid="ref10 ref11 ref9">9, 10, 11</xref>
        ].
I. S. Abdurraza et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] successfully developed a CNN-based automatic snake classification algorithm,
achieving high accuracy. During the same period, many other snake recognition algorithms based on
deep learning were also proposed.
      </p>
      <p>
        The winning method of SnakeCLEF 2021 [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ] combined object detection with an
EficientDetD1 [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] model, and an EficientNet-B0 classifier as well as likelihood weighting to fuse image and
location information. The best model reached a macro-averaging F1 score of 90.30%. In SnakeCLEF
2022 [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], one team [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] used YOLOv5 [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] to first detect the specific location of the snake in the
image, and then used a CNN network for classification, while also utilizing metadata to statistically
determine the regional distribution of snake species. They also employed various strategies such as
test-time augmentation and model ensembling. In SnakeCLEF 2023 [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], the winning team [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] used
CLIP [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] to process metadata and leveraged intermediate layer features from CNNs to aid in the final
classification decision. Additionally, they designed a post-processing strategy to determine whether the
snake was venomous. In previous competitions, some teams also used attention-based models such as
MetaFormer [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], ViT [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], and VOLO [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Competition Description</title>
      <p>Understanding datasets and metrics is essential for participating in this competition. Within this section,
we aim to introduce our comprehension of the datasets and provide an overview of the evaluation
metrics employed by the competition organizers.</p>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>The organizers provide a dataset, consisting of 103,404 recorded snake observations, supplemented by
182,261 high-resolution images. These observations encompass a diverse range of 1,784 distinct snake
species.</p>
        <p>Fine-gained Image This dataset presents a challenging fine-grained image classification task, as
illustrated in Figure 1. Our objective is to accurately identify diferent species. While these species
share many visual similarities, they exhibit only subtle diferences in fine-grained features. Accurately
distinguishing these species demands models capable of identifying subtle yet significant diferences.
Long-tailed Distribution It is worth noting that the provided training dataset is in a heavily
longtailed distribution, as shown in Figure 2. In this distribution, the most frequently encountered species
are represented by 1,891 images. However, the least frequently encountered species is captured by
a mere 3 images, highlighting its exceptional rarity within the dataset. The number of images for
venomous snakes is 32,379, while the number of images for harmless snakes is 135,348, which also
represents an imbalanced distribution.
(a) Ahaetulla_malabarica
(b) Ahaetulla_nasuta
(c) Ahaetulla_oxyrhynca</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Evaluation Metric</title>
        <p>To motivate research in recognition scenarios with uneven costs for diferent errors, such as mistaking
a venomous snake for a harmless one, this competition will again go beyond the 0-1 loss common in
classification. This year’s competition incorporates a evaluation metric, denoted as “track1” on the
leaderboard. This metric combines the F1-Score with an assessment of the confusion errors related to
venomous species. It is calculated as a weighted average, incorporating both the macro F1-score and
the weighted accuracy of various types of confusions:
 = 11 + 2 (100 − 1) + 3 (100 − 2) + 4 (100 − 3) + 5 (100 − 4) ,
(1)
∑︀5 
where 1 = 1.0, 2 = 1.0, 3 = 2.0, 4 = 5.0, 5 = 2.0 are the weights of individual terms. The
metric incorporates several percentages, namely 1 representing the macro F1-score, 1 denoting the
percentage of harmless species misclassified as another harmless species, 2 indicating the percentage
of harmless species misclassified as a venomous species, 3 reflecting the percentage of venomous
species misclassified as another harmless species, and 4 representing the percentage of venomous
species misclassified as another venomous species.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Challenges of the Competition</title>
        <p>
          Past iterations of this competition have witnessed remarkable accomplishments by deep learning
models [
          <xref ref-type="bibr" rid="ref13 ref14 ref16 ref24 ref25 ref26 ref27">13, 14, 24, 16, 25, 26, 27</xref>
          ]. To achieve a better solution, we summarize the competition challenges
this year based on the above analysis:
• Fine-grained image recognition: The field of fine-grained image analysis [
          <xref ref-type="bibr" rid="ref28 ref29 ref30">28, 29, 30</xref>
          ] has long
posed a challenging problem within the FGVC workshop, meriting further investigation and study.
This year’s competition lacks available metadata for the test images, increasing the requirements
for understanding subtle image features and making the task more challenging.
• Long-tailed distribution: This dataset has a heavily long-tail distribution. The imbalance of data
in the tail class leads to insuficient generalization ability of models in these categories, making it
dificult for models to efectively learn and recognize tail class instances.
• Identification of venomous and harmless species: The distinction between venomous and harmless
snake species is meaningful, as venomous snake bites lead to a large number of deaths each year.
• Limited computational resources: We need to process approximately 10,000 images within one
hour on a server with an Nvidia T4, small 4vCPU, 15GB RAM, and 16GB VRAM.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Method</title>
      <p>In this section, we provide a detailed description of our method.</p>
      <sec id="sec-4-1">
        <title>4.1. Data Preprocessing</title>
        <p>Data preprocessing plays a crucial role in machine learning, as it not only influences the final
performance but also afects the feasibility of problem resolution. Upon obtaining the dataset provided by
the competition organizers, we encountered several issues. For instance, certain images listed in the
metadata CSV file were nonexistent in the corresponding image folders. To address this, we generated
a new CSV file by eliminating the afected rows from the original file.</p>
        <p>
          Data augmentation plays a vital role in image classification tasks by expanding the scale and diversity
of training data through a series of algorithms and techniques, efectively addressing the issue of
overfitting. By applying a variety of image transformation operations, data augmentation significantly
enhances the diversity of datasets, enabling models to learn more robust and comprehensive feature
representations. In our method, we leverage fundamental image augmentation methods from
Albumentations [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ], including RandomResizedCrop, Transpose, HorizontalFlip, VerticalFlip, ShiftScaleRotate,
RandomBrightnessContrast, PiecewiseAfine, HueSaturationValue, OpticalDistortion, ElasticTransform,
Cutout, and GridDistortion. Furthermore, we incorporate data mixing augmentation techniques such as
CutMix [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ] and TokenMix [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] during the competition. These methods provide strong regularization
to models by softening both images and labels, thus preventing model overfitting on the training
dataset. During the inference stage, we also employ Test-Time Augmentation (TTA) by applying various
augmentation methods to each input image, generating multiple augmented versions. These augmented
images are then individually processed by the model to obtain multiple sets of predictions. Finally,
these predictions are averaged to produce the final prediction.
4.2. Model
Throughout the competition, we explored various models, incorporating both classical and
state-ofthe-art architectures such as Convolutional Neural Networks and Vision Transformers. The models
employed during the competition included ConvNeXt [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ], ConvNeXt-v2 [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ], and EVA-02 [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ]. The
implementation of these models was facilitated by the use of the timm library [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ]. Considering the
limitations on model parameters and the need for robust model representation capabilities, we selected
ConvNeXt [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ] or ConvNeXt-v2 [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ] as the backbone architectures for our final method.
        </p>
        <p>
          However, relying solely on the visual backbone and training it with the classical classification strategy
is insuficient for efectively addressing the task at hand. To make the model focus on distinguishable
features that can diferentiate between venomous and harmless species, we propose a multibranch
co-training method as our final submission. The model architecture is illustrated in Figure 3. Inspired
by [
          <xref ref-type="bibr" rid="ref38 ref39 ref40">38, 39, 40</xref>
          ], our method primarily involves three branches, which are processed sequentially from
Image
Data
        </p>
        <p>ConvNeXt
Stage1-3</p>
        <p>ConvNeXt</p>
        <p>Stage4</p>
        <p>GMP
ConvNeXt</p>
        <p>Stage4
ConvNeXt</p>
        <p>Stage4</p>
        <p>GMP
C
GAP
GAP
top to bottom as shown in Figure 3. Each branch uses the same residual network structure and shares
weights for the first three stages.</p>
        <p>Given an image , we obtain feature maps from the first three stages and from the fourth stage,
denoted as 3 and 4 respectively, after processing it through the first branch. The feature map from
the fourth stage(4) undergoes global average pooling and is passed through a classification head to
obtain 1. Additionally, we concatenate the features obtained from global max pooling of 3 and
4, and pass them through a fully connected layer and sigmoid function to obtain  , which acts as a
gating coeficient to select between the 2nd and 3rd branches.In our method, the 1st branch, serving as
the primary branch, can identify all snake species and generate the gating coeficient  .</p>
        <p>The 2nd branch focuses on venomous species (venomous branch), while the 3rd branch focuses on
harmless species (harmless branch). To make these branches concentrate on their respective tasks,
we use binary masks generated from the coarse labels (venomous or harmless) to stop the gradient.
We combine the outputs of the 2nd and 4rd branches according to the gating coeficient  to obtain
2, and by summing 1 and 2, we can directly derive 3. All three obtained logits
are utilized during the training phase. However, in the inference phase, we select only one of them,
which is then passed through the softmax(· ) function to produce the predicted probability for an input
image (refer to the Table 3 for specific selection).</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.3. Optimization Procedure</title>
        <p>For the classification, the most widely adopted Cross-Entropy(CE) Loss can be written as:
(z) = −

∑︁  log( ),
=1
with   =
,
(2)
where z = [1, 2, . . . ,  ] and  = [ 1,  2, . . . ,   ] are the predicted logits and probabilities of the
classifier, respectively. And  ∈ {0, 1}, 1 ≤  ≤  is the one-hot ground truth label. However, the
classifier trained by the widely applied CE Loss is highly biased on long-tailed datasets, resulting in
(3)
(4)
seesaw (z) = −

∑︁  log (̂︀) , with ̂︀ =
=1</p>
        <p>∑︀̸=   + 
.</p>
        <p>The hyper-parameters  are carefully set based on the distribution characteristics inherent in the
dataset. As shown in Figure 3, for a input image processed by the model, we obtain 3 predicted logits.
For each predicted logits, we calculate the loss using either CE loss or Seesaw loss (refer to the Table 3
for specific configurations). The final loss is:
 =
1(1) + 2(2) + 3(3) .</p>
        <p>
          3
In addition to the choice of loss functions, the selection of an optimizer and an appropriate learning
rate decay strategy are important in the training of our models. For optimization, we adopt the
AdamW optimizer [45]. To enhance convergence speed and overall performance, we implement cosine
learning rate decay [46] coupled with warmup techniques during the training process. These strategies
collectively facilitate more efective and eficient model convergence.
much lower accuracy of tail classes than head classes. To tackle this challenge, we extensively explored
various techniques implemented in [
          <xref ref-type="bibr" rid="ref41 ref42">41, 42, 43</xref>
          ]. In our final submission, we incorporated the seesaw
loss [44] as a key component. The seesaw loss formulation can be expressed as follows:
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments</title>
      <p>In this section, we will introduce our implementation details and main results.</p>
      <sec id="sec-5-1">
        <title>5.1. Experiment Settings</title>
        <p>
          The proposed method was developed using the PyTorch framework [47]. All the pretrained weights used
in our experiments come from the timm library [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ]. Fine-tuning of these models was conducted across
four Nvidia RTX 3090 GPUs. The total number of training epochs was set to 15, with the first epoch
dedicated to warm-up. To optimize the model parameters, we utilized the AdamW optimizer [45] in
conjunction with a cosine learning rate scheduler [46]. During inference on the test dataset, considering
that an observation may consist of multiple images, we average the predicted probabilities from diferent
images of the same ID to obtain the final prediction for each observation.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Main Results</title>
        <p>
          In this section, we present our primary experiment results. Unless otherwise specified, the model is
trained by using Seesaw loss and performs inference in float32. First, we present some basic experimental
results. Table 1 and Table 2 respectively show the results of diferent backbones on the validation set
or the public leaderboard. Based on our experimental results and the experiences of past winners, we
chose the ConvNeXt series models [
          <xref ref-type="bibr" rid="ref34 ref35">34, 35</xref>
          ] as our backbone and used a resolution of 512× 512.
        </p>
        <p>After selecting the basic backbone, we conducted experiments using the multibranch co-training
strategy proposed in the Section 4. We used 1 and 3 with softmax(· ) to obtain final
prediction results respectively. The experimental results are shown in Table 3. Based on the experimental
results, we use 1 for the final prediction and directly drop the last three branches during the
inference stage to reduce computational overhead. Multiple evaluation metrics indicate that our solution
can efectively improve model performance.</p>
        <p>We ensemble the two best-performing models from Table 3, using the average of the output
probabilities from the two models as the final submission result. Considering the limited computational
resources, we use half-precision (float16) during inference. The ensembled result achieved first place on
both the public leaderboard and the private leaderboard.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Further Discussion</title>
      <p>Public Test Metrics</p>
      <p>Here, we briefly discuss our method. Our primary motivation is to enhance the model’s ability to
distinguish between venomous and harmless snake species. Building on this motivation, in addition to
classifying all snake species (a total of 1784), our method also indirectly addresses a binary classification
problem. The mask in our method serves as the supervisory information for this binary classification
task. Specifically, when a venomous image is input, optimizing 2 using CE loss or Seesaw loss
will increase  (for a harmless image, it increases 1 −  ). By generating  with GMP and introducing
the binary classification supervisory information, we actually apply a constraint to the parameters
of the first branch, ensuring that the maximum activation value is directly associated with being
venomous or harmless. We guess that this constraint enables the network to efectively mine the feature
representation indicative of venomousness, leading to the improvement of performance. We did not
explore the method in greater depth to demonstrate its interpretability. However, we believe that further
exploration into fully utilizing the binary classification supervisory information is worthwhile.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>This paper focused on addressing the snake classification problem. In our solution, we used the
GMP operation and a fully connected layer to generate the gating coeficient  , which determines
the maximum activation value of the feature map associated with whether a snake is venomous and
trained three branches end-to-end. Our multibranch co-training strategy has demonstrated significant
efectiveness in this competition, achieving a track1 score of 83.57% on the private leaderboard.
of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 235–244.
[43] X.-S. Wei, S.-L. Xu, H. Chen, L. Xiao, Y. Peng, Prototype-based classifier learning for long-tailed
visual recognition, Science China Information Sciences 65 (2022) 160105.
[44] J. Wang, W. Zhang, Y. Zang, Y. Cao, J. Pang, T. Gong, K. Chen, Z. Liu, C. C. Loy, D. Lin, Seesaw
loss for long-tailed instance segmentation, in: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, 2021, pp. 9695–9704.
[45] I. Loshchilov, F. Hutter, Decoupled weight decay regularization, arXiv preprint arXiv:1711.05101
(2017).
[46] I. Loshchilov, F. Hutter, Sgdr: Stochastic gradient descent with warm restarts, arXiv preprint
arXiv:1608.03983 (2016).
[47] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein,
L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy,
B. Steiner, L. Fang, J. Bai, S. Chintala, Pytorch: An imperative style, high-performance deep
learning library, in: Advances in Neural Information Processing Systems, 2019, pp. 8024–8035.
[48] M. Tan, Q. Le, Eficientnet: Rethinking model scaling for convolutional neural networks, in:
International conference on machine learning, PMLR, 2019, pp. 6105–6114.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hruz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Durso</surname>
          </string-name>
          , Overview of SnakeCLEF 2024:
          <article-title>Revisiting snake species identification in medically important scenarios</article-title>
          ,
          <source>in: Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Goëau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Espitalier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Botella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Deneu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Marcos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Estopinan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leblanc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Larcher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Šulc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hrúz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          , et al.,
          <source>Overview of lifeclef</source>
          <year>2024</year>
          :
          <article-title>Challenges on species distribution prediction and identification</article-title>
          ,
          <source>in: International Conference of the CrossLanguage Evaluation Forum for European Languages</source>
          , Springer,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Gutiérrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Calvete</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Habib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Harrison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Warrell</surname>
          </string-name>
          , Snakebite envenoming,
          <source>Nature reviews Disease primers 3</source>
          (
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Bracke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bagherifar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bloch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Friedrich</surname>
          </string-name>
          ,
          <article-title>Joint feature learning of image data with embedded metadata to leverage snake species classification (</article-title>
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Botella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Goëau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Deneu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Marcos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Estopinan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leblanc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Larcher</surname>
          </string-name>
          , et al.,
          <source>Overview of lifeclef</source>
          <year>2023</year>
          <article-title>: evaluation of ai models for the identification and prediction of birds, plants, snakes and fungi</article-title>
          ,
          <source>in: International Conference of the Cross-Language Evaluation Forum for European Languages</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>416</fpage>
          -
          <lpage>439</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.-S.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <article-title>Watch out venomous snake species: A solution to snakeclef2023</article-title>
          ,
          <source>arXiv preprint arXiv:2307.09748</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A. P.</given-names>
            <surname>James</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mathews</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sugathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. K.</given-names>
            <surname>Raveendran</surname>
          </string-name>
          ,
          <article-title>Discriminative histogram taxonomy features for snake species identification</article-title>
          ,
          <source>Human-Centric Computing and Information Sciences</source>
          <volume>4</volume>
          (
          <year>2014</year>
          )
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Amir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A. H.</given-names>
            <surname>Zahri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yaakob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. B.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <article-title>Image classification for snake species using machine learning techniques</article-title>
          ,
          <source>in: Computational Intelligence in Information Systems: Proceedings of the Computational Intelligence in Information Systems Conference (CIIS</source>
          <year>2016</year>
          ), Springer,
          <year>2017</year>
          , pp.
          <fpage>52</fpage>
          -
          <lpage>59</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>Imagenet classification with deep convolutional neural networks</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>60</volume>
          (
          <year>2017</year>
          )
          <fpage>84</fpage>
          -
          <lpage>90</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>K.</given-names>
            <surname>Simonyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          ,
          <article-title>Very deep convolutional networks for large-scale image recognition</article-title>
          ,
          <source>arXiv preprint arXiv:1409.1556</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Deep residual learning for image recognition</article-title>
          ,
          <source>in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>770</fpage>
          -
          <lpage>778</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>I. S.</given-names>
            <surname>Abdurrazaq</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Suyanto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Q.</given-names>
            <surname>Utama</surname>
          </string-name>
          ,
          <article-title>Image-based classification of snake species using convolutional neural network</article-title>
          , in: 2019
          <source>International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>102</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Chamidullin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Šulc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <article-title>A deep learning method for visual recognition of snake species</article-title>
          , Working Notes of CLEF (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Durso</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Bolon</surname>
          </string-name>
          , R. R. de Castañeda, Overview of snakeclef 2021:
          <article-title>Automatic snake species identification with country-level focus</article-title>
          , Working Notes of CLEF (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <article-title>Eficientdet: Scalable and eficient object detection</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>10781</fpage>
          -
          <lpage>10790</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hrúz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Durso</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Bolon</surname>
          </string-name>
          , Overview of snakeclef 2022:
          <article-title>Automated snake species identification on a global scale</article-title>
          , Working Notes of CLEF (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bloch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-F.</given-names>
            <surname>Böckmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bracke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Friedrich</surname>
          </string-name>
          ,
          <article-title>Combination of object detection, geospatial data, and feature concatenation for snake species identification</article-title>
          .,
          <source>in: CLEF (Working Notes)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1982</fpage>
          -
          <lpage>2013</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>G.</given-names>
            <surname>Jocher</surname>
          </string-name>
          , Yolov5 by ultralytics,
          <year>2020</year>
          . URL: https://github.com/ultralytics/yolov5. doi:
          <volume>10</volume>
          .5281/ zenodo.3908559.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Šulc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chamidullin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Durso</surname>
          </string-name>
          , Overview of snakeclef 2023:
          <article-title>snake identification in medically important scenarios</article-title>
          ,
          <source>CLEF</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hallacy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          , G. Goh,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mishkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          , et al.,
          <article-title>Learning transferable visual models from natural language supervision</article-title>
          ,
          <source>in: International conference on machine learning, PMLR</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>8748</fpage>
          -
          <lpage>8763</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Diao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <article-title>Metaformer: A unified meta framework for fine-grained recognition</article-title>
          ,
          <source>arXiv preprint arXiv:2203.02751</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dosovitskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Beyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kolesnikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Weissenborn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Unterthiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dehghani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Minderer</surname>
          </string-name>
          , G. Heigold,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gelly</surname>
          </string-name>
          , et al.,
          <article-title>An image is worth 16x16 words: Transformers for image recognition at scale</article-title>
          ,
          <source>Proceedings of the International Conference on Learning Representations</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>L.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <article-title>Volo: Vision outlooker for visual recognition</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Bolon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Durso</surname>
          </string-name>
          , R. R. de Castañeda,
          <article-title>Overview of the snakeclef 2020: Automatic snake species identification challenge</article-title>
          , Working Notes of CLEF (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bloch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Boketta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Keibel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Mense</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Michailutschenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rückert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Willemeit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Friedrich</surname>
          </string-name>
          ,
          <article-title>Combination of image and location information for snake species identification using object detection and eficientnets</article-title>
          , Working Notes of CLEF (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          , Y. Cheng,
          <article-title>Solutions for fine-grained and long-tailed snake species recognition in snakeclef 2022</article-title>
          , arXiv preprint arXiv:
          <volume>2207</volume>
          .01216 (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>F.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.-S.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <article-title>A deep learning based solution to fungiclef2023</article-title>
          ,
          <string-name>
            <surname>Aliannejadi</surname>
          </string-name>
          et al.[
          <volume>1</volume>
          ] (
          <year>2023</year>
          )
          <fpage>2051</fpage>
          -
          <lpage>2059</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>X.-S.</given-names>
            <surname>Wei</surname>
          </string-name>
          , Y.-
          <string-name>
            <given-names>Z.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. Mac</given-names>
            <surname>Aodha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Belongie</surname>
          </string-name>
          ,
          <article-title>Fine-grained image analysis with deep learning: A survey</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>44</volume>
          (
          <year>2021</year>
          )
          <fpage>8927</fpage>
          -
          <lpage>8948</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>X.-S.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <article-title>Attribute-aware deep hashing with self-consistency for large-scale fine-grained image retrieval</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>X.-S.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-H.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.-H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Selective convolutional descriptor aggregation for finegrained image retrieval</article-title>
          ,
          <source>IEEE transactions on image processing 26</source>
          (
          <year>2017</year>
          )
          <fpage>2868</fpage>
          -
          <lpage>2881</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>A.</given-names>
            <surname>Buslaev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. I.</given-names>
            <surname>Iglovikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Khvedchenya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Parinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Druzhinin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Kalinin</surname>
          </string-name>
          ,
          <article-title>Albumentations: Fast and flexible image augmentations</article-title>
          ,
          <source>Information</source>
          <volume>11</volume>
          (
          <year>2020</year>
          )
          <fpage>125</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yun</surname>
          </string-name>
          , D. Han,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Oh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Choe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yoo</surname>
          </string-name>
          , Cutmix:
          <article-title>Regularization strategy to train strong classifiers with localizable features</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF International Conference on Computer Vision</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>6023</fpage>
          -
          <lpage>6032</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          , Tokenmix:
          <article-title>Rethinking image mixing for data augmentation in vision transformers</article-title>
          ,
          <source>in: European Conference on Computer Vision</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>455</fpage>
          -
          <lpage>471</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-Y. Wu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Feichtenhofer</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Darrell</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Xie</surname>
          </string-name>
          ,
          <article-title>A convnet for the 2020s</article-title>
          ,
          <source>in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>11976</fpage>
          -
          <lpage>11986</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>S.</given-names>
            <surname>Woo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Debnath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. S.</given-names>
            <surname>Kweon</surname>
          </string-name>
          , S. Xie, ConvNeXt V2:
          <article-title>Co-designing and scaling convnets with masked autoencoders</article-title>
          ,
          <source>arXiv preprint arXiv:2301.00808</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          , Eva-02:
          <article-title>A visual representation for neon genesis</article-title>
          ,
          <source>arXiv preprint arXiv:2303.11331</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>R.</given-names>
            <surname>Wightman</surname>
          </string-name>
          , Pytorch image models, https://github.com/rwightman/pytorch-image-models,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.-S.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.-M.</given-names>
            <surname>Chen</surname>
          </string-name>
          , Bbn:
          <article-title>Bilateral-branch network with cumulative learning for long-tailed visual recognition</article-title>
          ,
          <source>in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>9716</fpage>
          -
          <lpage>9725</lpage>
          . doi:
          <volume>10</volume>
          .1109/CVPR42600.
          <year>2020</year>
          .
          <volume>00974</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Miao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. X.</surname>
          </string-name>
          <article-title>Yu, Long-tailed recognition by routing diverse distributionaware experts</article-title>
          , arXiv preprint arXiv:
          <year>2010</year>
          .
          <year>01809</year>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Jacobs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. I.</given-names>
            <surname>Jordan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Nowlan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>Adaptive mixtures of local experts</article-title>
          ,
          <source>Neural computation 3</source>
          (
          <year>1991</year>
          )
          <fpage>79</fpage>
          -
          <lpage>87</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , J. Wu,
          <article-title>Bag of tricks for long-tailed visual recognition with deep convolutional neural networks</article-title>
          ,
          <source>in: Proceedings of AAAI Conference on Artificial Intelligence</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>3447</fpage>
          -
          <lpage>3455</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>Y.-Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
            ,
            <given-names>X.-S.</given-names>
          </string-name>
          <string-name>
            <surname>Wei</surname>
          </string-name>
          ,
          <article-title>Distilling virtual examples for long-tailed recognition</article-title>
          ,
          <source>in: Proceedings</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>