1. Introduction

Matěj Sieber

Tomáš Železný

0 0 University of West Bohemia, Faculty of Applied Sciences , Univerzitni 2732/8, 301 00 Pilsen , Czech Republic

This paper presents participation in the SnakeCLEF 2024 challenge, which aims to automate the identification of snake species. We explore various custom loss functions that incorporate the venomousness of snakes. These loss functions are used to train the Swin-v2 tiny model with same training specification as baseline solution to accurately measure the impact of custom loss functions. Swin-v2 tiny model is beneficial due to its low computational demand and opens the possibility for use in handheld devices. Our results show that the best approach for maximising performance on the custom competition metrics is to apply a soft target set according to the venomousness of the snake. The best accuracy is achieved by the model trained with loss, which weights the diferent classes according to the number of their instances.

eol>SnakeCLEF Snake Bite Computer Vision Classification Snake Species Identification Imbalanced dataset

1. Introduction 2. Dataset

The SnakeCLEF dataset [4] is a comprehensive collection of snake images used for the classification of snake species. The dataset consists of three parts: training, validation, and a private test set used for competition evaluation. The training and validation sets are publicly available, while the private test set is held back for evaluating the competition entries. The dataset is available in multiple variants, difering in the size of the images, to accommodate various computational capabilities and research needs. As stated in the SnakeCLEF2023 report [5], all subsets combined result in roughly 110,000 real snake observations with community-verified species labels, ensuring high-quality and reliable data for training models. Dataset contains 1,784 snake species, the classes exhibit a long-tailed distribution, meaning that a small number of classes have a large number of images, while many classes have only a few images. Figure 1 shows an illustrative representation of medically important snake species.

3. Evaluation

The competition evaluates competitors’ models using four diferent metrics. The first two are the macro-averaged F1 score and accuracy, which are the standard metrics for classification tasks. In the real world, however, misclassifying diferent snakes does not have the same consequences. In the worst case scenario, a deadly venomous snake is misclassified as a harmless one, which may result in death. With this in mind, the organisers came up with two other metrics, that take into account whether the snake is venomous or not. These metrics are denoted in Equation ( 2 ) and ( 3 ). In the real world, however, diferent venomous snake bites cannot be treated in the same way, as some are more venomous than others and the bite requires diferent types of serum, which may vary in side-efects, price or location. As it would be very expensive to achieve this level of granularity, a generalisation was made in the form of a universal, harmless, free and always-accessible antivenom.

⎧0 if = ˆ ⎪ ⎪ ⎪⎪⎪1 if ̸= ˆ and () = 0 and (ˆ) = 0 ⎪ (, ˆ) = ⎨2 if ̸= ˆ and () = 0 and (ˆ) = 1 ⎪⎪⎪2 if ̸= ˆ and () = 1 and (ˆ) = 1 ⎪ ⎪ ⎪⎩5 if ̸= ˆ and () = 1 and (ˆ) = 0 = ∑︁ (, ˆ),

where correct species is and predicted species is ˆ. () = 1 if species is venomous, otherwise () = 0.

= 11 + 2ℎ ℎ + 3ℎ + 4 + 5 ℎ ,

1 + 2 + 3 + 4 + 5 where 1 = 1, 2 = 1, 3 = 2, 4 = 2, and 5 = 5 are the weights of individual confusions, ℎ is the percentage of wrongly classified venomous species as a harmless species, ℎ is the percentage of wrongly classified harmless species as a venomous species, is the percentage of wrongly classified venomous species as another venomous species, ℎ ℎ is the percentage of wrongly classified harmless species as another harmless species, and the F1 is the macro averaged F1 score. ( 1 ) ( 2 ) ( 3 )

4. Experiments

Given the practical limitations and the need for widespread accessibility, we aimed to develop a model that could run on handheld devices such as smartphones and tablets. Real-time identification of snake species has a potential to significantly improve the speed and efectiveness of medical response, potentially saving lives and reducing the incidence of serious snakebite complications. This capability is particularly important in remote areas where snakebite is most common and access to high performance computing resources is limited.

We use the Swin-v2 tiny model [6], which is suitable for this task due to its small size and eficiency. Such lightweight models are less prone to overfitting, which is particularly important when dealing with diverse and unbalanced datasets.

We aim to maximise the performance of the model for those metrics that take into account whether the snake is venomous or not. Primarily, we focus on minimizing the L metric (Equation 2), as this closely matches real-world scenarios where accurate identification of snake species is paramount. The main goal of our work is to optimise the loss functions. Specifically, we created four diferent custom losses to meet our objectives. All results are presented in Table 1.

To efectively measure impact of custom losses we use the same training parameters as the baseline: RandResizedCrop and RandAugment augmentations, resolution size 256x256, learning rate 0.01 and SGD optimizer. Full training pipeline for the baseline solution is available at BVRA GitHub1. Code for proposed methods can be found at Our GitHub2.

4.1. Dual-head

The aim of this experiment is to improve the performance of the model by incorporating snake venom information using a combination of two classification losses. We add a second head consisting of one neuron with Sigmoid activation function. In addition to the Categorical Cross Entropy loss, we also train the model on the binary classification of venomous/harmless classes using Binary Cross Entropy loss, resulting in equation denoted in 4. ( 4 ) ( 5 ) ℒDual-head = ℒBCE + ℒCE

4.2. Rare class boost

Given the nature of the data, there are several strategies to address a long-tailed distribution, such as uneven sampling of training data or assigning greater weight to rarer classes. Our approach utilizes the weight strategy, which demonstrates superior accuracy, aside from ensemble models. In this experiment, we compute the loss with SeeSawLoss [7], multiplying it by the rarity of each class. Although SeeSawLoss was originally developed for long-tailed distribution data, our modification of incorporating dynamic class rarity across the whole dataset further improves the results. These results can be directly compared to the Baseline solution, which also utilizes SeeSawLoss.

batch = ∑︁ , =1

= ,

batch ℒClsBoost = × ℒ SeeSaw(x, y), where is number of instances of the class , is the dataset size, and is the batch size. 1BVRA GitHub: https://github.com/BohemianVRA/FGVC-Competitions/tree/feat/baselineTrainingForSnakeCLEF2024/ SnakeCLEF2024 2Authors GitHub: https://github.com/sieberm111/snakeclef2024 where is penalty defined in ( 1 ).

4.4. Soft target

ℒ+ = ℒ + , Instead of classical approach of setting a Cross Entropy target as one-hot vector, we explore using a soft target. In this method, our goal is to set the negative targets to values accordingly to venomous penalties in Equation ( 1 ), while ensuring that the sum of the target values remains 1. First, we linearly transform these penalties using Equation ( 7 ). This results in a target value of 1 for the least penalized classification, i.e., ( = ˆ), and a target value of 0 for the most penalized classification, i.e., ( ̸= ˆ and the venomous snake is classified as harmless). The values are then normalised and used as a soft target.

4.3. CE + VenomousPenalty

Another approach to take the venom of the snake into account is to simply add a venomous penalty, according to the equation 1, to the classical Categoric Cross Entropy loss. Although this method does not perform notably better than the baseline method, it achieves the best results in the F1 metric. ( 6 ) ( 7 ) (8) (9) = − 0.2 · + 1,

Targets = Norm( ), where are the penalties in Equation ( 1 ), and are the targets before normalisation.

This method results in poor performance, as the positive and negative target values are close after normalisation. This motivates us to create a soft target method which sets the positive target to significantly higher value than the negative targets. Our aim is to set the positive target value in the range ⟨0.5; 1.0). We conducted empirical experiments, which resulted in the target denoted in Equation (9). Models with best performance are denoted as SoftT-3 for temperature parameter = 3 and SoftT-4 for = 4.

Target = − · Softmax(log()), where = 0.1 for = ˆ, = 1 for ̸= ˆ and () = 0 and (ˆ) = 0, = 10 for ̸= ˆ and () = 0 and (ˆ) = 1, = 10 for ̸= ˆ and () = 1 and (ˆ) = 1, = 100 for ̸= ˆ and () = 1 and (ˆ) = 0, and is temperature parameter.

4.5. Model Ensemble

The competition sets a limit of 60 minutes for the maximum model inference time on the test set. Since our model is able to process the test set in units of minutes, we decided to use the remaining time for additional experiments. We created an ensemble of our models by averaging the logits. The ensemble of models performed noticeably better. Since there was still a lot of time left, we also tested the ensemble of logits for the given image and its horizontally flipped version for each model as the flip was not part of the augmentations. This doubled our inference time. However, we did not gain any improvement by using this method, so we do not report it.

5. Conclusion

This work presents a participation in the competition SnakeCLEF2024. Our approach is based on the use of the compact Swin-v2 tiny model, known for its speed and suitability for running on mobile devices such as smartphones and tablets. Instead of focusing on conventional methods, we decided to experiment exclusively with custom loss functions tailored to our specific scenario. In particular, we focused on the L metric, which is designed to penalise misclassification of snakes based on their venomousness.

The results (see Table 1 and 2) show that the Dual-head approach improved results compared to the baseline solution. And these improvements were stable on both datasets public and private alike. The ClsBoost loss was a viable idea that maintained the best accuracy on both the public and private datasets. Since the loss function did not focus on the custom metrics, but rather aimed to reduce the efect of the long-tailed distribution, the accuracy, even when tied to other metrics, was not suficient to maintain the best custom metric score. The CE-VP loss function proved that incorporating the L metric into the loss function helped. However, there are notable diferences between the performance on the public and test set. The SoftT loss function achieved the best results, namely the M and L metrics on the public dataset and the M and F1 metrics on the private dataset. Since this method uses a hyperparameter chosen by empirical study, this gives the opportunity for future work where the hyperparameter could be tuned within an ablation study with the aim of finding the best performing parameters of this method.

Since the competition allows a maximum inference time of 60 minutes, and our model requires only a few minutes to infer the entire test set, we decided to create an ensemble of our best models, resulting in better scores.

As an extension to our methods, we propose to use location data in the recognition process. By using GPS information, which is available on handheld devices such as smartphones, we can improve the accuracy of species identification by taking into account the geographical distribution of diferent snake species.

Acknowledgments

Computational resources were provided by the e-INFRA CZ project (ID:90254), supported by the Ministry of Education, Youth and Sports of the Czech Republic.

[1]

World

Health Organization , Snakebite envenoming, 2023 . Https://www.who.int/news-room/factsheets/detail/snakebite-envenoming.

[2]

Joly ,

Picek ,

Kahl ,

Goëau ,

Espitalier ,

Botella ,

Deneu ,

Marcos ,

Estopinan ,

Leblanc ,

Larcher ,

Šulc ,

Hrúz ,

Servajean , et al., Overview of lifeclef 2024 : Challenges on species distribution prediction and identification , in: International Conference of the CrossLanguage Evaluation Forum for European Languages , Springer, 2024 .

[3]

Picek ,

Hruz ,

A. M.

Durso , Overview of SnakeCLEF 2024: Revisiting snake species identification in medically important scenarios , in: Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum , 2024 .

[4] LifeCLEF, Snakeclef2024, 2024 . Https://huggingface.co/spaces/BVRA/SnakeCLEF2024.

[5]

Picek ,

Šulc ,

Chamidullin ,

Durso , Overview of snakeclef 2023: snake identification in medically important scenarios , CLEF , 2023 .

[6]

Liu ,

Hu ,

Lin ,

Yao ,

Xie ,

Wei ,

Ning ,

Cao ,

Zhang ,

Dong ,

Wei ,

Guo , Swin transformer v2: Scaling up capacity and resolution , in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2022 , pp. 12009 - 12019 .

[7]

Wang ,

Zhang ,

Zang ,

Cao ,

Pang ,

Gong ,

Chen ,

Liu ,

C. C.

Loy ,

Lin , Seesaw loss for long-tailed instance segmentation , in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2021 , pp. 9695 - 9704 .